Smartphone interventions offer unprecedented opportunities to support access to various forms of health care almost anywhere in the world, especially where the rate of device ownership is high but the availability (1) or uptake (Reference Gulliver, Griffiths and Christensen2) of face-to-face services is low. However, many users of smartphone health interventions delete them within weeks of download (Reference Dorsey, Chan and Mcconnell3). Poor user experience (UX) is a contributing factor, and a consequent threat to ubiquitous smartphone-supported care. Indeed, negative UX leads to poor adoption and variable use of technology-based interventions (Reference Sabesan and Kelly4–Reference Zaliani, Gilani and Nikbin6). Analysis of UX can provide useful data for enhancing the design and implementation of smartphone interventions, as well as improving target outcomes through increased adherence, fidelity, and reach. Addressing poor UX can increase user satisfaction (Reference George, Hassali and HSS7), while tailoring design to users' needs can improve treatment adherence and patient empowerment (Reference Knowles, Toms and Sanders8).
Mental disorders are a leading cause of disability, yet uptake of care services remains low. Interventions aiming to address barriers to improving mental health-related morbidity, such as stigma, poor mental health literacy (Reference Gulliver, Griffiths and Christensen2), and fragmented health services (Reference Smith, Lawrence and Sadler9), have a UX dimension. For example, the selection of language and imagery, as well as direct use of attitude-changing interventions (such as “myth-busting” (Reference Knaak, Mantler and Szeto10)) are not only relevant to users' experience of stigma, but must also be designed through user-focused methods to be effective (Reference Gronholm, Henderson and Deb11).
Smartphone interventions have the potential to assist in overcoming these barriers by providing confidential care, tailored toward users' literacy, needs, and availability (Reference Bhugra, Tasman and Pathare12). Design thinking (DT) (Reference Razzouk and Shute13) can help realize these advantages, yet remains underutilized in mental health research (Reference Scholten and Granic14). DT involves building solutions based on empathy toward users' nuanced needs, multidisciplinary ideation, and experimentation using prototyping and iterative development (Reference Scholten and Granic14). UX testing is based around the principles of DT, and aims to collect and utilize data reflecting user values and behaviors, and direct design to satisfy users' purposes (Reference Yardley, Morrison and Bradbury15). UX research requires balancing the demands of the scientific method with the subjectivity of user feedback. While this can be challenging, it allows the development of bespoke products which more effectively meet the needs of users (Reference Kylberg, Haak and Iwarsson16).
UX testing remains underinvestigated in the digital mental health literature. Some studies have recognized the importance of optimizing the UX of digital interventions, but few provide guidance on how best to do this. For example, in a review paper, Bakker et al. (Reference Bakker, Kazantzis and Rickwood17) recommended using reminder functions, simple interface designs, and providing links to crisis support services to improve user engagement. However, methods available for collecting and using UX data in design to realize these recommendations were not explored. Torous et al. (Reference Torous, Firth and Mueller18) reviewed eleven studies on smartphone apps for schizophrenia, and found some form of UX testing in most, but this commonly involved obtaining only general, unstructured feedback from users throughout the design process. Additionally, Feather et al. (Reference Feather, Howson and Ritchie19) conducted a systematic review, and examined 21 studies on web-based interventions for mental health conditions, such as depression. They found UX data was most commonly collected to improve the understandings of barriers to use. Some studies used qualitative methods, such as interviewing users during field trials of an intervention. Others applied questionnaires to examine UX through constructs, including user-satisfaction and acceptability. However, most measures were not assessed for rigor, and little was provided on the applications of data. Nicholas et al. (Reference Nicholas, Fogarty and Boydell20) investigated how users formulate UX assessments, and examined trends in public user reviews of smartphone apps for bipolar disorder. Users focused on functionality, perceived effects on relationships with clinicians, and how easily apps could be integrated into existing care plans. However, potential applications for these data were not examined in detail.
Limited understanding of user characteristics, user-centric design practices, and field utility assessments remain barriers to realizing the full potential of smartphones to improve mental health outcomes (Reference Torous, Nicholas and Larsen21). UX-based design methods, the reporting of UX data, and the development of standards for UX testing have been recognized as key issues (Reference Torous, Andersson and Bertagnoli22). However, a synthesis of potential approaches to help guide researchers, clinicians, and patients is lacking. The objective of this review was to respond by providing a critical overview of UX methods that have been used by researchers and practitioners seeking to improve the design and implementation of mental health smartphone interventions and to use these findings to identify opportunities for future UX evaluations directed toward this goal.
Methods
Within this paper, “user experience” (UX) refers to the dynamic combination of subjective and contextual factors that shape users' engagement with products or systems (Reference Abrahão, Bordeleau, Cheng, Gray and Kulkarni23;Reference Law, Roto, Hassenzahl, Olsen and Arthur24). Approaches to examining UX vary, ranging from quantitative system performance data, such as time taken to complete tasks, to user opinions, and construct-based measures, including user-satisfaction questionnaires (Reference Razzouk and Shute13). A principled approach to the measurement of UX recognizes it as a multidimensional phenomenon that can be characterized using multiple, overlapping constructs from a wide range of theoretical backgrounds. In this context, a construct is a coherent, well-operationalized account of some aspect of subjective experience. Constructs relevant to UX include satisfaction, acceptability, feasibility, utility, likeability, learnability, credibility, and usability (Reference Torous, Nicholas and Larsen21;Reference Baumel, Birnbaum and Sucala25;Reference Ben-Zeev, Wang and Abdullah26).
Reflecting the rapid pace of research on mental health smartphone interventions, the heterogeneity of study types, and a still-emerging focus on UX testing in relation to smartphone apps, we conducted a narrative review based on the research question: “In the research literature, what methods have been applied in practice to integrate user experience data into the design and implementation of mental health smartphone interventions?” We adopted a sample-based search strategy. We did not critically review every paper published in this area, but instead scanned available literature to identify key examples of major types of UX approaches within mental health. We focused on studies investigating the role of UX testing in mental health research, clinical work, and other settings involving mental health smartphone interventions.
PubMed, PsycINFO, and Scopus were searched from their inception to February 2019 with the following search terms: cell phone, smartphone, telemedicine, psychiatry, mental disorders, user experience, user perceptions, user testing, analytics, evaluation, and usability. We selected representative papers in English for each category from the last 6 years if they examined a mental health smartphone intervention, included an assessment of data reflecting the users' perception of the intervention, and drew conclusions specifically about UX. As UX testing is still emerging in the literature on mental health smartphone interventions, the quality of research was considered and discussed, but not used to exclude studies.
Currently accessible approaches are diverse, both in terms of the processes involved and the underlying ontological frameworks. Consequently, there is no universally agreed organizing principle. We therefore divided findings from our search into (a) studies using a situated approach to define bespoke UX measurement and evaluation and (b) studies using methods based on normative UX constructs, such as usability. We discuss how these approaches can be applied in design and implementation processes, as well as guide clinical use of mental health smartphone interventions. Table 1 summarizes our search findings.
Table 1. Summary of approaches to user experience testing of mental health smartphone interventions and potential applications in design and implementation

Results
Situated Approaches
Situated approaches were characterized by the use of investigator-specified UX methods or metrics closely tied to the specific contexts where evaluation occurred. Quantitative methods within this group included bespoke measures of product adherence, for example, task completion rates that were used for subgroup analyses of study outcomes. Methods also included qualitative techniques, such as collating unstructured user opinions about functional and emotional aspects of design and implementation of specific products. Grounded approaches to qualitative analysis that did not draw on formal UX constructs in their subsequent analysis also fell into this group.
Simple bespoke UX-focused feedback offers a pragmatic and potentially efficient way to summarize perceptions concerning the UX of an existing app design. Mackintosh et al. (Reference Mackintosh, Niehaus and Taft27) examined the UX of a mobile app for the treatment of emotional dysregulation among veterans. The app had functions for practicing emotional regulation skills, monitoring symptoms as well as recording physiological data, and was used in-between face-to-face anger management therapy sessions. A bespoke technology feedback questionnaire sought to assess “ease of use,” “frustration,” and “helpfulness” using a six-point scale with data summarized as score averages. Treatment engagement was also measured according to the dropout rate (attendance of less than twelve therapy sessions), session attendance, and treatment completion. According to questionnaire data, users' perceptions of the app's ease and helpfulness increased after 3 months of consistent use, while levels of frustration decreased. There was no difference in treatment engagement amongst those using the app alongside face-to-face therapy. On the basis of these overall positive scores, the app was considered helpful, easy to use, and efficient. Summary statistics such as these have particular value as repeated measures and may provide a basis for iterative UX improvement. However, using single-dimension measures offers little diagnostic potential. Had users indicated poor subjective UX along the dimensions being measured, this approach offers limited insight as to why.
Potentially addressing this limitation, grounded qualitative methods allow intervention designers to understand the detail of UX issues considered important to users. Orlowski et al. (Reference Orlowski, Lawn and Matthews28) used thematic analysis of interviews about smartphone-based care with youth mental health workers. Participants were asked about current use of technology at work, barriers to increased use, and how technology could impact one's professional role. Responses revealed youth mental health workers felt technology-based care was not “standard” practice and could challenge their ability to make interpersonal connections with patients. Concerns were raised about the amount of specific training needed to ensure staff could access maximum benefits from such technology. These results highlight how UX insights reveal areas for improving the design beyond the technical intervention, such as ensuring that technology-led change is integrated into clinical workflows. Practical design strategies that respond to these issues might include training and workflow process engineering, in addition to changing the app design itself.
A situated alternative to user feedback is to use observable data, such as automatically collected user analytics. Attwood et al. (Reference Attwood, Parke and Larsen29) evaluated the UX of an app designed to help monitor and reduce alcohol consumption. Download and deletion rates were used as indicators of usage patterns. About 42% of users deleted the app within 1 week. By the end of a 12-week program, however, only 5% of the initial number of users remained. The use of particular app functions was also examined. For instance, female users, those aged 35–44, users exhibiting “high-risk” drinking behaviors (more than twice the upper limit of recommended intake on four or more occasions within 1 week), and those with specific goals of reducing alcohol intake tended to use goal-setting functions more frequently than others. An advantage of quantitative UX data is that it can be explicitly linked to outcomes through modeling or subgroup analysis. Multiple linear regression analyses on the patterns of app usage associated with drinking behaviors identified gender, age, season of download, and baseline drinking as explaining more than a quarter of the variance in alcohol consumption at 1 month of app use.
Approaches Based on UX Constructs
In contrast to situated measures, constructs can provide a normative conceptual framework for investigating and interpreting the UX. Multiple constructs can be applied, each examining a different UX dimension. Universal construct definitions and validated methods add precision to UX assessment and allow comparison of apps built for the same purpose. Usability is a popular construct for measuring the UX because of its commonly cited universal definition and specific assessment methodologies. Usability is defined by the International Organisation for Standardisation (ISO) as “the extent to which a system, product or service can be used by specified users to achieve a specified goal with effectiveness, efficiency and satisfaction in a specified context of use” (30). Systems with high usability are associated with increased occupational value, while low usability is linked to user frustration and workflow disruption (Reference Agnisarman, Madathil and Smith31;Reference Sheehan, Lee and Rodriguez32). For other constructs, definitions are not always agreed upon, and evaluation methods not always robust.
Similar to situated approaches, construct-based approaches to UX evaluation can be based on instrument-based measurement, qualitative inquiry, or both. In mixed approaches, the quantitative method provides normative data to benchmark and compare UX along the selected dimension, while the qualitative method is used to fill in contextual details to deepen understanding of UX problems and inform product refinement. For instance, Vilardaga et al. (Reference Vilardaga, Rizo and Kientz33) had patients with a mental illness evaluate the usability of a popular smoking cessation app. Usability was measured through the System Usability Scale (SUS), a 10-item questionnaire that gives a total score out of 100 to represent overall usability. The SUS has been validated, found to be reliable (α = .91), and there are agreed schedules for categorizing scores (Reference Bangor, Kortum and Miller34). A separate structured interview explored specific issues drawn from the underlying construct of usability. These related to navigation, motivation, utility of features, barriers to use, potential situations where the app may be pleasant or engaging as well as opportunities for giving feedback about design and implementation. The app scored 65.6/100 on the SUS. Interview data mapped to this moderate–poor rating included content being difficult to understand for some users, and differences in perceptions of functions, as some found smoking cessation reminders helpful, while others found them annoying. However, there was no subsequent refinement of the app, nor measurement of the UX.
One of the potential benefits of construct-based approaches is comparison of normative findings between competing intervention variants or across studies. Dubad et al. (Reference Dubad, Winsper and Meyer35) conducted a systematic review assessing the psychometric properties, usability, and clinical effects of mood monitoring smartphone apps for young people aged 10–24 years. They used a previous, broader ISO definition of usability as “the capability of the software product to be understood, learned, used and attractive to the user, when used under specified conditions” (36). However, studies were compared mainly according to participation rates of app use, and available data reflecting users' perceptions of apps, such as whether they found them easy to use. Based on these data, Dubad et al. (Reference Dubad, Winsper and Meyer35) concluded that users have overall positive perceptions of mental health apps and are willing to use them in real life. Although the usability construct provided a framework for study comparisons, data from usability metrics themselves were not compared. This may reflect the still limited use of standardized tools across the literature, yet remains a major potential advantage of the usability construct and associated measures, such as the SUS (Reference Kortum and Bangor37).
Construct-based measures can also be combined to form bespoke UX measures. Ben-Zeev et al. (Reference Ben-Zeev, Brenner and Begale38) drew on several constructs and validated standard instruments to analyze the UX of a smartphone app for schizophrenia which collected background behavioral data, such as location and physical movement to map behavioral tendencies. A 26-item questionnaire was used to measure acceptability and usability. Items were drawn from construct-based questionnaires, including the SUS (Reference Brooke, Jordan, Thomas, Weerdmeester and McClelland39), Post Study System Usability Scale (Reference Lewis40), Technology Assessment Model Measurement Scales (Reference Venkatesh and Davis41), and Usefulness, Satisfaction and Ease questionnaire (Reference Lund42). Data indicated users were comfortable using the app (95%), understood its functions (70%), did not find it difficult to use (70%), were interested in the app having functions to provide feedback to them about their behavior (65%) as well as functions suggesting coping mechanisms in times of distress (65%). However, there was no measure of change in acceptability or usability over time, only that the app was considered highly acceptable and usable.
Not all UX constructs have established definitions or conceptual and operational frameworks, which can limit the interpretation and utility of UX data. For instance, Povey et al. (Reference Povey, Mills and Dingwall43) conducted a qualitative study evaluating the acceptability of a cognitive behavioral therapy app and suicide prevention app for Aboriginal and Torres Strait Australians in controlled settings. Povey et al. (Reference Povey, Mills and Dingwall43) reported acceptability was influenced by personal factors (e.g., aspects of users' illness), environmental factors (e.g., stigma toward mental health conditions), and app functionality (e.g., “ease of use”). However, acceptability was not defined, and the qualitative methodology used was not specific to the construct. An established definition is needed to benchmark the UX of apps, and could have helped determine if the app's UX met a standard where clinicians could recommend use. Construct-specific, validated methods of analysis within a defined conceptual framework could also have enabled practical design and implementation recommendations. For instance, if Povey et al. (Reference Povey, Mills and Dingwall43) demonstrated that environmental factors are stronger predictors of app acceptability than personal factors, more focus could be given to implementation rather than design in communities where mental illness is strongly stigmatized.
Discussion
Mental health smartphone interventions are increasingly used in clinical practice. One survey estimated that there were almost 250 apps for depression alone (Reference Shen, Levitan and Johnson44). Despite the potential value of UX data across the process of intervention design, evaluation, and commissioning, there is no gold standard for assessing the UX of smartphone-based mental health interventions. In this narrative review, our objective was to summarize the major types of approaches that have been used in practice. Our review demonstrates how a range of methods, which we partitioned according to underlying theoretical basis, have been used in the design and implementation of mental health smartphone interventions. Situated approaches provide quantitative data to help understand needs, behaviors and patterns of use for a specific user group, as well as bespoke measures of particular aspects of the UX. Constructs can assist by structuring analysis of the UX around agreed definitions and robust methodologies, enabling measurement and comparison of data using validated instruments, as well as analysis of multiple aspects of the UX in parallel.
The utility of any UX data depends on the stage of development of an intervention, the interests of the user and designer, as well as the strength of the method being used. The design and implementation process for smartphone interventions is rarely linear and requires repeated adaptation to the changing needs of patients and clinicians. Both situated and construct-based UX measures can be helpful at various stages. Our findings highlight how, in the early stages of design, situated methods can assist in determining the real needs of users, and help anticipate responses to potential barriers to use. During implementation, situated approaches also enable identification of UX issues that may arise when interventions are deployed into real-world use, as well as understanding how UX needs differ according to the type of user (Reference Fuller-Tyszkiewicz, Richardson and Klein45). As interventions become more established, construct-based approaches using established definitions and validated measurement techniques can help track how the UX is changing with time or the impact of new iterations of an intervention. These findings also emphasize how dogmatic adherence to theoretical constructs may miss identification of actionable UX issues using grounded methods, as well as how analysis driven solely by situated methods may miss deeper insights gained through broader theoretical frameworks.
Recent work (Reference Maramba, Chatterjee and Newman46) reveals several UX methods being used in eHealth, such as eye-tracking (Reference Bojko47) and card sorting (Reference Righi, James and Beasley48), which were not reported in this review. Notably, experimental paradigms such as split testing (Reference Speicher, Both, Gaedke, Casteleyn, Rossi and Winckler49) were lacking. Additionally, there was an absence of organized use of higher-level framework methodologies, including DT, which are being used to enhance the design of smartphone interventions in other areas, such as heart failure (Reference Woods, Cummings and Duff50). Importantly, however, this study was not an exhaustive systematic review, and aimed to identify and analyze UX methods currently in use in mental health. Additionally, studies may not always report on preliminary UX testing of interventions. Hence, not all UX methods may have been detected.
Our review also highlights how, without clear definitions and validated measurement, construct-based UX measurement can be significantly limited. For example, Povey et al. (Reference Povey, Mills and Dingwall43) did not specifically define the constructs used, making it difficult to draw useful conclusions about the UX or understand how these data could be applied to improve the apps reviewed. From the point of view of comparison, this means that, while both proclaimed to assess the acceptability of a smartphone app, they may have been assessing quite different aspects of the UX. Further, where several constructs are used at once, each must be defined and measured separately if clear conclusions are to be drawn. This can allow for the development of bespoke combinations of constructs and validated measures (Reference Ben-Zeev, Brenner and Begale38). However, subtle differences in meaning between constructs can be lost when constructs are bundled without rationale or validated measurement. For instance, some studies use satisfaction as a measure of acceptability (Reference Naslund, Aschbrenner and Barre51), while other literature recognizes acceptability as a distinct measure of the extent of behavior change resulting from an intervention, and can differ from levels of satisfaction toward the same intervention (Reference Long, Gambling and Young52). Future studies should investigate the development of a battery of constructs with established definitions and validated tools for measuring the UX of smartphone interventions to increase their utility across the literature and in practice, particularly in mental health.
The findings of our review can help in navigating emerging evidence-based tools for evaluating the UX of smartphone interventions in clinical practice. Commonly used tools, such as star rating systems and user comments, provide limited UX insights, and can be difficult to generalize beyond one individual's experience. One option is to deploy either situated or construct-based methods to evaluate each candidate product in turn, but this is often not feasible on skills or resource grounds. A promising recent development has been the creation of simplified assessment methods that provide heuristic or streamlined approaches to detect UX issues (without necessarily requiring in-depth evaluation of each product). The pros and cons of these tools are summarized in Table 2. Many use a combination of situated and construct-based measures. For instance, the American Psychiatric Association (APA) provides a publicly available online 5-step “App Evaluation Model” to help clinicians and patients decide on adopting smartphone apps (Reference Torous, Chan and Gipson53). The model includes a step on evaluating the UX, specifically focusing on “ease of use.” Rather than recommending specific construct-based measures, the model suggests UX issues for consideration, such as whether the app can be customized or is easy to use on a long-term basis. Alternatively, stakeholders can rely on UX judgments made by clearing houses. Psyberguide, a nonprofit web site that publishes evaluations of mental health apps, includes an assessment of UX using The Mobile App Rating Scale (MARS), a 23-item questionnaire assessing UX constructs, including engagement, functionality, and aesthetics, as well as quality of information, subjective quality, and perceived impact (Reference Stoyanov, Hides and Kavanagh54). MARS gives an overall score to reflect “app quality,” and has been shown to have high levels of inter-rater reliability and internal consistency (Reference Stoyanov, Hides and Kavanagh54). Such tools utilize a combination of situated and construct-based approaches. Given the emergence of such evidence-based tools, clinicians and researchers should be skilled in navigating the merits and challenges associated with the included situated and construct-based measures, and ensure the rationale behind their use is justified. Additionally, similar to other areas of medicine, where organizations recommend the use of certain tools, clinicians and patients should expect the evidence behind using certain UX assessment approaches to be provided and peer-reviewed.
Table 2. Summary of the pros and cons of structured clinical tools using situated and construct-based UX assessment methods

Our review highlighted that there were no studies examining the potential relationships between clinical outcomes and the UX, though this remains an important area for future research. As Leigh highlights, it is difficult to imagine how governments would increase support for improving app development and implementation based on evidence explicitly focusing on usability without clear links to improved patient reported outcomes (Reference Leigh55). However, some studies in our review, such as Attwood et al. (Reference Attwood, Parke and Larsen29) suggest that situated UX data can provide insight into behavioral aspects of illness, which through subgroup analysis can help improve understandings of target populations, as well as the kinds of interventions most likely to provide benefit. For instance, future research involving subgroup analysis of UX data from smartphone apps tracking manic behaviors in people with bipolar disorder could help clarify triggers for relapse in certain groups, and guide the timing of interventions for maximum benefit.
In addition, with the development of machine learning and novel human–machine interfaces, such as conversational agents and “Internet of things” devices, there is potential for even greater intervention complexity in digital mental health. Future interventions will have broader scope for personalization and adaptive intervention design, with clinicians and designers potentially able to modify the format and type of care provided according to patients' needs and response to therapy, such as in Tess, the psychological artificial intelligence chatbot built by X2AI Inc. (Reference Fulmer, Joerin and Rauws56). UX assessment is being conducted in relation to these interventions (Reference Fulmer, Joerin and Rauws56), and could grow to become particularly helpful in identifying which specific aspects of interventions help facilitate the best clinical outcomes. Future studies could utilize combinations of situated and construct-based measures to explore the potential effects of certain design and implementation modifications on clinical outcomes, such as whether sleep monitoring affects the timing of access to care in patients becoming unwell.
Conclusions
The spectrum of methods examined in this review highlights how informed trade-offs can guide approaches to UX assessment of mental health smartphone interventions with reference to resources, experiences, and service demands. Simple, bespoke measures may help detect easily fixable UX issues, while complex decisions such as comparison of iterations or competitive selection of one intervention over another require robust construct-based methods. Moving forward, more sophisticated UX assessment tools should be used for mental health smartphone interventions, and the relationship between UX data and clinical outcomes further explored.
Financial support
This research received no specific grant from any funding agency, commercial or not-for-profit sectors.
Conflicts of interest
The opinions expressed in this paper are those of the authors and do not necessarily represent the decisions, policies, or views of the WHO.