Assessing distress in the community: psychometric properties and crosswalk comparison of eight measures of psychological distress

P. J. Batterham; M. Sunderland; T. Slade; A. L. Calear; N. Carragher

doi:10.1017/S0033291717002835

Assessing distress in the community: psychometric properties and crosswalk comparison of eight measures of psychological distress

Published online by Cambridge University Press: 02 October 2017

P. J. Batterham ,

M. Sunderland ,

T. Slade ,

A. L. Calear and

N. Carragher

Show author details

P. J. Batterham*: Affiliation:
Centre for Mental Health Research, The Australian National University, Canberra, Australia
M. Sunderland: Affiliation:
NHMRC Centre of Research Excellence in Mental Health and Substance Use, University of New South Wales, Sydney, Australia
T. Slade: Affiliation:
NHMRC Centre of Research Excellence in Mental Health and Substance Use, University of New South Wales, Sydney, Australia
A. L. Calear: Affiliation:
Centre for Mental Health Research, The Australian National University, Canberra, Australia
N. Carragher: Affiliation:
Office of Medical Education, University of New South Wales, Sydney, NSW, Australia
*: Author for correspondence: P. J. Batterham, E-mail: philip.batterham@anu.edu.au

Article contents

Abstract
Background
Methods
Results
Conclusions
Method
Results
Discussion
References

Rights & Permissions

Abstract

Background

Many measures are available for measuring psychological distress in the community. Limited research has compared these scales to identify the best performing tools. A common metric for distress measures would enable researchers and clinicians to equate scores across different measures. The current study evaluated eight psychological distress scales and developed crosswalks (tables/figures presenting multiple scales on a common metric) to enable scores on these scales to be equated.

Methods

An Australian online adult sample (N = 3620, 80% female) was administered eight psychological distress measures: Patient Health Questionnaire-4, Kessler-10/Kessler-6, Distress Questionnaire-5 (DQ5), Mental Health Inventory-5, Hopkins Symptom Checklist-25 (HSCL-25), Self-Report Questionnaire-20 (SRQ-20) and Distress Thermometer. The performance of each measure in identifying DSM-5 criteria for a range of mental disorders was tested. Scale fit to a unidimensional latent construct was assessed using Confirmatory Factor Analysis (CFA). Finally, crosswalks were developed using Item Response Theory.

Results

The DQ5 had optimal performance in identifying individuals meeting DSM-5 criteria, with adequate fit to a unidimensional construct. The HSCL-25 and SRQ-20 also had adequate fit but poorer specificity and/or sensitivity than the DQ5 in identifying caseness. The unidimensional CFA of the combined item bank for the eight scales showed acceptable fit, enabling the creation of crosswalk tables.

Conclusions

The DQ5 had optimal performance in identifying risk of mental health problems. The crosswalk tables developed in this study will enable rapid conversion between distress measures, providing more efficient means of data aggregation and a resource to facilitate interpretation of scores from multiple distress scales.

Keywords

Crosswalk distress equating psychometrics scales

Type: Original Articles
Information: Psychological Medicine , Volume 48 , Issue 8 , June 2018 , pp. 1316 - 1324

DOI: https://doi.org/10.1017/S0033291717002835 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2017

There are many uses for dimensional scales assessing non-specific (or ‘general’) psychological distress. Such measures are frequently used in epidemiological and clinical research, as well as in health service settings. A psychological distress measure may be used as a proxy or screener for mental health status (e.g. Payton, Reference Payton2009; Sunderland et al. Reference Sunderland, Slade, Stewart and Andrews2011), to assess treatment effectiveness (e.g. Christensen et al. Reference Christensen, Griffiths and Jorm2004; Duijts et al. Reference Duijts, Faber, Oldenburg, van Beurden and Aaronson2011), as a risk factor for clinical outcomes (e.g. Overholser et al. Reference Overholser, Freiheit and DiFilippo1997; Linton, Reference Linton2000; Stansfeld et al. Reference Stansfeld, Fuhrer, Shipley and Marmot2002) or as a screener to determine whether clinical assessment or intervention may be required (e.g. Carlson et al. Reference Carlson, Groff, Maciejewski and Bultz2010; Baksheev et al. Reference Baksheev, Robinson, Cosgrave, Baker and Yung2011). Due to the response burden of more extensive clinical interviews and disorder-specific measures, psychological distress measures have become popular for use in research where mental health is a focus or a consideration due to their brevity and accuracy. The importance of psychological distress measures has increased, with growing emphasis on measures that broadly assess symptom severity (Brown & Barlow, Reference Brown and Barlow2005; Cuthbert, Reference Cuthbert2014). A number of national health surveys use psychological distress measures to assess severity of non-specific psychological symptoms (Hickman et al. Reference Hickman, Delucchi and Prochaska2014; Sunderland et al. Reference Sunderland, Carragher, Buchan, Batterham and Slade2014; Kaul et al. Reference Kaul, Avila, Mutambudzi, Russell, Kirchhoff and Schwartz2017). The present study aimed to compare the psychometric properties of multiple measures of psychological distress and develop crosswalk tables to facilitate comparison across different measures.

There are many measures available to measure psychological distress. The development of these measures is varied, but most have their origins in conceptualisations of symptoms of internalising mental disorders. The focus on specific symptoms of common mental disorders may be problematic, as features of less common manifestations of distress may be overlooked (Batterham et al. Reference Batterham, Sunderland, Carragher, Calear, Mackinnon and Slade2016). While there have been comparisons of selected psychological distress scales (e.g. McCabe et al. Reference McCabe, Thomas, Brazier and Coleman1996; Furukawa et al. Reference Furukawa, Kessler, Slade and Andrews2003; Ventevogel et al. Reference Ventevogel, De Vries, Scholte, Shinwari, Faiz, Nassery, van den Brink and Olff2007; Batterham et al. Reference Batterham, Sunderland, Carragher, Calear, Mackinnon and Slade2016), there is limited comparative information available on which measures are optimal for assessing psychological distress in the community. Such evaluation may adopt a number of approaches, including: examining the precision of such distress measures in identifying clinical caseness for multiple mental health problems (Furukawa et al. Reference Furukawa, Kessler, Slade and Andrews2003; Ventevogel et al. Reference Ventevogel, De Vries, Scholte, Shinwari, Faiz, Nassery, van den Brink and Olff2007; Batterham et al. Reference Batterham, Sunderland, Carragher, Calear, Mackinnon and Slade2016); testing psychometric properties of the measures, such as unidimensionality and item fit (McCabe et al. Reference McCabe, Thomas, Brazier and Coleman1996; Batterham et al. Reference Batterham, Sunderland, Carragher, Calear, Mackinnon and Slade2016); or examining the relative precision of measures across a latent dimension of distress (Batterham et al. Reference Batterham, Sunderland, Carragher, Calear, Mackinnon and Slade2016). Evaluation of scale performance also needs to account for the efficiency of the measure, as there may be compromises between the length (and respondent burden) of an instrument and its precision.

While comparison of measures is important in assisting researchers and clinicians in choosing a measure, the problem remains that measures are not always selected on the basis of their precision and efficiency. A lack of consistency will remain in the selection of measures in research and clinical settings. Consequently, the development of a common metric for distress measures would provide an important resource for researchers and clinicians. By equating scores across multiple scales to develop conversion tables or crosswalks between scales using Item Response Theory (IRT)-based approaches, transforming scores from one scale to another becomes relatively straightforward (Choi et al. Reference Choi, Schalet, Cook and Cella2014). Researchers may use this information to more efficiently aggregate data across studies that use different measures of psychological distress, or to facilitate more rapid development of individual patient meta-analyses (e.g. Bower et al. Reference Bower, Kontopantelis, Sutton, Kendrick, Richards, Gilbody, Knowles, Cuijpers, Andersson, Christensen, Meyer, Huibers, Smit, van Straten, Warmerdam, Barkham, Bilich, Lovell and Liu2013; Culverhouse et al. Reference Culverhouse, Bowes, Breslau, Nurnberger, Burmeister, Fergusson, Munafo, Saccone, Bierut, Httlpr and Depression2013). Clinicians may use crosswalk tables as a resource to interpret scores on disparate distress scales. Such crosswalks have been developed for measures of depression (Choi et al. Reference Choi, Schalet, Cook and Cella2014; Wahl et al. Reference Wahl, Lowe, Bjorner, Fischer, Langs, Voderholzer, Aita, Bergemann, Brahler and Rose2014; Umegaki & Todo, Reference Umegaki and Todo2017), anxiety (Schalet et al. Reference Schalet, Cook, Choi and Cella2014) and pain (Askew et al. Reference Askew, Kim, Chung, Cook, Johnson and Amtmann2013). However, to our knowledge, no study has developed crosswalks for measures of psychological distress.

The present study used a community-based sample of 3620 Australian adults who completed eight measures of psychological distress (each denoted with the number of items included): the Patient Health Questionnaire-4 (PHQ-4; Kroenke et al. Reference Kroenke, Spitzer, Williams and Lowe2009), Kessler-10 (K10; Kessler et al. Reference Kessler, Andrews, Colpe, Hiripi, Mroczek, Normand, Walters and Zaslavsky2002), Kessler-6 (K6, a subset of the K10; Furukawa et al. Reference Furukawa, Kessler, Slade and Andrews2003), Distress Questionnaire-5 (DQ5; Batterham et al. Reference Batterham, Sunderland, Carragher, Calear, Mackinnon and Slade2016), Mental Health Inventory-5 (MHI-5; Berwick et al. Reference Berwick, Murphy, Goldman, Ware, Barsky and Weinstein1991), Hopkins Symptom Checklist-25 (HSCL-25; Sandanger et al. Reference Sandanger, Moum, Ingebrigtsen, Dalgard, Sorensen and Bruusgaard1998), Self-Report Questionnaire-20 (SRQ-20; Beusenberg et al. Reference Beusenberg and Orley1994); and the one-item Distress Thermometer (DT; Mitchell, Reference Mitchell2007; Donovan et al. Reference Donovan, Grassi, McGinty and Jacobsen2014). Measures were chosen on the basis of brevity and previous evidence for validity and reliability. The study aimed to compare these eight measures, based on precision in assessing clinical caseness across seven common internalising mental disorders, evidence for unidimensionality, and precision across a latent dimension of distress. IRT was then used to calibrate a single item bank comprising all items from the eight scales for the purpose of developing crosswalk tables for the different measures. As such, scores from each of the scales could be used to estimate the severity of psychological distress on the same continuum regardless of the specific instrument.

Method

Participants and procedure

Participants were recruited using advertisements on the online social media platform Facebook between January and February 2016. Advertisements linked directly to the survey and targeted Australian adults aged 18 years or older. The survey was implemented online using Qualtrics survey software. From 5379 individuals who clicked on the survey link, 5220 (97%) consented and commenced the survey and 3620 (67%) completed the full survey. There were no missing data as responses were required for all questions except age and gender, with participants encouraged to discontinue if they were uncomfortable with the survey.

Potential participants were provided with a detailed information sheet outlining survey involvement and providing mental health resources including help lines and crisis team contact information. Subsequent to presentation of the information sheet, online informed consent was collected. The survey included the measures of psychological distress, in addition to measures assessing major depressive disorder (MDD), generalised anxiety disorder (GAD), social anxiety disorder (SAD), panic disorder (PD), post-traumatic stress disorder (PTSD), obsessive compulsive disorder (OCD), adult attention deficit hyperactivity disorder (ADHD), substance use disorder (SUD), suicidality, psychosis, eating disorders, fatigue, sleep disturbance, help seeking, disclosure, demographics and personality. The survey took approximately 40–60 min to complete. No incentive was provided to participants. Ethical approval was obtained from the ANU Human Research Ethics Committee (protocol: #2015/717).

Measures

Psychological distress was assessed using each of the eight identified distress measures: PHQ-4, K10/K6, DQ5, MHI-5, HSCL-25, SRQ-20, and DT. Each of these distress measures is brief, easy to administer, freely available for research use, and has previously been shown to have robust measurement properties, including strong associations with presence of internalising disorders.

Clinical diagnosis was assessed using a DSM-5 symptom checklist, developed by the authors as a self-report assessment for clinical diagnosis based on DSM-5 criteria (Batterham et al. Reference Batterham, Calear, Christensen, Carragher and Sunderland2017a, Reference Batterham, Sunderland, Carragher and Calearb). The checklist queried respondents about the presence or absence of symptoms based directly on DSM-5 definitions for each disorder of interest. There were eight checklist items used to assess SAD; 21 for PD; 14 for GAD; 15 for MDD (including items to exclude hypomania); 22 for PTSD; 14 for OCD; and 21 for ADHD. Each item reflected a single DSM-5 criterion for the disorder of interest, although some criteria were probed across multiple questions and additional items were used to exclude alternative diagnoses. Example items included as follows: ‘In the past six months, did social situations nearly always make you feel frightened or anxious?’, ‘During the past six months, has your behaviour or difficulty in paying attention caused problems at home, work, school, or socially?’ and ‘In the past month, has there been a time when you unexpectedly felt intense fear or discomfort?’ The checklist was designed along similar principles to the electronic version of the Mini International Neuropsychiatric Interview (MINI; Zbozinek et al. Reference Zbozinek, Rose, Wolitzky-Taylor, Sherbourne, Sullivan, Stein, Roy-Byrne and Craske2012) in terms of structure (binary and categorical self-report items with conditional skip logic) and response burden. However, the checklist used in the current study was developed independently from the MINI, non-proprietary and based on DSM-5 rather than DSM-IV criteria. As with the electronic MINI, the checklist has not yet been validated against DSM-5 clinician diagnosis or structured clinical interview, but contains items reflecting each of the symptom criteria used in DSM-5 to generate a diagnosis. The full checklist has been published previously (Batterham et al. Reference Batterham, Calear, Christensen, Carragher and Sunderland2017a) and is available from the authors.

Demographic factors were reported to describe the sample, and were based on self-reported measures of age group, sex (male, female, other), educational attainment, employment status, location (metropolitan, regional, rural), and language spoken at home.

Analysis

The performance of each scale in identifying individuals meeting the checklist criteria for one or more DSM-5 mental disorders (from MDD, GAD, PD, SAD, PTSD, OCD or ADHD) was compared by examining the area under the receiver operating characteristic curve (AUC). The AUC metric provided a summary of the performance of each scale against the checklist criteria across all possible cut-points of a scale, providing a metric for comparing scales that did not require selection of a specific cut-point. The sensitivity and specificity of each scale in identifying clinical criteria as measured by the checklist were then compared at two cut-points: the cut point prescribed by the authors of the scale and the optimal cut point based on the Youden index (i.e. highest combination of sensitivity and specificity) to ensure that no screeners were disadvantaged by performance in the current sample. The correlations between scale scores were estimated to examine whether scales were measuring a similar construct. Internal consistency was assessed for each scale using Cronbach's alpha. The unidimensional structure of each measure was tested using a confirmatory factor analysis (CFA), on the basis of limited information weighted least squares with mean and variance adjustment. This estimator is suitable for the analysis of categorical/ordinal data based on tetrachoric/polychoric correlation matrix. Excellent model fit was determined using established cut-off values of ⩾0.95 for Comparative Fit Index (CFI) and Tucker-Lewis Fit Index (TLI) and values ⩽0.05 for Root Mean Square Error of Approximation (RMSEA) whereas acceptable model fit was determined using cut-off values of ⩾0.90 for CFI and TLI and values ⩽0.08 for RMSEA (Hu & Bentler, Reference Hu and Bentler1998, Reference Hu and Bentler1999). Items with a factor loading on a single factor solution <0.4 were identified as poorly fitting (Brown, Reference Brown2015). As RMSEA can be influenced by factors including the number of parameters in the model, decisions on fit were based on a range of factors including CFI/TLI, RMSEA, item loadings and percentage of variance explained by a single factor. From the CFA models, test information curves were generated to examine the precision of each measure across a latent dimension of distress.

IRT analysis was then used to calibrate the eight psychological distress measures on a single common metric. To enable calibration onto a single metric, items from all scales were incorporated into a single item bank. The IRT model assumed that this item bank was unidimensional, an assumption that was tested using a CFA model, analogous to the procedure conducted with the separate measures. Data from all items included in the combined item bank were then modelled using the 2-parameter logistic IRT model (for binary data) and the 2-parameter logistic graded response model (for ordinal data) using an Expectation-Maximization approach outlined by Bock & Aitkin (Reference Bock and Aitkin1981) and implemented in the MIRT package for R (Chalmers, Reference Chalmers2012). The residual correlation matrix (correlation between item pairs that remain after accounting for the expected correlation between items by the model) was used to determine the presence of locally dependent items. Any item pair with a residual correlation greater than an absolute value of 0.3 were flagged for inspection. Local dependence can artificially increase the discrimination parameters of the IRT model. To examine the impact of locally dependent items, the parameters were re-estimated after dropping one item from each flagged pair and drawing comparisons with the original estimates (Reeve et al. Reference Reeve, Hays, Bjorner, Cook, Crane, Teresi, Thissen, Revicki, Weiss, Hambleton, Liu, Gershon, Reise, Lai, Cella and Group2007).

The fit of each item to the IRT model was examined using the S-X2 statistic proposed by Orlando & Thissen (Reference Orlando and Thissen2000) for dichotomous data and Kang & Chen (Reference Kang and Chen2011) for polytomous data. The significance level of the S-X2 values were adjusted using the Bonferroni method. Items that exhibited poor fit were excluded from the analysis and the item parameters for the remaining items were re-estimated. These steps were repeated until all item exhibited good model fit. A procedure outlined in Wahl et al. (Reference Wahl, Lowe, Bjorner, Fischer, Langs, Voderholzer, Aita, Bergemann, Brahler and Rose2014) was utilised to generate item parameters for the items that were excluded in the final model. Briefly, the item parameters for the excluded items were estimated one by one with the item parameters of all fitting items being fixed at their previously estimated values. This placed the excluded items on the metric estimated by items with good model fit.

As the sample was not representative of the general population, with higher levels of psychopathology and over-representation of females, a weighting scheme was applied to the estimation of IRT parameters for the derivation of normative estimates. This scheme was designed to make the sample more representative of the general population in terms of age, gender, and psychopathology distributions. The weighting scheme used Australian data on the population prevalence of anxiety, affective, and substance use disorders, accounting for comorbidity between these disorder categories in each age and gender group. Data on mental disorders was obtained from the 2007 Australian National Survey of Mental Health and Wellbeing (Slade et al. Reference Slade, Johnston, Oakley Browne, Andrews and Whiteford2009), while data on age and gender distributions of the general population were obtained from Australian Bureau of Statistics population estimates (Australian Bureau of Statistics, 2014). Weights were generated for 3577 (99%) participants, excluding 43 participants who did not report either age or gender.

The individual weighted item response parameters can be used to generate theta (latent psychological distress) scores for each respondent on the common metric regardless of what specific psychological distress measure was administered. Calculating theta scores based on individual response data requires the use of a statistical package such as the MIRT package in R (Chalmers, Reference Chalmers2012). However, crosswalk or score conversion tables can be generated to easily convert grouped raw scores (e.g. count scores) from each psychological distress scale to theta scores on the normative common metric. These crosswalk tables were constructed in the current study using an Expected A Posteriori (EAP) to sum-score method (Thissen et al. Reference Thissen, Pommerich, Billeaud and Williams1995).

Descriptive and classical test statistics were derived using SPSS v23 (IBM Corp, Chicago IL USA). CFA models were estimated using Mplus v7.4 (Muthen & Muthen, Los Angeles CA USA). All IRT models and crosswalk tables were generated using R v3.3.2 (R Foundation for Statistical Computing, Vienna Austria), with the MIRT package (Chalmers, Reference Chalmers2012).

Results

Sample characteristics are provided in Table 1, broken down by absence or presence of meeting criteria for one or more DSM-5 mental disorders (MDD, GAD, PD, SAD, PTSD, OCD, or ADHD), as assessed using the self-report symptom checklist. The sample had elevated prevalence of mental disorders, with 41% meeting checklist criteria for at least one current mental disorder (9% with MDD, 22% GAD, 5% PD, 15% SAD, 11% PTSD, 12% OCD, and 19% adult ADHD). The sample comprised a relatively uniform age distribution, although declining participation was observed after age 65 years of age. Females (80%) were over-represented in the sample. Geographical location was largely representative of the Australian population. Factors significantly associated with presence of mental disorder included younger age, other gender (includes transgender), less educational attainment, unemployment and living in a metropolitan or regional area. Screener scores are also displayed in the table, with all indicating markedly greater distress among individuals meeting criteria for a mental disorder.

Table 1. Characteristics of the sample based on absence or presence of a mental disorder as assessed using a self-report symptom checklist (n = 620)

PHQ-4, Patient Health Questionnaire-4; K10, Kessler-10; K6, Kessler-6; DQ5, Distress Questionnaire-5; MHI-5, Mental Health Inventory-5; HSCL, Hopkins Symptom Checklist; SRQ, Self-Report Questionnaire.

Notes: bold values indicate p < 0.05.

Comparison of distress measures

Table 2 indicates that scores on all distress measures were highly correlated, with most having correlations >0.8 between all other screeners. The Distress Thermometer had correlations of 0.7 with other measures, which is very strong for a single item although slightly weaker than correlations between the other measures. Internal consistency for all screening scales was high, with α ⩾ 0.89. Table 3 shows the psychometric properties of each screener against meeting clinical caseness for any DSM-5 mental disorder based on the checklist. The AUC was ⩾0.85 for all scales except the Distress Thermometer. The HSCL-25 had the largest AUC (0.891), which was significantly greater than all other screeners except the DQ5. Two cut points for each screener were tested: the cut point prescribed by the authors of the scale and the optimal cut point based on the Youden index. Based on the latter criteria, the SRQ-20 had highest sensitivity and the DQ5 had highest specificity. In addition, the PHQ-4, DQ5, and MHI-5 had sensitivity not significantly lower than that obtained for the SRQ-20, while the K10 and MHI-5 had specificity not significantly lower than the DQ5. Only the DQ5 had AUC, sensitivity and specificity that were not significantly different from the highest-performing scales on each metric.

Table 2. Pearson correlations between distress screeners, with internal consistency (Cronbach alpha) and proportion of variance explained by a single factor

AUC, area under the curve; PHQ-4, Patient Health Questionnaire-4; K10, Kessler-10; K6, Kessler-6; DQ5, Distress Questionnaire-5; MHI-5, Mental Health Inventory-5; HSCL, Hopkins Symptom Checklist; SRQ, Self-Report Questionnaire.

Table 3. Psychometric properties of distress screeners, with comparison to DSM-5 criteria for any internalising disorder (n = 3620)

AUC, area under the curve; PHQ-4, Patient Health Questionnaire-4; K10, Kessler-10; K6: Kessler-6; DQ5, Distress Questionnaire-5; MHI-5, Mental Health Inventory-5; HSCL, Hopkins Symptom Checklist; SRQ, Self-Report Questionnaire.

Notes: bold values indicate p < 0.05.

The fit of items within each scale on a unidimensional latent construct was tested using CFA, with fit statistics shown in Table 4. The Distress Thermometer could not be tested as it includes only one item. The Table indicates that only the DQ5, HSCL-25, and SRQ-20 had adequate fit, based on all three criteria (⩾0.90 for CFI and TLI and ⩽0.08 for RMSEA). Loadings of items on the latent dimension were adequate for all items in all scales, ranging 0.84–0.91 for PHQ-4, 0.71–0.90 for K10, 0.65–0.93 for K6, 0.76–0.90 for DQ5, 0.65–0.89 for MHI-5, 0.47–0.91 for HSCL-25 (five items loaded <0.6) and 0.49–0.88 for SRQ-20 (five items loaded <0.6). Fit tended to be better for longer scales, particularly based on RMSEA, with the exception of the DQ-5 which had good fit. A single factor explained more than 70% of the variance in each scale (Table 2), except for the HSCL-25 (47%) and SRQ-20 (38%), with subsequent factors accounting for 8% or less of additional variance, providing further support for unidimensionality. Online Supplementary Fig. S1 displays the information curves for the scales, indicating the location on the latent dimension where each scale was most accurate. Scales with more items tended to provide greater information, with the exception of the SRQ-20. Most of the scales were accurate within a range of −2 to +4 standard deviations from the population mean (θ = 0). However, the SRQ-20, PHQ-4 and DT had limited accuracy in the upper tail of the distress dimension.

Table 4. Summary of fit statistics from Confirmatory Factor Analyses of each screener

Note: bold items indicate RMSEA < 0.08, CFI > 0.95 or TLI > 0.95.

IRT analysis and development of the common metric

The unidimensional CFA of the combined item bank for the eight psychological distress scales evidenced acceptable model fit with CFI = 0.93, TLI = 0.93, and RMSEA = 0.043. The standardized factor loadings for all items were significant and ranged between 0.51 and 0.93. Inspection of the residual correlation matrix after fitting the IRT model indicated that 11 item pairs comprising 19 unique items had a residual correlation >0.3. These residual correlations ranged from 0.31 to 0.59. One item from each pair was excluded from the item pool and the parameters were re-estimated and the difference in the discrimination parameters between the remaining items from each pair was calculated. The impact of the local dependence on the remaining parameters was minimal with an average difference in the discrimination parameters of 0.03 and a range of 0.02–0.05. Consequently, all items were retained in the item bank. The S-X2 item fit statistics indicated that eight items displayed poor fit (Bonferroni adjusted p value = 0.05/70). These items included those targeting feelings of worthlessness and hopelessness from the K-10, DQ5, SRQ-20, and HSCL-25, feeling depressed and unhappy from the K-10 and SRQ-20 as well as experiencing headaches from the HSCL-25. These items were excluded from the final estimation of the fitted item parameters. Figure 1 provides the crosswalk conversions for the seven distress measures (excluding the one-item Distress Thermometer), with raw scores on each scale converted to theta scores (with a population mean of 0 and standard deviation of 1) as well as T scores (with a population mean of 50 and standard deviation of 10). By converting raw scores to theta or T scores, it is possible to convert or pool data from multiple scales onto a common metric. Online Supplementary Table S1 provides the weighted IRT parameters for the final fitted items and the parameters for the excluded items placed on the metric derived by the fitted items. Online Supplementary Table S2 provides the information from Fig. 2 in a tabular format that may be more convenient for researchers looking to convert scores across multiple scales.

Fig. 1. Crosswalk conversion for seven measures of psychological distress.

Discussion

The current study compared the psychometric performance of eight distress measures. In terms of identifying individuals at risk of a DSM-5 internalising disorder (as assessed using a comprehensive self-reported symptom checklist), most scales performed adequately with sensitivity >0.80 and specificity >0.75. The DQ5 performed optimally, with sensitivity, specificity, and area under the curve not significantly different from the highest performing measure on each psychometric indicator and evidence for good fit to a unidimensional latent construct. The K6, HSCL-25, and DT had significantly poorer sensitivity and significantly poorer specificity compared with the best performing measures, with inadequate fit to a unidimensional latent construct. In addition, information curves suggested that the SRQ-20, PHQ-4, and DT may have modest accuracy at higher levels of the distress spectrum, potentially limiting their ability to distinguish between individuals with moderate and severe levels of distress.

The results of this study indicate that the DQ5 was the most robust screener for distress in this sample, an interesting outcome given as it was one of the briefest scales tested. The marginal differences in performance of other screeners should not necessarily deter use of more established distress measures, such as the K6/K10. Nevertheless, researchers and clinicians should be aware that scales such as these may not robustly measure a unidimensional latent construct of distress, and may be slightly less accurate in identifying distress than other screeners. The superior performance of the DQ5 may arise from the approach to its development in an independent population sample, which took into account a broad range of internalising symptoms, rather than focusing on only MDD and GAD symptoms (Batterham et al. Reference Batterham, Sunderland, Carragher, Calear, Mackinnon and Slade2016). Using distress screeners as a proxy for risk of mood disorders or specific disorders such as MDD may result in different conclusions about the most appropriate screening tool. The present study used a broad definition of psychological distress to incorporate internalising beyond specific disorders, in accordance with other research on non-specific psychological distress.

There are many potential practical applications of the crosswalk tables provided in this study. Researchers may use the crosswalk tables to rapidly harmonise distress data from multiple studies, enabling them to more accurately aggregate data for applications such as meta-analyses, with less bias than other approaches such as reliance on cut-off scores or Z scores (Bauer & Hussong, Reference Bauer and Hussong2009; Choi et al. Reference Choi, Schalet, Cook and Cella2014). Such pooling of data provides great power to observe small but potentially important effects in the population, such as identifying novel genetic and other biological markers for mental disorders (Hussong et al. Reference Hussong, Curran and Bauer2013). Pooling of data also enables comparison of outcomes across independent treatment programs and patient populations. In addition, in moving to more accurate and newly developed measures of distress, researchers may use the crosswalk tables to maintain consistency with historical data (Kaat et al. Reference Kaat, Newcomb, Ryan and Mustanski2017). Clinicians may similarly use the crosswalk tables to compare distress scores generated from different scales, due to the use of different distress measures by other clinicians or a change in the choice of a distress scale in their own practice.

This is the first study to compare such a broad array of distress measures in a general community sample, and provides clear guidelines for harmonisation of different distress measures across studies. However, there are some limitations that should be taken into account in interpreting the findings. First, there are some distress measures that were not used in the current study, such as the GHQ-12, which is a proprietary measure (Goldberg, Reference Goldberg1992), and other less commonly used screening tools. Further research is required to incorporate other measures into the crosswalk tables, but could build upon the data presented here using anchor measures. Second, the sample was not representative of the general Australian population, particularly in terms of gender and prevalence of mental disorders. The weighting scheme somewhat addressed this issue for the crosswalk tables, which are intended for use in general population samples. Due to the self-selection of the sample, further testing of the crosswalks in other population-based samples would provide additional confidence in their use as normative indicators. Nevertheless, this self-selection likely had a similar impact on each of the scales, suggesting that scale comparisons using the crosswalk tables would remain valid. Third, the standard used to assess clinical criteria was a self-report checklist, due to the practical and financial constraints involved in conducting a large-scale population-based assessment. The checklist systematically assessed all criteria for internalising disorders based on DSM-5 criteria, similar to other tools such as the MINI (Zbozinek et al. Reference Zbozinek, Rose, Wolitzky-Taylor, Sherbourne, Sullivan, Stein, Roy-Byrne and Craske2012). Nevertheless, self-report of DSM-5 criteria may not be consistent with clinician-administered measures, so clinician diagnosis would provide a more rigorous standard for diagnostic comparison. Finally, distress manifesting in externalising behaviours or thought disorder was not assessed in the current study and may not be adequately captured by the identified measures.

In conclusion, the present study evaluated the psychometric properties of eight distress measures by comparing with DSM-5 criteria for internalising disorders and examining whether they fit a unidimensional latent construct. Although most measures performed adequately in assessing criteria for DSM-5 internalising disorders, the DQ5 outperformed the other measures. The crosswalk tables presented may facilitate the pooling and conversion of data for research and clinical purposes.

Supplementary material

The supplementary material for this article can be found at https://doi.org/10.1017/S0033291717002835.

Acknowledgements

This study was funded by NHMRC Project Grant 1043952. PJB and ALC are supported by NHMRC Fellowships 1083311 and 1122544 respectively. The authors thank Dr Sonia McCallum for managing the data collection for the study. The authors declare no conflicts of interest.

References

Askew, RL, Kim, J, Chung, H, Cook, KF, Johnson, KL, Amtmann, D (2013) Development of a crosswalk for pain interference measured by the BPI and PROMIS pain interference short form. Quality of Life Research 22, 2769–2776.Google Scholar

Australian Bureau of Statistics (2014) 3101.0 Australian Demographic Statistics. Commonwealth of Australia: Canberra.Google Scholar

Baksheev, GN, Robinson, J, Cosgrave, EM, Baker, K, Yung, AR (2011) Validity of the 12-item general health questionnaire (GHQ-12) in detecting depressive and anxiety disorders among high school students. Psychiatry Research 187, 291–296.Google Scholar

Batterham, PJ, Calear, AL, Christensen, H, Carragher, N, Sunderland, M (2017 a). Independent effects of mental disorders on suicidal behaviour in the community. Suicide and Life-Threatening Behavior in press. doi: 10.1111/sltb.12379.Google Scholar

Batterham, PJ, Sunderland, M, Carragher, N, Calear, AL (2017 b). Psychometric properties of 7- and 30-day versions of the PROMIS emotional distress item banks in an Australian adult sample. Assessment 1073191116685809.Google Scholar

Batterham, PJ, Sunderland, M, Carragher, N, Calear, AL, Mackinnon, AJ, Slade, T (2016) The distress questionnaire-5: population screener for psychological distress was more accurate than the K6/K10. Journal of Clinical Epidemiology 71, 35–42.Google Scholar

Bauer, DJ, Hussong, AM (2009) Psychometric approaches for developing commensurate measures across independent studies: traditional and new models. Psychological Methods 14, 101–125.CrossRef Google Scholar PubMed

Berwick, DM, Murphy, JM, Goldman, PA, Ware, JE Jr., Barsky, AJ, Weinstein, MC (1991) Performance of a five-item mental health screening test. Medical Care 29, 169–176.Google Scholar

Beusenberg, M, Orley, JH, World Health Organization (1994) A User's Guide to the Self Reporting Questionnaire (SRQ). World Health Organization: Geneva, Switzerland.Google Scholar

Bock, RD, Aitkin, M (1981) Marginal maximum-likelihood estimation of item parameters – application of an Em algorithm. Psychometrika 46, 443–459.Google Scholar

Bower, P, Kontopantelis, E, Sutton, A, Kendrick, T, Richards, DA, Gilbody, S, Knowles, S, Cuijpers, P, Andersson, G, Christensen, H, Meyer, B, Huibers, M, Smit, F, van Straten, A, Warmerdam, L, Barkham, M, Bilich, L, Lovell, K, Liu, ET (2013) Influence of initial severity of depression on effectiveness of low intensity interventions: meta-analysis of individual patient data. BMJ 346, f540.Google Scholar

Brown, TA (2015) Confirmatory Factor Analysis for Applied Research. Guilford Publications: New York, NY.Google Scholar

Brown, TA, Barlow, DH (2005) Dimensional versus categorical classification of mental disorders in the fifth edition of the diagnostic and statistical manual of mental disorders and beyond: comment on the special section. Journal of Abnormal Psychology 114, 551–556.Google Scholar

Carlson, LE, Groff, SL, Maciejewski, O, Bultz, BD (2010) Screening for distress in lung and breast cancer outpatients: a randomized controlled trial. Journal of Clinical Oncology 28, 4884–4891.Google Scholar

Chalmers, RP (2012) Mirt: a multidimensional item response theory package for the R environment. Journal of Statistical Software 48, 1–29.Google Scholar

Choi, SW, Schalet, B, Cook, KF, Cella, D (2014) Establishing a common metric for depressive symptoms: linking the BDI-II, CES-D, and PHQ-9 to PROMIS depression. Psychological Assessment 26, 513–527.CrossRef Google Scholar PubMed

Christensen, H, Griffiths, KM, Jorm, AF (2004) Delivering interventions for depression by using the internet: randomised controlled trial. BMJ 328, 265.Google Scholar

Culverhouse, RC, Bowes, L, Breslau, N, Nurnberger, JI Jr., Burmeister, M, Fergusson, DM, Munafo, M, Saccone, NL, Bierut, LJ, Httlpr, S, Depression, C (2013) Protocol for a collaborative meta-analysis of 5-HTTLPR, stress, and depression. BMC Psychiatry 13, 304.Google Scholar

Cuthbert, BN (2014) The RDoC framework: facilitating transition from ICD/DSM to dimensional approaches that integrate neuroscience and psychopathology. World Psychiatry 13, 28–35.Google Scholar

Donovan, KA, Grassi, L, McGinty, HL, Jacobsen, PB (2014) Validation of the distress thermometer worldwide: state of the science. Psycho-Oncology 23, 241–250.CrossRef Google Scholar PubMed

Duijts, SF, Faber, MM, Oldenburg, HS, van Beurden, M, Aaronson, NK (2011) Effectiveness of behavioral techniques and physical exercise on psychosocial functioning and health-related quality of life in breast cancer patients and survivors – a meta-analysis. Psycho-Oncology 20, 115–126.Google Scholar

Furukawa, TA, Kessler, RC, Slade, T, Andrews, G (2003) The performance of the K6 and K10 screening scales for psychological distress in the Australian national survey of mental health and well-being. Psychological Medicine 33, 357–362.Google Scholar

Goldberg, D (1992) General Health Questionnaire (GHQ-12). NFER-Nelson: Windsor, UK.Google Scholar

Hickman, NJ III, Delucchi, KL, Prochaska, JJ (2014) Menthol use among smokers with psychological distress: findings from the 2008 and 2009 national survey on drug Use and health. Tobacco Control 23, 7–13.Google Scholar

Hu, LT, Bentler, PM (1998) Fit indices in covariance structure modeling: sensitivity to underparameterized model misspecification. Psychological Methods 3, 424–453.CrossRef Google Scholar

Hu, LT, Bentler, PM (1999) Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Structural Equation Modeling 6, 1–55.CrossRef Google Scholar

Hussong, AM, Curran, PJ, Bauer, DJ (2013) Integrative data analysis in clinical psychology research. Annual Review of Clinical Psychology 9, 61–89.CrossRef Google Scholar PubMed

Kaat, AJ, Newcomb, ME, Ryan, DT, Mustanski, B (2017) Expanding a common metric for depression reporting: linking two scales to PROMIS(R) depression. Quality of Life Research 26, 1119–1128.Google Scholar

Kang, T, Chen, TT (2011) Performance of the generalized S-X-2 item fit index for the graded response model. Asia Pacific Education Review 12, 89–96.Google Scholar

Kaul, S, Avila, JC, Mutambudzi, M, Russell, H, Kirchhoff, AC, Schwartz, CL (2017) Mental distress and health care use among survivors of adolescent and young adult cancer: a cross-sectional analysis of the national health interview survey. Cancer 123, 869–878.Google Scholar

Kessler, RC, Andrews, G, Colpe, LJ, Hiripi, E, Mroczek, DK, Normand, SL, Walters, EE, Zaslavsky, AM (2002) Short screening scales to monitor population prevalences and trends in non-specific psychological distress. Psychological Medicine 32, 959–976.Google Scholar

Kroenke, K, Spitzer, RL, Williams, JB, Lowe, B (2009) An ultra-brief screening scale for anxiety and depression: the PHQ-4. Psychosomatics 50, 613–621.Google Scholar

Linton, SJ (2000) A review of psychological risk factors in back and neck pain. Spine 25, 1148–1156.Google Scholar

McCabe, CJ, Thomas, KJ, Brazier, JE, Coleman, P (1996) Measuring the mental health status of a population: a comparison of the GHQ-12 and the SF-36 (MHI-5). British Journal of Psychiatry 169, 516–521.CrossRef Google Scholar PubMed

Mitchell, AJ (2007) Pooled results from 38 analyses of the accuracy of distress thermometer and other ultra-short methods of detecting cancer-related mood disorders. Journal of Clinical Oncology 25, 4670–4681.Google Scholar

Orlando, M, Thissen, D (2000) Likelihood-based item-fit indices for dichotomous item response theory models. Applied Psychological Measurement 24, 48–62.Google Scholar

Overholser, JC, Freiheit, SR, DiFilippo, JM (1997) Emotional distress and substance abuse as risk factors for suicide attempts. Canadian Journal of Psychiatry. Revue Canadienne de Psychiatrie 42, 402–408.Google Scholar

Payton, AR (2009) Mental health, mental illness, and psychological distress: same continuum or distinct phenomena? Journal of Health and Social Behavior 50, 213–227.Google Scholar

Reeve, BB, Hays, RD, Bjorner, JB, Cook, KF, Crane, PK, Teresi, JA, Thissen, D, Revicki, DA, Weiss, DJ, Hambleton, RK, Liu, H, Gershon, R, Reise, SP, Lai, JS, Cella, D, Group, PC (2007) Psychometric evaluation and calibration of health-related quality of life item banks: plans for the patient-reported outcomes measurement information system (PROMIS). Medical Care 45, S22–S31.Google Scholar

Sandanger, I, Moum, T, Ingebrigtsen, G, Dalgard, OS, Sorensen, T, Bruusgaard, D (1998) Concordance between symptom screening and diagnostic procedure: the hopkins symptom checklist-25 and the composite international diagnostic interview I. Social Psychiatry and Psychiatric Epidemiology 33, 345–354.Google Scholar

Schalet, BD, Cook, KF, Choi, SW, Cella, D (2014) Establishing a common metric for self-reported anxiety: linking the MASQ, PANAS, and GAD-7 to PROMIS anxiety. Journal of Anxiety Disorders 28, 88–96.Google Scholar

Slade, T, Johnston, A, Oakley Browne, MA, Andrews, G, Whiteford, H (2009) 2007 national survey of mental health and wellbeing: methods and key findings. Australian and New Zealand Journal of Psychiatry 43, 594–605.Google Scholar

Stansfeld, SA, Fuhrer, R, Shipley, MJ, Marmot, MG (2002) Psychological distress as a risk factor for coronary heart disease in the whitehall II study. International Journal of Epidemiology 31, 248–255.Google Scholar

Sunderland, M, Carragher, N, Buchan, H, Batterham, PJ, Slade, T (2014) Comparing profiles of mental disorder across birth cohorts: results from the 2007 Australian national survey of mental health and wellbeing. Australian and New Zealand Journal of Psychiatry 48, 452–463.Google Scholar

Sunderland, M, Slade, T, Stewart, G, Andrews, G (2011) Estimating the prevalence of DSM-IV mental illness in the Australian general population using the Kessler psychological distress scale. Australian and New Zealand Journal of Psychiatry 45, 880–889.Google Scholar

Thissen, D, Pommerich, M, Billeaud, K, Williams, VSL (1995) Item response theory for scores on tests including polytomous items with ordered responses. Applied Psychological Measurement 19, 39–49.Google Scholar

Umegaki, Y, Todo, N (2017) Psychometric properties of the Japanese CES-D, SDS, and PHQ-9 depression scales in university students. Psychological Assessment 29, 354–359.Google Scholar

Ventevogel, P, De Vries, G, Scholte, WF, Shinwari, NR, Faiz, H, Nassery, R, van den Brink, W, Olff, M (2007) Properties of the hopkins symptom checklist-25 (HSCL-25) and the self-reporting questionnaire (SRQ-20) as screening instruments used in primary care in Afghanistan. Social Psychiatry and Psychiatric Epidemiology 42, 328–335.Google Scholar

Wahl, I, Lowe, B, Bjorner, JB, Fischer, F, Langs, G, Voderholzer, U, Aita, SA, Bergemann, N, Brahler, E, Rose, M (2014) Standardization of depression measurement: a common metric was developed for 11 self-report depression measures. Journal of Clinical Epidemiology 67, 73–86.Google Scholar

Zbozinek, TD, Rose, RD, Wolitzky-Taylor, KB, Sherbourne, C, Sullivan, G, Stein, MB, Roy-Byrne, PP, Craske, MG (2012) Diagnostic overlap of generalized anxiety disorder and major depressive disorder in a primary care sample. Depression and Anxiety 29, 1065–1071.CrossRef Google Scholar

Table 1. Characteristics of the sample based on absence or presence of a mental disorder as assessed using a self-report symptom checklist (n = 620)

Table 2. Pearson correlations between distress screeners, with internal consistency (Cronbach alpha) and proportion of variance explained by a single factor

Table 3. Psychometric properties of distress screeners, with comparison to DSM-5 criteria for any internalising disorder (n = 3620)

Table 4. Summary of fit statistics from Confirmatory Factor Analyses of each screener

Fig. 1. Crosswalk conversion for seven measures of psychological distress.

Batterham et al supplementary material

Batterham et al supplementary material 1

File 30.3 KB

Batterham et al supplementary material

Batterham et al supplementary material 2

Image 3.3 MB

Batterham et al supplementary material

Batterham et al supplementary material 3

Image 12.1 MB

Article contents

Assessing distress in the community: psychometric properties and crosswalk comparison of eight measures of psychological distress

Abstract

Keywords

Method

Participants and procedure

Measures

Analysis

Results

Comparison of distress measures

IRT analysis and development of the common metric

Discussion

Supplementary material

Acknowledgements

References

Batterham et al supplementary material

Batterham et al supplementary material

Batterham et al supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests