There are many uses for dimensional scales assessing non-specific (or ‘general’) psychological distress. Such measures are frequently used in epidemiological and clinical research, as well as in health service settings. A psychological distress measure may be used as a proxy or screener for mental health status (e.g. Payton, Reference Payton2009; Sunderland et al. Reference Sunderland, Slade, Stewart and Andrews2011), to assess treatment effectiveness (e.g. Christensen et al. Reference Christensen, Griffiths and Jorm2004; Duijts et al. Reference Duijts, Faber, Oldenburg, van Beurden and Aaronson2011), as a risk factor for clinical outcomes (e.g. Overholser et al. Reference Overholser, Freiheit and DiFilippo1997; Linton, Reference Linton2000; Stansfeld et al. Reference Stansfeld, Fuhrer, Shipley and Marmot2002) or as a screener to determine whether clinical assessment or intervention may be required (e.g. Carlson et al. Reference Carlson, Groff, Maciejewski and Bultz2010; Baksheev et al. Reference Baksheev, Robinson, Cosgrave, Baker and Yung2011). Due to the response burden of more extensive clinical interviews and disorder-specific measures, psychological distress measures have become popular for use in research where mental health is a focus or a consideration due to their brevity and accuracy. The importance of psychological distress measures has increased, with growing emphasis on measures that broadly assess symptom severity (Brown & Barlow, Reference Brown and Barlow2005; Cuthbert, Reference Cuthbert2014). A number of national health surveys use psychological distress measures to assess severity of non-specific psychological symptoms (Hickman et al. Reference Hickman, Delucchi and Prochaska2014; Sunderland et al. Reference Sunderland, Carragher, Buchan, Batterham and Slade2014; Kaul et al. Reference Kaul, Avila, Mutambudzi, Russell, Kirchhoff and Schwartz2017). The present study aimed to compare the psychometric properties of multiple measures of psychological distress and develop crosswalk tables to facilitate comparison across different measures.
There are many measures available to measure psychological distress. The development of these measures is varied, but most have their origins in conceptualisations of symptoms of internalising mental disorders. The focus on specific symptoms of common mental disorders may be problematic, as features of less common manifestations of distress may be overlooked (Batterham et al. Reference Batterham, Sunderland, Carragher, Calear, Mackinnon and Slade2016). While there have been comparisons of selected psychological distress scales (e.g. McCabe et al. Reference McCabe, Thomas, Brazier and Coleman1996; Furukawa et al. Reference Furukawa, Kessler, Slade and Andrews2003; Ventevogel et al. Reference Ventevogel, De Vries, Scholte, Shinwari, Faiz, Nassery, van den Brink and Olff2007; Batterham et al. Reference Batterham, Sunderland, Carragher, Calear, Mackinnon and Slade2016), there is limited comparative information available on which measures are optimal for assessing psychological distress in the community. Such evaluation may adopt a number of approaches, including: examining the precision of such distress measures in identifying clinical caseness for multiple mental health problems (Furukawa et al. Reference Furukawa, Kessler, Slade and Andrews2003; Ventevogel et al. Reference Ventevogel, De Vries, Scholte, Shinwari, Faiz, Nassery, van den Brink and Olff2007; Batterham et al. Reference Batterham, Sunderland, Carragher, Calear, Mackinnon and Slade2016); testing psychometric properties of the measures, such as unidimensionality and item fit (McCabe et al. Reference McCabe, Thomas, Brazier and Coleman1996; Batterham et al. Reference Batterham, Sunderland, Carragher, Calear, Mackinnon and Slade2016); or examining the relative precision of measures across a latent dimension of distress (Batterham et al. Reference Batterham, Sunderland, Carragher, Calear, Mackinnon and Slade2016). Evaluation of scale performance also needs to account for the efficiency of the measure, as there may be compromises between the length (and respondent burden) of an instrument and its precision.
While comparison of measures is important in assisting researchers and clinicians in choosing a measure, the problem remains that measures are not always selected on the basis of their precision and efficiency. A lack of consistency will remain in the selection of measures in research and clinical settings. Consequently, the development of a common metric for distress measures would provide an important resource for researchers and clinicians. By equating scores across multiple scales to develop conversion tables or crosswalks between scales using Item Response Theory (IRT)-based approaches, transforming scores from one scale to another becomes relatively straightforward (Choi et al. Reference Choi, Schalet, Cook and Cella2014). Researchers may use this information to more efficiently aggregate data across studies that use different measures of psychological distress, or to facilitate more rapid development of individual patient meta-analyses (e.g. Bower et al. Reference Bower, Kontopantelis, Sutton, Kendrick, Richards, Gilbody, Knowles, Cuijpers, Andersson, Christensen, Meyer, Huibers, Smit, van Straten, Warmerdam, Barkham, Bilich, Lovell and Liu2013; Culverhouse et al. Reference Culverhouse, Bowes, Breslau, Nurnberger, Burmeister, Fergusson, Munafo, Saccone, Bierut, Httlpr and Depression2013). Clinicians may use crosswalk tables as a resource to interpret scores on disparate distress scales. Such crosswalks have been developed for measures of depression (Choi et al. Reference Choi, Schalet, Cook and Cella2014; Wahl et al. Reference Wahl, Lowe, Bjorner, Fischer, Langs, Voderholzer, Aita, Bergemann, Brahler and Rose2014; Umegaki & Todo, Reference Umegaki and Todo2017), anxiety (Schalet et al. Reference Schalet, Cook, Choi and Cella2014) and pain (Askew et al. Reference Askew, Kim, Chung, Cook, Johnson and Amtmann2013). However, to our knowledge, no study has developed crosswalks for measures of psychological distress.
The present study used a community-based sample of 3620 Australian adults who completed eight measures of psychological distress (each denoted with the number of items included): the Patient Health Questionnaire-4 (PHQ-4; Kroenke et al. Reference Kroenke, Spitzer, Williams and Lowe2009), Kessler-10 (K10; Kessler et al. Reference Kessler, Andrews, Colpe, Hiripi, Mroczek, Normand, Walters and Zaslavsky2002), Kessler-6 (K6, a subset of the K10; Furukawa et al. Reference Furukawa, Kessler, Slade and Andrews2003), Distress Questionnaire-5 (DQ5; Batterham et al. Reference Batterham, Sunderland, Carragher, Calear, Mackinnon and Slade2016), Mental Health Inventory-5 (MHI-5; Berwick et al. Reference Berwick, Murphy, Goldman, Ware, Barsky and Weinstein1991), Hopkins Symptom Checklist-25 (HSCL-25; Sandanger et al. Reference Sandanger, Moum, Ingebrigtsen, Dalgard, Sorensen and Bruusgaard1998), Self-Report Questionnaire-20 (SRQ-20; Beusenberg et al. Reference Beusenberg and Orley1994); and the one-item Distress Thermometer (DT; Mitchell, Reference Mitchell2007; Donovan et al. Reference Donovan, Grassi, McGinty and Jacobsen2014). Measures were chosen on the basis of brevity and previous evidence for validity and reliability. The study aimed to compare these eight measures, based on precision in assessing clinical caseness across seven common internalising mental disorders, evidence for unidimensionality, and precision across a latent dimension of distress. IRT was then used to calibrate a single item bank comprising all items from the eight scales for the purpose of developing crosswalk tables for the different measures. As such, scores from each of the scales could be used to estimate the severity of psychological distress on the same continuum regardless of the specific instrument.
Method
Participants and procedure
Participants were recruited using advertisements on the online social media platform Facebook between January and February 2016. Advertisements linked directly to the survey and targeted Australian adults aged 18 years or older. The survey was implemented online using Qualtrics survey software. From 5379 individuals who clicked on the survey link, 5220 (97%) consented and commenced the survey and 3620 (67%) completed the full survey. There were no missing data as responses were required for all questions except age and gender, with participants encouraged to discontinue if they were uncomfortable with the survey.
Potential participants were provided with a detailed information sheet outlining survey involvement and providing mental health resources including help lines and crisis team contact information. Subsequent to presentation of the information sheet, online informed consent was collected. The survey included the measures of psychological distress, in addition to measures assessing major depressive disorder (MDD), generalised anxiety disorder (GAD), social anxiety disorder (SAD), panic disorder (PD), post-traumatic stress disorder (PTSD), obsessive compulsive disorder (OCD), adult attention deficit hyperactivity disorder (ADHD), substance use disorder (SUD), suicidality, psychosis, eating disorders, fatigue, sleep disturbance, help seeking, disclosure, demographics and personality. The survey took approximately 40–60 min to complete. No incentive was provided to participants. Ethical approval was obtained from the ANU Human Research Ethics Committee (protocol: #2015/717).
Measures
Psychological distress was assessed using each of the eight identified distress measures: PHQ-4, K10/K6, DQ5, MHI-5, HSCL-25, SRQ-20, and DT. Each of these distress measures is brief, easy to administer, freely available for research use, and has previously been shown to have robust measurement properties, including strong associations with presence of internalising disorders.
Clinical diagnosis was assessed using a DSM-5 symptom checklist, developed by the authors as a self-report assessment for clinical diagnosis based on DSM-5 criteria (Batterham et al. Reference Batterham, Calear, Christensen, Carragher and Sunderland2017a, Reference Batterham, Sunderland, Carragher and Calearb). The checklist queried respondents about the presence or absence of symptoms based directly on DSM-5 definitions for each disorder of interest. There were eight checklist items used to assess SAD; 21 for PD; 14 for GAD; 15 for MDD (including items to exclude hypomania); 22 for PTSD; 14 for OCD; and 21 for ADHD. Each item reflected a single DSM-5 criterion for the disorder of interest, although some criteria were probed across multiple questions and additional items were used to exclude alternative diagnoses. Example items included as follows: ‘In the past six months, did social situations nearly always make you feel frightened or anxious?’, ‘During the past six months, has your behaviour or difficulty in paying attention caused problems at home, work, school, or socially?’ and ‘In the past month, has there been a time when you unexpectedly felt intense fear or discomfort?’ The checklist was designed along similar principles to the electronic version of the Mini International Neuropsychiatric Interview (MINI; Zbozinek et al. Reference Zbozinek, Rose, Wolitzky-Taylor, Sherbourne, Sullivan, Stein, Roy-Byrne and Craske2012) in terms of structure (binary and categorical self-report items with conditional skip logic) and response burden. However, the checklist used in the current study was developed independently from the MINI, non-proprietary and based on DSM-5 rather than DSM-IV criteria. As with the electronic MINI, the checklist has not yet been validated against DSM-5 clinician diagnosis or structured clinical interview, but contains items reflecting each of the symptom criteria used in DSM-5 to generate a diagnosis. The full checklist has been published previously (Batterham et al. Reference Batterham, Calear, Christensen, Carragher and Sunderland2017a) and is available from the authors.
Demographic factors were reported to describe the sample, and were based on self-reported measures of age group, sex (male, female, other), educational attainment, employment status, location (metropolitan, regional, rural), and language spoken at home.
Analysis
The performance of each scale in identifying individuals meeting the checklist criteria for one or more DSM-5 mental disorders (from MDD, GAD, PD, SAD, PTSD, OCD or ADHD) was compared by examining the area under the receiver operating characteristic curve (AUC). The AUC metric provided a summary of the performance of each scale against the checklist criteria across all possible cut-points of a scale, providing a metric for comparing scales that did not require selection of a specific cut-point. The sensitivity and specificity of each scale in identifying clinical criteria as measured by the checklist were then compared at two cut-points: the cut point prescribed by the authors of the scale and the optimal cut point based on the Youden index (i.e. highest combination of sensitivity and specificity) to ensure that no screeners were disadvantaged by performance in the current sample. The correlations between scale scores were estimated to examine whether scales were measuring a similar construct. Internal consistency was assessed for each scale using Cronbach's alpha. The unidimensional structure of each measure was tested using a confirmatory factor analysis (CFA), on the basis of limited information weighted least squares with mean and variance adjustment. This estimator is suitable for the analysis of categorical/ordinal data based on tetrachoric/polychoric correlation matrix. Excellent model fit was determined using established cut-off values of ⩾0.95 for Comparative Fit Index (CFI) and Tucker-Lewis Fit Index (TLI) and values ⩽0.05 for Root Mean Square Error of Approximation (RMSEA) whereas acceptable model fit was determined using cut-off values of ⩾0.90 for CFI and TLI and values ⩽0.08 for RMSEA (Hu & Bentler, Reference Hu and Bentler1998, Reference Hu and Bentler1999). Items with a factor loading on a single factor solution <0.4 were identified as poorly fitting (Brown, Reference Brown2015). As RMSEA can be influenced by factors including the number of parameters in the model, decisions on fit were based on a range of factors including CFI/TLI, RMSEA, item loadings and percentage of variance explained by a single factor. From the CFA models, test information curves were generated to examine the precision of each measure across a latent dimension of distress.
IRT analysis was then used to calibrate the eight psychological distress measures on a single common metric. To enable calibration onto a single metric, items from all scales were incorporated into a single item bank. The IRT model assumed that this item bank was unidimensional, an assumption that was tested using a CFA model, analogous to the procedure conducted with the separate measures. Data from all items included in the combined item bank were then modelled using the 2-parameter logistic IRT model (for binary data) and the 2-parameter logistic graded response model (for ordinal data) using an Expectation-Maximization approach outlined by Bock & Aitkin (Reference Bock and Aitkin1981) and implemented in the MIRT package for R (Chalmers, Reference Chalmers2012). The residual correlation matrix (correlation between item pairs that remain after accounting for the expected correlation between items by the model) was used to determine the presence of locally dependent items. Any item pair with a residual correlation greater than an absolute value of 0.3 were flagged for inspection. Local dependence can artificially increase the discrimination parameters of the IRT model. To examine the impact of locally dependent items, the parameters were re-estimated after dropping one item from each flagged pair and drawing comparisons with the original estimates (Reeve et al. Reference Reeve, Hays, Bjorner, Cook, Crane, Teresi, Thissen, Revicki, Weiss, Hambleton, Liu, Gershon, Reise, Lai, Cella and Group2007).
The fit of each item to the IRT model was examined using the S-X2 statistic proposed by Orlando & Thissen (Reference Orlando and Thissen2000) for dichotomous data and Kang & Chen (Reference Kang and Chen2011) for polytomous data. The significance level of the S-X2 values were adjusted using the Bonferroni method. Items that exhibited poor fit were excluded from the analysis and the item parameters for the remaining items were re-estimated. These steps were repeated until all item exhibited good model fit. A procedure outlined in Wahl et al. (Reference Wahl, Lowe, Bjorner, Fischer, Langs, Voderholzer, Aita, Bergemann, Brahler and Rose2014) was utilised to generate item parameters for the items that were excluded in the final model. Briefly, the item parameters for the excluded items were estimated one by one with the item parameters of all fitting items being fixed at their previously estimated values. This placed the excluded items on the metric estimated by items with good model fit.
As the sample was not representative of the general population, with higher levels of psychopathology and over-representation of females, a weighting scheme was applied to the estimation of IRT parameters for the derivation of normative estimates. This scheme was designed to make the sample more representative of the general population in terms of age, gender, and psychopathology distributions. The weighting scheme used Australian data on the population prevalence of anxiety, affective, and substance use disorders, accounting for comorbidity between these disorder categories in each age and gender group. Data on mental disorders was obtained from the 2007 Australian National Survey of Mental Health and Wellbeing (Slade et al. Reference Slade, Johnston, Oakley Browne, Andrews and Whiteford2009), while data on age and gender distributions of the general population were obtained from Australian Bureau of Statistics population estimates (Australian Bureau of Statistics, 2014). Weights were generated for 3577 (99%) participants, excluding 43 participants who did not report either age or gender.
The individual weighted item response parameters can be used to generate theta (latent psychological distress) scores for each respondent on the common metric regardless of what specific psychological distress measure was administered. Calculating theta scores based on individual response data requires the use of a statistical package such as the MIRT package in R (Chalmers, Reference Chalmers2012). However, crosswalk or score conversion tables can be generated to easily convert grouped raw scores (e.g. count scores) from each psychological distress scale to theta scores on the normative common metric. These crosswalk tables were constructed in the current study using an Expected A Posteriori (EAP) to sum-score method (Thissen et al. Reference Thissen, Pommerich, Billeaud and Williams1995).
Descriptive and classical test statistics were derived using SPSS v23 (IBM Corp, Chicago IL USA). CFA models were estimated using Mplus v7.4 (Muthen & Muthen, Los Angeles CA USA). All IRT models and crosswalk tables were generated using R v3.3.2 (R Foundation for Statistical Computing, Vienna Austria), with the MIRT package (Chalmers, Reference Chalmers2012).
Results
Sample characteristics are provided in Table 1, broken down by absence or presence of meeting criteria for one or more DSM-5 mental disorders (MDD, GAD, PD, SAD, PTSD, OCD, or ADHD), as assessed using the self-report symptom checklist. The sample had elevated prevalence of mental disorders, with 41% meeting checklist criteria for at least one current mental disorder (9% with MDD, 22% GAD, 5% PD, 15% SAD, 11% PTSD, 12% OCD, and 19% adult ADHD). The sample comprised a relatively uniform age distribution, although declining participation was observed after age 65 years of age. Females (80%) were over-represented in the sample. Geographical location was largely representative of the Australian population. Factors significantly associated with presence of mental disorder included younger age, other gender (includes transgender), less educational attainment, unemployment and living in a metropolitan or regional area. Screener scores are also displayed in the table, with all indicating markedly greater distress among individuals meeting criteria for a mental disorder.
Table 1. Characteristics of the sample based on absence or presence of a mental disorder as assessed using a self-report symptom checklist (n = 620)

PHQ-4, Patient Health Questionnaire-4; K10, Kessler-10; K6, Kessler-6; DQ5, Distress Questionnaire-5; MHI-5, Mental Health Inventory-5; HSCL, Hopkins Symptom Checklist; SRQ, Self-Report Questionnaire.
Notes: bold values indicate p < 0.05.
Comparison of distress measures
Table 2 indicates that scores on all distress measures were highly correlated, with most having correlations >0.8 between all other screeners. The Distress Thermometer had correlations of 0.7 with other measures, which is very strong for a single item although slightly weaker than correlations between the other measures. Internal consistency for all screening scales was high, with α ⩾ 0.89. Table 3 shows the psychometric properties of each screener against meeting clinical caseness for any DSM-5 mental disorder based on the checklist. The AUC was ⩾0.85 for all scales except the Distress Thermometer. The HSCL-25 had the largest AUC (0.891), which was significantly greater than all other screeners except the DQ5. Two cut points for each screener were tested: the cut point prescribed by the authors of the scale and the optimal cut point based on the Youden index. Based on the latter criteria, the SRQ-20 had highest sensitivity and the DQ5 had highest specificity. In addition, the PHQ-4, DQ5, and MHI-5 had sensitivity not significantly lower than that obtained for the SRQ-20, while the K10 and MHI-5 had specificity not significantly lower than the DQ5. Only the DQ5 had AUC, sensitivity and specificity that were not significantly different from the highest-performing scales on each metric.
Table 2. Pearson correlations between distress screeners, with internal consistency (Cronbach alpha) and proportion of variance explained by a single factor

AUC, area under the curve; PHQ-4, Patient Health Questionnaire-4; K10, Kessler-10; K6, Kessler-6; DQ5, Distress Questionnaire-5; MHI-5, Mental Health Inventory-5; HSCL, Hopkins Symptom Checklist; SRQ, Self-Report Questionnaire.
Table 3. Psychometric properties of distress screeners, with comparison to DSM-5 criteria for any internalising disorder (n = 3620)

AUC, area under the curve; PHQ-4, Patient Health Questionnaire-4; K10, Kessler-10; K6: Kessler-6; DQ5, Distress Questionnaire-5; MHI-5, Mental Health Inventory-5; HSCL, Hopkins Symptom Checklist; SRQ, Self-Report Questionnaire.
Notes: bold values indicate p < 0.05.
The fit of items within each scale on a unidimensional latent construct was tested using CFA, with fit statistics shown in Table 4. The Distress Thermometer could not be tested as it includes only one item. The Table indicates that only the DQ5, HSCL-25, and SRQ-20 had adequate fit, based on all three criteria (⩾0.90 for CFI and TLI and ⩽0.08 for RMSEA). Loadings of items on the latent dimension were adequate for all items in all scales, ranging 0.84–0.91 for PHQ-4, 0.71–0.90 for K10, 0.65–0.93 for K6, 0.76–0.90 for DQ5, 0.65–0.89 for MHI-5, 0.47–0.91 for HSCL-25 (five items loaded <0.6) and 0.49–0.88 for SRQ-20 (five items loaded <0.6). Fit tended to be better for longer scales, particularly based on RMSEA, with the exception of the DQ-5 which had good fit. A single factor explained more than 70% of the variance in each scale (Table 2), except for the HSCL-25 (47%) and SRQ-20 (38%), with subsequent factors accounting for 8% or less of additional variance, providing further support for unidimensionality. Online Supplementary Fig. S1 displays the information curves for the scales, indicating the location on the latent dimension where each scale was most accurate. Scales with more items tended to provide greater information, with the exception of the SRQ-20. Most of the scales were accurate within a range of −2 to +4 standard deviations from the population mean (θ = 0). However, the SRQ-20, PHQ-4 and DT had limited accuracy in the upper tail of the distress dimension.
Table 4. Summary of fit statistics from Confirmatory Factor Analyses of each screener

PHQ-4, Patient Health Questionnaire-4; K10, Kessler-10; K6, Kessler-6; DQ5, Distress Questionnaire-5; MHI-5, Mental Health Inventory-5; HSCL, Hopkins Symptom Checklist; SRQ, Self-Report Questionnaire.
Note: bold items indicate RMSEA < 0.08, CFI > 0.95 or TLI > 0.95.
IRT analysis and development of the common metric
The unidimensional CFA of the combined item bank for the eight psychological distress scales evidenced acceptable model fit with CFI = 0.93, TLI = 0.93, and RMSEA = 0.043. The standardized factor loadings for all items were significant and ranged between 0.51 and 0.93. Inspection of the residual correlation matrix after fitting the IRT model indicated that 11 item pairs comprising 19 unique items had a residual correlation >0.3. These residual correlations ranged from 0.31 to 0.59. One item from each pair was excluded from the item pool and the parameters were re-estimated and the difference in the discrimination parameters between the remaining items from each pair was calculated. The impact of the local dependence on the remaining parameters was minimal with an average difference in the discrimination parameters of 0.03 and a range of 0.02–0.05. Consequently, all items were retained in the item bank. The S-X2 item fit statistics indicated that eight items displayed poor fit (Bonferroni adjusted p value = 0.05/70). These items included those targeting feelings of worthlessness and hopelessness from the K-10, DQ5, SRQ-20, and HSCL-25, feeling depressed and unhappy from the K-10 and SRQ-20 as well as experiencing headaches from the HSCL-25. These items were excluded from the final estimation of the fitted item parameters. Figure 1 provides the crosswalk conversions for the seven distress measures (excluding the one-item Distress Thermometer), with raw scores on each scale converted to theta scores (with a population mean of 0 and standard deviation of 1) as well as T scores (with a population mean of 50 and standard deviation of 10). By converting raw scores to theta or T scores, it is possible to convert or pool data from multiple scales onto a common metric. Online Supplementary Table S1 provides the weighted IRT parameters for the final fitted items and the parameters for the excluded items placed on the metric derived by the fitted items. Online Supplementary Table S2 provides the information from Fig. 2 in a tabular format that may be more convenient for researchers looking to convert scores across multiple scales.

Fig. 1. Crosswalk conversion for seven measures of psychological distress.
Discussion
The current study compared the psychometric performance of eight distress measures. In terms of identifying individuals at risk of a DSM-5 internalising disorder (as assessed using a comprehensive self-reported symptom checklist), most scales performed adequately with sensitivity >0.80 and specificity >0.75. The DQ5 performed optimally, with sensitivity, specificity, and area under the curve not significantly different from the highest performing measure on each psychometric indicator and evidence for good fit to a unidimensional latent construct. The K6, HSCL-25, and DT had significantly poorer sensitivity and significantly poorer specificity compared with the best performing measures, with inadequate fit to a unidimensional latent construct. In addition, information curves suggested that the SRQ-20, PHQ-4, and DT may have modest accuracy at higher levels of the distress spectrum, potentially limiting their ability to distinguish between individuals with moderate and severe levels of distress.
The results of this study indicate that the DQ5 was the most robust screener for distress in this sample, an interesting outcome given as it was one of the briefest scales tested. The marginal differences in performance of other screeners should not necessarily deter use of more established distress measures, such as the K6/K10. Nevertheless, researchers and clinicians should be aware that scales such as these may not robustly measure a unidimensional latent construct of distress, and may be slightly less accurate in identifying distress than other screeners. The superior performance of the DQ5 may arise from the approach to its development in an independent population sample, which took into account a broad range of internalising symptoms, rather than focusing on only MDD and GAD symptoms (Batterham et al. Reference Batterham, Sunderland, Carragher, Calear, Mackinnon and Slade2016). Using distress screeners as a proxy for risk of mood disorders or specific disorders such as MDD may result in different conclusions about the most appropriate screening tool. The present study used a broad definition of psychological distress to incorporate internalising beyond specific disorders, in accordance with other research on non-specific psychological distress.
There are many potential practical applications of the crosswalk tables provided in this study. Researchers may use the crosswalk tables to rapidly harmonise distress data from multiple studies, enabling them to more accurately aggregate data for applications such as meta-analyses, with less bias than other approaches such as reliance on cut-off scores or Z scores (Bauer & Hussong, Reference Bauer and Hussong2009; Choi et al. Reference Choi, Schalet, Cook and Cella2014). Such pooling of data provides great power to observe small but potentially important effects in the population, such as identifying novel genetic and other biological markers for mental disorders (Hussong et al. Reference Hussong, Curran and Bauer2013). Pooling of data also enables comparison of outcomes across independent treatment programs and patient populations. In addition, in moving to more accurate and newly developed measures of distress, researchers may use the crosswalk tables to maintain consistency with historical data (Kaat et al. Reference Kaat, Newcomb, Ryan and Mustanski2017). Clinicians may similarly use the crosswalk tables to compare distress scores generated from different scales, due to the use of different distress measures by other clinicians or a change in the choice of a distress scale in their own practice.
This is the first study to compare such a broad array of distress measures in a general community sample, and provides clear guidelines for harmonisation of different distress measures across studies. However, there are some limitations that should be taken into account in interpreting the findings. First, there are some distress measures that were not used in the current study, such as the GHQ-12, which is a proprietary measure (Goldberg, Reference Goldberg1992), and other less commonly used screening tools. Further research is required to incorporate other measures into the crosswalk tables, but could build upon the data presented here using anchor measures. Second, the sample was not representative of the general Australian population, particularly in terms of gender and prevalence of mental disorders. The weighting scheme somewhat addressed this issue for the crosswalk tables, which are intended for use in general population samples. Due to the self-selection of the sample, further testing of the crosswalks in other population-based samples would provide additional confidence in their use as normative indicators. Nevertheless, this self-selection likely had a similar impact on each of the scales, suggesting that scale comparisons using the crosswalk tables would remain valid. Third, the standard used to assess clinical criteria was a self-report checklist, due to the practical and financial constraints involved in conducting a large-scale population-based assessment. The checklist systematically assessed all criteria for internalising disorders based on DSM-5 criteria, similar to other tools such as the MINI (Zbozinek et al. Reference Zbozinek, Rose, Wolitzky-Taylor, Sherbourne, Sullivan, Stein, Roy-Byrne and Craske2012). Nevertheless, self-report of DSM-5 criteria may not be consistent with clinician-administered measures, so clinician diagnosis would provide a more rigorous standard for diagnostic comparison. Finally, distress manifesting in externalising behaviours or thought disorder was not assessed in the current study and may not be adequately captured by the identified measures.
In conclusion, the present study evaluated the psychometric properties of eight distress measures by comparing with DSM-5 criteria for internalising disorders and examining whether they fit a unidimensional latent construct. Although most measures performed adequately in assessing criteria for DSM-5 internalising disorders, the DQ5 outperformed the other measures. The crosswalk tables presented may facilitate the pooling and conversion of data for research and clinical purposes.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S0033291717002835.
Acknowledgements
This study was funded by NHMRC Project Grant 1043952. PJB and ALC are supported by NHMRC Fellowships 1083311 and 1122544 respectively. The authors thank Dr Sonia McCallum for managing the data collection for the study. The authors declare no conflicts of interest.