Introduction
The prevalence of common mental disorders (CMDs), a term used to describe neurotic and non-psychotic affective disorders (depressive and anxiety disorders) (Goldberg & Huxley, Reference Goldberg and Huxley1992), ranges from 10% to 40% of adults in primary-care settings (Goldberg & Lecrubier, Reference Goldberg, Lecrubier, Ustun and Sartorius1995). Although there is now compelling evidence of the efficacy of antidepressant and psychosocial treatments for CMDs in primary care (NICE, 2004), there is still a wide gap between the efficacy and effectiveness of specific treatments in routine practice (Simon, Reference Simon1998; Thompson et al. Reference Thompson, Kinmonth and Stevens2000; Hodges et al. Reference Hodges, Inch and Silver2001). A major challenge in closing this gap has been the low levels of physician recognition of CMD. Physician education has been found to be associated with an increase in the recognition of CMD, but this is often transient and does not result in lasting improvements in patients' clinical outcomes (Gerrity et al. Reference Gerrity, Cole, Dietrich and Barrett1999; Thompson et al. Reference Thompson, Kinmonth and Stevens2000). Overall, strategies to improve recognition rates have yielded disappointing results (Hodges et al. Reference Hodges, Inch and Silver2001). However, systematic reviews of collaborative care programmes for treatment of CMDs in primary care show that the use of systematic procedures for detection of CMDs significantly improves clinical outcomes (Bower et al. Reference Bower, Gilbody, Richards, Fletcher and Sutton2006).
Physician recognition of CMD rates is generally low in developing countries (Patel et al. Reference Patel, Pereira, Coutinho, Fernandes, Fernandes and Mann1998a), and improving recognition rates is a challenge because of the high patient loads, poor undergraduate training in these skills, and the stigma associated with mental illness and somatic presentations of mental disorders. It is not surprising, then, that even in developed countries, practice guidelines now advocate the routine use of screening questionnaires given the high burden of CMDs and low recognition rates in routine clinical encounters (NICE, 2004). The past two decades have seen a number of screening questionnaires being designed in developed countries (notably the USA and the UK) and by the World Health Organization (WHO). Many of these questionnaires have been adopted by international investigators for one- and two-stage epidemiological investigations. The value of these questionnaires for screening in routine clinical contexts has, to the best of our knowledge, scarcely been evaluated and compared. This was the objective of the present study, which is the first stage of a larger clinical trial aimed at developing and evaluating a collaborative stepped care intervention for CMDs in primary care in India, where screening for CMDs is the first step of the intervention. We chose five screening questionnaires for evaluation based on international use of the questionnaires, brevity (maximum of 20 questions) to ensure feasibility as a routine clinical questionnaire, and face validity of the items.
Method
Study design
The design of this study was a cross-sectional survey in primary health care.
Setting
The study was located in the state of Goa on the west coast of India. Goa has a population of roughly 1.4 million, and has been the setting of a series of studies on the epidemiology and treatment of CMD (Patel et al. Reference Patel, Pereira, Coutinho, Fernandes, Fernandes and Mann1998a, Reference Patel, Pereira and Mannb, Reference Patel, Rodrigues and De Souza2002, Reference Pothen, Kuruvilla, Philip, Joseph and Jacob2003, Reference Patel, Kirkwood, Pednekar, Pereira, Barros, Fernandes, Datta, Pai, Weiss and Mabey2006). The main language is Konkani. In India, primary health care is provided by government-run primary health centres (PHCs) and privately managed general practitioners (GPs). This study was conducted in three PHCs and in two GP clinics.
Sample
The study was nested in a larger programme that was screening all adult attenders in the selected facilities. Attenders aged below 18 years and those requiring urgent medical attention were excluded. All other attenders were deemed eligible and a systematic sample of these was selected. In three clinics with relatively smaller numbers of daily attenders, every second patient was invited to participate in our study; in the other two sites, every fourth and fifth patient respectively was invited to participate.
Data collection
All subjects who consented to participate were interviewed in two stages. The first stage comprised a brief sociodemographic questionnaire (age, sex, education, occupation), followed by a pair of screening questionnaires administered in a face-to-face interview setting, and verbal responses of the participants were noted. The four questionnaires were paired, with each pair having two sets in alternate order to create 12 sets of paired questionnaires. These were allocated in random order to consecutive eligible participants. This was followed by the second stage, a reference standard structured diagnostic interview carried out by a trained interviewer who was blind to the first stage findings.
Primary Health Questionnaire (PHQ)
The nine-item PHQ (PHQ-9) is the depression screening module of the full PHQ, a self-administered version of the Primary Care Evaluation of Mental Disorders (PRIME-MD) diagnostic instrument for CMDs (Spitzer et al. Reference Spitzer, Kroenke and Williams1999). It has been found to be a useful questionnaire for screening depression among primary-care patients because of its brevity and its ability to help to establish a DSM-IV-based diagnosis of major depression (Chen et al. Reference Chen, Huang, Chang and Chung2006). The PHQ has been used in studies of depression in developing countries (Wulsin et al. Reference Wulsin, Somoza and Heck2002; Adewuya et al. Reference Adewuya, Ola and Afolabi2006), including South Asia (Hussain et al. Reference Hussain, Creed and Tomenson2000; Malhotra et al. Reference Malhotra, Schwartz and Hameed2004).
General Health Questionnaire (GHQ)
The GHQ was originally developed in the UK (Goldberg & Williams, Reference Goldberg and Williams1988) and has since become one of the most widely used screening questionnaires internationally, including in India (Shamasundar et al. Reference Shamasundar, Krishna Murthy, Prakash, Prabhakar and Subbakrishna1986a, Reference Shamasundar, Sriram, Murali Raj and Shanmughamb; Gautam et al. Reference Gautam, Nijhawan and Kamal1987; Patel, Reference Patel1999). The short 12-item version of the GHQ has been used previously in studies in Goa (Patel et al. Reference Patel, Pereira and Mann1998b).
Self-Reporting Questionnaire (SRQ)
This 20-item questionnaire was originally developed by an international team of investigators on behalf of the WHO. It was subsequently used in one of the earliest multinational studies of CMDs in developing countries (Harding et al. Reference Harding, De Arango, Baltazar, Climent, Ibrahim, Ladrigo-Ignacio, Srinivasa Murthy and Wig1980), which included an Indian site. It has been used by a number of investigators in developing countries, including in India (Sen, Reference Sen1987; Srinivasan & Suresh, Reference Srinivasan and Suresh1990; Pothen et al. Reference Pothen, Kuruvilla, Philip, Joseph and Jacob2003).
Kessler Psychological Distress Scale (K10)
The K10 is a 10-item questionnaire developed on the basis of item response theory models (Kessler et al. Reference Kessler, Andrews, Colpe, Hripi, Mroczek, Normand, Walters and Zaslavsky2002). It has been used extensively in many countries as part of the World Mental Health Surveys (Andrews & Slade, Reference Andrews and Slade2001; Kessler et al. Reference Kessler, Andrews, Colpe, Hripi, Mroczek, Normand, Walters and Zaslavsky2002; Furukawa et al. Reference Furukawa, Kessler, Slade and Andrews2003), although, to date, no validity data have been published from developing countries. A shortened 6-item version of the questionnaire (K6) has also been advocated as a screening measure.
Of the five screening questionnaires, two of which (the K10 and K6) shared six items, most were either already available in local languages (such as the GHQ-12) or in another Indian language (e.g. the K10). Those not available in local languages were translated using the standard, stepwise, method of translation (Sartorius & Kuyken, Reference Sartorius, Kuyken, Orley and Kuyken1994). All questionnaires were piloted to assess feasibility issues, for example on the scoring method. The time duration for reporting complaints varies from 2 weeks in the GHQ-12 to 30 days for the K6/K10. The questionnaires were modified to make them more feasible for use in busy clinics (the GHQ-12 and K10/K6 scoring was made dichotomous) and the duration of reporting symptoms standardized to 2 weeks for all symptoms (as the ICD-10 diagnosis was based on a 2-week duration of symptoms).
The reference standard diagnostic interview was the Revised Clinical Interview Schedule (CIS-R), a structured interview for use by lay interviewers for the measurement and diagnosis of CMD in community and primary-care settings (Lewis et al. Reference Lewis, Pelosi, Araya and Dunn1992). The CIS-R inquires about the experience of symptoms of CMD in 14 domains (e.g. fatigue, depression, panic). It generates a total score that provides a dimensional measure of CMD. Data can also be analysed using the Programmable Questionnaire System (PROQSY) software program (available from Professor G. Lewis, University of Bristol), which generates ICD-10 diagnoses for the following CMDs: depressive episode, phobias, generalized anxiety disorder, panic disorder, obsessive-compulsive disorders, and mixed anxiety-depression disorder. The CIS-R has been used extensively in India, and specifically in Goa (Sen & Williams, Reference Sen and Williams1987; Patel et al. Reference Patel, Pereira, Coutinho, Fernandes, Fernandes and Mann1998a, Reference Patel, Pereira and Mannb, Reference Pothen, Kuruvilla, Philip, Joseph and Jacob2003, Reference Patel, Kirkwood, Pednekar, Pereira, Barros, Fernandes, Datta, Pai, Weiss and Mabey2006). The translation and field testing of the CIS-R in earlier studies in Goa are reported elsewhere (Patel et al. Reference Patel, Pereira and Mann1998b). We used four case criteria derived from the CIS-R: an ICD-10 diagnosis of any CMD; an ICD-10 diagnosis of depressive episode; a cut-off score of 11/12 (i.e. a score of 12 or more signifying case-level morbidity); and a cut-off score of 17/18 as an indicator of ‘severe’ morbidity.
Ethical consideration
All patients were required to provide written informed consent before the interviews. The study received approval from the ethical committees of the London School of Hygiene and Tropical Medicine and Sangath (the Goan collaborating institution). All participants who were found to be suffering from a CMD were offered services from the primary-care doctor and a mental health counsellor located in the clinic.
Data analysis
The sample was categorized into cases/non-cases based on the CIS-R outputs; thus, any subject with an ICD-10 diagnosis of CMD was classified as being a ‘case’. We estimated sensitivity, specificity, positive predictive values (PPVs) and positive likelihood ratios [i.e. sensitivity/(1 – specificity); Zweig & Campbell, Reference Zweig and Campbell1993] for each cut-off score for each of the screening questionnaires against the CIS-R case criteria of ‘any CMD’ and for ‘major depressive disorders’ only. We plotted receiver operating characteristic (ROC) curves, which measure the overall predictive value of a questionnaire, for each of the questionnaires and estimated areas under the curve (AUCs) using SPSS version 14.0 (SPSS Inc., Chicago, IL, USA). We estimated the internal consistency of each of the questionnaires using Cronbach's α. The degree of correlation between questionnaires was measured using Spearman's coefficient.
Results
We invited 602 eligible attenders to participate; of these, two refused participation in the first stage and two refused participation in the second stage. Thus, we had complete data on 598 participants (participation rate 99.3%), of whom 337 (56.4%) were women. The average age of the participants was 37.5 years (s.d.=14.2, range 18–83 years). The majority were married (65.2%), Hindu (92%) and spoke Konkani (88%). Based on the CISR data, 92 participants (15.4%) were diagnosed as cases of CMD. Among these, mixed anxiety-depression was the most frequent diagnosis (90 participants, 15.1%), followed by depression (33 participants; 5%). Pure anxiety disorders (i.e. agoraphobia, specific phobias and panic disorder) were diagnosed in 15 participants (2.5%). A total of 46 participants had at least two co-morbid diagnoses.
Distribution, internal consistency and correlation between questionnaires
The mean scores on each questionnaire and their internal consistency are shown in Table 1. The SRQ, GHQ and K10 showed high internal consistency (Cronbach's α>0.8) while the PHQ and K6 demonstrated moderately high levels of internal consistency (Cronbach's α 0.79 and 0.74 respectively). As shown in Table 2, the highest correlations (Spearman's coefficient) were between the SRQ and the GHQ (ρ=0.79), PHQ (0.82) and K10 (0.84). The lowest correlations were between the K6 and the GHQ (ρ=0.58) and the PHQ (0.57).
Table 1. Distribution of scores and internal consistency of questionnaires
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20151023072358902-0495:S0033291707002334_tab1.gif?pub-status=live)
GHQ-12, 12-item General Health Questionnaire; K10 and K6, 10- and 6-item Kessler Psychological Distress Scales; PHQ, Primary Health Questionnaire; SRQ, Self-Reporting Questionnaire; s.d., standard deviation.
Table 2. Correlation between questionnaires using Spearman's correlation coefficient
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20151023072358902-0495:S0033291707002334_tab2.gif?pub-status=live)
GHQ, General Health Questionnaire; K10 and K6, 10- and 6-item Kessler Psychological Distress Scales; PHQ, Primary Health Questionnaire; SRQ, Self-Reporting Questionnaire.
Case detection properties of the screening questionnaires
ROC curves are shown in Fig. 1 for the assessment against ICD-10 diagnosis of any CMD criterion. The AUC was highest for the GHQ (0.90) but was also above 0.8 for all the other questionnaires, indicating that they are all highly accurate instruments. The AUCs derived from the ROC analyses for all the case criteria are presented in Table 3.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160627184144-32535-mediumThumb-S0033291707002334_fig1g.jpg?pub-status=live)
Fig. 1. Receiver operating characteristic (ROC) areas under the curve (AUCs) for questionnaires using the ICD-10 diagnosis for any common mental disorder (CMD) criterion. GHQ-12, 12-item General Health Questionnaire; K6 and K10, 6- and 10-item Kessler Psychological Distress Scales; PHQ, Primary Health Questionnaire; SRQ, Self-Reporting Questionnaire;
Table 3. Receiver operating characteristic (ROC) areas under the curve (AUCs) for screening questionnaires against four case criteria
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20151023072358902-0495:S0033291707002334_tab3.gif?pub-status=live)
CMD, Common mental disorder; CIS-R, Revised Clinical Interview Schedule; GHQ, General Health Questionnaire; K10 and K6, 10- and 6-item Kessler Psychological Distress Scales; PHQ, Primary Health Questionnaire; SRQ, Self-Reporting Questionnaire.
To define a cut-off score at or above which a patient can be diagnosed as having a CMD, we defined our choice of cut-off scores for use in clinical settings where we sought to minimize resources allocated to patients who did not suffer from CMD. Thus, we deemed that an optimal balance between sensitivity and PPV (minimum of 50% for both) was mandatory for an acceptable questionnaire. Table 4 presents the optimal cut-offs for each of the questionnaires, against the ICD diagnosis of any CMD criterion that met our criteria for acceptability, and the corresponding coefficients, including likelihood ratios. No cut-off on the PHQ met the acceptability criteria. For example, when using a cutoff of 12/13, the sensitivity was 61% but the PPV was only 40%.
Table 4. Acceptable cut-off scores for the questionnaires against the ICD-10 diagnosis for any common mental disorder (CMD) criterion
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20151023072358902-0495:S0033291707002334_tab4.gif?pub-status=live)
GHQ, General Health Questionnaire; K10 and K6, 10- and 6-item Kessler Psychological Distress Scales; SRQ, Self-Reporting Questionnaire.
a 5/6 represents a cutpoint between the values 5 and 6, etc.
There was no significant difference in the duration for completion of the questionnaires, each requiring an average of 3 min per subject. We found no significant problems with the comprehension of any of the questions on the questionnaires.
Conclusions
We describe the results of a comparison of five widely used screening questionnaires for the detection of CMDs (depressive and anxiety disorders) in adults attending primary care. The high participation rate of 99.3% can be explained by two factors: the first is the encouragement given by the primary-care doctor to participate in the study, and the second is the intervention being offered to patients who were identified as having CMD. All five questionnaires showed moderate to high discriminating ability in relation to a structured clinical interview; the GHQ and SRQ performed the best as compared to the others irrespective of the case criterion used; the poorest performance was for the shorter questionnaires (K6 and the PHQ). We found no significant differences in the time required to complete the questionnaires. All the questionnaires showed moderate to high degrees of correlation with one another, the poorest correlations being between the shortest questionnaires.
The relatively poorer performance of the K6 may be partly attributable to the change in its scoring system in our study; however, we also modified the scoring system of the GHQ. Furthermore, if we had retained the original five-point scoring system of the K questionnaires, they would have not satisfied our requirement for feasibility and acceptability contingent for a routine screening questionnaire in a busy primary-care setting. The ROC AUCs estimated for the diagnosis of depression and for severe psychiatric morbidity with all questionnaires also showed similar trends. All five questionnaires had relatively good internal consistency. This supports the dimensional concept of CMD and the strong correlation between anxiety and depression in primary care (Lewis, Reference Lewis1992; Jacob et al. Reference Jacob, Everitt, Patel, Weich, Araya and Lewis1998) and calls into question the clinical and construct validity of the distinction of anxiety and depressive disorders in primary care.
Thus, our study findings are similar to those of earlier comparisons that showed that the operating characteristics of screening instruments were similar and concluded that selection of a particular instrument should be determined by issues such as feasibility, administration and scoring times, and the instruments' ability to serve additional purposes, such as monitoring severity or response to therapy (Mulrow et al. Reference Mulrow, Williams, Gerety, Ramirez, Montiel and Kerber1995). For use in routine clinical care, a questionnaire should be able to identify all cases simultaneously (i.e. be highly sensitive), and of those who are identified as cases, few should be false positives (i.e. have a high PPV) so that health resources are not misallocated. None of the questionnaires met these dual criteria. Although the GHQ-12 showed the best balance of discriminating ability and internal consistency, to achieve a sensitivity of over 70% the GHQ-12 results would have led to one in three ‘cases’ being misclassified as a false positive. However, if we wanted to improve the PPV such that only one in four ‘cases’ was a false positive, the sensitivity would fall to about 50%. The positive likelihood ratio reflects both sensitivity and specificity, as it is the ratio of the probability of being identified as a case among true cases compared with among true non-cases. Among our tests, the highest positive likelihood ratio value, against the ICD-10 any case criterion, was for the GHQ with a 7/8 cut-off, followed by SRQ with a 12/13 cut-off. However, both of these cut-off points had relatively low sensitivity (52% and 55% respectively) and failing to identify almost half of true cases. Nevertheless, a number of cut-off points for the GHQ had impressive likelihood ratio values, which are independent of the prevalence of the disorder. Thus, the choice of an appropriate cut-off score for use of these questionnaires in routine primary care may depend on whether an additional assessment by the primary-care physician to confirm the diagnosis of CMD is feasible and reliable. If not, then we advocate a higher cut-off score to ensure that primary-care resources are not misallocated to non-cases.
One limitation of our study may be the choice of our reference standard. There is no gold standard questionnaire for the diagnosis of CMD in primary care; the Composite International Diagnostic Interview (CIDI; Wittchen et al. Reference Wittchen, Robins, Cottler, Sartorius, Burke and Regier1991) was considered as an alternative to the CIS-R but it was not selected because its length and complexity made its use in busy primary-care clinics unfeasible. The CIS-R is a derivative of one of the oldest interviews in psychiatric research (the CIS). Apart from being one of the most widely used lay interviews for the diagnosis of CMD in developing countries, it has also been used extensively in India. One of its great strengths is that it requires only 20 to 30 min to complete and generates ICD-10-compatible diagnoses. However, the high AUCs found for the GHQ-12 may be partly the result of the shared history for both instruments; it is notable that when the K6 and K10 are compared against the CIDI, again both sets of instruments that share a history, the AUCs reported are higher than those we have found in our study. For example, in an Australian study that compared the K6/K10 with the GHQ, both the K6 and the K10 were significantly better than the GHQ-12 (Furukawa et al. Reference Furukawa, Kessler, Slade and Andrews2003). In another study from Canada (Cairney et al. Reference Cairney, Veldhuizen, Wade, Kurdyak and Streiner2007), the K6 and K10 performed very well as predictors of 1-month depression, with AUCs exceeding 0.9.
Thus, this study shows that there is little to choose between the questionnaires evaluated, all being relatively similar in their ability to identify cases of CMD and narrowly defined major depressive disorder. Screening in routine clinical practice may need to be combined with physician assessment of screen positives to reduce the proportion of false positives identified.
Acknowledgements
This study was supported by a Wellcome Trust Senior Clinical Research Fellowship award to V. Patel. We are grateful to the Directorate of Health Services (Government of Goa) and the two private practitioners for permission to conduct the study in their clinics.
Declaration of Interest
None.