Hostname: page-component-745bb68f8f-lrblm Total loading time: 0 Render date: 2025-02-06T20:41:41.755Z Has data issue: false hasContentIssue false

Composite International Diagnostic Interview screening scales for DSM-IV anxiety and mood disorders

Published online by Cambridge University Press:  18 October 2012

R. C. Kessler*
Affiliation:
Department of Health Care Policy, Harvard Medical School, Boston, MA, USA
J. R. Calabrese
Affiliation:
Department of Psychiatry, University Hospitals Case Medical Center, Case Western Reserve University, Cleveland, OH, USA
P. A. Farley
Affiliation:
Clinical Services, EPI-Q, Inc., Oakbrook Terrace, IL, USA
M. J. Gruber
Affiliation:
Department of Health Care Policy, Harvard Medical School, Boston, MA, USA
M. A. Jewell
Affiliation:
Clinical Services, EPI-Q, Inc., Oakbrook Terrace, IL, USA
W. Katon
Affiliation:
School of Medicine, University of Washington, Seattle, WA, USA
P. E. Keck Jr.
Affiliation:
Lindner Center of HOPE and Department of Psychiatry, University of Cincinnati College of Medicine, Cincinnati, OH, USA
A. A. Nierenberg
Affiliation:
Depression Clinical and Research Program and the Bipolar Clinic and Research Program, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
N. A. Sampson
Affiliation:
Department of Health Care Policy, Harvard Medical School, Boston, MA, USA
M. K. Shear
Affiliation:
Columbia University School of Social Work, New York, NY, USA
A. C. Shillington
Affiliation:
Clinical Services, EPI-Q, Inc., Oakbrook Terrace, IL, USA
M. B. Stein
Affiliation:
Department of Psychiatry, University of California San Diego, San Diego, CA, USA
M. E. Thase
Affiliation:
Departments of Psychiatry, University of Pennsylvania School of Medicine, Philadelphia Veterans Affairs Medical Center, and the University of Pittsburgh Medical Center, Philadelphia and Pittsburgh, PA, USA
H.-U. Wittchen
Affiliation:
Institute of Clinical Psychology and Psychotherapy, Technische Universität Dresden, Dresden, Germany
*
*Address for correspondence: R. C. Kessler, Ph.D., Department of Health Care Policy, Harvard Medical School, 180 Longwood Avenue, Boston, MA 02115, USA. (Email: Kessler@hcp.med.harvard.edu)
Rights & Permissions [Opens in a new window]

Abstract

Background

Lack of coordination between screening studies for common mental disorders in primary care and community epidemiological samples impedes progress in clinical epidemiology. Short screening scales based on the World Health Organization (WHO) Composite International Diagnostic Interview (CIDI), the diagnostic interview used in community epidemiological surveys throughout the world, were developed to address this problem.

Method

Expert reviews and cognitive interviews generated CIDI screening scale (CIDI-SC) item pools for 30-day DSM-IV-TR major depressive episode (MDE), generalized anxiety disorder (GAD), panic disorder (PD) and bipolar disorder (BPD). These items were administered to 3058 unselected patients in 29 US primary care offices. Blinded SCID clinical reinterviews were administered to 206 of these patients, oversampling screened positives.

Results

Stepwise regression selected optimal screening items to predict clinical diagnoses. Excellent concordance [area under the receiver operating characteristic curve (AUC)] was found between continuous CIDI-SC and DSM-IV/SCID diagnoses of 30-day MDE (0.93), GAD (0.88), PD (0.90) and BPD (0.97), with only 9–38 questions needed to administer all scales. CIDI-SC versus SCID prevalence differences are insignificant at the optimal CIDI-SC diagnostic thresholds (χ21 = 0.0–2.9, p = 0.09–0.94). Individual-level diagnostic concordance at these thresholds is substantial (AUC 0.81–0.86, sensitivity 68.0–80.2%, specificity 90.1–98.8%). Likelihood ratio positive (LR+) exceeds 10 and LR− is 0.1 or less at informative thresholds for all diagnoses.

Conclusions

CIDI-SC operating characteristics are equivalent (MDE, GAD) or superior (PD, BPD) to those of the best alternative screening scales. CIDI-SC results can be compared directly to general population CIDI survey results or used to target and streamline second-stage CIDIs.

Type
Original Articles
Copyright
Copyright © Cambridge University Press 2012 

Introduction

Although research on the community epidemiology of mental disorders (i.e. general population incidence, prevalence, risk factors, consequences) is an active area of investigation (Susser et al. Reference Susser, Schwartz, Morabia and Bromet2006; Kessler & Üstün, Reference Kessler and Üstün2008b; Tsuang et al. Reference Tsuang, Tohen and Jones2011), research on clinical epidemiology (i.e. prevalence, severity, long-term course in treatment samples) is underdeveloped, especially in primary care settings. Indeed, the most important clinical epidemiological study of mental disorders in primary care remains the World Health Organization (WHO) Collaborative Study on Psychological Problems in General Health Care (Üstün & Sartorius, Reference Üstün and Sartorius1995), a study carried out nearly two decades ago that led to few extensions (e.g. Wittchen et al. Reference Wittchen, Kessler, Beesdo, Krause, Hofler and Hoyer2002; Barkow et al. Reference Barkow, Maier, Üstün, Gansicke, Wittchen and Heun2003; Kisely & Simon, Reference Kisely and Simon2005; Kisely et al. Reference Kisely, Scott, Denney and Simon2006). More sustained long-term clinical epidemiological studies exist in specialty treatment samples (Katz et al. Reference Katz, Secunda, Hirschfeld and Koslow1979; Bruce et al. Reference Bruce, Yonkers, Otto, Eisen, Weisberg, Pagano, Shea and Keller2005), but those studies are now outdated because of changes in the composition of patient populations since the studies were initiated (Kessler et al. Reference Kessler, Demler, Frank, Olfson, Pincus, Walters, Wang, Wells and Zaslavsky2005).

We know from primary care screening studies that untreated mental disorders are common in primary care (Lowe et al. Reference Lowe, Spitzer, Williams, Mussell, Schellberg and Kroenke2008; Gili et al. Reference Gili, Luciano, Serrano, Jimenez, Bauza and Roca2011). However, screening studies tell us little about the natural history of these disorders, as screening studies typically focus on current prevalence or treatment response. Yet information is needed on episode recurrence and onset of secondary disorders to understand the public health significance and long-term cost-effectiveness of primary care screening, outreach and treatment quality improvement (Barrett et al. Reference Barrett, Byford and Knapp2005; Konnopka et al. Reference Konnopka, Leichsenring, Leibing and Konig2009). This integration of primary care with public health is now an area of considerable policy interest (Committee on Integration to Improve Population Health, 2012).

One way to build a critical mass of such data would be to blend longitudinal clinical epidemiological studies with community epidemiological surveys. For example, several new community epidemiological surveys in the WHO World Mental Health (WMH) Survey Initiative (Kessler & Üstün, Reference Kessler, Üstün, Kessler and Üstün2008a) are using a dual-frame sampling approach with parallel samples of (i) patients in primary care (both with and without detected and undetected mental disorders) and (ii) other household residents in the same communities. This design facilitates comparisons of illness prevalence course among treated and untreated cases by collecting successive snapshots of current prevalence of disorders over multiple points in time.

Screening scales will be used in the primary care segment of these surveys as the first stage in a two-stage approach to oversample patients with current mental disorders for second-stage interviews. Screening scale responses are being ‘preloaded’ in the computerized scripts of the second-stage interviews to control question skip logic (e.g. skipping sections based on negative screening responses; expanding questions based on positive screening responses). The screening scales used for this purposes are based on the WHO Composite International Diagnostic Interview (CIDI; Kessler & Üstün, Reference Kessler and Üstün2004), the diagnostic interview used in the WMH surveys and most other psychiatric epidemiological surveys throughout the world. Psychometric analyses of these disorder-specific CIDI screening scales (CIDI-SC) have been reported previously for adult attention deficit/hyperactivity disorder (ADHD; Kessler et al. Reference Kessler, Adler, Gruber, Sarawate, Spencer and Van Brunt2007), insomnia, (Kessler et al. Reference Kessler, Coulouvrat, Hajak, Lakoma, Roth, Sampson, Shahly, Shillington, Stephenson, Walsh and Zammit2010a) and overall serious mental illness (SMI; Kessler et al. Reference Kessler, Green, Gruber, Sampson, Bromet, Cuitan, Furukawa, Gureje, Hinkov, Hu, Lara, Lee, Mneimneh, Myer, Oakley-Browne, Posada-Villa, Sagar, Viana and Zaslavsky2010b). The current report presents comparable results for the CIDI-SC scales of major depressive episode (MDE), bipolar disorder (BPD), generalized anxiety disorder (GAD) and panic disorder (PD). Although the results presented are for cross-sectional rather than longitudinal analyses, they are relevant for the longitudinal design described above because the latter is made up of a series of cross-sectional snapshots.

Method

Screening scale development

The CIDI

The CIDI is a fully structured research diagnostic interview developed for use by trained lay interviewers to generate diagnoses of lifetime and recent DSM-IV-TR/ICD-10 disorders (Robins et al. Reference Robins, Wing, Wittchen, Helzer, Babor, Burke, Farmer, Jablenski, Pickens, Regier, Sartorius and Towle1988). Clinical reappraisal studies document generally good concordance of CIDI diagnoses with blinded clinical diagnoses (Wittchen, Reference Wittchen1994; Kessler et al. Reference Kessler, Wittchen, Abelson, McGonagle, Schwarz, Kendler, Knäuper and Zhao1998). The CIDI uses extensive skip logic to reduce interview length. This skip logic is also used in the CIDI-SC based on the assumption that tablet computers will be used to administer, score and print out summary screening scale results.

Expanding the CIDI item pool

All CIDI symptom questions operationalize DSM/ICD criteria using simple descriptive language (Robins et al. Reference Robins, Wing, Wittchen, Helzer, Babor, Burke, Farmer, Jablenski, Pickens, Regier, Sartorius and Towle1988). However, validation studies find some CIDI questions less concordant than others with independent clinical assessments (Wittchen et al. Reference Wittchen, Kessler, Zhao and Abelson1995; Kessler et al. Reference Kessler, Akiskal, Angst, Guyer, Hirschfeld, Merikangas and Stang2006; Green et al. Reference Green, Avenevoli, Finkelman, Gruber, Kessler, Merikangas, Sampson and Zaslavsky2011). We consequently expanded the CIDI item pool in developing the CIDI-SC scales by reviewing a wide range of other diagnostic instruments to generate alternative symptom questions. The expanded question set was reviewed iteratively, with diagnostic experts using their judgment to pinpoint alternative questions they considered potentially useful and to help revise and prioritize indicators. Previous methodological research has shown that such iterative expert review is often the most useful form of pretesting (Converse & Presser, Reference Converse and Presser1986; Presser & Blair, Reference Presser and Blair1994; Groves et al. Reference Groves, Fowler, Couper, Lepkowski, Singer and Tourangeau2009).

Pilot testing

Once preliminary symptom questions were generated, a convenience sample of 15 psychiatric out-patients with each diagnosis was administered the disorder-specific symptom questions. Cognitive debriefing interviews (Willis, Reference Willis2005) assessed problems in conceptual understanding and question wording. These interviews were conducted by professional cognitive interviewers using the ‘think aloud’ method (Presser et al. Reference Presser, Couper, Lessler, Martin, Martin, Rothgeb and Singer2004) to elicit initial respondent reactions and collect alternative terminologies for confusing phrases. The results were presented to the diagnostic experts for review and final question revision.

The clinical reappraisal study

The sample

The revised questions were then administered to 3058 patients sampled from 29 primary care offices selected to include practices in both urban and rural areas in all four US Census Regions (Northeast, South, Midwest and West). No other stratification criteria were used in selecting practices. Practices were recruited through the Primary Care Network (www.primarycarenet.org). The original sample design called for a quota sample of 100 completed interviews in each of 30 offices with an unselected consecutive sample of patients. This sample size was selected to allow for the second-stage assessment of at least 30 screened positives for even the least common disorder (BPD) assuming plausible prevalence estimates and second-stage response rates. However, because one selected office dropped out after office recruitment ended, other offices in the same sample stratum were asked to continue data collection for 2 days beyond the time they met their quota, yielding a sample slightly larger than the originally targeted 3000 and based on 29, not 30, offices.

Respondent recruitment began by giving a ‘study fact brochure’ to patients as they checked in that explained the study as a test of a new screening questionnaire for common anxiety and mood disorders. The brochure explained that the study needed people aged ⩾18 years both with and without the disorders to complete a 15-min laptop computer questionnaire in the waiting room; that participants would be remunerated US$25; that some participants would be asked to participate in a telephone follow-up interview that could take up to 1 h to complete; and that telephone respondents would receive an additional US$50. The brochure emphasized that responses were confidential and decisions about participation would not affect health-care treatment or benefits. Patients who informed the office receptionist that they were interested in the study then provided written informed consent and received a laptop computer to complete the questionnaire in the waiting room. Telephone numbers provided in the questionnaire were used to contact respondents and administer clinical reappraisal interviews within 3 days of the visit. The Human Subjects Committee of the New England Institutional Review Board (www.neirb.com) approved these recruitment, consent and field procedures.

The clinical reappraisal interview

Each CIDI-SC respondent was classified as ‘very likely’, ‘likely’, ‘possible’ or ‘no’ on each screening scale. A probability subsample of 30 respondents classified ‘very likely’ and 20 classified ‘likely-possible’ was selected for each scale with replacement and administered the clinical reappraisal interview. The sampling fraction varied across disorders due to prevalence differences to make the sample well-distributed across practices. Fifty patients who screened ‘no’ on all screening scales were also interviewed. The total clinical reappraisal sample of 206 is less than 50 × 5=250 because some respondents were independently selected for multiple disorders.

The clinical interview was an abridged Research Version, Non-Patient Edition of the Structured Clinical Interview for DSM-IV (SCID-I; First et al. Reference First, Spitzer, Gibbon and Williams2002) focused on the four syndromes under study: 30-day MDE and GAD and lifetime and 30-day PD and mania/hypomania. Experienced SCID interviewers administered the interviews under the supervision of a study collaborator (P.E.K.) blinded to CIDI-SC responses. 30-day PD was defined as lifetime PD with 30-day panic attacks and/or persistent concern about additional attacks, worry about implications/consequences of attacks, or significant change in behavior due to attacks. 30-day BPD was defined as lifetime mania/hypomania with either 30-day MDE or 30-day mania/hypomania. SCID diagnoses were made without diagnostic hierarchy rules but with organic exclusions. Organic exclusions were not made in the screening scales. Each SCID disorder was classified as severe or non-severe to determine whether CIDI-SC could differentially detect more severe cases. BP-I versus BP-II defined BPD severity whereas the distinction between severe and non-severe cases of the other disorders was based on SCID interviewer assessments of whether there were (i) many symptoms more than needed for diagnosis, (ii) several symptoms that were particularly severe and/or (iii) marked impairment in social or occupational functioning associated with the disorder.

Analysis methods

The clinical reappraisal sample was weighted to adjust for oversampling of patients screened as ‘very likely’, ‘likely’ or ‘possible’. Iterative stepwise logistic regression was then used (0.05-level entry criterion) to predict SCID diagnoses from CIDI-SC symptoms to determine the minimum CIDI-SC question set needed to approximate SCID diagnoses. An unweighted summary CIDI-SC score was created for each diagnosis from this minimum symptom set and receiver operating characteristic curve (ROC) analysis (Margolis et al. Reference Margolis, Bilker, Boston, Localio and Berlin2002) was used to estimate the area under the ROC curve (AUC) for each scale. The AUC is the probability of correctly identifying SCID cases from CIDI-SC scores in paired comparisons of randomly selected pairs of SCID cases and non-cases, where CIDI-SC tied scores are assigned a 50% chance of correct classification (Kraemer, Reference Kraemer1992). The AUC has a predicted value of 0.5 when the screening scale is completely unrelated to the true score and 1.0 when perfectly related. CIDI-SC scores were not weighted to avoid overfitting in the absence of a large enough sample for cross-validation.

Each CIDI-SC score was then collapsed so that SCID prevalence estimates increased monotonically across screening scale strata but did not differ significantly within strata using the logic of stratum-specific likelihood ratio (LR) analysis (Pepe, Reference Pepe2003). McNemar χ2 tests then tested the significance of differences between CIDI-SC and SCID prevalence estimates. Significance tests were based on Taylor series design-based standard errors to adjust for data weighting (Wolter, Reference Wolter1985).

Individual-level concordance was evaluated using the AUC and Cohen's κ (Cohen, Reference Cohen1960). Although κ is the traditional measure used in psychiatric research, it is not emphasized here because is varies across populations that differ in prevalence even when sensitivity (SN; the percentage of true cases correctly classified) and specificity (SP; the percentage of true non-cases correctly classified) are constant (Cook, Reference Cook, Armitage and Colton1998). The AUC, in comparison, is a function of SN and SP, which are considered the fundamental parameters of agreement. The AUC equals (SN + SP)/2 when the screen is dichotomous. AUC scores between 0.5 and 1.0 are often interpreted in parallel with κ as slight (AUC = 0.5–0.6, κ = 0.0–0.2), fair (AUC = 0.6–0.7, κ = 0.2–0.4), moderate (AUC = 0.7–0.8, κ = 0.4–0.6), substantial (AUC = 0.8–0.9, κ = 0.6–0.8) and almost perfect (AUC ⩾0.9, κ ⩾0.8) (Landis & Koch, Reference Landis and Koch1977). We also report total classification accuracy (TCA), the proportion of all respondents whose CIDI-SC and SCID classifications are consistent.

In addition, we report disaggregated measures of operating characteristics, including SN and SP, positive predictive value (PPV; proportion of screened positives confirmed by the SCID), negative predictive value (NPV; proportion of screened negatives confirmed as non-cases by the SCID), LR positive [(LR+); SN/(1 – SP)] and LR negative [LR–; (1 – SN)/SP)]. LR+ and LR− assess the relative proportions of screened positives versus screened negatives confirmed as cases (LR+) or non-cases (LR–). LR+ values ⩾5 and LR− values ⩽0.2 are generally considered useful, whereas LR+ values ⩾10 and LR− values ⩽0.1 are considered sufficient to rule in/out diagnoses (Haynes et al. Reference Haynes, Sackett, Guyatt and Tugwell2006).

Comparison with other widely used screening scales

To compare CIDI-SC operating characteristics with other screening scales, a 1990–2012 Medline search of screening scale validity studies was carried out using search terms ‘screening’, ‘validity’, ‘sensitivity’, ‘specificity’, ‘case finding’ and ‘AUC’ crossed with ‘depression’, ‘bipolar disorder’, ‘manic depression’, ‘generalized anxiety disorder’ and ‘panic disorder’. We focused on studies where screening scales were compared to blinded clinical reappraisal interviews in samples of patients, community residents or internet users. Only key studies were considered.

Results

Stepwise logistic regression

Separate stepwise logistic regression analyses were used to predict each SCID disorder from the corresponding CIDI screening items.

MDE

Three CIDI questions were entered stepwise to predict 30-day dysphoria (sad–depressed, down–discouraged) and anhedonia (little–no interest in day-to-day activities) in the total sample. Among respondents with dysphoria and/or anhedonia, five additional questions were entered to screen for other DSM-IV Criterion A symptoms of MDE or the Criterion C requirement of clinically significant distress or impairment. The AUC for the continuous CIDI-SC scale with these eight questions was 0.93.

GAD

Two CIDI questions were entered to screen for 30-day DSM-IV GAD Criterion A (excessive anxiety–worry about multiple events–activities) in the total sample. Among Criterion A screened positives, five additional questions were entered to screen for Criteria B (difficulty controlling worry), C (restless, difficulty relaxing) and E (clinically significant distress or impairment). The AUC for the continuous CIDI-SC scale with these seven questions was 0.88.

PD

Two CIDI questions were entered to screen for having lifetime attacks of intense fear or discomfort that came on very suddenly in the total sample. Among respondents with such attacks, seven follow-up questions were entered about psycho-physiological symptoms. Among patients with such symptoms, an additional question asked about symptoms reaching a peak within 10 min and two question asked about the Criterion A1 DSM-IV PD requirement that attacks be recurrent and unexpected. Four questions asked about Criterion A2 that attacks be followed by a month of persistent concern about another attack, worry about implications or significant change in behavior. Final questions then asked about 30-day prevalence. The AUC for the continuous CIDI-SC scale with these 15 symptom questions crossed with reports of 30-day recency was 0.90.

BPD

Two CIDI questions were entered to screen for lifetime DSM-IV mania-hypomania Criterion A (distinct periods of abnormally persistently elevated, expansive or irritable mood) in the total sample. Among respondents who endorsed at least one such question, four additional questions were entered to screen for Criterion B (more talkative than usual, racing thoughts, psychomotor agitation, excessive involvement in activities having high potential for painful consequences) and two for Criterion D (mania)/E (hypomania) involving presence–absence of marked impairment or hospitalization. A final question then asked about 30-day prevalence. The AUC for the continuous CIDI-SC scale with these eight questions for lifetime or 30-day mania–hypomania crossed with the CIDI-SC screen for 30-day MDE to define 30-day BPD was 0.97.

The three CIDI-SC diagnostic stem questions for MDE combined with two for GAD, two for PD and two for BPD create a set of only nine items that screen out the majority of primary care patients in less than 3 min. The maximum number of items (40) can be completed in no more than 8 min.

Concordance of DSM-IV CIDI-SC and SCID diagnoses

CIDI-SC versus SCID prevalence estimate differences are insignificant for all disorders at optimal (for estimating prevalence) CIDI-SC thresholds (χ21 = 0.0–2.9, p = 0.09–0.98) (Table 1). Aggregate diagnostic concordance at these thresholds is substantial for all disorders (AUC = 0.81–0.86), with proportions of SCID cases detected (SN) of 68.0–80.2%. The proportions of SCID non-cases classified correctly (SP) are 90.1–98.8%. Lower SN than SP is expected for thresholds designed to estimate prevalence without bias when only a minority of patients has a disorder, in which case LR+ is more informative than SN. LR+ is >10 for three of the four CIDI-SC at the optimal thresholds, indicating that screened positives are much more likely than screened negatives to be confirmed as SCID cases. LR+ is 8.1 for MDE, an informative but not definitive value. The proportions of screened positives at the optimal thresholds confirmed as SCID cases (PPV) are in the range 48.2% (GAD) to 73.7% (BPD) (Table 2).

Table 1. Consistency of DSM-IV diagnoses based on the CIDI screening scales (CIDI-SC) at their optimal (to estimate prevalence) thresholds and based on the SCID (n = 206)

CIDI, Composite International Diagnostic Interview; SCID, Structured Clinical Interview for DSM-IV; AUC, area under the receiver operating characteristic curve; TCA, total classification accuracy; SN, sensitivity of the screening scale at the designated threshold; SP, specificity of the screening scale at the designated threshold; MDE, major depressive episode; GAD, generalized anxiety disorder; PD, panic disorder; BPD, bipolar disorder; s.e., standard error.

a Prevalence estimates based on the CIDI-SC do not differ significantly from those based on the SCID for any of the diagnoses (p = 0.09–0.98).

Table 2. CIDI screening scale (CIDI-SC) classification of DSM-IV/SCID cases and non-cases at different thresholds on the CIDI-SC (n = 206)a

CIDI, Composite International Diagnostic Interview; SCID, Structured Clinical Interview for DSM-IV; p, proportion of patients who screened positive on the CIDI-SC at the designated threshold; SN, sensitivity of the CIDI-SC at the designated threshold; LR + , likelihood ratio positive of the CIDI-SC at the designated threshold; PPV, positive predictive value of the CIDI-SC at the designated threshold; SP, specificity of the CIDI-SC at the designated threshold; LR − , likelihood ratio negative of the CIDI-SC at the designated threshold; NPV, negative predictive value of the CIDI-SC at the designated threshold; MDE, major depressive episode; GAD, generalized anxiety disorder; PD, panic disorder; BPD, bipolar disorder; s.e., standard error.

The screen for MDE, the only one where LR+ is <10, can be made more conservative by raising the threshold (LR + =24.5, PPV = 85.9%), but at the cost of reducing SN from 80.2% to 46.5%. All four CIDI-SC can be made less conservative by lowering their thresholds, increasing SN to between 94.8% (BPD) and 100% (GAD and PD), but at the cost of increasing the estimated prevalence and decreasing LR+ and PPV. The only disorder where this conservative change is efficient is BPD, with an estimated prevalence increasing from 4.4% to 7.0%, LR+ and PPV both remaining high (31.6, 59.1%) and SN increasing from 74.0% to 94.8%.

Although SP is above 90% for all disorders, this is not a definite rule-out when only a small minority of respondents has the disorder, in which case LR− is more informative. LR− is ⩽0.2 for only two diagnoses at the optimal threshold (MDE and PD) whereas LR− is never below 0.2, meaning that none of the diagnoses can be ruled out confidently with CIDI-SC scores below the optimal diagnostic threshold. However, thresholds can be lowered to produce LR− values less than 0.1 for all disorders, although at the cost of reducing SN. For MDE, 54.1% of patients can be ruled out [i.e. at a threshold where 45.9% (100%–45.9% = 54.1%) of patients screen positive] with LR− 0.1 (NPV = 99.7%). For GAD, 55.7% of patients can be ruled out with LR− 0.0 (NPV = 100%). For PD, 86.3% of patients can be ruled out with LR− 0.2 and 37.9% with LR− 0.0. None of the PD screened negatives at the lower thresholds and 2.8% at the next lowest threshold had a SCID diagnosis. For BPD, 93% of patients can be ruled out with LR− 0.1 (NPV = 99.8%).

Severe and non-severe cases

SN is higher for severe than non-severe cases of all four diagnoses at the optimal threshold (Table 3). SN is 85.4–92.9% for severe MDE, PD and BPD versus 68.9–69.6% for non-severe cases and 70.8% versus 59.6% for severe and non-severe GAD. However, none of the severe versus non-severe SN differences are statistically significant because of the small numbers of cases (χ21 = 0.2–2.4, p = 0.12–0.68).

Table 3. CIDI screening scale (CIDI-SC) sensitivity (SN) and likelihood ratio positive (LR+) for detecting severe and non-severe DSM-IV/SCID cases (n = 206)

CIDI, Composite International Diagnostic Interview; SCID, Structured Clinical Interview for DSM-IV; SN, sensitivity of the CIDI-SC at the designated threshold; LR + , likelihood ratio positive of the CIDI-SC at the designated threshold; MDE, major depressive episode; GAD, generalized anxiety disorder; PD, panic disorder; BPD, bipolar disorder; s.e., standard error.

a Although SN is consistently higher for severe than non-severe cases, none of these differences is statistically significant (p = 0.12–0 .68).

Comparisons with other screening scales

MDE

The nine-item Patient Health Questionnaire (PHQ-9; Spitzer et al. Reference Spitzer, Kroenke and Williams1999) is the most widely used major depression screening scale. Reviews of many PHQ-9 primary care validity studies (Gilbody et al. Reference Gilbody, Richards, Brealey and Hewitt2007; Wittkampf et al. Reference Wittkampf, Naeije, Schene, Huyser and van Weert2007; Kroenke et al. Reference Kroenke, Spitzer, Williams and Lowe2010; Manea et al. Reference Manea, Gilbody and McMillan2012) show a central tendency of the AUC to be 0.85–0.88, which does not differ meaningfully from the CIDI-SC MDE AUC of 0.85. (See online Appendix Tables 1–4 for detailed results.) CIDI-SC SN and SP (0.80, 0.90) are also in the middle of the PHQ-9 ranges (0.77–0.88, 0.88–0.94). The AUC of other MDE screening scales is generally lower (0.72–0.84) and LR+ uninformative (Zigmond & Snaith, Reference Zigmond and Snaith1983; Broadhead et al. Reference Broadhead, Leon, Weissman, Barrett, Blacklow, Gilbert, Keller, Olfson and Higgins1995; Farvolden et al. Reference Farvolden, McBride, Bagby and Ravitz2003; Hunter et al. Reference Hunter, Penick, Powell, Othmer, Nickel and Desouza2005; Donker et al. Reference Donker, van Straten, Marks and Cuijpers2009; Gaynes et al. Reference Gaynes, DeVeaugh-Geiss, Weir, Gu, MacPherson, Schulberg, Culpepper and Rubinow2010; Houston et al. Reference Houston, Kroenke, Faries, Doebbeling, Adler, Ahl, Swindle and Trzepacz2011). One exception was found in a community survey of the Center for Epidemiologic Studies Depression scale (CES-D; Radloff, Reference Radloff1977), with an AUC of 0.94 (Beekman et al. Reference Beekman, Deeg, Van Limbeek, Braam, De Vries and Van Tilburg1997), but other CES-D validity studies found considerably lower AUC, at 0.76–0.82 (Schulberg et al. Reference Schulberg, Saul, McClelland, Ganguli, Christy and Frank1985; Klinkman et al. Reference Klinkman, Coyne, Gallo and Schwenk1997; Thomas et al. Reference Thomas, Jones, Scarinci, Mehan and Brantley2001).

GAD

The CIDI-SC GAD AUC (0.81) is in the middle of the range for the GAD screening scales reviewed (0.74–0.85) (Broadhead et al. Reference Broadhead, Leon, Weissman, Barrett, Blacklow, Gilbert, Keller, Olfson and Higgins1995; Farvolden et al. Reference Farvolden, McBride, Bagby and Ravitz2003; Kroenke et al. Reference Kroenke, Spitzer, Williams, Monahan and Lowe2007; Donker et al. Reference Donker, van Straten, Marks and Cuijpers2009, Reference Donker, van Straten, Marks and Cuijpers2011; Houston et al. Reference Houston, Kroenke, Faries, Doebbeling, Adler, Ahl, Swindle and Trzepacz2011). However, CIDI-SC SN and SP (0.68, 0.94) are closest to those of one specialty treatment screening scale, the Web-Based Depression and Anxiety Test (WB-DAT; Farvolden et al. Reference Farvolden, McBride, Bagby and Ravitz2003). Other screening scales have higher SN (0.83–0.93) but much lower SP (0.45–0.82). CIDI-SC and WB-DAT consequently have much higher LR+ (11.1, 10.5) than other scales (1.7–4.9), indicating higher confirmation of screened positives. This can be illustrated using Bayes' theorem to calculate post-test probability of SCID GAD for screened positives (Altman & Bland, Reference Altman and Bland1994), which shows that for a true GAD prevalence of 5–15%, confirmation of screened positives would be only 21–46% for screening scales but much higher for WB-DAT (36–65%) and CIDI-SC (37–66%). Caution is needed in interpreting the WB-DAT results, however, as they were obtained in a specialty treatment setting.

PD

The CIDI-SC PD AUC (0.85) is at the upper end of the PD screening scales reviewed (0.69–0.88) (Broadhead et al. Reference Broadhead, Leon, Weissman, Barrett, Blacklow, Gilbert, Keller, Olfson and Higgins1995; Stein et al. Reference Stein, Roy-Byrne, McQuaid, Laffaye, Russo, McCahill, Katon, Craske, Bystritsky and Sherbourne1999; Farvolden et al. Reference Farvolden, McBride, Bagby and Ravitz2003; Lowe et al. Reference Lowe, Grafe, Zipfel, Spitzer, Herrmann-Lingen, Witte and Herzog2003; Hunter et al. Reference Hunter, Penick, Powell, Othmer, Nickel and Desouza2005; Bunevicius et al. Reference Bunevicius, Peceliuniene, Mickuviene, Valius and Bunevicius2007; Kroenke et al. Reference Kroenke, Spitzer, Williams, Monahan and Lowe2007; Donker et al. Reference Donker, van Straten, Marks and Cuijpers2009). CIDI-SC has among the highest LR+ (12.3) along with WB-DAT (12.5) and one of two GAD-7 (19.0) validity studies (Lowe et al. Reference Lowe, Grafe, Zipfel, Spitzer, Herrmann-Lingen, Witte and Herzog2003). The high LR+ in that GAD-7 study, however, is offset by a much lower LR+ (3.9) in a second much larger GAD-7 study (Kroenke et al. Reference Kroenke, Spitzer, Williams, Monahan and Lowe2007). The scales with high LR+ are much more distinct for their high SP (0.94–0.96) than high SN. If we assume that the true PD prevalence is in the range 5–15% in primary care and SN–SP estimates are accurate, confirmation of screened positives would be 35–65% for CIDI-SC and WB-DAT, 17–77% for the GAD-7, and no higher than 20–45% for other screening scales.

BPD

Although the Mood Disorder Questionnaire (MDQ; Hirschfeld et al. Reference Hirschfeld, Williams, Spitzer, Calabrese, Flynn, Keck, Lewis, McElroy, Post, Rapport, Russell, Sachs and Zajecka2000) is by far the most widely used BPD screening scale, the vast majority of MDQ studies focus on patients in treatment for depression and investigate whether those with BPD can be distinguished from non-bipolar depressives (Hirschfeld et al. Reference Hirschfeld, Williams, Spitzer, Calabrese, Flynn, Keck, Lewis, McElroy, Post, Rapport, Russell, Sachs and Zajecka2000, Reference Hirschfeld, Cass, Holt and Carlson2005; Miller et al. Reference Miller, Klugman, Berv, Rosenquist and Ghaemi2004, Reference Miller, Johnson, Kwapil and Carver2011; Weber Rouget et al. Reference Weber Rouget, Gervasoni, Dubuis, Gex-Fabry, Bondolfi and Aubry2005; Parker et al. Reference Parker, Fletcher, Barrett, Synnott, Breakspear, Hyett and Hadzi-Pavlovic2008, Reference Parker, Graham, Hadzi-Pavlovic, Fletcher, Hong and Futeran2012; Twiss et al. Reference Twiss, Jones and Anderson2008; Zimmerman et al. Reference Zimmerman, Galione, Ruggero, Chelminski, McGlinchey, Dalrymple and Young2009). This focus reflects the fact that the MDQ was developed to address the under-detection of BPD among depressed patients (Hirschfeld & Vornik, Reference Hirschfeld and Vornik2004). We are aware of only two MDQ validity studies that evaluated ability to distinguish patients with BPD from all other patients (including those without depression) in settings other than a specialty clinic (Hirschfeld et al. Reference Hirschfeld, Holzer, Calabrese, Weissman, Reed, Davies, Frye, Keck, McElroy, Lewis, Tierce, Wagner and Hazard2003; Dodd et al. Reference Dodd, Williams, Jacka, Pasco, Bjerkeset and Berk2009). These studies were both carried out in community samples. The MDQ AUC was fairly low in both studies (AUC = 0.62) compared to much higher AUCs (0.86–0.96) for the CIDI-SC BPD scale at its two informative thresholds.

Only one other BPD screening scale, the Mood Swings Questionnaire (MSQ; Parker et al. Reference Parker, Hadzi-Pavlovic and Tully2006), had an AUC as high as the CIDI-SC, but this was in a study in a mental health specialty clinic among patients presenting for treatment of depression. Two subsequent studies in that same clinic produced lower MSQ AUC estimates (0.73–0.81; Parker et al. Reference Parker, Fletcher, Barrett, Synnott, Breakspear, Hyett and Hadzi-Pavlovic2008, Reference Parker, Graham, Hadzi-Pavlovic, Fletcher, Hong and Futeran2012). Other BPD screening scales reviewed had lower AUC (0.66–0.81; Hunter et al. Reference Hunter, Penick, Powell, Othmer, Nickel and Desouza2005; Gaynes et al. Reference Gaynes, DeVeaugh-Geiss, Weir, Gu, MacPherson, Schulberg, Culpepper and Rubinow2010). The advantage of CIDI-SC over these other scales can be traced to high CIDI-SC SN at its anti-conservative threshold (0.95). Although, as noted earlier, high SN is often accompanied by low LN + , this is not the case with CIDI-SC BPD, where LR+ is 31.6 at the anti-conservative threshold and 62–85% of screened positives would be confirmed as SCID cases if the true BPD prevalence was in the range 5–15%.

Discussion

CIDI-SC operating characteristics are equivalent to the best alternative screening scales for MDE and GAD and superior to other screening scales for PD and BPD. CIDI-SC results can be compared directly to general population epidemiological CIDI surveys because CIDI-SC items all come from the CIDI. Such nested screening scales can be useful in targeting and streamlining CIDI follow-up interviews by ‘pre-loading’ CIDI-SC responses into the CIDI computerized interview program to guide interview question skip logic. Such an integrated computerized CIDI interviewing system is currently in development and includes options for self-administering CIDI-SC on tablet computers in primary care waiting rooms, web-based CIDI-SC self-administration to track treatment response, and interviewer-based CIDI interview administration using pre-loaded CIDI-SC responses.

The fact that AUCs of continuous CIDI-SC scales in ROC analyses (0.88–0.97) are considerably higher than AUCs of dichotomized CIDI-SC scales at their unbiased thresholds (0.81–0.86) means that meaningful variation in SCID prevalence exists throughout the CIDI-SC scale ranges. One implication, as shown in the comparative analyses of LR+ and LR− at multiple thresholds, is that different thresholds can be useful for screening in than screening out cases. Importantly, the CIDI-SC has excellent LR+ and LR− at multiple informative thresholds. Furthermore, continuous CIDI-SC scores can be converted into predicted probabilities of clinical diagnoses in epidemiological studies to yield more accurate estimates of prevalence than by dichotomizing scores and classifying each respondent as either a definite case or a non-case. This predicted probability approach is discussed in more detail elsewhere (Kessler et al. Reference Kessler, Green, Gruber, Sampson, Bromet, Cuitan, Furukawa, Gureje, Hinkov, Hu, Lara, Lee, Mneimneh, Myer, Oakley-Browne, Posada-Villa, Sagar, Viana and Zaslavsky2010b).

Despite these positive findings, several study limitations are noteworthy. First, CIDI-SC SN is lower for GAD than other diagnoses. Disaggregation shows that this is because CIDI-SC have difficulty operationalizing the DSM-IV requirement that worries be excessive. CIDI-SC questions for this requirement have a higher threshold than the SCID. A similar result was found in an earlier study of the full CIDI (Wittchen et al. Reference Wittchen, Kessler, Zhao and Abelson1995). Concerns exist about clinician ability to determine when worries are excessive (Ruscio et al. Reference Ruscio, Lane, Roy-Byrne, Stang, Stein, Wittchen and Kessler2005), leading to the suggestion that more concrete guidance be given in DSM-5 about defining excessiveness (Andrews et al. Reference Andrews, Hobbs, Borkovec, Beesdo, Craske, Heimberg, Rapee, Ruscio and Stanley2010). Although such guidance does not appear in currently proposed DSM-5 criteria (www.dsm5.org/ProposedRevisions), new behavioral requirements (proposed DSM-5 Criterion D) of marked avoidance, time–effort, procrastination or seeking reassurance might help to establish a threshold for excessiveness that could be the basis for improving revised CIDI-SC GAD SN.

Second, although we want to use CIDI-SC results to create a cross-walk between general population CIDI epidemiological surveys and primary care CIDI-SC screening studies, no guarantee exists that CIDI-SC operating characteristics will be similar in community epidemiological surveys and primary care samples. It is consequently important to include CIDI-SC in future CIDI community surveys and validate their operating characteristics relative to diagnoses based on the full CIDI and SCID. Such methodological studies are currently underway in new CIDI surveys in the WHO WMH Survey Initiative (Kessler & Üstün, Reference Kessler, Üstün, Kessler and Üstün2008a).

Third, our clinical reappraisal sample was relatively small because of funding limitations, precluding cross-validation, subgroup analysis, or analysis of information values across the range of continuous CIDI-SC scores to evaluate sensitivity to change. These limitations make it especially important to replicate the current study in independent primary care samples, to investigate the stability of the encouraging results reported here and to carry out analyses of the clinical sensitivity of variation in continuous CIDI-SC scores to assess the severity of anxiety and depression. Larger replication studies could also help to establish an empirical foundation for determining whether even shorter versions of CIDI-SC might be developed based on computerized adaptive testing (Gibbons et al. Reference Gibbons, Feldman, Crane, Mugavero, Willig, Patrick, Schumacher, Saag, Kitahata and Crane2011).

Supplementary material

For supplementary material accompanying this paper visit: www.hcp.med.harvard.edu/wmhcidi/resources.php.

Acknowledgments

Support for this study was provided by AstraZeneca. The sponsor had no role in the design and conduct of the study; collection, management, analysis and interpretation of the data; and preparation, review or approval of the manuscript.

Declaration of Interest

R. C. Kessler has consulted for AstraZeneca, Analysis Group, Bristol–Myers Squibb, Cerner-Galt Associates, Eli Lilly and Company, GlaxoSmithKline Inc., HealthCore Inc., Health Dialog, Integrated Benefits Institute, John Snow Inc., Kaiser Permanente, Matria Inc., Mensante, Merck & Co. Inc., Ortho-McNeil Janssen Scientific Affairs, Pfizer Inc., Primary Care Network, Research Triangle Institute, Sanofi-Aventis Groupe, Shire US Inc., SRA International Inc., Takeda Global Research & Development, Transcept Pharmaceuticals Inc., and Wyeth-Ayerst; has served on advisory boards for Appliance Computing II, Eli Lilly and Company, Mindsite, Ortho-McNeil Janssen Scientific Affairs, Plus One Health Management and Wyeth-Ayerst; and has had research support for his epidemiological studies from Analysis Group Inc., Bristol–Myers Squibb, Eli Lilly and Company, EPI-Q, GlaxoSmithKline, Johnson & Johnson Pharmaceuticals, Ortho-McNeil Janssen Scientific Affairs., Pfizer Inc., Sanofi-Aventis Groupe, and Shire US Inc. J. R. Calabrese has received research grant support from Abbott, AstraZeneca, Bristol-Myers Squibb, Cephalon, Eli Lilly and Company, GlaxoSmithKline, Janssen, Repligen, Sunovion/DSPA, Takeda and Wyeth; has consulted to or served on advisory boards of AstraZeneca, Bristol–Myers Squibb, Cephalon, Sunovion/DSPA, Forest, GlaxoSmithKline, Janssen, Johnson and Johnson, Lundbeck, Neurosearch, OrthoMcNeil, Otsuka, Pfizer, Repligen, Schering-Plough, Servier, Solvay, Supernus, Synosia, and Wyeth; and has provided CME lectures supported by Abbott, AstraZeneca, Bristol–Myers Squibb, GlaxoSmithKline, Janssen, Johnson and Johnson, Lundbeck, Merck, Sanofi Aventis, Schering-Plough, Pfizer, Solvay, and Wyeth. P. A. Farley, M. A. Jewell and A. C. Shillington are employees of EPI-Q, the organization that implemented the primary care screening and recruitment for the SCID clinical reappraisal interviews in this study. In addition, they carry out contract research for AstraZeneca, Cephalon, Merck & Co. Inc., Sanofi-Aventis, GlaxoSmithKline, Genentech, Biogen, Roche, Transcept Pharmaceuticals Inc., Lundbeck, Shire US Inc., Takeda, Novartis, Pfizer, Abbott, and Adolor. Their compensation related to these activities is limited to their salary. They also own stock in EPI-Q. P. E. Keck is employed by the University of Cincinnati College of Medicine and University of Cincinnati Physicians, the organization that carried out the clinical reappraisal interviews in this study. He is presently or has been in the past year a principal or co-investigator on research studies sponsored by Alkermes, AstraZeneca, Cephalon, GlaxoSmithKline, Eli Lilly and Company, Marriott Foundation, National Institute of Mental Health (NIMH), Orexigen, Pfizer Inc., and Shire. He has been reimbursed for consulting to, in the past 2 years: 2011: Pamlab, 2012: Bristol–Myers Squibb. Patents: Dr Keck is a co-inventor on United States Patent No. 6,387,956: Shapira NA, Goldsmith TD, Keck PE Jr. (University of Cincinnati) Methods of treating obsessive-compulsive spectrum disorder comprises the step of administering an effective amount of tramadol to an individual. Filed 25 March 1999; approved 14 May 2002. Dr Keck has received no financial gain from this patent. A. A. Nierenberg has consulted for the American Psychiatric Association, Appliance Computing Inc. (Mindsite), Basliea, Brain Cells Inc., Brandeis University, Bristol–Myers Squibb, Dey Pharmaceuticals, Dainippon Sumitomo, Eli Lilly and Company, EpiQ, L.P./Mylan Inc., Novartis, PGx Health, Shire, Schering-Plough, Takeda Pharmaceuticals, and Targacept; consulted through the MGH Clinical Trials Network and Institute (CTNI) for AstraZeneca, Brain Cells Inc., Dianippon Sumitomo/Sepracor, Johnson and Johnson, Labopharm, Merck, Methylation Science, Novartis, PGx Health, Shire, Schering-Plough, Targacept and Takeda/Lundbeck Pharmaceuticals; received grant/research support from NIMH, PamLabs, Pfizer Pharmaceuticals, and Shire; received honoraria from Belvoir Publishing, University of Texas Southwestern Dallas, Hillside Hospital, American Drug Utilization Review, American Society for Clinical Psychopharmacology, Baystate Medical Center, Columbia University, CRICO, Dartmouth Medical School, IMEDEX, Israel Society for Biological Psychiatry, Johns Hopkins University, MJ Consulting, New York State, Medscape, MBL Publishing, National Association of Continuing Education, Physicians Postgraduate Press, SUNY Buffalo, University of Wisconsin, University of Pisa, University of Michigan, University of Miami, APSARD, ISBD, SciMed, Slack Publishing and Wolters Kluwer Publishing; was currently or formerly on the advisory boards of Appliance Computing Inc., Brain Cells Inc., Eli Lilly and Company, Johnson and Johnson, Takeda/Lundbeck, Targacept, and InfoMedic; owns stock options in Appliance Computing Inc. and Brain Cells Inc.; has copyrights to the Clinical Positive Affect Scale and the MGH Structured Clinical Interview for the Montgomery Asberg Depression Scale exclusively licensed to the MGH Clinical Trials Network and Institute (CTNI); and has a patent extension application for the combination of buspirone, bupropion, and melatonin for the treatment of depression. M. B. Stein has a financial interest/arrangement or affiliation with one or more organizations that could be perceived as a real or apparent conflict of interest in the context of the subject of this presentation. He receives or has in the past 3 years received Research Support from: Hoffmann-La Roche; is currently or in the past 3 years has been a Consultant for Care Management Technologies; and receives payment for editorial work from Depression and Anxiety (Publisher: Wiley-Blackwell). M. E. Thase has served as an advisory/consultant for Alkermes, AstraZeneca, Bristol–Myers Squibb Company, Eli Lilly and Company, Dey Pharma, L.P., Forest Laboratories, Gerson Lehman Group, GlaxoSmithKline (ended 2008), Guidepoint Global, H. Lundbeck A/S, MedAvante Inc., Merck and Co. Inc. (formerly Schering Plough and Organon), Neuronetics Inc., Novartis (ended 2008), Otsuka, Ortho-McNeil Pharmaceuticals (Johnson & Johnson), Pamlab, L.L.C., Pfizer (formerly Wyeth Ayerst Pharmaceuticals), PGx Inc., Shire US Inc., Sunovion Pharmaceuticals Inc., Supernus Pharmaceuticals, Takeda, Transcept Pharmaceuticals; has received grant support from Agency for Healthcare Research and Quality, Eli Lilly and Company, Forest Pharmaceuticals, GlaxoSmithKline (ended July 2010), National Institute of Mental Health, Otsuka Pharmaceuticals, Sepracor Inc. (ended January 2009); is or has been on the Speakers bureaus for AstraZeneca (ended June 2010), Bristol–Myers Squibb Company, Dey Pharmaceutical, Eli Lilly and Company (ended June 2009), Merck and Co. Inc., Pfizer (formerly Wyeth Ayerst Pharmaceuticals); holds equity in MedAvante Inc.; receives royalties from American Psychiatric Foundation, Guilford Publications, Herald House and W.W. Norton & Company Inc.; and has a spouse employed by Embryon (Formerly Advogent; Embryon does business with BMS and Pfizer/Wyeth). H.-U. Wittchen has consulted for and received research grants from Lundbek, Pfizer Inc., Novartis, Abbott, Sanofi-Aventis, Eli Lilly and Company, Merck and Co., and GlaxoSmithKline.

References

Altman, DG, Bland, JM (1994). Diagnostic tests 2: Predictive values. British Medical Journal 309, 102.CrossRefGoogle ScholarPubMed
Andrews, G, Hobbs, MJ, Borkovec, TD, Beesdo, K, Craske, MG, Heimberg, RG, Rapee, RM, Ruscio, AM, Stanley, MA (2010). Generalized worry disorder: a review of DSM-IV generalized anxiety disorder and options for DSM-V. Depression and Anxiety 27, 134147.CrossRefGoogle ScholarPubMed
Barkow, K, Maier, W, Üstün, TB, Gansicke, M, Wittchen, HU, Heun, R (2003). Risk factors for depression at 12-month follow-up in adult primary health care patients with major depression: an international prospective study. Journal of Affective Disorders 76, 157169.CrossRefGoogle ScholarPubMed
Barrett, B, Byford, S, Knapp, M (2005). Evidence of cost-effective treatments for depression: a systematic review. Journal of Affective Disorders 84, 113.CrossRefGoogle ScholarPubMed
Beekman, AT, Deeg, DJ, Van Limbeek, J, Braam, AW, De Vries, MZ, Van Tilburg, W (1997). Criterion validity of the Center for Epidemiologic Studies Depression scale (CES-D): results from a community-based sample of older subjects in The Netherlands. Psychological Medicine 27, 231235.CrossRefGoogle Scholar
Broadhead, WE, Leon, AC, Weissman, MM, Barrett, JE, Blacklow, RS, Gilbert, TT, Keller, MB, Olfson, M, Higgins, ES (1995). Development and validation of the SDDS-PC screen for multiple mental disorders in primary care. Archives of Family Medicine 4, 211219.CrossRefGoogle ScholarPubMed
Bruce, SE, Yonkers, KA, Otto, MW, Eisen, JL, Weisberg, RB, Pagano, M, Shea, MT, Keller, MB (2005). Influence of psychiatric comorbidity on recovery and recurrence in generalized anxiety disorder, social phobia, and panic disorder: a 12-year prospective study. American Journal of Psychiatry 162, 11791187.CrossRefGoogle ScholarPubMed
Bunevicius, A, Peceliuniene, J, Mickuviene, N, Valius, L, Bunevicius, R (2007). Screening for depression and anxiety disorders in primary care patients. Depression and Anxiety 24, 455460.CrossRefGoogle ScholarPubMed
Cohen, J (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20, 3746.CrossRefGoogle Scholar
Committee on Integration to Improve Population Health (2012). Primary Care and Public Health: Exploring Integration to Improve Population Health. National Academy Press: Washington, DC.Google Scholar
Converse, J, Presser, S (1986). Survey Questions: Handcrafting the Standardized Questionnaire. Sage: Thousand Oaks, CA.CrossRefGoogle Scholar
Cook, RJ (1998). Kappa and its dependence on marginal rates. In The Encyclopedia of Biostatistics (ed. Armitage, P. and Colton, T.), pp. 21662168. Wiley: New York, NY.Google Scholar
Dodd, S, Williams, LJ, Jacka, FN, Pasco, JA, Bjerkeset, O, Berk, M (2009). Reliability of the Mood Disorder Questionnaire: comparison with the Structured Clinical Interview for the DSM-IV-TR in a population sample. Australian and New Zealand Journal of Psychiatry 43, 526530.CrossRefGoogle Scholar
Donker, T, van Straten, A, Marks, I, Cuijpers, P (2009). A brief Web-based screening questionnaire for common mental disorders: development and validation. Journal of Medical Internet Research 11, e19.CrossRefGoogle ScholarPubMed
Donker, T, van Straten, A, Marks, I, Cuijpers, P (2011). Quick and easy self-rating of Generalized Anxiety Disorder: validity of the Dutch web-based GAD-7, GAD-2 and GAD-SI. Psychiatry Research 188, 5864.CrossRefGoogle ScholarPubMed
Farvolden, P, McBride, C, Bagby, RM, Ravitz, P (2003). A Web-based screening instrument for depression and anxiety disorders in primary care. Journal of Medical Internet Research 5, e23.CrossRefGoogle ScholarPubMed
First, MB, Spitzer, RL, Gibbon, M, Williams, JBW (2002). Structured Clinical Interview for DSM-IV Axis I Disorders, Research Version, Non-Patient Edition (SCID-I/NP). Biometrics Research, New York State Psychiatric Institute: New York, NY.Google Scholar
Gaynes, BN, DeVeaugh-Geiss, J, Weir, S, Gu, H, MacPherson, C, Schulberg, HC, Culpepper, L, Rubinow, DR (2010). Feasibility and diagnostic validity of the M-3 checklist: a brief, self-rated screen for depressive, bipolar, anxiety, and post-traumatic stress disorders in primary care. Annals of Family Medicine 8, 160169.CrossRefGoogle ScholarPubMed
Gibbons, LE, Feldman, BJ, Crane, HM, Mugavero, M, Willig, JH, Patrick, D, Schumacher, J, Saag, M, Kitahata, MM, Crane, PK (2011). Migrating from a legacy fixed-format measure to CAT administration: calibrating the PHQ-9 to the PROMIS depression measures. Quality of Life Research 20, 13491357.CrossRefGoogle Scholar
Gilbody, S, Richards, D, Brealey, S, Hewitt, C (2007). Screening for depression in medical settings with the Patient Health Questionnaire (PHQ): a diagnostic meta-analysis. Journal of General Internal Medicine 22, 15961602.CrossRefGoogle ScholarPubMed
Gili, M, Luciano, JV, Serrano, MJ, Jimenez, R, Bauza, N, Roca, M (2011). Mental disorders among frequent attenders in primary care: a comparison with routine attenders. Journal of Nervous and Mental Disease 199, 744749.CrossRefGoogle ScholarPubMed
Green, JG, Avenevoli, S, Finkelman, M, Gruber, MJ, Kessler, RC, Merikangas, KR, Sampson, NA, Zaslavsky, AM (2011). Validation of the diagnoses of panic disorder and phobic disorders in the US National Comorbidity Survey Replication Adolescent (NCS-A) supplement. International Journal of Methods in Psychiatric Research 20, 105115.CrossRefGoogle ScholarPubMed
Groves, RM, Fowler, FJ Jr., Couper, MP, Lepkowski, JM, Singer, E, Tourangeau, R (2009). Survey Methodology, 2nd edn. Wiley: New York, NY.Google Scholar
Haynes, RB, Sackett, DL, Guyatt, GH, Tugwell, P (2006). Clinical Epidemiology: How to Do Clinical Practice Research, 3rd edn. Lippincott Williams & Wilkins: Philadelphia, PA.Google Scholar
Hirschfeld, RM, Cass, AR, Holt, DC, Carlson, CA (2005). Screening for bipolar disorder in patients treated for depression in a family medicine clinic. Journal of the American Board of Family Practice 18, 233239.CrossRefGoogle Scholar
Hirschfeld, RM, Holzer, C, Calabrese, JR, Weissman, M, Reed, M, Davies, M, Frye, MA, Keck, P, McElroy, S, Lewis, L, Tierce, J, Wagner, KD, Hazard, E (2003). Validity of the Mood Disorder Questionnaire: a general population study. American Journal of Psychiatry 160, 178180.CrossRefGoogle ScholarPubMed
Hirschfeld, RM, Vornik, LA (2004). Recognition and diagnosis of bipolar disorder. Journal of Clinical Psychiatry 65 (Suppl. 15), 59.Google ScholarPubMed
Hirschfeld, RM, Williams, JB, Spitzer, RL, Calabrese, JR, Flynn, L, Keck, Jr. PE, Lewis, L, McElroy, SL, Post, RM, Rapport, DJ, Russell, JM, Sachs, GS, Zajecka, J (2000). Development and validation of a screening instrument for bipolar spectrum disorder: the Mood Disorder Questionnaire. American Journal of Psychiatry 157, 18731875.CrossRefGoogle ScholarPubMed
Houston, JP, Kroenke, K, Faries, DE, Doebbeling, CC, Adler, LA, Ahl, J, Swindle, R, Trzepacz, PT (2011). A provisional screening instrument for four common mental disorders in adult primary care patients. Psychosomatics 52, 4855.CrossRefGoogle ScholarPubMed
Hunter, EE, Penick, EC, Powell, BJ, Othmer, E, Nickel, EJ, Desouza, C (2005). Development of scales to screen for eight common psychiatric disorders. Journal of Nervous and Mental Disease 193, 131135.CrossRefGoogle ScholarPubMed
Katz, MM, Secunda, SK, Hirschfeld, RM, Koslow, SH (1979). NIMH clinical research branch collaborative program on the psychobiology of depression. Archives of General Psychiatry 36, 765771.CrossRefGoogle ScholarPubMed
Kessler, RC, Adler, LA, Gruber, MJ, Sarawate, CA, Spencer, T, Van Brunt, DL (2007). Validity of the World Health Organization Adult ADHD Self-Report Scale (ASRS) Screener in a representative sample of health plan members. International Journal of Methods in Psychiatric Research 16, 5265.CrossRefGoogle Scholar
Kessler, RC, Akiskal, HS, Angst, J, Guyer, M, Hirschfeld, RM, Merikangas, KR, Stang, PE (2006). Validity of the assessment of bipolar spectrum disorders in the WHO CIDI 3.0. Journal of Affective Disorders 96, 259269.CrossRefGoogle ScholarPubMed
Kessler, RC, Coulouvrat, C, Hajak, G, Lakoma, MD, Roth, T, Sampson, N, Shahly, V, Shillington, A, Stephenson, JJ, Walsh, JK, Zammit, GK (2010 a). Reliability and validity of the Brief Insomnia Questionnaire in the America Insomnia Survey. Sleep 33, 15391549.CrossRefGoogle ScholarPubMed
Kessler, RC, Demler, O, Frank, RG, Olfson, M, Pincus, HA, Walters, EE, Wang, P, Wells, KB, Zaslavsky, AM (2005). Prevalence and treatment of mental disorders, 1990 to 2003. New England Journal of Medicine 352, 25152523.CrossRefGoogle ScholarPubMed
Kessler, RC, Green, JG, Gruber, MJ, Sampson, NA, Bromet, E, Cuitan, M, Furukawa, TA, Gureje, O, Hinkov, H, Hu, CY, Lara, C, Lee, S, Mneimneh, Z, Myer, L, Oakley-Browne, M, Posada-Villa, J, Sagar, R, Viana, MC, Zaslavsky, AM (2010 b). Screening for serious mental illness in the general population with the K6 screening scale: results from the WHO World Mental Health (WMH) Survey Initiative. International Journal of Methods in Psychiatric Research 19 (Suppl. 1), 422.CrossRefGoogle ScholarPubMed
Kessler, RC, Üstün, TB (2004). The World Mental Health (WMH) Survey Initiative Version of the World Health Organization (WHO) Composite International Diagnostic Interview (CIDI). International Journal of Methods in Psychiatric Research 13, 93121.CrossRefGoogle ScholarPubMed
Kessler, RC, Üstün, TB (2008 a). Overview and future directions for the World Mental Health Survey Initiative. In The WHO World Mental Health Surveys: Global Perspectives on the Epidemiology of Mental Disorders (ed. Kessler, R. C. and Üstün, T. B.), pp. 555568. Cambridge University Press: New York, NY.Google Scholar
Kessler, RC, Üstün, TB (eds) (2008 b). The WHO World Mental Health Surveys: Global Perspectives on the Epidemiology of Mental Disorders. Cambridge University Press: New York, NY.Google Scholar
Kessler, RC, Wittchen, H-U, Abelson, JM, McGonagle, KA, Schwarz, N, Kendler, KS, Knäuper, B, Zhao, S (1998). Methodological studies of the Composite International Diagnostic Interview (CIDI) in the US National Comorbidity Survey. International Journal of Methods in Psychiatric Research 7, 3355.CrossRefGoogle Scholar
Kisely, S, Scott, A, Denney, J, Simon, G (2006). Duration of untreated symptoms in common mental disorders: association with outcomes: international study. British Journal of Psychiatry 189, 7980.CrossRefGoogle ScholarPubMed
Kisely, S, Simon, G (2005). An international study of the effect of physical ill health on psychiatric recovery in primary care. Psychosomatic Medicine 67, 116122.CrossRefGoogle Scholar
Klinkman, MS, Coyne, JC, Gallo, S, Schwenk, TL (1997). Can case-finding instruments be used to improve physician detection of depression in primary care? Archives of Family Medicine 6, 567573.CrossRefGoogle ScholarPubMed
Konnopka, A, Leichsenring, F, Leibing, E, Konig, HH (2009). Cost-of-illness studies and cost-effectiveness analyses in anxiety disorders: a systematic review. Journal of Affective Disorders 114, 1431.CrossRefGoogle ScholarPubMed
Kraemer, HC (1992). Evaluating Medical Tests: Objective and Quantitative Guidelines. Sage: Newbury Park, CA.Google Scholar
Kroenke, K, Spitzer, RL, Williams, JB, Lowe, B (2010). The Patient Health Questionnaire Somatic, Anxiety, and Depressive Symptom Scales: a systematic review. General Hospital Psychiatry 32, 345359.CrossRefGoogle ScholarPubMed
Kroenke, K, Spitzer, RL, Williams, JB, Monahan, PO, Lowe, B (2007). Anxiety disorders in primary care: prevalence, impairment, comorbidity, and detection. Annals of Internal Medicine 146, 317325.CrossRefGoogle ScholarPubMed
Landis, JR, Koch, GG (1977). The measurement of observer agreement for categorical data. Biometrics 33, 159174.CrossRefGoogle ScholarPubMed
Lowe, B, Grafe, K, Zipfel, S, Spitzer, RL, Herrmann-Lingen, C, Witte, S, Herzog, W (2003). Detecting panic disorder in medical and psychosomatic outpatients: comparative validation of the Hospital Anxiety and Depression Scale, the Patient Health Questionnaire, a screening question, and physicians' diagnosis. Journal of Psychosomatic Research 55, 515519.CrossRefGoogle ScholarPubMed
Lowe, B, Spitzer, RL, Williams, JB, Mussell, M, Schellberg, D, Kroenke, K (2008). Depression, anxiety and somatization in primary care: syndrome overlap and functional impairment. General Hospital Psychiatry 30, 191199.CrossRefGoogle ScholarPubMed
Manea, L, Gilbody, S, McMillan, D (2012). Optimal cut-off score for diagnosing depression with the Patient Health Questionnaire (PHQ-9): a meta-analysis. Canadian Medical Association Journal 184, E191E196.CrossRefGoogle ScholarPubMed
Margolis, DJ, Bilker, W, Boston, R, Localio, R, Berlin, JA (2002). Statistical characteristics of area under the receiver operating characteristic curve for a simple prognostic model using traditional and bootstrapped approaches. Journal of Clinical Epidemiology 55, 518524.CrossRefGoogle ScholarPubMed
Miller, CJ, Johnson, SL, Kwapil, TR, Carver, CS (2011). Three studies on self-report scales to detect bipolar disorder. Journal of Affective Disorders 128, 199210.CrossRefGoogle ScholarPubMed
Miller, CJ, Klugman, J, Berv, DA, Rosenquist, KJ, Ghaemi, SN (2004). Sensitivity and specificity of the Mood Disorder Questionnaire for detecting bipolar disorder. Journal of Affective Disorders 81, 167171.CrossRefGoogle ScholarPubMed
Parker, G, Fletcher, K, Barrett, M, Synnott, H, Breakspear, M, Hyett, M, Hadzi-Pavlovic, D (2008). Screening for bipolar disorder: the utility and comparative properties of the MSS and MDQ measures. Journal of Affective Disorders 109, 8389.CrossRefGoogle ScholarPubMed
Parker, G, Graham, R, Hadzi-Pavlovic, D, Fletcher, K, Hong, M, Futeran, S (2012). Further examination of the utility and comparative properties of the MSQ and MDQ bipolar screening measures. Journal of Affective Disorders 138, 104109.CrossRefGoogle ScholarPubMed
Parker, G, Hadzi-Pavlovic, D, Tully, L (2006). Distinguishing bipolar and unipolar disorders: an isomer model. Journal of Affective Disorders 96, 6773.CrossRefGoogle ScholarPubMed
Pepe, MS (2003). Statistical Analysis of Medical Tests for Classification and Prediction. Oxford University Press: New York, NY.CrossRefGoogle Scholar
Presser, S, Blair, J (1994). Survey pretesting: do different methods yield different results? Sociological Methodology 24, 73104.CrossRefGoogle Scholar
Presser, S, Couper, MP, Lessler, JT, Martin, E, Martin, J, Rothgeb, JM, Singer, E (2004). Methods for testing and evaluating survey questions. Public Opinion Quarterly 68, 109130.CrossRefGoogle Scholar
Radloff, LS (1977). The CES-D Scale: a self-report depression scale for research in the general population. Applied Psychological Measurement 1, 385401.CrossRefGoogle Scholar
Robins, LN, Wing, J, Wittchen, HU, Helzer, JE, Babor, TF, Burke, J, Farmer, A, Jablenski, A, Pickens, R, Regier, DA, Sartorius, N, Towle, L (1988). The Composite International Diagnostic Interview. An epidemiologic instrument suitable for use in conjunction with different diagnostic systems and in different cultures. Archives of General Psychiatry 45, 10691077.CrossRefGoogle ScholarPubMed
Ruscio, AM, Lane, M, Roy-Byrne, P, Stang, PE, Stein, DJ, Wittchen, HU, Kessler, RC (2005). Should excessive worry be required for a diagnosis of generalized anxiety disorder? Results from the US National Comorbidity Survey Replication. Psychological Medicine 35, 17611772.CrossRefGoogle ScholarPubMed
Schulberg, HC, Saul, M, McClelland, M, Ganguli, M, Christy, W, Frank, R (1985). Assessing depression in primary medical and psychiatric practices. Archives of General Psychiatry 42, 11641170.CrossRefGoogle ScholarPubMed
Spitzer, RL, Kroenke, K, Williams, JB (1999). Validation and utility of a self-report version of PRIME-MD: the PHQ primary care study. Primary Care Evaluation of Mental Disorders. Patient Health Questionnaire. Journal of the American Medical Association 282, 17371744.CrossRefGoogle ScholarPubMed
Stein, MB, Roy-Byrne, PP, McQuaid, JR, Laffaye, C, Russo, J, McCahill, ME, Katon, W, Craske, M, Bystritsky, A, Sherbourne, CD (1999). Development of a brief diagnostic screen for panic disorder in primary care. Psychosomatic Medicine 61, 359364.CrossRefGoogle ScholarPubMed
Susser, E, Schwartz, S, Morabia, A, Bromet, EJ (2006). Psychiatric Epidemiology: Searching for the Causes of Mental Disorders. Oxford Universiy Press: New York, NY.CrossRefGoogle Scholar
Thomas, JL, Jones, GN, Scarinci, IC, Mehan, DJ, Brantley, PJ (2001). The utility of the CES-D as a depression screening measure among low-income women attending primary care clinics. The Center for Epidemiologic Studies-Depression. International Journal of Psychiatry in Medicine 31, 2540.CrossRefGoogle ScholarPubMed
Tsuang, MT, Tohen, M, Jones, P (eds) (2011). Textbook of Psychiatric Epidemiology. Wiley: New York, NY.CrossRefGoogle Scholar
Twiss, J, Jones, S, Anderson, I (2008). Validation of the Mood Disorder Questionnaire for screening for bipolar disorder in a UK sample. Journal of Affective Disorders 110, 180184.CrossRefGoogle Scholar
Üstün, TB, Sartorius, N (eds) (1995). Mental Illness in General Health Care: An International Study. Wiley: New York, NY.Google Scholar
Weber Rouget, B, Gervasoni, N, Dubuis, V, Gex-Fabry, M, Bondolfi, G, Aubry, JM (2005). Screening for bipolar disorders using a French version of the Mood Disorder Questionnaire (MDQ). Journal of Affective Disorders 88, 103108.CrossRefGoogle ScholarPubMed
Willis, GB (2005). Cognitive Interviewing: A Tool for Improving Questionnaire Design. Sage: Thousand Oaks, CA.CrossRefGoogle Scholar
Wittchen, HU (1994). Reliability and validity studies of the WHO-Composite International Diagnostic Interview (CIDI): a critical review. Journal of Psychiatric Research 28, 5784.CrossRefGoogle ScholarPubMed
Wittchen, HU, Kessler, RC, Beesdo, K, Krause, P, Hofler, M, Hoyer, J (2002). Generalized anxiety and depression in primary care: prevalence, recognition, and management. Journal of Clinical Psychiatry 63 (Suppl. 8), 2434.Google ScholarPubMed
Wittchen, HU, Kessler, RC, Zhao, S, Abelson, J (1995). Reliability and clinical validity of UM-CIDI DSM-III-R generalized anxiety disorder. Journal of Psychiatric Research 29, 95110.CrossRefGoogle ScholarPubMed
Wittkampf, KA, Naeije, L, Schene, AH, Huyser, J, van Weert, HC (2007). Diagnostic accuracy of the mood module of the Patient Health Questionnaire: a systematic review. General Hospital Psychiatry 29, 388395.CrossRefGoogle ScholarPubMed
Wolter, KM (1985). Introduction to Variance Estimation. Springer-Verlag: New York, NY.Google Scholar
Zigmond, AS, Snaith, RP (1983). The hospital anxiety and depression scale. Acta Psychiatrica Scandinavica 67, 361370.CrossRefGoogle ScholarPubMed
Zimmerman, M, Galione, JN, Ruggero, CJ, Chelminski, I, McGlinchey, JB, Dalrymple, K, Young, D (2009). Performance of the mood disorders questionnaire in a psychiatric outpatient setting. Bipolar Disorders 11, 759765.CrossRefGoogle Scholar
Figure 0

Table 1. Consistency of DSM-IV diagnoses based on the CIDI screening scales (CIDI-SC) at their optimal (to estimate prevalence) thresholds and based on the SCID (n = 206)

Figure 1

Table 2. CIDI screening scale (CIDI-SC) classification of DSM-IV/SCID cases and non-cases at different thresholds on the CIDI-SC (n = 206)a

Figure 2

Table 3. CIDI screening scale (CIDI-SC) sensitivity (SN) and likelihood ratio positive (LR+) for detecting severe and non-severe DSM-IV/SCID cases (n = 206)