Deconstructing major depression: a validation study of the DSM-IV symptomatic criteria

V. Lux; K. S. Kendler

doi:10.1017/S0033291709992157

Deconstructing major depression: a validation study of the DSM-IV symptomatic criteria

Published online by Cambridge University Press: 11 January 2010

V. Lux and

K. S. Kendler

Show author details

V. Lux: Affiliation:
Department of Psychology, Free University Berlin, Germany
K. S. Kendler*: Affiliation:
Virginia Institute for Psychiatric and Behavioral Genetics and Departments of Psychiatry and Human and Molecular Genetics, Virginia Commonwealth University, Richmond, VA USA
*: *Address for correspondence: Dr K. S. Kendler, Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Box 980126, Richmond, VA 23298–0126, USA. (Email: kendler@vcu.edu)

Article contents

Abstract
Background
Method
Results
Conclusions
Introduction
Methods
Results
Discussion
Conclusion
References

Rights & Permissions

Abstract

Background

The DSM-IV symptomatic criteria for major depression (MD) derive primarily from clinical experience with modest empirical support.

Method

The sample studied included 1015 (518 males, 497 females) Caucasian twins from a population-based registry who met criteria for MD in the year prior to the interview. Logistic regression analyses were conducted to compare the associations of: (1) single symptomatic criterion, (2) two groups of criteria reflecting cognitive and neurovegetative symptoms, with a wide range of potential validators including demographic factors, risk for future episodes, risk of MD in the co-twin, characteristics of the depressive episode, the pattern of co-morbidity and personality traits.

Results

The individual symptomatic criteria showed widely varying associations with the pattern of co-morbidity, personality traits, features of the depressive episode and demographic characteristics. When examined separately, these two criteria groups showed robust differences in their patterns of association, with the validators with the cognitive criteria generally producing stronger associations than the neurovegetative.

Conclusions

Among depressed individuals, individual DSM-IV symptomatic criteria differ substantially in their predictive relationship with a range of clinical validators. These results challenge the equivalence assumption for the symptomatic criteria for MD and suggest a more than expected degree of ‘covert’ heterogeneity among these criteria. Part of this heterogeneity is captured by the distinction between cognitive versus neurovegetative symptoms, with cognitive symptoms being more strongly associated with most clinically relevant characteristics. Detailed psychometric evaluation of DSM-IV criteria is overdue.

Keywords

Criteria symptoms DSM-IV heterogeneity major depression psychiatric diagnosis

Type: Original Articles
Information: Psychological Medicine , Volume 40 , Issue 10 , October 2010 , pp. 1679 - 1690

DOI: https://doi.org/10.1017/S0033291709992157 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2010

Introduction

A key component of the current approach to the diagnosis of major depression (MD) in DSM-IV is the symptomatic or ‘A’ criteria. These nine criteria (‘depressed mood’, ‘loss of interest’, ‘appetite or weight change’, ‘hyper- or insomnia’, ‘psychomotor agitation or retardation’, ‘fatigue’, ‘feelings of worthlessness or guilt’, ‘diminished ability to think or concentrate or indecisiveness’ and ‘suicidal ideation or thoughts of death’) derive, with only modest alteration, from the list of symptoms proposed in the Feighner et al. (Reference Feighner, Robins, Guze, Woodruff, Winokur and Munoz1972) criteria. These criteria were created by consensus of senior clinicians at Washington University in the early 1970s and were particularly influenced by a study by Cassidy et al. (Reference Cassidy, Flanagan, Spellman and Cohen1957) that analyzed symptom frequencies in manic-depressive patients as compared with those with medical illnesses (Cassidy et al. Reference Cassidy, Flanagan, Spellman and Cohen1957; Zimmerman et al. Reference Zimmerman, McGlinchey, Young and Chelminski2006b; Kendler et al. Reference Kendler, Munoz and Murphy2009). The symptoms included in the Cassidy study were, in turn, partly based on those utilized in a 1950 report, which examined symptoms in 50 cases of melancholia (Stone & Burris, Reference Stone and Burris1950).

Since then, many studies have examined the diagnostic reliability of this classification (for example, Spitzer & Fleiss, Reference Spitzer and Fleiss1974; Spitzer et al. Reference Spitzer, Endicott and Robins1978; Keller et al. Reference Keller, Klein, Hirschfeld, Kocsis, McCullough, Miller, First, Holzer, Keitner, Marin and Shea1995), the differentiation between clinical and subclinical depression (for example, Kendler & Gardner, Reference Kendler and Gardner1998; Parker et al. Reference Parker, Wilhelm and Asghari1998), the validity of subtypes (Frances et al. Reference Frances, Pincus, Widiger, Davis and First1990; Kendler et al. Reference Kendler, Eaves, Walters, Neale, Heath and Kessler1996; Schotte et al. Reference Schotte, Maes, Cluydts and Cosyns1997; Chen et al. Reference Chen, Eaton, Gallo and Nestadt2000; Sullivan et al. Reference Sullivan, Prescott and Kendler2002; Carragher et al. Reference Carragher, Adamson, Bunting and McCann2009) and the relationship between single symptoms and depressive subtypes with severity of MD (Faravelli et al. Reference Faravelli, Servi, Arends and Strik1996). By contrast, we are aware of only two studies that have addressed, even indirectly, the validity of these widely used symptomatic criteria for MD (McGlinchey et al. Reference McGlinchey, Zimmerman, Young and Chelminski2006; Zimmerman et al. Reference Zimmerman, McGlinchey, Young and Chelminski2006a). Both concluded that further efforts at validation were needed. Zimmerman et al. (Reference Zimmerman, McGlinchey, Young and Chelminski2006a) examined the content validity of the DSM-IV classification of MD and found the criteria were somewhat redundant. The same research group examined the ability of single depressive symptoms to differentiate between patients with and without a diagnosis of MD and found that the individual criterion performed quite differently in this regard (McGlinchey et al. Reference McGlinchey, Zimmerman, Young and Chelminski2006).

In this report, we examine the validity of the DSM-IV symptomatic criteria for MD in a sample of ∼1000 cases of MD reported in the last year in a population-based cohort of twins (Kendler & Prescott, Reference Kendler and Prescott2006). First, we examine the ability of individual criterion to predict uniquely a broad set of validators, including risk of MD in co-twin, demographic characteristics, lifetime co-morbidities, characteristics at index episode, prior depression history and risk for future episodes. Based on these results, we then aggregate two sets of symptomatic MD criteria ‘A’ and compare their performance on the same set of validators.

Methods

Sample and measurements

Participants in this report derive from two inter-related studies in Caucasian same-sex twin pairs from the Virginia Adult Twin Study of Psychiatric and Substance Use Disorders (VATSPSUD) (Kendler & Prescott, Reference Kendler and Prescott2006). All subjects were ascertained from the Virginia Twin Registry – a population-based register formed from a systematic review of birth certificates in the Commonwealth of Virginia. Female–female (FF) twin pairs, from birth years 1934–1974, became eligible if both members previously responded to a mailed questionnaire in 1987–1988, the response rate to which was ∼64%. The first face-to-face interview (FF1) was completed by 92% (n=2163) of the eligible twins. These twins participated in three subsequent interviews with cooperation rates ranging from 85 to 93%. Participating twin pairs were identified for the male–male and male–female (MMMF) cohort from twins with birth years from 1940–1974 initially ascertained directly from registry records. The first MMMF interview (MMMF1) was conducted by telephone with a response rate of 72% (n=6812). This sample was re-interviewed once with an 83% response rate. Zygosity was determined by discriminate function analyses using standard twin questions validated against DNA genotyping in 496 pairs (Kendler & Prescott, Reference Kendler and Prescott1999).

MD diagnoses were based on a module in the FF1 and MMMF1, where every subject was asked whether they experienced each of the disaggregated criteria symptoms for DSM-IV MD in the year prior to the interview. Separate questions were asked for psychomotor agitation and retardation, insomnia and hypersomnia, weight loss and gain and appetite increase and decrease.

DSM-IV criteria for last-year MD were met by 217 twins from the FF1 and 798 twins from the MMMF1 interviews, prevalence rates are 10% [95% confident interval (CI) 8.8–11.4%] in the FF1 and 11.7% (95% CI 11.0–12.5%) in the first interview of male–female twins. Of these 1015 twins, 518 were males and 497 were females. At the time of the first interview, their ages ranged from 18 to 57 years with a mean of 34.5. There were 83 twin pairs, with both twins diagnosed with MD, and two more pairs from two triplets who were diagnosed with MD.

For our first analysis, we created binary dummy variables for each of the nine criteria, indicating the presence of each as part of the depressive syndrome. For the disaggregated criteria, it was counted present when at least one of the disaggregated symptoms was reported (e.g. ‘weight loss’ for appetite/weight change). In addition, a criterion was not counted as part of the depressive syndrome when reported due to medication or illness.

For comparing the two symptom groups – cognitive and neurovegetative – we counted the number of endorsed symptoms separately for each group. Cognitive symptoms included ‘depressed mood’, ‘loss of interest’, ‘worthlessness/guilt’ and ‘suicidal ideation’ and neurovegetative symptoms included ‘sleep’, ‘appetite/weight’, ‘psychomotor’ changes and ‘fatigue’. ‘Trouble concentrating’ was excluded from this analysis (see Results for details).

The VATSPSUD includes a rich set of data about future episodes, co-twin history of MD, lifetime co-morbidities, demographic characteristics and characteristics of the index depressive episode (Kendler & Prescott, Reference Kendler and Prescott2006). For demographic characteristics, characteristics of the index depressive episode and last year co-morbidity with general anxiety disorder (GAD), the data came from the same interview wave. For future episodes, depressive episodes of the co-twin and all other co-morbidities, data were obtained from all interview waves to maximize available information.

GAD was diagnosed using the DSM-III-R criteria (APA, 1987) requiring a minimum of 1 month duration. Panic disorder was also diagnosed using the DSM-III-R criteria. We also examined a broad definition of panic (‘panic broad’), for which the only criterion was a positive response to a probe question for lifetime panic attacks (‘Thinking back over your entire life, have you ever had a spell or attack when you suddenly felt frightened or extremely uncomfortable in a situation in which you didn't expect to feel that way?’). Real danger or clear phobic stimuli were excluded. ‘Any phobia’ was diagnosed using an adaptation of DSM-III criteria (APA 1980) requiring one or more unreasonable fears, including fears of different animals, social phobia and agoraphobia that objectively interfered with the respondent's life. Nicotine dependence was defined as a score ⩾7 on the Fagerström Tolerance Questionnaire (Fagerström, Reference Fagerström1978) and alcohol dependence and illicit drug dependence were diagnosed using DSM-IV criteria (APA, 1994). Adult antisocial personality traits were defined as meeting three or more of the DSM-III-R (APA, 1987) ‘C criteria’ for antisocial personality disorder. Extraversion was assessed with eight and neuroticism with 12 items from the short form of the self-administered Eysenck Personality Questionnaire (Eysenck et al. Reference Eysenck, Eysenck and Barrett1985). For ‘co-occurring anxiety symptoms’ we used a binary variable indicating whether the respondent endorsed at least one of two anxiety symptoms for which we had separate questions: ‘felt anxious, nervous or worried’ and ‘muscles felt tense or felt jumpy or shaky inside’ lasting at least 5 days in the last year prior to the interview. ‘Chronic MD’ was defined as a depressive episode lasting 12 months or longer. For ‘something happened before the depressive episode’, we used a question asking whether ‘something happened to make you feel that way or did the feeling just come on you “out of the blue”?’ for the worst depressive episode in the last year. ‘Seeking help’ was assessed by a question asking whether the respondent went to get help from health professionals, ministers, self-help groups or anyone else. Finally, childhood sexual abuse was defined by a positive response to the question: ‘Have you ever been sexually abused or molested?’ and when the abuse happened before the age of 16 years.

Data analysis

We conducted logistic regression analyses with the logistic function in SAS (SAS Institute, 2005). Our approach was to compare the odds ratios (ORs) of each test to examine possible patterns of differences, with the p values indicating the general probability of results arising from chance. (For our approach to the question of multiple testing, see Limitations.) For the first set of analyses, the validator variable functioned as dependent variable while the dummy variable representing that the specific symptom was part of the syndrome was the predictor variable. For the second set of analyses, which compared the two symptom groups, the number of cognitive or number of neurovegetative symptoms was the predictor variable. Depending on the validator variable, we fitted a binary or cumulative logit model to the data, with age and sex as covariates. In addition, we included the number of positive criteria symptoms as a covariate in the model when comparing the single symptoms with control for the overall clinical severity of the depressive episode as indexed by the number of endorsed criteria. When we examined the two symptom groups, the symptom counts for both groups were included in the model to get the unique predictive power for the cognitive and the neurovegetative symptom count controlling for the number of symptoms endorsed in the other symptom group. In addition, we also tested if the observed differences in the odds ratios between both groups were significant using the test statement of the logistic procedure in SAS (SAS Institute, 2005).

When risk of MD in co-twin was the dependent variable, zygosity was included as covariate in the model. p Values are reported two-tailed except for risk of MD in co-twin, where we report one-tailed values, given the prior prediction of twin resemblance.

In interpreting these results, it is important to note that the first set of analyses that examined individual criterion controlled for total number of endorsed symptoms. In these analyses, an OR <1 would indicate that compared with all subjects who endorsed n criteria, those who endorsed symptom X predicted the validator more poorly. However, in our analyses of criterion groups, controlling for number of endorsed symptoms had the undesirable effect of causing a strong negative correlation between the predictive effects of the two symptom groups. Therefore, in these analyses, where we did not control for total endorsed criteria, we would expect nearly all ORs to be positive as more symptoms tend to predict these validators. Our goal here was to compare the strength of the observed positive ORs between the two symptom groups.

Results

Logistic regression analyses of single symptoms

The first nine columns of Table 1 show the results from the logistic regression analyses of the nine A criteria with each of the 25 validator variables. ORs, 95% CI and levels of statistical significance are presented, controlling for age, sex and the total number of endorsed criteria. For the cumulative logit models, the ORs reflect the impact of a 1 s.d. change in the dependent variable.

Table 1. Comparison major depression (MD) criteria symptoms

^a One-tailed p values, regression model with zygosity as additional covariate;

n, sample size, differences in sample sizes throughout validators due to missing data; BL, binary logit model; CL, cumulative logit model, covariates included in the model: age, sex and criteria symptom count; xxx, no valid regression model; OR, odds ratio; CI, confidence interval.

* p<0.1, shaded cells are significant at the p<0.05 level, ** p<0.05, *** p<0.01.

† p<0.001, ‡ p<0.0001.

In total, 36 of the 225 analyses (16%) were statistically significant (p<0.05), far in excess of chance expectations (Feild & Armenakis, Reference Feild and Armenakis1974). Loss of interest was the symptom with the fewest (only one) and suicidal ideation was the symptom with the most significant results (nine). The means (s.d.) of the ORs for the nine criteria across all the analyses ranged from 1.99 (1.18) for depressed mood to 1.17 (0.14) for trouble concentrating.

These results reveal a pattern of relationships between the criteria and the validators that was unexpectedly complex and variable. Of the diverse set of findings, nine were noteworthy.

(1) Regarding co-morbidities with anxiety disorders, psychomotor agitation/retardation predicted co-morbidity with GAD and was strongly positive associated with prominent anxiety symptoms, whereas appetite/weight change was negative associated with the anxiety symptoms. In addition, suicidal ideation positively predicted fully syndromal panic disorder.
(2) With regard to co-morbidities with substance use disorders and adult antisocial personality traits, suicidal ideation positively predicted risk for illicit drug dependence and adult antisocial personality traits, while appetite or weight changes were negative associated with both these validator variables. Yet, only worthlessness/guilt predicted co-morbidity with alcohol dependence and nicotine dependence, while fatigue was negative associated with nicotine dependence.
(3) None of the nine A criteria was significantly related to a new episode of MD or a diagnosis of MD in the co-twin. However, examining non-significant findings (p>0.05), we observed a heterogeneous pattern of relationships for these validators. For example, loss of interest showed a positive but depressed mood a negative association with risk for future episodes. Problems sleeping and fatigue tended to be positively related and depressed mood and suicidal ideation negatively related to the risk for MD in the co-twin.
(4) Significant sex differences emerged for four of the criteria; depressed mood, appetite/weight changes and fatigue significantly predicted being female, while psychomotor agitation/retardation was associated with being male.
(5) The criteria varied substantially in their relationships to other demographic characteristics. For example, age differences were found for two criteria; depressed mood and psychomotor agitation/retardation had a significant positive correlation with current age. In addition, four symptoms were associated with education, with sleep changes and fatigue being associated with more, and psychomotor agitation/retardation and suicidal ideation associated with less years of education. Also, depressed mood was associated with lower and trouble concentrating associated with higher family income.
(6) Regarding the two personality traits – neuroticism and extraversion – feeling worthless/guilty and suicidal ideation predicted higher neuroticism scores, while appetite/weight change and sleep changes were associated with lower neuroticism scores. In addition, worthlessness/guilt was associated with a lower extraversion score.
(7) Only suicidal ideation predicted a longer duration of the index episode and chronic depression, while fatigue was positively associated with shorter episodes and negatively associated with ‘chronic depression’ (defined as lasting ⩾12 months).
(8) Loss of interest was associated with episodes precipitated by a particular life event (‘something happened’), while suicidal ideation was associated with experiencing the depressive episode ‘out of the blue’.
(9) None of the symptoms was significantly associated with the number of prior depressive episodes, while only sleep problems predicted (older) age of first onset of MD.

Logistic regression analyses of the two symptom groups

A review of the results obtained at the individual symptom level reveals a tendency for predictions from neurovegetative symptoms (most typically appetite/weight, sleep problems, psychomotor changes and fatigue) to differ qualitatively from those obtained for the ‘cognitive/emotional’ symptoms (hereafter ‘cognitive’) of sad mood, loss of interest, guilt and suicidal ideation. For example, a significantly elevated risk for alcohol, nicotine and illicit drug dependence and antisocial traits is associated with symptoms of guilt or suicidal ideation, whereas a significantly decreased risk is associated with symptoms of fatigue or appetite changes. Levels of neuroticism are positively associated with guilt or suicidal ideation but negatively associated with appetite and sleep changes. Fatigue predicts shorter and suicidal ideation longer depressive episodes. Similar trends that do not reach significance are seen for co-morbidity, with panic disorder (positive with suicidal ideation and negative with appetite and sleep problems) and alcoholism (positive with guilt and negative with appetite and sleep problems), help-seeking (positive with suicidal ideation and negative with psychomotor changes and fatigue) and age at onset (positive with sleep problems and negative with depressed mood, loss of interest and suicidal ideation). While this trend was far from uniform (and did not appear to involve trouble concentrating in either group), we judged it sufficiently prominent to warrant further investigation.

We therefore examined, in our sample of individuals with MD in the last year, the ability of the sum of the symptoms endorsed in these two groups to predict the same set of validators we examined for the individual symptoms. Table 2 shows the results of these analyses.

Table 2. Comparison of cognitive and neurovegetative criteria count (logistic regression)

n, Sample size, differences due to missing data; BL, binary logit model; CL, cumulative logit model, covariates included in the model: sex, age and symptom count of the other symptom group; MD, major depression; OR, odds ratio; CI, confidence interval.

Cognitive symptoms: depressed mood; loss of interest; feeling worthless/guilty; suicidal ideation.

Neurovegetative symptoms: appetite/weight change; sleep; psychomotor agitation/retardation; fatigue.

Not counted: trouble concentrating.

^a One-tailed p values, zygosity included as covariate in the model.

Two-tailed p values:

* p<0.1, ** p<0.05, *** p<0.01, † p<0.001, ‡ p<0.0001.

ORs, 95% CI and levels of statistical significance are presented, controlling only for age, sex and the number of endorsed criteria symptoms of the other symptom group. We also controlled for zygosity when the ‘MD diagnosis in co-twin’ was the dependent variable. For the cumulative logit models, the ORs of 1 s.d. increase of the dependent variable are reported in the table.

Strikingly, for 17 of the 25 validator variables, the predictive power of the two symptom groups was significantly different from one another. Out of these results, eight were noteworthy.

(1) The number of endorsed cognitive criteria better predicted co-morbidities with other disorders than did the number of neurovegetative criteria, and these differences were significant except for GAD and our broad definition of panic.
(2) The number of cognitive criteria better predicted both higher neuroticism scores and lower extraversion scores and these differences were also significant.
(3) Regarding characteristics of the index episode, the cognitive criteria count significantly better predicted the duration of the episode, chronic MD defined as a depressive episode lasting longer than 12 months and seeking help, while the number of neurovegetative criteria better predicted that the depressive episode occurred ‘out of the blue’.
(4) The cognitive symptom count also significantly better predicted the number of prior episodes.
(5) The two symptom scores did not predict age at interview while the neurovegetative criteria count better predicted being female; this difference was significant.
(6) Regarding the other demographic characteristics, the cognitive criteria count was associated with fewer years of schooling and lower income; these differences were also significant.
(7) The cognitive criteria count significantly better predicted childhood sexual abuse.
(8) We found no significant differences between the two criteria counts regarding the prediction of a new depressive episode and a MD diagnosis in the co-twin.

Discussion

The aim of our analysis was to examine carefully the validity of the individual DSM-IV symptomatic or ‘A’ criteria for MD. In a large epidemiological sample of individuals who reported episodes of MD in the last year, we began by comparing the nine criteria on a wide range of potentially validating characteristics, none of which played any role in the diagnostic process. Our analyses focused on potential similarities and differences in the patterns of association of the nine criteria symptoms with these characteristics. In these analyses, we controlled for the number of endorsed A criteria so that an OR >1 meant that the presence of this criterion predicted the validator more strongly than expected for individuals with the same number of endorsed symptoms without that particular criterion. An OR <1 meant that the presence of this criterion predicted the validator more poorly than expected for individuals with the same number of endorsed symptoms without that particular criterion.

These results revealed an unexpected degree of heterogeneity in the performance of the individual criterion. Indeed, each of the nine criteria had a relatively unique pattern of findings with regard to our set of validator variables, suggesting a surprising degree of evidence for ‘covert heterogeneity’ within the MD syndrome. The relationships we observed support the notion of heterogeneity of the classification of MD discussed in the literature (e.g. Merikangas et al. Reference Merikangas, Wicki and Angst1994; Kendler et al. Reference Kendler, Eaves, Walters, Neale, Heath and Kessler1996; Chen et al. Reference Chen, Eaton, Gallo and Nestadt2000; Carragher et al. Reference Carragher, Adamson, Bunting and McCann2009; Hybels et al. Reference Hybels, Blazer, Pieper, Landerman and Steffens2009).

To further characterize this heterogeneity, we also examined an intermediate level between the MD diagnosis and the single criterion by comparing the number of endorsed criteria separately for two symptom groups – cognitive and neurovegetative – through the same set of validator variables. Of note, in recent exploratory factor analyses of depressive symptoms run with the entire VATSPSUD sample, we found a good statistical fit for a two-factor model with one factor reflecting cognitive and the other neurovegetative symptoms. When we compared these two symptom groups, we did not control for the total number of endorsed symptoms. Rather, we specifically examined whether the predictive power differed between the two groups of criteria. As our results show, some of this heterogeneity can be represented by the differences between the number of endorsed cognitive and neurovegetative criteria, with these differences being significant for most of the validator variables. Therefore, co-morbidities with other psychiatric disorders and the personality traits – higher neuroticism and lower extraversion – were most strongly associated with the cognitive criteria group, while we found significant sex differences (predicting being female) for the number of endorsed neurovegetative criteria.

Some of our individual findings have precedents in the literature. For example, Keller et al. (Reference Keller, Neale and Kendler2007) found that thoughts of self-harm were not associated with adverse life events, which are partly reflected in our result of suicidal ideation, predicting the depressive episode as experienced ‘out of the blue’. Although sex differences for psychomotor changes and fatigue have already been reported for our sample by Khan et al. (Reference Khan, Gardner, Prescott and Kendler2002), our approach was probably not sensitive enough to also show the sleep differences that Khan et al. reported in their study of matched twins. Corresponding to our results, Angst and Dobler-Mikola (Reference Angst and Dobler-Mikola1984) found that females report weight change more often when diagnosed with MD. Their finding of females reporting more feelings of worthlessness/guilt is not supported by our results. In addition, other studies found no sex differences for expressed depressive symptoms (Middeldorp et al. Reference Middeldorp, Wray, Andrews, Martin and Boomsma2006). Furthermore, some associations between substance use disorders and psychomotor agitation and appetite/weight loss were reported in the literature (for example, Balazs et al. Reference Balazs, Benazzi, Rihmer, Rihmer, Akiskal and Akiskal2006; Leventhal et al. Reference Leventhal, Francione and Zimmerman2008), but they are not comparable with our results because of the use of disaggregated symptoms in these studies.

The question of homogeneity versus heterogeneity of MD, including the distinction of ‘types’ of depression by symptom groups or causes (endogenous versus reactive) has a long history in psychiatry (for the homogeneity/unitary position, see Mapother, Reference Mapother1926; Lewis, Reference Lewis1934; Kendell, Reference Kendell1969; Akiskal & McKinney, Reference Akiskal and McKinney1975; for the heterogeneity/binary position, for example, Gillespie, Reference Gillespie1929; Kiloh & Garside, Reference Kiloh and Garside1963; Eysenck, Reference Eysenck1970). Because of a lack of empirical support for a clear binary concept of MD, DSM-III and IV definition were based on the unitarian position accompanied with the hope of finally settling these debates (see Parker, Reference Parker2000; van Praag, Reference van Praag2000). However, the validity of the current DSM-IV classification of MD has been questioned (see e.g. Parker, Reference Parker2005; Zimmerman et al. Reference Zimmerman, McGlinchey, Young and Chelminski2006b). The amount of heterogeneity that we found comparing the performance of criteria symptoms support these concerns. Yet, our results indicate that only some of this heterogeneity can be captured by a simple binary structure of cognitive versus neurovegetative symptoms.

Our results also have implications for the basic structure of DSM diagnostic categories. Beginning with DSM-III, ‘polythetic’ diagnostic models formed the basis for DSM diagnoses. Such models implicitly assume an approximate equivalence of the individual criterion – that is, that each criterion plays a similar and roughly interchangeable role in the diagnostic process. Our results question this assumption for the MD symptomatic criteria. In fact, we found that different symptoms or groups of symptoms of MD are associated with different clinical characteristics.

While each DSM revision has been justifiable proud about the increasing role of empirical findings in making nosologic decisions, in fact most of the diagnostic categories and the diagnostic criteria they contain have been accepted for historical rather than strictly empirical reasons (Kendler & Zachar, Reference Kendler, Zachar, Kendler and Parnas2008). Given the deep clinical wisdom and experience contained in these traditions, this is not an illogical approach. However, these results provide some challenge to this approach and, at a minimum, suggest that detailed psychometric evaluation of these historically defined diagnostic entities is overdue.

Finally, we find that cognitive–emotional criteria are rather consistently stronger predictors of our validators than are the neurovegetative criteria. In contrast, common clinical teaching highlights the neurovegetative or ‘biological’ symptoms as the ‘core’ features of depression. These results are broadly consistent with the work of Beck and colleagues, which emphasizes the primacy of cognitive changes in the etiology of MD (Beck & Alford, Reference Beck and Alford2008).

Limitations

These results should be interpreted in the light of four potentially important methodological concerns. First, our sample is limited to white twins born in the Commonwealth of Virginia and these results may or may not extrapolate to other samples. Second, the nature of our validator variables made it difficult to account formally, in most of our analyses, for the non-independence of observations in our twin data. However, only ∼17% of our data result from twin pairs and by exploring formal corrections for the binary logit models we found no substantial effects. Third, given the number of tests we performed, some of the significant findings are surely a result of chance effects. To correct formally for multiple testing, a strict Bonferroni (see Abdi, Reference Abdi and Salkind2007) would require setting the p value to around 0.0002. This would surely be too conservative, because of the substantial high correlations between a number of symptoms and between some of the validators. Furthermore, our goal here is not so much to strictly evaluate every single hypothesis; rather, we advocate taking an experimental wide approach, asking if the broad pattern of relationships between symptoms and validators are likely chance effects. As we demonstrate, this is very unlikely to be the case, as the total number of significant findings far exceeds that expected under the null hypothesis. While we cannot be confidant that any one of our findings is not due to chance effects, the probably that the entire pattern of results arose solely from false positives due to multiple testing is extremely unlikely.

Fourth, we created the two symptom groups based partly on our initial results and partly on conceptual and clinical grounds. Only the decision to drop ‘trouble concentrating’ was based on empirical results from our single symptom analysis. Other clusters of symptom groups could have possibly revealed other structures. However, the distinction between cognitive and neurovegetative symptoms has long precedence in the depression literature (Koenig et al. Reference Koenig, Cohen, Blazer, Krishnan and Sibert1993; Beck & Alford, Reference Beck and Alford2008; Carragher et al. Reference Carragher, Adamson, Bunting and McCann2009) and has been empirically supported by factor analyses conducted subsequently in this sample.

Conclusion

In individuals meeting DSM-IV criteria for MD in the last year in an epidemiological sample, the individual symptomatic criteria showed strongly varying patterns of relationships with a broad set of clinical relevant validators including demographic characteristics, co-morbidities, characteristics of depressive episodes and personality traits. This pattern is partly reflected by a dichotomy between cognitive and neurovegetative symptoms. Therefore, regarding the characteristics represented by endorsed symptoms, the current DSM-IV definition of MD represents a construct of substantial heterogeneity. These results challenge our understanding of MD as homogenous categorical entity. Hopefully, future research will determine whether a distinction between cognitive and neurovegetative symptoms is clinically relevant to the etiology, course and treatment of MD.

Acknowledgements

This work was supported by the American Psychiatric Association and NIH grants MH-0828 and MH/DA/AA 49492.

Declaration of Interest

None.

References

Abdi, H (2007). Bonferroni and Sidak corrections for multiple comparisons. In Encyclopedia of Measurement and Statistics (ed. Salkind, N. J.), pp. 103–107. Sage: Thousand Oaks, CA.Google Scholar

Akiskal, HS, McKinney, WT Jr. (1975). Overview of recent research in depression. Integration of ten conceptual models into a comprehensive clinical frame. Archives of General Psychiatry 32, 285–305.CrossRef Google Scholar PubMed

Angst, J, Dobler-Mikola, A (1984). Do the diagnostic criteria determine the sex ratio in depression? Journal of Affective Disorders 7, 189–198.CrossRef Google Scholar PubMed

APA (1980). Diagnostic and Statistical Manual of Mental Disorders, 3rd edn. American Psychiatric Association: Washington, DC.Google Scholar

APA (1987). Diagnostic and Statistical Manual of Mental Disorders – Revised, 3rd edn. American Psychiatric Association: Washington, DC.Google Scholar

APA (1994). Diagnostic and Statistical Manual of Mental Disorders, 4th edn. American Psychiatric Association: Washington, DC.Google Scholar

Balazs, J, Benazzi, F, Rihmer, Z, Rihmer, A, Akiskal, KK, Akiskal, HS (2006). The close link between suicide attempts and mixed (bipolar) depression: implications for suicide prevention. Journal of Affective Disorders 91, 133–138.CrossRef Google Scholar PubMed

Beck, AT, Alford, BA (2008). Depression: Causes and Treatment, 2nd edn. University of Pennsylvania Press: Pennsylvania.Google Scholar

Carragher, N, Adamson, G, Bunting, B, McCann, S (2009). Subtypes of depression in a nationally representative sample. Journal of Affective Disorders 113, 88–99.CrossRef Google Scholar

Cassidy, WL, Flanagan, NB, Spellman, M, Cohen, ME (1957). Clinical observations in manic-depressive disease: a quantitative study of one hundred manic-depressive patients and fifty medically sick controls. Journal of the American Medical Association 164, 1535–1546.CrossRef Google Scholar PubMed

Chen, LS, Eaton, WW, Gallo, JJ, Nestadt, G (2000). Understanding the heterogeneity of depression through the triad of symptoms, course and risk factors: a longitudinal, population-based study. Journal of Affective Disorders 59, 1–11.CrossRef Google Scholar PubMed

Eysenck, HJ (1970). Classification of depressive illnesses. British Journal of Psychiatry 117, 241–250.CrossRef Google Scholar PubMed

Eysenck, SBG, Eysenck, HJ, Barrett, P (1985). A revised version of the psychoticism scale. Personality and Individual Differences 6, 21–29.CrossRef Google Scholar

Fagerström, KO (1978). Measuring degree of physical dependence to tobacco smoking with reference to individualization of treatment. Addictive Behaviors 3, 235–241.CrossRef Google Scholar PubMed

Faravelli, C, Servi, P, Arends, JA, Strik, WK (1996). Number of symptoms, quantification, and qualification of depression. Comprehensive Psychiatry 37, 307–315.CrossRef Google Scholar PubMed

Feighner, JP, Robins, E, Guze, SB, Woodruff, RA Jr., Winokur, G, Munoz, R (1972). Diagnostic criteria for use in psychiatric research. Archives of General Psychiatry 26, 57–63.CrossRef Google Scholar PubMed

Feild, H, Armenakis, A (1974). On use of multiple tests of significance in psychological research. Psychological Reports 35, 427–431.CrossRef Google Scholar

Frances, A, Pincus, HA, Widiger, TA, Davis, WW, First, MB (1990). DSM-IV: work in progress. American Journal of Psychiatry 147, 1439–1448.Google Scholar PubMed

Gillespie, RD (1929). The clinical differentiation of types of depression. Guy's Hospital Report 9, 306–344.Google Scholar

Hybels, CF, Blazer, DG, Pieper, CF, Landerman, LR, Steffens, DC (2009). Profiles of depressive symptoms in older adults diagnosed with major depression: latent cluster analysis. American Journal of Geriatric Psychiatry 17, 387–396.CrossRef Google Scholar PubMed

Keller, MB, Klein, DN, Hirschfeld, RMA, Kocsis, JH, McCullough, JP, Miller, I, First, MB, Holzer, CP, Keitner, GI, Marin, DB, Shea, T (1995). Results of the DSM-IV mood disorders field trial. American Journal of Psychiatry 152, 843–849.Google Scholar PubMed

Keller, MC, Neale, MC, Kendler, KS (2007). Association of different adverse life events with distinct patterns of depressive symptoms. American Journal of Psychiatry 164, 1521–1529.CrossRef Google Scholar PubMed

Kendell, RE (1969). Continuum model of depressive illness. Proceedings of the Royal Society of Medicine – London 62, 335–339.Google Scholar PubMed

Kendler, KS, Eaves, LJ, Walters, EE, Neale, MC, Heath, AC, Kessler, RC (1996). The identification and validation of distinct depressive syndromes in a population-based sample of female twins. Archives of General Psychiatry 53, 391–399.CrossRef Google Scholar

Kendler, KS, Gardner, CO Jr. (1998). Boundaries of major depression: an evaluation of DSM-IV criteria. American Journal of Psychiatry 155, 172–177.CrossRef Google Scholar PubMed

Kendler, KS, Munoz, RA, Murphy, G(2009). The development of the Feighner criteria: an historical perspective. American Journal of Psychiatry. Published online: 15 December 2009. doi:10.1176/appi.ajp.2009.09081155.Google Scholar PubMed

Kendler, KS, Prescott, CA (1999). A population-based twin study of lifetime major depression in men and women. Archives of General Psychiatry 56, 39–44.CrossRef Google Scholar PubMed

Kendler, KS, Prescott, CA (2006). Genes, Environment, and Psychopathology: Understanding the Causes of Psychiatric and Substance Use Disorders. Guilford Press: New York.Google Scholar

Kendler, KS, Zachar, P (2008). The incredible insecurity of psychiatric nosology. In Philosophical Issues in Psychiatry (ed. Kendler, K. S. and Parnas, J.), pp. 368–383. The Johns Hopkins University Press: Baltimore, MD.CrossRef Google Scholar

Khan, AA, Gardner, CO, Prescott, CA, Kendler, KS (2002). Gender differences in the symptoms of major depression in opposite-sex dizygotic twin pairs. American Journal of Psychiatry 159, 1427–1429.CrossRef Google Scholar PubMed

Kiloh, LG, Garside, RF (1963). The independence of neurotic depression and endogenous depression. British Journal of Psychiatry 109, 451–463.CrossRef Google Scholar PubMed

Koenig, HG, Cohen, HJ, Blazer, DG, Krishnan, KR, Sibert, TE (1993). Profile of depressive symptoms in younger and older medical inpatients with major depression. Journal of the American Geriatric Society 41, 1169–1176.CrossRef Google Scholar PubMed

Leventhal, AM, Francione, WC, Zimmerman, M (2008). Associations between depression subtypes and substance use disorders. Psychiatry Research 161, 43–50.CrossRef Google Scholar PubMed

Lewis, AJ (1934). Melancholia: a clinical survey of depressive states. Journal of Mental Science 80, 277–378.CrossRef Google Scholar

McGlinchey, JB, Zimmerman, M, Young, D, Chelminski, I (2006). Diagnosing major depressive disorder VIII: are some symptoms better than others? Journal of Nervous and Mental Disease 194, 785–790.CrossRef Google Scholar PubMed

Mapother, E (1926). Discussion on manic-depressive psychosis. British Medical Journal 2, 872–885.Google Scholar

Merikangas, KR, Wicki, W, Angst, J (1994). Heterogeneity of depression. Classification of depressive subtypes by longitudinal course. British Journal of Psychiatry 164, 342–348.CrossRef Google Scholar PubMed

Middeldorp, CM, Wray, NR, Andrews, G, Martin, NG, Boomsma, DI (2006). Sex differences in symptoms of depression in unrelated individuals and opposite-sex twin and sibling pairs. Twin Research and Human Genetics 9, 632–636.CrossRef Google Scholar PubMed

Parker, G (2000). Classifying depression: Should paradigms lost be regained? American Journal of Psychiatry 157, 1195–1203.CrossRef Google Scholar PubMed

Parker, G (2005). Beyond major depression. Psychological Medicine 35, 467–474.CrossRef Google Scholar PubMed

Parker, G, Wilhelm, K, Asghari, A (1998). Depressed mood states and their inter-relationship with clinical depression. Social Psychiatry and Psychiatric Epidemiology 33, 10–15.CrossRef Google Scholar PubMed

SAS Institute (2005). SAS OnlineDoc Version 9.1.3. SAS Institute Inc.: Cary, NC.Google Scholar

Schotte, CK, Maes, M, Cluydts, R, Cosyns, P (1997). Cluster analytic validation of the DSM melancholic depression. The threshold model: integration of quantitative and qualitative distinctions between unipolar depressive subtypes. Psychiatric Research 71, 181–195.CrossRef Google Scholar PubMed

Spitzer, RL, Endicott, J, Robins, E (1978). Research diagnostic criteria: rationale and reliability. Archives of General Psychiatry 35, 773–782.CrossRef Google Scholar PubMed

Spitzer, RL, Fleiss, JL (1974). A re-analysis of the reliability of psychiatric diagnosis. British Journal of Psychiatry 125, 341–347.CrossRef Google Scholar PubMed

Stone, TT, Burris, BC (1950). Melancholia; clinical study of 50 selected cases. Journal of the American Medical Association 142, 165–168.CrossRef Google Scholar PubMed

Sullivan, PF, Prescott, CA, Kendler, KS (2002). The subtypes of major depression in a twin registry. Journal of Affective Disorders 68, 273–284.CrossRef Google Scholar

van Praag, HM (2000). Nosologomania: a disorder of psychiatry. World Journal of Biological Psychiatry 1, 151–158.CrossRef Google Scholar PubMed

Zimmerman, M, McGlinchey, JB, Young, D, Chelminski, I (2006 a). Diagnosing major depressive disorder III: can some symptoms be eliminated from the diagnostic criteria? Journal of Nervous and Mental Disease 194, 313–317.CrossRef Google Scholar PubMed

Zimmerman, M, McGlinchey, JB, Young, D, Chelminski, I (2006 b). Diagnosing major depressive disorder introduction: an examination of the DSM-IV diagnostic criteria. Journal of Nervous and Mental Disease 194, 151–154.CrossRef Google Scholar PubMed

Table 1. Comparison major depression (MD) criteria symptoms

Table 2. Comparison of cognitive and neurovegetative criteria count (logistic regression)

Article contents

Deconstructing major depression: a validation study of the DSM-IV symptomatic criteria

Abstract

Keywords

Introduction

Methods

Sample and measurements

Data analysis

Results

Logistic regression analyses of single symptoms

Logistic regression analyses of the two symptom groups

Discussion

Limitations

Conclusion

Acknowledgements

Declaration of Interest

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests