Hostname: page-component-745bb68f8f-cphqk Total loading time: 0 Render date: 2025-02-06T07:57:06.309Z Has data issue: false hasContentIssue false

Measuring melancholia: the utility of a prototypic symptom approach

Published online by Cambridge University Press:  16 September 2008

G. Parker*
Affiliation:
School of Psychiatry, University of New South Wales, Sydney, Australia Black Dog Institute, Sydney, Australia
K. Fletcher
Affiliation:
School of Psychiatry, University of New South Wales, Sydney, Australia Black Dog Institute, Sydney, Australia
M. Hyett
Affiliation:
Black Dog Institute, Sydney, Australia
D. Hadzi-Pavlovic
Affiliation:
School of Psychiatry, University of New South Wales, Sydney, Australia Black Dog Institute, Sydney, Australia
M. Barrett
Affiliation:
Black Dog Institute, Sydney, Australia
H. Synnott
Affiliation:
Black Dog Institute, Sydney, Australia
*
*Address for correspondence: Professor G. Parker, Black Dog Institute, Hospital Road, Randwick, NSW 2031, Australia. (Email: g.parker@unsw.edu.au)
Rights & Permissions [Opens in a new window]

Abstract

Background

Melancholia has long resisted classification, with many of its suggested markers lacking specificity. The imprecision of depressive symptoms, in addition to self-report biases, has limited the capacity of existing measures to delineate melancholic depression as a distinct subtype. Our aim was to develop a self-report measure differentiating melancholic and non-melancholic depression, weighting differentiation by prototypic symptoms and determining its comparative classification success with a severity-based strategy.

Method

Consecutively recruited depressed out-patients (n=228) rated 32 symptoms by prototypic or ‘characteristic’ relevance (using the Q-sort strategy) and severity [using the Severity-based Depression Rating System (SDRS) strategy]. Clinician diagnosis of melancholic/non-melancholic depression was the criterion measure, but two other formal measures of melancholia (Newcastle and DSM-IV criteria) were also tested.

Results

The prevalence of ‘melancholia’ ranged from 20.9% to 54.2% across the subtyping measures. The Q-sort measure had the highest overall correct classification rate in differentiating melancholic and non-melancholic depression (81.6%), with such decisions supported by validation analyses.

Conclusions

In differentiating a melancholic subtype or syndrome, prototypic symptoms should be considered as a potential alternative to severity-based ratings.

Type
Original Articles
Copyright
Copyright © Cambridge University Press 2008

Introduction

… melancholia may be a naturally occurring phenotype, qualitatively distinguishable from non-melancholic depression. (Leventhal & Rehm, Reference Leventhal and Rehm2005)

One of the longest-standing controversies in psychiatry involves the classification of depression. The possibility exists that, within the overall set of depressive conditions, there is a depressive subtype (variably termed endogenous, endogenomorphic, autonomous, vital and melancholia over time) that has resisted definition for a number of reasons. First, many suggested markers (Parker & Hadzi-Pavlovic, Reference Parker and Hadzi-Pavlovic1996; Taylor & Fink, Reference Taylor and Fink2006) may measure depression per se rather than being specific to melancholia, with Nelson & Charney (Reference Nelson and Charney1981) identifying few features with distinct specificity. Second, apart from the possible exception of psychotic features, depressive symptoms are imprecise and do not approach the optimum criterion of being ‘necessary and sufficient’ to allow categorical definition. Most symptoms vary in severity and are subject to a range of self-reporting biases (e.g. denial, minimizing or magnification).

Such problems necessarily limit the capacity of any measure (especially a severity-weighted one) to delineate and measure melancholia, and the standard approach is to undertake a ‘semi-structured interview’ (Leventhal & Rehm, Reference Leventhal and Rehm2005), with DSM-IV criteria being ‘the most frequently used’ reference point. Few other measures have been developed. The Bech–Rafaelsen Melancholia Scale (MES; Bech & Rafaelsen, Reference Bech and Rafaelsen1980) is one of few symptom measures, developed as an extension of the Hamilton Depression Scale. However, Smolka & Stieglitz (Reference Smolka and Stieglitz1999), as well as Bech (Reference Bech2002), state that the MES is more a measure of depression severity than of melancholia. Other symptom-based measures of melancholia operationalize DSM criteria, such as the Inventory to Diagnose Depression (IDD; Zimmerman & Coryell, Reference Zimmerman and Coryell1987) and the self-report version of the Inventory of Depressive Symptomatology (IDS-SR; Rush et al. Reference Rush, Trivedi, Ibrahim, Carmody, Arnow, Klein, Markowitz, Ninan, Kornstein, Manber, Thase, Kocsis and Keller2003). A rare example of a self-report measure designed to discriminate melancholic depression is the Levine–Pilowsky Depression Questionnaire (LPDQ; Pilowsky et al. Reference Pilowsky, Levine and Boulton1969), which was developed within a sample including 38 patients diagnosed with ‘endogenous depression’ and 38 patients with ‘neurotic depression’ but which did not return particularly impressive overall correct classification rates in that sample.

In terms of composite illness measures, DSM-III-R criteria included three non-symptom criteria (i.e. no significant antecedent personality disturbance; previous depressive episodes followed by remission; and previous good response to physical treatments). The Newcastle Scale (Carney et al. Reference Carney, Roth and Garside1965) comprises symptoms, an illness course variable and also an assessment of personality ‘adequacy’ and precipitating events. However, measurement of any syndrome or entity is preferably limited to symptoms rather than incorporating illness course and treatment response variables, which are better viewed as validators.

In our earlier studies (Parker & Hadzi-Pavlovic, Reference Parker and Hadzi-Pavlovic1996) we considered the comparative capacity of sets of ‘endogeneity symptoms’ and observable CORE signs of psychomotor disturbance to define and differentiate melancholic depression. The 18-item CORE measure was superior to the symptom measure in differentiating ‘melancholia’, and superior to endogeneity symptom scores across a range of validation strategies involving psychosocial and biological variables, predicting response to both antidepressant drug therapy and electro-convulsive therapy (ECT). The greater discrimination of ‘signs’ encouraged us to argue that melancholia required observational rating of psychomotor disturbance.

However, two limitations have emerged over time. First, observable psychomotor disturbance is less distinctive in younger patients with seemingly ‘true melancholia’. Second, valid rating of psychomotor disturbance requires observing patients at or near nadir of their depressive episode, a limitation in clinical practice when many patients do not present for diagnosis at their worst.

We therefore sought to develop a symptom-based measure in an attempt to overcome such limitations, and judged that there were four key issues to address. First, we aimed to improve on previous candidate sets of potentially specific melancholic symptoms. As ‘psychomotor disturbance’ is the most consistently identified marker of melancholia (Rush & Weissenburger, Reference Rush and Weissenburger1994), we introduced seven items capturing both motoric and concentration components of psychomotor disturbance. Second, we included those ‘endogeneity’ symptoms most consistently identified in previous reviews. Third, although the heterogeneous nature of non-melancholic depression ensures that it has no definable symptoms, we tested the utility of certain symptoms (e.g. anger, irritability) that are more likely to be reported by those with non-melancholic disorders. Fourth, we needed to overcome limitations to most depression severity self-report measures, where ratings may reflect individual response biases (Demyttenaere & De Fruyt, Reference Demyttenaere and De Fruyt2003) rather than capture true ‘severity’, and with such biases more likely in non-endogenous and dysthymic patients (Rush et al. Reference Rush, Hiser and Giles1987). We therefore sought to rate symptoms in terms of their ‘prototypic’ status, adopting a Q-sort strategy. By also including a standard Likert-type scale rating severity of the same set of symptoms, we could quantify the comparative discrimination offered by each strategy.

Lacking any ‘gold standard’ reference for any melancholic versus non-melancholic depression decision, we used ‘clinician diagnosis’ as the reference measure, with two clinicians making independent ratings, and with the senior clinician essentially adopting a ‘Longitudinal, Expert, All Data’ (LEAD; Spitzer, Reference Spitzer1983) strategy to decision making, effectively by assessing longitudinal and cross-sectional information from multiple sources. Finally, we report several validation analyses of the measure.

Method

General

Patients referred to our out-patient Depression Clinic for diagnostic and management advice completed a computerized Mood Assessment Program (MAP), which collects (by self-report and clinician-rated) sociodemographics, illness details (including assessing features of bipolar disorder), personality and treatment details, and lifetime rates of anxiety disorders. MAP data included: (i) sociodemographic (age, sex, marital status) data; (ii) current depressive disorder characteristics [duration of episode, and severity as assessed by two state depression measures, the 10-item depression in the medically ill (DMI-10; Parker et al. Reference Parker, Hilton, Bains and Hadzi-Pavlovic2002) and the Quick Inventory of Depressive Symptomatology Self-Report (QIDS-SR; Rush et al. Reference Rush, Trivedi, Ibrahim, Carmody, Arnow, Klein, Markowitz, Ninan, Kornstein, Manber, Thase, Kocsis and Keller2003)]; (iii) depressive disorder history (age of onset of first depressive episode); (iv) family history of depression, bipolar disorder and alcoholism; (v) developmental difficulties with parents; (vi) stressful life events during the preceding 12 months; (vii) previous receipt of ECT and its self-reported level of effectiveness; (viii) overall functioning limitations (clinician-rated and patient-rated) or disability levels; and (ix) ongoing and pre-episode personality functioning limitations (clinician-rated).

MAP administration was followed by a semi-structured interview assessing DSM-IV criteria for major depressive disorder and melancholia. This interviewer (K.F.) was available for 187 (82%) of the patients and blind to MAP diagnostic decisions and all referral information.

The intake psychiatrists (M.B. and H.S.), both of whom have had extensive clinical experience in assessing those with mood disorders, then undertook a detailed clinical assessment for 60–90 minutes, derived Newcastle Scale ratings (cut-off score ⩾6 assigning melancholic depression) subject to interview time constraints, and subsequently presented the history to the senior psychiatrist (G.P.). Here, documentation from referring physicians and any other source was reviewed, followed by the senior psychiatrist interviewing the patient (and often a relative) to clarify diagnostic issues. All psychiatrists were blind to MAP diagnostic decisions, with the senior psychiatrist also blind to Newcastle Scale results. As noted, we therefore collected longitudinal and cross-sectional data from patients, from their referring physicians and often from multiple sources, seeking to adopt a LEAD strategy.

Although the clinic is a tertiary referral centre, the majority of patients were referred by general practitioners for diagnostic and management clarification, and only a minority had a referral from a psychiatrist for a treatment-resistant depression. Nevertheless, such referral nuances may have weighted the sample to a more diagnostically difficult group than if recruitment had occurred in a general practice or a general psychiatrist setting.

Study-specific nuances

Q-sorting is an ‘ipsative’ technique whereby the ‘sorter’ is required to examine each item in relation to every other item presented. By using a forced distribution, the sorter is restricted to limited grid positions in which to place symptoms, thus addressing the limitation of Likert-type scales, where all items may be inflated or minimized by a rater bias (Block, Reference Block1961). For a detailed discussion of Q-sort methodology, the reader is referred to the article by Watts & Stenner (Reference Watts and Stenner2005).

Q-methodology ordinarily requires 40–80 items (Watts & Stenner, Reference Watts and Stenner2005), with some researchers adopting 100 or more items in some instances (e.g. Jones, Reference Jones1985). Large numbers of items are burdensome (particularly for depressed individuals), whereas smaller item sets risk inadequate coverage of the area of interest. We sought to maximize the validity of the methodology in our sample by using a forced-normal distribution (Fig. 1) that minimizes the potential for subjects to rank symptoms at the extreme points, an aspect that we believe is particularly salient for the population of interest, who are noted to often ‘rate up’ symptom severity to emphasize the seriousness of their condition. The forced-normal distribution dictates the number of items that can be assigned to each ranking position. Bearing these issues in mind, and respecting the need for an approximately equal number of ‘melancholic’ and ‘non-melancholic’ symptoms to reduce potential bias, we wrote 32 descriptors (see Table 1) that we judged (from historical and our own clinical studies) as having some specificity to melancholic or to non-melancholic depression. The same 32-item set was presented for self-report completion using the computerized MAP, first using the Q-sort format and then the Severity-based Depression Rating System (SDRS) to generate Q-sort scores and SDRS scores.

Fig. 1. Final Q-sort grid template, illustrating the template allowed for patients to allocate prescribed numbers of ‘least characteristic’ through ‘neutral’ to ‘most characteristic’ features.

Table 1. Item mean scores for clinically diagnosed (‘extremely confident’) melancholic and non-melancholic patients (n=74)

Mel, Melancholic depression; N-Mel, non-melancholic depression.

+ indicates higher mean score in melancholic subset; − indicates higher mean score in non-melancholic subset.

Patients were instructed to focus on when they were at their ‘worst’ of their depression, to ignore medication side-effects and to effectively rank items from ‘extremely characteristic’ to ‘extremely uncharacteristic’. They used a computer mouse to assign items and generate a final grid template comprising the two ‘most characteristic’ items (scoring +4), the next three ‘most characteristic’ (+3), the next four ‘most characteristic’ (+2), the next four ‘most characteristic (+1), and also a similar grid for the least characteristic items (respectively scoring −4, −3, −2 and −1) and with six ‘neutral’ items (scoring 0) also selected. The computer strategy allowed items to be progressively assembled, reviewed and altered by the patient before their final confirmation, as shown in Fig. 1.

Patients were subsequently presented with the same 32-item set (SDRS measure) and asked to rate whether they had experienced each symptom severely, moderately, mildly or not at all (scored 3, 2, 1 and 0 respectively), in relation to the same reference period for their episode.

Without any reference to the MAP, both the assessing and the senior psychiatrist (G.P.) were required to make a judgment about whether the patient had either a melancholic or a non-melancholic depressive disorder, and rate the confidence of their diagnosis (5=extremely confident, 4=very confident, 3=neutral, 2=not very confident, 1=not at all confident).

Sample characteristics

The provisional sample comprised 234 patients clinically diagnosed with a unipolar depressive episode at the Institute's Depression Clinic during 2005–2007. Of those assessed, six were removed from the data analysis because of invalid self-report data as determined by the assessing clinician, generally reflecting distinctly impaired concentration or memory.

Results

Study samples

Analyses were undertaken on the whole sample (n=228) and on a subset of 74 patients judged by the senior psychiatrist as allowing an ‘extremely confident’ subtyping decision of either melancholic (n=86, n=19 for total and subset samples respectively) or non-melancholic (n=142, n=55 for total and subset samples respectively) depression. Table 5 details further sample characteristics including mean age and gender distribution.

Mean depression severity for the whole sample was calculated as 14.6 (s.d.=5.4) for the QIDS-SR and 19.7 (s.d.=7.6) for the DMI-10. Agreement between the independent allocations (melancholic versus non-melancholic) made by the assessing and the senior psychiatrist was high for both the whole sample (κ=0.78) and the ‘extremely confident’ subset (κ=0.89).

Item testing

Q-sort scores returned by those with clinically diagnosed melancholic and non-melancholic depression were calculated in the ‘extremely confident’ subset, and items ranked (see Table 1) in terms of their diagnostic weighting and differentiation. Those with melancholia were more likely to prioritize items assessing psychomotor disturbance, mood and energy worse in the morning, anhedonia and mood non-reactivity, whereas those with non-melancholic depression were more likely to prioritize anger and irritability, guilt and feeling suicidal.

Our principal analyses were undertaken on the whole sample of 228 patients rather than being restricted to ‘clear-cut’ cases. Q-sort symptom scores for the 32 items were entered into a logistic regression, with the senior psychiatrist's diagnosis (i.e. melancholic versus non-melancholic depression) as the independent grouping variable, to calculate an overall Q-sort test score. For comparative purposes, the analyses were repeated using the 32-item SDRS scores. The logistic regression model for the Q-sort measure correctly classified 81.6% of the total sample (72.1% of melancholic patients and 87.3% of non-melancholic patients) and 85.1% of the ‘extremely confident’ subset (73.7% melancholic and 89.1% non-melancholic) when compared to senior Clinician diagnosis. The SDRS yielded overall correct classification rates of 77.6% (65.1% melancholic and 85.2% non-melancholic) in the total sample and 77.0% (68.4% melancholic and 80.0% non-melancholic) in the ‘extremely confident’ sample.

Receiver operating characteristic (ROC) analyses were undertaken on Q-sort and SDRS test scores to evaluate their comparative diagnostic subtyping propensities as quantified by the area under the curve (AUC) of both measures. Table 2 lists AUC values for both measures and tests for pair-wise differences. The highest AUC was produced by the Q-sort measure within the diagnostically ‘extremely confident’ subset; however, no significant differences were demonstrated across samples or measurement strategies.

Table 2. Area under the curve (AUC) comparison for SDRS and Q-sort measures

SDRS, Severity-based Depression Rating System; s.e., standard error; CI, confidence interval.

The optimal cut-off score for each measure was calculated as corresponding to the maximum of Kraemer's (Reference Kraemer1992) QROC criterion κ (0.5, 0), weighting false positives and negatives equally. QROC- and ROC-derived cut-off values, sensitivity, specificity, positive and negative predictive values are presented in Table 3.

Table 3. Receiver operating characteristic (ROC) analyses for SDRS and Q-sort measures

SDRS, Severity-based Depression Rating System; Se, selectivity; Sp, specificity; PPV, positive predictive value; NPV, negative predictive value.

Examination of the performance of both measures within the total sample quantified the Q-sort measure as having marginally higher specificity values than the SDRS measure (regardless of whether QROC or ROC cut-off scores were used). For the ‘extremely confident’ subset, the Q-sort strategy had higher sensitivity values than the SDRS strategy, whereas the SDRS strategy returned slightly higher specificity values. In terms of the positive predictive power (i.e. correctly allocating ‘true cases’ of melancholia), the Q-sort strategy was only superior to the SDRS within the whole sample set. Overall, the Q-sort strategy had superior negative predictive power (i.e. correctly allocating ‘true cases’ of non-melancholic depression) to the SDRS in both samples. As an ideal screening measure should have high specificity and negative predictive values, we suggest that the Q-sort strategy is marginally superior to the SDRS as a potentially useful screening tool based on the values derived in this sample.

Assignment and agreement between the differing subtyping strategies

The five differing systems assigned varying percentages to a melancholic category: 54.2% by DSM-IV, 37.7% by Clinician, 35.1% by Q-sort, 33.8% by SDRS and 20.9% by the Newcastle Scale. Table 4 examines the overall diagnostic agreement between each of the various measures, limited to the subset of 107 subjects where we had complete DSM-IV data. κ coefficients indicate that DSM-IV assignment was minimally in agreement with all other measures, that Newcastle assignment was slightly associated with Q-sort and SDRS and moderately so with Clinician diagnosis, and that Q-sort, SDRS and Clinician diagnosis were all moderately associated with each other.

Table 4. Diagnostic agreement (κ values) between differing diagnostic strategies (whole sample)

SDRS, Severity-based Depression Rating System.

κ values refer to overall diagnostic agreement between measures.

a Diagnostic decisions were derived by logistic regression analysis of the 32-item set.

* Significant at the p<0.05 level (two-tailed). ** Significant at the p<0.01 level (two-tailed). *** Significant at the p<0.001 level (two-tailed).

Validation analyses

Several ascriptions to melancholia have been detailed (Parker & Hadzi-Pavlovic, Reference Parker and Hadzi-Pavlovic1996; Taylor & Fink, Reference Taylor and Fink2006), including an older age, older age at first onset, greater severity, a more ‘endogenous’ onset (as against being reactive to life event stressors), exposure to fewer developmental stressors in childhood including dysfunctional parenting, being less likely to be associated with a personality disorder, a stronger family history of depression and a more specific response to physical treatments, particularly ECT.

Table 5 examines the degree to which assignment to melancholic depression by the five differing systems could be supported by several validating variables. Those assigned as melancholic by all five systems had a higher mean current age and, apart from DSM-IV diagnoses, a higher age of onset of depression, and tended to have longer episodes than non-melancholic subjects. Those assigned as having melancholic depression by all systems apart from DSM-IV were less likely to report difficulty with parents in their developmental years (significant for Q-sort, SDRS and Clinician) and fewer stressful life events in the preceding 12 months. Q-sort, Newcastle and Clinician-assigned melancholic patients were significantly less likely to be rated with personality dysfunction. Receipt of ECT was more likely for those assigned as having melancholic depression by all systems other than by DSM-IV criteria, but no system identified any self-reported differential efficacy to ECT across those with melancholic or non-melancholic allocations.

Table 5. Validator variables

SDRS, Severity-based Depression Rating System; Mel, melancholic depression; N-Mel, non-melancholic depression; ECT, electro-convulsive therapy.

Note: n are indicated where analyses were based on subsamples of patients.

a Effectiveness ratings: 1=not at all effective, 2=somewhat effective, 3=very effective.

b Higher scores indicate more severe dysfunction.

* Significant at the p<0.05 level (two-tailed). ** Significant at the p<0.01 level (two-tailed). *** Significant at the p<0.001 level (two-tailed).

Melancholia: do study items support a separate type?

According to Kendell (Reference Kendell1989), a bimodal distribution is indicative of the existence of two discrete depressive subtypes. We therefore undertook mixture analyses (Agha & Ibrahim, Reference Agha and Ibrahim1984) to examine for any evidence of a bimodal distribution. First, summed SDRS scores for all 32 items were analysed, with tests examining for one versus two populations being non-significant. As the same test could not be undertaken for the Q-sort scores because of the nature of the data (scores sum to zero), five items were selected on the basis that they produced a mean score difference of 1.0 or more between clinician-diagnosed melancholic and non-melancholic patients (see Table 1) in terms of Q-sort grid placement, and were most weighted to a clinical diagnosis of ‘melancholia’. Neither Q-sort nor SDRS scores produced a bimodal distribution for this item set.

Discussion

We sought to develop a self-report measure of melancholia, and by testing the same item set with contrasting prototypic and severity-based rating strategies, sought to quantify the extent to which the items themselves and/or the methodological strategy might influence differentiation. McKeown & Thomas (Reference McKeown and Thomas1988) noted that the Q-method ‘retains a somewhat fugitive status within the larger scientific community’. Our results suggested a slight advantage to the prototypic approach, as the Q-sort strategy effectively (through logistical regression analyses) identified 81.6% of the total sample as having a ‘melancholic’ or ‘non-melancholic’ depression when compared to Clinician diagnosis. In light of the statistically comparable AUC values for both measures derived from ROC analyses, we tentatively position the Q-sort strategy as a potential alternative to the SDRS as a screening tool for melancholic depression because of its higher specificity and negative predictive values when using optimal cut-off scores based on the total sample. We suggest that by testing such an approach, and quantifying comparative classification success of prototypic and severity-based measures, we allow other researchers the opportunity to consider whether a prototypic approach may have utility to a similar or independent inquiry. It may be that, in some inquiries, rating severity of features is more important than rating characteristic features, whereas for other inquiries (particularly in differentiating clinical subtypes) the converse may hold. The Q-sort strategy effectively ‘forces’ subjects to rank items according to a normal distribution, thus overcoming any tendency to rate at the extreme ends of any dimensional scale, and when such ratings can reflect contrasting minimizing and maximizing biases.

In any sample of depressed patients, some will evidence ‘clear-cut’ characteristic depressive patterns whereas others will have less distinctive patterns. We elected to focus our analyses on all subjects rather than the more diagnostically ‘confident’ or clear-cut subjects, as the latter approach theoretically risks optimizing classification rates, and with such subjects not likely to represent those presenting to primary and secondary care facilities. Our overall sample comprised patients referred to a tertiary consultative service, with many having diagnostically unclear conditions and generating low ‘confidence’ ratings. In light of this context issue, the high differentiation achieved by the Q-sort strategy in this sample is noteworthy. As our severity-based (SDRS) measure was almost as successful, it is likely that high differentiation emerged more from the item set rather than from the contrasting strategy. As the Q-sort was completed prior to the SDRS, many might have been cued by the Q-sort procedure preceding the SDRS presentation, artificially assisting SDRS differentiation, and arguing for a counterbalanced approach in replication studies.

Examined against several potential validators, additional support for the Q-sort solution emerged. Those assigned by the Q-sort strategy to a melancholic class had a profile consistent with ascriptions of ‘melancholia’: that it appears at an older age, has a more ‘endogenous’ background rather than reflect distal and proximal antecedent psychosocial factors, and is more likely to require ECT.

Q-sort assignment (though showing some agreement with clinician, SDRS and Newcastle assignments) was at considerable variance with DSM-IV assignment, whereas only one validation variable (i.e. receipt of ECT) was significant in relation to DSM-IV assignment. Thus, Q-sort replication and extension (e.g. aetiological) studies should focus on both intrinsic utility and comparative utility with DSM-IV decision rules.

One key limitation is noted. By using a clinician-based diagnostic decision as the reference criterion, there is a risk of circularity, in clinically weighting the diagnosis of melancholia to certain items, and also selecting similar items for the Q-sort, so creating associations. However, the selected Q-sort items respected historical emphases in describing melancholia. In addition, as demonstrated in Table 1, many of the selected items emerged as minimally differentiating or non-differentiating, arguing against a simple circularity process. Finally, high (blinded) inter-rater agreement in judging whether a patient had a melancholic or non-melancholic depressive disorder argues against idiosyncratic diagnostic subtyping.

Although we asked patients to ignore medication side-effects, we accept that any such impact cannot be reliably ensured. However, if operative, we would anticipate that it would have impacted similarly on both rating strategies and therefore been largely ‘controlled’.

Although we sought to measure ‘melancholia’ rather than establish its status as a categorical subtype, analyses failed to find any evidence of a bimodal distribution of scores, which would support a categorical binary model. However, no symptom item showed absolute specificity to either diagnostic subtype, with most (whether weighted to melancholic or non-melancholic depression) returning minor mean differences at best. Thus, and as indicated in the Introduction, most depressive symptoms are imprecise and lack clear specificity. Exceptions might include symptoms that tap categorical constructs (e.g. psychotic symptoms or psychomotor signs).

In our previous studies of melancholia (Parker & Hadzi-Pavlovic, Reference Parker and Hadzi-Pavlovic1996), latent class analyses of CORE item probabilities established low differentiation of symptoms at best, whereas ‘signs’ of psychomotor disturbance had low latent class probabilities in the non-melancholic class and distinct probabilities in the non-melancholic class. This allowed us (in conjunction with mixture analyses rejecting a unimodal distribution) to conclude that observable psychomotor disturbance went a considerable way to meeting ‘necessary and sufficient’ criteria for the definition of melancholia. Thus, bimodality is likely to depend on the presence of items that are specific to one class and as we had no such variable (e.g. psychotic symptoms) in our item set, such a distribution could not be demonstrated.

We can reconcile results from these two differing study approaches to suggest that a percentage of those with ‘true melancholia’ are likely to have psychotic features and another percentage are likely to have overt psychomotor disturbance. If melancholia is defined by the presence of one or both of those features, then categorical distinction is likely to be able to be made to those with residual depressive conditions, and a bimodal distribution demonstrated on quantifying such constituent features. However, if melancholia is alternatively defined merely by a set of symptoms that are over- or under-represented in those with melancholia (as indicated here and in general), the absence of clear-cut symptom specificity will prevent bimodality being demonstrated. Our study was designed to develop a symptom measure of melancholia, not to test the hypothesis that melancholia is a categorical entity.

Our measure could be reduced from its current set of 32 items, preserving those most specific to both melancholia and non-melancholic depression. However, we would argue that the initial step would encompass replication studies preserving the full set for testing across differing rating instructions, particularly in evaluating their prototypic status as against severity-based strategies. Refinement of the item set, and decisions as to whether a Q-sort strategy is the optimal strategy, might then best occur following analysis of multiple data sets. We do note, however, that item reduction does not necessarily present a key objective for the Q-sort strategy, as any derived algorithms rely on item placements within the grid as opposed to the specific number of items selected. We conclude that the present study has advanced the objective of producing a symptom-based strategy for distinguishing melancholic and non-melancholic depression. Future research will seek to clarify and refine item-sets adopting the Q-sort strategy or any strategy that seeks to measure ‘melancholia’ on the basis of the comparative relevance of symptoms descriptors, and in particular in determining whether the prototypical nature of symptoms, as opposed to symptom severity, is the more salient domain to differentiate any melancholic depressive subtype.

Acknowledgements

This work was supported by Program Grant 510135 from the National Health and Medical Research Council of Australia, an Infrastructure Grant from NSW Health and by the Heine Foundation.

Declaration of Interest

None.

References

Agha, M, Ibrahim, MT (1984). Algorithm AS 203: maximum likelihood estimation of mixtures of distributions. Applied Statistics 33, 327332.CrossRefGoogle Scholar
Bech, P (2002). The Bech–Rafaelsen Melancholia Scale (MES) in clinical trials of therapies in depressive disorders: a 20-year review of its use as outcome measure. Acta Psychiatrica Scandinavica 106, 252264.CrossRefGoogle ScholarPubMed
Bech, P, Rafaelsen, OJ (1980). The use of rating scales exemplified by a comparison of the Hamilton and the Bech–Rafaelsen Melancholia Scale. Acta Psychiatrica Scandinavica 62 (Suppl. 285), 128132.CrossRefGoogle Scholar
Block, J (1961). The Q-sort Method in Personality Assessment and Psychiatric Research. Charles C. Thomas: Springfield, IL.CrossRefGoogle Scholar
Carney, MW, Roth, M, Garside, RF (1965). The diagnosis of depressive syndromes and the prediction of E.C.T. response. British Journal of Psychiatry 111, 659674.CrossRefGoogle ScholarPubMed
Demyttenaere, K, De Fruyt, J (2003). Getting what you ask for: on the selectivity of depression rating scales. Psychotherapy and Psychosomatics 72, 6170.CrossRefGoogle ScholarPubMed
Jones, EE (1985). Manual for the Psychotherapy Process Q-sort. Unpublished manuscript, University of California, Berkeley.Google Scholar
Kendell, RE (1989). Clinical validity. Psychological Medicine 19, 4555.CrossRefGoogle ScholarPubMed
Kraemer, HC (1992). Evaluating Medical Tests: Objective and Quantitative Guidelines. Sage: Newbury Park, CA.Google Scholar
Leventhal, AM, Rehm, LP (2005). The empirical status of melancholia: implications for psychology. Clinical Psychology Review 25, 2544.CrossRefGoogle ScholarPubMed
McKeown, BF, Thomas, DB (1988). Q Methodology. Sage: Newbury Park, CA.CrossRefGoogle Scholar
Nelson, JC, Charney, DS (1981). The symptoms of major depressive illness. American Journal of Psychiatry 138, 113.Google ScholarPubMed
Parker, G, Hadzi-Pavlovic, D (1996). Melancholia: A Disorder of Movement and Mood. Cambridge University Press: New York.CrossRefGoogle Scholar
Parker, G, Hilton, T, Bains, J, Hadzi-Pavlovic, D (2002). Cognitive-based measures screening for depression in the medically ill: the DMI-10 and DMI-18. Acta Psychiatrica Scandinavica 105, 419426.CrossRefGoogle ScholarPubMed
Pilowsky, I, Levine, S, Boulton, DM (1969). The classification of depression by numerical taxonomy. British Journal of Psychiatry 115, 937945.CrossRefGoogle ScholarPubMed
Rush, AJ, Hiser, W, Giles, DE (1987). A comparison of self-reported versus clinician-related symptoms in depression. Journal of Clinical Psychiatry 48, 246248.Google ScholarPubMed
Rush, AJ, Trivedi, MH, Ibrahim, HM, Carmody, TJ, Arnow, B, Klein, DN, Markowitz, JC, Ninan, PT, Kornstein, S, Manber, R, Thase, ME, Kocsis, JH, Keller, MB (2003). The 16-item Quick Inventory of Depressive Symptomatology (QIDS) Clinician Rating (QIDS-C) and Self-Report (QIDS-SR): a psychometric evaluation in patients with chronic major depression. Biological Psychiatry 54, 573583.CrossRefGoogle ScholarPubMed
Rush, AJ, Weissenburger, JE (1994). Melancholic symptom features and DSM-IV. American Journal of Psychiatry 151, 489498.Google ScholarPubMed
Smolka, M, Stieglitz, RD (1999). On the validity of the Bech–Rafaelsen Melancholia Scale (BRMS). Journal of Affective Disorders 54, 119128.CrossRefGoogle ScholarPubMed
Spitzer, RL (1983). Psychiatric diagnosis: are clinicians still necessary? Comprehensive Psychiatry 24, 399411.CrossRefGoogle ScholarPubMed
Taylor, MA, Fink, M (2006). Melancholia: The Diagnosis, Pathophysiology, and Treatment of Depressive Illness. Cambridge University Press: New York.CrossRefGoogle Scholar
Watts, S, Stenner, P (2005). Doing Q methodology: theory, method and interpretation. Qualitative Research in Psychology 2, 6791.CrossRefGoogle Scholar
Zimmerman, M, Coryell, W (1987). The Inventory to Diagnose Depression (IDD): a self-report scale to diagnose major depressive disorder. Journal of Consulting and Clinical Psychology 55, 5559.CrossRefGoogle Scholar
Figure 0

Fig. 1. Final Q-sort grid template, illustrating the template allowed for patients to allocate prescribed numbers of ‘least characteristic’ through ‘neutral’ to ‘most characteristic’ features.

Figure 1

Table 1. Item mean scores for clinically diagnosed (‘extremely confident’) melancholic and non-melancholic patients (n=74)

Figure 2

Table 2. Area under the curve (AUC) comparison for SDRS and Q-sort measures

Figure 3

Table 3. Receiver operating characteristic (ROC) analyses for SDRS and Q-sort measures

Figure 4

Table 4. Diagnostic agreement (κ values) between differing diagnostic strategies (whole sample)

Figure 5

Table 5. Validator variables