Introduction
There is increasing awareness of the importance of subjective measures including quality of life in medical care. Such measure are often referred to as patient-reported outcomes or ‘PROs’ (Greenhalgh, Reference Greenhalgh2009). Emotional distress and depression are important PROs that have a major effect on quality of life (Moussavi et al. Reference Moussavi, Chatterji, Verdes, Tandon, Patel and Ustun2007). Consequently, it has been recommended that medical patients, such as those with cancer (Carlson et al. Reference Carlson, Waller and Mitchell2012), are screened for emotional distress and depression (Pignone et al. Reference Pignone, Gaynes, Rushton, Burchell, Orleans, Mulrow and Lohr2002; NICE, 2009), but only if there are facilities to provide treatment for identified cases (USPSTF, 2009). Despite an extensive literature on such screening (Carlson et al. Reference Carlson, Waller and Mitchell2012), there is limited information on the practicalities of carrying it out, an important aspect of which is when and where to administer the screening measures.
The most convenient and widely used strategy is to administer a questionnaire, such as the Hospital Anxiety and Depression Scale (HADS; Zigmond & Snaith, Reference Zigmond and Snaith1983), in the medical clinic, taking advantage of the time patients spend waiting to go into their consultation. The patient's questionnaire score is then used to determine whether they have a significant level of distress that requires attention and whether they need a further assessment to determine whether they have a depressive disorder.
However, there is a potential problem with this strategy; measuring distress in the clinic prior to the consultation might result in a transient inflation of the score because of the clinical context and the anticipation of the consultation. This phenomenon would be similar to that referred to as the ‘white-coat effect’ in the measurement of blood pressure (Gerin et al. Reference Gerin, Ogedegbe, Schwartz, Chaplin, Goyal, Clemow, Davidson, Burg, Lipsky, Kentor, Jhalani, Shimbo and Pickering2006). If such inflation were to occur it would result in false positives in the identification of patients suffering from significant distress and would lead to more patients than necessary being given assessment interviews for depression. Such an effect would therefore be important in increasing both inconvenience to patients and the costs to clinical services.
As far as we are aware, although there are studies of the test–retest reliability of measures of quality of life and distress (Hjermstad et al. Reference Hjermstad, Fossa, Bjordal and Kaasa1995; Bakker et al. Reference Bakker, Terluin, van Marwijk, van Mechelen and Stalman2009), the course of distress over a series of cancer consultations (van Dooren et al. Reference van Dooren, Seynaeve, Rijnsburger, Duivenvoorden, Essink-Bot, Tilanus-Linthorst, Klijn, de Koning and Tibben2005) and of the influence of the content of the consultation on distress (van Dulmen et al. Reference van Dulmen, Fennis, Mokkink, van der Velden and Bleijenberg1995), this particular question has not been specifically addressed in the published literature.
We therefore aimed to find out whether oncology patients who were high scorers on the HADS questionnaire, completed while waiting for their cancer consultation in clinic, remained high scorers when completing a repeat HADS questionnaire a week later at home. Specifically, we aimed to determine: (a) what proportion of the patients who scored high (total score of ⩾15) on the HADS prior to their consultation still had a high score when reassessed at home 1 week later; and (b) how much the mean HADS score had changed between these two occasions and how much any fall could be accounted for by regression to the mean.
Method
To address the research question we analysed data that had been routinely collected by an established distress and depression screening service operating in multiple cancer out-patient clinics in Scotland, UK.
Routine screening procedure
The screening service was in operation in numerous clinics, each specializing in one of a variety of cancer types including breast, colorectal, gynaecological, lung and genito-urinary. All patients attending the clinics were asked to complete the HADS on touch-screen computers (or, where computers were not available, on paper) prior to their medical consultation. The results of screening were given to their cancer clinician at the time of the consultation. In addition, all patients who had scored high on the HADS in clinic were telephoned at home, approximately 1 week later, and assessed for depression using the major depression component of the Structured Clinical Interview for DSM-IV (SCID; First et al. Reference First, Spitzer, Gibbon and Williams1999).
Collection of repeat HADS scores
As part of routine clinical service data collection during March and April 2009, patients who had scored ⩾15 on the HADS in the clinic were asked to complete the HADS again at home over the telephone, immediately before they were given the routine interview to assess them for depression. We analysed these clinical data to address the research question.
Ethical approval
We obtained ethical approval from the local Research Ethics Committee to use the data in this way and also obtained each patient's permission to use their anonymized clinical data for research.
Measure
The HADS is the most extensively studied distress scale in cancer patients and is very widely used as a first stage in screening medical patients for depression (Vodermaier et al. Reference Vodermaier, Linden and Siu2009). The HADS asks patients how they have been feeling over the past 2 weeks. It has 14 items: seven on each of the anxiety and depression subscales. Each item is rated from 0 to 3, resulting in a total HADS score between 0 and 42, with higher scores indicating more severe symptoms (Zigmond & Snaith, Reference Zigmond and Snaith1983) A recent review concluded that the HADS was an effective measure of emotional distress but that the subscales were unable to differentiate consistently between anxiety and depression (Cosco et al. Reference Cosco, Doyle, Ward and McGee2012). A total HADS score of ⩾15 has been reported to be optimal to identify cancer patients likely to have major depression on further assessment (Walker et al. Reference Walker, Postma, McHugh, Rush, Coyle, Strong and Sharpe2007).
Analysis
We analysed these data to determine whether patients with high HADS scores measured in the clinic prior to their consultation still had high scores when measured later at home. We therefore calculated the proportion of patients who still had a high score (⩾15) when the HADS was repeated at home. We also determined the mean change in the total HADS score between clinic and home.
Individual patient distress scores vary over time. Patients scoring high or low are likely to score closer to the mean score of all assessed patients on later reassessment, a phenomenon known as ‘regression to the mean’. If all patients who completed a first HADS also completed a second HADS, we would expect the effect of these variations on the mean score of the whole group to even out. However, as we only had follow-up data the subsample of initial high scorers we would expect the average of the reassessed scores in this subsample to be lower because of this ‘regression to the mean’ effect (Barnett et al. Reference Barnett, van der Pols and Dobson2005). Therefore, to isolate the effect of the clinic from this phenomenon we estimated the size of the anticipated regression to the mean. This involved using more than 5000 HADS scores that had been collected by the screening service in similar clinics from 2007 to 2010 to obtain details of the overall distribution of HADS scores in this population. These details included the variance and covariance of repeated scores. The technical details of this approach are provided in the Appendix and described elsewhere (Das & Mulder, Reference Das and Mulder1983). Finally, we conducted an exploratory analysis describing and comparing the changes in the HADS anxiety and depression subscales to determine whether these differed in the amount they changed.
Results
The service had offered screening to all patients attending the cancer clinics except for a small number (<5%) who were unable to complete questionnaires because they were too unwell or had severe cognitive or communication problems. A further 10% of patients were missed by the service, mainly because they were taken straight to their consultation before being screened, and an additional 7% refused to participate in screening.
A total of 1691 patients were screened in clinic during the period from which the data analysed were derived. Of these, 395 scored high on the HADS in clinic and 329 were listed for further assessment at home (the remainder were not listed for a variety of reasons including a recent depression assessment, cognitive or communication problems or exclusion by their clinician, usually because they were considered to be too ill). Repeat HADS were not available on 111 of these patients for several reasons, but mainly because they were not contacted by the screening service within the 1-month time window used for the analysis. The final patient sample is shown in Fig. 1. A total of 218 patients were given a repeat HADS at home by the screening service during the data collection period. This is the sample analysed.

Fig. 1. Derivation of patient sample. The patients initially identified with distress (HADS ⩾15) were screened during the period from 25 February 2009 to 31 March 2009.
In the analysed sample, 159 (73%) patients were female and the median age was 61 years [interquartile range (IQR) 53–70 years]. Almost all of the patients were attending follow-up appointments. The median interval between the clinic and repeat HADS assessments was 6 days (IQR 5–8 days). The 111 patients who did not have a repeat HADS at home had similar distributions of sex, age and clinic HADS scores and attended similar types of cancer clinics. However, there were more new and good prognosis patients included in the sample reassessed. The patients' characteristics and the comparison of those with and without a HADS rated at home are shown in Table 1.
Table 1. Characteristics of the analysed sample compared with those in the group of eligible patients not included

HADS, Hospital Anxiety and Depression Scale; s.d., standard deviation.
a Age in years and HADS scores were compared using the Wilcoxon rank sum test. All other p values were from χ2 tests.
b Appointment type was unknown for 10 patients.
c Poor prognosis was defined for lung (non-lung) cancer patients as a life expectancy of <3 (12) months. Prognosis was unknown for six patients.
Figure 2 shows the distributions of HADS scores when patients were (a) assessed in clinic and (b) reassessed at home. Fig. 3 shows the change in HADS scores for each individual patient. As a result of the large variance in the HADS scores, there was also considerable variability in the change scores between the two assessments despite a high intra-class correlation between repeated measurements (ICC = 0.83).

Fig. 2. Hospital Anxiety and Depression Scale (HADS) scores of patients (n=218) in the study sample (a) when assessed in clinic and (b) when reassessed at home.

Fig. 3. Change in Hospital Anxiety and Depression Scale (HADS) total score from clinic to follow-up at home plotted against initial HADS score in clinic. Circles indicate patients whose reassessment score fell below 15. Patients plotted above the dashed line had a higher HADS score on reassessment whereas those below the line had a lower score. A degree of ‘jitter’ was applied to separate out overlapping data points.
Almost three-quarters (72.5%; 158/218) of the initial high-scoring patients were still high scorers at reassessment [95% confidence interval (CI) 66.6–78.4]. The mean change in total HADS score was a reduction of 1.74 points (95% CI 1.09–2.39).
Our estimate of the regression-to-mean effect was an average reduction of 1.21 points (95% CI 1.02–1.43). Hence regression to the mean potentially accounts for the majority of this observed fall in mean score, meaning that the effect of measuring in clinic was very small. The exploratory analysis of changes in HADS subscales found a mean reduction in the anxiety subscale of 1.26 points (95% CI 0.84–1.67) and in the depression subscale of 0.48 points (95% CI 0.12–0.85). The difference between the scales in the reduction in scores was statistically significant (p < 0.001).
Discussion
We had hypothesized that patients' HADS scores might be transiently inflated when measured in the clinic prior to the consultation because of the potentially stressful clinical surroundings and anticipation of the upcoming appointment. If that were the case it would question the utility of this widely used strategy for screening for distress and depression in medical clinics. We found that the majority of the patients who scored high on the HADS in clinic prior to their cancer consultation (72.5%) were still high scorers when reassessed at home a week later. That also means that 27.5% of patients who had scored high in the clinic were no longer high scorers when reassessed later at home. However, further analysis indicates that despite large variability at the individual patient level, the mean HADS total score in the sample fell by only 1.74 points between the two assessments, most of which could reasonably be attributed to the natural tendency for individuals who score high on an initial measurement to score lower on later reassessment (regression to the mean), independent of the setting in which the measurement was made. Our hypothesis was therefore not supported and measuring distress in the clinic prior to the consultation is a reasonable strategy to adopt.
There was considerable individual variability in the size of change scores between the two assessments despite a high intra-class correlation between repeated measurements from the same patient. This was due to large overall variance in the scores, a property common to measures of psychological distress. It is unclear whether this variation is due to a large random error in the measurements or a reflection of actual fluctuations in the severity of distress over time. Nonetheless, our sample of 218 patients was sufficiently large to estimate the mean change for the sample with reasonable accuracy.
It is notable that, whereas the screening service used the total score in the HADS to define significant distress, the fall in score was slightly larger on the anxiety subscale. This may be because the consultation has a greater transient effect on anxiety than on depression. It may also imply that scales that measure only depressive symptoms are even less subject to a clinic effect.
We are not aware of any studies that have directly addressed the question we have posed. We identified a test–retest reliability study of the European Organization for Research and Treatment of Cancer (EORTC) quality of life measures, which include emotional functioning, that compared questionnaire scores administered to 270 patients attending routine post-treatment follow-up visits to cancer clinics with their score at home 4 days later and found generally good agreement (Hjermstad et al. Reference Hjermstad, Fossa, Bjordal and Kaasa1995). Other studies that have administered repeated psychological assessment have examined distress trajectories over longer periods of time (Hinnen et al. Reference Hinnen, Ranchor, Sanderman, Snijders, Hagedoorn and Coyne2008) or before and after consultations (van Dooren et al. Reference van Dooren, Seynaeve, Rijnsburger, Duivenvoorden, Essink-Bot, Tilanus-Linthorst, Klijn, de Koning and Tibben2005) but we found none that directly addressed the possible effect of the clinical context on the measurement score.
There were limitations to this study. First, we analysed data collected by a routine screening service operating in cancer clinics; the findings may not therefore generalize to other clinical settings. Second, the service administered a second HADS only to patients who had scored high in clinic. This meant that our observed HADS scores obtained at home underestimated the true proportion of patients who would have scored high had all patients been reassessed, as it would be likely that some of the patients who scored low in clinic would have scored high on the second occasion. This limitation was addressed by estimating the regression to the mean. Third, there were missing data from patients who could not be contacted during the limited time window in which repeat HADS were administered. However, the characteristics of patients on whom we had analysable data and those on whom we did not were mostly similar; systematic bias is therefore unlikely. Fourth, there may be limits to the intrinsic test–retest reliability of the HADS (as opposed to real changes in symptoms) but this is unlikely to be large over this time period, or to represent a systematic bias. Fifth, patients completed the HADS on a touch-screen computer or on paper in the clinic, but the follow-up assessment was carried out by reading out the scale over the telephone. It is possible that administering the HADS over the telephone causes patients to score differently. Previous studies have found good agreement between self-completed and verbally completed distress screening questionnaires, with a tendency for the latter to record a lower score (Pinto-Meza et al. Reference Pinto-Meza, Serrano-Blanco, Penarrubia, Blanco and Haro2005; Cheung et al. Reference Cheung, Goh, Thumboo, Khoo and Wee2006). Such a bias, if present, would reduce further the observed fall in HADS score attributable to the effect of measurement in the clinic. Future studies could use the same mode of administration to avoid this issue. Sixth, the content of the consultation and its meaning for the patient, whether positive or negative, might have accounted for some of the changes in scores and we were not able to assess this. However, most of the consultations were for follow-up and not for the communication of new diagnoses. The effect of consultation type could be addressed in future studies. Seventh, because the results of the screening were given to the clinician before the consultation, it is possible that they might have taken action to address the distress, for example by referring the patient for psychological treatment. This is, however, very unlikely to have occurred within 1 week. Finally, although the average change in scores was small, the intra-patient variability was high, with some patients scoring very differently on reassessment. It is possible, therefore, that a minority of patients are affected considerably by the clinic setting. Consequently, we cannot rule out the possibility of an important ‘clinic effect’ for some individuals.
Conclusions
In conclusion, most patients who scored high on the HADS administered in clinic prior to their medical consultation remained high scorers when reassessed at home a week later. There was only a small reduction in mean score, most of which could be attributed to regression to the mean. Therefore, the widely used strategy of asking patients to complete a screening questionnaire for distress while they wait for their clinic appointment is a reasonable method of identifying those who have significant distress and also a useful first step in identifying those who require an interview for the assessment of possible depressive disorder. The increasing use of telephones and the internet provides opportunities to screen patients away from the clinic, thereby potentially avoiding the issue of clinic context. However, the pre-consultation waiting time has long provided an opportunity to undertake such clinic-based screening, and is likely to continue to do so in the future.
Appendix
Estimating the regression-to-the-mean effect
As only patients with an initial high score were followed up, the scores on reassessment were subject to regression to the mean. We estimated the average drop in scores caused by this effect as follows.
Suppose that a patient's HADS score, H, is the sum of their (constant) true underlying score, S, and an independent error term, e, where S is distributed according to some arbitrary density function with variance σs 2 and the errors are normally distributed with mean 0 and variance σe 2. The total variance is then Var(H) = σt 2 = σs 2 + σe 2 and ρ = σs 2/σt 2 is the intra-patient correlation between repeated scores from the same individual.
We wanted to estimate the expected difference between a pair of repeated HADS scores, H 1 and H 2, conditional on H 1 being ⩾15. That is, we wanted to estimate E[H 1−H 2|H 1 ⩾15].
For a continuous H it can be shown that

where g(hc) is the probability density function for H evaluated at hc, and G(hc) is the corresponding cumulative distribution function (Das & Mulder, Reference Das and Mulder1983). From the large sample of scores collected by the screening service in similar clinics from 2007 to 2010, we obtained empirical estimates of g(hc) and G(hc). Using data from the 5215 patients who had HADS scores measured in subsequent clinic visits during this period, we estimated σt 2 and ρ as the correlation of scores obtained 1 week apart. We did this by modelling the covariance matrix of repeated scores in a linear regression with random intercept and exponential covariance structure to account for a decreasing correlation over time. A 95% quantile-based CI for the regression-to-mean estimate was derived through bootstrapping.
Although technically a discrete scale, we applied the HADS (range 0–42) with the above result, introducing a continuity correction by evaluating g(.) and G(.) at h c = 14.5 by approximating a theoretical continuous curve. The approach was verified through simulation studies and sensitivity analysis.
Acknowledgements
We thank the clinical service and patients who provided the data. This work was funded as part of a Programme Grant from Cancer Research UK (CRUK), ref. C5547/A7375.
Declaration of Interest
None.