Introduction
Around a third of stroke survivors suffer from low mood, with about 10% receiving a diagnosis of major depression and many experiencing co-morbid anxiety (House et al. Reference House, Dennis, Mogridge, Warlow, Hawton and Jones1991; Ayerbe et al. Reference Ayerbe, Ayis, Crichton, Wolfe and Rudd2013). Mood disturbance has been linked to greater dependence in activities of daily living, institutionalization and mortality, and poorer quality of life (Kotila et al. Reference Kotila, Numminen, Waltimo and Kaste1999; House et al. Reference House, Knapp, Bamford and Vail2001; Pohjasvaara et al. Reference Pohjasvaara, Vataja, Leppävuori, Kaste and Erkinjuntti2001; Williams et al. Reference Williams, Ghose and Swindle2004; Donnellan et al. Reference Donnellan, Hickey, Hevey and O'Neill2010). However, the management of mood disorders post-stroke is often suboptimal; problems are frequently undiagnosed and inadequately treated (Hackett et al. Reference Hackett, Yapa, Parag and Anderson2005) and most survivors report insufficient help to deal with their emotional needs (National Audit Office, 2010). This may result from difficulties in professionals' recognition of the symptoms of low mood in patients with stroke because of the overlap between stroke-related impairments, hospitalization and somatic symptoms of mood disorder (Hart & Morris, Reference Hart and Morris2007).
In acknowledgement of this shortcoming, improving management and access to psychological services has become a priority (NHS Improvement, 2011). Timely diagnosis is an essential element, facilitating early access to treatment and improving prognosis (Jorge et al. Reference Jorge, Robinson, Arndt and Starkstein2003; Mitchell et al. Reference Mitchell, Veith, Becker, Cain, Fruin, Tirschwell and Teri2009); thus, clinical guidelines recommend that stroke survivors should be routinely screened for the presence of mood disorders, using a validated tool (National Stroke Foundation, 2010; NICE, 2010). Screening rates have improved substantially over the past decade (Bowen et al. Reference Bowen, Knapp, Hoffmann and Lowe2005; National Audit Office, 2010) but 20% of appropriate patients are still not screened (NSSA, 2011). One issue that affects implementation of the guidelines is the choice of screening tool; clinicians report that lack of knowledge and consensus about the best measures to use are barriers (Burton et al. Reference Burton, Tyson and McGovern2013). As part of an ongoing programme of work to select and implement effective measurement tools in stroke rehabilitation, we aimed to systematically review the psychometric properties of tools to screen for mood disorders post-stroke to identify the most suitable for clinical practice. To facilitate uptake in clinical practice, we also aimed to identify optimal cut-off scores for major and any degree of depression and anxiety, and assess clinical utility (or feasibility).
Method
Study identification and selection
Electronic databases (AMED, PsycINFO, CINAHL, Medline and EMBASE) were searched from their inception to May 2013, using the following keywords: stroke OR cerebrovascular accident OR CVA and screen* OR tool OR measure* OR questionnaire OR scale AND mood OR depression OR anxiety OR emotion OR distress. All searches were limited to English language and human studies.
We also searched the reference lists of selected articles, previously published reviews and the Stroke Group of the Cochrane Library. The titles, abstracts and then full texts were screened by two independent reviewers to identify articles that reported validation of tools to screen for low mood, distress and/or anxiety in people with stroke. Articles that assessed the properties of mood screening tools, reported both sensitivity and specificity compared with a gold standard measure and aimed to identify people who needed further evaluation or treatment were selected. We excluded studies of tools that were not designed as a screening tool and intended to make a full assessment of mood or diagnosis, including the Montgomery–Asberg Depression Rating Scale (MADRS; Montgomery & Asberg, Reference Montgomery and Asberg1979), the Hamilton Rating Scale for Depression (HAMD; Hamilton, Reference Hamilton1960), the Cornell Scale (Alexopoulos et al. Reference Alexopoulos, Abrams, Young and Shamoian1988), the Geriatric Mental State Examination (GMS; Copeland et al. Reference Copeland, Dewey, Wood, Searle, Davidson and McWilliam1987), the Post-Stroke Depression Rating Scale (PSDS; Gainotti et al. Reference Gainotti, Azzoni, Razzano, Lanzillotta, Marra and Gasparini1997) and the Symptom Checklist 90 (SCL-90; Derogatis et al. Reference Derogatis, Lipman and Covi1973). We also excluded studies of tools that assessed generic related constructs such as quality of life; merely involved the validation of a language translation of a tool; were conference papers or abstracts where the data could not be extracted; and where less than 50% of the participants had suffered a stroke or data from people with stroke could not be extracted.
Data extraction
We extracted independently from the selected articles data regarding the participant samples and settings (where available), selection criteria, tools evaluated, type of disorder assessed, and sensitivity and specificity. Cut-off scores for major depression and any degree of depression and anxiety were identified. Positive (PPV) and negative predictive values (NPV) at each cut-off score were calculated from available data where possible. Final data were agreed by consensus with a third party to arbitrate if necessary. Sensitivity ⩾0.8 and specificity ⩾0.6 were considered sufficiently accurate. There is often a trade-off between sensitivity and specificity and these (widely used) criteria are considered appropriate for clinical practice, where the costs of failing to identify an individual with difficulties are greater than the costs of further evaluation of those who may not require treatment (Bennett & Lincoln, Reference Bennett and Lincoln2006).
Any cut-off score that did not yield data meeting these criteria in at least one study was excluded and any tools that did not report any cut-off scores with sufficient sensitivity and specificity were rejected.
Tools that met these criteria were then assessed for clinical utility (the feasibility of using a tool in clinical practice) from the original articles (where possible), marketing material (including costs), the tools' authors and instruction manuals. A previously published tool to assess clinical utility of outcome measures (Connell & Tyson, Reference Connell and Tyson2012) was reviewed and adapted by a consultation group of occupational therapists and clinical psychologists working in stroke services in a large UK conurbation to reflect their priorities for selecting screening tools. These are summarized as follows: as access to clinical psychology is limited for most stroke survivors, it is important that screening tools can be completed by any member of the multidisciplinary team to identify moderate to severe difficulties for onward referral (NSSA, 2008). This is often undertaken in addition to their traditional workload so screening tools need to be quick and easy to administer, with minimal training requirements. Finally, they need to be inexpensive or, preferably, freely available, particularly because, in the current financial climate, a cheaper tool would be chosen over one incurring costs if it performed equally well in terms of psychometrics.
The final utility criteria and scores were:
-
(a) Time to administer and score the measure: ⩽5 min (score 2); 6–10 min (score 1); ⩾11 min (score 0).
-
(b) Initial costs for purchase of the measure (e.g. starter kit including manual): 2 = freely available; 1 = cost < £100; 0 = cost ⩾£100 or unavailable.
-
(c) Additional cost per record form: 1 = no additional costs; 0 = additional cost or unavailable.
-
(d) Need for specialist training to administer and score the measure: 1 = no specialist training required; 0 = specialist training required.
Summing these scores gave a maximum of six points, with higher scores indicating greater clinical utility. Tools that scored < 6 were rejected at this stage.
Results
The searches revealed 30 papers that met the selection criteria, involving 3751 stroke survivors and 27 screening tools. All 27 tools were tested to detect depression and eight were also used to identify anxiety. The tools' progress through the review is summarized in Fig. 1 and detailed below. The selected tools are described in Table 1 and details of the population tested are presented in Table 2. Most of the selected papers recruited their participants through acute admissions to hospital (Parikh et al. Reference Parikh, Eden, Price and Robinson1988; Williams et al. Reference Williams, Brizendine, Plue, Bakas, Tu, Hendrie and Kroenke2005; Bennett et al. Reference Bennett, Thomas, Austen, Morris and Lincoln2006; Lightbody et al. Reference Lightbody, Auton and Baldwin2007; Healey et al. Reference Healey, Kneebone, Carroll and Anderson2008; Lee et al. Reference Lee, Tang, Yu and Cheung2008; Hacker et al. Reference Hacker, Stark and Thomas2010; de Man-van Ginkel et al. Reference de Man-van Ginkel, Gooskens, Schepers, Schuurmans, Lindeman and Hafsteinsdóttir2012a ; Kang et al. Reference Kang, Stewart, Kim, Jang, Kim, Bae, Kim, Shin, Park, Cho and Yoon2012), often consecutively (Watkins et al. Reference Watkins, Daniels, Jack, Dickinson and van den Broek2001a ,Reference Watkins, Leathley, Daniels, Dickinson, Lightbody, van den Broek and Jack b , Reference Watkins, Lightbody, Sutton, Holcroft, Jack, Dickinson, van den Broek and Leathley2007; Aben et al. Reference Aben, Verhey, Lousberg, Lodder and Honig2002; Benaim et al. Reference Benaim, Cailly, Perennou and Pelissier2004; Tang et al. Reference Tang, Chan, Chiu, Wong, Kwok, Mok and Ungvari2004a ; Berg et al. Reference Berg, Lonnqvist, Palomaki and Kaste2009; Sagen et al. Reference Sagen, Gunnar Vik, Moum, Morland, Finset and Dammen2009; de Man-van Ginkel et al. Reference de Man-van Ginkel, Hafsteinsdóttir, Lindeman, Burger, Grobbee and Schuurmans2012b ; Tham et al. Reference Tham, Kwan and Ang2012). Others recruited from in-patient rehabilitation facilities (Tang et al. Reference Tang, Ungvari, Chiu and Sze2004b ,c; Turner-Stokes et al. Reference Turner-Stokes, Kalmus, Hirani and Clegg2005; Roger & Johnson-Greene, Reference Roger and Johnson-Greene2009; de Man-van Ginkel et al. Reference de Man-van Ginkel, Gooskens, Schepers, Schuurmans, Lindeman and Hafsteinsdóttir2012a ), or a mixed in-patient and community-dwelling population (Shinar et al. Reference Shinar, Gross, Price, Banko, Bolduc and Robinson1986; Lincoln et al. Reference Lincoln, Nicholl, Flannaghan, Leonard and Van der Gucht2003; Sivrioglu et al. Reference Sivrioglu, Sivrioglu, Ertan, Ertan, Cankurtaran, Aki, Uluduz, Ince and Kirli2009; Turner et al. Reference Turner, Hambridge, White, Carter, Clover, Nelson and Hackett2012); only one study recruited solely in the community (Agrell & Dehlin, Reference Agrell and Dehlin1989). Three studies involved participants who were also taking part in a clinical trial (O'Rourke et al. Reference O'Rourke, MacHale and Signorini1998; Lincoln et al. Reference Lincoln, Nicholl, Flannaghan, Leonard and Van der Gucht2003; Williams et al. Reference Williams, Brizendine, Plue, Bakas, Tu, Hendrie and Kroenke2005). Most assessments were made in the acute (within 1 month) (Shinar et al. Reference Shinar, Gross, Price, Banko, Bolduc and Robinson1986; Watkins et al. Reference Watkins, Daniels, Jack, Dickinson and van den Broek2001a ,Reference Watkins, Leathley, Daniels, Dickinson, Lightbody, van den Broek and Jack b , Reference Watkins, Lightbody, Sutton, Holcroft, Jack, Dickinson, van den Broek and Leathley2007; Aben et al. Reference Aben, Verhey, Lousberg, Lodder and Honig2002; Tang et al. Reference Tang, Ungvari, Chiu and Sze2004b ,Reference Tang, Ungvari, Chiu, Sze, Yu and Leung c ; Bennett et al. Reference Bennett, Thomas, Austen, Morris and Lincoln2006; Lightbody et al. Reference Lightbody, Auton and Baldwin2007; Lee et al. Reference Lee, Tang, Yu and Cheung2008; Berg et al. Reference Berg, Lonnqvist, Palomaki and Kaste2009; Roger & Johnson-Greene, Reference Roger and Johnson-Greene2009; Hacker et al. Reference Hacker, Stark and Thomas2010; Kang et al. Reference Kang, Stewart, Kim, Jang, Kim, Bae, Kim, Shin, Park, Cho and Yoon2012) or subacute (from 1 to 6 months) stages post-stroke (Johnson et al. Reference Johnson, Burvill, Anderson, Jamrozik, Stewart-Wynne and Chakera1995; O'Rourke et al. Reference O'Rourke, MacHale and Signorini1998; Tang et al. Reference Tang, Chan, Chiu, Wong, Kwok, Mok and Ungvari2004a ; Turner-Stokes et al. Reference Turner-Stokes, Kalmus, Hirani and Clegg2005; Williams et al. Reference Williams, Brizendine, Plue, Bakas, Tu, Hendrie and Kroenke2005; Bennett et al. Reference Bennett, Thomas, Austen, Morris and Lincoln2006; Watkins et al. Reference Watkins, Lightbody, Sutton, Holcroft, Jack, Dickinson, van den Broek and Leathley2007; Healey et al. Reference Healey, Kneebone, Carroll and Anderson2008; Berg et al. Reference Berg, Lonnqvist, Palomaki and Kaste2009; Sagen et al. Reference Sagen, Gunnar Vik, Moum, Morland, Finset and Dammen2009; de Man-van Ginkel et al. Reference de Man-van Ginkel, Gooskens, Schepers, Schuurmans, Lindeman and Hafsteinsdóttir2012a ,Reference de Man-van Ginkel, Hafsteinsdóttir, Lindeman, Burger, Grobbee and Schuurmans b ). Five papers considered mood disorders in the long term (more than 6 months) after stroke (Agrell & Dehlin, Reference Agrell and Dehlin1989; Berg et al. Reference Berg, Lonnqvist, Palomaki and Kaste2009; Sivrioglu et al. Reference Sivrioglu, Sivrioglu, Ertan, Ertan, Cankurtaran, Aki, Uluduz, Ince and Kirli2009; Kang et al. Reference Kang, Stewart, Kim, Jang, Kim, Bae, Kim, Shin, Park, Cho and Yoon2012; Turner et al. Reference Turner, Hambridge, White, Carter, Clover, Nelson and Hackett2012) and one study combined assessment scores of participants between 1 week and 2 years post-stroke (Parikh et al. Reference Parikh, Eden, Price and Robinson1988).

Fig. 1. Study flowchart. ADRS, Aphasia Depression Rating Scale; BASDEC, Brief Assessment Schedule Depression Cards; BDI, Beck Depression Inventory (FS, Fast Screen); CES-D, Center for Epidemologic Studies Depression Scale; DISCs, Depression Intensity Scale Circles; GDS, Geriatric Depression Scale (30-item); GDS-15, 15-item Geriatric Depression Scale; GHQ-28, 28-item General Health Questionnaire; HADS, Hospital Anxiety and Depression Scale; K10, Kessler-10; NGRS, Numeric Graphic Rating Scale; PHQ-2, two-item Patient Health Questionnaire; PHQ-9, nine-item version Patient Health Questionnaire; SADQ-H, Stroke Aphasic Depression Questionnaire – Hospital version; SADQ-10, 10-item SADQ; SADQ-H10, 10-item SADQ-H; SIDI, Stroke Inpatient Depression Inventory; SoDS, Signs of Depression Scale; VAMS, Visual Analogue Mood Scales; VASES, Visual Analogue Self-Esteem Scale; WDI, Wakefield Depression Inventory; Zung SDS, Zung Self-Rating Depression Scale.
Table 1. Brief description of each identified measure meeting psychometric criteria in at least one validation study

ADRS, Aphasia Depression Rating Scale; BASDEC, Brief Assessment Schedule Depression Cards; BDI, Beck Depression Inventory; BDI-II, Beck Depression Inventory Second Edition; CES-D, Center for Epidemiologic Studies Depression Scale; GDS, Geriatric Depression Scale (30-item); GDS-15, 15-item Geriatric Depression Scale; GHQ-28, 28-item General Health Questionnaire; HADS, Hospital Anxiety and Depression Scale; PHQ-2, two-item Patient Health Questionnaire; PHQ-9, nine-item Patient Health Questionnaire; SADQ-H, Stroke Aphasic Depression Questionnaire – Hospital version; SADQ-H10, 10-item Stroke Aphasic Depression Questionnaire – Hospital version; SoDS, Signs of Depression Scale; VAMS, Visual Analogue Mood Scale; n.a., not available.
Table 2. Descriptions of the selected papers

ADRS, Aphasia Depression Rating Scale; BASDEC, Brief Assessment Schedule Depression Cards; BDI, Beck Depression Inventory; BDI-II, Beck Depression Inventory, Second Edition; CES-D, Center for Epidemiologic Studies Depression Scale; CIDI, Composite International Diagnostic Interview (Robins et al. Reference Robins, Wing, Wittchen, Helzer, Babor, Burke, Farmer, Jablenski, Pickens, Regier, Sartorius and Towle1988); GDS, Geriatric Depression Scale (30-item); GDS-15, 15-item Geriatric Depression Scale; GHQ-28, 28-item General Health Questionnaire; HADS-A, Hospital Anxiety and Depression Scale – Anxiety subscale; HADS-D, Hospital Anxiety and Depression Scale – Depression subscale; HADS-T, Hospital Anxiety and Depression Scale – Total score; IQR, interquartile range; MADRS, Montgomery–Asberg Depression Rating Scale; MINI, Mini International Neuropsychiatric Interview (Sheehan et al. Reference Sheehan, Lecrubier, Sheehan, Amorim, Janavs, Weiller, Hergueta, Baker and Dunbar1998); NPV, negative predictive value; PAS, Psychiatric Assessment System; PHQ-2, two-item Patient Health Questionnaire; PHQ-9, nine-item Patient Health Questionnaire; PPV, positive predictive value; PSE, Present State Examination (Wing et al. Reference Wing, Cooper and Sartorius1974); SADQ-H, Stroke Aphasic Depression Questionnaire – Hospital version; SADQ-H10, 10-item Stroke Aphasic Depression Questionnaire – Hospital version; SADS, Schedule for Affective Disorders and Schizophrenia (Endicott & Spitzer, Reference Endicott and Spitzer1978); SCAN, Schedules for Clinical Assessment in Neuropsychiatry (WHO, 1992); SCID, Structured Clinical Interview for DSM-IV (First et al. Reference First, Spitzer, Gibbon and Williams1995); s.d., standard deviation; SoDS, Signs of Depression Scale; VAMS, Visual Analogue Mood Scale.
The identified studies used a range of criterion measures as the reference gold standard; most used a psychiatrist's opinion (Agrell & Dehlin, Reference Agrell and Dehlin1989; Benaim et al. Reference Benaim, Cailly, Perennou and Pelissier2004), usually based on a semi-structured interview or assessment tool (Shinar et al. Reference Shinar, Gross, Price, Banko, Bolduc and Robinson1986; Parikh et al. Reference Parikh, Eden, Price and Robinson1988; Johnson et al. Reference Johnson, Burvill, Anderson, Jamrozik, Stewart-Wynne and Chakera1995; O'Rourke et al. Reference O'Rourke, MacHale and Signorini1998; Aben et al. Reference Aben, Verhey, Lousberg, Lodder and Honig2002; Lincoln et al. Reference Lincoln, Nicholl, Flannaghan, Leonard and Van der Gucht2003; Tang et al. Reference Tang, Chan, Chiu, Wong, Kwok, Mok and Ungvari2004a ,Reference Tang, Ungvari, Chiu and Sze b ,Reference Tang, Ungvari, Chiu, Sze, Yu and Leung c ; Williams et al. Reference Williams, Brizendine, Plue, Bakas, Tu, Hendrie and Kroenke2005; Lightbody et al. Reference Lightbody, Auton and Baldwin2007; Healey et al. Reference Healey, Kneebone, Carroll and Anderson2008; Roger & Johnson-Greene, Reference Roger and Johnson-Greene2009; Sagen et al. Reference Sagen, Gunnar Vik, Moum, Morland, Finset and Dammen2009; Kang et al. Reference Kang, Stewart, Kim, Jang, Kim, Bae, Kim, Shin, Park, Cho and Yoon2012; de Man-van Ginkel et al. Reference de Man-van Ginkel, Hafsteinsdóttir, Lindeman, Burger, Grobbee and Schuurmans2012b ; Tham et al. Reference Tham, Kwan and Ang2012; Turner et al. Reference Turner, Hambridge, White, Carter, Clover, Nelson and Hackett2012). Most used DSM criteria (Shinar et al. Reference Shinar, Gross, Price, Banko, Bolduc and Robinson1986; Parikh et al. Reference Parikh, Eden, Price and Robinson1988; Johnson et al. Reference Johnson, Burvill, Anderson, Jamrozik, Stewart-Wynne and Chakera1995; O'Rourke et al. Reference O'Rourke, MacHale and Signorini1998; Aben et al. Reference Aben, Verhey, Lousberg, Lodder and Honig2002; Lincoln et al. Reference Lincoln, Nicholl, Flannaghan, Leonard and Van der Gucht2003; Tang et al. Reference Tang, Chan, Chiu, Wong, Kwok, Mok and Ungvari2004a ,Reference Tang, Ungvari, Chiu and Sze b ,Reference Tang, Ungvari, Chiu, Sze, Yu and Leung c ; Turner-Stokes et al. Reference Turner-Stokes, Kalmus, Hirani and Clegg2005; Williams et al. Reference Williams, Brizendine, Plue, Bakas, Tu, Hendrie and Kroenke2005; Lightbody et al. Reference Lightbody, Auton and Baldwin2007; Healey et al. Reference Healey, Kneebone, Carroll and Anderson2008; Lee et al. Reference Lee, Tang, Yu and Cheung2008; Berg et al. Reference Berg, Lonnqvist, Palomaki and Kaste2009; Roger & Johnson-Greene, Reference Roger and Johnson-Greene2009; Sagen et al. Reference Sagen, Gunnar Vik, Moum, Morland, Finset and Dammen2009; Sivrioglu et al. Reference Sivrioglu, Sivrioglu, Ertan, Ertan, Cankurtaran, Aki, Uluduz, Ince and Kirli2009; de Man-van Ginkel et al. Reference de Man-van Ginkel, Hafsteinsdóttir, Lindeman, Burger, Grobbee and Schuurmans2012b ; Kang et al. Reference Kang, Stewart, Kim, Jang, Kim, Bae, Kim, Shin, Park, Cho and Yoon2012; Tham et al. Reference Tham, Kwan and Ang2012; Turner et al. Reference Turner, Hambridge, White, Carter, Clover, Nelson and Hackett2012) for classification of psychiatric disorders, although one also used ICD criteria (Lincoln et al. Reference Lincoln, Nicholl, Flannaghan, Leonard and Van der Gucht2003). Others used another a screening or assessment tool as the gold standard (Watkins et al. Reference Watkins, Daniels, Jack, Dickinson and van den Broek2001a ,Reference Watkins, Leathley, Daniels, Dickinson, Lightbody, van den Broek and Jack b , Reference Watkins, Lightbody, Sutton, Holcroft, Jack, Dickinson, van den Broek and Leathley2007; Bennett et al. Reference Bennett, Thomas, Austen, Morris and Lincoln2006; Hacker et al. Reference Hacker, Stark and Thomas2010; de Man-van Ginkel et al. Reference de Man-van Ginkel, Hafsteinsdóttir, Lindeman, Burger, Grobbee and Schuurmans2012b ).
Screening for depression
Measures identified
Twenty-seven tools met the inclusion criteria and fell into three categories:
-
(1) Verbal tools for those who could self-report their mood (n = 15): the Beck Depression Inventory (BDI; Beck et al. Reference Beck, Ward, Mendelson, Mock and Erbaugh1961), BDI Fast Screen (BDI-S; Beck et al. Reference Beck, Steer and Brown2000) and BDI – Second Edition (BDI-II; Beck et al. Reference Beck, Brown and Steer1996), the Center for Epidemologic Studies Depression Scale (CES-D; Radloff, Reference Radloff1977), the 28-item General Health Questionnaire (GHQ-28; Goldberg & Williams, Reference Goldberg and Williams1988), the Geriatric Depression Scale (GDS, 30 items; Yesavage et al. Reference Yesavage, Brink, Rose, Lum, Huang, Adey and Leirea1983) and its 15-item version (GDS-15; Sheikh & Yesavage, Reference Sheikh and Yesavage1986), the Hospital Anxiety and Depression Scale (HADS; Zigmond & Snaith, Reference Zigmond and Snaith1983), the Kessler-10 (K10; Kessler et al. Reference Kessler, Barker, Colpe, Epstein, Gfroerer, Hiripi, Howes, Normand, Manderscheid, Walters and Zaslavsky2003), the two-item Patient Health Questionnaire (PHQ-2; Kroenke et al. Reference Kroenke, Spitzer and Williams2003) and the nine-item version (PHQ-9; Spitzer et al. Reference Spitzer, Kroenke and Williams1999), the Stroke Inpatient Depression Inventory (SIDI; Rybarczyk et al. Reference Rybarczyk, Winemiller, Lazarus, Haut and Hartman1996), the Wakefield Depression Inventory (WDI; Snaith et al. Reference Snaith, Ahmed, Mehta and Hamilton1971), the Yale question (Lachs et al. Reference Lachs, Feinstein, Cooney, Drickamer, Marottoli, Pannill and Tinetti1990; Mahoney et al. Reference Mahoney, Drinka, Abler, Gunter-Hunt, Matthews, Gravenstein and Carnes1994) and the Zung Self-Rating Depression Scale (Zung SDS; Zung, Reference Zung1965).
-
(2) Tools involving visual aids (visual analogue scales or pictures) to aid self-report for those with communication problems (n = 7): the Brief Assessment Schedule Depression Cards (BASDEC; Adshead et al. Reference Adshead, Cody and Pitt1992), the Depression Intensity Scale Circles (DISCs; Turner-Stokes et al. Reference Turner-Stokes, Kalmus, Hirani and Clegg2005), the distress thermometer (Holland et al. Reference Holland, Andersen, Breitbart, Compas, Dudley and Fleishman2011), the Numbered Graphic Rating Scale (NGRS; Turner-Stokes et al. Reference Turner-Stokes, Kalmus, Hirani and Clegg2005), ‘smiley faces’ (Lee et al. Reference Lee, Tang, Yu and Cheung2008), Visual Analogue Mood Scales (VAMS) ‘sad item’ (Stern, Reference Stern1997) and Visual Analogue Self-Esteem Scales (VASES; Brumfitt & Sheeran, Reference Brumfitt and Sheeran1999).
-
(3) Observer-rated measures for people unable to self-report their mood due to communication or cognitive impairments (n = 5): the Aphasia Depression Rating Scale (ADRS; Benaim et al. Reference Benaim, Cailly, Perennou and Pelissier2004), the Stroke Aphasic Depression Questionnaire (SADQ; Sutcliffe & Lincoln, Reference Sutcliffe and Lincoln1998), SADQ – Hospital version (SADQ-H; Lincoln et al. Reference Lincoln, Sutcliffe and Unsworth2000) and its 10-item version (SADQ-H10; Lincoln et al. Reference Lincoln, Sutcliffe and Unsworth2000), and the Signs of Depression Scale (SoDS; Hammond et al. Reference Hammond, O'Keefe and Barer2000).
The sensitivity and specificity for each cut-off score for the depression screening tools are detailed in Table 2. Eleven tools did not meet sensitivity and specificity criteria at any cut-off scores and were rejected (see Fig. 1). This left 16 tools that could accurately detect depression in people with stroke. Ten were verbal self-report tools, two used visual aids and four were observational measures; these are described briefly in the following sections.
Verbal self-report tools
The verbal self-report tools were all questionnaires completed by the person with stroke or verbally by interview with a health-care professional, apart from the single-question ‘Yale’ tool (Lachs et al. Reference Lachs, Feinstein, Cooney, Drickamer, Marottoli, Pannill and Tinetti1990; Mahoney et al. Reference Mahoney, Drinka, Abler, Gunter-Hunt, Matthews, Gravenstein and Carnes1994), which was reported verbally. All were developed to identify problems in a general population and subsequently applied to stroke.
The questionnaires ranged from two (PHQ-2) to 30 questions (GDS). Five used four- or five-point Likert scales for people to rate the severity or frequency of their symptoms (GHQ-28, HADS, PHQ-2, PHQ-9 and CES-D). Three used a ‘yes/no’ response format (GDS, GDS-15 and Yale question) whereas the BDI and its second edition BDI-II included four multiple choice statements that graded symptom severity. Most scales were designed to detect depression alone; exceptions were the GHQ-28, which is a measure of general distress, and the HADS, which includes separate scales for depression and anxiety. The time scale over which people rated their mood varied: present mood state (Yale question), the previous week (BDI, CES-D, GDS, GDS-15 and HADS) and the past 2 weeks (BDI-II, GHQ-28, PHQ-2 and PHQ-9) in line with DSM (APA, 1987, 1994, 2000). Most tools include somatic items (BDI, BDI-II, CES-D, GHQ-28 and PHQ-9) whereas others seek to limit their inclusion (GDS and GDS-15) or omit them completely (HADS, PHQ-2 and Yale).
Tools incorporating visual aids
Two tools incorporated visual prompts to aid self-report. The BASDEC consists of 19 cards, each with a single printed statement, that the person sorts into ‘true’ and ‘false’ piles to describe their present mood state. The VAMS ‘sad item’ involves a vertical line with a cartoon sad face at one end with a verbal descriptor and a neutral face at the opposite end. The person points to where they see themselves on the scale at the present time. Neither measure incorporated somatic symptoms of low mood.
Observational measures
Four observational measures were identified for use with people who could not self-report. The SADQ-H and the SADQ-H10 require an observer to rate the frequency of behaviours related to low mood over the previous week, using a four-point Likert scale, with elevated scores on two consecutive weeks suggesting low mood. The SoDS requires the observer to rate the presence of behaviours related to low mood over the previous week in a ‘yes/no’ response format. The ADRS uses an interview with, or observation of, the person to indicate severity. All of the measures except for the SoDS incorporated somatic mood-related symptoms.
Optimal cut-off scores
Having selected screening tools that could accurately identify depression, the optimal cut-off score to detect either major depression or any depressive disorder was explored (Tables 3 and 4). Sufficient data were available for only four verbal screening tools to identify effective cut-off scores for both major depression and any depressive disorder (BDI, GHQ-28, HADS total score and HADS depression subscale). Of the observational tools and those incorporating visual prompts, only the ADRS, BASDEC, SADQ-H and VAMS ‘sad item’ demonstrated effective cut-off scores although no distinction was made between severity of depression. For all tools, data were reported for multiple cut-off scores, but the results were so highly varied that optimal cut-off scores could not be identified for any of them.
Table 3. Sensitivity and specificity of verbal self-report tools to detect major depression or any depressive disorder

BDI, Beck Depression Inventory; BDI-II, Beck Depression Inventory, Second Edition; CES-D, Center for Epidemiologic Studies Depression Scale; GDS, Geriatric Depression Scale (30-item); 15-item GDS-15, Geriatric Depression Scale; GHQ-28, 28-item General Health Questionnaire; HADS, Hospital Anxiety and Depression Scale; PHQ-2, two-item Patient Health Questionnaire; PHQ-9, nine-item Patient Health Questionnaire.
Cut-off scores with at least one study reporting 80% sensitivity and 60% specificity criteria have been included. Where no studies reached the sensitivity and specificity criteria at a cut-off score, the cut-off score has been removed. Sensitivity and specificity levels that reach the selection criteria are highlighted in bold.
Table 4. Sensitivity and specificity of screening tools for post-stroke depression for those who are unable to self-report or require visual prompts

ADRS, Aphasia Depression Rating Scale; BASDEC, Brief Assessment Schedule Depression Cards; SADQ-H, Stroke Aphasic Depression Questionnaire – Hospital version; SADQ-H10, 10-item Stroke Aphasic Depression Questionnaire – Hospital version; SoDS, Signs of Depression Scale; VAMS, Visual Analogue Mood Scale.
Cut-off scores with at least one study reporting 80% sensitivity and 60% specificity criteria have been included. Where no studies reached the sensitivity and specificity criteria at a cut-off score, the cut-off score has been removed. Sensitivity and specificity levels that reach the selection criteria are highlighted in bold.
Clinical utility
The 16 selected screening tools were then assessed for clinical utility (Table 1). All of the tools took less than 15 min to administer; however, some of the reported administration times were not specific for people with stroke, who often have communication and cognitive problems and may therefore take longer to complete the mood screen. All except the ADRS could be administered without specialist training, but the time needed to complete this tool could not be ascertained. Ten tools were freely available (ADRS, CES-D, GDS, GDS-15, PHQ-2, PHQ-9, SADQ-H, SADQ-H10, SoDS and the Yale question), one (BDI-II) required the purchase of an initial starter kit, which costs under £100 (approximately US$150), whereas the three other tools cost over £100 (GHQ-28, HADS and VAMS). All tools that incurred costs for initial purchase also generated additional costs per person screened, either for paper record forms or for electronic reports. Two tools (BASDEC and BDI) were not available for use.
Seven tools met all clinical utility criteria (see Fig. 1): four verbal measures (PHQ-2, PHQ-9, GDS-15 and Yale) and three observational measures (SADQ-H, SADQ-H10 and SoDS). None of the tools that incorporated visual prompts to aid self-report met the clinical utility criteria (Table 5).
Table 5. Sensitivity and specificity of verbal self-report screening tools for post-stroke anxiety

HADS, Hospital Anxiety and Depression Scale.
Cut-off scores with at least one study meeting 80% sensitivity and 60% specificity criteria have been included. Where no studies reached the sensitivity and specificity criteria at a cut-off score, the cut-off score has been removed. Sensitivity and specificity that reach selection criteria are highlighted in bold.
Screening for anxiety
Eight screening tools tested to detect anxiety post-stroke were identified: three were verbal tools (GDS, GHQ-28 and HADS), three were observational (SADQ-H, SADQ-H10 and SoDS) and two used visual analogue scales (VAMS and VASES; Fig. 1). Only the HADS anxiety subscale was designed specifically to measure anxiety, although the total HADS has also been used and these were the only measures to yield adequate sensitivity and specificity data at any cut-point (Table 4). None of the visual analogue scales or observational measures met selection criteria.
Clinical utility
Clinical utility for the HADS is mixed (Table 1): it can be administered in under 5 min with minimal training for staff, but it incurs both initial and recurrent costs and therefore does not meet clinical utility criteria.
Discussion
Our extensive search strategies identified a wide range of tools to screen for mood disorders after stroke, but only the SADQ-H met both the psychometric and clinical utility criteria for both major depression and any depressive disorder; none of the tools incorporating verbal self-report or visual aids met all these criteria. The HADS, BDI, ADRS, BASDEC, GHQ-28 and VAMS ‘sad item’ all yielded good psychometric data for both major depression and any depressive disorder, indicating that they could accurately identify stroke survivors who needed further assessment and possibly treatment for depression. However, the BASDEC and the BDI are currently unavailable, the former could not be located and the latter was superseded by BDI-II; the other tools incur financial costs (although ‘pirate’ copies can be downloaded from the internet), require specialist training and/or are time-consuming to administer. Two tools met the clinical utility criteria and yielded acceptable psychometrics at a specific cut-off score for either major depression or any depressive disorder; the GDS-15 can detect any depressive disorder (but not specifically major depression) and so may be best used as an initial screen to identify people with stroke who require further evaluation. The PHQ-9 can detect major depression although sensitivity drops to 78% in identifying milder symptoms (Williams et al. Reference Williams, Brizendine, Plue, Bakas, Tu, Hendrie and Kroenke2005). The HADS (both total score and anxiety subscale) was the only effective tool to identify anxiety, but it does incur a financial cost.
There has only been one previous systematic review of screening tools for depression after stroke and this focused on the detection of major depression only (Meader et al. Reference Meader, Moe-Byrne, Llewellyn and Mitchell2014). One of our objectives in this study was to enable implementation of screening tools by identifying optimal cut-off scores for the tools we could recommend. However, this proved impossible because of the heterogeneity of participant characteristics, study designs and observed sensitivity and specificity estimates. Some of the heterogeneity was due to the varied choice of criterion measures against which the accuracy of the screening tools was judged. A semi-structured interview by a psychiatrist is widely accepted as the ‘gold standard’ reference criterion but use of different interview tools and diagnostic criteria can result in variances in diagnostic accuracy. The variability and accuracy of these tools will affect the reported accuracy of the screening tools against which they are measured. More consistent use of a criterion measure would enhance meta-analysis and comparison between tools.
A limitation of the psychiatric interview and diagnostic tools is that they are difficult for people with communication and cognitive difficulties to complete. Subsequently, many studies have used other screening tools as the criterion measure (Watkins et al. Reference Watkins, Daniels, Jack, Dickinson and van den Broek2001a ,Reference Watkins, Leathley, Daniels, Dickinson, Lightbody, van den Broek and Jack b , Reference Watkins, Lightbody, Sutton, Holcroft, Jack, Dickinson, van den Broek and Leathley2007; Bennett et al. Reference Bennett, Thomas, Austen, Morris and Lincoln2006; Lee et al. Reference Lee, Tang, Yu and Cheung2008; Hacker et al. Reference Hacker, Stark and Thomas2010; de Man-van Ginkel et al. Reference de Man-van Ginkel, Gooskens, Schepers, Schuurmans, Lindeman and Hafsteinsdóttir2012a ) or excluded people with these problems (even those that tested observational tools for people unable to self-report), making the assumption that the results from more able populations would generalize to people with stroke. There is limited evidence to support or refute this assumption. Communication and cognitive problems are common after stroke and their exclusion limits the generalizability and relevance of the findings to clinical practice. Further research is needed involving pragmatic samples of people with stroke at all stages of recovery and survivorship, using predefined cut-off scores to establish the optimal thresholds so that people with both major and mild disorders can be identified.
Most of the selected tools were originally developed for a psychiatric population and then applied to stroke survivors, assuming that the experience of mood disorder after stroke has the same construct as other populations. Although stroke survivors display similar distributions of symptoms scores as people with other physical illnesses (House et al. Reference House, Dennis, Mogridge, Warlow, Hawton and Jones1991), mood disorders may be experienced differently in someone with additional physical or cognitive impairments (Gainotti et al. Reference Gainotti, Azzoni and Marra1999). It is a matter of modern health policy that measurement tools should reflect the issues that are important and relevant to service users (so-called patient- reported outcome measures) (Darzi, Reference Darzi2008) and service users' views and expertise should be involved in all levels of research. Thus, it is notable that none of the selected screening tools included stroke survivors' perspectives in their construction and, although they have apparent face validity, the content validity for stroke survivors is unknown.
The inclusion of somatic items in the mood screening tools is particularly controversial because these can overlap with symptoms of stroke and/or effects of being in hospital. For example, many included items assess fatigue, concentration and memory problems or altered activity or sleeping patterns, which are common impairments after stroke independent of any emotional difficulties or hospitalization. Inclusion of such items would conflate the scores and might lead to ineffective clinical decision making or inaccurate research conclusions. To avoid the confounding effects of somatic items, some screening tools exclude them; however, the impact is unclear as there is some evidence that somatic symptoms are among the best differentiators between stroke survivors with and without depression (de Coster et al. Reference de Coster, Leentjens, Lodder and Verhey2005). It might be better to adjust the cut-off scores to reflect the increase in prevalence of these symptoms in the stroke population. Further research is required to investigate the construct of post-stroke depression and anxiety and to establish the content validity of the screening tools for stroke survivors.
A related issue is the construct validity of the screening tools for stroke survivors. Most selected tools were developed using classic test theory and are scored by summing the scores from the different items to produce a ‘total’ score. This is a controversial approach; many proponents of item response theory would consider this an inappropriate use of categorical data that could produce misleading scores (Tennant & Conaghan, Reference Tennant and Conaghan2007). Three of the selected tools have been subjected to Rasch analysis in relatively small samples: the BDI-II, GDS and HADS depression subscale. They report inconsistent findings regarding the unidimensionality of the tools and all identified redundant or disordered items, or ineffective scoring methods that did not fit the model (Pickard et al. Reference Pickard, Dalal and Bushnell2006; Tang et al. Reference Tang, Wong, Chiu and Ungvari2007; Siegert et al. Reference Siegert, Tennant and Turner-Stokes2010). Only one of the studies provided data to enable the ordinal scoring data to be transformed into interval level data to allow use of parametric statistics and change scores to be calculated (Siegert et al. Reference Siegert, Tennant and Turner-Stokes2010), and it remains a moot point whether this makes an important difference to the data and how they are reported. Further development of the tools should include Rasch analysis to ensure an effective and efficient scale structure.
Most of the selected tools rely on the person's ability to self-report their mood, which is often compromised after stroke. In an attempt to overcome this, several tools use visual aids to facilitate non-verbal responses. However, these are based on the assumption that stroke survivors are able to interpret the visual aids. Work applying visual aids in tools to measure pain suggests that many stroke survivors, particularly those with right-hemisphere damage, find this difficult (Benaim et al. Reference Benaim, Froger, Cazottes, Gueben, Porte, Desnuelle and Pelissier2007). Acceptability and clinical utility of these tools need to be examined with people with stroke with different types and severity of impairment.
Finally, we identified a wide range of tools to detect depression in stroke survivors, but there are few standardized tools to detect anxiety and emotionalism. Further work is needed to develop person-centred tools for these purposes.
Limitations
A limitation of this study is that the quality is dependent on the articles identified. There is evidence of selection bias in the two studies that excluded participants who did not report low mood (Lincoln et al. Reference Lincoln, Nicholl, Flannaghan, Leonard and Van der Gucht2003; Williams et al. Reference Williams, Brizendine, Plue, Bakas, Tu, Hendrie and Kroenke2005), thereby artificially increasing the prevalence of depression in the sample and probably affecting the reported PPV and NPV. We also included studies from around the world and note that the construct of depression could vary in different cultures, such as collectivist societies, which may have contributed to the heterogeneity of cut-off scores. Furthermore, we only included published English language studies and so may have missed relevant publications in other languages or unpublished data. To produce a generalizable result, we included studies that assessed stroke survivors at all stages of recovery, from the acute hospital setting to several years post-stroke. This may have contributed to our difficulties in ascertaining the optimum cut-off score for each tool, as sensitivity and specificity values at different cut-offs have been demonstrated to vary over time in the period following stroke (Berg et al. Reference Berg, Lonnqvist, Palomaki and Kaste2009). Finally, although we used recommended sensitivity and specificity criteria to reflect clinical priorities, alternative criteria may be warranted in different situations; for example, higher specificity may be required where resources for further assessment are limited. Further research should examine and compare the factor structure of depression globally and at different stages post-stroke to identify the most effective tools for each situation.
Clinical utility is rarely considered in tool development although it is key to uptake in clinical practice and research. We worked with clinicians in a range of stroke services in one of the largest conurbations in the UK to develop our measure of clinical utility, so we are confident that it is representative of the issues that limit implementation in the UK, at least. However, although the barriers to implementing tools in practice are fairly universal, the cut-off points may be context specific. Different models of health care may have different funding limits or time available for assessment. A further limitation is our reliance on reported administration times within the general population, as there were few reports within a stroke population. It is likely that screening tools would take longer to administer with patients with stroke-related impairments and activity limitations so the reported administration times may be underestimates. To address this, we contacted the authors or publishers of the selected tools for further information regarding clinical utility, but the information was not always available and should be considered a limitation.
Conclusions
The following tools can accurately screen for depression in stroke survivors in clinical practice: the GDS-15 can detect any depressive disorder and the PHQ-9 can detect severe depression whereas the SADQ-H can be used with stroke survivors who are unable to self-report. The HADS (both the total scale and the anxiety subscale) can effectively identify anxiety post-stroke but clinical utility is limited by the costs involved. We were unable to establish the optimal cut-off scores for these or the other selected tools.
Acknowledgements
This project was funded through a Knowledge Transfer Partnership (grant no. 0007812) by the Technology Strategy Board and the Greater Manchester and Cheshire Cardiovascular Network.
Declaration of Interest
None.