Introduction
The Epworth Sleepiness Scale was introduced by Johns in 1991. It assesses eight situations with a likelihood of falling asleep, and assigns scores from zero to three, giving a total scale of 0 to 24.Reference Johns1 It has gained huge popularity and has been translated into seven languages, and is used all over the world for assessment of daytime sleepiness and sleep apnoea.
There have been many studies of the association between Epworth scoring and sleep apnoea. Several studies have assessed the association between the Epworth Scale and various snoring-related parameters, such as the Apnoea–Hypopnoea Index (AHI), Mean Sleep Latency Test and Mean Wakefulness Test. However, no study to our knowledge has analysed the cause–effect association between Epworth scoring and sleep apnoea on a one-to-one basis, without introducing other confounding factors.
Introducing other parameters into the association has the potential to introduce bias and problems related to intercorrelations between the independent variables, unless the confounding factors have been controlled for.Reference Gottlieb, Yao, Redline, Ali and Mahowald2 Problems associated with such covariates have rarely been addressed in the literature. It is therefore not surprising that the evidence in the literature is inconclusive with regard to the diagnostic value of the Epworth Scale in patients with sleep apnoea.
It is imperative that the association between the Epworth Scale and obstructive sleep apnoea syndrome (OSAS) should be assessed on a one-to-one basis, in order to assess the efficacy of the Scale in the screening of patients presenting with sleep apnoea. With this objective in mind, we carried out a retrospective analysis using data from the sleep study database at Monklands Hospital.
Materials and methods
The first part of the project involved a systematic literature review. We searched the Cochrane, DARE (Database of Abstracts and Reviews of Effects), EMBASE (Excerpta Medical Database), CINAHL (Cumulative Index to Nursing and Allied Health Literature) and Medline (1996 onwards) databases. In addition, we searched for established guidelines, systematic reviews and evidence-based summaries. The search terms used were ‘Epworth’, ‘snoring’ and ‘apnoea’.
A total of 107 English language studies were identified, including eight Cochrane reviews and three systematic reviews. Each of these papers was studied for any demonstration of an association between the Epworth Sleepiness Scale and sleep apnoea. Only 16 studies were suitable for inclusion in our review. The results of these studies were categorised as type one (a positive association between Epworth scoring and sleep apnoea) and type two (no significant association between Epworth scoring and sleep apnoea). These studies are detailed in Table I.
Table I Studies of ESS vs snoring parameters
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160802135953-43465-mediumThumb-S0022215111003082_tab1.jpg?pub-status=live)
ESS = Epworth Sleepiness Scale; AHI = Apnoea–Hypoponea Index; OSAS = obstructive sleep apnoea syndrome; SMD = standardised mean difference; OR = odds ratio; MSLT = Mean Sleep Latency Test; min = minute; ROC = receiver operating characteristic; AUC = area under the curve; RDI = Respiratory Disturbance Index; PSG = polysomnogram; ANOVA = analysis of variance; EDS = Excessive Daytime Sleepiness
Each study was reviewed and, wherever possible, the effect size of the Epworth Scale in relation to the AHI or Respiratory Disturbance Index was calculated using Cohen's definition.Reference Cohen18 In studies that used a correlation coefficient between the Epworth scores and the AHI or the Respiratory Disturbance Index, the effect size (unweighted) was the magnitude of the correlation coefficient itself.
Cohen's definition of small, medium and large effect sizes was used to identify effect sizes in these studies. In studies in which comparison of mean Epworth scores was undertaken, the effect size was defined as a standardised mean difference between the groups.Reference Cohen18 RevMan version 5 software (Cochrane Incorporated) was used to compute effect sizes in the various studies and to perform a partial meta-analysis.
In some studies, categorisation of patients by AHI or Respiratory Disturbance Index values (for comparison with Epworth scores) resulted in more than two groups. In these studies, the group with the highest mean and standard deviation values was compared with the group with the lowest mean and standard deviation values, in order to estimate the effect size as described by Cohen.Reference Cohen18
A Forest plot was generated using RevMan software, and this plotted effect sizes and 95 per cent confidence intervals (CI) of the effect size (Figure 1).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160802135953-49283-mediumThumb-S0022215111003082_fig1g.jpg?pub-status=live)
Fig. 1 Forest plot showing effect sizes as standard mean difference and 95 per cent confidence intervals, for studies using comparison of means. RDI = Respiratory Disturbance Index; AHI = Apnoea–Hypopnoea Index; IV = Inverse Variance method; SD = standard deviation; CI = confidence interval
The sleep study database of Monklands Hospital was used to extrapolate the details of the patients who underwent a sleep study at our centre during a 10-year period from 1998 to 2008. All patients referred with loud, constant snoring, including those with possible sleep apnoea, underwent a sleep study.The sample size was 343 patients.
All patients had a basic sleep assessment in the form of pulse oximetry and periodic observation. Patients having more than two desaturations per hour underwent five-channel home polysomnography. Evidence in the current literature uses a more stringent form of reference, which directs that only patients with more than five desaturations per hour should undergo full polysomnography.Reference Netzer, Eliasson, Netzer and Kristo19 In Woodhead and colleagues' study, the presence of OSAS was taken to be suggested by more than 10 falls in saturation of more than 3 per cent from the baseline.Reference Woodhead, Davies and Allen20 Our low threshold for referral ensured the least likelihood of a false negative result as regards identification of sleep apnoea. We used a validated definitionReference Lacasse, Godbout and Sériés21 of the AHI, involving more than 15 episodes per hour, as the cut-off point for defining sleep apnoea in a patient.
The Epworth score and presence or absence of sleep apnoea were recorded for each patient, using the definitions mentioned in the above paragraph. Various statistical measures were used to analyse the association between the two results, using no other variable apart from the Epworth score to analyse the predictability of sleep apnoea.
The first part of the analysis compared the mean Epworth score in the group with sleep apnoea with that in the group without sleep apnoea. Student's t-test was used to compare the mean values. A logistic regression analysis was used to assess the predictability of the Epworth score with regard to the probability of occurrence of sleep apnoea measured objectively using odds ratios. For the purposes of validation and illustration of results, a receiver operating curve analysis was used to determine any association between the Epworth score and the probability of OSAS occurrence.
Results and analysis
Literature review
The first part of the study involved a literature review. This included a large number of studies using Epworth scoring. We found 16 studies directly investigating the association between the Epworth score and OSAS. Studies that involved evaluation of continuous positive airway pressure treatment were not included in our analysis. Five of the 16 studies found a significant association between Epworth score and OSAS (Table I) (i.e. a type one result). Eleven of the 16 studies showed no significant association between Epworth score and OSAS (i.e. a type two result) (Table I).
Significance was determined by the statistical inference drawn by the study authors. The differentiation between type one and type two results was primarily based on the study authors' results and /or conclusions. Post hoc effect sizes were determined wherever possible using the criteria mentioned above. Two studiesReference Herzog, Kühnel, Bremert, Herzog, Hosemann and Kaftan3, Reference Montoya, Bedialauneta, Larracochechea, Ibarguen, Del Rey and Fernandez5 used regression analysis with an outcome measure of OSAS as a dependent variable, and these were the studies in which the cause–effect relationship could be established. The regression coefficient in these studies was small (Table I).
In studies that compared the mean Epworth score among groups defined by the AHI and the Respiratory Disturbance Index, a uniform cut-off threshold for sleep apnoea was not present. Therefore, this differentiation was made based on the authors' recommendations in each individual study.
Six of the 16 studies showed a weak correlation between the Epworth score and the AHI; two studies used regression analysis with the AHI as the outcome variable, and the Epworth Scale was found to have small regression coefficients. Two studies found significant correlations between the Epworth score and the Mean Sleep Latency Test, but concluded that the latter was more useful in predicting OSAS.
A formal, complete meta-analysis was not performed as the data and outcome measures in all the studies could not be pooled. This was primarily because of the use of a combination of parametric and non-parametric measures to investigate the correlations between the Epworth score and the AHI.
Effect sizes were calculated wherever possible, and are shown in Figures 1 and 2. It was possible to perform a partial meta-analysis, in order to determine the effect size, in those studies that used comparison of mean Epworth scores among patient groups. Standardised mean difference was used to calculate the effect size.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160715230251-87895-mediumThumb-S0022215111003082_fig2g.jpg?pub-status=live)
Fig. 2 Bar chart showing correlation coefficients (Pearson's R or Spearman's rho) for studies correlating Epworth Sleepiness Scale with the Apnoea–Hypopnoea Index or the Respiratory Disturbance Index.
Weighting of effect sizes was done by the inverse variance method, and a random-effect model was applied for meta-analysis, which assumes that the effect size was not uniform across the variable populations studied. This was statistically justified as the chi-square test for heterogeneity was significant (p < 0.01). However, this may also be clinically explained by the lack of objectivity in the Epworth Scale scores across the population samples. This may partially be due to the lack of a uniform AHI cut off threshold to define sleep apnoea.
The total effect size was 0.77, which is deemed to be large by CohenReference Cohen18 (Figure 1). In the case of correlations, Cohen described a correlation coefficient of up to 0.2 as a low effect, up to 0.4 as moderate, and greater than 0.4 as a large effect.Reference Cohen18
Study
In our study, 105 patients had OSAS whereas 238 patients did not (Figure 3), and the mean Epworth score was 8.75 (95 per cent CI 8.30–9.19). The sample of 343 patients had a representative age distribution (median age 45 years).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160715230251-68596-mediumThumb-S0022215111003082_fig3g.jpg?pub-status=live)
Fig. 3 Bar chart showing patients with and without sleep apnoea.
The Epworth scores were normally distributed, as suggested by the Kolmogorov–Smirnov test (z = 1.22, p = 0.103). The mean Epworth score was 10.94 (95 per cent CI 9.46–11.42) in the OSAS group and 7.73 (95 per cent CI 7.04–8.41) in the non-OSAS group (Figure 4). The difference in mean values was statistically significant (Student's t-test, p = 0.003).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160715230251-93499-mediumThumb-S0022215111003082_fig4g.jpg?pub-status=live)
Fig. 4 Box plot showing mean Epworth scores for patients with and without obstructive sleep apnoea syndrome. The difference was statistically significant (p = 0.003).
Logistic regression was used to assess the predictive potential of the Epworth score as regards the probability of OSAS occurrence. Table II shows the mean Epworth scores in the various groups with sleep apnoea as defined by the AHI. It is however worth noting that, for the purposes of analysis, the ‘mild’ group (with an AHI of 15 or less) was not included in the group defined as having sleep apnoea. This approach has been validated in recent literature.Reference Lacasse, Godbout and Sériés21 The goodness of fit statistic (using the Hosmer and Lemeshow test) proved that the analysis model fitted the data well (p = 0.455).
Table II Epworth score by AHI
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20151021031659837-0164:S0022215111003082_tab2.gif?pub-status=live)
AHI = Apnoea–Hypopnoea Index; Pts = patients; ESS = Epworth Sleepiness Scale score
The Epworth scoring was able to explain 7 to 10 per cent of the variance in the probability of occurrence of OSAS (Cox and Snell R 2 = 0.07, Nagelkerke R 2 = 0.10), with only 69.4 per cent of the cases being correctly predicted by the model. The odds ratio for the Epworth score was 1.118 (95 per cent CI 1.068–1.171). This implied that when the Epworth score increased by one point, the odds of occurrence of OSAS increased nearly 1.12 times.
A receiver operating curve (Figure 5) was used to analyze the predictive potential and to validate the results. Essentially, this is a graph that plots sensitivity on the y axis and the false positive rate (1-specificity) on the x axis, to determine the association between sensitivity and the false positive rate. Any curve above the reference line shown in Figure 5 implies a positive association. Calculating the area under the curve is a good way of determining the predictive potential.Reference McNeil and Hanley23 An area under the curve of 0.5 to 0.7 implies a marginally useful test, 0.7 to 0.9 a good test and greater than 0.9 an excellent test.Reference McNeil and Hanley23
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160715230251-58810-mediumThumb-S0022215111003082_fig5g.jpg?pub-status=live)
Fig. 5 Receiver operating characteristic curve showing the relationship between the sensitivity and the false positive rate (i.e. 1 – specificity). Blue plot = sensitivity vs (1 − specificity); green line = reference line
The area under the curve for the Epworth score was 0.672, making it only a marginally useful test. This also implies that Epworth scoring would only be able to correctly predict the occurrence of OSAS in 67.2 per cent of cases. This is close to the estimation made by logistic regression analysis, whereby nearly 69 per cent of cases were correctly classified, thus validating the analysis.
Discussion
Numerous studies have assessed the association of the Epworth Sleepiness Scale with OSAS, but few have compared the effect sizes of the relative groups. This has led to confusing conclusions, many made indirectly. Our study analyses indicate that, although the mean Epworth score was significantly greater in the OSAS group, this does not translate into a clinically significant cause effect association due to the low effect size of the Epworth score. This is a unique conclusion which contrasts with other authors' interpretations of their findings.
The studies we reviewed used various statistical measures, and a large number of variables were included in their analyses. This distracted attention from the primary question of whether the Epworth Sleepiness Scale is a good predictor of OSAS. Effect size estimation was not strongly emphasised in the studies we reviewed, despite the fact that this is the only measure that will indicate the strength of the relationship between Epworth scoring and OSAS. Moreover, with one exception (Smith et al.),Reference Smith, Oei, Douglas, Brown, Jorgensen and Andrews10 our review studies used inclusion criteria for their other variables which were neither sufficiently clear nor consistently validated on statistical grounds. In our study, other variables were not added into the regression model, as this would have affected the intercorrelation between variables and subsequently altered the findings of our analysis.
To our best knowledge, ours is the only study to provide a dual validation of the association of Epworth scoring with OSAS on a one-to-one basis, along with effect size estimation. Our findings add to the growing body of evidence questioning the importance of the Epworth Sleepiness Scale as a tool in the evaluation of sleep apnoea.
A large number of ENT centres depend heavily on Epworth scoring to screen patients with sleep apnoea.Reference Olson, Cole and Ambrogetti7, Reference Mediano, Barcelo, de la Pena, Gozal, Agusti and Barbe24, Reference Mackay25 Use of the Epworth Scale as a screening tool has been justified in a recent study that suggested that only patients with an Epworth score of more than 11 should be referred for a sleep study.Reference Mackay25 Although the Epworth Scale is a relatively simple scoring system and can be easily completed in the out-patients department, relying solely on Epworth scores for OSAS screening could exclude a significant proportion of patients with OSAS (Table III).
Table III Effect of ESS cut-off on screening numbers and OSAS false negatives
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160802135953-88904-mediumThumb-S0022215111003082_tab3.jpg?pub-status=live)
*Compared with total patient (pt) numbers. ESS = Epworth Sleepiness Scale; OSAS = obstructive sleep apnoea syndrome
Since the first introduction of the Epworth Sleepiness Scale by JohnsReference Johns1 in 1991, there have been numerous studies on its validation. Johns used a rigorous exploratory factor analysis to justify the inclusion of the eight points in the analysis. A recent study by Smith and colleagues,Reference Smith, Oei, Douglas, Brown, Jorgensen and Andrews10 however, refutes Johns' inclusion criteria by means of a confirmatory factor analysis, which provides a more stringent validation of Johns' results. In addition, a six-point scale (instead of an eight-point scale) was found to have more favourable inter-reliability statistics.Reference Smith, Oei, Douglas, Brown, Jorgensen and Andrews10
Various studies have shown that there is a substantial amount of confusion regarding the validity of the Epworth scale. Some good quality type one studies have relied upon comparison of mean Epworth scores in patients with and without sleep apnoea. This difference has been noted to be significant, and the same result is reproduced in our study. In our patients, the mean Epworth score was 7.73 in the non-OSAS group and 10.73 in the OSAS group. This difference was statistically significant (p = 0.003).
However, such a comparison is not in itself a reliable method of establishing a cause–effect relationship, as it suggests that every patient with OSAS is likely to have a high Epworth score, but this does not mean that any patient with a high Epworth score is likely to have OSAS. There are many other reasons why a patient may have daytime somnolence and fatigue. Conversely, a low Epworth score does not exclude sleep apnoea.
Thus, in addition to a simple comparison of means, we must determine the variance in the probability of the occurrence of OSAS that is accounted for by the Epworth score. This was 7 to 10 per cent in our study, suggesting that close to 90 per cent of the variance was due to other factors. The confounding effect of these covariates has rarely been taken into account in other studies.
Effect size is important in showing the strength of a relationship. In our study, this was represented by the odds ratio. The odds of having OSAS went up 1.118 times if the Epworth Sleepiness Scale score went up by one unit. Although this is significant (95 per cent CI 1.068–1.171), the effect size is very low as an odds ratio of one would be consistent with no effect at all.
Figure 1 shows the results of our partial meta-analysis involving studies using mean value comparisons. The total effect size was 0.77, suggesting that one could expect a difference of approximately 80 per cent of a standard deviation between the mean Epworth scores of OSAS and non-OSAS patients. This effect size may be statistically significant, but may be inappropriate for justifying Epworth scoring as a screening test, since the large number of false negatives generated may be considered unacceptable (Table III).
The large effect size was also to some extent reflected in the significant difference in the Epworth scores of OSAS and non-OSAS patients in our study. However, by simply comparing the mean values, the effect of other variables is not included. Since the various studies used different cut-off points to define OSAS, this itself has the potential to introduce an inclusion bias towards larger effect sizes, as some studies included healthy volunteers whereas others only included patients with OSAS. Therefore, one can expect the actual effect size to be smaller, as seen in Figure 2, which shows the correlation coefficients of various studies.
Two studies that used a comparison of mean Epworth scores (Kumar et al. Reference Kumar, Bhatia, Tripathi, Srivastava and Jain4 and Chung et al. Reference Chung15), reported relatively large effect sizes. However, in the former the error margin was large (wide confidence intervals), and in the latter a selection bias cannot be ruled out as the ‘control’ group were people who worked as associates or colleagues in the same institution as the principal investigator. If these studies were excluded from our analysis, the effect size would reduce further.
An interesting observation is the low effect size in terms of the correlation coefficient reported in a few studies that also reported a standardised mean difference showing a moderate effect. The possible explanation could be the presence of other compounding variables that would lead to a low effect size while measuring correlation. Figure 2 shows the universally low effect sizes obtained for correlations between Epworth scoring and AHI or Respiratory Disturbance Index parameters.
The literature abounds with descriptions of confounding factors presumed to affect the Epworth Sleepiness Scale. Body mass index (BMI) has been shown to be a predictive factor for hypertension in an Asian population.Reference Leng, Mosharraf-Hossain, Chan and Tan26 This identifies BMI as one of the many Epworth Scale confounding factors. Another studyReference Banks, Barnes, Tarquinio, Pierce, Lack and Leon27 showed that predictive factors such as age and subjective sleep history, among others, accounted for nearly 12.8 per cent of the observed variance, and reported that almost 33 per cent of subjects had discrepant Epworth scores. Again, this shows the importance of confounding factors.
Some studies used a simple comparison of means to justify the predictive ability of the Epworth Sleepiness Scale.Reference Kumar, Bhatia, Tripathi, Srivastava and Jain4, Reference Resta, Carratu, Carpagnano, Maniscalco, Di Gioia and Lacedonia28 However,this does not imply a cause–effect relationship. A few studiesReference Olson, Cole and Ambrogetti7, Reference Kingshott, Sime, Engleman and Douglas13, Reference Osman, Osborne, Hill and Lee14, Reference Thong and Pang16 used correlation coefficients to study Epworth Scale associations; however, this method does not establish a certain cause–effect relationship either.
Interestingly, only two studiesReference Herzog, Kühnel, Bremert, Herzog, Hosemann and Kaftan3, Reference Montoya, Bedialauneta, Larracochechea, Ibarguen, Del Rey and Fernandez5 used regression analysis, both of which found the regression coefficient and/or the correlation coefficient to be significant but small. A standardised regression coefficient generated by a regression analysis using many predictor variables along with the Epworth score is a measure of a semi-partial correlation between the Epworth score and a measure of the dependent variable (OSAS).
There is no debate that the Epworth Sleepiness Scale is a good, simple, easy screening method for excessive daytime sleepiness, and is also useful in the prediction of certain parameters associated with daytime sleepiness and snoring. Excessive daytime sleepiness irrespective of OSAS is in itself associated with significant morbidity.Reference Kessler and Rodenstein29
There is an abundance of support for the validity of the Epworth Scale in current literature, including studies using a non-English version of the Scale.Reference Beiske, Kjelsberg, Ruud and Stavem30 The correlation between the Epworth Scale and the Mean Sleep Latency Test has been found to be not significant.Reference Beiske, Kjelsberg, Ruud and Stavem30
A few of our type two studies did not assess the cause–effect relationship. Fong et al. Reference Fong, Ho and Wing11 used analysis of variance to compare the mean Epworth scores of groups of patients graded according to OSAS severity. They found no significant difference between Epworth scores, and concluded that the Mean Sleep Latency Test varied significantly between these groups.
The importance of clinical assessment alone (including Epworth scoring) in detecting non-apnoeic snorers has been emphasised in the literature.Reference Lim and Curry31 The sensitivity in this particular study was reported as more than 90 per cent. This implies that a group of simple snorers with high Epworth scores, and therefore excess daytime sleepiness, could be detected by clinical assessment alone, irrespective of their Epworth scoring. Apart from the fact that this requires very rigorous clinical history-taking, the subjective element of such an exercise cannot be ignored. This is adequately justified by Table III, which shows the number of apnoeic patients missed by Epworth Scale assessment. It has been suggested that the Epworth Sleepiness Scale is highly variable when administered sequentially to a population, suggesting that its reproducibility in the clinical setting is not optimum.Reference Nguyen, Baltzan, Small, Wolkove, Guillon and Palayew22
Osman and colleaguesReference Osman, Osborne, Hill and Lee14 have suggested that the Epworth Sleepiness Scale is not a good predictor of OSAS, on the basis of a poor correlation observed between the Epworth score and the AHI. They believed this to be due to the fact that excessive daytime sleepiness can be present in simple snorers (caused by unknown mechanisms); this has also been mentioned by Gottlieb and colleagues.Reference Gottlieb, Yao, Redline, Ali and Mahowald2
• The Epworth Sleepiness Scale is only marginally useful in predicting the occurrence of obstructive sleep apnoea syndrome (OSAS)
• Epworth scoring should not be used alone to screen patients for OSAS
• Patients with loud, constant snoring and a history suggestive of OSAS (or no reliable history) should undergo sleep assessment regardless of their Epworth score
Overall, the evidenceReference West, Bennett, Deegan, Merry, Watson and Jones32 suggests that any patient with a history of possible OSAS should have some form of sleep assessment regardless of their Epworth scoring. The importance of dividing referrals into those with possible sleep apnoea and those with simple snoring has been described.Reference West, Bennett, Deegan, Merry, Watson and Jones32 In the UK, this will be particularly important with the introduction of the forthcoming 18-week referral-to-treatment period, so that patients can obtain a cross-specialty referral for a different condition from that of the initial referral if necessary. An exception will be some specialist ENT centres that have the expertise to perform sleep assessments in the same unit.
The importance of pulse oximetry, and its predictive potential, are well documented.Reference Netzer, Eliasson, Netzer and Kristo19 This measure is a valid predictor of sleep apnoea.Reference Netzer, Eliasson, Netzer and Kristo19 Not all patients in our study underwent full five-channel nocturnal polysomnography, due to the need to optimise resource allocation. Thus, there was the potential for false negative results. However, this was minimised by having a liberal threshold for polysomnography referral. Evidence in the current literature justifies the instigation of therapeutic measures for OSAS simply on the basis of pulse oximetry, suggesting that the possibility of false negatives during such a process may be statistically insignificant.Reference Netzer, Eliasson, Netzer and Kristo19