Published online by Cambridge University Press: 28 May 2004
Objectives: The clinical assessment efficiency of the CAGE questionnaire for alcohol abuse based on diagnostic accuracy has not been fully established to date because of the varied and inconclusive gold standards used as diagnostic criteria. CAGE has also been highlighted to miss almost half of the risk-drinkers due to the use of inadequetly set criteria for the positive recognition of alcohol abuse. This study aims to establish the diagnostic accuracy of CAGE at different treatment settings.
Methods: A hybrid of the receiver operating characteristic (ROC) and the Taguchi method was used, as this approach proved to evaluate the diagnostic performance and accuracy in hypothetical clinical settings. Data were used from three cross-clinical treatment settings, i.e., general medicine outpatients, medical inpatients, and psychiatric inpatients, and analyzed by means of a step-wise application of managable number of statistical indices such as the area under the ROC curve (AUC), leveling factor (p′), and signal-to-noise ratios (S/N; standardized S/N [SS/N]).
Results: The selected settings yielded similar AUCs but portrayed different trade-offs on the ROC curves signaling the presence of different critical CAGE scores. Analysis of the sensitivity and specificity data of i, ii, iii by p′, S/N, SS/N and their dependent relation resulted in the critical CAGE scores of 1,1, and 2; and high diagnostic accuracy levels of 76.84 percent, 86 percent, and 76.84 percent, respectively.
Conclusions: By setting these critical CAGE scores as the minimum detection levels of alcohol abuse, early intervention before the onset of serious alcohol-related problems is possible. This will decrease the health-care costs of the patient and, in addition, reduce the psychological and social burdens inherent to alcohol abuse both on the patient and society. Having its critical scores reliably identified and diagnostic accuracy fully determined, CAGE can now improve the detection rate of problem drinking individuals substantially.
The CAGE questionnaire was first developed by Ewing and Rouse (4). Since the year it was formulated, it has been recognized as a valid and reliable screening instrument for alcohol abuse and dependence in clinical settings. The CAGE was initially validated by Mayfield in psychiatric inpatients (6). Then, Bush subsequently studied the CAGE questionnaire using medical inpatients (3). Next, Buchsbaum applied the CAGE on elderly general medicine outpatients (1;2).
The brevity and applicability of the CAGE for varied clinical settings make it particularly attractive as a screen for detecting alcohol abuse. Up-to-date, the CAGE has not only been used in a variety of other patient groups, such as college and university students, but also for measuring the dimensions of alcohol problems in general populations (7;8).
CAGE is an acronym arising from key concepts contained in each of the following four questions of the CAGE questionnaire: (i) Have you ever felt you should Cut down on your drinking? (ii) Have people Annoyed you by criticizing your drinking? (iii) Have you ever felt bad or Guilty about your drinking? (iv) Have you ever had a drink first thing in the morning to steady your nerves or to get rid of a hangover (Eye-opener)?
Each of the above question yields an answer in the form of binary responses. The CAGE is more positive giving more affirmative answers to the four component questions. Each affirmative answer accounts for “1” point, whereas each negative answer is given “0” points. The resulting accumulated points is called the CAGE score. A CAGE score enables a physician to stratify patients along a continuum of risk for alcohol abuse or dependence: the higher the CAGE score, the greater the probability of problem drinking. Usually, a dichotomous model is used in the interpretation of CAGE scores, that is, all patients above a predetermined cut-off point are assigned the same risk of alcohol abuse. Therefore, a “positive CAGE” represents all possible CAGE scores above the borderline cut-off point, where there is a risk in real terms for alcoholism whether this point be the same or different in various treatment settings (10).
Each treatment setting is an independent system that involves patient groups suffering from various levels of problem drinking that may need to be diagnosed by adjusting medical decision-making criteria. Therefore, it is more likely for patients with different treatment settings to have different borderline cut-off points for the risk of alcohol abuse. Patient groups may indicate a different positivity criterion of CAGE with absolute risk for alcohol abuse, because the definition of positive CAGE would be different due to inherent characteristics of each independent system. This point where CAGE is considered to be positive portrays a clinical significance. The choice of the borderline cut-off point in a treatment setting must be influenced by the relative importance of sensitivity and specificity. This in turn will be ratified by the decisions to be made when a case is detected, the implications of missing a case, and the clinical services available. The presence of such a borderline CAGE score can provide a structured, disciplined, and reliable means to detect individuals at risk as well as calling for further investigation into the suspicion of alcohol abuse. This borderline can also be called the “critical CAGE score,” because it should always be chosen where the patient's risk for alcohol abuse and problem drinking is identified with high diagnostic accuracy (10).
The values of sensitivity and specificity incorporated in the studies of Buchsbaum et al., Bush et al., and Mayfield et al., together form a fundamental basis for this study (2;3;6). These values given for each CAGE score result from that score being used as a cut-off point, so that all those with that score or above are deemed positive, and those with a lower score are deemed negative.
The direct effect of misclassification to diagnostic accuracy becomes more clear when the true-positive rate (sensitivity) is plotted against the false-positive rate (1-specificity) as the CAGE score is varied. This can be accomplished by means of receiver operating characteristic (ROC) curves. These curves can yield a single measure for evaluating the diagnostic accuracy, namely the area under the curve (AUC). This area quantifies the information value of the CAGE independently of the cut-off points, indicating the likelihood of correct classification and reflects an overall performance, with a quantified area of 1.0 being a perfectly accurate CAGE and 0.5 representing a CAGE that provides no discrimination between alcoholic and nonalcoholic individuals. With ROC analysis, the closer its AUC to 1, the more likely the CAGE study will accurately distinguish abuse from nonabuse. AUC can also be used as a means of both evaluating the diagnostic accuracy of a CAGE study or comparing the performances of CAGE studies.
The areas under the constructed ROC curves are estimated along with standard errors (SE) and 95 percent confidence intervals. For this purpose, the Hanley-McNeil method is used, based on the nonparametric Wilcoxon statistic (5). For the studies of Buchsbaum et al., Mayfield et al., and Bush et al., the AUCs were calculated. The ROC of Buchsbaum et al. yielded an area of 0.89 with a standard error (SE) of 0.013, whereas that of Mayfield et al. gave an area of 0.91 with an SE of 0.017. Subsequently, the ROC of Bush et al. was calculated to have an area of 0.90 with an SE of 0.021. These SEs associated to the AUCs were obtained from the variance of the Wilcoxon statistic (2).
Because the total AUC reflects the overall performance of CAGE, these close AUC results indicate that under most conditions the three studies have essentially yielded the same results. However, these areas have a readily interpretable probabilistic meaning, since AUC is the probability that a randomly selected pair of observations drawn from the two underlying distributions will be ranked (and thus classified) correctly. This corresponds to the probability that a true positive (TP) and a true negative (TN) that are selected at random will be correctly ordered by the CAGE. Each CAGE study yielded an AUC of approximately 0.90. This finding means that an individual who has been randomly selected from a group of alcoholic individuals is with 90 percent probability, gives a more unbiased positive CAGE result than an individual who has been randomly selected from a group of nonalcoholic individuals.
From Taguchi's perspective, the dynamic goal of CAGE can be summarized as finding the combination of sensitivity and specificity levels that produce different levels of performance in direct proportion to the correct classification (signal) of alcoholics and nonalcoholics while producing minimum variation due to the misclassification (noise) at each cut-off point. Thus, we can implement a new index to evaluate the diagnostic accuracy of the CAGE, namely the signal-to-noise (S/N) ratio. This ratio has been formulated by Dr. Genichi Taguchi as part of the Taguchi method for improving the performance of industrial processes under varying settings. Applying Taguchi's philosophy, physicians will be provided with screening instruments that have consistent maximum performance, minimum loss-to-society, whose variations due to misclassification factors are significantly reduced.
The S/N ratio is the ratio of the level of performance this desired function to the variability of undesired function. Therefore, S/N ratio provides an index of robustness and can be calculated at each CAGE score from the following equation:
Here, we will use S/N ratio and its standardized form to rule-in the possible performance differences between the three studies by means of the unique performance evaluation criteria it includes. The S/N ratio is a single measure that analyzes the influences of the means and standard deviations of each correct classification (TP and TN) and misclassification (false positive and false negative) to diagnostic performance at the cut-off points where the study is defined. Thus, it can characterize the inherent tradeoff between the true and false decisions in a classification system.
A more sophisticated form of S/N ratio is called the standardized signal-to-noise ratio (SS/N) that is used in binary systems. This ratio is different from the normal S/N ratio in that it directly incorporates the levelling factors (p′) calculated at the mid-point of the density functions of abuse and nonabuse for each CAGE score instead of measuring the diagnostic performance from classification and misclassification rates (10). The equation for determining SS/N ratio for a given CAGE score is as follows:
The ideal function of CAGE is the performance state where all patients known to have alcohol abuse and individuals known not to have alcohol abuse are transformed into the intended function as TPs and TNs, respectively. In a perfect world, 100 percent of the alcoholic patients and nonalcoholic individuals are desired to be transformed into intended function, but in reality, there are always misclassifications leading to unintended function. These misdiagnoses are a loss-to-society, leading to all sorts of costs, including inappropriate or unnecessary therapeutic and diagnostic activity cost, patient-care cost, as well as emotional and financial damages that the patients and their families have to face.
Consistent performance with high diagnostic efficiency can only be achieved by moving the mean performance to the intended function as well as reducing the variations around the intended function caused by the development parameters of the diagnosis such as the cognitive block. When a S/N ratio of approximately 3 dB is observed, a strong effect of classification characteristics to the diagnostic performance is known to have occurred. In addition, the higher the S/N ratio of the CAGE score, the higher the performance of the CAGE study at that cut-off point. The S/N ratio for the CAGE studies of Buchsbaum et al., Bush et al., and Mayfield et al. along with SS/N ratio at each CAGE score are calculated and given in Table 1.
As observed from Table 1, there is an inverse relationship between the S/N and SS/N ratios. This relationship permits the simultaneous verification of the performance of each CAGE score. The lowest SS/N ratio and highest S/N ratio for the CAGE study of Bush is achieved at the CAGE score of 1, whereas the same criteria are maintained at a point between the CAGE scores of 1 and 2 for the studies of Buchsbaum et al. and Mayfield et al. At these critical CAGE scores, the performance of three studies was at its maximum. As for the optimum interval, the relative costs, risks, ease, and convenience of the lower-score and higher-score should be determined.
The diagnostic accuracy of the CAGE, however, can be determined by using the two ratios and combining them in a single measure for evaluating diagnostic accuracy, namely (SS/N – S/N). The smaller the difference at the critical CAGE score, the more accurately CAGE study rules-in the presence of alcohol abuse. Because S/N and SS/N are dependent of the predetermined cut-off points, the value of (SS/N – S/N)min can enable us to find whether the lower score or the higher score of the critical interval performs better as well as providing a means of making comparisons between the CAGE studies in determining the diagnostically most accurate study.
In an ideal world, the performance of the CAGE is at its maximum, that is, 100 percent, and there is no loss-to-society. The p′ index is also at its maximum level. This state of the levelling factor is the ideal state of p′, which is denoted by p′ideal. This statistic can only be calculated at the critical cut-off point of each CAGE study from the following sensitivity and specificity equivalence:
The p′ideal results recorded were 0.15 for Bush et al., whereas Buchsbaum et al. and Mayfield et al. each had values of 0.19. When the sensitivity and specificity values at the cut-off point are adjusted to be equal, then no variation between S/N ratio and SS/N ratio is observed. This idea is expressed in the following equation:
Thus, it is now possible to design robust questionnaires and locate the critical cut-off point where (SS/N – S/N) variation is minimum. Similarly, this minimization of the (S/N – SS/N) statistic can optimize the diagnostic performance of CAGE. The (SS/N – SS/N) values for the studies of Buchsbaum et al., Bush et al., and Mayfield et al. are calculated at each CAGE score. Then, (SS/N – S/N)min values are recorded below.
The study of Bush et al. yields an (SS/N – S/N) at the numerical value of 0.053 for a CAGE score of 1. The (SS/N – S/N) index also makes a judgment between upper and lower limiting scores of the possible optimum interval. The (SS/N – S/N) for the study of Buchsbaum with the minimum value of 0.166 is obtained at the lower score of the optimum interval, that is, at the CAGE score of 1. Therefore, the role of the CAGE in screening is greatest for medical inpatients and general medicine outpatients who respond affirmatively to any of the CAGE questions. In addition, this difference is achieved at the upper limit for the study of Mayfield et al., having a numerical value of 0.166 with a CAGE score is 2. This finding suggests that the CAGE can effectively identify alcohol abuse in psychiatric inpatients if a CAGE score of 2 is used.
In two of three comparisons, the critical CAGE score yielded the same cut-off point with (SS/N – S/N)min values as it did with the best operating point measured from the ROC plot. In the only discrepant case, the difference was rather small: the critical CAGE score was selected to be the higher border of the optimum interval instead of the lower. The consistency between the best operating point and critical CAGE score in the studies is remarkable in terms of the confirmation of results by using two different methods (9;10).
In our analysis, the levelling factor is used for calculating the trade-off between sensitivity and specificity at each CAGE score and finding the score with optimal the trade-off. When the factor at the critical CAGE score (p′critical) is compared with the p′ideal value from the following equation, the diagnostic accuracy of each CAGE study is obtained:
The p′critical values recorded at the critical CAGE scores were 0.129 for Bush et al., whereas Buchsbaum et al. and Mayfield et al. had an equal value of 0.146. The closer the p′critical to the p′ideal, the more efficient the CAGE study. Our results showed a more efficient conclusion when the Bush et al. study was used, which yielded a value of 0.021 compared with 0.044 when both of the other studies were used.
The lowest value recorded throughout the complete study was a (SS/N – S/N)min of 0.053. The CAGE questionnaire performed on medical inpatients has discriminated abuse from nonabuse with the highest accuracy. It is also remarkable that the (SS/N – S/N)min yielded equal values at the critical CAGE scores for the studies of Mayfield and Buchsbaum. This finding shows that CAGE performs equally well in the psychiatric inpatients and general medicine outpatients but at different critical cut-off points.
The ROC/Taguchi method is proven to be a future promising method to determine the diagnostic accuracy and critical cut-off points in clinical settings. Observation was made by translating the CAGE scores into quantitative performance indices enhanced by the richness of the information available regarding the patient population. A stepwise application of the statistical elements incorporated in this method showed the inherent discrimination capacity of each CAGE score and introduced a solution to problematic cases of similar AUCs. Therefore, the hybrid method can further assist physicians to improve the identification rate of patients with health problems by facilitating the difficult task of standardizing the borderline cut-off points for medical applications.
The diagnostic accuracy is 86 percent among medical inpatients (critical score 1); 76.84 percent among general medicine outpatients (critical score 1) and 76.84 percent among psychiatric inpatients (critical score 2). A diagnostic result of 86 percent is high for medical inpatients, high diagnostic accuracy is obtained at other treatment settings as well. This finding shows that the CAGE questionnaire can have established reliability and validity in screening individuals with alcohol dependencies. With regular use, we believe that the CAGE questionnaire is a sensitive and efficient screening instrument in identifying patients likely to have serious alcohol-related problems. It can improve both recognition and treatment rates for these patients with high diagnostic accuracy. This high accuracy of CAGE will in turn significantly reduce the physical, social, psychological, and legal costs on society of misclassified patients.
Once alcohol abuse has been initially detected, a treatment program will require the asserted effort not only of the physician but also a qualified social advisor, a marriage counsellor, and in extreme cases even a financial advisor. Total commitment must be self-evident by all parties involved from the very beginning if the potential benefits are to be achieved.
Further research should concentrate on the determination of critical CAGE scores in other everyday treatment settings. Standardizing these findings as screening protocols will minimize the time for diagnosis and intervention to problem drinking. We also believe that there is a need to establish a perfect gold standard in every clinical setting to be used against the CAGE that would constitute a definitive diagnostic evidence. This definition of an appropriate gold standard associated with alcohol abuse will unquestionably change with the development of superior techniques and research methodologies.
Predictor Indices of Diagnostic Performancea