Introduction
Dysphonia is a multifactorial disorder which affects expressive communication, mood and general health status.Reference Millar, Deary, Wilson and MacKenzie1 As discussed in the preceding paper,Reference Webb, Carding, Deary, Mackenzie, Steen and Wilson2 a number of well validated measures is now available to assess these different domains. The responsiveness to change of these very different tools is, however, much less clear. There are comparatively few studies of the outcomes of dysphonia treatment, and most use limited outcome data sets in a highly selected patient group (for example, vocal fold medialisation). Furthermore, the number of studies of surgical as opposed to voice therapy outcomes is exceedingly small. Our objective was to compare the sensitivity to change of a number of different self-reported, perceptual and quality of life measures following conservative (i.e. speech and language therapy) and surgical intervention in a large, heterogeneous group of voice patients.
The specific aims of the study were: (Reference Millar, Deary, Wilson and MacKenzie1) to estimate the responsiveness to change of self-reported and perceptual ratings of voice quality and voice-related quality of life, in a large cohort of heterogeneous dysphonic patients; and (Reference Webb, Carding, Deary, Mackenzie, Steen and Wilson2) to estimate the range of effect sizes for voice therapy and surgical interventions, using self-reported and observer-rated measures, in order to inform future prospective, controlled trials.
Methods
The approach adopted was to identify patients with voice-related problems and then, using a range of generic and voice-specific measures, to assess their quality of life before and after medical intervention.
Three self-reported voice scales were completed by patients. The vocal performance questionnaireReference Deary, Webb, MacKenzie, Wilson and Carding3 consists of 12 items which address the physical aspects of the voice problem and its social and emotional impact, scored to give a total score. The voice handicap indexReference Jacobson, Johnson, Grywalski, Silbergleit, Jacobson and Benninger4 is a 30-item questionnaire with questions grouped into three content domains, representing the functional, emotional and physical aspects of voice disorders. Its sensitivity to change in voice was evaluated on a sample of 37 subjects with various vocal fold abnormalities.Reference Benniger, Ahuja, Gardner and Grywalski5 The voice symptom scaleReference Deary, Wilson, Carding and MacKenzie6 is a 30-item scale with three content domains and a total score, the reliability and validity of which have been assessed in a series of studies involving over 800 subjects.Reference Wilson, Webb, Carding, Steen, MacKenzie and Deary7
Perceptual, observer-rated analysis of voice quality was performed using the grade–roughness–breathiness–asthenia–strain scale.Reference Hirano8–Reference De Bodt, Wuyts, Van de Heyning and Croux10 All voices were recorded on digital audiotape, following a standard procedure, both before and after treatment.Reference Webb, Carding, Deary, MacKenzie, Steen and Wilson11 The recorded voice sample included rote counting and speaking the days of the week, prolonged /a/ and /i/ vowels, and three sentences from the Rainbow Passage. The five grade–roughness–breathiness–asthenia–strain parameters were scored using a four-point rating scale, from zero (normal) to three (extreme). Each participant was scored on a standard pro forma by an independent expert rater. Independent raters were blinded to treatment group and treatment status, but aware of each patient's age and sex. The short form 36 (SF 36)Reference Jenkinson, Coulter and Wright12 is an extensively validated, self-administered, 36-item questionnaire assessing quality of life. It has eight subscales and two global domains (mental health and physical health), and a large body of normative data is available.Reference Brazier, Harper, Jones, O'Cathain, Thomas and Usherwood13 The SF 36 is known to be abnormal in patients with voice disorders.Reference Wilson, Millar, Deary and MacKenzie14
All patients also completed the hospital anxiety and depression scale.Reference Zigmond and Snaith15
Patients
One hundred and forty-four patients complaining of hoarseness and attending out-patient clinics in Newcastle and Glasgow were assessed by the above measures, before and after intervention. The patients included a subgroup of patients described in our companion paper.Reference Webb, Carding, Deary, Mackenzie, Steen and Wilson2 The patient exclusion criteria were: no intervention undertaken; defaulting from follow-up; laryngeal cancer; age less than 18 years; and impaired language or receptive communication skills. At the initial out-patient appointment, each participant completed the three self-reported voice questionnaires, the SF 36 and the hospital anxiety and depression scale, and also had their voice recorded. Ninety patients received a course of speech and language therapy, while 54 patients underwent laryngeal surgery.
Analysis
For each questionnaire or rating scale, the effect size was defined as the change in mean score divided by the standard deviation of change scores. The effect size is independent of scale and sample size and can be used to make comparisons between the different questionnaires and different groups of subjects. It is acceptedReference Cohen16,Reference Kazis, Anderson and Meenen17 that values around 0.2 represent small effect sizes, values around 0.5 represent medium effect sizes and values around 0.8 represent large effect sizes. If subjects experience an improvement in quality of life, the outcome measure with the largest effect size is clearly the most sensitive to change.
Results
Table I shows the mean baseline and follow-up scores for each patient group, along with the mean improvement in quality of life (with a 95 per cent confidence interval (CI)) and an estimate of effect size. A paired t-test indicates those improvements that are statistically significant.
Table I Responsiveness to change of voice-related quality of life measures
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160629074824-22268-mediumThumb-S0022215107007839_tab1.jpg?pub-status=live)
Data represent scores for the various measures. *Mean change from baseline in the direction consistent with an improvement in quality of life (QOL) (a negative score implies a deterioration in quality of life). †Effect sizes = improvement in quality of life divided by the standard deviation (SD) of change scores. CI = confidence intervals; VPQ = vocal performance questionnaire; SLT = speech and language therapy; surg = surgery; VHI = voice handicap index; VoiSS = voice symptom scale; GRBAS = grade–roughness–breathiness–asthenia–strain scale; SF 36 = short form 36; HAD = hospital anxiety and depression scale
Both groups of subjects reported medium to large improvements on all three voice questionnaires. The smallest changes were in the emotional subscale of the voice handicap index (effect sizes = 0.44 and 0.48 for speech and language therapy and surgery groups, respectively) and in the physical symptoms subscale of the voice symptom scale (effect sizes = 0.38 and 0.43 for speech and language therapy and surgery groups, respectively). The largest changes in the individual scales were in the physical aspects of voice subscale of the voice handicap index (effect sizes = 0.71 and 0.81 for speech and language therapy and surgery groups, respectively) and in the voice impairment subscale of the voice symptom scale (effect sizes = 0.78 and 1.00 for speech and language therapy and surgery groups, respectively). The effect sizes corresponding to the change in the total score, respective to speech and language therapy and surgery, for each of the three voice questionnaires, were: vocal performance questionnaire, 1.04 and 0.82; voice handicap index, 0.62 and 0.72; and voice symptom scale, 0.78 and 1.06. The two patient groups were very similar both at baseline and follow up (Figure 1), with no significant differences between them.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160629074628-85772-mediumThumb-S0022215107007839_fig1g.jpg?pub-status=live)
Fig. 1 Box and whisker plots of pre- and post-intervention scores for the three self-reported questionnaires completed by (a) the speech and language therapy group, and (b) the surgery group. Scores have been numerically standardised for comparison. The x axis shows the number of fully completed questionnaire sets, by questionnaire. The lower and upper edges of each box represent the 25th and 75th percentiles, respectively. Medians are indicated by the horizontal line within each box. The range is denoted by the whiskers; individual outliers are indicated as circles. QOL = quality of life; VPQ = vocal performance questionnaire; VHI = voice handicap index; VoiSS = voice symptom scale
The changes in the three voice questionnaires were highly correlated, as follows: vocal performance questionnaire vs voice symptom scale, 0.74 (95 per cent CI: 0.65, 0.81); vocal performance questionnaire vs voice handicap index, 0.76 (95 per cent CI: 0.68, 0.83); and voice symptom scale vs voice handicap index, 0.83 (95 per cent CI: 0.76, 0.87). All differences were significant (p < 0.0001). The correlation between the voice symptom scale and the voice handicap index was greater, due in part to the small number of shared items between the two questionnaires. Changes in the subscale components of the voice handicap index and the voice symptom scale were also significantly correlated. The greatest correlation was between the voice handicap index physical aspects of health subscale and the voice symptom scale impairment subscale (ρ = 0.76; 95 per cent CI: 0.67, 0.82). Changes in the three voice handicap index subscales were correlated with each other (all correlations > 0.6). Changes in the three subscales of the voice symptom scale were less strongly correlated with each other (correlations between 0.2 and 0.6). The weakest correlations were between the voice symptom scale physical symptom subscale and all the other subscales (correlations between 0.2 and 0.4).
For both groups of subjects, there was some evidence of change on the grade–roughness–breathiness–asthenia–strain scale, but the effect sizes were much smaller than those observed with the self-reported measures. For subjects undergoing speech and language therapy, there were small or medium effects in each grade–roughness–breathiness–asthenia–strain component. For subjects undergoing surgery, the effect sizes were smaller in all components except for roughness, the least sensitive component.
There was little evidence of substantive change in any of the generic health status instruments – all effect sizes were less than 0.3. Thus, although some of the changes were statistically significant, the small effect sizes suggest that they were not clinically important.
Discussion
The pattern of effect sizes observed in Table I indicates very small changes in the generic health status instruments in comparison with the voice-specific measures. This supports the conclusion of a previous study which assessed voice therapy alone.Reference MacKenzie, Millar, Sellars, Wilson and Deary18 Following intervention, there were fairly large changes in self-reported, voice-related quality of life. Similar effect sizes were observed across the three self-reported voice questionnaires. The changes in total scores were highly correlated, suggesting that the three self-reported voice questionnaires were detecting changes in the same sets of patients.
When considering the voice handicap index and the voice symptom scale component scores, greatest change was observed in the voice symptom scale impairment subscale and in the voice handicap index physical aspects of voice subscale. Inspection of the individual items that make up these two subscales suggests that they are more or less equivalent; both relate to voice quality. The very high correlation between the changes in these two subscales supports this suggestion. The remaining two voice handicap index components (functional and emotional aspects of health) broadly overlap with the voice symptom scale emotion component; this is reflected in the correlation in change scores between these components. The voice symptom scale physical symptom component has almost no overlap with any of the voice handicap index components; this is reflected in the much weaker correlations of this subscale with the voice handicap index subscales, and indeed with the other voice symptom scale components.
The grade–roughness–breathiness–asthenia–strain components were less sensitive to change than the self-reported measures. Roughness did not alter much, and was also the only parameter which consistently failed to show any relationship with the total or subscales of any of the three self-reported measures.Reference Webb, Carding, Deary, Mackenzie, Steen and Wilson2
The participants reported comparatively little change in the generic health status questionnaires; a much larger change was observed in the self-reported voice scales. The SF 36 and other generic quality of life instruments have proven validity and reliability, and it is reasonable to suppose that any general, subjective increase in wellbeing would be reflected in these instruments. Similarly, there was little change in the mood variables (the hospital anxiety and depression scales and the SF 36 mental health component score). This suggests that the changes observed in the voice measures were independent of a general improvement in subjective wellbeing as a result of receiving a medical intervention, but rather reflected real improvement in voice-related quality of life. At the same time, the lack of change in the generic measures also highlights the need for a voice-specific questionnaire in assessing voice treatment effectiveness; the SF 36 does not include voice-sensitive components.
• Several self-reported voice tools seem responsive to change
• The differing sensitivity of the various available tools is not known
• Identification of the effect sizes of different interventions according to different tools would be useful in the standardisation of interventions and in clinical voice outcome reporting
• The three self-reported voice tools studied (the vocal performance questionnaire, the voice handicap index and the voice symptom scale) all showed large effect sizes following either voice therapy or phonosurgery
• Smaller effect sizes were noted for the grade–roughness–breathiness–asthenia–strain perceptual rating scale, administered by an expert rater
• General health status measures were the least responsive to change
Our work suggests that all three self-reported voice measures are capable of detecting change. There is no strong evidence, on the basis of sensitivity to change evaluation, to favour one measure over the other two. Whichever one is chosen, the means and standard deviations given in Table I can be used to help determine sample size when planning an evaluation of an intervention. In the present, large sample of patients with benign disorders, the vocal performance questionnaire and the voice symptom scale showed the largest overall effect sizes, while the voice handicap index sensitivity was somewhat lower for both conservative and surgical interventions. The voice handicap index was, however, developed partly in laryngectomy patients; therefore, the responsiveness pattern may well change in subjects with laryngeal malignancy. Figure 1 shows that, especially for the surgery group, the vocal performance questionnaire and the voice handicap index inter-quartile ranges shrank post-treatment. This possibly supplies evidence for a floor effect in these questionnaires, which was not seen in the voice symptom scale.
The data shown here are among the first to record the approximate level of benefit from a heterogeneous group of phonosurgical patients. There was no prior expectation as to the relative effects of speech and language therapy and surgery. Both interventions produced very similar changes in self-reported voice quality, although the expert raters recorded larger improvements in perceived voice quality in the group undergoing speech and language therapy. The study was, however, designed principally to assess the behaviour of the clinical outcomes studied, rather than to compare different subgroups of intervention. The groups were not prospectively matched at baseline, either for disease severity or for diagnostic spread; no treatment comparison inference is therefore possible. Nonetheless, the results do provide useful evidence that the measures used can in future be applied across a range of interventions likely to affect voice-related quality of life.
Acknowledgements
This research was supported by a grant from the Wellcome Trust.