Optimising outcome assessment of voice interventions, II: sensitivity to change of self-reported and observer-rated measures

I N Steen; K MacKenzie; P N Carding; A Webb; I J Deary; J A Wilson

doi:10.1017/S0022215107007839

Optimising outcome assessment of voice interventions, II: sensitivity to change of self-reported and observer-rated measures

Published online by Cambridge University Press: 14 May 2007

I N Steen ,

K MacKenzie ,

P N Carding ,

A Webb ,

I J Deary and

J A Wilson

Show author details

I N Steen: Affiliation:
Institute of Health and Society, Newcastle University, Royal Infirmary, Glasgow, UK
K MacKenzie: Affiliation:
Department of Otolaryngology Head Neck Surgery, Royal Infirmary, Glasgow, UK
P N Carding: Affiliation:
Department of Otolaryngology Head and Neck Surgery, Newcastle University, Glasgow, UK
A Webb: Affiliation:
Department of Otolaryngology Head and Neck Surgery, Newcastle University, Glasgow, UK
I J Deary: Affiliation:
Department of Psychology, University of Edinburgh, Scotland, UK.
J A Wilson*: Affiliation:
Department of Otolaryngology Head and Neck Surgery, Newcastle University, Glasgow, UK
*: Address for correspondence: Prof. Janet A Wilson, Dept of Otolaryngology Head Neck Surgery, Freeman Hospital, Newcastle upon Tyne NE7 7DN, UK. Fax: (44) 191 223 1246 E-mail: j.a.wilson@ncl.ac.uk

Article contents

Abstract
Objectives:
Design:
Setting:
Participants:
Main outcome measures:
Results:
Conclusion:
Introduction
Methods
Results
Discussion
References

Rights & Permissions

Abstract

Objectives:

A wide range of well validated instruments is now available to assess voice quality and voice-related quality of life, but comparative studies of the responsiveness to change of these measures are lacking. The aim of this study was to assess the responsiveness to change of a range of different measures, following voice therapy and surgery.

Design:

Longitudinal, cohort comparison study.

Setting:

Two UK voice clinics.

Participants:

One hundred and forty-four patients referred for treatment of benign voice disorders, 90 undergoing voice therapy and 54 undergoing laryngeal microsurgery.

Main outcome measures:

Three measures of self-reported voice quality (the vocal performance questionnaire, the voice handicap index and the voice symptom scale), plus the short form 36 (SF 36) general health status measure and the hospital anxiety and depression score. Perceptual, observer-rated analysis of voice quality was performed using the grade–roughness–breathiness–asthenia–strain scale. We compared the effect sizes (i.e. responsiveness to change) of the principal subscales of all measures before and after voice therapy or phonosurgery.

Results:

All three self-reported voice measures had large effect sizes following either voice therapy or surgery. Outcomes were similar in both treatment groups. The effect sizes for the observer-rated grade–roughness–breathiness–asthenia–strain scale scores were smaller, although still moderate. The roughness subscale in particular showed little change after therapy or surgery. Only small effects were observed in general health and mood measures.

Conclusion:

The results suggest that the use of a voice-specific questionnaire is essential for assessing the effectiveness of voice interventions. All three self-reported measures tested were capable of detecting change, and scores were highly correlated. On the basis of this evaluation of different measures' sensitivities to change, there is no strong evidence to favour either the vocal performance questionnaire, the voice handicap index or the voice symptom scale.

Keywords

Voice Voice Quality Quality of Life Outcome Assessment (Healthcare)Outcome Measures

Type: Main Articles
Information: The Journal of Laryngology & Otology , Volume 122 , Issue 1 , January 2008 , pp. 46 - 51

DOI: https://doi.org/10.1017/S0022215107007839 [Opens in a new window]
Copyright: Copyright © JLO (1984) Limited 2007

Introduction

Dysphonia is a multifactorial disorder which affects expressive communication, mood and general health status.Reference Millar, Deary, Wilson and MacKenzie¹ As discussed in the preceding paper,Reference Webb, Carding, Deary, Mackenzie, Steen and Wilson² a number of well validated measures is now available to assess these different domains. The responsiveness to change of these very different tools is, however, much less clear. There are comparatively few studies of the outcomes of dysphonia treatment, and most use limited outcome data sets in a highly selected patient group (for example, vocal fold medialisation). Furthermore, the number of studies of surgical as opposed to voice therapy outcomes is exceedingly small. Our objective was to compare the sensitivity to change of a number of different self-reported, perceptual and quality of life measures following conservative (i.e. speech and language therapy) and surgical intervention in a large, heterogeneous group of voice patients.

The specific aims of the study were: (Reference Millar, Deary, Wilson and MacKenzie1) to estimate the responsiveness to change of self-reported and perceptual ratings of voice quality and voice-related quality of life, in a large cohort of heterogeneous dysphonic patients; and (Reference Webb, Carding, Deary, Mackenzie, Steen and Wilson2) to estimate the range of effect sizes for voice therapy and surgical interventions, using self-reported and observer-rated measures, in order to inform future prospective, controlled trials.

Methods

The approach adopted was to identify patients with voice-related problems and then, using a range of generic and voice-specific measures, to assess their quality of life before and after medical intervention.

Three self-reported voice scales were completed by patients. The vocal performance questionnaireReference Deary, Webb, MacKenzie, Wilson and Carding³ consists of 12 items which address the physical aspects of the voice problem and its social and emotional impact, scored to give a total score. The voice handicap indexReference Jacobson, Johnson, Grywalski, Silbergleit, Jacobson and Benninger⁴ is a 30-item questionnaire with questions grouped into three content domains, representing the functional, emotional and physical aspects of voice disorders. Its sensitivity to change in voice was evaluated on a sample of 37 subjects with various vocal fold abnormalities.Reference Benniger, Ahuja, Gardner and Grywalski⁵ The voice symptom scaleReference Deary, Wilson, Carding and MacKenzie⁶ is a 30-item scale with three content domains and a total score, the reliability and validity of which have been assessed in a series of studies involving over 800 subjects.Reference Wilson, Webb, Carding, Steen, MacKenzie and Deary⁷

Perceptual, observer-rated analysis of voice quality was performed using the grade–roughness–breathiness–asthenia–strain scale.Reference Hirano⁸^–Reference De Bodt, Wuyts, Van de Heyning and Croux¹⁰ All voices were recorded on digital audiotape, following a standard procedure, both before and after treatment.Reference Webb, Carding, Deary, MacKenzie, Steen and Wilson¹¹ The recorded voice sample included rote counting and speaking the days of the week, prolonged /a/ and /i/ vowels, and three sentences from the Rainbow Passage. The five grade–roughness–breathiness–asthenia–strain parameters were scored using a four-point rating scale, from zero (normal) to three (extreme). Each participant was scored on a standard pro forma by an independent expert rater. Independent raters were blinded to treatment group and treatment status, but aware of each patient's age and sex. The short form 36 (SF 36)Reference Jenkinson, Coulter and Wright¹² is an extensively validated, self-administered, 36-item questionnaire assessing quality of life. It has eight subscales and two global domains (mental health and physical health), and a large body of normative data is available.Reference Brazier, Harper, Jones, O'Cathain, Thomas and Usherwood¹³ The SF 36 is known to be abnormal in patients with voice disorders.Reference Wilson, Millar, Deary and MacKenzie¹⁴

All patients also completed the hospital anxiety and depression scale.Reference Zigmond and Snaith¹⁵

Patients

One hundred and forty-four patients complaining of hoarseness and attending out-patient clinics in Newcastle and Glasgow were assessed by the above measures, before and after intervention. The patients included a subgroup of patients described in our companion paper.Reference Webb, Carding, Deary, Mackenzie, Steen and Wilson² The patient exclusion criteria were: no intervention undertaken; defaulting from follow-up; laryngeal cancer; age less than 18 years; and impaired language or receptive communication skills. At the initial out-patient appointment, each participant completed the three self-reported voice questionnaires, the SF 36 and the hospital anxiety and depression scale, and also had their voice recorded. Ninety patients received a course of speech and language therapy, while 54 patients underwent laryngeal surgery.

Analysis

For each questionnaire or rating scale, the effect size was defined as the change in mean score divided by the standard deviation of change scores. The effect size is independent of scale and sample size and can be used to make comparisons between the different questionnaires and different groups of subjects. It is acceptedReference Cohen^16,Reference Kazis, Anderson and Meenen¹⁷ that values around 0.2 represent small effect sizes, values around 0.5 represent medium effect sizes and values around 0.8 represent large effect sizes. If subjects experience an improvement in quality of life, the outcome measure with the largest effect size is clearly the most sensitive to change.

Results

Table I shows the mean baseline and follow-up scores for each patient group, along with the mean improvement in quality of life (with a 95 per cent confidence interval (CI)) and an estimate of effect size. A paired t-test indicates those improvements that are statistically significant.

Table I Responsiveness to change of voice-related quality of life measures

Data represent scores for the various measures. *Mean change from baseline in the direction consistent with an improvement in quality of life (QOL) (a negative score implies a deterioration in quality of life). ^†Effect sizes = improvement in quality of life divided by the standard deviation (SD) of change scores. CI = confidence intervals; VPQ = vocal performance questionnaire; SLT = speech and language therapy; surg = surgery; VHI = voice handicap index; VoiSS = voice symptom scale; GRBAS = grade–roughness–breathiness–asthenia–strain scale; SF 36 = short form 36; HAD = hospital anxiety and depression scale

Both groups of subjects reported medium to large improvements on all three voice questionnaires. The smallest changes were in the emotional subscale of the voice handicap index (effect sizes = 0.44 and 0.48 for speech and language therapy and surgery groups, respectively) and in the physical symptoms subscale of the voice symptom scale (effect sizes = 0.38 and 0.43 for speech and language therapy and surgery groups, respectively). The largest changes in the individual scales were in the physical aspects of voice subscale of the voice handicap index (effect sizes = 0.71 and 0.81 for speech and language therapy and surgery groups, respectively) and in the voice impairment subscale of the voice symptom scale (effect sizes = 0.78 and 1.00 for speech and language therapy and surgery groups, respectively). The effect sizes corresponding to the change in the total score, respective to speech and language therapy and surgery, for each of the three voice questionnaires, were: vocal performance questionnaire, 1.04 and 0.82; voice handicap index, 0.62 and 0.72; and voice symptom scale, 0.78 and 1.06. The two patient groups were very similar both at baseline and follow up (Figure 1), with no significant differences between them.

Fig. 1 Box and whisker plots of pre- and post-intervention scores for the three self-reported questionnaires completed by (a) the speech and language therapy group, and (b) the surgery group. Scores have been numerically standardised for comparison. The x axis shows the number of fully completed questionnaire sets, by questionnaire. The lower and upper edges of each box represent the 25th and 75th percentiles, respectively. Medians are indicated by the horizontal line within each box. The range is denoted by the whiskers; individual outliers are indicated as circles. QOL = quality of life; VPQ = vocal performance questionnaire; VHI = voice handicap index; VoiSS = voice symptom scale

The changes in the three voice questionnaires were highly correlated, as follows: vocal performance questionnaire vs voice symptom scale, 0.74 (95 per cent CI: 0.65, 0.81); vocal performance questionnaire vs voice handicap index, 0.76 (95 per cent CI: 0.68, 0.83); and voice symptom scale vs voice handicap index, 0.83 (95 per cent CI: 0.76, 0.87). All differences were significant (p < 0.0001). The correlation between the voice symptom scale and the voice handicap index was greater, due in part to the small number of shared items between the two questionnaires. Changes in the subscale components of the voice handicap index and the voice symptom scale were also significantly correlated. The greatest correlation was between the voice handicap index physical aspects of health subscale and the voice symptom scale impairment subscale (ρ = 0.76; 95 per cent CI: 0.67, 0.82). Changes in the three voice handicap index subscales were correlated with each other (all correlations > 0.6). Changes in the three subscales of the voice symptom scale were less strongly correlated with each other (correlations between 0.2 and 0.6). The weakest correlations were between the voice symptom scale physical symptom subscale and all the other subscales (correlations between 0.2 and 0.4).

For both groups of subjects, there was some evidence of change on the grade–roughness–breathiness–asthenia–strain scale, but the effect sizes were much smaller than those observed with the self-reported measures. For subjects undergoing speech and language therapy, there were small or medium effects in each grade–roughness–breathiness–asthenia–strain component. For subjects undergoing surgery, the effect sizes were smaller in all components except for roughness, the least sensitive component.

There was little evidence of substantive change in any of the generic health status instruments – all effect sizes were less than 0.3. Thus, although some of the changes were statistically significant, the small effect sizes suggest that they were not clinically important.

Discussion

The pattern of effect sizes observed in Table I indicates very small changes in the generic health status instruments in comparison with the voice-specific measures. This supports the conclusion of a previous study which assessed voice therapy alone.Reference MacKenzie, Millar, Sellars, Wilson and Deary¹⁸ Following intervention, there were fairly large changes in self-reported, voice-related quality of life. Similar effect sizes were observed across the three self-reported voice questionnaires. The changes in total scores were highly correlated, suggesting that the three self-reported voice questionnaires were detecting changes in the same sets of patients.

When considering the voice handicap index and the voice symptom scale component scores, greatest change was observed in the voice symptom scale impairment subscale and in the voice handicap index physical aspects of voice subscale. Inspection of the individual items that make up these two subscales suggests that they are more or less equivalent; both relate to voice quality. The very high correlation between the changes in these two subscales supports this suggestion. The remaining two voice handicap index components (functional and emotional aspects of health) broadly overlap with the voice symptom scale emotion component; this is reflected in the correlation in change scores between these components. The voice symptom scale physical symptom component has almost no overlap with any of the voice handicap index components; this is reflected in the much weaker correlations of this subscale with the voice handicap index subscales, and indeed with the other voice symptom scale components.

The grade–roughness–breathiness–asthenia–strain components were less sensitive to change than the self-reported measures. Roughness did not alter much, and was also the only parameter which consistently failed to show any relationship with the total or subscales of any of the three self-reported measures.Reference Webb, Carding, Deary, Mackenzie, Steen and Wilson²

The participants reported comparatively little change in the generic health status questionnaires; a much larger change was observed in the self-reported voice scales. The SF 36 and other generic quality of life instruments have proven validity and reliability, and it is reasonable to suppose that any general, subjective increase in wellbeing would be reflected in these instruments. Similarly, there was little change in the mood variables (the hospital anxiety and depression scales and the SF 36 mental health component score). This suggests that the changes observed in the voice measures were independent of a general improvement in subjective wellbeing as a result of receiving a medical intervention, but rather reflected real improvement in voice-related quality of life. At the same time, the lack of change in the generic measures also highlights the need for a voice-specific questionnaire in assessing voice treatment effectiveness; the SF 36 does not include voice-sensitive components.

• Several self-reported voice tools seem responsive to change
• The differing sensitivity of the various available tools is not known
• Identification of the effect sizes of different interventions according to different tools would be useful in the standardisation of interventions and in clinical voice outcome reporting
• The three self-reported voice tools studied (the vocal performance questionnaire, the voice handicap index and the voice symptom scale) all showed large effect sizes following either voice therapy or phonosurgery
• Smaller effect sizes were noted for the grade–roughness–breathiness–asthenia–strain perceptual rating scale, administered by an expert rater
• General health status measures were the least responsive to change

Our work suggests that all three self-reported voice measures are capable of detecting change. There is no strong evidence, on the basis of sensitivity to change evaluation, to favour one measure over the other two. Whichever one is chosen, the means and standard deviations given in Table I can be used to help determine sample size when planning an evaluation of an intervention. In the present, large sample of patients with benign disorders, the vocal performance questionnaire and the voice symptom scale showed the largest overall effect sizes, while the voice handicap index sensitivity was somewhat lower for both conservative and surgical interventions. The voice handicap index was, however, developed partly in laryngectomy patients; therefore, the responsiveness pattern may well change in subjects with laryngeal malignancy. Figure 1 shows that, especially for the surgery group, the vocal performance questionnaire and the voice handicap index inter-quartile ranges shrank post-treatment. This possibly supplies evidence for a floor effect in these questionnaires, which was not seen in the voice symptom scale.

The data shown here are among the first to record the approximate level of benefit from a heterogeneous group of phonosurgical patients. There was no prior expectation as to the relative effects of speech and language therapy and surgery. Both interventions produced very similar changes in self-reported voice quality, although the expert raters recorded larger improvements in perceived voice quality in the group undergoing speech and language therapy. The study was, however, designed principally to assess the behaviour of the clinical outcomes studied, rather than to compare different subgroups of intervention. The groups were not prospectively matched at baseline, either for disease severity or for diagnostic spread; no treatment comparison inference is therefore possible. Nonetheless, the results do provide useful evidence that the measures used can in future be applied across a range of interventions likely to affect voice-related quality of life.

Acknowledgements

This research was supported by a grant from the Wellcome Trust.

References

1Millar, A, Deary, IJ, Wilson, JA, MacKenzie, K. Is an organic/functional distinction psychologically meaningful in patients with dysphonia? J Psychosom Res 1999;46:497–505CrossRef Google Scholar PubMed

2Webb, AL, Carding, PN, Deary, IJ, Mackenzie, K, Steen, IN, Wilson, JA. Optimising outcome assessment of voice interventions, I: reliability and validity of three self-reported scales. J Laryngol Otol 2007:1–5 [Epub ahead of print]Google Scholar PubMed

3Deary, IJ, Webb, A, MacKenzie, K, Wilson, JA, Carding, PN. Short self-report voice symptom scales: psychometric characteristics of the VHI-10 and the VPQ. Otolaryngol Head Neck Surg 2004;131:232–5CrossRef Google Scholar

4Jacobson, BH, Johnson, A, Grywalski, C, Silbergleit, A, Jacobson, G, Benninger, MS. The Voice Handicap Index (VHI): development and validation. Am J Speech Lang Pathol 1997;6:66–70CrossRef Google Scholar

5Benniger, MS, Ahuja, AS, Gardner, G, Grywalski, C. Assessing outcomes for dysphonic patients. J Voice 1998;12:540–50CrossRef Google Scholar

6Deary, IJ, Wilson, JA, Carding, PN, MacKenzie, K. VoiSS: a patient derived voice symptom scale. J Psychosom Res 2003;54:483–9CrossRef Google Scholar PubMed

7Wilson, JA, Webb, AL, Carding, PN, Steen, IN, MacKenzie, K, Deary, IJ. Comparing the Voice Symptom Scale (VoiSS) and the Voice Handicap Index: structure and content. Clin Otolaryngol 2004;29:169–74CrossRef Google Scholar PubMed

8Hirano, M. Clinical Examination of Voice. New York: Springer-Verlag, 1981Google Scholar

9Dejonckere, PH, Obbens, C, de Moor, GM, Wieneke, GH. Perceptual evaluation of dysphonia: reliability and relevance. Folia Phoniatrica 1993;45:76–83CrossRef Google Scholar PubMed

10De Bodt, M, Wuyts, FL, Van de Heyning, PH, Croux, C. Test-retest of the GRBAS scale: influence of experience and professional background on perceptual ratings of voice quality. J Voice 1997;11:74–80CrossRef Google Scholar PubMed

11Webb, AL, Carding, PN, Deary, IJ, MacKenzie, K, Steen, N, Wilson, JA. The reliability of three perceptual evaluation scales for dysphonia. Eur Arch ORL 2004;261:429–34Google Scholar PubMed

12Jenkinson, C, Coulter, A, Wright, L. Short-form SF-36 health survey questionnaire: normative data for adults of working age. Br Med J 1993;306:1437–40CrossRef Google Scholar

13Brazier, JE, Harper, R, Jones, NMB, O'Cathain, A, Thomas, KJ, Usherwood, T et al. Validating the SF-36 health survey questionnaire: a new outcome measure for primary care. BMJ 1992;305:160–4CrossRef Google Scholar PubMed

14Wilson, JA, Millar, A, Deary, IJ, MacKenzie, K. The quality of life impact of dysphonia. Clin Otolaryngol 2002;27:179–82CrossRef Google Scholar PubMed

15Zigmond, AS, Snaith, RP. The hospital anxiety and depression scale. Acta Psychiatrica Scand 1983;67:361–70CrossRef Google Scholar PubMed

16Cohen, J. Statistical Power Analysis for the Behavioral Sciences, 2nd edn.Hillsdale, NJ: Erlbaum, 1988Google Scholar

17Kazis, LE, Anderson, JJ, Meenen, RF. Effect sizes for interpreting changes in health status. Med Care 1989;27(3 Suppl):S178–89CrossRef Google Scholar PubMed

18MacKenzie, K, Millar, A, Sellars, C, Wilson, JA, Deary, IJ. Is voice therapy an effective treatment for dysphonia? A randomised controlled trial. BMJ 2001;323:658–61CrossRef Google Scholar PubMed

Table I Responsiveness to change of voice-related quality of life measures

Article contents

Optimising outcome assessment of voice interventions, II: sensitivity to change of self-reported and observer-rated measures

Abstract

Keywords

Introduction

Methods

Patients

Analysis

Results

Discussion

Acknowledgements

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests