Introduction
Laryngeal voice production is an aerodynamic and myoelastic event. It is determined by the interaction of glottal airflow, subglottic pressure and vocal fold tension.Reference Terada, Saeki, Toh, Uwa, Sagawa and Takayasu1 Following total laryngectomy, a neopharynx (neoglottis) or pharyngoesophageal segment becomes the new location of voice production.Reference Grolman, Eerenstein, Tange, Canu, Bogaardt and Dijkhuis2 In tracheoesophageal voice, aerodynamic power generated by the lungs is modulated and transformed into sound by the myoelastic tonicity of the pharyngoesophageal segment and neoglottis. This transformed power is passed as a sound through the remaining vocal tract and exits via the mouth.Reference Kazi, Kanagalingam, Venkitaraman, Prasad, Clarke and Nutting3
As experience with the tracheoesophageal puncture procedure and associated voice prostheses has evolved, it has become evident that tracheoesophageal voice can be limited by pharyngoesophageal muscle spasm induced by insufflated air.Reference Singer, Blom and Hamaker4 The myoelastic tonicity of the neoglottis varies with different techniques of neopharyngeal construction after total laryngectomy. In such patients, the goal is not just creation of an intact neopharynx that does not leak. The luminal diameter of this neopharynx should be sufficient to allow the passage of a food bolus, and should allow for either primary or secondary tracheoesophageal voice restoration, but should not be so flaccid as to adversely affect post-operative voice quality.Reference Blom, Pauloski and Hamaker5 Pharyngoesophageal myotomy and plexus neurectomy have been the ‘gold standard’ for surgical management of the neopharynx. Other methods have also been used effectively, such as non-muscle, half-muscle and transverse repairs.Reference Deschler, Doherty, Reed, Hayden and Singer6–Reference Albirmawy, El-Guindy, Elsheikh, Saafan and Darwish8
Alaryngeal, tracheoesophageal voice quality can vary significantly depending on which type of hypopharyngeal repair is used. Thus, this study aimed to evaluate the effect of primary, cross-over, zigzag neopharyngeal construction (neopharyngoplasty), a novel technique, on quantitative and qualitative acoustic parameters of tracheoesophageal voice, compared with pharyngoesophageal myotomy, a standard technique, following total laryngectomy with partial pharyngectomy.
Patients and methods
Patient population
Thirty patients were recruited prospectively. These patients voluntarily provided informed consent and completed the investigational protocol, which had been approved by the relevant institutional review board.
All patients were diagnosed as having primary laryngeal carcinoma (stage III or IV), and were identified as candidates for treatment with total laryngectomy and partial pharyngectomy, at the otolaryngology department of Tanta University Hospital, Egypt. Patients underwent pre-operative assessment for tracheoesophageal voice, regarding manual dexterity, visual acuity, pulmonary function, status of articulation, and neurological and psychological stability.
Diabetic patients with a history of prior radiotherapy who were felt to be at high risk of wound healing complications were pre-operatively excluded from this study.
All included patients had at least one ear with a hearing level within normal limits (i.e. less than 20 dB HL at 0.5, 1 and 2 kHz), and all were Arabic speakers.
Surgical technique
Total laryngectomy with or without neck dissection was performed in the standard fashion.
Patients included in this study underwent partial pharyngectomy, provided that the width of the open, relaxed hypopharyngeal remnant was not less than 3 cm at its narrowest point, without compromising oncological goals. In all patients, primary tracheoesophageal puncture was performed.
Patients were randomly allocated into two groups of 15, to receive one of two types of neopharyngeal repair: either pharyngoesophageal myotomy (group one) or cross-over, zigzag neopharyngoplasty (group two).
In group one patients, a primary, posterior, midline pharyngoesophageal myotomy was performed in the usual manner, from the midpoint of the hypopharynx down to the level of the tracheostoma.Reference Blom, Pauloski and Hamaker5 This was followed by closure of the neopharynx in a three-layered fashion that approximated the mucosal edges, reinforced by a second and third layer of submucosal tissue and constrictor muscles, to form an I-shaped closure line (Figure 1).
In group two patients, the constrictor muscles were freed from lateral attachments, to increase mobility, and then carefully dissected from the lining mucosal layer for about 10 to 15 mm along its length, to create mucosal and constrictor muscle flaps on each side (Figure 2), without compromising the vascular integrity of these flaps. The mucosal flaps were closed in the midline in a two-layered fashion incorporating the mucosal and submucosal layers. Each constrictor muscle flap was then divided into two equal flaps (upper and lower) by two transverse incisions (10 mm long), one midway between the tongue base and the oesophageal inlet and the other at the level of the oesophageal inlet. Each upper flap was mobilised over the midline closure, and the side with the least restriction and the best vascularity was chosen to cross the midline, to be sutured to the submucosal connective tissue and the free border of the contralateral upper flap, 8 to 10 mm lateral to the midline. The reverse was performed for the lower muscular flaps. This created a lateral vertical suture line from the level of the tongue base to the midpoint of the neopharynx, and another from the latter level to the oesophageal inlet on the contralateral side. Three transverse suture lines were also created: the first at the base of the tongue; the second between the inferior border of the upper crossing flap and the superior border of the lower crossing flap; and the third at the oesophageal inlet (Figure 3).
Post-operative care and outcome measures
Once adequate healing had been demonstrated, a voice prosthesis (Provox® 2™; Atos Medical, Hörby, Sweden) of appropriate size was inserted in the tracheoesophageal puncture. All patients received instruction on the care and use of their prosthesis.
All patients completed voice recordings either three months post-operatively or six months after post-operative radiotherapy, conducted according to a previously published protocol.Reference Albirmawy, El-Guindy, Elsheikh, Saafan and Darwish8 Signal recordings were performed in a quiet room with a condenser microphone positioned 30 cm from the patient's mouth. Speech signals were conducted to the microphone amplifier and then processed by a pulse code modulator and stored on a video cassette recorder. Data acquisition was performed with an analogue-to-digital converter with a resolution of 16 bits, which was accurate to within ±0.0015 per cent of the full screen.
Both quantitative and qualitative acoustic parameters were measured.
Each patient was asked to sustain the vowel /a/ at a comfortable pitch and at a conversational loudness, on a single deep breath for as long as possible, over three successive trials. Voice signals and a one-second sample from the middle of the phonation were analysed in order to calculate each patient's voice amplitude, dynamic range, shimmer, fundamental frequency, jitter, maximum phonation time, percentage number of pauses and harmonic-to-noise ratio.
Patients were then asked to read a standard passage (‘Al-Fat-ha’, in Arabic, the initial phrases of the Holy Qur'an). A group of three trained listeners evaluated the following qualitative voice parameters: intelligibility, communicative effectiveness, fluency, speaking rate and wetness. The listeners rated patients from one to 10 on a 10 cm line, with the results expressed in millimetres.
Statistical analysis
The mean and standard deviation of each voice parameter were calculated for each group. Statistical analysis was done with a means comparison test with independent data, using the Student t-test, utilising the Statistical Program for the Social Sciences (SPSS, Chicago, Illinois, USA). Each p value was compared with an α level of 0.05 to determine statistical significance.
Results
Over a five-year period, 30 consecutive patients under the care of the same surgeon (the author) were prospectively enrolled into the present randomised, double-arm study. All patients underwent total laryngectomy with partial pharyngectomy, followed by either primary pharyngoesophageal myotomy in group one (15 cases) or cross-over, zigzag neopharyngoplasty in group two (15 cases).
The patient profiles of the two surgical groups were roughly equivalent (Table I). Patients comprised 28 men and two women, with ages ranging from 45 to 66 years (mean 53.9 ± 7.32 years) in group one and 44 to 63 years (mean 54.7 ± 7.54 years) in group two. Patients’ tumours were staged, according to the tumour–node–metastasis classification, as either stage III (70 per cent) or stage IV (30 per cent). Supraglottic lesions represented 20 per cent of the total cases, the remaining 80 per cent being endolaryngeal lesions.
Data represent patients unless otherwise specified. *n=15. SD = standard deviation; ipsilat = ipsilateral; bilat = bilateral; PF = pharyngocutaneous fistula
One patient in each group developed a post-operative pharyngocutaneous fistula. In both cases, the fistula was managed conservatively, and healed completely within one to two weeks.
At the time of the study, all patients were clinically free of complications and maintained a regular diet.
Quantitative voice parameters
Quantitative analysis of voice intensity (sound pressure level), measured in dB, was undertaken for soft and loud speech, and demonstrated an adequate dynamic range in all patients (Figure 4). The mean intensity for soft speech was 55.71 ± 6.23 dB in group one and 57.34 ± 5.88 dB in group two, whereas that for loud speech was 69.36 ± 4.21 dB in group one and 77.62 ± 3.92 dB in group two. The dynamic range for groups one and two, as indicated by the difference between soft and loud sound pressure levels, was 13.65 ± 3.01 dB and 17.28 ± 3.22 dB, respectively. Amplitude perturbation (shimmer), indicated by cycle-to-cycle differences in intensity, was evaluated for each patient for loud speech only (Figure 4); the mean shimmer value was 2.27 ± 0.33 dB in group one and 1.01 ± 0.21 dB in group two. Voice intensity parameters were noted to be better in group two than group one, although this difference was not statistically significant for soft intensity (p = 0.25), and was only slightly significant for loud intensity (p = 0.01), dynamic range (p = 0.03) and shimmer (p = 0.03).
Fundamental frequency and frequency perturbation (i.e. cycle-to-cycle difference in frequency, also known as jitter) were measured in both groups for both soft and loud speech (Figure 4). The mean soft and loud fundamental frequency values were greater in group two (88.01 ± 3.18 and 131.54 ± 4.16 Hz, respectively) than in group one (87.73 ± 3.26 and 122.33 ± 4.84 Hz, respectively). The mean soft and loud jitter values were greater in group one (3.55 ± 0.12 and 4.86 ± 0.62 per cent, respectively) than in group two (3.11 ± 0.08 and 3.01 ± 0.53 Hz, respectively). No statistically significant differences in soft fundamental frequency or soft jitter were observed between the groups (p = 0.22 and p = 0.13, respectively). However, a statistically significant difference was found for loud fundamental frequency and loud jitter, comparing the two groups (p = 0.01 for both).
The mean values for temporal quantitative parameters were: maximum phonation time, 13.26 ± 2.01 seconds in group one and 14.69 ± 1.98 seconds in group two; percentage number of pauses, 7.82 ± 1.70 per cent in group one and 7.69 ± 1.81 per cent in group two; and harmonic-to-noise ratio, −2.22 ± 0.33 dB in group one and −2.06 ± 0.41 dB in group two (Figure 5). These values were found to be almost equivalent in the two groups (although tending to be greater in group two), except in the case of percentage number of pauses. No statistically significant difference was found for these temporal measures, comparing the two study groups (p > 0.05).
Qualitative voice parameters
Trained listeners evaluated qualitative aspects of the patients’ voice quality, generating the following mean values: intelligibility, 92.1 ± 0.87 mm in group one and 92.4 ± 0.79 mm in group two; communicative effectiveness, 90.8 ± 1.11 mm in group one and 91.2 ± 1.03 mm in group two; fluency, 88.7 ± 0.77 mm in group one and 90.9 ± 0.64 mm in group two; speaking rate, 81.6 ± 1.31 mm in group one and 86.3 ± 0.98 mm in group two; and wetness, 19.5 ± 0.71 mm in group one and 18.8 ± 0.86 mm in group two (see Figure 6). There were no statistically significant differences between the groups as regards intelligibility, communicative effectiveness or wetness (p = 0.32, 0.34 and 0.33, respectively). However, there were statistically significant differences as regards fluency and speaking rate (p = 0.04 for both).
Discussion
Following total laryngectomy, tension in the muscular wall of the neopharynx (neoglottis) is essential for deglutition and anti-reflux protection, and also plays an important role in alaryngeal voice production. A critical level of myoelastic tonicity must exist to permit air flow and adequate phonation. Hypertonicity will trap air and force it into the stomach, with resultant distension. Hypotonicity will produce lower vocal pitches, while a rigid but adynamic neopharynx will produce a coarse whisper.Reference Singer, Blom and Hamaker9 Modifications in neopharyngeal construction have been introduced in an effort to prevent voice-limiting pharyngoesophageal spasm; these include primary myotomy,Reference Hamaker, Singer, Blom and Daniels10 pharyngeal plexus neurectomy,Reference Singer, Blom and Hamaker9 non-muscle repair,Reference Clevens, Esclamado, Martshorn and Lewin11 half-muscle repairReference Deschler, Doherty, Reed, Hayden and Singer6 and transverse repair.Reference Albirmawy, El-Guindy, Elsheikh, Saafan and Darwish8 The cross-over, zigzag neopharyngoplasty technique used in group two of the current study is the result of similar efforts.
Pharyngoesophageal spasm is manifested clinically by a sudden increase in sound pressure level with abrupt termination of produced sound. Following tracheoesophageal puncture, Singer and BlomReference Singer and Blom12 believed that pharyngoesophageal spasm was the cause of failure to achieve satisfactory speech in 28–55 per cent of patients, due to lack of management of the neopharynx. Near-complete acquisition of tracheoesophageal voice was reported in patients with disabling spasm who underwent a secondary pharyngoesophageal myotomy.Reference Singer and Blom12 Furthermore, Hamaker et al. Reference Hamaker, Singer, Blom and Daniels10 suggested that primary pharyngoesophageal myotomy be performed in all patients. However, Baugh et al. Reference Baugh, Baker and Lewis13 reported that the procedure may be associated with a flaccid neopharynx and a deep, breathy, unacceptable voice.
In the current series, patients were categorised according to their type of hypopharyngeal repair, either: (1) pharyngoesophageal myotomy with an I-shaped, three-layered closure line including the mucosa, submucosa and constrictor muscles (group one); or (2) vertical, two-layered closure including the mucosa and submucosa, followed by repair of the constrictors in a cross-over, zigzag fashion (group two). The two groups were then compared with each other as regards different quantitative and qualitative acoustic voice parameters, within a prospective clinical trial, in order to evaluate the best method with which to alleviate neopharyngeal spasm after total laryngectomy with partial pharyngectomy.
In a previous prospective study by the author and colleagues,Reference Albirmawy, El-Guindy, Elsheikh, Saafan and Darwish8 both quantitative and qualitative tracheoesophageal voice parameters were assessed in patients undergoing four types of primary neopharyngeal repair: myotomy, pharyngeal plexus neurectomy, non-muscle repair and transverse repair. No statistically significant differences were found for quantitative voice parameters, comparing the four surgical groups. It was thus concluded that all four neopharyngeal repair types successfully prevented post-operative pharyngoesophageal spasm.
Quantitative voice results from the current series were comparable with those reported by Yoshida et al.,Reference Yoshida, Hamaker, Singer and Blom14 Most et al. Reference Most, Tobin and Mimran15 and Albirmawy et al. Reference Albirmawy, El-Guindy, Elsheikh, Saafan and Darwish8 Although results were better in group two, the improvement was only statistically significant for loud intensity (p = 0.01), dynamic range (p = 0.03), shimmer (p = 0.03), loud fundamental frequency (p = 0.01) and loud jitter (p = 0.01). These results favour the cross-over technique, and can be explained by the ability of this technique: (1) to maintain the upper oesophageal sphincter intact without a myotomy, achieving higher intensity values; and (2) to elongate and relax the healing tension line of the constrictors of the partially resected hypopharynx, by use of a zigzag pattern, over a lax mucosal layer. These effects combine to achieve more favourable soft and loud fundamental frequencies, phonation time, harmonic-to-noise ratio, shimmer, soft and loud jitter, and percentage number of pauses.
The pharyngeal plexus neurectomy, reported by Singer et al. Reference Singer, Blom and Hamaker9 and Albirmawy et al.,Reference Albirmawy, El-Guindy, Elsheikh, Saafan and Darwish8 is highly effective in preventing hypertonicity, and produces significantly higher voice fundamental frequencies, due to the greater resting tension of the upper oesophageal sphincter. However, failure occurs in some cases due to incomplete resection of all branches of the plexus.
Clevens et al. Reference Clevens, Esclamado, Martshorn and Lewin11 used a non-muscle closure technique for primary voice restoration, with 100 per cent success. However, their fistula rate was almost double that for patients undergoing the three-layer technique with muscle closure.
Deschler et al. Reference Deschler, Doherty, Reed, Hayden and Singer6 reported using the half-muscle neopharyngeal closure technique. Their fistula rate was acceptably low. Because only one constrictor muscle flap is used for reinforcement, a circumferential muscle ring (which may spasm) is not created, but a completely patulous conduit is likewise avoided.
Hamaker and CheesmanReference Hamaker, Cheesman, ED, MI and RC16 and Albirmawy et al. Reference Albirmawy, El-Guindy, Elsheikh, Saafan and Darwish8 have reported that patients undergoing horizontal closure of the neopharynx were 100 per cent successful in establishing voice. However, the majority of these patients were considered somewhat hypotonic, in comparison with patients undergoing other techniques.
In the current study, half the patient population underwent cross-over neopharyngoplasty, in an attempt to gain the benefits of several different surgical techniques while limiting their drawbacks. The technique provided vascular reinforcement for the pharyngeal closure, as derived from myotomy and half-muscle repair techniques. Although no further incisions or risks to mucosal integrity were required, as with the myotomy procedure, the fistula rate was equal in the two surgical groups, and acceptably low (6.6 per cent). Similarly to the plexus neurectomy technique, tone-inducing muscle remained intimately related to the neopharyngeal conduit, allowing acceptable fundamental frequency and jitter levels which were acoustically similar to, if not better than, those of the control group (group one). A patulous conduit was also avoided, as can occur with the non-muscle repair and transverse repair techniques. A degree of flaccidity was achieved, accounting for the slightly higher phonation time and harmonic-to-noise ratio. A degree of critical tonicity was also achieved, at the level of the intact upper oesophageal sphincter, which accounted for the slightly higher intensity levels, compared with the control group.
Studies of alaryngeal voice have indicated that tracheoesophageal speakers demonstrate impaired pitch modulation, compared with laryngeal speakers, but can produce a range of fundamental frequencies and achieve high proficiency in lexicon stress control.Reference Deschler, Doherty, Reed and Singer17
Moon and WeinbergReference Moon and Weinberg18 evaluated the aerodynamic and myoelastic factors contributing to tracheoesophageal voice, and supported the theory that factors other than effort level (as measured by flow rate and tracheal pressure) serve to modulate fundamental frequency. They concluded that tracheoesophageal voice is an aerodynamic and myoelastic event with passive and active components.
Omori et al. Reference Omori, Kojima, Nonomura and Fukushima19 undertook fluoroscopic and electromyographic studies, and reported that the ability to actively control contraction at the level of the pharyngoesophageal segment can provide alaryngeal speakers with a method of actively modulating pitch. Changes in fundamental frequency and perceptual values would therefore be dependent on myoelastic alteration of the pharyngoesophageal segment, and also upon aerodynamic effects.
Hui et al. Reference Hui, Wei, Yuen, Lam and Nho20 reported that, although a pharyngeal remnant of 1.5 cm was adequate to maintain swallowing function after primary closure, insertion of a voice prosthesis did not enable satisfactory phonation in patients with a neopharynx of such a small diameter. Iwai et al. Reference Iwai, Tsuji, Tachikawa, Inoue, Izumikawa and Yamamichi21 found that primary closure of a pharyngeal remnant smaller than 3 cm often resulted in neopharyngeal stenosis or phonation dysfunction several years after surgery.
In the current series, either primary pharyngoesophageal myotomy or the cross-over neopharyngoplasty technique was used in patients with a hypopharyngeal remnant width of not less than 3 cm, in order to obtain better results for swallowing and phonation.
Study results demonstrated that the cross-over neopharyngoplasty technique not only succeeded in preventing hypopharyngeal spasm (a common consequence of partial pharyngectomy) but also succeeded in maintaining and augmenting the myoelastic activity of the pharyngoesophageal segment, as shown by patients’ more favourable results for amplitude, fundamental frequencies, phonation time, harmonic-to-noise ratio, shimmer, jitter and percentage number of pauses, compared with the myotomy control group.
Patients undergoing cross-over neopharyngoplasty also showed a trend towards more favourable qualitative voice results. Group two patients had higher scores for intelligibility, communicative effectiveness, fluency and speaking rate; improvements in fluency and speaking rate were statistically significant. Although the mean wetness score was higher in group one than group two, this difference was not statistically significant.
These results agree with the quantitative and qualitative tracheoesophageal voice findings for standard laryngectomised patients reported by Robbins et al.,Reference Robbins, Fisher, Blom and Singer22 Blood,Reference Blood23 Blom et al.,Reference Blom, Pauloski and Hamaker5 Most et al.,Reference Most, Tobin and Mimran15 Deschler et al. Reference Deschler, Doherty, Reed, Hayden and Singer6 and Cornu et al. Reference Cornu, Vlantis, Elliott and Gregor24 However, the diversity of tracheoesophageal voice assessment methods and techniques used in these studies makes it difficult to accurately compare study results.
• This prospective study of 30 total laryngectomy plus partial pharyngectomy patients compared tracheoesophageal voice parameters for cross-over neopharyngeal construction vs pharyngoesophageal myotomy
• The cross-over neopharyngoplasty modification of hypopharyngeal closure simply and effectively prevented pharyngoesophageal spasm and maintained effective voice amplitude, fundamental frequencies, temporal measures and perceptual parameters, compared with controls
Perceptual analysis of tracheoesophageal voice has traditionally been considered the gold standard for evaluation of alaryngeal voice. Nevertheless, such analysis requires extensive training, which is time-consuming and expensive.Reference Van As, Koopmans-Van Beinum, Pols and Hilgers25 Therefore, attention has turned to acoustic, quantitative voice analysis, which can be performed quickly and objectively.Reference Van Gogh, Festen, Verdonck-de Leeuw, Parker, Traissac and Cheesman26
In the current study, in order to reliably assess patients’ voices, both objective (quantitative) and perceptual (qualitative) voice parameters were evaluated using standard patterns of voice analysis.Reference Albirmawy, El-Guindy, Elsheikh, Saafan and Darwish8 Perceptual parameters were assessed by trained listeners using voice alone (rather than audiovisual input), in order to avoid any biasing effect of visual cues. There was no need for naïve listeners, the use of whom would have required additional, multiple statistical comparisons. The three trained listeners’ individual measurements showed a high level of inter-observer agreement, and correlated well with quantitative voice analysis results, supporting the validity of the qualitative voice parameters. However, the small sample size in the current study may not have allowed perfect statistical analysis.
Conclusion
The cross-over, zigzag neopharyngoplasty technique is a simple and straightforward variation for reconstruction of a partially resected hypopharynx after total laryngectomy. Results for quantitative and qualitative acoustic parameters of tracheoesophageal voice are comparable to those of pharyngoesophageal myotomy. Keeping the pharyngoesophageal sphincter intact helps improve voice amplitude and shimmer. In addition, repairing the constrictor muscle flaps in a zigzag fashion helps relax the healing tension line of these muscles, prevent circumferential muscular spasm, reinforce and vascularise the mucosal suture line, and maintain a critical level of myoelastic tonicity to facilitate successful phonation, resulting in favourable results for fundamental frequencies, jitter, temporal measures and perceptual values.
Acknowledgements
Special thanks are expressed to Dr Mohamed N Elsheikh for revising the manuscript, and to the members of the Speech Therapy Department for their kind help during voice training sessions and assessment of voice parameters.