Introduction
Auditory dys-synchrony, also known as auditory neuropathy, is a recently described hearing disorder which has unique physiological and perceptual consequences. A major characteristic of this disorder is the disrupted activity of the auditory nerve in the presence of normal or near-normal cochlear outer hair cells.Reference Starr, Picton, Sininger, Hood, Berlin and Auditory1 Auditory brainstem responses (ABRs) are either absent or severely abnormal in these patients. Advances in the accurate assessment of outer hair cell function have made it easier to diagnose auditory dys-synchrony. Persons with this condition generally show poor speech perceptionReference Kumar and Jayaram2 and derive only limited benefit from hearing aids.Reference Berlin, Hood, Morlet, Rose and Brashears3 In the present study, speech perception was studied in individuals with auditory dys-synchrony by lengthening the transition duration and the voice onset time. The effects of lengthened transition duration upon speech perception are presented here. The effect of voice onset time modification, and of combined modification of voice onset time and transition duration, and its relationship with temporal processing ability are reported in a companion paper.
In individuals with sensorineural hearing loss, the prevalence of auditory dys-synchrony has been estimated at approximately 0.54 per cent.Reference Kumar and Jayaram2 Prevalence rates are higher (around 10 per cent) in hyperbilirubinaemic infants.Reference Rance, Beer, Cone-Wesson, Shepherd, Dowell and King4 The exact lesion site and pathophysiology of auditory dys-synchrony are not yet completely understood.
The aetiological factors affecting auditory dys-synchrony are only now being established, and appear diverse. Neonates exposed to hyperbilirubinaemia and anoxia seem to be at increased risk of auditory dys-synchrony.Reference Rance, Beer, Cone-Wesson, Shepherd, Dowell and King4 Genetic factors have also been identified; Starr et al. Reference Starr, Michalewski, Zeng, Brooks, Linthicum and Kim5 reported a novel mutation in the MPZ gene in a family with hereditary motor sensory neuropathy and deafness.
The mechanism of auditory dys-synchrony may include: loss of inner hair cells;Reference Salvi, Wang, Ding, Stecker and Arnold6 dysfunction of synaptic junctions between inner hair cells and the auditory nerve;Reference Starr, Sininger and Praat7 and auditory nerve demyelinisation or axonal loss, or both.Reference Starr, Picton and Kim8
Individuals with auditory dys-synchrony show marked deficits in processing temporal (i.e. time-based) auditory information, but have relatively good processing of auditory intensity and frequency information.Reference Rance, McKay and Grayden9 It is now well established that the speech identification problems of individuals with auditory dys-synchrony are disproportionate to their degree of hearing loss.Reference Starr, Picton, Sininger, Hood, Berlin and Auditory1 In fact, it can be said that poor speech perception which is disproportionate to the pure tone hearing thresholds is the cardinal characteristic of auditory dys-synchrony. In affected individuals, the degree of speech perception appears to depend on the extent of suprathreshold temporal distortion of speech cues, rather than on access to the speech spectrum.Reference Rance, McKay and Grayden9–Reference Kraus, Bradlow, Cheatham, Cunningham, King and Koch13 This is in contrast to subjects with cochlear hearing loss, who typically demonstrate loudness recruitment, broadening of auditory filters and normal processing of temporal information, at least at high sensation levels. Persons with cochlear hearing loss derive significant benefit from hearing aids which employ nonlinear compression circuits. All such hearing aids assume abnormal functioning of outer hair cells. Hence, these aids are seldom of benefit for individuals with auditory dys-synchrony, who have normal outer hair cell functioning.Reference Berlin, Hood, Hurely and Wen14 These hearing aids do not change the temporal parameters of speech sound (using linear amplification) or reduce fluctuations when a non-linear amplitude-compression circuit is employed.Reference Plomp15 Other appliances and strategies that are helpful to individuals with auditory dys-synchrony include frequency modulation systems, cochlear implants, perceptual training, speech reading and cued speech. Cochlear implantation may be a viable option for some patients with auditory dys-synchrony.Reference Plomp15–Reference Peterson, Shallop, Driscoll, Breneman, Babb and Stoeckel19 However, many patients, particularly in developing countries, will find the cost of a cochlear implant prohibitive.
Previous studies on the speech perception and psychophysical abilities of individuals with auditory dys-synchrony have suggested that temporal processing is markedly affected. In fact, there is reasonable evidence to suggest that individuals with auditory dys-synchrony find it more difficult to perceive short duration dynamic sounds than long duration steady sounds.Reference Zeng, Kong, Michalewski and Starr11, Reference Kraus, Bradlow, Cheatham, Cunningham, King and Koch13
Therefore, the purpose of the present study was to investigate the effect of the transition duration of speech segments (a short duration temporal cue) upon the speech perception of individuals with auditory dys-synchrony. The specific aims of the study were (1) to measure ‘just noticeable differences’ in speech segment transition duration, using speech stop consonants, in individuals with auditory dys-synchrony compared with normal hearing persons, and (2) to investigate the effect of lengthened speech segment transition durations upon the speech perception of individuals with auditory dys-synchrony.
Materials and methods
Experiment one: subjects
Two groups of subjects participated in experiment one. The first group consisted of 30 individuals with auditory dys-synchrony (19 men and 11 women; age range 16–30 years, mean age 22.4 years). The second group consisted of 30 age- and gender-matched normal hearing individuals. All auditory dys-synchrony subjects were recruited from the audiology department of the All India Institute of Speech and Hearing, Mysore.
Prior to recruitment, all subjects underwent a structured interview conducted by the first author, an audiologist and speech-language pathologist. These interviews established that none of the subjects had any history of middle-ear disease, noise exposure, ototoxic drug usage, or a previous history of sound amplification or hearing rehabilitation.
Of the 30 auditory dys-synchrony subjects, 25 reported onset of auditory dys-synchrony between the ages of 16 and 20 years, three reported onset at 22–24 years, and the remaining two reported onset at 25 years. No specific aetiology could be identified in most subjects. Two participants reported that the problem had started after they had given birth to their first child. All the subjects had clinically normal speech.
The results of various audiological tests conducted on the auditory dys-synchrony subjects are given in Table I.
S no = subject number; y = years; PTA = pure tone average (of 0.5, 1 and 2 kHz); RE = right ear; LE = left ear; SIS = speech identification score (for monosyllables);22 TEOAE = transient evoked otoacoustic emissions; SNR = signal to nose ratio; ABR = auditory brainstem response; OAE supprn = otoacoustic emission suppression; F = female; M = male; A = absent; peaked = sharp peak at single frequency with worsening of thresholds at immediately adjacent frequencies by more than 10 dB; rising = decrease in threshold by 5 dB or more per octave
Figure 1 shows representative ABR and otoacoustic emission waveforms for a subject with auditory dys-synchrony.
The conditions and procedures used for audiological evaluation were as described by Kumar and Jayaram.Reference Kumar and Jayaram2 The methods used to measure contralateral suppression of otoacoustic emissions were the same as described by Kumar and Vanaja.Reference Kumar and Vanaja23
Before selection for the study, all auditory dys-synchrony subjects underwent an ENT examination to exclude any external or middle-ear pathology. Similarly, all subjects underwent a neurological examination, conducted by a qualified neurologist, to exclude any peripheral neuropathy or space-occupying lesion.
Subjects in the normally hearing group had hearing thresholds within 15 dB HL at octave frequencies between 250 and 8 kHz, and had normal results on immittance evaluation.
All subjects were native speakers of Kannada, a Dravidian language spoken by approximately 55 million people in South India.Reference Jayaram24
Experiment one: stimuli preparation
Consonant–vowel syllables with voiceless stop consonants (i.e. velar /ka/, alveolar /ta/, retroflex /ta/ and bilabial /pa/) and their voiced cognates were used in the study. These syllables, spoken in isolation by a 25-year-old, male, native Kannada speaker, were digitally recorded onto a data acquisition system at 16 kHz sampling frequency with a 16-bit analogue-to-digital converter.
The recorded syllables were then altered using the pitch synchronised overlap and add technique, accessed via the Praat software package (Amsterdam, The Netherlands).25 The pitch synchronised overlap and add technique enables temporal lengthening of stimulus in the time domain while still preserving most of the physical characteristics of the stimulus, such as spectral shape, amplitude distribution and periodicity.Reference Moulines and Lorche26 It was difficult to accurately segment natural speech. Identification of transition was confirmed as follows. It was noted that, in transition regions, the sound frequency changed by more than 200 Hz (in 25–30 ms depending on speech sound). However, in the steady-state portions of the vowel, the change in sound frequency was less than 40 Hz in 30 ms. Furthermore, pitch synchronised overlap and add technique processing was applied within 40 ms of the stimulus onset. Thus, we were reasonably certain that only the format transition of the syllables had been lengthened. It is highly unlikely that any significant part of the steady-state segment of the vowel was also lengthened, although such a possibility cannot be excluded.
Figure 2 shows the waveform, spectrogram and spectrum of the unmodified syllable /ba/, while Figure 3 shows the same parameters for the modified syllable /ba/.
Experiment one: procedure
Both the unmodified and modified speech material was presented, in turn, through a loud speaker placed at an angle of 0° azimuth and a distance of 1 m away from the subject. The presentation level of the sound signals was kept constant at 40 dB SL (the reference was average thresholds at 500 Hz, 1 kHz and 2 kHz). If 40 dB SL could not be reached due to audiometric limits or subject discomfort, testing was performed at an intensity 10 dB below the level of discomfort. This was required in three participants (subjects number 8, 20 and 29).
The loud speaker output was calibrated at the beginning of the experiment and then regularly during the test period, using a sound level meter (Quest 1800; Quest technologies, Wisconsin, USA) and microphone (Quest 4180).
Subjects were tested individually in a sound treated room. Sound signals were fed through a personal computer, at a sampling frequency of 44 kHz, to an audiometer (Maico MA-53; Minnesota, USA). Subjects received the sound signals through a loudspeaker connected to the audiometer. The loudspeaker was positioned as described above. The ‘just noticeable difference’ was determined using an adaptive tracking technique (Parameter estimation through sequential testing (PEST)) with an AX same-difference discrimination paradigm (where A = anchor stimulus and X = variable stimulus): subjects were asked to indicate whether A was the same as X or not. The inter-stimulus interval between the anchor and variable stimuli was 500 ms. The step size and the direction of the variable stimulus were changed as determined by “parameter estimation through sequential testing” technique.Reference Tyler, Summerfield, Wood and Farnandes27 Each subject's just noticeable difference time was determined by calculating the difference in transition duration between the anchor stimulus and the variable stimulus which was required to achieve a 69 per cent correct response. Test trials included an equal number of catch trials; these consisted of either two identical anchor or two identical non-anchor stimuli.
Experiment two: speech identification
Speech identification scores for unmodified speech stimuli were obtained for both the auditory dys-synchrony subjects and the normal hearing subjects. Ten repetitions of each of the eight unmodified speech signals were randomly presented and speech identification scores were noted. Stimuli were played through a personal computer at a sampling frequency of 44.1 kHz, and then fed into a calibrated clinical audiometer (Maico MA-53). Subjects received the stimuli through a loudspeaker connected to the audiometer. The loudspeaker was positioned at an angle of 90° azimuth and a distance of 1 m from the subject. The sound presentation level was kept constant at 40 dB SL for all subjects (the reference was average thresholds at 500 Hz, 1 kHz and 2 kHz).
Written responses were obtained from subjects if they were literate. If subjects were illiterate, the experimenter and another native Kannada speaker noted down the subject's responses independently; only those responses for which there was 100 per cent agreement between the two observers were used for the further analysis.
Subsequently, the speech signal transition duration was lengthened by a time period equal to multiples of the just noticeable difference time, and the effect of this lengthening upon the subject's speech identification score was determined. Only subjects with auditory dys-synchrony participated in this experiment. Four modified speech sounds were generated to evaluate the effect of lengthening of the transition duration upon the subject's perception of consonant–vowel syllables. The first modified speech sound was the syllable /pa/ with the transition duration lengthened by a time period equal to one just noticeable difference (the mean just noticeable difference of the normal listeners was used). Similarly, three more modified speech sounds were generated wherein the transition duration was lengthened by time periods equal to two, three or four just noticeable difference times.
Each modified speech sound was presented to the subject 10 times. Presentation stopped when the subject identified the correct sound, or the sound which elicited the closest response to the target sound as determined by feature analysis. Feature analysis considered voicing and placement of articulation information. The order of presentation of sound objects was randomised to minimise the practice effect.
Results
Experiment one
Repeated measures analysis of variance revealed a statistically significant difference between the just noticeable difference times for altered syllable stop consonant transition durations, comparing the auditory dys-synchrony and normal hearing groups (F (1, 63) = 4471; p < 0.01).
Independent sample t-testing was conducted to establish the statistical significance of differences between the just noticeable difference times for each speech sound signal, comparing the two groups. Results revealed a statistically significant difference in the two groups' just noticeable difference times, for all sound signals. Figure 4 shows the means and 95 per cent confidence intervals for the observed just noticeable difference times.
Experiment two: individual data
Figure 5 shows individual subjects' speech identification scores for unmodified and modified (i.e. lengthened transition duration) speech sounds.
Speech identification scores varied between 0 and 87 per cent for the unmodified speech sounds. When presented with the unmodified speech sounds, 11 auditory dys-synchrony subjects had a speech identification score of 0 per cent, while only four auditory dys-synchrony subjects had speech identification scores of more than 50 per cent.
Subjects' speech identification scores for the modified speech sounds ranged between 0 and 100 per cent. When presented with the modified speech sounds, six auditory dys-synchrony subjects had a speech identification score of 100 per cent.
Experiment two: auditory dys-synchrony group data
Figure 6 shows mean speech identification scores for unmodified and TD lengthened speech sounds. Data from all the subjects have been combined for each speech sound and shown in the figure.
Table II gives a group stimulus-response matrix for unmodified speech sound identification by the auditory dys-synchrony subjects. Each stimulus was presented 10 times, resulting in a total of 300 presentations of each stimulus for the 30 auditory dys-synchrony subjects. However, the row total for each stimulus was less than 300, as on some occasions the subjects could not identify the stimuli and reported hearing only noise. In this stimulus-response matrix, the number in each cell represents the number of times the speech sound shown in the row heading was identified as the speech sound shown in the column heading. The number of correct responses (for all sounds) can be obtained by summing the numbers occurring along the main diagonal of the table (i.e. from the top left data cell to the bottom right data cell). The matrix reveals that auditory dys-synchrony subjects correctly identified /da/ more than any other sound. The syllable /ga/ was the next best identified. However, identification scores did not exceed 50 per cent for any of the unmodified speech sounds presented. The stimulus-response matrix shows little consistent grouping of results among phoneme categories. The two exceptions were (1) the syllable /ba/, which was frequently confused with its unvoiced cognate /pa/, and (2) the syllable /da/, which was confused with /ga/. The syllables /ta/ and /pa/ were only rarely identified correctly.
Bold numbers indicate correct responses. Data represent combined responses for all subjects (percentage correct scores, rounded to nearest integer).
Table III gives a group stimulus-response matrix for modified speech sound identification, for the auditory dys-synchrony subjects. This Table includes only those data for the just noticeable difference modification that resulted in near-normal perception of the consonant–vowel syllable. The data in this stimulus-response matrix should not be directly compared with those in Table II, as the number of presentations of unmodified and modified speech sounds was different. It is noteworthy that confusion between speech sounds was greatly reduced when the transition duration was lengthened. There was greater consistency in the pattern of errors between speech sounds. Lengthening of the transition duration reduced ‘place’ confusion between the phonemes. Confusion of /ba/ with /pa/ was reduced but not eliminated. Also, /dha/ was frequently confused with /da/.
Bold numbers indicate correct responses. Data represent combined responses for all subjects (Values in parenthesis indicate percentage correct scores, rounded to nearest integer).
* Lengthened transition duration.
Sequential information transfer analysisReference Wang and Bilger28 was performed on the group data for each experimental condition, to assess the amount of information transfer from stimulus to response for each set of phonetic features. Sequential information transfer analysis was performed using the Feature Information Xfer software package (developed by the University College of London linguistics department), which uses the procedure described by Wang and Bilger.Reference Wang and Bilger28 In sequential information transfer analysis, features go through a number of iterations. In the present study, there were two iterations. The first iteration was the same as the information transmission analysis described by Miller and Nicely,Reference Miller and Nicely29 wherein information transmitted for each feature is calculated. In the subsequent iterations, the feature with the highest percentage of information transmitted in the previous iteration is held constant and is partialled out. Thus, sequential information transfer analysis helps to estimate the redundancy of a specific feature in its contribution to the perception of monosyllables.
Table IV shows the features assigned to the eight speech sounds used in the study. These eight speech sounds were basically differentiated by their voicing and placement of articulation, as all eight were stop consonants.
+ = present; – = absent; b = bilabial; a = alveolar; d = dental; v = velar
Table V shows the relative information transmitted by each feature, expressed as bits, for the unmodified and modified speech sounds. Maximum information that can be transmitted for eight stimuli is three bits (i.e. the maximum information which can be transmitted is 2n, where n = number of stimuli). In Table V, the numbers in the second and third columns represent the proportion of input information transmission. Zero indicates no transmission of that particular feature, while a value of 1 indicates maximum transmission of information. Numbers in the fourth column represent the total information transmitted, which can range between 0 and 3 bits. In general, the patterns seen in the total transferred information and in the information transfer for each phonetic feature are similar to the pattern found in the stimulus-response matrices shown in Tables II and III and a simple tally of consonants correctly identified.
*Lengthened transition duration.
In individuals with auditory dys-synchrony, lengthening the transition duration appears to result in better transmission of placement information, compared with voicing information.
Discussion
The important findings of this study were: (1) individuals with auditory dys-synchrony have severely impaired temporal processing abilities (as shown by their long just noticeable difference times), and (2) lengthening of speech signal transition duration significantly improved speech perception in such individuals.
The existence of temporal processing deficits in individuals with auditory dys-synchrony has been well documented using non-speech signals.Reference Kumar and Jayaram10–Reference Zeng, Oba, Garde, Sininger and Starr12 Data from the present study indicate that individuals with auditory dys-synchrony have severe difficulties in differentiating speech segments with altered transition durations, as indicated by their long just noticeable difference times. The just noticeable difference times of auditory dys-synchrony subjects were approximately three to four times longer than those of normal hearing listeners. This may mean that such individuals will have difficulty discriminating between speech sounds that differ only (or mainly) in their temporal characteristics.
Kraus et al. Reference Kraus, Bradlow, Cheatham, Cunningham, King and Koch13 reported a subject with auditory dys-synchrony who had marked difficulty in discriminating between speech sounds of differing spectral onset. Transitions comprise rapid spectral changes occurring at the onset of a speech stimulus. Results from the present study, together with those of Kraus et al.,Reference Kraus, Bradlow, Cheatham, Cunningham, King and Koch13 provide evidence that individuals with auditory dys-synchrony have difficulties in processing temporal and spectral information presented at the onset of speech stimuli.
In the present study, confusion matrices and sequential information transfer analysis showed that lengthening the transition duration of speech stimulus resulted in better transfer of information. However, the results of sequential information transfer analysis should be interpreted with caution. Such analysis is robust when the stimuli occur with equal frequency; however, this could not be ensured in the present study. The number of stimuli was different for different speech sounds, depending on whether they were identified in the unmodified speech condition or not. Obviously, speech syllables which were identified in their unmodified condition were not included in subsequent experiments. This limitation should be kept in mind when interpreting the results of sequential information transfer analysis.
It has been shown that individuals with auditory dys-synchrony have difficulty in processing short duration sounds.Reference Zeng, Kong, Michalewski and Starr11 Format transitions are short (<50 ms), but they are important acoustic cues for the placement of articulation in stop-vowel syllables. Lengthening of the transition duration of a speech segment may enhance an auditory dys-synchrony individual's perception of this speech segment, but the mechanism is unclear. Similar results have been reported in language learning impaired children.Reference Tallal, Miller, Bedi, Byma, Wang and Nagarajan30, Reference Tallal and Piercy31 Tallal et al. Reference Tallal, Miller, Bedi, Byma, Wang and Nagarajan30, Reference Tallal and Piercy31 have reported improved speech perception in such children when speech stimuli are modified to increase the duration of format transitions and the modulation depth.
• Auditory dys-synchrony is characterised by disrupted activity of the auditory nerve in the presence of normal or near-normal functioning of the cochlear outer hair cells
• Speech perception is adversely affected
• Conventional amplification is of little help
• Modifying the temporal structure of speech stimuli may improve speech perception in individuals with this disorder
We hypothesise that lengthening the transition duration of a speech signal reduces the modulation frequency without altering the modulation depth or the overall spectrogram of the stimulus. This hypothesis was tested by lengthening sinusoidally amplitude modulated white noise by a factor of two. This resulted in a reduction of modulation frequency but no change in modulation depth or spectrogram. Figure 7 shows the spectrogram and waveform of unaltered and lengthened white noise. Data from our laboratory, as well as from others, suggest that individuals with auditory dys-synchrony have difficulty in processing high modulation frequencies.Reference Zeng, Kong, Michalewski and Starr11–Reference Kraus, Bradlow, Cheatham, Cunningham, King and Koch13 Reduction in modulation frequency (by lengthening the speech signal transition duration) augments such individuals' speech perception, as their modulation detection is better at low modulation frequencies than at high modulation frequencies.
Conclusion
The present study provides indirect evidence that hearing aids which process speech to enhance certain critical short events may be beneficial to persons with auditory dys-synchrony. One such factor is enhancement (lengthening) of transition duration which seems to improve perception of both manner and place cues in individuals with auditory dys-synchrony. Hence, new types of hearing aid which augments/modifies temporal information of speech may improve speech recognition in patients with auditory dys-synchrony.
Acknowledgement
The authors would like to thank the authorities of the All India Institute of Speech and Hearing, Mysore, for providing the infrastructure for the study. This study forms part of the first author's doctoral thesis.