Published online by Cambridge University Press: 01 July 2005
Pure word deafness (PWD) is a rare neurological syndrome characterized by severe difficulties in understanding and reproducing spoken language, with sparing of written language comprehension and speech production. The pathognomonic disturbance of auditory comprehension appears to be associated with a breakdown in processes involved in mapping auditory input to lexical representations of words, but the functional locus of this disturbance and the localization of the responsible lesion have long been disputed. We report here on a woman with PWD resulting from a circumscribed unilateral infarct involving the left superior temporal lobe who demonstrated significant problems processing transitional spectrotemporal cues in both speech and nonspeech sounds. On speech discrimination tasks, she exhibited poor differentiation of stop consonant-vowel syllables distinguished by voicing onset and brief formant frequency transitions. Isolated formant transitions could be reliably discriminated only at very long durations (>200 ms). By contrast, click fusion threshold, which depends on millisecond-level resolution of brief auditory events, was normal. These results suggest that the problems with speech analysis in this case were not secondary to general constraints on auditory temporal resolution. Rather, they point to a disturbance of left hemisphere auditory mechanisms that preferentially analyze rapid spectrotemporal variations in frequency. The findings have important implications for our conceptualization of PWD and its subtypes. (JINS, 2005, 11, 456–470.)
Pure word deafness (PWD) is a rare neurological syndrome characterized by severe difficulties in understanding and reproducing spoken language, with otherwise intact speech production and written language comprehension (Kussmaul, 1877; Lichtheim, 1885). The deficiencies in decoding language occur at a stage of analysis that markedly impairs the processing of speech sounds, words, phrases, and sentences, but leaves environmental sound recognition and identification relatively preserved. Because deficits are specific to the auditory modality, this pattern of impairment has also been referred to as “word sound deafness” (Kohn & Friedman, 1986; Franklin, 1989) or “verbal auditory agnosia” (Ulrich, 1978; Wang et al., 2000). In the majority of published reports, it is the sequelae of cerebrovascular accidents, but it has also been described in association with tumor (Goldstein, 1948), seizures (Stefanatos, 1993; Fung et al., 2000), head injury (Seliger et al., 1991), degenerative aphasia (Mesulam, 1982; Croisile et al., 1991; Otsuki et al., 1998), encephalitis (Goldstein et al., 1975), and drug toxicity (Donaldson et al., 1981).
The syndrome is of significant theoretical importance because it provides clinical support for the modularity of speech perception as separable from nonverbal auditory recognition systems and more central language computational networks (Allport & Funnell, 1981; Polster & Rose, 1998; Poeppel, 2001; Pinard et al., 2002). However, longstanding disagreement exists regarding the nature of the pathognomonic auditory comprehension deficits and their neurological basis (Goldstein, 1974; Buchman et al., 1986; Praamstra et al., 1991). Phenomenological descriptions, in some cases, have implicated a substantial dissolution of auditory perceptual processing, sufficient in severity to impede recognition of the acoustic characteristics of the human voice (Miceli, 1982). Speech is described in such terms as “a noise” (Coslett et al., 1984; Buchman et al., 1986), “a hurr or buzzing” (Mendez & Geehan, 1988), like “wind in the trees” (Ziegler, 1952), or “the rustling of leaves” (Luria, 1966). However, other subjective descriptions allude to subtler disturbances, suggesting a continuum of severity. Spoken communications are recognized as speech, but a breakdown appears to occur early in the process of mapping the acoustic input to lexical representations of words. Discourse sounds like “jabbering” or “a foreign language” (Denes & Semenza, 1975; Auerbach et al., 1982; Buchman et al., 1986; Mendez & Geehan, 1988), or simply does not “register” (Saffran et al., 1976). There are also frequent suggestions that perceptual or cognitive resources are incapable of keeping up with the rate at which speech is produced: “words just run together” (Klein & Harper, 1956) or “come too quickly” (Albert & Bear, 1974).
Experimental investigations of word deafness have yielded substantively different conceptions of the functional locus of the underlying processing disturbances. Several reports have noted fundamental problems in basic aspects of auditory temporal processing, including deficient intensity-duration functions (Kanshepolsky et al., 1973) and poor resolution of temporally distinct auditory events (Albert & Bear, 1974; Auerbach et al., 1982; Tanaka et al., 1987; Buchtel & Stewart, 1989; Best & Howard, 1994; Godefroy et al., 1995). Based on such findings, it has been argued that PWD results from general limitations in fine-grained auditory temporal analysis (Albert & Bear, 1974; Auerbach et al., 1982; Phillips & Farmer, 1990) that are particularly detrimental to language comprehension because they impede the ability to perceive brief spectrotemporal cues in speech that are important to the derivation of linguistic meaning. In contrast, others have noted systematic patterns of error on speech discrimination and identification tasks that implicate problems at phonetic levels of analysis, a higher-order stage of auditory processing specific to speech (Saffran et al., 1976; Caramazza et al., 1983; Metz-Lutz & Dahl, 1984; Praamstra et al., 1991).
There is also disagreement regarding the neuroanatomical substrate of the syndrome. The majority of cases reported in the literature demonstrate bilateral temporal lobe lesions, particularly involving the middle and posterior portions of the superior temporal gyrus or underlying geniculotemporal pathways (Bauer & Zawacki, 2000; Poeppel, 2001). However, PWD was first described in patients considered to have unilateral lesions of the left posterior temporal lobe (Kussmaul, 1877; Lichtheim, 1885), and a number of subsequent case reports have affirmed that subcortical-cortical lesions in this localization can result in the symptom complex (Saffran et al., 1976; Kamei et al., 1981; Metz-Lutz & Dahl, 1984; Shindo et al., 1991; Takahashi et al., 1992; Wang et al., 2000).
To reconcile these contrasting views, it has been suggested that there may be at least two types of PWD (Auerbach et al., 1982; McCarthy et al., 1990; Phillips & Farmer, 1990). One proposed form is the sequelae of bitemporal lesions and is associated with “prephonemic” disturbances of auditory temporal resolution. By contrast, a second form stems from lesions of the left superior temporal lobe and underlying white matter and is linked to impairment at phonetic levels of processing.
In this article, we describe the results of a detailed analysis of auditory processing in a prototypical case of PWD resulting from a well-defined, discrete, unilateral left temporal lesion. Our findings revealed substantial impairment in the ability to process rapid spectrotemporal variations in speech and nonspeech sounds. However, other aspects of auditory processing were relatively spared, even though they required a higher degree of temporal resolution. These observations have important implications for our conceptualization of PWD and its subtypes.
NH (PR00001-NH)a
Human subjects policy at the sponsoring institution now precludes identifying participants by their actual initials. The identifier used here is a code. Future reports issued from our institution that involve this subject will identify her by this same code (PR00001-NH).
As awareness returned postoperatively, NH noted that she could not understand the dialogue in programs aired on her hospital room television, although she could pick out words periodically. She readily oriented to and could hear spoken communications by hospital staff but had profound difficulty comprehending the meaning of simple verbal questions or statements. She was transferred to an inpatient rehabilitation hospital at 3 weeks post-onset.
Assessment of speech and language during the fourth week revealed average oromotor function. There was no facial asymmetry, and labial, lingual, and velar movements were within normal limits. Expressive language was fluent with good intelligibility, articulatory precision, rate, and prosody. She produced infrequent phonemic paraphasias. Object naming was adequate but repetition was impaired at the word level. Understanding of basic one-step commands generally required frequent repetition along with a decreased rate of speech. While she had briefly demonstrated reading problems postoperatively, these had resolved at the time of this evaluation. Narrative writing skills were also adequate.
Following a week of inpatient rehabilitation, she was discharged to a Day Program, which she attended for 4 weeks. She continued to receive outpatient speech-language therapy twice weekly. While she learned compensatory strategies for her receptive problems, the auditory processing deficits remained unchanged. The investigations described below were completed during this time.
NH's history prior to the events surrounding her aneurysm bleed was noncontributory. She had no history of seizures, significant head injury, or neurologic disease. She had attended public schools and generally received B's and C's in her mainstream classes. On completing the 11th grade, she obtained her General Education Diploma and worked as a salesperson until she started having children.
On physical examination a few months post-onset, she was found to have nonnodular thyroid swelling but thyroid function tests were normal. Her balance was slightly impaired. She could stand on her right foot for 12 s and on her left for 15 s. She did not demonstrate evidence of lateralized motor deficits, although she reported that her right hand had diminished sensation and fatigued easily. She demonstrated a full range of motion in her upper and lower extremities and, consistent with her subjective report of somatosensory changes, she had mildly decreased sensation to sharps on the right side of her body.
An audiological screening was completed using a standard calibrated clinical audiometer varying in 5 dB steps. Pure-tone thresholds were within normal limits for the left ear. Her threshold was at 10 dB at 250 Hz, 500 Hz, 750 Hz, 1 kHz, 2 kHz, 3 kHz, and 4 kHz while at 8 kHz her threshold was 15 dB. In the right ear, sensitivity remained between 10 and 20 dB at frequencies from 250 Hz to 3 kHz. However, slightly raised thresholds (30 dB) were evident at 4 and 8 kHz.
Proton density, T1-, and T2-weighted magnetic resonance images were obtained using a 1.5 Tesla scanner (see Fig. 1). These scans revealed encephalomalacia and gliosis involving the dorsal surface of the left temporal lobe extending into subadjacent white matter. This included the planum polare, the transverse temporal (Heschl's) gyrus, and the planum temporale. In the lateromedial plane, the abnormalities extended from the dorsal convexity of the superior temporal gyrus to the insula. At points, there was also minimal involvement of adjoining intrasylvian cortex in the frontoparietal operculum. Homologous areas of the right temporal and perisylvian region appeared intact. A small amount of artifact was present from the implanted coil.
Magnetic resonance images showing encephalomalacia and gliosis in the left temporal region involving the planum polare, the transverse temporal (Heschl's) gyrus, and the planum temporale and subadjacent white matter. Abnormalities extended in the lateromedial axis from the dorsal surface of the superior temporal gyrus to the insula. This study was obtained 4 months post-onset.
Nonverbal cognitive ability, assessed with the General Assessment of Mental Abilities (Naglieri & Bardos, 1997), revealed a nonverbal IQ of 87, corresponding to the upper end of the low average range. Academic abilities were commensurate with expectations given her general level of cognitive function and educational background. On the Wide Range Achievement Test–Third Edition (Jastak & Wilkinson, 1992), her reading/word recognition skills fell in the low average range in comparison with other individuals her age. On the Gates-MacGinitie Reading Tests (MacGinitie & MacGinitie, 1978), her score on the reading comprehension subtest corresponded to a grade level equivalent of 10.9.
Assessment of speech and language utilizing the Western Aphasia Battery (WAB) (Kertesz, 1982) revealed marked problems with auditory language comprehension and repetition. She had substantial difficulty understanding sequential commands (e.g., “point to the window and then the door”) (20/80) and in correctly answering biographical/nonbiographical questions requiring a basic “yes/no” response (27/60). She made only a few mistakes in her auditory recognition of color names, real objects, body parts, numbers, and letters (56/60), although this appeared to be facilitated by the fact that items are identified from a small closed set of alternatives (6) and one repetition is allowed. Consistent with other reports of PWD (Shindo et al., 1991; Jacobs & Schneider, 2003), she also appeared to utilize lip reading to facilitate her auditory comprehension. Her ability to correctly repeat single words, phrases and sentences was significantly impaired (39/100).
By contrast, expressive language was relatively spared. Speech was fluent and grammatically correct with normal phrase length. Rare phonemic paraphasias were noted and appeared related to periodic lapses in self-monitoring. She obtained a perfect score (60/60) on the naming subtest of the WAB, and word fluency was adequate (14/20). Performance on the sentence completion (2/10) and responsive speech (2/10) subtests was poor secondary to her comprehension deficits. Reading comprehension (34/40) on the WAB was within normal limits. Spontaneous writing was adequate, but she had significant difficulties writing to dictation. Overall, she obtained an Aphasia Quotient of 69.7.
Several ancillary measures of language were also administered. On the Token Test (DeRenzi, 1978), a measure of auditory comprehension using nonredundant commands, she scored at the 1st percentile in comparison with individuals her age. Her performance on measures of speech discrimination was also impaired. On Benton's Phoneme Discrimination Test (Benton et al., 1983), she was required to make perceptual judgments (same-different) on pairs of nonsense syllables/words. Half of the pairs differed in one major phonemic feature. She correctly discriminated only 18 of 30 items. A score less than 22 is below the lowest performance of the normative controls and is considered “defective.” Similarly, she performed poorly on a word discrimination task from the Psycholinguistic Assessments of Language in Aphasia (Kay et al., 1992) requiring her to match single spoken words to a corresponding picture in a three-alternative forced-choice format. Whereas the mean score for control subjects is 39/40 (SD = 1.7), NH obtained a score of 33/40.
NH was administered a dichotic listening test using words (Damasio & Damasio, 1980) presented at approximately 60 dB above threshold. During practice trials with monaural presentation, she demonstrated significant difficulty, correctly repeating only 30% of items. There did not appear to be a remarkable difference in performance comparing left and right ears, although she subjectively noted that speech presented to the right ear was more difficult to identify. On dichotic presentation, she demonstrated comparable levels of performance. However, right ear extinction was apparent in a complete inability to accurately recall words presented to the right ear.
To further examine the extinction effect, NH was administered the consonant-vowel dichotic task described by Hugdahl et al. (1991). Stimuli consisted of a set of 36 dichotic consonant-vowel pairs produced by combining the stop consonants—/ba/, /da/, /ga/, /ka/, /pa/, /ta/—in all possible combinations (including six identical pairs, e.g., /ba/-/ba/). Three different attentional conditions included: (1) a nonforced (NF) attention condition, in which she was asked to indicate what she heard on each trial; (2) a forced left (FL) condition, where she was asked to recall only stimuli presented to the left ear; and (3) a forced right (FR) condition, on which she was instructed to focus and identify only right ear stimuli. Following Hugdahl and Asbjornsen (Manual for Dichotic Listening with CV-Syllables), only the first response was scored from each trial. In the NF condition, NH demonstrated better recall from the left ear (26.67%) compared to the right ear (16.67%). This ear asymmetry was accentuated in the FL condition (L ear = 40%, R ear = 3.32%), but there was little change in her overall level of performance. In the FR condition, left ear recall was 36.67% while right ear recall was 16.67%. When monaural consonant-vowels were presented to the right ear, her accuracy (23.3%) was not substantially different from her best right ear performance on the dichotic pairs. Similarly, when monaural stimuli were presented to only the left ear, her performance (40%) remained close to her best left-ear performance with the dichotic pairs. When compared to age norms, she demonstrated depressed recall of information presented to her right ear in all dichotic conditions.
NH was administered a 40-item environmental sounds recognition test (Stefanatos & Madigan, 2000). Each item was a 2 s segment of an environmental sound corresponding to one of four sound categories: (1) human nonverbal (e.g., laughing, coughing); (2) man-made, inanimate (e.g., car crash, toilet flushing); (3) nonhuman, animate (e.g., dog barking, cow mooing); and (4) natural, inanimate (e.g., wind, fire crackling). After hearing each sound, she was asked to point to the corresponding picture in a four-alternative forced-choice paradigm. Each response card included a pictorial representation corresponding to the target, an acoustic foil, a semantic foil, and an object that was neither acoustically nor semantically related to the target. Acoustic distractors produced sounds similar to the target but were from a different semantic category. Semantic distractors were from the same semantic category as the target but were acoustically disparate. A third distractor type was neither semantically or acoustically confusable with the target. She produced only four errors on this entire task, which is within normal limits based on comparison with a small normative sample. She produced two acoustic errors (volcano erupting → pistol firing, heavy footsteps on wooden stairway → hammering a nail) and two semantic errors (avalanche → tree falling, baby cooing → baby sleeping, mouth closed). These findings suggested that her auditory processing problems did not extend to environmental sounds. Moreover, she claimed no change in her perception and appreciation of music, except that she could no longer understand the lyrics.
Three parallel tasks were devised to examine NH's perception of speech (vowels and consonant-vowels) and nonspeech sounds (complex tones). All stimuli were digitally synthesized (16-bit resolution at a sampling rate of 44.1 kHz) using PRAAT 4.1 software to permit precise control over acoustic parameters such as onset/offset, envelope shape, and frequency characteristics. The digital waveforms were converted to analog signals by a high-performance external USB soundcard/amplifier (Edirol UA-5) and passed through custom-built attenuators prior to transduction by Sennheisser HD-580 headphones. Unless otherwise indicated, tasks required a perceptual discrimination between pairs of stimuli presented with an 800 ms interstimulus interval (ISI). She was instructed to listen to each pair and press one button on a response pad to indicate that the two sounds presented were the “same” or an adjacent button to indicate they were “different.” Stimulus presentation and response collection were controlled by using E-prime software (Psychology Software Tools, 2001). Each task included six tokens that were contrasted in all possible combinations.
On a vowel discrimination task, NH was asked to discriminate between pairs of the following vowels: /i/ as in beet,
as in bit,
as in bet,
as in bat,
as in cot, and /u/ as in boot. These were generated utilizing the parameters for vowel formant frequencies outlined by Peterson and Barney (1952). The second task, a nonspeech analogue of the vowel discrimination task, required NH to discriminate pairs of complex tones. Each token was comprised of four pure-tone frequency components corresponding to the center formant frequencies of the equivalent vowel described above. A third task assessed consonant-vowel (CV) discrimination. The CV tokens were synthesized by pairing six stop consonants (/b/, /d/, /g/, /p/, /t/ and /k/) with one of the synthesized vowels,
. The consonantal burst appropriate to each CV was sampled from natural productions by a male speaker and inserted at the onset of each CV. All stimuli were 250 ms in duration. Spectrograms of a representative complex tone, vowel, and CV are provided in Figures 2A, 2B, and 2C, respectively. As can be appreciated in these illustrations, both complex tone and vowel stimuli were characterized by steady state spectra, whereas CV syllables contained rapidly changing spectrotemporal cues at onset. Normal controls perform at or near ceiling on all three of these tasks (92–100%).
These spectrograms plot the frequency spectra of the four types of speech and nonspeech stimuli. A represents a four component complex tone, B depicts the corresponding vowel , C shows the CV syllable , and D provides examples of the single formant stimuli. The extent of these formant transitions was modeled after the second formant in .
Each contrast was presented 12 times in random order in the course of two sessions. NH demonstrated significant difficulties in discriminating the consonant-vowel syllables. She correctly identified 69% of the items, which was not significantly better than chance (Binomial Test). There were no systematic error patterns related to distinctive feature contrasts. She demonstrated as much difficulty with contrasts of voicing as she did with place of articulation. However, she performed better when speech sounds differed by more than one distinctive feature (both voicing and place of articulation).
By contrast, she was able to correctly discriminate complex tone contrasts on 90% of trials, which is not substantially different from performance seen in age and sex matched control subjects. An intermediate level of performance was evident on the vowel discrimination task. She correctly identified 82% of the contrasts correctly. She made errors on closely related contrasts such as
and
. This is slightly lower than performance we have seen in controls.
Because formant frequency transitions are fundamental cues to distinguishing consonants, we devised a task to assess her ability to discriminate transitions in a single formant. By parametrically varying the time course of transitions, we aimed to determine which durations posed difficulty for her. The extent of the frequency transition was modeled after the second formant transitions in the CV's
. The up-going formant ramp started at 900 Hz and transitioned to a steady-state frequency of 1240 Hz. The comparison down-going formant ramp started at 1580 Hz and transitioned to the same steady-state frequency (1240 Hz). These stimuli were paired in all possible combinations: up-up, down-down, up-down, down-up. The duration of the formant transition was the same for each pair. Five different durations were examined (40, 80, 120, 200, 500) with 20 trials per stimulus pair. Examples of the single formant stimuli are depicted in Figure 2D.
NH's performance on the single formant ramp perceptual discrimination task is depicted graphically in Figure 3. She responded with 45 to 70% accuracy on ramp durations from 40 to 200 ms. Binomial tests suggested that her discrimination accuracy was no better than chance for all but the 500 ms ramp duration. Normal listeners score between 93 and 100% on these discriminations, so her performance remained somewhat poor even at the longest ramp duration.
Results from the single formant perceptual discrimination task. The abscissa is the duration of the formant transition and the ordinate is the percent correct.
Auditory temporal resolution was assessed with a click fusion task in which very brief, spectrally broadband stimuli (clicks) are presented in rapid succession separated by an intervening period of silence. When this ISI is extremely short, the temporal boundary between each click is insufficient to result in separate percepts, so listeners “fuse” two clicks and report hearing one. Extending the duration of the ISI beyond a threshold value results in the perception of two distinct clicks.
In the current implementation of this task, the duration of a silent interval inserted between two 0.5 ms binaurally presented square wave clicks was varied from 0 to 10 ms in 1 ms steps. Ten stimuli were presented at each ISI. NH was asked to press one button on a response box if one click was heard, and another button if two clicks were perceived. Figure 4 plots her percent correct identifications as a function of ISI.
Results from the click fusion task. The abscissa denotes the duration of the silent ISI between 0.5 ms square wave clicks. The ordinate is the percent correct.
NH was able to discriminate one from two “clicks” with 100% accuracy when separated by an ISI of 3 ms or more. She correctly identified clicks with no intervening silence (0 ms) as a single click on 80% of trials. Overall, her performance compares favorably with results from normal listeners (Patterson & Green, 1970). The findings suggest that NH was able to adequately process brief temporal cues in sound defined by onset/offset characteristics.
Following Wernicke's (1874) seminal description of “sensory aphasia,” Kussmaul (1877) noted similar deficits in auditory comprehension in a patient who did not demonstrate the coexisting problems with reading or the copious “paraphasias” (a term he coined) that Wernicke had noted. Kussmaul postulated that this symptom complex constituted a distinguishable syndrome “word deafness” that resulted from destruction of the first left temporal gyrus. He contrasted this with an analogous disturbance in processing language in the visual modality, “word blindness,” which he believed was secondary to left angular gyrus and supramarginal gyrus lesions. When the two syndromes occurred together, he suggested, they represented Wernicke's sensory aphasia.
Lichtheim (1885) also regarded “isolated word deafness” as a distinct clinical entity, characterizing it in terms of selective deficits in auditory language comprehension, repetition, and writing to dictation. He supported this with a description of a case who demonstrated early symptoms of Wernicke's aphasia that improved rapidly, leaving a more selective pattern of deficits that spared volitional speech, reading, copying written material, and spontaneous writing (p. 460). While Kussmaul had implicated destruction of auditory cortex, Lichtheim speculated that word deafness resulted from a deep left temporal lesion that essentially isolated Wernicke's area from the “auditory reception center” (primary auditory cortex). His conceptualization implied that Wernicke's area and auditory cortex might independently be capable of normal or near normal functioning. Subsequently, several case studies confirmed an association of PWD with deep unilateral superior temporal lesions (Liepmann & Storch, 1902; Schuster & Taterka, 1926).
A number of reports also emerged suggesting that PWD could result from bilateral temporal lesions (Pick, 1892; Déjerine & Serieux, 1898; Ballet, 1903; Barrett, 1910; Henschen, 1919, 1920). In contrast to the unilateral cases, these patients frequently demonstrated persisting aphasic disturbances or collateral symptomatology suggestive of a resolving general auditory agnosia or cortical deafness (Buchman et al., 1986). The rarity of cases fully meeting criteria for PWD prompted some to question the existence of the syndrome. Pierre Marie (1906) remarked in no uncertain terms that PWD was “a simple myth” stating “I must declare, first of all, that to my knowledge pure word deafness does not exist from either the clinical or the anatomical pathological point of view … it is impossible to find an authentic case of this pretended clinical form” (translated by Cole & Cole, 1971, pp. 77–78). Head (1926) concurred, noting that in every instance where case reports provided sufficient detail, nonverbal perceptual impairment existed or there was clear evidence of an aphasic disturbance apparent in problems executing both written and oral commands. Despite persisting concerns over definitional issues, many cases labeled as PWD were subsequently reported with unilateral left temporal or bitemporal lesions, which did not entirely comply with the original criteria.
The broadening concept of PWD arguably contributed to confusion regarding its validity, its basis, and its boundaries that continues to the present day. Both the selectivity for verbal material and the preservation of other language functions remain focal points in contemporary controversies (Goldstein, 1974; Buchman et al., 1986). A current review of the literature reveals a predominance of cases (∼72%) secondary to bitemporal infarcts, typically involving fairly symmetric cortico-subcortical lesions compromising middle and posterior regions of the superior temporal gyrus (Poeppel, 2001), some with sparing of Heschl's gyrus (Bauer & Zawacki, 2000). Word deafness resulting from unilateral lesions is more rarely reported, and in only a few cases has the location of the lesion been adequately delineated (Hamanaka et al., 1980; Kamei et al., 1981; Metz-Lutz & Dahl, 1984; Takahashi et al., 1992; Wang et al., 2000). In all but one unilateral case, PWD has been reported in association with left hemisphere pathology.
As a consequence of both the higher proportion of bitemporal lesions and the general rarity of cases that adhere to strict diagnostic criteria, prevalent views regarding the functional basis of PWD are disproportionately influenced by studies in which word deafness was identified as a symptom mixed with other aphasic components (Miceli, 1982; Buchman et al., 1986; Buchtel & Stewart, 1989; Praamstra et al., 1991) or with features of a broader auditory recognition disorder (Auerbach et al., 1982; Vignolo, 1982). As a case in point, the only report of PWD resulting from a unilateral right temporal lesion (Roberts et al., 1987) is a patient who lost his ability to recognize both spoken words and musical tunes and arguably should not be regarded as a “pure” case of word deafness. In reviewing the literature, Buchman et al. (1986) suggested that the modifier “pure” be dropped but supported the concept that “word deafness” was a distinct clinical entity.
NH represents one of the rare instances where the clinical picture conforms well to the original descriptions of PWD. Consistent with Lichtheim's description, she exhibited early signs of a broader aphasic disturbance resembling a Wernicke's aphasia that rapidly resolved in the course of a couple of weeks, leaving her with a remarkably circumscribed disturbance of auditory language comprehension. She demonstrated moderate to severe difficulties in understanding and repeating spoken language with otherwise intact speech production and nonauditory language comprehension. Both reading and spontaneous writing were preserved. In addition, there was no evidence of a broader auditory agnosia. She demonstrated intact environmental sound recognition and reported no change or difficulties in her appreciation of music, although she complained that she could no longer understand the lyrics. Her peripheral hearing sensitivity in the frequency range most critical to speech was broadly within normal limits, although her audiogram showed mildly raised thresholds at 4 and 8 kHz in the ear contralateral to her lesion.
Similar audiological findings have been described in other unilateral cases of PWD (e.g., Saffran et al., 1976) although interaural disparities in sensitivity have also been evident at somewhat lower (250–1000 Hz) frequencies (Takahashi et al., 1992; Wang et al., 2000). By comparison, patients with word deafness secondary to bitemporal lesions commonly demonstrate bilateral increases in pure-tone thresholds (Lhermitte et al., 1971; Kanshepolsky et al., 1973; Phillips & Farmer, 1990) that are also frequently associated with a gradient whereby higher frequencies (≥4 kHz) are more affected (Jerger et al., 1972; Auerbach et al., 1982; Motomura et al., 1986; Tanaka et al., 1987; Yaqub et al., 1988; Praamstra et al., 1991; Griffiths, Rees & Green, 1999).
Analysis of NH's ability to perceive speech revealed a substantial impairment in her ability to distinguish between stop consonants while her perception of vowels was significantly better. This pattern of phonemic imperception has been previously described in PWD (Saffran et al., 1976; Auerbach et al., 1982; Miceli, 1982; Yaqub et al., 1988) but is not specific to it. Inordinate difficulty with consonant processing can emerge in aphasia (Miceli et al., 1978; Baker et al., 1981; Gow & Caplan, 1996; Caramazza et al., 2000) and problems with specific stop consonants (for example, alveolars t, d) can be secondary to moderate hearing loss (Walden & Montgomery, 1975). However, NH's difficulties with phonemic perception are both more severe and pervasive than those typically associated with aphasia or mild to moderate hearing loss. In the context of her overall presentation, they suggest that a fundamental breakdown exists in processes that mediate the mapping of acoustic features of consonants onto discrete phonological representations or in the representations themselves. Viewed from the perspective of current models of speech perception (McClelland & Elman, 1986; Franklin, 1989; Marslen-Wilson & Warren, 1994; Frauenfelder & Floccia, 1998; Norris et al., 2000), this would impede the matching of acoustic input with lexical entries that in turn could result in failures to activate the retrieval of meaning attributes of words in the semantic system.
The prelexical impairment in auditory analysis associated with “word sound deafness” can be contrasted with “word meaning deafness” (Bramwell, 1927; Kohn & Friedman, 1986) in which patients have severe difficulties understanding auditory language but are able to repeat speech and write to dictation. Preservation of the ability to repeat speech in these cases suggests that their analysis of phonological information is relatively intact (but see Tyler & Moss, 1997). Rather, the disruption of language comprehension in word meaning deafness appears to occur at the level of lexical access, possibly secondary to a failure to match the intact phonological code with corresponding lexical representations or to a “post access” failure of lexical items to activate corresponding representations in the semantic system (Ellis, 1984; Kohn & Friedman, 1986; Franklin et al., 1996).
NH also exhibited significant perturbations in dichotic listening performance that are consistent with previous descriptions of unilateral cases of PWD (Albert & Bear, 1974; Saffran et al., 1976). Specifically, she demonstrated extinction of words presented to the right ear during dichotic stimulation and relative suppression of right ear recall when listening to dichotic CV syllables. A strong left ear advantage on dichotic listening tasks has also been observed in patients with aphasia and interpreted as evidence of functional neuroplasticity and a greater role of right hemisphere mechanisms in mediating speech processing (Moore & Papanicolaou, 1988, 1992). It is somewhat unusual to obtain complete or near complete extinction of the right ear in aphasia (Niccum et al., 1986), although it can occur with lesions to Heschl's gyrus or geniculotemporal pathways with or without aphasia.
Several factors may account for this pattern in NH. First, the extent and distribution of her lesion likely prevents right ear input from reaching Wernicke's area via crossed geniculocortical auditory pathways. Right ear input may conceivably reach Wernicke's area via transcallosal pathways, although it must compete against left ear input arriving at right auditory cortex via stronger contralateral pathways. Given that contralateral pathways are functionally predominant over ipsilateral channels, this competition is biased against the right ear input. Degradation of the signal may result from the circuitous path to the language dominant hemisphere, and this may be further affected by alterations in hearing competence seen contralateral to hemispheric lesions (Linebaugh, 1978; Niccum & Speaks, 1991). Finally, it is also possible that transcallosal pathways have been compromised by the subcortical extension of her lesion (Gazzaniga et al., 1973). Selective attention appeared to have little effect on this asymmetry.
It is noteworthy that NH showed generally lower overall performance on dichotic word tasks than has been noted in other unilateral cases of PWD. We used an open set of words while both Saffran et al. (1976) and Albert and Bear (1974) used small (≤12) closed sets of dichotic stimuli (monosyllabic names and digits). Because patients with PWD appear to make use of contextual cues (Saffran et al., 1976; Best & Howard, 1994), repeated presentation of distinct stimuli from a closed set may have contributed to the relatively better overall accuracy in those studies.
The fundamental nature of the processing disturbance that characterizes PWD has not been firmly established, nor is it clear that the same mechanism is operative in all cases. Recent attempts to identify the functional locus of their verbal auditory recognition disorder have focused on detailed analyses of their difficulties with speech perception (Denes & Semenza, 1975; Saffran et al., 1976; Auerbach et al., 1982; Yaqub et al., 1988; Praamstra et al., 1991). A key issue concerns the basis for the dissociation between consonant and vowel perception and whether a lawful breakdown occurs in their processing of particular phonetic features during consonant perception.
Vowels are characterized acoustically by a formant frequency structure that maintains a relatively steady state for 100–150 ms. Although not invariably the case (Tanaka et al., 1987; Praamstra et al., 1991; Tanji et al., 2003), the reasonably static cues for vowel perception, such as the frequency of the first formant (F1) and the relative spacing between F1 and F2 (Delattre et al., 1952; Ladefoged, 2001), are generally processed without substantial difficulty in PWD, even in patients with bitemporal lesions (Auerbach et al., 1982; Miceli, 1982; Yaqub et al., 1988).
By contrast, the processing of consonants is impaired in both unilateral and bitemporal cases of PWD (Denes & Semenza, 1975; Auerbach et al., 1982; Yaqub et al., 1988; Kazui et al., 1990; Godefroy et al., 1995; Jacobs & Schneider, 2003). This has been observed when tested by means of natural CV syllables (Denes & Semenza, 1975; Saffran et al., 1976) as well as computer-synthesized speech sounds (Saffran et al., 1976; Miceli, 1982). The perception of consonants necessitates successful online perceptual elaboration of short-term acoustic features in the speech signal. Cues important for the perception of stop consonants of English (p, t, k, b, d, g) include the spectrum of brief (5 and 15 ms) noise (consonantal) bursts (Fant, 1973), and the rate, duration, and direction of rapid (20–50 ms) formant frequency transitions (Delattre et al., 1952; Liberman et al., 1967; Keating & Blumstein, 1978; Diehl, 1981). The frequency of the burst and the trajectory (rising vs. falling) of formant transitions give rise to the perception of different places of articulation (labial, alveolar, or velar). There are also multiple cues to distinguish voiced (b, d, g) from voiceless (p, t, k) consonants. A primary cue related to voiced-voiceless distinctions is the “voice-onset time,” which corresponds to the time lag between the consonantal burst and the onset of voicing (the first glottal pulse).
Some studies have suggested that patients with PWD demonstrate a systematic breakdown involving the coding of particular phonetic features. This is of considerable interest because it may specify the level of phonemic processing where problems arise (Saffran et al., 1976; Caramazza et al., 1983; Metz-Lutz & Dahl, 1984). Saffran et al. (1976) noted greater difficulty in identifying consonants differentiated by voiced-voiceless distinctions than place of articulation in a patient with word deafness secondary to left hemisphere pathology. More frequently, however, the converse pattern has been observed (Miceli, 1982; Yaqub et al., 1988), or patients demonstrate roughly equal problems with consonant perception whether the critical feature contrast is dependent on place or voicing cues (Tanaka et al., 1987; Praamstra et al., 1991). Given the relatively small number of cases, variations in underlying pathology, and differences in methodology, there is insufficient information at present to discern whether systematic dissimilarities in consonant perception may exist between patients with unilateral and bilateral lesions. However, the observed patterns of error do not appear to be predictable on the basis of problems at the level of phonological representation. Consequently, it has been suggested that their difficulties with speech perception may be related to prephonemic disturbances (Saffran et al., 1976; Auerbach et al., 1982).
The precise nature and extent of prephonemic disturbances associated with PWD remain to be elaborated. A number of studies have suggested that deficits in temporal resolution may be key to their difficulties in processing speech. While normal individuals demonstrate click fusion thresholds on the order of 1 to 3 ms (Miller & Taylor, 1948; Patterson & Green, 1970; Hirsh, 1975), it has been reported that patients with PWD require 15–300 ms between two clicks before they are able to reliably make this discrimination (Albert & Bear, 1974; Auerbach et al., 1982; Tanaka et al., 1987; Best & Howard, 1994; Godefroy et al., 1995). Auerbach et al. (1982) proposed that this elevated threshold reflected impaired auditory temporal acuity, which in turn could cause special difficulties with consonant perception by impeding the analysis of rapid formant frequency transitions.
The link between impairment of auditory temporal resolution and problems with specific aspects of speech perception or language comprehension in PWD has remained speculative. Indirect evidence supporting an association may be garnered from a study by Godefroy et al. (1995) who followed a patient with “auditory agnosia” in the course of his recovery. They found that click fusion thresholds assessed in the acute stage were significantly elevated. However, as the word deafness resolved, there was a corresponding normalization of click fusion thresholds. The parallel recovery of word deafness and click fusion strengthens the argument for a common underlying disturbance, although clearly it is not possible to suggest causal relations from such correlations.
The precise nature of the temporal processing deficits indexed by faulty click fusion also remains at issue. Kanshepolsky et al. (1973) found that as the duration of auditory stimuli became very short, sound intensity had to be raised appreciably (20 to 25 dB) in order for their patient with word deafness to hear the stimulus. A few studies have shown that these problems tend to co-occur in patients with word deafness secondary to bitemporal infarction, particularly with lesions involving or extending into subcortical white matter (Motomura et al., 1986; Tanaka et al., 1987). However, Buchtel and Stewart (1989) suggest that poor temporal resolution is evident in PWD in the absence of difficulties perceiving brief stimuli in isolation. Noting that their patient with PWD was completely unable to perceive clicks due to their brevity, they utilized a modified fusion task, replacing clicks with 30 ms tone bursts that their patient had no difficulty hearing. Manipulating the duration of the silent ISI between these tone bursts, they found that their patient demonstrated profound difficulties in discriminating one from two tones presented in rapid succession. Whereas normal subjects reliably differentiated two tones when separated by silent intervals of approximately 15 ms, their patient required a silent gap of approximately 250 ms.
Based on observations of a frequent co-occurrence of abnormal intensity-duration functions and elevated fusion thresholds, Tanaka et al. (1987) speculated on a common neurophysiological basis for these diverse disturbances of timing. They suggested that such problems might be explicable in terms of abnormal persistence of neural activity, prolonged refractory periods, or slow initial recruitment. Some studies have observed that timing anomalies are also evident in PWD in the perception of rapid visual events (Best & Howard, 1994; Tanaka et al., 1987) raising the possibility of a supramodal timing deficit. Extending a conceptualization by Auerbach et al. (1982), Best and Howard (1994) suggested that a slow or inconsistent neural clock may the most parsimonious explanation for the diverse temporal processing deficits associated with PWD. They conceived of this biological timing mechanism in terms of a reciprocal loop from the cortex to the cerebellum and suggested that it may serve as a central timing mechanism for several neural systems.
A problem with the slow clock hypothesis and other accounts positing that PWD is based in fundamental disturbances of temporal resolution is that the supporting data are derived almost entirely from patients with bilateral lesions. Only one study has examined click fusion in a unilateral case of PWD, and while the findings were abnormal (Albert & Bear, 1974), the observed threshold (15 ms) was substantially lower than that described in several cases of word deafness secondary to bitemporal lesions (30 to 300 ms) (Auerbach et al., 1982; Motomura et al., 1986; Tanaka et al., 1987).
Evidence from NH suggests that temporal processing deficits can exist in PWD in the absence of basic deficiencies in resolving distinct auditory events. NH demonstrated a normal click fusion threshold, yet had profound difficulties with the spectrotemporal analysis of transient modulations in formant frequency occurring within a temporal window of up to 200 ms. Because short-term dynamic patterns such as frequency modulations (FM) are ubiquitous to speech and serve as fundamental cues to the phonemic identification of consonants, the observed deficits would pose significant obstacles in understanding spoken language. By contrast, she appeared able to process slower modulations. Although not specifically examined, this may support satisfactory processing of suprasegmental aspects of speech analysis such as prosody and intonation contours, consistent with a case described by Coslett et al. (1984).
Deficiencies in FM analysis of tonal stimuli have also been described by Wang et al. (2000) in a PWD patient with a unilateral left hemisphere lesion involving cortical and subcortical white matter of left temporal lobe extending superiorly into frontoparietal regions. Their patient had difficulty in discriminating the directional trajectory (up-going vs. down-going) of pure-tone glides 302 ms in duration as well as 50 ms linear frequency ramps followed by a steady-state tone. These observations are broadly consistent with our findings with formant frequency modulations. Wang et al. (2000) did not utilize a measure of temporal acuity, so it cannot be discerned whether the deficits observed in their case were also independent of more basic problems with temporal resolution.
Overall, our findings provide compelling evidence that prephonemic auditory temporal processing disturbances can occur in PWD secondary to unilateral left temporal lobe lesions and result in substantial problems with auditory comprehension. The dissociation between click fusion and frequency modulation analysis observed in NH cannot readily be accommodated by the concept of a slow clock as suggested by Best and Howard (1994). Rather, the results appear to implicate specialized mechanisms that mediate rapid FM analysis. Evidence from animal neurophysiological studies and human psychophysical experiments suggest that temporally varying sounds undergo specialized analysis involving modulation sensitive neural mechanisms that are distinct from processes underlying the analysis of steady-state sounds (Kay, 1974). These mechanisms or “channels” are not merely concerned with the detection of a change from one frequency to another but are sensitive to the instantaneous temporal properties of frequency change such as the rate, shape, direction, and periodicity of modulation (Kay & Matthews, 1972; Green & Kay, 1973; Collins & Cullen, 1978; Gardner & Wilson, 1979; Regan & Tansley, 1979). They are physiologically distinguishable from analogous mechanisms that process amplitude modulations in sound (Regan & Tansley, 1979; Kay, 1982). The temporal tuning that characterizes the response properties of FM sensitive neurons, their connectivity with other auditory neurons, and their hierarchical organization in the auditory system suggests an intrinsic capacity to track modulations in time.
Frequency modulation sensitive mechanisms have functional characteristics critical to speech reception (Kay, 1982). Studies of the cortical organization of these mechanisms suggest that they are elaborated in the superior temporal cortex in areas confluent with primary auditory cortex and classical language reception areas (Arlinger et al., 1982; Hari & Makela, 1986; Makela et al., 1987). Recent functional magnetic resonance imaging studies have localized FM sensitive mechanisms to so-called belt and parabelt association cortex anterolateral and lateral to Heschl's gyrus (Johnsrude et al., 1997; Binder et al., 2000; Zatorre et al., 2002) in areas that appear to be specialized for processing rapid time varying aspects of sound (Hart et al., 2003). Joanisse and Gati (2003) observed that speech and dynamically varying nonspeech stimuli produce remarkable overlap in neural activation on functional magnetic resonance imaging. Indeed, they suggest that differences appeared to be related to the degree of activation rather than differences in spatial localization.
Interestingly, these areas have been implicated in cortical mapping studies as critical to consonant perception. Examining patients undergoing cortical mapping in preparation for surgical treatment of intractable epilepsy, Boatman et al. (1995) demonstrated that electrical stimulation to lateral mid and posterior superior temporal cortex disrupted the perception of consonant-vowel syllables but not vowels. These regions may form a functional subunit of a posterior stream of auditory processing distributed along the supratemporal cortical plane (Wise et al., 2001).
While there is evidence of bilateral mediation of the speech code, there is growing support from dichotic listening, electrophysiological, neuromagnetic, and functional neuroimaging studies to suggest that there are hemispheric differences in the computational networks that specialize in the analysis of rapid acoustic modulations in speech. Poeppel (2003) has proposed that the neural architecture that analyzes speech signals is asymmetrically analyzed in the time domain, with left-hemisphere mechanisms preferentially extracting information over shorter (25–50 ms) temporal integration windows while right hemisphere mechanisms integrate temporal variations over longer (150–250 ms) windows. Our findings are broadly consistent with this viewpoint. However, the dissociation between click fusion and frequency modulation analysis suggests that the temporal window concept is in need of a higher degree of specification. Specifically, our results suggest that window size may depend on the type of information that is being analyzed in the time domain. Differentiation of auditory objects based on information present in amplitude modulations (on and off) was not affected in our patient with PWD. We would therefore suggest that hemispheric asymmetries are more related to the analysis of rapid spectrotemporal variations that require the tracking of relational structure or temporal patterns in sound.
The specificity of the auditory comprehension deficit observed in NH affirms our understanding of the modular organization of the speech recognition system and suggests that this can be impaired by neural damage with relative sparing of more central language computational networks. In addition, the data presented here support a modular view of auditory processing and suggest that substantial difficulties involving the analysis of rapid frequency changes over time can exist despite adequate temporal resolution. The results are in keeping with emerging conceptions that human speech perception is based on multiple, hierarchical processing pathways and that there are left hemispheric mechanisms that are particularly adept at high-speed processing of acoustic cues important to the perception of speech.
We concur with Head (1926) and more contemporary conceptualizations (Ellis & Young, 1988) that perceptual impairment does indeed exist beyond that for words in cases of PWD. This may be an inescapable consequence of the interactive and highly interconnected architecture that subserves the processing of speech. Because language emerged as a system that codes linguistically important acoustic differences related to subtle variations in articulation, it poses special challenges to auditory temporal processing capacities of the brain. We are rarely called upon in everyday life to make subtle distinctions between environmental sounds based on rapid spectrotemporal cues lasting a few tens of milliseconds. A possible example might be to distinguish whether a violin has been plucked or bowed by listening to the difference in the attack. It is perhaps not surprising then that a temporal processing disorder of the kind we observed would be clinically most evident in processing speech.
We thank the Albert Einstein Society and the Pennsylvania Department of Health for funding research programs that allowed us to study this patient. We also gratefully acknowledge Denise Haas and Sharon Kaplan of the Center for Communication Disorders at Moss Rehabilitation Hospital for insights on this patient.
Magnetic resonance images showing encephalomalacia and gliosis in the left temporal region involving the planum polare, the transverse temporal (Heschl's) gyrus, and the planum temporale and subadjacent white matter. Abnormalities extended in the lateromedial axis from the dorsal surface of the superior temporal gyrus to the insula. This study was obtained 4 months post-onset.
These spectrograms plot the frequency spectra of the four types of speech and nonspeech stimuli. A represents a four component complex tone, B depicts the corresponding vowel , C shows the CV syllable , and D provides examples of the single formant stimuli. The extent of these formant transitions was modeled after the second formant in .
Results from the single formant perceptual discrimination task. The abscissa is the duration of the formant transition and the ordinate is the percent correct.
Results from the click fusion task. The abscissa denotes the duration of the silent ISI between 0.5 ms square wave clicks. The ordinate is the percent correct.