Regressive voicing assimilation is a common phonological pattern in which a contrast between voiced and voiceless obstruents is neutralized in the position before another obstruent, with the preceding obstruent agreeing with the following one in voicing (Passy Reference Passy1891: 168; Cho Reference Cho1990; Lombardi Reference Lombardi1995, Reference Lombardi1999; Wetzels & Mascaró Reference Wetzels and Mascaró2001). Laryngeally mixed obstruent clusters do not occur in languages with regressive voicing assimilation: voiced obstruents do not occur before voiceless ones, and voiceless obstruents do not occur before voiced ones. Languages that have been reported to have such a pattern are listed in (1), and other reported variants are listed in (2)–(5).
(1) An obstruent must agree in voicing category with a following obstruent.
a. Slavic: Russian (Halle Reference Halle1959: 64), Polish (Rubach Reference Rubach1984: 206), Slovak (Rubach Reference Rubach1993: 280), Serbian-Croatian (Partridge Reference Partridge1972: 20), Slovene (Herrity Reference Herrity2000: 22), Czech (Heim Reference Heim1976: 14), Belarus (Rubach Reference Rubach2008: 463)
b. Germanic: Dutch (Booij Reference Booij1995: 59–60), Yiddish (Katz Reference Katz1987: 29–30)
c. Romance: Latin (Niedermann Reference Niedermann1910: 67–68), Walloon (Francard & Morin Reference Francard and Morin1986: 454–455), Rumanian (between words – Avram Reference Avram1986: 565)
d. Sanskrit (Whitney Reference Whitney1891: 55)
e. Breton (Le Dû Reference Le Dû1986: 446)
f. Semitic: Sudanese Arabic, Turkish Arabic, and Maltese Arabic (Abu-Mansour Reference Abu-Mansour and Eid1996), Modern Hebrew (except [x] – Barkai Reference Barkai1972: 90)
g. Lithuanian (Dambriunas, Klimas & Smalstieg Reference Dambriunas, Klimas and Smalstieg1966: 17)
h. Hungarian (Vago Reference Vago1980: 34–35)
(2) An obstruent must agree with a following consonant in voicing (including a following sonorant consonant).
a. Catalan (Wheeler Reference Wheeler1979: 310–313)
b. Sanskrit (between words – Selkirk Reference Selkirk, Aronoff and Kean1980: 115)
c. Lango (Noonan Reference Noonan1992: 17)
(3) A voiced obstruent must agree with a following obstruent in voicing, but a voiceless obstruent can stand before a voiced one.
a. French (within a phonological word – Dell Reference Dell1995: 12)
b. Yathê (Wetzels & Mascaró Reference Wetzels and Mascaró2001: 228)
c. Afar (noncoronals only – Bliese Reference Bliese and Bender1976: 160)
d. Sawai (between morphemes – Whisler Reference Whisler, Burquest and Laidig1992: 24)
e. Yorkshire English (between words – Wells Reference Wells1982: 367)
f. Makkan Arabic (Abu-Mansour Reference Abu-Mansour and Eid1996: 217)
(4) An obstruent must agree with a preceding obstruent in voicing category.
a. Turkish (Lees Reference Lees1961: 50)
b. Uyghur (Hahn Reference Hahn1991: 82)
c. Dutch (only if the second obstruent is a fricative or a coronal suffix – Booij Reference Booij1995: 58–64)
(5) Obstruents in an obstruent cluster must agree in voicing class and must be voiceless.
a. Osage (H. Wolf Reference Wolf1952)
b. Modern Irish (Ó Siadhail Reference Ó Siadhail1989)
c. Hixkaryana (Derbyshire Reference Derbyshire1985)
d. Swedish (stops only – Helgason & Ringen Reference Helgason and Ringen2007, Reference Helgason and Ringen2008)
Admittedly, the statements above are based for the most part on transcription data, which systematically overstates the categoriality of sound patterns. Experimental studies have in a number of languages found that patterns that had been described as categorical regressive voicing assimilation were in fact incomplete, i.e. the voiced and voiceless obstruent classes were acoustically and perceptually distinct even in the assimilation context. This has been found in Catalan (Charles-Luce Reference Charles-Luce1993), Hungarian (Jansen Reference Jansen2004), Dutch (Jansen Reference Jansen2004, Warner et al. Reference Warner, Jongman, Sereno and Kemps2004), and French (Snoeren, Hallé & Segui Reference Snoeren, Hallé and Segui2006). On the other hand, Burton & Robblee (Reference Burton and Robblee1997) found in Russian no significant differences in voicing measures between underlyingly voiced and voiceless obstruents in the assimilation context. Ernestus & Baayen (Reference Ernestus, Harald Baayen, Goldstein, Whalen and Best2006: 47) suggest that such incomplete neutralization reflects the analogical influence on the production of alternating lexical items by activation of related items with a different voicing class. According to this plausible view, an item in which a final [t] alternates in related forms with final [d] will show more of the phonetic properties of [d] that an item in which the final consonant is non-alternating [t].
It has long been recognized that regressive voicing assimilation corresponds to the phonetic realization of voicing contrasts in obstruent clusters (e.g. Sievers Reference Sievers1901: 290; Passy Reference Passy1891: 168–169). Articulatory studies such as Yoshioka, Löfqvist & Hirose (Reference Yoshioka, Löfqvist and Hirose1981) and Munhall & Löfqvist (Reference Munhall and Löfqvist1992) have shown that laryngeal gestures for successive consonants in a consonant cluster tend to overlap and blend, particularly at faster speech rates. Acoustic studies have shown that there is a longer voiced interval in an obstruent before a voiced consonant than before a voiceless consonant in English (Haggard Reference Haggard1978; Docherty Reference Docherty1992: 165; Stevens et al. Reference Stevens, Blumstein, Glicksman, Burton and Kurowski1992; Smith Reference Smith1997; Jansen Reference Jansen2004), French (Snoeren et al. Reference Snoeren, Hallé and Segui2006), and Syrian Arabic (Barry & Teifour Reference Barry and Teifour1999). The effects of such anticipatory laryngeal coarticulation are gradient, rate-dependent, and vary in strength within the affected segment depending on the proximity to the conditioning sound (Cohn Reference Cohn1993, Zsiga Reference Zsiga, Connell and Arvaniti1995).
The acoustic effects of anticipatory laryngeal coarticulation can impact the identification of voicing categories. Stevens et al. (Reference Stevens, Blumstein, Glicksman, Burton and Kurowski1992) excised VCCV sequences from English sentences, and had listeners identify the consonants in the consonant cluster. Voiceless–voiced sequences were identified correctly 56.3% of the time, but the most common error (29.8% of the identification judgments for such clusters) was misidentification of the first consonant as voiced. Likewise, mixed clusters with the intended sequence voiced–voiceless were correctly identified in 75.7% of the cases, and the most common error was misidentification of the first consonant as voiceless (16.5% of the identification judgments). Similarly, Snoeren et al. (Reference Snoeren, Hallé and Segui2006: 252) found that the voicing category of a French word-final stop from a position before an obstruent of the opposite voicing class was misidentified more often than a word-final stop from a position preceding a consonant of the same voicing class.
These perceptual results suggest how laryngeal coarticulation within consonant clusters (a gradient pattern in the realization of voicing classes) can be related to the phonological pattern of regressive voicing assimilation (a restriction on the distribution of voicing categories). Overlap between laryngeal gestures within clusters could lead to a tendency among listeners, especially inexperienced ones, to identify consonants within such a cluster as matching in voicing class. If they generalize on this basis, they could infer that there is a restriction on distribution requiring that the consonants belong to the same class. If such a pattern spread through the speech community, the result would be the emergence in the language of regressive voicing assimilation.
This sound change would be an instance of hypocorrection (Ohala Reference Ohala1981, Reference Ohala and Jones1993), in which a listener fails to compensate in perception for a production effect. The result, if the pattern spreads from that speaker through a speech community, is phonologization (Hyman Reference Hyman and Juilland1976), in which a gradient production pattern is reinterpreted diachronically as a phonological pattern in the distribution of categories.
Such a diachronic account would explain some basic properties of regressive voicing assimilation. The pattern is common and has emerged independently in many different languages because it is a straightforward reinterpretation of a pervasive pattern of laryngeal coarticulation. The obstruent changes in voicing because an obstruent affected by laryngeal coarticulation tends to get mistaken for an obstruent of the opposite voicing, rather than some other segment type. The assimilation tends to be regressive because the coarticulation pattern is anticipatory, and the voicing cues in prevocalic position are more informative than those in postvocalic position (Warner Reference Warner, Doran, Majors, Mauk and Goss1998), so that an error in voicing categorization is more likely to occur in the first of two successive obstruents than in the second (cf. place assimilation – Ohala Reference Ohala, Kingston and Beckman1990). The pattern tends to be restricted to obstruents because partially devoiced sonorants are so low in intensity that they tend to be mistaken for silence rather than for any class of voiceless segment (Myers & Hansen Reference Myers and Hansen2007).
Empirical support exists already, in the studies cited above, for the articulatory and acoustic stages of this diachronic scenario (i.e. for laryngeal coarticulation and its acoustic effects). The present study is an investigation of the next, perceptual stage of the sound change. The experiments presented here test the hypothesis that listeners tend to identify the voicing class of obstruents as matching that of a following obstruent.
To investigate the perception of voicing in laryngeally mixed clusters, one needs to look at a language in which such clusters are allowed. In the present study, the language of the materials and the subjects is English, which has a contrast in voicing in obstruent clusters (e.g. afghan, vodka, obtain, football – Westbury Reference Westbury1979: 12). The first study is a production study (Experiment 1), which investigated how the voicing cues of a fricative are affected by the category of a following segment. The second study is a perception study (Experiment 2), which tested whether listeners’ identification of the voicing class of an obstruent is affected by following segment context.
1 Production study (Experiment 1)
The production study investigates the effect of a following segment on acoustic correlates of voicing in voiced and voiceless fricatives in American English.
1.1 Methods
Eight adult native speakers of American English participated in the production study. They read aloud materials presented to them in randomized order at five-second intervals in a timed Powerpoint presentation, while seated in a sound-isolated booth. They were instructed to produce each sentence as if it were an isolated utterance. Their productions were recorded on a digital solid-state recorder at a sampling rate of 44.1 kHz and 16-bit amplitude resolution.
Each test word was a member of a minimal pair distinguished by voicing in a final fricative. These nine minimal pairs are listed in (6).
(6) Minimal pairs of test words
Since the duration of the vowel before a consonant is a cue for the voicing class of that consonant (Chen Reference Chen1970), factors that affect vowel duration have been held constant. The test syllable was always the final syllable of the word, and always a syllable bearing main word stress. To control for intrinsic vowel duration effects (Lindblom Reference Lindblom1968), the test vowel was always high and long (i.e. tense). Each utterance was a sentence consisting of ten syllables, to control for the effects of utterance length on duration measures (Lindblom Reference Lindblom1968).
Each test word occurred in four sentence-medial contexts:
(i) before a word beginning with a vowel (That is the conclusive proof in the case)
(ii) before a word beginning with a nasal (The police believe they have the proof now)
(iii) before a word beginning with a voiced plosive (The tent was put to the proof by this storm)
(iv) before a word beginning with a voiceless plosive (He showed the proof to the defense lawyer).
The word following the test word was always the same in the corresponding test sentences for both members of a minimal pair. There were thus 18 test words in four contexts for 72 test sentences for each subject (listed in the appendix), and a total of 576 test sentences (72 × 8) for the whole subject pool. Of these, eight utterances were excluded due to misreading of the test item context, or hesitation or pause following the test item. This left a total of 568 tokens.
The duration of the following intervals was measured on the basis of the waveform and the spectrogram:
(7) Measured intervals
• the vowel preceding the test fricative
• the test fricative
• voiced and voiceless intervals within the test fricative
The onset of the vowel was defined as the onset of the amplitude rise in the waveform. The most obvious acoustic manifestation of the fricative was the interval of noise. But in a number of cases, particularly with voiced fricatives in the prevocalic context, there was no noise, but there was an interval with a low-amplitude periodic waveform indicating a period of constriction. An example is given in Figure 1, which presents the waveform and spectrogram of the sequence [ˈɡɹiʋɨ] from the sentence She didn't have time to grieve about it. The vowel [i] is marked in the spectrogram as the interval a, and c marks the frictionless continuant [ʋ].
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160927090351152-0593:S0025100309990284:S0025100309990284_fig1g.gif?pub-status=live)
Figure 1 Spectrogram showing the frictionless realization of a voiced fricative (marked c) in the sequence [ˈɡɹiʋɨ] from the sentence She didn't have time to grieve about it. The vowel preceding the fricative is marked as a.
To accommodate such undershot sonorant realizations of the fricatives, the constriction interval was measured, whether or not it included a noise interval. The onset of the fricative was defined as the earliest of two time-points: either the onset of noise, or the beginning of an interval of wave cycles that are locally minimal in amplitude and complexity (departures from sinusoidal form). The offset of the fricative was measured as the later of either the noise offset or the offset of the minimal wave cycles as defined above. The voiced interval within the fricative was the interval including a quasi-periodic cycle in the waveform, and the voiceless interval had no such quasi-periodic cycle. There were 12 instances in the dataset in which the voiced interval was discontinuous, including two subintervals at the onset and offset of the fricative. In these cases both subintervals were included in the duration of the voiced interval.
The three measures were expected to reflect the voicing category of the fricative. The vowel preceding a voiced fricative is expected to have a longer duration than one before a voiceless fricative (Chen Reference Chen1970, Kluender, Diehl & Wright Reference Kluender, Diehl and Wright1988, de Jong Reference de Jong1991). A voiceless fricative is expected to be longer in duration than a voiced one (Denes Reference Denes1955, Stevens et al. Reference Stevens, Blumstein, Glicksman, Burton and Kurowski1992). A voiced fricative is also expected to have a longer interval of voicing than a voiceless fricative (Raphael Reference Raphael1972, C. Wolf Reference Wolf1978, Hogan & Roszypal Reference Hogan and Roszypal1980, Stevens et al. Reference Stevens, Blumstein, Glicksman, Burton and Kurowski1992, Kingston & Diehl Reference Kingston and Diehl1994, Smith Reference Smith1997).
Due to anticipatory laryngeal coarticulation, these measures are also expected to be affected by whether the segment following the test fricative is voiced or voiceless. A voiceless obstruent following the fricative would be expected to push the acoustic measures of the fricative toward where they would be for a voiceless fricative: shorter vowel duration, longer fricative duration, and a longer voiced interval within the constriction. The other segments that follow the fricative are all voiced and so would be expected to influence the fricative in the opposite direction.
For each measure, a mixed model analysis was conducted in which talker and word were treated as random factors, adjusting the intercept according to subject and item pair. The fixed, experimental factors were voicing (voiced/voiceless) and following segment (nasal/voiced obstruent/voiceless obstruent/vowel). The alpha level was p ≤ .05.
1.2 Results
1.2.1 Voicing duration
Figure 2 presents the percentile distribution of the duration of the voiced interval during the fricative, according to the voicing of the fricative and the category of the following segment. The mean duration of the voiced portion of the constriction interval was greater in voiced (42 ms) than in voiceless fricatives (26 ms). Within the voiceless fricatives, there is little difference depending on what class of segment follows the fricative. Within the voiced fricatives, the most notable difference is that the mean voicing duration when a voiceless obstruent follows (34 ms) is less than that for any of the other following segment groups (nasal –45 ms; voiced obstruent –46 ms; vowel –45 ms).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160927090351152-0593:S0025100309990284:S0025100309990284_fig2g.gif?pub-status=live)
Figure 2 Duration of the voiced interval in the fricative (ms) by fricative voicing class and following segment class.
Both main effects were significant: voicing (F(1,560) = 155.3, p < .001) and following segment (F(3,560) = 6.3, p < .001). The interaction between the two factors was also significant (F(3,560) = 6.2, p < .001). Examining each voicing class separately, we find that the effect of following segment is significant in voiced fricatives (F(3,277) = 9.1, p < .001), but not in voiceless fricatives (F < 1).
Within the voiced fricatives, the following segment classes were compared pairwise, with a Bonferroni adjustment of the alpha level to .008 (= .05/6). Voiced fricatives with a following voiceless obstruent had a significantly shorter voiced interval compared to voiced fricatives with any other following segment class: vowels (F(1,139) = 26.5, p < .001), nasals (F(1,139) = 19.8, p < .001), and voiced obstruents (F(1,139) = 20.5, p < .001). None of the other pairs of groups were significantly different from each other.
Thus the duration of the voiced interval in the test fricative was significantly longer in voiced than in voiceless fricatives, as has been found previously (Raphael Reference Raphael1972, C. Wolf Reference Wolf1978, Hogan & Roszypal Reference Hogan and Roszypal1980, Stevens et al. Reference Stevens, Blumstein, Glicksman, Burton and Kurowski1992, Kingston and Diehl Reference Kingston and Diehl1994, Smith Reference Smith1997). Voicing duration was significantly less in voiced fricatives before a voiceless obstruent than before a voiced segment of any sort, but in voiceless fricatives the measure was not significantly affected by the following segment type, in agreement with Docherty (Reference Docherty1992: 165) and Stevens et al. (Reference Stevens, Blumstein, Glicksman, Burton and Kurowski1992).
1.2.2 Fricative duration
Figure 3 presents the percentile distribution of the fricative duration according to the voicing of the fricative, and the class of the following segment. The mean duration of voiceless fricatives (119 ms) was greater than that of voiced fricatives (71 ms). Both main effects and their interaction were significant: voicing (F(1,560) = 53.8, p < .001), following segment (F(3,560) = 8.9, p < .001), voicing × following segment (F(3,560) = 7.6, p < .001). The effect of the following segment was significant within the class of voiceless fricatives (F(3,283) = 12.4, p < .001), but not in voiced fricatives (F(3,277) = 2.2, p = .08).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160927090351152-0593:S0025100309990284:S0025100309990284_fig3g.gif?pub-status=live)
Figure 3 Duration of the fricative (ms) by fricative voicing class and following segment class.
Focusing on the voiceless fricatives, a pair-wise comparison of following segment classes (with a Bonferroni-adjusted alpha level of .008) revealed that voiceless fricatives with a following obstruent were significantly shorter than those followed by a sonorant: voiced obstruent vs. nasal (F(1,142) = 34.3, p < .001), voiced obstruent vs. vowel (F(1,141) = 25.5, p < .001), voiceless obstruent vs. nasal (F(1,142) = 10.9, p = .001), voiceless obstruent vs. vowel (F(1,141) = 9.1, p = .003). None of the other pairs were significantly different.
In summary, fricative duration was significantly longer in voiceless than in voiced fricatives, replicating a result in such studies as Denes (Reference Denes1955), Cole & Cooper (Reference Cole and Cooper1975), and Stevens et al. (Reference Stevens, Blumstein, Glicksman, Burton and Kurowski1992). This voicing cue was not significantly affected by the voicing of the following segment, but in voiceless fricatives it was affected by whether the following segment was a sonorant or an obstruent.
1.2.3 Vowel duration
The percentile distribution of the duration of the vowel is presented in Figure 4. The mean duration of the vowel was longer before a voiced fricative (169 ms) than before a voiceless fricative (128 ms).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160927090351152-0593:S0025100309990284:S0025100309990284_fig4g.gif?pub-status=live)
Figure 4 Duration of preceding vowel (ms) by fricative voicing class and following segment class.
The main effects were both significant: voicing (F(1,560) = 92.9, p < .001) and following segment (F(3,560) = 4.5, p = .004). The interaction between the two factors was not significant. Comparing the following segment classes pair-wise with a Bonferroni adjustment (α = .008), we find that only two pairs are significantly different: voiceless obstruent vs. vowel (F(1,282) = 13.6, p < .001), and voiceless obstruent vs. voiced obstruent (F(1,283) = 9.2, p = .003). Voiceless obstruents were, however, not significantly different than nasals in this measure (F(1,183) = 3.3, p = .07).
The vowel was significantly shorter before a voiceless fricative than before a voiced fricative, as expected (Chen Reference Chen1970). But the vowel was not consistently shorter when the consonant after that postvocalic fricative (i.e. C2 in a VC1C2 sequence) was voiceless than when that consonant was voiced. This was also the case when the measure was vowel duration relative to the duration of the vowel + fricative sequence (i.e. V/V + C), as proposed by Kohler (Reference Kohler1979).
1.2.4 Discussion: production study
In summary, voiced fricatives in this study had a significantly shorter voiced interval before a following voiceless consonant than before any voiced segment (obstruent, sonorant consonant, or vowel). The duration of the voiced interval in voiceless fricatives was, on the other hand, not significantly affected by the following segment class. The voicing class of the following segment also had no significant effect on either the duration of the test fricative or the duration of the vowel preceding that fricative. These results are consistent with those of Smith (Reference Smith1997: 483) and Jansen (Reference Jansen2004: 137), who found that the voicing of a following segment affected the duration of voicing in obstruents in English, but not the segment duration cues that are also associated with voicing class. Kuzla, Cho & Ernestus (Reference Kuzla, Cho and Ernestus2007) report a similar pattern in German: the proportion of voicing in a fricative is less after a voiceless stop than after a vowel, but the voicing of the preceding segment does not affect the duration of the fricative noise interval.
One consequence of this separation of the voicing cue from the duration cues for the voicing contrast is that they cannot be mechanically or causally linked, despite their frequent co-occurrence (Kingston & Diehl Reference Kingston and Diehl1994). The effect of the following segment on voicing within the fricative reflects an anticipation of the glottal spreading gesture of the following segment in the preceding one. The supralaryngeal gestures that determine vowel and fricative length are largely unaffected.
The effect of the following segment was limited to devoicing. Voiced fricatives were affected by a following voiceless obstruent, but voiceless fricatives were not significantly affected in any of the three measures by a following voiced obstruent. This finding is consistent with the results for English of Docherty (Reference Docherty1992: 165) and Stevens et al. (Reference Stevens, Blumstein, Glicksman, Burton and Kurowski1992). The asymmetry between voiced and voiceless fricatives is likely due to the inherent fragility of voicing in fricatives (Haggard Reference Haggard1978, Ohala Reference Ohala and MacNeilage1983, Stevens et al. Reference Stevens, Blumstein, Glicksman, Burton and Kurowski1992). As Ohala (Reference Ohala and MacNeilage1983) points out, there is a delicate balance between the trans-glottal pressure drop required to maintain voicing, and the relatively high pressure behind the oral constriction required to maintain turbulent oral airflow. As a result, English voiced fricatives often include voiceless intervals even in prevocalic position (Haggard Reference Haggard1978, Smith Reference Smith1997). The inherent resistance of fricatives to voicing facilitates the devoicing effect of a following voiceless obstruent, but it resists the voicing effect that would be expected if the laryngeal configuration of a following voiced consonant were anticipated in the fricative.
However, it should be noted that Snoeren et al. (Reference Snoeren, Hallé and Segui2006) found the opposite asymmetry in French word-final stops preceding a consonant in a following word. The duration of voicing in a voiceless stop was significantly more affected by the voicing class of a following consonant than it was in a voiced stop. The difference between this study and those on English might be partly due to the fact that the focus of the French study is voicing in stops, while that of the English studies is voicing in fricatives. But it could also be related to the language-particular implementation of the voicing contrast. Both voiced and voiceless stops in French have a lower VOT than the corresponding stops in English (Kessinger & Blumstein Reference Kessinger and Blumstein1997). This indicates a smaller glottal width for the voiceless category in French, with less of a scope for devoicing neighboring segments than in English.
2 Perception study (Experiment 2)
The materials produced in the production experiment were used as the basis for stimuli in a perception experiment, designed to investigate the effect of a following segment on the identification of the voicing category of a fricative. Test words contrasting in the voicing of the final consonant were excised from their carrier sentences and presented to listeners for identification. It was expected that, due to the acoustic effects of the following consonant, listeners would tend to identify obstruents as matching in voicing class with the obstruent that followed in the sentence from which the word was excised. In particular, given the results of the production study, voiced fricatives were expected to be identified as voiceless if they came from a position before a voiceless obstruent, without identification of voiceless fricatives being affected by following segment context.
2.1 Methods
The stimuli for the perception experiment consisted of the 568 test items from the production experiment. These were all members of the minimal pairs listed in (6), all distinguished only by the voicing class of the word-final fricative. These items were excised from the carrier sentence, cutting at the closest zero crossing. The items were all normalized to the same peak intensity level (–20 dB relative to full scale). To make the onset of the recording less abrupt, a 10 ms interval of silence was introduced at the beginning of the recording, and a 50 ms fade-in was introduced for all files beginning with a continuant.
The test word was removed from its discourse context to eliminate discourse-context clues to its identity. In a pilot study the test word was presented together with the following word (e.g. proof in), since this following word was identical for both members of a minimal pair, but it was found that subjects were disconcerted by the fact that the sequence to be identified was not a coherent unit of the language.
Because there was no other modification of the files, and in particular no modification of duration or f0, the recordings retain their prosodic characteristics and sound as if they were incomplete snippets from a longer utterance, rather than being a complete one-word utterance. Subjects were told that the recordings had been excised from longer sentences, and were warned that this might make them sound somewhat odd.
Sixteen adult native speakers of American English participated as subjects in the perception study. The stimuli were presented to them over headphones from a computer. They were blocked by minimal pair, and within each block they were presented in randomized order every two seconds. The minimal pair for the current block was presented visually on the screen, with the left-hand word in blue and the right-hand word in red. Subjects identified which of the two words they heard by pressing the corresponding key of a response pad: the left-hand blue button or the right-hand red button. 50% of the blocks had the voiced-final word on the left, and 50% had it on the right. Responses were recorded on the computer.
There were 568 stimuli and 16 subjects for a total of 9088 potential trials (16 × 568). Two subjects inadvertently exited the experiment before it was over, leaving 103 stimuli unpresented. There were 188 cases in which the subject did not succeed in responding within the two seconds allowed between stimuli. Two responses with suspiciously low response times below 200 ms were excluded, interpreted as either inadvertent button presses or late responses to the previous stimulus. Finally, two subjects with normal error rates in all other minimal pairs, had less than 50% correct responses to the pair lose/loose. This suggested that they were confused by the unusual spelling distinction in this pair (in which a final consonant voicing contrast correlates with a difference in the spelling of the vowel), and their responses for this minimal pair were excluded. These exclusions left 8677 responses for the analysis.
2.2 Results
Table 1 gives the counts and percentages for each response in each condition. Boldface marks correct responses. Overall, 90.5% (7857/8677) of the responses were correct, indicating that English speakers are, as expected, well able to identify contrasting voicing categories in word-final position.
Table 1 Responses by stimulus category.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160927090351152-0593:S0025100309990284:S0025100309990284_tab1.gif?pub-status=live)
Nevertheless, the proportion of voiced and voiceless responses varied according to the position from which the stimulus was excised. In Table 1, the lowest percentage of correct responses was for voiced stimuli drawn from the position before a voiceless obstruent (84%), and the second lowest was for voiceless stimuli are drawn from before a voiced obstruent (87% correct). The highest rate of errors, then, came in cases of mixed-voicing obstruent clusters, where the final obstruent of the test word preceded an obstruent of the opposite voicing class.
The statistical analysis used mixed-model logistic regression. The dependent variable was the subject's response, with a voiceless response (identification of the voiceless-final word) was coded as 1, and the opposite response was coded as 0. Subject (listener), talker, and minimal pair were treated as random effects. The fixed effects were voicing (voiced stimulus/voiceless stimulus) and following segment class (nasal/voiced obstruent/voiceless obstruent/vowel). In the factor voicing, voiceless is coded as 1, and voiced as 0. The multi-level factor following segment class is contrast-coded, with nasal as the (initial) default level. The alpha level was set at p ≤ .05. The results of this analysis are given in Table 2. The effect of voicing is significant, and the coefficient is positive, indicating that, all else being equal, a voiceless stimulus was significantly more likely than a voiced stimulus to be identified as voiceless. This factor reflects the highly accurate identification of voicing categories.
Table 2 Voiceless response as a function of stimulus voicing class and following segment class (logistic regression results).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160927090351152-0593:S0025100309990284:S0025100309990284_tab2.gif?pub-status=live)
The effect of the following segment was more complicated. The effect of a following voiced obstruent was significant, and the negative coefficient indicates that a fricative before a voiced obstruent was less likely to be identified as voiceless than a fricative before the default following segment class (nasal). The effect of a following voiceless obstruent was significant, and favored a voiceless response. A following vowel did not have a significant effect compared to a nasal. The interactions of voicing with different following segment are all significant as well.
By redoing the analysis with different default values for the following segment factor, we can determine which levels differ significantly from which. A following voiceless obstruent significantly favors a voiceless response whether it is compared to a following nasal, as in Table 2, a voiced obstruent (z = 7.6, p < .001), or a vowel (z = 5.7, p < .001). A following voiced obstruent significantly favors a voiced response whether it is compared to a nasal, as in Table 2, a voiceless obstruent (z = –7.6, p < .001), or a vowel (z = –0.4, p = .02). The effect of a following nasal is not significantly different from that of a following vowel.
Within the subset of responses to voiced stimuli, a following voiceless obstruent was a significant factor favoring a voiceless response whether it is compared to a nasal (z = 6.0, p < .001), a voiced obstruent (z = 7.9, p < .001), or a vowel (z = 5.9, p < .001). With voiced stimuli, a following voiced obstruent was a significant factor favoring a voiced response whether it is compared to a nasal (z = –2.3, p = .02), or a vowel (z = –2.4, p = .02). The effect of a following nasal or vowel was not significant in this subset.
Within the subset of responses to voiceless stimuli, neither a following voiced obstruent nor a following voiceless obstruent had a consistent significant effect on the response. A following voiceless obstruent significantly favored a voiceless response compared to a following voiced obstruent (z = 2.3, p = .02), but otherwise the only significant effect of following segment type was that of a following nasal (surprisingly) favoring a voiceless response, whether compared to a following voiced obstruent (z = 6.7, p < .001), a voiceless obstruent (z = 4.7, p < .001), or a following vowel (z = 5.9, p < .001).
2.3 Discussion: perception study
Listeners displayed significant tendencies to identify a voiced fricative as voiceless when it was drawn from the position preceding a voiceless obstruent, and as voiced when it was drawn from a position preceding a voiced obstruent. But they showed no significant tendency to identify a fricative as voiced before a nasal or a vowel, nor did they show any tendency to identify a voiceless fricative as voiced before a voiced segment class. In this, the perceptual results followed the production study, in which it was found that the acoustic effect of the following segment was limited to the devoicing effect of a voiceless obstruent on a preceding voiced fricative.
Stevens et al. (Reference Stevens, Blumstein, Glicksman, Burton and Kurowski1992) reported a listening test in which a V–fricative–fricative–V sequence was excised from recorded sentences and presented to listeners to identify the medial cluster from a set of four alternatives differing just in voicing (voiceless–voiceless, voiceless–voiced, voiced–voiceless, voiced–voiced). This experiment differed from the one reported here mainly in the task. In the present experiment, subjects heard actual words of English, and their task was a forced-choice identification of the stimulus as one of two words differing in voicing of the final consonant. In the Stevens et al. (Reference Stevens, Blumstein, Glicksman, Burton and Kurowski1992) study, on the other hand, subjects heard a meaningless VCCV excerpt from a nonsense sentence, and they identified the medial consonant cluster from a set of four choices. As in the present study, Stevens et al. (Reference Stevens, Blumstein, Glicksman, Burton and Kurowski1992) found a tendency for listeners to identify a voiced fricative as voiceless when it preceded a voiceless obstruent (in their experiment a fricative). 16.5% of the voiced–voiceless sequences were identified as voiceless–voiceless, compared to only 4.6% identified as voiced–voiced and 3.2% identified as voiceless–voiced. Unlike our study, they also found that a voiceless fricative tended to be identified as voiced when it occurred before a voiced obstruent. 29.8% of voiceless–voiced clusters were misidentified as voiced–voiced, compared to 10.1% as voiceless–voiceless and 3.8% as voiced–voiceless. They do not present a statistical analysis of their results.
Snoeren et al. (Reference Snoeren, Hallé and Segui2006) had subjects identify French words with word-final voiced or voiceless stops, excised from contexts with a following consonant with the opposite voicing or from other contexts without such a following consonant. They found that the word-final stop in the former context tended to be identified as having the same voicing as the following consonant, and that this effect was greater for voiceless than for voiced stops. This is an asymmetry between voiced and voiceless targets that runs in the opposite direction as in our results. This probably reflects the differences between English and French in acoustic effects of a following segment on voicing cues, discussed in Section 1.2.4 above.
Because the test word was excised from its sentence context and presented alone, subjects were not able to use their abilities to compensate for the effects of segment context (Lindblom & Studdert-Kennedy Reference Lindblom and Studdert-Kennedy1967, Mann & Repp Reference Mann and Repp1980), or to fill in missing information top-down from discourse context (Warren Reference Warren1970). The results therefore provide a baseline picture of how the acoustic effects of the following segment impact voicing class identification, when information about the phonetic and discourse context is removed.
3 Conclusion
The production study was designed to test the hypothesis that, due to laryngeal coarticulation, the voicing cues of a word-final fricative would tend toward matching the voicing category of the following segment. This hypothesis was only supported in part: the voicing interval of a voiced fricative was significantly shorter before a voiceless obstruent than before a voiced segment (obstruent, nasal, or vowel). But this measure was not significantly affected in a voiceless fricative by voicing of the following segment, nor were the voicing cues of vowel duration and fricative duration affected by the voicing class of the following segment.
The results of the perception study paralleled those of the production study. Subjects displayed a significant tendency to identify a voiced fricative as voiceless when it was excised from the position before a voiceless obstruent, but they did not display a similar tendency to identify a voiceless fricative as voiced before a voiced obstruent or sonorant.
The perception results suggest that these listeners assigned a greater weight to the voicing duration cue, or some unmeasured cue with the same distribution, than to the vowel duration or consonant duration cues to the voicing contrast. If they attended more to the vowel and consonant duration cues, which were not significantly affected by the following segment, they would not have displayed a sensitivity to the following segment in their identification judgments.
The results are also relevant to the proposed diachronic account of the emergence of phonological regressive voicing assimilation. The proposal was based on the hypothesis that the voicing cues of both voiced and voiceless obstruents would be influenced through overlap of laryngeal gestures in the direction of the voicing category of a following obstruent. This would result in listeners tending to mistake the voicing category of the first obstruent for that of the second. Generalization on this basis would lead language learners to adopt regressive voicing assimilation as a restriction on the distribution of voicing categories.
The finding that the effect of the following segment in both acoustic measurements and in perception was limited to the devoicing of a voiced obstruent before a voiceless obstruent suggests that this scenario requires revision. The simplest interpretation would be that the phonetic state of affairs in English would lead directly not to the general regressive voicing assimilation exemplified in (1), but to the more restricted pattern in (3), in which a voiced obstruent is required to match a following obstruent in voicing class, but a voiceless obstruent is not. This regressive devoicing assimilation pattern is already said to occur in Yorkshire English (between words – Wells Reference Wells1982: 367), but also in French (within a phonological word – Dell Reference Dell1995: 12), Yathê (Wetzels & Mascaró Reference Wetzels and Mascaró2001: 228), Afar (noncoronals only – Bliese Reference Bliese and Bender1976: 160), Sawai (between morphemes – Whisler Reference Whisler, Burquest and Laidig1992: 24), and Makkan Arabic (Abu-Mansour Reference Abu-Mansour and Eid1996: 217). If a language learner made errors in the identification of voicing class as the subjects in the perception experiment did, and then made generalizations on that basis about the distribution of voiced and voiceless obstruents, he or she could well end up with the generalization that a voiced obstruent cannot occur before a voiceless one.
The more general version of regressive voicing assimilation in (1), according to which an obstruent must match a following obstruent in voicing category, could develop from the regressive devoicing assimilation pattern as a generalization of the phonological distribution to include the whole class of obstruents as the target. Alternatively, it could be that this more symmetrical assimilation pattern emerges from a phonetic pattern different from that found in English – one in which both voiced and voiceless obstruents are affected in their voicing cues by the voicing class of a following obstruent.
Acknowledgements
I would like to thank Adrian Simpson and three anonymous JIPA reviewers for very useful comments on this paper, as well as the audience at the 2009 meeting of the Acoustical Society of America in Portland.
Appendix. Test sentences
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160927090351152-0593:S0025100309990284:S0025100309990284_tab3.gif?pub-status=live)