1 Introduction
English and some other Germanic languages present an interesting case for the role of phonation in the implementation of a voice contrast in obstruents. From a phonetic perspective, studies have shown that in isolation and in post-pausal position, phonation in languages like English and German most often does not occur either in voiceless or voiced obstruents (e.g. Lisker & Abramson Reference Lisker and Abramson1964, Suomi Reference Suomi1980, Docherty Reference Docherty1992, Jessen & Ringen Reference Jessen and Ringen2002, Moosmüller & Ringen Reference Moosmüller and Ringen2004, Beckman, Jessen & Ringen Reference Beckman, Jessen and Ringen2013). Instead, stops are acoustically distinguished by, among other characteristics, voice onset time (VOT), f0, F1 onset frequency, etc. (Lisker & Abramson Reference Lisker and Abramson1967, Zlatin Reference Zlatin1979, Lisker Reference Lisker1986, Hanson Reference Hanson2009). Studies have shown that English (and German) inter-sonorant voiced stops do in fact usually contain a considerable amount of phonation, but phonation is still not obligatory and the amount and location of phonation is highly dependent on surrounding phonetic factors such as the preceding consonant or the prosodic boundary (Suomi Reference Suomi1980, Docherty Reference Docherty1992, Jacewicz, Fox & Lyle Reference Jacewicz, Fox and Lyle2009, Davidson Reference Davidson2016). Similar results for fricatives confirm that the surrounding context is also a critical factor; for example, there is significantly more phonation in voiced fricatives when preceded by sonorants than by voiceless sounds (Haggard Reference Haggard1978, Docherty Reference Docherty1992, C. Smith Reference Smith1997, Davidson Reference Davidson2016).
Because of the lack of obligatory phonation in English and German voiced obstruents, there are still some questions about how phonation is controlled as an articulatory and acoustic cue for potentially distinguishing between obstruents that are voiceless and voiced (or fortis and lenis, as some authors prefer, Iverson & Salmons Reference Iverson and Salmons1995, Jessen & Ringen Reference Jessen and Ringen2002, Moosmüller & Ringen Reference Moosmüller and Ringen2004, Purnell et al. Reference Purnell, Salmons, Tepeli and Mercer2005). While it has been observed that phonation from surrounding sonorants and vowels can carry over into following voiceless obstruents (e.g. Docherty Reference Docherty1992, Pirello, Blumstein & Kurowski Reference Pirello, Blumstein and Kurowski1997, Shih, Möbius & Narasimhan Reference Shih, Möbius and Narasimhan1999, Koenig & Lucero Reference Koenig and Lucero2008), the contextual factors corresponding to the amount of phonation produced for voiced stops in English are comparatively better understood. To extend previous research focusing on minimal pairs and other laboratory-based speech such as words in short carrier phrases, this study examines the implementation of phonation in voiceless obstruents taken from a relatively large corpus of read speech. An advantage of the larger corpus is that the target voiceless obstruents are taken from a variety of contexts, including a range of preceding sounds, different phrase and word positions, and different stress patterns that are produced in the course of a narrative. Moreover, the corpus includes a large number of talkers. Following the study in Davidson (Reference Davidson2016) on voiced obstruents, this investigation examines the extent to which voiceless obstruents seem to take advantage of surrounding phonetic environments that are conducive to prolonging phonation, or whether the presence of a glottal opening gesture for voiceless obstruents levels these contexts, leading to a similar rate of phonation across all voiceless obstruents regardless of the surrounding context.
1.1 Presence of phonation in obstruents
Previous research on the implementation of phonation in English obstruents has largely focused on the obstruents that are usually characterized as voiced: [b d ɡ v ð z ʒ]. Many studies of stops in both American and British English have found that in utterance-initial or post-pausal position, less than a quarter of the tokens were produced with any voicing at all before the burst (Lisker & Abramson Reference Lisker and Abramson1967, Suomi Reference Suomi1980, Keating Reference Keating1984, Docherty Reference Docherty1992, Davidson Reference Davidson2016), but other studies have shown that some speakers produce more than 50% of tokens with some amount of prevoicing (B. Smith Reference Smith1978, Flege Reference Flege1982, Westbury Reference Westbury1983). Studies of utterance-initial fricatives also show that there is generally a split between voicing throughout, partial voicing, and no voicing at all (Docherty Reference Docherty1992, Pirello et al. Reference Pirello, Blumstein and Kurowski1997). Taken together, these results suggests that great individual variability in phonation during obstruents exists among English speakers in this prosodic position.
When voiced obstruents are preceded by another phoneme, the effects of the surrounding context are particularly influential on the realization of phonation. Most studies that examine stops in intervocalic position show that between 80%–90% are fully phonated (Flege & Brown Reference Flege and Brown1982, Westbury Reference Westbury1983, Keating Reference Keating1984). Intervocalic fricatives are also typically produced with phonation, though the phonation is not always complete (Haggard Reference Haggard1978, Stevens et al. Reference Stevens, Blumstein, Glicksman, Burton and Kurowski1992, C. Smith Reference Smith1997). Measurements of phonation for obstruents in phrase-medial position – including word-initial, medial, and final obstruents – in Davidson (Reference Davidson2016) provide comprehensive evidence about the effect of preceding sounds beyond vowels. The main results demonstrate that phrase-medial stops are produced with the most full voicing when preceded by nasals, followed by vowels and approximants. For stops, all preceding obstruents lead to significantly more full devoicing and less complete voicing in the target stop, regardless of the underlying voicing specification of the preceding obstruent. Fricatives maintain more consistent rates of full phonation (~31% of tokens measured) and partial phonation (~50%) regardless of preceding sound, though preceding nasals significantly decrease rates of full voicing and increase rates of full devoicing.
Lastly, Davidson (Reference Davidson2016) also reports on ‘voicing shape’, or where during the obstruent constriction phonation is present when there is partial phonation of a stop or fricative. Here, this measure will be renamed ‘phonation shape’ to better reflect the cue that is being examined. This measure was carried out because previous studies do sometimes report on whether an obstruent is partially or fully phonated, but for partially phonated obstruents, there is less available information about where in the closure or constriction phonation is more likely to be present. Many voiced obstruents exhibit phonation due to carryover voicing or bleed from a preceding sonorant. The second most common phonation shape is trough, which occurs when there is phonation at the beginning of the obstruent which dies out and then emerges again before the offset of the obstruent. Whereas the bleed pattern is relatively common for both stops and fricatives, the trough pattern is much more common for fricatives. Negative VOT, referring to phonation beginning partway through the closure and extending after the burst or the end of frication, was very rare.
These previous phonetic results for English voiced obstruents demonstrate two points that are relevant to understanding differences in gestural requirements for voiceless and voiced obstruents. Because phonation is far from obligatory in English, English is a relevant test case for understanding which aerodynamic and articulatory conditions are conducive to the natural prolongation of phonation and which ones restrict it. Though these results suggest that English may not need an obligatory gesture to ensure vocal fold vibration in the representation of voiced obstruents (consistent with the lack of an overt voicing gesture in Articulatory Phonology representations of English, for example, Browman & Goldstein Reference Browman and Goldstein1992), it is generally assumed that the representation of voiceless obstruents (in English and other languages) does contain a specification for a laryngeal opening gesture (e.g. Löfqvist Reference Löfqvist1980; Browman & Goldstein Reference Browman, Goldstein, Kingston and Beckman1990, Reference Browman and Goldstein1992; Munhall & Löfqvist Reference Munhall and Löfqvist1992; Hoole Reference Hoole1997). The timing of this gesture with respect to the oral gesture of the obstruent may condition how much ‘edge phonation’ (cf. ‘edge vibrations’ in Lisker & Abramson Reference Lisker and Abramson1964) – partial phonation that carries over from the preceding sound or that begins during the transition from a consonant to a vowel – is present during the constriction of stops and fricatives. In a study of intraoral air pressure in German, Koenig, Fuchs & Lucero (Reference Koenig, Fuchs and Lucero2011) find that there is an asymmetry in the amount of intraoral pressure (Pio) observed at obstruent edges: Pio is greater at phonation offset (at the beginning of the target consonant) than at phonation onset (after the target consonant), which contributes to differences in the proportion of phonation present at the edges of the closure.
Moreover, studies suggest that the timing and magnitude of the glottal opening gesture differ substantially between aspirated stops, where it is large and shows little variability in coordination with the oral gesture, and unaspirated stops, where it has a decreased magnitude and may not even always be present for all speakers (Cooper Reference Cooper1991, Fuchs Reference Fuchs2005). As summarized in Hoole (Reference Hoole1997), articulatory studies have shown that across languages that have both aspirated voiceless stops and voiceless fricatives, the onset and peak of glottal abduction is earlier with respect to the onset of oral constriction for fricatives than for stops (Löfqvist & Yoshioka Reference Löfqvist and Yoshioka1984, Löfqvist & McGarr Reference Löfqvist, McGarr, Baer, Sasaki and Harris1987). If an active laryngeal opening gesture and an increase in intraoral air pressure largely prevents carryover or anticipatory phonation from surrounding voiced segments, this may be another way – in addition to VOT or f0 of the following vowel, for example – that languages like English distinguish between overall phonetic distributions of phonologically voiced and voiceless stops. One goal of this paper is to rely on results from both acoustic outcomes and articulatory research on glottal opening gestures to understand patterns of phonation in obstruents in English.
Previous examinations of the amount of phonation that is produced during voiceless obstruent constrictions in a variety of languages have noted that a small proportion of the constriction period is often phonated. For Southern British English, Docherty (Reference Docherty1992) reports that in post-pausal and post-voiceless obstruent position, voiceless stops do not have any phonation during the constriction, but when they are preceded by a vowel, between 14%–18% of the constriction contains voicing (depending on the individual consonant) for a majority of the stops. A majority of word-final stops, also preceded and followed by a vowel, have slightly higher rates of phonation bleeding into the closure (15%–24% of the closure duration). The effects of the surrounding context are similar for fricatives, but the proportion of phonation during the closure is typically lower: between 3.5%–11%. Although Docherty does not seem to specifically analyze the end of the obstruent, it seems that that all of the phonation that is present occurs as carryover phonation from a preceding vowel (since there is no phonation at all in the post-pausal case). In a study on only fricatives, Pirello et al. (Reference Pirello, Blumstein and Kurowski1997) confirm that when there is phonation during the constriction for their American English speakers in the laboratory speech that they elicit, it is almost always within the first 30 ms of the frication noise.
Related results have been found for voiceless stops in other languages. In an analysis of one speaker each of Spanish, Italian, Mandarin, Hindi and German, without taking word or phrase position into account, Shih et al. (Reference Shih, Möbius and Narasimhan1999) find that there is often phonation at the beginning of obstruent constrictions in all of these languages, though the likelihood of the phonation persisting into later portions of the closure depends on the languages. For the Spanish and Italian speakers, the probability of phonation occurring at the beginning of the closure is between 50–60%, and decreases linearly in 20% intervals over the duration of the obstruent constriction. In Mandarin, which does not have phonologically voiced stops, the likelihood of phonation occurring at the beginning of a voiceless unaspirated stop is nearly 80%, and decreases only to around 40% by the end of the closure period. Aspirated stops have a 60% likelihood of phonation at the beginning, but decrease to nearly 0% likelihood by the end. For Hindi, which does have voiced stops, the likelihood of phonation in both types of voiceless stops starts around 50%, but quickly drops to nearly 0% before the midpoint of the stop is reached. For German, which is most similar to English, the likelihood of phonation in voiceless stops starts around 50%, and linearly decreases to 0% by the end of the constriction period (see also Möbius Reference Möbius2004, for German).
In a comparison of minimal stop pairs such as Liebe–Lippe in Austrian German, Moosmüller & Ringen (Reference Moosmüller and Ringen2004) find that the percentage of voicing during the voiceless closure for six speakers ranged from 10%–40%, as compared to 30%–100% for the voiced stop. An articulatory and acoustic examination of spontaneous speech from two speakers of Greek shows that nearly 40% of /t/ tokens were produced with partial or full voicing (though evidence of fricative and approximant productions of /t/ is also reported, but it is unclear whether the voicing coincides with the lenited forms or not). Phonation during /k/ also occurred, but at slightly lower rates. Finally, a small number of intervocalic /s/ tokens were produced with full or partial voicing (Nicolaidis Reference Nicolaidis2001). An experiment on the production of final voiceless stops in French showed that singleton obstruents contain phonation during approximately 30% of the closure period, ranging from 0% to 63% (Snoeren, Hallé & Segui Reference Snoeren, Hallé and Segui2006). A comparison of intervocalic voiceless stops in Spanish and French show that Spanish has quite a bit of phonation, with 32.7% of voiceless stops in a corpus of spontaneous speech being produced as completely voiced and 61.8% with phonation for at least half of the stop closure duration. The comparable numbers in French are 8.5% and 31.8% (Torreira & Ernestus Reference Torreira and Ernestus2011).
Overall, the authors of these studies are mostly unconcerned that the phonation that occurs in voiceless obstruent could potentially be encroaching on the contrast between the phonological categories of voiced–voiceless (or lenis–fortis). For Standard German, Jessen & Ringen (Reference Jessen and Ringen2002: 201) note that the voiceless obstruents are ‘all voiceless (except for a very short tail of voicing into closure, which is probably universal and not perceivable)’. Lisker & Abramson (Reference Lisker and Abramson1964) argue that in English the ‘edge vibrations’ are present both acoustically and articulatorily, via information from transillumination of the larynx, but they speculate that the glottal pulses are probably too low amplitude to be audible to listeners. On the other hand, Docherty (Reference Docherty1992: 129) observes that ‘the frequency with which VOICED obstruents are “devoiced” means that there is a good deal of overlap in voicing timing patterns between sounds which would typically be labeled differently as “voiced” or “voiceless”’.
In order to adjudicate between these two positions for American English, the current study provides a more complete analysis of the presence of phonation in American English obstruents. Because this study uses a bigger corpus of connected (read) speech (37 speakers, 12,500 obstruents) as compared to the individual sentences or carrier phrases used in most other studies, it is possible to more comprehensively examine the effects of phrase position, word position, preceding context, and stress on the implementation of phonation during voiceless obstruents. Where relevant, the results from this study will be compared to the findings in Davidson (Reference Davidson2016), which uses the same corpus to examine phonation in voiced obstruents. Moreover, this study more carefully examines whether the phonation during voiceless constrictions must be carryover voicing from a preceding sonorant, or whether it is also possible that there is some anticipatory phonation for an upcoming vowel. This is investigated by employing the ‘phonation shape’ metric that was introduced in Davidson (Reference Davidson2016). The results based on these questions have implications for the cross-linguistic representation and coordination of laryngeal gestures for both voiceless and voiced obstruents.
2 Method
2.1 Participants
The obstruents from 37 speakers investigated in this experiment come from a corpus of read stories that were originally collected for two other studies: Bouavichith & Davidson (Reference Bouavichith and Davidson2013), a study of intervocalic stop reduction, and Davidson & Erker (Reference Davidson and Erker2014), which focused on hiatus resolution.
The 13 speakers in Bouavichith & Davidson (six female and seven male) were all college students in the upper Midwest, and were between the ages of 18–25 years. The 24 speakers (17 female and seven male) in Davidson & Erker were college students in New York City at the time of the study. They ranged in age from 18–25 years. Most speakers were from hometowns in the mid-Atlantic and New England, but there were also speakers originally from the Midwest (Chicago, Michigan, Minnesota), and one each from Georgia, Texas, and New Mexico. Only one participant reported a history of speech or hearing disorders, having had speech therapy as a small child for the misarticulation of /s/. Since the speech of the participant was no longer affected, this participant's data was retained. All speakers were compensated for their participation.
2.2 Materials
The corpus of voiceless obstruents collected for the study comes from the three short stories read by the participants in Davidson & Erker (Reference Davidson and Erker2014) and the five short stories in Bouavichith & Davidson (Reference Bouavichith and Davidson2013). These same recordings were also used as the data for Davidson (Reference Davidson2016). In Davidson & Erker, the recordings were made in a sound proof room with a Marantz PMD-660 digital solid state recorder. The recordings for Bouavichith & Davidson occurred in a quiet room with no background noise, using a TASCAM DR-40 digital solid state recorder. A Shure WH30XLR cardioid condenser head-mounted microphone was used in both studies. The targets consisted of all instances of the stops /p t k/, fricatives /f θ s ʃ h/, and the affricate /tʃ/.
Since the materials in the corpus were originally designed for other purposes, the number of different obstruents and their positions in the phrases and words were not controlled, but they were counted and factored into the current analysis. Several factors led to the exclusion of some voiceless obstruents. First, obstruents in function words such as it, it's, this, what, who, how, she, him, her, etc., were excluded. The /h/ in forms of the verb ‘to have’ was also excluded since it was often not pronounced by the speakers. Extremely frequent prepositions such as at, to, for were also omitted. Second, flaps deriving from underlying /t/ and /t/ followed by a nasal, when realized as a glottal stop, were excluded. Third, /t/ was excluded when it was before /t, d, ð/, since /t/ was never released in this environment and therefore could not be distinguished from the following sound. Similarly, /s, ʒ/ before a sibilant fricative was also excluded for the same reason. Fourth, a stop before another stop was only included if it was released before the closure of the following stop began; otherwise, it would not be possible to distinguish between the end of the first stop and the beginning of the second stop. Fifth, any target obstruents that are produced as approximants (including formant structure) or stops produced with frication throughout the closure were not included in the corpus. Finally, any word containing a final stop with clear evidence of glottalization (either glottal striations throughout the closure, or glottalization on the surrounding vowels using the criteria described in Redi & Shattuck-Hufnagel Reference Redi and Shattuck-Hufnagel2001) were excluded. Only obstruents that were produced in their canonical realizations were included in this study, and those that were weakened or glottalized were not analyzed since the purpose of the study is to examine the distribution of phonation in obstruents.
The obstruents identified as target sounds were segmented using Praat textgrids (Boersma & Weenink Reference Boersma and Weenink2016) that were created for each story using the Penn Forced Aligner (Yuan & Liberman Reference Yuan and Liberman2008). The automated boundaries were then manually adjusted to ensure proper segmentation of the frication period of the fricatives and the closure period of the stops. Since there were few affricates relative to the stops and fricatives (3.8% of the total data set), only the closure portion was segmented and affricates were grouped with the stop. For both types of obstruents, when adjacent segments were vowels or sonorant consonants, the onset or offset of the second formant was used to determine the edge of the frication or closure boundaries. While this segmenting convention could potentially increase the proportion of tokens that are coded as having some voicing during the closure, it was adopted because it is often difficult to determine a distinction between F1 and f0 on a waveform and spectrogram and we did not want to erroneously exclude tokens that do contain voicing. For stops, the closure was demarcated at the onset of the burst, except in the few cases where no burst was visible, in which case the closure was segmented at the onset of F2. When a released stop was adjacent to a stop or a fricative, a boundary was placed at the offset of the stop burst or frication and the onset of the closure. In the case of adjacent fricatives (almost always a sibilant–non-sibilant boundary), a boundary was placed where there was an abrupt change between the high intensity of the sibilant and the low intensity of the non-sibilant. A marked change in the concentration of the frication noise was also usually evident at this juncture. The data were analyzed by a research assistant who was not informed of the purpose of the study. Each file was double checked by the author to ensure that the segmentation criteria had been met, and about 3% of the total data was adjusted after the original segmentation.
Since a main purpose of this study is to investigate the effect of the surrounding context, word and phrase position, and lexical stress on voicing in the target obstruents, this information was collected by converting each of the stories into the Carnegie Mellon University (CMU) pronouncing dictionary transcription system and then searching these transcripts to keep track of a number of variables. First, the sounds preceding and following the target obstruent were classified as to whether they were vowels, approximants, nasals, voiced fricatives, voiceless fricatives, voiced stops, or voiceless stops. Affricates were grouped with stops with the matching voicing specification to remain consistent with the grouping of the target affricates with the stops. Second, using the CMU notations that indicate stress (a ‘1’ following the vowel when it is stressed, and ‘0’ when it is unstressed), the target obstruents were also coded for the stress specification of the preceding and following vowels. Third, the obstruents were coded for word position (word-initial, medial or final). Fourth, the phrase position of the obstruents designated as phrase-initial (preceded by a pause), phrase-medial (in the middle of an utterance, regardless of the word position) or phrase final (followed by a pause). The determination of phrase/utterance-initial and phrase/utterance-final was based on whether there was a comma, period, exclamation mark or colon in the transcript. As a check on this orthographically-based metric, we determined that these punctuation marks typically coincided with the ‘sp’ notation that is inserted in the textgrid when the Forced Aligner detects a period of silence that is long enough to be characteristic of a pause. However, this method of marking pauses has some limitations which must be considered. In particular, one voiceless /t/ following a comma was removed from the dataset after noting that it was the one consonant in phrase-initial position that was often voiced by some speakers.Footnote 1 Subsequent inspection of this sentence indicated that speakers did not always pause as a result of the comma, and so phonation from the previous nasal carried over to the target consonant. Once this sentence was removed from the database, no other phrase-initial voiceless stops with phonation were observed.
Due to the large number of speakers, and obstruents that met the criteria for inclusion in the study (N = 12,498), our manual acoustic segmentation was limited to the target obstruents. As a result, we were not able to investigate each individual obstruent to ensure, for example, that the adjacent consonant was not deleted, or that stress was produced as would be expected from the CMU transcription. While the large number of tokens should mitigate this concern to some extent, it is important to make note of the potential limitations of this method.
After identifying the target obstruent, a Praat script was used to measure the duration of the obstruent interval and the proportion of phonation in the interval. The latter measurement was only applied to target obstruents in phrase-medial position, since acoustic data alone do not provide information about where stops begin, or where they end if they are unreleased. (While the proportion of phonation in fricatives could have been measured regardless of phrase position, that measurement was not included here in order to be able to compare stops and fricatives.) The proportion of phonation was obtained using the fraction of locally unvoiced frames measure that is implemented in Praat's Voice Report. The pitch settings were optimized for voice analysis as described in the Praat manual (see also Eager Reference Eager2015, for uses of this measure; Bárkányi & Kiss Reference Bárkányi, Kiss, den Dikken and Vago2009, Davidson Reference Davidson2016). This measure reports the proportion of voicelessness in an interval, which was converted to a proportion of phonation.
The amount of phonation during the closure of a voiceless obstruent is analyzed in two ways, both as a categorical classification and as a measure of partial phonation. For the categorical measure, each obstruent was classified as to whether it was phonated (greater than 90% of the interval was identified as voiced by the Praat Voice Report), unphonated (less than 10% of the interval was identified as voiced) or partially phonated (between 10% and 90% of the interval was voiced). This range was chosen (instead of 0% and 100%) to err on the conservative side, so that cases were not excluded from being fully unphonated or phonated by just one glottal pulse.
In addition to calculating the proportion of phonation for the whole duration of the frication or closure, the duration of each fricative and stop was also divided into thirds in order to determine the shape of phonation in the partially phonated category. This measure indicates, for example, whether the proportion of phonation steadily increased or decreased over each of the three intervals, or whether it decreased from the first to the second and then increased again. By examining the smaller intervals, we can determine, for example, whether there is phonation bleed from a preceding sound only, or perhaps whether there is both carryover phonation at the left edge and anticipatory phonation at the right edge. This measure is described in more detail in Section 3.2.
3 Results
3.1 Categorical voicing measure
The analyses in this section examine the role of segmental and prosodic variables on the proportion of obstruents realized as fully phonated (N = 219), unphonated (N = 5077), or partially phonated (N = 7202) as defined in Section 2.2. In order to facilitate comparison with the voiced stops that were analyzed in Davidson (Reference Davidson2016), the analyses for this study match those that were performed for the voiced stops.
3.1.1 Effects of phrase position
The first analysis focuses on the effect of phrase position on the presence of phonation in voiceless obstruents. In phrase-initial stops, it is not possible to be certain whether the appearance of phonation coincides with the onset of the closure or whether it begins after the closure, just as it is not possible to know whether the end of phonation coincides with the end of a stop closure in final position if the stop is not released. Therefore, for the analysis of phrase position, phonation is treated as a binary variable with the levels ‘unphonated’ and ‘phonated’, which indicates the presence of phonation of any duration. This is a simple model that includes only manner of the target obstruent and phrase position, since variables such as preceding or following stress, or preceding segment will not apply to all of the phrase positions. A more complex model including prosodic and segmental factors will follow using only phrase medial stimuli.
The analysis of phrase position is a binomial mixed effects regression using lme4 in R (Bates et al. Reference Bates, Maechler, Bolker and Walker2014, R Development Core Team Reference Development Core Team2013), with fixed effects of obstruent manner (stop, fricative) and phrase position (initial, medial, or final). Words and speakers are treated as random effects, with random intercepts for words and random slopes (for manner and phrase position) and intercepts for speakers. There are significantly fewer stops produced with any phonation than fricatives (β = –0.96, p < .001). There are significantly fewer tokens with phonation in initial and final position than in medial position (initial: β = –3.80, p < .001, final: β = –1.69, p < .001), and fewer tokens with phonation in initial position than in final position (β = –2.11, p = .002). There were no significant interactions between manner and phrase position. These results are given in Table 1.
Table 1 Proportion of obstruents with phonation (vs. no phonation) in initial, medial and final phrase position (medial phrase position includes all possible word positions).

3.1.2 Phrase-medial position: Effects of word position, stress, and preceding segment
The next analysis examines the effects of word position, preceding stress, following stress, preceding segment, and obstruent duration on whether the target obstruents are realized with or without phonation. For this analysis, a subset of data containing only obstruents in phrase-medial position (but including all potential word positions) was analyzed to ensure that none of the variables were undefined for some levels (e.g. preceding stress in phrase-initial position). Since the proportion of fully phonated obstruents was extremely low (1.7%) and were not distributed across all of the word positions and manner types, these utterances were combined with the partially phonated obstruents to create a binomial variable (phonated, unphonated) for the categorical analysis of phonation. The fixed effects included target manner (stop, fricative), word position (initial, medial, final), preceding stress (stressed, unstressed), following stress (stressed, unstressed), preceding segment (approximant+vowel [referred to as ‘approximant’ in the figures], nasals, voiced fricatives, voiceless fricatives, voiced stops, and voiceless stops), and a numeric predictor of obstruent duration. Note that the stress variables account for preceding or following stress, but because of the many word and phrase types in the corpus, the target obstruent could be between two stressed or two unstressed syllables, or could be surrounded by syllables of opposite stress specifications. Since there were so many combinations to account for, only the preceding and following stress specifications were used in order to make the analysis more manageable.
Because there were expected to be significant differences between stops and fricatives, interaction terms for manner and preceding segment were also included. The baseline values for the predictors were fricative, medial word position, unstressed (for preceding and following stress), and vowels for preceding segment. Random intercepts for words and speakers are included, but random slopes were not included as that model failed to converge.
Results for this analysis are given in Table 2, and the fixed effects are illustrated in Figures 1–3. These results demonstrate that there are significantly fewer stops with phonation than fricatives. As for word position, there is phonation significantly less often in initial position than in medial position, but the almost-significant interaction between manner and word position indicates that this effect may be due to the bigger difference between proportion of tokens with some phonation in medial versus initial position for fricatives (initial: 60%, medial 75%) than for stops (initial: 59%, medial 54%). When stress precedes the target obstruent, it is more likely to be voiceless, but there is no effect of following stress. As for the preceding segment, all other consonant sounds lead to significantly less phonation as compared to a preceding vowel/approximant. The interaction between target stops and preceding nasals (e.g. ‘on tour’) indicates that there is significantly more phonation in the closure for stops than for fricatives when preceded by nasals (and indeed, this is the category where almost all of the fully voiced tokens appear). There is no robust effect of obstruent duration, though there is a trend toward producing longer closure intervals as fully voiceless.
Table 2 Logistic regression coefficients in log-odds for voicing categories for phrase-medial obstruents.

* = significance of at least p < .05

Figure 1 Proportions of phrase-medial voiceless, partially voiced, and fully voiced obstruents by word position. Note that fully voiced tokens are included for illustrative purposes although they are combined with the partially voiced tokens for the statistical analysis.

Figure 2 Proportions of phrase-medial unvoiced, partially voiced, and fully voiced obstruents by preceding vowel's specification for stress (top), and following vowel's specification for stress (bottom).

Figure 3 Proportions of unphonated, partially phonated, and fully phonated obstruents by preceding segment.
The results from this section provide an overview of the factors that lead to at least some phonation during voiceless obstruent constrictions. Overall, regardless of phrase or word position, fewer stops are produced with any phonation at all as compared to fricatives. As for phrase position, both phrase initial and final obstruents are less often produced with phonation than obstruents in phrase medial position, but overall the most tokens with no phonation are produced in absolute utterance initial position. Obstruents in medial position show effects of both word position and surrounding context. Whereas more fricatives are produced with phonation in word medial position than in word initial and final position, there is no effect of position for stops. As for the effect of the preceding sound, approximants and nasals both lead to more than 50% of the target obstruents being produced with phonation, but the effect is greater for stops than for fricatives. For stops, 86% preceded by nasals are produced with phonation, compared to 66% preceded by approximants. For fricatives, on the other hand, a preceding approximant gives rise to 81% with some phonation, versus 54% for preceding nasals. By comparison, no more than 20% of target obstruents have any phonation at all when preceded by any obstruent, whether voiced or not. The results for the effects of stress show that there is more voicelessness when stress precedes the target obstruent but no effect when it follows, and the findings for duration show only a marginal trend toward fewer tokens with phonation as the closure duration gets longer. The results for stress, which are not on the surface consistent with previous findings, will be considered further in the General Discussion.
To better understand these results, and to address the question of whether all of the phonation present during a voiceless obstruent is necessarily carryover voicing from the preceding sound, as suggested by some authors (e.g. Lisker & Abramson Reference Lisker and Abramson1964, Docherty Reference Docherty1992), the next section presents results from the analysis of phonation shape. As briefly mentioned in the introduction, phonation shape refers to the proportion of voicing at the beginning, middle, and end of the obstruent constriction, as measured by dividing up the constriction into three equally spaced intervals. In the following section, the phonation shape for partially-phonated voiceless obstruents is investigated.
3.2 Partial phonation shape
Davidson (Reference Davidson2016) found four possible descriptive phonation shapes in American English obstruents that are phonologically voiced. The first type is called bleed, which refers to the presence of phonation that carries over from the preceding sonorant but dissipates before the onset of the stop release or before the end of frication. This is illustrated in Figure 4 (top panel). The next type is called trough, which indicates a pattern in which phonation carries over from the preceding sound, then dies out, but then reappears before the end of the stop closure or fricative constriction (see Figure 4, middle panel). Negative VOT is the third type, after standard descriptions of phonation that often starts partway through the closure or frication period and continues beyond the end of the stop closure or aperiodic frication (see Figure 4, bottom panel). The last type, called hump, refers to partial phonation that is present only in the middle of the constriction period and does not extend to either edge. This type is exceedingly rare for voiceless stops and not present for all manner/phrase position combinations.

Figure 4 Examples of the main three partial phonation shapes: bleed, trough, and negative VOT. Top panel: Example of bleed in [k] of the phrase ‘to Queen’. Middle panel: Example of trough in [s] in the word ‘noticing’. Bottom panel: Example of negative VOT in [h] in the phrase ‘still hungry’.
The analysis of phonation shape for obstruents in phrase-medial position was operationalized as follows. An obstruent was characterized as bleed if the proportion of phonation decreased from the first to the third interval (often being completely absent in the second and third intervals), and as negative VOT if the proportion of voicing increased from the first to the third interval (typically being completely absent in the first, and often second, intervals). An obstruent was labeled as trough if there was a greater proportion of voicing in the first and third intervals than in the second (usually with no phonation in the middle interval), and as hump if the proportion of voicing increased from the first to the second interval and then decreased again. A more detailed breakdown including word position is given in Figure 5, but overall, 67% of phrase-medial obstruents were produced with the bleed pattern and 29% with the trough pattern. Only 1% of tokens showed the hump pattern and 3% were produced with the negative VOT pattern.

Figure 5 Proportion of voicing shapes for partially phonated fricatives and stops in phrase-medial position. ‘Initial’, ‘medial’, and ‘final’ refer to word position. See text for explanation of ‘voicing shapes’.
In order to investigate the differences between stops and fricatives that are visually evident in Figure 5, a logistic analysis of trough and bleed patterns were carried out for obstruents containing partial phonation. Negative VOT and hump patterns are not included in the analysis because they comprise an extremely small proportion of the data. This binomial logistic regression included fixed effects of manner (stop, fricative) and word position (initial, medial, final). The reference values were fricatives and word initial position. Words were included as random factors with random intercepts, and subjects with random slopes and intercepts for all of the fixed effects. There was a significant effect of manner, since stops are produced with significantly less of the trough pattern than fricatives (β = –1.63, p < .001). For word position, final position has significantly less trough than initial position (β = –0.63, p < .002), and an interaction between manner and word-final position occurs because there is more bleed for final stops than for final fricatives (β = 0.87, p < .002).
In addition to this categorical analysis of phonation shape, an examination of the continuous proportion of phonation over the three intervals of the constriction addresses the amount of phonation produced at the beginning, middle and end of the constriction interval. While the trough pattern, for example, indicates phonation at both the beginning and end of the obstruent, it does not reveal which interval has a higher proportion of phonation. This analysis is a mixed effects linear regression using lme4 in R, again on the same subset of phrase-medial words. The dependent variable is proportion of phonation, and the fixed effects are manner (stop, fricative), word position (initial, medial, final), interval (first, second, third), and obstruent duration. Manner, word position and interval were fully crossed. Obstruent duration was included as a main effect and as two-way interactions with manner, word position and interval, in order to avoid a proliferation of three and four-way interactions. Words were included as random factors with random intercepts, and subjects with random slopes and intercepts for manner and word position. The fixed effects are sum coded. Using the guidelines established by Gelman & Hill (Reference Gelman and Hill2006), and widely used in psychological and linguistic research, a factor was considered significant if the absolute value of the t-value was greater than 2. The proportion of voicing in the three intervals for each manner and word position are shown in Figure 6.

Figure 6 Proportion of phonation for phrase-medial stops and fricatives by interval. ‘Initial’, ‘medial’, and ‘final’ refer to word position.
Statistical results are presented in Table 3. Overall, there is significantly more phonation in the first interval as compared to the other two, and no significant difference between the second and third intervals. There is generally more phonation in medial position than in final position, but no significant difference between initial and final positions. The interactions between obstruent type and interval indicate that there is more carryover phonation for stops in both the first and second intervals as compared to fricatives, but Figure 6 shows that stops and fricatives have the same proportion of phonation by the time the sound is in the third interval. The interaction between the first interval and word-initial position is due to the fact that the proportion of phonation is greatest for fricatives (relative to fricatives in other word positions) but lowest for stops (relative to the other word positions for stops) in this position. The three-way interactions for stops, both the first and second interval, and initial word position are attributable to the greater difference between the proportion of phonation in initial and medial stops relative to those positions in fricatives, as compared to final position. The significant interactions between interval, manner and word position with obstruent duration all indicate that a longer duration gives rise to a lower proportion of phonation.
Table 3 Linear mixed effects regression model for proportion of voicing for phrase-medial obstruents.

* = significant t-values – those with an absolute value greater than 2
(*) = marginally significant t-values (t = 1.9)
The results shown in Figure 6 are further informed by dividing the data into the proportions for bleed and trough separately. These results are given in Table 4, showing that for the third interval, the proportions of phonation are all under 3% for the bleed tokens and between 16%–28% for the trough tokens. (Note the number of bleed tokens with phonation in the third interval is very low: N = 124, or 2.6% of all items in the bleed category.)
Table 4 Proportion of phonation for phrase-medial fricatives and stops by interval (1st, 2nd, 3rd) divided into those categorized as having the bleed or trough patterns. ‘Initial, medial, final’ refer to word position.

The results from both the categorical phonation shape and the proportion of phonation by intervals indicate that in phrase medial position, stops with partial phonation largely reflect the bleed pattern and fricatives are more evenly split between the trough and bleed patterns, with a decrease in the trough pattern in final position. However, as illustrated in Figure 6 and Table 4, even the small rise in phonation from the second to third intervals for the trough pattern is a much lower proportion of phonation than in the initial interval: for fricatives, an average of 17% in the third interval (collapsing over word position) vs. 47% for the first interval and for stops, 26% vs. 62%. Moreover, the proportion of phonation in voiceless obstruents is in some ways different from the pattern seen for voiced fricatives in Davidson (Reference Davidson2016), which showed greater rates of phonation in the third interval (voiced fricative average over all word positions is 43%; the voiced stop average of 22% is similar to voiceless stops).
4 Discussion
Consistent with previous studies that have observed some amount of phonation during the constrictions of voiceless obstruents, this study also finds such evidence of phonation. Similar to the voiced obstruents in Davidson (Reference Davidson2016), it is conditioned by factors such as surrounding segmental context and phrase position. The phonation shape measure for both types of obstruent confirm that partial phonation can be present in multiple ways, with the bleed and trough patterns being the most frequently observed.
4.1 Influence of contextual factors on obstruent phonation
4.1.1 Effects of phrase position
The results for phrase position for voiceless obstruents indicate that the greatest proportion of phonation is found in phrase-medial position (collapsing over word position), followed by phrase-final and then phrase-initial positions. More than half of both stops and fricatives in medial position have partial phonation, whereas final fricatives have twice as much phonation as final stops (44% vs. 22%). In contrast, just over 10% of phrase-initial fricatives and stops have some partial phonation. The somewhat greater proportion of phonation in phrase-final position as compared to phrase initial position is expected, because of the asymmetry of the lower amount of threshold pressure needed to maintain phonation from a preceding sonorant as compared to the higher level required to initiate it after a pause (Plant, Freed & Plant Reference Plant, Freed and Plant2004, Koenig et al. Reference Koenig, Fuchs and Lucero2011).
Regarding the presence of any phonation in post-pausal position, inspection of voiceless fricative tokens indicates that the cases of partial voicing in utterance-initial position usually correspond to the ‘negative VOT’ phonation shape pattern, and the amount of phonation that overlaps with frication is generally less than 10% of the closure duration. While negative VOT for voiceless stops does not seem to occur in this environment, there is evidence of Lisker & Abramson's (Reference Lisker and Abramson1964) ‘edge vibrations’ on the right for phrase initial fricatives.
4.1.2 Effects of word position
Phrase-medial position allows for the examination of word position on phonation during voiceless obstruents, which is an effect that is potentially distinct from phrase position. Whereas phrase-position had some significant effects, with phrase-initial and phrase-final obstruents patterning differently from one another, word position largely had no effect either within or across the two different manners. Figure 1 shows that for fricatives and stops in word-initial, medial, and final position, almost all of them were completely unphonated for 40%–46% of the category, except for word-medial fricatives, which had significantly more tokens with partial phonation (only 24% were unphonated). The greater likelihood of phonation in the word-medial fricatives may be due to less variable coordination between the target fricative and a surrounding vowels/approximants within a word than across a word (Nam Reference Nam, Cole and Ignacio Hualde2007, Cho, Yoon & Kim Reference Cho, Yoon and Kim2014), which could allow for edge phonation on both sides. Although a similar coordination relationship may occur for medial stops, the overlap on the right in particular is more conducive to anticipatory phonation for fricatives, as research shows that stops have larger rates of peak airflow, minimum flow, and open quotient (the proportion of time in a glottal cycle when then vocal folds are open) than fricatives do, which would tend to delay phonation (Löfqvist, Koenig & McGowan Reference Löfqvist, Koenig and McGowan1995, McGowan, Koenig & Löfqvist Reference McGowan, Koenig and Löfqvist1995). Thus, phonation is more likely to begin before the end of frication as compared to the stop release. This is corroborated by the results for phonation shape in Figure 5, which show greater proportions of the trough pattern for fricatives than for stops. In comparison, voiced obstruents in Davidson (Reference Davidson2016) exhibit substantially greater rates of partial and full voicing in all phrase medial positions (between 80%–95%, except for word-initial stops, which had 65% of tokens with at least some phonation; see Davidson (Reference Davidson2016) for an explanation of this effect).
4.1.3 Effects of stress placement
The results for stress placement reveal a counterintuitive finding: target obstruents with preceding stress are significantly more likely to be produced as fully unphonated (e.g. rituals [ˈrɪtʃuəlz]) than when the preceding syllable is unstressed, whereas there is no effect of stressed vs. unstressed for the following syllable. This result is unexpected since previous phonetic accounts of lenition, or consonant reduction, in intervocalic position, have shown that voiced consonants in the onset of unstressed syllables can be produced with full voicing, and often with formant structure, or without a burst (Lavoie Reference Lavoie2001, Warner & Tucker Reference Warner and Tucker2011, Bouavichith & Davidson Reference Bouavichith and Davidson2013, Davidson Reference Davidson2016). In line with flapping of coronal stops, this tends to be a weakened position in American English.
Previous outcomes for voiced obstruents might lead to the expectation that voiceless obstruents would also show some amount of weakening in post-stressed or pre-unstressed position, which could be manifested as at least partial phonation during the obstruent constriction (Gurevich Reference Gurevich, van Oostendorp, Ewen, Hume and Rice2011). However, in this study this environment is complicated by the fact that phrase-medial position has tokens that cross word boundaries, and it is possible that two stressed syllables could occur in a row across a word boundary (e.g. some famous), depending on whether or not speakers avoid stress clash. Investigation of the breakdown of the stress on the syllable following post-stressed obstruents shows that 57% are stressed. In this post-stressed environment, there are substantial differences between tokens with some phonation and those that are fully voiceless depending on the following stress specification: 58% of obstruents are fully voiceless when there is following stress, versus 31% when the following vowel is unstressed. Thus, the relatively large number of target obstruents between stressed vowels might account for the apparent finding that there is greater voicelessness in post-stress obstruents.
4.1.4 Effects of preceding context
The results for preceding segment are clear: vowels/approximants and nasals are the contexts that give rise to substantial partial voicing for voiceless obstruents, whereas any type of preceding obstruent markedly increases the likelihood that a voiceless obstruent will be produced as fully voiceless. This is similar to the pattern of partial voicing and devoicing that was found for voiced obstruents in Davidson (Reference Davidson2016). The effect of a preceding obstruent in conditioning the voicelessness of a subsequent obstruent, regardless of the first obstruent's own underlying voicing specification, is consistent with previous research arguing that it is especially difficult to maintain phonation throughout the extended constriction duration of a consonant cluster or sequence, so even if the first consonant is specified for voicing, it would be unlikely for phonation from that consonant to carry over into the second one (Westbury & Keating Reference Westbury and Keating1986, Ohala & Kawasaki-Fukumori Reference Ohala, Kawasaki-Fukumori, Eliasson and Hakon Jahr1997).
One notable finding for both voiced and voiceless target obstruents is that a preceding nasal has different effects before a fricative than before a stop. For stops, a preceding nasal leads to more partial voicing and is the only preceding segment that leads to appreciable full voicing in the target voiceless stop (16%). If the raising of the velum occurs considerably later than onset of the constriction of the target stop, partial or even full voicing could ensue due to nasal venting (Rothenberg Reference Rothenberg1968, Westbury Reference Westbury1983, Ohala & Ohala Reference Ohala and Ohala1993, Solé Reference Solé, Vigário, Frota and João Freitas2009). Davidson (Reference Davidson2016) argued that for voiced stops, English speakers can take advantage of this preferential aerodynamic environment to achieve a fully voiced segment, and the current study suggests that there is little imperative to realize a voiceless stop with no phonation (by quickly raising the velum) following a nasal in many cases. In fact, to the extent that there is partial phonation following obstruents, it is often because the target obstruent is two segments from a nasal, as in the /k/ of different kind. In these cases, the first obstruent may be deleted or assimilated to the second one, or the nasal can lend phonation to both following obstruents. On the other hand, for fricatives, it has been argued that an open velopharyngeal port could impair strong frication noise (Solé Reference Solé, Vigário, Frota and João Freitas2009), so speakers may make more effort to quickly close the velum to protect the frication noise in both voiceless and voiced obstruents. To verify these speculations, the timing of the velum movement before voiced and voiceless obstruents would have to be measured (see Bell-Berti Reference Bell-Berti1993, for evidence that surrounding context can affect the timing and speed of the opening and closing of the velum).
4.2 Phonation shape: Where ‘edge vibrations’ are found
While an overall picture of unphonated versus partially phonated voiceless obstruents provides insight into how linguistic factors such as adjacent segment, phrase and word position, and surrounding stress condition the presence of phonation in obstruents, it does not address where in the constriction the phonation occurs. For voiceless obstruents, it is of interest to know if phonation is always carryover voicing from a preceding phonated sound, or if the aerodynamic conditions also permit voicing to begin before the constriction ends. Such cases have already been discussed above (e.g. partial voicing in phrase-initial fricatives, for example), but they are corroborated by the graphs of obstruents produced with partial phonation in Figure 5 and Figure 6. When voiceless obstruents contain some phonation, the trough and bleed patterns are the most common phonation shapes. However, whereas fricatives are more evenly split between trough and bleed, > 80% of stops with partial voicing are produced with the bleed pattern. An examination of the data suggests that the trough pattern in stops tends to coincide with intervocalic consonants that are produced with a closure, but do not necessarily have a burst, suggesting that the intraoral pressure was relatively low and therefore more conducive to starting the vocal fold vibration necessary for the following vowel/approximant (Koenig et al. Reference Koenig, Fuchs and Lucero2011). As for fricatives, weaker frication at the edges at the moments of transition and overlap between the fricative and adjacent sonorants is more likely to allow for the presence of short periods of edge phonation (Docherty Reference Docherty1992), whereas the typical high pressure and open larynx up to the burst in a voiceless stop ensures that phonation will not begin until after the burst is released (Löfqvist et al. Reference Löfqvist, Koenig and McGowan1995, Fuchs & Koenig Reference Fuchs and Koenig2009). More on the timing of the oral and laryngeal gesture will be discussed below.
When the general pattern of phonation shape is compared to the proportion of phonation in each of the three intervals of the closure duration as shown in Figure 6, an even clearer picture emerges. First, both stops and fricatives substantially reduce in the proportion of phonation present from the first to the second interval. For the obstruents categorized as trough, reported in Table 4, there is a decrease in the second interval to having phonation in less than 5% of the interval, and an increase to 16%–28% for the third interval. For those categorized as bleed, there is a steep drop-off from the first to the second interval and nearly no phonation at all in the third interval. The shape distributions between stops and fricatives are consistent with what was found for voiced obstruents in Davidson (Reference Davidson2016). However, the overall rates of phonation in each interval for voiceless obstruent are considerably lower than their voiced counterparts. For example, whereas voiceless obstruents only increase for a small amount of edge voicing in the third interval, voiced fricatives raised back up to 59% (collapsing over word position) and 62% for stops (keeping in mind that like the voiceless obstruents, the total number of voiced stops in the trough category was much smaller than the number of fricatives).
One last point to make is that a different picture of phonation during obstruent intervals is drawn depending on whether one looks at a categorical measure of phonation (totally phonated, partially phonated, fully unphonated) or at a proportional measure of phonation shape. Whereas the categorical measure indicates that more fricatives have some phonation than stops do, the phonation shape measure indicates that the phonation that does occur is more extensive for stops than for fricatives. While both partially phonated stops and fricatives have much more phonation in the first interval than in the other two, stops in all word positions still have more phonation than fricatives in all word positions in the first interval, as shown in Figure 6. This is consistent with an analysis of airflow in stops and fricatives by Koenig et al. (Reference Koenig, Fuchs and Lucero2011), who note that ‘[c]losely-approximated vocal folds entering stops should facilitate phonation and allow it to persist to a higher level of [intraoral pressure], whereas earlier abduction for fricatives would inhibit phonation’ (3235). It is also consistent with the comparison of voiceless fricatives and stops in Docherty (Reference Docherty1992), who found that when voiceless stops have phonation during the closure, it tends to be longer in duration than the counterpart phonation in fricatives. Docherty also discusses the difference between amounts of bleed in fricatives and stops in terms of the coordination of the laryngeal opening gesture relative to the oral gesture; we turn to this point in the last section.
4.3 Implications for the gestural representation of obstruents
The current results, in combination with previous acoustic findings and articulatory studies of laryngeal movement and oral pressure, have ramifications for proposals for the representation of voicing contrasts within gestural theories like Articulatory Phonology. The typical distinction that Articulatory Phonology makes between voiced and voiceless stops in American English is that there is no specific laryngeal gesture for voiced stops, because vocal fold vibration is considered the default when a speaker is in the speech-ready state (Browman & Goldstein Reference Browman and Goldstein1986, Reference Browman and Goldstein1992). In contrast, voiceless stops are specified as having glottal abduction, and in the case of aspirated stops, the peak of the glottal opening is timed to coincide with the offset of the oral gesture. Moreover, the glottal gesture in this case extends beyond the release of the oral closure, which gives rise to the acoustic realization as aspiration. This timing relationship is based on articulatory reports using methods such as transillumination, electromyography, photo‐electric glottography, and fiberoptic endoscopy (e.g. Löfqvist & Yoshioka Reference Löfqvist and Yoshioka1981, Yoshioka, Löfqvist & Hirose Reference Yoshioka, Löfqvist and Hirose1981, Löfqvist & McGarr Reference Löfqvist, McGarr, Baer, Sasaki and Harris1987, Cooper Reference Cooper1991). While there is evidence from English that unaspirated, post-stress voiceless stops also have a glottal opening gesture, transillumination data suggests that the glottal gesture is completed prior to the release of the oral closure (Löfqvist & McGarr Reference Löfqvist, McGarr, Baer, Sasaki and Harris1987). Further data from languages including English, German, and Danish indicate that the laryngeal abduction gesture produced during unaspirated stops is also smaller in magnitude than in aspirated stops (Hutters Reference Hutters1985, Cooper Reference Cooper1991,Jessen Reference Jessen1998), and is typically subject to much more across-subject variability (Lisker et al. Reference Lisker, Sawashima, Abramson and Cooper1970, Fuchs Reference Fuchs2005).
The laryngeal specification for American English voiceless fricatives differs from voiceless stops in both timing and amplitude. Löfqvist & McGarr (Reference Löfqvist, McGarr, Baer, Sasaki and Harris1987) report that the amplitude of the laryngeal opening for fricatives is larger than it is for stops (see also Tsuchida, Cohn & Kumada Reference Tsuchida, Cohn and Kumada2000, for duration), and that there is a timing difference between the onset of the laryngeal opening for fricatives and for stops: ‘the onset of glottal abduction occurs 10–20 ms before the formation of the oral constriction in voiceless fricatives, while the same event coincides with, or lags behind 10–15 ms, the onset of oral closure in voiceless stops’ (399) (see also Löfqvist & Yoshioka Reference Löfqvist and Yoshioka1984). These differences in timing as well as their consequences for oral air pressure are useful for understanding the relationship between articulatory implementation and the acoustic patterns shown in this study. Although stops and fricatives mostly have similar rates of partial voicing (~50%–60%, with the exception of word-medial fricatives, as discussed above), stops have a greater proportion of phonation in the first interval than fricatives do (Figure 6). This may be due to the 10–15 ms lag for the onset of the glottal abduction in stops relative to the oral closure, which would give the vocal folds more time to continue vibrating even after the laryngeal closure occurs, as shown by the preponderance of the bleed pattern for stops in Figure 5. Löfqvist & McGarr (Reference Löfqvist, McGarr, Baer, Sasaki and Harris1987) also note that the abduction gesture at least for stops in stressed position is 5–15 ms longer than that for fricatives (though see also Munhall, Ostry & Parush Reference Munhall, Ostry and Parush1985, who report that the duration of the abduction gesture for stops and fricatives are similar); if this is corroborated in larger samples, it would be consistent with the greater proportion of the trough pattern seen for fricatives.
The representations that have been posited for American English in Articulatory Phonology treatments are largely consistent with the results of the current study. Aspirated stops have received ample treatment, and are usually represented as having a glottal opening gesture that extends beyond the offset of the oral gesture to capture the existence of post-aspiration (Browman & Goldstein Reference Browman and Goldstein1986, Reference Browman and Goldstein1992). In many graphic representations and in some computational models, the onset of the laryngeal gesture is not synchronous with the onset of the oral gesture (e.g. Browman & Goldstein Reference Browman and Goldstein1992, Goldstein & Fowler Reference Goldstein, Fowler, Schiller and Meyer2003, Nam et al. Reference Nam, Goldstein, Saltzman and Byrd2004, Goldstein, Byrd & Saltzman Reference Goldstein, Byrd, Saltzman and Arbib2006), which is consistent with Löfqvist & McGarr (Reference Löfqvist, McGarr, Baer, Sasaki and Harris1987) and captures the bleed pattern seen in the current study. Regarding the relationship between the laryngeal gesture and the oral gesture for unaspirated or post-stress stops, the combination of articulatory and acoustic results indicates that there should be a smaller magnitude than the laryngeal gesture for aspirated stops, and that it should end before the offset of the oral gesture. Browman & Goldstein (Reference Browman and Goldstein1992), relying on findings about gestural magnitude by Cooper (Reference Cooper1991), observe that both stress and position can lead to gestural reduction in some contexts. According to the acoustic results from this study, there are nearly the same proportion of partially-phonated voiceless stops when in a stressed (aspirated) or unstressed (unaspirated) syllable (see Figure 2), and the bleed pattern is by far the most prevalent for stops regardless of word position (see Figure 5). This suggests either that the start of the laryngeal gesture relative to the oral gesture is similar for both aspirated and unaspirated stops in order to allow for bleed in both cases, or that a shorter laryngeal gesture with a smaller magnitude does not affect oral air pressure enough to prevent carryover phonation even if it does start simultaneously with the oral gesture. Fricatives, on the other hand, are typically depicted with coextensive laryngeal abduction and oral constriction gestures (Goldstein et al. Reference Goldstein, Byrd, Saltzman and Arbib2006). However, though the laryngeal abduction gesture begins when the oral constriction gesture does, it takes some time to reach peak oral pressure for strong frication, and then the articulators move away from their peaks (Löfqvist & McGarr Reference Löfqvist, McGarr, Baer, Sasaki and Harris1987). It is at these transitional edges when frication is weaker that phonation can occur when the fricative is overlapped by vowels/approximants on either side.
In the representation of voiced obstruents, Articulatory Phonology accounts usually simply do not represent any laryngeal gesture at all on the laryngeal tier (Browman & Goldstein Reference Browman and Goldstein1986, Reference Browman and Goldstein1992; Goldstein & Fowler Reference Goldstein, Fowler, Schiller and Meyer2003; Goldstein et al. Reference Goldstein, Byrd, Saltzman and Arbib2006; Best & Hallé Reference Best and Hallé2010; Parrell Reference Parrell2011). However, McGowan & Saltzman (Reference McGowan and Saltzman1995) argue that an aerodynamically-motivated tract variable can be used to account for the presence of phonation (whether partial or full) during voiced obstruents. They propose transglottal pressure (PT) as a candidate, with the volume of the supralaryngeal back cavity as the appropriate articulator to control PT. By allowing the rate of PT decrease to have language-specific values, a task dynamics model can account for both languages that require obligatory phonation, even when it is not aerodynamically advantageous, as in utterance-initial position or in consonant clusters (e.g. Abdelli-Beruh Reference Abdelli-Beruh2004, for French; Burton & Robblee Reference Burton and Robblee1997, Samokhina Reference Samokhina2010, Ringen & Kulikov Reference Ringen and Kulikov2012, for Russian; Keating Reference Keating1984, for Polish, among others) and languages like English, for which the empirical findings show that the presence of phonation is extremely context sensitive (e.g. Docherty Reference Docherty1992, C. Smith Reference Smith1997, Davidson Reference Davidson2016). McGowan & Saltzman (Reference McGowan and Saltzman1995) also further discuss evidence that speakers seem to actively control the expansion of vocal tract volume in various ways in order to enhance the likelihood that phonation will continue for voiced stops (Bell-Berti Reference Bell-Berti1975; Westbury Reference Westbury1983; Narayanan, Alwan & Haker Reference Narayanan, Alwan and Haker1995; Solé Reference Solé, Vigário, Frota and João Freitas2009, Reference Solé, Lee and Zee2011; Proctor, Shadle & Iskarous Reference Proctor, Shadle and Iskarous2010).
Some recent task dynamic modeling further underscores where aerodynamic tract variables like transglottal pressure might be especially useful. In an experimental and computational study focusing on the magnitude and duration of oral gestures in differentiating between /p/ and /b/ in northern peninsular Spanish, Parrell (Reference Parrell2011) uses the Task Dynamic Application (TaDA, Nam et al. Reference Nam, Goldstein, Saltzman and Byrd2004) to model the /p/ with both an oral labial gesture and a laryngeal gesture, and the /b/ with an oral gesture only. It is unclear what happens in the model when /b/ is in utterance initial position, but even in intervocalic position, /b/ becomes devoiced if its duration is long. Parrell explains that this occurs within the TaDA model because the model does not manipulate either oral cavity expansion or movement of the position of the vocal folds. Yet, this perhaps unintentional devoicing also underscores the necessity of the specification of gestures and articulators like PT and the volume of the supralaryngeal back cavity, which could be specified in order to enhance vocal fold vibration for languages that obligatorily implement phonation in voiced stops.
In contrast to gestural phonology frameworks, this distinction between Germanic languages and ‘true voicing’ languages has been extensively discussed within phonological theories employing distinctive features, leading to some researchers hypothesizing privative features such as [voice], [spread glottis], [constricted glottis], or the need for the underspecification of [voice] to capture languages like English and German (Iverson & Salmons Reference Iverson and Salmons1995, Tsuchida et al. Reference Tsuchida, Cohn and Kumada2000, Jessen & Ringen Reference Jessen and Ringen2002, Honeybone Reference Honeybone, Van Oostendorp and Van de Weijer2005, Beckman, Jessen & Ringen Reference Beckman, Jessen and Ringen2009, Beckman et al. Reference Beckman, Jessen and Ringen2013, Nicolae & Nevins Reference Nicolae and Nevins2016,). In order to account for differences in languages that have more or less passive phonation for obstruents that are not specified for laryngeal opening/spread glottis, Beckman et al. (Reference Beckman, Jessen and Ringen2013) further explore numerically specified features to account for varying degrees of phonation. While gestural frameworks may be a natural fit for accounting for variation in phonation, such frameworks have yet to attempt a full classification of laryngeal contrasts more generally. In fact, a proper gestural treatment of laryngeal contrasts will require the proposal of either new tract variables or applications of existing ones on the laryngeal tier to capture contrastive laryngeal differences beyond just voicing and aspiration, including implosives and ejectives (e.g. Iverson & Salmons Reference Iverson and Salmons1995, Gallagher Reference Gallagher2011). While it is beyond the scope of this study to develop a full set of gestural representations for laryngeal contrasts, the empirical data reported here, in addition to the acoustic and articulatory data presented for other languages in the literature, should be relevant to determining the proper timing and magnitude of laryngeal representations cross-linguistically.
5 Conclusion
Investigating the presence of phonation during voiceless obstruents in American English allows for the study of the interaction between a laryngeal specification for abduction and the natural phonetic environments that may either promote phonation or prevent it. Davidson (Reference Davidson2016) showed that for voiced obstruents, the presence of phonation is extremely affected by articulatory and aerodynamic considerations, such that voiced obstruents can be almost always devoiced in some challenging environments (after a pause, end of an utterance, for example) while environments conducive to voicing give rise to fully phonated obstruents at high rates. In this corpus study, naturalistically produced voiceless obstruents show some similarities, but unlike their voiced counterparts, the aerodynamic effects seem to be tempered by the requirements that are imposed by a laryngeal abduction gesture. In more favorable environments like intervocalic position, the incursion of partial phonation is seen at the edges, but it is largely limited to the beginning of the target obstruent. Where phonation is found overlapping with the offset of the constriction, it is mainly for fricatives and is a much smaller proportion of phonation as compared to voiced stops at the right edge. The acoustic patterns found in this study illustrate the acoustic consequences of laryngeal abduction gestures that have been observed in articulatory studies. The body of empirical data – both acoustic and articulatory, and the cross-linguistic studies that have been undertaken – would be a useful source of evidence for building a representational system of laryngeal contrasts within a gestural framework.
Acknowledgements
I would like to thank editor Amalia Arvaniti and three anonymous reviewers for their comments and feedback on this paper. I also thank Zack Jaggers for his help in analyzing the data for this study. Suzy Ahn, Gillian Gallagher, Sang-Im Lee-Kim, Jason Shaw, Colin Wilson and members of the NYU PEP Lab provided helpful comments on this work. I also received input from audiences at the Acoustical Society of America meeting in Hawaii, Harvard University, University of Southern California, and the Conversational Speech and Lexical Representations workshop in Nijmegen. This research was supported by faculty research funds from New York University.