1 Introduction
Abramson & Whalen (Reference Abramson and Whalen2017: 84) claim that after 50 years, voice onset time (VOT) ‘has proven to be a robust measure of the acoustic realization of the consonantal voicing distinctions in most languages’. Affricates can also be distinguished from stops using VOT (Abramson Reference Abramson1989, Reference Abramson and Raphael1995). Moreover, VOT has been shown to be affected by place and vowel quality. In general, VOT is significantly longer for affricates, followed by velars, dentals or alveolars, and bilabials (Abramson Reference Abramson and Raphael1995, Cho & Ladefoged Reference Cho and Ladefoged1999). VOT is also significantly longer before high vowels, while vowel advancement effects on VOT are mixed (Nearey & Rochet Reference Nearey and Rochet1994, Morris, McCrea & Herring Reference Morris, McCrea and Herring2008).
Recently, tone has been shown to affect the VOT of voiceless unaspirated and aspirated stops in Chinese language varieties. For Mandarin (ISO 639-3 cmn) voiceless unaspirated stops, VOT was significantly longer before the high and falling-rising tones in contrast to the rising tone, while for aspirated stops, VOT was significantly longer before the falling-rising tone compared to the falling tone (Chen, Peng & Chao Reference Chen, Peng and Chao2009). In Hakka (ISO 639-3 hak), another Chinese variety, the shortest VOT was associated with the two stopped-syllable tones (Chen et al. Reference Chen, Peng and Chao2009).
The effect of place and nasality on VOT, closure duration, and the voiceless interval has been studied in two nasal harmony languages: Paraguayan Guaraní (ISO 639-3 gug), a member of the Tupí family (Walker Reference Walker1999), and Desano (ISO 639-3 des), a Tukanoan language (Silva Reference Silva2008). Both researchers found that the mean VOT of intervocalic voiceless stops is significantly longer and closure duration is significantly shorter in nasal words. However, their results differ regarding the voiceless interval. Walker (Reference Walker1999) found no significant difference in the mean length of the voiceless interval of nasal and oral words, while Silva (Reference Silva2008) found that the voiceless interval is significantly longer in oral words compared to nasal words.
While the laryngeal timing of obstruents has been investigated in a wide variety of languages, investigations of the laryngeal timing of the obstruents of Karenic languages is limited. Except for the investigations of Kayan plosives, vowels, and tones with six speakers (Luangthongkum Reference Luangthongkum2010) and Sgaw Karen tone perception with four speakers (Brunelle & Finkeldey Reference Brunelle, Finkeldey, Wai Sum and Eric2011), all other acoustic studies of Karenic languages have reported on the speech of a single speaker (Abramson Reference Abramson and Raphael1995, Sun Reference Sun, DiCanio, Malins, Good, Michelson, Jaeger and Keily2016). Furthermore, Abramson (Reference Abramson and Raphael1995) is the only study that has investigated VOT in a Pwo Karen language variety.
Given the state of research on Karenic languages and context effects on VOT, closure duration, and the voiceless interval, the main purpose of this study is to provide a first description of some of the acoustic aspects of Northern Pwo Karen (N. Pwo) stops and affricates. N. Pwo (ISO 639-3 pww) is an under-documented Karenic language with a three-way stop distinction (voiceless unaspirated, voiceless aspirated, and voiced) and a two-way affricate distinction (voiceless unaspirated and voiceless aspirated). The phoneme inventory includes both oral and nasal vowels, as well as six tones: four modal tones and two glottalized tones. A secondary purpose of this study is to investigate the effects of place, nasality, tone, and vowel quality on the VOT, closure duration, and voiceless interval of obstruents in one language.
In the rest of the paper, we consider studies of context effects on VOT, closure duration, and the voiceless interval (Section 2). We then describe the methodology of the current study, including the N. Pwo syllable shapes and phoneme inventory, participants, materials, procedure, and measurements (Section 3). Then, the results of the acoustic analysis are presented (Section 4), followed by discussion (Section 5) and the conclusion (Section 6).
2 Previous work on VOT, closure duration, and the voiceless interval
In this overview of previous studies of context effects on VOT, closure duration, and the voiceless interval, place effects are considered first (Section 2.1). Then, vowel quality effects on VOT are detailed (Section 2.2), followed by nasality effects on VOT, closure duration, and the voiceless interval (Section 2.3), and then tonal effects on VOT (Section 2.4). The section ends with the questions and predictions considered in this paper (Section 2.5).
2.1 Place effects on VOT, closure duration, and the voiceless interval
Place effects on VOT have received more attention than vowel quality, nasality, or tone effects. One common finding is that velar or uvular stops are associated with the longest VOT, while bilabial stops are associated with the shortest VOT (Cho & Ladefoged Reference Cho and Ladefoged1999: 218). When the two alveolo-palatal affricates, /tɕ/ and /tɕʰ/, are included in a study of VOT, the affricates show the longest VOT, followed by the velar, and then dental stops, while the bilabial stops exhibit the shortest VOT (Abramson Reference Abramson1989, Reference Abramson and Raphael1995). Often, however, a difference in VOT between labial and dental or alveolar stops is not found (e.g. Abramson & Lisker Reference Abramson, Lisker, André and René1972, Cho & Ladefoged Reference Cho and Ladefoged1999).
Cho & Ladefoged (Reference Cho and Ladefoged1999: 213) suggest possible explanations for place effects on VOT. For unaspirated stops, the differences in VOT could be due to the size of the supralaryngeal cavity behind the constriction. For example, the size of the cavity is smaller for /k/ causing greater pressure that takes longer to fall, so VOT is longer. Conversely, the volume of the cavity in front of the constriction is larger, so the pressure takes longer to fall, again resulting in longer VOT for stops produced further back in the mouth. Slower movement of the back of the tongue could also result in longer VOT compared to the quicker movement of the tip of the tongue or lower lip. Finally, the extent of the contact area of the articulators could affect the length of VOT. A smaller contact area would contribute to shorter VOT, while a greater contact area would contribute to longer VOT. For voiceless aspirated stops, they suggest that intraoral pressure drops more slowly for /kʰ/ than for /tʰ/ or /pʰ/. This results in a slower reduction of the glottal opening after the release of the aspirated stop, which results in longer VOT further back in the mouth. They also mention the reciprocal timing relationship between closure duration and VOT in the belief that the duration of the vocal fold opening is fixed. However, Docherty (Reference Docherty1992) has shown that the duration of the voiceless interval is not necessarily of fixed length, in contrast to Weismer (Reference Weismer1980).
Until 2010, the only acoustic study of a Karenic language was Abramson (Reference Abramson and Raphael1995), which was a study of the laryngeal timing of the obstruents of one Pwo Karen speaker from Paa Sangngaam village in Chiang Rai province, Thailand, and a Sgaw Karen (ISO 639-3 ksw) speaker from Taunggyi, Southern Shan State, Myanmar. Both language varieties are mutually unintelligible with N. Pwo. In this study, the Pwo Karen obstruents included bilabial, alveolar, and velar voiceless aspirated and unaspirated stops, voiceless alveolo-palatal aspirated and unaspirated affricates, and bilabial and alveolar voiced stops. The Sgaw Karen obstruents included the same stops, along with voiceless aspirated and unaspirated sibilants. Abramson’s results show a separation between voiceless aspirated and unaspirated obstruents, as well as voiced stops, at each place of articulation for both languages. Concerning the inclusion of the affricates in this study, Abramson (Reference Abramson and Raphael1995: 159–161) comments that their inclusion is justified, with the proviso that the voicing lag from the release of the affricate to the glottal pulsing of the vowel is composed of frication, not the aspiration that is characteristic of the voiceless aspirated stops. Furthermore, the aspirated affricate evidences a longer period of frication as opposed to the unaspirated affricate.
Since 2010, the Karen Linguistics Project of Chulalongkorn University, Bangkok, Thailand, has produced several studies on phonetic aspects of Karenic languages. A study of four female and two male speakers of Kayan (ISO 639-3 pdu), which is distantly related to N. Pwo, found that the VOT differences between labial and alveolar plosives were not statistically significant, while VOT differences between the labial and alveolar plosives compared to the velar plosive were statistically significant (Luangthongkum Reference Luangthongkum2010).
Concerning place effects on closure duration, Weismer (Reference Weismer1980), in a study of English obstruents, reported that when VOT is longer, closure duration is shorter and vice versa. Therefore, one would expect labials to have the longest closure duration, followed by dentals, velars, and then alveolo-palatals, which is the opposite of the pattern expected for VOT. However, Yao (Reference Yao2007) reports wide speaker variation for both the VOT and closure duration of English word-initial voiceless aspirated stops, which are not utterance-initial, in the Buckeye corpus (Pitt et al. Reference Pitt, Keith, Elizabeth Hume and Raymond2005). The results show that the average closure duration of /p/ was significantly longer than /t/ and /k/ for all speakers. Also, the average closure duration of /k/ was greater than the average closure duration of /t/. Furthermore, the difference in closure duration between /t/ and /k/ was much less significant. Finally, the variability in closure duration was ‘most sensitive’ to the target stop and the preceding environment, while VOT was the most sensitive to speaker, speaking rate, the following phone, and word frequency (Yao Reference Yao2007: 220). Thus, closure duration and VOT are not necessarily in a reciprocal relationship across all stops. They are sensitive to both linguistic context and non-linguistic variables.
As for the voiceless interval, we found two definitions in the literature. Weismer (Reference Weismer1980: 429) stated that ‘the voiceless interval for voiceless stops will generally include the duration of the closure plus the voice-onset time’. Yao (Reference Yao2007) also reports on total duration (closure duration + VOT) in his corpus study of English. In contrast, other researchers define the voiceless interval as closure duration + voice onset time – the offset voicing or voicing bleed (Davidson Reference Davidson2016) of the preceding vowel (Suomi Reference Suomi1980, Docherty Reference Docherty1992, Walker Reference Walker1999, Silva Reference Silva2008). For this study, we use the second definition of the voiceless interval.
Weismer (Reference Weismer1980) reported that the mean duration of the voiceless interval of intervocalic voiceless stops and fricatives was constant, regardless of either the place or manner of articulation. Suomi (Reference Suomi1980), who studied voicing in English and Finnish stops, also concluded that the voiceless interval is constant regardless of place of articulation. However, Docherty (Reference Docherty1992) determined that the voiceless interval is not constant based on his study of the timing of British English obstruents. He also noted that both Weismer (Reference Weismer1980) and Suomi (Reference Suomi1980) concluded that the voiceless interval is constant based on mean voiceless interval measurements, without considering the variation across speakers, as evidenced by the wide standard deviations.
2.2 Vowel quality effects on VOT
After place, vowel quality is the next most studied VOT context effect. Significantly longer VOT preceding high vowels has been observed in English and French (Nearey & Rochet Reference Nearey and Rochet1994, Morris et al. Reference Morris, McCrea and Herring2008), as well as Mandarin Chinese (Rochet & Fei Reference Rochet and Fei1991, Chao & Chen Reference Chao and Chen2008). Results for vowel advancement are not as straightforward. For Mandarin Chinese, Rochet & Fei (Reference Rochet and Fei1991) found that VOT was significantly longer for /p/ when followed by /u/, while VOT was the longest for /t/ before /i/. In contrast, Chen, Chao & Peng (Reference Chen, Chao, Peng, Kuang-Hua and Berlin2007) reported that front vs. back vowels do not significantly affect the VOT of Mandarin voiceless aspirated and unaspirated stops. Concerning voiced stops, Nearey & Rochet (Reference Nearey and Rochet1994) did not find any significant vowel context effects on the VOT of French voiced stops.
2.3 Nasality effects on VOT, closure duration, and the voiceless interval
Walker (Reference Walker1999) investigated VOT, closure duration, and the voiceless interval of intervocalic voiceless unaspirated stops in nasal and oral words in Paraguayan Guaraní, which is known for its nasal harmony. Then, Silva (Reference Silva2008) repeated Walker’s study with Desano, which also exhibits nasal harmony. In these two languages, nasal harmony affects both vowels and voiced obstruents. In contrast, voiceless obstruents do not nasalize, although they do not block the spreading of nasality. In Walker’s study, longer VOT and shorter closure duration were associated with voiceless stops in nasal words as opposed to oral words. Furthermore, in an interaction with place of articulation, VOT was significantly longer for both /p/ and /t/ in nasal words as opposed to oral words. For /k/, however, average VOT in nasal and oral words was not significantly different.
For closure duration, both Walker (Reference Walker1999) and Silva (Reference Silva2008) found that the closure duration of /p/ and /t/ was longer for oral words as opposed to nasal words, while the closure duration for /k/ in nasal and oral contexts was not significantly different. Walker (Reference Walker1999: 78) suggests that the reason for the lack of a difference in velar closure duration in relation to VOT in nasal and oral contexts is because ‘aspects of the timing with velars are highly fixed in comparison to other stops’. These results indicate that closure duration can also be affected by place and nasality.
Concerning the voiceless interval, Walker (Reference Walker1999) found that the average length of the voiceless interval was not significantly different in nasal vs. oral contexts for /p/ and /t/. However, she found that the length of the voiceless interval for /k/ is significantly longer in oral contexts. For Desano, Silva (Reference Silva2008) reports a smaller difference in the voiceless interval for /p/ between nasal and oral contexts, while the difference in the voiceless interval for /t/ and /k/ is larger between nasal and oral contexts. Therefore, he concludes that the length of the voiceless interval is not constant. These studies show that the voiceless interval can be affected by both place and nasality. Therefore, if the voiceless interval can be affected by place and nasality, it may be affected by other context effects, such as tone and vowel quality.
2.4 Tonal effects on VOT
Liu et al. (Reference Liu, Ng, Wan, Wang and Zhang2008) found that the VOT of Mandarin Chinese voiceless aspirated stops preceding Tone 2 (35) and Tone 3 (214) was significantly longer than the VOT preceding Tone 4 (51). The reason for this is that ‘the longer VOT values in the MR [mid-rising] and FR [falling-rising] tones may be related to the fact that both tones have a rising component in their pitch contours…The anticipated increase of tension in the voicing source and pitch may have delayed the onset of vibration of the vibratory structure’ (Liu et al. Reference Liu, Ng, Wan, Wang and Zhang2008: 215).
Chen et al. (Reference Chen, Peng and Chao2009) reported the same results for Mandarin aspirated stops. However, when they excluded the non-words in the data, they found that the VOT of voiceless aspirated stops was significantly longer preceding Tone 3 (214) compared to Tone 4 (51). The VOT of unaspirated stops was significantly longer preceding Tone 1 (55) and Tone 3 (214) compared to Tone 2 (35).
When Chen et al. (Reference Chen, Peng and Chao2009) examined the effect of tone on the voiceless stops in real words of Hakka (ISO 639-3 hak), another Chinese variety, they found that the VOT of both aspirated and unaspirated stops was significantly longer before Tone 1 (24) and Tone 5 (11). In addition, the VOT of unaspirated stops was significantly shorter before Tone 8 (55), while the VOT of aspirated stops was significantly shorter before Tone 4 (32) and Tone 8 (55). Both Tone 4 and Tone 8 only occur in stopped syllables, which are characterized as ‘short, rapid and ended by a stop’ (Chen et al. Reference Chen, Peng and Chao2009: 557).
In Asian languages, tones are complexes of pitch, voice quality, and duration (Andruski & Ratliff Reference Andruski and Ratliff2000, Brunelle Reference Brunelle2009). For example, Brunelle & Finkeldey (Reference Brunelle, Finkeldey, Wai Sum and Eric2011: 375) describe Sgaw Karen Tone 4 as a glottalized tone that is distinguished by shorter duration and glottalized voice quality. The Vietnamese (ISO 639-3 vie) B2 tone is described as shorter than other tones and ending on a glottal stop (Brunelle Reference Brunelle2009: 80). Furthermore, the realization of a glottalized tone can range ‘from a strong laryngealization to a full glottal stop’ (Brunelle Reference Brunelle2009: 80). In addition, stop-final syllables, including syllables with a glottal stop coda, are treated the same. Duanmu (Reference Duanmu1994: 78) observes that
in Asian tone languages stopped syllables (i.e. those whose coda is [ʔ] or a glottalized [p], [t], or [k]) do have different pitch contours from unstopped syllables; usually, the pitch contour is shorter on a stopped syllable than on unstopped syllables. As a result, tones on stopped syllables are usually listed separately in traditional descriptions.
However, shorter tone duration does not necessarily correlate with shorter VOT. Sun (Reference Sun, DiCanio, Malins, Good, Michelson, Jaeger and Keily2016) provides a preliminary investigation of the interaction between breathiness and the voiceless aspirated and unaspirated stops of one Sgaw Karen speaker. Sgaw Karen has a modal mid tone, two breathy tones (breathy high and breathy low), and two glottalized tones (high glottalized and low glottalized). The mid tone duration is the longest, followed by the breathy tones. The two glottalized tones are the shortest. In addition, aspirated stops do not co-occur with the low breathy tone at all, while real words with an aspirated stop and a high breathy tone are rare and only occur with /pʰ/ and /tʰ/. The average VOT of /pʰ/- and /tʰ/-initial syllables is the longest before the mid (112–121 ms) and glottalized tones (115–130 ms), while the shortest VOT occurs before the high breathy tone (85–98 ms). For /p/ and /t/, VOT is somewhat longer before the glottalized tones (8–10 ms) compared to the mid tone (6–9 ms), while the shortest VOT occurs before the breathy tones (2–5 ms). These results show that neither tone duration nor the presence of aspiration with a breathy tone predict shorter VOT. As Liu et al. (Reference Liu, Ng, Wan, Wang and Zhang2008) suggest, the greater articulatory effort required to produce particular tones could result in longer VOT. Less articulatory effort is required to produce breathy phonation (Marasek Reference Marasek1997, Gick, Wilson & Derrick Reference Gick, Wilson and Derrick2013), which could contribute to shorter VOT before the Sgaw Karen breathy tones.
To summarize, for voiceless aspirated stops, longer VOT tends to be associated with modal tones with rising components and low tones, while shorter VOT is associated with falling and stopped-syllable tones, although stops that occur before breathy tones can exhibit even shorter VOT. For voiceless unaspirated stops, results are mixed. In Mandarin, the longest VOT occurs preceding Tone 1 (55) and Tone 3 (214), while the shortest VOT occurs preceding Tone 2 (35). In Hakka, the longest VOT occurs preceding both the rising and low tones and the shortest VOT occurs preceding Tone 8, one of the stopped-syllable tones.
2.5 Questions and predictions
Given the results of previous research, this study seeks to answer three questions:
Question 1: How is the VOT of stops and affricates affected by vowel quality, tone, and nasality in a language with a three-way distinction in stops, such as N. Pwo?
First, studies on the effect of vowel quality on VOT seem to have been limited to languages with a two-way stop distinction. Second, tone has been shown to affect the VOT of voiceless unaspirated and aspirated stops in Chinese language varieties and in Sgaw Karen. However, the effect of tone on the VOT of voiced stops is yet to be investigated. N. Pwo is a ‘true voicing’ tonal language, which makes it an ideal candidate for investigating tonal effects on voiced stop VOT. Third, the effect of following nasal vowels on VOT has not been investigated in any language, so this is one gap that this study aims to fill.
Question 2: Are VOT and closure duration in a reciprocal relationship when affected by vowel quality, a post-obstruent nasalized vowel, or tone?
It has been claimed that VOT and closure duration are in a reciprocal relationship with each other. For place, when VOT is shorter, closure duration is longer and vice versa (Weismer Reference Weismer1980). Both Walker (Reference Walker1999) and Silva (Reference Silva2008) have shown that this reciprocity is maintained in nasal and oral words in a nasal harmony language. However, it is not known whether this reciprocal relationship between VOT and closure duration holds for tone, nasalized vowels, or vowel quality effects.
Question 3: Is the voiceless interval affected by place, vowel quality, nasality, and tone in a three-way stop language, such as N. Pwo?
Reports of the constant length of the voiceless interval by place and nasality have been mixed. Furthermore, it is not known whether vowel quality, nasalized vowels, and tone have any effect on the voiceless interval.
Based on the results of previous studies, we expect contextual effects on VOT that differ by type (voiceless aspirated, voiceless unaspirated, or voiced). For place effects, we predict that VOT will be the longest for the alveolo-palatals (affricates), followed by the velars, and then the dentals and bilabials. It is also possible that the difference in the VOT of dentals and bilabials will not be significant. Since N. Pwo has modal and glottalized tones, we predict that VOT will be the longest preceding the mid and high tones, which both rise slightly, as well as the low tone, while VOT will be the shortest before the falling and glottalized tones. VOT will also be the longest before high vowels. Moreover, the VOT of alveolo-palatal aspirated and unaspirated obstruents before front vowels will be the longest, while the VOT of the bilabial plosives before front vowels will be the shortest. Finally, no vowel context effects are predicted for voiced stops.
As for the reciprocal relationship between VOT and closure duration that has been observed in other languages, we expect that closure duration will be the longest for bilabials, followed by the dentals, velars, and then affricates. As for a nasality effect, it is not possible to make a prediction since N. Pwo has nasalized vowels, not nasal words that come about through nasal harmony. It is also not possible to predict the effects of vowel quality and tone on closure duration, due to the lack of previous work. Finally, we expect context effects on the voiceless interval, although it is not possible to predict what effects there might be based on previous work.
3 Methods
For this study of the effects of place, vowel quality, tone, and nasality on VOT, closure duration, and the voiceless interval, meaningful N. Pwo words were recorded in a carrier phrase that placed the target stop or affricate in a word-initial, intervocalic position. Information about the relevant phonological inventory of N. Pwo is detailed in Section 3.1, followed by the speech materials in Section 3.2, the participants in Section 3.3, the recording procedure in Section 3.4, and the acoustic measurements in Section 3.5.
3.1 Northern Pwo Karen
N. Pwo is an isolating language with major and minor syllables. Major syllables (C(C)VT) are stressed and make use of the full inventory of consonants, vowels, and tone. In contrast, minor syllables (Cə) are reduced and never occur alone or word-finally. They have a limited single initial consonant inventory and unstressed /ə/ as the only vowel, with no tone. See Pittayaporn (Reference Pittayaporn, Enfield and Bernard2015) for an examination of this areal phenomenon of Mainland Southeast Asian languages.
The N. Pwo stop and affricate inventory (Table 1) includes voiceless unaspirated, voiceless aspirated, and voiced stops, along with voiceless unaspirated and aspirated alveolo-palatal (laminal post-alveolar) affricates (Cooke, Hudspith & Morris Reference Cooke, Edwin Hudspith, Morris and William1976, Phillips Reference Phillips2009).
Table 1 Northern Pwo Karen stops and affricates.

The vowel inventory (Table 2) includes both monophthongs and diphthongs consisting of a low /ɐ/ with high /i ɨ u/ offglides. Monophthongs can be either oral or nasal while diphthongs are only oral. Length is not contrastive.
Table 2 Northern Pwo Karen vowels.

Finally, N. Pwo has six tones (Table 3): four modal tones, High (H), Mid (M), Low (L), Falling (F), and two glottalized tones, Mid-glottalized (MQ) and Falling-glottalized (FQ) that end in laryngealization or a glottal stop. The tones are characterized using the Chao (Reference Chao1967) tone numbers, which provide an approximation of the contour patterns of lexical tones in a five-level system.
Table 3 Northern Pwo Karen tones.

1 = Low, 2 = Half-Low, 3 = Middle, 4 = Half-High, 5 = High
3.2 Materials
A list was created of all the possible major CV syllables occurring with the ten oral obstruents /p pʰ b t tʰ d tɕ tɕʰ k kʰ/, seven oral vowels /i e ɛ ɐ u o ɔ/ and six nasal vowels /̃ e⃜ ɐ⃜ u⃜ o⃜ ɔ⃜/, along with the six tones (high, mid, low, falling, mid-glottalized and falling-glottalized). In this list, the phonemic diphthongs were excluded, as well as the high and mid central vowels due to their rarity. Also, the N. Pwo vowel inventory does not include a nasalized /ɛ⃜/ in opposition to the oral /ɛ/ (see Table 2).
When this list of 840 possible major syllables was tested with a N. Pwo speaker, only 289 meaningful words were found in which the target stops and affricates occur in the initial major syllable of a mono- or polysyllabic word (see Appendix A). Phonotactically, nasalized vowels never co-occur with the glottalized tones. Also, in the elicitation list, no voiced stops and no tokens of /i/ and /u/ occur with the mid-glottalized tone, no tokens of /e/ and /o/ occur with the falling-glottalized tone, and no tokens of /ɔ/ occur with the high, mid, and falling tones. The distribution of place, vowel, and tone combinations in the data set is tabulated in Appendix B.
3.3 Participants
Twelve participants, six men and six women, were recruited by an assistant, who is a speaker of N. Pwo and a leader in the N. Pwo Karen community in Chiang Mai, Thailand. Participants’ ages ranged from 17–37 years. Eleven participants were born and grew up in southern Mae Hong Son province, while one participant came from southern Chiang Mai province.
All participants were multilingual speakers of at least N. Pwo and Standard Thai, as well as Northern Thai, a lingua franca in northern Thailand. Some were also fluent in Sgaw Karen, a related Karenic language, and some were familiar with English. Most of the participants were either in university or vocational school, or they had already graduated. The single exception was one speaker with a Grade 9 education. All the participants were readers of Standard Thai and all but three of them were proficient readers of the N. Pwo writing system, which uses an adapted Thai script. These three participants transliterated problem words using Thai spelling to help them read the words correctly.
3.4 Procedure
Recordings were carried out in a quiet room at Payap University, Chiang Mai, Thailand, with a Marantz PMD661 digital recorder and either a Shure WH20 head-mounted or a Shure SM57 microphone. The data collection was conducted by the N. Pwo assistant who gave instructions in a mix of Standard Thai and N. Pwo, although communications eventually became N. Pwo exclusively. Participants had the opportunity to practice reading through the list prior to recording, and reading practice also occurred during the recording, as necessary. The N. Pwo assistant checked for misreadings, while the first author ran the recording equipment, checked for hesitations, and collated the list of words to be reread.
Participants were asked to read the word list at a normal speaking rate, in 200-item blocks. The words were read in the carrier phrase Did you hear _________? (Lit.: “You hear _________ ques”)’. Each word was repeated twice, in a randomized list of 578 items, by 12 speakers (578 x 12), for a total of 6,936 tokens. Any reading mistakes were noted, and speakers were asked to reread items at the end of the recording session. Reading mistakes were most often due to hesitations or the mispronunciation of tone. Generally, recording took about 45 minutes to an hour per participant. For participants who were not literate in N. Pwo, the recording time was longer, as they took more time to practice and reread mispronounced words.
3.5 Measurements
Several measurements related to the target stop in each token were marked using Praat (Boersma & Weenink Reference Boersma and Weenink2012). For the voiceless obstruents, voiced closure (VDCLO) was measured from the last visible instance of the F2 and/or F3 formants to the end of the voicing bleed, which was determined based on a change in the pattern of the waveform and the end of voicing in the spectrogram. Voiceless closure (VLCLO) was measured from the end of the voicing bleed of the preceding vowel to the burst release of the target obstruent. VOT (REL) was measured from the burst release to the onset of voicing of the following vowel. The vowel onset was marked when the pattern of the waveform changed and the striations in the spectrogram became darker. In cases of multiple burst releases, VOT was measured from the first burst release. Figure 1 illustrates the measurement of the syllable . For the purposes of this study, VOT is labelled REL, closure duration is voiced closure (VDCLO) + voiceless closure (VLCLO), and the voiceless interval is voiceless closure (VLCLO) + VOT (REL).

Figure 1 Waveform and spectrogram of (VDCLO – voiced closure, VLCLO – voiceless closure, REL – release).
N. Pwo voiced stops are voiced for more than 50% of the closure, which makes N. Pwo a ‘true voicing’ language (Davidson Reference Davidson2016, Abramson & Whalen Reference Abramson and Whalen2017). Voiced closure (VDCLO) or negative VOT was measured from the last visible instance of the F2 and/or F3 formants of the preceding vowel to the onset of voicing in the following vowel (Lisker & Abramson Reference Lisker and Abramson1964, Klatt Reference Klatt1975). Figure 2 illustrates the measurement of the syllable .

Figure 2 Waveform and spectrogram of the voiced closure (VDCLO) of .
4 Results
Nine tokens were removed due to mispronunciations, and a further forty-two tokens were removed due to hesitations. This left 6,885 tokens for the analysis. Most of the hesitations were produced by three of the six female speakers and three of the six male speakers, while most of the mispronunciations were produced by five of the twelve speakers. Even so, the data set included at least one iteration of every word by every participant.
These data were analysed using linear mixed effects regression in R (R Core Team 2019), with the lme4 package (Bates et al. Reference Bates, Maechler, Bolker and Walker2019). The package lmerTest (Kuznetsova et al. Reference Kuznetsova, Brockhoff, Christensen and Jensen2019) was used to produce degrees of freedom and p-values using the Satterthwaite approximation. However, we note that Bates et al. (Reference Bates, Mächler, Bolker and Walker2015: 34) indicate that any method for approximating degrees of freedom for linear mixed-effects regression is ‘at best ad hoc’.
In addition to random effects for speaker and word, all possible random slopes with speaker were tested and included in the final model if they improved model fit and did not cause convergence issues. All the dependent variables, VOT, closure duration, and the voiceless interval, had slightly skewed distributions and the distribution for the unaspirated VOT had a multimodal distribution likely related to place of articulation. We transformed all the dependent variables using a base-10 logarithm, which better approximated a normal distribution and maintained consistency across all the models. Furthermore, since the differences were so large between the voiceless aspirated, voiceless unaspirated, and voiced obstruents, we analysed the different types separately after modelling an overall effect.
The factors tested were: Type (aspirated, unaspirated, voiced), Place of Articulation (bilabial, dental, alveolo-palatal, velar), Nasality (oral, nasal), Tone (high, mid, low, falling, mid-glottalized, falling-glottalized), Vowel Height (high /i ̘ u u⃜/, non-high /e e⃜ ɛ ɐ ɐ⃜ o o⃜ ɔ ɔ⃜/), and Vowel Advancement (front /i ̘ e e⃜ ɛ/, back /ɐ ɐ⃜ u u⃜ o o⃜ ɔ ɔ⃜/). The Syllable variable registered the number of syllables in a word (one syllable, two syllables, three syllables). A backward stepwise model-fitting procedure was used in which non-significant effects were excluded from the model. The testing of context effect interactions was limited to interactions with place of articulation. None of the factors were fully crossed since data was not available in many cases due to the phonotactics of the language and the lack of meaningful words for all logical combinations of obstruents, vowels, and tones. The combinations of obstruents, vowels, and tones in the data set are shown in Appendix B. Place interactions were included in the final model if they were significant and did not cause convergence issues. In the results to follow, the analysis of voice onset time is reported first (Section 4.1), followed by closure duration (Section 4.2), and then the voiceless interval (Section 4.3).
4.1 Voice onset time
A first linear mixed effects model of the entire dataset – with speaker and word random effects, a vowel advancement random slope with speaker, and no interactions – shows a large effect of Type. In Figure 3, the unaspirated obstruents exhibit the shortest VOT followed by the aspirated obstruents, which are significantly longer (p < .001). Voiced stops evidence negative VOT. Since the difference in VOT by type was so large, a separate analysis was carried out for the voiceless aspirated and unaspirated tokens to better explore context effects.

Figure 3 (Colour online) Raincloud plot of mean VOT by phoneme and type (unaspirated obstruents in orange, aspirated obstruents in green, and voiced stops in lavendar).
Since negative VOT is equivalent to closure duration, the voiced stop results are reported under closure duration in Section 4.2. Aspirated obstruent results are reported first (Section 4.1.1), followed by the unaspirated obstruents (Section 4.1.2). The section ends with a summary of the VOT results (Section 4.1.3).
4.1.1 Aspirated obstruents
A linear mixed effects regression model of the aspirated tokens, which included speaker and word random effects, showed significant main effects of place, nasality, tone, vowel height, and vowel advancement, along with a significant interaction of place and vowel advancement (Table 4). The model also included a nasality random slope with speaker. The model intercept is Syllables: 1s, Place: Bilabial, Nasality: Nasal, Tone: Falling-glottalized, Vowel Height: Non-High, and Vowel Advancement: Back.
Table 4 VOT for aspirated tokens; N = 3536.

Vadv = Vowel advancement. Significance codes: *** p < .001, ** p < .01, * p < .05
For place, the reported and releveled models show that VOT is significantly longer for the alveolo-palatal aspirate, followed by the velar aspirate, and then the dental and bilabial aspirates. The difference in VOT between the dental and bilabial aspirates is not significant. VOT preceding oral vowels is significantly longer than VOT preceding nasal vowels. For tone, after releveling to investigate all possible comparisons, we found that the longest VOT occurs preceding the mid tone, while the shortest VOT occurs preceding the falling-glottalized tone. VOT preceding the remaining tones falls between these two extremes in stepwise fashion, starting with the low tone. Thus, VOT preceding the low tone is significantly less than VOT preceding the mid tone, but not significantly different from VOT preceding the high tone. VOT preceding the high tone is also not significantly different from VOT preceding either the low tone or the mid-glottalized tone, and so on. VOT is significantly longer preceding high vowels as opposed to non-high vowels. Finally, an interaction between place and vowel advancement shows that VOT is significantly longer for the alveolo-palatal aspirate preceding front vowels compared to the velar and dental aspirates. The bilabial aspirate evidences the shortest VOT before front vowels.
4.1.2 Unaspirated obstruents
The unaspirated tokens show significant main effects of place, tone, and vowel height, along with significant place and nasality and place and vowel height interactions (Table 5). The model includes a nasality random slope with speaker. Finally, the model intercept is Place: Bilabial, Nasality: Nasal, Tone: Falling-glottalized, and Vowel Height: Non-High.
Table 5 VOT for unaspirated tokens; N = 2133.

Vheight = Vowel height. Significance codes: *** p < .001, ** p < .01, * p < .05, . p < .1
The place of articulation results, which included releveling, indicate a significant difference in VOT at each place of articulation across all comparisons in the expected directions. Interestingly, the VOT of the dental non-aspirate is significantly shorter than the VOT of the bilabial non-aspirate. For nasality, VOT is significantly longer before oral vowels for the alveolo-palatal and velar non-aspirates compared to the bilabial non-aspirate. The VOT of the dental non-aspirate is not significantly different from the other three places of articulation before oral vowels. Considering tone, VOT preceding a mid or low tone is significantly longer than VOT preceding the falling glottalized tone. VOT preceding the high, falling, and mid-glottalized tones is not significantly different in comparison to the mid, low, and falling-glottalized tones. Before high vowels, the VOT of the bilabial non-aspirate is significantly longer compared to the alveolo-palatal non-aspirate. In addition, the VOT of the dental and velar non-aspirates before high vowels is not significantly different from either the bilabial or the alveolo-palatal non-aspirates.
4.1.3 Summary of VOT results
To summarize, the effects of place, nasality, tone, and vowel quality are specific to whether the voiceless obstruent is aspirated or unaspirated. As shown in Table 6, VOT is the longest for the alveolo-palatals, while it is the shortest for the bilabials and/or the dentals. There is no significant difference in VOT between the dental and bilabial aspirates, while the short lag VOT of the bilabial non-aspirate is significantly longer than the VOT of the dental non-aspirate. VOT is significantly longer before oral vowels for the aspirates. In an interaction between place and nasality, VOT is significantly longer for the alveolo-palatal and velar unaspirated obstruents preceding oral vowels in comparison to the bilabial non-aspirate. The dental non-aspirate does not differ significantly from the other places of articulation before oral vowels.
Table 6 Summary of VOT effects (longest to shortest VOT).

Note: Slashes between items (e.g. pʰ/tʰ) indicate no significant difference in VOT. An equals sign indicates no significant difference between contiguous items. Configurations like M/L (H, F, MQ) FQ indicate that while VOT is significantly different between M/L and FQ, it is not signifi cantly different between H, F, and MQ and M/L or FQ.
Concerning tone, the VOT of the voiceless aspirated and unaspirated obstruents is consistently the longest preceding the mid tone and the shortest preceding the falling-glottalized tone. Considering vowel height, the longest VOT occurs before high vowels for both aspirated and unaspirated obstruents, although in an interaction between place and vowel height, VOT is significantly longer for the bilabial non-aspirate before high vowels in contrast to the alveolo-palatal non-aspirate. Finally, VOT is significantly longer for the alveolo-palatal aspirate compared to the bilabial aspirate before front vowels, with the velar and dental aspirates patterning together.
4.2 Closure duration
For the investigation of context effects on closure duration, the voiced plosives were included in the analysis since closure duration and negative VOT are the same (Abramson & Whalen Reference Abramson and Whalen2017). Although a mixed effects model for all the data failed to converge, comparison of mean closure duration by place of articulation showed a substantial effect of Type. Figure 4 shows that the mean closure duration of aspirated stops (76 ms) is shorter than the mean closure duration of the voiced stops (109 ms), which is shorter than the unaspirated stops (119 ms). Therefore, as with VOT, we separated tokens by type to better examine context effects.

Figure 4 (Colour online) Raincloud plot of mean closure duration by phoneme and type (aspirated obstruents in green, voiced stops in lavendar, and unaspirated obstruents in orange).
Voiceless aspirated obstruents are covered first (Section 4.2.1), followed by voiceless unaspirated obstruents (Section 4.2.2), and then voiced stops (Section 4.2.3). The section ends with a summary of the closure duration results (Section 4.2.4).
4.2.1 Voiceless aspirated obstruents
A linear mixed effects model with the voiceless aspirated tokens shows main effects of place, tone, vowel height, and vowel advancement, with no significant place interactions. The model includes a vowel advancement random slope with speaker (Table 7). The model intercept is Syllables: 1s, Place: Bilabial, Tone: Falling-glottalized, Vowel Height: Non-High, and Vowel Advancement: Back.
Table 7 Closure duration for aspirated tokens; N = 3536.

Significance codes: *** p < .001, ** p < .01, * p < .05
The final and releveled models indicate that the closure duration of the bilabial aspirate is significantly longer than the alveolo-palatal aspirate, which evidences the shortest closure duration. The difference in closure duration between the dental and velar aspirates is not significant. Thus, significant differences in closure duration occur between the bilabial, dental/velar, and alveolo-palatal places of articulation. For tone, closure duration is significantly longer before the mid-glottalized and falling-glottalized tones, while closure duration is significantly shorter preceding the mid and high tones. The difference in closure duration between the mid-glottalized, falling-glottalized, and falling tones is not significant. In addition, the difference in closure duration between the falling and low tones is not significant, nor is the difference in closure duration between the low, mid, and high tones. Considering vowel quality, closure duration is significantly longer preceding both high and front vowels.
4.2.2 Voiceless unaspirated obstruents
For the voiceless unaspirated obstruents, closure duration shows significant main effects of place and vowel advancement, with significant interactions between place and vowel height and place and vowel advancement. A vowel advancement random slope with speaker is also included in the model (Table 8). The model intercept is Syllables: 1s, Place: Bilabial, Vowel Height: Non-High, and Vowel Advancement: Back.
Table 8 Closure duration for unaspirated tokens; N = 2133.

Vheight = Vowel height, Vadv = Vowel advancement. Significance codes: *** p < .001, ** p < .01, * p < .05
The combination of the reported and releveled models shows that closure duration is the longest for the bilabial and dental stops, with no significant difference between them. The velar unaspirated stop evidences the next longest closure duration, followed by the voiceless unaspirated affricate. In an interaction with place, closure duration is also significantly longer for the dental and alveolo-palatal non-aspirates before high vowels as opposed to the bilabial and velar non-aspirates. Finally, closure duration for the bilabial non-aspirate before front vowels is significantly longer compared to the dental, velar, and alveolo-palatal non-aspirates.
4.2.3 Voiced stops
The negative VOT or voiced closure of the voiced plosives /b/ and /d/ shows significant main effects of place, tone, vowel height, and vowel advancement, with no significant effects of nasality and no significant place interactions (Table 9). The model includes a vowel height random slope with speaker. The model intercept is Syllables: 1s, Place: Bilabial, Tone: Falling-glottalized, Vowel Height: Non-High, and Vowel Advancement: Back.
Table 9 Closure duration for voiced tokens; N = 1216.

Significance codes: *** p < .001, ** p < .01, * p < .05
The voiced closure of the bilabials is significantly longer than the voiced closure of the dentals. For tone, voiced closure is significantly longer before a falling-glottalized tone, compared to the high, mid, or falling tones. No significant difference in voiced closure occurs with the low tone. Note that the dataset does not include any tokens of voiced stops with the mid-glottalized tone. Voiced closure is also significantly longer preceding both high and front vowels.
4.2.4 Summary of closure duration results
To summarize, closure duration for the voiceless aspirated, voiceless unaspirated, and voiced tokens shows effects of place, tone, vowel height, and vowel advancement, but not nasality – see Table 10. In this table, the bilabial aspirate, the bilabial and dental non-aspirates, and the bilabial voiced stop evidence the longest closure duration, while the alveolo-palatal affricates and the dental voiced stop show the shortest closure duration. In addition, the closure duration of the dental and velar aspirates and the velar non-aspirate is significantly different compared to the bilabial or alveolo-palatal places of articulation. Significant tonal effects on closure duration are limited to the aspirates and the voiced stops. The longest closure duration precedes the mid-glottalized and falling-glottalized tones for the aspirates and the falling-glottalized tone for the voiced stops. The shortest closure duration precedes the mid and high tones for the aspirates and the high, mid, and falling tones for the voiced stops. For vowel height, the longest closure duration precedes high vowels for the aspirates and the voiced stops, while closure duration is significantly longer for the dental and alveolo-palatal non-aspirates before high vowels in opposition to the bilabial and velar non-aspirates. Finally, the longest closure duration for the aspirates and the voiced stops occurs preceding front vowels. In addition, closure duration is significantly longer for the bilabial non-aspirate before front vowels in opposition to the dental, velar, and alveolo-palatal non-aspirates.
Table 10 Summary of significant closure duration effects (longest to shortest closure duration).

Note: Slashes between items (e.g. th/kh) indicate no significant difference in closure duration. Configurations like MQ/FQ (F) (L) M/H indicate that while closure duration is significantly different between MQ/FQ and M/H, it is not significantly different between F and MQ/FQ, (F) and (L), and (L) and M/H.
4.3 The voiceless interval
A voiceless interval mixed effects model for all the data, which included type and vowel advancement random slopes with speaker, shows an effect of Type. As shown in Figure 5, the aspirated tokens exhibit longer voiceless intervals compared to the unaspirated tokens (p < .001). As a result, we analyse context effects of the aspirated and unaspirated voiceless interval separately.

Figure 5 (Colour online) Raincloud plot of mean voiceless interval by phoneme and type (unaspirated obstruents in orange and aspirated obstruents in green).
The results for voiceless aspirated obstruents are presented first (Section 4.3.1), followed by the voiceless unaspirated obstruents (Section 4.3.2). The section ends with a summary of the voiceless interval results (Section 4.3.3).
4.3.1 Voiceless aspirated obstruents
The analysis of the voiceless interval of aspirates shows main effects of place, nasality, vowel height, and vowel advancement, with a significant place and vowel advancement interaction (Table 11). The model includes nasality and vowel height random slopes with speaker. The model intercept is Syllables: 1s, Place: Bilabial, Nasality: Nasal, Vowel Height: Non-High, and Vowel Advancement: Back.
Table 11 The voiceless interval for aspirates; N = 3536.

Vadv = Vowel advancement. Significance codes: *** p < .001, ** p < .01, * p < .05, . p < .1
The reported and releveled models indicate that the voiceless interval of the velar aspirates is the longest, followed by the bilabial aspirate. The dental and alveolo-palatal aspirates evidence the shortest voiceless interval with no significant difference. The voiceless interval is significantly longer preceding oral as opposed to nasal vowels. It is also significantly longer preceding high vowels. Finally, in an interaction between place and vowel advancement, the voiceless interval of the alveolo-palatal aspirate is significantly longer preceding front vowels compared to the bilabial, dental, and velar aspirates.
4.3.2 Unaspirated obstruents
The analysis of the voiceless interval of voiceless unaspirated obstruents shows significant main effects of place, nasality, and vowel advancement, with a significant place and vowel height interaction (Table 12). No random slopes are included in the model. The model intercept is Syllables: 1s, Place: Bilabial, Nasality: Nasal, Vowel Height: Non-High, and Vowel Advancement: Back.
Table 12 The voiceless interval for non-aspirates; N = 2133.

Vheight = Vowel height. Significance codes: *** p < .001, ** p < .01, * p < .05, . p < .1
The final and releveled models indicate that the voiceless interval of the bilabial non-aspirate is the longest and the alveolo-palatal non-aspirate voiceless interval is the shortest. The difference in the voiceless interval is not significant between the bilabial and velar non-aspirates and the velar and dental non-aspirates. Before high vowels, the voiceless interval is significantly longer for the alveolo-palatal non-aspirate, while it is significantly shorter for the bilabial and velar non-aspirates. Moreover, the voiceless interval of the dental non-aspirate before a high vowel is not significantly different from either the alveolo-palatal non-aspirate or the bilabial and velar non-aspirates. Finally, the voiceless interval of non-aspirates is significantly longer before front vowels.
4.3.3 Summary of voiceless interval results
In sum, as with both VOT and closure duration, the effects of place, nasality, and vowel quality on the voiceless interval are unique to each type. The significant differences in the length of the voiceless interval by context are detailed in Table 13.
Table 13 Summary of voiceless interval effects (longest to shortest voiceless interval).

Note: Slashes between items (e.g. ph/th/kh) indicate no significant difference in voiceless interval duration. Configurations like p (k) t > t┅ indicate that while the voiceless interval between p, t, and t┅ is significantly different, it is not significantly different between p and (k) and (k) and t.
Comparing place effects on the voiceless interval of the aspirates, the velar aspirate is the longest, followed by the bilabial aspirate, and then the dental and alveolo-palatal aspirates. For the non-aspirates, the voiceless interval of the bilabial non-aspirate is the longest. The velar non-aspirate does not differ significantly from either the bilabial or the dental non-aspirates, although the voiceless interval of the dental non-aspirate is significantly shorter than the bilabial non-aspirate. The alveolo-palatal non-aspirate evidences the shortest voiceless interval compared to the bilabial, velar, and dental non-aspirates. The voiceless interval of the aspirated obstruents is significantly longer preceding high vowels, while the voiceless interval of the alveolo-palatal non-aspirate is the longest and the bilabial and velar non-aspirates are the shortest before high vowels. The voiceless interval of the dental non-aspirate before high vowels does not differ significantly from the other non-aspirates. Finally, in an interaction with place, the voiceless interval is the longest for the alveolo-palatal aspirate before front vowels compared to the other aspirates, while the voiceless interval of the non-aspirates is significantly longer before front vowels.
5 Discussion
In this study of the effects of place, nasality, tone, and vowel quality on the VOT, closure duration, and voiceless interval of the stops and affricates of N. Pwo, we recorded stop and affricate-initial words in a carrier phrase, then measured the voiced closure, voiceless closure, and release of these obstruents. Due to the large effect of type (voiceless unaspirated, voiceless aspirated, and voiced obstruents), we modelled the results for each type, first with VOT (release) as the dependent variable, then closure duration (voiced closure + voiceless closure), and the voiceless interval (voiceless closure + release). In the discussion to follow, VOT results are considered first (Section 5.1), followed by the closure duration (Section 5.2) and voiceless interval results (Section 5.3).
5.1 Voice onset time
Effects on VOT included place (Section 5.1.1), nasality (Section 5.1.2), tone (Section 5.1.3), and vowel quality (Section 5.1.4).
5.1.1 Place effects
We predicted that VOT would be the longest for the alveolo-palatal affricates, followed by the velars, then the dentals, and finally the bilabials, although it was possible that the difference in VOT between the dentals and bilabials would not be significant. As expected, the N. Pwo aspirated alveolo-palatal affricate exhibits the longest VOT, followed by the velar aspirate. The bilabial and dental aspirates have the shortest VOT, which does not differ significantly. This pattern is in line with Kayan, a related Karenic language (Luangthongkum Reference Luangthongkum2010), where VOT is not significantly different between the alveolar and bilabial aspirated stops. Non-significant differences between the bilabial and alveolar or dental stops have also been reported for other languages (Lisker & Abramson Reference Lisker and Abramson1964, Cho & Ladefoged Reference Cho and Ladefoged1999).
For the non-aspirates, the alveolo-palatal affricate has the longest VOT, followed by the velar non-aspirate. However, the bilabial non-aspirate has the next longest VOT, while the dental non-aspirate has the shortest VOT. Although this pattern of bilabial stops with longer VOT is uncommon, it is not unheard of. Lisker & Abramson (Reference Lisker and Abramson1964) report such a pattern for Tamil voiceless unaspirated plosives.
5.1.2 Nasality effects
Nasalized vowels affected the VOT of both the aspirated and unaspirated obstruents. For the aspirated obstruents, VOT is significantly longer before oral vowels, while the VOT of the alveolo-palatal, velar, and dental non-aspirates is significantly longer before oral vowels compared to the bilabial non-aspirate.
The N. Pwo results are the opposite of the findings of Walker (Reference Walker1999) and Silva (Reference Silva2008). One possible explanation for longer VOT before N. Pwo oral vowels is that anticipatory velopharyngeal coarticulation disperses the build-up of stop air pressure allowing for an earlier onset of voicing. In the case of Walker (Reference Walker1999) and Silva (Reference Silva2008), where longer VOT is found in a nasal harmony context, anticipatory velopharyngeal coarticulation is less likely because speakers have to create sufficient velopharyngeal closure so that oral pressure can be achieved to produce a burst release, likely creating a situation that increases the VOT in a nasal context. This interaction between VOT and nasality could be explored by comparing a language with phonemic nasalization to a language that has incomplete nasalization. The production of nasality has also been shown to involve lingual, labial, pharyngeal, and velic articulations to different degrees by speakers of Northern Metropolitan French (Carignan Reference Carignan2014). Given this situation, further exploration of the interaction of VOT with nasalized vowels and in nasal harmony contexts, both acoustically and articulatorily, is necessary.
5.1.3 Tone effects
When we considered the influence of tone on VOT, the longest VOT was predicted to occur preceding the N. Pwo mid tone, which rises slightly, as well as the low tone. The shortest VOT was predicted to occur preceding the glottalized and falling tones. For both the voiceless aspirated and unaspirated obstruents, the longest VOT occurred consistently preceding the mid tone. The low tone, which falls slightly, was associated with the next longest VOT for the aspirated obstruents, while unaspirated VOT preceding the low tone did not differ significantly from the mid tone. The shortest VOT occurred preceding the falling-glottalized tone, while VOT preceding the high, falling, and mid-glottalized tones ranged between these two extremes.
The association of the longest VOT with the N. Pwo mid and low tones and shortest VOT with the falling and glottalized tones is in keeping with the results for Hakka. In Hakka, VOT preceding both Tone 1 (24) and Tone 5 (11) is significantly longer for both aspirated and unaspirated voiceless stops. VOT is the shortest preceding stopped-syllable tones for both unaspirated stops (Tone 8 (55)) and aspirated stops (Tone 4 (32), Tone 8 (55)). Phonetically, the N. Pwo glottalized tones are considered to consist of pitch with a final glottal stop, which makes them comparable with the Hakka stopped-syllable tones. One difference, however, is that VOT preceding the N. Pwo falling tone is shorter along with the two glottalized tones, which differs from the Hakka pattern where only the closed syllable tones have the shortest VOT. However, Mandarin aspirated stops also exhibit significantly shorter VOT before Tone 4, a falling tone.
Based on the results for Mandarin, Hakka, Sgaw Karen, and N. Pwo, it appears that the VOT of voiceless stops is likely to be shorter before falling or glottalized tones, which are likely of shorter duration than the mid, low, and high tones. However, in Sgaw Karen the results indicate that VOT before breathy tones can be even shorter, even though the duration of the glottalized tones is shorter than the breathy tones. Recall that Sgaw Karen aspirated stops do not occur with the low breathy tone and their occurrence with the high breathy tone is rare. Also, the VOT of unaspirated stops is the shortest before breathy tones in Sgaw Karen. So, the Sgaw Karen results indicate that neither aspiration nor tone duration can account for shorter VOT preceding breathy tones.
One possible explanation for these patterns is that increased oral flow and stiffer vocal folds in preparation for the rising tones causes longer VOT. Liu et al. (Reference Liu, Ng, Wan, Wang and Zhang2008) also provide a similar explanation for longer VOT with rising tones. However, this explanation does not account for the longer VOT before a low tone in open syllables, a pattern observed in both Hakka and N. Pwo. Furthermore, since breathy phonation requires less articulatory effort (Marasek Reference Marasek1997, Gick et al. Reference Gick, Wilson and Derrick2013), it makes sense that the Sgaw Karen breathy tones are associated with the shortest VOT.
Another possible contributor to tonal effects on VOT is the fundamental frequency of the following vowel, which has been shown to be significantly higher following an aspirated consonant (House & Fairbanks Reference House and Fairbanks1953, Hombert, Ohala & Ewan Reference Hombert, Ohala and Ewan1979, Francis et al. Reference Francis, Valter, Virginia Ka Man and Ka Lam Chan2006, Kirby Reference Kirby2018). Since VOT tends to be longer before high vowels, this circumstance could contribute to longer VOT before rising tones, although Kirby (Reference Kirby2018) found that the effect of aspiration on the f0 of the following vowel was absent when target words appeared in a carrier phrase, which was the context of the N. Pwo tokens. The present results, which investigate the effect of tone on VOT, and these other results that investigate the effect of stops on the fundamental frequency of the following vowel suggest a need for further research on the two-way interaction between VOT and tone.
5.1.4 Vowel quality effects
Concerning vowel height, we predicted that VOT would, in general, be longer preceding high vowels. In N. Pwo, VOT is significantly longer preceding high vowels for both voiceless aspirates and non-aspirates. This result is in line with the findings in other languages (Rochet & Fei Reference Rochet and Fei1991, Nearey & Rochet Reference Nearey and Rochet1994, Morris et al. Reference Morris, McCrea and Herring2008). Like the explanation for the tone results, longer VOT before high vowels may be due to an increase in the tension of the vocal folds and their length. Based on their study of the effect of vowels on the vocal mechanism, Higgins, Netsell & Schulte (1998: 723) suggest that vowel effects may be brought about by ‘a combination of the mechanical influence of laryngeal cartilage position, the suspected reflexive coupling of supralaryngeal and laryngeal neurons, and learned neural adjustments on the part of the speaker’.
For vowel advancement, we predicted that the VOT of the dental, alveolo-palatal, and velar voiceless aspirated and voiceless unaspirated obstruents would be the longest before front vowels, while the VOT of bilabial stops would be the shortest. For N. Pwo, the VOT of the alveolo-palatal aspirate is significantly longer before front vowels, followed by the velar and dental aspirates, with the VOT of the bilabial aspirate the shortest before a front vowel, as predicted. The voiceless unaspirated obstruents showed no effect of vowel advancement.
To summarize, the N. Pwo place effects on VOT behaved as expected, except that the VOT of the bilabial unaspirated stop was significantly longer than the dental unaspirated stop. We also found that nasality, tone, and vowel quality played a significant part in predicting the duration of VOT.
5.2 Closure duration
The question for closure duration was whether VOT and closure duration are in a reciprocal relationship when affected by vowel quality, a post-obstruent nasalized vowel, or tone. Place effects on closure duration were expected to pattern in the opposite direction of place effects on VOT, based on the findings of Weismer (Reference Weismer1980), Walker (Reference Walker1999), and Silva (Reference Silva2008). Thus, longer closure duration accompanies shorter VOT and vice versa (Maddieson Reference Maddieson, Hardcastle and John1999).
In N. Pwo, the patterns of place are largely reciprocal between VOT and closure duration for both the voiceless aspirated and unaspirated obstruents. Both the bilabial and dental aspirates show the shortest VOT, while only the bilabial aspirate evidences the longest closure duration. The next longest closure duration is shared by both the dental and velar aspirates. These results are like the Yao (Reference Yao2007) results, which showed that the closure duration of the aspirated bilabial was significantly longer than both the alveolar and velar aspirates. Between the alveolar and velar aspirates, the difference in closure duration was not as significant, which parallels the N. Pwo pattern where the difference between the dental and velar aspirate closure duration was not significant. For the unaspirated obstruents, the VOT of the alveolo-palatal and velar non-aspirates is in a reciprocal relationship; the alveolo-palatal non-aspirate shows the longest VOT and the shortest closure duration, followed by the velar non-aspirate. The only departure from full reciprocity is that the VOT of the bilabial non-aspirate is significantly longer than the dental non-aspirate, while the difference between the closure durations of the bilabial and dental non-aspirates is not significant. As Yao (Reference Yao2007) concluded, both VOT and closure duration are affected by different aspects of the context. As a result, VOT and closure duration do not exhibit perfectly reciprocal patterns.
Tone is the only other context effect that shows a reciprocal pattern between VOT and closure duration, although this only holds for the aspirated obstruents. For the aspirated obstruents, the longest VOT occurs preceding the mid tone, with the low tone the next longest. The shortest VOT occurs preceding the falling and falling-glottalized tones. This pattern is reversed somewhat for aspirated closure duration where the longest closure duration occurs preceding the mid-glottalized and falling-glottalized tones, which does not differ significantly from the closure duration preceding a falling tone. The shortest closure duration occurs preceding both the mid and high tones, which do not differ significantly from the low tone.
As for the other obstruent types, non-aspirate closure duration is not affected by tone, while the voiced closure of voiced stops evidences a similar pattern to voiceless aspirated closure. The longest voiced closure occurs preceding the falling-glottalized tone and the shortest voiced closure occurs preceding the high, mid, and falling tones. The voiced closure preceding the low tone stands in-between the falling-glottalized and high, mid, and falling tones.
Nasality was the only context effect that did not affect the closure duration of any of the obstruent types. This means that VOT is longer before oral vowels with no change in closure duration before either oral or nasal vowels. This was not the case for either of the nasal harmony languages (Walker Reference Walker1999, Silva Reference Silva2008), where longer VOT in nasal words correlated with shorter closure duration.
Finally, both vowel height and vowel advancement showed an overall parallel relationship between VOT and closure duration for the voiceless obstruents. This means that both VOT and closure duration are the longest preceding high, front vowels. However, in a place interaction with vowel height, the results showed a reciprocal pattern. Specifically, the bilabial non-aspirate has the longest VOT and the alveolo-palatal non-aspirate has the shortest VOT before high vowels. In contrast, the dental and alveolo-palatal non-aspirates have the longest closure duration, and the bilabial and velar non-aspirates have the shortest closure duration before high vowels. If there is a place interaction with vowel advancement, the pattern parallels the longest to shortest VOT or closure duration by place. Thus, before front vowels, the alveolo-palatal aspirate has the longest VOT and the bilabial aspirate has the shortest VOT, while the bilabial non-aspirate has the longest closure duration and the dental, velar, and alveolo-palatal non-aspirates have the shortest closure duration. As for the voiced closure of the voiced stops, it is also significantly longer preceding both high and front vowels. This result is contrary to our prediction that voiced closure would not be affected by vowel advancement, based on the lack of a vowel advancement effect on French voiced stops (Nearey & Rochet Reference Nearey and Rochet1994).
This account of context effects on VOT and closure duration has described three general patterns: 1) a reciprocal pattern for place between voiceless VOT and closure duration and voiceless aspirated tone, 2) an effect of nasality on VOT but not on closure duration, and 3) a parallel pattern between VOT and closure duration for both vowel height and vowel advancement if no place interactions are involved. Voiced stops are in a class of their own due to their voiced closure without any VOT, although context effects pattern similarly to voiceless closure duration. By including the voiceless interval in the discussion some suggestions for the reasons for the N. Pwo patterns can be proposed.
5.3 The voiceless interval
For the voiceless interval (voiceless closure + voice onset time), the question was whether the voiceless interval could be affected by place, vowel quality, nasality, or tone. Table 14 summarizes the results for VOT, closure duration, and the voiceless interval together. The results in Table 14 demonstrate that the voiceless interval is not constant and is subject to context effects, contrary to reports that the voiceless interval remains constant regardless of place (Suomi Reference Suomi1980, Weismer Reference Weismer1980) or nasality (Walker Reference Walker1999), although whether the voiceless interval is constant could be language-specific, based on Walker’s (Reference Walker1999) report that the voiceless interval is constant between oral and nasal words, while Silva (Reference Silva2008) reports that the voiceless interval is not constant for another Amazonian nasal harmony language.
Table 14 Context effects on VOT, closure duration and the voiceless interval (longest to shortest).

Note: Slashes between items (e.g. ph/th) indicate no significant difference in VOT, closure duration, or voiceless interval duration. Configurations like High p (t, k) t┅ indicate that VOT before high vowels is not significantly different between (t, k) or p and t┅, while VOT before high vowels is significantly different between p and t┅.
These context effects pattern in one of three ways in relation to VOT and closure duration. In the first pattern, the context effect is the same across VOT, closure duration, and the voiceless interval. This would indicate that the context effect affects the whole consonant. In the second pattern, a context effect on the voiceless interval is the same as either VOT or closure duration. If the voiceless interval patterns the same as VOT, this would mean that the context effect influences VOT more than closure duration. The opposite is true when the voiceless interval patterns the same as closure duration. Finally, in the third pattern, a context effect does not affect the voiceless interval and the effect on VOT and closure duration is in a reciprocal relationship.
Considering context effects that affect the whole consonant, vowel height is the only effect that patterns the same across VOT, closure duration, and the voiceless interval. Thus, VOT, closure duration, and the voiceless interval are all significantly longer before high vowels. It is possible that it takes more time to build up the tension necessary to produce a high vowel, which involves the entire voiceless interval. This same reasoning would apply to the closure of voiced stops, which is also significantly longer before high vowels.
Vowel advancement presents mixed patterns when place interactions occur. In general, the longest VOT, closure duration, and voiceless interval occur before front vowels. Even with a place interaction, it is the aspirated alveolo-palatal affricate that has the longest VOT and voiceless interval before front vowels. In contrast, we found no effect of vowel advancement on the VOT of unaspirated obstruents, while closure duration is the longest for the bilabial non-aspirate before front vowels as opposed to the other obstruents. The voiceless interval of the non-aspirates is also the longest before front vowels. This pattern also holds for the voiced closure of voiced stops. Taken together, it appears that vowel advancement is also a property of the whole consonant, although it is possible that the short-lag VOT of unaspirated obstruents may not allow for a vowel advancement effect.
Considering the second pattern in which context effects on the voiceless interval are similar to either VOT or closure duration, place effects on closure duration and the voiceless interval are similar for all obstruent types. These place effects are the opposite for VOT. This would seem to indicate that the influence of place is stronger on closure duration than VOT. In contrast, nasality only affects VOT and the voiceless interval. Since nasality affects both VOT and the voiceless interval in N. Pwo, this might indicate that the influence of nasality is stronger on VOT than closure duration.
In the third pattern, the voiceless interval shows no effect of tone. In addition, tone only affects the closure duration of the aspirated obstruents and the voiced stops. For the aspirated obstruents, tone effects on closure duration are the opposite of VOT; VOT is the longest before the mid tone, while closure duration is the shortest before the mid tone. In this instance, tonal effects cancel each other out, which may be the reason that tone has no effect on the voiceless interval. Voiced closure parallels the pattern of the aspirated obstruent closure duration to some extent. Like the aspirated obstruents, the longest voiced closure occurs preceding the falling-glottalized tone and the shortest closure duration occurs preceding the high, mid, and falling tones.
In sum, vowel quality effects largely impact the acoustic characteristics of the whole consonant. In contrast, place has a stronger influence on closure duration, while nasality has a stronger influence on VOT. Finally, tone does not affect the length of the voiceless interval at all and the reciprocal patterns of tonal effects on VOT and closure duration cancel each other out for the aspirated obstruents, while the effect of tone on the voiced closure of the voiced stops patterns like the effect of tone on the closure duration of the aspirated obstruents.
6 Conclusion
In addition to our primary goal of documenting some acoustic aspects of N. Pwo Karen for the first time, we set out to answer three questions. The first question was concerned with whether the VOT of N. Pwo stops and affricates is affected by tone, vowel quality, and nasality, since N. Pwo is a ‘true voicing’ language, with a three-way distinction in stops. We found that VOT is affected by all three of these context effects.
The second question asked whether VOT and closure duration evidence a reciprocal relationship when affected by vowel quality, a post-obstruent nasalized vowel, or tone. In a reciprocal relationship, when VOT is long, closure duration is short and when closure duration is long, VOT is short. We found that only place and tone had reciprocal effects on VOT and closure duration, while nasality only affected VOT, and vowel quality effects were similar for VOT and closure duration.
The third question asked whether the voiceless interval was affected by place, vowel quality, nasality, and tone. We found that the voiceless interval was affected by place, nasality, and vowel quality, but not tone. These patterns show that the voiceless interval in N. Pwo is not constant.
It is apparent from this study that in order to develop an understanding of the laryngeal timing of obstruents, it is necessary to examine at least closure duration, in addition to VOT, while keeping in mind that context has an effect on both VOT and closure duration, as well as the voiceless interval.
In addition, questions remain, especially about the effects of nasality and tone on VOT and closure duration. For both tone and nasality, the configuration of a language’s tonal or nasal inventory seems to have a bearing on their effects, considering the divergent patterns of nasal harmony contexts and post-obstruent nasalized vowels. In addition, obstruents have been shown to affect the fundamental frequency of the following vowel. The interaction of this pattern with the effects of tone on obstruents needs to be considered together. Finally, more studies of tone effects on VOT, closure duration, and the voiceless interval are needed to determine whether the patterns seen in N. Pwo are found in other languages with a similar tonal inventory. Furthermore, tonal duration measurements would provide insight into tonal effects on laryngeal timing.
Acknowledgements
The authors acknowledge the assistance of Mr. Sanit Thutphithak, who facilitated the data collection, and the twelve participants who provided the data for this study. We also benefited from the input of three anonymous reviewers.
Appendix A. Word list

Appendix B. Distribution of context effects
Table A1 Quantity of vowel, tone, and place combinations in the data set.

H = High, M = Mid, L = Low, F = Falling, MQ = Mid-glottalized, FQ = Falling-glottalized
Dark grey shading indicates an impossible vowel + tone combination. Lighter grey shading indicates a possible vowel/tone combination that does not occur in the data set.