Final obstruent devoicing is prevalent crosslinguistically. While many languages phonologically neutralize final obstruents (e.g., German, Catalan, Russian), others do so variably; in English, complete and partial devoicing of final obstruents is relatively common (e.g., Docherty, Reference Docherty1992 and references therein). The prevalence of this phenomenon can be attributed to the aerodynamics of voiced obstruents, which require airflow through the glottis yet prevent supraglottal airflow from being readily released. Thus, “active oral tract expansion (for example, by tongue root advancement or larynx lowering) is necessary to maintain airflow in an obstruent” but “cannot be continued indefinitely or controlled tightly” (Hayes & Steriade, Reference Hayes, Steriade, Hayes, Kirchner and Steriade2004:8). While these aerodynamic difficulties are exacerbated in prepausal position, obstruent devoicing in English also variably occurs word- and phrase-medially and initially.
Despite its ubiquity in varieties of Mainstream U.S. English (MUSE), final /z/ devoicing has been consistently cited as a salient feature of LatinxFootnote 1 Englishes (e.g., Frazer, Reference Frazer1996; Thomas & Van Hofwegen, Reference Thomas, Van Hofwegen and Thomas2019; Thompson, Reference Thompson1975), that is, varieties spoken by individuals and communities with an ethnic background and descent from Mexico, Central, or South America. For these varieties, final /z/ devoicing has been attributed to substrate effects: Spanish lacks phonemic /z/. However, it is not clear from the existing literature that speakers of Latinx Englishes devoice final /z/ at higher rates than speakers of MUSE; the strong connection listeners make between this feature and Latinx speakers remains unexplained. The primary goal of the present study is to explore whether the acoustic characteristics of devoiced final /z/ might provide clues as to why devoicing is so much more salient in these language varieties.
In addition to exploring the acoustic characteristics of devoiced final /z/ in Latinx English, a secondary goal of the present study is to evaluate the potential benefits of analyzing this phenomenon as continuous rather than categorical. That is, while it is well established that voicing is a gradient phonetic process, existing work on this variable has conceived of it segmentally ([z] vs. [s], [z] vs. [z̥], etc.). This methodological approach requires researchers to draw arbitrary lines between variants, relying on dichotomous auditory judgments and hindering replicability. We thus aim to provide the first instrumental study of final /z/ devoicing in Latinx English in which we analyze this phenomenon gradiently via the continuous measure of percent voicelessness and subsequently compare our findings to existing segmental accounts.
Through an acoustic analysis of the naturalistic speech of eighteen Latinx high schoolers in Southern California, the paper endeavors to address two distinct research questions:
1. In other dialects of English, research has shown that devoiced /z/ retains various acoustic characteristics that prevent it from fully neutralizing with /s/. Is there evidence for more complete neutralization that might explain the feature's increased salience in Latinx dialects?
2. When /z/ devoicing is examined gradiently rather than categorically, do the following segment, morphological status of /z/, and gender identity of the speaker–the most robust predictors of /z/ devoicing in previous accounts of the phenomenon–still condition variation in the expected ways?
The sections that follow review the literature that serves as a foundation for these research questions.
(De)voicing as a gradient property
Previous studies of /z/ devoicing in both Latinx and other dialects of English have largely treated this variable as categorical, coding variants in either a bipartite or tripartite manner. Yet crosslinguistic studies on the inverse phenomenon of /s/ voicing have made compelling cases, both theoretical and practical, for treating voicing as gradient.
In a paper examining /s/ voicing assimilation in a non-/s/ weakening dialect of Peninsular Spanish, Campos-Astorkiza (Reference Campos-Astorkiza, Côte, Mathieu and Poplack2014) found ample variation in the degree of voicing before a following voiced obstruent, an environment thought to trigger phonological voicing assimilation across dialects of Spanish. The author concluded that “[/s/ voicing] is not a categorical process, but rather gradient and incomplete in many cases” and attributed this gradience to “increased gestural overlap between two adjacent and contradictory glottal gestures,” which “results in gestural blending and, consequently, gradient surface assimilation.”Footnote 2 The author further emphasized that “the question is not whether assimilation takes place or not, but rather to what extent the involved laryngeal gestures overlap; there could be no or minimal overlap, or total overlap” (Campos-Astorkiza, Reference Campos-Astorkiza, Côte, Mathieu and Poplack2014:31). Earlier work on other dialects of Spanish (e.g., Romero, Reference Romero and Ohala1999; Schmidt & Willis, Reference Schmidt, Willis and Alvord2011) has made similar claims.
Investigations of /s/ voicing in other languages such as Greek have yielded parallel results and led scholars to arrive at similar conclusions about the gradient nature of voicing. Pelekanou and Arvaniti (Reference Pelekanou, Arvaniti and Clairis2001), for example, examined /s/ voicing sandhi in two regional dialects of Greek and found that, in /s/+sonorant environments, degree of voicing is unsystematic. In line with Campos-Astorkiza (Reference Campos-Astorkiza, Côte, Mathieu and Poplack2014), the authors argued that the fact that identical phonological environments engendered different degrees of voicing showed that “/s/-voicing should best be treated as a gradient phenomenon, i.e., as the result of gestural overlap between the gestures of the vocal folds” (Pelekanou & Arvaniti, Reference Pelekanou, Arvaniti and Clairis2001:73). Baltazani (Reference Baltazani2006) later reaffirmed Pelekanou and Arvaniti's results in a similar paper and cautiously situated her findings within Articulatory Phonology (Browman & Goldstein, Reference Browman and Goldstein1986, Reference Browman and Goldstein1989, Reference Browman and Goldstein1992). Akin to explanations offered by scholars aforementioned in this section, Baltazani (Reference Baltazani2006:9) framed gradient voicing assimilation as “variable reduction in the amplitude of the opening gesture of the glottis, which is responsible for voicelessness.”
/z/ devoicing as a feature of Latinx Englishes
Because /z/ devoicing is common across dialects of English, the question of why it is such a salient feature of Latinx Englishes is one that scholars have been interested in investigating. Early accounts of /z/ devoicing posited that frequency may be the answer. In one of the first studies of this feature, for example, Thompson (Reference Thompson1975) examined the speech of forty second-generation Mexican American men in Austin, Texas, in an auditory coding task and found that twenty-three of his participants devoiced /z/ between 10% and 15% of the time (typically after /ɹ/, as in ‘cars’ /kɑɹz/, also a feature of Anglo speech in Austin), while the other seventeen devoiced at a rate of 35-75%. He concluded that only this latter group showed increased devoicing attributable to contact with Spanish and thus proposed a threshold of 25% for Spanish-influenced /z/ devoicing.
Some more recent work has shown that Mexican American speakers do, indeed, devoice /z/ at rates at or above this threshold. For example, in a study of twenty-four Mexican American teens in San Antonio, Texas, Bayley and Messing (Reference Bayley, Messing, Grosvald and Soares2008) found that final /z/ devoicing occurred in 39.1% of the tokens. In the same community a handful of years later, Bayley and Holland (Reference Bayley and Holland2014) examined final /z/ devoicing among thirteen novel speakers and found that the rate of devoicing was almost identical to that obtained by Bayley and Messing.
That said, not all findings have been so consistent. Frazer (Reference Frazer1996), for example, found that Mexican American speakers in Sterling and Rock Falls, Illinois, devoiced /z/ either very infrequently or not at all, with individual rates ranging from 0% to 16%, all well under Thompson's (Reference Thompson1975) threshold. Similarly, in a comprehensive study of Chicanx English speakers in Austin, Texas, Galindo (Reference Galindo1987) found that only 10% of tokens of /z/ were devoiced to [s]. While differences in methodologyFootnote 3–specifically, what constitutes devoicing–and the social characteristics of participants may have contributed to these disparate findings, it is nevertheless difficult to conclude from this body of literature that speakers of Latinx Englishes devoice /z/ at higher rates than speakers of other dialects.
Studies of /z/ devoicing in MUSE further complicate the story that frequency explains the salience of this feature in Latinx Englishes. In an examination of English /z/ devoicing via electroglottography (EGG) measures, for example, Smith (Reference Smith1997) found substantial variation in devoicing among four Midwestern and Western U.S. English-speaking participants. Binning tokens into three groups–“voiced” (tokens in which more than 90% of the fricative showed vocal fold vibration), ”partially devoiced” (tokens in which 25-90% of the fricative was voiced), and ”devoiced” (tokens in which less than 25% of the fricative was voiced)–Smith found that speakers produced ”devoiced” tokens at rates of approximately 24-67% and ‘”partially devoiced” tokens at rates of approximately 17-45%.Footnote 4 All participants devoiced or partially devoiced /z/ more than 50% of the time, a rate higher than those found in various studies of /z/ devoicing in Latinx Englishes. Other studies of /z/ devoicing in MUSE have yielded similar results. For example, José (Reference José2010) found that White speakers in Northwestern Indiana devoiced postsonorant /z/ at a rate of 34% overall, well above Thompson's (Reference Thompson1975) threshold.
Just as the frequency of /z/ devoicing does not suffice to explain its salience in Latinx Englishes, nor do the contexts in which the phenomenon occurs: the phonological environments and morphological classes that favor devoicing have been found to be almost identical in Latinx and non-Latinx dialects of English. (Findings for Latinx Englishes are reviewed in the following section; for other dialects of English, see, for example, Holmes [Reference Holmes1996], José [Reference José2010], and Smith [Reference Smith1997]). We thus turn to possible acoustic explanations.
Accounts of /z/ devoicing in non-Latinx dialects of English have examined the acoustic cues that potentially differentiate devoiced /z/ from /s/. Smith (Reference Smith1997), for example, found that, even when fully or partially devoiced, /z/ retains many of the acoustic cues that typically differentiate voiced fricatives from their voiceless counterparts; that is, the duration of /z/ is significantly shorter than that of /s/, the duration of the vowel preceding /z/ is significantly longer, and airflow for /z/ is significantly lower. Smith concluded that /z/ and /s/ resist complete neutralization, even when the former is produced without vocal fold vibration. As no similar analyses (to our knowledge) have been conducted with speakers of Latinx Englishes, we hypothesize that more complete neutralization of devoiced /z/ and /s/, a loss of contrast that would make /z/ devoicing more perceptually salient, might acoustically distinguish Latinx English /z/ devoicing.
Existing sociophonetic accounts of Latinx English /z/ devoicing
The voicing and manner of the following segment and the morphological class of /z/ have consistently been identified as the best predictors of devoicing in Latinx Englishes. Unsurprisingly, given expected regressive assimilatory effects, Bayley and Messing (Reference Bayley, Messing, Grosvald and Soares2008), Bayley and Holland (Reference Bayley and Holland2014), Doviak and Hudson-Edwards (Reference Doviak, Hudson-Edwards, Blansitt and Teschner1980), and others have identified the voicing of the following segment as the most important determinant of /z/ devoicing. Most recently, Bayley and Holland (Reference Bayley and Holland2014) found that /z/ was devoiced most frequently when followed by an /s/ (e.g., /dɑɡzslip/ ‘dogs sleep’), a pause (e.g., /dɑɡz/ ‘dogs’), or another voiceless consonant (e.g., /dɑɡzkʌdəl/ ‘dogs cuddle’). While voiced following segments generally disfavor devoicing, manner matters: devoicing is most prevalent preceding nasals and liquids (e.g., /dɑɡznoʊ͡/ ‘dogs know’) and least prevalent preceding glides and vowels (e.g., /dɑɡzit/ ‘dogs eat’). These results echo, for the most part, those of Galindo (Reference Galindo1987) and are summarized in Figure 1.
With respect to morphology, Bayley and Messing (Reference Bayley, Messing, Grosvald and Soares2008) found that inflectional /-z/ (e.g., /dɑɡ-z/ ‘dog-s’) favored devoicing (except in cases of the third person singular, e.g., /dʒ͡ɑɡ-z/ ‘jog-s’), while /z/ that occurred as part of a monomorpheme (e.g., /lɛnz/ ‘lens’) was less likely to be affected. Bayley and Holland (Reference Bayley and Holland2014) further extended this result, showing that plurals, possessives (e.g., /dɑɡ-z/ ‘dog's’), and contracted copulas (e.g., /hiz/ ‘he's’) all favored devoicing, while third person singular was neutral, and monomorphemes disfavored it. The authors noted that, while this finding “is the opposite of what we find in well-studied variables, such as English coronal stop deletion where inflected forms tend to resist deletion (Bayley, Reference Bayley1994; Labov, Reference Labov1989),” the case of /z/ devoicing is different in that “there is no loss of information when an affix is devoiced. That is, from a functional perspective, plurals, possessives, and contracted copulas retain the same information whether the affix is realized as [s] or as [z]” (396-98).
Finally, gender has been consistently identified as a predictor of /z/ devoicing in Latinx Englishes, with scholars, regardless of when and where research is conducted, finding that women devoice /z/ at higher rates than their male counterparts (e.g., Bayley & Holland, Reference Bayley and Holland2014; Bayley & Messing, Reference Bayley, Messing, Grosvald and Soares2008; Frazer, Reference Frazer1996; Galindo, Reference Galindo1987; among others). It is worth noting that this finding is not unique to Latinx English. Rather, gender seems to be the most robust social predictor of /z/ devoicing in other dialects of English as well (e.g., José, Reference José2010; Verhoeven, Hirson, & Basavaraj, Reference Verhoeven, Hirson, Basavaraj, Lee and Zee2011).
Methods
Participants
The speakers in this study are eighteen self-identified Chicanx high school students (15–17 years old) residing in Los Angeles, San Bernardino, and Riverside Counties of Southern California. Eleven identify as female, and seven identify as male. All students report English and Spanish as their first languages and indicate that they use both English and Spanish on a daily basis. Figure 2 shows the geographic locations of the various communities from which the participants hail,Footnote 5 with the Latinx population (which ranges from 28.2% [Loma Linda] to 96.2% [East Los Angeles])–represented by dot size, based on data from the U.S. Census (U.S. Census, 2020). While the Census data reflect those who identify as any type of Latinx, the students in the current study all have Mexican American heritage and do identify as Chicanx (Fought, Reference Fought2003).
Speakers were interviewed during June and July of 2019 while participating in a Summer Enrichment Program (SEP) sponsored by Pomona College that provides English and Math classes for high-achieving, low-income/first-generation college students living in Southern California. All interviews, lasting approximately thirty minutes each, were conducted by six high school seniors also participating in the SEP, under the direction of a research team led by the first author and two undergraduate students. The SEP provides enrichment opportunities such as supplementary English and math classes for low-income/first-generation high schoolers in Southern California with the goal of assisting them in gaining admission to selective colleges and universities. One hundred percent of the students participating in the SEP identified as students of color, with 52% identifying as Hispanic/Latinx. The SEP student researchers involved in the data collection had participated in the program since their first year of high school and, as incoming seniors, they had the opportunity to collaborate with faculty and undergraduate students on a research project. The students were trained in research ethics and sociolinguistic interview procedures before embarking on the interview process.
Given the original research team's interest in the relationship between the use of nonstandard linguistic forms and school experience, interviews focused on topics related to language use, discipline and punishment, and race/ethnicity. As the interviewers were peers of the participants speaking in a private setting, students may have felt more comfortable using their authentic speech style and openly discussing their experiences at school (Rickford & McNair-Knox, Reference Rickford, McNair-Knox, Biber and Finegan1994). Three of the interviewers identified as Chicanx, two as Asian American, and one as African American. Each student conducted 3-5 one-on-one peer interviews as part of the study. Data collection met the ethical standards required by Pomona College and was approved through its Institutional Review Board (IRB).
Analysis
All interviews were orthographically transcribed and annotated by the first author and the undergraduate researchers on the research team using ELAN (ELAN, 2019) and were subsequently forced aligned using DARLA (Reddy & Stanford, Reference Reddy, Stanford, Gerber, Havasi and Lacatusu2015). For each recording, all instances of expected word-final [z]Footnote 6 were identified and coded for preceding segment, following segment, and word on an interval tier in Praat (Boersma & Weenink, Reference Boersma and Weenink2020). If the segment preceding /z/ was a vowel, then that vowel was also segmented. We extracted TextGrid information as well as measures of percent voicelessness, fricative duration, preceding vowel duration (if applicable), and center of gravity (COG) for 2,382 tokens of /z/ using a Praat script (Brown, Reference Brown2014). Each participant contributed an average (mean) of 132 tokens of /z/ to the final dataset. While no prior studies (to our knowledge) have examined the role of COG in maintaining perceptual contrast between [z̥] and [s], we thought it prudent to do so given that voicing often pulls down COG values even when a low pass-band filter has been applied (Lindhout, Reference Lindhout2016; Niebuhr, Lancia, & Meunier, Reference Niebuhr, Lancia, Meunier, Sock, Fuchs and Laprie2008), giving voiceless fricatives reliably higher COGs than their voiced counterparts.
Once tokens and their corresponding acoustic and temporal measures had been extracted, the research team coded them for morphological class: third person singular /-z/ (e.g., /ɡoʊ͡-z/ ‘goes’), other inflectional /-z/ (e.g., /dɑɡ-z/ ‘dogs’), or lexical /z/ (e.g., /lɛnz/ ‘lens’). Before conducting statistical analyses, we converted COG measures to z-scores using the scale() function in R (R Core Team, 2020) in an effort to account for variance in vocal tract size. All statistical analyses were conducted in R, and all data visualizations were produced using the ggplot2 package (Wickham, Reference Wickham2016).
Results
Almost half of the /z/s analyzed (48%) were categorically voiced (37.3%) or voiceless (10.4%); the other 52% were intermediately devoiced. Figure 3 summarizes the distribution of data in a density plot.
While previous studies have shown that Latinx English speakers tend to devoice /z/ at rates of more than 25%, a gradient analysis reveals that the proportion of categorically voiceless tokens among our speakers is substantially lower. That said, almost two-thirds of tokens are devoiced to some degree, highlighting the potential issues posed by treating this phenomenon segmentally and the necessity of examining voicing in more precise detail.
Acoustic/temporal cues differentiating devoiced /z/ from /s/
Given that, in other dialects of English, research has shown that devoiced /z/ retains various acoustic characteristics that prevent it from fully neutralizing with /s/, we first investigated whether there was evidence for more complete neutralization in our data that might explain the feature's increased salience in this dialect. For all tokens of final /z/ and final /s/ for which percent voicelessness was greater than zero (n = 2100), we compared four acoustic or temporal measures that typically distinguish phonemic /z/ and /s/: percent voicelessness, COG, fricative duration, and the duration of the preceding vowel (when applicable; n = 1645). In the case of incomplete neutralization, we would expect all of these measures to be significantly higher for /s/ except for the duration of the preceding vowel, which tends to be longer before /z/ (and voiced obstruents in general).
Using the lme4 package (Bates, Maechler, Bolker, & Walker, Reference Bates, Martin, Ben and Steve2015) in R, we constructed four linear mixed-effects models. Each model examined the effect of phoneme (/s/ versus /z/) on one of the four acoustic/temporal measures and included “speaker” as a random effect. This model structure was selected due to the fact that, when the model is run with random effects of both “speaker” and “word,” R yields a “singular fit” warning, which indicates model overfitting (see Matuschek, Kliegl, Vasishth, Baayen, & Bates, Reference Matuschek, Kliegl, Vasishth, Baayen and Bates2017). It was also clear that the slope of the effect of “word” in this model was close to zero, indicating that “word” has very limited explanatory power and thus does not add to the reliability of the model.
The first model, the output of which appears in Table 1, examined percent voicelessness and revealed that, as expected, /s/ is significantly more voiceless than partially- and fully– devoiced /z/.
Significance codes: *p < .05; **p < .01; ***p < .001. Dashes indicate reference level of variable.
The second model, the output of which appears in Table 2, examined normalized COG and revealed that, as expected, tokens of devoiced /z/ have a significantly lower normalized COG than those of /s/.
Significance codes: *p < .05; **p < .01; ***p < .001. Dashes indicate reference level of variable.
For ease of comparison, differences in raw (unnormalized) COG for /s/ and devoiced /z/ are shown in Figure 4.
While the effects of phoneme on percent voicelessness and COG are as expected, the third model revealed no such parallel for fricative duration. That is, while voiced fricatives are thought to be generally shorter than their voiceless counterparts (Kaiser, Reference Kaiser1997), our speakers’ devoiced /z/s are significantly longer than their /s/s. The output of this model is presented in Table 3.
Significance codes: *p < .05; **p < .01; ***p < .001. Dashes indicate reference level of variable.
To ensure that the observed differences are not simply an artifact of the speakers producing longer-than-expected /z/s even in the absence of devoicing, we compared duration across three categories: realizations of final /s/, devoiced realizations of final /z/, and fully voiced realizations of final /z/ (the reader will recall that this third category was excluded from the analyses above). This comparison, presented in Figure 5, reveals that fully voiced final /z/s are much shorter than their devoiced counterparts, confirming that this lengthening is, indeed, unique to the latter variant.
The fourth and final model, the output of which appears in Table 4, examined the duration of the vowel preceding /s/ and devoiced /z/ (where applicable) and revealed that, again unexpectedly, vowels preceding /z/ are significantly shorter than those preceding /s/.
Significance codes: *p < .05; **p < .01; ***p < .001. Dashes indicate reference level of variable.
In summary, results show that in some respects devoiced /z/ remains distinct from /s/: as expected, devoiced /z/ is significantly less voiceless than /s/ and has a significantly lower COG. However, unexpectedly, devoiced /z/ has a significantly longer fricative duration and a significantly shorter preceding vowel duration than /s/, a pair of results that run counter to general tendencies for voiced fricatives to be shorter and have longer preceding vowels than their voiceless counterparts.
Linguistic and social constraints
The second goal of the present study is to ascertain whether, when examined gradiently, following segment, morphological status of /z/, and gender identity of the speaker still condition variation in the expected ways. To address this question, we tested the effects of these three variables on percent voicelessness. Following Chappell and García (Reference Chappell and García2017) and Bolyanatz and Brogan (Reference Bolyanatz and Brogan2021), percent voicelessness was modeled using zero-one inflated beta regression in the gamlss package (Rigby & Stasinopoulos, Reference Rigby and Stasinopoulos2005). Zero-one inflated beta regression is preferable to standard linear regression when the values of the dependent variable are bounded 0 ≤ y ≤ 1 and the data are nonnormally distributed.
The initial model included three fixed effects: following segment (with eight levels reflecting both voicing and manner distinctions), morphological status of /z/, and gender, as well as a random effect for “speaker.” The final model includes the fixed effects of following segment and gender; the morphological status of /z/ was removed because it was not statistically significant, a finding that stands in contrast with segmental work on /z/ devoicing. Table 5 summarizes the model output.
Significance codes: *p < .05; **p < .01; ***p < .001. Dashes indicate reference level of variable.
Akin to previous work on this phenomenon, gender serves as a significant predictor of /z/ devoicing. However, it does so in an unexpected direction: men produce significantly more voiceless /z/s than their female counterparts. This finding, as well the effect of the following segment, are discussed in more detail in the following section.
Discussion
Shades of neutralization
In our effort to ascertain what acoustic distinctions, if any, are retained between devoiced /z/ and /s/ in this dialect, we found that devoiced /z/ was significantly less voiceless and had a significantly lower COG than /s/. While these results are expected based on the existing literature, the findings with respect to fricative and preceding vowel duration are not. In naturalistic speech, voiced fricatives are reliably shorter than their voiceless counterparts while the vowels preceding voiced fricatives are reliably longer. Smith (Reference Smith1997) confirmed that devoiced /z/ remained significantly shorter than /s/ while the preceding vowel remained significantly longer among her MUSE speakers. However, our speakers’ devoiced /z/s were, in fact, significantly longer than their /s/s, while the vowel preceding /z/ was significantly shorter than that preceding /s/. These findings are notable as previous work has found that, the more advanced a neutralization process driven by substrate effects is in an ethnolect, the more stigmatized the feature (Chand, Reference Chand2009).
While the limited amount of instrumental research on this variable makes us cautious in our interpretation of these asymmetries, we propose that, because fricative and preceding vowel duration are strong perceptual cues for voicing, the unexpected length of devoiced /z/ and its preceding vowel may help explain the particularly high salience of this feature in Latinx Englishes. Of course, perceptual research is needed to confirm this hypothesis, and we encourage other scholars to pursue this line of inquiry.
/z/ devoicing: categories versus continua
The data presented in Figure 3 highlight the inherently gradient nature of final /z/ devoicing. When coding the data categorically, where does one draw the line? If “devoiced” means 100% voiceless, the frequency of devoicing in this data is about 10%; if “devoiced” means not fully voiced, the frequency of devoicing in our data is greater than 50%. As the body of literature on Latinx /z/ devoicing continues to grow, it would behoove scholars to either treat this feature as the gradient phenomenon it is or bin segmental categories (i.e., [s]/[z̥]/[z]) according to continuous acoustics to aid in replicability.
Moreover, our sociophonetic analysis reveals that, when the dependent variable is treated as gradient, the linguistic predictors of /z/ devoicing shift in turn. Particularly of note is the absence of a significant effect for morphological status of /z/, which has reliably predicted /z/ devoicing in segmental accounts. Results with respect to following segment are more nuanced. Figure 6 juxtaposes our findings with those of existing segmental accounts; key differences are marked in italics.
As seen in Figure 6, while a following /s/, pause, and non-/s/ obstruent affect devoicing as in segmental accounts, we find differences vis-à-vis where following sonorants fall on the devoicing continuum. In previous accounts, less sonorous sonorants (nasals and liquids) tended to favor devoicing more than more sonorous segments (glides and vowels). However, in our model, vowels and glides favored devoicing more than liquids, which favored devoicing least of any following sound type. Upon examining the distribution of values for percent voicelessness by following segment in Figure 7, we see that nasals and vowels have a higher concentration of values on the right (more voiceless) end of the distribution as compared to liquids, which have more values clustered below 50% voicelessness.
Crucially, while nasals and liquids look almost identical at the poles of the distribution, potentially explaining why they tend to cluster together in segmental accounts, the shape of the distribution of values for nasals is much more akin to that of glides and vowels. A gradient analysis of final /z/ devoicing takes these underlying patterns into account.
A novel gender finding
As described in earlier sections of this paper, much previous work on final /z/ devoicing in Latinx Englishes has found that women devoice final /z/ at higher rates than their male counterparts (e.g., Bayley & Holland, Reference Bayley and Holland2014; Bayley & Messing, Reference Bayley, Messing, Grosvald and Soares2008). While this may be surprising given the general tendency for men to produce nonstandard variants at higher rates than women (Labov, Reference Labov2001:293), it aligns with physiological expectations: men tend to have longer vocal tracts and thicker vocal folds (Beck, Reference Beck, Hardcastle and Laver1999), both of which can make glottal adduction harder to control.Footnote 7
Our finding that male participants devoice /z/ at significantly higher rates than their female counterparts is inconsistent with the existing body of work and runs counter to what we might expect if variation were driven purely by physiological differences. Thus, we propose that, in the speech community in question, the characteristics, qualities, and so on indexed through use of the nonstandard variant (devoiced [z]) may play a crucial role in explaining patterns of variation. With this proposal in mind, it is worth noting that our male speakers also produce significantly more voiceless /z/s preceding voiced consonants as compared to their female counterparts, as shown in Figure 8.
This finding is rather curious, as the prevoiced environment is not only the most phonetically unnatural context for devoicing but is also the only context in which Spanish phonologically voices /s/, in words such as /besbol/ [ˈbez.bol] ‘baseball.’ We argue that this finding adds credence to our argument that /z/ devoicing carries social meaning for speakers, as scholars have frequently cited use of an already-salient variable in phonetically unnatural contexts as a means for emphasizing in-group solidarity. Schilling-Estes (Reference Schilling-Estes2000:159), for example, in her account of /ay/ monophthongization among Lumbee speakers in North Carolina, noted that the “social salience [of /ay/ monophthongization] may have led speakers […] to heighten their usage levels for the variant in phonetically less favored as well as highly favored contexts in order to increase the prominence of an already highly noticeable feature.” Future research should investigate the potential social meaning of final /z/ devoicing among Chicanx youth in Southern California.
Conclusion
The goals of the present study were twofold. First, we endeavored to explore the acoustic nature of final /z/ devoicing among speakers of Chicanx English. We compared realizations of devoiced /z/ with those of /s/ across multiple acoustic and temporal measures. We found that, for measures of percent voicelessness and COG, devoiced /z/ and /s/ remained sufficiently differentiated. However, while previous studies of /z/ devoicing in other dialects of English show that devoiced /z/ retained a shorter fricative duration and longer preceding vowel as compared to /s/, we found that these contrasts were not just lost but reversed: our speakers’ devoiced /z/s were significantly longer than their /s/s, and vowels preceding devoiced /z/ were significantly shorter than those preceding /s/. These novel findings led us to wonder if these durational asymmetries might explain the elevated salience of this variable in this dialect.
The present study also aimed to provide a sociophonetic account of final /z/ devoicing in Chicanx English and championed the use of acoustic measures to do so. Previous work on this variable has treated it categorically, which is problematic both theoretically and practically. First, given the robust evidence that voicing is a gradient process, our analytical methods should reflect as much. Furthermore, because of the gradient nature of this variable, binning variants into artificial categories is necessarily subjective as “the relationship between phonological representation and phonetic realization becomes complex and no clear boundary between /z/ and /s/ can be established” (Thomas & Van Hofwegen, Reference Thomas, Van Hofwegen and Thomas2019:63). In this paper, we showed that the vast distribution of voicelessness across tokens of /z/ necessitates that the variable either be treated as continuous or binned acoustically to aid in replicability.
Moreover, when comparing our sociophonetic model to those presented in previous studies, we found that the linguistic predictors of /z/ devoicing differed slightly when the outcome variable was treated as continuous. We also found that male speakers devoiced final /z/ significantly more than their female counterparts (and did so in more phonetically unnatural environments), a finding that runs counter to much of the existing literature. Cautiously, this provides some evidence that this variable carries social meaning for these male speakers, who are physiologically predisposed to producing less voicelessness, not more. With the recognition that social meaning is locally constructed, we believe that additional in-depth, qualitative research is required to unearth the meaning of this variable in this particular community.
Finally, while durational measures were not the focus of this paper, one limitation of this methodology is that we did not normalize duration by local speech rate, which could potentially affect duration measures. Additionally, we did not conduct a full prosodic analysis examining pause length and boundaries, which could also potentially impact variation at the individual level. Future work that aims to corroborate our finding with respect to fricative length should consider integrating speech rate into its analyses to avoid this shortcoming.
Acknowledgments
We are grateful to our participants as well as the student interviewers for sharing their time and stories with the research team, and to Pomona College's Department of Linguistics and Cognitive Science for financial support. No external funding was obtained in conducting this research.
Competing interests
The authors declare none.