1 Introduction
Cross-linguistic surveys of fricative systems (Maddieson Reference Maddieson1984, Ladefoged & Maddieson Reference Ladefoged and Maddieson1996) show that languages rarely contrast more than two phonemic sibilant fricatives of the same phonation type. Even less common is the use of secondary articulations to contrast sibilants within the same primary place. In Maddieson's (Reference Maddieson1984) survey of sound inventories, only five out of 317 languages have a palatalized dental/alveolar /s j / (compared to 276 languages with a plain dental/alveolar /s/), and only three languages have a palatalized post-alveolar /ʃj/ (compared to 163 languages with a non-palatalized posterior sibilant). These small numbers can be explained by the relative synchronic/diachronic instability of secondary articulation contrasts. The articulatorily complex /s j / (involving the partly conflicting tongue tip and tongue body gestures) is prone to shifting in place to /ʃ/ or /ɕ/, as part of a more general place-changing palatalization of coronals before front vowels and /j/ (Bhat Reference Bhat and Greenberg1978, among others). The posterior /ʃj/, on the other hand, does not differ considerably from /ʃ/ or /ɕ/ (which are inherently more or less palatalized), and thus is often subject to merger or depalatalization (see Carlton Reference Carlton1990 on palatalization and depalatalization processes in Slavic). Interestingly, the few languages that do have phonemic /s j / and/or /ʃj/ (Bulgarian, Kashmiri, Ket, Lithuanian, Paez, Russian, and Yurak in Maddieson Reference Maddieson1984) have those segments as part of a robust palatalized vs. non-palatalized contrast involving consonants of other manners and places. This suggests that the historical stability of these consonants is at least in part contingent on their integration into a broader system of phonemic contrasts. Another potentially important factor is the sufficient acoustic differentiation of palatalized and non-palatalized anterior and posterior sibilants – the question that is specifically investigated in this paper.
Russian is one of the few languages that have contrastive palatalized sibilant fricatives. These consonants – the dental/alveolar /s j / and the palatalized post-alveolar (prepalatal) /ʃj/ – are contrastive with the non-palatalized dental/alveolar /s/ and the retroflex (apical post-alveolar) /ʂ/ (Timberlake Reference Timberlake2004).Footnote 1 The secondary articulation contrast intersects with the primary place contrast between the anterior (also called ‘hissing’) and posterior (‘hushing’) coronal fricatives. Sample words with the four-way contrast among voiceless fricatives in initial, medial, and final position are shown in Table 1.
Table 1 Sample words with Russian voiceless sibilant fricatives (in bold) in various positions.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171116060716312-0860:S0025100317000019:S0025100317000019_tab1.gif?pub-status=live)
Anterior fricatives are well-integrated into the Russian consonant inventory, being part of the phonemic non-palatalized vs. palatalized distinction (which is particularly robust among anterior coronals, i.e. /t d s z n l r/ vs. /t j d j s j z j n j l j r j /; Timberlake Reference Timberlake2004). In contrast, the status of posterior fricatives within the palatalization correlation (set of contrasts) is less clear. Partly this is due to phonetic characteristics of these consonants other than secondary articulation (Jones & Ward Reference Jones and Ward1969, Matusevich Reference Matusevich1976, Avanesov Reference Avanesov1984). For example, /ʂ/ is characterized by a retroflex-like (tip-up) primary constriction, a retraction of the tongue back (see Keating Reference Keating, Paradis and Prunet1991, Hamann Reference Hamann2004), and some degree of lip rounding (Jones & Ward Reference Jones and Ward1969, Matusevich Reference Matusevich1976; but see Ladefoged & Maddieson Reference Ladefoged and Maddieson1996: 148, who attributed no lip rounding to the sound). In contrast, the tongue tip for /ʃj/ is lowered down, with the constriction made with the blade in the prepalatal region; the lips are also rounded, and the consonant has overall longer duration (i.e. [ʃ j ː]) (Bolla Reference Bolla1981). This length can be attributed to the historical development of this consonant from a sequence /ʂ/ + /ʧj/ (via [ʃ j ʧj]; Matusevich Reference Matusevich1976, Timberlake Reference Timberlake2004).Footnote 2 Given the marginal role of consonant length in Russian,Footnote 3 some researchers in the past have disputed the phonemic status of /ʃj/, still considering it an underlying sequence of distinct phonemes (/ʂ/ and /ʧj/; Baudouin de Courtenay Reference Baudouin de Courtenay1964, Zinder Reference Zinder1963; but see Scherba Reference Scherba and Scherba1957, Avanesov Reference Avanesov1948, cited in Matusevich Reference Matusevich1976; see also Timberlake Reference Timberlake2004). Another reason for the special treatment of the posterior fricatives /ʂ/ and /ʃj/ is their distinct phonological behavior. These segments fail to alternate with each other in contexts where other non-palatalized and palatalized consonants do (e.g. [duʂa] – [duʂe], *[duʃ j e] ‘soul, nom.sg – gen sg; compare [kasa] – [kas j e] ‘plait, nom.sg – gen.sg). Given this, Timberlake (Reference Timberlake2004: 57) refers to the posterior fricatives as ‘immutable hard and soft’ (non-palatalized and palatalized) sounds, in contrast to the ‘mutable’ /s/ and /s j /, among other consonants.
While there have been a number of articulatory studies of Russian sibilant fricatives (Skalozub Reference Skalozub1963, Bolla Reference Bolla1981, Kedrova et al. Reference Kedrova, Anisimov, Zaharov and Pirogov2008, Litvin Reference Litvin2014, among others), relatively little has been done on their acoustic properties. A few studies that did explore the acoustics of Russian fricatives were often limited to a subset of the contrasts or vowel contexts, and/or did not always examine acoustic differences quantitatively (Fant Reference Fant1960, Shupljakov, Fant & de Serpa-Leitao Reference Shupljakov, Fant and de Serpa-Leitao1968, Derkach, Fant & de Serpa-Leitao Reference Derkach Miron and de Serpa-Leitao1970, Bolla Reference Bolla1981, Zsiga Reference Zsiga2000, Padgett & Żygis Reference Padgett and Żygis2007). The goal of this study is to provide a systematic investigation of durational and spectral properties distinguishing the Russian four-way contrast in voiceless sibilant fricatives. This is done by examining a number of acoustic variables (consonant duration, frication noise intensity and frequency, as well as adjacent vowel duration and formants F1–F3) in the four fricatives occurring before five vowels and in four positional/stress conditions. Taken more broadly, the investigation of Russian fricatives is intended to contribute to phonetic documentation of cross-linguistically rare fricative contrasts (Ladefoged & Maddieson Reference Ladefoged and Maddieson1996, Gordon, Barthmaier & Sands Reference Gordon, Barthmaier and Sands2002, among others).
2 Acoustic properties of Russian fricatives: Previous work
2.1 Duration
Although many descriptive phonetic studies mention length as a distinctive property of /ʃj/, surprisingly little quantitative data exist on the duration properties of fricatives in Russian. Bolla's (Reference Bolla1981) acoustic description of various Russian consonants (based on a single male speaker) includes the following duration measurements for voiceless sibilant fricatives: 195 ms for /s/, 177 ms for /s j /, 187 ms for /ʂ/, and 290 ms for /ʃj/. Thus, the palatalized posterior fricative in his data is 1.55 times longer than its non-palatalized counterpart. It should be noted that the phonetic contexts in which the fricatives occur are not identical, and therefore, it is not clear to what extent these measurements are representative of the contrast in general. Duration measurements were also performed by Kochetov & Radišić (Reference Kochetov, Radišić, Babyonyshev, Kavitskaya and Reich2009), who examined the production of four fricatives in [aˈCa]-type nonsense words by four speakers. The mean duration values for the fricatives were 180 ms for /s/, 196 ms for /s j /, 180 ms for /ʂ/, and 222 ms for /ʃj/, thus giving a ratio of 1.23 for the latter two consonants. Notably, not all the speakers (three out of four) significantly distinguished the contrasts between /ʃj/ and some of the other fricatives (/ʂ/ and /s/, but not /s j /). This suggested that the durational distinction is not as clear as previously reported, or perhaps has diminished over time. It is also possible, however, that these results are dialect-specific, given that all four speakers in the study were from the same region in Russia (the city of Perm).
Regardless of segment-specific length differences, consonants as a class can differ in duration depending on contextual factors such as syllable/word position and stress. For example, Kochetov & Lobanova (Reference Kochetov and Lobanova2007) found that Komi-Permyak sibilant fricatives were significantly shorter word-finally than word-initially, while Zsiga (Reference Zsiga2000) reported significant shortening of Russian stops in unstressed syllables. These contextual factors may well interact with segment-specific length differences (e.g. /ʃj/ vs. /ʂ/ in onset vs. coda); however, such interactions have not been investigated for Russian fricatives.
2.2 Fricative spectra: Intensity and frequency
Bolla (Reference Bolla1981) observed that palatalized fricatives /s j / and /ʃj/ showed higher intensity noise compared to their non-palatalized counterparts. No comparison was made between anterior and posterior fricatives. Kochetov & Radišić (Reference Kochetov, Radišić, Babyonyshev, Kavitskaya and Reich2009), however, did not find intensity to distinguish palatalized and non-palatalized fricatives. Instead, intensity measurements in their data differentiated anterior and posterior fricatives, with /s/ and /s j / being significantly less intense than /ʂ/ and /ʃj/.
Bolla's (Reference Bolla1981) informal examination of fricative spectra revealed higher concentrations of energy for anterior fricatives (above 4000 Hz with peaks around 5000–6000 Hz for /s/ and above 4500 Hz with peaks between 4750 Hz and 6000 Hz for /s j /) compared to posterior fricatives (above 3500 Hz with the main peak at 4500–5000 Hz for /ʂ/ and from overall, above 2500 Hz, main peak at around 3000 Hz). These noise distribution patterns are expected, given the ‘hissing’ and ‘hushing’ auditory classification of the sounds corresponding to the more anterior and more posterior sources of noise, respectively (see Hughes & Halle Reference Hughes and Halle1956, Jones & Ward Reference Jones and Ward1969). However, specific details of fricative spectra should be taken with caution, given that the contexts for the fricatives were not matched.
Zsiga's (Reference Zsiga2000) study examined Russian /s ʂ s j / in the VC(#)V context in real words produced by five speakers (four males and one female). Measurements of the centre of gravity (COG; also called ‘centroid’) of fricative noise were performed at the onset, midpoint, and offset of each fricative. Zsiga reported a midpoint COG of about 6000 Hz for /s/ and /s j /, and COG of below 5000 Hz for /ʂ/. These values were overall similar to COG values for English fricatives /s/ (5900 Hz) and /ʃ/ (5100 Hz) produced by five English speakers. Interestingly, however, Russian /s j / exhibited little or no change in COG over time, compared to the English /s/ + /j/ sequence, where COG was gradually lowering.
Padgett & Żygis (Reference Padgett and Żygis2007) measured COG of four Russian voiceless fricatives (also at three points) produced by four speakers in nonsense CV syllables, as part of a larger study of Slavic fricative contrasts. They found that /s/ and /s j / had COG around 6000 Hz (but somewhat lower for /s j /, especially at the fricative offset), while /ʂ/ and /ʃj/ had COG around 3000 Hz. These findings are similar to Kochetov & Radišić (Reference Kochetov, Radišić, Babyonyshev, Kavitskaya and Reich2009), who reported an average midpoint COG of 5870 Hz for /s/, 5520 Hz for /s j /, 3705 Hz for /ʂ/, and 3845 Hz for /ʃj/ produced before /a/ by their four speakers (two males and two females). Significant differences in their results involved only pairs of anterior and posterior fricatives (consistently for all four speakers). One other study that examined Russian fricative spectra is Funatsu & Kiritani (Reference Funatsu and Kiritani1998), whose purpose was to investigate acoustic and perceptual differences between Russian and Japanese fricatives. The authors’ data, based on Russian /s s j ʂ/ produced by three native speakers word-initially before /a o u/, showed a binary distribution consistent with other studies: COG at the fricative midpoint was higher for the anterior /s/ and /s j / (4400 Hz and 4300 Hz) and lower for the posterior /ʂ/ (2300 Hz).
2.3 CV formant transitions
It has been long observed that palatalized consonants as a class, regardless of their manner and primary place of articulation, are cued by formant transitions from and to the adjacent vowels (Fant Reference Fant1960, Shupljakov et al. Reference Shupljakov, Fant and de Serpa-Leitao1968, Derkach et al. Reference Derkach Miron and de Serpa-Leitao1970, Purcell Reference Purcell1979). Specifically, these studies determined that in the same phonetic context, palatalized consonants are associated with considerably higher F2 and somewhat lower F1 than non-palatalized ones, especially at the onset of the following vowel. These differences are the result of fronting and raising of the tongue body for secondary palatalization, timed at the release of the consonant constriction. A measure that combines the effects of the first two formants, F2–F1 difference, has been useful in capturing differences between palatalized vs. non-palatalized contrasts (Iskarous & Kavitskaya Reference Iskarous and Kavitskaya2010). Purcell's (Reference Purcell1979) examination of dental/alveolar palatalized and non-palatalized stops /d/ and /d j / (among other stops) produced by four speakers (two males and two females) in nonsense VCV words revealed average F1 and F2 differences of 82 Hz and 409 Hz, respectively, giving an F2–F1 difference of 1884 Hz for palatalized and 1393 Hz for non-palatalized coronal consonants. Similar findings were obtained by Shupljakov et al. (Reference Shupljakov, Fant and de Serpa-Leitao1968) and Derkach et al. (Reference Derkach Miron and de Serpa-Leitao1970), whose stimuli included sibilant fricatives.
Focusing specifically on fricatives, Funatsu & Kiritani (Reference Funatsu and Kiritani1998) measured F2 at the onset of postconsonantal vowels /a/, /o/, and /u/. They found that F2 was considerably higher after the palatalized /s j / than the non-palatalized /s ʂ/, with the magnitude of this difference affected by the vowel context. Combined with COG of the fricative, F2 could distinguish three fricatives in three vowel contexts (with /ʃj/ not considered by the authors). The study by Kochetov & Radišić (Reference Kochetov, Radišić, Babyonyshev, Kavitskaya and Reich2009) examined the effect of the four fricatives on F1 through F3 of adjacent vowels. They found that /s j / and /ʃj/ had a lower F1 (490 Hz and 545 Hz) and higher F2 (1950 Hz and 1975 Hz) at the onset of the following vowel /a/, compared to /s/ and /ʂ/ (F1: 634 Hz and 630 Hz; F2: 1550 Hz and 1515 Hz). This gave F2–F1 differences of around 1450 Hz for palatalized and around 900 Hz for non-palatalized fricatives. Differences in F3 were not consistent. Overall, this suggests that both anterior and posterior fricatives in Russian have transitions similar to other palatalized and non-palatalized consonants, and thus appear to be strongly integrated into the phonemic system. Again, however, the latter findings with respect to fricatives were based on a single vowel context, and thus remain to be confirmed.
2.4 Summary
Table 2 presents a summary of previous acoustic studies of Russian sibilant fricatives, including numbers of speakers, examined contrasts and contexts, acoustic measurements, and major findings.Footnote 4 It can be seen that, although Russian fricatives have received a fair share of attention in the phonetic literature, few studies have examined fricative spectra and vowel transitions together, and even fewer have combined either of these measurements with those of consonant duration. Moreover, none of the studies have examined fricative realizations across positional or stress contexts. Taken together, however, the results suggest that Russian sibilant fricatives are distinguished by a combination of fricative spectra (the anterior vs. posterior contrast) and formant vowel transitions (the secondary articulation contrast). Spectral differences within the anterior and posterior categories appear to be smaller, and often not significant. Further, there is evidence that /ʃj/ may not be much longer than the other fricatives, contrary to the common assumption in the descriptive phonetic literature on Russian.
Table 2 A summary of previous acoustic studies of Russian fricatives. ‘>’ stands for ‘greater than’; f = female, m = male.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171116060716312-0860:S0025100317000019:S0025100317000019_tab2.gif?pub-status=live)
3 Russian sibilant fricatives: Phonotactics and historical development
The phonetic documentation of Russian fricatives is also expected to provide some insights into the phonological patterning and historical development of these sounds. While fricatives in Modern Standard Russian can occur in a range of contexts (as noted in Section 1 above; see Table 1), the palatalized fricatives /s j / and /ʃj/ are in general more restricted than their non-palatalized counterparts /s/ and /ʂ/. Specifically, Table 3 (based on the discussion in Timberlake Reference Timberlake2004: 58–60)Footnote 5 illustrates these distributional differences across four consonant cluster contexts – word-medially post- and preconsonantally (C_V and V_C), word-initially preconsonantally (#_C), and word-finally postconsonantally (C_#). All four fricatives can occur in the first context (C_V). In the second context (V_C), /s j / and /ʃ j / can occur only before either hetero-organic or homorganic consonants, respectively. In the third context (C_#), /s j / is completely absent, while /ʃ j / can occur only after a sonorant consonant. Finally, in the fourth context (#_C), neither of the palatalized fricatives can occur. Notably, none of these restrictions apply to the non-palatalized fricatives. The observed restrictions on palatalized sibilants may well be due to the way these consonants are acoustically cued. If palatalized sibilants are primarily distinguished by contextual cues such as formant transitions into the following vowel, these consonants would be disfavoured in contexts that lack this vowel – namely the V_C, C_#, and #_C contexts (see Kochetov Reference Kochetov2006 on Russian palatalized stops). It is therefore of interest to understand both consonant-internal (fricative noise frequency and intensity) and contextual (vowel transitions) acoustic properties of Russian fricatives.
Table 3 Phonotactic distribution of Russian voiceless fricatives in clusters. Yes = present, no = absent, Cht = hetero-organic consonant, Chm = homorganic consonant, Cobs = obstruent; dark and light shading indicates, respectively, more and less restricted patterns.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171116060716312-0860:S0025100317000019:S0025100317000019_tab3.gif?pub-status=live)
Another question of interest is the historical development of Russian and Slavic sibilant systems (Hamann Reference Hamann2004, Padgett & Żygis Reference Padgett and Żygis2007, Żygis & Padgett Reference Żygis and Padgett2010). As Padgett & Żygis (Reference Padgett and Żygis2007) note, Russian sibilant fricatives have undergone a series of phonetic changes which can be best viewed as perceptual optimization through dispersion (Liljencrants & Lindblom Reference Liljencrants and Lindblom1972, Lindblom & Maddieson Reference Lindblom, Maddieson, Hyman and Li1988, Flemming Reference Flemming2002). This process is schematically represented in Figure 1, based on the discussion in Borkovskii & Kuznetsov (Reference Borkovskii and Kuznetsov1965). At Stage 1 (Old Russian, prior to the 12th century), the language exhibited a three-way sibilant contrast among the non-palatalized dental /s/, palatalized dental /s j /, and a palatalized post-alveolar or alveolopalatal /ʃ j /. At Stage 2 (by the 14th century), the latter consonant became retroflex (or apical post-alveolar), thus shifting away from /s j /. The resulting gap in the sibilant inventory was filled with a sequence /ʂ/ + /ʧj/ at Stage 3. This sequence was originally produced as [ʃ j ʧ j ], subsequently developing at Stage 4 into a new phoneme /ʃj/, which was inherently long ([ʃ j ː]). Presumably the shift of the original /ʃj/ to /ʂ/ was driven by the need to perceptually differentiate the former from the palatalized /s j /, which is potentially unstable (Hamann Reference Hamann2004, Padgett & Żygis Reference Padgett and Żygis2007). The addition of the fourth fricative to the system, /ʃj/, could have served to stabilize the already well-dispersed sibilant contrasts, while extending the set of non-palatalized vs. palatalized contrasts to posterior coronals (i.e. re-using the already highly functional articulatory mechanism; see Clements Reference Clements2003 on ‘feature economy’). In other words, the historical development of Russian sibilant fricatives possibly presents an interesting test case for Lindblom & Maddieson's (Reference Lindblom, Maddieson, Hyman and Li1988: 72) proposed phonetic universal: ‘Consonant inventories tend to evolve so as to achieve maximal perceptual distinctiveness at minimum articulatory cost’.Footnote 6 Whether this interpretation of the historical development of Russian fricatives and its relevance to universal phonetic principles are correct depends to a large degree on our understanding of acoustic properties differentiating the four-way fricative contrast.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171116060716312-0860:S0025100317000019:S0025100317000019_fig1g.gif?pub-status=live)
Figure 1 A diagram outlining stages in the historical development of Russian voiceless sibilant fricatives.
In sum, this paper is motivated by the need to acoustically document a typologically uncommon sibilant contrast, while also seeking answers to the long-standing questions of phonological patterning and sound change involving Russian fricatives.
4 Method
4.1 Speakers
The participants were 10 native speakers of Standard Russian, five females and five males. They were all born and raised in RussiaFootnote 7 at least until the age of 16, and had lived in Canada for no more than six years. Their age ranged from 19 to 29 years, with a median of 21.5 years. At the time of the experiment, the participants were University of Toronto undergraduate or graduate students (none of whom had taken linguistics courses). While fluent in English, the participants reported to be using Russian on a day-to-day basis. They also reported to have normal speech and hearing.
4.2 Materials and the procedure
There were three sets of materials, all of which included words with the four fricatives /s s j ʂ ʃ j /. In the first (and the largest) set, these consonants appeared word-initially and word-medially before the stressed vowels /a e i o u/.Footnote 8 For example, the set of words with these consonants in word-initial position before /o/ included [sok] ‘juice’, [s j ok] ‘whipped’, [ʂok] ‘shock’, and [ʃ j ok] ‘cheeks, gen’. There were a total of 40 words in this set, a complete list of which is given in Table A1 in the appendix. The purpose of this set was to systematically investigate the four-way contrast and its realization in two prevocalic positions in a variety of vowel contexts. Set 2 was designed to examine differences between the same consonants word-initially and word-finally. It included eight words (four of which were from Set 1) with four fricatives at the beginning or at the end of words, all either before or after a stressed /o/. The word-final context is also interesting because of the previously noted optional degemination of /ʃ j / (Avanesov Reference Avanesov1984, Timberlake Reference Timberlake2004). Set 3 was designed to explore the effect of stress, and included eight words where the same consonants appeared before stressed and unstressed /a/. Four of these words were from Set 1. Both word lists are given in Tables A2 and A3 in the appendix. All the items were randomized, embedded in a carrier phrase [ˈpapa ___ paftaˈr j il] ‘Dad repeated ___.’ and presented in the Cyrillic script. Three repetitions of each item were recorded, giving 144 items per speaker (48 items × 3 repetitions).
Audio recordings were performed in a soundproof booth using a digital audio recorder Fostex FR-2 and an ATR3035 condenser microphone with the quantization of 16 bits and the sampling rate of 44000 Hz. Altogether, 1321 tokens (the intended 1320 items minus nine omissions or reading errors) were collected.
4.3 Analysis
The collected audio data were annotated in Praat (Boersma & Weenink Reference Boersma and Weenink2015). All fricatives and following vowels were segmented out based on a visual inspection of the waveform and spectrogram. The onset and the offset of the fricative were taken to be the onset and the offset of fricative noise (appearing on the spectrogram above 2500 Hz), typically coinciding with the lack of vocal fold vibration. The onset and the offset of the vowel before the fricative were taken to be the onset of vocal fold vibration (the first glottal pulse) and the onset of frication (see above). The onset and the offset of the vowel after the fricative were taken to be the offset of frication and the offset of vocal fold vibration. These measurements were based on the waveform with reference to the corresponding spectrogram, where the boundaries tended to coincide with the onset and offset of the second formant, F2. Figure 2 presents a sample annotated token of /s j / in [s j ok], indicating the intervals used in the analysis – the frication of /s j / and the following vowel /o/.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171116060716312-0860:S0025100317000019:S0025100317000019_fig2g.jpeg?pub-status=live)
Figure 2 An example of data annotation for a token of the word [s j ok] ‘whipped’ produced by speaker F1, with approximate analysis windows (see the text) shown below the waveform.
Annotated data were subjected to the following measurements using a Praat script:
-
• Consonant duration (in ms).
-
• Amplitude difference (in dB), calculated by subtracting the amplitude at the midpoint of the fricative (C-mid, see Figure 2) from the amplitude taken at the midpoint of the following vowel, or the preceding vowel for word-final fricatives (V-mid), measured using a 25 ms Gaussian window. Lower amplitude difference corresponds to greater intensity of a fricative. The difference, rather than the absolute amplitude, was used to normalize for inter-speaker differences in recording levels (see Jongman, Wayland & Wong Reference Jongman, Wayland and Wong2000, Kochetov & Lobanova Reference Kochetov and Lobanova2007).
-
• Centre of gravity of fricative noise (COG, or the first spectral moment, in Hz), measured at three points in time: at the onset (C-on), the midpoint (C-mid), and the offset of the fricative (C-off), using a 25 ms Gaussian window and a 500 Hz to 10000 Hz pass Hann filter. The windows were either aligned to fricative edges (C-on and C-off) or centred at the midpoint (C-mid). The low cutoff was set to exclude low-frequency room noise or voicing leakage from surrounding vowels, if any (see Zsiga Reference Zsiga2000, Nowak Reference Nowak2006). The COG measure was chosen as it is most commonly used to study spectral differences in fricatives (Jongman et al. Reference Jongman, Wayland and Wong2000, Gordon et al. Reference Gordon, Barthmaier and Sands2002, among others), including Russian fricatives (see Table 2; but see Jesus & Shadle Reference Jesus and Shadle2002 and Spinu, Vogel & Bunnell Reference Spinu, Vogel and Bunnell2012 on some limitations of and alternatives to the method).
-
• Formants F1, F2, and F3 (Hz) measured at three points within the following vowel (or the preceding vowel for word-final fricatives) – the onset (V-on), the midpoint (V-mid), and the offset (V-off), using a 25 ms Gaussian window and the Formant (Burg) algorithm. The windows were either aligned to fricative edges (V-on and V-off) or centred at the midpoint (V-mid). Additionally F2–F1 differences were calculated to determine the magnitude of the difference between palatalized and non-palatalized consonants (Iskarous & Kavitskaya Reference Iskarous and Kavitskaya2010).
The results for Set 1 were analyzed statistically using Repeated Measures Analyses of Variance (RM ANOVAs) separately for each dependent variable (duration, COG, etc.) with within-subjects factors Consonant (/s s j ʂ ʃ j /) and Position (initial and medial), and between-subjects factor Gender (female, male). These were based on means for each participant, averaged over five vowel contexts (before /a e i o u/). Bonferroni post-hoc tests were performed to investigate differences among the four consonants. Fricative variation due to vowel context was not investigated; however, relevant descriptive statistics by vowel are presented in the appendix (Tables A4–A9). Prior to the RM ANOVA analyses, the results for Set 1 were explored using hierarchical clustering analysis (employing the Ward's cluster method and the measure of squared Euclidean distance). This was done based on mean values for each consonant in initial and medial positions using all dependent variables, with values transformed to z-scores. Data from Sets 2 and 3 were analyzed using paired-samples t-tests (two-tailed) to examine differences between the same consonants in initial and final positions or before stressed and unstressed vowels. For the reason of space, only significant (p ≤ .05) effects and interactions are reported.
5 Results
5.1 Contrasts in prevocalic stressed position
5.1.1 Overview
This section provides an overview of acoustic differences among the four voiceless fricatives, as suggested by the hierarchical clustering analysis. The dendrogram in Figure 3 is based on all acoustic variables used in the analysis, both internal cues (consonant duration, amplitude difference, and COG at three points) and contextual cues (vowel formants F1, F2, and F3 at three points). Distance between cases is shown in the x-axis at the top, increasing from 0 to 25 and above. Higher values indicate more acoustically dissimilar cases, which are represented on the y-axis. Going from right to left, two main clusters emerge – those for non-palatalized and palatalized fricatives. Within each of those, two smaller clusters involve place differences – between /s/ and /ʂ/, and between /ʃ j / and /s j /. Each consonant case is further split into two sub-clusters by position. This clustering analysis thus suggests that the main acoustic difference among the examined phones is secondary articulation, followed by place of articulation, and then by position. Notably, the analysis also suggests that place differences are greater among the non-palatalized fricatives, while positional differences are somewhat greater for the palatalized fricatives.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171116060716312-0860:S0025100317000019:S0025100317000019_fig3g.gif?pub-status=live)
Figure 3 A clustering dendrogram showing the relation among the fricatives /s/, /s j /, /ʂ/, and /ʃ j / in initial and medial position based on all acoustic cues.
To what extent is the observed differentiation of the fricatives due to internal or contextual cues as a whole, or due to specific acoustic variables, such as duration, intensity, COG, and formant frequencies? The subsequent sections will explore these questions in detail.
5.1.2 Internal cues: Consonant duration and spectral properties
An RM ANOVA for consonant duration showed main effects of Consonant (F(3,24) = 25.242, p < .001) and Position (F(1,8) = 19.979, p = .002). Bonferroni post-hoc tests revealed that the consonant differences involved /ʃ j / on the one hand and the other fricatives on the other hand (/s/: p < .001; /ʂ/: p < .01; /s j /: p < .05). The palatalized posterior fricative was significantly longer than the others, although absolute differences were relatively small, below 20 ms. Means and standard deviations for consonant duration were 143 (28) ms for /s/, 146 (26) ms for /s j /, 147 (27) ms for /ʂ/, and 160 (26) ms for /ʃj/. Based on the latter two values, the ratio between the presumed geminate /ʃj/ and the singleton /ʂ/ was 1.09. The significant effect of Position was due to the longer duration of initial fricatives compared to intervocalic fricatives. This difference was on average 10 ms. These consonant and position differences are illustrated in Figure 4. Gender was not significant, and there were no significant interactions of any of the factors.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171116060716312-0860:S0025100317000019:S0025100317000019_fig4g.gif?pub-status=live)
Figure 4 Mean consonant duration (in seconds) values for /s/, /s j /, /ʂ/, and /ʃ j / by position (initial and medial).
For amplitude difference, an RM ANOVA showed a main effect of Consonant (F(3,24) = 10.135, p < .001). Based on Bonferroni post-hoc tests, significant differences involved /s j / on the one hand and the posterior fricatives /ʂ/ and /ʃ j / on the other (both ps < .01); the differences between /s/ vs. /ʂ/ and /ʃ j / did not reach significance. As seen in Figure 5, anterior fricatives show on average higher amplitude difference (i.e. being less intense) than posterior fricatives. The factors Position and Gender were not significant; there were no significant interactions of any of the factors.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171116060716312-0860:S0025100317000019:S0025100317000019_fig5g.gif?pub-status=live)
Figure 5 Mean amplitude difference (dB) values for /s/, /s j /, /ʂ/, and /ʃ j /.
For COG (illustrated in Figure 6), RM ANOVAs were performed at each of the three time points: the onset, the midpoint, and the offset. The analyses revealed a main effect of Consonant at all three points in time (F(3,24) = 208.216, p < .001; F(3,24) = 200.145, p < .001; F(3,24) = 46.885, p < .001, respectively), as well as a main effect of Gender at the midpoint and the offset (F(1,8) = 7.657, p < .05; F(1,8) = 6.823, p = .047). Among the consonants, all pairs were significantly different from each other at the fricative onset: /s/ vs. /ʂ/ and /ʃ j / (p < .001), /s/ vs. /s j / (p < .01), /s j / vs. /ʂ/ and /ʃ j / (p < .001), /ʂ/ vs. /ʃ j / (p < .05). COG consonant differences at the midpoint were significant only for the fricative pairs that differed in place: anterior /s/ and /s j / vs. posterior /ʂ/ and /ʃ j / (all ps < .001). The same was observed for the consonant differences in COG taken at the fricative offset: /s/ vs. /ʂ/ and /ʃ j / (p < .001 and p < .01, respectively), /s j / vs. /ʂ/ and /ʃ j / (p < .001 and p < .01, respectively). Notably, COG differences between /ʂ/ and /ʃ j / were not significant at the midpoint and the offset, and at best marginally significant at the offset (p = .071). On average, COG values at the midpoint were 7236 Hz for /s/, 6989 Hz for /s j /, 3884 Hz for /ʂ/, and 4296 Hz /ʃ j /. The above-mentioned Gender effect was due to higher COG for the female speakers compared to the male speakers. This difference was on average 1141 Hz at the midpoint and 1284 Hz at the offset. The factor Position was not significant; however, at two time points, there were significant interactions of Consonant × Position (offset: F(3,24) = 6.169, p = .003) and Consonant × Position × Gender (midpoint: F(3,24) = 3.080, p = .047). The first interaction was due to lower COG of /s/ in the word-medial position compared to the word-initial position. The second interaction appears to be due to on average higher difference between two posterior fricatives /ʂ/ and /ʃ j / (higher COG for the latter) produced by male speakers in the intervocalic position, compared to the word-initial position or to the female speakers’ productions.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171116060716312-0860:S0025100317000019:S0025100317000019_fig6g.gif?pub-status=live)
Figure 6 Mean COG (Hz) values for /s/, /s j /, /ʂ/, and /ʃ j / at the onset, midpoint, and offset of frication, separately for female and male speakers.
Figure 6 plots mean COG at three points in time for the four fricatives, separately for female and male speakers. It can be seen that there is a clear distinction between anterior fricatives showing higher COG (above 5000 Hz for females and 4000 Hz for males) and posterior fricatives showing lower COG (below the same thresholds). Within the anterior fricatives, average values are slightly lower for /s j / than /s/ (more so for the males and mainly at the fricative onset). Within the posterior ones, values are on average higher for /ʃ j / than /ʂ/. Neither of these differences, however, were significant at the midpoint and the offset. Means and standard deviations for selected internal cue variables by vowel context are given in Tables A4–A6 in the appendix.
Table 4 provides a summary of statistical results for each of the internal cue dependent variables. It can be seen that the Consonant effect turned out to be highly significant in all analyses. Some pairs of consonants, however, were distinguished by internal cue variables better than others. Specifically, the most acoustically differentiated were the pairs that differ in place (/ʃ j / vs. /s j / and /s/, /ʂ/ vs. /s j / and /s/). Considerably less distinguished were the pairs having the same place and differing in secondary articulation, with /ʂ/ vs. /ʃ j / differing in duration and COG at the onset (and marginally in COG at the offset), and /s/ vs. /s j / differing only in COG at the onset. Among the variables, COG was useful in distinguishing most contrasts (primarily involving place differences), followed by intensity (place) and duration (/ʃ j / vs. the others).
Table 4 A summary of RM ANOVAs and Bonferroni post-hoc results for the factor Consonant: Internal cue variables. ‘<’ and ‘>’ indicate, respectively, higher and lower values; significance: *** = p < .001, ** = p < .01, * = p < .05, ns = not significant; cells with significant results are shaded.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171116060716312-0860:S0025100317000019:S0025100317000019_tab4.gif?pub-status=live)
5.1.3 Contextual cues: Spectral properties of the following vowel
Turning to contextual cues, RM ANOVAs for vowel formants were performed at each of the three time points: the onset, the midpoint, and the offset (see Figure 7). The results for F1 showed a main effect of Consonant at all three points (onset: F(3,24) = 26.586, p < .001; midpoint: F(3,24) = 25.245, p < .001; offset: F(3,24) = 5.316, p = .006). Significant consonant differences, determined by post-hoc tests, showed lower F1 after palatalized fricatives compared to non-palatalized ones: for all pairs at the onset (/s j / vs. /s/: p < .01; /s j / vs. /ʂ/: p < .001; /ʃ j / vs. /ʂ/: p < .01) and at the midpoint (/s j / vs. /s/ and /ʂ/: p < .01; /ʃ j / vs. /s/ and /ʂ/: p < .001), as well as for one pair at the offset (/ʃ j / vs. /ʂ/: p < .01). There was a main effect of Gender at the midpoint (F(1,8) = 6.311, p = .036), with females having higher F1 than males. This difference was on average 70 Hz at the vowel midpoint. The factor Position was not significant, and there were no significant interactions.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171116060716312-0860:S0025100317000019:S0025100317000019_fig7g.gif?pub-status=live)
Figure 7 Mean formant (F1, F2, F3; Hz) values for /s/, /s j /, /ʂ/, and /ʃ j / at the onset, midpoint, and offset of the following vowel, separately for female and male speakers.
RM ANOVAs for F2 showed a main effect of Consonant at all three points (onset: F(3,24) = 96.273, p < .001; midpoint: F(3,24) = 54.918, p < .001; offset: F(3,24) = 19.065, p < .001). Significant consonant differences, determined by post-hoc tests, showed higher F2 for palatalized fricatives compared to the non-palatalized ones: for all pairs at the onset (/s j / vs. /s/ and /ʂ/: p < .001; /ʃ j / vs. /s/ and /ʂ/: p < .001) and at the midpoint (/ʃ j / vs. /s/: p < .01; /ʃ j / vs. /ʂ/: p < .001; /s j / vs. /s/: p < .01; /s j / vs. /ʂ/: p < .001), as well as for two pairs at the offset (/ʃ j / vs. /s/ and /ʂ/: p < .01). There was a main effect of Gender at all three points (onset: F(1,8) = 19.174, p = .002; midpoint: F(1,8) = 17.028, p = .003; offset: F(1,8) = 7.851, p = .023), with female speakers having significantly higher F2 than male speakers. This difference was on average 139 Hz at the vowel midpoint. There were, however, significant Consonant × Gender interactions at two points (onset: F(3,24) = 11.067, p < .001; midpoint: F(3,24) = 4.668, p < .05). These interactions were due to the greater Gender differences in F2 of the palatalized fricatives. The factor Position was not significant; there were no other significant interactions.
RM ANOVAs for F3 showed a main effect of Consonant at the first two points (onset: F(3,24) = 11.456, p < .001; midpoint: F(3,24) = 13.288, p < .001). Post-hoc tests revealed significantly lower F3 at the onset for /ʂ/ compared to /s/ and /ʃ j / (p < .01 and p < .05, respectively), and higher F3 at the midpoint for /s/ compared to /ʂ/ and /ʃ j / (p < .01 and p < .05, respectively). The factor Position was not significant; however, there was a significant Consonant × Position interaction at two points (midpoint: F(3,24) = 6.540, p = .002; offset: F(3,24) = 5.054, p = .007). This interaction was due to the difference between /s j / and /ʃ j / (lower F3 for the latter) word-medially but not word-initially. There was a main effect of Gender at the vowel onset (F(1,8) = 10.878, p = .011), with F3 being higher for females than males (by 270 Hz).
To illustrate some of these differences, F1–F3 at three points are plotted in Figure 7, separately by consonant and gender. It can be seen that the main difference between palatalized and non-palatalized consonants is in the F2 transition. This transition starts high for /s j / (on average 1896 Hz) and /ʃ j / (1927 Hz) and gradually declines. The transition for /s/ and /ʂ/ is largely flat (starting at 1520 Hz and 1573 Hz, respectively) or slightly declining. Being quite robust at the onset, F2 differences between palatalized and non-palatalized consonants are still large at the midpoint, and noticeable towards the offset of the vowel. Vowels after palatalized consonants also have somewhat lower F1, although these differences are much smaller in magnitude (with values being on average 438 Hz for /s j /, 473 Hz for /ʃ j / vs. 500 Hz for /s/ and 532 Hz for /ʂ/). Altogether, palatalized consonants exhibit a higher separation of F1 and F2 (1458 Hz and 1454 Hz) compared to non-palatalized consonants (1020 Hz and 1041 Hz). F3 is slightly lower for /ʂ/ than for the other fricatives (on average 2671 Hz for /ʂ/ vs. 2815 Hz for /s/, 2765 Hz for /s j /, and 2737 Hz for /ʃ j /).
Table 5 provides a summary of statistical results for each of the contextual dependent variables. It can be seen that the Consonant effect was significant in all analyses involving F1 and F2, as well as in all but one analysis of F3. Vowel duration was not affected by consonant contrasts. Considering significant differences among the consonants, the pairs that differed in palatalization were the most distinct: /ʃ j / vs. /ʂ/ and /s/, and /s j / vs. /ʂ/ and /s/ (in declining order). Place-only differences were either distinguished by F3, as for /s/ vs. /ʂ/, or not distinguished at all, as for /ʃ j / vs. /s j /. Means and standard deviations for selected contextual cue variables by vowel context are given in Tables A7–A9 in the appendix.
Table 5 A summary of RM ANOVAs and Bonferroni post-hoc results for the factor Consonant: Contextual cue variables. ‘<’ and ‘>’ indicate, respectively, higher and lower values; significance: *** = p < .001, ** = p < .01, * = p < .05, ns = not significant; cells with significant results are shaded.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171116060716312-0860:S0025100317000019:S0025100317000019_tab5.gif?pub-status=live)
5.1.4 Summary
To summarize, the analyses of internal cue variables showed a robust differentiation of fricatives by place (in COG and amplitude difference). Contrasts within places of articulation (anterior /s/ vs. /s j / and posterior /ʂ/ vs. /ʃ j /) were not significantly distinguished by internal cue variables, with the exception of COG at the onset (both pairs) and duration (the latter pair). On the other hand, the analyses of contextual cue variables showed a robust differentiation of secondary articulation differences – /s/ vs. /s j / and /ʂ/ vs. /ʃ j / (in both F1 and F2), but a rather limited differentiation of the primary place (only /s/ vs. /ʂ/ in F3).
Differences in position involved primarily consonant duration (initial > medial), as well as COG or formants for specific consonants. Some of these differences can be attributed to positional enhancement (e.g. higher COG in the more prominent word-initial position), while others could reflect contextual differences specific to lexical items (e.g. presence of postvocalic palatalized consonants). Gender differences were found for fricative COG and formants F1–F3, with females showing higher values than males for both spectral properties. These differences are fully expected, given the well-established physiological and sociophonetic differences in these parameters (e.g. Jongman et al. Reference Jongman, Wayland and Wong2000, Munson et al. Reference Munson, McDonald, DeBoe and White2006). Somewhat unexpectedly, however, these differences were greater for some consonants than others – specifically for anterior fricatives in COG and for palatalized fricatives in F1 and F2. This resulted in the place and secondary articulation contrasts being more dispersed for females than males.
Overall, COG and the first two formants emerged as the primary acoustic characteristics of the four-way contrast, with the latter measure being crucial to distinguishing the primary place of articulation (anterior vs. posterior), and the former measure distinguishing the secondary articulation (palatalized vs. non-palatalized) of the fricatives. As shown in Figure 8, these two parameters clearly define the contrasts in both word-initial and word-medial positions. The same can be said about individual results shown in Figure 9 (averaged by position): all 10 speakers show a relatively symmetrical dispersion of the fricative contrasts, although to a lesser extent in the F2–F1 dimension for some of the male speakers. (See Figure A1 in the appendix for the results by vowel context.)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171116060716312-0860:S0025100317000019:S0025100317000019_fig8g.gif?pub-status=live)
Figure 8 Plots showing mean COG at the midpoint (Hz) by mean F2–F1 difference at the vowel onset (Hz) for the fricatives /s/, /s j /, /ʂ/, and /ʃ j / separately by position – word-initial and word-medial (averaged for vowel contexts and speakers).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171116060716312-0860:S0025100317000019:S0025100317000019_fig9g.gif?pub-status=live)
Figure 9 Plots showing mean COG at the midpoint (Hz) by mean F2–F1 difference at the vowel onset (Hz) for the fricatives /s/, /s j /, /ʂ/, and /ʃ j / separately for each speaker (averaged for positions and vowel contexts).
5.2 Fricatives in other contexts
This section presents results for the remaining comparisons – initial/final position and stress.
5.2.1 Word-final position
Words from Set 2 (see appendix Table A2) were investigated using paired samples t-tests (t-tailed) to determine differences between word-initial and word-final fricatives. Compared to initial fricatives, the same consonants in final position had shorter duration (/s/: t(9) = –16.680, p < .001; /s j /: t(9) = –11.673, p < .001; /ʂ/: t(9) = –9.197, p < .001; /ʃ j /: t(9) = –17.195, p < .001). As can be seen in Figure 10, this reduction was quite substantial, ranging between 73 ms and 91 ms (i.e. 50–60% of the initial duration of the consonants), being the largest for /ʃ j /. This resulted in the /ʃ j / duration being shorter than for its ‘short’ non-palatalized counterpart /ʂ/ (73 ms vs. 80 ms), effectively neutralizing the length contrast.Footnote 9 COG at the midpoint of /s j / was lower in final position compared to initial position (t(9) = 5.386, p < .001). The difference was about 950 Hz.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171116060716312-0860:S0025100317000019:S0025100317000019_fig10g.gif?pub-status=live)
Figure 10 Mean consonant duration (in seconds) values for /s/, /s j /, /ʂ/, and /ʃ j / by position, initial and final (both next to vowel /o/).
All word-final fricatives showed higher F2 (measured at the vowel offset) compared to word-initial fricatives (measured at the vowel onset) (/s/: t(9) = 4.486, p < .01; /s j /: t(9) = 3.192, p < .05; /ʂ/: t(9) = 4.896, p < .01; /ʃ j /: t(9) = 2.939, p < .05). This difference was on average 170 Hz, and is possibly due to differences in the make-up of word-initial and word-final stimuli (with the other vowel-adjacent consonant being a velar or alveolar/labial, respectively). In addition, word-final anterior fricatives showed lower F3 than their counterparts in word-initial position (/s/: t(9) = –3.051, p < .05; /s j /: t(9) = –7.773, p < .001), by on average 240 Hz. These differences appear to be due to F3 lowering by the preceding /r/ for the items [zbros] and [bros j ].
5.2.2 Unstressed vowel context
Words from Set 3 (see Table A3) were examined for the effect of stress on the fricatives. Compared to their stressed counterparts, fricatives in unstressed syllables were characterized by shorter duration (/s/: t(1,9) = 4.935, p < .01; /s j /: t(9) = 8.870, p < .001; /ʂ/: t(9) = 7.815, p < .001; /ʃ j /: t(9) = 3.677, p < .01) and lower F1 at the onset of the vowel (/s/: t(9) = 4.935, p < .01; /s j /: t(9) = 5.286, p < .01; /ʂ/: t(9) = 4.006, p < .01, /ʃ j /: t(9) = 3.677, p < .01). As seen in Figure 11, the duration differences were relatively small, on average 25 ms, which corresponds to a 20% reduction in duration in unstressed syllables. This reduction did not affect the duration difference between /ʃ j / and the other fricatives, as the former was still on average longer than the other fricatives (136 ms vs. 99–104 ms). It should be noted that the /ʂ/ vs. /ʃ j / duration ratio observed here (1.25–1.30) is somewhat higher than in the analysis reported in Section 5.1.2 above. This difference possibly reflects some variation in the duration of /ʃ j / by position and vowel context. The above-mentioned F1 differences were on the magnitude of 100 Hz, and seem to reflect some raising (centralization) of /a/ to [ʌ] or [ə] in unstressed syllables. In addition, vowels adjacent to palatalized fricatives in unstressed position showed lower F3 than vowels adjacent to the same consonants in stressed position (/s j /: t(9) = 3.465, p < .01; /ʃ j /: t(9) = 3.464, p < .01). This difference, the source of which is unclear, was about 140 Hz.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171116060716312-0860:S0025100317000019:S0025100317000019_fig11g.gif?pub-status=live)
Figure 11 Mean consonant duration (in seconds) values for /s/, /s j /, /ʂ/, and /ʃ j / in two stress conditions (stressed and unstressed).
5.2.3 Summary
The results presented in this section showed that duration was important in signaling positional and stress effects, with word-final and unstressed fricatives being considerably shorter than their word-initial and stressed counterparts. An important result of this positional shortening is the neutralization of the durational distinction between /ʃ j / and the other fricatives, the contrast that was already relatively small in the more prominent initial and medial positions. Other positional differences involved less consistent variation in COG, F2, and F3, possibly reflecting effects of non-adjacent consonants or vowel centralization. The positional shortening of final consonants seems unexpected given the literature documenting the propensity of the former to lengthening (e.g. Beckman & Edwards Reference Beckman, Edwards, Kingston and Beckman1990). It should be noted, however, that lengthening usually involves utterance-final consonants, while the word-final consonants in this study were utterance-medial, embedded in a carrier phrase. Shortening of word-final fricatives was also reported by Kochetov & Lobanova (Reference Kochetov and Lobanova2007) for Komi-Permyak.
6 Discussion and conclusion
The results of the study demonstrate that Russian voiceless sibilant fricatives are robustly distinguished by spectral differences – the frequency concentration of the fricative noise and the formants of an adjacent vowel. The higher COG for anterior fricatives /s s j / and the lower COG for posterior fricatives /ʂ ʃ j / is consistent with previous analyses of the Russian contrast (Bolla Reference Bolla1981, Funatsu & Kiritani Reference Funatsu and Kiritani1998, Zsiga Reference Zsiga2000, Padgett & Żygis Reference Padgett and Żygis2007, Kochetov & Radišić Reference Kochetov, Radišić, Babyonyshev, Kavitskaya and Reich2009), as well as is expected based on what we know about the articulation of these consonants (e.g. Skalozub Reference Skalozub1963, Bolla Reference Bolla1981, Avanesov Reference Avanesov1984, Kedrova et al. Reference Kedrova, Anisimov, Zaharov and Pirogov2008). What is new, however, is the finding that the anterior/posterior differences are substantial and significant throughout the fricative (and regardless of the vowel context). While this is not surprising for the non-palatalized fricatives /s/ and /ʂ/ (see Gordon et al. Reference Gordon, Barthmaier and Sands2002 on place differences in sibilants across languages), it is notable for the palatalized pair, /s j / and /ʃ j /. Cross-linguistically, anterior coronals are prone to shifting to the posterior place when overlapped by or adjacent to a palatal articulation (Bhat Reference Bhat and Greenberg1978, among others). We could therefore expect /s j / to spectrally approach /ʃ j /, at least its offset. This is clearly not the case in our data: the COG of /s j / was consistently high (/s/-like) throughout the frication, including the fricative offset. This is in line with Zsiga's (Reference Zsiga2000) results, which also showed a temporally stable COG for Russian /s j /, in contrast with a gradually decreasing COG in English /s+j/ sequences (e.g. /sj/ → [sʃ] in miss you). The lack of a similar palatal assimilation in Russian presumably reflects a language-particular constraint on the coarticulation of the overlapping tongue tip and tongue body gestures, effectively suppressing the otherwise expected categorical or gradient change in place. As palatalized consonants in Russian also contrast with consonant + glide sequences (e.g. [s j el] ‘sat down’ vs. [sjel] ‘has eaten’), it would be interesting to examine how these contrasts are implemented acoustically and articulatorily.
Also new is the finding of within-category differences in COG at least at the onset of frication: the palatalized /ʃ j / had higher COG than the non-palatalized /ʂ/, while the palatalized /s j / had lower COG than the non-palatalized /s/. These differences (and particularly for /ʂ/ vs. /ʃ j /) were also observed numerically at the midpoint and the offset of frication, although they did not reach significance. A non-significant tendency towards this was previously noted in Padgett & Żygis (Reference Padgett and Żygis2007) and observed in Kochetov & Radišić (Reference Kochetov, Radišić, Babyonyshev, Kavitskaya and Reich2009), the studies examining sibilant fricatives in the context before /a/.
Another internal acoustic property – fricative noise intensity (reflected in the measure of amplitude difference) – was also found to contribute to the realization of primary place contrasts. This is in part consistent with Kochetov & Radišić’s (Reference Kochetov, Radišić, Babyonyshev, Kavitskaya and Reich2009) findings (where both posterior fricatives before /a/ were more intense) and different from Bolla's (Reference Bolla1981) observation (where palatalized fricatives were louder; not controlled for the following vowel).
While fricative noise spectral differences were crucial to distinguishing place contrasts, formant differences during the vowel were important for the palatalized vs. non-palatalized distinction. Higher F2 of an adjacent (and particularly the following) vowel is a well-known correlate of consonant palatalization, reflecting the decreasing size of the front cavity (Fant Reference Fant1960, Shupljakov et al. Reference Shupljakov, Fant and de Serpa-Leitao1968, Derkach et al. Reference Derkach Miron and de Serpa-Leitao1970, Purcell Reference Purcell1979, among others). To a lesser degree this applies to lower F1, corresponding to the raising of the tongue (e.g. Derkach et al. Reference Derkach Miron and de Serpa-Leitao1970). The resulting F2–F1 difference at the vowel onset in our data was found to be consistently higher for palatalized consonants than non-palatalized ones, being on average 425 Hz (and ranging from 345 Hz to 510 Hz depending on the vowel context). This is comparable for the difference obtained by Purcell (Reference Purcell1979) for Russian coronal stops across five vowel contexts (490 Hz) and by Kochetov & Radišić (Reference Kochetov, Radišić, Babyonyshev, Kavitskaya and Reich2009) for the fricatives in the context of /a/ (550 Hz). Moreover, the current results showed that formant differences were not limited to the onset of the vowel, but were almost as robust at vowel midpoint, and – at least for /ʃ
j
/ – still significant at the vowel offset. This reflects the large temporal extent of the palatalization gesture in Russian, as well as a seemingly greater degree of palatalization of /ʃ
j
/ compared to /s
j
/. The lower F3 for /ʂ/ (compared to /s/) can be attributed to the consonant tip-up (retroflex-like) articulation. This difference, however, was fairly small (e.g. 200 Hz for for the /ʂ/ – /s/ pair), in comparison to the robust differences found for the true retroflex/non-retroflex contrasts as in Toda (Gordon et al. Reference Gordon, Barthmaier and Sands2002). This suggests that the Russian /ʂ/ is relatively weakly retroflexed, comparable to apical post-alveolar fricatives in other Slavic and Finno-Ugric languages (Hamann Reference Hamann2004, Kochetov & Lobanova Reference Kochetov and Lobanova2007). Given this, an alternative IPA transcription for the Russian non-palatalized posterior fricative would be /
/ (apical post-alveolar; see Lee & Zee Reference Lee and Zee2003 for Standard Chinese).
Overall, the study shows that the primary acoustic distinction among the four Russian consonants is that of the secondary articulation, followed by primary place, and then positional differences (see the results of the hierarchical clustering analysis in Figure 3). Interestingly, the results also show that place differences are greater among the non-palatalized fricatives, while positional differences are somewhat greater for the palatalized fricatives. This possibly reflects the lesser stability and greater variability of palatalized consonants in the system of contrasts. The finding that the palatalized /s j / and /ʃ j / are distinguished from their non-palatalized counterparts /s/ and /ʂ/ primarily by F1 and F2 of adjacent vowels is important. It explains the contextual restrictions on the distribution of palatalized fricatives. Recall from Table 3 that /s j / and /ʃ j / are partially restricted word-medially before consonants (V_C; but not before vowels, C_V) and partially or fully avoided word-finally after consonants (C_#) and word-initially before consonants (#_C). Note that the last three contexts lack the following vowels, and thus the cues to the palatalization distinction are considerably reduced (to the preceding vowel or sonorant consonant transitions in V_C and C_#) or almost absent (in #_C). Overall, this provides evidence for the role of acoustic cues in the distribution of phonological contrasts (Steriade Reference Steriade1997; see Kochetov Reference Kochetov2006 on Russian stops). Further, while phonotactically, Russian /ʂ/ and /ʃ j / pattern just like other non-palatalized or palatalized consonants, they behave differently in alternations (Timberlake Reference Timberlake2004; see Section 1 above). The ‘immutable’ status of posterior fricatives can be attributed to their somewhat greater acoustic contrast compared to /s/ vs. /s j /, as manifested in on average larger COG differences and a longer span of F1/F2 differences across the following vowel.
The overall robust and symmetrical differentiation of the Russian sibilant contrasts across the examined contexts provides support for the acoustic/perceptual dispersion approach to fricative inventories (Padgett & Żygis Reference Padgett and Żygis2007 on Russian and Polish; Żygis & Padgett Reference Żygis and Padgett2010 on Polish). Recall that Russian fricatives have evolved to become sufficiently distinct from each other (see Figure 1). As our results show, this enhancement must have proceeded primarily along the dimensions of F1/F2 vowel formants and frequency of fricative noise (although somewhat differently than in other Slavic languages; see Padgett & Żygis Reference Padgett and Żygis2007 on Polish). This study thus further extends the application of the concepts of perceptual optimization and dispersion (Liljencrants & Lindblom Reference Liljencrants and Lindblom1972) to consonants, beyond the more familiar cases of vowel inventories (see Lindblom & Maddieson Reference Lindblom, Maddieson, Hyman and Li1988, Flemming Reference Flemming2002; but see Ohala Reference Ohala1979).
Thus it appears that the Russian sibilant fricative inventory has evolved, in Lindblom & Maddieson's (Reference Lindblom, Maddieson, Hyman and Li1988) words, ‘to achieve maximal perceptual distinctiveness at minimum articulatory cost’.
Somewhat unexpectedly, duration turned out to contribute rather little to distinguishing the fricative contrasts. Previous descriptive phonetic accounts have often considered /ʃ j / to be geminate (i.e. /ʃː j /), being considerably longer than the other fricatives. Bolla's (Reference Bolla1981) measurements of voiceless fricatives (not controlled for vowel context) showed this to be true, with /ʃ j / being 1.55 times longer than /ʂ/. A more recent study by Kochetov & Radišić (Reference Kochetov, Radišić, Babyonyshev, Kavitskaya and Reich2009), however, found smaller differences, with /ʃ j / (before /a/) being 1.23 times longer than /ʂ/, but not significantly different from /s j /. It was not clear to what extent those results were reflecting dialect or vowel context differences. Our current results, nevertheless, are fairly consistent with the latter study, showing even smaller, albeit significant, durational differences, with the average ratio of 1.09 for /ʃ j / vs. /ʂ/. The analysis of a smaller set of data in Section 5.2.2 above showed a somewhat higher ratio of 1.25–1.30, likely reflecting some variation in the realization of /ʃ j / across positions and vowel contexts. Overall, however, these ratios are considerably lower than ratios reported for Russian contrastive and morphologically-conditioned geminates by Dmitrieva (Reference Dmitrieva and Kubozono2017), which are in the 1.6–1.8 range. This suggests that /ʃ j / in contemporary Russian is only marginally distinguished by duration, which is in contrast to the many mid-to-late 20th century descriptive and phonetic accounts (Jones & Ward Reference Jones and Ward1969, Matusevich Reference Matusevich1976, Bolla Reference Bolla1981, Avanesov Reference Avanesov1984). The marginal utility of duration is further confirmed by positional comparisons, showing that the word-final /ʃ j / was not different in duration from the other fricatives. Previous descriptive accounts mentioned the optionality of the /ʃ j / degemination in this context (Timberlake Reference Timberlake2004). Our results, however, suggest that the process is no longer optional, as the lack of contrast was exhibited both at the level of the group and individual speakers. Overall, the results suggest a change in progress, likely reflecting a fuller integration of /ʃ j / in the palatalized vs. non-palatalized system of contrasts, where consonant duration does not play a role. This in fact may be another example of the application of the principles of minimizing articulatory cost (Lindblom & Maddieson Reference Lindblom, Maddieson, Hyman and Li1988) or enforcing feature economy (Clements Reference Clements2003). At the same time, consonant duration in our results was useful in distinguishing positional variation, with considerably shorter values exhibited by all consonants in word-final and unstressed positions, compared to initial (stressed) position. Similar effects of word-final position on fricative duration were previously reported for Komi-Permyak (Kochetov & Lobanova Reference Kochetov and Lobanova2007), while stress effects on stop duration were documented for Russian (Zsiga Reference Zsiga2000).
To conclude, this study provides an acoustic snapshot of the complex and historically dynamic set of voiceless sibilant fricative contrast in contemporary Russian. Further phonetic work on complex sibilant inventories across languages is expected to provide new insights in the mechanisms underlying their historical development and to further enrich phonetic typology of fricative contrasts.
Acknowledgments
The paper has benefited greatly from insightful comments and suggestions by Shigeto Kawahara, Mayuki Matsui, and anonymous Journal of International Phonetic Association reviewers. Thanks to Milica Radišić for assistance with the collection, annotation, and extraction of data. The research was supported by grants from the Social Sciences and Humanities Research Council of Canada (#416-2006-1006 and #435-2015-2013).
Appendix. Materials and means by vowel context
Table A1 Set 1 of the stimuli: words with four fricatives in word-initial and word-medial position before five stressed vowels.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171116060716312-0860:S0025100317000019:S0025100317000019_tab6.gif?pub-status=live)
Table A2 Set 2 of the stimuli: words with four fricatives in word-initial and word-final position.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171116060716312-0860:S0025100317000019:S0025100317000019_tab7.gif?pub-status=live)
Table A3 Set 3 of the stimuli: words with four fricatives before stressed and unstressed vowels. Vstr = stressed vowel, Vunstr = unstressed vowel.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171116060716312-0860:S0025100317000019:S0025100317000019_tab8.gif?pub-status=live)
Table A4 Mean consonant duration (ms) for four consonants, by vowel context and gender, averaged over positions and speakers.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171116060716312-0860:S0025100317000019:S0025100317000019_tab9.gif?pub-status=live)
Table A5 Mean amplitude difference (dB) for four consonants, by vowel context and gender, averaged over positions and speakers.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171116060716312-0860:S0025100317000019:S0025100317000019_tab10.gif?pub-status=live)
Table A6 Mean COG (Hz) at the frication midpoint for four consonants, by vowel context and gender, averaged over positions and speakers.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171116060716312-0860:S0025100317000019:S0025100317000019_tab11.gif?pub-status=live)
Table A7 Mean F1 (Hz) at the vowel onset for four consonants, by vowel context and gender, averaged over positions and speakers.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171116060716312-0860:S0025100317000019:S0025100317000019_tab12.gif?pub-status=live)
Table A8 Mean F2 (Hz) at the vowel onset for four consonants, by vowel context and gender, averaged over positions and speakers.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171116060716312-0860:S0025100317000019:S0025100317000019_tab13.gif?pub-status=live)
Table A9 Mean F3 (Hz) at the vowel onset for four consonants, by vowel context and gender, averaged over positions and speakers.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171116060716312-0860:S0025100317000019:S0025100317000019_tab14.gif?pub-status=live)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171116060716312-0860:S0025100317000019:S0025100317000019_fig12g.gif?pub-status=live)
Figure A1 Plots showing mean COG at the midpoint (Hz) by mean F2–F1 difference at the vowel onset (Hz) for the fricatives /s/, /s j /, /ʂ/, and /ʃ j / separately in each vowel context (averaged over positions and speakers).