1 Background
1.1 Introduction
Glottalic (or ejective) stops common in some languages (Maddieson Reference Maddieson, Dryer and Haspelmath2011) are not usually associated with the segmental phonology of English. We can, nevertheless, hear them outside in the street and on the radio on a daily basis. Such stops are produced with a glottalic airstream, which, at the extreme, is capable of generating a strong acoustic effect with a brief pulse-like spectral spike, shown in Figure 1. The articulation is achieved by the larynx being briskly pushed upwards with the tightly closed glottis, while there is an occlusion in the oral cavity (Ladefoged & Maddieson Reference Ladefoged and Maddieson1996), followed by the oral release of this relatively high pressure air. To reach a fully unphonated state and a complete blockage of pulmonic air passage (found in the strongest ejectives), there might be a simultaneous adduction of ventricular folds with a variable constriction of the sphincter mechanism, similar to the ones observed for glottal stops (Roach Reference Roach1979, Catford Reference Catford1982, Esling & Harris Reference Esling and Harris2005). Less extreme combinations of glottal and supraglottal closures may generate appreciably non-pulmonic bursts or releases to the oral constriction without being obviously ejective.
Figure 1 A Middle Class Scottish English boy from Rosyth (3 years 11 months old) saying Hey, I got Mister Cook in a naturalistic home-playing situation. There is a strong (spectrally pulse-like) ejective final [k’] in Cook at the end of the phrase.
Recent study of geographical distribution of languages with phonemic ejectives (Everett Reference Everett2013) found that ejectives tend to occur in world regions close to areas of high elevation (above 1500 m). As an explanation for this geographical bias Everett (Reference Everett2013) proposes that ejective sounds are easier to produce at high altitudes due to reduced atmospheric pressure, and/or they might mitigate the high altitude hypoxia as the air volume from the lungs is not consumed for speech during ejective production. While this apparent correlation between high altitudes and distributions of ejective phonemes requires a more rigorous proof of causation (see detailed discussion in Roberts & Winters Reference Roberts and Winters2013), it is clear that any (non-phonological) ejective sounds in the British Isles do not fit this pattern, with the highest peak Ben Nevis (1344 m) not exceeding Everett's high-altitude criterion.
There have been notes of phonetic ejectives in English in the past (Catford Reference Catford1982, Ladefoged Reference Ladefoged1993, Ladefoged & Maddieson Reference Ladefoged and Maddieson1996, Chirrey Reference Chirrey1999, Fabricius Reference Fabricius2000, Ogden Reference Ogden2009), suggesting that ejectivisation has been present in various English varieties for at least three or more decades, but so far almost no systematic studies are available. Catford (Reference Catford1982: 70) attributed the ejective realisation of final /ptk/ to ‘pathological speech and some northern English dialects’. Wells (Reference Wells1982 vol. 1: 261) mentioned ejectivisation in both northern and southern English dialects, and considered it as ‘an emphatic articulation of the glottal component’ in the sense that ejectivisation is somehow connected to stop preglottalisation. Wells’ view of ejectives is more recently shared by Ogden (Reference Ogden2009), who (also tentatively) described it as ‘a rearrangement in time of the constrictions needed to produce glottally reinforced voiceless plosives’ (Ogden Reference Ogden2009: 164).
It is an open question whether this rearrangement of glottalisation timing in British English varieties is segmentally epiphenomenal (Wells Reference Wells1982, Ogden Reference Ogden2009); or alternatively, epiphenomenal due to phonotactic influences, as in German, where ejectives are a result of the higher air pressure building up in the supralaryngeal cavity during temporal overlap of a final stop and glottalisation of the next vowel (Simpson Reference Simpson2007, in press); or whether this is an allophone (or free variant) emerging as part of the phonetic system, possibly as a para- or sociolinguistic marker (see McCarthy & Stuart-Smith Reference McCarthy and Stuart-Smith2013 for the latter).
Recent systematic research begins to give an indication of how common ejectives might be. McCarthy (Reference McCarthy2011) carried out a sociophonetic study of word-final /k/ in Glaswegian secondary school girls. He showed that up to as much as 65% of all final /k/ variants were a type of ejective, with more of them occurring in a read speech condition (in which speakers might strive to closely approximate a language standard) than in casual speech. On top of that, ethnicity played a role in the distribution − girls of Glaswegian Asian background used less strong ejectives than non-Asian Glaswegian ones, possibly indicating that the phenomenon arises as a language-internal development.
Our own longitudinal study of seven Scottish English pre-school children of middle class background undertaken in semi-spontaneous play in 2002−2004 (Gordeeva & Scobbie Reference Gordeeva, Scobbie, Scobbie, Mennen and Watson2011) showed that the subjects could use them categorically, i.e. some children did not produce them at all. Among the children using ejectives (five out of seven), 13.5% of all word-final obstruents involved glottalic stops, as in the example in Figure 1 and one other, similar example.Footnote 1 Prosodically, the ejectives appeared significantly more frequently in phrase-final positions. Contrastively, they appeared mostly in lexemes ending with phonologically voiceless stops, but 11.7% of the ejectives also appeared in items pig and food accompanied by complete devoicing. This suggested that /−voice/ and phrase-finality are not the exclusive contexts of ejectives. Generally, from that study it could be argued that ejectives act as phonetic variants of fully released pulmonic stops of either phonological voicing specification, and potentially they could indeed have a socio-phonetic meaning (e.g. correlate of age or social class) and/or a paralinguistic meaning, such as emphatic or clear articulation, as also suggested by Wells (Reference Wells1982).
To address some of these open topics, we will therefore analyse the relationship between the glottalic airstream mechanism and aspirated or glottalised articulation near this obstruent locus in a new group of adult speakers undertaking a standard laboratory speech task, and further explore the possible use of ejectives in English phonology, specifically as a correlate of (hence potential cue to) the obstruent /voice/ contrast.
1.2 Preglottalisation and ejectivisation of Scottish English stops
As just discussed, stops with glottalic airstream (also ejectivisation) are little reported in English. On the other hand, glottalisation (as a secondary articulation, but also preglottalisation or the glottal reinforcement of stops) is extremely common in various dialects of English throughout England and Scotland (Roach Reference Roach1979, Wells Reference Wells1982, Milroy et al. Reference Milroy, Milroy, Hartley and Walshow1994, Chirrey Reference Chirrey1999, Stuart-Smith Reference Stuart-Smith1999). Glottal stops lacking any oral constriction are also a common realisation of English /t/ (less so for other stops), and we will assume that these are an extension of glottalised stops, and not discuss them much further. The term ‘glottal stop’ often covers a range of productions from complete sustained glottal adduction through to weak creak and diminished intensity (clearly analysed in Docherty & Foulkes Reference Docherty and Foulkes2005). They differ most obviously from ejectives in that the latter necessitate an oral constriction for which the glottalic constriction can act as an airstream mechanism.
Glottalised and glottalic stops alike can both result in irregular phonation in the vowel preceding the oral stop, also referred to as ‘creakiness’ in taxonomies of phonatory settings (Catford Reference Catford1982, Laver Reference Laver1994). A ‘phonatory setting’, whether indexical or a systematic part of a linguistic variety, can be parametrically described as an average value of relevant parameters globally, indicative of a long-term influence of habitual neutrality (Honikman Reference Honikman, Abercrombie, Fry, MacCarthy, Scott and Trim1964). In such taxonomies, habitual creakiness of a speaker should be unrelated to the preglottalisation of stops as a secondary articulation of the segment. However, it may be that in fact stop production and creaky phonation in preceding vowels are simultaneously indicative of both segmental contrastiveness and a variety's phonatory setting, by virtue of both eventually acting on the same ‘susceptible segments’ in pre-stop vowels and having the same acoustic manifestation of irregular periodicity memorably described by Catford (Reference Catford1982: 98) as ‘the sound of a stick being run along a railing’. For this reason, we do not necessarily intend to terminologically separate the two levels in this paper, and use ‘creaky’ and ‘breathy’ simply as attributes of the notions of ‘phonatory settings’ or ‘voice quality’, and reserve the more articulatory terms ‘glottalisation’ and ‘aspiration’ to other contexts.
Creaky voice quality has indeed been reported in Scottish English as an indexical parameter. In Edinburgh specifically, voice quality was found to correlate with socio-economic status, whereby creaky phonatory settings seem to predominate in speakers with higher social status, such as middle class (Esling Reference Esling1978).
Figure 2 shows both vowel glottalisation and a strong ejective release of a stop co-occurring at a phrase-final locus. The glottalisation is typically acoustically manifested in irregularities, such as pitch-asynchronous secondary pitch spikes. This is especially clear in the figure in the electroglottographic (EGG) time-derivative in the lower pane in approximately the last seven pitch periods of the vowel. In this context, the ejectivisation cannot be epiphenomenal in Simpson's (Reference Simpson2007, in press) phonotactic sense, since there is no vowel following the ejective; it is rather followed by a pause. However, such co-occurrence could be epiphenomenal if ejectivisation involves a rearrangement in time of the glottal constriction as suggested by Wells (Reference Wells1982) and Ogden (Reference Ogden2009).
Figure 2 An example of creaky phonation followed by a strong phrase-final ejective stop in the word bought produced a Scottish English male subject. The panes contain (from the top): time-aligned acoustic waveform, spectrogram and EGG time-derivative.
1.3 Preaspiration of Scottish English /−voice/ fricatives
The situation with word-final fricatives in Scottish English is very interesting in the light of potential glottalisation and ejectivisation in the stop series. Periodicity (or phonetic voicing), its timing and segmental durations are known to be important correlates of the voicing contrast in most English varieties (Haggard Reference Haggard1978, Docherty Reference Docherty1992, Smith Reference Smith1997), including the relationship between the Scottish vowel length rule and obstruent /voice/ in Scottish English (Scobbie, Hewlett & Turk Reference Scobbie, Hewlett and Turk1999). Our recent studies have also found that preaspiration is a secondary (to periodicity) correlate of the /voice/ contrast in Scottish English word-final fricatives (Gordeeva & Scobbie Reference Gordeeva and Scobbie2010), and for some middle class speakers it is primary. The large extent of preaspiration in fricatives, and cross-linguistic differences in the alignment of aspiration parameters (Gordeeva Reference Gordeeva, Trouvain and Barry2007) merited a conclusion that preaspiration in Scottish English is variety-specific phonetic structure. Our data showed that preaspiration (locally breathy phonation) occurs in vowels before fricatives due to an abducted glottis configuration timed substantially prior to the supralaryngeal stricture. Any epiphenomenal ejectivisation of fricatives is unlikely to occur, given that the fricatives are contributing neither a full glottal closure nor an oral closure which might tend to overlap and capture higher air pressure in an oral-glottal cavity.
Different languages feature both aspirated and glottalic obstruents in one and the same inventory, e.g. Quechua, Georgian or Navajo (Ladefoged & Maddieson Reference Ladefoged and Maddieson1996). In principle, phonetically there is nothing against a language completely precluding preaspiration where there is preglottalisation or the other way around (see examples and discussion in Gordon & Ladefoged Reference Gordon and Ladefoged2001). Scottish English appears (given our current state of knowledge of phonology and phonetics) to be a less idealised situation, in which such phonetic properties as preaspiration and preglottalisation are unsteady, might occur to different degrees in different speakers and might in turn affect the production of ejective stops if they are epiphenomenal in the sense suggested by Wells (Reference Wells1982) and Ogden (Reference Ogden2009).
1.4 Research questions
We aim to provide empirical evidence of any potential complementarity of phonatory settings (creakiness and breathiness) before voiceless stops and fricatives in Scottish English, and their rates of co-occurrence in individual speakers.
Establishing this should enable us to address the hypothesis that creaky speakers produce ejective releases more and breathy speakers do so less. A confirmation should indicate that ejectives are indeed an epiphenomenal result of glottalisation (Wells Reference Wells1982, Ogden Reference Ogden2009). A negative result should indicate that individuals with ejectives produce them as a phonetic variant tentatively connected to some (socio-)phonetic or paralinguistic context. To shed more light on what phonetic context this might be, we will look at Scottish English /±voice/ in word-final stops and fricatives and consider it against preglottalisation or preaspiration in individual speakers, and the amounts of ejectives as a function of stop /±voice/.
It is also the purpose of this study to provide an acoustic description of Scottish English ejectives and how they relate to the acoustic measures (e.g. closure duration and burst intensity) shown as relevant in previous research into phonemic ejectives (Lindau Reference Lindau1984, Grawunder, Simpson & Khalilov Reference Grawunder, Simpson and Khalilov2010, Vicenik Reference Vicenik2010).
2 Method
We used a combination of impressionistic categorisation, electroglottographic and acoustic methods to address the research questions. A range of acoustic techniques used for the analysis of aspiration and periodicity was already established in our previous work (Gordeeva & Scobbie Reference Gordeeva and Scobbie2010). However, to ensure the accuracy of periodicity and glottalisation analysis we performed EGG recordings.
The acoustic analyses and annotations of segmental boundaries were augmented by an impressionistic categorisation of stop release properties and speakers’ phonatory settings in the preceding vowel. The categorisation was necessary because ejectivisation, glottalisation and aspiration are non-phonological in Scottish English and these variable properties needed to be identified for each separate case. The annotations are described in Section 2.3 below. The categorisations served as predicted variables throughout the study, and they helped to establish both non-parametric distributions and parametric (acoustic and EGG) discriminability of glottalisation and aspiration in relation to ejectivisation and phonological /voice/.
2.1 Subjects and recordings
The subjects were five Scottish English males (SP1–5) recruited in Edinburgh. All were informally of middle class background. Subjects 1–4 were in the age range of 40–50 years. S5's age was between 20 and 25 years. The subjects were not paid for their participation. The materials were presented on a computer screen, one carrier sentence at a time. The speakers were instructed to read out the prompts from the screen in a natural way that would be clear to a small classroom public. The speech rate was kept constant.
The recordings were performed in sound-insulated booth with the sampling frequency of 22.05 kHz stereo and 16-bit resolution. The right channel contained the acoustic recording. The left channel was coupled to the Laryngograph Processor™ for the EGG recording. A directional headset microphone with a fixed position was used to ensure that any time delay between the acoustic and EGG recordings was minimal and constant, and so that further analyses were not biased by this factor. The EGG electrodes were gently strapped around the speaker's neck and larynx. The recording was done in one single session to ensure that the electrodes always registered the laryngeal movement from the same fixed location.
2.2 Materials
Target words contained coda stops (Table 1) or fricatives (Table 2) following a range of pre-consonantal vowels. The targets were matched phonetically and varied in /±voice/ of the final stops or fricatives (e.g. ‘greed’ and ‘greet’ in Table 1). Each target was embedded in carrier sentences in four different contexts, presented here in (1)–(4).
(1) I say SHEEP, and not SHIP
(2) I say SHIP, and not SHEEP
(3) I can say ship AGAIN.
(4) That's the word SHIP.
Table 1 List of the target words with word-final coda stops (total n = 690).

Table 2 List of the target words with word-final coda fricatives (total n = 205).

To control for cross-subject sentence prosody, the participants were asked to emphasise the words in uppercase (as in the above examples (1)–(4)). An utterance was re-recorded if the sentence accent fell on the wrong word. This resulted in a somewhat different number of tokens recorded per speaker. The materials were interspersed with materials from an unrelated study involving vowel quality. The variation in sentence prosody was exploratory, and was not to be addressed in this study as a target variable. It was used as a selection criterion for some statistical tests to avoid its influence. In any case, the distributions of stop properties resulting from this prosodic range should be more realistic in approaching the variation in natural language.
Table 3 sums up the total number of tokens used per speaker for coda stops and fricatives. The structure of the context for final /voice/ followed the Scottish vowel length rule (SVLR), three-way context conditioning of vowel length, where the suffix morpheme -ed triggers a longer duration than tautomorphemic /d/ (Scobbie, Turk & Hewlett Reference Scobbie, Turk, Hewlett, Ohala, Hasegawa, Ohala, Granville and Bailey1999).
Table 3 Number of tokens per speaker for word-final coda stops and fricatives across all contexts (total n = 895).

2.3 Phonetic categorisation and segmental analyses
The phonetic analysis scheme of phonatory settings and ejectivisation was conceptualised by both authors, performed by the first author and validated by the second. All boundary annotations and further acoustic analyses were performed in the acoustic recording in Praat (Boersma & Weenink Reference Boersma and Weenink2009).
The first step to capture these ranges was to annotate segmental vowel and obstruent onset along with the phonetic labels, as shown in Figure 3. Additionally, for the fricatives, we time-marked the onset of preaspiration in vowel–fricative transitions (procedure in Gordeeva & Scobbie Reference Gordeeva and Scobbie2010), or the onset of stop burst, as applicable (time of marks ‘M1’ and ‘G5’ in Figure 3). For the annotation of onsets we used a set of common spectral and waveform criteria, such as aperiodic excitation changes in the spectrum around F2−F4, visible formant level weakening, concave decrease of amplitude envelope to consonantal levels and concave increases of waveform amplitude. Annotations of sentence accent were recorded in a separate matrix: 705 out of the total of 895 tokens contained sentence accent.
Figure 3 An example of segment-onset annotations of target words. M1 and G5 labels are put at the beginning of burst onset. Phoneme interval labels (in SAMPA here) are put at the beginning of the corresponding segment.
As a second step, we inserted a composite impressionistic categorisation for each target word with final stops at the time of the burst onset (see labels ‘M1’ and ‘G5’ in Figure 3). Each label consisted of two parts: phonation type and airstream. The contents of the labels are described in Table 4. To determine each label, we listened to the whole word rhyme and impressionistically evaluated vowel and stop characteristics following the scheme in Table 4.
Table 4 Categorical variables and annotation used in this study.
As already discussed, vowels are ‘susceptible’ segments for phonatory settings (Catford Reference Catford1982, Laver Reference Laver1994). Therefore, our evaluation of phonatory settings was based on the laryngeal characteristics of the vowels before word-final stops or fricatives (Part 1 in Table 4). For example, ‘M’ and ‘G’ in the composite labels ‘M1’ and ‘G5’ in Figure 3 mean that the former vowel was perceived as modal (speaker neutral), and the latter vowel as glottalised (creaky). Since we focus on phonation type here, we disregarded non-laryngeal voice quality features (Mackenzie Beck Reference Mackenzie Beck2005) to reduce the number of variables. Other phonation types, such as harshness, falsetto or whisper did not occur in our data, so are not discussed further. In these data, the cases of whispery-creaky phonation only rarely happened before /−voice/ fricatives. In such cases, we indicated the onset of whispery phonation as ‘A’ (aspiration) irrespective of the co-occurring creak, since whispery phonation overruled our overall percept of creakiness.
Finally, to capture glottalic–pulmonic range we had to account for the airstream mechanism along with the strength of release, eventual uncertainties in decision, and additional characteristics of the burst (e.g. glottal or fricated, see Part 2 in Table 4 above). For example, ‘1’ and ‘5’ in the composite labels ‘M1’ and ‘G5’ in Figure 3 mean that the former final stop was perceived as a pulmonic normal/strong post-aspirate, and the latter one as a strong ejective.
2.4 Electroglottographic and acoustic analyses
In this study we aimed to capture a range of glottalisation and aspiration as well as the acoustic characteristics of ejective stops. To capture this phonetic spectrum, we needed a reliable approximation of sound periodicity. Therefore, we performed simultaneous EGG and acoustic recordings. An overview of all the acoustic and EGG measurements used in this study is given in Table 5. We assume that this range of variables should capture the main differences between glottalisation, ejectivisation and aspiration in the obstruent itself or the preceding vowel. The periodicity-dependent measures in this study (Part 1 in Table 5), such as jitter and VoiceOff, were derived from the Lx of the EGG recordings.
Table 5 Overview of the acoustic and EGG-based variables used in this study.
EGG directly infers the contact area of the vocal folds, and estimates the source periodicity without the confounding effect of the supralaryngeal structures (e.g. Marasek Reference Marasek1997, Fourcin Reference Fourcin, Kent and Ball2000). When the vocal folds are abducted the EGG current is minimal (lows in pane A in Figure 4). During the closing phase, the EGG current grows and reaches its maximum velocity near zero-crossing (tx on pane A in Figure 4). A time-derivative (Lx) of the EGG waveform (pane B in Figure 4) reflects the velocity of amplitude changes and the spikes permit a rather accurate analysis of pitch and its (ir)regularity. The EGG waveforms were high-pass filtered at 50 Hz to get rid of the possible direct current (DC) component due to exhalation interfering with periodicity.
Figure 4 Linear (pane A) and time-derived (pane B) representation of the EGG signal.
The time-derivative was computed using the formula: Lx = (Ui + 1 – Ui) × s, where U is amplitude, i is sample number, and s = 1 /(Umax + .01) is a scale factor for amplitude rescaling relative to the maximal amplitude, i.e. Umax, in the file.
Jitter, or pitch-period variation in time, is a good correlate of both glottalisation (creaky phonation), or modal voice (Gordon & Ladefoged Reference Gordon and Ladefoged2001: 397). Jitter was computed from DC-filtered 8 kHz Lx waveforms throughout the complete vowel preceding coda stops and fricatives. For calculation, we used the cross-correlation algorithm. This measure (especially when derived from the EGG signal) is good at estimating pitch period irregularities, such as double creak spikes shown in the vowel in Figure 2. Other measures (such as shimmer and harmonics-to-noise ratio) were also considered in the preliminary stages, but were discarded from the final results due to their cross-correlation with jitter.
Modal phonation is well reflected in periodic stretches of speech (certainly in combination with parallel aspiration and glottalisation estimation). Voicing Offset Ratio (VoiceOff, %) is both a measure of phonatory modality and phonological obstruent /voice/ developed in our previous fricative study (Gordeeva & Scobbie Reference Gordeeva and Scobbie2010: 181). VoiceOff estimates the timing of periodicity in vowel–obstruent (VC) sequences relative to the obstruent onset. The measure quantifies the timing either prior to or post the onset of obstruent stricture. If the periodicity offset occurs in the pre-consonantal vowel, it normalises the above timing as a negative percentage relative to the absolute duration of V (in ms). If the offset occurs in C, then VoiceOff is normalised as a positive percentage relative to the absolute duration of C. In this study, VoiceOff was calculated from the Lx derivative of the EGG waveforms using the cross-correlation algorithm with the minimum of 75 Hz and maximum of 350 Hz. The minima/maxima were based on vowel f0 ranges in the complete data set. The VoiceOff values are rescaled ratios (%), and are, therefore, normalised for any intrinsic vowel and stop articulation differences reflected in absolute V or C duration.
BpZCR, or band-pass filtered zero-crossing rate (per sec), is a correlate of aspiration (breathy phonation). This measure was equally developed in our earlier study (Gordeeva & Scobbie Reference Gordeeva and Scobbie2010: 183). A standard ZCR measure is heavily affected by the presence of low-frequency periodicity and DC-component deviations due to e.g. exhalation. Flexible band-pass filtering (lower limit defined at 1.5 × maximum f0 for each vowel and an upper limit at 5.5 kHz) removes these low spectral frequencies while keeping the high frequencies, which are also found to contribute to about 80% of human perception of breathiness (Klatt & Klatt Reference Klatt and Klatt1990, Hillenbrand, Cleveland & Ericson Reference Hillenbrand, Cleveland and Ericson1994). We thus, acquire a periodicity-independent and perceptually-motivated measure of aspiration, suitable for varying periodicity of any vowel, including those with voiceless portions. In this study, bpZCR was averaged throughout the final 20% of the vowel duration.
To analyse ejective stops, in addition to the categorical scales in Section 2.3 and the above phonatory acoustic measures, we used a number of acoustic durational and burst intensity measures found to be relevant in previous research into phonemic ejectives (Lindau Reference Lindau1984, Grawunder et al. Reference Grawunder, Simpson and Khalilov2010, Vicenik Reference Vicenik2010). Some other measures that have previously been applied to phonological ejectives such as voice onset time (see overview in Grawunder et al. Reference Grawunder, Simpson and Khalilov2010) are not applicable to Scottish English ejectives occurring in syllable codas. Therefore, the scope of the acoustic measures summarised in Table 5 above fits into syllable rhymes.
2.5 Statistical analyses
The statistical analyses were carried out using IBM (SPSS) Statistics 19 software. The specific factorial design is described in individual results sections.
In Section 3.1 below, on categorical analyses, we used some descriptive statistics and non-parametric Chi-square association tests. The latter were only run for the parts of the analyses fulfilling the prescribed minimum requirements for validity.
Linear Discriminant Analysis (LDA) was used in Section 3.2 below. The ‘stepwise’ LDA was chosen, since it makes no assumptions about which predictor should have higher priority than others. The order of predictor entry was determined as Wilks’ Lambda with an F-value of 3.84 for predictor entry and 2.71 for removal.
3 Results
3.1 Glottalisation versus ejectivisation in categorical terms
In this section, we address four questions, which should help us to relate glottalisation and ejectivisation in terms of our impressionistic categories and clarify any association of ejectives to phonatory settings and phonological /±voice/.
3.1.1 How does glottalisation in Scottish English /±voice/ stops relate to individual speakers’ phonatory settings in the preceding vowels?
Glottalisation was quantified as the percentage of ‘creaky’ labels of all phonatory settings labels (‘modal’, ‘breathy’ or ‘creaky’) in vowels before /±voice/ stops (n = 658). The distribution of these categories per speaker and stop /±voice/ is shown in Figure 5.
Figure 5 Percentages of creaky, breathy and modal labels in pre-stop vowels per speaker and stop /±voice/.
Figure 5 shows that SP3 and SP5 are the biggest glottalisers, with accordingly 74% and 94% of glottalisation rate in /−voice/ stop contexts. SP2 shows the least glottalisation rate of 31%. Overall, in pre-stop context, breathy phonation is rarely present, so that the five speakers can be considered to vary phonatory settings for stop /voice/ along the modal–creaky phonatory continuum. Glottalisation rate is higher in voiceless stops individually ranging from 31% to 91%, as opposed to from 2% to 10% in voiced stops.
We ran Chi-square tests to measure the association between ‘creaky’ and ‘modal’ auditory labels (across speakers) and factor stop /±voice/. ‘Breathy’ labels (n = 8) were excluded because their ‘0’ counts exceeded minimum requirements for the test validity. The cross-tabulation of percentages and counts per category are presented in Table 6. The Chi-square test showed that there was a very highly significant association (χ2 = 227.856; df = 1; p < .001) between ‘creaky’ and ‘modal’ auditory labels and stop /voice/: i.e. the creaky phonation was associated with /−voice/ stops and the modal phonation was more frequent in /+voice/ ones.
Table 6 Percentage and counts for each speaker of ‘creaky’ and ‘modal’ categorical labels as a function of /±voice/ in stops.
3.1.2 How does ejectivisation relate to pre-stop phonatory settings?
To address this question, we looked at the overall distributions (%) of these factors per phonatory setting (‘breathy’, ‘creaky’ and ‘modal’). The analysis was run on the voiceless subset of stops (n = 370). Stops were classified as ‘glottalic’ if they were labelled as jointly ‘weak’ or ‘strong’ glottalic stops in impressionistic categories (see Section 2.3 below). All other stops were classified as ‘pulmonic’. Unsure airstream labels (n = 50) were excluded from the analyses.
Figure 6 shows that 23.6% of ejective stops co-occur with ‘creaky’, 13.1% with ‘modal’ phonatory setting and none co-occur with ‘breathy’ labels. While this confirms the trend that more ejectives co-occur with stop preglottalisation, yet in a merely epiphenomenal relationship between the two factors we would expect no glottalic stops to co-occur with modal phonation whatsoever (being full, regular and lacking creak or breath). Instead, there is an incremental increase in ejectivisation.
Figure 6 Pulmonic and glottalic stops across all /−voice/ stops as percentage per creaky, modal or breathy phonation in the preceding vowel.
3.1.3 How does ejectivisation relate to stop /±voice/?
To address the question whether ejectivisation of stops is conditioned by phonological /voice/, we looked at the distributions (%) of these factors per speaker in all stops (n = 670). Stops were classified as ‘glottalic’ if they were labelled as jointly ‘weak’ or ‘strong’ glottalic stops in impressionistic categories (see Section 2.3 above). Unsure airstream labels (n = 50) were excluded from the analyses. All other stops were classified as ‘pulmonic’.
Individual speaker’ results are visualised in Figure 7 and the numbers are reported in Table 7. The results show that the top-rate glottaliser (SP5 in Figure 5) produces no ejectives at all. The least glottaliser (SP2) does not produce ejectives either. SP1 produces 38% of pulmonic stops and 62% of glottalic for voiceless stops. Noteworthy is that he also produces 26% of glottalic stops in /+voice/ stop context.
Figure 7 Percentage of pulmonic and glottalic stops per speaker and /±voice/ stops.
Table 7 Percentage and counts for each speaker of pulmonic and glottalic stop burst labels as a function of /±voice/ in stops.
This result echoes the non-contrastive to /voice/ distribution of ejectives reported in our Edinburgh child data (Gordeeva & Scobbie Reference Gordeeva, Scobbie, Scobbie, Mennen and Watson2011) mentioned in the literature review. SP3 and SP4 produce glottalic stops at smaller rates than SP1. SP3 produces them irrespective of the stop /voice/ factor.
3.1.4 Do individuals with longer preaspiration in fricatives have less glottalisation or ejectivisation in stop series?
To address this question we analysed individual patterns by plotting the rates of long preaspiration against the rates of preglottalisation and ejectivisation per speaker. The result is shown in Figure 8. Pane A of the figure shows the relationship between the rates of fricative preaspiration and stop preglottalisation in individual speakers in a /−voice/ series (n = 511 for both stops and fricatives). Pane B shows the relationship between the rates of fricative preaspiration and stop ejectivisation in the same subset. The y-axis shows the percentage of all preglottalised stop tokens (n = 195). The x-axis shows the percentage of all preaspirated fricatives. Preaspiration was considered ‘long’ if it exceeded 50 ms (total n = 26, i.e. 18% of all /−voice/ fricatives).
Figure 8 Speaker-specific relationship from categorical labels between (A) stop preglottalisation rate (y-axis) and (B) stop ejectivisation and (A, B) fricative preaspiration (x-axes) of longer than 50 ms.
The speakers with ejectives in voiceless stop series (SP1: 62%, SP3: 10% and SP4: 17%) do produce a smaller rate of long preaspiration in fricatives (12%, 11%, 7% accordingly), and have a preglottalisation rate between 50% and 80%. However, the top-rate glottaliser, SP5, shows the top rate of long preaspiration in /−voice/ fricatives (54%). Also recall (from Figure 7) that the same SP5 produces no ejectives at all.
It appears that preglottalisers tend to produce smaller rates of ‘long’ preaspiration and the rates are in fact similar, but SP5 shows the contrary pattern. Once again phonatory settings of individual speakers do not seem to act like ‘communicating vessels’ here: less preaspiration does not necessarily mean more preglottalisation or the other way around.
3.2 Glottalisation versus ejectivisation in acoustic terms
In this section, we address the questions which should help us to relate glottalisation to ejectivisation in terms of acoustic and EGG measures and further clarify any relations of ejectives to phonatory settings and phonological /voice/.
3.2.1 What is the acoustic footprint of Scottish English ejectives?
First of all, we look at the acoustic footprint of Scottish English ejectives to establish if Scottish ejectives are acoustically similar to the phonemic ejectives reported for other languages (Lindau Reference Lindau1984, Grawunder et al. Reference Grawunder, Simpson and Khalilov2010, Vicenik Reference Vicenik2010), and/or whether they correlate to phonatory settings in the preceding vowels as a result of their epiphenomenal relationship with preceding glottalisation (Wells Reference Wells1982, Ogden Reference Ogden2009). If the correlates of phonatory settings (e.g. creakiness expressed as jitter) result in high correlation size, the hypothesis should be confirmed that ejectiveness is a consequence of glottalisation in the preceding vowel.
The non-parametric analyses in Section 3.1.3 above revealed that ejective stops occur in three out of five speakers. Thus, for the acoustic analysis of ejectives we selected a subset of word-final voiceless stops for the three speakers with ejectives (SP1, SP3 and SP4) (n = 174 tokens with sentence accent to avoid its confounding effect on the durational measures). We ran stepwise LDA with the predicted variable ‘ejective’ (‘Yes’ or ‘No’, based on both ‘strong’ and ‘weak’ ejective labels). All the acoustic measures from Table 5 above were subjected to the analysis as dependent variables. Next to the durational and intensity correlates of phonemic ejectives discussed already, we included the correlates of phonatory settings (bpZCR, jitter and VoiceOff).
The dependent variables entered in the LDA are shown in Table 8 along with the resulting correlation size to standardised canonical discriminant functions. The variables are listed in the order of importance. The variables in parentheses were automatically excluded by LDA from the classification because they did not successfully contribute to the classification.
Table 8 List of variables used for LDA prediction of the factor ‘ejective’ and corresponding correlation size to standardised canonical functions of each entered variable. Variables in parentheses were automatically excluded by LDA as predictors.

The LDA result showed that 78.2% (above the chance level of .5 for a binary variable) of the original cases were correctly classified as ejective or pulmonic stops based on burst duration, burst intensity and VoiceOff. The percentages and number of cases are summed up in Table 9. LDA showed that jitter was not selected by LDA as a contributing factor for the predicted variable ‘ejective’. We further discuss the results in the discussion.
Table 9 Number of tokens and % of voiceless stops perceived as ejectives versus factor ‘ejective’ predicted by LDA.

3.2.2 How does ejectivisation relate to stop /±voice/?
To determine whether factors ‘voice’ and ‘ejective’ are interrelated (i.e. can be predicted with the same list of dependent variables as in Table 8 for ejectives), we ran LDA with the same dependent acoustic variables as in Section 3.2.1 above to predict ‘voice’. The analysis was based on all /±voice/ stops for the same speakers and tokens as in Section 3.2.1 (n = 313). The list of the dependent variables is shown in Table 10 next to the correlation size to standardised canonical discriminant functions. The variables in the table are listed in the order of importance. Variables marked with parentheses were automatically excluded by LDA from the classification.
Table 10 List of variables used for LDA prediction of factor ‘voice’ and corresponding correlation size to standardised canonical functions of each entered variable. Variables in parentheses were automatically excluded by LDA as predictors.

The result showed that 93.9% of the original cases were correctly classified for ‘voice’ based on the dependent variables: VoiceOff, jitter and consonant duration. The percentages and number of cases are summed up in Table 11.
Table 11 Number of tokens and % of /±voice/ stops and predicted ‘voice’ membership by LDA.

The result showed that factors ‘ejective’ in Section 3.2.1 and ‘voice’ in this section rely on a different set of acoustic variables.
Figure 9 presents the means for the four most important acoustic variables from the two LDA analyses run for ‘ejective’ and ‘voice’ in this and the previous section.
Figure 9 Means (+ 1 SD) for three speakers with glottalic stops for four acoustic variables. Left column (blocked pattern) contrasts phonetically pulmonic and glottalic stops (black and grey, respectively). Right column (striped pattern) contrasts /+voice/ and /−voice/ stops (grey and black, respectively).
The left panes represent pulmonic and glottalic series. The top left pane shows that the burst duration is longer in pulmonic stop series compared to the glottalic ones. The mean differences for other acoustic variables are much smaller, but nonetheless are quite consistent for the three speakers with ejectives. The burst intensity is higher in glottalic stops. Interestingly, there is somewhat more jitter in pulmonic stops compared to the glottalic stop series. The difference indicates that there is in fact more glottalisation in the pulmonic stop series contrary to the epiphenomenal assumption in the literature (Wells Reference Wells1982, Ogden Reference Ogden2009). The differences in VoiceOff are found in the lowest left pane of the figure. As a reminder (see Section 2.4 above for details), positive values indicate the timing (%) of the periodicity offset prior the stop closure (0 in Figure 9), while negative values indicate the timing after it. Figure 9 shows that the glottalic series has a consistently earlier periodicity offset compared to the pulmonic stop series. The differences in jitter and VoiceOff (the glottal activity patterns measured from the EGG signal) are consistent with somewhat earlier glottal closure, and less glottalisation in the glottalic series.
For the ‘voice’ stop series in the right panes of the figure, the differences are larger. The longer duration of the voiced stops compared to voiceless is consistent with longer post-release aspiration. Burst intensity is higher in the /+voice/ series due to the presence of periodicity. There is consistently more jitter (preglottalisation) in the /−voice/ series. Finally, there are large differences in the timing of periodicity (VoiceOff), with the /+voice/ series having the offset far after the stop closure onset, and in the /−voice/ series with the offset before the stop closure onset.
3.2.3 Do individuals with longer preaspiration in fricatives have less preglottalisation in stops?
Linear Discriminant Analysis was used to evaluate, per speaker, the relative strength of the three acoustic variables: voicing (VoiceOff), aspiration (bpZCR) and glottalisation (jitter) in predicting /±voice/ for coda stops and fricatives. For this test we only included the acoustic variables relating to phonatory settings to show how much impact quality of phonation in the preceding vowel can have on the categorisation of phonological /voice/ in Scottish English if we disregard traditional durational measures. We included all stops and fricatives listed in Table 3 above. The results of the LDA classification are presented in Table 12 below.
Table 12 Percentage of correct classification by LDA (three variables) in predicting /±voice/ in coda stops and fricatives per speaker.

The results are above 92% of correct discrimination and show that the subset of acoustic variables is highly representative for the encoding of /±voice/ in our data. Pooled within-groups correlations of /±voice/ in stops are presented in Figure 10 for each variable. The main correlate of /±voice/ in stops is timing of voicing (VoiceOff) for all speakers, except for SP5, who (it will be recalled) is also the biggest perceptually-labelled glottaliser: SP5's main correlate is glottalisation (jitter) with the correlation size of .75. Quite consistently, glottalisation is the secondary correlate for the other speakers.
Figure 10 Pooled within-group correlation size for the three acoustic variables (voicing, aspiration and glottalisation) used in LDA as predictors of /±voice/ across stops per speaker.
Pooled within-groups correlations of /±voice/ in fricatives are presented in Figure 11. The main correlate of /±voice/ in fricatives is timing of voicing (VoiceOff) for all speakers, except for SP5 and SP3: their main correlate for fricative voice is aspiration (bpZCR). Aspiration is the consistent secondary correlate for the other three speakers.
Figure 11 Pooled within-group correlation size for the three acoustic variables (voicing, aspiration and glottalisation) used in LDA as predictors of /±voice/ across fricatives per speaker.
Importantly for the validity of this study, the acoustic comparison of aspiration and glottalisation (bpZCR and jitter) is derived independently of our categorical labels. The acoustic analysis is in agreement with our impressionistic results in Sections 3.1.1 and 3.1.4 above in that some individuals (such as SP5 here) can use voice quality as a primary correlate to encode the Scottish English /voice/ obstruent contrast. Indeed, we could state that modal voice for the /+voice/ pole contrasts with creak for /−voice/ stops and breathiness for /−voice/ fricatives. All of our speakers use aspiration and glottalisation at the very least as a secondary correlate; this despite their apparently ‘contrasting’ laryngeal configuration for glottal abduction and adduction. Thus, it seems reasonable to conclude that preglottalisation and preaspiration serve as local phonetic attributes of phonological /−voice/ in Scottish English.
4 Conclusions and discussion
Several findings in this study support the following conclusions.
(i) Preglottalisation is an important part of some individuals’ phonologisation of the stop /±voice/ contrast in Scottish English.
(ii) The production of glottalic stops seems not to be bound to the factor preglottalisation.
(iii) Glottalic stop allophones in Scottish English seem to have a similar acoustic footprint as phonemic ejectives known in the literature.
(iv) The production of glottalic stops is not related to phonological stop /±voice/.
The conclusions are borne out by the combination of (and broad agreement in) impressionistic, EGG and acoustic analyses of phonatory settings, obstruent /voice/ and ejectiveness.
With regard to (i), we show that the different extent of phonetic voicing (VoiceOff) is the most important correlate of /±voice/ across all speakers, confirming established views (see e.g. Docherty Reference Docherty1992, Haggard Reference Haggard1978, Smith Reference Smith1997). While the presence of preaspiration is a secondary correlate of /−voice/ in fricatives for all five speakers (as also shown in Gordeeva & Scobbie Reference Gordeeva and Scobbie2010), glottalisation is a consistent secondary correlate of stop /−voice/ for all the speakers (apart from SP5, who even uses it as primary). As appears from the categorical analysis in Sections 3.1.1 and Figure 5 above, SP3 and SP5 are predominantly creaky speakers in voiceless stop contexts, yet both employ aspiration as a primary correlate of the word-final /voice/ contrast in fricatives (refer to Figure 11). Moreover, LDA analysis in Section 3.2.3 above shows that speakers such as SP5 vary glottalisation primarily as a function of /±voice/. The differences in glottalisation rates between the categorical (Sections 3.1.1) and Linear Discriminant Analyses (Section 3.2.3) for SP3, for example, might be explained by the fact that LDA may detect VoiceOff as a sufficiently discriminating variable with a very high probability even in the presence of glottalisation. In this respect, human perception and machine learning may be different, and consideration of both yet may contribute to a more complete picture in a versatile way. Nevertheless, both analyses in categorical and acoustic terms show that the tendency for fricative preaspiration does not preclude the tendency for stop preglottalisation, but rather seems to form a speaker-dependent strategy to employ these local phonatory characteristics as correlates of voiceless coda obstruents, next to periodicity and its timing. Whether stop glottalisation and previously reported phonatory creakiness in middle class Scottish English speakers (Esling Reference Esling1978) form an overlapping continuum contributing to perception of glottalisation/creakiness is a question beyond the scope of this study.
In our speakers, the presence and timing of voicing vs. phonation type (preglottalisation and preaspiration) compete to be the main correlate of phonological /±voice/ in word-final obstruents. An alternative view is that they are both equally important. Either way of looking at this result ought to be relevant for typological comparisons of English to other languages, as well as for research in other areas where the phonetic exponent of phonological systems, or exposure to phonetic forms, matters for making predictions about how speakers of English might sound. A binary distinctive feature like /voice/ is physically represented by a set of co-varying complex phonetic cues whereby, as previously suggested, ‘absence of one or several such cues may be compensated by the presence of others, or by recovery processes that rely on listeners’ knowledge and expectations’ (Hawkins Reference Hawkins2010: 60). It would be misleading to propose a single invariant feature to characterise Scottish English /voice/.
With regard to (ii), the claim concerning the relationship between glottalisation and ejectivisation, the analyses show that ejective stops are not likely to be a natural consequence of increased glottalisation (Wells Reference Wells1982, Ogden Reference Ogden2009), but are separate phonetic variants serving some functions (phonetic, sociophonetic or paralinguistic) other than phonological /voice/. This lack of epiphenomenal connection to glottalisation supports the argument brought forward by Simpson (in press) that the above ‘natural’ bond for English does not necessarily hold unless the mechanism of pressure build-up is present. The alternative account, drawing on the epiphenomenal nature of ejectives in German (Simpson Reference Simpson2007), does not necessarily hold in this study since ejectivisation occurs across both utterance-medial and final positions, and thus is not necessarily influenced by the glottalisation in the following vowels (even though this option should not be overseen in English in the contexts with post-stop glottalised vowels).
In fact, the analyses reveal that jitter is somewhat lower in glottalic voiceless stops compared to the pulmonic counterparts (Figure 9 above). This pattern is valid for the three ‘glottalic’ speakers considered in Section 3.2.2, and is consistent with somewhat less glottalisation. The latter and earlier cessation in VoiceOff shown there could arguably be explained by a somewhat earlier full adduction of glottis in the glottalic stop series resulting in a total lack of creak and periodicity in these parts of the signal. Nevertheless, jitter is not borne out as a contributing factor to ‘ejectiveness’ (Section 3.2.2) in LDA classification. Besides, SP5 is the biggest pre-stop glottaliser (see Figures 5 and 10), yet he produces no ejective stops at all (see Figure 7). Similarly in categorical analyses, SP1 produces most ejectives of the five speakers and shows the fourth lowest rate of preglottalisation out of five (Figure 5). Finally, the categorical analyses in Section 3.1.2 show that ejective stops can also co-occur with impressionistically perceived modal phonation (see Figure 6).
With regard to (iii) and (iv), concerning the acoustic footprint of Scottish English ejectives and the relationship between ejectivisation and /−voice/, we have shown that ejectivisation is mapped to the acoustic features in the following order of priority: (a) shorter stop burst duration, (b) higher burst intensity, and (c) somewhat earlier cessation of voicing in the preceding vowel (lower VoiceOff value). Ejectiveness has a high (78%) rate of correct classification in LDA tests (Section 3.2.1) based on these measures. On one hand, this analysis puts the Scottish English non-normative ejectives in a similar phonetic order of durational and intensity correlates attested for ejectives classified as phonologically contrastive in other non-European languages (see also e.g. Grawunder et al. Reference Grawunder, Simpson and Khalilov2010, Lindau Reference Lindau1984, Vicenik Reference Vicenik2010). On the other hand, the phonetic footprint is sufficiently different from the primary correlates of /−voice/ stops in the order of importance, i.e. VoiceOff and jitter (as shown in Section 3.2.2), so it is quite plausible that the two can co-occur but be perceptually relevant for different aspects of sound structure. They are manifested at the same word-final locus, but are conditioned by different factors (still to be understood).
The presence of ejective stops in Scottish English is a noteworthy phonetic finding, since ejectives are only sporadically mentioned in relation to English (Ladefoged & Maddieson Reference Ladefoged and Maddieson1996, Chirrey Reference Chirrey1999, Ogden Reference Ogden2009, McCarty & Stuart-Smith 2013, Simpson, in press). Even considering the small number of subjects here and in our previous child language study (Gordeeva & Scobbie Reference Gordeeva, Scobbie, Scobbie, Mennen and Watson2011), the sociophonetic factor of speaker age does not seem to be important in the adult or child groups considered (3;5–50 years), in the sense that they are already present in both groups, whereby some youngsters and adults produce them, while others do not at all. (The latter should not mean that no age-related frequency differences can be expected in larger subject samples, if the production of ejectives is spreading in such varieties as Scottish English here.) It seems that ejectives form a rather idiosyncratic, independent phonetic variable, and may, we tentatively suggest, be more prevalent among middle class speakers. Ejectives should therefore be studied as a potential sociolinguistic variable in future research, to see if they attain greater phonological status, or convey social meanings, and become salient and non-idiolectal.
Although the range of functions of ejectives in the Scottish English sound system (and in other varieties of English) requires further study, a stylistically emphatic articulation and articulation/breathing control trade-off seem a plausible further hypothesis as the explanation for the occurrence of ejective stops in our data, both child (Gordeeva & Scobbie Reference Gordeeva, Scobbie, Scobbie, Mennen and Watson2011) and adult, since high intensity glottalic bursts are produced with lesser air volume and yet (impressionistically) radiate clarity (also suggested in Ogden Reference Ogden2009). The speakers in this study were all subject to the same stylistic condition, namely a standard laboratory speech task requiring natural but clear articulation, while the children in the previous study interacted with an adult in semi-spontaneous play situation (Gordeeva & Scobbie Reference Gordeeva, Scobbie, Scobbie, Mennen and Watson2011).
In general, British English varieties seem to explore variable phonatory correlates other than voicing, which go well beyond the aspects of voice disorder or even individual phonatory habits (Gordon & Ladefoged Reference Gordon and Ladefoged2001). This happens at different linguistic levels in wide-spread geographical areas in the British Isles, such as preaspiration, preglottalisation, voice onset time realised with breathiness and use of socio-linguistic markers such as glottal stops in young speakers. There are many languages where this local phonatory specification of segments simply does not happen to the same extent (e.g. Gordeeva Reference Gordeeva, Trouvain and Barry2007). Ejectives in modern English are, moreover, so conspicuous that anecdotal observation and awareness of their use seem to have far outpaced academic study, but the research literature is now, however, exploring its socio-linguistic meaning (McCarthy & Stuart-Smith Reference McCarthy and Stuart-Smith2013). Perhaps therefore it is not so surprising that we can report the systematic (non-epiphenomenal) ways in which another laryngeal toy, the glottalic stop, is being played with by speakers of English.
Acknowledgements
This paper is a revised and extended version of our previous working paper (Gordeeva & Scobbie Reference Gordeeva, Scobbie, Scobbie, Mennen and Watson2011). We would like to thank the ESRC (UK) (grants PTA-026-27-0368 and R000-22-2032) and Queen Margaret University (Edinburgh, Scotland) for funding this research. The authors are very grateful to the anonymous reviewers for their thorough comments and suggestions.