1 Introduction: LVT phonology
Sounds of the lower vocal tract (LVT), which involve articulatory activity of the larynx and pharynx, pose a challenge to traditional phonological analysis (see e.g. Clements & Hume Reference Clements, Hume and Goldsmith1995, Rice Reference Rice, Oostendorp, Ewen, Hume and Rice2011). Unlike sounds made in the upper vocal tract (UVT), many LVT sounds intertwine phonation and articulation together, and their production often requires broad changes to vocal-tract shape, not just localized articulatory movements. These aspects are not well represented by current theoretical models of phonology.
Most contemporary views of the phonological status of LVT sounds reflect, on the one hand, the arguments of Halle & Stevens (Reference Halle and Stevens1971) concerning the features relevant for describing laryngeal state, and on the other, those of Hayward & Hayward (Reference Hayward and Hayward1989) and McCarthy (Reference McCarthy and Keating1994) that laryngeal, pharyngeal, and uvular sounds constitute a natural class of sounds, traditionally called gutturals or post-velars. Halle & Stevens’ (Reference Halle and Stevens1971) features focus on glottal state and fail to connect it to the state of the rest of the vocal tract. The McCarthy Reference McCarthy and Keating1994: 198–202) model posits that, since the somatosensory endowment of the pharynx looks sparse in comparison to that of the oral region, one cannot define an articulator comparable to [labial], [coronal], or [dorsal] in order to define the guttural class. Instead, McCarthy argues that the guttural sounds share an entire orosensory region of articulation, designated [pharyngeal], and this property of their representation underlies their unity as a natural class. In this paper, we argue that this view does not capture the richness of interaction between the components of the LVT system. Likewise, the proposal for a tongue root articulator (e.g. Bessell Reference Bessell1992, S. Rose Reference Rose1996), which remains popular in more recent analyses of LVT phonology (e.g. Bin-Muqbil Reference Bin-Muqbil2006), does not adequately capture the phonological possibilities of this region and inadequately characterizes the phonetic character of pharyngeal and laryngeal sounds.
The difficulty these proposals face is in accounting for the (phonological) coupling between the laryngeal and supralaryngeal systems. Several studies have observed that there must be a connection between the guttural sounds and other types of sounds which involve an association between the pharynx and larynx (Colarusso Reference Colarusso1985, Czaykowska-Higgins Reference Czaykowska-Higgins1987, Trigo Reference Trigo1991, Halle Reference Halle1995). These include ‘tense’, ‘head’, or ‘harsh’ vocal registers in Tibeto-Burman and Mon-Khmer languages, cross-height or so-called ‘ATR’ harmony in languages of the Niger-Congo and Nilo-Saharan families, pharyngealized vowels found in Caucasian and some Tungusic languages, and ‘sphincteric’ vowels found in Khoisan-group languages. A central challenge these phenomena pose is how to relate phonatory quality to vowel quality: there is thus a need for a model that possesses a notion of voice quality, a notion which combines these (and other) properties.
Speech sound systems have a dual nature as both phonetic and phonological, and neither of these operate in isolation of the other. The approach we take embraces the notion of phonetically ‘grounded’ phonology (Archangeli & Pulleyblank Reference Archangeli and Pulleyblank1994) but places emphasis on the physical mechanisms of speech as the foundation for discreteness (Stevens Reference Fujimura and Stevens1989). Drawing inspiration from these approaches, we propose the notion of phonological potentials and a theoretical framework based on these called the Phonological Potentials Model (PPM), which we apply to the consideration of LVT phonology. As we argue below, the idea of phonological potentials allows for relations between these phonetic properties to be established in a way that has relevance for phonological patterns of the LVT.
The paper is organized as follows: Section 1.1 introduces the basic elements of the PPM, and Section 1.2 returns to the question of how the model is relevant for LVT phonology. Sections 2–4 exemplify the model based on LVT phenomena, focusing on epilaryngeal vibration and its interaction with phonatory, tonal, and vowel quality, the link between laryngeal and lingual articulation, and the link between pharyngeals and palatals. Section 5 returns to the conception of the PPM and discusses how it fits into current understanding of phonology.
1.1 Phonological potentials
Phonological potentials are the physical properties of speech that form, bias, or otherwise influence the discrete structure and patterning of speech sound systems. Such a relation between the gradient and discrete aspects of speech sound systems is attributed to the quantal nature of speech (Stevens Reference Stevens, David and Denes1972, 1989; Stevens & Keyser Reference Stevens and Keyser2010). Quantal properties arise from nonlinear mappings between physical domains. The basic distinction between linear and nonlinear functions is represented in Figure 1. Stevens (Reference Stevens, David and Denes1972, Reference Fujimura and Stevens1989) originally focused on nonlinear relations between articulation and acoustics, but speculation of the generalizability of this idea to other domains of speech, such as the relation between biomechanics and articulation, was put forth by Fujimura & Kakita (Reference Fujimura, Kakita, Lindblom and Öhman1979). Their work focused on biomechanical saturation effects, exemplified by the stiffening of the tongue and its bracing against the teeth and palate in the vowel /i/ that help to produce stable cross-sectional areas robust to variation in muscle contraction. Such effects are thought to not be limited to the tongue but rather pervade the functioning of the entire vocal tract (see also Fujimura Reference Fujimura and Stevens1989; Schwartz et al. Reference Schwartz, Boë, Vallée and Abry1997: 257; Perkell Reference Perkell2012).
Thus, the nonlinear relation between articulation and acoustics that Stevens posited is not a restricted phenomenon, but rather a general one. In nature, nonlinearities of physical systems underlie categorical state transitions because small linear changes in state can create sudden changes in system behaviour (an example is neuron firing, which only occurs once the depolarization of the plasma membrane passes a certain threshold).
Here we conceive of phonological potentials as analogous to potentials in physics, such as electric (field) potential, gravitational potential, potential energy, and so forth. All these concepts describe continuous physical states, which are often nonlinear in character and which can potentially influence the behaviour of a system,Footnote 1 but which do not strictly determine this behaviour. Predicting actual behaviour requires consideration of initial conditions, boundary conditions, and other applied forces. Knowing the potential of a system, however, does go a long way towards accounting for system behaviour and, ideally, predicting how it might evolve.Footnote 2 Similarly, the aim of the PPM is to employ phonological potentials in accounting for the substance and behaviour of speech sound systems. This connection to the physical concept of potential could productively lead to a mathematical formalism (similar, but, as discussed immediately below, different in aim, to that outlined in Gafos & Benus Reference Gafos and Benus2006) to characterize the probabilistic nature of phonological potentials. However, this paper does not pursue this approach, since the priority is to first sketch out the relevant factors that characterize phonological potentials in any given physical domain of speech.
The basic idea behind phonological potentials is not new: they are physiological biases acting on speech sound system organization. They have been given various names before, such as ‘functional pressures’ (Napoli, Sanders & 2014: 424; see also Boersma Reference Boersma1998), ‘attractors’ (borrowing from dynamic systems theory; see Gafos & Benus Reference Gafos and Benus2006, see also Wedel Reference Wedel, Oostendorp, Ewen, Hume and Rice2011: 133), or ‘selective pressures’ (borrowing from evolutionary biology; e.g. Winter Reference Winter2014). Because phonological potentials are biases and not requirements, they parallel the notion of violability in OT grammar (Prince & Smolensky Reference Prince and Smolensky2004). Likewise, Gafos & Benus (Reference Gafos and Benus2006) interpret cognition as a system characterizable with the mathematics of nonlinear dynamical systems, such that ‘the essential constructs of phonological cognition are dynamical in nature’ (p. 3). They introduce the notion of ‘potential’ to frame cognition as a nonlinear dynamical system. However, phonological potentials in the present work do not require us to posit any sort of cognitive construct (e.g. constraints) for them to be useful in analysis: such cognitive constructs would arguably repeat the work that is done by the body operating in time and space (see Ohala Reference Ohala, Lee and Zee2011). Also, unlike Gafos & Benus (Reference Gafos and Benus2006), we do not attempt to go so far as to implement the idea of phonological potentials in an explicit mathematical formalism. (Such an endeavour might be possible, but we are more interested in this work with characterizing the physiological elements and their interactions that shape phonological potential.)
The notion of phonological potentials then, and its inspiratory origins as an analogy borrowed from physics, is intended to reframe the phonetics–phonology relationship such that systematicities of speech sound systems can be viewed in a way that simultaneously admits variation and constraint. Their nonlinearity provides a basis for the continuous-to-discrete mapping underlying the phonetics–phonology interface that leads to symbolic phonological cognitive processing, and this parallels Gafos & Benus (Reference Gafos and Benus2006) with the exception that this mapping exists outside of cognition, being grounded in the physiology of the body: an embodied phonology. The primary challenge in understanding speech sound systems is to characterize what the phonological potentials are and how they become realized within and across languages. This is by no means trivial given the extreme complexity of speech when considered broadly from a multi-dimensional, multi-physical-domain perspective. This systemic complexity is part of what results in so much cross-linguistic diversity in speech sound systems. The first step we take in characterizing phonological potentials is thus narrowed in scope to the biomechanical-articulatory domain (nearly but not exclusively since sometimes we will consider other domains, such as the articulatory-aerodynamic domain). The focus is squarely on LVT sounds. (Note that this narrowing of scope on physical domains primarily associated with speech production is not intended to downplay the importance of other domains such as those associated with speech acoustics and perception, and we discuss these later on in the paper.)
1.2 Phonological potentials of the biomechanical-articulatory domain
The concept of phonological potentials is very broad, as potentials span many physical domains of speech. In the biomechanical-articulatory domain, we discuss two types of phonological potentials: neuromuscular modules (NMMs), and their synergistic relations, which help to characterize the types of interactions that might occur from the physiological states that the NMMs produce. Overall, phonological potentials continuously act on speech sound systems to shape how these might pattern synchronically or change diachronically. While the primary conceptual orientation in this paper is on biomechanical-articulatory considerations, a complete model would take account of other physical dimensions such as aerodynamics, perception, neurophysiology, and so forth.
The notion of neuromuscular modules is derived from Gick & Stavness’s (Reference Gick and Stavness2013) proposal that speech production is characterized by a set of primitives employed in the process of speech motor control (see also Gick Reference Gick2016). Such primitives are characterized as functionally-defined neuromotor pathways underlying spatio-temporal patterns of muscle activation formed around physical structures of the vocal tract, such as the lips. Gick (Reference Gick2016: 179) further characterizes them as modules that comprise a pre-existing (and thus acquirable and not necessarily genetically specified) grouping of muscles bound in a fixed activation ratio, meaning that activation of the module causes a proportional increase in the contraction of those muscles forming the grouping (with the constituent muscles and the relative activation levels remaining fixed). The concept of neuromuscular modules echoes earlier notions proposed to solve the degree of freedom problem in motor control. These include Bernstein’s (Reference Bernstein1967) ‘muscle synergies’ (NB: this term should not be confused with the use of synergy in the current paper, further elaborated below) and the subsequent ‘coordinative structures’ put forth by Easton (Reference Easton1972) and ultimately applied to theorizing about speech production (Fowler et al. Reference Fowler, Rubin, Remez, Turvey and Butterworth1980). A concise summary of this terminology in relation to the idea of neuromuscular modules, along with a survey of the literature relating these to general motor control and also to speech, can be found in Gick (Reference Gick2016).
No attempt has yet been made to develop a formal model around the notion of neuromuscular modules. To emphasize their unique status, we symbolize NMMs with white square brackets, 〚 〛. Very loosely, we could imagine that each segment-sized unit in speech has its own specific NMM responsible for its production. This is an oversimplification. We might expect highly frequent sequences of phonemes are also implemented as NMMs. However, for the scope of this paper, we only consider segment-sized NMMs. This should at least help to clarify that NMMs are not themselves abstract constructs like phonemes, but rather concrete aspects of the motor system which enable us to produce speech sounds, and possibly entire sequences of such sounds, with reliability and efficiency.Footnote 3
An important property of NMMs is that their usefulness in producing sounds depends on how much they exploit quantal properties of the biomechanical-articulatory domain (Fujimura & Kakita Reference Fujimura, Kakita, Lindblom and Öhman1979, Fujimura Reference Fujimura and Stevens1989, Gick et al. Reference Gick, Stavness, Chiu and Fels2011, Nazari et al. Reference Nazari, Perrier, Chabanas and Payan2011, Moisik & Gick Reference Moisik and Gick2017). As with other physical domains, the nonlinear nature of the biomechanical-articulatory domain means that certain regions of parameter space (e.g. muscle activation) are more physically stable than others. A key assumption here is that NMMs will be more likely to form around stable regions in this space because doing so provides advantages such as feed-forward operation, reduced dependency on cortical feedback, and robustness to various sources of noise and error (Gick & Stavness Reference Gick and Stavness2013). The more robust an NMM is in this way, the greater its phonological potential is, and the most robust NMMs (such as stop NMMs) will be nearly universal. Thus, NMMs are phonological potentials that become realized in the neuromuscular control systems of speakers of a given language in association with suitably sized units of speech production (on the spatio-temporal scale of, but not limited to, the phoneme) formed around the language’s speech sound system.
There is another source of phonological potentials associated with NMMs. Each NMM is associated with a set of physiological states. For example, 〚p〛 is an NMM underlying the phoneme /p/, which is associated with physiological states such as lip closure, a raised mandible, closure of the velopharyngeal port, and abduction of the vocal folds. All of these physiological states have consequences for how this NMM would be expected to interact with other NMMs (underlying other speech sounds), and this is where the additional phonological potential is found. While NMMs themselves can be said to have a certain phonological potential related to how likely it is that they will become the biomechanical-articulatory basis for a phoneme, the physiological states constituting them also have phonological potential related to how NMMs interact with each other.
We assume two basic types of interactions between physiological states here: physiological states which work well together are synergistic (for example, closing the lips and the jaw); those that do not are anti-synergistic (for example, opening the jaw while maintaining closed lips). We use curly braces to denote physiological states, {}. (A state for bilabial closure of the lips might be rendered as {blc}, where blc stands for bilabial closure. Further details are given in Section 1.3.) These relations amongst physiological states, referred to collectively here as synergistic relations, are meant to be parallel or even equate to the notions of articulatory cooperation and conflict (e.g. Gick & Wilson Reference Gick, Wilson, Goldstein, Whalen and Best2006). However, we avoid these terms (i.e. articulatory cooperation and conflict), since the relations we are describing are not framed around (traditional) articulators per se (e.g. between the lips and the jaw) but rather physiological states of the vocal tract achieved through articulation (e.g. between a closed lip state and a closed jaw state). Also, the reader must take care to note that the terms ‘synergy’, ‘anti-synergy’, and ‘synergistic relations’ as they appear in this paper bear no direct or indirect relationship to the term ‘muscular synergies’, which more closely resembles the notion of NMMs.
Synergistic relations are interpreted as a type of phonological potential that may influence the behaviour of associated NMMs and the speech sound patterns in which these participate. That is to say, synergistic relations encode a complex set of assumptions (grounded in speech physiology research) that specify the tendencies and susceptibilities of how different physiological states interact when they are engaged by different NMMs. They are potential because they are expected behaviours, but the realization of such behaviour in any given speech sound system will vary in relation to numerous other factors (which gives rise to a significant amount of variation and diversity of such systems), such as the composition of the phonological inventory or prosodic influences. Without a notion of synergistic relations, there is no obvious way to express how the NMMs are likely to interact: i.e. to specify their potential for forming phonological patterns. Because synergistic relations are relations between the articulatory actions of body structures (as implemented by NMMs), their strength and influence – their phonological potential – varies in a continuous, gradient (but also possibly nonlinear) way that depends on the speech sounds involved.
Figure 2 presents a summary of these details. This figure shows speech sound systems as the confluence of two facets of phonological potential: latent and realized (demarcated by the dashed line). Realized phonological potential (right side of the dashed line) occurs as cognitive abstractions, featuring various levels of complexity (such as segmental and syllable structure) and existing within the minds of speakers belonging to a speech community (a social system). Latent phonological potential (left side of the dashed line) is found in the physical domains of speech (grey boxes with black outline, Figure 2), which are not mutually exclusive and show overlap (and alignment patterns, as discussed in Section 5). The domain of interest in the present paper is the biomechanical-articulatory domain, and it has been expanded to highlight the two key aspects of phonological potential in this domain: neuromuscular modules and the corresponding synergistic relations. Even though the goal has been to focus on the biomechanical-articulatory domain, almost unavoidably aspects of other domains enter the discussion (and in reality, these domains interact simultaneously).
The phonological potential of these domains is characterized by various forms of nonlinear behaviour in the different physical domains and interaction amongst the domains. In the biomechanical-articulatory domain, these take the form of quantal biomechanical effects, such as that demonstrated for the lips by Gick et al. (Reference Gick, Stavness, Chiu and Fels2011, see also Nazari et al. Reference Nazari, Perrier, Chabanas and Payan2011). Although we do not explore this area in detail, we could posit various synergistic relations for the various physiological states involved in the case of the lips. Two examples might be bilabial closure (or {blc}) and jaw closure (or {jc}), and these would characterize neuromuscular modules, such as 〚p〛 and 〚m〛, associated with the corresponding bilabial sounds, [p] and [m]. Note that by invoking the idea of neuromuscular modules, we are reminded of the differentiation of the biomechanical-articulatory action associated with 〚p〛 and 〚m〛. Despite surface similarities, these must use slightly different combinations of muscles and/or muscle activation ratios or (at least) muscle activation: e.g. experimental evidence indicates that the bilabial closure in [p] is stronger than that in [m] (Gick et al. Reference Gick, Chiu, Flynn, Stavness, Francis and Fels2012). The states {blc} and {jc} are synergistic (each facilitates or even reinforces the action of the other); jaw opening, {jo} , would be (at least) anti-synergistic with jaw closure, {jc} , (as these are contradictory states for the jaw to be in). (Further details about the formal representation of synergistic relations and neuromuscular modules are given below in Section 1.3.)
It is sensible to speak of the potential associated with segment-sized units of sound, such as [p] or [m] (large grey arrows), which have the potential to become realized as units in the cognitive abstraction of speech sound systems (hence these arrows cross the latent/realized line). Realized potential is, of course, no longer potential but phonological reality, thus the phonological potentials are properly located on the left side of the latent-realized line. This further implies that phonological potential entirely arises from the way our bodies can be used to produce and receive speech signals: in our model, the body is the basis of phonology. While this is most obvious for segment-sized units, aspects of prosody such as rhythm and intonation are also are grounded in the natural modes of oscillation of vocal tract movements (and corresponding cortical oscillation patterns, see, e.g. Luo & Poeppel Reference Luo and Poeppel2012).
Note that while other physical domains are equally important, the main focus in this paper is the biomechanical-articulatory domain. The phonological potentials associated with the biomechanical-articulatory domain are of two sorts: (i) neuromuscular modules (NMMs), which are primitives of speech motor control and interpreted here as the physical embodiment of segment-sized phonological units (but may span entire sequences of phonemes); and (ii) synergistic relations, which are the various forms of interactions amongst physiological states which NMMs engage and which help to account for tendencies in speech-sound patterning associated with the biomechanical-articulatory domain.
Phonological potentials must be realized in actual languages (reflected by the collective knowledge speakers have of their languages), but they always act (as a bias or ‘pressure’) on their associated speech sounds (grey arrows, Figure 2), synchronically and diachronically; we assume that this realization is emergent for speakers (sensu Mielke Reference Mielke2008) and most probably involves numerous forms and layers of abstraction that give rich structure to the lexicon (and enable operations such as rhyming). Thus, in this model, ‘phonological’ patterns begin in the body. With these definitions we are now ready to discuss LVT phonology.
1.3 Synergistic relations of the lower vocal tract
The model of the LVT phonology we present here takes a ‘whole larynx’ approach (Moisik & Esling Reference Hassan, Esling, Moisik, Crevier-Buchman, Lee and Zee2011). Previous phonological models do not connect activity at the glottal level to that of the rest of the vocal tract, primarily because the larynx in these models is treated as if it were just the glottis. On the basis of Moisik (Reference Moisik2013), the epilarynx – the part of the larynx situated between the vocal folds and the base of the tongue – is assumed to be central in linking laryngeal behaviour to the rest of the vocal tract and thus in determining many of the potentials found in LVT phonology. Anatomically, the epilarynx is formed by the ventricular folds, aryepiglottic folds, and epiglottis (a more detailed description is given in Moisik Reference Moisik2013: Chapter 2). As Figure 3 depicts, the epilarynx is a tube-shaped structure nested within the relatively independent pharyngeal tube.
The key action of the epilarynx (arrows at 2, Figure 3) is posteroanterior narrowing or epilaryngeal constriction (or just constriction where the context is clear). Note that this is in stark contrast to pharyngeal constriction (arrow at 1, Figure 3). The two actions are mostly independent: pharyngeal constriction does not necessarily entail epilaryngeal constriction. Likewise, epilaryngeal constriction can occur without pharyngeal constriction. They do however synergize in ways that will be described below. Note that the epilarynx is a composite structure consisting of two levels, the ventricular fold level and the aryepiglottic fold level (above the vocal fold level beneath it). Epilaryngeal constriction can occur at either or both of these two levels. The ventricular level is important because it can mechanically connect epilaryngeal action to the vocal folds; the aryepiglottic level is important because it can form a range of strictures to produce different pharyngeal manners of articulation, and, furthermore, it can form an additional sound source (Edmondson et al. Reference Edmondson, Padayodi, Hassan, Esling, Trouvain and Barry2007).
Because the epilarynx is the principal articulator of sounds traditionally classified as pharyngeal sounds (Esling Reference Esling1996, Reference Esling1999, Reference Esling2005; Esling & Harris Reference Esling, Harris, Solé, Recasens and Romero2003; Edmondson & Esling Reference Edmondson and Esling2006; Heselwood Reference Heselwood2007), it is essential to understanding the guttural/post-velar class, whose prototypical members are argued to be pharyngeals (Hayward & Hayward Reference Hayward and Hayward1989, Moisik Reference Moisik2013, Sylak-Glassman Reference Sylak-Glassman2014). Laryngeals and uvulars, the other places of articulation associated with gutturals/post-velars, are peripheral to the class. Laryngeals sometimes fail to pattern the same way as the other members (S. Rose Reference Rose1996). On the other hand, uvulars do not have cohesion within the guttural class, since /q/ is often phonologically non-guttural (Hayward & Hayward Reference Hayward and Hayward1989: 179; Trigo Reference Trigo1991: 122–126; McCarthy Reference McCarthy and Keating1994: 202–204; S. Rose Reference Rose1996: 98–101; Bin-Muqbil Reference Bin-Muqbil2006: 243–247). Thus, it is important to consider how the epilarynx works when trying to understand LVT sound systems.
There is extensive documentation of the physiological and phonetic nature of the epilarynx (Lindqvist-Gauffin Reference Lindqvist-Gauffin1972; Fink Reference Fink1974; Laver Reference Laver1980; Painter Reference Painter1986; Esling Reference Esling1996, Reference Esling1999, Reference Esling2005; Esling & Harris Reference Esling, Harris, Solé, Recasens and Romero2003; Edmondson & Esling Reference Edmondson and Esling2006; Moisik Reference Moisik2013; Moisik & Esling Reference Moisik and Esling2014). Yet most traditional phonological models ignore the epilarynx and assume strict (phonological) lingual-laryngeal independence. This is particularly evident in the laryngeal–supralaryngeal dichotomization of distinctive features and their organization (Clements Reference Clements1985; Steriade Reference Steriade, McDonough and Plunkett1987) and the glotto-centrism of Halle & Stevens’ (Halle & Stevens Reference Halle and Stevens1971) model. As mentioned, these earlier models are inconsistent with later observations made by several phonologists (Czaykowska-Higgins Reference Czaykowska-Higgins1987, Trigo Reference Trigo1991, Halle Reference Halle1995) that phonological models must address lingual–laryngeal–pharyngeal interactions found in LVT sounds.
In the approach we take here, phonological potentials of the LVT can be expressed as the set of possible NMMs and their tendencies for interaction when juxtaposed in a string of speech. Each NMM used in producing LVT sounds engages certain states of the vocal tract, such as tongue retraction, larynx raising, vocal fold adduction and so forth, and these are a useful means for characterizing the NMMs. Table 1 contains a list of these states and their abbreviations, illustrating a key principle of epilaryngeal function, namely, that it effectively couples the vocal folds to the tongue through mechanical linkage (Moisik Reference Moisik2013), and thus forms a nexus linking the various structures needed for LVT articulation. (NB: Table 1 is not exhaustive and only contains those physiological states judged to be most relevant for the present paper; elaborations are certainly possible.)
All the physiological states and their relations are based on previous research on epilaryngeal physiology and speech functioning (Esling Reference Esling2005, Edmondson & Esling Reference Edmondson and Esling2006, Moisik Reference Moisik2013). These states all refer to aspects of vocal tract configuration viewed at a coarse level of detail: e.g. {vfo} is any kind of vocal fold opening. The lingual states – tongue fronting {tfr}, tongue raising (back and up) {tra} , and tongue retraction (back and down) {tre} are based directly on Esling’s (Reference Esling2005) Laryngeal Articulator Model (discussed further in Moisik Reference Moisik2013).
The synergistic relations among physiological states constituting the model are of two sorts: there are synergies and anti-synergies. A mapping or synergy network diagram of the states in Table 1 is given in Figure 4. (Note that not all possibilities are shown; the model is simplified to what is relevant for discussion in this paper.) Synergies (denoted in-text by {↔}), indicate that two states complement each other in some facilitative way (such as the way jaw closure is facilitative of bilabial closure, as discussed in Section 1.2). Synergies do not imply a particular function, but may be useful towards achieving various functional goals associated with NMMs, such as the epilaryngeal constriction {epc} characterizing an aryepiglotto-epiglottal stop ⟦ʡ⟧. An example is the synergy between tongue retraction and larynx raising, symbolized as {tre ↔ ↑lx}; this synergy exists because of linguo–hyo–laryngeal linkage. The hyoid bone yokes the tongue and larynx together via the hyoglossus and thyrohyoid muscles, respectively. If the larynx is not actively pulled down (especially via the sternothyroid and sternohyoid), then the combination of these states will help to produce a forceful constriction within the lower pharynx and promote (but not guarantee) epilaryngeal stricture. Opposite to this is the synergy between tongue fronting and larynx lowering, {tre ↔ ↓lx}. Pharyngeal expansion achieved with tongue fronting is complimented by simultaneously lowering the larynx, so these states can work synergistically to facilitate NMMs which involve dilation of the pharynx, such as vowel NMMs like 〚i〛 and 〚u〛. (The relationships involving tongue state and larynx height are discussed further in Section 3.)
A notable pair of synergies is found in the relationship between adductive tension of the vocal folds and pitch-related states. As discussed in Laver (Reference Laver1980: 133) and Hombert, Ohala & Ewan (Reference Hombert, Ohala and Ewan1979: 47–48), abducting the vocal folds, as in breathy voice production, is associated with reduced medial compression and adductive tension. This leads towards a tendency for the conditions which promote lower pitch, thus {vfo ↔ Lf0}. Increasing these tensioning parameters promotes higher pitch (for evidence in relation to the action of the lateral cricoarytenoid muscles, see Hirano & Ohala Reference Hirano and Ohala1969; for discussion about the effects of abduction on vocal fold vibration rate see Watson Reference Watson1998: 3646), thus {vfc ↔ Hf0}. Of course, these synergies are facilitative tendencies that can be overridden (reflecting their nature as phonological potentials). One way to do so would be to contract the cricothyroid muscles or even raise the larynx in the case of the former and perhaps lower the larynx in the case of the latter (Honda Reference Honda, Bell-Berti and Raphael1995, Honda et al. Reference Honda, Hirai, Masaki and Shimada1999). Thus, we can also identify the synergies between larynx height and the pitch states, { ↑lx ↔ Hf0} (raised larynx synergizes with high pitch) and { ↓lx ↔ Lf0} (larynx lowering synergizes with low pitch), as represented by Figure 4. Pitch also relates to lingual state via hyo–laryngeal linkage demonstrated to be relevant in characterizing intrinsic f0 of vowels (Honda & Fujimura Reference Honda, Fujimura, Gauffin and Hammerberg1991, Whalen et al. Reference Whalen, Gick, Kumada and Honda1998, Hoole & Honda Reference Hoole, Honda, Clements and Ridouane2011), but this effect is not addressed in the present work (and would require characterization of hyoid state). Another intriguing relationship not modeled here is the way that the cricothyroid muscles can help arrest vocal fold vibration by increasing their tension.Footnote 4 This mechanism has been claimed to account for the tendency for voiceless consonants, particularly obstruents, to raise the f0 at the beginning of following vowels (Löfqvist et al. Reference Löfqvist, Baer, McGarr and Story1989). (Both of these details and the many others which we neglect here would be interesting points of future expansion.)
Anti-synergies (denoted by {…}), are conflicts between physiological states: they do not preclude a particular combination of states, but some compensation is needed within the system; for example, epilaryngeal vibration at higher glottal pitch {epv … Hf0} is not impossible, but requires the antithetical combination of epilaryngeal constriction and longitudinal vocal fold tension (Esling Reference Esling2005, Esling & Moisik Reference Esling, Moisik, Gibbon, Hirst and Campbell2012) needed for increasing glottal pitch. As described by Fink (Reference Fink1974) and verified by Moisik (Reference Moisik2013: Chapter 4.2, inter alia), epilaryngeal constriction favours larynx raising (and larynx raising can induce epilaryngeal constriction), thus these states synergize {↑lx ↔ epc}, but larynx lowering counteracts and hence is anti-synergistic with epilaryngeal constriction {↓lx … epc}. The main states for the tongue – retraction {tre} , raising {tra} , and fronting {tfr} – are all anti-synergistic with each other. Tongue double bunching {tdb}, which can for now be thought of as a lingual state which forms two strictures, one palatal and one pharyngeal (discussed further in Section 4), synergizes with tongue fronting {tdb ↔ tre} and retraction {tdb ↔ tre}, but is anti-synergistic with tongue raising {tdb…tra}. Diametrically opposed states, such as {Hf0} and {Lf0} or {↑ and {↓lx}, can be interpreted as being very unlikely to be combinable or outright impossible (but we will not focus on fully examining these edge cases here). Synergistic relations can be indirect, meaning that a physiological state, such as tongue retraction {tre}, although not directly connected via synergistic relations to the epilaryngeal vibration {epv} state, is indirectly synergistic with it because of the synergy with epilaryngeal constriction, or {tre ↔ epc ↔ epv}. Likewise, tongue fronting {tfr} is indirectly anti-synergistic with epilaryngeal vibration, {tfr … epc ↔ epv}. Indirect synergies/anti-synergies make sense insofar as they do not have any contradicting direct anti-synergies/synergies: so there is no indirect synergy between larynx lowering and epilaryngeal constriction, for instance via {↓lx ↔ vfo ↔ epv ↔ epc}, because these are directly anti-synergistic: {↓;lx … epc}.
The synergy network in Figure 4 has the following structure: there are three principal quality zones – phonatory quality, tonal quality, and vowel quality –which are grouped together under the higher-order category of voice quality (which is the aggregate of the former three and which is not to be confused here with phonation type). Approximately situated within each zone are the physiological states that are associated with the different quality types. Vocal fold states associated with high and low pitches, {Hf0} and {Lf0}, are important for defining tonal quality. Phonatory quality (e.g. modal voice, breathy voice, etc.) relates to the states for vocal fold opening and closure, {vfo} and {vfc}, the states for epilaryngeal constriction and vibration, {epc} and {epv}, and the states for larynx elevation, high {↑lx} and low {↓lx} (although these have associations with tonal and vowel quality, too). Finally, vowel quality mainly relates to states of the tongue, {tfr}, {tra}, {tre}, and {tdb}. Note that, as with Table 1 above, the states and relations shown here are not exhaustive and are only those necessary for accounting for LVT phenomena.
As discussed in Section 1.2, NMMs are (roughly) characterizable by some combination of physiological states. For example, a glottal stop [ʔ], which can be understood to be produced by an NMM 〚ʔ〛 controlling vocal fold closure, is associated with relatively strong vocal fold closure {vfc}, as depicted in Figure 5a by the dark association line between 〚ʔ〛 and {vfc}. (The darkness of the association lines can be used to express how strongly engaged a particular physiological state is.) Glottal stop might also be implemented (by some other speaker, for example) by a 〚̙ʔ〛, which indicates vocal fold closure plus additional reinforcement by the ventricular folds (the lower part of the epilarynx). As Figure 5a shows, vocal fold closure is very important for 〚̙ʔ〛, but epilaryngeal constriction {epc} and larynx raising {↑lx} are also incorporated into this NMM. The grey association line indicates that these two states are not likely to be as strongly engaged as they might be in, say, an NMM for 〚ʡ〛, shown in Figure 5b. Note that the NMMs do not necessarily correspond to particular phonemes: all three of the NMMs in Figure 5 might represent the NMMs that three different individuals have for a glottal stop phoneme. The ‘epiglottal stop’ NMM 〚ʡ〛 would be much less likely to be observed (although see Lindqvist-Gauffin Reference Lindqvist-Gauffin1972).
The synergistic relations in the network specify what other states might tend to appear in patterns involving 〚ʔ〛. For example, the network expresses that larynx raising, which tends to close the vocal folds {vfc ↔ ↑lx}, is more compatible with glottal stop than larynx lowering, which tends to open the vocal folds {vfo ↔ ↓lx}. This particular relation is grounded in empirical research on laryngeal physiology by Fink (Reference Fink1974), which concludes that larynx height interacts positively with laryngeal plication or folding, but its phonetic expression has also been characterized (Laver Reference Laver1980: 31; Esling & Moisik Reference Esling, Moisik, Gibbon, Hirst and Campbell2012; Moisik, Lin & Esling Reference Moisik, Lin and Esling2014). The network also expresses that epilaryngeal constriction {epc} is compatible with glottal stop, but epilaryngeal vibration {epv} and vocal fold opening {vfo} are less compatible. As we pointed out in Section 1.1, the actual probabilities for these tendencies could, in principle, be characterized, but this is a long term goal of the PPM and not an immediate concern in the present work, which is aimed at sketching out the basic theoretical framework. Thus, in accounting for observed patterns, it is enough to understand that some states work very well together (a synergy, {X ↔ Y}) and others are quite strongly opposed (an anti-synergy, {X … Y}). Overall, the synergy network in Figure 4 should be taken as a set of hypotheses about interactions between the physiological states that are most relevant for the discussion in the present paper.
1.4 Outline of LVT issues addressed
The application of the PPM framework to LVT phonology is carried out in the context of three phenomena that have been discussed in previous literature. These phenomena are useful grounds for illustrating the depth and breadth of the model. Each topic focuses on particular subsections of the synergy network so that the design of the model and its applicability to LVT phonology can be introduced in a gradual and simpler way. The first issue (Section 2) concerns epilaryngeal vibration (sphincteric/strident/‘growled’ phonation). The second issue (Section 3) concerns the fact that laryngeals pattern as if they were ‘placeless’ in some languages and ‘pharyngeal’ in others. The third and final issue (Section 4) concerns pharyngeal sounds and their relationship with palatals.
2 Epilaryngeal vibration and phonatory, tonal, and vowel quality
Epilaryngeal vibration – vibration of any part of the epilaryngeal tube structures (Figure 3), but especially the aryepiglottic folds – has never before been directly incorporated into a phonological model. Nevertheless, the PPM accounts for the potential phonological status of epilaryngeal vibration. It also specifies that epilaryngeal vibration favours abducted vocal fold states, that it is biased towards low tonal states, and that it is most compatible with retracted lingual states.
Several factors may have contributed to the omission of epilaryngeal vibration from phonological models. Unlike vocal fold vibration, the phonetic and phonological properties of epilaryngeal vibration are relatively under-documented. In addition, epilaryngeal vibration is often stigmatized as an inherently pathological phonation type, or is thought to lead to vocal pathology, and is therefore not considered deserving of phonological status, which may lead its occurrence to be dismissed by linguists. Numerous comments can be called up in relation to this last point. For example, Gordon & Ladefoged (Reference Gordon and Ladefoged2001: 401) remark ‘[i]f the !Xóõ did not exist, and someone had suggested that [sphincteric/strident voice] could be used in a language, scholars would probably have said that this was a ridiculous notion’.Footnote 5
Despite the impediments to understanding epilaryngeal vibration, there is evidence of its status as phonologically distinctive. The strongest case for distinctive epilaryngeal vibration is what is variably referred to as ‘sphincteric’, ‘strident’, or ‘epiglottalized’ phonation in languages of the Khoisan group. Noteworthy examples are !Xóõ (Traill Reference Traill, Singer and Lundy1986) and N|uu (Miller et al. Reference Miller, Brugman, Sands, Namaseb, Exter and Collins2009). Traill (Reference Traill, Singer and Lundy1986) shows laryngoscopic evidence that !Xóõ does indeed use epilaryngeal vibration, specifically at the aryepiglottic level and with vocal fold opening. Although no visual evidence is available for N|uu, the presence of epilaryngeal vibration in the ‘epiglottalized’ vowels is supported by an acoustic analysis (Moisik Reference Moisik2013: Chapter 3.1.4).
As discussed above (Section 1.3), sounds traditionally described as pharyngeals fundamentally rely on epilaryngeal constriction. In the PPM, this property is associated with epilaryngeal vibration through a physiological synergy: {epc ↔ epv}. Thus pharyngeals should show increased proneness to epilaryngeal vibration. In support of this, aryepiglottic trilling is visually attested in some variants of the pharyngeal consonants for Iraqi Arabic (Hassan et al. Reference Hassan, Esling, Moisik, Crevier-Buchman, Lee and Zee2011), and a combination of aryepiglottic and epiglottal vibration has been attested in Somali (Edmondson et al. Reference Edmondson, Padayodi, Hassan, Esling, Trouvain and Barry2007). Furthermore, Catford (Reference Catford1977a: 163) reports ‘bleat-like’ trilling associated with the pharyngeals of the Abkhazo-Adyghe languages. Impressionistic labels used in describing pharyngeals, such as ‘remarkably raucous’ (Colarusso Reference Colarusso1985: 367), are also suggestive of trilling.Footnote 6 Epilaryngeal vibration is confirmed in acoustic data for Agul (Moisik Reference Moisik2013: 124–129) featuring a female speaker of the Burkikhan dialect and male speaker of the Tpig dialect and a range of syllable and segmental contexts. In all of the above cases, it is very possible that epilaryngeal vibration is allophonically restricted, possibly occurring even as free variation or subject to prosodic factors. Further research is required to confirm where epilaryngeal vibration occurs and ascertain the exact details of the phonological factors governing its distribution.
Given the range of observations cited here, it seems plausible that phonologies can incorporate epilaryngeal vibration. What makes this phonological potential particularly interesting is the constraining of its distribution in ways that reflect the synergies and anti-synergies governing the vocal tract configuration required to produce epilaryngeal vibration. In this light, the next three sections examine how epilaryngeal vibration relates to vocal fold, tonal, and lingual states.
2.1 Epilaryngeal vibration and phonatory quality
The tendency for epilaryngeal vibration to favour abducted vocal fold states is specified by links between vocal fold states associated with phonatory quality and epilaryngeal vibration: specifically, epilaryngeal vibration is synergistic with vocal fold opening (abduction), {epv ↔ vfo}, but anti-synergistic (less likely) with vocal fold closure (adduction) {epv … vfc}; note that these synergistic relations arise in interactions with aerodynamic factors discussed below. Figure 6a (which is just a subsection of the full synergy network shown in Figure 4) depicts the synergistic relations for epilaryngeal vibration, {epv}, epilaryngeal constriction, {epc}, and the vocal fold states, {vfo} and {vfc}. (Remember that epilaryngeal vibration benefits from epilaryngeal constriction, hence {epv ↔ epc}.) At the very least, there are two NMMs implementing epilaryngeal vibration, one without concomitant vocal fold vibration, 〚 H 〛, and one with it, 〚ʢ〛. These would be interpreted as being associated to the states depicted in Figure 6b.
The reasoning (and this admittedly is entering the territory of the articulatory-aerodynamic domain) behind the postulated synergistic relations, {vfo ↔ epv} and {vfc … epv}, is that epilaryngeal vibration requires an airflow source and, by inference, we may suppose that vibration can more reliably be initiated when there is more flow. This follows Solé (Reference Solé2002), who argues that, because of the increased flow, voiceless alveolar trills have a broader tolerance for variation in pressure and lingual configuration than voiced ones. It is reasonable to suppose that similar principles apply to epilaryngeal vibration.
The above considerations are aeromechanical in nature: an abducted glottal configuration permits more airflow and this makes it easier to drive epilaryngeal vibration. However, phonological models do not typically admit aerodynamic considerations into the explanation of phonological patterns (although, see Ohala Reference Ohala, Lee and Zee2011). A relevant example is Trigo’s (Reference Trigo1991: 118) phonological analysis of !Xóõ phonatory contrasts (see Table 2). Trigo claims that ‘breathy-pharyngealized’ (sphincteric) vowels ‘fill the gap’ (p. 118) in attested vowel typologies because they combine [spread glottis] with the feature for pharyngealized vowels (which is [RL] for Trigo).
In our framework, the synergy between epilaryngeal vibration and abducted vocal fold states {vfo ↔ epv} accounts for the tendency of sphincteric vowels in !Xóõ to be realized with an abducted vocal fold configuration, and with minor, if any, vocal fold vibration (see Traill Reference Traill, Singer and Lundy1986). This reflects the phonological exploration of a physical design space and stands in contrast to Trigo’s analysis, which posits that !Xóõ has instead explored a featural design space. Although such features have a phonetic grounding (Halle & Stevens Reference Halle and Stevens1971), Trigo’s interpretation makes no use of this basis, and, thus, it does not allow us to directly relate speech sound systems to their physical reality, obscuring the actual mechanisms behind such contrasts. Epilaryngeal vibration or ‘growling’ with concomitant vocal fold opening also appears in the Zhenhai dialect of Wu (P. Rose Reference Rose1989: 238): adopting the PPM allows us to generalize across these cases.
[RL] = [raised larynx]; [sg] = [spread glottis]; [cg] = [constricted glottis].
2.2 Epilaryngeal vibration and tonal quality
The PPM specifies restrictions on the distribution of epilaryngeal vibration in relation to tonal quality, and, as Figure 7 depicts, the most important of these is a tendency towards low tonal states, {Lf0}. There are several physiological factors operating here, beginning with epilaryngeal constriction, which itself asymmetrically relates to pitch production: it synergizes with low pitch {epc ↔ Lf0} but is anti-synergistic with high pitch {epc … Hf0}. The reasoning is that, when the epilarynx constricts, it can add to the effective (i.e. vibrating) mass of the vocal folds via vocal-ventricular fold coupling (Laver Reference Laver1980: 123), and this can perturb the vocal folds towards lower rates of vibration (Moisik & Esling Reference Moisik and Esling2014). Conversely, increasing glottal pitch tends to rely on increasing longitudinal tension (mainly achieved by means of increased cricothyroid activity). This is associated in turn with posteroanterior expansion of the larynx via rotation about the cricothyroid joint (which causes the distance between the thyroid notch and the cricoid lamina to increase). Thus, high pitch states are not favourable for epilaryngeal constriction, which functions most efficiently when there is substantial posteroanterior narrowing. In other words, laryngeal adjustments for raising pitch – stretching the vocal folds from front to back – are opposite in nature to those required for epilaryngeal constriction – drawing the arytenoids and attached aryepiglottic folds up and forwards toward the tubercle of the retracting epiglottis – and the dimension of this opposition falls along the posteroanterior axis of the larynx (Edmondson & Esling Reference Edmondson and Esling2006: 169).
In regard to epilaryngeal vibration, low pitch is a product of the naturally low frequency vibration of the epilarynx (Moisik, Esling & Crevier-Buchman Reference Moisik, Esling and Crevier-Buchman2010; Moisik Reference Moisik2013: 123, 119–131), hence {epv ↔ Lf0}, which can cause glottal pitch to lower via entrainment (an effect of oscillators sharing an energy source). This may complement a bias towards low pitch perception because of the acoustic-perceptual effects of subharmonic structure created by amplitude modulation (by the vibrating epilaryngeal structures) of the glottal pulse.
As mentioned in Section 2.1, Zhenhai employs epilaryngeal vibration in relation to its tonal system. Zhenhai has two tonal registers characterized as Yin and Yang tone sets. Yin tones have high f0 onsets; Yang tones are distinguished by low f0 onsets which occur, subject to various conditioning factors (some of which are discussed further in Section 2.3), with one of three phonation types: whisper, whispery voice, and ‘growling’ (epilaryngeal vibration). In evaluating the Zhenhai system, P. Rose (Reference Rose1989) concludes that all three phonation types are phonetically related by sharing ‘epiglottalization’ (in P. Rose’s terms, p. 240), and this property thus defines the Yang register. In support of this is the fact that Yin and Yang tone contours are essentially the same, neglecting the initial, low pitch component of the Yang tonal contours (p. 243), which P. Rose interprets as the realization of the register component.
In the PPM, the Yang register can be considered to be based on NMMs that share the {epc} state; thus, the restriction of the three (constricted) phonation types to the low tone register is interpreted as an expression of the epilaryngeal constriction and low pitch synergy, {epc ↔ Lf0}, and, relatedly, the corresponding anti-synergy involving high pitch, {epc … Hf0}. That growl is restricted to the Yang register thus relates to the {epv ↔ epc} synergy, but more relevant is its further restriction to just the low f0 regions of the Yang register. The interpretation of this then is that Zhenhai has realized the {epv ↔ Lf0} and {epv … Hf0} synergistic relations (and, relatedly, {epc ↔ Lf0} and {epc … Hf0}). It must be remembered that, in principle, growling/epilaryngeal vibration should be possible with higher f0, but our linguistic example shows the bias of {epv} towards low f0 is instantiated in Zhenhai.
Epilaryngeal vibration is also found in Jianchuan Bai, which has tonal-register contrasts traditionally described as a tense–lax system. Using laryngoscopic evidence, Edmondson & Esling (Reference Edmondson and Esling2006: 173) show that the tense register – or ‘harsh register’ – is produced with epilaryngeal constriction. Although modal, breathy, and harsh phonation types are phonetically attested, the overall system (Table 3) has two contrasting registers differing by whether there is epilaryngeal stricture (constricted) or not (unconstricted).
Breathiness is restricted to the unconstricted register and occurs on the mid-falling tone (‘field, soil’, Table 3), and non-breathy realizations of this tone are not reported to occur (Edmondson et al. Reference Edmondson, Esling, Harris, Li and Lama2001, Edmondson & Esling Reference Edmondson and Esling2006). The implication under this analysis is that, more than just being a default modal register, the unconstricted register is diametrically opposed to the constricted register via anti-constriction: adjustments to the laryngeal mechanism which oppose the engagement of epilaryngeal stricture, such as larynx lowering and vocal fold opening or less adductory effort. When the tone level is high, modal phonation occurs; when it is low or falling, breathiness results. These relations are congruent with the mutual compatibility of larynx lowering, (slight) vocal fold opening, and low pitch. Put differently, Bai realizes the synergistic relations associated with larynx lowering, {↑lx … epc} but also {↓lx ↔ Lf0} and {↓lx ↔ vfo}, for the unconstricted mid-falling tone; the unconstricted mid- and high-level tones do not involve breathiness, and no synergies, other than {↑lx ↔ Hf0}, apply (meaning, we expect larynx raising will probably occur with H tone and modal voice is fully compatible with this state).
Note: All syllables are [tɕiː]. The [ii] allows for easier representation of phonation type transitions. [V̰̰] = harsh voice; [Vʢ] = epilaryngeal vibration.
In Bai, and similar to Zhenhai, epilaryngeal vibration is restricted to tones of the harsh register (‘nervous’, ‘to hurry’, and ‘flag’, Table 3), which once again reflects the {epv ↔ epc} synergy. At higher tone levels (‘leech’ and ‘to mail’, Table 3), the {epc … Hf0} and {epv … Hf0} anti-synergies act, and the result is a state where the lower epilarynx (i.e. the ventricular folds) impinges on the vocal folds, but the upper epilarynx is relatively patent and evidently under more tension than the non-high pitch configurations (Edmondson & Esling Reference Edmondson and Esling2006: 176) and, as expected according to the synergistic relations, there is no epilaryngeal vibration.
2.3 Epilaryngeal vibration and vowel quality
As noted in Section 1.4 (see Figure 4), the PPM adopts the lingual states defined by Esling (Reference Esling2005) – fronted {tfr}, raised {tra}, and retracted {tre}. The PPM specifies particular relations between vowel quality and laryngeal state: epilaryngeal constriction {epc} is most compatible with a retracted lingual state, {tre}, (i.e. the state for open, back vowels), as expressed by {epc ↔ tre}; it is not facilitated by or anti-synergistic with tongue raising (neither {epc ↔ tra} nor {epc … tra}), but it is anti-synergistic with tongue fronting, {epc … tfr}. Given its synergy with epilaryngeal constriction {epc ↔ epv}, epilaryngeal vibration is also potentially predisposed towards retracted lingual states. The mapping in Figure 8 relates specific vowel qualities (depicted as hypothetical NMMs with the self-same IPA character for a given vowel quality) to the physiological states assumed in the PPM. The grey-scale value of the association line is meant to represent the extent to which a given NMM can be characterized by a particular state (e.g. 〚ɑ〛 has far greater tongue retraction than 〚i〛) and to underscore the gradient nature of this mapping.
Vowel systems of !Xóõ, Ju|’hoansi, and N|uu help to demonstrate the link between epilaryngeal vibration (and constriction) to retracted {tre} vowel qualities. Table 4 shows the distribution of vowels within two major register categories found in these languages. The pharyngealized set – which includes the sphincteric/epiglottalized vowels (i.e. those with epilaryngeal vibration) – tends towards Esling’s (Reference Esling2005: 23) canonical retracted vowels [ɑ ɒ ʌ ɔ] (i.e. both relatively low and back, in traditional terms), i.e. those vowels whose NMMs are strongly associated with {tre}.
Evidently, some vowels are more likely to appear in the pharyngealized series than others: for example, pharyngealized [i] does not appear at all, while less strongly fronted (and therefore more retracted) [e] does occur. These differences in vowel behaviour suggest a gradient rather than a categorical requirement for retraction, which fits with the model given in Figure 8. Most permissive is N|uu, which has pharyngealized mid-vowels [eʢ oʢ] and [uʢ]. (Note that non-pharyngealized mid-vowels in N|uu have ‘[–ATR]’ allophones [ɛ ɔ], but the pharyngealized mid-vowels are realized as ‘[+ATR]’ [eʢ oʢ]: this suggests that pharyngealiza-tion unexpectedly – from a traditional [ATR] perspective – counteracts vowel ‘laxing’, a point which will be taken up in Section 4.) In !Xóõ, pharyngealization (of which sphincteric is a subtype) only occurs with /a o u/. Ju|’hoansi has the most restricted pharyngealized vowel set [ɑʢ] and [ɔʢ], closely matching Esling’s set of retracted vowels. Thus the synergy between retracted vowels and epilaryngeal constriction and vibration, {tre ↔ epc ↔ epv}, is expressed in these vowel systems. The absence of [iˤ] or [iʢ] reflects the dual aversion of epilaryngeal stricture states to fronted vowels {epc … tfr} and vowels which themselves are synergistic with larynx lowering (as NMMs for [i] would be characterized by larynx lowering synergies associated with tongue fronting, {↑lx ↔ tfr}, since larynx lowering is anti-synergistic with epilaryngeal constriction, {↑lx … epc}). The occurrence of [uˤ] or [uʢ] is interpreted in relation to its weaker association with {tfr} (see Figure 8) and the lack of strong anti-synergy between {epc}/{epv} and {tra} (see also Figure 4).
Returning to Zhenhai Wu, vowel height is a conditioner of phonation type in the Yang (low tone) register, particularly in regard to whether growl (epilaryngeal vibration) will occur or not (P. Rose Reference Rose1989: 243): relatively open oral vowels (e.g. [œʢ ɛʢ aʢ]) and nasalized vowels are possible targets for growl. In contrast, whisper, which is also a function of epilaryngeal stricture but lacking epilaryngeal vibration, has nearly complementary distribution, tending to occur on relatively close vowels (e.g. whispery [i y]). Those vowels with greater opening are characterized by more tongue retraction, {tre}, while closer vowels will exploit tongue fronting, {tfr}, especially palatal vowels like [i] (Honda Reference Honda1996, Takano & Honda Reference Takano and Honda2007). Greater pharyngeal constriction in more open vowels can be supported by larynx raising, which has been observed by Perkell (Reference Perkell1969); vowels requiring pharyngeal expansion benefit (to varying degrees) from tongue fronting synergistically combined with larynx lowering, especially for close round vowels although more open rounded vowels may also show larynx lowering, but of a lesser degree (Wood Reference Wood1979: 33). Thus, the relatively open vowels, which have indirect synergistic connections to {epv} via direct synergies with epilaryngeal constriction, {tre ↔ epc ↔ epv}, tend to trigger the growl; the relatively close vowels are not synergistic with epilaryngeal vibration in this way. According to the model, these vowels are indirectly anti-synergistic with epilaryngeal vibration via {tfr ↔ ↓lx … epc ↔ epv}. Thus we would expect that close vowels (and especially close front vowels, particularly those with lip rounding) would reduce the likelihood of occurrence of epilaryngeal vibration, leaving epilaryngeal constriction and the abducted vocal fold state (resulting in whisper or whispery voice; see Section 2.1). As far as why nasalized vowels might predispose growling, this is because of the chain effect of palatoglossus–hyoglossus–thyrohyoid: they all tend to pull towards the hyoid. We might suspect that the nasalized vowels are actually produced with a lower tongue height and are perhaps more retracted. No articulatory data are available for Zhenhai to determine whether this is true, but in other languages with nasalized vowels, there is a tendency for nasalized front vowels to be produced with a lower height and all nasalized vowels to have a more retracted position (e.g. Carignan et al. Reference Carignan, Shosted, Fu, Liang and Sutton2015). Conversely, more open vowels tend to have greater passive nasalization (Ladefoged Reference Ladefoged1971: 34), and this may be relevant in diachronic development of nasalized vowels (Ruhlen Reference Ruhlen, Ferguson, Hyman and Ohala1975).
The case of Bai demonstrates that, even in [i] contexts, extreme epilaryngeal stricture and vibration can occur (Edmondson & Esling Reference Edmondson and Esling2006: 176). This underscores the need for a gradient model of phonology that deals in terms of potentials, not absolutes. Bai thus falls on the extreme end of the vowel-quality–epilaryngeal-vibration continuum by exhibiting relaxed restrictions on which vowels can occur with epilaryngeal vibration (although it may also be that Bai exploits a mechanism for simultaneously producing epilaryngeal and palatal stricture discussed in Section 4).
Altogether, epilaryngeal vibration exhibits a strong tendency to be paired with relatively open and relatively retracted vowels, which represents an instantiation of the synergistic connection between tongue retraction and epilaryngeal constriction. The fact that a combination of epilaryngeal vibration with close vowels can occur, as in the Bai case, reflects the fact that the anti-synergies between these two diametrically opposed states can, potentially, be overcome, but this would, by hypothesis, be less frequent than those forms which show stronger synergistic combinations.Footnote 7
3 The lingual-laryngeal link
According to the PPM, there is a potential for vocal fold configuration to be influenced by supralaryngeal vocal tract state via the epilarynx. The nature of this interaction most importantly pertains to closed vocal fold states in relation to open vowels, especially retracted ones. The framework also specifies a relationship between open vocal fold states and relatively close and non-retracted vowels, but the number of synergistic interactions is less than that which is posited for the former case. Unlike vocal fold closure {vfc}, which is synergistic with a raised larynx {vfc ↔ ↑lx} and with epilaryngeal constriction {vfc ↔ epc}, the open vocal fold state, {vfo}, benefits from larynx lowering {vfo ↔ ↓lx} and is anti-synergistic with epilaryngeal constriction {vfo … epc}. This set of relations leads to two tendencies concerning sound patterns associated with the NMMs which employ constricted vocal fold states: (i) glottal stop should be more interchangeable with pharyngeal sounds, which use epilaryngeal stricture, than glottal fricative; (ii) glottal stop and phonatory qualities with glottal constriction (e.g. creaky voice) should tend to pattern with relatively open and especially retracted vowels.
In relation to tendency (i), there is some evidence showing that glottal stop more directly patterns with pharyngeals than glottal fricative does. Prunet (Reference Prunet and Hudson1996: 191) demonstrates that Proto-Ethiopian Semitic *ʔ and *ʕ merged as /ʕ/ in Inor, which is described as having ‘glottal closure’ and /a/ characteristics (p. 192). Paradis & LaCharité (Reference Paradis and LaCharité2001: 285–286; also Halle Reference Halle1995: 18) make an extensive case for structural similarity between /ʔ/ and /ʕ/ in their analysis of adaptations of post-velars in loan-word phonology. As was established in Esling (Reference Esling1996, Reference Esling2005) and further elucidated by Heselwood (Reference Heselwood2007: 18, 24–25), near-stop or full stop variants (i.e. those employing complete aryepiglotto-epiglottal stricture) are commonly found in relation to /ʕ/. Perhaps the best example comes from Tigre laryngeal-pharyngeal neutralization data, which has appeared in numerous places in the literature but which has never been given a full analysis (although, see Moisik, Czaykowska-Higgins & Esling Reference Moisik, Czaykowska-Higgins and Esling2012). Tigre, which contrasts /h ħ ʔ ʕ/ and ejective consonants (S. Rose Reference Rose1996: 92), has an optional process that neutralizes the contrast between /ʔ/ and /ʕ/ in the presence of pharyngeals and/or ejectives anywhere else in the word. For example, /ʔaddaħa/ ‘noon’ is variably realized as [ʕaddaħa] or [ʔaddaħa] (Raz Reference Raz1983: 5; Hayward & Hayward Reference Hayward and Hayward1989: 181; McCarthy Reference McCarthy and Keating1994: 224). Critically, /h/ and /ħ/ do not neutralize under the same conditions, suggesting there is a relatively greater phonological distance between these sounds compared to that between /ʔ/ and /ʕ/. This situation is accounted for in the PPM: glottal stop NMMs employing {vfc} (vocal fold closure) synergize with epilaryngeal constriction (see Figure 5), which, based on Esling (Reference Esling1996, Reference Esling1999), is a principal physiological state associated with /ʕ/ (and any NMMs which realize it).
Tendency (ii)Footnote 8 commonly manifests in Salish languages as glottal stop triggered vowel lowering. Languages in which this occurs include St’át’imcets/Lillooet (S. Rose Reference Rose1996), Klallam (Thompson, Thompson & Efrat Reference Thompson, Thompson and Efrat1974; Bessell Reference Bessell1992: 323; Montler Reference Montler and Bates1998), and Sliammon (Blake Reference Blake2000). Furthermore, glottal stop neutralizes the contrast between /ə/ and /a/ in Nxaʔamxčín (Bessell Reference Bessell1992: 142). This tendency is also attested in hiatus resolution patterns. The general pattern is for glottal stop epenthesis to occur between adjacent vowels when one of those is relatively open (especially /a/); in the context of other vowels, the glides [ j] and [w] surface in their corresponding homorganic vowel contexts. Numerous cases have been reported in the literature, such as Malay (Onn Reference Onn1976, Durand Reference Durand, Anderson and Durand1986; see also Bessell Reference Bessell1992: 338–341), Karanga and Zezuru dialects of Shona (Mudzingwa Reference Mudzingwa2010: 161–177), Kiribati (Groves, Groves & Jacobs Reference Groves, Groves and Jacobs1985), Tamil (Christdas Reference Christdas1988; Lombardi Reference Lombardi2002: 225), Ilokano (Hayes & Abad Reference Hayes and Abad1989; Lombardi Reference Lombardi2002: 227), and Dutch (Booij Reference Booij1995: 65–66; Lombardi Reference Lombardi2002: 226 fn. 6).Footnote 9 In Mongsen Ao (Coupe Reference Coupe2003: 43), creaky voice is contrastive only on the vowel /a/ (to the exclusion of /i ʉ u ə/). In German, glottal stop occurs more often before low than high vowels (Pompino-Marschall & Żygis Reference Pompino-Marschall and Z˙ygis2010), although the sample was limited. Experimental lingual ultrasound evidence shows that tongue root retraction is more favourable for laryngealization (Lancia & Grawunder Reference Lancia, Grawunder, Fuchs, Grice, Hermes, Lancia and Mücke2014).
In the PPM, the increased proneness of constricted glottal configurations towards supralaryngeal interactions, as in glottal-stop vowel-lowering cases, reflects synergistic interactions within the LVT. The PPM proposes several hypopharyngeal stop NMMs (stops in the general region of the lower pharynx-larynx) as shown in Figure 9 (and recent biomechanical modeling work by Moisik & Gick (Reference Moisik and Gick2017), provides some support for these, although it suggests that 〚ʔ〛 is unstable to the point that it will often become 〚ʔ̙〛. These NMMs engage varying degrees of epilaryngeal stricture from ventricular reinforcement, 〚ʔ̙〛, to full closure of the epilarynx, 〚ʡ〛). All of them are associated with the {vfc ↔ epc} synergy, but they vary with respect to how complex and (presumably) biomechanically stable the closure is (e.g. as it only has vocal fold closure {vfc}, 〚ʔ〛 should be more prone to spontaneous phonation than when the ventricular folds compress into the vocal folds, as in 〚ʡ̙〛). The glottal fricative, which is based on an abducted vocal fold state {vfo} does not synergize with epilaryngeal constriction in the same way that glottal stop does.
At the heart of the lingual-laryngeal link is the synergistic relation between epilaryngeal constriction and lingual retraction: {tre ↔ epc}. Lingual retraction is also synergistic with larynx raising {tre ↔ ↑lx}, which itself is favourable for epilaryngeal constriction. Thus, while the lingual-laryngeal system exhibits a great deal of flexibility, it is not unbiased: the bias is that states with laryngeal constriction favour retracted lingual states. Note that basic glottal stop 〚ʔ〛 does not alone predispose retraction, since vocal fold closure and lingual position do not have any synergies per se, but the closure in 〚ʔ〛 does synergize with epilaryngeal stricture {vfc ↔ epc}. It is through the composition of these synergistic effects that glottal stop has the potential to bias or be biased by relatively retracted or open vowels.
As noted above, close vowels, /i/ and especially /u/, are specified as being synergistic with larynx lowering rather than raising. They also act against the influence of lingual retraction because they employ expansion of the pharynx by engagement of the posterior genioglossus muscles (e.g. Lindau Reference Lindau1975, Wood Reference Wood1979, Honda Reference Honda1996). Thus, these vowels have less potential to pattern with glottal stop (and other constricted states): larynx lowering is interpreted as anti-synergistic with both vocal fold closure (indirectly) and epilaryngeal stricture (directly), {epc … ↓lx ↔ vfo … vfc}, and the relatively advanced lingual position of such vowels is anti-synergistic with epilaryngeal stricture {tfr … epc}.
To illustrate the above cases visually, we can consider the range of larynx configurations across [i], [a], and [u] shown in Figure 10. Close vowels such as [i] and [u] tend to co-occur with larynx lowering and pharyngeal expansion driven by genioglossus posterior activity: the consequence is that the epilarynx is passively expanded in both the vertical and posteroanterior dimensions. The vowel [a] relies on tongue retraction and exhibits higher larynx position; the result is a passively narrowed epilarynx. Several researchers have identified similar patterns for these and similar vowels in relation to larynx height with the general tendency being that vowel height tends to be inversely related to larynx height (Perkell Reference Perkell1969: 40; Ewan & Krones Reference Ewan and Krones1974; Lindau Reference Lindau1975: 56; Ewan Reference Ewan, Jaeger, Kawasaki and Riordan1979: 47). For instance, Perkell’s (Reference Perkell1969: 40) American English participant’s vowels exhibited larynx height according to the ranking [æ a iɪ ɛ ʊ u], from highest to lowest. Wood’s (Reference Wood1979: 26–27) vocal tract area functions are for a range of vowels produced by a Southern British English (SBE) speaker and a speaker of Egyptian Arabic (EA). In either case, [i] has lower larynx height than [æ], and is lower than [ɑ] and [a] for the SBE speaker but comparable in height to these vowels for the EA speaker, as judged by either the position of the upper epilarynx (immediately before area expansion associated with the valleculae) or by the location of the glottis. The [i] vowel also has larynx height approaching or as low as that of [u], [ʊ] for the SBE speaker. However, it should be noted that there are exceptions to the tendency: while [u] seems to be universally lower in larynx height, [i] and low vowels like [a] and [ɑ] vary in larynx height by language. (Indeed, even in the ranking derived from Perkell’s participant the vowels [ɪ] and [ε], despite being lower in tongue height than [i], are also lower in larynx height, defying the simple inverse relationship between these variables. The EA speaker in Wood’s study shows similar discrepancies with these vowels.) In Ewan (Reference Hombert, Ohala and Ewan1979), which looks at a wider range of languages than Ewan & Krones (Reference Ewan and Krones1974), including English, French, Japanese, Taiwanese, Mandarin, Vietnamese, and Thai, [o] and [u] always have a low larynx height, but only in roughly half of the cases is [i] lower than [a]. The variation may reflect the competing benefit of hyoid (and larynx) lowering to help with opening the jaw for low vowels, and, for [i], the disadvantageousness of traction on the hyoid from the larynx, as hyoid advancement would be expected to benefit tongue root advancement in [i]. This variation aside, in any case, the acoustico-auditory benefits to vowel quality are manifest: lowering the larynx will drive F1 lower for high/close vowels and raising the larynx will drive F1 higher for low/open vowels, and we could imagine this will help maximize the perceptual difference between these vowels.Footnote 10 Similar explanations arise for considering how larynx lowering complements the acoustic signature (lowered resonances) of lip rounding (Perkell Reference Perkell1969; Riordan Reference Riordan1977; Wood Reference Wood1979: 33, 1986; Hoole & Kroos Reference Hoole, Kroos, Mannell and Robert-Ribes1998), and the doubling up of these articulatory-acoustic considerations probably underlie the strong tendency for [u] to have a low larynx height. The work of Ewan (Reference Hombert, Ohala and Ewan1979) and Hoole & Kroos (Reference Hoole, Kroos, Mannell and Robert-Ribes1998) also indicates that the patterns exhibit interspeaker variation.
For the purposes of illustration, the synergistic relationships relating glottal stop to the canonical open and close vowels, ɑ, /i/, and /u/ are depicted in Figure 11. In (a), /ʔ/ (left side) is associated with /ɑ/ (right side) through its potential to be realized using 〚ʔ̙〛 (see also Figure 5), which bears physiological states that are synergistic with those of 〚ɑ〛. Slight (hence light grey) larynx raising and slight (also hence light grey) epilaryngeal constriction (in the form of ventricular fold adduction) synergize with tongue retraction, {↑lx ↔ tre} and {epc ↔ tre}, found in the typical 〚ɑ〛.
The model also accounts for why the /ʔ/ is less likely to pattern with close vowels like /i/ and /u/. This would be attributed to the anti-synergies between these sounds, as depicted for /i/ in Figure 11b. First, there is antipodal weak (light grey) association with the larynx height states between these sounds. Second, the core vocal fold closed state of glottal stop is (indirectly) anti-synergistic with larynx lowering weakly associated with 〚i〛, {vfc … vfo ↔ ↓lx}, and possible epilaryngeal constriction of NMM implementations of /ʔ/ is likewise anti-synergistic with tongue fronting strongly associated with 〚i〛, {epc … tfr}. Finally, those /ʔ/ with epilaryngeal constriction do not benefit from larynx lowering that might occur with 〚i〛, as expressed by {epc … ↑lx}. Nothing stops /ʔ/ from being produced in the context of an /i/ (anti-synergies can be overcome), but the phonological potential – the bias – for /ʔ/ and /i/ sounds to become associated is not as strong as that between /ʔ/ and relatively open vowels. The greater tendency for rounded vowels, especially /u/, to be expressed with larynx lowering makes these vowels even less likely to pattern with glottal stop in this way. The indirect anti-synergy between vocal fold opening and larynx raising {vfo … vfc ↔ ↓lx} inhibits epilarynx-mediated interactions between /h/ and /ɑ/, which accounts for the asymmetric behaviour of /ʔ/ and /h/.
Stepping back, the synergistic relations help to characterize (minimally) the biomechanical-articulatory factors stacking the odds against such an association (and likewise favouring that between glottal stop and low vowels, especially /ɑ/). We also want to remember that these will act in tandem with phonological potentials in other domains that may or may not align with these biomechanical-articulatory tendencies (as discussed further in Section 5).
4 The pharyngeal-palatal link
It has been recognized for some time (going as far back, at least, as Trubetzkoy Reference Trubetzkoy1939) that pharyngeals do not always do the ‘expected’ thing (Comrie Reference Comrie2005: 2) to neighbouring vowels, but can in fact become associated ‘with fronted vowel qualities’ as in Tsez. This potential palatal component of pharyngeals – or pharyngeal-palatal link – is not addressed by the traditional and widely assumed analysis that pharyngeals are [RTR], are formed from primary stricture of the pharyngeal tube (see Figure 3), and behave, for the most part, on par with uvulars in relation to phonological patterning, such as vowel alternations (Bessell Reference Bessell1992 McCarthy Reference McCarthy and Keating1994, Halle Reference Halle1995, S. Rose Reference Rose1996, Paradis & LaCharité Reference Paradis and LaCharité2001, Bin-Muqbil Reference Bin-Muqbil2006).
According to the PPM, however, the pharyngeal-palatal link can be accounted for by inspection of the synergistic network. The pharyngeal-palatal link, as it is depicted in Figure 12 (grey region), is an expression of the phonological potential for pharyngeals to become associated with a palatal stricture via engagement of a double-bunching configuration, {tdb} (see below; for more detailed discussion, see Moisik Reference Moisik2013: Chapters 5.3 and 5.4). Since the {tfr} of NMMs that form palatal sounds, such as 〚∫〛 and 〚i〛, is anti-synergistic with {epc}, palatal sounds themselves are unlikely to induce epilaryngeal stricture. However, pharyngeals and pharyngealized sounds, which have the potential to be associated with NMMs which exploit the {tdb} configuration (‘〚ʕʴ〛, 〚iˤ〛 etc’ in Figure 12), are at least complementary with palatal sounds insofar as the lingual constriction is concerned. The potential for double-bunching on pharyngeals and pharyngealized sounds is not obviously a necessity, and pharyngeal consonant NMMs possessing only simple tongue retraction {tre} (‘〚ʕ〛, 〚ħ〛 etc’ in Figure 12) are also expected to be possible. It is an open empirical question as to what the distribution of tongue double-bunching {tdb} is like for pharyngeal sounds in general.
There is good evidence that pharyngeals are often quite different from uvulars in their vowel effects, as Comrie’s (Reference Comrie2005: 2) observation suggests. Hayward & Hayward (Reference Hayward and Hayward1989: 183) point out that the gutturals in D’opaasunte, /ʛ χ ʕ ħ ʔ h/, which neutralize the /e/ ∼ /a/ contrast, cause lowering to /a/, and this vowel is realized with ‘marked fronting (to [æ])’ only when preceded by a pharyngeal (p. 183). In fact, they argue, based on similar evidence, that it makes no sense to classify pharyngeals sounds as [+back, +low]. McCarthy (Reference McCarthy and Keating1994: 197) notes that the Arabic low vowel /a/ is realized as more front [æ] near pharyngeals and as [ɑ] in the context of uvulars (for example, [ħæːl] ‘condition’ vs. [χæːl] ‘maternal uncle’). Herzallah (Reference Herzallah1990: 29, 59) and S. Rose (Reference Rose1996: 87) describe similar patterns in Palestinian and Iraqi Arabic, respectively.
Caucasian pharyngeals and pharyngealization provide more striking evidence of this pharyngeal-fronting pattern such that a relationship between pharyngeals and palatal sounds becomes apparent. Lak (Anderson Reference Anderson and Kaye1997) has the vowels /i a u iˤ aˤ uˤ/; pharyngealized vowels are realized as [eˤ], [æˤ] and [œˤ] and, when present in a word, cause /k/ and /l/ to palatalize (p. 980). Furthermore, Anderson (Reference Anderson and Kaye1997: 975) suggests that pharyngealization has autosegmental status, that it is blocked by dental sibilants, and that Russian–Lak bilinguals pharyngealize Russian [a] and [u] when these vowels follow a palatal(ized) sound. Bezhta pharyngeals morphophonologically participate in a palatal harmony which groups /ɑ o u i s z ts ts’/ and pharyngeal-palatal /aˤ oˤ uˤ i e ∫ ʒʧʧ“ʕħ/ (Kibrik & Testelets Reference Kibrik, Testelets and Job2004: 221–222). Catford (Reference Catford1977b: 291) observes that Abkhaz /ɥ/ (the labial-palatal approximant) is cognate with Abaza /ʕʷ/, which suggests a palatal constriction was present in the proto Abkhaz-Abaza segment and probably persists in the Abaza〚ʕʷ〚. Furthermore, pharyngealization in Tsakhur causes ‘fronting’ of back vowels (Catford 1977b: 294–295), particularly /u o/ (the same is true for Udi). Catford (Reference Catford, Bless and Abbs1983: 348–350) recapitulates that pharyngealized vowels in Tsakhur and Udi are condensed in the vowel space, clarifying that ‘the front ones seem to be lowered and retracted, and the back appear fronted’.
Catford provides a key insight into this connection by pointing out the parallelism between the lingual state in Caucasian pharyngealization and the pharyngeal-palatal stricture associated with American English ‘double-bunching’ /r/ variants. According to Catford (Reference Catford, Bless and Abbs1983: 349), double bunching is when ‘[t]he tongue root at about the level of the tip of the epiglottis bulges backwards into the pharynx, while a depression is formed in the dorsal surface of the tongue approximately opposite the uvula, with a further upward bulge further forward on the tongue’. Articulatory and acoustic data support the connection: X-ray imaging of Tsakhur[oˤ] and Udi [aˤ] (based on Gaprindashvili Reference Gaprindashvili1966, Catford Reference Catford, Bless and Abbs1983: 350, 2002: 176–177) shows a distinct double-bunched configuration (see Figure 13a and b). Catford’s acoustic data (1983: 349; for additional discussion, see Moisik Reference Moisik2013: 302-305) demonstrate that, like American English /r/, F3 lowering and F2-F3 approximation occur for pharyngealized vowels in both Tsakhur and Udi, a fact that is further verified for Tsez in Maddieson, Rajabov & Sonnenschein (Reference Maddieson, Rajabov and Sonnenschein1996). Pharyngealized vowels in !Xóõ (Figure 13c) also show this double-bunching configuration of the tongue (Hess Reference Hess1998; Moisik Reference Moisik2013: 318).
The most extreme example of this pharyngeal-palatal phonological potential is found in those Caucasian languages, such as Lak and Bezhta, where palatal and pharyngeal segments form a class and pattern together, and the other cases, where pharyngealized vowels are associated with a degree of fronting. Those cases in other languages, such as (some varieties of) Arabic, indicate that the bias towards the {tdb} configuration is quite typical of pharyngeals, even if in such cases its phonological expression is limited to subtle influences on vowel quality, differing from that which is exerted by other gutturals. Furthermore, double bunching is anti-synergistic with tongue raising, {tdb … tra}, meaning that vowels like /u/ become strangely coloured and lose much of their original quality. Udi illustrates this effect: Catford (Reference Catford1977a: 294–295) reports that /u o/ are particularly compromised, having a ‘distinctly central quality’ when pharyngealized. We attribute this tendency to the concavity of the velar surface of the tongue when in the double-bunched configuration.
The pharyngeal-palatal link falls outside the scope of the traditional conception of pharyngeals as pharyngeal-tube strictures. The reality is that the primary stricture is of the epilaryngeal tube (refer to Figure 3), and this opens up the potential for the hybridizing of two seemingly disparate areas of articulation (palatal and pharyngeal) in the form of NMMs which exploit the tongue double-bunching {tdb} state. This link also helps to express the unique voice quality associated with pharyngeals, which is often neglected given the tendency for traditional analyses to focus on vowel quality, as observed in Section 2.2 (Edmondson & Esling Reference Edmondson and Esling2006: 177), and to miss aspects of voice quality, which simultaneously implicate changes to tonal, phonatory, and vowel quality (see Figure 4 for how this is represented in the proposed model). This unique voice quality (pharyngealized or raised larynx voice, depending on pitch level) undoubtedly serves an important function in the perceptual identity of such sounds. Finally, it is also evident that the notion of pharyngealization cannot be treated as a simple homogenous phenomenon. The effects described in this section are wholly different from those that might be better described as uvularization (Hoberman Reference Hoberman1985: 7; Czaykowska-Higgins Reference Czaykowska-Higgins1987). This would minimally include Arabic emphatic consonants that exhibit secondary oro-pharyngeal stricture (for lingual ultrasound evidence, see Zeroual, Esling & Hoole Reference Zeroual, Esling, Hoole, Majeed Hassan and Heselwood2011) and uvularization and epiglottalization in Ju|’hoansi (Miller-Ockhuizen Reference Miller-Ockhuizen2003, Miller Reference Miller2007).
5 Implications and conclusions
This paper has elaborated and explored the Phonological Potentials Model with regard to lower vocal tract phonology and a focus on the biomechanical-articulatory domain. As argued here, phonological potentials are biases that act on speech sound systems: they come in many forms and act within many different physical domains of speech. This sort of approach has been advocated for years by a few researchers (especially, e.g. Ohala Reference Ohala, Hardcastle and Mackenzie Beck2005, and similarly Stevens & Keyser Reference Stevens and Keyser2010), but never before advanced as a cohesive framework to understand speech sound systems.
Our work is firmly grounded in an articulatory phonetic understanding of lower vocal tract function that has accumulated over the past twenty years; however, the present work offers new synthesis, interpretation, and application of these results. We have taken the observations present in this research, distilled and reworked these into a compact abstraction of the complex structural interactions found in this region (depicted using the ‘wiring diagram’ in Figure 4). We then fused this with new ideas about speech motor control to indicate that speech biomechanics is intrinsically self-discretizing and modularized (the NMMs).
Finally, we applied the resulting conceptual model to the analysis of a wide range of under-documented languages, most of which have never been given much treatment from previous models of phonology and have also not been thoroughly addressed in the articulatory phonetic literature that forms the empirical grounding of the present model. The results of this embodied approach help to characterize some of the possibilities (e.g. epilaryngeal vibration as a distinctive phonation type and its complex interactions with phonatory, tonal, and vowel quality) and patterns (e.g. pharyngeal-palatal patterning) that were previously considered unexpected and were not addressed by previous models, but which have turned out to arise in connection with the LVT. The PPM also represents for the first time an effort to capture the notion of and role played by voice quality (i.e. holistic sound quality associated with overall vocal tract articulatory setting) advocated for many years by researchers such as Laver (Reference Laver1980), but never used in direct analysis of phonological patterns.
We hypothesize that a better characterization of how these different physiological domains coincide to bias certain types of sound patterns (i.e. their phonological potential) will give greater clarity to how and why speech sounds are organized in the way that they are and what makes some patterns more likely than others.
To be more concrete, this situation is depicted in Figure 14. Physical-physiological domains (black lines in Figure 14) constitute speech subspaces wherein various (for example, biomechanical-articulatory or articulatory-acoustic) nonlinear relations constitute biases or ‘warpings’ (which appear as bumps on the lines in Figure 14) of the subspace associated with specific phonological patterns: this is the essence of the notion of phonological potential. The phonological potentials operating in this subspace may (Figure 14a), or may not (Figure 14b), show certain alignments or agreements. For example, a biomechanical-articulatory property such as tissue-on-tissue contact may make a speech sound reliably producible, while a given nonlinearity in the corresponding articulatory-acoustic domain may likewise reinforce this reliability; further complementary properties might likewise be found in the acoustic-auditory domain. A strong alignment (Figure 14a) of such properties would be expected to give rise to a strong overall phonological potential towards some pattern, and it would be expected then that the sound pattern in question would be realized repeatedly in the phonologies of many languages. The converse is when there is poor alignment (Figure 14b). In this case, the phonological potential is weak and phonologies might diverge towards many different types of patterns with possibly a very different nature. This latter case is interpreted in the PPM as the basis behind relatively uncommon phonological phenomena, such as epilaryngeal vibration.
As formulated in the PPM, by the assumption that all humans are fundamentally similar in terms of their vocal tract structures and physiology and the biases acting on these, languages are all subject to the same phonological potentials. This then must drive cross-linguistic similarity in speech sound organization.Footnote 11 But phonological potentials, by their nature, are not deterministic mechanisms; they are stochastic in nature. We might best conceive of them as acting probabilistically rather than as absolute constraints (otherwise, they would not be potentials). The numerous interacting layers of phonological potential across the various physical domains give rise to the possibility of variation and thus cross-linguistic diversity. In addition to this, there are also many physically-external factors (such as sociophonetic mechanisms) that influence the developmental trajectory of a language, many of which are stochastic in nature. Thus, the exact trajectory a given language takes results from the complex interaction of the phonological potentials, the external factors, and the noise inherent in these systems. A parallel may be drawn to development of biological organisms (including people), which show remarkable ‘canalization’ towards reoccurring and stable forms but which also are subject to variation arising from stochastic molecular processes and environmental factors (Hallgrímsson, Willmore & Hall Reference Hallgrímsson, Willmore and Hall2002).
The model presented here represents an elaboration of the phonetic–phonological mapping, particularly as it concerns segmental phenomena. It is not an alternative approach to traditional feature-and-constraint-oriented phonology of the past half century, but such models might excessively ascribe cognitive status to phonological phenomena which are deeply rooted in the functioning of the speech mechanism. Such could be especially true for models such as Feature Geometry, which seem to replicate the layout of the vocal tract but fail to capture many of the nuances of how it works, such as aerodynamic factors (e.g. Ohala Reference Ohala, Hardcastle and Mackenzie Beck2005, Reference Ohala, Lee and Zee2011). The PPM approach differs from models that propose a strict division between the computational/grammatical part of phonology and the part associated with phonetic realization (e.g. Hale & Reiss Reference Hale and Reiss2008). In the broad view, wherein all physical domains of speech are considered, the sum of all phonological potentials can be taken to represent the substrate from which both phonological categories/groupings and processes emerge (sensu Mielke Reference Mielke2008): these phonological units are the abstract realizations of phonological potentials. Put differently, the emergence of feature patterns is not a random, undirected process but instead one that is governed in highly specific ways by the phonological potentials associated with the many physiological domains that speech occupies. These domains set the scope for what is possible and what is not, for which sounds will likely pattern together and which will not, and the likely processes that will arise in association with the units of speech.
The PPM contributes to new foundations from which to understand speech sound systems, and it is firmly rooted in developments in phonetics and phonology that indicate the necessity of an embodied approach. It parallels work in Message-Oriented Phonology which examines biases outside of phonology proper that are claimed to support the accuracy of message transmission (see Hall et al. Reference Hall, Hume, Jaeger and Wedel2016). And while the PPM deals primarily with presumably universal biases intrinsic to the speech apparatus, it is also fully compatible with research concerning extrinsic linguistic biases which may shape phonological diversity and expose the adaptive nature of language as evolving (in cultural and evolutionary terms) in the context of the environment, demography, and biology of its speakers (e.g. Everett Reference Everett2013, Everett, Blasi & Roberts Reference Everett, Blasi and Roberts2015, Lupyan & Dale Reference Lupyan and Dale2010, Dediu & Ladd Reference Dediu and Ladd2007; for debate surrounding the effect of environment on language, particularly in relation to tone, see Dediu & De Boer Reference Dediu and Boer2016). But, again, cross-linguistic variation does not depend on these factors, since phonological potentials are not absolutes but rather tendencies towards certain patterns. It would be a mistake, however, to rule out these other sources of variation without proper scientific investigation. Such research is helping to expose how incredibly rich and complex the (cultural and even biological) evolutionary landscape of language is.
Within the context of the LVT, future hypothesis testing can be conducted to validate and elaborate the model. One important consideration is that we are assuming the validity of the many different cases serving as the empirical foundation of the work. All of these phenomena are deserving of further phonetic and phonological investigation, which may lead to new insights that need to be reflected in the PPM. With contemporary tools, such as computational modeling, it is becoming possible to reveal the extent of the quantal nature of speech biome-chanics, which will help in identifying the set of NMMs and validate and further explore the synergies acting on those NMMs. Biomechanical simulation research that examines such effects in the context of the larynx is underway and provides support for some of the NMMs discussed in this article (Moisik & Gick Reference Moisik and Gick2017). More sophisticated computational models would allow for cross-domain effects (aerodynamics interacting with biomechanics) to be explored. Agent-based simulations in the iterative learning framework might likewise help us explore how phonological potentials interact to influence the dynamics of sound change and the influence of variation at many different levels (e.g. Morley Reference Morley2013, Dediu, Janssen & Moisik Reference Dediu, Janssen and Moisik2017). Together with data from phonological databases (such as Phoible; see Moran, McCloy & Wright Reference Moran, McCloy and Wright2014), such work would help to further identify and validate phonological potentials, how they operate across various physical domains of speech, and how they act to influence speech sound systems as they evolve through time.
ABBREVIATIONS
- LVT
-
lower vocal tract (from uvula to larynx)
- UVT
-
upper vocal tract (from uvula to lips)
- PPM
-
Phonological Potentials Model
- NMM
-
neuromuscular modules (symbolized with IPA characters in 〚 〛 brackets)
- vfo*
-
vocal folds open (abducted)
- vfc*
-
vocal folds closed (adducted/pre-phonation posture)
- epc*
-
epilaryngeal constriction
- epv*
-
epilaryngeal vibration
- tfr*
-
tongue fronting
- tre*
-
tongue retraction
- tra*
-
tongue raising
- tdb*
-
tongue double bunching
- ↑lx*
-
raised larynx
- ↓lx*
-
lowered larynx
- Hf0*
-
increased tension, less vibrating mass (high f0/pitch)
- Lf0*
-
decreased tension, more vibrating mass (lower f0/pitch)
- {A ↔ B}
-
states A and B are synergistic (mutually compatible articulation)
- {A … B}
-
states A and B are anti-synergistic (in articulatory conflict)
* = physiological state of the vocal tract