Since the research of Dorian (Reference Dorian1973, Reference Dorian1994), languages undergoing language shift have often been noted to show greater category-internal and social variation. Many studies linking language shift to increased variability (e.g., Cook, Reference Cook1989; Holloway, Reference Holloway1997) are based on speakers who are, according to some authority, not fully fluent in the language at hand. This has prompted explanations of high variability in terms of incomplete language acquisition. However, in other language shift contexts, even speakers who fully acquired the language show high variability; for example, Hellwig and Schneider-Blum (Reference Hellwig and Schneider-Blum2014) attributed high variability in vowel productions by fluent Tabaq speakers to migration and contact with Arabic rather than incomplete acquisition. The concept of “language attrition”–a decrease in L1 exposure following successful acquisition–has been proposed as an alternative explanation. Bird (Reference Bird2008), for one, argued that increased variability in the realization of laryngealization in St’át'imcets resonants is due to reduced exposure to St’át'imcets on the part of its speakers.
The acquisition of a second grammatical system, typically of a socioeconomically dominant language, has also been argued to prompt changes in shifting languages (Campbell & Muntzel, Reference Campbell, Muntzel and Dorian1989). While the idea that an L1 and an L2 can influence each other has often been proposed and supported (e.g., Flege, Reference Flege1987), convergence is not automatic: Torres Cacoullos and Travis (Reference Torres Cacoullos and Travis2018:203-4) and Stanford (Reference Stanford2008) showed that morphosyntax and phonetics can be resistant to change in situations of long-term language contact.
This study investigates the effects of reduced L1 exposure (operationalized as residence away from an L1-dominant community) and multilingualism on two developments in vowel productions in Sanapaná (ISO639-3: spn, Enlhet-Enenlhet, Paraguay): (1) movement of /e, o/ in the typologically rare /e, a, o/ system toward the high corners of the vowel space and (2) increased intracategory variability of /e, o/. Sanapaná provides an ideal context to investigate the effects of these factors independently.
Language shift among the Sanapaná is fairly advanced: only a third of the ethnic group still speak Sanapaná. Those who still speak the language vary on both parameters of interest. Regarding multilingualism, there is a continuum from near-monolingual Sanapaná speakers to speakers also fluent in Paraguayan Guaraní and Paraguayan Spanish, both of which have high vowels (Table 1). Regarding L1 exposure/immersion, there is a continuum from speakers who use Sanapaná for daily communication to speakers who live away from their home community and hardly use Sanapaná. Crucially, the two continua are not strongly correlated with each other.
This paper has three goals. First, it presents the first acoustic study of the Sanapaná vowel system. Second, it tests predictions of an exemplar-based model of phonetics and phonology regarding the effects of language shift on intracategory variability and vowel quality in an /e, a, o/ system (Section “Exemplar Theory, Language Shift, and Multilingualism”). Third, it tries to separate two manifestations of language shift—being multilingual and losing L1 exposure—from each other and from age as predictors for change. The data are limited in sample size and register diversity (with data only from word list repetitions) due to the logistical challenges of fieldwork in small-scale Indigenous communities (see, for example, Kasstan, Reference Kasstan2019:699). Nevertheless, such data from underrepresented communities form a valuable testing ground for hypotheses based on data from more well-resourced languages.
The Sanapaná and their language
Sanapaná, one of the six Enlhet-Enenlhet languages, is spoken by around one thousand of the 2,500 Sanapaná people (autodenomination /nenɬet/; DGEEC, 2012).
Sociolinguistic situation
The Enlhet-Enenlhet peoples started the process of language and culture shift in the late nineteenth century. Because of the invasion of their hunting grounds in the Gran Chaco, many moved to factory towns on the Paraguay River. There, they adopted Paraguayan Guaraní for communication with other indigenous groups and Latinos (Villagra & Bonifacio, Reference Villagra, Bonifacio, Córdoba, Bossert and Richard2015). Nowadays, the Sanapaná mostly live in seven rural communities in the Presidente Hayes department. In La Esperanza (where this study was conducted, Figure 1) and Anaconda, over 85% of the population speaks Sanapaná (DGEEC, 2012). Only here is the language still transmitted, largely as children's L1. In Nueva Promesa, nearly 60% of the population speaks Sanapaná. In Laguna Pato, Xákmok Kásek, Karandilla Poty, and Karanda'y Puku, less than 15% use Sanapaná with any frequency. These communities form a continuum from incipient to nearly complete language shift.
Even in La Esperanza and Anaconda, the language is under pressure, and speakers vary in their degree of multilingualism and exposure to Sanapaná. Some (mostly older, female) speakers only have passive knowledge of Spanish and Guaraní, but most speakers are fairly fluent in at least one of the two. Formal schooling is conducted in Spanish and Guaraní. Many Sanapaná people across age groups also spend long stretches of time working on Mennonite landholdings or in nearby cities (Loma Plata and Filadelfia, Figure 1), where they have little opportunity to use Sanapaná. Younger generations often move to these cities for education. It is therefore not uncommon for people to be immersed in an L2 environment for weeks or months at a time.
Overall, Sanapaná speakers are using Guaraní and Spanish in ever more social domains. While a situation of stable multilingualism where Sanapaná continues to be used in the home could arise, Guaraní seems to be gaining ground even there. When a Sanapaná speaker marries a Guaraní speaker (e.g., someone from another Sanapaná community), Guaraní is usually spoken in the home.Footnote 1
Sanapaná vowel system
Sanapaná has three contrastive vowels: mid front /e/, mid back /o/, and low central /a/ (Gomes, Reference Gomes2013; Van Gysel, Reference Van Gysel2017). Unlike related Enlhet Norte (Unruh & Kalisch, Reference Unruh and Kalisch1997) and Enxet Sur (Elliott, Reference Elliott2021), Sanapaná has no vowel length distinction. Figure 2 illustrates the system based on three near-monolingual speakers. Formant values are not normalized, ellipses span 1.5 standard deviations around the category mean.
The description of the nonlow vowels as mid is supported by their mean F1. At around 500 Hz (men) and 500–600 Hz (women), they are closer to the average F1 of 475 Hz listed for Spanish /e, o/ by Pasquale (Reference Pasquale, Stanford and Preston2009:246) than to the F1 of 300 Hz for Spanish /i, u/. These mean F1 values for Sanapaná /e, o/ also approximate values for Mexico City Spanish /e, o/ (Figure 3; no measurements were found for Paraguayan Spanish).
This system is typologically uncommon on two accounts. When querying crosslinguistic databases of phoneme inventories, only forty-nine of the 1,672 languages (2.9%) in the Phonetics Information Base and Lexicon (PHOIBLE) and nineteen of the 317 languages (6%) in the UCLA Phonological Segments Inventory Database (UPSID) have a vowel system with three or fewer contrasting qualities (Moran, McCloy, & Wright, Reference Moran, McCloy and Wright2014; Schwartz, Boë, Vallée, & Abry, Reference Schwartz, Boë, Vallée and Abry1997). Secondly, the exact vowel qualities are unexpected. Over 90% of vowel spaces in the world's languages contain /i, a, u/ (Vallée, Reference Vallée1994). Of the forty-nine small systems in PHOIBLE, only four are like Sanapaná in having no high vowels.
Preliminary observations
Gomes (Reference Gomes2013:108–11) noted that Sanapaná /e/ and /o/ vary from [ɛ] to [i] and from [ɔ] to [o], respectively. During fieldwork, I rarely encountered [ɛ] or [ɔ]-like productions but did notice variability in vowel productions, with raising and fronting of /e/ and raising and backing of /o/.
Smaller vowel systems have been found to typically allow more (coarticulatory) variability than larger systems, because, in the latter, variability increases the risk of category overlap (Manuel, Reference Manuel and Studdert-Kennedy1990). Therefore, one might expect high intracategory variability of /e, o/ to be a stable feature of Sanapaná. Raising and fronting (of /e/) or backing (of /o/) could be internal developments as well. Given the rarity of /e, a, o/ systems, and Liljencrants and Lindblom's (Reference Liljencrants and Lindblom1972) Dispersion Theory, one might hypothesize movement of /e, o/ toward the periphery to be functionally motivated by contrast maximization. In this case, one would expect raising to be predicted by date of birth, but not by factors specifically related to language shift.
However, one could also hypothesize external motivations for both developments. Since increased variability has often been related to language attrition (e.g., Cook, Reference Cook1989), one might expect speakers immersed in an L2-environment to show greater variation in vowel productions than daily users of Sanapaná. One might also expect multilingual speakers to show convergence with their L2 vowel systems which contain high vowels,Footnote 2 and therefore show higher and fronter /e/ and higher and backer /o/ than monolinguals. An exemplar-theoretic model of phonetic and phonological knowledge predicts exactly these two developments in the scenario of language shift from Sanapaná to Guaraní/Spanish.
Exemplar theory, language shift, and multilingualism
Exemplar theory was introduced into linguistics by researchers in the usage-based framework (Bybee, Reference Bybee2001; Pierrehumbert, Reference Pierrehumbert, Bybee and Hopper2001). Its adoption was prompted by observations of frequency-based, word-specific variation (e.g., Hooper, Reference Hooper and Christie1976).
Perception and production in exemplar theory
Exemplar theory posits that perceived utterances are stored in memory with rich phonetic and contextual detail. Such stored “exemplars” establish memory links with each other based on perceptual and contextual similarity, forming linguistic and social categories as emergent schemas over connected exemplars (Bybee, Reference Bybee2001:7).
In perception, hearers make categorization judgments to interpret utterances. Existing exemplars in the perceptual representation have an activation level, determined contextually (based on their similarity to incoming tokens) and by their overall frequency and recency (Pierrehumbert, Reference Pierrehumbert, Bybee and Hopper2001:141). Categories with many highly activated exemplars close to a new token have a greater chance of “winning the competition” for its categorization (Kruschke, Reference Kruschke1992). Because of the continuous storage of new exemplars, strengthening of frequent ones, and decay of infrequent ones, categories are inherently dynamic (Elliott & Anderson, Reference Elliott and Anderson1995; Pierrehumbert, Reference Pierrehumbert, Bybee and Hopper2001:140).
When producing a category, all exemplars in the production representation within a fixed-size neighborhood around a randomly selected exemplar of this category are weighted by activation level and averaged to form a production target (Pierrehumbert, Reference Pierrehumbert, Bybee and Hopper2001:150; Rosenbaum, Engelbrecht, Busche, & Loukopoulos, Reference Rosenbaum, Engelbrecht, Busche and Loukopoulos1993). Frequent, prototypical, and contextually appropriate exemplars are most strongly activated and therefore influence this averaging most, drawing productions toward the category prototype. This mechanism, entrenchment, constrains variation in production over time (Pierrehumbert, Reference Pierrehumbert, Bybee and Hopper2001:149-52).
Rather than understanding speech production and perception as separate cognitive processes, recent arguments have been made in favor of unified perceptual and production representations (e.g., Wedel, Reference Wedel2004). Support for this position comes from, for example, phonetic convergence between interlocutors (Pardo, Reference Pardo2006), and neuroimaging studies observing activation in motor areas of the brain during perception, and in areas closely related to speech perception during production (e.g., Casserly & Pisoni, Reference Casserly and Pisoni2010; Tourville, Reilly, & Guenther, Reference Tourville, Reilly and Guenther2008). However, it has also been shown that neural overlap between perception and production systems is partial (e.g., Grabski, Schwartz, Lamalle, Villain, Vallée, Baciu, Le Bas, & Sato, Reference Grabski, Schwartz, Lamalle, Villain, Vallée, Baciu, Le Bas and Sato2013), and that learning in L2 production and perception does not always progress in tandem (e.g., Baese-Berk, Reference Baese-Berk2019). This paper remains agnostic as to whether perceptual and production representations are separate or unified. My argumentation will only assume that they are connected and, specifically, that exemplars of perceived tokens can be activated to some extent during production, therefore influencing production targets and outputs.
Multilingualism in an exemplar-based approach
The Speech Learning Model (Flege, Reference Flege1987) of L2 acquisition predicts new categories to be established for L2 sounds perceptually and acoustically different from L1 categories. For L2 categories perceptually similar or identical to L1 categories, no new category is established—the L2 sound is categorized in terms of the nearest L1 category (“equivalence classification,” Flege, Reference Flege1987). Equivalence classification is expected to result in bidirectional, convergent change between the L1 and L2 categories if they are similar but not identical (Flege, Reference Flege and Strange1995).
This prediction is compatible with exemplar theory. Various studies have shown that bilinguals’ L1 and L2 are typically both activated even in monolingual settings, in perception (Chen & Marian, Reference Chen, Marian, Gaskell and Mirkovic̆2017; Marian & Spivey, Reference Marian and Spivey2003) and production (Kroll, Bogulski, & McClain, Reference Kroll, Bogulski and McClain2012). If this is the case, perceived L2 tokens can conceivably be integrated in exemplar clouds of L1 categories, affect their emergent schemas, and affect the categorization of subsequently perceived L1 tokens. Similarly, L2 production representations would be activated to some degree during L1 production and would be able to influence production targets. In both processes, L1 tokens would presumably be more influential, as they would have stronger contextual activation levels. The activation level of L2 tokens would be expected to increase as the L2 is used more frequently. This mechanism would explain findings such as those of Major (Reference Major, Hyltenstam and Viberg1993), where L1 English speakers showed VOTs for English consonants approaching the norms of their L2, Portuguese, and Pasquale (Reference Pasquale, Stanford and Preston2009), where the /ɪ, ʊ/ of L1 Quechua speakers showed raising toward [i, u] under influence of their L2 Spanish.
This model offers specific predictions for the Sanapaná case. Consider the vowel system of a monolingual Sanapaná speaker upon first exposure to Spanish/Guaraní vowel tokens (Figure 4; cf., Bird, Reference Bird2008:395). The L1 vowels each have a distinct exemplar cloud, both in perception and production. These are densely populated and characteristic of frequent language use. Entrenchment draws productions toward the category means (interior, thick-line ellipses).
Perceived L2 mid front and back vowels are expected to fall generally within the range of variation represented in the Sanapaná /e, o/ exemplar clouds. Therefore, equivalence classification with Sanapaná /e, o/ is expected to apply, and they are not expected to significantly alter the structure of L1 exemplar clouds: L2 tokens mostly strengthen the existing L1 exemplars of which they are perceived to be instances. As for L2 /i, u/, it is not a priori clear whether new categories will be established or equivalence classification will apply, and L2 /i, u/ tokens will be categorized as (nonprototypical) instances of L1 /e, o/.
There are empirical and theoretical reasons to assume the latter outcome. On the one hand, in many Spanish loans, /i, u/ are replaced by Sanapaná /e, o/:
Theoretically, the assimilation of L2 /i, u/ to L1 /e, o/ need not surprise either. Johnson, Flemming, and Wright (Reference Johnson, Flemming and Wright1993) showed that American English speakers tend to judge higher and fronter /i/ tokens as better instances of their category than tokens in the middle of the exemplar cloud: higher, fronter /i/ tokens are furthest away from all other vowel categories, increasing their likelihood of being categorized as /i/. In the Sanapaná case, L2 /i, u/ are somewhat close to /e, o/ but far away from the other L1 vowel exemplars, potentially prompting their categorization as /e, o/. Furthermore, it is not unheard of for L2 contrasts between high and mid vowels to be assimilated to the same L1 category. L1 Quechua speakers, for instance, do not categorically distinguish mid from high vowels in their L2 Spanish (Pérez Silva, Reference Pérez Silva2017). The available evidence, then, points toward L2 /i, u/ tokens being categorized as (perhaps nonprototypical) instances of /e, o/ by Sanapaná speakers learning Spanish or Guaraní.
Figure 5 illustrates the hypothesized effect of Spanish/Guaraní fluency on a Sanapaná speaker's vowel exemplar clouds. In the perceptual representation, memory links are established between all perceived tokens labeled as L2 /i/, L2 /e/, and L1 /e/ (and mutatis mutandis for /o, u/). For multilinguals, the exemplar clouds linked to Sanapaná /e, o/ tokens therefore presumably extend higher and fronter, and higher and further back, respectively, than for monolinguals. Whether this also happens in production representations is an open question—accurate perception of an L2 contrast is not a necessary precondition for accurate production of this contrast (de Leeuw, Stockall, Lazaridou-Chatzigoga, & Gorba Masip, Reference De Leeuw, Stockall, Lazaridou-Chatzigoga and Gorba Masip2021). Nevertheless, if perceptual and production representations are linked, L2 /i, u/ exemplars from the perceptual representation would on this hypothesis be able to affect future L1 /e, o/ production targets and cause some convergence toward the L2 high vowels.
In production, upon initial L2 exposure, L1 tokens are still more numerous and more highly activated in terms of sheer frequency and contextual activation (speaking Sanapaná activates Sanapaná tokens more strongly than L2 tokens). They are more influential in the averaging process to establish production targets, keeping Sanapaná /e, o/ productions close to the original L1 clouds. Upon increased exposure to L2 /i, u/ tokens, the higher frequency of L2 exemplars in memory makes it more likely for them to have a high enough activation level to influence L1 productions by drawing them toward the upper corners of the vowel space.
As long as a speaker maintains frequent exposure to their L1, L1 exemplar clouds are not expected to decay. Entrenchment is expected to remain functional and constrain within-category variation. In the context of this study, the model laid out above thus predicts that Sanapaná speakers with Spanish/Guaraní proficiency will have higher and fronter /e/ and higher and backer /o/ productions in Sanapaná than monolinguals, but that they will not show greater within-category variability.
Language attrition in an exemplar-based approach
As language shift progresses, speakers typically use their L2 more and more in production and perception. Nevertheless, speakers of a minority language can be highly fluent in the dominant language but still use their community language frequently. Therefore, the effect of decreased L1 exposure (operationalized as residence away from an L1 community) on Sanapaná vowel productions warrants independent analysis. In an exemplar model, decreased L1 exposure is expected to cause a breakdown of entrenchment and to result in more intracategory variability.
Upon losing L1 exposure, the rate at which perceived L1 tokens strengthen L1 categories decreases and may become lower than the rate of L1 exemplar decay, causing existing L1 perceptual exemplars to have a lower activation level and to eventually get lost. In parallel, as someone speaks their L1 less and less often, the rate at which L1 production exemplars are strengthened may become lower than their rate of decay. When a peripheral exemplar is randomly selected in the production process, there are few highly activated exemplars within the averaged region of exemplar space. Averaging does not draw the production target toward the category prototype, and an atypical output is produced. Such an exemplar model without entrenchment causes intracategory variation to increase (Pierrehumbert, Reference Pierrehumbert, Bybee and Hopper2001:144-6). The loss of entrenchment is not expected to result by itself in a shift in the central tendency of the categories whose exemplars are lost—for a category not being influenced by a nearby L2 category, such a shift is not expected.
Figure 6 represents the hypothesized vowel exemplar clouds of a Sanapaná speaker who no longer frequently uses Sanapaná and is immersed in an L2 environment. L1 /e, o/ exemplars in the perceptual and production representation have decayed. Consequently, exemplar clouds for front and back vowels are less dense, and entrenchment cannot work as it does in Figures 4–5. Averaging over L1 /e, o/ tokens in the production or perceptual representation no longer draws production targets toward the category prototype. This mechanism leads us to the hypothesis that multilinguals who do not use Sanapaná frequently will show increased intracategory variability in Sanapaná /e, o/ productions compared with their monolingual peers and multilinguals who do use Sanapaná frequently.
Furthermore, it is mainly exemplars of Sanapaná /e, o/ that decay, while perceptual and production representations of L2 /i, e, o, u/ are still frequently strengthened. Therefore, the proportion of high vowels to mid vowels represented in these clouds increases, as do the former's activation levels. The chance that a higher exemplar is chosen as a production target for /e/ or /o/ increases, as does the influence exerted by higher tokens in the averaging process. Consequently, this model also predicts that multilinguals with low Sanapaná exposure will show higher and fronter /e/ and higher and backer /o/ in Sanapaná productions than monolinguals and multilinguals who do use Sanapaná frequently.
Research hypotheses
This exemplar-theoretic interpretation of current knowledge on second language acquisition, language attrition, and the language contact situation of Sanapaná leads to the following hypotheses regarding the effects of multilingual fluency and decreased L1 exposure (operationalized in this paper as residence in or away from a Sanapaná-speaking community) on the central tendency and variability of Sanapaná /e, o/:
1. Multilingual Sanapaná speakers will show more peripheral mean /e, o/ productions than monolinguals.
2. Multilingual Sanapaná speakers will not show greater intracategory variability in /e, o/ productions than monolinguals.
3. Speakers immersed in an L2 environment will show more peripheral mean /e, o/ productions than speakers immersed in an L1 environment.
4. Speakers immersed in an L2 environment will show greater intracategory variability in /e, o/ productions than speakers immersed in an L1 environment.
In previous studies on language shift (e.g., Maddieson, Avelino, & O'Connor's [Reference Maddieson, Avelino and O'Connor2009] study of variability in glottalized lateral fricatives in Oaxaca Chontal), speakers’ position on a language shift continuum is often operationalized through age as a predictor of language change. However, younger speakers are not automatically also more multilingual or more mobile. Therefore, I treat multilingualism, loss of L1 exposure, and date of birth (operationalized as speaker age) as independent predictors. If language shift can indeed be analyzed successfully through the effects of multilingualism and loss of L1 exposure, one would not expect age to have an independent effect on Sanapaná vowel productions. Therefore, the following hypotheses are tested as well:
5. There will be no effect of age alone on the position of /e, o/, once multilingualism and L1 exposure have been factored in.
6. There will be no effect of age alone on intracategory variability, once multilingualism and L1 exposure have been factored in.
Should such an independent effect of age be found anyway, this could point toward an internally motivated change in progress observable in apparent time.
Methodology
A wordlist reading task was conducted with eleven speakers whose lives are centered on La Esperanza, where Sanapaná is the main language of daily communication. They differ, however, in how much time they have spent in this community over the last years.
Data collection
Data were collected over five weeks in La Esperanza (June-July 2018). Three participants are near-monolinguals living in La Esperanza full time. They were found by asking around for people who did not know any Spanish or Guaraní and only ever spoke Sanapaná. Communication in Spanish between them and me required an interpreter, and they self-reported not knowing any other language than Sanapaná. Four participants reported being proficient in Spanish and/or Guaraní, living in La Esperanza, and having lived there for at least the last three years—they have daily exposure to Sanapaná. Four participants are multilinguals living away from La Esperanza: one had been living in the city of Loma Plata for nine years, one had been living in La Huerta, an agricultural school, for three years, and two had spent most of the last three years living on Mennonite landholdings.
Table 2 presents the participants’ place of residence, multilingual fluency, age, and gender. Age and gender were not sampled in a balanced way.Footnote 3
Participants produced three repetitions of each item on a list of fifty words, mostly consisting of plants, animals, and body parts (Appendix A). Both /e/ and /o/ occurred in the last syllable, in a C_CVelar# frame (velars are the most common word-final codas). Preceding contexts were not balanced, as care was taken to limit the list largely to basic vocabulary—preceding contexts were labial, coronal, or /h/ for /e/, and labial, coronal, or dorsal for /o/. Since the last syllable of words in isolation (i.e., the target syllable) typically has most prominence, target words were not inserted in carrier phrases.
Seven participants read the list from a sheet of paper. The four others (three monolinguals, one multilingual) were not comfortable reading the words and repeated them after one of three native speaker assistants (two men, one woman; all multilingual and living in La Esperanza). Priming may have resulted in monolingual participants converging with the normal speech patterns of the multilingual assistants, reducing the difference between monolinguals and multilinguals. On the other hand, a reviewer notes that performing the role of expert language worker may have favored more conservative pronunciations by the assistants, which may in turn have primed monolinguals to some extent.
Recordings mostly took place in the house where I stayed. Since this was out of the way for some participants, and some objected to working indoors, four sessions took place at the quietest available outside location. A Movo WMIC50 wireless lavalier microphone with wind shield was attached to the participants’ shirt and recorded in .WAV format onto a Zoom H4N Pro recorder. Recordings were made at 44.1 kHz and 16 bps.
Coding
Target vowels were segmented in Praat (Boersma & Weenink, Reference Boersma and Weenink2018). The start of each vowel was labeled at the onset of voicing when preceded by an obstruent (/p, t, k, s, ɬ/). After approximants or nasals, parts of the acoustic signal were spliced off little by little to find the start of the vowel (see Thomas, Reference Thomas2011:142). The end of each vowel was labeled at the end of F2 energy or periodicity in the waveform, whichever came first. The first three formants and corresponding bandwidths were extracted at the midpoint of each vowel using a Praat script. The formant extraction command was set to a frequency range of 5500 Hz (women) or 5000 Hz (men), and five formants. For tokens with high bandwidths (B1 over 300, B2 or B3 over 500) or formant values further than 1.5 standard differences from the speaker's category mean, the formants were checked manually, adjusting the frequency range and number of formants searched for. Because of noise in the recording, the production of an unexpected target syllable, or an unclear F3 trace (especially for /o/), 17.5% of tokens were discarded.Footnote 4 Table 2 above shows the number of remaining tokens per participant.
Two of the three Sanapaná vowels are target variables in this study, so there were not enough stable reference points to perform vowel-extrinsic normalization. Using NORM (Thomas & Kendall, Reference Thomas and Kendall2017), the Bark Difference Metric method was therefore applied to obtain measures of perceptual height (Bark-converted F3 minus Bark-converted F1) and advancement (Bark-converted F3 minus Bark-converted F2). Higher values correspond to perceptually higher and backer vowels. Formula (1) was used for Bark-conversion (Traunmüller, Reference Traunmüller1997).
1. Zi = 26.81 / (1 + 1960/Fi) – 0.53
Since /e, o/ are hypothesized to be moving both vertically and horizontally along the diagonals of the vowel space—converging with /i/ and /u/, respectively—their height and advancement measures were combined into one value. Following Dodsworth (Reference Dodsworth2013), Bark-normalized height is subtracted from Bark-normalized advancement for the overall position of /e/ in perceptual space, and Bark-normalized height is added to Bark-normalized advancement for /o/. In both cases, higher absolute values indicate more peripheral tokens.
2. d(Vi,Vī) = √(F1i – F1ī)2 + (F2i – F2ī)2
I follow Lane, Denny, Guenther, Matthies, Menard, Perkell, Stockman, Teide, Vick, and Zandipour (Reference Lane, Denny, Guenther, Matthies, Menard, Perkell, Stockmann, Tiede, Vick and Zandipour2005) in using the Euclidean distance between a token and the speaker-specific category mean (“vowel dispersion”) to model within-category variability, as in formula (2), where V is a vowel category, i is the i-th token of this category, and ī is the speaker-specific category mean. This Euclidean distance calculation makes it possible to capture variability in the F1 and F2 dimensions as one single variable. Dispersion was calculated over the normalized height and advancement values.
Statistical analysis
Four mixed-effects regression models were set up in R (R Core Team, 2018) using the lmer() function (Bates, Maechler, Bolker, Walker, Bojesen, Singmann, Dai, Scheipl, Grotendieck, Green, & Fox, Reference Bates, Maechler, Bolker, Walker, Bojesen, Singmann, Dai, Scheipl, Grotendieck, Green and Fox2019) for dispersion of /e/ and /o/ and position of /e/ and /o/ along the front or back diagonal. The following language-internal fixed effects were added:
– Vowel duration (in log milliseconds, centered)
– Preceding context (labial, coronal, or /h/ for /e/; labial, coronal, or dorsal for /o/)
Preceding context is added to capture any co-articulatory effects of the preceding consonant. Duration is added to compensate for any prosodic effects. Furthermore, shorter vowels may be expected to show undershoot effects (Lindblom & Moon, Reference Lindblom and Moon1989)—that is, to show a smaller opening gesture and, therefore, be higher. These factors interact with the sociolinguistic factors of interest in ways that are not easily explained (see section “Results”). Language-external factors in the models, besides multilingualism and place of residence, included age. However, there may not be a wide enough age range represented within each factor level of multilingualism and place of residence to assess the independent effect of age. Also included was gender. Though women have often been found to adopt prestigious variants more readily than men of the same social class (e.g., Labov, Reference Labov1990), it is not clear what the relations of gender groups in La Esperanza to forms of linguistic prestige are.
All possible two-way interactions were tested, except for those of place of residence with multilingualism and with gender (there were no monolingual or female participants living away from La Esperanza) and that of multilingualism and gender (this intersection creates subgroups of one participant, making it impossible to use by-speaker random intercepts). Random intercepts were added for speaker and target word. Upon each round of backward selection, the predictor with the highest p-value was removed from the model, unless it was part of a significant interaction, until only factors significant at the p = 0.05 level remained. At this point, overly influential observations were removed,Footnote 5 and the model was double-checked. Each model was bootstrapped with the validate() function (Harrell, Reference Harrell2019), taking 2000 resamplings of a number of observations equal to the size of the original dataset (with random replacement) to judge the robustness of the effect of predictors indicated as significant in the original regression model.Footnote 6
Results
Figure 7 shows the differences between the normalized vowel productions of monolinguals (MO), multilinguals living in La Esperanza (MU1), and (for men) multilinguals living elsewhere (MU2). The ellipses show a region of 1.5 standard deviations around the category mean. Tokens of /a/ from all groups are averaged.
Especially for men, predictions regarding the position of /e, o/ seem to be borne out. Multilingual men seemingly have more peripheral /e, o/ than monolinguals, and this tendency is greater for those living away from La Esperanza. Women's /e/ productions show the same tendency. The four following subsections present the regression models for dispersion and position of /e, o/, always discussing first the external variables and their interactions, and then the remaining internal variables.
Dispersion of /e/
The model for dispersion of /e/ is based on 726 F1/F2 measurements (ten outliers discarded, Appendix B). After bootstrapping, no sociolinguistic variables significantly affect dispersion. Token duration does not affect dispersion either. Following coronals (n = 639), /e/ tokens are on average 0.57 Bark (53 Hz) away from the category mean in Euclidean distance units.Footnote 7 Following /h/ (n = 62), dispersion is not significantly different (p = 0.14). Following labials (n = 25), /e/ shows a near-significant trend toward greater dispersion: tokens are on average 0.75 Bark (69 Hz) away from the mean (p = 0.05).
Dispersion of /o/
The model for dispersion of /o/ is based on 611 measurements (three outliers discarded, Appendix C). The final model contains four pairwise interactions.Footnote 8 Multilingualism, firstly, interacts significantly with preceding context and age (Figure 8-9).
Following coronals, tokens by multilinguals (n = 41) are on average 0.91 Bark away from the category mean. Tokens by monolinguals (n = 17) show a near-significant trend toward being further away from the mean (p = 0.05, 0.28 Bark). Following labials, multilinguals have /o/ 0.69 Bark (64 Hz) further away from the mean than expected (n = 91 for monolinguals, n = 207 for multilinguals). After dorsals, multilinguals have /o/ 0.57 Bark (53 Hz) further away from the mean than expected (n = 63 for monolinguals, n = 192 for multilinguals). As shown in Figure 9, both monolinguals and multilinguals show greater dispersion as they grow older. This effect is, however, greater for monolinguals (the slope is 0.02 Bark/year greater).
Place of Residence, secondly, has a significant main effect on the dispersion of /o/. Residents of La Esperanza (n = 421) have /o/ on average 0.11 Bark (10 Hz) closer to the category mean than speakers living elsewhere (n = 190).
The effect of gender interacts significantly with those of preceding context and age. For tokens following coronals, women show an average dispersion of 1.18 Bark (n = 17). For men, dispersion does not differ significantly (n = 41). Following labials, productions by women are on average 0.53 Bark (49 Hz) further away from the mean than expected (n = 219 for men, n = 79 for women). Following dorsals, tokens by women are on average 0.34 Bark (32 Hz) further away from the mean than expected (n = 190 for men, n = 65 for women). The slope for the effect of age is 0.02 Bark per year greater for men than for women. While younger women have /o/ on average further away from the category mean than men their age, this gender effect weakens as speakers age.
Token duration does not significantly affect dispersion of /o/.
Position of /e/
The model for position of /e/ is based on 726 tokens (ten outliers discarded, Appendix D). Place of residence is the only significant external factor. Residents of La Esperanza (n = 476) have /e/ productions on average 0.80 Bark (74 Hz) less peripheral on the front diagonal than those living elsewhere (n = 250). Preceding context also significantly predicts the position of /e/. Tokens following labials (n = 23) show an average value of 8.7 Bark, tokens following coronals (n = 643) are on average 0.62 Bark (58 Hz) more peripheral. Tokens following /h/ are another 0.52 Bark (48 Hz) more peripheral than tokens following coronals (n = 60).
Position of /o/
The model for position of /o/ is based on 612 measurements (two outliers discarded, Appendix E). There are four pairwise interactions. The effect of multilingualism interacts with those of preceding context and duration (Figure 10–11).
Following dorsals, tokens of /o/ by monolinguals (n = 63) and multilinguals (n = 192) do not have a significantly different position. Following labials and coronals, multilinguals have /o/ more peripheral than expected: 0.64 Bark (60 Hz) in the former context (n = 91 for monolinguals, n = 206 for multilinguals); 0.85 Bark (79 Hz) in the latter (n = 17 for monolinguals, n = 42 for multilinguals).
For multilingual speakers, a token's duration does not significantly predict its position, while for monolinguals the value for the duration slope is 2.67 Bark lower per unit of logged duration. So, for monolinguals only, longer /o/ tokens are less peripheral than shorter tokens. Place of residence also interacts with preceding context (Figure 12), such that following dorsals, speakers living away from La Esperanza have /o/ 0.87 Bark (81 Hz) more peripheral than expected (n = 172 for La Esperanza residents, n = 83 for others).
Gender has a significant main effect on the position of /o/: women have an average value of 17.74 Bark, that is, 1.67 Bark (156 Hz) more peripheral than men.
The last significant interaction is between age and token duration (Figure 13). Older speakers generally have more peripheral /o/ than younger speakers (positive slopes in Figure 13). However, the slope for longer tokens, in the right-hand panel, is steeper than that for shorter tokens, in the left-hand panel, indicating a stronger effect of age for longer tokens. In other words, older speakers show more peripheral /o/ across the board, but the difference between older and younger speakers is largest for longer tokens. With each additional unit of logged duration, the slope for age becomes 0.06 Bark per year greater.
Discussion: Variability and convergence in language shift scenarios
The previous section showed that different social groups of Sanapaná speakers show different degrees of variability in their vowel productions and have their category means at different positions in the vowel space. The interactions between social factors and language-internal factors such as vowel duration and preceding context may lead to a better understanding of the origins of social differences in terms of articulatory processes, but the following subsections focus exclusively on the effects of multilingualism, place of residence, and age.
Multilingualism and place of residence
Multilingualism and/or place of residence significantly influence three of the four outcome variables (Table 3). The effect of place of residence on vowel quality is unequivocal: speakers living away from La Esperanza have more peripheral /e/ and /o/ in all contexts. They also show greater dispersion of /o/ (but not /e/) in all contexts. Together, these data confirm hypotheses 3 and 4. This supports the exemplar theoretic predictions that the decay of L1 mid vowel exemplars would cause entrenchment to break down (increasing intracategory variability), and that it would cause L2 high vowel exemplars linked to the /e, o/ clouds to become more influential in establishing production targets, moving /e, o/ toward the upper front and back corners of the vowel space.
The effect of multilingualism is less consistent. Multilingual speakers show neither different dispersion nor a different position in the vowel space for their /e/ productions than monolinguals. For /o/, multilinguals show higher and backer productions than monolinguals after labials and coronals, but not dorsals, partially supporting exemplar theoretic hypotheses. Monolinguals (but not multilinguals) also show lower and fronter tokens at longer durations. All in all, these findings partially support hypothesis 1: multilinguals do not consistently show more peripheral /e, o/, but wherever a significant difference between monolinguals and multilinguals is found it is in the expected direction.
Multilinguals also show greater dispersion than monolinguals after labials and dorsals, but they show a trend toward lower dispersion following coronals. Should either of these tendencies toward differences in variability between monolinguals and multilinguals be confirmed upon further data collection, it would contradict the exemplar-theoretic prediction that multilingualism should not independently affect within-category variability (hypothesis 2).
In sum, the exemplar theoretic interpretation of current models of language shift proposed here seems to account for the effects of language attrition on Sanapaná vowels. It confirms Bird's (Reference Bird2008) findings that speakers with reduced L1 exposure show increased intracategory variation, presumably through the decay of exemplars associated with L1 categories. It goes beyond Bird's (Reference Bird2008) predictions in explaining how decreased L1 exposure may cause further convergence with the L2 than mere fluency in this contact language. In fact, this study does not conclusively support convergence to L2 categories as a necessary outcome of L2 fluency when not accompanied by long-term removal from a majority-L1 environment. This suggests that even though Spanish and Guaraní expand into ever more social domains in La Esperanza, L1-to-L2 convergence in Sanapaná speakers may only come into play once the L1 is no longer frequently used.
Age
Speaker age only significantly affects dispersion and position of /o/. The effects of age go in the opposite direction from those of multilingualism and place of residence. Younger speakers show more central /o/ than older speakers, especially at longer durations—an unexpected development if we treat age as a proxy for language shift, since shifting speakers are expected to show more L2 influence (and therefore more peripheral /o/). Older men, older multilinguals, and older monolinguals all show greater dispersion than their younger counterparts. Women show the opposite effect. Such developments are unlikely to have resulted from a breakdown of entrenchment, as there is no a priori reason to assume that with equal L1 exposure older speakers would have less dense exemplar clouds than younger speakers. While these effects must be interpreted with caution due to the limited sample of this study, they may point toward ongoing language-internal developments unrelated to the language shift process.
Based on the data examined here, it then seems possible to tease apart the effects of multilingualism and decreasing L1 exposure on the phonetics and phonology of a language under pressure, and to study them independently. Using only age as a proxy for language shift, it would have been impossible to observe the different effects that multilingualism and place of residence have on Sanapaná vowel productions, and the hypotheses advanced by a usage-based model of language shift could not have been tested in as much depth as they were. The value of including age as a predictor independently from such factors relating in a concrete way to speakers’ linguistic experiences must be confirmed in further studies.
Summary and conclusions
This study has presented the first report of social variation in Sanapaná phonetics. It has investigated the influence of multilingualism, decreased L1 exposure, and age on variability in Sanapaná /e, o/ productions and on the quality of these vowels. The main findings are the following:
1. Multilingual Sanapaná speakers have more peripheral /o/ than monolinguals following coronals and labials.
2. Multilingual Sanapaná speakers do not consistently show greater intracategory variability in /e, o/ productions than monolinguals.
3. Sanapaná speakers living in a majority-L2 environment have more peripheral /e, o/ than speakers living in La Esperanza.
4. Sanapaná speakers living in a majority-L2 environment show greater intracategory variability in /o/ productions than speakers living in La Esperanza.
These findings have allowed me to argue for the use of multilingual fluency and degree of L1 exposure as proxies of advancement on the language shift continuum in variationist work on endangered languages as they seem to be theoretically and practically separable, at least in the current dataset. They may have different, motivated, and perhaps conflicting effects on processes of language change. Adding age to the analysis may help to tease out manifestations of any language-internal developments going on simultaneously. Although age was not sampled in a balanced way, and its effects must be interpreted with caution, the following effects may be taken as starting points for future studies on variation in Sanapaná vowels:
5. Younger speakers have more central /o/ productions than older speakers.
6. Younger speakers show less variability in /o/ productions than older speakers.
Studying multilingualism and degree of L1 exposure separately has allowed me to propose an extension to Bird's (Reference Bird2008) exemplar-theoretic account of language shift. I argue that both phonetic convergence by multilingual speakers and increased variability paired with even stronger convergence by speakers with reduced L1 exposure are predicted by an exemplar model that assumes that L2 tokens are perceived and categorized in the same way as L1 tokens. On the one hand, L2 exemplars linked in memory to similar L1 exemplars may cause convergence in productions by multilinguals. On the other hand, the decay of L1 exemplars and the increased proportion of stored L2 exemplars may account for the increased variability and further convergence in productions by speakers removed from an L1-dominant environment. The present data from Sanapaná support the predictions of this model as to the consequences of reduced L1 exposure. They are inconclusive regarding the predictions made about the consequences of multilingualism in itself: it is possible that multilingualism does not in itself affect Sanapaná vowel productions but only does so in conjunction with a decrease in L1 exposure.
The present findings bring up new questions. Multilingual Sanapaná speakers seemingly establish memory links between Spanish/Guaraní high vowel exemplars and their Sanapaná mid vowel exemplars—how different from L1 categories, then, does an L2 sound need to be in order to establish its own category without influencing L1 exemplar clouds? The nature of the effects of language-internal variables found here must also be investigated, as must the role of gender in the sociolinguistic landscape of Sanapaná. The interaction effects on vowel quality and dispersion containing preceding context, vowel duration, and gender may point toward articulatory origins of the sound changes discussed here and to different roles men and women play in the advancing process of language shift. A corpus of Sanapaná oral history is now available (Van Gysel, Reference Van Gysel2020), and makes more realistic the possibility of gaining deeper insights into the relation between language-internal, social, and typological factors in phonetic change in Sanapaná on the basis of naturalistic data and the possibility of assessing the potential of exemplar models of linguistic representation to account for them.
Acknowledgments
I thank the people of La Esperanza and its leaders, Marino Ortega and Sanlorenzo Cantero, for hosting and helping me throughout my fieldwork. I specifically thank all participants in this study and Cristino Benitez, Dionicia Echeverría, and Valenciano Cabrera for their help as language experts. I thank Caroline Smith, Chris Koops, Rosa Vallejos, and audiences at the 13th HDLS conference and the 3rd UC Berkeley Symposium on Amazonian Languages for feedback on earlier versions of this work. This work was partly funded by the Foundation for Endangered Languages, the University of New Mexico's Latin American and Iberian Institute / Tinker Foundation, and the Endangered Languages Documentation Programme SG0523.
Competing Interests
The author declares none.
Appendix A. Word List for Data Collection
Appendix B. Regression Model for Dispersion of /e/
Appendix C. Regression Model for Dispersion of /o/
Appendix D. Regression Model for Position of /e/
Appendix E. Regression Model for Position of /o/