Hostname: page-component-6bf8c574d5-t27h7 Total loading time: 0 Render date: 2025-02-20T08:11:57.879Z Has data issue: false hasContentIssue false

Reflections of the French nasal vowel shift in orthography on Twitter

Published online by Cambridge University Press:  13 July 2022

James Law*
Affiliation:
Brigham Young University
*
Rights & Permissions [Opens in a new window]

Abstract

Non-standard orthography on social media provides a useful supplementary data source for sociophonetic research. Regarding an ongoing chain shift in Northern Metropolitan French nasal vowels, spellings reflecting shifted vowel targets are observed on Twitter. These non-standard spellings, e.g. avont [avɔ̃] for avant /avɑ̃/ ‘before’, provide insight into speakers’ awareness of this change and its lexical distribution. Tweets with shifted and standard spellings of 306 word forms containing the phonemes /ɛ̃/, /œ̃/, /ɑ̃/ and /ɔ̃/ were collected from an 870-million word Internet Archive corpus of French tweets from 2011–2017. Shifted spellings were found for all four vowels and 168 words. The shifted spelling rate is lower than that of comparable variables in English and is not conditioned by stress, grammatical category, frequency, or phonological context, which affect the distribution of shifted nasal vowels in speech. However, frequent words show more indications of intentional misspelling, such as repetition and capitalization of the target vowel, suggesting that some speakers are conscious of the variation and comment on it using salient words. The results also contribute to an ongoing debate about a possible merger between /ɛ̃/ and /œ̃/, supporting the hypothesis of an incomplete merger where /ɛ̃/ shifts towards [ɑ̃] but /œ̃/ does not.

Type
Article
Copyright
© The Author(s), 2021. Published by Cambridge University Press

1. Introduction

In the study of phonetic variation, written data is normally only used when spoken data is unavailable (Linell, Reference Linell2005). Writing is an often unreliable approximation of spoken language, especially in languages such as French that have imperfect correspondences between characters and sounds in their writing systems. However, when it includes non-standard orthography, written data can in fact be a rich source of information regarding phonetic variation. While standard orthography shows only adherence to a convention, non-standard orthography captures some of the variability inherent in spoken language (Jaffe, 2000). In some cases, non-standard orthography may even provide insights about the unconscious phonological system which could not be revealed by traditional measures of production and perception (Caravolas, Reference Caravolas1996). In computer-mediated communication, non-standard orthography can be common, especially on social media (Baeza-Yates and Rello, Reference Baeza-Yates and Rello2012; McCulloch, Reference McCulloch2019). Written language data from social media may in many cases provide a useful supplement to spoken data in sociophonetic research. In this article, I consider a case where written data from Twitter allows new insights on prior research on a vowel shift in spoken French, contributing to ongoing debate about the nature and salience of the shift.

There are several reasons why written data containing non-standard orthography can be a useful complement to available spoken data in phonetic research. Androutsopoulos (Reference Androutsopoulos2000) points out that non-standard orthography is used in ways that reflect both standard pronunciations (e.g. <wuz> for ‘was’) and colloquial or regional pronunciations. In some cases, especially in media where production is rarely edited, these orthographies may be unintentional, whether they are the result of ignorance or typos (Wengelin, Reference Wengelin2002; He and Wang, Reference He and Wang2009). Even when unintentional, misspellings may provide insight into the structure of a speaker’s phonemic inventory. However, in some cases non-standard orthography represents an intentional flouting of convention to assert one’s identity (Sebba, Reference Sebba, Androutsopoulos and Georgakopoulou2003). Writers also use non-standard orthography intentionally to characterize the speech patterns of others, either for satirical purposes or to add an element of orality and authenticity to a text (Preston, Reference Preston2000; Honeybone and Watson, Reference Honeybone and Watson2013; Tatman, Reference Tatman2016). These intentional respellings can reveal which phonological features of a variety are most salient within the speech community (Honeybone and Watson, Reference Honeybone and Watson2013). The presence of non-standard orthography in a text impacts how readers make judgments regarding the identity of the quoted speaker, affecting their stance towards them (Jaffe and Walton, Reference Jaffe and Walton2000). Non-standard orthography can therefore provide insight into how speakers see themselves and others within their speech community, as well as how they see varieties outside of their own.

Working primarily on English, researchers of non-standard orthography on Twitter have generally found that it follows the same distributional patterns of the spoken variation it encodes. Non-standard orthographies associated with a particular variety cluster together, with multiple dialectal features appearing in a single tweet (Tatman, Reference Tatman2015). The same contextual factors that favour a variant in speech can favour the encoding of that variant in writing (Eisenstein, Reference Eisenstein2013). Pragmatic and sociolinguistic variables impact spoken and written variation in similar ways (Eisenstein, Reference Eisenstein2015; Tatman, Reference Tatman2015). Diachronically, non-standard orthographies in tweets written by those in the United States tend to begin in coastal urban areas before spreading to inland urban areas and finally rural areas, mirroring the typical spread of many innovations in speech (Eisenstein, Reference Eisenstein, Boberg, Nerbonne and Watt2018). Despite the difference in modality, non-standard orthographies on Twitter behave in much the same way as their associated oral variants.

Additional affordances of written social media data are its scalability and lack of interference from the researcher as observer. Available quantities of traditionally collected spoken data are limited by logistical and technical constraints (Kendall, Reference Kendall2006). The naturalness of traditionally collected spoken data can be called into question due to the Observer’s Paradox (Labov, Reference Labov1972; Tatman, Reference Tatman2015). Social media data, on the other hand, is not elicited for research purposes but produced for communicative purposes. The naturalness of social media data means that certain features may appear which would be suppressed in a typical research setting.

2. The French nasal vowel shift

By traditional accounts, there are four nasal vowels in the Modern French vowel inventory, represented as /ɛ̃/, /œ̃/, /ɑ̃/, and /ɔ̃/ (Rochet, Reference Rochet1976: 39). However, a great deal of variety exists in this system. I will use these symbols within slashes to represent the standard phonemes while placing symbols representing their phonetic realizations, including shifted variants, within square brackets. In the supralocal variety dominant in most of French-speaking Europe, which I refer to as Northern Metropolitan French (NMF), ongoing change within the nasal vowel system has been the subject of much research and debate (Néron, Reference Néron2017). For many NMF speakers, /œ̃/ has lost its rounding, and Lonchamp (Reference Lonchamp1978: 294–295) suggests that it is better represented as [Λ̃]. /ɛ̃/ has also become lower and perhaps more centralized, moving closer to [æ̃], [ɑ̃] or [ə̃] (Lonchamp, Reference Lonchamp1978; Walter, Reference Walter1994). There is ongoing debate over whether a distinction is maintained between /ɛ̃/ and /œ̃/ or a complete merger of the two vowels has occurred. Walter (Reference Walter1994), Pooley (Reference Pooley2006) and Mooney (Reference Mooney2016) claim that /ɛ̃/ and /œ̃/ have merged in NMF, although the four-vowel distinction is maintained by some speakers in Belgium, Switzerland and southern France (Coquillon and Turcsan, Reference Coquillon and Turcsan2012; Hambye and Simon, Reference Hambye, Simon, Gess, Lyche and Meisenburg2012; Racine and Andreassen, Reference Racine, Andreassen, Gess, Lyche and Meisenburg2012). Malecot and Lindsay (Reference Malécot and Lindsay1976) find that speakers do not make a distinction in their production of these two vowels, but when asked to indicate the most ‘natural’ pronunciation in a listening task, they preferentially indicate higher realizations for /ɛ̃/ and lower realizations for /œ̃/. Sampson (Reference Sampson1999) claims that stigmatization of this merger has led many Parisian speakers to maintain a distinction between these vowels. Meanwhile, Hansen (Reference Hansen2001a) finds that although /œ̃/ has shifted towards [ɛ̃], it has not followed the shift of /ɛ̃/ towards [ɑ̃], indicating an incomplete merger. Also, while /œ̃/ has lost its rounded lip protrusion, it may not have acquired the feature of spread lips which characterizes /ɛ̃/ (Hansen, Reference Hansen, Gess, Lyche and Meisenburg2012).

There is evidence that the lowering of /ɛ̃/ may have begun as early as the fifteenth century (Straka, Reference Straka1979: 514). A lowered and retracted tongue position enhances the existing formant contrasts provided by nasalization of /ɛ/, so Carignan (Reference Carignan2014) suggests that this shift was triggered by a tendency to increase the perceptual difference between the nasal vowel and its oral counterpart. This tendency may have been driven in part by a misperception by listeners that a raised first formant was due to tongue lowering rather than velopharyngeal coupling only (Beddor, Krakow and Goldstein, Reference Beddor, Krakow and Goldstein1986). Both of these articulatory movements naturally increase the frequency of the first formant. The intrusion of /ɛ̃/ into the vowel space of /ɑ̃/ pushed it towards [ɔ̃] beginning in the nineteenth century (Passy, Reference Passy1891: 254; Sampson, Reference Sampson1999: 87). This development appears to be ongoing. Walter (Reference Walter1994) found that younger speakers’ production of /ɑ̃/ was shifted back with a tendency towards rounding, while older speakers maintained a more conservative pronunciation of /ɑ̃/. Meanwhile, Mettas (Reference Mettas1973) found 21 years earlier that this shift was a significant feature of young women’s speech. Some have suggested that /ɑ̃/ and /ɔ̃/ could be on track to merge, resulting in a binary nasal vowel system (Fónagy, Reference Fónagy1989; Sampson, Reference Sampson1999; Hansen, Reference Hansen2001a). The more widely accepted scenario is that /ɔ̃/ is shifting upward and becoming more rounded to maintain a distinction with /ɑ̃/ (Mettas, Reference Mettas1973; Walter, Reference Walter1994; Hansen, Reference Hansen, Gess, Lyche and Meisenburg2012; Carignan, Reference Carignan2014). This shifted pronunciation of /ɔ̃/ is even higher than [õ], approaching [ũ] (Walter, Reference Walter1994). The most widely accepted view of the NMF nasal vowel inventory is therefore a tripartite system containing [æ̃/ɑ̃], [ɔ̃] and [ũ], with a possible fourth nasal vowel phoneme traditionally transcribed as /œ̃/ but actually pronounced [ɛ̃/æ̃].

This creates a mismatch between the orthographic conventions of written French and the most common modern pronunciation of nasal vowels. /ɛ̃/ is usually written as <in>, <ein>, <ain>, or sometimes <en>, even though its modern pronunciation would be more clearly represented with <an>. /œ̃/ is written as <un>, despite no longer having a rounded pronunciation. /ɑ̃/, usually written as <an> or <en>, has a pronunciation that would suggest the spelling <on>. Since written French represents /u/ as <ou>, a more natural spelling of /ɔ̃/ might be <oun>, rather than the conventional <on>. These non-standard spellings would associate each nasal vowel to the oral vowel with the most similar place of articulation.

The situation is further complicated by the variation in nasal vowel realization in other dialects. I have already mentioned the lack of a /ɛ̃/-/œ̃/ merger in some European varieties, which also applies to several African varieties of French (Bordal, Reference Bordal2012; Boutin, Gess and Guèye, Reference Boutin, Gess and Guèye2012). Mooney (Reference Mooney2016) also describes a chain shift in the nasal vowels of the Béarn variety located in southwest France, motivated by transfer of the /ɑ̃/ > /ɔ̃/ shift from NMF, resulting in /ɔ̃/ becoming centralized and /œ̃/ being fronted. The counterclockwise chain shift in NMF contrasts with a clockwise chain shift in the nasal vowels of Canadian French varieties (Hansen, Reference Hansen2001a).

In addition to this dialectal variation, the shifting of NMF nasal vowel articulation has also been found to be related to certain linguistic factors. Hansen (Reference Hansen2001b) reports that nasal vowels in NMF are more strongly shifted when they are found in stressed position, which generally falls on the final syllable of a prosodic group. She also finds that rounding of /ɑ̃/ to [ɔ̃] is conditioned by a following labial consonant and a preceding labial segment of any type. She does not investigate the effect of adjacent labial segments on the shift from /ɔ̃/ to [ũ], although the increased labiality of [ũ], often pronounced with extended lips, suggests that this shift might be similarly affected (Mettas, Reference Mettas1973; Hansen, Reference Hansen, Gess, Lyche and Meisenburg2012). Hansen also found a weak effect for frequency, with frequent words showing more shifted pronunciations of /ɑ̃/, as well as an effect for grammatical category, with adverbs leading this same change by a significant margin, followed by nouns, verbs, adjectives and prepositions, with numerals lagging behind. Grammatical category also appeared to impact the other nasal vowels in a similar way, although she does not explore this effect in detail. Previous work on English suggests that phonological factors can impact the use of non-standard orthography on Twitter (Eisenstein, Reference Eisenstein2015), although it remains to be seen if the same is true for this particular phenomenon in French. I therefore consider the effects of stress, adjacent labial segments, frequency, and grammatical category on non-standard spelling of nasal vowels.

The heterogeneity of the French nasal vowel inventory and shifts within it complicates the identification of parallels between phonetic and orthographic realizations. It would be expected that orthographic representation of nasal vowels on Twitter would vary according to the dialect of the user. However, such a hypothesis presupposes that such non-standard orthographies are found at all, and in sufficient enough quantities to chart their distribution. Most previous research on non-standard French on Twitter has focused on primarily syntactic features common to informal social media writing such as ne-deletion and emoticons, with little attention paid to phonologically-motivated orthography (Abitbol etal., Reference Abitbol, Karsai, Magué, Chevrot and Fleury2018; Magué etal., Reference Magué, Rossi-Gensane and Halté2020). The primary objective of this study is consequently to contribute to a basic account of phonologically-motivated orthography in French tweets. A secondary objective is to use these written data to explore remaining questions regarding the phonology of French nasal vowels, although a more robust sociophonetic investigation is reserved for future research. Despite the availability of some location-based metadata on Twitter, it is often not possible to determine the location from which a tweet is sent, much less the language varieties spoken by its writer (Dredze etal., Reference Dredze, Paul, Bergsma and Tran2013). Geographically specific analysis of orthographic variation in tweets has been performed successfully (e.g. Eisenstein, Reference Eisenstein2015; Tatman, Reference Tatman2015) and would be more informative to sociophonetics, but at the expense of ascertaining a sense of the general scope of this previously undocumented phenomenon in French. I opt in this article to include data that cannot be reliably located in northern and central France, while factoring in location data when it is available.

The present study considers the following research questions regarding the use of non-standard orthography on Twitter as a reflection of the nasal vowel shift in NMF:

  • (Q1) Which shifted nasal vowels are most frequently represented by non-standard orthography?

  • (Q2) Do non-standard orthographies of words containing /œ̃/ reflect the pronunciation [ɑ̃] (indicating a complete merger with /ɛ̃/) or [ɛ̃] (indicating an incomplete merger with /ɛ̃/, where /œ̃/ does not participate in the lowering of /ɛ̃/)?

  • (Q3) Are orthographic reflections of shifted nasal vowels conditioned by stress, preceding labial segments, frequency and grammatical category in the same way that oral production of shifted nasal vowels is?

  • (Q4) Is the use of non-standard orthography reflecting shifted nasal vowels intentional or unintentional? If intentional, what motivates it?

Q1 and Q2 address the variability found in descriptions of the French nasal vowel shift. Given the difficulty of determining most sociolinguistically relevant information about users sampled in a Twitter corpus, the possible contribution to ongoing debates about the nature of the shift across dialects is limited. However, Twitter data can uniquely contribute to our understanding of particularly salient modality-crossing features of the NMF dialect which is dominant in the corpus. Q3 applies findings by Hansen (Reference Hansen2001b) to written data. Q4 focuses on one of the most significant affordances of social media data, namely insight into speaker attitudes regarding linguistic variation.

3. Data

The data were found in a corpus of 88,034,726 French tweets from the years 2011 to 2017. Most tweets are limited to 140 characters. In November 2017, Twitter increased its character limit to 280, so a small portion of the corpus may contain longer tweets (Perez, Reference Perez2017). The corpus contains over 870 million words. The tweets are part of a collection assembled by the Internet Archive (Archive Team 2018, www.archive.org/details/twitterstream). The data exclude ‘retweets’, or shared copies of tweets. The selection of French tweets was based on the user’s chosen language, not on the actual content of tweets, so the corpus contains some tweets in other languages written by users who selected French as their language on their Twitter account. For only 2.6% of tweets, the latitude and longitude coordinates from which the tweet was sent were shared by users. However, location can often be inferred based on user-provided time zone or location fields. Approximately 86% of tweets in the corpus include data for at least one of these fields. Tweets whose location could not be determined from these combined fields were not rejected, but tweets located outside of France were excluded from most of the analysis.

I assembled lists of the 100 most frequent French word forms containing each of the four nasal vowel phonemes. Word frequencies were taken from the Lexique 3.80 database’s film captions corpus (New etal., Reference New, Brysbaert, Veronis and Pallier2007, www.lexique.org). The advantage of this frequency measure is that film captions approximate informal speech more closely than many other corpora that are based on books or Wikipedia. Inflectional variants are considered as individual word forms, so pense ‘think.1sg’ and penses ‘think.2sg’ are separate items. The same is true of polysemous words, which share a form but have a different grammatical category, so bien ‘well’ (ADV) and bien ‘good’ (NOUN) are separate items. Some word forms are counted twice because they contain two different nasal vowels, such as content /kɔ̃tɑ̃/ ‘happy’ which appears within the 100 most frequent words for both /ɔ̃/ and /ɑ̃/. The 400 items represent 256 different lemmas and 340 unique word forms.

I searched in the corpus for these items in their standard spellings as well as with non-standard spellings that reflect shifted nasal vowels. Table 1 shows the non-standard spellings that correspond to each shifted vowel. Nasal vowels are represented in written French by an orthographic vowel or combination of vowels followed by either <n> or <m> word-finally or before another consonant. I only altered the orthographic vowels for non-standard spellings, maintaining the <n> or <m> from the standard spelling. For example, possible non-standard spellings of ainsi /ɛ̃si/ ‘thus’ are ensi and ansi, but not emsi or amsi. The search allowed unlimited repetition of the target vowel, such as pooonse for pense ‘think’, sometimes used for emphasis or to reflect vowel lengthening (Brody and Diakopoulos, Reference Brody and Diakopoulos2011).

Table 1. Orthographic representations of nasal vowels

A change in vowel orthography may affect the pronunciation of adjacent letters. In written French, <g> followed by <e> is pronounced [ʒ], while <g> followed by <o> is pronounced [g]. If genre [ʒɑ̃ʁ] ‘like, type’ is spelled gonre, it suggests not just a change of [ɑ̃] to [ɔ̃] but also a change of [ʒ] to [g]. I adjusted for words like this by including additional non-standard spellings that account for the impact on adjacent letters. In the case of genre, I searched for the non-standard spellings gonre, geonre and jonre.

The orthography ‘en’ is used for both /ɛ̃/ and /ɑ̃/. Generally, <en> is realized as /ɛ̃/ when it follows [j] in words like bien /bjɛ̃/ ‘well’ or orthographic <é> in words like européen /øʁɔpeɛ̃/ ‘European’, and it is realized as /ɑ̃/ in most other contexts, although there are exceptions such as examen /ɛgzamɛ̃/ ‘test’. For words containing /ɛ̃/ spelled with <en>, only <an> was included as a possible non-standard spelling. For other words containing /ɛ̃/ and words containing /œ̃/, <en> is included as one of the possible non-standard spellings reflecting the shift to [æ̃/ɑ̃].

For words with more than one nasal vowel, I included spellings where either or both vowels are shifted. For instance, the search for non-standard spellings of content /kɔ̃tɑ̃/ ‘happy’ included countent [kũtɑ̃], contont [kɔ̃tɔ̃] and countont [kũtɔ̃].

I excluded 34 word forms whose non-standard spellings would overlap with another common word. For instance, a non-standard spelling of sens /sɑ̃s/ ‘direction’ reflecting the /ɑ̃/ > [ɔ̃] shift would be sons, an existing French word meaning ‘sounds’. The search therefore included standard and non-standard spellings of 306 unique word forms.

4. Methods

I conducted the search in R 3.5.0 (R Core Team, 2018). The search was not accent-sensitive except when the presence of an accent distinguished between two different words on the search list, such as demande ‘request’ and demandé ‘requested’.

The search for non-standard spellings returned a total of 14,869 tokens. From these, I removed by hand those in a language other than French. Tweets containing code-switching were kept as long as the target word was in a French context or could not represent a word of the matrix language. I also removed tweets where context showed that the identified word was a non-standard spelling of a different word than the target. For example, one non-standard spelling of the target word importun ‘inappropriate’ is importan. In context, it became clear that most results with this spelling were typos of the more common word important ‘important’ and did not represent a shifted nasal vowel.

After these false results were removed, 3,333 tokens of non-standard spellings remained, found in 3,312 unique tweets. For polysemous words like bien ‘well, good’ that were included multiple times in the list of target words under different grammatical categories, I coded the category of each token by hand based on context. At least 42,057,128 tokens of standard spellings were found. For polysemous words in these results, I calculated an estimate of the distribution across each available grammatical category, assuming the same distribution as reported in the Lexique 3.80 database. For instance, bien ‘good, well’ is included in the list of target words as an adverb, a noun and an adjective. The adverbial use is the most common, with Lexique 3.80 reporting a ratio of 97:2:1 for ADV:NOUN:ADJ. I therefore divided the tokens of bien according to this ratio to estimate frequency.

Once the results for polysemous words were separated by grammatical category, I calculated the percentage of tweets containing non-standard spellings of each word. This is the ratio of tweets with non-standard spellings of a word to the sum of those with any spelling of the word, standard or non-standard, multiplied by 100.

I used syllable position as a surrogate variable to code for stress, which falls on the final syllable of a prosodic group in NMF. Determining prosodic groups in tweets without acoustic data would be unreliable, so my coding represents susceptibility to stress, rather than actual phonological stress. If the target vowel is in the final syllable of a word, or is in a monosyllabic word, then it is susceptible to stress. Auxiliary verbs and relative pronouns are the exception as they do not appear at the end of prosodic groups and are therefore not susceptible to stress. If the vowel is in an earlier syllable, then it is not susceptible to stress.

I coded the nasal vowel in a word as having a preceding labial if a labial segment (/b/, /p/, /m/, /f/, /v/, /w/, /y/, or /u/) was present in either of the two sound segments to the left of the target vowel. I coded a nasal vowel for a following labial if the sound segment immediately following it was [+labial]. I included two segments before the nasal vowel because the effect found by Hansen (Reference Hansen2001b) for preceding context was stronger and applied to both consonants and vowels.

I was able to infer the country of origin for 49% (1636/3333) of the retrieved tokens of non-standard spellings. Just over 2.8% (92/3333) contained latitude and longitude coordinates, which I used to determine the country. And 67% (2244/3333) included a user-provided location, although many of these did not refer to real locations. Any that contained “France” or one of a list of cities and regions in France were coded as originating from France. The others were cross-referenced with a list of other French-speaking countries and major cities in those countries and were coded with the indicated country. A further 64% (2137/3333) of the tokens provided the user’s time zone. Users have the option to associate their account with a time zone by selecting the time zone of a major city. Several options are provided for each time zone, so some tweets may indicate “Amsterdam” as their time zone while others indicate “Paris”, even though both cities are in the same time zone. I coded those with the time zone “Paris” as originating from France if the country of origin could not be determined in any other way but did not use time zone to locate tweets in any other countries.

I coded tokens of non-standard spellings for five properties that I collectively refer to as “intentionality markers”. These are orthographic and contextual properties which suggest that a non-standard spelling is being used intentionally by the writer, rather than the target word being unintentionally misspelled. Orthographic intentionality markers on the target word itself are letter repetition (e.g. pooonse for pense), capitalization (e.g. pOnse), and punctuation (e.g. “ponse” or *ponse). I coded a token for the intentionality marker of capitalization if either the letters representing the nasal vowel were capitalized in an otherwise lowercase word or the target word was capitalized in an otherwise lowercase sentence or tweet. I did not consider it an intentionality marker if the entire sentence or tweet was capitalized. A fourth intentionality marker, co-occurrence, is the use of multiple non-standard spellings of nasal vowels in different words within the same tweet. The final intentionality marker, metalinguistic commentary, is applied to tweets with a metalinguistic topic, usually involving criticism of the pronunciation or spelling of other people. A single token may include multiple intentionality markers, e.g. *pOOOnse.

5. Results

At least 3,312 tweets were found to contain non-standard spellings of one or more of the target words, representing 0.0038% of the total corpus. And 21 of these are included twice in the results because they contain non-standard spellings of two target words, leaving a total of 3,333 tokens. By comparison, nearly 28 million tweets contain one or more target words in their standard spellings, representing 32% of the corpus. A larger proportion of these tweets are found in earlier years. In 2011–2013, 0.012% (2109/17832208) of target word forms have non-standard spellings, while in 2014–2017, only 0.005% (1224/24228253) do. The available date range is small, however, so this is more likely due to chance and fluctuations in the size of the corpus than any historical change in writing patterns.

Of the 1,636 tokens for which location could be determined, 1,193 (73%) came from metropolitan France. Of the others, the majority (284) came from Africa, with 108 from North America, 26 from other European nations with French as an official language, and 25 from elsewhere. These 443 tokens originating from outside of France are excluded from most of the analysis, leaving 2,890 tokens. However, given that not all speakers of NMF live in metropolitan France, these tokens will be considered in portions of the analysis as indicated.

In the following sections I report the results relevant to each of the four research questions.

5.1 Q1: Which vowels are most represented by non-standard orthography?

All four nasal vowels are represented by non-standard orthography. Table 2 shows the number of tokens of non-standard and standard spellings of items in the wordlist separated by vowel, as well as the percentage of non-standard spellings out of all spellings.

Table 2. Tokens of non-standard spellings of each nasal vowel

By raw token counts, /ɛ̃/ and /ɑ̃/ are represented by non-standard spellings much more often than /ɔ̃/ or /œ̃/. However, when these results are normalized for frequency, a different pattern emerges. The vowel /œ̃/ is found in fewer words than the other nasal vowels, and consequently the 100 most frequent words containing /œ̃/ include some very infrequent words. The low frequency of these words results in very high percentages of non-standard spellings if any such spellings are found. The word with the highest rate of non-standard orthography is emprunterais ‘borrow.1/2s.cond’, which is only found once with a non-standard spelling (emprinterais). That single token, however, represents 9.1% of all tokens of emprunterais in the corpus. Because of these low-frequency words, /œ̃/ actually has the highest percentage of non-standard spellings. Across all four vowels, 0.008% (3333/42062570) of tokens of target word forms use a non-standard spelling.

The words with the most tokens of non-standard orthography are bien /bjɛ̃/ ‘well’ (bian: 327), gentil /ʒɑ̃ti/ ‘nice’ (gontil: 178, jontil: 74, geontil: 15, total: 267), putain /pytɛ̃/ ‘whore’ (putan: 206, puten: 21, total: 227), quelqu’un /kɛlkœ̃/ ‘someone’ (quelqu’in/quelquin: 147, quelqu’en/quelquen: 26, quelquan: 2, total: 175) and gens /ʒɑ̃/ ‘people’ (geons: 74, gons: 58, jons: 22, total: 154). The mean non-standard percentage of these five words is 0.09%. The vowels /ɛ̃/, /ɑ̃/ and /œ̃/ are represented in these top five results. The word containing /ɔ̃/ with the most tokens of non-standard orthography is bon /bɔ̃/ ‘good’ (boun), with 49 tokens and a non-standard percentage of 0.006%. Out of 306 unique word forms, at least one non-standard spelling was found for 168 of them.

5.2 Q2: Which possible shift from /œ̃/ is reflected more in orthography?

Among words containing /œ̃/, orthographies reflecting shifts to both [ɛ̃] and [ɑ̃] are found in the corpus. However, the shift to [ɛ̃] is more frequently represented. At least 280 tweets have non-standard spellings that reflect /œ̃/ > [ɛ̃], while only 51 tweets have non-standard spellings that reflect /œ̃/ > [ɑ̃]. The mean percentage of non-standard spellings for /œ̃/ > [ɛ̃] is 0.27%, while that for /œ̃/ > [ɑ̃] is 0.08%.

There is not a clear separation between the words which are spelled to reflect /œ̃/ > [ɛ̃] and those which are spelled to reflect /œ̃/ > [ɑ̃]. Only two words have more than 10 tokens where the spelling reflects /œ̃/ > [ɑ̃]: quelqu’un ‘someone’ (28) and lundi ‘Monday’ (landi: 11, lendi: 2, total: 13). These two words account for 80% of non-standard spellings reflecting /œ̃/ > [ɑ̃]. Both words also rank highly among the words which reflect /œ̃/ > [ɛ̃]. Quelqu’un is the word with the most tokens reflecting /œ̃/ > [ɛ̃], with 147 tokens spelled quelqu’in, quelquin, or quelqu in. Lundi (lindi: 26, laindi: 1) is in third place behind chacun ‘each one’ (chaquin: 13, chakin: 13, chacin: 3, chacain: 1). The shift of /œ̃/ > [ɛ̃] is orthographically represented both more frequently and across a greater variety of words than /œ̃/ > [ɑ̃].

5.3 Q3: Is orthography affected by factors that affect speech?

I used two fixed-effect linear regression models to test for the effect of the identified potential factors on the percentage of non-standard spellings. I did not include individual words as a random effect because the sample of words is not random, being based on frequency and extensive enough to cover the large majority of instances of nasal vowels in the corpus. The first model includes susceptibility to stress (based on syllable position), grammatical category and frequency, and is applied to all the data, except for the tweets located outside of France which are excluded. These factors have no effect on the percentage of non-standard spellings, F(16,427) = 0.7548, p > 0.05. The model does not improve when the tweets located outside of France are included, F(16,427) = 0.7126, p > 0.05.

The second linear regression tests for the effect of preceding labial context and following labial context on the percentage of non-standard spellings. Because the shifts affecting /œ̃/ and /ɛ̃/ do not involve an increase in rounding, I exclude data on these vowels from this model. The resulting models show that labial context has no effect on the use of non-standard orthography to represent the vowels /ɑ̃/ and /ɔ̃/, F(2,181) = 0.4261, p > 0.05. In this model as well, including the tweets located outside of France does not produce an effect, F(2,181) = 0.4337, p > 0.05.

Although there are no statistical effects for these factors, it is worth noting an observation regarding the mean percentages of non-standard spellings for each grammatical category. Nouns have the highest mean percentage of non-standard spellings, at 0.102%. They are followed in order by verbs, adjectives, indefinite pronouns, indefinite adjectives, interjections, adverbs, possessive pronouns, prepositions, conjunctions and possessive adjectives, with relative and personal pronouns, auxiliaries and numerals having no tokens of non-standard orthography. This order corresponds roughly to the pattern found by Hansen (Reference Hansen2001b), where word classes with more lexical semantic content (nouns, verbs, adjectives, adverbs) were more likely to be pronounced with shifted nasal vowels than functional word classes (numerals, prepositions). Given the results of the statistical tests, this observation should not be given too much weight. However, this illustrates that lexical factors may have a small degree of influence on the use of non-standard orthography, but this influence cannot be statistically assessed given the general rarity of the use of non-standard orthography and the resulting small token counts, especially when divided across a fairly large number of grammatical categories.

5.4 Q4: Is non-standard orthography intentional?

I infer the intentionality of nonstandard spellings based on five intentionality markers: repetition, capitalization, punctuation, co-occurrence and metalinguistic commentary. Examples of each marker taken from the corpus are shown in (1). Usernames have been removed from these examples to protect the anonymity of users who are not public figures.

(1a), (1b) and (1c) are standard examples of the use of repetition, capitalization and punctuation, respectively, to draw attention to the shifted nasal vowel. In (1d), the writer reflects the /ɑ̃/ > [ɔ̃] shift in two words, rentres and dedans, and also characterizes another feature of the Parisian dialect, the pronunciation of a full schwa after a final consonant before a pause ([aʁɛtə]), a context that traditionally calls for a drop of schwa ([aʁɛt]) (Hansen, Reference Hansen, Gess, Lyche and Meisenburg2012: 165). This tweet therefore uses the intentionality marker of co-occurrence, as well as repetition for the words arreteuuuh and dedooons. (1e) likewise uses co-occurrence, with the /ɑ̃/ > [ɔ̃] shift reflected in the words prends and français (the use of e instead of ais in fronce is a common abbreviation and does not necessarily reflect any non-standard phonetic value). Pragmatically, this example also criticizes the language skills of the addressee by telling them to take French classes. This tweet is a response to a tweet by another user which reads C cho don ma taite (C’est chaud dans ma tête ‘It’s hot in my head (I’m angry)’). This tweet uses several abbreviations and other non-standard orthographies, including a spelling of dans ‘in’ which indicates a shift from /ɑ̃/ to /ɔ̃/. The response in (1e) is therefore meant to mock the forms used by the other user. The metalinguistic commentary marker applies to any such tweets which explicitly comment on linguistic forms, spoken or written.

Out of 2,890 tokens of non-standard orthography, 401 (13.9%) use at least one of these intentionality markers. These include 292 tokens with repetition, 41 with capitalization, 41 with punctuation, 28 with co-occurrence and 29 with metalinguistic commentary. Repetition is the most common intentionality marker by a large margin. The margin increases if tweets located outside of France are included, for a total of 363 tweets containing letter repetition. Although literature on these intentionality markers as a group is scarce, the rate of co-occurrence does appear low in light of Tatman’s (Reference Tatman2015: 103) finding that 84% of tweets in a sample representing Scottish English contained multiple non-standard spellings. However, that figure includes a variety of Scottish dialectal features, while co-occurrence in the present study is limited to the representation of shifted nasal vowels.

Intentionality markers are used at a greater rate with high-frequency words than low-frequency words. I used a simple logistic regression to model the effect of frequency on the presence of intentionality markers. For every one-unit increase in word frequency, the log odds of at least one intentionality marker being used increases by 2.61·10-7, N = 2890, p = 0.00131, see Table 3. Another way to observe this effect is by simply comparing the mean word frequencies of tokens with at least one intentionality marker and those without intentionality markers. Tokens with intentionality markers have a higher mean frequency (M = 616985.9) than those without these markers (M = 510034.2), t(512.06) = -3.0181, p = 0.0027.

Table 3. Table of regression results for the presence of intentionality markers

Significance codes: * 0.05, ** 0.01, *** 0.001

Related to frequency, intentionality markers are used more often with /ɛ̃/ and /ɑ̃/ than with the other two vowels. At least 13% of tweets reflecting /ɛ̃/ > [ɑ̃] have intentionality markers (N = 160), and 19% of tweets reflecting /ɑ̃/ > [ɔ̃] have them (N = 216). Compare this with 10% of tweets reflecting /ɔ̃/ > [ũ] (N = 19), 2% of tweets reflecting /œ̃/ > [ɑ̃] (N = 1) and 1.8% of tweets reflecting /œ̃/ > [ɛ̃] (N = 5). This cannot be entirely attributed to frequency differences among the vowels, however. Words containing /ɔ̃/ actually have a higher mean frequency (M = 191,655) than words containing /ɛ̃/ (129,923) or /ɑ̃/ (M = 177,132). It may be that the shift in /ɔ̃/ receives less commentary via intentionality markers because its raising is less salient than the shifts seen with /ɛ̃/ and /ɑ̃/.

6. Discussion

The reflection of shifted French nasal vowels in orthography is relatively widespread throughout the lexicon, as shifted spellings are observed for 168 out of 306 word forms investigated. However, the phenomenon is rare when compared to some documented cases of phonologically motivated non-standard orthography in English. Across the entire word list, only 0.008% of tokens use a non-standard spelling reflecting a shifted pronunciation of a nasal vowel. Among the five words with the highest incidence of non-standard nasal vowel orthography, non-standard spellings represent only 0.09% of tokens. As a point of comparison, Eisenstein (Reference Eisenstein2015) found a t-deletion rate in the word ‘next’ of 0.499% before consonants and 0.474% before vowels in English tweets, and a g-deletion rate in the word ‘going’ of 15.3% before consonants and 28.5% before vowels in English tweets. Without more research on phonologically-motivated orthographic variation in French computer-mediated communication, it is unclear whether this low frequency reflects an overall lower rate of this type of non-standard orthography in French than in English, or only applies to variation in nasal vowels. One difference to note between the French nasal vowel shift and t- and g-deletion in English is that the nasal vowel shift is older and more fully established. Speakers have had more time to become accustomed to the conventional pairings of orthography and phonetic realization, even if the letters do not represent similar places of articulation for oral and nasal vowels. It may be that for more recent and ongoing changes in French, non-standard orthographic representations are more common.

The rates of non-standard orthography for the vowels /ɛ̃/ and /ɑ̃/ are close to the mean. The rate for the vowel /œ̃/ is exceptionally high, but this appears to be due to the inclusion of some very low-frequency words containing this vowel in the word list. /œ̃/ is only found in a relatively small number of French words, so in selecting 100 words for each vowel in order to create a balanced word list, I included some rare words which skew the rates for this vowel.

For the vowel /ɔ̃/, the rate of non-standard spellings is surprising. /ɔ̃/ has the highest token count of standard spellings, and the lowest token count of non-standard spellings. This vowel also has fewer non-standard spellings with intentionality markers than /ɛ̃/ or /ɑ̃/. I see three possible reasons for the reduced use of non-standard spellings and intentionality markers with this vowel. First, /ɔ̃/ > [ũ] is a subtler change than the other nasal vowel shifts. /ɛ̃/ > [ɑ̃] involves a shift from a mid, front vowel to a low, back vowel. /ɑ̃/ > [ɔ̃] involves raising and the addition of rounding. /œ̃/ > [ɛ̃] involves a loss of rounding, and /œ̃/ > [ɑ̃] adds to that lowering and backing. The shift from /ɔ̃/ to [ũ] is perhaps less noticeable, given that it only involves raising from a mid vowel to a high vowel, and /ɔ̃/ is already rounded. The letter <o> in French can represent either /ɔ/ or /o/, depending on the word and the phonological context. Because of this, speakers with a raised pronunciation of /ɔ̃/ can already read <on> as [õ] rather than [ɔ̃], and from there raising to [ũ] is an even smaller deviance from the standard orthography. Second, the spelling <oun> is perhaps less likely to be used because it is not already a standard spelling for a different nasal vowel. The other three nasal vowels have shifted into roughly the same vowel space previously occupied by another nasal vowel. <In>, <ein> and <ain> are standard spellings of /ɛ̃/ and non-standard spellings of /œ̃/. <An> and <en> are standard spellings of /ɑ̃/ and non-standard spellings of /ɛ̃/ and /œ̃/. <On> is a standard spelling of /ɔ̃/ and a non-standard spelling of /ɑ̃/. <Oun> is not a standard spelling of any vowel, although <ou> is a standard spelling of /u/. Because /ɔ̃/ has moved into a new vowel space within the nasal vowel system, no conventional spelling is available to represent its shifted pronunciation phonetically. Although clearly some speakers have chosen to use <oun>, the novelty of this spelling may be reducing its use. A final and related reason is that there may be other ways Twitter users represent [ũ] orthographically which I did not include in my search. One possibility is the use of repetition or capitalization without changing the letter used, e.g. bOOOn for bon ‘good’. I decided not to include these as non-standard spellings of /ɔ̃/ because they do not necessarily represent a change in place of articulation. On their own, repetition and capitalization could indicate an increase in vowel quantity or loudness without a change in vowel quality.

I searched for orthographies reflecting two possible endpoints for shifts from /œ̃/: [ɛ̃] and [æ̃/ɑ̃]. While most scholars agree that /œ̃/ has merged with /ɛ̃/ in NMF, Hansen (Reference Hansen2001a) claims that the merger is incomplete, because while /ɛ̃/ shifts to [æ̃/ɑ̃], /œ̃/ does not. The occurrence of orthographies where /œ̃/ is spelled <an> or <en> seems at first to contradict this hypothesis. However, as previously noted, the spelling <en> represents /ɛ̃/ in some words, so its use is not an unambiguous indication of the pronunciation [æ̃/ɑ̃]. Only 18 tokens of /œ̃/ are spelled <an>, and 13 of them are a single word: landi (lundi ‘Monday’). Around 33 tokens of /œ̃/ are spelled <en> (including two tokens of lendi), and an additional 280 are spelled <in>, <ein>, or <ain>, indicating a pronunciation like [ɛ̃]. If /œ̃/ and /ɛ̃/ are completely merged for most speakers, then one would expect words with /œ̃/ to be spelled either with their standard orthography or with an orthography matching their pronunciation ([æ̃/ɑ̃]). The fact that non-standard spellings of /œ̃/ reflecting the pronunciation [ɛ̃] are much more common supports Hansen’s hypothesis of an incomplete merger, perhaps with certain words like lundi having a fully merged vowel for some speakers.

In previous work on phonologically motivated orthographic variation in English tweets, the use of non-standard spellings was conditioned by many of the same factors that affect the respective variants in speech. This finding is one reason that non-standard orthography represents such a compelling new data source in sociophonetic research. In the present study, however, the factors considered had no statistical effect on the orthography of nasal vowels. I will suggest three possible reasons for the lack of any measurable effect from these factors. The first is the lower rate of non-standard spellings of French nasal vowels compared to similar variables investigated in English (e.g. Eisenstein, Reference Eisenstein2015). Whether limited to nasal vowels or affecting other non-standard French orthography, lower token counts make it less likely for conditioning factors to achieve statistical significance. A second possible reason involves a limitation of this study. Although tweets specifically located outside of France were excluded for much of the analysis, data collection was not limited to a narrow dialect area. Hansen’s (Reference Hansen2001b) findings are based on data from 42 Parisians, and while the NMF nasal vowel shift has been documented well beyond Paris, the particular conditioning factors identified by Hansen have not. Of course, limiting data collection for the present study to tweets reliably identified as written by Parisians would exacerbate the limitations of token counts. Finally, there is the possibility that the factors that condition this variable in speech simply do not apply as strongly to tweets. The factors of stress and adjacent labial segment in particular operate on an articulatory level of speech production that does not automatically extend to silent writing. Although frequency and grammatical category might be expected to play a similar role in writing as in speech, perhaps factors specific to the modality of written social media should be given greater weight. These include keyboard layouts, message length limits, and the role of predictive typing and spelling correction tools. The universality of Eisenstein’s (Reference Eisenstein2015) findings regarding phonologically-motivated non-standard spellings is called into question by these results, and further study in this area is warranted.

Evidence of intentional respelling is found more often with high-frequency words than with low-frequency words. The written form of a frequent word is encountered more often, reinforcing mental access to the correct spelling. Although non-standard orthography is used for both frequent and infrequent words, the motivation for its use may therefore differ. Users may produce non-standard spellings like emprinterais due to ignorance or carelessness while producing non-standard spellings like bian as an intentional act of satire, criticism, rebellion, or identity assertion.

Of course, a use of non-standard orthography may be intentional without providing any explicit intentionality markers as evidence of this fact. Another possible source of evidence about intentionality is the orthography used in other tweets by the same user, as well as tweets in dialogue. Four of the five tweets in (1) are replies to a tweet by a different user, and 49% (1618/3312) of the tweets containing non-standard orthography of nasal vowels are replies. In cases like (1e), both the original tweet and the reply contain non-standard orthography of nasal vowels. Orthographic patterns can become part of the shared practice of participants in a dialogue, even if the participants are aware that the orthography is non-standard. Particularly strong evidence for intentionality would be if a user spells nasal vowels with standard orthography in certain tweets, directed at a certain audience and about a certain topic, and with non-standard orthography in other tweets, with a different audience and topic. Such an analysis is provided by Tatman (Reference Tatman2015) for some dialectal variants of English. This type of detailed qualitative analysis is beyond the scope of the current contribution. However, pragmatic analysis such as this may be critical to our understanding of when and why Twitter users use non-standard orthography.

7 Conclusion

The investigation of phonologically motived orthographic variation on social media is a new and promising area of research in sociophonetics. Non-standard orthography that reflects phonetic variation can provide different kinds of insights than acoustic data while giving access to data from a wider population than it would ever be possible to record acoustically. Most work in this area has focused on English, but this study confirms that the same type of orthographic variation is found in French tweets.

The use of written data supplements previous research on the French nasal vowel system in multiple ways. Acoustic and perceptual data has been inconclusive on the question of whether /œ̃/ has completely merged with /ɛ̃/. The written data examined here adds a new perspective on this issue, providing evidence in support of the hypothesis that the merger is incomplete in NMF, although the two vowels may be merged for some speakers and some words. Twitter data also informs our understanding of the social salience and perception of shifted nasal vowels. From the written data, it appears that Twitter users are most conscious of the change in the vowels /ɛ̃/ and /ɑ̃/ and in more frequent words, since in those cases non-standard spellings are often accompanied by markers of intentionality. The raising and extra rounding of /ɔ̃/ appears to be less salient, and non-standard spellings of less frequent words are more likely to be unintentional. When Twitter users make explicit comment on non-standard spellings of nasal vowels, it is sometimes to mock or criticize others, although it is often not clear whether the criticism is directed towards an especially shifted pronunciation of the vowels or towards accidental misspelling. While examples like (1e) criticize misspelling as associated with a lack of education, a more thorough qualitative analysis of the data would be required to better understand the attitudes that these tweets reveal towards shifted pronunciation of nasal vowels. These insights highlight the usefulness of written social media data as a secondary resource in sociophonetic research.

Continued work of this type on a variety of languages and phonetic variables is needed to determine the extent to which factors such as articulatory assimilation, which by its nature would only be expected to impact spoken language, transfer to non-standard orthography. Future work should also compare the realization of particular phonetic variables across multiple written platforms. As digital social media have become an increasingly important part of global cultures, communities of practice have developed within each available platform. Different platforms are used by different demographics and have their own registers and genre conventions. The patterns found in the orthography of French nasal vowels on Twitter may not be the same as those found on Instagram or Reddit. Written social media of all varieties are a rich source of authentic language data that is less bound by convention than other written genres. This data has many benefits that can be exploited in sociophonetic research.

Competing interests

The author declares none.

Acknowledgements

I am grateful to Alex Rosenfeld, Lars Hinrichs, and the Texas Advanced Computing Center for their generous assistance in the preparation of the corpus used in this study. I would also like to thank the editors and reviewers of this manuscript for their invaluable feedback and suggestions.

References

Abitbol, J. L., Karsai, M., Magué, J.-P., Chevrot, J.-P., and Fleury, E. (2018). Socioeconomic Dependencies of Linguistic Patterns in Twitter: a Multivariate Analysis. In: WWW 2018: The 2018 Web Conference, April 23–27, 2018, Lyon, France. New York: ACM, pp. 1125–1134.Google Scholar
Androutsopoulos, J. (2000). Non-standard spellings in media texts: The case of German fanzines. Journal of Sociolinguistics, 4.4: 514533.CrossRefGoogle Scholar
Archive Team. (2018). The Twitter Stream Grab. URL: www.archive.org/details/twitterstream, retrieved 1 June 2018.Google Scholar
Baeza-Yates, R., and Rello, L. (2012). On measuring the lexical quality of the web. In: Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality – WebQuality ’12. New York: Association for Computing Machinery, pp. 1–6.CrossRefGoogle Scholar
Beddor, P. S., Krakow, R. A., and Goldstein, L. M. (1986). Perceptual Constraints and Phonological Change: A Study of Nasal Vowel Height. In: C. Ewen and J. Anderson (eds), Phonology Yearbook, vol. 3. Cambridge: Cambridge University Press, pp. 197–217.Google Scholar
Bordal, G. (2012). A phonological study of French spoken by multilingual speakers from Bangui, the capital of the Central African Republic. In: R. Gess, C. Lyche and T. Meisenburg (eds), Phonological Variation in French. Amsterdam: John Benjamins, pp. 23–44.CrossRefGoogle Scholar
Boutin, B. A., Gess, R. and Guèye, G. M. (2012). French in Senegal after three centuries: A phonological study of Wolof speakers’ French. In: R. Gess, C. Lyche and T. Meisenburg (eds), Phonological Variation in French. Amsterdam: John Benjamins, pp. 45–72.CrossRefGoogle Scholar
Brody, S. and Diakopoulos, N. (2011). Cooooooooooooooollllllllllllll‼‼‼‼‼‼‼ Using Word Lengthening to Detect Sentiment in Microblogs. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. Edinburgh: Association for Computational Linguistics, pp. 562–570.Google Scholar
Caravolas, M. (1996). Six-year-olds’ phonological and orthographic representations of vowels: A study of 1st grade Québec-French children. McGill University Doctoral dissertation.Google Scholar
Carignan, C. (2014). An acoustic and articulatory examination of the “oral” in “nasal”: The oral articulations of French nasal vowels are not arbitrary. Journal of Phonetics, 46: 2333.CrossRefGoogle Scholar
Coquillon, A. and Turcsan, G. (2012). An overview of the phonological and phonetic properties of Southern French: Data from two Marseille surveys. In: R. Gess, C. Lyche and T. Meisenburg (eds), Phonological Variation in French. Amsterdam: John Benjamins, pp. 105–128.Google Scholar
Dredze, M., Paul, M. J., Bergsma, S., and Tran, H. (2013). Carmen: A twitter geolocation system with applications to public health. In: Expanding the Boundaries of Health Informatics Using Artificial Intelligence: Papers from the AAAI 2013 Workshop. Palo Alto, California: AAAI Press, pp. 20–24.Google Scholar
Eisenstein, J. (2013). Phonological factors in social media writing. In: Proceedings of the Workshop on Language Analysis in Social Media – 2013 Conference of the North American Chapter of the Association for Computational Linguistics, NAACL-HLT 2013. Stroudsburg, Pennsylvania: Association for Computational Linguistics, pp. 11–19.Google Scholar
Eisenstein, J. (2015). Systematic patterning in phonologically-motivated orthographic variation. Journal of Sociolinguistics, 19.2: 161188.CrossRefGoogle Scholar
Eisenstein, J. (2018). Identifying Regional Dialects in On-Line Social Media. In: Boberg, C., Nerbonne, J. and Watt, D. (eds), The Handbook of Dialectology. Oxford: Wiley, pp. 368–383.Google Scholar
Fónagy, I. (1989). Le français change de visage? Revue romane, 24.2: 225253.Google Scholar
Hambye, P. and Simon, A. C. (2012). The variation of pronunciation in Belgian French: From segmental phonology to prosody. In: Gess, R., Lyche, C. and Meisenburg, T. (eds), Phonological Variation in French. Amsterdam: John Benjamins, pp. 129–150.Google Scholar
Hansen, A. B. (2001a). Les changements actuels des voyelles nasales du français parisien : confusions ou changement en chaine ? La linguistique, 37.2: 3348.CrossRefGoogle Scholar
Hansen, A. B. (2001b). Lexical diffusion as a factor of phonetic change: The case of Modern French nasal vowels. Language Variation and Change, 13.2: 209252.CrossRefGoogle Scholar
Hansen, A. B. (2012). A study of young Parisian speech: Some trends in pronunciation. In: Gess, R., Lyche, C. and Meisenburg, T. (eds), Phonological Variation in French. Amsterdam: John Benjamins, pp. 151–172.Google Scholar
He, T. and Wang, W. (2009). Invented spelling of EFL young beginning writers and its relation with phonological awareness and grapheme-phoneme principles. Journal of Second Language Writing, 18.1: 4456.CrossRefGoogle Scholar
Honeybone, P. and Watson, K. (2013). Salience and the sociolinguistics of Scouse spelling: Exploring the phonology of the Contemporary Humorous Localised Dialect Literature of Liverpool. English World-Wide, 34.3: 305340.CrossRefGoogle Scholar
Jaffe, A. (2000). Introduction : Non-standard orthography and non-standard speech. Journal of Sociolinguistics, 4.4: 497513.Google Scholar
Jaffe, A. and Walton, S. (2000). The voices people read: Orthography and the representation of non-standard speech. Journal of Sociolinguistics, 4.4: 561587.CrossRefGoogle Scholar
Kendall, T. (2006). Recording and environmental effects in sociolinguistic interviews: Implications for sociophonetic analysis. The Journal of the Acoustical Society of America, 119.5: 33373337.CrossRefGoogle Scholar
Labov, W. (1972). Sociolinguistic Patterns. Philadelphia: University of Pennsylvania Press.Google Scholar
Linell, P. (2005). The written language bias in linguistics : its nature, origins and transformations. New York: Routledge.Google Scholar
Lonchamp, F. (1978). Recherches sur les indices perceptifs des voyelles orales et nasales: Application à la structure du système vocalique français et de diverses autres langues. University of Nancy Master thesis.Google Scholar
Magué, J.-P., Rossi-Gensane, N. and Halté, P. (2020). De la segmentation dans les tweets : signes de ponctuation, connecteurs, émoticônes et émojis. Corpus, 20.Google Scholar
Malécot, A. and Lindsay, P. (1976). The Neutralization of /ɛ̃/-/œ̃/ in French. Phonetica, 33.1: 4561.CrossRefGoogle Scholar
McCulloch, G. (2019). Because Internet: Understanding the new rules of language. New York: Riverhead Books.Google Scholar
Mettas, O. (1973). Les réalisations vocaliques d’un sociolecte parisien. Travaux de l’Institut de Phonétique de Strasbourg, 5: 111.Google Scholar
Mooney, D. (2016). Transmission and diffusion: Linguistic change in the regional French of Béarn. Journal of French Language Studies, 26.3: 327352.CrossRefGoogle Scholar
Néron, M. (2017). French Nasals: An Objective View. Journal of Singing, 73.4: 413420.Google Scholar
New, B., Brysbaert, M., Veronis, J., and Pallier, C. (2007). The use of film subtitles to estimate word frequencies. Applied Psycholinguistics, 28.4: 661–677. URL: http://www.lexique.org, retrieved 25 July 2018.CrossRefGoogle Scholar
Passy, P. É. (1891). Étude sur les changements phonétiques et leurs caractères généraux. Paris: Firmin-Didot.Google Scholar
Perez, S. (2017). Twitter officially expands its character count to 280 starting today. TechCrunch. URL: https://techcrunch.com/2017/11/07/twitter-officially-expands-its-character-count-to-280-starting-today/, retrieved 14 August 2018.Google Scholar
Pooley, T. (2006). On the geographical spread of Oïl French in France. Journal of French Language Studies, 16.3: 357390.CrossRefGoogle Scholar
Preston, D. R. (2000). Mowr and mowr bayud spellin’: Confessions of a sociolinguist. Journal of Sociolinguistics, 4.4: 615621.CrossRefGoogle Scholar
R Core Team. (2018). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. URL: https://www.R-project.org/, retrieved 18 August 2018.Google Scholar
Racine, I. and Andreassen, H. N. (2012). A phonological study of a Swiss French variety: Data from the canton of Neuchâtel. In Gess, R., Lyche, C. and Meisenburg, T. (eds), Phonological Variation in French. Amsterdam: John Benjamins, pp. 173–210.Google Scholar
Rochet, B. L. (1976). The Formation and the Evolution of the French Nasal Vowels. Tübingen: Niemeyer.CrossRefGoogle Scholar
Sampson, R. (1999). Nasal vowel evolution in Romance. Oxford: Oxford University Press.Google Scholar
Sebba, M. (2003). Spelling rebellion. In Androutsopoulos, J. K. and Georgakopoulou, A. (eds), Discourse Constructions of Youth Identities. Amsterdam: John Benjamins, pp. 151172.Google Scholar
Straka, G. (1979). Remarques sur les voyelles nasales, leur origine et leur évolution en français. In: Les sons et les mots: Choix d’études de phonétique et de linguistique. Paris: Klincksieck, pp. 501–531.Google Scholar
Tatman, R. (2015). # go awn: Sociophonetic Variation in Variant Spellings on Twitter. Working Papers of the Linguistics Circle, 25.2: 97108.Google Scholar
Tatman, R. (2016). “I’m a spawts guay”: Comparing the use of sociophonetic variables in speech and Twitter. University of Pennsylvania Working Papers in Linguistics, 22.2: 160170.Google Scholar
Walter, H. (1994). Variétés actuelles des voyelles nasales du français. Communication and Cognition, 27.1–2: 223236.Google Scholar
Wengelin, Å. (2002). Text production in adults with reading and writing difficulties. Göteborg University Doctoral dissertation Google Scholar
Figure 0

Table 1. Orthographic representations of nasal vowels

Figure 1

Table 2. Tokens of non-standard spellings of each nasal vowel

Figure 2

Table 3. Table of regression results for the presence of intentionality markers