I. Introduction
Vocal fry is the phenomenon in which an individual drops their voice to the lowest range possible and emits a low, growly, creaky tone of voice. The affect came to public attention in the late 2000s, when a number of prominent female reality television stars, notably Jersey Shore's Snookie and the cast of Keeping Up with the Kardashians, were scrutinized for their manners of speech. Vocal fry has been regarded as a generational vocal style, often compared to “valley girl” speak, in which speakers add “like” superfluously in a sentence, or uptalk, in which assertions are delivered with the intonation of questions. Like previous generational vocal styles, vocal fry has been met with derision by cultural critics, such as National Public Radio's On the Media co-host Bob Garfield, who in a recent interview on the magazine's language blog called the phenomenon “annoying, I mean, really annoying,” “vulgar,” and “repulsive” (Garfield and Vuolo Reference Garfield and Vuolo2013). Garfield's sentiments echo reactions discussed in other national news and culture outlets such as The Atlantic (Khazan Reference Khazan2014) and This American Life (Glass Reference Glass2015).
Critics of vocal fry often claim that it is an exclusively female vocal pattern, and some, such as those calling recently to complain to This American Life about female reporters using vocal fry, say that the voicing is so distracting that they cannot understand what is being said (Glass Reference Glass2015). It is an extreme reaction to claim that a phonation is so distracting as to prevent uptake of the semantic content of a speech act associated with it. Moreover, this claim is made predominantly about female users of vocal fry, despite fry being a phonation employed by users across a range of gender expressions—for instance, Ira Glass, the host of This American Life, calls out his own vocal fry in his reporting on the issue. In the mid-2010s, influential social-science research into female vocal fry associated vocal fry use with lower job prospects for American women in the workplace (Anderson et al. Reference Anderson, Klofstad, Mayew and Venkatachalam2014), where no such parallel findings were apparent for male vocal fry. This research reinforced, and is widely cited by, cultural commentators and members of the public. The research, and the cultural commentators, recommend that women stop using vocal fry in order to improve their communicative abilities.
In this article, we are interested in the scholarly and public distaste for female vocal fry, as we see this distaste as a form of limiting women's communicative autonomy. From valley girl speak and uptalk to vocal fry, the generational vocal stylings of young women are widely recognized as linguistically innovative and socially complex, and these innovations are often derided by linguistically mainstream commentators who discourage their use. What is fascinating about vocal fry in particular is that it does not, in its use in American English, carry semantic or even grammatical content, and a common reaction to it (exemplified in the cases above) is a reaction that refuses to engage with the said content of women's speech. This is the particular reaction to vocal fry that we aim to characterize here, which we call the “negative non-content-based response” to speech acts performed using female vocal fry, which occurs when hearers report being unable to understand what a speaker is saying because the sound of their voice is annoying. We address this negative hearer-reaction to female vocal fry from the perspectives of philosophy of language, feminist epistemology, and linguistics. Our primary interest is in characterizing the dismissive reaction to female vocal fry as a phenomenon within pragmatic accounts of language use. However, because this reaction is one that manifests along gendered lines, and because both speaker-use of fry and hearer-reactions are indexed to gender as well as to other intersectional features of the speaker's personal identity, it is both impossible and undesirable to explicate the pragmatics of the negative hearer reaction to female vocal fry without orienting it in the context of larger conversations about the power dynamics of language use. Likewise, because vocal fry is produced by a more straightforwardly biological mechanism than many speech phenomena, and because detractors have used so-called scientific arguments to disparage the practice of vocal fry, we incorporate empirical perspectives into our discussion.
In what follows, we construct a framework that explains not only how women are judged negatively for vocal frying, but why these particular negative judgments serve to negate, dismiss, or otherwise erase the asserted content of their utterances. This framework consists of two main parts: the construction of a distinction between content-based and non-content-based responses to utterances, and the explication of the negative response to female vocal fry through the pragmatic and cognitive-linguistic construct of communicative echoing. The first part of this framework identifies and fills a gap in the philosophy of language literature, and the full account aims to extend recent work on silencing and other forms of epistemic injustice. To assemble our framework, in section II we use empirical studies on contemporary female vocal fry to describe patterns of use and the negative reaction we wish to explicate. We argue that this reaction occurs due to female vocal fry's violation of the frequency code, a phonetic construct in which vocal intonation is indexed to gender presuppositions. Then, in section III we characterize our distinction between content-based and non-content-based responses to utterances, arguing that the dismissal of asserted content plays a key role in contributing to the specific type of negative response to female vocal fry that concerns us. Section IV contains the second part of our framework, which uses Deirdre Wilson and Dan Sperber's echoic account of irony to argue that the negative reaction to vocal fry fails to interpret the speaker's utterance as an assertion (Wilson and Sperber Reference Wilson and Sperber2012). We argue that in cases of non-content-based responses to female vocal fry, women are judged as annoying because their use of vocal fry is mistakenly taken as a signal of inappropriately echoing another authoritative utterance, rather than making an assertion. We use results from section II to show that this is indicative of sexist attitudes and is generally an inappropriate response to instances of female vocal fry. Section V orients our discussion with respect to contemporary pragmatics and feminist epistemology, and section VI contains brief concluding remarks.
II. Vocal fry in sociolinguistics: the frequency code
Although it has been getting new media attention of late, vocal fry has been studied as a type of phonationFootnote 1 for over fifty years. In this section, our historical survey demonstrates that vocal fry is neither a novel phenomenon in the English-speaking world, nor is it produced exclusively by female speakers. Nonetheless, there is some evidence that a) it is becoming more frequent among some communities of young women, especially in certain American communities, and that b) female vocal fry may be viewed as a different sort of speech act from male fry.
Some early research treated fry as a type of voice disorder, but it has been studied as a nonclinical voicing since at least the 1960s (Catford Reference Catford, Abercrombie, Fry, MacCarthy, Scott and Trim1964; Hollien et al. Reference Hollien, Moore, Wendahl and Michel1966). Vocal fry is produced by a slackening of the vocal cords such that they vibrate irregularly and lower the fundamental frequency of voice pitch. It is usually located on a spectrum of voicing types produced by voluntary changes in the anatomy of the vocal cords during speech. Such spectra vary in terminology and specificity (for example, Catford Reference Catford, Abercrombie, Fry, MacCarthy, Scott and Trim1964; Gordon and Ladefoged Reference Gordon and Ladefoged2001), but generally identify at least five phonation types, arranged from most constricted to least constricted vocal cords: whisper, breathy voice, modal (that is, “normal”) voice, vocal fry, and full glottal stop (as in, the sound produced during the hyphenation in an utterance of “uh-oh”). Falsetto is sometimes included on such spectra between breathy and modal voices.
Early descriptions of the sound produced by vocal fry reveal unpleasant metaphorical associations that persist in contemporary complaints about the annoying quality of the voicing. In his 1964 survey of phonation, for instance, phonetician J. C. Catford writes of fry, “The auditory effect is of a rapid series of taps, like a stick being run along a railing” (Catford Reference Catford, Abercrombie, Fry, MacCarthy, Scott and Trim1964, 32). More recently, definitions of fry are given in terms of acoustic mechanisms, as in the following: “Pitch, which has been observed to be extremely low, is controlled by aerodynamic factors and not by varying the longitudinal tension. The F0 [fundamental frequency] and amplitude variation of consecutive glottal pulses is further known to be very irregular. ‘Double pulsing’ is also a frequent characteristic of creak, where two pulses, with different amplitude and duration, occur within what appears to be one cycle” (Gobl and Chasaide Reference Gobl and Chasaide2003, 13), or “a train of discrete laryngeal excitations, or ‘pulses’ of low frequency . . . it is a phonational register occurring at frequencies below those of the modal register” (Hollien et al. Reference Hollien, Moore, Wendahl and Michel1966, 263–64), or “the creaky phonation is characterized by irregularly spaced pitch periods and decreased acoustic intensity relative to modal phonation” (Gordon and Ladefoged Reference Gordon and Ladefoged2001, 387).
These descriptions are associated with the detection of fry by spectrograms, which measure patterns of vocal pitch frequency and intensity over time. Contemporary sociolinguistic and linguistic-anthropological analysis of vocal fry in English-speaking subjectsFootnote 2 typically uses spectrogram measurements to detect instances of fry, then determines whether the speaker's subject of discussion, speaking context, personal identity, or other features correlate with the presence or absence of frying. For instance:
• John Esling correlates more frequent use of fry with higher social status in Edinburgh speakers (Esling Reference Esling1978);
• Jane Stuart-Smith correlates it with male speakers in Glasgow (Stuart-Smith (Reference Stuart-Smith, Foulkes and Docherty1999);
• Carmen Fought identifies it as a central feature of Southern California Chicano English (Fought Reference Fought2003);
• Jeannine Carpenter associates it with projections of masculinity in teenage males (Carpenter Reference Carpenter2006);
• Norma Mendoza-Denton argues that it “assists [Chola/Chicana] gang girls with the construction of a hardcore persona” when telling fight narratives (Mendoza-Denton Reference Mendoza-Denton2011, 266).
In an influential article linking phonation pitch to gender expression, John Ohala develops what he terms the “frequency code,” which links higher-pitched phonation types like falsetto with femininity and lower-pitched phonation, including fry, with masculinity (Ohala Reference Ohala, Hinton, Nichols and Ohala1994; see also the review of vocal masculinity and femininity in Pisanski and Feinberg Reference Pisanski, Feinberg, Frühholz and Belin2017). The frequency code is implicit or explicit in many of the articles and books listed above, and recent work on gender and vocal fry typically either reinforces or calls into question the associations of the frequency code. According to the frequency code, a woman speaking with vocal fry will sound “like a man”; likewise, a man speaking in falsetto will sound “like a woman”.
The frequency code is often rationalized biologically, appealing to the idea that menFootnote 3 typically have bigger larynxes and thus lower fundamental frequencies in their phonation. Sociolinguist Robert Podesva, whose research program centers around investigating associations between phonation types and personal identity markers, puts the point thus:
The creaky voice pattern may arise from iconic associations between creaky voice and masculinity—the low pitch characterizing creaky voice is interpreted as resembling masculinity, due to gross tendencies for men to have lower pitched voices than women. This indexical association can be recruited to link creaky voice to stances conventionally associated with men, like toughness, at higher orders of indexicality. . . . Similar ideological processes link falsetto, and its characteristically high pitch, to femininity. (Podesva Reference Podesva2013, 427)
The frequency code rationalizes certain features of the echoic account of female fry that we develop below. The misfiring that occurs in cases of dismissive reactions to female fry, like the ones referenced in the introduction, may originate in confusions of (sociolinguistic) utterance indexicality.Footnote 4 Masculine-indexed utterances issuing from female speakers present conflicted messages of what sort of thing the speech act is. Accounting for this conflict is one of our main aims in presenting our echoic account of female vocal fry in section IV. In section V, we argue that this conflict cannot be accounted for by existing feminist epistemologies.
It is worth noting that frequency-code explanations of female fry as masculine need not appeal to speaker intent; that is, a woman need not be consciously indexing masculinity to produce fry that is interpreted by an audience as echoing masculine voicing. In fact, there are cases where the indexed associations between fry and masculinity have only developed in a community after female fryers have encoded fry in other utterance contexts. Mendoza-Denton's account of Chola/Chicana fight narratives, for instance, argues that though her initial data on fry demonstrates its contextual use in the construction of a “hardcore” persona that was not necessarily male-imitating, the phonation became associated with masculinity through its later uptake in the (male) Chicano rap community. Discussing fry in “gang girl” communities, she argues:
[C]reaky voice is not necessarily associated with masculinity, but instead with a counterhegemonic gendered performance of being “hardcore,” once it was picked up as part of the Chicano gangster persona in popular gangster rap music and film, it acquired strong associations with Chicano men. What for the girls involved in gangs was a discourse device manipulating low-tones in narratives acquired an indirect indexicality that pointed to Chicano masculinity once it became telescoped through the wider media lens that associated gangster rapper styles exclusively with men. (Mendoza-Denton Reference Mendoza-Denton2011, 270)
Although the aim of the initial “gang girl” speakers was not explicitly trying to sound masculine, the fry that they used in their fight narratives was appropriated by the male members of their gang communities as signaling masculinity.
A variety of recent studies tie vocal fry not to masculinity but to other indexical and nonindexical utterance markers, including increased intimacy of a verbal exchange (Speck Reference Speck2006), demonstrations of authority by college-age women (Lefkowitz and Sicoli Reference Lefkowitz and Sicoli2007), and speech about emotional topics (Loss and Zold Reference Loss and Zold2014). Most of the sociolinguistic research on vocal fry centers around the speaker's utterances, rather than audience perception of those utterances, but a few studies have investigated audience reactions (for example, Gobl and Chasaide Reference Gobl and Chasaide2003; Yuasa Reference Yuasa2010; Loss and Zold Reference Loss and Zold2014). In these, female vocal fry is correlated with a range of affects from association with specific emotions, for example, boredom (Gobl and Chasaide Reference Gobl and Chasaide2003) or nonaggression (Yuasa Reference Yuasa2010), to level of education (Yuasa Reference Yuasa2010). Sara Loss and Elizabeth Zold argue that young women (under forty) use fry to communicate authority, whereas older women (over forty) use it to communicate about emotional topics (Loss and Zold Reference Loss and Zold2014). This corroborates Daniel Lefkowitz and Mark Sicoli's study of college-age women in Virginia, for whom vocal fry was a marker of authoritative speech (Lefkowitz and Sicoli Reference Lefkowitz and Sicoli2007). None of these studies discuss whether fry in these latter instances is used intentionally, nor whether young women fry users associate their use with sounding “like a man.”
III. The negative reaction to vocal fry as a non-content-based response
In this section we discuss the set of reactions to female vocal fry that are embodied in the assessment of fry as annoying or difficult to listen to. We show that this response to female vocal fry is a non-content-based response and discuss how non-content-based responses limit the communicative and agentive autonomy of those on the receiving end of the response.
As the previous section highlights, sociolinguistics has developed a nuanced set of empirical conclusions associating female use of vocal fry with a striving for authority. In Anglophone linguistic communities, male voices have historically been the voices of power and authority. So, the implication of some of the research surveyed above is that women adopt vocal fry in order to gain authority by sounding more like men.Footnote 5
Empirical evidence supports this implication: in addition to the research discussed above, a recent study found that women who are a part of a male-dominated workplace are more likely to use vocal fry (Anderson et al. Reference Anderson, Klofstad, Mayew and Venkatachalam2014). In a disappointing if unsurprising twist, that same study also reported that female users of vocal fry, rather than being seen as more authoritative, are perceived as less confident and less competent by their coworkers. Certain cultural commentators who identify as feminists have pointed to this negative reaction to vocal fry as evidence that women should cease using vocal fry in order to sound more professional (Katz Reference Katz2014; Wolf Reference Wolf2015). Indeed, some speech-language pathologists even offer therapy plans to rid speakers of vocal fry. Although it is not central to our thesis, we do wish to point out that there is no evidence that engaging in vocal fry is harmful to healthy vocal systems; we view this workplace voice-policing as analogous to coworkers making suggestions about the type of clothing and amount of makeup their women colleagues should wear, and we hope readers take it as a given that these latter practices should be discouraged.
The negative reaction to vocal fry is not exclusive to men, but media stories on vocal fry and YouTube comments on videos of women using vocal fry typically feature more men than members of other genders reporting negative reactions to vocal fry. This may be merely statistical, as, if fry is more prevalent in male-dominated workplaces, it is more likely to be used around men than women in the workplace. Regardless, the negative reaction to female use of vocal fry that concerns us is expressed with some variant of “that is an annoying sound.” This type of reaction is our primary concern here. It appears in interviews in news stories about the recent rise of female vocal fry as a sociolinguistic phenomenon, as illustrated in our introduction, and it is almost invariably expressed multiple times among the top comments in YouTube videos about, or featuring, female vocal fry. Common extensions of this reaction are for hearers to report that they cannot stand to listen to vocal fry or to request or demand that speakers cease using vocal fry.
This negative reaction to female vocal fry indicates that women users of vocal fry are being assessed and responded to on the basis of how they sound, rather than by the content of their speech. We call this type of response a non-content-based response to a speech act, to contrast it with content-based responses to speech acts. A content-based response is one that evaluates, and responds to, the linguistic or conceptual content of a speech act, whereas a non-content-based response is one that does not engage the content presented in a speech act. For example, if a speaker says, “It's cold outside,” some content-based responses include, “Yes, it sure is,” “Oh, it's not so bad,” “It's supposed to be warmer tomorrow,” “Do you want an extra layer?” and, “I have some lip balm.” Each of these responses engages with the idea that it is cold outside, either by agreeing or disagreeing, or by acknowledging the content tacitly by generating a related response. Non-content-based responses to, “It's cold outside,” include responses such as, “What'd you say?” “No hablo ingles,” “Oh, I didn't see you there,” “Hump day, am I right, buddy?” and “Shh! There's a bear behind you.” Non-content-based responses acknowledge and respond to the fact that a speech act has been performed but do not engage with the linguistic content of that speech act. In other words, non-content-based responses respond to the fact that something was said but not to what was said. This distinction has not, to the best of our knowledge, been previously drawn in the philosophy of language, and we believe it is an essential distinction to understanding the negative reaction to vocal fry—and that it may prove useful in unpacking other types of exchanges (including some that have become all too common in 2020, such as “I think your mic is muted” and “Your video froze”) as well.
III.i The Effects of Non-Content-Based Responses
An interesting feature of non-content-based responses is that, when a speaker is intending to communicate, receiving a non-content-based response slows the process in at least two ways. First, non-content-based responses require the speaker to “try again” in their original attempt to present content, or to give up on the project of eliciting a content-based response; and second, some non-content-based responses generate response requirements of their own, for example, by changing the subject or deflecting from the original speaker's topic.
Consider the bear response in the examples above: The speaker who receives the response, “Shh! There's a bear behind you,” to her statement “It's cold outside,” is now presented with a variety of practical and communicative challenges that she was likely not anticipating when she uttered her initial comment on the weather. To wit: If there is a bear, dare she speak again? What is the set of actions with the best likelihood of survival? Is her interlocutor being serious or playing a prank on her? If it is a prank, is it one intended good-naturedly, or is the interlocutor acting cruelly? If there is no bear, how important is it to restart the conversation about the weather, and is it worth the awkwardness of repeating herself? And so forth.
Non-content-based responses invariably stall, and can foreclose, the transmission of content over the course of a linguistic exchange. For example, journalist Jessica Grose recounted interviewing an older man for Businessweek who responded to her questions by telling her that she sounded like his granddaughter (Grose Reference Grose2015). Such a response places Grose in a difficult communicative position. As a result of receiving this non-content-based response, Grose must decide whether to pursue her interlocutor's comment as part of the interview, ignore it and risk offending him, take offense herself, alter her vocal tone for the duration of the interview, and so on.
Because non-content-based responses impede linguistic exchanges, the particular negative response to female vocal fry we are interested in has some systematic communicative consequences. In the workplace, these can translate to additional material consequences as well. For example, suppose Shohreh goes to her boss and asks for a raise by saying, “I believe I deserve a raise due to all my hard work,” while using vocal fry. She justifies the raise by listing her recent accomplishments, explaining her dedication to the company, and reciting statistics that demonstrate that she is making below-average pay for women with her qualifications in her field. Rather than responding to the content of her request, her boss responds by telling her that her voice is annoying and that she should change the way she speaks. As a result of this non-content-based response, Shohreh must either give up or start anew in her pursuit of the raise.
III.ii Non-Content-Based Responses and Personal Identity
Some situations call for non-content-based responses, such as if there is genuinely a bear behind you who will be provoked by your continued speech. There are also some cases in which respondents may form the judgment that, based on a speaker's previous utterances or actions, non-content-based responses are the most reasonable response to that particular speaker or that type of utterance. We have in mind here times when inappropriate or offensive content is delivered in a speech act. At such times, the act warrants some response, but not one that legitimizes its content through engagement. In these instances, when the respondent's prior assessment has engaged with the content of a speaker's pattern of utterances, a non-content-based response to a particular utterance is warranted. In such cases, then, a respondent has every right, and perhaps some moral duty, to ignore the content of the speech act. For example, suppose a man at a coffee shop approaches a woman whom he does not know and says, “You would look prettier if you smiled more.” The woman may surmise that regardless of her beliefs about the speech act's content, no fruitful conversation will follow from engaging on that content with a speaker who has delivered it in such a context. At that point, the woman is justified in delivering a non-content-based response, such as, “I'm working.”
This example shows that assessment of whether a particular response to an utterance is content-based or non-content-based is not always clear-cut or binary: unlike “Shh! There's a bear behind you,” the woman's response of “I'm working” was generated in a wider context of her forming the judgment that sexist utterances do not always merit content-based responses. In this wider lens, she does respond to the content of the utterance by choosing not to engage it with a content-based response. In future work, we hope to develop a more robust account of content-based and non-content-based responses that accommodates these differing levels of context.Footnote 6
An important component in this example is that the respondent in such cases is responding to some features of the original speaker's personal identity. It is who is speaking that is being judged, more than the content of the particular speech act. This is the case as well in the negative response to female vocal fry. The dismissal of female vocal fry as annoying and hard to listen to ties intimately to a dismissal of aspects of personal and gender identity. Grose writes evocatively about the experience of receiving this negative response:
I remember one [listener-commenter] in particular said I sounded like “a valley girl and a faux socialite,” and there were a couple of comments that echoed that, and the tenor of them was pretty nasty. And before that I had never really thought about my voice, one way or the other. No one had ever commented on it to me. I was hurt—that sounds a little silly, I'm a big girl, I write all the time on the Internet, and so I'm used to criticism, but there's something really personal about your voice, and especially if it's something you've never thought about as unpleasant. It's not fun to hear that people find it irritating. (Grose Reference Grose2015)
We agree with Grose that voices are deeply personal aspects of communication, and we see our project as one that aims to incorporate this intimate and under-appreciated dimension of speech acts into contemporary philosophy of language. In addition to being personal and individual, though, voices are also typically gendered in a way that other dimensions of the speech act, such as grammar and discursive function, are not. When women are judged negatively for deploying vocal fry, the judgment being made is not about the content of their speech, but about the manner of their utterance. As a result, the linguistic exchanges such women participate in are stalled and can be halted in their entirety. This limits such women's abilities to communicate, and consequently their abilities to act as autonomous agents.
The distinction between content-based and non-content-based responses is crucial to our understanding of the communicative misfire that occurs when instances of female vocal fry are met with the negative response we have described. In the next section, we draw on work in cognitive linguistics to propose an explanation for this misfire. Our account, which we call the echoic account of vocal fry, aims to explicate both the variety of conscious and nonconscious motivations women have for employing vocal fry, as well as the patterning behind the non-content-based negative response. Once we have laid out our account, section V orients our view with respect to related work in feminist epistemology.
IV. The echoic account of vocal fry
Before we proceed further, we want to reiterate that our diagnosis of the misfire that produces the non-content-based response to female vocal fry is not an apologetic for those who perform the response. Both authors of this article are consummate female vocal fryers with no interest in reshaping our vocal patterning for the comfort of our audiences. We are interested in accounting for where the misfire comes from in order to put it in the light, so that it, and the non-content-based response, can be seen as the sexisms that they are.
Characterizing the dismissive “annoyance” response to female vocal fry as non-content-based has delivered part of the complex diagnosis of female vocal fry's reception that we aim to make here. Female vocal fry's violation of the frequency code offers something of an explanation, although certainly not an excuse, for the psychology behind the response: sociolinguistics showed that vocal fry is indexed masculine, directly or indirectly, which means that female performances of vocal fry may be heard as a conflicted or confused utterance, not unlike hearing someone you believe to be named “Monika” introducing herself as “Julia.”
We believe this perceived conflict can be understood as a conflict in the hearer's mind about the nature of the speech act performed by the speaker. If the co-author of this article who is named Monika says, “I am Monika,” the nature of that speech act is fairly straightforward: Monika is asserting content in a declarative sentence, perhaps in order to communicate which of the co-authors she is to a conference audience, or to introduce herself to a new colleague, or to provide a classroom example of a declarative sentence. However, if the co-author of this article who is named Monika says, “I am Julia,” and the hearer knows that the speaker is in fact Monika, then the hearer is presented with a problem to solve: What did this speaker mean? It may be that Monika is playing a prank on the hearer, or that she is indicating that she will be standing in for Julia to deliver a lecture, or perhaps that she has decided to change her legal name simply in order to make the examples in co-authored articles much more confusing. This conflict is jarring for the hearer, and it can hinder the flow of conversation.
In canonical work that traces its origins to pragmatic theories within the philosophy of language (for example, Austin Reference Austin1975; Grice Reference Grice1989), linguists Sperber and Wilson develop the relevance theory of utterances, which emphasizes the existence of communicative intentions as a dimension of speech acts distinct from informative intentions (Sperber and Wilson Reference Sperber and Wilson1986). Sperber and Wilson aim to characterize a variety of communicative intentions through further identification of nonsemantic features of utterances and their responses within communicative environments.
A central aim of their work (indeed, the eponymous aim) is to determine when and how different types of utterances achieve relevance. They contrast utterances that achieve relevance through descriptive use with those that achieve relevance through interpretive use. Descriptive utterances deliver straightforward asserted content, whereas interpretive utterances achieve relevance, and therefore meaning, by expressing some propositional attitude toward the asserted content. The meaningful content that it is raining is delivered descriptively when the sentence “It's raining” is uttered to describe the state of the present weather; the same meaningful content is delivered interpretively when the sentence, “I doubt that it's raining,” is uttered. In particular, for Sperber and Wilson, in this latter case the content that it is raining is delivered echoically. Echoic utterances achieve their relevance by expressing propositional attitudes toward other utterances: echoic utterances convey thoughts that are not directly about an actual or possible state of affairs, but about another thought that it resembles in content (Wilson and Sperber Reference Wilson and Sperber2012).
Sperber and Wilson use their device of the echoic utterance to construct a theory of ironic utterances. In the echoic theory of irony, a speaker's ironic utterance is interpreted not as an assertion of any genuine said-content, but rather as the expression of a mocking or derisive attitude toward the descriptive expression, by an actual or hypothetical other speaker, of that content. Expressing such attitudes often occurs nonsemantically, through contextual features of the utterance such as timing, environment, and vocal tone. Sperber and Wilson argue that depending on the vocal tone of an utterance, the assertive propositional attitude in the ironical utterance differs from the optative or normative propositional attitude of the people whose thought is being echoed. This is to say that the way that an utterance is delivered can influence the way that the utterance is understood by the audience. In ironic utterances in particular, vocal tone can reign over the attributive utterance—in successful cases of irony, the way that the attitude that is being referenced is understood by the audience. Vocal tone, according to Sperber and Wilson, can expose either a way that the speaker wishes the attitude to be understood, or mimic the attitude itself (Wilson and Sperber Reference Wilson and Sperber2012).
This is a technical way of saying something that most of us know pre-philosophically: that the way something is said can influence its meaning. For example, suppose that one morning, Monika and Julia are trying to figure out what to do on their day off.
Julia: It's such a nice day for a hike.
Monika: Let's go on a hike, then.
Monika and Julia proceed to pack up, travel to a nearby park, and head down the trail. Unfortunately, they did not check the weather forecast closely for the region of the park, and they are soaked by an afternoon shower in the middle of their foray. As they are wringing out their clothes, Monika quips:
It's such a nice day for a hike.Footnote 7
In this remark, Monika references both Julia's initial utterance, and Julia's attitude in that utterance, to contribute to what is ultimately understood by her audience as biting sarcasm. In this case, Monika is clearly echoing Julia, lending to Julia's understanding that Monika is trying to comment ironically on their failed hiking trip by referencing Julia's earlier comment.
In ironic utterances, the referenced attitude is usually the opposite of what the speaker is trying to convey. In the case of Monika and Julia's bedraggled hiking trip, the attitude referenced is the positivity surrounding a hike. The audience then derives that what this utterance and referenced attitude are meant to express is that the hiking trip has failed. By referencing the positive attitude ironically, Monika reinforces her expression of a negative attitude toward the trip at the time of her ironic utterance.
The echoic account of irony may be usefully extended to explain the conflict that generates the negative reaction to vocal fry. Analogously to cases of irony, in the negative reaction to vocal fry, the annoyed hearer may be understood as interpreting the speaker's remark as a dismissive reference to utterances delivered by male-voiced speakers. We believe that in its violation of the frequency code, female vocal fry may appear to some hearers to be presenting the notion of a male voice and any utterances it makes as a distorted echo, a dismissive reference to the idea of men talking. This interpretation extends the echoic account of irony beyond referencing the content of utterances to referencing vocal tones themselves, and to doing so with a particular interpretive attitude.
Sperber and Wilson discuss how vocal tones may help guide how the interpretation of an attitude is meant to be taken (Sperber and Wilson Reference Sperber and Wilson1986). Our extension of their echoic account of irony posits not only that vocal tone guides interpretation, but also that it itself may be an object of reference in echoic utterances; that is, that vocal tone is such an important part of speech acts that it can play the role of the echoed utterance. In the echoic account of vocal fry, the negative reaction to vocal fry is accounted for at least in part by the hearer processing the speaker's utterance as one that is dismissively echoic of male voices. Unlike in the hiking example above, the supposed dismissive attitude is not expressed toward a particular piece of asserted content, but rather toward a vocal tone type itself. The fact that this supposed dismissal targets vocal tone, rather than asserted content, helps to explicate why the annoyance reaction is often followed by a non-content-based response: if an utterance spoken with female vocal fry is interpreted as aiming to express a dismissive echo of male vocal tone, then the content of the utterance is secondary at best.
As we discussed earlier, a non-content-based response is sometimes appropriate, such as when one speaker hasn't heard another's utterance and requests them to repeat what they said, or when a conversation needs to be halted in order not to attract the attention of bears. Some echoic utterances attract non-content-based responses on the basis of individual circumstances. Others attract non-content-based responses on the basis of personal identity features of the speaker, as discussed above. As an additional example, consider the echoic appropriation of the chant “Black Lives Matter!” in the form of the chant “All Lives Matter.” By echoing the structure of the “Black Lives Matter” chant, people who chant “All Lives Matter” are, wittingly or unwittingly, dismissing the complex set of structural injustices targeted against Black people of which the “Black Lives Matter” chant is intended to remind us. It is appropriate, we believe, simply to refuse to engage with the content of the “All Lives Matter” chant, because those who chant it are typically either racist, or they are ignorant of the dismissively echoic aspect of their utterance (which may amount to the same thing). Either set of circumstances creates a barrier to engagement with the content of their chant, which, while not insurmountable, is reasonable to see as grounds for halting or redirecting conversation. Responding to an “All Lives Matter” chant with, “I'm working,” “Okay, Boomer,” or even, “Shh! There's a bear behind you,” is, in our view, an appropriate deployment of a non-content-based response.
With the echoic account of vocal fry, we can understand the annoyance response to female vocal fry as a type of non-content-based response to an utterance that is taken to be echoic. The contexts in which the annoyance response is offered are systematic: it is deployed on women more than on men; it is, at least anecdotally, deployed by men more than by women; and white cultural critics consider it particularly in the context of the (presumed-white) workplace, especially in male-dominated workplaces. So the response is best understood as a response to personal identity features of the speaker, rather than simply to a set of particular contingent environmental circumstances. The annoyance reaction is prevalent in male-dominated workplaces, that is, in places where male voices are both more common than female voices and where it is more common to find a male voice than a female one in a position of authority or leadership. Social-science research has shown that both men and women express a preference for male-indexed voices in leadership positions, including both apparently male voices and deeper female voices (Anderson and Klofstad Reference Anderson and Klofstad2012).Footnote 8 An implication of these circumstances and this research is that, in workplace annoyance reactions (and perhaps elsewhere), the annoyed hearers may be consciously or subconsciously interpreting female vocal fry users as dismissively referencing not simply the gender identity but the authoritative stance of male-voiced coworkers.
This interpretation is echoed in the panicked response of cultural commentators (for example, Wolf Reference Wolf2015; van Edwards Reference van Edwardsn.d.) encouraging women to stop using vocal fry if they want to be taken seriously in the workplace. If a hearer believes a woman to be dismissing or deriding a male figure of authority through the use of vocal fry, they may view such an action as warranting a non-content-based response. In our view, both this reaction and the more common annoyance reaction are tacitly interpreting female vocal fry utterances as dismissively echoing an utterance type that they, by dint of gender and professional identity, are not entitled to, rather than making any asserted claim to be engaged with.
The non-content-based response to female vocal fry is hard to understand in part because the complex set of associations that lead hearers to call it annoying may or may not be consciously deployed. In articulating the echoic account of vocal fry, we have extended Sperber and Wilson's echoic account of irony to develop the notion that vocal tone types, and not merely asserted content, are the sorts of things that can be echoed. Sperber and Wilson's account of echoing then generates the mechanics necessary to understand how utterances that are taken to be echoic warrant distinct sets of responses from descriptive utterances. We suggested that echoic utterances in particular may generate non-content-based responses, either due to the context of the utterance or due to personal identity features of the speaker. In the case of the annoyance reaction to female vocal fry, we believe that personal identity features of the speaker are implicated as part of the warrant for the non-content-based response.
Again, our account is not meant to excuse the annoyance reaction. On the contrary, we believe that by highlighting the sexist attitudes at play in the uptake of female vocal fry as a dismissive echo of male-authority voices, we can shine a light on the fact that deeming vocal fry “annoying” is not a mere aesthetic preference. Instead, it is a restriction on the expressive autonomy of people, and especially of women. In the next section we orient our view against other communication- and language-oriented feminist epistemology.
V. Non-Content-Based responses, silencing, and discursive injustice
Feminist epistemology has widely acknowledged that the personal or group identity of a speaker can affect the variety of responses to a speech act performed by that speaker, and that power dynamics between speakers further constrain this variety of responses to the disadvantage of the disempowered (for example, Collins Reference Collins1990; Crenshaw Reference Crenshaw1991; Langton Reference Langton1993; Hornsby Reference Hornsby1995; Fricker Reference Fricker2007; Dotson Reference Dotson2011; Kukla Reference Kukla2012; Maitra Reference Maitra2018). It should be evident at this stage that we see the annoyance response to female vocal fry as a thread in this tapestry. We have shown that the non-content-based nature of this response can stall or foreclose conversations to the disadvantage of the woman performing vocal fry, and we have attributed this response to sexist interpretations of the nature of vocal-fried speech acts. In this section, we offer a few points to frame our account of the non-content-based response and the echoic account of vocal fry against existing work on the complex interrelations between speaker identity and audience identity in pragmatic philosophies of language and feminist epistemology.
In Black Feminist Thought, Patricia Hill Collins characterizes systematic ways in which Black women's statuses as knowers are undermined through structurally oppressive, stereotype-driven “controlling images” that diminish their epistemic power (Collins 1990). As a result of this oppression, the ability of Black women to speak their truths and to transmit content as knowers is limited; this is what Collins identifies as silencing. We see the non-content-based response to female vocal fry as a form of silencing, given that it is a personal-identity-driven way in which certain speakers’ abilities to communicate are limited.
By identifying the non-content-based response as a variant of silencing, we do not wish to locate it anywhere near the severity or structural entrenchment of the forms of oppression with which Collins articulates the concept. The negative response to female vocal fry is a much smaller and more surmountable obstacle; it is also one does not cleanly fit into the typologies of epistemic and testimonial injustice that arose as scholars like Rae Langton and Jennifer Hornsby drew connections between Austinian speech-act theory and feminist analyses of structural systems of oppression (Langton Reference Langton1993; Hornsby Reference Hornsby1995; Hornsby and Langton Reference Hornsby and Langton1998). These approaches in the philosophy of language aim to characterize relations between speakers and hearers as a foundational part of the interpretation and analysis of speech acts, For instance, Hornsby's requirement of reciprocity between speakers and hearers consists in the need for a successful speech act to communicate its content to its audience (Hornsby Reference Hornsby1995). This cannot be achieved if there is a failure of uptake on the part of the hearer. Although Hornsby does not measure failures of reciprocity by the yardstick of hearer responses, we believe the receipt of a non-content-based response is sufficient for a failure of reciprocity: if one's respondent engages with the fact but not the content of one's speech, then that content has by definition not been reciprocated.
Continuing in the construction of theories of language use that incorporate speaker identities, Quill Kukla and Mark Lance develop a typology of speech acts that disambiguates between agent-relative and agent-neutral inputs and outputs of speech acts (Kukla and Lance Reference Kukla and Lance2009). They offer the example of a colonel ordering a cadet to “Drop and give me ten push-ups!” This speech act generates a different demand upon the hearer (the cadet) than the same utterance said by a child to a sibling, or by one passing stranger to another. The differences in the force of the demand are what Kukla and Lance mean by agent-relativity.
It is not always the case that a non-content-based response is agent-relative; supposing that the respondent who utters, “Shh! There's a bear behind you,” is sincere, and that they do not hold bear-attack-level grudges against anyone in their vicinity, it is reasonable to suppose that anyone who said anything to that respondent in that moment would receive the same response. However, female vocal fry is not a bear behind you. As the empirical studies and media reports discussed above showed, though it is the case that both men and women use vocal fry, there is a significantly different response to the use of vocal fry by women than by men, especially in the workplace. In the non-content-based negative response to female vocal fry, respondents are determining that, since a woman uses vocal fry, the content of her speech act does not warrant engaging.
This determination is related to, but distinct from, the phenomenon of discursive injustice. Discursive injustice, coined by Kukla, describes a set of agent-relative responses wherein women and minorities are routinely judged as making a speech act that is less forceful than intended. In cases of discursive injustice, an assertion is seen as a request, in virtue of the speaker's social status playing a role in the uptake of the speech act. Kukla gives the example of Celia, a manager at a factory, whose employees routinely misread her work demands as requests. Celia suffers from discursive injustice (Kukla Reference Kukla2012).
There is a similar impact in Celia's case as in Shohreh's case above, in that in both cases a female speaker's ability to generate the desired responses to her utterances is limited by respondents’ assessments of features of her personal identity. However, the negative response to vocal fry is not a case of discursive injustice, as the respondent does not mistake a particular type of discursive function for a less forceful one. The non-content-based response occurs before the respondent even manages to engage with the discursive function of the speaker's utterance. In cases of discursive injustice, the audience is still engaging with the content of the speech act, whereas in the negative response to vocal fry, the speaker receives a non-content-based dismissal.
Though it is not an instance of discursive injustice, the non-content-based response received by our hypothetical Shohreh and nonhypothetically described by Grose still finds an anchor point in the feminist epistemology literature from which discursive injustice arose. In her work on testimony, epistemic violence, and epistemic power, Kristie Dotson considers a variety of interrelated limits on speakers’ and knowers’ communicative and epistemic autonomies (Dotson Reference Dotson2011; Reference Dotson2014; Reference Dotson2018). She defines epistemic violence in testimony as “a refusal, intentional or unintentional, of an audience to communicatively reciprocate a linguistic exchange owing to pernicious ignorance” (Dotson Reference Dotson2011, 238). The force of this definition hinges on the notion of pernicious ignorance, which is in turn defined as ignorance that is reliable and harmful. By mistaking the intentions of instances of female vocal fry in the manner outlined in the echoic account, we believe that hearers who engage the non-content-based response to female vocal fry are committing pernicious ignorance.
This diagnosis pairs with a treatment plan: recognizing that female vocal fry is more than an aesthetic linguistic choice generates new burdens on communicators to reinterpret speech acts and, optimistically, to achieve greater reciprocity and thereby empower the speech of women. More modestly, recognition by both users and hearers of female vocal fry that users are generally trying to achieve a communicative goal other than the one outlined in the echoic story may be able to disrupt the flow from the echoic account to the non-content-based response. We may still be told our voices are annoying, but knowing what tacit, and perhaps explicit, assumptions underlie that response generates new options for users to restart stalled conversations after receiving the response—and importantly, options other than changing their voices to sound more pleasant to men.
VI. Vocal tone in speech acts and the burden of the hearer
The non-content-based response to women users of vocal fry is a frequent enough occurrence to warrant significant cultural commentary. The fact that the response is to vocal tone rather than to the asserted content of a speech act means that most pragmatic accounts of speech acts, which emphasize the mechanics of interpretation and response to content (even when content is generalized beyond semantics) fail to apply. We have argued for a way to understand the response through the characterization of the response as a) non-content-based and b) driven by a mistaken understanding of female vocal fry as dismissively echoic of men's voices. We began this analysis with a historical overview of sociolinguistics research on vocal fry, in order to empirically ground our conceptualization of the reaction pattern, and we ended by orienting our views in the dialectical settings of contemporary work on pragmatics and feminist epistemology. Throughout, we have aimed to show that the reaction we considered commits an injustice against those who experience it.
We believe there is more work to be done in this area, particularly in the analysis of how other intersectional features of personal identity, notably perceived age and ethnicity, may affect the likelihood of a woman receiving a non-content-based response to a speech act performed using vocal fry. We hope that the frameworks we present here may be useful as a starting point for such work. It is also worth noting that the account we have developed does not aim to, nor does it, characterize the complex of non-authority-seeking reasons women may have for using vocal fry, such as in-grouping and intimacy among friends, signaling of conversational code-switching, celebrity-imitation, conveyance of detachment or boredom, and others.
As we emphasized in developing the echoic account of female vocal fry, our account goes some way to explain the reaction of the hearer, but it is not intended as an excuse for writing off the negative reaction to women's vocal fry as a mere misunderstanding. The negative reaction to female vocal fry is rooted in normative expectations about how men and women should sound, and even though there is some biological foundation for the frequency code, it is outdated and sexist to hold that utterances violating gendered expectations about vocal tone should warrant non-content-based responses. This is to say that those who are using vocal fry when they speak aren't usually aiming to reference an attitude of derision toward men, so non-content-based responses are generally inappropriate and rude, on par with responding “That's annoying,” to Monika's declaration, “I am Monika.” Women who experience this reaction are subject to an injustice that has yet to be characterized in the literature, in which their voices, but not their ideas, are being heard. Our framework suggests that rather than telling women to change their ways, we would be better off asking ourselves why we assume that echoic derision is what women are doing by adopting masculinized phonations. By shifting the burden from speaker to hearer, we can better understand and correct for vocal fry's negative reputation.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/hyp.2020.55.
Acknowledgments
Both authors thank Ásta for her helpful early discussions of the ideas that generated this paper. Monika thanks the members of the Bay Area Feminism and Philosophy Workshop for their feedback on this work. Julia thanks the members of STARS for their accountability and discussions.
Monika Chao is a PhD student in philosophy at the University of California, Berkeley. She holds an MA from San Francisco State University, where she wrote a thesis titled “An Echoic Account of Vocal Fry.” Her research interests include philosophy of language, feminist philosophy, and philosophy of science.
Julia R. S. Bursten is an Assistant Professor of philosophy at the University of Kentucky, where she also holds a secondary appointment in the Department of Gender and Women's Studies. Her work has appeared in Philosophy of Science, British Journal for the Philosophy of Science, Studies in the History and Philosophy of Modern Physics, Journal of Physical Chemistry Letters, and Nature Nano. She edited a collection of essays titled Perspectives on Classification in Synthetic Sciences: Unnatural Kinds (Routledge, 2019). From 2014–2018 she served as co-chair of the Philosophy of Science Association Women's Caucus.