INTRODUCTION
The view that speakers agentively use linguistic variation to construct their own identities as well as larger social structures has become increasingly important in the study of sociolinguistic variation (e.g., Eckert Reference Eckert2000, Zhang Reference Zhang2005). This view requires an understanding of variation as indexing social objects such as speech acts, activities, and stances (Johnstone Reference Johnstone and Englebretson2007, Ochs Reference Ochs, Duranti and Goodwin1992), and identity features, including social category membership (Podesva Reference Podesva2006, Podesva, Roberts & Campbell-Kibler Reference Podesva, Roberts, Campbell-Kibler, Campbell-Kibler, Podesva, Roberts and Wong2001). Through their indexical relationships to these social objects, linguistic acts convey social information to listeners, representing attempted moves that may be supported, ignored, or challenged. But while speakers have a great deal of leeway in choosing resources, successful performances must prompt others to interpret them in desirable ways, making listeners' reactions central to the use of sociolinguistic variation and its development over time.
As yet, little effort has been devoted to understanding how listeners form reactions to sociolinguistic variation, although much has been aimed at understanding what some common reactions are and how speakers anticipate them. The role of the listener in constructing the social and linguistic identities of others has been theorized from multiple perspectives (Bell Reference Bell, Eckert and Rickford2001, Butler Reference Butler2001, Giles & Powesland Reference Giles and Powesland1975), and a number of studies have investigated sociolinguistic perceptions (for overviews, see Campbell-Kibler Reference Campbell-Kibler2005, Giles & Billings Reference Giles, Billings, Davies and Elder2004). Much of this work has been aimed at discovering the social stereotypes associated with whole languages or language varieties (e.g. Lambert et al. Reference Lambert, Hodgson, Gardner and Fillenbaum1960). Studies of the perception of more detailed sociolinguistic variation have so far focused on investigating how good listeners are at detecting individual variables (Labov et al. Reference Labov, Ash, Baranowski, Ravindranath, Weldon and Nagy2005, Plichta & Preston Reference Plichta and Preston2005) and/or whether their reactions can be aligned with the associations deduced from production studies (Fridland, Bartlett, & Kreuz Reference Fridland, Bartlett and Kreuz2004, Labov Reference Labov1966). Some work has addressed how perceptions of regionally marked varieties may be influenced by listeners' age (Ball Reference Ball1983), sex, and regional background (Labov et al. Reference Labov, Ash, Baranowski, Ravindranath, Weldon and Nagy2005, Paltridge & Giles Reference Paltridge and Giles1984), while other work has explored how listeners use their own speech habits as points of reference in their evaluations, for example favoring speakers socially whose speech rate is close to their own (Aune & Kikuchi Reference Aune and Kikuchi1993, Street, Brady, & Putman Reference Street, Brady and Putman1983). Less explored have been the ways in which differences in personality, mood, situational goals, or other interpersonal factors may contribute to listener perceptions. These factors have the potential to shed light on the more fundamental question of how sociolinguistic perceptions function – a question that is key to understanding how a given use of a variable will play out and, ultimately, what the relationship is between linguistic variation and social space.
This article explores examples of demographically similar listeners reacting differently to a single variable, even when it is used by the same speaker in the same linguistic context. The cue in question is the English variable (ING) – the alternation between word-final [in] or [ən], both referred to here as -in, and [iŋ], called -ing – studied via an expanded form of the Matched Guise Technique (MGT) that used matched pairs of recordings of spontaneous speech, digitally manipulated to differ only in tokens of (ING). Data were gathered in open-ended discussions of the stimuli in group interviews and through an online experiment. Other reports on these data have documented the ways in which the meaning of (ING) is influenced by contextual factors, including other linguistic cues, particularly regional accent (Campbell-Kibler Reference Campbell-Kibler2007). This discussion will present three patterns in which subsets of listeners showed different responses to the same (ING) variant from the same speaker – for example, some listeners marked a speaker's -in guise as compassionate while others labeled it condescending. These differences of opinion relate not to disagreements about (ING) alone, but to a difference in how the listeners incorporate their understanding of the variable into their image of the speaker. The results show that listeners use the process of listening to exploit (consciously or automatically) the multiple meanings available for a given piece of socially significant linguistic structure.
The next section discusses the various ways in which listeners may differ from one another in their interpretation of a variable and relevant for the current discussion. It also touches on the existing literature on listener agency and the ways listener reactions help to shape speaker choices. The third section describes the methods employed in collecting and analyzing the data, including the development of the stimuli, the collection of open-ended metalinguistic commentary, and the design of the questionnaire-based experiment. This is followed by a section describing the divergent reactions to (ING) found in the experimental data and examples from the group interview data to illuminate some of the social logic behind the patterns found. Finally, the fifth section explores some theoretical issues concerning the nature of social meaning and how these findings contribute to our understanding of them.
UNDERSTANDING LISTENER AGENCY
One need not be a sociolinguist to know that the construction of a sociolinguistic performance can be a difficult and at times dangerous project. The audiences for whom we perform on a day-to-day basis are not obligated to accept our accounts of ourselves, even if they share a common ground with us regarding the basic meaning of our semiotic choices. As a result, the process of constructing linguistic (and other social) performances is not like encoding a secret message, where we can trust that the recipient is seeking to uncover exactly the message we intended to send, whether they succeed or fail. Instead, social performance is more like choosing a name for a child: We may study name books and quiz friends about childhood memories of insulting nicknames, but once the name is chosen, we ultimately have no control over what someone gets called on the playground – that is, what interpretations others assign to our chosen resources. Indeed, as on the playground, we may expect that specific audiences will make it a point to assign either the most damaging, most supportive, or most amusing interpretation possible. Sociolinguistic choices in these different environments are likely to differ, just as they differ when audiences vary in other respects (Bell Reference Bell1984, Reference Bell, Eckert and Rickford2001; Giles & Powesland Reference Giles and Powesland1975).
Listener reactions to specific resources within a performance influence not only the behavior of a given speaker but the listeners' own future behavior toward that speaker, as well as their own deployment of the resources themselves. This cycle, which is fundamental to sociolinguistic variation and change, depends crucially on the process of sociolinguistic perception, the details of which are poorly understood. One obvious point about this process is that listeners have at least some latitude in determining what aspects of a performance to attend to and what to do with them. The perceptions of each listener are shaped by previous uses and analyses of the variable, but they are not fully determined by this history. The knowledge built up over time provides a set of possible understandings, and one of the open questions in studies of variation is how, in a given setting, members of that set come to be understood as dominant, both immediately in the mind of the listener and over the course of an interaction. Chun Reference Chun2006 has investigated how linguistic performances can be recast stylistically after the fact by both the original speaker and others. Her data involve explicit reframing of an utterance as having sounded “preppy,” which serves within a non-preppy setting to mark particular topics and linguistic markers as belonging to this social category. The present article focuses on a listener's immediate response, rather than on the reconstructions through interaction, asking how listeners first form an understanding of the significance of a given use in a given context.
To think about the various points at which two listeners may hear the same speaker use the same cue and yet have different reactions, let us consider two audience members at a political rally, listening to a politician use an r-less variety of American English speech (associated with several regionally marked varieties of U.S. English, including the South, Boston, and New York City). One way for listener variation to translate into differences in perception is for listeners to have different meanings for the variable. For example, speakers of different varieties may have had different exposure to the same variable, so that one associates r-lessness, for example, with high status while another hears it as low status.Footnote 1 But even if two listeners have the same sociolinguistic knowledge of a variable, their interpretations may differ if they disagree about the speaker using it, so that a listener who thinks a politician is a shrewd manipulator may interpret his r-lessness as a false and calculated move to gain the trust of the electorate, while another may believe the speaker is honestly reflecting his “true” speech patterns. Further, even when their factual knowledge and assumptions about a speaker agree, they may have divergent emotional reactions, such that they interpret a performance in a more or less positive light, so that while two listeners both evaluate the r-lessness as honest, a negatively inclined listener hears it as lack of intellect while another thinks it signifies strong local ties. This divergence can continue through many layers, as when both hear the variable as connoting local ties, but one sees those ties as positive loyalty to a community while another connects them to corruption and cronyism. The instances of disagreement that will be presented in the fourth section fall on the latter end of this spectrum, centering not on different understandings of (ING) (though variation of that sort arose as well) but rather on different ways of incorporating the same or similar meanings into an overall picture.
These disagreements exhibit a common structure: a choice between assigning a given quality to a speaker and attributing to him or her an attempt to exhibit that quality. In each case, the speaker's use of (ING) as one or the other variant potentially contributes a particular meaning (compassion, intelligence or physical masculinity) to the speaker's situational self-presentation. Some listeners apply that meaning, increasing their perceptions of that speaker with respect to the quality in question (Elizabeth is more compassionate, Valerie more intelligent, and Sam is a jock). Other listeners, hearing the same utterances, deny that meaning but interpret the speaker as intending to convey it (Elizabeth is more condescending, Valerie is less intelligent and trying to impress, and Sam is annoying and less masculine). Note that neither the listener's interpretations nor the speaker's intentions need be conscious. They could be a set of learned automatic processes whose relationship to conscious ideologies and judgments is as yet unknown. I will return in the concluding section to both the topic of intention and to that of conscious vs. automatic processing, but for now it is important simply that the data to be presented will illustrate this tension between speaker intention and successful social moves. Before presenting them, however, I will first describe the methods used in collecting the responses.
METHODS
The data discussed here come from a study using an expanded form of the Matched Guise Technique (MGT). The MGT contrasts listener reactions to samples of recorded speech that have been designed to differ in specific and controlled ways, typically to compare reactions to different languages (e.g. Bourhis Reference Bourhis1984) or language varieties (e.g. Purnell, Idsardi, & Baugh Reference Purnell, Idsardi and Baugh1999), though other variables have also been investigated (e.g., speech rate in Ray & Zahn Reference Ray and Zahn1999). The same speakers and texts are used to produce the different versions, to ensure that differences in reactions are directly attributable to the qualities under investigation. This study goes beyond the typical MGT approach by using digitally manipulated recordings of spontaneous speech, by combining open-ended interviews with an experiment, and by intentionally varying message content.
The easiest and most frequently used technique for creating alternate speech samples for MGT work is to ask speakers to shift styles deliberately. This can be problematic for individual sociolinguistic variables, however, as there is no guarantee that only the variable(s) of interest will be altered. Direct manipulation of the acoustic stream allows for much more precise alterations, and advances in technology have made it easier and faster, prompting work investigating the effects of altered pitch and speech rate (Apple, Streeter, & Krauss Reference Apple, Streeter and Krauss1979) or altered vowel formants (Fridland et al. Reference Fridland, Bartlett and Kreuz2004, Plichta & Preston Reference Plichta and Preston2005). My study used a “cut and paste” approach, inserting tokens of -in and -ing into the original recordings (also seen in Labov et al. Reference Labov, Ash, Baranowski, Ravindranath, Weldon and Nagy2005), creating matched pairs in which the (ING) tokens are either all -in or all -ing. By splicing only the tokens of interest (and in some cases immediately surrounding syllables) I could ensure that no other material varied between the two versions, creating matched pairs differing only in tokens of (ING). This manipulation technique also allowed me to work with spontaneous speech samples, taken from informal interviews, rather than with speech read or performed especially for the purpose. There is clear evidence that read and spontaneous speech differ in systematic ways (Hirose & Kawanami Reference Hirose and Kawanami2002) and that listeners perceive these differences (Guaïtella Reference Guaïtella1999, Mehta & Cutler Reference Mehta and Cutler1988), making it problematic to generalize to other contexts listener perceptions based on read or recited speech. In addition, stimuli based on spontaneous speech increase the realism of the judgment task, allowing listeners to hear the speakers not only as animators of the speech, but authors and principals as well (Goffman Reference Goffman1981).
The study brought together qualitative and quantitative data, using group interviews to develop experimental materials that were maximally appropriate for the population and stimuli (similar to techniques seen in Ladegaard Reference Ladegaard2000, Williams et al. Reference Williams, Hewett, Hopper, Miller, Naremore and Whitehead1976, and Wölck Reference Wölck and Hartig1985). In addition to providing the basis for designing the experimental questionnaire, the social descriptions and metalinguistic commentary from the interviews give insight into possible reasoning behind the more limited quantitative data. Conversely, the experimental findings allow for statistical methods of evaluating the generalizability of the patterns uncovered.
Finally, the inclusion of multiple samples from each speaker increased both the richness and the generalizability of the results. While MGT work often seeks to eliminate influence from content and speech context, this goal has been shown to be methodologically impossible (Giles et al. Reference Giles, Coupland, Henwood, Harriman, Coupland and Ramgaran1990, Smyth, Jacobs & Rogers Reference Smyth, Jacobs and Rogers2003), as well as theoretically problematic, since listeners will respond to the speech contexts they imagine even when none is specified (Bradac, Cargile & Hallett Reference Bradac, Cargile, Hallett, Robinson and Giles2001). As the next section will show, content played a strong role in listener reactions, from straightforward judgments – as when the speaker Ivan was overwhelmingly described as lazy when complaining about the amount of effort it takes to attend movies – to more subtle effects on the role of (ING). Similarly, the variation among individual speakers proved to have profound impacts on the contribution of (ING) to listeners' perceptions, as different voice qualities, levels of dynamism, and topics of conversation shaped the aspects of social perception that were available for manipulation by (ING). This variability is precisely why it was important to include multiple speakers and multiple excerpts from each.
The study design incorporated the regional background of both speaker and listener as a central variable, because there is reason to believe that speakers in the southern United States use -in more often and perhaps in different ways than others (Hazen Reference Hazen and Brown2005, Labov Reference Labov1966). Both speakers and listeners for the study were university students from North Carolina and California. The speakers, pseudonyms given in Table 1, were two men and two women from each location, all but one of whom had grown up in the state (Elizabeth, one of the California women, was originally from Seattle).
table 1. Speakers, by region and sex.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20151027133114241-0172:S0047404508080974jra_tab001.gif?pub-status=live)
The stimuli for the study were excerpted from informal, hour-long interviews with the speakers, focused primarily on work or school and hobbies or family. Before the interviews, I outlined the overall structure of the study and explained that I would be manipulating excerpts of their speech and playing them for others, but did not tell them what linguistic features I would be changing. After each interview, I met with the speaker again to record alternate (ING) variants. Given the utterance “I'm planning on going to grad school,” in the original, the speaker repeated: “I'm planning on going to grad school” and “I'm plannin' on goin' to grad school.” Speakers attempted to capture the speed and intonation of the original as much as possible, but this was a difficult task and subsequent manipulation proved necessary.
I selected four excerpts from each speaker's interview, ranging from 10 to 20 seconds in length, containing from 2 to 6 tokens of (ING) and varying with respect to content, as described above. The alternate (ING) variants were spliced into copies of the original excerpt using the software package Praat. Regardless of which variant appeared in the original, both the -in and -ing versions were altered, to minimize potential confounds. Praat's functions for manipulating intensity, pitch, and duration allowed me to adjust the alternates with respect to these qualities to match the originals and each other as closely as possible.
The first phase of data collection was a set of open-ended interviews, conducted on campuses in California and North Carolina with groups of one to six participants, though most were groups of two or three. Interviews began by eliciting general reactions to the speakers, playing two recordings from each of four speakers (all male or all female). During the first pass through the clips, I asked general questions about each speaker:
• What can you tell me about Jason?
• Does he sound competent or good at what he does?
• Is he someone you would be likely to be friends with?
• Who do you think he's talking to? What is the context of the conversation?
• Where you think he is from?
In the second half the interviews, we listened to the same recordings again, in their matched pairs. I explained the goal of the study and asked listeners to comment explicitly on the effect of (ING), eliciting intuitions on the general character of (ING) and the influence it had on the different performances. In all, I analyzed data from 20 interviews for a total of 55 participants.
The second phase of data collection was an experiment in which a new set of respondents evaluated the speakers on rating scales and descriptor lists. The experiment was conducted over the World Wide Web, with participants recruited through word-of-mouth e-mail and classified advertisements in university newspapers, again targeting university students in both California and North Carolina. Subjects ranged in age from 18 to 22 years and were predominantly White (60%), with substantial subpopulations of Asian (28%) and Black (13%) subjects. Listeners were recruited who had not participated in the interviews, and they were not told that the study was investigating (ING). These listeners heard only a single recording from each of the eight speakers, meaning that the members of the matched pairs were heard by different listeners. A total of 124 participants completed the study. An additional 36 began it but failed to finish and their data were removed from the analyses, out of concern that they may have not been taking the study in earnest.
To keep the survey under 15 minutes, no distracters were used, the interview phase having established that the (ING) tokens were not salient enough to reveal the goal of the study on their own. Instead, most participants seemed to interpret the study as being about regional accents. The survey instrument, shown in Appendix A, was developed using descriptors gathered from the interviews in the first phase of data collection, the literature on (ING), and previous MGT work. The instrument began by asking listeners to rate the speaker on seven qualities (e.g. educated, shy/outgoing). After these ratings came sets of descriptions in checkbox form, so that listeners could select those appropriate to the speaker, each as an independent binary choice. The first set of checkbox descriptions contained identities or personal characteristics such as redneck or artist while the second focused on situational or state qualities such as polite or joking. Finally came questions about regional background, whether the speaker was from the city, the country, or the suburbs, and likely class background.
I used logistic regression to investigate the influence of the independent variables (speaker, recording, (ING) variant and listener school, gender, regional background and race) on the checkbox variables, as well as co-occurrence between checkbox variables (e.g. articulate, artist). To analyze the ratings variables (e.g. not all educated/very educated), I used analysis of variance on linear regression models, including looking at the relationship between checkbox variables and rating variables by using the checkbox variable as a term. None of the listener demographic factors, including gender, race, or regional background, affected the results presented here, and I will not be discussing them further.
It is important to be clear about the role of the statistical techniques used in the next section, and in particular the generalizations that they are and are not able to support. Findings regarding the specific evaluative responses may be generalized to other listeners within the same population – young university students at high-prestige schools – hearing samples of these particular speakers recorded in a similar social setting: speaking informally to someone they do not know. Working with another population and/or a fresh set of speakers, it is virtually guaranteed that the actual pattern of selections would be different. What can be taken beyond this population and these speakers is the more fundamental observation that listeners have options in how they engage with a sociolinguistic performance, based on overall interpersonal reactions to a given speaker.
The core results of this article are three instances in which the listeners in the experiment disagreed in their interpretations of (ING) for a given speaker. The differences I will document all centered around the tension described above: the degree to which listeners were willing to credit a social move as successful, as opposed to seeing it as the speaker's attempted account of herself. These disagreements offer insight into the shape of the network of connected meanings that a variable such as (ING) may take on, and into how listeners hearing the same performance may settle on different portions of that network. The following section will explain the experimental patterns pointing to disagreements among listeners regarding the interpretation of (ING).
LISTENER VARIATION IN INTERPRETING (ING)
In this section I examine three instances where listeners disagreed about the impact of (ING) on their image of a given speaker. These examples are not the only cases in which (ING) influenced perceptions of a single speaker, nor the only cases where different listeners selected contradictory descriptors for the same speaker. They are the instances in which multiple statistical effects combine to support a more complete and more robust picture of the disagreement than would be available from a single result. Further, a basic pattern is shared across all three instances, in that each case involves some listeners accepting a positive potential meaning of (ING) as a fair reflection of the speaker's nature, while others recognize the same (or a related) meaning but instead respond to it as a failed attempt.
The first example concerns Elizabeth, a highly dynamic speaker who elicited some of the strongest responses in the group interviews. In two of her four recordings, she is heard discussing groups of people to which she does not herself along. The transcripts of the two recordings, “Discussion” and “Theme park,” are given below with the altered (ING) tokens in bold. Note that in “Theme park,” the future modal gonna was not altered.
(1) Discussion. And I don't think a lot of the people who were sort of at this lower level who were doing the data entry and who were actually ordering the things got involved in the discussions of what kind of effect this new system would have on the work and how the system could be structured to redesign the work.
(2) Theme park. And you go there and you might ride one ride and then you sit somewhere and you have a nice restaurant meal. And they're, you know, they're the family and this is the one time they're ever gonna make it there and they're trying to bulldoze through the park and stand in line and dash around. And you're just kind of sitting there watching it all go by.
Of the listeners who heard one of these two recordings (N = 67), some selected either the term compassionate or the term condescending as a description of Elizabeth. Not surprisingly, however, the sets of listeners selecting these two qualities were virtually disjoint (only one listener selected both). Both of these terms were significantly more likely to be selected when the listener had heard Elizabeth's -in guise, as opposed to -ing, as Table 2 shows.
table 2. Compassionate and condescending selections for Elizabeth's “other” recordings, by (ING).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20151027133114241-0172:S0047404508080974jra_tab002.gif?pub-status=live)
The topics of these recordings, dealing with the behavior of others, open the door for Elizabeth to be heard as either compassionate or condescending. The content itself, however, does not completely reveal the stance she is taking toward the others depicted. Most interview participants heard “Theme park” as dismissive of or even mocking those families trying to dash around, but one, already positively disposed toward Elizabeth (see ex. 6 below), enthusiastically described her as a young mother wanting to take her family to stand in lines at Disneyworld. Sample “Discussion” elicited more disagreement, with some interview participants hearing Elizabeth as complaining about higher-ups, as in (3), while others heard her as complaining about the workers themselves, in (4) and (5).
(3)
(4)
(5)
Both of these excerpts from Elizabeth involve her presenting an apparently ambiguous stance toward a group of other people (those uninvolved in the systemic change at work, and those attending the theme park). Two possible such stances are compassion and condescension. But she is much more likely to be heard as either one if she uses -in. To understand this pattern, we need to understand a few things about Elizabeth and how she sounds to these listeners. She was described as the most dynamic of all the speakers, rated the most outgoing, with a mean of 5.21 on a 6-point scale, far more than Jason, who had the next highest mean (4.43). This finding in the survey echoed the statements of the interview participants, who described Elizabeth with terms like energetic. Her dynamism seemed to polarize interview participants, prompting them to take strong stands liking her or disliking her. Elizabeth inspired some of the most positive comments, as in (6), and the most negative, as in (7), found in all the interviews.
(6)
(7)
Another important aspect of Elizabeth's speech is her perceived regional background. I have reported elsewhere on the patterns in these data regarding regional accent (Campbell-Kibler Reference Campbell-Kibler2007). One facet of this pattern was a common observation that -in belonged more appropriately in the speech of those North Carolina speakers who were perceived as Southern (one was not), while -ing sounded more natural for the three California speakers, including Elizabeth, who were heard as aregional, and described as being “from anywhere.” As one of these “anywhere speakers,” Elizabeth was characterized as accent-free and as someone who was educated and articulate enough to “say her G's.” The other two speakers I will discuss, Valerie and Sam, shared this accent-free status. Descriptions of naturalness primarily centered on what interview participants believed to be the most common form for a given speaker, but also invoked ideas of which variant would involve less effort for the speaker to produce. These perceptions loosely reflected the reality of these speakers, in that the four California speakers used almost no tokens of -in in their original interviews, while the four from North Carolina used a mixture of the two variants, but the imagined divide was much stronger than reality, as Southerners were typically described as saying -in only.
Because Elizabeth was seen as a “natural” -ing speaker, her use of -in stands out to listeners and is available to be interpreted as a sociolinguistic move. Given the content of these recordings, some listeners interpret that move in relation to the others that she is discussing. Depending on the listeners' opinion of the speaker and/or how generous they are feeling, they will interpret this move differently, since marked informality may be seen as either condescending (indicating lack of respect for her subject) or compassionate (indicating a more connected stance). The meaning of (ING) in this context is not fixed but varies for different listeners, based in part on their reactions to Elizabeth overall.
Both interpretations may be intensified by the two (ING) tokens in the final sentence of “Theme park.” The phrase is marked off with dramatic prosodic shifts: Where her descriptions of families is loud and contains several changes of speech rate, in the final clause she reduces her volume and its variability, and delivers the entire phrase at an even rate and a higher pitch, underlining the shift from her description of the frenetic families to her description of her own detached state. The relevance of (ING) for the humorous effect of this performance may be seen in the fact that these two tokens were produced as -in in the original interview, the only two -in tokens spontaneously uttered by any of the four California speakers. The use of humor in this context may further polarize listeners, as it marks Elizabeth as funny but also widens the gap she is drawing between herself and the others she is talking about.
These reactions to Elizabeth demonstrate that (ING) may be interpreted as a marker, not only of the speaker's background, character, or mood, but situation-specific stances, for example as an indicator of how the speaker is orienting toward his or her subject matter (Johnstone Reference Johnstone and Englebretson2007, Ochs Reference Ochs, Duranti and Goodwin1992). They also point to the degree to which listeners feel entitled to read qualities into a speaker's linguistic cues that speakers are unlikely to have included deliberately, given that it is unlikely that they believed that Elizabeth was trying to sound condescending.
The second example is from Valerie, another perceptually aregional speaker, who was heard as young (44% of listeners selected college aged) and confident (65%). Unlike Elizabeth, Valerie inspired disagreement in her -ing guise, though she, too, was heard as a “natural” -ing speaker. In Valerie's case, the role of (ING) seemed to be to either disrupt or support a larger personal style, to which listeners responded differently. Some but not all of the listeners hearing Valerie's -ing guise described her as annoying. Those that did also were more likely than other listeners to describe her as trying to impress her audience and rated her as less intelligent.
Table 3 shows the listeners' selections of the descriptions of annoying and trying to impress, presented separately for -in and -ing guises. Each smaller table gives the percentages of listeners selecting each possible combination of the two responses, out of all the listeners hearing that guise. Listeners hearing Valerie's -ing guise selected both annoying (16.6%) and trying to impress (42.4%) significantly more often than those hearing her -in guise (6.9% and 19.0%, respectively). Further, the -ing guise not only increased selections of both qualities, but it also prompted a favoring relationship between them: Listeners selecting both qualities formed nearly one-third of the trying to impress responses and most of the annoying selections. This contrasts strongly with the -in guise, where the two categories shared no responses. Put another way, a little less than half of the listeners hearing Valerie's -ing guise thought she was trying to impress her audience, and a substantial subset of these found her annoying. These reactions contrast with her -in guise, which provoked some selections of the two qualities, but fewer and apparently unrelated.
table 3. Annoying and trying to impress selections for Valerie, by (ING).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160627171308-31860-mediumThumb-S0047404508080974jra_tab003.jpg?pub-status=live)
This pattern of negative responses not only involved listeners finding Valerie annoying, but also implicated their evaluations of her intelligence. Table 4 shows the average ratings of intelligence for Valerie, broken down by (ING) and whether the listeners selected annoying or not. Listeners who selected annoying rated Valerie as significantly less intelligent than those who did not, and a non-significant trend suggests that this effect may be increased in her -ing guise. The highest intelligence ratings come from listeners hearing her -ing guise who did not select annoying, while the lowest come from those hearing the same guise who did. Intelligence ratings in response to Valerie's -in guises show a more moderate connection to annoying.
table 4. Intelligence ratings for Valerie, by (ING) and annoying.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20151027133114241-0172:S0047404508080974jra_tab004.gif?pub-status=live)
The experimental results show, then, that approximately one-quarter of the listeners hearing Valerie's -ing guise thought that she was an unintelligent young woman who was trying to impress her listeners, presumably to sound more intelligent. Further, these listeners were annoyed by this performance. We see evidence of interview participants having this negative reaction to Valerie and her intelligence in examples (8) and (9), both from participants hearing her -ing guise.
(8)
(9)
Not all interview participants shared this view of Valerie. Some listeners accepted her confidence and read her as an interesting and vibrant person, as in (10), from a listener who had heard Valerie's -in guise. Note that Greg comments on her not sounding like she was making an effort to impress, a spontaneous comment on his part.
(10)
The comments in (8) and (9) underline the strong role that message topic played in how interview participants constructed images of the speakers. Despite acknowledging the limited nature of the brief recordings, interview participants tended to position the topic of each clip as central to a speaker's identity. The clip under discussion in (8) and (9) featured Valerie describing some aspects of the history major and what the coursework for one was like, and on the basis of it, the listeners in (8) deduced (accurately) that she was herself a history major and (inaccurately) that she was complaining about difficulties (in the original interview the complexities she describes were presented as attractions of the field). This information, whether accurate or not, is foundational to the personality descriptions listeners provide, as they attach attributes to their fundamental image of a college student majoring in history. Greg, in (10), heard a different side of Valerie, as she spoke about having gone backpacking through Norway with her father. While this did not cause his positive opinion on its own (others heard the same recording and described her as “spoiled”), it surely shaped his reaction. The role of content of this type is quite literally inseparable from the analysis of the meaning of variation, although at times it is difficult to tell precisely how the content is contributing to the overall picture.
Valerie, like Elizabeth, is heard as accent-free and a “natural” -ing user, and as such it is -in, the unexpected form, that can be most reasonably seen as its own social move. In Valerie's speech, it appears the effect of the move is to disrupt a larger style. For some listeners, Valerie's entire self-presentation reads as someone who is trying too hard to sound smart. Her use of -ing is not the only cue, given that other speakers do not evoke this response, but it contributes significantly enough that when it is absent and -in is used instead, this reaction is eliminated.
The last example of listener divergence concerns Sam, one of the California men. Like Valerie, Sam comes across as extremely young (described as a teenager 40% of the time, half again as often as his closest follower), but he also sounds somewhat hesitant, with many long pauses in his speech (he was described as confident 28% of the time, in the middle of the group). Sam triggered an annoyed reaction in some listeners, just as Valerie did, but in response to his -in guise rather than –ing, and the reaction was associated with lowered perceptions of masculinity, rather than of intelligence. This reaction is in contrast to another, where Sam is seen as a jock (a person primarily interested in sports) and as more masculine. This latter reaction was not confined to -in; as Table 5 shows, there is only a trend suggesting a potential relationship between (ING) and selections of jock. Evoked by both guises, this reaction provides a marked contrast to the annoyed response which is specific to Sam's -in guise. Both these reactions were seen primarily in Sam's recordings on recreational topics.
table 5. Annoying and jock selections for Sam, by (ING).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20151027133114241-0172:S0047404508080974jra_tab005.gif?pub-status=live)
Table 6 shows that Sam is more likely to be described as annoying in his -in guise and shows the responses describing him as a jock in both of his guises. As is the case with Elizabeth, however, the listeners giving these responses are not the same: Only one person selected both annoying and jock as descriptors for Sam's -in guise. This disfavoring relationship between annoying and jock is robust across all responses to Sam (p = 0.002), and there is an interaction indicating it is even stronger in his -in guise (p = 0.011).
table 6. Masculine ratings for Sam, by selections of annoying and jock.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20151027133114241-0172:S0047404508080974jra_tab006.gif?pub-status=live)
The term jock is available as a descriptor for one type of physical masculinityFootnote 2 (Connell Reference Connell1995). Reflecting this meaning, listeners rated Sam as significantly more masculine when they also described him as a jock. Selections of annoying were associated with significantly lower masculinity ratings, underscoring the difference between this set of listener reactions and those associated with jock.
These masculinity ratings suggest that the listeners selecting jock in reaction to Sam are doing so because they have credited him with a successful bid for a particular kind of physical masculinity. In contrast, those selecting annoying discount Sam's masculinity, compared to other listeners, suggesting that they are interpreting -in in his case as either an entirely different social move, or an unsuccessful version of a move the jock selectors are accepting. (ING)'s connection to masculinity, and particularly to physical masculinity, is well documented. Survey studies have shown men in some cases using higher levels of -in than women in similar socioeconomic categories and similar speech activities (e.g., Labov Reference Labov1966, Shuy, Wolfram & Riley Reference Shuy, Wolfram and Riley1967), while others have tied -in use to particular types of working-class or physical masculinity (Fischer Reference Fischer1958, Kiesling Reference Kiesling1998). Because this annoyed and masculinity-lowering reaction is associated primarily with Sam's -in guise, it suggests that these listeners are interpreting his use of -in as an unsuccessful bid for this masculine identity.
Unlike the other two speakers, Sam does not evoke the same responses in both the experiment and the group interviews. Interview participants did not bring up Sam's masculinity for discussion, nor did they remark on him as particularly annoying. Instead, interview participants were most likely to describe Sam as recognizably similar to themselves, interested in typical teen or college activities and concerns. There are a couple of potential reasons for this divergence between the interview participants and the experimental subjects. One is that, for whatever reason, this reaction is a relatively subconscious one that is not available enough to introspection to be articulated in interviews. More likely, the number and size of the interviews meant that these reactions simply did not come up, particularly given that they were primarily in response to only two recordings, further limiting the pool of interview participants to hear them. Finally, because of Sam's position as the last speaker heard in the male-speaker interview sessions, responses to him tended to be shorter and less detailed than those regarding other speakers, owing to fatigue, so it is possible that participants having this reaction simply did not mention it.
All three of these sets of reactions provide insight into the ways in which even similar listeners may construct very different social images from identical linguistic cues heard in identical performances. In particular, they show the ways in which listeners are at liberty to accept or discount potentially valuable social meanings carried by specific linguistic variables. The final section will discuss the implications of these findings for sociolinguistic theory, particularly with respect to our understandings of speaker/listener diversity, the role of speaker intention, and the issue of to what degree the social moves of both speaker and listener should be seen as under conscious control.
DISCUSSION
Early work in sociolinguistic variation explored variable linguistic behaviors while emphasizing the commonality of “linguistic norms,” the evaluative beliefs that speakers hold regarding their own and others' linguistic traits. Labov incorporated this common evaluative understanding into the very definition of a speech community, built around “participation in a set of shared norms” (Labov Reference Labov1972:121). These norms were reflected in the ways that in, for example, New York City, variables stratified with respect to class were also associated with increased self-consciousness regarding speech, suggesting that whatever their own linguistic patterns, “most New Yorkers think or feel that particular variants are better, or more correct, or are endowed with superior status” (Labov Reference Labov1966:405). Despite the use of the word “most,” which allows for occasional outliers to stray, this picture is one in which, by and large, every speaker/hearer has the same evaluative reaction to a given form, and diverges from his/her neighbors primarily in linguistic production, not in assignment of social value.
This picture has since been challenged in a number of ways, most centrally in its focus on those evaluational norms based on systems of socioeconomic prestige. Rickford Reference Rickford1986 pointed out that this shared-norms approach assumes a consensual model whereby all participants in the social order agree on their place in the social hierarchy and on the location of socially significant linguistic cues. He argued that the model was inadequate for his own data from Cane Walk, Guyana, where members of the Estate Class, lower on the social hierarchy and faced with more economic hardships and physically taxing work, largely, though not unanimously, rejected values associated with the standard language market and, indeed, the very notion of reshaping the self in order to advance in the social hierarchy.
Not only is this system of prestige-related norms viewed differently by speakers in different places on the prestige hierarchy, but it also represents only a limited set of the evaluation systems at play in the sociolinguistic world. As Eckert (Reference Eckert2000:227) puts it, “While vernacular speakers know that their speech is stigmatized in the context of a global hierarchy, their day-to-day life unfolds in a place that is in many ways orthogonal to global prestige and stigma.” Sociolinguistic cues often index meanings, like toughness, flamboyance, or intimacy, that are not straightforwardly about prestige and that relate to each other in complex ways. Many such meanings are indexed indirectly (Ochs Reference Ochs, Duranti and Goodwin1992), through the connection of linguistic variants to particular points in the social landscape.
Inherent in this multiplicity of meanings is the opportunity for contestation. Listener/speakers gather and deploy social (including linguistic) resources and use them to craft the stylistic personae (Half Moon Bay Style Collective 2006) and interactional stances (Johnstone Reference Johnstone and Englebretson2007) that are most useful to them. As listeners, they then must take the cues they observe, which may or may not be the same ones the speakers perceived themselves to be using, and construct an image of the persona and stance of their interlocutor.
Despite the ability of speakers to construct their linguistic performances with particular social goals in mind, the meaning of these performances does not reside in the speaker's intention. In this respect, social meaning differs fundamentally from semantic meaning as defined by Grice: that a speaker A means something by an utterance x when “A intended the utterance of x to produce some effect in an audience by means of the recognition of this intention” (Grice Reference Grice1957:385). Grice distinguished this special linguistic sense of meaning, located in speaker intention, from natural meaning, such as that used in “Those spots meant measles,” in which the connection between the spots and the measles rests in the natural world and cannot be attributed to the intention of a particular person. Sociolinguistic meaning occupies an interesting position with respect to this distinction, since from some theoretical perspectives social meaning is a kind of natural meaning, with some even using the intentional/non-intentional divide as a guide to the boundary between the fields of pragmatics, devoted to linguistic meanings (i.e., those intended by the speaker), and sociolinguistics, devoted to the study of natural meanings (e.g., regional variation) (Levinson Reference Levinson1983:29). The social meaning-based approach to variation challenges this classification, attributing a sizable amount of agency to speakers' use of socially loaded variation, but without moving social meaning unproblematically into Grice's understanding of non-natural meaning. Metalinguistic understandings of social meaning in my data take speaker intention into account variably, based on the kind of meaning being evaluated and whether it is seen as a legitimate object of intention. Valerie's use of -ing combined with her other characteristics successfully means intelligent to exactly those listeners who do not perceive that move as intentional. Those who do think she intends “intelligence” by her social cues react by seeing her as less intelligent. In the eyes of these listeners, the social meaning “intelligence” is only legitimate as a Gricean natural meaning. In another context, perhaps a job interview, or for another meaning, such as politeness, however, intention does not conflict with the meaning itself and may resemble semantic meaning instead.
Discussions of intention inevitably raise the specter of another, largely neglected question as to the nature of social meaning: What aspects of sociolinguistic processing are subject to conscious control, and to what degree? We understand linguistic processing (particularly comprehension) to consist of rapid processes that operate without the conscious control of the speaker. Sociolinguistic cues are part of this linguistic system and so must be processed automatically in at least some ways. What is not clear is the degree to which conscious control plays a role in their social functions. Many sociolinguistic variables function in the social world with little conscious awareness on the part of speakers, a fact that has been seen as evidence that the role of social meaning on variation must necessarily be limited. After observing that speakers were not conscious of the centralization of diphthongs in his Martha's Vineyard study, Labov concluded that the variables “can hardly therefore be the direct objects of social affect” (Labov Reference Labov1972:40). Instead, he inferred, there must be some more general category, a grouping of variables, a style, in the California Style Collective's 1993 sense of the term, to which speakers are orienting. While it is likely that groupings of variables will turn out to be important in understanding the cognitive representation of variation, the assumption that only consciously available cues may index social meanings is open to question. Speakers clearly do in some cases subject their language to conscious manipulation to accomplish a social goal, but there is no reason to assume that social meaning requires conscious reasoning. Linguistic processing is accepted as being performed rapidly and automatically despite the fact that it is also understood as necessarily dependent on a great deal of complex real-world knowledge. Likewise, researchers in the field of social cognition have shown that a tremendous amount of social perception can and does happen automatically (Wyer Reference Wyer2004). The degree to which speakers have conscious knowledge and control of different sociolinguistic behaviors is a fascinating question, but conscious reasoning need not be the only way in which sociolinguistic information may be understood and used.
These intertwined problems of intention and consciousness/automaticity lie at the heart of our current questions regarding the cognitive nature of sociolinguistic variation. The results presented in this article have demonstrated that even though sociolinguistic variation need not be consciously processed during perception, it is factored into social evaluations in complex ways. Differences between listeners (potentially mood, personal disposition, or idiosyncratic reactions to specific speakers) shape the role of variation, in this case by changing listeners' conceptualizations of speakers as either honestly conveying their identity traits (intelligent, masculine) and stances (compassionate) or intentionally, and therefore unsuccessfully, trying to perform these constructs. The work that remains is untangling exactly how these processes of sociolinguistic perception occur and to which other processes they are tied.
APPENDIX A: SURVEY INSTRUMENT
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160627171508-20606-mediumThumb-S0047404508080974jra_tabU009.jpg?pub-status=live)