Until very recently, sociolinguistic research on the North American quotative system tended to focus on a few, by now well-researched, variants, such as like and go (Bakht-Rofheart, Reference Bakht-Rofheart2002; Barbieri, Reference Barbieri2005, Reference Barbieri2007; Buchstaller, Reference Buchstaller2004; Buchstaller & D'Arcy, Reference Buchstaller and D'Arcy2009; Cukor-Avila, Reference Cukor-Avila2002; Romaine & Lange, Reference Romaine and Lange1991; Tagliamonte & D'Arcy, Reference Tagliamonte and D'Arcy2007).
(1) I'm like“oh my uncle's calling me it must be important”
(2) I go“I seen you following me for a couple of miles now.”
Only in the last few years has the literature started to pick up on another, apparently new variant, quotative all, as in (3)–(5), (see Bayley & Santa Ana, Reference Bayley, Santa Ana and Kortmann2004; Rickford, Reference Rickford2000; Singler, Reference Singler2001).
(3) He's all“well let me check em alright oh I'm sorry bout that”
(4) I'm all, “Dude, you're not helping your cause!”
(5) She's all “Ooh- he's so wonderful—I'm all in love with him— he's all in love with me.”
All's extension to quotative function is new. Quotative all is not in the Oxford English Dictionary (OED), nor in any of the modern dictionaries except the fourth edition of the American Heritage Dictionary. The Switchboard Corpus I (collected from 1988 to 1992) and the Santa Barbara Corpus of spoken English (collected in 1988) each contain only one token of quotative all. The earliest report of quotative all we have found is in the fall 1982 issue of the newsletter Not Just Words edited by Danny Alford at the University of California at Berkeley. In terms of its regional pattern, quotative all has previously been attested primarily in California (Alford, Reference Alford1982–1983; Fought, Reference Fought2003; Rickford, Reference Rickford2000; Rickford, Buchstaller, Wasow, & Zwicky, Reference Rickford, Buchstaller, Wasow and Zwicky2007; Waksler, Reference Waksler2001; Wimmer, Reference Wimmer1990) but also in Arizona (Barbieri, Reference Barbieri2005), Texas (Bayley & Santa Ana, Reference Bayley, Santa Ana and Kortmann2004), New York (Singler, Reference Singler2001) and Ontario, Canada (Tagliamonte & D'Arcy, Reference Tagliamonte and D'Arcy2004, Reference Tagliamonte and D'Arcy2007), and even in England (Buchstaller, Reference Buchstaller2004).
In earlier work, we discussed the relationship between all in intensifier and quotative function (Buchstaller & Traugott, Reference Buchstaller and Traugott2006) as well as its social and linguistic constraints (Rickford et al., Reference Rickford, Buchstaller, Wasow and Zwicky2007). We have shown that the frequency of all in the quotative system decreases considerably in recent years and that the overall decline goes hand in hand with a shift in its constraint hierarchy. In this paper, we zoom in on the change of this relatively new variant. Using a combination of quantitative variationist and computational methodology, we focus on the recent history of the quotative variant in apparent and real time. As a first step, we trace the relative frequency of all in the set of quotative introducers used in recordings from California youth from 1990/1994 until 2004–2005. Moving beyond the Californian context, we then discuss the results of a collaborative research project with Google Inc., which allowed us to track the diachronic development of all versus other quotative options in greater detail. Focusing on the distribution of quotative variants with different types of interpretations (speech, thought, or stereotypes) across time, we show that all has indeed taken on a quite specialized function within the quotative pool.
The investigation of both real and apparent time data leads us to conclude that quotative all is a rather short-lived innovation. It exhibits a steep drop-off, both in the comparison between the interviews conducted in the 1990s and those conducted in 2004–2005, and in the Google corpus spanning the years 1982–2006, being replaced by like, which has been attested since the 1980s, in both instances. The extent of the shift from all to like also shows up in the proliferation of the intermediate form all like, as in (6) and (7):
(6) He's all like“You know little punk. Say another word just keep on…”
(7) She was all like um “Yeah at my school knitting was banned”
Looking specifically at the interaction between all and all like across real time, we will detail the extent to which all has given way to like in the first few years of this century. The rise and fall of quotative all provides insight from language change in progress for similar short-term innovations and their actualization in earlier English (cf. stinten ‘to stop V-ing’ in Middle English). Before we get into our analysis, we will first discuss the data sets on which this study is based.
DATA
For this paper, we will report on three principal sources:
1. The 1990/1994 Wimmer/Fought tape-recorded corpus (WFTRC) collected in California from 12 high school and undergraduate students and young adults, who were all born in California and have never left the state for any protracted amount of time. The corpus consists of two sets of conversational recordings: one set was collected by Ann Wimmer for her Stanford senior honors thesis in 1990. It includes six middle-class white speakers (ages 14–23 years), all from the San Francisco Bay area in Northern California. The second set, which includes six Chicano (Mexican American) speakers (ages 17–20 years) from the Los Angeles area (Southern California), was collected by Carmen Fought in 1994. These recordings, which yielded 473 quotations, including 134 tokens of all (including all here and all like) and 97 tokens of like, served as a comparative base for our later corpus, recorded in Stanford in 2004–2005.
2. The 2004–2005 Stanford tape-recorded corpus (STRC) consists of sociolinguistic interviews with 17 Stanford University undergraduates (ages 17–22 years) and 1 graduate student (22 years old), 11 students from Gunn High School in Palo Alto, California (ages 14–18 years), and 3 young adults from San Francisco and Southern California (ages 24–27 years). The speakers were of various ethnicities but most of them could be counted as middle class (being children of highly educated parents, living in relatively affluent areas, and attending a highly esteemed school/university). All speakers are native Californians and/or have spent most of their lives in California. By comparing this corpus with the earlier 1990/1994 corpus, we were able to pinpoint how all has changed quantitatively, in terms of its relative frequency, and its internal constraints. This tape-recorded corpus yielded 1,134 quotatives, including 26 tokens of all or all like and 820 tokens of like.
3. The Google Newsgroups corpus. In order to get a more fine-grained sense of the relative frequency of quotative all over the past two decades, we searched a massive archive of Internet newsgroupFootnote 1 postings hosted at Google. According to their Web page, when Google acquired the database from Deja.com in 2001, it contained about 500 million individual messages.Footnote 2 Google Groups now exceeds one billion postings—hence many billion words—and it is steadily growing.Footnote 3
We now move on to the discussion of our findings. We first discuss the patterning of quotative all across time in the California data and then the Internet searches.
FINDINGS
Variationist analysis of spoken California corpora
The overall distribution of the most frequent variants in the California corpora has shifted extensively within the last decade. For the California adolescents recorded in 1990/1994, all is the most frequent single variant in the quotative system, being used by three-quarters of the speakers in our corpus (9 of 12) and making up the majority variant among these speakers. By the 2004–2005 period, however, the picture has changed dramatically: Only about one-third of the 32 adolescents and young adults we interviewed used the form at all, and even among these speakers, all was clearly a minority variant.
Given the inverse numerical relationship between all users and nonusers across the two corpora, we decided to represent our data split up by whether or not speakers used the quotative variant all. Table 1 includes the speakers in the 1990/1994 data whose system contains all. For the California adolescents recorded in 1990/1994, all is the most frequent single variant in the quotative system. Although there is considerable variation across these speakers, all and all like make up about 37% overall, with quotative like amounting to 20% and say and other (including unframed) quotes making up another 16% to 19% each. Table 2 shows the three speakers in the 1990/1994 data who did not use all. What distinguishes the two groups, adopter versus nonadopters,Footnote 4 from one another was their age. Indeed, Ann Wimmer reported that age is the most important constraint in the 1990 corpus. “All of the high school students interviewed used it [all], but none of the college age speakers did.… No one in the study over the age of 19 was heard to use this variable at any time” (Wimmer, Reference Wimmer1990:10).
Table 1. Quotative variants of speakers in the Wimmer/Fought 1990/1994 corpus who used all or all like
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160704232337-16315-mediumThumb-S0954394510000098_tab1.jpg?pub-status=live)
Notes: W = white, MA = Mexican American, M = male, F = female. Los Gatos (Wimmer's 1990 research site) is in the San Francisco Bay Area, Northern California; Los Angeles (Fought's 1994 research site) is in Southern California. ALL includes 52 tokens of all here used by Brandon.
Table 2. Quotative variants of speakers in the Wimmer/Fought 1990/1994 corpus who did not use all or all like
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160704232337-15602-mediumThumb-S0954394510000098_tab2.jpg?pub-status=live)
Notes: W = white, MA = Mexican American, M = male, F = female. Burlingame and Los Gatos are in the San Francisco Bay Area, Northern California; Los Angeles is in Southern California.
In our corpus collected from California adolescents and young adults a decade later, quotative all has decreased markedly in overall frequency as well as in the proportion of speakers who use it. Tables 3 and 4 show that in our 2004–2005 corpus, all-users are clearly in the minority (10 of 22 speakers). Note that even among those speakers whose system contains all, it is like that has clearly established itself as the default form among the quotative introducers (72%) whereas all and all like amount to only 6%. Among the nonusers of all, like retains the same share in the system, 72%, with slightly higher frequencies of go and say.
Table 3. Quotative variants of speakers in Stanford Tape Recorded Corpus (STRC 2004–2005) who used all or all like
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160704232337-44944-mediumThumb-S0954394510000098_tab3.jpg?pub-status=live)
Notes: W = white, MA = Mexican American, F = female, M = male, C = college student, H = high school student, JC = junior college student.
Table 4. Quotative variants of speakers in Stanford Tape Recorded Corpus (STRC 2004–2005) who did not use all or all like
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160704232337-60858-mediumThumb-S0954394510000098_tab4.jpg?pub-status=live)
Notes: A = Asian, B = black, CH = Chinese, CR = Creole, J = Japanese, P = Punjabi, PI = Pacific Islander, W = white, F = female, M = male, C = college student, H = high school student, JC = junior college student, G = graduate student, N = nonstudent, Atl = Atlanta, SFO = San Francisco.
We decided to zoom in on the competition between all and like across time, concentrating on the speakers whose system contains quotative all. Figure 1 comparatively depicts the composition of the quotative system of the all-users in our 1990/1994 and 2004–2005 corpora. The crossover pattern is evident: all, which in the 1990/1994 data amounted to almost as large a share as all other quotatives together (mainly say, go, and unframed) has been relegated to only 6% in the 2004–2005 data and like clearly dominates the system. Indeed, all and like switch places as the primary quotative, with the overall frequency of the other variants (say, think, go, zero, etc.) changing far less in overall proportion. This highly significant change (χ2(2) = 217.851, p < .001) is largely driven by the replacement of all with like as the preferred quotative.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160704232337-91318-mediumThumb-S0954394510000098_fig1g.jpg?pub-status=live)
Figure 1. Relative frequency of all and all like, like and other quotatives among the speakers who use all or all like in the 1990/1994 and 2004–2005 data sets (in %).
The overall trend across real time is also sustained when we look at individual speakers within these two data sets: Whereas at least four speakers in Wimmer's (1990) and Fought's (1994) recordings used 10 or more tokens of quotative all, the highest number used by any one speaker in our 2004–2005 tape-recorded corpus was only 6. The movement away from all and toward like between the 1990s and the 2000s becomes even more dramatic if we reconsider the fact that in the 1990s, all was categorically constrained by age: in Wimmer's 1990s data, only the high school students used the new incoming form all (42% among the quotative options); none of the college-age speakers did.
Importantly, the extent of the shift from all to like also shows up in the development of a combined form: all like, as exemplified in (8) and (9).
(8) I'd be all like“You know I'm thirteen, right?”
(9) He's all like“You got any weapons in the car”
There are no all like tokens whatsoever in Wimmer's (1990) corpus. By the mid-1990s, in Fought's corpus, three tokens of all like were found. In our 2004–2005 corpus, all like is the primary sequence in which quotative all is used (17 of 26 all tokens). As is evident in Figure 2, the increase in the proportional amount of all like in the three data sets 1990, 1994, and 2004–2005 is quite dramatic.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160704232337-48790-mediumThumb-S0954394510000098_fig2g.jpg?pub-status=live)
Figure 2. All like ratio as calculated by the proportion of all like out of all quotatives introduced by all like and all (in %).
It is furthermore remarkable that the only nine tokens of quotative all by itself in our 2004–2005 tape recordings come from college students. All of our high school students used all like instead.Footnote 5 By our the time of our 2004–2005 corpus, all like has become the primary sequence in which all is used as a quotative, and the only one used by the younger speakers.Footnote 6
The demise of all within the set of quotative introducers used by the California youth represented in our corpora is also confirmed by the input probabilities of two separate VARBRUL runs on the two data sets.Footnote 7Table 5 shows that all is much more likely to occur overall in the 1990/1994 corpus (input probability .34) than in the 2004–2005 corpus (input probability .04). But it is not only the input probabilities that have decreased sharply, showing that overall frequency of all has diminished; the constraints that govern quotative all have also changed across the two corpora. We discussed the constraint hierarchy of all across time in detail in Rickford et al. (Reference Rickford, Buchstaller, Wasow and Zwicky2007).Footnote 8 Here, we only briefly point to the major changes in the constraints that govern quotative all.
Table 5. VARBRUL analysis of the factor groups conditioning quotative all among the speakers who use the form in the 1990/1994 corpus and the 2004–2005 corpus (see Rickford et al., Reference Rickford, Buchstaller, Wasow and Zwicky2007)a
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160704232337-62807-mediumThumb-S0954394510000098_tab5.jpg?pub-status=live)
[ …] = non-significant constraint. Not significant in either corpus: gender, sexual orientation, drama/animation.
a For the 1990/1994 data, we have excluded Carl, a marginal all user who only produced a single token of all in his 45 quotatives. With him included, the results of our VARBRUL run would look slightly different with N = 365 and an input probability of .29. Only one factor comes out as significant, namely tense (present = .77, past = .33, other [future, conditional, etc.] = .07), with all other factor groups chosen as non-significant.
b The category non-third-person also includes one token of the very rare second-person generic you in the string you're just all, “I can't do this.”
For both VARBRUL runs, we included seven factor groups in the analysis: (1) tense and modality (present nonmodal, past nonmodal or modal/quasiauxiliaries), (2) subject type (full singular or plural NPs, first or third person pronouns including it, and unframed quotes), (3) birds of a feather (priming effects with respect to the quotative choice in the preceding five turns, operationalized here as the occurrence of a different quotative [alternation], of the same quotative [perseverance] or of no quotative), (4) speech or thought encoding, (5) drama/animation (the [non]occurrence of voice or sound effects), (6) gender, and (7) ethnicity.
Table 5 shows that while ethnicity showed a significant effect in the 1990/1994 data with white speakers slightly favoring quotative all over the Chicanos, none of the social factors tested for came out significant in the 2004–2005 data.Footnote 9 In the 1990/1994 data, the occurrence of all is conditioned by the tense/modality in the quotative frame, with present nonmodal contexts strongly favoring its occurrence (with a factor weight of .75). In the later corpus, however, tense/modality does not have a significant effect: The few tokens of all in the 2004/2005 corpus occur with a broad range of tense and aspect markers (see (10) and (11)).
In the 1990/1994 data, all is mainly used with present time reference:
(10) You know so we're all “Three more questions, who cares?”
In the 2004–2005 data, all is also used with future time reference and habitual would:
(11) a. He'd be all “It's a it's a black guy.”
b. I'll be all like“Stop it. Don't text me.”
As regards the role of tense/modality, the numerical loss seems to go hand in hand with a loss of constraints, from a very high range of .69 in the 1990/1994 data set to a nonsignificant outcome in 2004–2005.
However, one other factor group continues to have a bearing on the occurrence of quotative all, albeit with varying strengths and directions. In the earlier corpus (see (12a)), all tended to cluster, because perseverance, which we define as the utterance of another token of all within the five preceding lines, favored its occurrence with a factor weight of .65 in 1990/1994. Importantly, there are also several clustered examples in the corpus collected by Rachelle Waksler in spring 1997 until fall 2000 in San Francisco and which formed the basis for her (2001) article (e.g., (12b)). This is worth noting because it extends the period in which such sequences could be documented by another 6 years or so, which is potentially significant for a short-lived trend (the rise and fall of all) that essentially lasted just 20 years.
In the 1990/1994 data, all is mainly used in clusters
(12) a. He's all“What are you doing here?”
I'm all“You called me in.”
He's all“For what? For what?”
Examples from Waksler (Reference Waksler2001) collected 1997–2000:
b. And so he's all“NO, I'm not getting out of the car.”…
And then I was all“Well could you please give him a message for me, please?”
He's all“What?”
I'm all“Tell him to leave Mary alone.”
And he's all“OK.”
And he's all“Well I'm supposed to give YOU a message.”
And I was all“Whatever!”
By 2004–2005, however, all mainly occurs in sequences where it is preceded by other quotative options (a context that we termed alternation, factor weight .71, see (13)) or where it is not preceded by reported activity at all (factor weight .61). In our 2004–2005 corpus, all is very strongly disfavored in clustered contexts.Footnote 10
(13) I asked some guys in Portuguese where the academy is
And they're all“It's right here”
And I went there and asked the lady when they trained
And she's like“come back at eight”
Finally, by 2004–2005, all has acquired two constraints: the type of quote reported and type of subject. We will discuss both in turn. In the 1990/1994 data set, all was used indiscriminately with speech and thought. However, by 2004–2005, it has narrowed its uses, being now mainly used for the introduction of reported speech rather than thought (consider (14a) and (14b).
In 2004–2005, all is mainly used for the introduction of speech rather than thought:
(14) a. SPEECH: He's all“Stay right there”
b. THOUGHT: it was all like“Oh my God I'm gonna fail”
The second constraint that was significant only in the 2004–2005 corpus is the subject type with which the quotative occurs. Importantly, this factor group harbors two intersecting constraints: full NP versus pronoun and first versus third person. In the 2004–2005 data, full NPs strongly disfavor the occurrence of all (factor weight .20) whereas subject pronouns either favor it or have no effect. Among the pronouns we also notice a person-hierarchy: Whereas all is strongly favored by third-person pronouns (factor weight .71), which include singular as well as plural forms (see (15)), first-person pronouns I or we have a neutral effect on the occurrence of the form.Footnote 11 Interestingly, while the literature on quotation discusses the role of third-person it in the development of quotative like (see Buchstaller, Reference Buchstaller2004; Tagliamonte & Hudson, 2001), only one quotation in our corpus was framed by a form of it + all (see (14b)).
In 2004–2005, all is mainly used with third-person pronouns:
(15) a. They're all, “gotta get to the arcade!”
b. So he's all, “yeah, come over ‘n’ use it.”
The difference in constraint hierarchy and direction from the 1990/1994 to 2004–2005 data means that change has indeed taken place, both in relative frequency and in constraint patterning. As all decreases in frequency, it loses one linguistic constraint, namely tense and modality, and gains two more, subject type and speech/thought representation. The birds of a feather effect continues to exert an influence on the occurrence of the form, albeit with a much larger range than in the earlier corpus. Overall, the development of this form seems to provide supporting evidence that all is a rather short-lived innovation that has ceded its territory to like and all like over the past years. After a high in the late 1990s, the overall use of quotative all is clearly in decline.
However, thus far, we have based our claims solely on California data. We are not in a position to state how robust and generalizable these findings are across geographical space. We also do not have any information about the more fine-grained temporal detail of what happened before and between the collection of the two data sets, a problem endemic to real-time analysis in sociolinguistics. As a second step, therefore, we set out to test the hypothesis that the frequency of all has dwindled in recent years in a larger, more finely time-differentiated corpus. We also wanted to give a wider geographical angle to our investigation.
Lacking large-scale corpora collected within the sociolinguistic research paradigm that span the full period since the first attestations of quotative all, while also exhibiting wide geographical coverage, we decided to work with data from the World Wide Web. More specifically, we drew on corpora culled from Google, the Web-based search engine. It is worth noting here that most of the material in the Google corpus (as far as we can determine its provenance) is from the United States. Hence, while the scope of the Internet searches is indeed broader than California and does include a multitude of sources, it is still mainly based on U.S. data. To what extent this is the case is notoriously difficult to assess and cannot be determined here. The following sections detail our analysis of the Google corpus.
An analysis of the newsgroups data
The extent to which the language of Internet newsgroups is comparable to spoken language is a point of contention (Androutsopoulos & Ziegler, Reference Androutsopoulous, Ziegler and Gunnarsson2004; Crystal, Reference Crystal2001; Tagliamonte & Denis, Reference Tagliamonte and Derek2005). Here, we do not intend to argue that newsgroups contain the same frequency and general distribution of quotatives as spoken interaction; although, Jones and Schieffelin (Reference Jones and Schieffelin2007) have shown that another type of new media, instant messaging, is very rich in quotations, which seem to be used for similar functions as reported speech in spoken interaction. The aim of this second section is rather to describe in some detail the methods and outcome of our collaborative project with Google, which aimed at investigating the use of quotative all in Internet newsgroups. We believe that the methods we employed for our work on quotative all can be applied to other kinds of linguistic research projects and, therefore, have the potential to substantially enrich the kinds of corpus-based analysis used in variation studies, sociolinguistics, and other linguistics subfields.
Our analysis of the Google corpus proceeded in two steps. The first step was a pilot study, which we reported in Rickford et al. (Reference Rickford, Buchstaller, Wasow and Zwicky2007), so we provide only a brief summary of it. Here, we describe in some detail the second step, which builds on the findings of the preliminary analysis.
The pilot study, carried out in 2005, used Google's interface to the newsgroups corpus to search for examples of quotative all.Footnote 12 Google's search tool only allows simple string searches and ignores punctuation, so finding quotatives among the millions of occurrences of all in the newsgroups corpus was not straightforward.Footnote 13 We thus constructed a number of strings containing all that we thought would have a good chance of matching quotative uses of all. In a nutshell, these consisted of a singular subject pronoun with a contracted present tense form of be, followed by all and a word that seemed likely to be the start of a quote, such as a wh-word, yeah, no, shit, it, or the like. For example, “I'm all yeah” or “I'm all shit.” Footnote 14
The resulting hits were examined and the quotatives culled, producing 354 examples over the period 1982–2004. These were then grouped according to the year of posting. In order to determine whether the rate of quotative uses of all was changing during the period covered by the newsgroup archive, it was necessary to have some measure of the size of each year's archive. A crude metric of the rate of quotative all would be the number of instances we found in a given year, divided by the total size of that year's archive. Unfortunately, Google does not make publicly available the size of the newsgroup archives for each year. In our pilot study, we attempted to get a measure of the relative sizes of the archives on a year-by-year basis by searching for some very common words (such as word, other, make, see, way, people, first, the) and comparing the number of hits across years. The tentative conclusion of the first stage of our project, on the basis of this method, was that quotative all first appeared in the newsgroups in the mid-1990s, becoming rapidly more common until about 1999, and then declining precipitously in frequency (see Rickford et al., Reference Rickford, Buchstaller, Wasow and Zwicky2007:20). We could not be confident about this conclusion, however, because of several methodological limitations, which we will address in more detail.
To advance our understanding of the development of all and to test our hypothesis that all has indeed dwindled in recent years, we collaborated with Thorsten Brants, a researcher at Google Inc., and David Hall, a Stanford undergraduate who was employed by Google for two months over the summer 2006. This collaboration allowed our searches of the Google Groups archive to improve on the standardly available tools we had employed previously in a variety of ways. The most serious limitation we had run into during the pilot study was that we needed a more reliable measure of the sizes of the newsgroup archives for each year. In order to test whether frequency of usage of any form is changing, the raw frequency of occurrences in the archive is useful information only if it is accompanied by information about how the size of the archive changed over time.Footnote 15 Even though Google remained reluctant to disclose the absolute size of their newsgroup corpus, during our summer project, they provided us with numbers indicating the relative size of each year's archive,Footnote 16 which allowed us to normalize our raw year-by-year counts of different quotatives. This was necessary to yield comparable data across time and thus to make the newsgroup searches a reliable source of data on the changing rates of quotative usage.
A second methodological problem that we had run into during our pilot study was that our pilot search tool was restricted to the search bar that Google makes available on its Web site. Hence, the search mechanism was essentially just keyword search, with a few minor enhancements. However, because all is an extremely common word,Footnote 17 and only a tiny fraction of its uses are quotative, it was impossible to try to find all, and only, the quotative uses in the output yielded by keyword searches. As we already mentioned, in the pilot study we attempted to circumvent this problem by constructing linguistic environments that we hoped would yield a relatively high rate of quotative hits and went through them by hand. But even with this method, the signal-to-noise ratio on our searches was relatively low. The 354 instances of quotative all that we found by this method had to be culled from thousands of hits by our search pattern. In our collaboration with Google, we were able to search in a way that was sensitive to punctuation and, therefore, reduce the amount of noise substantially.
Our Google partners developed a search tool allowing regular expressionsFootnote 18 in search patterns, which made the searches far more efficient. Preliminary attempts to find regular expressions that would yield all, or nearly all quotatives resulted in far too many hits to be analyzed individually. Moreover, an examination of random samples of those hits revealed a rather poor signal-to-noise ratio—that is, the vast majority of the hits were not quotative uses.Footnote 19 We therefore modified our strategy. We used our existing compilation of quotative examples including the 1990/1994 and the 2004–2005 California data, as well as other examples, such as those in Waksler's (2001) article, to look for words that were common as the first word in a quotative. Selecting the most common lexical items, their most common spelling variants, and a few closely related words, we constructed a regular expression that included a left context of a singular pronoun and contracted copula, followed by all, followed by optional comma and quotation marks, and finally one of our likely quotation-initial words. The regular expression can be summarized as in Figure 3.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160704232337-50456-mediumThumb-S0954394510000098_fig3g.jpg?pub-status=live)
Figure 3. Regular expression for the Google newsgroup search.
The procedure for the regular expression search was as follows: First, we searched the newsgroup corpus using the regular expression in Figure 3.Footnote 20 By including only these lexemes in the template (and thereby limiting hits to strings that contained these exact sequences), we obviously missed many other quotes that did not start with these exact words. In Figure 3, W stands for one of our likely quotation-initial words, which are: are(n't), blah, can('t), could, do, dude, fuck, gee, get, give, hey, hi, how, if, is(n't), lets, look, no, oh, ok, OK, okay, ooh, shit, shut up, thank, uh, um, well, what, when, where, who, whoa, why, will, wow, yeah, yes.
However, narrowing down our search to these typical quote beginnings also dramatically increased the ratio of quotatives in the output. Of the 913 hits for all, only 162 (18%) were noise, and for the other quotative introducers, the noise rate was even less.
A final methodological problem of the pilot study was that in 2004, we looked at only one quotative, all. But without checking the rates of other quotatives, our study of quotative all lacked adequate controls. Even if we could be confident that the rate of all was dropping, that could be the result of changes in what newsgroups were used for. Perhaps changing technologies were leading discourses rich in quotatives to migrate to other venues, such as blogs or instant messaging. In order to provide accountability in terms of the behavior of the competitor variants, we thus searched not only for all but for the quotatives say, go, and like as well.
Using essentially the same method but exchanging the quotatives in the parametric slot (cf. Figure 3), we then searched the corpus for all like, like, say/go. The overall output can be seen in Table 6.Footnote 21 The searches for like and say/go yielded too many hits to be practically examined individually (10,938 for like and 132,036 for say/go). We thus decided to work with randomly selected samples of 1,000 hits of each of them. Finally, all 3,118 hits in the corpus (914 all, 203 all like, 1,000 like, 1,000 say/go) were hand-coded into one of four categories. We now define these categories, exemplifying them with output from the Google searches.
Table 6. Raw output from regular expression search on the Google newsgroup corpus
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20151021084149517-0234:S0954394510000098_tab6.gif?pub-status=live)
Speech
This category consists of quotes in the traditional sense, namely reports of words said or written, as in (16).
(16) She said“so you're [sic] baby juts [sic] turned one, I think I met her”
and he's all“yeah, you babysat her once, you were great, like $10 an hour”
Thought
These quotations appeared to be reports of thoughts that were not actually uttered or committed to writing, as in (17) and (18).
(17) No matter how many times I see this subject line, my first thought is that
it's a score, and I'm all“Who the hell could beat somebody 420–1?”
(18) So, I been reading these posts and I'm all: “Who's this Arrow Guy?”
Stereotypes
These quotes are characterizations of a person or of a situation through a quote that might characteristically be produced by that person or in that situation. This category is exemplified in (19) and (20) with all and in (21) with say. Footnote 22
(19) What a bunch of whiner troops we have! It's all“could we please have some body armor so our limbs aren't blown off” and “some metal shielding on our humvees might help us to die less.”
(20) You seem to be under the impression that we think that once you've sinned then it's all‘oh dear, game's over, that's us condemned to the eternal fires’.
(21) When they say “You'd better stay overnight for observation.”
It means“I want everyone to get a good laugh at this one.”
The category stereotype is new to discussions of quotatives.Footnote 23 In fact, it constitutes a relatively small fraction of the examples of all quotatives variants except for quotative all, so it is perhaps not surprising that it had not been noted before. But as we began examining the all data from the Google Groups search, it was evident that a great many of the examples served to characterize people or situations through quotes without actually attributing words or specific thoughts to them. So we added this category to our study.
Nonquotatives
This category consists of examples that should not be considered quotes, such as cases of quotation marks used for emphasis, quotes around proverbs or clichés, or discussions of the use of nonstandard quotatives (of which there were several in our data), as exemplified in (22)–(25).
(22) It's all “what ifs” but like it or not, Oct 4 was a big deal.
(23) it all depends on whether you consider reporting things 'too good to be true' has no grey area at all, or if it's all 'yes' and 'no' with great lines between them.
(24) Here in So. Calif. the most recent incarnation of “go” in lieu of “say” is And'm all “No Waaaaaaay!!” And then she's all “Yeah, waaaaaay!” Well all right, so there's a verb in there, but…
(25) I recall this even from elementary school. Two other annoying slang substitutes for “say” are “like” and “all”.
The categorization of all quotatives into these four categories was carried out by Nick Romero, an undergraduate student at Stanford, and questionable cases were reviewed (and occasionally changed) by at least one of the authors. Full contexts from the newsgroups were available to us—and were consulted in the majority of cases—so that informed decisions could be made about the classifications. The raw data from our four searches of the Google corpus (for all, all like, like, and say/go) are summarized in Tables 7–10.Footnote 24
Table 7. All quotations by quotative category and year (raw data)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160704232337-48824-mediumThumb-S0954394510000098_tab7.jpg?pub-status=live)
Table 8. Like quotations by quotative category and year (raw data)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160704232337-98599-mediumThumb-S0954394510000098_tab8.jpg?pub-status=live)
Table 9. All like quotations by quotative category and year (raw data)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160704232337-52497-mediumThumb-S0954394510000098_tab9.jpg?pub-status=live)
Table 10. Say/go quotations by quotative category and year (raw data)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160704232337-83148-mediumThumb-S0954394510000098_tab10.jpg?pub-status=live)
Note first of all that noise—category 4, the nonquotatives—constituted under 5% of the data in the like (34 of 1,000), all like (2 of 203), and say/go (23 of 1,000) searches but 18% in the all data (162 of 913). Hence, whereas the bulk of the material consisted of usable data from categories 1–3, the output for quotatives all nevertheless contained a sizeable ratio of noise. More importantly, note that all leads the way in the stereotype category: 38.5% of the all quotes are from the stereotype category (compared with only 23.3% for like and only 4.8% for the combined say/go tokens). Hence, as we pointed out previously, all seems to be fundamentally doing something different from the older quotatives say/go and also probabilistically from like.
In order to trace the development of the quotative variants across real time, we needed to normalize the raw output of our searches. Because we were not given the absolute word frequencies for the archive but only the relative sizes of the newsgroups on a year-by-year basis, we computed normalized numbers that take account of the fluctuations in newsgroup size per year in the following way. We took the numbers from Tables 7–10, excluded the nonquotatives, and adjusted for the relative size of each year's newsgroup archive by dividing the number of actual examples of all for each year by the percentage of the total newsgroups corpus contained in that year's archive. In the case of like and say/go, we also projected the rates based on the fact that we had only examined random samples of 1,000 examples (by multiplying the like rates by 10.939 and the say/go rates by 132.036).Footnote 25 Finally, we plotted the normalized rates of each quotative over the years in Figures 4–7. Because token numbers for all of the quotative variants were generally very low in the pre-1995 newsgroup postings, we collapsed these age bands into one composite figure. The reader is advised to refer to the (nonnormalized) frequencies in Tables 7–10 for information about the patterning of the individual quotatives in these earlier age bands. We turn now to the results of these manipulations, one quotative at a time, starting with all.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160704232337-67683-mediumThumb-S0954394510000098_fig4g.jpg?pub-status=live)
Figure 4. Rate of all in the Google newsgroups, computed by taking the totals of quotative categories 1–3 and adjusting for the size of each year's newsgroup archive (frequency count).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160704232337-55572-mediumThumb-S0954394510000098_fig5g.jpg?pub-status=live)
Figure 5. Rate of like in the Google newsgroups, computed by taking the totals of quotative categories 1–3 and projecting the rates based on the fact that we had only examined random samples of 1,000 examples (frequency count).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160704232337-24742-mediumThumb-S0954394510000098_fig6g.jpg?pub-status=live)
Figure 6. Rate of say/go in the Google newsgroups, computed by taking the totals of quotative categories 1–3 and projecting the rates based on the fact that we had only examined random samples of 1,000 examples (frequency count).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160704232337-76205-mediumThumb-S0954394510000098_fig7g.jpg?pub-status=live)
Figure 7. Rate of all like in the Google newsgroups, computed by taking the totals of quotative categories 1–3 and adjusting for the size of each year's newsgroup archive (frequency count).
The earliest occurrence of all in the newsgroup corpus is a category 3 quote, a stereotype, which occurred in 1982. It is given in example (26).
(26) Those mercenaries sure lead a life, don't they? It's all “What Ho! Roger, we've been double-crossed! Let's take over the country!” and “Aargh, I'm hit-kill me…
After this lone occurrence, we did not find another token of quotative all in the newsgroup corpus until 1993. Let us now consider the development of all across real time in the Google newsgroups from this point (represented by pre-1995) on.
The lines in Figure 4 represent the year-by-year distribution of quotative all broken down into the categories speech, thought, and stereotypes. The topmost line (dotted, with triangles) represents the total occurrences of quotative all in our newsgroup corpus. The two lines below the top line indicate how the total is divided among speech (the area below the lowest line), thought (the area between the lowest line and the second line), and stereotypes (the area between the second line and the top line). The fact that the top line is relatively far above the other two shows that the category stereotype constitutes a substantial fraction of the occurrence of quotative all. Overall, Figure 4 shows that all is used mainly for speech and stereotypes. The category thought does not contribute much to its overall frequency of occurrence. And, as we pointed out before, the main locus of occurrence of all is the introduction of stereotypes. This is especially the case in the period when it is the most frequent, between 1999 and 2005.
Returning to the question of whether quotative all is in decline, the data presented here supports the conclusions we drew on the basis of our pilot project: Quotative all usage increased during the 1990s, peaked in 1999, and has been declining rapidly in the past six or seven years. Our larger, more recent study also allows us to see that it is especially in the stereotypes category that all first expands and then dwindles in frequency, whereas the speech and thought categories, while declining slightly since 1999, stay relatively stable. Importantly, this rate of decline is not matched by other quotatives. Let us first discuss quotative like, which is depicted in Figure 5. Rising sharply after 1995, like is fluctuating in frequency of occurrence across time but seems to have hit a high in 2006 after a steady rise (discounting an inexplicable trough in 2004 and 2005). Importantly, like seems to be used in almost equal proportions for the introduction of speech, thoughts, and stereotypes.
Furthermore, the overall proportion of these categories seems to stay relatively stable across time, except for 2006. However, as our database for 2006 was relatively small (including newsgroup postings from only the first six weeks of the year), we treat the 2006 figure with caution. Hence, like and all seem to be fundamentally distinguished by their propensity to introduce reported thought. Whereas like occurs with speech, thought, and stereotypes in equal measure (see Buchstaller, Reference Buchstaller2004, who also found that like is used in equal proportions with quotes of various epistemic stances), the fraction of reported thought framed by all is negligible. But it is important to note that both like and all introduce speech and stereotypes, which sets them apart from the traditional quotatives say and go. Consider now Figure 6, which plots say and go across time.
Clearly, say/go are used virtually exclusively for true quotes. The categories thought and stereotype do not add much to their overall frequency of occurrence. This fact is also reflected in the low numbers in the columns for categories 2 and 3 in Table 10. In terms of the development of say/go, we note that the curve exhibits considerable year-by-year variation and very slow long-term decline, but nothing like the rapid drop-off of all.
These findings thus lend support to our earlier claim that all has declined in the last few years. Its lower frequencies of occurrence in the years since 1999 cannot be attributed entirely to the fact that populations that use a lot of quotation have left the newsgroups and migrated to a newer, possibly hipper medium such as blogs. If that were the case, we would see a similar trend for like and say/go. But this is clearly not the case. The curve for all looks so different from all the others that it seems safe to rule out any attempt to explain its shape as a function of more general changes in what people use newsgroups for.
Finally, we discuss the figure for all like. The number of examples of all like is so much fewer than the other quotatives that we are hesitant to draw any conclusions from the recent dearth of examples (consider Table 9). However, we need to address one point in particular. Earlier, we noted that the move from all to like is accompanied by the development of the form all like. If this were the case, we would expect all like to rise in frequency at the point in time when the transition actually happens, namely around 1999. Figure 7 shows that this is indeed the case. All like starts at low frequencies (under 8), picks up until 1999, plateaus while steadily increasing between 1999 and 2004 until the last two years, when examples are almost nonexistent. Hence, on the basis of these findings—which are admittedly based on relatively low token numbers—we conjecture that all like developed in tandem with all and continued to rise during the demise of all. Finally, in the last two years, when all is clearly ousted by like, all like also almost disappeared.
Figure 7 fits well with the pattern found in the California data (see Figure 2). In 1990, the California adolescents did not produce any tokens of all like. By 1994, some tokens of all like had developed in California. The 2004–2005 data collected in California seems to have caught it at its high point, just before it dropped dramatically in frequency. Obviously, it would be interesting to follow up the California study and add another time slice to see whether the drop in all like frequency in the Google data is also replicated in California.
CONCLUSIONS
In this paper, we have investigated the change of quotative all, using two different data sources: traditional sociolinguistic interviews and a Web-based newsgroups corpus. In both our California data and in the Google newsgroup data, all has dramatically declined in real time. Importantly, its numerical decline has also significantly influenced the constraints it is governed by, both in terms of the direction of constraints as well the types of factor groups.
The trajectory of quotative all discussed here is interesting from the perspective of language change in progress, as it provides a direct window on what has often been observed in historical texts: the short-term flourishing of a linguistic form or usage. In the case of quotative all, we clearly have an instance not simply of innovation in the individual, but of change in the sense of spread to many users (Milroy, Reference Milroy1992, Reference Milroy and Hickey2003; Weinreich, Labov, & Herzog, Reference Weinreich, Labov, Herzog, Lehmann and Malkiel1968). Earlier examples of such changes that were relatively short-lived in the textual evidence for Standard English include the use of auxiliary do in affirmative clauses such as [T]here I did see the whole Consent of the Realm against it (1554, Throckmorton qtd. in Nevalainen, Reference Nevalainen and Mugglestone2004:202), and of several aspectualizers such as stinten and finen, both meaning ‘finish,’ and both short-lived in Middle English (Brinton, Reference Brinton1988:151).Footnote 26 In some cases, such as all and do, the form becomes realigned with other uses, in others, such as finen, the form ceases to be used. Emergent structures are unstable in nature (Bybee & Hopper, Reference Bybee and Hopper2001), so it is no surprise that this kind of phenomenon of development and dissolution occurs, despite a tendency for analysts to expect a new phenomenon, especially one of a grammatical nature, to persist. (Contrast this with the loss of the verbal coda in topic restricting as far as constructions, a change that has been in process since the 19th century and appears to be moving forward in terms of frequency and linguistic environments affected [Rickford, Wasow, Mendoza-Denton, & Espinoza, Reference Rickford, Mendoza-Denton, Wasow and Espinoza1995]).
Our newsgroups study has added an interesting angle to our earlier findings. Perhaps the most remarkable thing to emerge is that there are some important differences among quotatives in the distribution of the three subtypes we identified (speech, thought, and stereotypes). Clearly, say/go are used virtually exclusively for true quotes. Like, on the other hand, is used as much to introduce thoughts or stereotypes as to introduce speech. All is unique in its frequent use to introduce stereotypes, particularly during its peak period of use, from 2000 to 2004. This indicates that all is functionally somewhat different from the other quotatives examined here.
Also, we hope to have shown that Google newsgroups (and similar data to the extent that they exist and are made available at other sites) are valuable sources for studying recent trends in language variation and change (see also Hoffmann, Reference Hoffmann2007; Hundt, Nesselhauf, & Biewer, Reference Hundt, Nesselhauf and Biewer2007). The collaboration with Google has given us the opportunity to search a huge amount of chronologically organized data using punctuation-sensitive regular expressions, a more powerful tool than the search methods Google makes available to everyone. In principle, we could have done our searches using the standard Google search tools, but it would have been vastly more time-consuming and error-prone. But the one thing we got from the collaboration that we absolutely could not have had without it is accurate data on the relative sizes of the archives year-by-year. The Web provides linguists with a corpus so large that it would have been unimaginable just a few years ago. Unfortunately, its very size and the variety of its contents make it unwieldy as a source of linguistic data. The newsgroups provide a much smaller, but still immense, corpus, with a modicum of useful organization built in. Two particularly attractive features of the newsgroups archives are that they can be searched by language and that they are organized chronologically. The latter property allowed us to study change in language usage over a time span far shorter than those usually considered in diachronic linguistics. We recommend this tool to others interested in studying ongoing changes that are detectable in the written form of language.Footnote 27