Introduction
In her widely circulated 2017 article for The Chronicle of Higher Education, Sarah Valentine recounts a conversation with her graduate mentor about the isolation and discrimination Black students face while studying in Russia. Valentine was told “that the Slavic field had always been more concerned with the political and cultural dynamics existing between the various Slavic groups…than with ‘outside concerns.’”Footnote 1 Along with this narrow definition of our field comes a serious implication: Valentine's well-being, even as a member of “the Slavic field” itself, was an “outside concern;” Blackness, for her field, was and is an “outside concern.”Footnote 2
Testimonials like Valentine's indicate that Slavic Studies has a problem with racist exclusion that works by the elision of racial issues.Footnote 3 Even the 2020 “AATSEEL Statement Concerning Systemic Racism and Police Brutality in the United States,” one of the most productive plans for anti-racist action to emerge from our field's leadership, opens: “AATSEEL does not generally make statements about public issues unless they directly relate to the Slavic field.”Footnote 4 Framing US-based racism as an exception to this rule is factually incorrect: not only does racism in the US have urgent reverberations in Eurasia, eastern Europe, and Russia, but also nothing relates more directly to “the Slavic field” than the well-being of that field's own members when their livelihood is hampered by racial discrimination and violence.
These present-day conversations have roots in the history of the field of Slavic Studies and its perceived and real marginalization. Articles traded year after year between luminaries from Roman Jakobson to Ronald Grigor Suny have contained calls for greater attention to Slavic “diversity” as it was threatened by the monolith of a Russo-centric Soviet state and by the crisis of that state's dispersal.Footnote 5 A redefinition on geographic terms ultimately took hold, adding “Eurasia” to the title of our discipline's organizational body in 2010.Footnote 6 Yet, as Ani Kokobobo recently noted, the majority of PhD-granting institutions in the US have retained the phrase “Slavic Languages and Literatures” in the title of their departments.Footnote 7 We ask: how are these decisions about what the field is reflected in what the field writes?
Anti-racist movements in other predominantly white disciplines point toward a methodology for approaching this question. In recent months, Princeton Classics Professor Dan-El Padilla Peralta has not only highlighted disparities within his field but argued that the very foundation of Classics is racist.Footnote 8 The New York Times Magazine framed these arguments as “a crisis of identity” in a field seeking “to shed its self-imposed reputation as an elitist subject overwhelmingly taught and studied by white men.”Footnote 9 As Padilla argues, however, this foundation may be impossible to shed without a radical transformation of the discipline's foundations. The much newer field of Digital Humanities (DH) has been shaped by internal review that brings methodology and representation to the fore.Footnote 10 Tara McPherson has observed that digital media and the Civil Rights era emerged as intertwined responses in a shared Cold War context. This foundational imbrication necessitates the incorporation of “race from the outset…as a ghost in the digital machine.”Footnote 11
This article offers a first attempt to articulate the “ghosts in our [Slavic] machines” through a new assessment of the shape of our field that stops pretending race is incidental. Current engagement with racism in the United States requires a response that is both immediate and sustained; using what Alex Gil and his collaborators call “nimble” digital methodologies or “rapid response research,” we can quickly and broadly assess a field that occupies a unique racial position in the US academy and has found that position to be in need of revision since the field's very inception.Footnote 12 Our approach merges theories of race and DH methods to ask how published articles in “Slavic Studies” do and do not reflect critically on race.Footnote 13 Because the textual field of Slavic Studies has never previously been analyzed as a corpus, the application of these methods is preliminary; however, our results gesture toward concrete circumstances and actionable steps. Namely, in some areas of identity (such as gender), Slavic Studies research has generated conversations that are robust enough to be visible from a digital bird's-eye view. Although individual articles stretching back two decades demonstrate the importance of race in Russia and Eurasia, the absence of any large, digitally detectable conversations on this topic reinforces Black students’ observations of negligence and ignorance. Meanwhile, works of scholarship that do offer a critical apparatus for thinking about race and Eurasia also demonstrate a great potential for interdisciplinary impact. Such writings can point an anti-racist path forward for the field as a whole.
Methods and Results
Digital Humanities offers the ability to study materials at a scale that would be impossible for any single scholar to grasp. Three specific computational methods allow us to ask research questions of entire fields and disciplines in a sample of over 100,000 scholarly texts: frequency analysis, topic modeling, and perspectival modeling. For predominantly English-language academic sources, we analyzed 41,251 texts, including both articles and books, provided by JSTOR Data for Research within a “Slavic Studies” cluster. For each text, JSTOR provides a list of all the words in the text and their frequency. For scholarship in Russian, we included texts from thirty seven journals. We used each of these samples to ask how our field represents two categories of socially perceived identity: race and gender. Complete information about our methods, corpus, models, and results, as well as additional images are available in this article's companion GitHub repository.Footnote 14
Preliminary Approaches
Reading discrimination on a digital scale does not only mean using a computer to scan articles for racist or sexist labels. We initially attempted to screen our samples for outright hate speech using Hatebase, a multilingual repository of 3,700 derogatory terms.Footnote 15 Term frequency analysis indicated how many times each Hatebase term appears in our samples, and in which texts each term appears most often. Pursuing this blunt approach demonstrates that discrimination in Slavic Studies publications is of a different kind. Slurs and derogatory terms do appear in our corpus by the thousands, but they are primarily used in quoted or reported speech with varying degrees of contextualization.
For example, a manual survey of texts that frequently use the word “whore(s)” found that this word (which appears 201 times in our English corpus) often arises in immutable, textually derived phrases (such as “Whore of Babylon”) or in critical analyses of historical archetypes (alongside “virgin” and “mother”). Similarly, the presence of slurs and often-offensive terms about Blackness in our sample indicates that race figures prominently in the sources Slavicists use—perhaps even more prominently than sex work. The word “negro” and its plural appear 732 times in our corpus; the n-word and its plural appear thirty seven times, typically but not always in primary sources (such as the title of a novel by Joseph Conrad).Footnote 16 Our Russian corpus yielded comparable ratios, with shliukha and bludnitsa arising a total of 3,004 times and translations of derogatory terms about Black people appearing 5,597 times.
The English word “whore” is overtly dehumanizing in common parlance; the n-word has been used for centuries to label millions of people as livestock or worse. Any casual repetition of these words should never be publishable. However, frequency analysis is incapable of quantifying how well scholars contextualize these words and address empirical abuses. When discriminatory terms about race and other identity formations appear in our field's source materials, what matters is whether researchers contextualize broader histories of racism and discrimination.Footnote 17 Digital methods can indicate whether such expertise has developed in Slavic Studies scholarship.
When a scholarly field is collectively interested in understanding a topic, it leaves textual traces of that interest beyond individual words. It includes many words about that topic in the same texts, as scholars undertake extended discussions rather than encountering the topic tangentially. A field's areas of interest spawn interconnected citations, as the same names become associated with certain terms. The limited results of our frequency analysis led to a search for such networks of conversation using a different method of text analysis called topic modeling. We applied three rounds of topic modeling, each using a different algorithm, to texts in the JSTOR Slavic Studies sample. Then, we manually examined the subject areas found to be important in this sample, searching for conversations about embodied identity.
Topic modeling is a form of machine learning that searches for statistically significant themes within texts.Footnote 18 Each time it runs, the model outputs a set number of “topics,” each of which is actually a long chain of characteristic words. Topic modeling clusters words from its input corpus with no indication from humans as to why the sample is interesting or what the words in it mean.Footnote 19 However, a computer can tell that when a Slavic Studies publication contains the word “feminism,” it is also likely related to publications that contain words like “gay,” “girls,” “dowry,” “zhenotdel,” and “sex.” This particular set of statistically connected words is sampled from the list of 200 terms that our third round of modeling labeled (arbitrarily) as topic #1.Footnote 20 A human can look at this “topic” and surmise that gender relations are a subject of great interest in Slavic Studies. Humans can also tell how robust this interest is by examining the topic's formative “fingerprint,” that is, the frequently used and highly connected words that models put at the front of each topic. The gender relations “fingerprint” contains not only abstract concepts (like “sexuality,” the 22nd term) and relational terms (“papa,” 4th, and “mama,” 5th) but also historical figures (Maria “pokrovskaia,” 23rd; Sophia “parnok,” 29th), and current scholars (Wendy “rosslyn,” 25th; Helena “goscilo,” 43rd). This network of ideas, histories, and people has played a significant role in allowing conversations about gender to coalesce in our field.
By contrast, on this macroscopic scale, our field writes about racial and ethnic histories without acknowledging the presence of race and ethnicity in those histories. A smaller cluster of scholarship has repeatedly demonstrated the potency of racialization and ethnicity as descriptors for identity in Central Asia, Russia, and eastern Europe.Footnote 21 Topic modeling makes clear, however, that the broader field has siloed these conversations since they are not statistically detectable in topic modeling alongside gender relations, the Politburo, post-Soviet politics, or individual centuries of Russian literary history. Our algorithm proposed one topic (#11) that used race as a fingerprint term (“racial,” 19th), but the rest of the topic's fingerprint (including “reich,” 2nd; “himmler,” 3rd; and “volksdeutsche,” 13th) indicated that the term was only significant in relation to scholarship about the Nazi regime.Footnote 22 Another topic (#79) pointed toward widespread scholarly conversations about nationalities policy without referencing ethnicity.Footnote 23 The gender relations topic showed that terms like “girls” and “husband” must be understood through frameworks like “sex” and “feminism”; likewise, it is possible to write about various nations and their histories without a robust understanding of the fundamental role race and ethnicity play in those histories.Footnote 24
As topic modeling points to gaps in understanding, it also illustrates how fields can understand identity multidimensionally, as a matter of cultural history, political thought, and current scholarship. The intersection between Slavic Studies and gender studies offers one such example. So does the intersection between Slavic Studies and Africana Studies: an algorithm separate from the one described above pinpointed one topic that centered on Black history, including both historical terms and conceptual terms (such as “race” and “racial”) in its fingerprint.Footnote 25 This was our data's only reflection of the sectors of Slavic Studies that examined what race is doing in racialized histories (beyond Nazism). To examine how studies of gender and Blackness could shape Slavic Studies on a broader scale, it is necessary to move beyond undirected topic modeling and toward a human-supervised method of classifying texts––that is, toward a digital method that looks specifically for developments in Slavic Studies scholarship on race and gender and analyzes those developments over time.
Perspectival Modeling
In his groundbreaking work Distant Horizons: Digital Evidence and Literary Change, Ted Underwood offers a novel method called “perspectival modeling” that makes it possible to quantify long-term change in literary genres on a massive scale.Footnote 26 This method trains classification models for various periods across time, articulating differences among the periodized models in a quantifiable measure of change. In this regard, the term “perspective” is important. Underwood measures, from the perspective of an earlier time, how texts become increasingly dissimilar and unfamiliar while still being clearly identifiable as part of a common genre. We adapted Underwood's technique in two ways: by using it to understand scholarly fields rather than literary genres and by enabling it to measure granular change over one short period of time within a single model. This new method pinpoints increased linguistic overlap in recent years between Slavic Studies and two fields that focus on identity: Gender Studies and African American Studies.
Our team trained one classification model for each of these fields.Footnote 27 The model sorting Slavic Studies texts from Gender Studies ones was able to predict the correct discipline in 98% of its samples (40764 correct, 478 incorrect). The African American Studies model was correct 99% of the time (98074 correct, 1070 incorrect). The high accuracy of these models suggests that real and significant differences exist between Slavic Studies and the other two classes. With these models, we ran predictions for every text in our JSTOR Slavic Studies sample. Nearly all the texts were correctly classified as Slavic, but we also recorded a score measuring similarity to other disciplines. This score can be used to analyze individual works, or it can be tracked over time to measure continuous, non-periodized developments in entire fields.
For example, Amanda Bellows's American Slavery and Russian Serfdom in the Post-Emancipation Imagination is a Slavic Studies title that has a very high prediction score for African American Studies (0.999) and a very low score for Slavic (0.0003).Footnote 28 This text is a comparative study with equal focus on cultural responses to emancipation in Russia and the US. With such a balance, we might expect the predictions to be 50% Slavic and 50% African American. However, a prediction score of 0.5 would indicate very high ambiguity. Such a score would show that the text contains nothing very characteristic or distinctive of either discipline. With very few similar works in the corpus, the model has learned that a text with any African American-related subject matter is most likely a work of African American Studies and is highly unlikely to be a work of Slavic Studies.Footnote 29 (Figure 1).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220228150510494-0096:S0037677921000784:S0037677921000784_fig1.png?pub-status=live)
Figure 1: Slavic Studies Texts Incorrectly Classified as Gender Studies or African American Studies
On the scale of the entire field, however, the ambiguity we might expect from crossover texts does emerge over time. As seen in Figure 2, the model becomes less certain of its classifications for both African American Studies and Slavic Studies between 2015 and 2020.Footnote 30 (Figure 2). This means overlap with African American Studies has become less of an anomaly in Slavic Studies.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220228150510494-0096:S0037677921000784:S0037677921000784_fig2.png?pub-status=live)
Figure 2: Area of Unusual Ambiguity, 2015–2020
Since 2015, the mean Slavic prediction score for Slavic Studies texts has been declining. This shift occurs at the same time that significantly more texts from Slavic Studies are miscategorized by the models because they bear resemblance to texts in other disciplines. This is true for both Gender Studies and African American Studies. Rather than using the model purely as a quantitative measure of change, we found that the models’ predictions are most useful as an interpretive tool. If the model has identified a significant change in the field, how might we account for that shift? What has it found?
Looking at the Slavic Studies texts classified most strongly as Gender Studies, there is a clear connection between language and classification. Nearly all the titles are written in Spanish, such as ¿Se puede hablar hoy de populismo en Rusia? (Can we speak today of populism in Russia?)Footnote 31 Between 1991 and 2020, the Gender Studies sample averages ninety-six titles in Spanish per year. For many years Slavic Studies has had no Spanish titles at all, with an average of 2.8 texts per year.Footnote 32 In 2001, however, there were thirteen Spanish-language titles; and, in 2018 there were twenty-five Spanish titles in the Slavic sample. Rather than recording “similarity” between fields, the model learned to associate Spanish with Gender Studies to such a degree that any Slavic titles published in Spanish are miscategorized.Footnote 33
Perspectival modeling offers a useful method of investigating macroscopic changes over time and to identify significant features of the collection. In this case, a machine learning model has learned, with great accuracy, how to distinguish between works from Slavic Studies and two other disciplines. While the model's internal logic and decision making are not accessible to us, we can nonetheless investigate the model's outputs and interpret its findings. In this case, the model identified a significant change in Slavic Studies scholarship beginning in 2015. The investigation of this shift revealed the significance of language in the model's classifications. Because the machine makes no assumptions about what differentiates scholarly fields, it can arrive at genuine insights that a human researcher might never have noticed.Footnote 34
Asking where robust discussions about race in the field do take place shows our study in yet another light. Students, former students, and non-tenured faculty have written the vast majority of blog posts and articles detailing the structures of racism that constrain who can enter this field and stay in it.Footnote 35 Moreover, these discussions have taken place almost entirely outside of peer-reviewed scholarly journals, despite our field's affinity for journal-based self-reflection. Perceived bodily difference, race, and Blackness have not been “outside concerns” for ASEEES NewsNet or All the Russias, either as subject matter for research in Eurasia or as matters that shape Eurasian Studies in the US. These research areas do not show up as prominent subfields in our analysis of journal data, however. Public-facing platforms must be available to scholars in precarious positions; but, when the resulting essays highlight the same patterns of casual discrimination in research advising, study abroad, and other areas for years, it becomes clear that our field cannot keep doing anti-racism and scholarship in two separate forums. The disparity between the robust apparatuses that students and scholars of color have developed for grappling with race through our field's online spaces and the absence of any similarly explicit critical tools in our topic modeling and perspectival modeling results calls for a significant shift in the most established and prestigious institutions of our field. As leading pedagogues and younger researchers pursue anti-racist work, peer-reviewed journals will have the opportunity to welcome and solicit anti-racist research as they have done this year. This shift will not rectify the direct interpersonal aggressions many people of color in the field report, but it will produce a newly proactive and sustained mechanism for grappling with the field's inequities in writing and for recognizing those who do so.
On the path toward that mechanism, we consider our multi-pronged digital approach to be but a first step: a nimble offering in response to our present moment in a time of crisis. Between this article and our accompanying GitHub repository, our hope is that scholars can replicate our dataset and use it for further research. We implore other scholars to build upon our pilot study, analyzing the publishing history of prominent Slavic Studies journals from a range of angles that may help this field reach more ethical practices and norms.