Leveraging the vast quantities of material published online to address pressing social questions – particularly in a reliable and conceptually valid manner – is a critical frontier for social scientists. Over the last two decades, the Internet Archive, a non-profit organization attempting to archive the Internet, has collected and curated 1.1 billion captures of content posted on the US government's ‘.gov’ Internet domain. This unique archive includes original web content posted on senators' and representatives' official websites (including policy platforms, blogs and daily updates), complete documents (for example, PDF copies of speeches or press releases), congressional transcripts (for example, the documentation of a congressional session) and, importantly, content that has been subsequently amended or removed. Such archives provide largely untapped resources for measuring attributes, behaviors and outcomes relevant to political science research with unprecedented granularity, scalability and attention to shifts over time.
The Internet Archive's .gov collection – which includes captures of content published on US legislators' official websites – offers a range of applications for understanding legislators' attitudes and political behavior. For example, scholars could use Internet archives to capture the relative attention and gravitas a legislator gives to certain policy initiatives over others. Doing so introduces new approaches to quantifying legislators' policy priorities and provides large-scale ways to measure and compare the deliberative strategies legislators employ to shape policy (Schonhardt-Bailey Reference Schonhardt-Bailey2008). These data could similarly be used to complement well-established measures of legislators' political ideology (Lewis et al. Reference Lewis2018) with data about how their public displays of conservatism or liberalism shift over time, wane or crystallize at particular political moments – and what social cues they use to signal those ideologies (Diermeier et al. Reference Diermeier2012). Internet archives could also help quantify when and how legislators appeal to various constituent identities (Collingwood Reference Collingwood2019) or how their rhetorical behavior varies across racial and gender identities (Bicquelet, Weale and Bara Reference Bicquelet, Weale and Bara2012). Meanwhile, these data could reveal the emotional tones or dispositions (such as hope or fear) legislators most commonly invoke as part of their public profile.
In this letter, we leverage the Internet Archive's ‘.gov’ collection to measure a new dimension of legislators’ religiosity: their public use of religious rhetoric on official congressional websites. This scalable, time-variant measure, which represents the frequency with which religion infuses a legislator's public profile, aligns with difficult-to-collect conventional approaches to measuring legislators' personal religious practices. It enables researchers to isolate the unique ways in which legislators apply religious concepts to their legislative behavior or to understand how legislators' political objectives shape their public displays of religiosity. In doing so, we demonstrate that the Internet Archive – specifically its 90-terabyte collection of text from the ‘.gov’ domain – introduces unprecedented opportunities to develop nuanced and cost-effective approaches to analyzing political behavior.
Legislators' Religious Attributes
Virtually all current and former members of the US Congress self-identify as religious.Footnote 1 For many legislators, their own religious identity, and those of their constituencies, influences their political behavior (Fastnow, Grant and Rudolph Reference Fastnow, Grant and Rudolph1999; Guth and Kellstedt Reference Guth and Kellstedt2005; Marchetti and O'Connell Reference Marchetti and O'Connell2017; Oldmixon Reference Oldmixon2005; Yamane and Oldmixon Reference Yamane and Oldmixon2006). Religious identity and campaign rhetoric can also provide candidates with a vehicle to amass public support (Clifford and Gaskins Reference Clifford and Gaskins2016; Coe and Chapp Reference Coe and Chapp2017; Domke and Coe Reference Domke and Coe2008), prime and channel voters' predispositions (Tesler Reference Tesler2015), or help activate partisan voting (Campbell, Green and Layman Reference Campbell, Green and Layman2010). Meanwhile, denominational lobbying efforts catalyze religious membership as an efficient vehicle for influencing congressional behavior (Djupe, Olson and Gilbert Reference Djupe, Olson and Gilbert2005; Mihut Reference Mihut2011).
However, religious identity is not the only politically informative dimension of a legislator's religious attributes. Identity can mask intra-denominational partisan divides (Guth et al. Reference Guth2006, 225–26) and does not always reliably influence political behavior (Jones-Correa and Leal Reference Jones-Correa and Leal2001). Measuring a legislator's religiosity – the depth of her religious beliefs and/or the extent to which she integrates those beliefs into her political behavior – can therefore usefully complement research on elected officials' religious identity or affiliation.
Unfortunately, quantifying personal religiosity is challenging. Religiosity is often measured according to the frequency with which a person attends religious services or practices religious rituals like prayer or meditation (ANES 2016). However, these expressions are often unobservable and people routinely over-report them (Hadaway and Marler Reference Hadaway and Marler2005). Guth (Reference Guth2014) advanced efforts to quantify legislators' religiosity by recording the observable religious activities of each of the 435 members of the 112th House of Representatives over a one-year period (2012). While informative, this approach is time intensive. Nor is it feasibly scalable over time or across Congresses; because it is static, it provides limited insights into how a legislator publicly mobilizes her religious beliefs. We use the Internet Archive to develop a scalable, time-varying approach to measuring a legislator's public-facing religiosity. Doing so surmounts barriers that often prevent analyses of the personal roots that shape legislative behavior (Burden Reference Burden2007).
Internet Archive Data
Methods used to capture and archive the Internet yield unavoidably messy, unstructured data; the Internet Archive data is no exception. These data should generally be treated as neither complete nor representative. Lists of seed uniform resource locator (URLs) – the starting points for any web crawl – are not randomly generated, they change from one crawl period to another, and they are finite (while the Internet's expanse is arguably infinite). However, we identify three reasons why our particular Internet-archived data meet the high standard required for social science analysis. First, the Internet Archive uses the best available methods for curating the virtually limitless Internet.Footnote 2 Secondly, web-crawling technology has improved dramatically over time. For example, Internet Archive software crawled the US White House domain three times in 1997, at least once a week in 2008, and more than once a day by 2012 (Appendix Figure 1).
Thirdly, the Library of Congress contracted with the Internet Archive to achieve crawls that neared completion among the US government domain during the last three months of each election year (2004–2012). To do so, the initiative identified as many ‘.gov’ seed URLs as possible and activated complete crawls through all layers of subsequently linked URLs. This dramatically increased the amount of government material captured and archived,Footnote 3 making the election-year government collection among the most comprehensive in the Internet Archive. We limit our analysis to election years between 2006 and 2012 and aggregate our data to the year level. Doing so allows us to proceed as if these data represent a nearly complete universe of congressional website material.
Measuring Public-Facing Religiosity
The Internet Archive's government collection was hosted on a Hadoop distributed computing cluster (Gade, Wilkerson and Washington Reference Gade, Wilkerson and Washington2017). We created a unique regular expression to identify the URL root for each member of the 109–112th Congresses (for example, murray.senate.gov for Senator Patty Murray, D-WA). We then scraped the collection for all website captures within that domain.Footnote 4 Extracted content may include floor speech transcripts, opinion pieces, constituent newsletters, policy platforms, legislative priorities and any other material legislators publish on their websites.Footnote 5
Next, we calculated the yearly proportion of religious terms (relative to the total yearly word count) that appeared in the text of each legislator's domain. We based these counts on two independent lists of religious words: the Linguistic Inquiry and Word Count (LIWC)'s designated religious terms and the religious terms used by 2012 US presidential candidates in their campaign stump speeches (Chapp Reference Chapp2012).Footnote 6 We removed potential confounders from these lists (for example ‘minister’ could reference ‘prime minister’). To avoid duplication, we limit our analyses to data from legislators' original website captures and content added since the most recent previous capture.Footnote 7 Our unit of analysis (legislator-year) mirrors those typically applied to members of Congress and improves upon measures of religiosity that do not change over time.
The resulting behavioral measure represents a legislator's religious rhetoric. This rhetoric may include religious holiday messages, offers of prayers amid turmoil, announcements of collaborative relationships with religious leaders, biographical information or religious justifications for policy initiatives. Distinct from measuring legislators' private behavior or the depth of their underlying beliefs, this measure represents the level to which a given legislator integrates and models religious concepts (broadly defined) as relevant to her political objectives and public profile. Scholars could fine-tune these data and methods to pinpoint legislators' specific religious identities, traditions or affiliations, or to identify the particular policies to which they apply religious justifications.
Evaluating Religiosity Measure
We conduct two analyses to assess the validity of our resulting measure of public-facing religiosity. Our first and main analysis uses Guth's (Reference Guth2014) manual scoring of House of Representatives members' religious activities in 2012 to estimate the relationship between representatives' personal religious practices and public religious rhetoric.Footnote 8 If measured appropriately, we expect these two dimensions of religiosity to be associated with one another. Indeed, linear regression analyses demonstrate a strong, positive relationship between a representative's religious activity and rhetoric (Table 1). This relationship is stable and statistically reliable (p < 0.01). It holds across modeling approaches, including models with distinct outcome variables based on different lists of religious words and among models that include relevant control variables.
Table 1. House of Representatives: comparing measures of religious rhetoric to religious practice
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210912153340023-0165:S0007123420000290:S0007123420000290_tab1.png?pub-status=live)
*p < 0.1; **p < 0.05; ***p < 0.01
This is the most important means of validating our measure of public-facing religiosity. It demonstrates the internal validity of the measure and assuages concerns that certain legislators may invoke religiosity indiscriminately or without a grounding in personal practice or conviction. We can conclude that, at least to some extent, legislators' political invocation of religious concepts mirrors their personal religious practices. Furthermore, this analysis demonstrates the meaningfulness of archived Internet data and supports the premise that using these data can provide conceptually valid and reliable measures that have not been previously available to political scientists.
Table 1 demonstrates that individual-level evaluations of personal religious practice among US House members hold a strong, positive statistical association with those members’ use of religious rhetoric on their congressional websites. We next consider the US Senate. Since there is no equivalent member survey of religiosity among senators, we ask whether their religious rhetoric usage correlates with individual- and constituency-level demographic variables. Christian conservatism, public evangelism and Republicanism are increasingly intertwined (Dowland Reference Dowland2015). We therefore expect a positive association between a senator's ideological conservatism (indicated by voting patterns) and her usage of religious rhetoric. We also expect that senators who represent states with more ‘very religious’ constituents will employ religious rhetoric more frequently. Finally, prior research suggests that Christian male candidates disproportionately benefit from using religious rhetoric on the campaign trail.Footnote 9 We therefore expect that male senators, once elected, will disproportionately use religious rhetoric.
This analysis allows us to evaluate and demonstrate our measure's usefulness in various congressional contexts and to assess the generalizability of this approach to measuring legislator attributes. Because states are larger and more heterogeneous than House congressional districts, this Senate-based approach to measuring the effects of constituent demographics on members' religious rhetoric also represents a ‘tougher test’ of our measure, relative to the House of Representatives.
We estimate these relationships using three separate model specifications: beta regression, panel regression and ordinary least squares regression. This provides the most transparent and robust assessment of the factors that might affect senators' use of religious rhetoric. Because our dependent variable is a proportion, beta regression is the most conventional and appropriate approach. However, because beta models do not allow for fixed effects, we include a panel model with fixed effects for year and senator (which requires removing fixed senator-level variables like gender or race). Finally, due to concerns about the bias variance trade-off in statistical modeling (Shalizi Reference Shalizi2013), we include an ordinary least squares (OLS) model. Given the structure of our variables, this OLS approach should provide the toughest test of our measure.
We find support for each of the above hypotheses. Ideological conservatism has a sizable, positive relationship with our measure of religious rhetoric (Table 2). Controlling for other relevant factors, senators who represent states with more ‘very religious’ constituents are also more likely to use religious rhetoric. This is especially noteworthy given that Senate constituencies tend to be more heterogenous than House constituencies. Male senators tend to use religious rhetoric more frequently than their female colleagues, a finding that aligns with prior research (Calfano and Djupe Reference Calfano and Djupe2011).
Table 2. Senate: relationships between senator and state characteristics and religious rhetoric
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210912153340023-0165:S0007123420000290:S0007123420000290_tab2.png?pub-status=live)
*p < 0.1; **p < 0.05; ***p < 0.01
Thus we conclude that the internet-based measure of public-facing religiosity we introduce here correlates with individual-level reports of religious activity among lawmakers in the US House of Representatives, and with member and constituency demographic differences among lawmakers in the US Senate. The first result provides confirmation that our measure correlates with underlying personal religious behavior. The second is less precise in the sense that positive correlations between senators' religious rhetoric usage and their general voting behavior, constituency characteristics or gender may be less indicative of their personal religious convictions. However, this second analysis provides additional evidence that this Internet-Archive measure of members’ attributes is conceptually valid across congressional chambers and constituencies. Thus our measure of public-facing religiosity has ‘face validity’ when applied to both chambers of Congress. Future research may try to disentangle the personal and political motivations driving religious rhetoric usage among lawmakers.
There are important differences between our main (Table 1) and supplementary (Table 2) analyses. Our main analysis intended to demonstrate associations among two forms of religious behavior (religious rhetoric and religious practice); these associations likely demonstrate underlying religious convictions that determine various forms of religious behavior. Given these conceptual similarities, this analysis should yield distinctly observable, positive relationships between religious practice and rhetoric. However, we would expect associations between demographics and religious behavior (Table 2) – which capture the relationship between identity and belief-based religious rhetoric – to be informative but far more subtle (see Table 2's lower R-squared values). Together, these analyses support the credibility of our measure of legislators' public-facing religiosity (religious rhetoric) as a meaningful dimension of legislators' religiosity.
Researchers using Internet archives to evaluate rhetorical associations must account for the unavoidable noise, structural similarities, idiosyncrasies and linguistic false positives embedded within text-based big data. We have modeled an approach to doing so in this letter, and demonstrated one approach to using these data to derive improved measures of important political science concepts.
Contributions
This research makes three applied methodological contributions to social science research. First and most importantly, it marks a step forward as political scientists seek to catalyze the unmarked expansion of Internet data to answer previously unapproachable political questions. In doing so, we model an approach to leveraging the tremendous breadth of Internet data to advance research in the digital age.
Secondly, we develop a meaningful, scalable and time-varying approach to measuring legislator religiosity. Informative in its own right, this measure can also help scholars identify religion's political outcomes; understand how politics reshape religious orientations, affiliations and public behavior (Campbell et al. Reference Campbell2018; Djupe, Neiheisel, and Sokhey Reference Djupe, Neiheisel and Sokhey2018; Margolis Reference Margolis2018); and distinguish between political and religious ideologies. As American partisan politics have polarized (Layman Reference Layman1999), religion's political identities and outcomes are increasingly mediated through partisanship and political ideology (Marietta Reference Marietta2009; Newman et al. Reference Newman2016, 294; Norris and Inglehart Reference Norris and Inglehart2011, 211; Oldmixon and Hudson Reference Oldmixon and Hudson2008; Yamane and Oldmixon Reference Yamane and Oldmixon2006).
However, conflating conservatism with religiosity fails to sufficiently capture crucial distinctions between the two, particularly among issues over which they may conflict. As a result, this measure provides scholars with new tools to evaluate – and perhaps challenge – assumptions that identity-related variables (like political ideology or religious affiliation) function as sufficient proxies for religiosity (here, as with past studies, understood as public-facing religiously). Congress scholars will likely gain valuable new insights if they adopt and incorporate this public-facing dimension of religiosity into their future research on congressional politics. Most importantly, these analyses provide strong evidence that archived congressional website text can be used to measure and analyze scalable, time-variant legislator attributes, like religiosity.
Finally, this analysis improves upon existing approaches to analyzing legislators' attributes and constituent communications. Congress scholars have gained valuable political insights from analyzing disaggregated slices of legislatures' public profiles, like their press releases, newsletters or floor speeches (Maltzman and Sigelman Reference Maltzman and Sigelman1996; Osborn and Mendez Reference Osborn and Mendez2010). However, these approaches do not comprehensively capture senators' multifaceted communication platforms. Congressional websites represent and aggregate these and other public engagement materials. Analyzing this aggregated material as a whole offers a more comprehensive analysis of legislators' public engagement (Esterling, Lazer, and Neblo Reference Esterling, Lazer and Neblo2010). We invite scholars to adopt our presented methods of gathering and measuring data to analyze other legislator attributes, policy priorities or communication strategies.
Supplementary materials
Data replication sets are available in Harvard Dataverse at: https://doi.org/10.7910/DVN/GNL6XG, and online appendices at https://doi.org/10.1017/S0007123420000290.
Acknowledgements
We would like to thank the eScience Data Incubator program at the University of Washington (2015), the Archived Unleashed Workshop Series and audiences and discussants at the American Political Science Association Annual Meeting (2015), the Text-as-Data Conference (2017) and the Political Methodology Annual Meeting (2018) for their intellectual and methodological contributions to projects that laid the groundwork for this research note. We also thank Altiscale and Start Smart Labs (in particular Ellen R. Salisbury and Raymie Stata) for making this research possible by hosting the .GOV collection on a public cluster and making it available to researchers free of charge. The original effort to use the Internet Archive for political science research was supported by NSF award number #1243917 - PI-NET: Poli-Informatics Research Coordination Network. Any mistakes are our own.