While women and racial and ethnic minorities remain underrepresented throughout the United States, the racial, ethnic, and gender diversity of candidates in state and federal elections has never been greater. Fifteen years ago, before the election of the country’s first Black president, many social scientists and most pundits would have thought today’s more diverse political reality was unlikely. As evidence, they could point to the stunning amount of racial resentment held by white voters, including Democrats (Kinder and Sanders Reference Kinder and Sanders1996; Krupnikov and Piston Reference Krupnikov and Piston2015). They could highlight the historical rarity of nonwhite and women officeholders at the local, state, and federal levels (Clark Reference Clark2019; Lublin Reference Lublin1997). In particular, they would note that even when racial and ethnic minority individuals held office, it usually was in heavily gerrymandered and geographically segregated majority-minority districts (Lublin Reference Lublin1997), resulting in few opportunities for candidates of color to win in majority-white districts. Given all of this evidence—and in addition to the Shelby County vs. Holder (2013) decision gutting the 1965 Voting Rights Act—scholars and pundits had every reason to consider Obama’s 2008 victory as an outlier (Kinder and Dale-Riddle Reference Kinder and Dale-Riddle2012), a lucky break (Lewis-Beck, Tien, and Nadeau Reference Lewis-Beck, Tien and Nadeau2010), and a precursor to an even greater white-voter backlash against minority candidates (Hajnal Reference Hajnal2006).
Around that same time, researchers realized that much of the work on elections was hampered by a difficult data problem. Although scholars of race, ethnicity, and gender representation in the United States had some demographic information about officeholders, we knew little about candidates who lost. Before the social media revolution of the late 2000s, collecting biographical information about candidates required either surveys (Broockman et al. Reference Broockman, Carnes, Crowder-Meyer and Skovron2013; Maestas et al. Reference Maestas, Sarah Fulton and Stone2006), interest-group publications (e.g., National Association of Latino Elected and Appointed Officeholders and Joint Center for Political and Economic Studies), or limiting the focus to fewer congressional races, each of which involved tradeoffs of coverage or bias.
Largely due to these difficulties, a large-scale, over-time state legislative dataset of candidate race, ethnicity, and gender characteristics does not exist. In an age when many details of candidates (including “major in college” and “current car”) are available on websites such as Project Vote Smart and Ballotpedia, none of these sites provides variables about candidate race, ethnicity, and gender. Although Ballotpedia provides some photographs from their candidate surveys, coding every candidate and every cycle and then matching them with district information is resource consuming. We wanted to have consistent, valid, and publicly available data about the thousands of candidates who run for state government so we could answer questions about elections and representation in the United States, but the data did not exist. In 2012, we embarked on a project that brought together Klarner’s state legislative candidate lists (Klarner Reference Klarner2018b; Klarner et al. Reference Klarner, Berry, Carsey, Jewell, Niemi, Powell and Snyder2013), interest-group publications, and online sources such as Ballotpedia and Facebook to code the race, ethnicity, and gender of state legislative candidates for office. The evolution of social media, online campaigns, and journalism in the past 15 years has made finding biographical information about election also-rans easier to collect systematically. As a team, we were able to code the candidates from 15 states between 2012 and 2016,Footnote 1 but the task was cumbersome and limited.
To expand on these efforts, we created the Candidate Characteristics Cooperative (C3), a hand-coded database of primary- and general-election candidates for state legislative elections held in 2018.Footnote 2 We identified 19 methodologically diverse contributors across the country; in return for coding a single state, they were offered access to the complete dataset during an embargo period of 12 months. We provided a list of the primary-election candidates and relevant electoral data and asked contributors to code the race, ethnicity, and gender of the candidates using a rubric that we had developed. Contributors submitted their completed state files to us and we compiled these data into a single, uniform file.
The result of this pilot project was a hand-coded database of all state legislative primary- and general-election candidates from 2018. By covering approximately 14,000 unique major- and minor-party candidates, contributors were able to identify the race and ethnicity of 94% of the candidates when using the techniques described previously. Coding was highly consistent across contributors; 26% of candidates were coded by more than one contributor and, 96% of the time, contributors produced the same race and ethnicity coding despite not coordinating efforts beyond receiving the provided rubric. Given that 22 different researchers (i.e., team leaders, graduate students, and undergraduates) hand-coded candidates, this degree of correspondence indicates that the hand-coding method produces consistent, replicable results. The C3 dataset also has similarly complete coding of candidate gender as well as supplemental information on the ancestry and national origin, occupation, and religion of many candidates (Shah, Juenke, Fraga Reference Shah, Juenke and Fraga2022).
During the past eight years, we have learned much about elections involving racial and ethnic minority and women candidates. First, contrary to the reasonable expectations of many race scholars, we found that Black and Latina/o candidates did well when they were on state legislative ballots (Juenke Reference Juenke2014; Shah Reference Shah2014; Juenke and Shah Reference Juenke and Shah2016). However, we also discovered that candidates of color were rarely found in elections in majority-white districts. By adding new data—that is, election losers—to descriptive representation models, we discovered that the empirical focus of descriptive representation models needed to change, from voters choosing officeholders to candidates structuring voter choices. The results of this shift mirrored the contemporaneous findings in the gender literature, generally demonstrating that when women run, they can win—despite facing high levels of hostile sexism from many voters and donors (Barnes, Branton, and Cassese Reference Barnes, Branton and Cassese2017; Cassese and Holman Reference Cassese and Holman2017; Crowder-Meyer and Cooperman Reference Crowder-Meyer and Cooperman2018; Sanbonmatsu Reference Sanbonmatsu2006).
Second, we discovered that one of the main reasons minority officeholders were underrepresented across the United States is that minority candidates were not appearing on ballots. What had for decades been understood as a voter-“demand” problem had turned into a candidate-“supply” problem. We demonstrated that white people, even in racially conservative white districts, vote for nonwhite candidates if the candidates belong to their political party and signal that they will represent their political team. Ted Cruz, Mia Love, Marco Rubio, and Tim Scott are a few recent examples in very conservative states.
Most recently, we have answered two interesting and important questions with the data. Building on our “supply-side” theory, we first asked: “Have the number of candidates of color and women who win elections increased over time?” As noted previously, racial and ethnic minorities and women are underrepresented in virtually all levels of government. However, 2018 witnessed an increase in women candidates (Dittmar Reference Dittmar2018), candidates of color (Schneider Reference Schneider2018), and women candidates of color (Bejarano and Smooth Reference Bejarano and Smooth2018). Most ran as Democrats, which coincided with the expected Democratic “wave” in 2018 (Klarner Reference Klarner2018a) and demonstrated that disparities in candidate partisanship drive aggregate increases and decreases in gender and minority representation. Consequently, we hypothesized that supply-side factors would drive aggregate increases in officeholding for women and candidates of color rather than gender, racial, and ethnic demand-side voter factors, which would indicate changing preferences for minority and women’s representation. To examine changes in candidacy and officeholding over time (Fraga, Shah, and Juenke Reference Fraga, Shah and Juenke2020), we tested this by comparing the emergence and success of women and candidates of color in states in 2018 to the same states that we coded in previous years (i.e., 2012, 2014, and 2016). We found that 2018 very well may have marked a turning point in women and minority representation in the United States—not because women and candidates of color were more likely to win their elections but rather because more of them ran for office. About 30% more women won state legislative office in 2018 than in previous years, and the number of women who lost compared to previous years almost doubled. These numbers are roughly similar for candidates of color and women of color in 2018, which provides early indications of a historic change in who will hold elected office in the future (Fraga, Shah, and Juenke Reference Fraga, Shah and Juenke2020).
The second question we asked using these data was: “Are the effects of representation transitive?” The shift in focus to racial and ethnic minority candidate emergence and supply produces new opportunities for research addressing minority representation. Chief among these is the possibility that minority candidates may be discouraged from seeking office due to a perceived inability to win or that their likelihood of winning might be affected by up-ballot or down-ballot representation, an insight drawn from the literature on gender and politics. Parties exert substantial control over who seeks office in legislative elections (Brown Reference Brown2014; Hassell Reference Hassell2016), and scholars have determined that gender underrepresentation may be a function of partisan recruitment and gatekeeping (Crowder-Meyer Reference Crowder-Meyer2013; Fox and Lawless Reference Fox and Lawless2005; Karpowitz, Monson, and Preece Reference Karpowitz, Monson and Preece2017; Lawless Reference Lawless2011). Party elites appear to discourage women from running due to a perception that they are less likely to win and less qualified as candidates (Niven Reference Niven2006; Sanbonmatsu Reference Sanbonmatsu2002)—a perception that may change with the success of women candidates (Doherty, Dowling, and Miller Reference Doherty, Dowling and Miller2019; MacManus Reference MacManus1981; Sanbonmatsu Reference Sanbonmatsu2006). If elites believe that minority candidates are less likely to win in heavily white districts, minority candidates may be similarly discouraged from seeking office in these areas. Using our data from 2012 and 2014, we found evidence that the presence of minority higher-level officeholders positively affects the chances of minority down-ballot success (Fraga, Juenke, and Shah Reference Fraga, Juenke and Shah2020). Leveraging information about the overlap between congressional and state legislative districts, we demonstrate that the victories of candidates of color for Congress reduce the co-ethnic and racial demographic thresholds associated with state legislative candidacy. This suggests that perceptions of minority-candidate viability play a key role in structuring contemporary disparities in who runs for office.
In summary, scholars have considered the different ways in which women and minority candidates’ paths to office, campaign strategies, and representational styles may differ from their white and/or male counterparts but only infrequently have been able to engage in large-scale systematic study of these phenomena using data from real-world elections. The C3 database allows scholars to answer vital questions about diversity, inclusion, and representation at a time when more women and candidates of color are running for office than ever before. The 2018 data are now publicly available to all scholars and the public (Fraga, Juenke, and Shah Reference Fraga, Juenke and Shah2021). In creating a multiyear database, we want to provide a valuable resource to scholars interested in taking a deeper look at the characteristics of thousands of state legislative candidates and officeholders—the forerunners of change in American politics—and build a foundation on which individual researchers can add their own data. In doing so, we hope to promote collaborative data collection and support research in race, ethnic, and gender politics more broadly.
The C3 database allows scholars to answer vital questions about diversity, inclusion, and representation at a time when more women and candidates of color are running for office than ever before.
Data Availability Statement
Research documentation and data that support the findings of this study are openly available at the PS: Political Science & Politics Dataverse at https://doi.org/10.7910/DVN/VHAPHV.