Although election forecasting has provided an excellent proving ground for theories of voting behavior (Lewis-Beck and Stegmaier Reference Lewis-Beck and Stegmaier2014; Linzer Reference Linzer2014; Sides Reference Sides2014), some political scientists contend that forecasting is now at a crossroads. It is ironic that in the age of Big Data—in which many extol the power of web-generated data and imbue it with the ability to solve some of humanity’s vexing problems (Lohr Reference Lohr2012) Footnote 1 —it is a “lack of data” that limits the development of more accurate electoral-forecasting models (Linzer Reference Linzer2014).
Based on my experience with online media, I wondered whether the treasure trove of information produced online might provide a solution to the data problem that confronts forecasting. Footnote 2 In 2012, I hypothesized that Facebook metrics might be paired with electoral fundamentals in a simple model to predict the outcomes of individual congressional races and produce continually updated forecasts. Specifically, I asked: “Can candidate-page fan and engagement statistics tracked by Facebook be used to forecast congressional-campaign results?”
The initial answer to this question is: quite possibly. The performance of the Facebook Model in forecasting the vote in seven hotly contested campaigns for US Senate in 2012 indicates that readily available and transparent Facebook metrics—paired in a model with fundamental electoral benchmarks similar to those used in national-election forecasting—may provide an accurate new tool for predicting the results of individual congressional contests.
The story of the Facebook Model’s performance begins, as it should, with the proven election theories in which it is rooted and the one additional theoretical insight that makes it a promising hybrid.
THEORY
Lewis-Beck (Reference Lewis-Beck and Stegmaier2014) advances five fundamental theories of voting behavior that have been positively tested by forecasting. Four of the theories form the backbone of the Facebook Model. The model begins with the premises that voters are retrospective (Fiorina Reference Fiorina1978; Lewis-Beck and Stegmaier Reference Lewis-Beck and Stegmaier2000, Reference Lewis-Beck and Stegmaier2014; Sides Reference Sides2014); incumbency matters (Campbell Reference Campbell2014); and although “campaigns [can] influence…electoral outcome[s],” the partisan preferences of voters are not “easily swayed” (Lewis-Beck and Stegmaier Reference Lewis-Beck and Stegmaier2014). The Facebook Model adds to the latter two premises the following simple theoretical insight:
Although voters’ partisan preferences are not easily swayed, the willingness of partisans and fence sitters to publicly commit to and engage with a candidate can be influenced by the candidate’s campaign. Furthermore, the more supporters that a campaign enlists and engages in political action before Election Day, the more likely that campaign is to win.
Two basic Facebook metrics can be used to quantify a campaign’s success in enlisting and engaging Facebook users. Candidate “likes” track the number of Facebook users who enlist in a campaign by becoming fans of a candidate’s page. Footnote 3 Facebook’s “people talking about this” (PTAT) statistic Footnote 4 measures engagement by counting interactions between users and a candidate’s page. I argue that these Facebook measurements are a real-time measurement of a campaign’s effectiveness in enlisting and engaging supporters. When these data are included in a model with electoral fundamentals used in standard forecasting, the outcomes of individual Senate races can be predicted. In 2012, my team used this model to predict outcomes in seven US Senate races starting eight weeks before Election Day (MacWilliams and Erikson Reference MacWilliams and Erikson2012).
FACEBOOK’S RELEVANCE TO FORECASTING
Why might Facebook data be a useful tool for estimating campaign effectiveness? Since the inception of social media use by candidates in 2006, research has found that political activity on Facebook mirrors offline political action.
First, in terms of enlistment, Facebook fans are described as “a proxy for the underlying enthusiasm and intensity of support a candidate generates” (Williams and Gulati Reference Williams and Gulati2007). A significant correlation between online fans and offline vote share was documented even when controlling for campaign expenditures, press coverage, and organizing (Williams and Gulati Reference Williams and Gulati2008).
In terms of engagement, scholars have found a significant relationship between online and offline participation in which greater Facebook political activity is correlated with increased political action offline (Park, Kee, and Valenzuela Reference Park, Kee and Valenzuela2009; Vesnic-Alujevic Reference Vesnic-Alujevic2012) and is a “significant predictor of other forms of political participation” (Vitak et al. Reference Vitak, Zube, Smock, Carr, Ellison and Lampe2011). Political engagement on Facebook leads to “mobilizing political participation” offline (Feezell, Conroy, and Guerrero Reference Feezell, Conroy and Guerrero2009).
Why might Facebook data be a useful tool for estimating campaign effectiveness? Since the inception of social media use by candidates in 2006, research has found that political activity on Facebook mirrors offline political action.
The mobilizing effect of Facebook messages distributed peer-to-peer or en masse is also potent. A randomized test conducted in 2010 (N = 61 million) of third-party, get-out-the-vote Facebook messages found that they “directly influenced the voting behaviors of millions of Americans” (Bond et al. Reference Bond, Fariss, Jones, Kramer, Marlow, Settle and Fowler2012).
Second, the Facebook data used in my model are standardized measurements that are readily accessible and regularly tracked. These data avoid many of the limits and methodological challenges found in many Big Data datasets, including Twitter (Boyd and Crawford Reference Boyd and Crawford2011). The “right now” availability of Facebook data and resulting lack of historical record (Bollier and Firestone Reference Bollier and Firestone2010), however, remain a challenge that can be surmounted only by capturing data weekly, as our team did during the closing weeks of the 2012 election and has continued to do since September 2013.
Third, Facebook is ubiquitous. In 2013, Pew Research reported that “Facebook is popular across a diverse mix of demographic groups” (Duggan and Smith Reference Duggan and Smith2014). Of those Americans who are online, 71% are on Facebook, 63% of whom check Facebook at least once a day. Moreover, 45% of Internet users 65 and older now use Facebook. This represents a 28-percentage-point growth in seniors’ use of Facebook in only one year (Project Reference Project2013). Facebook is no longer simply a social medium; it has become a social utility that campaigns are using to reach, activate, and mobilize voters. Facebook users comprise neither a random nor a perfectly selected sample of the American electorate, and they are not conceptualized as such in the model. Instead, the relative effectiveness of campaigns in enlisting, engaging, and mobilizing Facebook users is theorized as a proxy for estimating the effectiveness of a campaign to generate support, activism, and votes among voters—much as Americans’ views of the economy in presidential forecasting models are used as a tool for estimating retrospective voting.
THE MODEL
Following the best practices of forecasting, the Facebook Model is steeped in theory, parsimony, and transparency. It is founded on the assumption that past election results and incumbency are fundamentals that play an important role in shaping electoral outcomes (Brody and Sigelman Reference Brody and Sigelman1983; Campbell Reference Campbell2009; Campbell and Garand, Reference Campbell and Garand2000; Lewis-Beck and Rice Reference Lewis-Beck and Rice1992; Rosenstone Reference Rosenstone1983). The model adds to this foundation a participation variable (quantified through social media statistics generated by Facebook) that theoretically captures the effectiveness of each campaign’s efforts to enlist and engage voters, as well as their potential to mobilize voters on Election Day. The Facebook Model is specified as follows:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160921042601309-0193:S1049096515000797:S1049096515000797_equ1.gif?pub-status=live)
In the model, Senate vote is the forecasted percentage of the two-party vote won by either major-party candidate. It is a function of the partisan vote index (PVI), which measures past election results, incumbency, and the estimated candidate-participation advantage generated from Facebook metrics.
PVI is estimated regularly by the Cook Political Report and has been used to undergird other election forecasts (Cook Reference Cook2012). For example, Campbell’s 2012 House Seats-in-Trouble forecasting model used the Cook Political Report’s race-by-race analysis, which is predicated in large part on PVI (Campbell Reference Campbell2012b).
PVI averages the electoral performance of many candidates in a state or district over time to calculate existing partisan advantage. In this way, it captures the increasing polarization that presents statistical challenges to presidential models (Campbell Reference Campbell2014) but negates the fundamental advantages enjoyed by some Congressional incumbents.
The second fundamental variable—incumbency—is added to the Facebook Model to correct this PVI shortcoming. In presidential forecasting models, incumbency often is captured by a dichotomous variable. The inadequacy of quantifying presidential incumbency with a simple binary term is a contested question (Campbell Reference Campbell2014). Conceptualizing incumbency in a similar manner for individual Senate races—given the obvious electoral variations among Senate incumbents—is even more problematic. Thus, in the Facebook Model, incumbency advantage or disadvantage is determined by calculating how an incumbent performed, compared to the reported PVI, in the previous election. An incumbent Senator who won by five more percentage points in 2006 than predicted by the PVI would enjoy a five-percentage-point incumbent advantage in the 2012 model—if the PVI had remained constant in the intervening years. Footnote 5
The third variable, which enables the model to produce a forecast, trend data, and nowcast (Lewis-Beck and Stegmaier Reference Lewis-Beck and Stegmaier2014), is participation. The participation-advantage variable is theorized as a real-time measurement of the effectiveness of each campaign in enlisting and engaging Facebook users as well as its potential to mobilize the vote on Election Day. These three measurements are designed to capture and quantify the Facebook effects identified and studied by scholars since the 2006 elections (Bond et al. Reference Bond, Fariss, Jones, Kramer, Marlow, Settle and Fowler2012; Feezell et al. Reference Feezell, Conroy and Guerrero2009; Vesnic-Alujevic Reference Vesnic-Alujevic2012; Vitak et al. Reference Vitak, Zube, Smock, Carr, Ellison and Lampe2011; Williams and Gulati Reference Williams and Gulati2008; Williams and Gulati Reference Williams and Gulati2007, Reference Williams and Gulati2009a, Reference Williams and Gulati2009b; Zhang, Johnson, Seltzer, and Bichard Reference Zhang, Johnson, Seltzer and Bichard2010).
The first component of participation advantage is Facebook “likes.” Likes are a measure of Facebook users’ decisions about a candidate before Election Day. The growth of a candidate’s likes or fan base (i.e., Enlist Growth) over time tracks the effectiveness of a campaign in enlisting support among Facebook users.
The second component of participation advantage is Facebook’s PTAT statistic. PTAT measures active engagement with candidates, beyond mere support, in real time. Facebook users who engage with candidates online are politically mobilized.
The second component of participation advantage is Facebook’s PTAT statistic. PTAT measures active engagement with candidates, beyond mere support, in real time. Facebook users who engage with candidates online are politically mobilized. Although engagement with a candidate ebbs and flows depending on campaign events, success in building this politically mobilized group of activists over time (i.e., Engage Growth) is another component of campaign effectiveness.
The third component of participation advantage is the potential of a campaign to mobilize voters at a particular time (i.e., Mobilization Potential). This is conceptualized as the number of engaged PTATs divided by the campaign’s current fan base.
In the Facebook Model, these three measurements of campaign effectiveness are measured and combined weekly to produce the model’s dynamic participation-advantage variable (PA). How was this accomplished in 2012?
During each of the last nine weeks of the campaign, Facebook data for Senate candidates were collected and factored into the following equation to produce candidate participation scores (PS). For clarity, we use an example of a candidate named Hertz:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160921042601309-0193:S1049096515000797:S1049096515000797_equ2.gif?pub-status=live)
The participation score of Hertz’s opponent, Avis, also was calculated using the same formula.
Because Hertz and Avis are competing for votes from the same pool of voters, a Relative Participation Score (RPS) was calculated for each candidate, as follows:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160921042601309-0193:S1049096515000797:S1049096515000797_equ3.gif?pub-status=live)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160921042601309-0193:S1049096515000797:S1049096515000797_equ4.gif?pub-status=live)
From these two RPS figures, an absolute candidate PA was calculated each week by subtracting one candidate’s RPS from the other candidate’s RPS, as follows:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160921042601309-0193:S1049096515000797:S1049096515000797_equ5.gif?pub-status=live)
Finally, to produce the weekly campaign forecast, the Hertz–Avis vote was divided equally first between the two candidates and then adjusted to account for the PVI, Footnote 6 incumbency advantage, and weekly.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160921042601309-0193:S1049096515000797:S1049096515000797_equ6.gif?pub-status=live)
The vote percentage for Avis is simply 100 minus the Hertz estimated percentage.
2012 FACEBOOK MODEL PERFORMANCE
In 2012, candidate likes and PTAT data from September 1 to November 3 for major-party candidates in 15 of the most competitive Senate races Footnote 7 were gathered daily using PageData. Footnote 8 Competitive races were chosen for two reasons: (1) they provide a more difficult prediction challenge, and (2) competitive Senate campaigns are more likely to use Facebook. The data from seven of the campaigns Footnote 9 were complete during the entire period studied and were used to assess the performance of the model. PageData metrics for the other eight campaigns were incomplete and therefore excluded from our analysis. Footnote 10 The model’s performance was assessed in the following three ways:
-
• accuracy of the weekly model forecasts immediately after Labor Day
-
• performance of the Facebook Model versus a model using only fundamental variables
-
• performance of the Facebook Model versus weekly aggregations of race-level polling data
Model Accuracy
To assess the accuracy of the Facebook Model predictions eight and seven weeks before Election Day, election results for the Senate races studied were converted to two-party candidate totals. These results provided the dependent variable against which the forecasted percentages produced by the model for the weeks ending September 14 and 21 were regressed. Footnote 11
The R-squareds for the first and second sets of Senate race predictions produced by the model for the weeks ending September 14 and September 21 were 0.772 and 0.746, respectively. Moreover, in both weeks, the Facebook Model accurately predicted the ultimate Senate victors.
Model Performance versus Fundamentals
The performance of the Facebook Model also was tested against a fundamentals-only model that used PVI and incumbency variables to produce predictions. In this test, if the Facebook Model produced a higher R-squared than the fundamentals alternative, it added to the accuracy of the forecast.
In six of the eight assessment weeks, the Facebook Model (table 1) outperformed the fundamentals alternative. The two weeks in which the Facebook Model failed to outperform the alternative are an indication of the sensitivity of the model to relative changes in the performance of competing candidates. Averaging Facebook Model predictions over two weeks (i.e., a technique that is tested post-election that produces a rolling forecast) smoothes out the volatility, maintains the trending and nowcasting capability of the model, and produces forecasts that exceed the fundamental baseline every week.
Table 1 R-Squared of the Facebook Forecasting Model Predictions Versus R-Squared of Predictions Based on Static Fundamentals (PVI and Incumbency)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170425115807-86237-mediumThumb-S1049096515000797_tab1.jpg?pub-status=live)
The dependent variable is two-party vote.
Model Performance versus Poll-of-Polls Averages
Finally, the accuracy of Facebook Model predictions was evaluated against polling results—a benchmark suggested in the 2012 PS: Political Science and Politics Symposium (Campbell Reference Campbell2012a). First, the results of all 212 polls completed from September 8 to November 3, 2012, in the seven Senate campaigns under study were gathered from the Huffington Post Election Dashboard (HuffPost Reference HuffPost2012) Starting with the week of September 8–14 and continuing through November 3, the results in each race were averaged, converted into two-party candidate totals to arrive at weekly poll-of-polls candidate estimates, and then regressed against election results.
Table 2 compares the weekly poll-of-polls R-squared to the Facebook Model. The simple Facebook Model was a better predictor of outcomes in the Senate races studied in five of eight weeks. It is important to note that the Facebook Model was a better predictor of election results in four of the five weeks that were farthest from Election Day. In other words, when compared to poll-of-poll averages, the Facebook Model was better at forecasting outcomes the farther the prediction was from Election Day.
Table 2 R-Squared of the Facebook Forecasting Model Predictions Versus R-Squared of Averaged Polls-of-Polls Predictions
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170425115807-02766-mediumThumb-S1049096515000797_tab2.jpg?pub-status=live)
The dependent variable is two-party vote.
CONCLUSION
Election forecasting is a worthy pursuit that has tested the mettle of many voting-behavior theories (Sides Reference Sides2014). Yet, as discussed in the 2014 PS: Political Science and Politics Symposium articles, it experiences several challenges, including lack of data, lack of timeliness, distance from the campaign narrative, inadequate specification of incumbency, partisan polarization, and national-level aggregation of results (Campbell Reference Campbell2014; Lewis-Beck and Stegmaier Reference Lewis-Beck and Stegmaier2014; Linzer Reference Linzer2014; Sides Reference Sides2014).
Since its inception, election forecasting has focused on the grand prize: predicting the outcome of presidential elections. However, in 2012, only 7 of the 12 regression models highlighted in PS: Political Science and Politics predicted the reelection of President Obama (Campbell Reference Campbell2012b; Lewis-Beck and Stegmaier Reference Lewis-Beck and Stegmaier2014).
For the study of election forecasting to progress, more cases for experimentation and more reliable data sources are needed. The Facebook Model is an attempt to answer both of those needs.
The results of this exploratory investigation indicate that Facebook likes and PTAT metrics, when added to standard forecasting fundamentals, can produce surprisingly accurate vote forecasts in individual contests. The question remains, however, whether these results are an anomaly or a tool to expand the statistical forecasting of election results to campaigns for Congress. Only time and the testing of the model in future elections will determine if Facebook metrics are indeed a new tool to add to the forecasting toolbox.