Until recently, public opinion polls conducted before an election were the main tool for explaining voter reactions to the candidates and their campaigns as well as estimating the outcome. Generally, polls conducted closer to Election Day produced better estimates than those conducted earlier, but recent analyses explain volatility in the preelection polls as well as indicate how to produce earlier, more accurate estimates. Now, and for the last few presidential election cycles, forecasters and statistical modelers produce estimates of election outcomes with a focus on factors such as how far in advance the estimates are produced and with what level of accuracy. While more polling is conducted at the state level, data aggregators in particular are estimating the outcome in each state and translating that into estimates of electoral votes as well as shares of the popular vote at the national level.
With advances in statistical tools and new methodologies, the prominence of preelection polls in their estimation role has slightly diminished, although poll results from the trial-heat question that asks about candidate preference remain a primary ingredient in statistical models that have fared well in recent electoral sequences. The tendency of news coverage to focus more intently on who is ahead and behind as well as to explain the candidates’ trajectories across the campaign has reduced the use of polls to provide empirical analyses of the electorate’s responses to the candidates and the effectiveness of their campaigns (Patterson Reference Patterson2005).
INTRODUCTION
Public opinion polling has been a central element of election coverage in the media in the United States for more than 75 years. News organizations believe that elections are important and have a real meaning and consequence for their readers and viewers. From an institutional perspective, it is relatively easy to organize coverage around an event that occurs on a fixed schedule, involves conflict, has multiple sources willing to be quoted, and has a definitive conclusion when the votes are counted so a coverage package can be wrapped up. This allows news organizations to budget resources and to reallocate them along the way as they sense that campaign events warrant.
The 1936 presidential campaign marks the origin of the contemporary polling period when George Gallup struck up a business relationship with The Washington Post to publish the results of his polls. Based on his understanding of their faulty methodology, Gallup offered a money-back guarantee that he could do a better job than The Literary Digest, the leading prognosticator of presidential elections until then, so the newspaper had little to lose. Gallup had a great deal to gain in public visibility that could stimulate business from commercial clients based on his reputation in the public sector. This was the start of a long-term symbiotic relationship between news organizations and pollsters that remains today, even taking into account the financial pressures that many news organizations face and the impact of new technologies and changing lifestyles on current polling methods.
Preelection polls can assist news organizations with their coverage in several ways. They provide content about the electorate’s reactions to the candidates and their campaigns, such as measures of their issue preferences and responses to specific events. As part of a longitudinal design, they assess the shifting dynamics of the campaigns’ impacts. Of course, they can also help assess who is ahead and who is behind, supporting the worst tendencies of news organizations to engage in “horse race” journalistic coverage. This phenomenon has been exacerbated by the use of certain technologies like interactive voice recognition (IVR) methods to reduce the cost of polling and produce more frequent measures of candidate standing at both the state and national levels.
The coverage of election campaigns has evolved with the advent of technology, the 24-hour news cycle, and the use of polling information. In the 1940s and 50s, election coverage was a newspaper story; the reporting was based on interviews with political elites interpreted by career political correspondents. Often a Sunday story preceding Election Day provided a summary of the campaign and possibilities for the outcome, and a Monday story focused on Election Day weather and its likely impact on turnout. A summary of the returns would appear in Thursday morning papers. When television took over Election Night coverage in the 1960s, the reporting of returns was faster, and eventually exit polls were developed to provide an analytical capability on Election Night. By the 1970s, the networks and major metropolitan dailies combined forces and resources to establish their own polling operations, giving them editorial control over the content and timing of polls. Further technological innovation allowed them to do quick reaction polls to campaign events like debates or other sometimes unanticipated campaign events like a speech or foreign event. In this period, polls were used to collect the basic independent variables that could be used to explain why support shifted—or did not.
One problem preelection pollsters typically face is that they have to make their surveys serve different purposes. Many firms prepare three kinds of estimates for media distribution based on different samples or subsamples of the population. On the one hand, they want to measure things like presidential approval among adult citizens to maintain consistency with the time series established before the campaign started. At the same time, voting in the United States is a two-step process that requires registration to establish eligibility. Especially early in the campaign, data are reported for registered voters and then compared to all adults. Finally, turnout rates in the United States are relatively low compared to other countries, typically ranging between 55% and 60% in recent elections, and people can cast their ballots in an increasing variety of ways. In recent elections, about one-third vote before the traditional “election” day by mail, in person at early voting centers, or by absentee ballot (Baretto et al. 2006). So pollsters have to identify those who have voted, determine the “likely electorate” among those who have not, and then determine how to combine these two segments in appropriate proportions (Erikson, Panagopolous, and Wlezien 2004; Rogers and Aida 2103). These techniques are part of the “secret sauce” that distinguishes one firm from another, but the full details are not typically divulged for fear of forsaking a competitive advantage.
So pollsters have to identify those who have voted, determine the “likely electorate” among those who have not, and then determine how to combine these two segments in appropriate proportions (Erikson, Panagopolous, and Wlezien 2004; Rogers and Aida 2103). These techniques are part of the “secret sauce” that distinguishes one firm from another, but the full details are not typically divulged for fear of forsaking a competitive advantage.
The accuracy of preelection polls has increased consistently over time, although some differences are seen by survey organizations, commonly referred to as “house effects” (Franklin Reference Franklin2008). Although overall the estimation of the outcome of the 2012 election was very good, some firms were consistently different across the campaign and in their final estimates. Some of this was expected based on past performance, as in the case of Rasmussen Reports and its consistent Republican bias or John Zogby’s historical Democratic bias, measured as a difference from the actual proportion of the vote each candidate received. One unexpected source of such a biased estimate was the Gallup polls that showed greater public support for Mitt Romney than the final vote tabulation provided (Sides Reference Sides2012). As a result, Gallup has undertaken a systematic review of its polling procedures to improve its estimation. They produced an interim report on their analysis to date (Gallup 2013), and their effort extended into the fall with research conducted in conjunction with the statewide gubernatorial elections in New Jersey and Virginia. The results of these studies will be publicly available and should contribute to improved methods in the industry as a whole and greater transparency with regard to preelection polling procedures.
PREELECTION POLLS AND THE DATA AGGREGATORS AND FORECASTERS
For some time, both forecasters and data aggregators—individuals who construct statistical models that use aggregate measures of public opinion such as trial-heat standings or presidential approval ratings—have relied on various limited public opinion assessments in their work. The forecasters have used measures like presidential approval for some time. Their major methodological obstacle has been the relatively small number of elections for which such information is available. In the last two presidential cycles in particular, the preelection polls were the key ingredient for the data aggregators and their prediction models for the outcome, in both popular and electoral vote terms. There are a number of reasons for this situation. First, the number of data points has increased exponentially at both the state and national levels, particularly with the advent of IVR polling. Second, the accuracy of the polls has been improving incrementally over time, despite problems of lowered response rates and issues like the increasing penetration of cell phone use only or mostly households where interviews are more difficult and costly to obtain.
The data aggregators took advantage of the fact that the measurement of candidate preference in a hypothetical election held “today” has generally been standardized with a “trial-heat” question that is very similar in wording among organizations although its placement in the questionnaire is still a source of one of many house- effect differences. In their models the forecasters have similarly applied a measure of presidential approval as a well understood and relatively common operationalized public opinion concept using a relatively consistent wording. In the case of the data aggregators, their models are applying comparisons and adjustments between state-level and national-level data in both directions as a refinement of their estimates. In addition, they make adjustments in their models, for the historical accuracy of different polling firms as well as the variance in their current estimates from a composite average at the state or national level. So some elements of polling data are crucial pieces of input for the forecaster and data aggregators.
In the 2012 campaign, some questions were raised about likely voter modeling and whether adjustments are necessary to the types of questions currently used to assess probabilities of voting. One interesting question about the 2012 campaign is the extent to which the highly successful targeting efforts of the Obama campaign were idiosyncratic and a one-time event or they are the wave of the future. Recent presidential campaigns have foregone federal financing and raised huge amounts of campaign funds, almost all of which has been spent on a limited number of “battleground” states. While turnout declined in 2012 compared to 2008, the reduction was negligible or nonexistent in those 10 or 12 states (Hanmer Reference Hanmer2013). As a result, the active and highly targeted Obama campaign may have created some sampling issues for pollsters as well, one which produced a systematic underrepresentation of his share of the vote in national samples because respondents from these states as a group were underrepresented relative to their share of the vote. These geographically dispersed states do not ordinarily form a regional stratum in the typical national sample design, but conceptually this may be a useful adjustment to standard stratification strategies in the future.
Several polls using IVR techniques also estimated the outcome of the election quite well, although their lack of transparency hinders an understanding of how they accomplished this. Recent research (Traugott Reference Traugott2012) suggests that the raw data from IVR samples does not reflect the adult population of the United States very well and is biased toward older, female white voters in expected ways. While the weighted estimates produced from the trial-heat question seem close to election outcomes, the lack of information about weighting or other adjustments makes it difficult to place much credence in them. This raises important questions about provenance and methodology for those who use such estimates in their models.
CONCLUSIONS
Overall, these trends in preelection polling, coupled with increasingly sophisticated statistical models of election outcomes used by forecasters and data aggregators, raise interesting questions about the meaning of the concept of “public opinion” in the contemporary period. First, empirical public opinion used to refer to the aggregated individual opinions of a sample of a population measured with valid and reliable survey questions. Until the 2012 election cycle, this was the purview of academic survey researchers and pollsters. Going forward, will the definition of “public opinion” take on new meanings with the work of the data aggregators, in particular, given their efforts in predicting election outcomes?
Second, some public pollsters will redesign their research to improve the precision of their preelection estimates to match the accuracy of with the data aggregators. This may help them compete in the private sector for clients who are interested in the best available market research at the lowest cost, especially against low-cost data collection methodologies increasingly involving social media. If this happens, will average citizens be the losers as news organizations devote an even greater proportion of their coverage to the relative standing of the candidates but include less explanatory information about where that support comes from and why it might be shifting during the campaign?