Political scientists around the world have invested tremendous energy and resources into understanding the causes and correlates of voting (Blais Reference Blais2006; Campbell Reference Campbell2013; Cox Reference Cox2015). This large body of research relies, to a substantial degree, on post-election survey data where people self-report their voting behavior. However, the reported turnout rate in survey samples—even in probability samples—is usually much higher than what is reported in the official results (Burden Reference Burden2000; Karp and Brockington Reference Karp and Brockington2005; Ansolabehere and Hersh Reference Ansolabehere and Hersh2012). This turnout overestimation can potentially compromise the validity of research findings concerning the determinants of electoral participation (Bernstein, Chadha and Montjoy Reference Bernstein, Chadha and Montjoy2001; Selb and Munzert Reference Selb and Munzert2013).
In the survey literature, turnout overestimation is conventionally attributed to three main causes: sampling error leading to an overrepresentation of voters (Burden Reference Burden2000); unintentional inaccurate recall of voting behavior (Stocké and Stark Reference Stocké and Stark2007; Waismel-Manor and Sarid Reference Waismel-Manor and Sarid2011); and deliberate misreporting in response to the social desirability associated with voting (Näher and Krumpal Reference Näher and Krumpal2012). In this paper, we focus on the third cause. Survey methodology research has shown that offering abstainers the opportunity to use face-saving response options succeeds at reducing social-desirability-induced overreporting by a range of 4–8 percentage points. However, this finding rests on survey experiments conducted by phone in the United States after national elections. Additionally, the results from other countries are less supportive.
In this paper, we offer a thorough test of the efficacy of the face-saving response items through a series of wording experiments embedded in 19 post-election surveys in Europe and Canada, at four different levels of government. We believe it is important to study a great variety of contexts, as we do not know to what extent the results found in the American case are generalizable to other countries and to other elections. Social desirability is indeed likely to be lower in countries where turnout is not public, and in elections that are less important in terms of policy impact. Our analyses reveal a distribution of effect sizes ranging from null to minus 18 percentage points with a mean effect size of minus 7.6 percentage points. Here, we consider reduced reported turnout as a quality benchmark. As actual voting reports are not available in the countries studied, we are unable to validate the claim that overreporting is lower when reported turnout is low. However, this benchmark has been often used by previous studies that conducted similar wording experiments designed to investigate the effect of face-saving response items on turnout overreporting in the United States (Belli et al. Reference Belli, Moore and VanHoewyk1999; Belli, Moore, and VanHoewyk Reference Belli, Traugott, Young and McGonagle2006; Duff et al. Reference Duff, Hanmer, Park and White2007).
Addressing the Problem of Social-Desirability Bias
Social-desirability-induced overreporting is rooted in the fact that not all respondents are equally impacted by social pressure. For example, it has been shown that the more educated a non-voter is, the more likely she is to report having voted (Anderson and Silver Reference Anderson and Silver1986; Silver, Anderson and Abramson Reference Silver, Anderson and Ambramson1986; Bernstein, Chadha and Montjoy Reference Bernstein, Chadha and Montjoy2001). This pattern is thus likely to lead to an overestimation of the effect of the variables associated with the social-desirability bias in the explanation of electoral participation.
Several solutions have been put forward to address the problem of social-desirability-induced overreporting, including using mail instead of face-to-face surveys (Preisendörfer and Wolter Reference Preisendörfer and Wolter2014), modifying the sequence of questions (Holbrook and Krosnick Reference Holbrook and Krosnick2013), or using longer question preambles (Belli, Moore and VanHoewyk Reference Belli, Traugott, Young and McGonagle2006). All these tools are designed to release, or at least reduce, the social pressure exerted on the survey respondent when she is asked to indicate whether she voted or not. A recent study has used a more direct approach and shown that reminding American respondents that the investigator can check in the public records and verify whether they voted or not significantly reduces overreporting (Hanmer, Banks and White Reference Hanmer, Banks and White2014). However, in a great number of countries the law strictly prohibits access to voting records for privacy concerns.
One of the most promising solutions to the problem of social desirability bias is the combination of a short preamble, where the possibility of abstaining is presented as a legitimate choice, and response options that allow respondents to choose between various face-saving responses if they want to indicate that they did not vote. In standard turnout questions, the respondent has to indicate whether she voted or not in the last election (“yes” or “no”). In face-saving versions, there are extra response categories. These categories allow the respondent to justify why she did not vote in a socially acceptable way. Typically, these response categories include statements such as “I usually vote but not this time” and “I thought about voting but could not go.”
The evidence found in the literature suggests that the inclusion of face-saving response items can be successful at diminishing turnout overreporting in American post-election surveys. An experiment embedded in the 2002 American National Election Study (ANES) telephone survey, where half the sample received the standard voting question and the other half the face-saving question, found that the reported turnout in the mid-term election was 8 percentage points lower among those who responded to the face-saving question (p=0.002) (Duff et al. Reference Duff, Hanmer, Park and White2007). Another study implemented a similar experiment in a telephone survey after the 1998 Congressional election and reported a weaker effect: turnout was 4.6 percentage points lower when the turnout question included face-saving response items (Belli, Moore and VanHoewyk Reference Belli, Traugott, Young and McGonagle2006). In that study, the effect did not reach conventional levels of statistical significance (p=0.078). The ANES has used face-saving question wording to measure electoral participation, alone or as part of a wording experiment, since 2000.
Our review of the literature reveals less consistent findings when the face-saving items were implemented in election studies outside the United States. However, it is unclear whether failures to replicate the American findings in non-US contexts are due to inconsistencies in research designs. A telephone survey experiment implemented after the 2008 federal election in Austria reported a 4.6 percentage point reduction (p=0.080) (Zeglovits and Kritzinger Reference Zeglovits and Kritzinger2014). This weak finding may be due to the small sample size or to the fact that the survey was implemented more than two years after the federal election. Pressure to report socially desirable political behavior might have weakened by that time.
A different telephone survey experiment conducted in two Israeli cities after the 2008 municipal election (Waismel-Manor and Sarid Reference Waismel-Manor and Sarid2011) had quite a different result. The experiment, which compared the standard turnout question (“yes” or “no,” short preamble) to a turnout question that combined face-saving response options with a long preamble aimed at reducing memory failures, validated reported turnout with the actual turnout record of the individuals. Analyses did not reveal any substantive difference when responses were collected four to five weeks after the election (a 3 percentage point reduction, p=0.37). Furthermore, the data collected 12 months after the election show a disquieting pattern: the treatment question increased turnout by 7 percentage points (p=0.06). The authors suggest that the long preamble might have in fact increased social pressure, but only long after the election, when memory failures were the most likely. However, with regard to the null finding right after the election, we cannot exclude the possibility that the social desirability of voting is weaker for municipal than for national elections.
Finally, a third study presents the results of an online survey experiment conducted in Sweden (Persson and Solevid Reference Persson and Solevid2014). Among other things, the experiment was designed to test for the impact of including face-saving response options in questions asking respondents to report whether or not they did each of 11 political behaviors during the 12 months preceding the survey. The study finds significant treatment effects on all the survey items except for the turnout question. One might suggest this null effect is due to the fact that social desirability biases are weaker in online surveys as respondents are not in direct contact with an interviewer. Yet, this explanation fails to account for the significant effects observed for the other political behaviors. A more plausible possibility would be that the turnout question generated confusion among respondents. Indeed, the question did not refer to any specific election but simply asked respondents whether they voted during the previous year. Of course, it is also possible that cultural differences may partly account for why the face-saving question is more efficient in some countries or regions than others. Our study investigates this possibility.
This paper addresses the following three research questions: Is the inclusion of face-saving response options in a turnout question effective outside the United States? Is it only effective in national elections, or also at other levels of government? And, is it effective when respondents are interviewed online? We answer these questions by presenting the results of 19 survey experiments implemented in online surveys conducted in Europe and Canada. The surveys were fielded between 2011 and 2014, in the weeks following municipal, regional, national, and European elections.
Materials and Methods
Between 2011 and 2014, we conducted 19 two-wave (pre- and post-election) online panel surveys within the framework of the Making Electoral Democracy Work project. The surveys were fielded in Canada, France, Germany, Spain, and Switzerland. For each survey, an initial sample of ~1000 respondents was recruited from pre-existing survey panels. We used gender, age, and education (and linguistic if applicable) sampling quotas to ensure the diversity of the samples. On average, the proportion of initial respondents who completed both survey waves is 75 percent. Our final post-election sample for each survey is relatively large, usually 700–900 respondents. These observations are weighted for age, gender, and level of education, thus maximizing the representativeness of the resulting samples.
The 19 surveys cover 12 elections of different types: two municipal elections (Paris 2014 and Marseille 2014), six state/regional/cantonal elections (Catalonia 2012, Lucerne 2011, Zurich 2011, Lower Saxony 2013, Ontario 2011, and Québec 2012), three federal/national elections (France 2012, Spain 2011, and Switzerland 2011), and one European election (2014). Additional details concerning the 19 post-election surveys can be found in the online appendix. These include the dates of the elections, the turnout rate in the region from which respondents were sampled, the participation rate for each survey, and the dates of data collection.
The voting question was asked near the beginning of the post-election questionnaire. As in other studies mentioned above, half of the respondents were randomly assigned to a standard voting question, and the other half to a face-saving question. Table 1 reports the English translation of both types of questions (as it was used in the Québec and Ontario provincial elections). Both questions started with a preamble, which reminded the respondent that in each election a lot of citizens do not vote. This type of reassuring preamble is typical of modern voting questions (see, e.g., the ANES and the British Election Study). The wordings used for the interviews in other languages (French, German, Spanish, and Catalan) are displayed in the online appendix.
Table 1 Voting Questions Experiment
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170605071427223-0737:S2049847016000315:S2049847016000315_tab1.gif?pub-status=live)
In the group of respondents that received the face-saving question (treatment group) only respondents who indicated “I am sure I voted in the election” are considered to be voters. The other respondents form the control group. In both the control and treatment groups, we exclude respondents who chose the “don’t know” category. The proportion of “don’t know” answers was similar in both groups. The full results and some randomization checks are displayed in the online appendix.
Results
Figure 1 displays the results of our 19 survey experiments. Columns distinguish the results depending on the type of election. Rows identify the country and the region where the survey was conducted. Bars show the treatment effect (T e ) in percentage points, with 95 and 99 percent confidence intervals. Additional details concerning the results are displayed in the online appendix.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170605071550-10829-mediumThumb-S2049847016000315_fig1g.jpg?pub-status=live)
Fig. 1 Treatment effects of the inclusion of face-saving response options on reported turnout Note: The error spikes represent the 95 and 99 percent confidence intervals. For each survey, observations are weighted by age, gender, and education. The surveys for the municipal elections were conducted in the cities of Paris and Marseille (i.e., the biggest cities in Île-de-France and Provence-Alpes-Côte d’Azur, respectively).
The first column reveals the treatment effect in two municipal elections in France. In both instances, the face-saving question reduced the reported turnout (as shown by negative effects). The treatment effect in Marseille (see the Provence-Alpes-Côte d’Azur row) is not significant (T e =−5.99, p=0.232, N=517). However, in Paris (see the Île-de-France row), the effect is statistically significant (T e =−7.52, p=0.019, N=856).
The second column of Figure 1 presents the results of six experiments implemented after regional elections. Our analyses show significant effects in four of them: Catalonia (T e =−6.75, p=0.001, N=800), Lucerne (T e =−12.53, p<0.001, N=904), Zurich (T e =−8.24, p=0.005, N=843), and Québec (T e =−10.0, p<0.001, N=724). The effect in Lower Saxony leans in the expected direction but fails to reach significance (T e =−5.68, p=0.152, N=818). It is striking to note that the face-saving question had absolutely no influence on reported turnout in Ontario (T e =−0.19, p=0.950, N=884).
The third column reports the results from survey experiments conducted after three national elections, with samples drawn from two regions in each country. The two bars appearing at the top of the column show the treatment effect in the regions of Provence-Alpes-Côte d’Azur (T e =−0.99, p=0.820, N=719) and Île-de-France (T e =−0.70, p=0.873, N=748) in the post-election survey following the 2012 national legislative election in France. For these elections, the treatment has no effect on reported turnout. The next two bars report the treatment effects in surveys conducted after the 2011 national election in Spain. The results show no clear impact of the question wording in the region of Catalonia (T e =−2.41, p=0.284, N=818), but a significant effect in the region of Madrid (T e =−5.99, p=0.003, N=823). Finally, the last two bars appearing in the third column report the effect for the 2011 federal election in Switzerland. We find an effect in the region of Lucerne (T e =−9.09, p=0.003, N=844), but not in the region of Zurich (T e =−4.48, p=0.126, N=840).
The fourth column presents the results from five survey experiments conducted after the 2014 European election. The analyses reveal strong effects in Provence-Alpes-Côte d’Azur (T e =−14.24, p<0.001, N=806), Île-de-France (T e =−17.98, p<0.001, N=834), Catalonia (T e =−12.23, p=0.002, N=811), and Madrid (T e =−13.68, p=0.001, N=805). However, no significant effect is observed in Lower Saxony, although the bar leans in the negative direction (T e =−5.93, p=0.210, N=791).
When all the observations collected in our 19 survey experiments are pooled and a single analysis performed, we find that the face-saving question reduces reported turnout by 7.59 percentage points (p<0.001, N=15,185).Footnote 1 Another way to summarize our finding is to calculate the mean treatment effect size across our experiments. This mean value equals −7.61 (SE=1.13, p<0.001, N=19).Footnote 2
As mentioned above, our goal in this paper is to test whether the face-saving turnout question reduces reported turnout in various contexts. An extensive examination of which factors explain the variations in treatment effects between surveys goes beyond the scope of this paper. However, we conducted a preliminary, exploratory study of the moderating factors by pooling the 19 surveys and estimating multi-level regressions in which we interacted the treatment with various individual-level and macro-level variables. The results are disclosed in Table 2. Details about the variables can be found in the online appendix.
Table 2 Multi-Level Regression Models
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170605071550-76046-mediumThumb-S2049847016000315_tab2.jpg?pub-status=live)
Note: Entries are coefficient estimates of logit models predicting the probability to report having voted or not. A random effect is added at the level of the survey to correct for the multi-level nature of the data. In Models 2–4, we add an interaction term between the covariates included in the model and the treatment (see interaction term column). Standard errors are in parentheses.
***p<0.001, **p<0.01, *p<0.05.
We find that the treatment effect is larger for different groups of respondents: those who reported that they were encouraged by a friend or an acquaintance to vote for a party (compared with those who did not report that), those who said that they decided whether to vote or not on Election Day (compared with those who said they made the decision before), those who believe voting is a choice (compared with those who believe it is a duty), and those who have completed a post-secondary degree. Additionally, on average, the treatment effect is larger in European elections (compared with other government levels), and in elections where the actual turnout is high. Finally, the results indicate that the treatment effect tends to be larger in surveys for which the participation rate, that is, the proportion of completed questionnaires from invitations sent to panel respondents, is high.
We cannot really develop a clear picture of the factors moderating the effect of the face-saving turnout question on reported turnout; again, this is beyond the scope of this paper. However, we have shown that a multitude of individual and macro factors seem to be important in this respect.
Discussion
The results of our experiments demonstrate the utility of using face-saving response items in countries outside the United States, and at different levels of government. As such, it makes several important contributions to the field of survey methodology and to the large literature on election studies. First, it is worth noting that none of our 19 survey experiments show that the face-saving question causes an increase in reported turnout. Indeed, at its worst, implementing the face-saving question simply has no effect. Second, this paper is the first to show that the face-saving question can be effective even when respondents are interviewed online. This finding suggests that social desirability can be strongly embedded in cultural conventions and habits, enough to drive survey respondents’ answers even when nobody is likely to judge them (see also Näher and Krumpal Reference Näher and Krumpal2012; Persson and Solevid Reference Persson and Solevid2014). Finally, we find that the inclusion of face-saving response options can reduce reported turnout at several different levels of government: municipal, regional, national, and supra-national.
In light of our results, we strongly recommend that electoral studies outside the United States offer respondents face-saving options when asking them to report their voting behavior in post-election surveys. At the very least, the likelihood of gathering honest responses will increase. This will improve the quality of data used to understand voter behavior and researchers can only benefit.
One limitation and two important questions arise from our findings, and suggest directions for future research. First, in our study, we used pre-existing survey panels and two-wave surveys, which are likely to bias the representativeness of the online samples in ways that cannot be corrected by demographic weights. For example, we can expect that respondents who accept to complete two surveys within a relatively short period of time are likely to be more interested in politics than the rest of the population. This, in turn, may contribute to the gap between actual turnout and reported turnout in our sample (Martinez Reference Martinez2003). Additionnally, in our pre-election surveys we asked respondents whether they intended to vote or not, and prior work indicates that doing so may inadvertently influence respondents' voting behavior on Election Day (Greenwald et al. 1987; however, Smith, Gerber and Orlich 2003 failed to replicate the original finding).
Second, a question that arises from our study is what is it that causes the differences in treatment effects from one survey to another? We conducted some preliminary tests of the factors potentially moderating the treatment effect, but the results are mixed. On the one hand, we find that the treatment effect is larger when actual turnout is higher; on the other hand, we find that it is also larger in European elections where turnout is usually low. More research that takes other factors into consideration, such as the nature of the campaign, is needed to elucidate this apparent paradox.
Third, is the treatment effect effective across all subgroups of respondents? Previous studies have matched self-reported turnout with actual voting records, allowing researchers to identify the characteristics of abstainers who misreport in post-election surveys. Some of these studies show that abstainers who overreport voting are very similar to actual voters: they are interested in politics, strongly identify with a political party, and have achieved higher levels of education (Ansolabehere and Hersh Reference Ansolabehere and Hersh2012).Footnote 3 Our preliminary tests suggest directions for future work, as our most promising finding is that the inclusion of face-saving response items has a greater effect on the probability of reporting turnout for respondents who reported having been encouraged to vote for a party by friends or acquaintances and for those who made the decision to vote or not on Election Day. To the best of our knowledge, this is the first study to explore the effect of these factors. Future research could, for example, replicate these findings in other contexts and with other tools that would be tailored to precisely measure these aspects.
Acknowledgements
The data used in the paper were collected within the Making Electoral Democracy Work Project, which is funded by the Social Sciences and Humanities Research Council of Canada.