Hostname: page-component-745bb68f8f-g4j75 Total loading time: 0 Render date: 2025-02-06T08:47:58.221Z Has data issue: false hasContentIssue false

Do Voters Benchmark Economic Performance?

Published online by Cambridge University Press:  26 June 2019

Vincent Arel-Bundock*
Affiliation:
Department of Political Science, Université de Montréal
André Blais
Affiliation:
Department of Political Science, Université de Montréal
Ruth Dassonneville
Affiliation:
Department of Political Science, Université de Montréal
*
*Corresponding author. Email: vincent.arel-bundock@umontreal.ca
Rights & Permissions [Opens in a new window]

Abstract

Type
Letter
Copyright
Copyright © Cambridge University Press 2019

The conventional theory of economic voting is that voters reward or punish the incumbent government based on how the domestic economy is performing. Recently, scholars have challenged that view, arguing that voters use relative assessments to gauge government performance. From this perspective, what matters is not how well the national economy is doing per se, but rather how it performs relative to an international or historical reference point.

This article revisits prominent published works in that emerging tradition, and finds that the available evidence does not support the benchmarking hypothesis. The authors come to this conclusion after taking a close look at the regression models that are typically used to test benchmarking. It shows algebraically that the way in which those models are specified invites a fundamental misreading of the evidence. Finally, the study proposes an alternative regression equation that can be used to test benchmarking, avoids common misinterpretations and facilitates the assessment of complex, conditional theories of relative evaluation.

Background

Economic voting is one of the most important accountability mechanisms at work in electoral democracies. The fact that voters reward or punish the incumbent government based on how the domestic economy is performingFootnote 1 is traditionally viewed as normatively desirable, for it reflects popular control of representatives.Footnote 2

Recently, a number of scholars have challenged this optimistic view, by pointing out that domestic economic growth is often a weak proxy for government performance. When the local economy moves in sync with secular trends or global shocks, governments may be rewarded or punished for events beyond their control.Footnote 3 This is especially true in integrated economies, where domestic fortunes are tightly linked to events abroad, and responsibility is blurred.Footnote 4 Democratic accountability thus requires more from voters than a simple response to local economic conditions.Footnote 5

A new and influential strand of research argues that voters do, in fact, make rational judgements about government performance, because their evaluations are relative.Footnote 6 What matters to rational voters may not be how well the national economy is doing per se, but rather how it performs compared to the economies of other countries, or relative to some historical benchmark.

Benchmarking is a powerful idea, which can be traced back to the work of Powell and Whitten.Footnote 7 As these authors point out, voters are likely to ‘evaluate government relative to some expectations about how the economy should have performed’.Footnote 8 But since expectations are difficult to measure, ‘it seems reasonable to use the international average levels of growth, inflation, and unemployment to estimate a baseline against which each country’s citizens could judge the performance of their own economy’.Footnote 9 This approach is intuitive, since ‘abundant research in other domains of social science supports the proposition that individuals are sensitive to comparative assessments’.Footnote 10

Yet there are also good reasons to doubt that voters benchmark economic performance. First, the benchmarking hypothesis is at odds with a dominant view on the cognitive limitations of ordinary voters. Indeed, a long tradition of research in political science has depicted the citizenry as poorly informedFootnote 11 and biased.Footnote 12 It is difficult to imagine how such an unsophisticated electorate could systematically and accurately compare how well the national economy is performing relative to other countries or a historical benchmark. Secondly, even if some authors posit that the media could facilitate benchmarking by making implicit comparisons in their news coverage, evidence of the underlying mechanism is rather weak. For instance, Kayser and Peress (henceforth, KP) report that high-information voters – those most exposed to the media – do not engage in more benchmarking than low-information voters.Footnote 13 Finally, some empirical studies claim that voters act based on relative economic conditions, but others find that when it comes to evaluating government performance, ‘the effect of luck is larger than the effect of competence’.Footnote 14 In short, the theoretical case for benchmarking is implausible, and the empirical record is mixed.

In this article, we show that the empirical evidence of benchmarking is extremely weak. We argue that the way in which regression models are typically specified to test benchmarking is needlessly complicated, that it invites a fundamental misreading of the evidence and that it may lead researchers astray. We propose a simpler model specification that can be used to test benchmarking, which avoids common misconceptions and carries powerful intuitions about the theory. We also show how this simple model can be enhanced to test more complicated theories, such as when voters benchmark against multiple reference points, or when the strength of benchmarking depends on the context. We revisit a prominent empirical study of Benchmarking Across Borders,Footnote 15 and conduct a faithful replication of the models reported in that article. When correctly interpreted, the results do not support the contention that voters make rational comparative evaluations.

Our findings have important implications for the field of economic voting, and for our understanding of the mechanisms that underpin democratic accountability. More generally, our article makes useful contributions to political science methodology by highlighting the shortcomings of a widely used empirical strategy, and by proposing a better way to test theories of relative evaluation.

Benchmarking vs. Conventional Economic Voting

The core intuition of benchmarking is illustrated in Figure 1, which shows how domestic growth and a reference point can affect support for the incumbent party. The solid line represents the domestic growth rate during the election year (G y ), and the dashed line represents the growth rate that voters use as a benchmark to evaluate the incumbent government’s performance. Depending on the analyst’s theory, the reference point could be the international growth rate (G i ) or the historical level of growth in the country under study (G h ).

Figure 1 Marginal effects of domestic economic growth and benchmark growth on votes for the incumbent.

The conventional view of economic voting is that votes for the incumbent (V) are tied to domestic growth. As we move from left to right, G y increases in Figure 1a but stays constant in 1b. Thus the conventional prediction is that votes for the incumbent will increase in 1a but stay constant in 1b.Footnote 16

In contrast, proponents of benchmarking argue that what matters to voters is not domestic growth per se, but rather the difference between domestic growth and the benchmark (G y G i ).Footnote 17 When the solid line is above the dashed line, domestic growth outperforms the benchmark, and voters should reward the incumbent. When the solid line is below the dashed line, domestic growth underperforms relative to the benchmark, and voters should punish the incumbent. As we move from left to right in Figure 1, the ‘performance gap’ or ‘competence signal’ increases in Figure 1a and decreases in 1b. Thus, benchmarking predicts that votes for the incumbent will increase in 1a and decrease in 1b.

These expectations can be restated using the language of multiple regression. When domestic growth increases and the reference point is held constant (Figure 1a), both theories predict that the incumbent’s vote share will increase. In other words, both theories predict that the marginal effect of domestic growth will be positive: ∂V/∂G y >0. When the reference point increases and domestic growth is held constant (Figure 1b), benchmarking predicts that votes for the incumbent will decrease. In other words, benchmarking predicts that the marginal effect of the reference point will be negative: ∂V/∂G i <0.

If we hope to discriminate between benchmarking and conventional economic voting, the main quantity of interest is the marginal effect of the reference point, since this is where benchmarking theory makes a distinctive prediction.

How do Scholars Test Benchmarking?

Conventional theories of economic voting are typically tested using models of this form:

(1) $$V{\equals}\beta _{y} G_{y} {\plus}{\rm \Psi \Omega }{\plus}\nu ,$$

where V is the incumbent’s vote share, G y is the domestic economic growth rate during the election year, Ω is a vector of control variables, and ν is a disturbance term. Clearly, Model 1 cannot be used to test benchmarking, since it ignores relative evaluations altogether.

In their seminal article, Powell and WhittenFootnote 18 estimate a regression equation of this form:

(2) $$V{\equals}\lambda _{{y{\minus}i}} (G_{y} {\minus}G_{i} ){\plus}{\rm \Phi \Omega }{\plus}\upsilon ,$$

with G i equal to a ‘reference point’, the international economic growth rate. Model 2 takes us very close to the benchmark story: When the gap between G y and G i is positive, the domestic economy outperforms the global economy, and voters should reward the incumbent government for its competence.

As KP note, however, Model 2 cannot be used to distinguish between benchmarking and conventional economic voting, because it suffers from omitted variable bias.Footnote 19 Indeed, the composite variable (G y G i ) is highly correlated with the level of domestic economic growth (G y ).Footnote 20 As a result, we cannot parse out the effect of benchmarking from conventional economic voting, and λ yi captures both phenomena. Model 2 is thus useful if we want to estimate something akin to the ‘total effect’ of domestic growth and benchmarking on voting behavior, but not if we wish to compare and contrast the two theories.

To solve this problem, KP introduce an additional control for the reference point:

(3) $$V{\equals}\theta _{{y{\minus}i}} (G_{y} {\minus}G_{i} ){\plus}\theta _{i} G_{i} {\plus}{\rm \Gamma \Omega }{\plus}{\epsilon}.$$

In this model, G y G i represents a ‘decomposed’ or ‘local’ component of growth, whereas G i aims to control for changes in the reference point. Model 3 has had tremendous influence in the field. At the time of writing, KP’s article has been cited over 160 times, and several other researchers have adopted and adapted their empirical strategy.

A Widespread Misconception

An intuitive – but ultimately incorrect – way to interpret Model 3 would be to focus on the gap between G y and G i , and to treat the θ yi coefficient as the effect of relative economic performance on votes for the incumbent.

For example, Aytaç argues that a positive estimate of θ yi provides ‘evidence for the hypothesis that voters reward (punish) incumbents on whose watch the economic performs relatively better (worse) in domestic and international comparisons’.Footnote 21 KP define ‘local growth’ as the gap between G y and G i , and interpret θ yi as measuring the association between ‘an increase in local growth’ and an ‘increase in the leader party’s vote share’.Footnote 22 Goplerud and Schleiter follow in those footsteps, and discuss θ yi as the effect of some ‘benchmarked’ or ‘local’ component of growth on voting behavior.Footnote 23 Using data on the American states, Ebeib and Rodden also interpret θ yi as the effect of ‘relative state conditions’ on votes.Footnote 24

If the domestic economy outperforms a reference point, it may be reasonable for voters to infer that the government is doing good work. In that spirit, Leigh treats θ yi as the effect of ‘government competence’ on votes for the incumbent.Footnote 25 In the American context, Wolfers considers the gap between state- and national-level economic growth, and calls θ yi the ‘effect of competence’.Footnote 26

Interpreting θ yi as the effect of relative economic performance on votes for the incumbent appeals to common sense, but it is a mistake. The root of the problem lies in the fact that G i appears twice on the right-hand side of Equation 3. This redundancy changes the substantive meaning of our regression coefficients.

To see how, take the partial derivative of Equation 3 with respect to G y , and find the marginal effect of domestic growth:

(4) $${{\partial V} \over {\partial G_{y} }}{\equals}\theta _{{y{\minus}i}} .$$

This simple exercise demonstrates that the coefficient associated with G y G i is exactly equivalent to the marginal effect of G y . Against intuitive common sense, θ yi does not measure the effect of relative economic performance on votes for the incumbent. Since θ yi is the marginal effect of domestic growth, finding a positive coefficient for ‘benchmarked’ or ‘local’ growth is actually supportive of conventional economic voting.

Tests of benchmarking based on Equation 3 have been repeatedly misinterpreted in prestigious scientific journals, by leading scholars of economic voting. The inclusion of duplicate regressors on the right-hand side of Equation 3 has been a source of widespread confusion in the economic voting literature.Footnote 27 To put this confusion to rest, we need a simpler, more direct test of benchmarking.

A Simpler Test of Benchmarking

From Figure 1, we learned that both benchmarking and conventional economic voting predict that the marginal effect of domestic growth should be positive. In contrast, only benchmarking predicts that the marginal effect of international growth should be negative. The simplest and most direct way to test those predictions is to estimate a model of this form:

(5) $$V{\equals}\delta _{y} G_{y} {\plus}\delta _{i} G_{i} {\plus}{\rm \Gamma \Omega }{\plus}{\epsilon}.$$

Since Model 3 includes redundant regressors, it carries no more information than the simpler Model 5. In fact, Models 3 and 5 are perfectly equivalent from a logical standpoint, and they produce identical numerical results: the marginal effect of domestic growth, the marginal effect of international growth,Footnote 28 the intercept, the control variables’ coefficients, the residuals and all fit statistics are always the same in both models. In the online appendix, we present side-by-side estimates using Models 3 and 5 to illustrate this point numerically.

Yet even if the two models are formally equivalent, the simpler specification has major advantages in terms of transparency, presentation and interpretation.

First, the correct interpretation of KP’s Model 3 is highly counterintuitive: the coefficient that they call ‘Local Component of Growth’ in their regression tables (θ yi ) does not measure the effect of the local economy’s relative performance on votes for the incumbent. As we showed above, this has been a major source of confusion in the field, and benchmarking results have been repeatedly misinterpreted in print. Our simpler specification avoids this problem.Footnote 29

Secondly, Equation 5 directly translates the theoretical intuitions conveyed by Figure 1, and it immediately reveals the relevant test statistics. Recall that the discriminating test of benchmarking is that international growth should have a negative marginal effect on votes for the incumbent. In Model 5, the marginal effect of international growth is the δ i coefficient, and we can simply look at its p-value. In Model 3, the marginal effect of international growth is a linear combination of coefficients (∂V/∂G i =θ i θ yi ), and we must conduct an extra Wald test to know if that combination is negative and statistically significant.

Finally, as we show below, our simpler specification offers a solid foundation on which we can build empirical tests for theories of benchmarking where voters compare multiple reference points, or where the strength of benchmarking is context dependent.

Replication: Benchmarking Across Borders

As we explained above, the key quantity of interest for tests of benchmarking is the marginal effect of international growth (holding domestic growth constant). Unfortunately, KP do not consistently report the statistics that are needed to test if that quantity is distinguishable from zero.Footnote 30 As a result, readers cannot assess the strength of the evidence simply based on the findings printed in Benchmarking Across Borders.

To see if KP’s data support their theory, we re-estimated all of their models using the authors’ replication files, and we computed all the quantities of interest.Footnote 31 Table 1 shows the results for four models,Footnote 32 estimated using KP’s preferred measure of international growth (an index constructed via principal component analysis).

Table 1 OLS regression models with incumbent vote share as the dependent variable

Note: robust standard errors in parentheses. *p<0.1, **p<0.05, ***p<0.01

Baseline Specification

In Column 1 of Table 1, we see that the G y coefficient is positive. This is consistent with both benchmarking and conventional economic voting. The G i coefficient is negative and statistically significant. This is consistent with benchmarking.Footnote 33 However, those results are not credible, because the model in Column 1 is fatally underspecified.

Controls, Lags and Fixed Effects

Ensuring that results are robust to the inclusion of controls and a lagged dependent variable is a minimum standard for most modern research on economic voting. In Column 2 of Table 1, we follow KP and add the same control variables as in their article; Column 3 includes the incumbent’s vote share in the previous election, and Column 4 includes both a lagged dependent variable and country fixed effects.

The three new models are consistent with conventional economic voting: the marginal effects of domestic growth in Columns 2 to 4 are all positive and statistically significant. However, none of the three models supports benchmarking: the marginal effects of international growth in Columns 2 to 4 are all indistinguishable from zero. As soon as we introduce control variables, a lagged dependent variable or country fixed effects – widely recognized best practices in the field – the evidence of benchmarking evaporates.

Alternative Measures of International Growth

The models in Table 1 were all estimated using an index of international growth constructed by principal component analysis. This is KP’s preferred measure of G i , but the authors also consider two alternatives: a trade-weighted average of growth rates around the world, and the international median.

In Benchmarking Across Borders, the choice between those three measures is rather inconsequential, because KP conclude that the evidence supports benchmarking, regardless of how they measure G i . Substantively, the authors take this to mean that ‘voters respond to their country’s deviation from various measures of average international performance’.Footnote 34 Moreover, KP do not offer a real theoretical defense of their preferred measure, and fit statistics do not give us strong reasons to favor one measure of G i over another.Footnote 35

Nevertheless, access to these two alternative measures of international growth is useful, because it allows us to probe the sensitivity of benchmarking tests to how we measure the reference point. In the online appendix, we replicate the eight regression models that KP estimated using aggregate-level data and their two alternative measures of international growth. None of those eight models shows evidence of benchmarking: the marginal effect of international growth is never distinguishable from zero.

Individual-Level Survey Data

Moving beyond aggregate-level data, KP also study benchmarking using individual-level surveys. Once again, their empirical specification resembles Model 3, and the quantity of interest is the marginal effect of international growth. In the online appendix, we replicate all twelve of KP’s individual-level models. None of those models allows us to reject the null of ‘no benchmarking’.

Three More Empirical Claims

In the online appendix, we consider three more empirical claims from the original article: (1) a statistically insignificant estimate of θ i constitutes evidence of ‘full benchmarking’, (2) the substantive effect of decomposed growth is more important than the substantive effect of domestic growth and (3) at several points in time, the magnitude of the benchmarked economic vote is greater than the magnitude of the non-benchmarked economic vote. Our assessment is that these claims do not add credence to the theory.

Do Voters Benchmark Economic Performance?

In their article, KP ‘argue that previous research has fundamentally misunderstood and hence incorrectly estimated how economic assessments are made’.Footnote 36 They contend that ‘voters respond more to national deviations from an international average rate of growth than to the growth rate itself’.Footnote 37 They claim that their empirical analysis reveals ‘strong evidence of cross-national benchmarking on economic growth both at the aggregate and at the individual level, across time periods, and across subsamples’.Footnote 38 Finally, after conducting extensive robustness checks, they conclude that their ‘main results are not altered’,Footnote 39 and that the evidence is ‘clearly inconsistent with no benchmarking’.Footnote 40

We re-evaluated benchmarking on KP’s own terms, using their original data, logically equivalent statistical models, the same null hypothesis testing framework and an evaluation criterion that they explicitly endorsed.Footnote 41 Yet our substantive conclusions are strikingly different.

When models include control variables, a lagged dependent variable or country fixed effects, we cannot reject the null of ‘no benchmarking’. When we use alternative measures of international growth, we cannot reject the null of ‘no benchmarking’. When we test the theory using individual-level survey data, we cannot reject the null of ‘no benchmarking’. In fact, out of the twenty-four regression models that we replicated, only one model – without controls or lagged dependent variable – supports the theory. In the other twenty-three tests, the critical quantity of interest does not cross (or even approach) conventional thresholds of statistical significance.Footnote 42 Put simply, the evidence in Benchmarking Across Borders amounts to little more than a null result.

How to Test Benchmarking With Multiple Reference Points

The models considered above show little evidence of benchmarking. This surprising result could be an artifact of several factors. For instance, our models may be too simple to capture the complex processes at work, or KP’s dataset may be too small to conduct well-powered tests. In this section, we consider how to adapt our barebones empirical framework to the more complex case in which voters compare domestic economic performance to multiple reference points. Then, we illustrate by studying a larger dataset drawn from a more recent study of benchmarking.

AytaçFootnote 43 develops a reference point theory that is highly reminiscent of KP’s, but which makes two important substantive changes. First, the author argues that voters use two reference points to assess their government’s performance: the level of international growth (G i ) and their own country’s historical level of growth (G h ).

Secondly, Aytaç points out that these reference points could be compared to two alternative measures of the incumbent’s performance: domestic growth during the election year (G y ) or domestic growth during the incumbent’s full term in office (G t ). The term-based measure is preferable if we adopt a rational voter model, since such voters can extract more information about the quality of government by observing performance over a longer period. The election year measure is preferable if we take the view – dominant in political psychology – that voters are cognitively limited, myopic and that they use end heuristics when engaging in retrospective evaluations.Footnote 44 Here, we remain agnostic and estimate models using both measures.

In our framework, testing theories of benchmarking with multiple reference points is straightforward: we simply introduce the new reference point variable additively in Model 5. Again, there is evidence of benchmarking if the marginal effect of domestic growth is positive, and if the marginal effects of the benchmarks are negative.

In Table 2, we illustrate this by estimating six models using Aytaç’s replication data.Footnote 45 In a first set of three models, we compare international and historical growth to domestic growth in the election year. In a second set of three models, we compare international and historical growth to the average domestic growth rate during the incumbent’s full term in office. We include the same control variables as Aytaç.

Table 2 Benchmarking with multiple reference points and alternative measures of domestic growth

Note: OLS regressions with country-clustered standard errors. Robust standard errors in parentheses. *p<0.1, **p<0.05, ***p<0.01

All six of the models in Table 2 show evidence of conventional economic voting: the coefficients for domestic economic growth (G y or G t ) are all positive and statistically significant. In contrast, none of the models allows us to reject the null hypothesis of ‘no international benchmarking’: the G i coefficient is never statistically significant at the α=0.1 level.

The two right-most models in Table 2 show evidence of historical benchmarking: the G h is negative and statistically significant. However, it is important to point out that those models rely on a highly unconventional assumption. Indeed, they assume that voters have long enough memories to accurately compare the average level of growth during the incumbent’s full term in office to the average level of growth during the previous government’s term. This assumption clashes with common wisdom in the field of economic voting, in which ‘virtually all macro-studies assume a short lag, generally of one year’.Footnote 46 Most studies of benchmarking also use short-term measures of domestic growth.Footnote 47

In sum, the results in Table 2 offer strong support for the conventional theory of economic voting, but evidence of benchmarking is mixed. None of the models allows us to confidently reject the absence of international benchmarking, and the only models that support historical benchmarking require us to jettison the widespread assumption that voters are myopic.

How to Test Conditional Theories of Benchmarking

The regression models that we have studied so far were relatively underspecified. Indeed, one of the major contributions of Powell and WhittenFootnote 48 was to point out that the level of economic voting depends on the institutional context (for example, clarity of responsibility). Similarly, there are good reasons to think that benchmarking will vary across populations: some voters – such as those with high information – might engage in more relative economic evaluations than others.

If benchmarking is truly conditional, then the ‘pooled’ models that we estimated above would be inappropriate, and our null results would not be surprising. For this reason, it is extremely important to develop regression models capable of testing conditional benchmarking hypotheses. Again, this is very easy to do in our simple empirical framework.

We use the same starting point as before (Figure 1). Benchmarking predicts that the marginal effect of domestic growth should be positive, and that the marginal effect of the reference point should be negative. If a moderating variable M increases (decreases) the salience of the reference point, then the marginal effect of domestic growth should be more (less) positive, and the marginal effect of the reference point should be more (less) negative where M is high.

This idea can be captured by a simple extension of Model 5:

(6) $$V{\equals}\delta _{y} G_{y} {\plus}\delta _{i} G_{i} {\plus}\delta _{{ym}} G_{y} {\times}M{\plus}\delta _{{im}} G_{i} {\times}M{\plus}\delta _{m} M{\plus}\Gamma \Omega {\plus}{\epsilon},$$

where M stands for a variable that moderates comparative economic assessments.

As usual, a positive marginal effect of domestic growth (δ y +δ ym M>0) would be consistent with both conventional economic voting and benchmarking. A negative marginal effect of international growth (δ i +δ im M<0) would be consistent with benchmarking. The slopes of those marginal effects (δ ym and δ im ) measure the extent to which M moderates relative economic assessments.

To illustrate how one can apply Model 6, we revisit a secondary set of tests from Aytaç,Footnote 49 where the author studies if benchmarking is more prevalent in countries with high trade intensity, GDP per capita or average level of schooling.Footnote 50 We assess the moderating effect of all three variables,Footnote 51 include the same control variables as in Table 2, and use Aytaç’s two alternative measures of domestic growth (G y and G t ). The full regression results are reported in the online appendix.

Figure 2 shows the estimated marginal effect of international growth in six models. None of the marginal effects is clearly negative, and most lines are nearly flat. These results, estimated using a dataset that is over twice the size of KP’s, offer no evidence of international benchmarking, and no evidence that trade, income or education increase the salience of comparative economic assessments.

Figure 2 Marginal effect of international growth on votes for the incumbent .Note: the figure displays the results of six regression models with three different moderators and two alternative measures of domestic growth. 95 per cent confidence intervals in gray.

Conclusion

In this article, we reinterpreted the theory of benchmarking and explained that, all else equal, it predicts that votes for the incumbent should be positively related to domestic growth, but negatively related to reference points. By recasting the theory’s predictions in terms of the marginal effects of domestic growth and the reference points, we showed that benchmarking could be tested using a simpler linear model that excludes duplicate regressors, immediately produces the relevant discriminating statistics and greatly facilitates interpretation.

We reanalyzed data from prominent studies that have claimed to present evidence clearly supportive of benchmark. Across a range of models, we found robust evidence that domestic growth affects voting behavior, but very little sign of benchmarking. We therefore conclude that benchmarking is an interesting hypothesis, but that it is not supported by the available evidence.

These results should not be interpreted as a wholesale rejection of the benchmarking hypothesis. Indeed, it seems reasonable to expect that some populations may be more responsive to international or historical comparisons than others. For example, voters may be particularly attuned to the economic performance of neighboring countries or rivals.Footnote 52 At the individual level, some types of voters (for example, politically sophisticated ones) may also be more prone to compare domestic with global economic performance or present economic conditions with previous ones. The idea that some voters evaluate economic performance in relative terms has some intuitive appeal, but the idea that most voters systematically and accurately compare with an ‘objective’ benchmark seems rather implausible given citizens’ cognitive limitations.

Perhaps most importantly, we have shown that there are great risks in using composite measures to test theories of relative evaluation. We have demonstrated that there is a straightforward way to test the benchmarking hypothesis, which is to avoid composite variables, and to simply enter each term additively in the regression equation. Using such an approach, we can formulate clear tests of the benchmarking hypotheses: all else equal, the benchmark should have a negative marginal effect on support for the incumbent party (or other relevant dependent variables). This simple empirical framework can also be extended in straightforward fashion to test theories of benchmarking with multiple reference points or context conditionality. We hope to have provided clear guidelines for further research on this complex and important question.

Supplementary Material

The data, replication instructions, and the data’s codebook can be found in Harvard Dataverse at: http://dx.doi.org/10.7910/DVN/OCNIWD and online appendices at: https://doi.org/10.1017/S0007123418000236

Acknowledgments

We thank Erdem Aytaç, Timothy Hellwig, Michael Lewis-Beck, seminar participants at Texas A&M University, and our regular lunch companions Chez Valère.

Financial Support

This work was supported by the Social Sciences and Humanities Research Council of Canada, and the Fonds de Recherche Société et Culture du Québec.

Related Article

A comment by Mark A Kayser and Michael Peress, “Benchmarking Across Borders: An Update and Response” is published in the British Journal of Political Science and can be found here: https://doi.org/10.1017/S0007123418000625”.

Footnotes

4 Duch and Stevenson Reference Duch and Stevenson2010; Fernàndez-Albertos Reference Fernàndez-Albertos2006; Hellwig and Samuels Reference Hellwig and Samuels2014.

7 Powell and Whitten Reference Powell and Whitten1993.

8 Powell and Whitten Reference Powell and Whitten1993, 396.

9 Powell and Whitten Reference Powell and Whitten1993, 396.

10 Kayser and Peress Reference Kayser and Peress2012, 664.

12 Achen and Bartels Reference Achen and Bartels2016; Taber and Lodge Reference Taber and Lodge2006.

13 Kayser and Peress Reference Kayser and Peress2012.

15 Kayser and Peress Reference Kayser and Peress2012.

16 Many proponents of the conventional view would remain agnostic with respect to Figure 1b.

17 For simplicity, we discuss international benchmarking, but if the analyst is interested in historical benchmarking, the relevant performance gap would be G y G h .

18 Powell and Whitten Reference Powell and Whitten1993.

19 Kayser and Peress Reference Kayser and Peress2012, 663.

20 In KP’s dataset the correlation coefficient between G y and (G y G i ) equals 0.83.

22 Kayser and Peress Reference Kayser and Peress2012, 669.

23 Goplerud and Schleiter Reference Goplerud and Schleiter2016, 444.

24 Ebeid and Rodden Reference Ebeid and Rodden2006, 537–9.

25 Leigh Reference Leigh2009. The quantities are interpreted slightly differently in Leigh’s model, since the author uses a conditional logit model, but the methodological problem remains the same.

27 For a short review of economic benchmarking, see Healy and Malhotra (Reference Healy and Malhotra2013), 296–7.

28 It is easy to show algebraically that δ y θ yi , and that δ I θ i θ yi .

29 Although KP’s model includes a single variable to represent the difference between G y and G i , it does not offer a single parameter to measure the association between that gap and votes for the incumbent.

30 KP only report the results of the relevant Wald test of coefficient equality for three of their twenty-four empirical models.

31 All models were estimated using our simpler Equation 5, but since Models 3 and 5 are strictly equivalent, our conclusions are unaffected when we estimate the original models.

32 Column 1 corresponds to Table 1, Model 3 in Benchmarking Across Borders. Column 2 corresponds to Table 3, Model 2. Column 3 corresponds to Table 3, Model 5. Column 4 corresponds to Table 3, Model 8.

33 Note that there is no evidence of benchmarking with respect to unemployment, even in this model.

34 Kayser and Peress Reference Kayser and Peress2012, 669.

35 Raftery (Reference Raftery1995) considers that there is ‘strong’ evidence against a model when its Bayesian Information Criterion (BIC) is 6 to 10 points larger than another model. In Table 2 of their article, KP report that the gap in BIC between the principal components and the median growth models is between 3.2 and 4.4.

36 Kayser and Peress Reference Kayser and Peress2012, 680.

37 Kayser and Peress Reference Kayser and Peress2012, 680.

38 Kayser and Peress Reference Kayser and Peress2012, 662.

39 Kayser and Peress Reference Kayser and Peress2012, 670.

40 Kayser and Peress Reference Kayser and Peress2012, 669.

41 Our hypothesis that the marginal effect of international growth should be negative is formally equivalent to the ‘partial benchmarking’ hypothesis discussed in KP (2012, 668). According to KP, the appropriate way to test partial benchmarking is to check if a Wald test allows us to reject the null that θ yi and θ i are equal. The p-value that this Wald test produces is exactly identical to the p-value of the δ i coefficient in our model.

42 The p-values of the δ i coefficient for all twenty-four models are: 0.233, 0.006, 0.904, 0.472, 0.187, 0.713, 0.329, 0.316, 0.479, 0.454, 0.575, 0.957, 0.330, 0.209, 0.233, 0.690, 0.395, 0.223, 0.702, 0.389, 0.246, 0.798, 0.443, 0.321.

44 Healy and Lenz Reference Healy and Lenz2014.

45 In the online appendix, we explain why these models do not replicate Aytaç’s faithfully.

46 Lewis-Beck and Stegmaier Reference Lewis-Beck and Stegmaier2013, 378.

47 Ebeid and Rodden Reference Ebeid and Rodden2006; Kayser and Peress Reference Kayser and Peress2012; Powell and Whitten Reference Powell and Whitten1993.

48 Powell and Whitten Reference Powell and Whitten1993.

50 The interaction models that we report here are slightly different from those estimated by Aytaç (Reference Aytaç2018). In the online appendix, we take a close look at Aytaç’s interaction specifications. Our discussion highlights some of the pitfalls of testing conditional benchmarking using composite variables and redundant regressors.

51 For comparison, the moderators are all rescaled to the [0, 1] interval.

52 Jérôme, Jérôme-Speziari and Lewis-Beck Reference Jérôme, Jérôme-Speziari and Lewis-Beck2001; Hansen, Olsen and Bech Reference Hansen, Olsen and Bech2015.

References

Achen, CH Bartels, LM (2016) Democracy for Realists: Why Elections Do Not Produce Responsive Government. Princeton, NJ: Princeton University Press.CrossRefGoogle Scholar
Anderson, CJ (2007) The end of economic voting? Contingency dilemmas and the limits of democratic accountability. Annual Review of Political Science 10, 271296.CrossRefGoogle Scholar
Arel-Bundock, Vincent, Blais, André Dassonneville, Ruth (2018) “Replication Data for: Do Voters Benchmark Economic Performance?”, https://doi.org/10.7910/DVN/OCNIWD, Harvard Dataverse, V1, UNF:6:36YYmCnYskcHu6JKRlV3gw==.Google Scholar
Aytaç, SE (2018) Relative economic performance and the incumbent vote: a reference point theory. Journal of Politics 80 (1):1629.CrossRefGoogle Scholar
Bartels, L (2012) Elections in hard times. Public Policy Research 19 (1):4450.CrossRefGoogle Scholar
Converse, PE (2000) Assessing the capacity of mass electorates. Annual Review of Political Science 3, 331353.CrossRefGoogle Scholar
Duch, RM Stevenson, R (2010) The global economy, competence, and the economic vote. Journal of Politics 72 (1):105123.CrossRefGoogle Scholar
Ebeid, M Rodden, J (2006) Economic geography and economic voting: evidence from the US States. British Journal of Political Science 36 (3):527547.CrossRefGoogle Scholar
Fernàndez-Albertos, J (2006) Does internationalisation blur responsibility? Economic voting and economic openness in 15 European countries. West European Politics 29 (3):2846.CrossRefGoogle Scholar
Fiorina, MP (1981) Retrospective Voting in American National Elections. New Haven, CT: Yale University Press.Google Scholar
Goplerud, M Schleiter, P (2016) An index of assembly dissolution powers. Comparative Political Studies 49 (4):427456.CrossRefGoogle Scholar
Hansen, KM, Olsen, AL Bech, M (2015) Cross-national yardstick comparisons: a choice experiment on a forgotten voter heuristic. Political Behavior 37 (4):767789.CrossRefGoogle Scholar
Healy, A Lenz, GL (2014) Substituting the end for the whole: why voters respond primarily to the election-year economy. American Journal of Political Science 58 (1):3147.CrossRefGoogle Scholar
Healy, A Malhotra, N (2013) Retrospective voting reconsidered. Annual Review of Political Science 16, 285306.Google Scholar
Hellwig, T Samuels, D (2014) Voting in open economies: the electoral consequences of globalization. Comparative Political Studies 40 (3):283306.CrossRefGoogle Scholar
Jérôme, B, Jérôme-Speziari, V Lewis-Beck, MS (2001) Évaluation économique et vote en France et en Allemagne. In Bruno Reynié D., et Cautrès (ed.) L’opinion européenne. Paris: Presses de Sciences Po.Google Scholar
Kayser, MA Peress, M (2012) Benchmarking across borders: electoral accountability and the necessity of comparison. American Political Science Review 106 (3):661684.CrossRefGoogle Scholar
Key, VO (1966) The Responsible Electorate. Cambridge, MA: Harvard University Press.CrossRefGoogle Scholar
Leigh, A (2009) Does the world economy swing national elections? Oxford Bulletin of Economics and Statistics 71 (2):163181.CrossRefGoogle Scholar
Lewis-Beck, MS (1988) Economics and Elections: The Major Western Democracies. Ann Arbor: University of Michigan Press.Google Scholar
Lewis-Beck, MS Stegmaier, M (2013) The VP-function revisited: a survey of the literature on vote and popularity functions after over 40 years. Public Choice 157 (3/4):367385.CrossRefGoogle Scholar
Powell, BG Whitten, GD (1993) A cross-national analysis of economic voting: taking account of the political context. American Journal of Political Science 37 (2):391414.Google Scholar
Przeworski, A, Stokes, SC Manin, B (1999) Democracy, Accountability, and Representation, 2nd ed. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Raftery, AE (1995) Bayesian model selection in social research. Sociological Methodology 25, 111163.CrossRefGoogle Scholar
Taber, CS Lodge, M (2006) Motivated skepticism in the evaluation of political beliefs. American Journal of Political Science 50 (3):755769.CrossRefGoogle Scholar
Wolfers, J (2002) Are Voters Rational? Evidence from Gubernatorial Elections. Technical report 1730. Graduate School of Business, Stanford University.CrossRefGoogle Scholar
Zaller, J (1992) The Nature and Origins of Mass Opinion. Cambridge: Cambridge University Press.Google Scholar
Figure 0

Figure 1 Marginal effects of domestic economic growth and benchmark growth on votes for the incumbent.

Figure 1

Table 1 OLS regression models with incumbent vote share as the dependent variable

Figure 2

Table 2 Benchmarking with multiple reference points and alternative measures of domestic growth

Figure 3

Figure 2 Marginal effect of international growth on votes for the incumbent .Note: the figure displays the results of six regression models with three different moderators and two alternative measures of domestic growth. 95 per cent confidence intervals in gray.

Supplementary material: Link

Arel-Bundock et al. Dataset

Link
Supplementary material: PDF

Arel-Bundock et al. supplementary material

Arel-Bundock et al. supplementary material
Download Arel-Bundock et al. supplementary material(PDF)
PDF 169.5 KB