Arel-Bundock, Blais and Dassonneville (Reference Arel-Bundock, Blais and Dassonneville2018, ABD hereafter) offer an unusual critique of our article, Benchmarking across Borders. They find no methodological flaws, produce identical empirical results and concede that their proposed specification (Model 5) is mathematically identical to that used in Kayser and Peress (Reference Kayser and Peress2012, KP hereafter). ABD make two claims: (1) that their preferred specification is an innovation that improves interpretation and (2) that the empirical evidence presented in KP does not support benchmarking. The first is unpersuasive and the second depends on a selective reading of the evidence. We address these issues below and update the individual-level dataset from KP to increase statistical power, finding additional evidence of benchmarking.
Value Added?
Let us examine the improvement in clarity from ABD’s innovation. KP define local growth as the deviation of national economic growth from an international benchmark yielding coefficients on local growth and on the international growth benchmark. This allows us to test three relevant null hypotheses (the null of no-benchmarking, the null of benchmarking and the null of no economic voting) as well as the possibility of partial benchmarking. We concisely and correctly discuss how to interpret the three relevant null hypotheses and how to interpret the relative size of the local and international (aka global) coefficients as the degree of benchmarking in the last paragraph of page 668 of KP. ABD spend four pages describing how to test only one of the relevant nulls and then present a model (Model 5) with coefficients on un-benchmarked growth (G
y
) and international growth (G
i
). Readers confronted with coefficients such as in ABD’s Table 1 are supposed to know that the coefficient on international growth does not capture the effect of international growth on the vote but, rather, the effect of cross-national benchmarking. This is a dubious improvement to clarity.
Table 1 Expanded individual-level results for benchmarking in the economic vote
Nor is ABD’s specification superior because it does not require a Wald test of the equality of two coefficients: it requires a Wald test for the null of benchmarking and the null of no economic voting. Moreover, unlike ABD, the KP specification allows the easy calculation of a point estimate of the degree of partial benchmarking (by dividing the global coefficient by the local coefficient). ABD’s preferred specification is mathematically equivalent, as they acknowledge (for example, in footnote 40), but it is less intuitive for the interpretation that matters and adds no value.
Mechanism
ABD additionally argue that ‘when correctly interpreted, the results [in KP] do not support the contention that voters make rational comparative evaluations’. They arrive at this broad-stroke claim, however, by impugning an admittedly improbable mechanism that we do not espouse, neglecting the evidence for the mechanism that we do espouse, ignoring additional evidence of the effect that we report, and then testing only one possible null. Consider the mechanism. ABD consider implausible the assumption that voters are sufficiently informed and rational to assess foreign growth rates, compare them to those in their country and hold politicians accountable. We also do not hold this to be the most plausible mechanism, which is why we develop and test a much different mechanism in KP that ABD barely engage.
We believe that voters (both low and high information) learn about economic growth primarily through media coverage, and that the most likely way that voters incorporate expectations is through consuming media that already contains this information. The media pre-benchmark economic news by reporting more positively (negatively) on the economy when it outperforms (underperforms) the international economy. We explicitly develop and refer to this pre-benchmarking mechanism throughout our article and test it empirically with newspaper data (KP, 679–80, Table 7). KP also present evidence that low- and high-information voters engage in similar amounts of benchmarking, which is not consistent with a sophisticated voter mechanism for benchmarking (KP appendix, pp. 3–9).Footnote 1 ABD, in turn, critique a straw man mechanism of hyper-informed and sophisticated voters that we did not propose. Addressing a tested mechanism rather than an inferred caricature would be a fairer way to assess an argument.
The Evidence for Benchmarking
ABD are no less selective in choosing how to interpret results and which empirical evidence to engage. Focusing solely on statistical tests of the equality of local and international growth effects, ABD ignore more than just our evidence in support of the media mechanism (Table 7 in KP) and against the voter sophistication mechanism for benchmarking. They also neglect KP’s finding that un-benchmarked, relative to benchmarked, growth has a less stable effect over time and across political institutions (Figure 2 and Table 4 in KP). The point estimates in almost all of our models for the degree of (partial) benchmarking suggest substantial benchmarking. The coefficients on global growth are usually near zero, and the coefficients on local growth are large, positive and statistically significant, but the two coefficients are statistically distinguishable only in the aggregate data (KP, Table 1) and in the test of the media mechanism (KP, Table 7). So, should we be troubled by the failure to reject the no-benchmarking null elsewhere?
Given that the individual-level sample in KP includes only thirty-one independent observations of the economy – one for each election study – the most likely explanation for the failure to reject the null of no-benchmarking is a lack of statistical power. We added the analysis of aggregate-level data in KP to see whether the no-benchmarking null could be rejected in a sample with more independent observations of the economy – it was (KP Table 1) – and whether the overall pattern in the coefficients would hold up – it did. That the coefficients on local and global growth in KP are not statistically differentiable at standard levels outside of Tables 1 and 7 could arise from voter behavior or simply from low statistical power.
Fortunately, in one of our two samples, new data and new methods allow us to markedly increase statistical power. Since KP, the Comparative Study of Electoral Systems (CSES) project has released Modules 3 and 4, which increases the number of elections and independent economic observations in our individual-level sample from thirty-one to fifty-eight. In addition, we previously employed a conditional logit model with clustered standard errors, which is conservative because it allows for an arbitrary within-cluster correlation structure. Given the nature of the CSES sample, the data are exchangeable within clusters, and thus can also be analyzed using a conditional logit model with election choice random effects, further improving power.
Table 1 reports the results from running conditional logit models with clustered standard errors (as in KP) and election choice random effects on the updated CSES election sample.Footnote 2 The conditional logit model with clustered standard errors yields similar coefficients to KP, with reduced standard errors for global growth. The Wald test reports that the local and global growth coefficients are now statistically distinguishable using the median measure. When applying the random effects estimator, the coefficients are statistically significantly different for both the median and principal components measures of international growth. Neither model using the trade measure of international growth can reject the no-benchmarking null, but the model results with the trade measure were also the weakest in KP, suggesting that a country’s five largest trade partners may not be the most relevant benchmark.Footnote 3 These results show that ABD’s solitary critique of KP – the failure to reject the null of no-benchmarking outside of Tables 1 and 7 of KP – is nothing more than a power issue.
Conclusion
ABD paint a dire picture of the state of benchmarking research, however they critique a straw man mechanism, neglect an explicitly developed and tested mechanism, ignore evidence in support of benchmarking, and selectively interpret any ambiguity as evidence for their preferred theory.Footnote 4 Their positive contribution is a mathematically identical specification that requires readers to interpret the coefficient on its key variable, international growth, as something other than the effect of international growth. A correct reading of the evidence in KP suggests that benchmarking is supported and no-benchmarking is not. Analysis of an extended individual-level sample provides additional evidence in support of benchmarking.
Arel-Bundock, Blais and Dassonneville (Reference Arel-Bundock, Blais and Dassonneville2018, ABD hereafter) offer an unusual critique of our article, Benchmarking across Borders. They find no methodological flaws, produce identical empirical results and concede that their proposed specification (Model 5) is mathematically identical to that used in Kayser and Peress (Reference Kayser and Peress2012, KP hereafter). ABD make two claims: (1) that their preferred specification is an innovation that improves interpretation and (2) that the empirical evidence presented in KP does not support benchmarking. The first is unpersuasive and the second depends on a selective reading of the evidence. We address these issues below and update the individual-level dataset from KP to increase statistical power, finding additional evidence of benchmarking.
Value Added?
Let us examine the improvement in clarity from ABD’s innovation. KP define local growth as the deviation of national economic growth from an international benchmark yielding coefficients on local growth and on the international growth benchmark. This allows us to test three relevant null hypotheses (the null of no-benchmarking, the null of benchmarking and the null of no economic voting) as well as the possibility of partial benchmarking. We concisely and correctly discuss how to interpret the three relevant null hypotheses and how to interpret the relative size of the local and international (aka global) coefficients as the degree of benchmarking in the last paragraph of page 668 of KP. ABD spend four pages describing how to test only one of the relevant nulls and then present a model (Model 5) with coefficients on un-benchmarked growth (G y ) and international growth (G i ). Readers confronted with coefficients such as in ABD’s Table 1 are supposed to know that the coefficient on international growth does not capture the effect of international growth on the vote but, rather, the effect of cross-national benchmarking. This is a dubious improvement to clarity.
Table 1 Expanded individual-level results for benchmarking in the economic vote
Note: Columns 1–3 report the results for a conditional logit model with standard errors clustered by election study. Columns 4–6 report the results for a conditional logit model with election-choice random effects. +p<0.10, *p<0.05, **p<0.01, ***p<0.001
Nor is ABD’s specification superior because it does not require a Wald test of the equality of two coefficients: it requires a Wald test for the null of benchmarking and the null of no economic voting. Moreover, unlike ABD, the KP specification allows the easy calculation of a point estimate of the degree of partial benchmarking (by dividing the global coefficient by the local coefficient). ABD’s preferred specification is mathematically equivalent, as they acknowledge (for example, in footnote 40), but it is less intuitive for the interpretation that matters and adds no value.
Mechanism
ABD additionally argue that ‘when correctly interpreted, the results [in KP] do not support the contention that voters make rational comparative evaluations’. They arrive at this broad-stroke claim, however, by impugning an admittedly improbable mechanism that we do not espouse, neglecting the evidence for the mechanism that we do espouse, ignoring additional evidence of the effect that we report, and then testing only one possible null. Consider the mechanism. ABD consider implausible the assumption that voters are sufficiently informed and rational to assess foreign growth rates, compare them to those in their country and hold politicians accountable. We also do not hold this to be the most plausible mechanism, which is why we develop and test a much different mechanism in KP that ABD barely engage.
We believe that voters (both low and high information) learn about economic growth primarily through media coverage, and that the most likely way that voters incorporate expectations is through consuming media that already contains this information. The media pre-benchmark economic news by reporting more positively (negatively) on the economy when it outperforms (underperforms) the international economy. We explicitly develop and refer to this pre-benchmarking mechanism throughout our article and test it empirically with newspaper data (KP, 679–80, Table 7). KP also present evidence that low- and high-information voters engage in similar amounts of benchmarking, which is not consistent with a sophisticated voter mechanism for benchmarking (KP appendix, pp. 3–9).Footnote 1 ABD, in turn, critique a straw man mechanism of hyper-informed and sophisticated voters that we did not propose. Addressing a tested mechanism rather than an inferred caricature would be a fairer way to assess an argument.
The Evidence for Benchmarking
ABD are no less selective in choosing how to interpret results and which empirical evidence to engage. Focusing solely on statistical tests of the equality of local and international growth effects, ABD ignore more than just our evidence in support of the media mechanism (Table 7 in KP) and against the voter sophistication mechanism for benchmarking. They also neglect KP’s finding that un-benchmarked, relative to benchmarked, growth has a less stable effect over time and across political institutions (Figure 2 and Table 4 in KP). The point estimates in almost all of our models for the degree of (partial) benchmarking suggest substantial benchmarking. The coefficients on global growth are usually near zero, and the coefficients on local growth are large, positive and statistically significant, but the two coefficients are statistically distinguishable only in the aggregate data (KP, Table 1) and in the test of the media mechanism (KP, Table 7). So, should we be troubled by the failure to reject the no-benchmarking null elsewhere?
Given that the individual-level sample in KP includes only thirty-one independent observations of the economy – one for each election study – the most likely explanation for the failure to reject the null of no-benchmarking is a lack of statistical power. We added the analysis of aggregate-level data in KP to see whether the no-benchmarking null could be rejected in a sample with more independent observations of the economy – it was (KP Table 1) – and whether the overall pattern in the coefficients would hold up – it did. That the coefficients on local and global growth in KP are not statistically differentiable at standard levels outside of Tables 1 and 7 could arise from voter behavior or simply from low statistical power.
Fortunately, in one of our two samples, new data and new methods allow us to markedly increase statistical power. Since KP, the Comparative Study of Electoral Systems (CSES) project has released Modules 3 and 4, which increases the number of elections and independent economic observations in our individual-level sample from thirty-one to fifty-eight. In addition, we previously employed a conditional logit model with clustered standard errors, which is conservative because it allows for an arbitrary within-cluster correlation structure. Given the nature of the CSES sample, the data are exchangeable within clusters, and thus can also be analyzed using a conditional logit model with election choice random effects, further improving power.
Table 1 reports the results from running conditional logit models with clustered standard errors (as in KP) and election choice random effects on the updated CSES election sample.Footnote 2 The conditional logit model with clustered standard errors yields similar coefficients to KP, with reduced standard errors for global growth. The Wald test reports that the local and global growth coefficients are now statistically distinguishable using the median measure. When applying the random effects estimator, the coefficients are statistically significantly different for both the median and principal components measures of international growth. Neither model using the trade measure of international growth can reject the no-benchmarking null, but the model results with the trade measure were also the weakest in KP, suggesting that a country’s five largest trade partners may not be the most relevant benchmark.Footnote 3 These results show that ABD’s solitary critique of KP – the failure to reject the null of no-benchmarking outside of Tables 1 and 7 of KP – is nothing more than a power issue.
Conclusion
ABD paint a dire picture of the state of benchmarking research, however they critique a straw man mechanism, neglect an explicitly developed and tested mechanism, ignore evidence in support of benchmarking, and selectively interpret any ambiguity as evidence for their preferred theory.Footnote 4 Their positive contribution is a mathematically identical specification that requires readers to interpret the coefficient on its key variable, international growth, as something other than the effect of international growth. A correct reading of the evidence in KP suggests that benchmarking is supported and no-benchmarking is not. Analysis of an extended individual-level sample provides additional evidence in support of benchmarking.
Supplementary Material
Data replication sets can be found in Kayser and Peress (Reference Kayser and Peress2018) Harvard Dataverse at: https://doi.org/10.7910/DVN/MX5NIV
Related Article
This is a comment on a letter by Vincent Arel-Bundock, Andre Blais and Ruth Dassonneville, “Do Voters Benchmark Economic Performance?” published in the British Journal of Political Science and available here: doi:10.1017/S0007123418000236