Introduction
Introducing a recent symposium on data use in comparative politics, Mudde and Schedler (Reference Mudde and Schedler2010: 411) dishearteningly note that though we have recently witnessed
vigorous and sophisticated debates on the quality of cross-national data … we still do not see evaluations of data quality appear in academic journals on a regular basis, and uncertainty over the quality of the cross-national data we use continues to be pervasive in many areas of research.
A number of influential articles and books have indeed targeted the quality of cross-national data sets (e.g. Adcock and Collier, Reference Adcock and Collier2001; Munck and Verkuilen, Reference Munck and Verkuilen2002; Goertz, Reference Goertz2005). Systematic templates for assessing and guiding both the data generating process and the content validity of the data do exist. Arguably, however, the proposed guidelines have been disseminated to the actual practices of those analyzing comparative politics only to a limited extent (Munck, Reference Munck2009, Reference Munck2010). One of the most striking examples is to be found in the rule of law research agenda.
Over the latest decades, the rule of law has become one of the most celebrated concepts within politics and academia alike. At times, the underlying phenomenon is construed as something close to a panacea for developing countries – politically as well as economically (Tamanaha, Reference Tamanaha2004: 1–6; The Economist, 2008). Yet, paradoxically, the interest in the rule of law has in no way been accompanied by a consensus as to how the concept should be defined (Tamanaha, Reference Tamanaha2004; Belton, Reference Belton2005; Møller and Skaaning, Reference Møller and Skaaning2010; Skaaning, Reference Skaaning2010).
Consequently, scholars analyze very different things while maintaining that they deal with the rule of law. This is probably one reason why the actual empirical analyses of the causes and consequences of the rule of law have produced strikingly dissimilar results, depending on which rule of law measure has been used (see, e.g. Knack, Reference Knack1996; Norton, Reference Norton1998; Barro, Reference Barro2000; Ali, Reference Ali2003; Andrews and Montinola, Reference Andrews and Montinola2004; Joireman, Reference Joireman2004; Butkiewicz and Yanikkaya, Reference Butkiewicz and Yanikkaya2006). In this paper, we pursue this observation systematically and demonstrate that the present lack of equivalence in cross-national studies on the rule of law mirrors the fact that the most influential rule of law measures are not interchangeable.
This is probably the right place to note that we refrain from assessing the measurement validity (Adcock and Collier, Reference Adcock and Collier2001) and ‘concept-measure consistency’ (Goertz, Reference Goertz2005) of the individual indices, as such exercises basically presuppose what is missing, namely a widespread consensus about the definition of the rule of law. Instead, we heed Mudde and Schedler's (Reference Mudde and Schedler2010: 412–413) call for carrying out two specific sets of assessments of the ‘substantive implications’ of using different data sets. First, we assess the sensitivity of descriptive inferences to data selection. Second, we assess the sensitivity of causal inferences to data selection. On this basis, we develop some more general analytical points about the ‘good governance’ research field and the problem of cumulativity in the Conclusions.
Posing the questions
More precisely, we pose four questions. First, do the dominant measures of the rule of law differ with respect to the conceptualization, that is, as regards the defining attributes? Second, do the measures correlate with each other? Third, do the measures lend support to the same explanatory factors derived from the literature on the rule of law? Finally, do non-random patterns of missing data introduce bias with respect to the investigated explanatory factors in one or more of the indices?
Before attempting to answer these questions, however, the present lack of consensus needs to be addressed. This is definitely not something unique to the rule of law research agenda. Most sub-fields in the social sciences are characterized by similar definitional controversies. To some extent, this seems to indicate that a definitional consensus is not to be expected. What is more, it is not a problem insofar as the scholarly community is conscious about the lack of interchangeability. If scholars carefully opt for the rule of law index which best accords with their stipulated definition of the rule of law and if their conclusions clearly state what has in fact been explained or used as an explanatory variable, the problem of interchangeability disappears. Indeed, if analysts conscientiously choose their measures with reference to what Adcock and Collier (Reference Adcock and Collier2001) term content validity, the present lack of consensus may be beneficial for research as more dimensions of the rule of law are scrutinized.
However, such conscientious treatment of indices is not the norm at present. Most researchers simply opt for the index that fits their data requirements best without discussing the content validity of these measures. Bolaky and Freund (Reference Bolaky and Freund2006: 23, fn. 16) thus choose the Law and Order Index (Political Risk Services (PRS) below) rather than that of the World Bank (World Governance Indicators (WGI) below) for the sole reason that the latter does not contain year-by-year data for the early 1990s and onwards. Burnside and Dollar (Reference Burnside and Dollar2004: 9) also select the Law and Order Index – as a proxy for institutional quality – because it is ‘available for many developing countries going back to the 1980s’. Even more tellingly, when describing their preferred measure of the dependent variable (i.e. rule of law), Andrews and Montinola (Reference Andrews and Montinola2004: 72) merely note that ‘PRS provides the most complete data for the period of interest’. Knack (Reference Knack2002: 12) captures the standard practice well in observing about the Law and Order Index that:
Because of its much better cross-country coverage … the ICRG [PRS] indicators have been the most widely used governance indicators in the cross-country empirical literature on economic performance.
The list could be expanded by studies, which do not explicitly argue that coverage was the criterion for the choice of data set – but, rather, remain silent (and maybe agnostic) on the question of the criteria for selecting a particular measure, or bluntly refer to the fact that the measure employed has been used in previous studies.Footnote 1
When data convenience rather than content validity becomes the norm, non-interchangeability is indeed problematic. This is further accentuated by the fact that most analyses refrain from using several indices to test the robustness of their findings, which means that the results hinge completely on the chosen index.Footnote 2 In addition, the problem is enhanced because even many of the indices scrutinized in this article – which are selected, among other criteria, due to their wide coverage – exclude different parts of the world, and tend to do so in non-random ways. Bearing this in mind, we set out to assess whether the dominant measures are interchangeable.
Selecting rule of law indices
Among readily available data sets, we have identified seven rule of law measures, based on three criteria. First, they target the actual (de facto) level of the rule of law, rather than formal (de jure) commitments. Second, they primarily build on experts’ assessments in the form of standards-based data.Footnote 3 Third, with the exception of the Ibrahim Index (see below), the scope of the data covers the contemporary degree of rule of law in at least the majority of the countries of the world.
The seven measures are presented in Table 1. The first is the rule of law sub-index from the Bertelsmann Transformation Index (BTI; 2006), which covers almost all non-OECD countries with more than two million citizens. The second is Freedom House's (FH; 2006) sub-category scores for the rule of law found in the Freedom in the World Survey. The third is the law and order index provided by Political Risk Services in its International Country Risk Guide (2007). The fourth is the aggregate rule of law indicator constructed by Kaufman et al. (Reference Kaufman, Kray and Mastruzzi2007) in the context of the WGI, which is based on many different data sources, including most of the other measures mentioned. The fifth is the sub-category scores for legal structure and security of property rights included in Fraser Institute's (FI; 2009) Economic Freedom of the World Data. The sixth is the measure of property rights, which is part of the Index of Economic Freedom provided by the Heritage Foundation (HF; 2009) and the Wall Street Journal. The seventh – and final – measure is the Ibrahim Index of African Governance provided by the Mo Ibrahim Foundation (IF; 2010).
BTI = Bertelsmann Transformation Index; FH = Freedom House; PRS = Political Risk Services; WGI = World Governance Indicators; FI = Fraser Institute; HF = Heritage Foundation; IF = Mo Ibrahim Foundation.
Generally, these indices cover different years and differ significantly with regard to the number of countries included. The information presented in the table applies to 2005, which is employed in the subsequent analyses because it is one of the very few years that all seven measures cover. Henceforth, we use the abbreviations of Table 1 to refer to the respective measures.
With the partialFootnote 4 exception of the Ibrahim index, the seven measures arguably suffer from what may be termed an ‘OECD bias’ as they measure the world from the point of view of the well-functioning and affluent OECD countries.Footnote 5 Critics have pointed to this on a number of occasions. The FH has thus been accused of having a right-wing bias and of overstating the level of freedom in ‘US-friendly’ countries (Chomsky and Herman, Reference Chomsky and Herman1988; Bollen, Reference Bollen1992: 205; Barahona, Reference Barahona2007; Giannone, Reference Giannone2010). Likewise, the BTI has been blamed of being ahistorical and Eurocentric, which implies that post-colonial countries are not assessed properly (Koelble and Lipuma, Reference Koelble and Lipuma2008). A similar charge has been leveled against the World Bank, which has been accused of supporting Western interests in general, and US business interests specifically, as it has pushed for a neo-liberal agenda (Uvin, Reference Uvin2002; Moore, Reference Moore2007). Finally, the measures of the HF and the FI have been said to have a libertarian bent (Card and Freeman, Reference Card and Freeman2002: 3; cf. Ashby and Sobel, Reference Ashby and Sobel2008: 332).
At the same time, and as demonstrated in the conceptual appraisal below, the indices measure starkly different things. One important insight of this paper is exactly that two more particular clusters of indices can be identified. Some indices (FH and BTI) seem to target what is best termed ‘political constitutionalism’, whereas others (PRS, HF, and FI) seem to measure ‘public order’. Most strikingly, a conceptual slide occurs when the degree of ‘property rights’ is understood as a measure of the rule of law. At most, such rights constitute one sub-component of the more general concept. This also comes out in Table 2, which illustrates the differences in defining attributes among the seven measures. One might thus – somewhat paradoxically – simultaneously question whether the indices are too uniformly Eurocentric and whether they simply measure different things, such as distinct aspects of the overarching concept of good governance rather than the rule of law.
BTI = Bertelsmann Transformation Index; FH = Freedom House; PRS = Political Risk Services; WGI = World Governance Indicators; IF = Mo Ibrahim Foundation.
Note: Where the data providers have not provided an explicit definition, we present the indicators.
Neither of these two problems, however, affects the objective of this paper. As mentioned, our aim is not to take stock of the validity of the rule of law measures – or governance measures in general for that matter (see, e.g. Arndt and Oman, Reference Arndt and Oman2006; Thomas, Reference Thomas2007; Williams and Siddique, Reference Williams and Siddique2008) – but to scrutinize their interchangeability. Fraser's and Heritage's property rights measures are, for instance, time and again used as indicators of the rule of law in the literature (cf. Munck, Reference Munck2003; Ríos-Figueroa and Staton, Reference Ríos-Figueroa and Staton2008), which is why we include them in this appraisal. More generally, we have attempted to include the most dominant indices which are actually employed as proxies for the rule of law in contemporary analyses. Using Adcock and Collier's (Reference Adcock and Collier2001) valuable distinction, we compare indices which are based on different ‘systematized concepts’ while agreeing on the ‘background concept’ (the rule of law). That we are not really comparing like with like is thus part of the problem which calls for assessment in the first place.
A preliminary conceptual appraisal
The literature to a large extent uses these indices as if they were interchangeable. The first question is if that is warranted conceptually? As Table 1 makes clear, only four of the indices (BTI, FH, IF, and WGI) actually employ the wording ‘rule of law’, the fifth (PRS) opting instead for the more ambiguous ‘law and order’, whereas the sixth (FI) and seventh (HF) focus on the legal structure and/or property rights. These differences in nomenclature indicate that the measures are not interchangeable even on the conceptual level. This tentative conclusion is borne out as soon as we descend to the level of the defining attributes, illustrated in Table 2.
As should be clear from this overview, little or no consensus concerning the conceptualization of the rule of law exists when we focus on the actual defining attributes underpinning the indices. The rule of law research agenda thus differs significantly from the related democratization research agenda. To illustrate this, it is pertinent to touch upon Casper and Tufis (Reference Casper and Tufis2003), who demonstrate that three prominent measures of democracy are not genuinely interchangeable. However, Casper and Tufis (Reference Casper and Tufis2003: 197) also show that all of these indices commence from Dahl's (Reference Dahl1971) definition of polyarchy and, consequently, that they are highly correlated (correlation coefficients between 0.85 and 0.92). Not even the first of these two premises are fulfilled in our case. In brief, the rule of law research agenda has not settled for a particular systematized concept.
Correlations between the indices
What about the second premise of Casper and Tufis’ (Reference Casper and Tufis2003) analysis: that the indices are highly correlated?Footnote 6 Considering the lack of consensus on the systematized concept, one would expect a much lower consistency of correlations in the case of rule of law measures. Such is indeed the case, as illustrated in Table 3.
BTI = Bertelsmann Transformation Index; FH = Freedom House; PRS = Political Risk Services; WGI = World Governance Indicators; FI = Fraser Institute; HF = Heritage Foundation; IF = Mo Ibrahim Foundation.
Note: Kendall's tau-b correlation coefficients, N in parentheses.
The seven rule of law measures correlate in the range of 0.21 to 0.80 (Kendall's tau-b). More particularly, two observations can be made. First, the correlations between BTI and FH and among WGI, FI, and HF are very high. Second, FH, BTI, and PRS show a relatively low degree of co-variation with a number of the other measures.
These patterns also emerge in a factor analysis (principal component) that extracts two components (with Eigenvalues above 1, accounting for 65 and 20%, respectively, of the variation).Footnote 7 BTI, FH, and PRS obtain the lowest loadings on the first component. Moreover, PRS exhibits a higher positive loading on the second component, whereas the loadings on this component are negative, and moderately high, in the cases of BTI and FH.
The consistently high correlations between the WGI measure and all the other indices are hardly surprising, considering the fact that the latter make up important parts of the former (or vice versa, as it is to some extent the case with respect to FI). Yet, other than that, only the correlations between BTI and FH operate on the same level. Most striking are the low correlations between the PRS and FH, BTI (0.11 and 0.36, respectively) and between the BTI, FI, and HF (0.40 and 0.51, respectively).
This indicates that some of the indices do not measure the same empirical phenomenon. To be more exact, two or even three clusters seem to exist. First, the FH and BTI indices correlate strongly. Second, the same is the case for FI and HF. Finally, WGI and IF can be linked with both clusters for the simple reason that they subsume the other measures to a large extent.
To further probe these patterns, we follow Adcock and Collier's (Reference Adcock and Collier2001: 540–541) recommendation to assess the correlation between the measures and measures of neighboring, yet distinct, concepts. The most obvious choice is a measure capturing a purely Schumpeterian definition of democracy, that is, one which only includes bare-bones electoral aspects.Footnote 8 We use the sub-component ‘electoral self-determination’ from the CIRI Human Rights Data setFootnote 9 to capture such a minimalist conception of democracy. The results are reported in the first row of Table 4.
WGI = World Governance Indicators; PRS = Political Risk Services; FH = Freedom House; BTI = Bertelsmann Transformation Index; FI = Fraser Institute; HF = Heritage Foundation; IF = Mo Ibrahim Foundation; CIRI = The Cingranelli & Richards Human Rights Dataset.
Note: Kendall's tau-b correlation coefficients, N in parentheses.
Two things are of interest here. First, PRS is once again the odd one out as the correlation with electoral process is a mere 0.08. Second, as regards FH and BTI, the correlations are on a par with the equivalent correlations among the rule of law indices (except that between FH and BTI). This goes to show that the empirical convergence of the rule of law indices is relatively low. In a nutshell, different measures of the same concept should have higher correlations than different measures of distinct concepts, even if the same factors cause the phenomena denoted by the distinct concepts (which is quite plausible in the case of democratic elections and the rule of law).
This exercise can also be used in a more constructive way. To unfold the distinction between the identified clusters (FH and BTI vs. FI and HF vs. PRS) in detail, Table 4 also contains a number of other measures affiliated with the rule of law concept: the rights of freedom of speech and freedom of assembly and association as measured by yet other subcomponents from the CIRI, violations of personal integrity rights (such as killings and torture) as measured by the Political Terror Scale provided by Gibney, Cornett, and Wood,Footnote 10 the level of criminality and violent crime as measured by two subcomponents of the Global Peace Index,Footnote 11 and corruption as measured by the Corruption Perception Index compiled by Transparency International.Footnote 12 All data refer to the year 2005, except the two measures of crime from the Global Peace Index that refer to 2007, that is, the first year for which data were available.
The objective of these additional tests is to scrutinize which particular dimensions each of the rule of law indices appears to emphasize. The picture revealed by Table 4 is fairly straightforward. The FH–BTI cluster correlates relatively strongly with the political rights of expression, assembly/association, and free elections but relatively weakly with the measures of crime and corruption. The contrary is the case for the FI–HF cluster. Hence, FH and BTI seem to tap into what could be termed political constitutionalism, whereas FI and HF tap into what is probably best termed public order. Unsurprisingly, the latter tendency is even more pronounced for the PRS, the ‘law and order’ label of which thus seems apt. Finally, IF and WGI correlate strongly on both dimensions but seem somewhat closer to the order dimension than the constitutionalism dimension, relatively speaking.
This information not only goes to underline that the measures tap different latent dimensions, thereby questioning their interchangeability; it should also assist scholars in choosing the measures best suited for the definition of the rule of law they are concerned with. Insofar as the rule of law is equated with political constitutionalism, FH and BTI seem the superior options. If, au contraire, one is on the lookout for a measure of order, PRS probably has the competitive edge, followed by FI and HF, respectively. Finally, should one wish to capture a very encompassing definition of the rule of law – an outlook which clashes with our recommendations in the Conclusions – the composite (and muddled) measures of WGI or IF are preferable.
Correlations with explanatory variables
The assessment carried out above shows that the rival measures seem to tap into the same empirical phenomenon only to a limited extent. This assertion should of course be qualified by the proposition that this is less so with regard to the pairwise comparison of BTI and FH and those among the FI and HF – and the fact that the WGI seems to be associated with both clusters. Still, the correlations show that the indices are not interchangeable tout court.
What is more, the analysis of Casper and Tufis (Reference Casper and Tufis2003) has demonstrated that even the highly correlated democracy measures may produce different explanatory results when used as dependent variables in cross-national analyses (see also Bollen and Paxton, Reference Bollen and Paxton2000; Hadenius and Teorell, Reference Hadenius and Teorell2005). One would, a fortiori, expect this to be even more pronounced for the rule of law measures.
To carry out this third test of the interchangeability of the indices, we have derived a number of explanatory variables from studies of cross-national differences in compliance with the rule of law (the operationalizations of these variables are presented in Appendix A). These are: oil production (Barro, Reference Barro2000; Hansson and Olsson, Reference Hansson and Olsson2006), wealth (Barro, Reference Barro2000; Joireman, Reference Joireman2004), country size (Hansson and Olsson, Reference Hansson and Olsson2006), ethno-religious fractionalization (Weingast, Reference Weingast1997; Hayo and Voigt, Reference Hayo and Voigt2005; Hansson and Olsson, Reference Hansson and Olsson2006), legal system (Hayek, Reference Hayek1973; Eisenberg, Reference Eisenberg1988; Joireman, Reference Joireman2004), communist past (Hoff and Stiglitz, Reference Hoff and Stiglitz2004; Sandholz and Taagepera, Reference Sandholz and Taagepera2005), and religion (Barro, Reference Barro2000; Hayo and Voigt, Reference Hayo and Voigt2005).
This exercise does not depend upon including all theoretically relevant independent variables. The objective is not to explain the causes of the rule of law but to test whether the results are relatively similar or dissimilar when the seven measures are used interchangeably as dependent variables in multiple OLS regression analyses.Footnote 13 To ease the interpretation, all measures have been calibrated to range from 0 (lowest level of rule of law) to 100 (highest level of rule of law).
In order to investigate whether the results might differ depending on the case coverage, we have run two sets of regressions: first, for all countries included in each of the data sets (Table 5) and, second, for the 116 countries included in fiveFootnote 14 of the seven indices (Table 6). The rule of law data once again covers 2005.
WGI = World Governance Indicators; FH = Freedom House; BTI = Bertelsmann Transformation Index; PRS = Political Risk Services; FI = Fraser Institute; HF = Heritage Foundation; IF = Mo Ibrahim Foundation.
*P < 0.1, **P < 0.05, ***P < 0.01 (two-tailed test). Results refer to regressions with all countries included.
Note: Unstandardized coefficients reported with (heteroscedasticity-consistent) robust standard errors in parentheses.
WGI = World Governance Indicators; FH = Freedom House; PRS = Political Risk Services; FI = Fraser Institute; HF = Heritage Foundation.
*P < 0.1, **P < 0.05, ***P < 0.01 (two-tailed test). Results refer to regressions only including countries covered by all five measures.
Note: Unstandardized coefficients reported with (heteroscedasticity-consistent) robust standard errors in parentheses.
When including all countries, the picture is as follows. A few explanatory variables consistently show a significant (wealth) or non-significant (ethno-religious fractionalization, common law) association with the rule of law. However, the remaining variables exhibit stark dissimilarities depending on the measure used.
The statistically significant associations for some variables (not colonized, communist past, Muslim) even point in opposite directions. Using BTI, we encounter a negative relationship with the lack of a colonized past, whereas we find a positive relationship when pinning our faith on WGI, PRS, and FI. Likewise, a communist past produces a significant negative relationship with the rule of law using WGI and HF but a positive relationship using the PRS. Finally, a status as a Muslim country has a strongly significant negative effect on the rule of law using the BTI and FH yet, in contrast, a significant (albeit weaker) positive effect when employing PRS, FI, and HF.
In sum, the association between the explanatory variables and the rule of law ranges from positive to non-existent to negative, depending on which index is used to operationalize the rule of law. Most disturbing are the positive significant associations between PRS and a communist past and a Muslim population, associations which completely contradict the theoretical expectations of the studies from which we have derived these factors.
What happens when we narrow the focus to the 116 countries covered by all the indices (save BTI, which is excluded as it does not cover the OECD countries, and IF, which is excluded as it only covers African countries)? As illustrated in Table 6 below, the results are somewhat more in line with one another. A Communist past and Muslim roots still produce contradictory significant associations, but even this is less pronounced than in Table 5. This indicates that some of the present problems in the literature could be alleviated by confining one's attention to the countries covered by more data sets, that is, by more self-conscious restrictions on the empirical scope of the inquiry.
Non-random patterns of missing data
Such an empirical restriction of the scope of comparison may, however, carry problems of its own. In an assessment of measures of judicial independence, Ríos-Figueroa and Staton (Reference Ríos-Figueroa and Staton2008) identify systematic patterns of missing data across a selection of indices. Based on their distinct empirical coverage, one might expect that the rule of law measures included in our assessment suffers from the same problem.
To see if this is indeed so, we use the simple test of non-random ‘missingness’ devised by Ríos-Figueroa and Staton (Reference Ríos-Figueroa and Staton2008). For each of the independent variables included in the previous regression analyses, we compare the average scores of countries that are included and countries that are missing in four of the data sets, viz. PRS, BTI, FI, and HF. FH and WGI are not assessed because they cover virtually all independent countries of the world, meaning that country data are not missing; IF is excluded because of its focus on one region only.
The question is if there is a statistically significant difference between the scores on each of the independent variables across the two groups (covered vs. missing)? If this is the case, it is likely to introduce biased estimates into any analysis that restricts the scope in accordance with the data availability. The results of the missingness analysis are reported in Table 7.
PRS = Political Risk Services; BTI = Bertelsmann Transformation Index; FI = Fraser Institute; HF = Heritage Foundation.
*P < 0.1, **P < 0.05, ***P < 0.01 (two-tailed test).
Note: Entries are average differences (subtracted means in parentheses) between countries that are covered and missing, respectively, by the rule of law measures.
It turns out that the differences between included and missing countries are indeed significant for a number of the variables. In the cases of wealth, country size, and common law, the differences are significant across the board. For variables such as not colonized, communist past, Muslim, and protestant, the differences are significant for at least one index. Notice also that the direction of the bias – as expressed by the coefficients – sometimes differs. The countries included in BTI are thus significantly poorer than those missing, whereas in the cases of PRS, FI, and HF the countries included are significantly more affluent than their missing equivalents.
Such non-random missingness obviously diminishes our ability to infer from the sample to the general population of all countries.Footnote 15 At the very least, scholars therefore need to abstain from such inferences or, alternatively, justify why it is possible to infer from the included to the missing countries. Ríos-Figueroa and Staton (Reference Ríos-Figueroa and Staton2008: 23) note that it is striking that patterns of missingness are almost never dealt with – or even commented upon – by scholars using data sets on judicial independence. Unfortunately, this observation travels well into studies of the rule of law.
The missingness bias afflicts all the data sets reviewed in this article that do not have universal coverage (i.e. all save FH and WGI). Furthermore, and most importantly for our purposes, the dissimilar systematic biases of these indices further undermine their interchangeability.
Conclusion
We have compared seven different indices, which – as may be inferred from the somewhat nonchalant practices present in the literature – have been taken to measure the same phenomenon, viz., the rule of law. As should be clear from our analyses, however, this does not seem to be the case. As soon as we reach the level of the systematized concept, the indices differ significantly. This is further reflected in the fact that the scores do not correlate highly with each other – at least not vis-à-vis the correlations between the most prominent democracy indices. It is also reflected in the fact that the results of an extensive explanatory model differ starkly, depending on which of the measures is employed as a dependent variable.
As such, the seven indices are clearly not interchangeable. The differences in measurement and coverage of the rule of law indices turn out to have considerable consequences for substantial findings – and much more so than in the neighboring field of democratization studies. The character of the tests carried out in this article means that we are only able to assess the correspondence between the indices in general, and not the extent to which the indicators actually measure the stipulated attributes. Having said that, one particular conclusion concerning the validity of the particular indices does seem warranted. The analyses have shown that PRS is very much out of tune with the other measures, with regard to both conceptual definition and empirical results, indeed, even with regard to its somewhat ambiguous choice of term (‘law and order’). Hence, one should be especially cautious of explanatory conclusions about the rule of law based on this index. It is all the more striking that – as shown in the Introduction – this particular measure has until recently been that most often employed in cross-national analyses of the rule of law, due to its wide coverage.
The six other indices also produce relatively dissimilar empirical results when employed in an explanatory model. Yet at least they seem to make up three clusters of indices, each of which captures important aspects of the same phenomenon. First, FH and BTI are so consistently linked that they can almost be used interchangeably in empirical analyses of the rule of law (in the guise of political constitutionalism). Second, FI and HF also show affinities, both concerning correlations and results (including the fact that they seem to tap into order). Third, WGI taps into all the other indices and therefore offers a more general, but also very muddled, measure. This statement also applies to IF.
This could be read as a classical half-full/half-empty conclusion. If PRS is disregarded, the glass seems half full as some interchangeability is allowed for, depending on whether one equates the rule of law with constitutionalism or order or both. However, the problems detected in the analysis of missingness show that the differing empirical scopes of the data sets further undermine their suitability in robustness tests. Most obviously, BTI suffers from massive problems of non-random missingness vis-à-vis FH. Coupled with the differences in conceptualization, correlation, and explanatory results across FH, BTI, WGI, FI, and HF, this makes for a relatively pessimistic conclusion about interchangeability, even within the described clusters.
The general objective of this paper has been to provide easy-to-understand ‘knowledge of how the specific measures we select affect the empirical inferences we draw’ (Mudde and Schedler, Reference Mudde and Schedler2010: 413). But what are the more particular analytical consequences of the results? When different data sets of the same phenomenon produce different results, an explicit validity test is warranted. More particularly, in such a situation we need a basis for selecting certain data sets and not others (cf. Hawken and Munck, Reference Hawken and Munck2009). This calls for a careful appraisal of the content validity of the indices vis-à-vis the definition of the rule of law employed by each scholar. However, previous large-N studies exploring the sources of rule of law have frequently neglected this exercise as researchers have tended to select single data sets based on cross-spatial and temporal coverage.
Based on our analysis, it seems fair to expect that the results of such analyses are unlikely to prove robust if other indices are also employed. This means that their results hinge entirely on what the employed rule of law-proxy actually measures and the empirical scope it covers – something that needs careful attention and scrupulous elucidation.
A simple recommendation can be made on this basis. Scholars should be very clear about their definition of the rule of law and they should select data sets in accordance with this definition and with the empirical scope conditions of their theories. As Gerring (Reference Gerring1999: 391) pertinently points out, scholars ‘have an obligation to state explicitly why (on the basis of which criteria) certain properties and terms were chosen, or excluded’. If this is done, then the lack of consensus may even spur research on the rule of law.
However, if the scholarly community does not in fact wish the rule of law indices to measure different things, then the recommendation changes. In that case, what the ascending research agenda on the rule of law needs now is a definition on the level of the systematized concepts that becomes generally accepted, as did Robert A. Dahl's seminal definition of polyarchy in the democratization literature. Needless to say, this is a necessary but not a sufficient condition, as the subsequent step of measurement also needs to be handled properly.
Until these issues are faced head-on, the problems pestering the rule of law research agenda are likely to subsist. These problems, in turn, speak volumes about the lack of maturity – and cumulativity – of the good governance research field in general (cf. Doornbos, Reference Doornbos2003). The very essence of science is, after all, to establish a common language based on sound conceptual premises. Research can only become truly cumulative if the vagueness and ambiguity of ordinary language is reduced. That, in turn, calls for either self-conscious and systematic disagreement about the definition of concepts such as the rule of law or, contrariwise, for explicit agreement. Neither seems to be beckoning at the moment.
Acknowledgements
We gratefully acknowledge the valuable comments on earlier versions of this paper from three anonymous reviewers, Kim Mannemar Sønderskov, Gerardo Munck, and participants in presentations at the Sandbjerg Estate, Denmark and Georgia State University, Atlanta.
Appendix A operationalization of explanatory factors
Oil production
Oil production is operationalized using IMF's (2007) list of hydro-carbon rich countries (2000–05) found in the Guide on Resource Transparency. The countries listed rely heavily on oil production for government revenues and receive the value of 1. The remaining countries receive a 0, meaning that the variable is treated as a dichotomy.
Wealth
Wealth is measured using a standard wealth indicator, namely (natural log of) GDP per capita (purchasing power parity) based on data from the Penn World Tables (2007) for the year 2004.
Country size
The (natural) log of a country's total area in square kilometers is used to measure this variable, based on data from the World Development Indicators provided by the World Bank (2008).
Ethno-religious fractionalization
Data on the degree of heterogeneity are taken from Alesina et al.'s (Reference Alesina, Devleeschauwer, Easterly, Kurlat and Wacziarg2003) scores on ethnic and religious fractionalization. As the same logic applies to both ethnic and religious fractionalization, a combined measure has been constructed based on the maximum value of the two original indices for each country.
Not colonized
Data on the colonial past from La Porta et al. (Reference La Porta, Lopez-de-Silanes, Shleifer and Vishny1999) have been used to construct a dummy variable distinguishing between countries that have never been colonized and former colonies. The few missing values have been filled in by scores based on information from the CIA's The World Fact Book, https://www.cia.gov/library/publications/the-world-factbook.
Common law
The data used to distinguish (predominant) common law countries from countries with another legal system (mainly civil law) are taken from La Porta et al. (Reference La Porta, Lopez-de-Silanes, Shleifer and Vishny1999). The few missing values have been filled in by scores based on information from the CIA's The World Fact Book, https://www.cia.gov/library/publications/the-world-factbook.
Communist past
A dummy variable for communist and post-communist countries has been constructed.
Dominant religion
Following the procedure used by Steven Fish (Reference Fish2002), we have constructed two dummy variables, distinguishing between countries where Islam and Protestantism, respectively, are the dominant (plurality or majority) religions.