On the limited interchangeability of rule of law measures

Jørgen Møller; Svend-Erik Skaaning

doi:10.1017/S1755773910000421

On the limited interchangeability of rule of law measures

Published online by Cambridge University Press: 01 April 2011

Jørgen Møller and

Svend-Erik Skaaning

Show author details

Jørgen Møller*: Affiliation:
Associate Professor, PhD, Department of Political Science, Aarhus University, Bartholins Allé, Århus C, Denmark
Svend-Erik Skaaning*: Affiliation:
Associate Professor, PhD, Department of Political Science, Aarhus University, Bartholins Allé, Århus C, Denmark
*: * E-mails: jm@ps.au.dk, skaaning@ps.au.dk
* E-mails: jm@ps.au.dk, skaaning@ps.au.dk

Article contents

Abstract
Introduction
Posing the questions
Selecting rule of law indices
A preliminary conceptual appraisal
Correlations between the indices
Correlations with explanatory variables
Non-random patterns of missing data
Conclusion
Footnotes
References

Rights & Permissions

Abstract

During the latest decade, empirical research on the causes and consequences of the rule of law has expanded and, in the process, become extremely influential. However, we show that a number of widely used indices of the rule of law are not interchangeable. This lack of interchangeability is reflected in the fact that they are based on different defining attributes, to some extent cover distinct empirical scopes, do not correlate highly with each other, and support different explanatory factors. Until a consensus has been established with respect to the conceptualization of the rule of law, scholars are thus not free to opt for the measure that fits their data requirements best regarding spatial and/or temporal scope. Instead, they must carefully assess the content validity vis-à-vis their stipulated definition of the rule of law. Given the amount of money and time poured into the rule of law agenda, the problems identified reflect the lack of maturity of ‘good governance’ research.

Keywords

rule of law indices interchangeability missingness content validity

Type: Research Article
Information: European Political Science Review , Volume 3 , Issue 3 , November 2011 , pp. 371 - 394

DOI: https://doi.org/10.1017/S1755773910000421 [Opens in a new window]
Copyright: Copyright © European Consortium for Political Research 2011

Introduction

Introducing a recent symposium on data use in comparative politics, Mudde and Schedler (Reference Mudde and Schedler2010: 411) dishearteningly note that though we have recently witnessed

vigorous and sophisticated debates on the quality of cross-national data … we still do not see evaluations of data quality appear in academic journals on a regular basis, and uncertainty over the quality of the cross-national data we use continues to be pervasive in many areas of research.

A number of influential articles and books have indeed targeted the quality of cross-national data sets (e.g. Adcock and Collier, Reference Adcock and Collier2001; Munck and Verkuilen, Reference Munck and Verkuilen2002; Goertz, Reference Goertz2005). Systematic templates for assessing and guiding both the data generating process and the content validity of the data do exist. Arguably, however, the proposed guidelines have been disseminated to the actual practices of those analyzing comparative politics only to a limited extent (Munck, Reference Munck2009, Reference Munck2010). One of the most striking examples is to be found in the rule of law research agenda.

Over the latest decades, the rule of law has become one of the most celebrated concepts within politics and academia alike. At times, the underlying phenomenon is construed as something close to a panacea for developing countries – politically as well as economically (Tamanaha, Reference Tamanaha2004: 1–6; The Economist, 2008). Yet, paradoxically, the interest in the rule of law has in no way been accompanied by a consensus as to how the concept should be defined (Tamanaha, Reference Tamanaha2004; Belton, Reference Belton2005; Møller and Skaaning, Reference Møller and Skaaning2010; Skaaning, Reference Skaaning2010).

Consequently, scholars analyze very different things while maintaining that they deal with the rule of law. This is probably one reason why the actual empirical analyses of the causes and consequences of the rule of law have produced strikingly dissimilar results, depending on which rule of law measure has been used (see, e.g. Knack, Reference Knack1996; Norton, Reference Norton1998; Barro, Reference Barro2000; Ali, Reference Ali2003; Andrews and Montinola, Reference Andrews and Montinola2004; Joireman, Reference Joireman2004; Butkiewicz and Yanikkaya, Reference Butkiewicz and Yanikkaya2006). In this paper, we pursue this observation systematically and demonstrate that the present lack of equivalence in cross-national studies on the rule of law mirrors the fact that the most influential rule of law measures are not interchangeable.

This is probably the right place to note that we refrain from assessing the measurement validity (Adcock and Collier, Reference Adcock and Collier2001) and ‘concept-measure consistency’ (Goertz, Reference Goertz2005) of the individual indices, as such exercises basically presuppose what is missing, namely a widespread consensus about the definition of the rule of law. Instead, we heed Mudde and Schedler's (Reference Mudde and Schedler2010: 412–413) call for carrying out two specific sets of assessments of the ‘substantive implications’ of using different data sets. First, we assess the sensitivity of descriptive inferences to data selection. Second, we assess the sensitivity of causal inferences to data selection. On this basis, we develop some more general analytical points about the ‘good governance’ research field and the problem of cumulativity in the Conclusions.

Posing the questions

More precisely, we pose four questions. First, do the dominant measures of the rule of law differ with respect to the conceptualization, that is, as regards the defining attributes? Second, do the measures correlate with each other? Third, do the measures lend support to the same explanatory factors derived from the literature on the rule of law? Finally, do non-random patterns of missing data introduce bias with respect to the investigated explanatory factors in one or more of the indices?

Before attempting to answer these questions, however, the present lack of consensus needs to be addressed. This is definitely not something unique to the rule of law research agenda. Most sub-fields in the social sciences are characterized by similar definitional controversies. To some extent, this seems to indicate that a definitional consensus is not to be expected. What is more, it is not a problem insofar as the scholarly community is conscious about the lack of interchangeability. If scholars carefully opt for the rule of law index which best accords with their stipulated definition of the rule of law and if their conclusions clearly state what has in fact been explained or used as an explanatory variable, the problem of interchangeability disappears. Indeed, if analysts conscientiously choose their measures with reference to what Adcock and Collier (Reference Adcock and Collier2001) term content validity, the present lack of consensus may be beneficial for research as more dimensions of the rule of law are scrutinized.

However, such conscientious treatment of indices is not the norm at present. Most researchers simply opt for the index that fits their data requirements best without discussing the content validity of these measures. Bolaky and Freund (Reference Bolaky and Freund2006: 23, fn. 16) thus choose the Law and Order Index (Political Risk Services (PRS) below) rather than that of the World Bank (World Governance Indicators (WGI) below) for the sole reason that the latter does not contain year-by-year data for the early 1990s and onwards. Burnside and Dollar (Reference Burnside and Dollar2004: 9) also select the Law and Order Index – as a proxy for institutional quality – because it is ‘available for many developing countries going back to the 1980s’. Even more tellingly, when describing their preferred measure of the dependent variable (i.e. rule of law), Andrews and Montinola (Reference Andrews and Montinola2004: 72) merely note that ‘PRS provides the most complete data for the period of interest’. Knack (Reference Knack2002: 12) captures the standard practice well in observing about the Law and Order Index that:

Because of its much better cross-country coverage … the ICRG [PRS] indicators have been the most widely used governance indicators in the cross-country empirical literature on economic performance.

The list could be expanded by studies, which do not explicitly argue that coverage was the criterion for the choice of data set – but, rather, remain silent (and maybe agnostic) on the question of the criteria for selecting a particular measure, or bluntly refer to the fact that the measure employed has been used in previous studies.Footnote 1

When data convenience rather than content validity becomes the norm, non-interchangeability is indeed problematic. This is further accentuated by the fact that most analyses refrain from using several indices to test the robustness of their findings, which means that the results hinge completely on the chosen index.Footnote 2 In addition, the problem is enhanced because even many of the indices scrutinized in this article – which are selected, among other criteria, due to their wide coverage – exclude different parts of the world, and tend to do so in non-random ways. Bearing this in mind, we set out to assess whether the dominant measures are interchangeable.

Selecting rule of law indices

Among readily available data sets, we have identified seven rule of law measures, based on three criteria. First, they target the actual (de facto) level of the rule of law, rather than formal (de jure) commitments. Second, they primarily build on experts’ assessments in the form of standards-based data.Footnote 3 Third, with the exception of the Ibrahim Index (see below), the scope of the data covers the contemporary degree of rule of law in at least the majority of the countries of the world.

The seven measures are presented in Table 1. The first is the rule of law sub-index from the Bertelsmann Transformation Index (BTI; 2006), which covers almost all non-OECD countries with more than two million citizens. The second is Freedom House's (FH; 2006) sub-category scores for the rule of law found in the Freedom in the World Survey. The third is the law and order index provided by Political Risk Services in its International Country Risk Guide (2007). The fourth is the aggregate rule of law indicator constructed by Kaufman et al. (Reference Kaufman, Kray and Mastruzzi2007) in the context of the WGI, which is based on many different data sources, including most of the other measures mentioned. The fifth is the sub-category scores for legal structure and security of property rights included in Fraser Institute's (FI; 2009) Economic Freedom of the World Data. The sixth is the measure of property rights, which is part of the Index of Economic Freedom provided by the Heritage Foundation (HF; 2009) and the Wall Street Journal. The seventh – and final – measure is the Ibrahim Index of African Governance provided by the Mo Ibrahim Foundation (IF; 2010).

Table 1 Originator, name, and scope of rule of law measures, 2005

BTI = Bertelsmann Transformation Index; FH = Freedom House; PRS = Political Risk Services; WGI = World Governance Indicators; FI = Fraser Institute; HF = Heritage Foundation; IF = Mo Ibrahim Foundation.

Generally, these indices cover different years and differ significantly with regard to the number of countries included. The information presented in the table applies to 2005, which is employed in the subsequent analyses because it is one of the very few years that all seven measures cover. Henceforth, we use the abbreviations of Table 1 to refer to the respective measures.

With the partialFootnote 4 exception of the Ibrahim index, the seven measures arguably suffer from what may be termed an ‘OECD bias’ as they measure the world from the point of view of the well-functioning and affluent OECD countries.Footnote 5 Critics have pointed to this on a number of occasions. The FH has thus been accused of having a right-wing bias and of overstating the level of freedom in ‘US-friendly’ countries (Chomsky and Herman, Reference Chomsky and Herman1988; Bollen, Reference Bollen1992: 205; Barahona, Reference Barahona2007; Giannone, Reference Giannone2010). Likewise, the BTI has been blamed of being ahistorical and Eurocentric, which implies that post-colonial countries are not assessed properly (Koelble and Lipuma, Reference Koelble and Lipuma2008). A similar charge has been leveled against the World Bank, which has been accused of supporting Western interests in general, and US business interests specifically, as it has pushed for a neo-liberal agenda (Uvin, Reference Uvin2002; Moore, Reference Moore2007). Finally, the measures of the HF and the FI have been said to have a libertarian bent (Card and Freeman, Reference Card and Freeman2002: 3; cf. Ashby and Sobel, Reference Ashby and Sobel2008: 332).

At the same time, and as demonstrated in the conceptual appraisal below, the indices measure starkly different things. One important insight of this paper is exactly that two more particular clusters of indices can be identified. Some indices (FH and BTI) seem to target what is best termed ‘political constitutionalism’, whereas others (PRS, HF, and FI) seem to measure ‘public order’. Most strikingly, a conceptual slide occurs when the degree of ‘property rights’ is understood as a measure of the rule of law. At most, such rights constitute one sub-component of the more general concept. This also comes out in Table 2, which illustrates the differences in defining attributes among the seven measures. One might thus – somewhat paradoxically – simultaneously question whether the indices are too uniformly Eurocentric and whether they simply measure different things, such as distinct aspects of the overarching concept of good governance rather than the rule of law.

Table 2 Conceptualization of rule of law measures

BTI = Bertelsmann Transformation Index; FH = Freedom House; PRS = Political Risk Services; WGI = World Governance Indicators; IF = Mo Ibrahim Foundation.

Note: Where the data providers have not provided an explicit definition, we present the indicators.

Neither of these two problems, however, affects the objective of this paper. As mentioned, our aim is not to take stock of the validity of the rule of law measures – or governance measures in general for that matter (see, e.g. Arndt and Oman, Reference Arndt and Oman2006; Thomas, Reference Thomas2007; Williams and Siddique, Reference Williams and Siddique2008) – but to scrutinize their interchangeability. Fraser's and Heritage's property rights measures are, for instance, time and again used as indicators of the rule of law in the literature (cf. Munck, Reference Munck2003; Ríos-Figueroa and Staton, Reference Ríos-Figueroa and Staton2008), which is why we include them in this appraisal. More generally, we have attempted to include the most dominant indices which are actually employed as proxies for the rule of law in contemporary analyses. Using Adcock and Collier's (Reference Adcock and Collier2001) valuable distinction, we compare indices which are based on different ‘systematized concepts’ while agreeing on the ‘background concept’ (the rule of law). That we are not really comparing like with like is thus part of the problem which calls for assessment in the first place.

A preliminary conceptual appraisal

The literature to a large extent uses these indices as if they were interchangeable. The first question is if that is warranted conceptually? As Table 1 makes clear, only four of the indices (BTI, FH, IF, and WGI) actually employ the wording ‘rule of law’, the fifth (PRS) opting instead for the more ambiguous ‘law and order’, whereas the sixth (FI) and seventh (HF) focus on the legal structure and/or property rights. These differences in nomenclature indicate that the measures are not interchangeable even on the conceptual level. This tentative conclusion is borne out as soon as we descend to the level of the defining attributes, illustrated in Table 2.

As should be clear from this overview, little or no consensus concerning the conceptualization of the rule of law exists when we focus on the actual defining attributes underpinning the indices. The rule of law research agenda thus differs significantly from the related democratization research agenda. To illustrate this, it is pertinent to touch upon Casper and Tufis (Reference Casper and Tufis2003), who demonstrate that three prominent measures of democracy are not genuinely interchangeable. However, Casper and Tufis (Reference Casper and Tufis2003: 197) also show that all of these indices commence from Dahl's (Reference Dahl1971) definition of polyarchy and, consequently, that they are highly correlated (correlation coefficients between 0.85 and 0.92). Not even the first of these two premises are fulfilled in our case. In brief, the rule of law research agenda has not settled for a particular systematized concept.

Correlations between the indices

What about the second premise of Casper and Tufis’ (Reference Casper and Tufis2003) analysis: that the indices are highly correlated?Footnote 6 Considering the lack of consensus on the systematized concept, one would expect a much lower consistency of correlations in the case of rule of law measures. Such is indeed the case, as illustrated in Table 3.

Table 3 Bivariate correlations between rule of law indices (2005)

Note: Kendall's tau-b correlation coefficients, N in parentheses.

The seven rule of law measures correlate in the range of 0.21 to 0.80 (Kendall's tau-b). More particularly, two observations can be made. First, the correlations between BTI and FH and among WGI, FI, and HF are very high. Second, FH, BTI, and PRS show a relatively low degree of co-variation with a number of the other measures.

These patterns also emerge in a factor analysis (principal component) that extracts two components (with Eigenvalues above 1, accounting for 65 and 20%, respectively, of the variation).Footnote 7 BTI, FH, and PRS obtain the lowest loadings on the first component. Moreover, PRS exhibits a higher positive loading on the second component, whereas the loadings on this component are negative, and moderately high, in the cases of BTI and FH.

The consistently high correlations between the WGI measure and all the other indices are hardly surprising, considering the fact that the latter make up important parts of the former (or vice versa, as it is to some extent the case with respect to FI). Yet, other than that, only the correlations between BTI and FH operate on the same level. Most striking are the low correlations between the PRS and FH, BTI (0.11 and 0.36, respectively) and between the BTI, FI, and HF (0.40 and 0.51, respectively).

This indicates that some of the indices do not measure the same empirical phenomenon. To be more exact, two or even three clusters seem to exist. First, the FH and BTI indices correlate strongly. Second, the same is the case for FI and HF. Finally, WGI and IF can be linked with both clusters for the simple reason that they subsume the other measures to a large extent.

To further probe these patterns, we follow Adcock and Collier's (Reference Adcock and Collier2001: 540–541) recommendation to assess the correlation between the measures and measures of neighboring, yet distinct, concepts. The most obvious choice is a measure capturing a purely Schumpeterian definition of democracy, that is, one which only includes bare-bones electoral aspects.Footnote 8 We use the sub-component ‘electoral self-determination’ from the CIRI Human Rights Data setFootnote 9 to capture such a minimalist conception of democracy. The results are reported in the first row of Table 4.

Table 4 Correlation between the rule of law measures (2005) and measures of neighboring or affiliated concepts

WGI = World Governance Indicators; PRS = Political Risk Services; FH = Freedom House; BTI = Bertelsmann Transformation Index; FI = Fraser Institute; HF = Heritage Foundation; IF = Mo Ibrahim Foundation; CIRI = The Cingranelli & Richards Human Rights Dataset.

Note: Kendall's tau-b correlation coefficients, N in parentheses.

Two things are of interest here. First, PRS is once again the odd one out as the correlation with electoral process is a mere 0.08. Second, as regards FH and BTI, the correlations are on a par with the equivalent correlations among the rule of law indices (except that between FH and BTI). This goes to show that the empirical convergence of the rule of law indices is relatively low. In a nutshell, different measures of the same concept should have higher correlations than different measures of distinct concepts, even if the same factors cause the phenomena denoted by the distinct concepts (which is quite plausible in the case of democratic elections and the rule of law).

This exercise can also be used in a more constructive way. To unfold the distinction between the identified clusters (FH and BTI vs. FI and HF vs. PRS) in detail, Table 4 also contains a number of other measures affiliated with the rule of law concept: the rights of freedom of speech and freedom of assembly and association as measured by yet other subcomponents from the CIRI, violations of personal integrity rights (such as killings and torture) as measured by the Political Terror Scale provided by Gibney, Cornett, and Wood,Footnote 10 the level of criminality and violent crime as measured by two subcomponents of the Global Peace Index,Footnote 11 and corruption as measured by the Corruption Perception Index compiled by Transparency International.Footnote 12 All data refer to the year 2005, except the two measures of crime from the Global Peace Index that refer to 2007, that is, the first year for which data were available.

The objective of these additional tests is to scrutinize which particular dimensions each of the rule of law indices appears to emphasize. The picture revealed by Table 4 is fairly straightforward. The FH–BTI cluster correlates relatively strongly with the political rights of expression, assembly/association, and free elections but relatively weakly with the measures of crime and corruption. The contrary is the case for the FI–HF cluster. Hence, FH and BTI seem to tap into what could be termed political constitutionalism, whereas FI and HF tap into what is probably best termed public order. Unsurprisingly, the latter tendency is even more pronounced for the PRS, the ‘law and order’ label of which thus seems apt. Finally, IF and WGI correlate strongly on both dimensions but seem somewhat closer to the order dimension than the constitutionalism dimension, relatively speaking.

This information not only goes to underline that the measures tap different latent dimensions, thereby questioning their interchangeability; it should also assist scholars in choosing the measures best suited for the definition of the rule of law they are concerned with. Insofar as the rule of law is equated with political constitutionalism, FH and BTI seem the superior options. If, au contraire, one is on the lookout for a measure of order, PRS probably has the competitive edge, followed by FI and HF, respectively. Finally, should one wish to capture a very encompassing definition of the rule of law – an outlook which clashes with our recommendations in the Conclusions – the composite (and muddled) measures of WGI or IF are preferable.

Correlations with explanatory variables

The assessment carried out above shows that the rival measures seem to tap into the same empirical phenomenon only to a limited extent. This assertion should of course be qualified by the proposition that this is less so with regard to the pairwise comparison of BTI and FH and those among the FI and HF – and the fact that the WGI seems to be associated with both clusters. Still, the correlations show that the indices are not interchangeable tout court.

What is more, the analysis of Casper and Tufis (Reference Casper and Tufis2003) has demonstrated that even the highly correlated democracy measures may produce different explanatory results when used as dependent variables in cross-national analyses (see also Bollen and Paxton, Reference Bollen and Paxton2000; Hadenius and Teorell, Reference Hadenius and Teorell2005). One would, a fortiori, expect this to be even more pronounced for the rule of law measures.

To carry out this third test of the interchangeability of the indices, we have derived a number of explanatory variables from studies of cross-national differences in compliance with the rule of law (the operationalizations of these variables are presented in Appendix A). These are: oil production (Barro, Reference Barro2000; Hansson and Olsson, Reference Hansson and Olsson2006), wealth (Barro, Reference Barro2000; Joireman, Reference Joireman2004), country size (Hansson and Olsson, Reference Hansson and Olsson2006), ethno-religious fractionalization (Weingast, Reference Weingast1997; Hayo and Voigt, Reference Hayo and Voigt2005; Hansson and Olsson, Reference Hansson and Olsson2006), legal system (Hayek, Reference Hayek1973; Eisenberg, Reference Eisenberg1988; Joireman, Reference Joireman2004), communist past (Hoff and Stiglitz, Reference Hoff and Stiglitz2004; Sandholz and Taagepera, Reference Sandholz and Taagepera2005), and religion (Barro, Reference Barro2000; Hayo and Voigt, Reference Hayo and Voigt2005).

This exercise does not depend upon including all theoretically relevant independent variables. The objective is not to explain the causes of the rule of law but to test whether the results are relatively similar or dissimilar when the seven measures are used interchangeably as dependent variables in multiple OLS regression analyses.Footnote 13 To ease the interpretation, all measures have been calibrated to range from 0 (lowest level of rule of law) to 100 (highest level of rule of law).

In order to investigate whether the results might differ depending on the case coverage, we have run two sets of regressions: first, for all countries included in each of the data sets (Table 5) and, second, for the 116 countries included in fiveFootnote 14 of the seven indices (Table 6). The rule of law data once again covers 2005.

Table 5 Summary of regression results with rule of law indices as dependent variable (all countries, 2005)

WGI = World Governance Indicators; FH = Freedom House; BTI = Bertelsmann Transformation Index; PRS = Political Risk Services; FI = Fraser Institute; HF = Heritage Foundation; IF = Mo Ibrahim Foundation.

*P < 0.1, **P < 0.05, ***P < 0.01 (two-tailed test). Results refer to regressions with all countries included.

Note: Unstandardized coefficients reported with (heteroscedasticity-consistent) robust standard errors in parentheses.

Table 6 Summary of regression results with rule of law indices as dependent variable (common country coverage, 2005)

WGI = World Governance Indicators; FH = Freedom House; PRS = Political Risk Services; FI = Fraser Institute; HF = Heritage Foundation.

*P < 0.1, **P < 0.05, ***P < 0.01 (two-tailed test). Results refer to regressions only including countries covered by all five measures.

Note: Unstandardized coefficients reported with (heteroscedasticity-consistent) robust standard errors in parentheses.

When including all countries, the picture is as follows. A few explanatory variables consistently show a significant (wealth) or non-significant (ethno-religious fractionalization, common law) association with the rule of law. However, the remaining variables exhibit stark dissimilarities depending on the measure used.

The statistically significant associations for some variables (not colonized, communist past, Muslim) even point in opposite directions. Using BTI, we encounter a negative relationship with the lack of a colonized past, whereas we find a positive relationship when pinning our faith on WGI, PRS, and FI. Likewise, a communist past produces a significant negative relationship with the rule of law using WGI and HF but a positive relationship using the PRS. Finally, a status as a Muslim country has a strongly significant negative effect on the rule of law using the BTI and FH yet, in contrast, a significant (albeit weaker) positive effect when employing PRS, FI, and HF.

In sum, the association between the explanatory variables and the rule of law ranges from positive to non-existent to negative, depending on which index is used to operationalize the rule of law. Most disturbing are the positive significant associations between PRS and a communist past and a Muslim population, associations which completely contradict the theoretical expectations of the studies from which we have derived these factors.

What happens when we narrow the focus to the 116 countries covered by all the indices (save BTI, which is excluded as it does not cover the OECD countries, and IF, which is excluded as it only covers African countries)? As illustrated in Table 6 below, the results are somewhat more in line with one another. A Communist past and Muslim roots still produce contradictory significant associations, but even this is less pronounced than in Table 5. This indicates that some of the present problems in the literature could be alleviated by confining one's attention to the countries covered by more data sets, that is, by more self-conscious restrictions on the empirical scope of the inquiry.

Non-random patterns of missing data

Such an empirical restriction of the scope of comparison may, however, carry problems of its own. In an assessment of measures of judicial independence, Ríos-Figueroa and Staton (Reference Ríos-Figueroa and Staton2008) identify systematic patterns of missing data across a selection of indices. Based on their distinct empirical coverage, one might expect that the rule of law measures included in our assessment suffers from the same problem.

To see if this is indeed so, we use the simple test of non-random ‘missingness’ devised by Ríos-Figueroa and Staton (Reference Ríos-Figueroa and Staton2008). For each of the independent variables included in the previous regression analyses, we compare the average scores of countries that are included and countries that are missing in four of the data sets, viz. PRS, BTI, FI, and HF. FH and WGI are not assessed because they cover virtually all independent countries of the world, meaning that country data are not missing; IF is excluded because of its focus on one region only.

The question is if there is a statistically significant difference between the scores on each of the independent variables across the two groups (covered vs. missing)? If this is the case, it is likely to introduce biased estimates into any analysis that restricts the scope in accordance with the data availability. The results of the missingness analysis are reported in Table 7.

Table 7 Simple non-random patterns in missing values in rule of law measures, 2005

PRS = Political Risk Services; BTI = Bertelsmann Transformation Index; FI = Fraser Institute; HF = Heritage Foundation.

*P < 0.1, **P < 0.05, ***P < 0.01 (two-tailed test).

Note: Entries are average differences (subtracted means in parentheses) between countries that are covered and missing, respectively, by the rule of law measures.

It turns out that the differences between included and missing countries are indeed significant for a number of the variables. In the cases of wealth, country size, and common law, the differences are significant across the board. For variables such as not colonized, communist past, Muslim, and protestant, the differences are significant for at least one index. Notice also that the direction of the bias – as expressed by the coefficients – sometimes differs. The countries included in BTI are thus significantly poorer than those missing, whereas in the cases of PRS, FI, and HF the countries included are significantly more affluent than their missing equivalents.

Such non-random missingness obviously diminishes our ability to infer from the sample to the general population of all countries.Footnote 15 At the very least, scholars therefore need to abstain from such inferences or, alternatively, justify why it is possible to infer from the included to the missing countries. Ríos-Figueroa and Staton (Reference Ríos-Figueroa and Staton2008: 23) note that it is striking that patterns of missingness are almost never dealt with – or even commented upon – by scholars using data sets on judicial independence. Unfortunately, this observation travels well into studies of the rule of law.

The missingness bias afflicts all the data sets reviewed in this article that do not have universal coverage (i.e. all save FH and WGI). Furthermore, and most importantly for our purposes, the dissimilar systematic biases of these indices further undermine their interchangeability.

Conclusion

We have compared seven different indices, which – as may be inferred from the somewhat nonchalant practices present in the literature – have been taken to measure the same phenomenon, viz., the rule of law. As should be clear from our analyses, however, this does not seem to be the case. As soon as we reach the level of the systematized concept, the indices differ significantly. This is further reflected in the fact that the scores do not correlate highly with each other – at least not vis-à-vis the correlations between the most prominent democracy indices. It is also reflected in the fact that the results of an extensive explanatory model differ starkly, depending on which of the measures is employed as a dependent variable.

As such, the seven indices are clearly not interchangeable. The differences in measurement and coverage of the rule of law indices turn out to have considerable consequences for substantial findings – and much more so than in the neighboring field of democratization studies. The character of the tests carried out in this article means that we are only able to assess the correspondence between the indices in general, and not the extent to which the indicators actually measure the stipulated attributes. Having said that, one particular conclusion concerning the validity of the particular indices does seem warranted. The analyses have shown that PRS is very much out of tune with the other measures, with regard to both conceptual definition and empirical results, indeed, even with regard to its somewhat ambiguous choice of term (‘law and order’). Hence, one should be especially cautious of explanatory conclusions about the rule of law based on this index. It is all the more striking that – as shown in the Introduction – this particular measure has until recently been that most often employed in cross-national analyses of the rule of law, due to its wide coverage.

The six other indices also produce relatively dissimilar empirical results when employed in an explanatory model. Yet at least they seem to make up three clusters of indices, each of which captures important aspects of the same phenomenon. First, FH and BTI are so consistently linked that they can almost be used interchangeably in empirical analyses of the rule of law (in the guise of political constitutionalism). Second, FI and HF also show affinities, both concerning correlations and results (including the fact that they seem to tap into order). Third, WGI taps into all the other indices and therefore offers a more general, but also very muddled, measure. This statement also applies to IF.

This could be read as a classical half-full/half-empty conclusion. If PRS is disregarded, the glass seems half full as some interchangeability is allowed for, depending on whether one equates the rule of law with constitutionalism or order or both. However, the problems detected in the analysis of missingness show that the differing empirical scopes of the data sets further undermine their suitability in robustness tests. Most obviously, BTI suffers from massive problems of non-random missingness vis-à-vis FH. Coupled with the differences in conceptualization, correlation, and explanatory results across FH, BTI, WGI, FI, and HF, this makes for a relatively pessimistic conclusion about interchangeability, even within the described clusters.

The general objective of this paper has been to provide easy-to-understand ‘knowledge of how the specific measures we select affect the empirical inferences we draw’ (Mudde and Schedler, Reference Mudde and Schedler2010: 413). But what are the more particular analytical consequences of the results? When different data sets of the same phenomenon produce different results, an explicit validity test is warranted. More particularly, in such a situation we need a basis for selecting certain data sets and not others (cf. Hawken and Munck, Reference Hawken and Munck2009). This calls for a careful appraisal of the content validity of the indices vis-à-vis the definition of the rule of law employed by each scholar. However, previous large-N studies exploring the sources of rule of law have frequently neglected this exercise as researchers have tended to select single data sets based on cross-spatial and temporal coverage.

Based on our analysis, it seems fair to expect that the results of such analyses are unlikely to prove robust if other indices are also employed. This means that their results hinge entirely on what the employed rule of law-proxy actually measures and the empirical scope it covers – something that needs careful attention and scrupulous elucidation.

A simple recommendation can be made on this basis. Scholars should be very clear about their definition of the rule of law and they should select data sets in accordance with this definition and with the empirical scope conditions of their theories. As Gerring (Reference Gerring1999: 391) pertinently points out, scholars ‘have an obligation to state explicitly why (on the basis of which criteria) certain properties and terms were chosen, or excluded’. If this is done, then the lack of consensus may even spur research on the rule of law.

However, if the scholarly community does not in fact wish the rule of law indices to measure different things, then the recommendation changes. In that case, what the ascending research agenda on the rule of law needs now is a definition on the level of the systematized concepts that becomes generally accepted, as did Robert A. Dahl's seminal definition of polyarchy in the democratization literature. Needless to say, this is a necessary but not a sufficient condition, as the subsequent step of measurement also needs to be handled properly.

Until these issues are faced head-on, the problems pestering the rule of law research agenda are likely to subsist. These problems, in turn, speak volumes about the lack of maturity – and cumulativity – of the good governance research field in general (cf. Doornbos, Reference Doornbos2003). The very essence of science is, after all, to establish a common language based on sound conceptual premises. Research can only become truly cumulative if the vagueness and ambiguity of ordinary language is reduced. That, in turn, calls for either self-conscious and systematic disagreement about the definition of concepts such as the rule of law or, contrariwise, for explicit agreement. Neither seems to be beckoning at the moment.

Acknowledgements

We gratefully acknowledge the valuable comments on earlier versions of this paper from three anonymous reviewers, Kim Mannemar Sønderskov, Gerardo Munck, and participants in presentations at the Sandbjerg Estate, Denmark and Georgia State University, Atlanta.

Appendix A operationalization of explanatory factors

Oil production

Oil production is operationalized using IMF's (2007) list of hydro-carbon rich countries (2000–05) found in the Guide on Resource Transparency. The countries listed rely heavily on oil production for government revenues and receive the value of 1. The remaining countries receive a 0, meaning that the variable is treated as a dichotomy.

Wealth

Wealth is measured using a standard wealth indicator, namely (natural log of) GDP per capita (purchasing power parity) based on data from the Penn World Tables (2007) for the year 2004.

Country size

The (natural) log of a country's total area in square kilometers is used to measure this variable, based on data from the World Development Indicators provided by the World Bank (2008).

Ethno-religious fractionalization

Data on the degree of heterogeneity are taken from Alesina et al.'s (Reference Alesina, Devleeschauwer, Easterly, Kurlat and Wacziarg2003) scores on ethnic and religious fractionalization. As the same logic applies to both ethnic and religious fractionalization, a combined measure has been constructed based on the maximum value of the two original indices for each country.

Not colonized

Data on the colonial past from La Porta et al. (Reference La Porta, Lopez-de-Silanes, Shleifer and Vishny1999) have been used to construct a dummy variable distinguishing between countries that have never been colonized and former colonies. The few missing values have been filled in by scores based on information from the CIA's The World Fact Book, https://www.cia.gov/library/publications/the-world-factbook.

Common law

The data used to distinguish (predominant) common law countries from countries with another legal system (mainly civil law) are taken from La Porta et al. (Reference La Porta, Lopez-de-Silanes, Shleifer and Vishny1999). The few missing values have been filled in by scores based on information from the CIA's The World Fact Book, https://www.cia.gov/library/publications/the-world-factbook.

Communist past

A dummy variable for communist and post-communist countries has been constructed.

Dominant religion

Following the procedure used by Steven Fish (Reference Fish2002), we have constructed two dummy variables, distinguishing between countries where Islam and Protestantism, respectively, are the dominant (plurality or majority) religions.

Footnotes

¹ For instance, in a recent paper, Bleaney and Dimico (Reference Bleaney and Dimico2008: 6) provide a very parsimonious argumentation for their selection of indicators to measure their dependent variables: ‘To proxy institutions we use two alternative indices. The first one is an index of property rights protection … published by the Fraser Institute. The second measure, which is available for considerably more countries, is an index of the rule of law supplied by Kaufmann et al.’

² Exceptions do exist (e.g. Norton, Reference Norton1998; Sunde et al., Reference Sunde, Cervallati and Fortunato2008). Joireman (Reference Joireman2001) uses the civil liberty scores provided by FH as an alternative measure of rule of law but neither FH itself nor the great majority of scholars consider this measure to solely capture the rule of law. Tellingly, Andrews and Montinola (Reference Andrews and Montinola2004) use it as an independent variable in their model trying to explain the strength of rule of law.

³ As Hawken and Munck (Reference Hawken and Munck2009) have demonstrated in an appraisal of indices on corruption, the class of the sources may have strong ramifications for the data.

⁴ ‘Partial’ because the Ibrahim Index – though created as a counterweight to the extant ‘Western’ measures – is basically developed by Harvard scholars, is not based on an alternative ‘non-Western’ conception of the rule of law, and actually combines a number of existing data generated by, for example, the Economist Intelligence Unit and the World Bank. As such, the index bears comparison with the WGI.

⁵ We thank an anonymous reviewer for calling attention to this point of contention.

⁶ Claims for the validity of measures are commonly based on their high statistical correlations with other measures of the same phenomenon. However, strongly correlated measures may all share biases and errors (Forewaker and Krznaric, Reference Forewaker and Krznaric2001: 5; Munck and Verkuilen, Reference Munck and Verkuilen2002: 29) so that high correlations do not signify a high level of validity. It is, nevertheless, important for the question of interchangeability to scrutinize whether the very high correlation between different measures of democracy (cf. Coppedge and Reinicke, Reference Coppedge and Reinicke1990: 61; Alvarez et al., Reference Alvarez, Cheibub, Limongi and Przeworski1996: 18–21) is repeated with the rule of law measures.

⁷ The IF is excluded from this analysis because of its low coverage.

⁸ With a ‘thicker’ definition – for example, of liberal democracy – the rule of law would be included in the intention of the concept, meaning that the measure would, if valid, automatically correlate highly with rule of law measures.

⁹ http://ciri.binghamton.edu/.

¹⁰ Based on the State Department's Country Reports on Human Rights Practices. The data constructed by Gibney, Cornett, and Wood can be found at http://www.politicalterrorscale.org/.

¹¹ http://www.visionofhumanity.org/gpi-data/.

¹² http://www.transparency.org/policy_research/surveys_indices/cpi/2005.

¹³ Owing to lack of variation (which ultimately is due to low country coverage), two of the variables are excluded from the model as regards the IF.

¹⁴ BTI and IF have been excluded as their incorporation would severely limit the number of countries covered by all indices.

¹⁵ Technically, this constitutes truncation on the dependent variable.

References

Adcock, R. Collier, D. (2001), ‘Measurement validity: a shared standard for qualitative and quantitative research’, American Political Science Review 95: 529–546.CrossRef Google Scholar

Alesina, A., Devleeschauwer, A., Easterly, W., Kurlat, S. Wacziarg, R. (2003), ‘Fractionalization’, Journal of Economic Growth 8: 155–194.CrossRef Google Scholar

Ali, A.M. (2003), ‘Institutional differences as sources of growth differences’, Atlantic Economic Journal 31: 348–362.CrossRef Google Scholar

Alvarez, M., Cheibub, J.A., Limongi, F. Przeworski, A. (1996), ‘Classifying political regime types’, Studies in Comparative International Development 31: 3–36.CrossRef Google Scholar

Andrews, J.T. Montinola, G.R. (2004), ‘Veto players and the rule of law in emerging democracies’, Comparative Political Studies 37: 55–87.CrossRef Google Scholar

Arndt, C. Oman, C. (2006), The Uses and Abuses of Governance Indicators, OECD: OECD Publishing.Google Scholar

Ashby, N.J. Sobel, R.S. (2008), ‘Income inequality and economic freedom in the U.S. States’, Public Choice 134: 329–346.CrossRef Google Scholar

Barahona, D. (2007), ‘The freedom house files’. Monthly Review, March.Google Scholar

Barro, R. (2000), ‘Rule of law, democracy, and economic performance’, in O'Driscoll G.P. Jr., Holmes K.R. and Kirkpatrick M. (eds), 2000 Index of Economic Freedom, Washington and New York: Heritage Foundation and Wall Street Journal, pp. 31–49.Google Scholar

Belton, R.K. (2005), ‘Competing definitions of the rule of law’, Carnegie Endowment. Carnegie Paper no. 55.Google Scholar

Bertelsmann Transformation Index (2006), ‘Bertelsmann Transformation Index’. Retrieved 15 March 2008 from http://www.bertelsmann-transformation-index.de/11.0.html?&L=1 Google Scholar

Bleaney, M., Dimico, A. (2008), ‘Biogeographical conditions, the transition to agriculture and long-run growth’. CREDIT Research Paper no. 08/15.Google Scholar

Bolaky, B., Freund, C. (2006). ‘Trade, regulations, and growth’. Paper presented at the IMF Trade and Growth Conference, January 9, 2006, Washington, DC.Google Scholar

Bollen, K.A. (1992), ‘Political rights and political liberties’, in T.B. Jabine and R.P. Claude (eds), Human Rights and Statistics: Getting the Record Straight, Philadelphia: University of Pennsylvania Press,, pp. 188–213.CrossRef Google Scholar

Bollen, K. Paxton, P. (2000), ‘Subjective measures of liberal democracy’, Comparative Political Studies 31: 58–86.CrossRef Google Scholar

Burnside, C., Dollar, D. (2004), ‘Aid, policies, and growth’. World Bank Policy Research Working Paper no. 3251.Google Scholar

Butkiewicz, J.L. Yanikkaya, H. (2006), ‘Institutional quality and economic growth’, Economic Modelling 23: 648–661.CrossRef Google Scholar

Card, D., Freeman, R. (2002), ‘What have two decades of British economic reform delivered?’. Retrieved 7 September 2010 from http://www.nber.org/papers/w8801 CrossRef Google Scholar

Casper, G. Tufis, C. (2003), ‘Correlation versus interchangeability: the limited robustness of empirical findings on democracy using highly correlated data sets’, Political Analysis 11: 196–203.CrossRef Google Scholar

Chomsky, N. Herman, E.S. (1988), Manufacturing Consent. New York, Scottsville: Pantheon Books.Google Scholar

Coppedge, M. Reinicke, W. (1990), ‘Measuring polyarchy’, Studies in Comparative International Development 25: 51–72.CrossRef Google Scholar

Dahl, R.A. (1971), Polyarchy: Participation and Opposition, New Haven: Yale University Press.Google Scholar

Doornbos, M. (2003), ‘ “Good governance”: the metamorphosis of a policy metaphor’, Journal of International Affairs 57: 3–17.Google Scholar

Eisenberg, M.A. (1988), The Nature of the Common Law, Cambridge: Harvard University Press.Google Scholar

Fish, M.S. (2002), ‘Islam and authoritarianism’, World Politics 55: 4–37.CrossRef Google Scholar

Forewaker, J. Krznaric, R. (2001), ‘How to construct a database of liberal democratic performance’, Democratization 8: 1–25.CrossRef Google Scholar

Fraser Institute (2009), ‘Economic freedom of the world’. Retrieved 15 November 2009 from http://www.freetheworld.com Google Scholar

Freedom House (2006), ‘Freedom in the world’. Retrieved 15 March 2008 from http://www.freedomhouse.org/template.cfm?page=15 Google Scholar

Gerring, J. (1999), ‘What makes a concept good?’, Polity 31: 357–393.CrossRef Google Scholar

Giannone, D. (2010), ‘Political and ideological aspects of the measurement of democracy’, Democratization 17: 68–97.CrossRef Google Scholar

Goertz, G. (2005), Social Science Concepts: A Users’ Guide, Princeton: Princeton University Press.Google Scholar

Hadenius, A., Teorell, J. (2005). ‘Assessing alternative indices of democracy’. IPSA Concepts & Methods Working Papers 6.Google Scholar

Hansson, G., Olsson, O. (2006), ‘Country size and the rule of law’. Working Paper 200, School of Economics: University of Gothenburg.Google Scholar

Hawken, A., Munck, G.L. (2009), ‘Do you know your data? Measurement validity in corruption research’, Manuscript.Google Scholar

Hayek, F.A. (1973), Law, Legislation and Liberty, Chicago: University of Chicago Press.Google Scholar

Hayo, B., Voigt, S. (2005), ‘Explaining de facto judicial independence’. Marburg Working Papers on Economics 200507, Philipps-Universität Marburg: Department of Economics.Google Scholar

Heritage Foundation (2009), ‘Index of economic freedom’. Retrieved 15 November 2009 from http://www.heritage.org/index Google Scholar

Hoff, K. Stiglitz, J. (2004), ‘After the big bang? Obstacles to the emergence of rule of law in post-communist societies’, American Economic Review 94: 753–763.CrossRef Google Scholar

IMF (2007), ‘Guide on resource transparency’. Retrieved 15 March 2008 from http://www.imf.org/external/np/pp/2007/eng/051507g.pdf Google Scholar

Joireman, S.F. (2001), ‘Inherited legal systems and effective rule of law’, Journal of Modern African Studies 39: 571–596.CrossRef Google Scholar

Joireman, S.F. (2004), ‘Colonization and the rule of law’, Constitutional Political Economy 15: 315–338.CrossRef Google Scholar

Kaufman, D., Kray, A., Mastruzzi, M. (2007), ‘Governance matters VI: governance indicators for 1996–2006’. Retrieved 15 March 2008 from http://papers.ssrn.com/sol3/papers.cfm?abstract_id=999979 Google Scholar

Knack, S. (1996), ‘Institutions and the convergence hypothesis’, Public Choice 87: 207–228.CrossRef Google Scholar

Knack, S. (2002), ‘Governance and growth’. The IRIS Discussion Papers on Institutions & Development no. 2/05.Google Scholar

Koelble, T.A. Lipuma, E. (2008), ‘Democratizing democracy’, Democratization 15: 1–28.CrossRef Google Scholar

La Porta, R., Lopez-de-Silanes, F., Shleifer, A. Vishny, R. (1999), ‘The quality of government’, Journal of Law, Economics, and Organization 15: 222–279.CrossRef Google Scholar

Mo Ibrahim Foundation (2010), ‘The Ibrahim Index of African Governance’. Retrieved 7 September 2010 from http://www.moibrahimfoundation.org/en/section/the-ibrahim-index Google Scholar

Møller, J., Skaaning, S.-E. (2010), ‘Systematizing thin and thick conceptions of the rule of law’. Paper presented at the APSA 2010 Annual Meeting Paper. from SSRN: http://ssrn.com/abstract=1643367 Google Scholar

Moore, D. (ed.) (2007), The World Bank. KwaZulu-Natal University Press.Google Scholar

Mudde, C. Schedler, A. (2010), ‘Introduction: rational data choice’, Political Research Quarterly 63: 410–416.CrossRef Google Scholar

Munck, G.L. (2003), ‘Measures of democracy, governance and rule of law’. Paper prepared for the World Bank Workshop on Understanding Growth and Freedom from the Bottom Up, July 15–17, Washington, DC.Google Scholar

Munck, G. (2009), Measuring Democracy, Baltimore: The Johns Hopkins University Press.CrossRef Google Scholar

Munck, G. (2010), ‘Comparative politics: taking stock and looking forward’. Paper prepared for presentation at the 2010 APSA Annual Meeting, September 2–5, Washington, D.C.Google Scholar

Munck, G.L. Verkuilen, J. (2002), ‘Conceptualizing and measuring democracy’, Comparative Political Studies 35: 5–34.Google Scholar

Nanda, P. (2006), ‘The “Good Governance” concept revisited’, The ANNALS of the American Academy of Political Science 603: 269–283.CrossRef Google Scholar

Norton, S.W. (1998), ‘Poverty, property rights, and human well-being’, Cato Journal 18: 233–245.Google Scholar

Penn World Tables (2007), ‘PWT 6.2’. Retrieved 15 March 2008 from http://pwt.econ.upenn.edu/php_site/pwt_index.php Google Scholar

Political Risk Services (2007), ‘ICRG’. Retrieved 15 March 2008 from http://www.prsgroup.com/ICRG.aspx Google Scholar

Ríos-Figueroa, J., Staton, J. (2008), ‘Unpacking the rule of law’. Committee of Concepts and Methods Working Paper no. 21.Google Scholar

Sandholz, W. Taagepera, R. (2005), ‘Coruption, culture, and communism’, International Review of Sociology 15: 109–131.CrossRef Google Scholar

Skaaning, S.-E. (2010), ‘Measuring the rule of law’, Political Research Quarterly 63: 449–460.CrossRef Google Scholar

Sunde, U., Cervallati, M. Fortunato, P. (2008), ‘Are all democracies equally good?’, Economic Letters 99(3): 552–556.CrossRef Google Scholar

Tamanaha, B. (2004), On the Rule of Law, Cambridge: Cambridge University Press.CrossRef Google Scholar

The Economist (2008), Economics and the Rule of Law. March 13. http://www.economist.com/node/10849115?story_id=10849115 Google Scholar

Thomas, M. (2007), ‘What do the worldwide governance indicators measure?’. Retrieved 1 May 2009 from http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1007527 Google Scholar

Uvin, P. (2002), ‘On high moral grounds: the incorporation of human rights by the development enterprise’, PRAXIS 17: 1–11.Google Scholar

Weingast, B.R. (1997), ‘The political foundations of democracy and the rule of law’, American Political Science Review 91: 245–263.CrossRef Google Scholar

Williams, A. Siddique, A. (2008), ‘The use (and Abuse) of governance indicators in economics’, Economics of Governance 9: 131–175.CrossRef Google Scholar

World Bank (2008), ‘World Development Indicators’. Retrieved 15 March 2008 from http://publications.worldbank.org/online Google Scholar

Table 1 Originator, name, and scope of rule of law measures, 2005

Table 2 Conceptualization of rule of law measures

Table 3 Bivariate correlations between rule of law indices (2005)

Table 4 Correlation between the rule of law measures (2005) and measures of neighboring or affiliated concepts

Table 5 Summary of regression results with rule of law indices as dependent variable (all countries, 2005)

Table 6 Summary of regression results with rule of law indices as dependent variable (common country coverage, 2005)

Table 7 Simple non-random patterns in missing values in rule of law measures, 2005

Article contents

On the limited interchangeability of rule of law measures

Abstract

Keywords

Introduction

Posing the questions

Selecting rule of law indices

A preliminary conceptual appraisal

Correlations between the indices

Correlations with explanatory variables

Non-random patterns of missing data

Conclusion

Acknowledgements

Appendix A operationalization of explanatory factors

Oil production

Wealth

Country size

Ethno-religious fractionalization

Not colonized

Common law

Communist past

Dominant religion

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests