Published online by Cambridge University Press: 01 June 2004
In contrast to statistical methods, a number of case study methods—collectively referred to as Mill's methods, used by generations of social science researchers—only consider deterministic relationships. They do so to their detriment because heeding the basic lessons of statistical inference can prevent serious inferential errors. Of particular importance is the use of conditional probabilities to compare relevant counterfactuals. A prominent example of work using Mill's methods is Theda Skocpol's States and Social Revolutions. Barbara Geddes's widely assigned critique of Skocpol's claim of a causal relationship between foreign threat and social revolution is valid if this relationship is considered to be deterministic. If, however, we interpret Skocpol's hypothesized causal relationship to be probabilistic, Geddes's data support Skocpol's hypothesis. But Skocpol, unlike Geddes, failed to provide the data necessary to compare conditional probabilities. Also problematic for Skocpol is the fact that when one makes causal inferences, conditional probabilities are of interest only insofar as they provide information about relevant counterfactuals.Jasjeet S. Sekhon thanks Walter R. Mebane Jr., Henry Brady, Bear Braumoeller, Shigeo Hirano, Gary King, John Londregan, Bruce Rusk, Theda Skocpol, Suzanne M. Smith, Jonathan N. Wand, the editors of Perspectives on Politics, and three anonymous reviewers for valuable comments and advice.
“Nothing can be more ludicrous than the sort of parodies on experimental reasoning which one is accustomed to meet with, not in popular discussion only, but in grave treatises, when the affairs of nations are the theme…. ‘How can such or such causes have contributed to the prosperity of one country, when another has prospered without them?’ Whoever makes use of an argument of this kind, not intending to deceive, should be sent back to learn the elements of some one of the more easy physical sciences.”—John Stuart Mill1
Mill 1872, 298.
Case studies have their own role in the progress of political science. They permit discovery of causal mechanisms and new phenomena, and can help draw attention to unexpected results. They should complement statistics. Unfortunately, however, case study research methods often assume deterministic relationships among the variables of interest; and failure to heed the lessons of statistical inference often leads to serious inferential errors, some of which are easy to avoid.
The canonical example of deterministic research methods is the set of rules (or what are often called canons) of inductive inference formalized by John Stuart Mill in his book A System of Logic.2
Ibid.
Cohen and Nagel 1934.
Przeworski and Teune 1970.
Mill's and related methods are valid only when the hypothesized relationship between the cause and effect of interest is unique and deterministic. These two conditions imply the absence of measurement error, because in the presence of such error, the relationship would cease to be deterministic. These conditions strongly restrict the methods' applicability. When Mill's methods of inductive inference are not applicable, conditional probabilities5
A conditional probability is the probability of an event given that another event has occurred. For example, the probability that the total of two dice will be greater than 10 given that the first die is a 4 is a conditional probability.
Needless to say, although Mill was familiar with the work of Pierre-Simon Laplace and other nineteenth-century statisticians, by today's standards, his understanding of estimation and hypothesis testing was simplistic, limited, and—especially in terms of estimation—often erroneous. He did, however, understand that if we want to make valid empirical inferences, we need to obtain and compare conditional probabilities when there may be more than one possible cause of an effect or when the causal relationship is complicated by interaction effects.
The importance of comparing the conditional probabilities of relevant counterfactuals is sometimes overlooked by even good political methodologists. Barbara Geddes, in an insightful and often assigned article on case selection problems in comparative politics, neglects this issue when discussing Theda Skocpol's book States and Social Revolutions.7
Skocpol explores the causes of social revolutions, examining the ones that occurred in France, Russia, and China, as well as the fact that revolutions did not occur in England, Prussia/Germany, and Japan. Geddes seriously questions Skocpol's claim of a causal relationship between foreign threat and social revolution.8Geddes 1990, figure 10.
My discussion does not, however, in any way undermine Geddes's criticism of Skocpol's research design for selecting on the dependent variable. In fact, Skocpol failed to provide the data necessary to compare the conditional probabilities.
Skocpol clearly believes she is relying on Mill's methods. She states that “[c]omparative historical analysis has a long and distinguished pedigree in social science” and that “[i]ts logic was explicitly laid out by John Stuart Mill in his A System of Logic.”9
Skocpol 1979, 36.
Skocpol does not make clear that she is, at best, using the Indirect Method of Difference, which is, as we shall see, much weaker than the Direct Method of Difference.
The key probabilistic idea upon which statistical causal inference relies is conditional probability.11
Holland 1986.
In this article, I outline Mill's methods, showing the serious limitations of his canons and the need to formally compare conditional probabilities in all but the most limited of situations. I then discuss Geddes's critique of Skocpol and posit several elaborations and corrections. I go on to show how difficult it is to establish a relationship between the counterfactuals of interest and the estimated conditional probabilities. I conclude that case study researchers should use the logic of statistical inference and that quantitative scholars should be more careful in how they interpret the conditional probabilities they estimate.
The application of the five methods Mill discusses has a long history in the social sciences. I am hardly the first to criticize the use of these methods in all but very special circumstances. For example, W. S. Robinson, who is well known in political science for his work on the ecological inference problem,12
Ecological inferences are about individual behavior, based on data of group behavior.
Robinson 1951.
Adam Przeworski and Henry Teune advocate the use of what they call the “most similar” design and the “most different” design.14
Przeworski and Teune 1970.
Mill describes his views on scientific investigations in A System of Logic, first published in 1843.16
For all page referencing in A System of Logic, I have used a reprint of the eighth edition, which was initially published in 1872. The eighth edition was the last printed in Mill's lifetime. Of all the editions, the eighth and the third were especially revised and supplemented with new material.
Mill 1872.
Lieberson 1991.
Here, I will review Mill's first three canons and show the importance of taking chance into account as well as comparing conditional probabilities when chance variations cannot be ignored.19
I do not review Mill's other two canons, the Method of Residues and the Method of Concomitant Variations, because they are not directly relevant to my discussion.
“If two or more instances of the phenomenon under investigation have only one circumstance in common, the circumstance in which alone all the instances agree is the cause (or effect) of the given phenomenon.”—John Stuart Mill20
Mill 1872, 255.
A possible cause—i.e., antecedent—may consist of more than one event or condition.21
Per Mill, I use the word antecedent to mean “possible cause.” Neither Mill nor I intend to imply that events must be ordered in time to be causally related.
Let us assume that the antecedents under consideration are A, B, C, D, E, and the effect we are interested in is a. Suppose that in one observation we note the antecedents A, B, C, and in another we note the antecedents A, D, E. If we observe the effect a in both cases, we may conclude, following Mill's Method of Agreement, that A is the cause of a. We conclude this because A is the only antecedent that occurs in both cases—i.e., the observations agree on the presence of A. When using this method, we seek observations that agree on the effect, a, and the supposed cause, A, but differ in the presence of other antecedents.
“If an instance in which the phenomenon under investigation occurs, and an instance in which it does not occur, have every circumstance in common save one, that one occurring only in the former; the circumstance in which alone the two instances differ is the effect, or the cause, or an indispensable part of the cause, of the phenomenon.”
In the Direct Method of Difference, we require, contrary to the Method of Agreement, observations that are alike in every way except one: they differ in the presence or absence of the antecedent we conjecture to be the true cause of a. If we seek to discover the effects of antecedent A, we must introduce A into some set of circumstances we consider relevant, such as B,C ; and having noted the effects produced, we must compare them with the effects of B,C when A is absent. If the effect of A, B, C is a, b, c, and the effect of B, C is b, c, it is evident, under this argument, that the cause of a is A.
Both the Method of Agreement and the Direct Method of Difference are based on a process of elimination. This process has been understood since the time of Francis Bacon to be a centerpiece of inductive reasoning.23
Pledge 1939.
Mill asserts that the Direct Method of Difference is commonly used in experimental science while the Method of Agreement, which is substantially weaker, is employed when experimentation is impossible.24
Mill 1872.
The requirement of a manipulation by the researcher has troubled many philosophers of science. However, the claim is not that causality requires a human manipulation—only that if we wish to measure the effect of a given antecedent, we gain much if we are able to manipulate the antecedent. For instance, we can then be confident that the antecedent caused the effect, and not the other way around. See Brady 2002.
The Direct Method of Difference accurately describes only a small subset of experiments. The method is too restrictive even if the relationship between antecedent A and effect a is deterministic. In particular, the control group B, C and the group with the intervention A, B, C need not be exactly alike (aside from the presence or absence of A). It would be fantastic if the two groups were exactly alike, but this is rarely possible to bring about. Some laboratory experiments are based on this strong assumption; but a more common assumption, one that brings in statistical concerns, is that observations in both groups are balanced before our intervention. In other words, before we apply the treatment, the distributions of both observed and unobserved variables in both groups are presumed to be equal. For example, if group A is the southern states in the United States and group B is the northern states, the two groups are not balanced. The distribution of a long list of variables is different between the groups.
Random assignment of treatment ensures, if the sample is large and if other assumptions are met, that the control and treatment groups are balanced even on unobserved variables.26
Aside from having a large sample size, experiments also need to meet a number of other conditions. See Campbell and Stanley 1966 for an overview particularly relevant for the social sciences. An important problem in experiments dealing with human beings is the issue of compliance. Full compliance implies that every person assigned to treatment actually receives it and every person assigned to control does not. Fortunately, if noncompliance is an issue, there are a number of possible corrections that make few and reasonable assumptions. See Barnard et al. 2003.
Baseline variables are the variables observed before treatment is applied.
More formally, random assignment results in the treatment being stochastically independent of all baseline variables as long as the sample size is large and other assumptions are satisfied.
If the balance assumption is satisfied, a modern experimenter estimates the relative causal effect by comparing the conditional probability of some outcome when the treatment is received with the outcome's conditional probability when the treatment is not received. In the canonical experimental setting, conditional probabilities can be directly interpreted as causal effects.
Complications arise when randomization of treatment is not possible. With observational data (which are found in nature, not as a product of experimental manipulation), many obstacles may prevent conditional probabilities from being directly interpreted as estimates of causal effects. Also problematic are experiments that prevent simple conditional probabilities from being interpreted as relative causal effects. (School voucher experiments are a good example of this phenomenon.29
Barnard et al. 2003 discuss in detail a broken school voucher experiment and a correction using stratification.
In an experiment, much can go wrong (e.g., compliance and missing data problems), but the fact that there is a manipulation can be very helpful in correcting the problems. See Barnard et al. 2003. Corrections are more problematic in the absence of an experimental manipulation because additional assumptions are required.
A primary reason that case study researchers find deterministic methods appealing is the power of the methods. For example, Mill's Direct Method of Difference can determine causality with only two observations. We assume that the antecedents A, B, C and B, C are exactly alike except for the manipulation of A; we also assume deterministic causation as well as the absence of measurement error and interactions among antecedents. Once probabilistic factors are introduced, though, we need larger numbers of observations to make useful inferences. Unfortunately, because of the power of deterministic methods, social scientists with only a small number of observations are tempted to rely heavily on Mill's methods—particularly the Method of Agreement, which we have discussed, and the Indirect Method of Difference.
“If two or more instances in which the phenomenon occurs have only one circumstance in common, while two or more instances in which it does not occur have nothing in common save the absence of that circumstance, the circumstance in which alone the two sets of instances differ is the effect, or the cause, or an indispensable part of the cause, of the phenomenon.”—John Stuart Mill31
Mill 1872, 259.
This method arises by a “double employment of the Method of Agreement.”32
Ibid., 258.
However, this double use of the Method of Agreement is clearly inferior. The Indirect Method of Difference cannot fulfill the requirements of the Direct Method of Difference, for “the requisitions of the Method of Difference are not satisfied unless we can be quite sure either that the instances affirmative of a agree in no antecedents whatever but A, or that the instances negative of a agree in nothing but the negation of A.”33
Ibid., 259.
Many researchers are unclear about these distinctions between the Indirect and Direct Methods of Difference. They often simply state that they are using the Method of Difference when they are actually using only the Indirect Method of Difference. For example, Skocpol asserts that she is using both the Method of Agreement and the “more powerful” Method of Difference when she is at best using the weaker Method of Agreement twice.34
Skocpol 1979.
In sum, scholars who claim to be using the Method of Agreement and the Method of Difference may actually be using the Indirect Method of Difference, the weaker sibling of the Direct Method of Difference. This weakness would not be of much concern if the phenomena we studied were simple. However, in the social sciences, we encounter serious causal complexities.
Mill's methods of inductive inference are valid only if the mapping between antecedents and effects is unique and deterministic.35
Mill 1872.
Mill's methods have additional limitations that are outside the scope of this discussion. For example, there is a set of conditions, call it z, that always exists but is unconnected with the phenomenon of interest. The star Sirius, for instance, is always present (but not always observable) whenever it rains in Boston. Is Sirius and its gravitational force causally related to rain in Boston? Significant issues arise from this question, but I do not have room to address them here.
The foregoing has a number of implications—most important, for deterministic methods such Mill's to work, there must be no measurement error. For even if there were a deterministic relationship between antecedent A and effect a, if we were able to measure either A or a only with some random measurement error, the resulting observed relationship would be probabilistic. We might, for instance, mistakenly think we have observed antecedent A (because of measurement error) in the absence of a. In such a situation, the process of elimination would lead us to conclude that A is not a cause of a.
To my knowledge, no modern social scientist argues that the conditions of uniqueness and lack of measurement error hold in the social sciences. However, the question of whether deterministic causation is plausible has a sizable literature.37
See Waldner 2002 for an overview.
Ontology is the branch of philosophy concerned with the study of existence itself.
Little 1998, chapter 11.
Bennett 1999.
Epistemology is the branch of philosophy concerned with the theory of knowledge—in particular, the nature and derivation of knowledge, its scope, and the reliability of claims to knowledge.
For example, if we can accurately estimate the probability distribution of A causing a, does that mean that we can explain any particular occurrence of a? After surveying three prominent theories of probabilistic causality in the mid-1980s, Wesley Salmon noted that “the primary moral I drew was that causal concepts cannot be fully explicated in terms of statistical relationships; in addition … we need to appeal to causal processes and causal interactions.” Salmon 1989, 168. I do not think these metaphysical issues ought to concern practicing scientists.
Faced with multiple causes and interactions, what are we to do? There are two dominant responses. One relies on statistical tests that account for conditional probabilities and counterfactuals; the other, on detailed (usually formal) theories that make precise, distinct empirical predictions. The statistical approach is adopted by fields such as medicine that have access to large data sets and are able to conduct field experiments. In these fields experiments may be possible, but the available experimental manipulations are not strong enough to satisfy the requirements of the Direct Method of Difference. There are also fields in which researchers can conduct laboratory experiments with such strong manipulations and careful controls that a researcher may reasonably claim to have obtained exact balance and the practical absence of measurement error. These manipulations and controls allow generalizations of the Direct Method of Difference to be used. Deductive theories generally play a prominent role in such fields.43
Mill places great importance on deduction in the three step process of “induction, ratiocination, and verification.” Mill 1872, 304. But on the whole, although the term ratiocinative is in the title of Mill's treatise and even appears before the term inductive, Mill devotes little space to the issue of deductive reasoning.
These two responses are not mutually exclusive. Economics, for example, is a field that depends heavily on both formal theories and statistical tests. Indeed, unless the proposed formal theories are nearly complete, there will always be a need to take random factors into account. And even the most ambitious formal modeler will no doubt concede that a complete deductive theory of politics is probably impossible. Given that our theories are weak, our causes complex, and our data noisy, we cannot avoid conditional probabilities. Thus, even researchers sympathetic to finding necessary or sufficient causes are often led to probability.44
For example, see Ragin 2000.
Mill asks us to consider the situation in which we wish to ascertain the relationship between rain and any particular wind—say, the west wind. A particular wind will not always lead to rain, but the west wind may make rain more likely because of some causal relationship.45
Since a particular wind will not always lead to rain, this implies, according to Mill, that “the connection, if it exists, cannot be an actual law.” Mill 1872, 346. However, he concedes that rain may be connected with a particular wind through some kind of causation. The fact that Mill reserves the word law to refer to deterministic relationships need not detain us.
How can we determine if rain and a particular wind are causally related? The simple answer is to observe whether rain occurs with one wind more frequently than with any other. But we need to take into account the baseline rate at which a given wind occurs. For example:
In England, westerly winds blow about twice as great a portion of the year as easterly. If, therefore, it rains only twice as often with a westerly as with an easterly wind, we have no reason to infer that any law of nature is concerned in the coincidence. If it rains more than twice as often, we may be sure that some law is concerned; either there is some cause in nature which, in this climate, tends to produce both rain and a westerly wind, or a westerly wind has itself some tendency to produce rain.46
Ibid., 346–7.
Formally, we are interested in the following inequality:
where Ω is a set of background conditions we consider necessary for a valid comparison. The probabilistic answer to our question is to compare the relevant conditional probabilities and to see if the difference between the two is significant.47
Mill had almost no notion of formal hypothesis testing, for it was rigorously developed only after Mill had died. He knew that the hypothesis test must be done, but he did not know how to formally do it. See Mill 1872.
If we find that P (rain | westerly wind, Ω) is significantly larger than P (rain | not westerly wind, Ω), we would have some evidence of a causal relationship between westerly wind and rain. But many questions would remain unanswered. For example, we do not know whether the wind caused rain or vice versa. What is more disconcerting, there may be a common cause that results in both rain and the westerly wind; and without this common cause, the inequality above would be reversed. These caveats should alert us that there is much more to establishing causality than merely estimating some conditional probabilities. I will return to this issue in the penultimate section of this article.
Geddes provides an excellent and wide-ranging discussion of case selection issues.48
Geddes 1990.
Geddes's central critique is that Skocpol offers no contrasting cases when trying to establish her claim of a causal relationship between foreign threat and social revolution in her examination of the revolutions that occurred in France, Russia, and China. Geddes does point out that Skocpol provides contrasting cases—namely, England, Prussia/Germany, and Japan—when attempting to establish the importance of two causal variables: dominant classes having an independent economic base and peasants having autonomy.49
Ibid.
developments within the international states system as such—especially defeats in wars or threats of invasion and struggles over colonial controls—have directly contributed to virtually all outbreaks of revolutionary crises.50
Skocpol 1979, 23.
Geddes argues that many nonrevolutionary countries in the world have suffered foreign pressures at least as great as those suffered by the revolutionary countries Skocpol considers, but revolutions are nevertheless rare. Geddes points out that Skocpol first selects countries that have had revolutions and then notices that these countries have faced international threat—i.e., Skocpol has selected on the dependent variable. Such a research design ignores countries that are threatened but do not undergo revolution. In a proper (for instance, random) selection of cases, one could “determine whether revolutions occur more frequently in countries that have faced military threats or not.”51
Geddes 1990, 144.
In short, Geddes claims that Skocpol has no variance in her dependent variable.52
There have been a variety of responses to this charge. David Collier and James Mahoney 1996 concede that such a selection of cases does not allow a researcher to analyze covariation. As they note, the no-variance problem is not exclusively an issue with the dependent variable, and studies that lack variance on an independent variable are obviously also unable to analyze covariation with that variable. Collier and Mahoney argue, however, that a no-variance research design may all the same allow for fruitful inferences. Indeed, it is still possible to apply Mill's Method of Agreement. I have already discussed the Method of Agreement and the problems associated with it; see also Collier 1995.
Some scholars, contrary to Geddes, assert that Skocpol does have variation in her dependent variable, even when she considers the relationship between foreign threat and revolution. See Mahoney 1999, Table 2; Collier and Mahoney 1996. My discussion does not depend on resolving this disagreement.
Geddes's 1990 analysis assumes that Skocpol's theory posits variables individually necessary and collectively sufficient for social revolution. Douglas Dion (1998), in contrast, argues that Skocpol is proposing conditions that are necessary but not sufficient for social revolution.
Goldstone 1997, 108–9.
Burawoy 1989, 768.
Sewell 1996, 260.
Mahoney offers the most elaborate description of Skocpol's research design.57
He argues that Skocpol employs Mill's methods but that she also uses ordinal comparisons and narrative. Mahoney 1999. For our purposes here, it is the ordinal comparisons that matter. In the conclusion of this article, I discuss the importance of the narrative and process-tracing aspects of Skocpol's research design.
Mahoney 1999, 1164.
In the original article, Geddes 1990, this is figure 10.
Ibid., 143.
Ibid., 145.
Ibid.
Relationship in Latin America between Defeat in War and Revolution
In general, I find Geddes's application of Skocpol's theory to Latin American countries to be both reasonable and sympathetic to Skocpol's hypothesis.63
But one objection is that “none of the Latin American countries analyzed by Geddes fits Skocpol's specification of the domain in which she believes the causal patterns identified in her book can be expected to operate.” Collier and Mahoney 1996, 81. Skocpol does assert in her book that she is concerned with revolutions in wealthy, politically ambitious agrarian states that have not experienced colonial domination. Moreover, she explicitly excludes two cases (Mexico 1910 and Bolivia 1952) that Geddes includes in her analysis. I agree with Geddes, however, that it is not clear why the domain of Skocpol's precise causal theory should be so restricted. I do not attempt to resolve Skocpol and Geddes's disagreement regarding parameters. The following discussion is of interest no matter who is right on this point.
Another set of objections to Geddes's analysis concerns her operationalization of concepts. For instance, Dion 1998 claims that Mexico (1910) and Nicaragua (1979) should be moved to the “No Revolution” / “Not Defeated within 20 Years” cell. If Dion is right, one cannot eliminate the possibility that foreign threat is a necessary condition for social revolution. His argument is based, in part, on the understanding that the presence of a large number of cases in the “No Revolution” / “Not Defeated within 20 Years” cell is irrelevant in terms of evaluating necessary causation. This assumption is inaccurate—see Seawright 2002a and Seawright 2002b for details.
I acknowledge that such disagreements with Geddes may be legitimate, but they cut both ways. Goldstone, for example, argues that France was relatively free of foreign threat but nevertheless underwent revolution. Based on this and other points of contention over Skocpol's analysis, Goldstone concludes that “the incidence of war is neither a necessary nor a sufficient answer to the question of the causes of state breakdown.” Goldstone 1991, 20.
Since my main interest here is methodological, I set aside these substantive disagreements and accept both Skocpol's and Geddes's operationalizations.
Geddes 1990, 144.
In order to decide whether the data support a probabilistic association, we need to compare two conditional probabilities. Recalling our discussion of winds and rain (1), we are interested in the following probabilities:
where Ω is the set of background conditions we consider necessary for valid comparisons (such as village autonomy and dominant classes who are economically independent). The probabilistic version of Skocpol's hypothesis is that the probability of revolution given foreign threat (2) is greater than the probability of revolution given the absence of foreign threat (3). Geddes never makes this comparison, but her table offers us the data to do so. We may estimate the first conditional probability of interest (2) to be
. In other words, according to the table, one observation (Bolivia, 1952) of the eight that experienced serious foreign threat underwent revolution.
An estimate of the second conditional probability of interest, the probability of revolution given that there is no serious foreign threat (3), must still be obtained. However, it is not clear from Geddes's table how many countries are in the “No Revolution” / “Not Defeated within 20 Years” cell. She only labels them “all others.” Nevertheless, any reasonable manner of filling this cell will result in an estimate for (3), which is a very small proportion—a much smaller proportion than the one-in-eight estimate obtained for (2). For example, let's take an extremely conservative approach and assume that in this “all others” cell we shall only include countries that do not appear in the other three cells of the table. Countries may appear multiple times in the table (notice Bolivia). We are left with four countries: Ecuador, El Salvador, Guatemala, and Honduras. Let us assume further that every 20 years since independence during which neither a revolution nor a defeat in a foreign war occurred in a given country counts as one observation for the “No Revolution” / “Not Defeated within 20 Years” cell of the table. This 20-year window is consistent with Geddes's decision to allow for 20 years between foreign defeat and revolution. Considering only these four countries, we arrive at 684 such years and hence roughly 34 observations. Since we have 34.2 20-year blocks with neither foreign defeat nor revolution, our estimate of (3) is:
. This number,
, is much smaller than our estimate of (2), which is
.
Instead of only considering the four countries that do not appear anywhere else in the table, if we consider all of the countries in 20-year blocks (starting from the date of independence and ending in 1989) with neither a revolution nor a foreign defeat, we are left with roughly 67 observations (1,337 years). This yields an estimate for (3) of
. Obviously, if we count every year as an observation, our estimate of (3) becomes even smaller.
It is not obvious how to determine whether these estimated differences between (2) and (3) are statistically significant. What is the relevant statistical distribution—a sampling distribution? a Bayesian posterior distribution?—of revolutions and significant foreign threats? However one answers that question, Geddes is clearly incorrect when she asserts that her table offers evidence that contradicts Skocpol's conclusions. Indeed, depending on the distributions of the key variables, the table may offer support for Skocpol's substantive points.65
Some researchers may be tempted to make the usual assumption that all of the observations are independent. Pearson's well-known χ2 test of independence is inappropriate for this data because of the small number of observed counts in some cells. A reliable Bayesian method shows that 93.82 percent of the posterior density is consistent with our estimate of (2) being larger than our estimate of (3). Geddes's original table ends in 1989 because her article was published in 1990. If the table is updated to the end of 2003, the only change is that the count in the “No Revolution” / “Not Defeated within 20 Years” cell becomes 73. The Bayesian method then shows that 94.61 percent of the posterior density is consistent with our estimate of (2) being larger than our estimate of (3). See Sekhon 2003 for details.
Since publishing States and Social Revolutions, Skocpol has argued that “comparative historical analyses proceed through logical juxtapositions of aspects of small numbers of cases. They attempt to identify invariant causal configurations that necessarily (rather than probably) combine to account for outcomes of interest.”66
Skocpol 1984, 378.
Lieberson 1994.
Nothing in this article should be taken as disagreement with Geddes's critique of Skocpol's research design. There is broad consensus that selection on the dependent variable leads to serious biases in inferences when probabilistic associations are of interest. But there is no consensus about the problems caused by selection issues when testing for necessary or sufficient causation. This is because scholars do not agree on what information is relevant in such testing.68
Indeed, some even reject the logic of deterministic elimination when counterexamples are observed—the logic upon which Mill's methods are based. To reach this conclusion, they rely on a particular form of measurement error,69Braumoeller and Goertz 2000.
Dion 1998.
Ragin 2000.
These attempts to bridge the gap between deterministic theories of causality and notions of probability are interesting. Although it is outside the scope of this article to fully engage them, I will note that once we admit that measurement error and causal complexity are problems, it is unclear what benefit there is in assuming that the underlying (but unobservable) causal relationship is in fact deterministic. This is an untestable proposition and hence one that should not be relied upon. It would appear to be more fruitful and straightforward to rely instead fully on the apparatus of statistical causal inference.72
This article has some similarities with Jason Seawright's 2002a discussion of how to test for necessary or sufficient causation. Seawright and I, however, have different goals. He assumes that one wants to test for necessary or sufficient causation, and then goes on to demonstrate that all four cells in Geddes's table contain relevant information for such tests, even the “No Revolution” / “Not Defeated within 20 Years” cell. Nothing in Seawright 2002a alters the conclusion that, based on Table 1, one is able to reject the hypothesis that foreign threat is a necessary and/or sufficient cause of revolution. But I argue that one should test for probabilistic causation in the social sciences. And there is no disagreement in the literature that for such tests all four cells of Table 1 are of interest.
No matter what inference one makes based on Table 1, Geddes is correct in saying that this exercise does not constitute a definitive test of Skocpol's argument. As we have seen, many of the decisions leading to the construction of Table 1 are debatable. But even if we resolve these debates in favor of Table 1, my conditional probability estimates may not provide accurate information of the counterfactuals of interest—e.g., whether a given country undergoing revolution would have been less likely to undergo revolution if it had not, ceteris paribus, faced the foreign threat it did. Moving from estimating conditional probabilities to making judgments about counterfactuals we never observe is tricky business.
Although conditional probability is at the heart of causal inference, by itself it is not enough to support such inferences. Underlying conditional probability is a notion of counterfactual inference. It is possible to have a causal theory that makes no reference to counterfactuals,73
but counterfactual theories of causality are by far the norm, especially in statistics.74Holland 1986; Rubin 1990; Rubin 1978; Rubin 1974; Splawa-Neyman 1990 [1923].
We have to depend on other means to obtain information about what would occur if A were present and if A were not. In many fields, a common alternative to the Direct Method of Difference is a randomized experiment. For example, we can contact Jane to prompt her to vote as part of a turnout study, or we can not contact her. But we cannot do both. If we contact her, we must estimate what would have happened if we had not contacted her, in order to determine what effect contacting Jane has on her behavior (whether she votes or not). We could seek to compare Jane's behavior with that of someone we did not contact who is exactly like her. The reality, however, is that no one is exactly like Jane (aside from the treatment received). So instead, in a randomized experiment, we obtain a group of people (the larger the better), contacting a randomly chosen subset and assigning the remainder to the control group (not to be contacted). We then observe the difference in turnout rates between the two groups and attribute any differences to our treatment.
In principle, the process of random assignment results in the observed and unobserved baseline variables of the two groups being balanced.75
This occurs with arbitrarily high probability as the sample size grows.
Gerber and Green 2000; Imai (forthcoming); Rubin 1974; Rubin 1978.
In the case of Skocpol's work on social revolutions, we would like to know whether countries that faced foreign threat would be less likely to undergo revolution if they had not faced such a threat, and vice versa. It is possible to consider foreign threat the treatment and revolution the outcome of interest. Countries with weak states may be more likely to undergo revolution and also more likely to be attacked by foreign adversaries. In that case, the treatment group (countries that faced external threat) and the control group (those that did not) are not balanced. Thus, any inferences about the counterfactual of interest based on the estimated conditional probabilities in the previous section would be erroneous. How erroneous the inferences will be depends on how unbalanced the two groups are.
Aspects of the previous two paragraphs are well understood by political scientists, especially if we replace the term unbalanced groups with the nearly synonymous confounding variables, or left out variables. But the core counterfactual motivation is often forgotten. This situation may arise when quantitative scholars attempt to estimate partial effects.77
These are the effects a given antecedent has when all of the other variables are held constant.
A good example of these issues is offered by the literatures that developed in the aftermath of the 2000 U.S. presidential election. A number of scholars have tried to estimate the relationship between voters' race and uncounted ballots. Ballots are uncounted because they contain either undervotes (no votes) or overvotes (more than the legal number of votes).78
See Herron and Sekhon 2003 and Herron and Sekhon (forthcoming) for a review of the literature and relevant empirical analyses.
No general solutions or methods ensure that the statistical quantities we estimate provide useful information about the counterfactuals of interest. The solution, which almost always relies on research design and statistical methods, depends on the precise research question under consideration. But all too often, the problem is ignored, and the regression coefficient itself is considered to be an estimate of the partial causal effect. In sum, estimates of conditional means and probabilities are an important component of establishing causal effects, but they are not enough. We must also establish the relationship between the counterfactuals of interest and the conditional probabilities we have managed to estimate.79
Many other issues are important in examining the quality of the conditional probabilities we have estimated. A prominent example is how and when we can legitimately combine a given set of observations—a question that has long been central to statistics. (In fact, a standard objection to statistical analysis is that observations rather different from one another should not be combined.) The original purpose of least squares was to give astronomers a way of combining and weighting their discrepant observations in order to obtain better estimates of the locations and motions of celestial objects. (See Stigler 1986.) A large variety of techniques can help analysts decide when it is valid to combine observations. For example, see Bartels 1996; Mebane and Sekhon 2004. This is a subject to which political scientists need to give more attention.
This article has by no means offered a complete discussion of causality and what it takes to demonstrate a causal relationship. There is much more to this process than just conditional probabilities or even counterfactuals. For example, it is often important to find the causal mechanism at work—to understand the sequence of events leading from A to a. I agree with qualitative researchers that case studies are particularly helpful in learning about such mechanisms. Process tracing is often cited as being especially useful in this regard.80
Process tracing is the enterprise of using narrative and other qualitative methods to determine the mechanisms by which a particular antecedent produces its effects. See George and McKeown 1985.
The importance of searching for causal mechanisms is often overestimated by political scientists, and this sometimes leads to an underestimate of the importance of comparing conditional probabilities. We do not need to have much or any knowledge about mechanisms in order to know that a causal relationship exists. For instance, owing to rudimentary experiments, aspirin has been known to help with pain since Felix Hoffmann synthesized a stable form of acetylsalicylic acid in 1897. In fact, the bark and leaves of the willow tree (rich in the substance called salicin) have been known to help alleviate pain at least since the time of Hippocrates. But only in 1971 did John Vane discover aspirin's biological mechanism of action.81
He was awarded the 1982 Nobel Prize for Medicine for his discovery.
In clinical medicine, case studies continue to contribute valuable knowledge even though large-n statistical research dominates. Although the coexistence of case studies and large-n studies is sometimes uneasy, as shown by the rise of outcomes research, it is nevertheless extremely fruitful; clinicians and scientists are more cooperative than their counterparts in political science.82
Returning to the aspirin example, it is interesting to note that Lawrence Craven, a general practitioner, noticed in 1948 that the 400 men to whom he had prescribed aspirin did not suffer any heart attacks. But it was not until 1985 that the U.S. Food and Drug Administration (FDA) first approved the use of aspirin for the purposes of reducing the risk of heart attack. The path from Craven's observation to the FDA's action required a large-scale randomized experiment.
Vandenbroucke 2001.
Eckstein 1975.
Relationship in Latin America between Defeat in War and Revolution