1. Introduction
The question of the transitivity of causality has been the subject of much debate. As Paul and Hall (Reference Paul and Hall2013, 3) say, “Causality seems to be transitive. If C causes D and D causes E, then C thereby causes E.” The appeal to transitivity is quite standard in informal scientific reasoning: we say things like “the billiards expert hit ball A, causing it to hit ball B, causing it to carom into ball C, which then drops into the pocket.” It then seems natural to conclude then the pool expert’s shot caused ball C to drop into the pocket.
Paul and Hall (Reference Paul and Hall2013, 215) suggest that “preserving transitivity is a basic desideratum for an adequate analysis of causation.” Hall (Reference Hall2000, 198) is even more insistent, saying, “That causation is, necessarily, a transitive relation on events seems to many a bedrock datum, one of the few indisputable a priori insights we have into the workings of the concept.” Lewis (Reference Lewis1986, Reference Lewis2000) imposes transitivity in his influential definition of causality, by taking causality to be the transitive closure (“ancestral,” in his terminology) of a one-step causal dependence relation.
But numerous examples have been presented that cast doubt on transitivity. Paul and Hall (Reference Paul and Hall2013) give a sequence of such counterexamples; Hall (Reference Hall2000) gives others. I review two such examples in the next section. This leaves us in a somewhat uncomfortable position. It seems so natural to think of causality as transitive. In light of the examples, should we just give up on these intuitions? Paul and Hall (Reference Paul and Hall2013, 219) suggest that “what’s needed is a more developed story, according to which the inference from ‘C causes D’ and ‘D causes E’ to ‘C causes E’ is safe provided such-and-such conditions obtain—where these conditions can typically be assumed to obtain, except perhaps in odd cases.” The goal of this article is to provide sufficient conditions for causality to be transitive. I formalize this using the structural equations framework of Halpern and Pearl (Reference Halpern and Pearl2001, Reference Halpern and Pearl2005). The properties that I require suggest that these conditions apply to any definition of causality that depends on counterfactual dependence and uses structural equations (see, e.g., Hitchcock Reference Hitchcock2001, Reference Hitchcock2007; Woodward Reference Woodward2003; Halpern and Pearl Reference Halpern and Pearl2005; Glymour and Wimberly Reference Glymour, Wimberly, Campbell, O’Rourke and Silverstein2007; Hall Reference Hall2007; Halpern Reference Halpern, Yang and Wooldridge2015, for examples of such approaches). These conditions may explain why, although causality is not transitive in general (and is not guaranteed to be transitive according to any of the counterfactual accounts mentioned above), we tend to think of causality as transitive and are surprised when it is not.
2. Defining Causation Using Counterfactuals
In this section, I review some of the machinery of structural equations needed to define causality. For definiteness, I use the same formalism as that given by Halpern and Pearl (Reference Halpern and Pearl2005).
2.1. Causal Structures
Approaches based on structural equations assume that the world is described in terms of random variables and their values. Some random variables may have a causal influence on others. This influence is modeled by a set of structural equations. It is conceptually useful to split the random variables into two sets: the exogenous variables, whose values are determined by factors outside the model, and the endogenous variables, whose values are ultimately determined by the exogenous variables. For example, in a voting scenario, we could have endogenous variables that describe what the voters actually do (i.e., which candidate they vote for), exogenous variables that describe the factors that determine how the voters vote, and a variable describing the outcome (who wins). The structural equations describe how the outcome is determined (majority rules; a candidate wins if A and at least two of B, C, D, and E vote for him; etc.).
Formally, a causal model M is a pair , where
is a signature, which explicitly lists the endogenous and exogenous variables and characterizes their possible values, and
defines a set of modifiable structural equations, relating the values of the variables. A signature
is a tuple
, where
is a set of exogenous variables,
is a set of endogenous variables, and
associates with every variable
a nonempty set
of possible values for Y (i.e., the set of values over which Y ranges). For simplicity, I assume here that
is finite, as is
for every endogenous variable
. The relation
associates with each endogenous variable
a function denoted
such that
. This mathematical notation just makes precise the fact that
determines the value of X, given the values of all the other variables in
. If there is one exogenous variable U and three endogenous variables, X, Y, and Z, then
defines the values of X in terms of the values of Y, Z, and U. For example, we might have
, which is usually written as
.Footnote 1 Thus, if
and
, then
, regardless of how Z is set.
The structural equations define what happens in the presence of external interventions. Setting the value of some variable X to x in a causal model results in a new causal model, denoted
, which is identical to M, except that the equation for X in
is replaced by
.
Following Halpern and Pearl (Reference Halpern and Pearl2005), I restrict attention here to what are called recursive (or acyclic) models. This is the special case in which there is some total ordering of the endogenous variables (the ones in
) such that, unless
, Y is independent of X; that is,
for all
. I write
if
and
. If
, then the value of X may affect the value of Y, but the value of Y cannot affect the value of X. It should be clear that if M is an acyclic causal model, then given a context, that is, a setting
for the exogenous variables in
, there is a unique solution for all the equations. We simply solve for the variables in the order given by
. The value of the variables that come first in the order, that is, the variables X such that there is no variable Y such that
, depends only on the exogenous variables, so their value is immediately determined by the values of the exogenous variables. The values of variables later in the order can be determined once we have determined the values of all the variables earlier in the order.
It is sometimes helpful to represent a causal model graphically. Each node in the graph corresponds to one variable in the model. An arrow from one node to another indicates that the former variable figures as a nontrivial argument in the equation for the latter. The graphical representation is useful for visualizing causal models, and will be used in the next section.
2.2. A Language for Reasoning about Causality
To define causality carefully, it is useful to have a language to reason about causality. Given a signature , a primitive event is a formula of the form
, for
and
. A causal formula (over
) is one of the form
, where
•
is a Boolean combination of primitive events,
•
are distinct variables in
, and
•
.
Such a formula is abbreviated as . The special case in which
is abbreviated as
. Intuitively,
says that
would hold if
were set to
, for
.
A causal formula is true or false in a causal model, given a context. As usual, I write
if the causal formula
is true in causal model M given context
. The
relation is defined inductively. If the variable X has value x in the unique (since we are dealing with acyclic models) solution to the equations in M in context
(i.e., the unique vector of values for the exogenous variables that simultaneously satisfies all equations in M with the variables in
set to
), then
. The truth of conjunctions and negations is defined in the standard way. Finally,
if
.
2.3. Defining Causality
The basic intuition behind counterfactual definitions of causality is that A is a cause of B if there is counterfactual dependence between A and B: if A had not occurred (although it did), then B would not have occurred. It is well known that the counterfactual dependence does not completely capture causality; there are many examples in the literature where people say that A is a cause of B despite the fact that B does not counterfactually depend on A (at least, not in this simple sense). Nevertheless, all the counterfactual definitions of causality (as well as people’s causality ascriptions) agree that this simple type of counterfactual dependence gives a sufficient condition for causality. For the purposes of this article, I consider only cases in which this counterfactual dependence holds.
More formally, say that is a but-for cause of
in
(where
is a Boolean combination of primitive events) if
(so both
and
hold in context
) and there exists some
such that
. Thus, with a but-for cause, changing the value of X to something other than x changes the truth value of
; that is,
counterfactually depends on X.
All the complications in counterfactual approaches to causality arise in how they deal with cases of causality that are not but-for causality. Roughly speaking, the idea is that is a cause of
if the outcome
counterfactually depends on X under the appropriate contingency (i.e., holding some other variables fixed at certain values). While the various approaches to defining causality differ in exactly how this is done, they all agree that a but-for cause should count as a cause. So, for simplicity in this article, I consider only but-for causality and do not both to give a general definition of causality.
3. Sufficient Conditions for Transitivity
In this section I present two different sets of conditions sufficient for transitivity. Before doing that, I give two counterexamples to transitivity, since these motivate the conditions. The first example is taken from (an early version of) Hall (Reference Hall, Collins, Hall and Paul2004) and is also considered by Halpern and Pearl (Reference Halpern and Pearl2005).
Example 1. Consider the following scenario:
Billy contracts a serious but nonfatal disease, so he is hospitalized. Suppose that Monday’s doctor is reliable and administers the medicine first thing in the morning, so that Billy is fully recovered by Tuesday afternoon. Tuesday’s doctor is also reliable and would have treated Billy if Monday’s doctor had failed to. Given that Monday’s doctor treated Billy, it’s a good thing that Tuesday’s doctor did not treat him: one dose of medication is harmless, but two doses are lethal.
Suppose that we are interested in Billy’s medical condition on Wednesday. We can represent this using a causal model with three variables:
• MT for Monday’s treatment (1 if Billy was treated Monday; 0 otherwise);
• TT for Tuesday’s treatment (1 if Billy was treated Tuesday; 0 otherwise); and
• BMC for Billy’s medical condition (0 if Billy feels fine on Wednesday; 1 if Billy feels sick on Wednesday; 2 if Billy is dead on Wednesday).
We can then describe Billy’s condition as a function of the four possible combinations of treatment/nontreatment on Monday and Tuesday. I omit the obvious structural equations corresponding to this discussion; the causal graph is shown in figure 1.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220105124431033-0747:S0031824800008837:S0031824800008837_fg1.png?pub-status=live)
Figure 1. Billy’s medical condition
In the context in which Billy is sick and Monday’s doctor treats him, is a but-for cause of
—because Billy is treated Monday, he is not treated on Tuesday morning. And
is a but-for cause of Billy’s being alive (
). However,
is not a cause of Billy’s being alive. It is clearly not a but-for cause; Billy will still be alive if MT is set to 0. Indeed, it is not even a cause under the more general definitions of causality, according to all the approaches mentioned above; no setting of the other variables will lead to a counterfactual dependence between MT and
. This shows that causality is not transitive according to these approaches. Although
is a cause of
and
is a cause of
,
is not a cause of
. (Of course, according to Lewis [Reference Lewis1986, Reference Lewis2000], who takes the transitive closure of the one-step dependence relation,
is a cause of
.) QED
Although this example may seem somewhat forced, there are many quite realistic examples of lack of transitivity with exactly the same structure. Consider the body’s homeostatic system. An increase in external temperature causes a short-term increase in core body temperature, which in turn causes the homeostatic system to kick in and return the body to normal core body temperature shortly thereafter. But if we say that the increase in external temperature happened at time 0 and the return to normal core body temperature happened at time 1, we certainly would not want to say that the increase in external temperature at time 0 caused the body temperature to be normal at time 1.Footnote 2
There is another reason that causality is intransitive, which is illustrated by the following example, due to McDermott (Reference McDermott1995).
Example 2. Suppose that a dog bites Jim’s right hand. Jim was planning to detonate a bomb, which he normally would do by pressing the button with his right forefinger. Because of the dog bite, he presses the button with his left forefinger. The bomb still goes off.
Consider the causal model with variables DB (the dog bites, with values 0 and 1), P (the press of the button, with values 0, 1, and 2, depending on whether the button is not pressed at all, pressed with the right hand, or pressed with the left hand), and B (the bomb goes off). We have the obvious equations: DB is determined by the context,
, and
if P is either 1 or 2. In the context in which
, it is clear that
is a but-for cause of
(if the dog had not bitten, P would have been 1), and
is a but-for cause of
(if P were 0, then B would be 0), but
is not a but-for cause of
. And again,
is not a cause of
, even under a more general notion of causation. Whether or not the dog had bitten Jim, the button would have been pressed, and the bomb would have detonated. QED
As I said, I believe that we feel that causality is transitive because, in typical settings, it is. My belief is based mainly on introspection here and informal polling of colleagues. Even when told that causality is not transitive, people seem to find it hard to construct counterexamples. This suggests that when they think about their everyday experience of causality, they come up with examples in which causality is transitive. If there were many counterexamples available in everyday life, it would be easier to generate them.
I now give two sets of simple conditions that are sufficient to guarantee transitivity. Specifically, I give conditions to guarantee that if is a but-for cause of
in
and
is a but-for cause of
in
, then
is a but-for cause of
in
.
The first set of conditions assumes that ,
, and
each has a default setting. We can think of the default setting as the result of doing nothing. This makes sense, for example, in the billiards example at the beginning of the article, where we can take the default setting for the shot to be the expert doing nothing and the default setting for the balls to be that they are not in motion. Let the default setting be denoted by the value 0.
Proposition 1. Suppose that (a) is a but-for cause of
in
, (b)
is a but-for cause of
in
, (c)
, (d)
, and (e)
. Then
is a but-for cause of
in
.
Proof. If in the unique solution to the equations in the causal model
in context
and
in the unique solution to the equations in
in context
, then it is immediate that
in the unique solution to the equations in
in context
. That is,
. It follows from assumption a that
. We must thus have
, since otherwise
, so
, which contradicts assumptions b and c. Thus,
is a but-for cause of
, since the value of
depends counterfactually on that of
. QED
Although the conditions of proposition 1 are clearly rather specialized, they arise often in practice. Conditions d and e say that if remains in its default state, then so will
, and if both
and
remain in their default states, then so will
. (These assumptions are very much in the spirit of the assumptions that make a causal network self-contained, in the sense defined by Hitchcock [Reference Hitchcock2007].) Put another way, this says that the reason for
not being in its default state is
not being in its default state, and the reason for
not being in its default state is
and
both not being in their default states. The billiard example can be viewed as a paradigmatic example of when these conditions apply. It seems reasonable to assume that if the expert does not shoot, then ball A does not move, and if the expert does not shoot and ball A does not move (in the context of interest), then ball B does not move, and so on.
Of course, the conditions on proposition 1 do not apply in either example 1 or example 2. The obvious default values in example 1 are , but the equations say that in all contexts
of the causal model
for this example, we have
. In the second example, if we take
and
to be the default values of
and P, then in all contexts
of the causal model
, we have
.
While proposition 1 is useful, there are many examples in which there is no obvious default value. When considering the body’s homeostatic system, even if there is arguably a default value for core body temperature, what is the default value for the external temperature? But it turns out that the key ideas of the proof of proposition 1 apply even if there is no default value. Suppose that is a but-for cause of
in
and
is a but-for cause of
in
. Then to get transitivity, it suffices to find values
,
, and
such that
,
, and
. The argument in the proof of proposition 1 then shows that
.Footnote 3 It then follows that
is a but-for cause of
in
. In proposition 1,
,
, and
were all 0, but there is nothing special about the fact that 0 is a default value here. As long as we can find some values
,
, and
, these conditions apply. I formalize this as proposition 2, which is a straightforward generalization of proposition 1.
Proposition 2. Suppose that there exist values ,
, and
such that (a)
is a but-for cause of
in
, (b)
is a but-for cause of
in
, (c)
, (d)
, and (e)
. Then
is a but-for cause of
in
.
To see how these ideas apply, suppose that a student receive an A+ in a course, which causes her to be accepted at Cornell University (her top choice, of course), which in turn causes her to move to Ithaca. Further suppose that if she had received an A in the course she would have gone to university and as a result moved to city
, and if she gotten anything else, she would have gone to university at
and moved to city
. This story can be captured by a causal model with three variables: G for her grade, U for the university she goes to, and C for the city she moves to. There are no obvious default values for any of these three variables. Nevertheless, we have transitivity here: the student’s A+ was a cause of her being accepted at Cornell, and being accepted at Cornell was a cause of her move to Ithaca; it seems like a reasonable conclusion that the student’s A+ was a cause of her move to Ithaca. And, indeed, transitivity follows from proposition 2. We can take the student getting an A to be
, the student being accepted at university
to be
, and the student moving to
to be
(assuming that
is not Cornell and that
is not Ithaca, of course).
The conditions provided in proposition 2 are not only sufficient for causality to be transitive, they are necessary as well, as the following result shows.
Proposition 3. If is a but-for cause of
in
, then there exist values
,
, and
such that
,
, and
.
Proof. Since is a but-for cause of
in
, there must exist values
and
such that
. Let
be such that
. Since
, it easily follows that
. QED
In light of propositions 2 and 3, understanding why causality is so often taken to be transitive comes down to finding sufficient conditions to guarantee the assumptions of proposition 2. I now present another set of conditions sufficient to guarantee the assumptions of proposition 2 (and thus sufficient to make causality transitive), motivated by the two examples showing that causality is not transitive. To deal with the problem in example 2, I require that for every value in the range of
, there is a value
in the range of
such that
. This requirement holds in many cases of interest; it is guaranteed to hold if
is a but-for cause of
and
is a binary variable (i.e., takes on only two values), since but-for causality requires that two different values of
result in different values of
. But this requirement does not hold in example 2; no setting of DB can force P to be 0.
Imposing this requirement still does not deal with the problem in example 1. To do that, we need one more condition. Say that a variable Y depends on X if there is some setting of all the variables in other than X and Y such that varying the value of X in that setting results in Y’s value varying; that is, there is a setting
of the variables other than X and Y and values x and
of X such that
.
Up to now I have used the phrase “causal path” informally; I now make it more precise. A causal path in a causal model M is a sequence of variables such that
depends on
for
. Since there is an edge between
and
in the causal graph for M exactly if
depends on
, a causal path is just a path in the causal graph. A causal path from
to
is just a causal path whose first node is
and whose last node is
. Finally, Y lies on a causal path from
to
if Y is a node (possibly
or
) on a directed path from
to
.
The additional condition that I require for transitivity is that must lie on every causal path from
to
. Roughly speaking, this says that all the influence of
on
goes through
. This condition does not hold in example 1; as figure 1 shows, there is a direct causal path from MT to BMC that does not include TT. However, this condition does hold in many examples of interest. Going back to the example of the student’s grade, the only way that the student’s grade can influence which city the student moves to is via the university that accepts the student.
The following result summarizes the second set of conditions sufficient for transitivity.
Proposition 4. Suppose that is a but-for cause of
in the causal setting
,
is a but-for cause of
in
, and the following two conditions hold:
a) for every value
, there exists a value
such that
;
b)
is on every causal path from
to
.
Then is a but-for cause of
.
The proof of proposition 4 is not hard, although we must be careful to get all the details right. The high-level idea of the proof is easy to explain, though. Suppose that is a but-for cause of
in
. Then there must be some values
and
such that
. By assumption, there exists a value
such that
. The requirement that
is on every causal path from
to
guarantees that
implies
. Roughly speaking,
“screens off” the effect of
on
, since it is on every causal path from
to
. Now we can apply proposition 2. I defer the formal argument to the appendix.
It is easy to construct examples showing that the conditions of proposition 4 are not necessary for causality to be transitive. Suppose that causes
,
causes
, and there are several causal paths from
to
. Roughly speaking, the reason that
may not be a but-for cause of
is that the effects of
on
may “cancel out” along the various causal paths. This is what happens in the homeostasis example. If
is on all the causal paths from
to
, then, as we have seen, all the effect of
on
is mediated by
, so the effect of
on
on different causal paths cannot “cancel out.” But even if
is not on all the causal paths from
to
, the effects of
on
may not cancel out along the causal paths, and
may still be a cause of
. That said, it seems difficult to find a weakening of the condition in proposition 4 that is simple to state and suffices for causality to be transitive.
Appendix A Proof of Proposition 4
To prove proposition 4, I need a preliminary result, which states a key (and obvious) property of causal paths: if there is no causal path from X to Y, then changing the value of X cannot change the value of Y. Although it is intuitively obvious, proving it carefully requires a little bit of work.
Lemma 1. If Y and all the variables in are endogenous,
, and there is no causal path from a variable in
to Y, then for all sets
of variables disjoint from
and Y and all settings
and
for
, y for Y, and
for
, we have
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220105124431033-0747:S0031824800008837:S0031824800008837_df11.png?pub-status=live)
and
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220105124431033-0747:S0031824800008837:S0031824800008837_df22.png?pub-status=live)
Proof. Define the maximum distance of a variable Y in a causal model M, denoted , to be the length of the longest causal path from an exogenous variable to Y. We prove the result by induction on
. If
, then the value of Y depends only on the values of the exogenous variables, so the result trivially holds. If
, let
be the endogenous variables on which Y depends. These are the endogenous parents of Y in the causal graph (i.e., these are exactly the endogenous variables Z such that there is an edge from Z to Y in the causal graph). For each
,
: for each path from an exogenous variable to Z, there is a longer path to Y, namely, the one formed by adding the edge from Z to Y. Moreover, there is no path from a variable in
to any of
, nor is any of
in
(for otherwise there would be a path from a variable in
to Y, contradicting the assumption of the lemma). Thus, the inductive hypothesis holds for each of
. Since the value of each of
does not change when we change the setting of
from
to
, and the value of Y depends only on the values of
and
(i.e., the values of the exogenous variables), the value of Y cannot change either. QED
I can now prove proposition 4. I restate it here for the convenience of the reader.
Proposition 4. Suppose that is a but-for cause of
in the causal setting
,
is a but-for cause of
in
, and the following two conditions hold:
a) for every value
, there exists a value
such that
;
b)
is on every causal path from
to
.
Then is a but-for cause of
.
Proof
Since is a but-for cause of
in
, there must exist
and
such that
. By assumption, there exists a value
such that
. I claim that
. This follows from a more general claim. I show that if Y is on a causal path from
to
, then
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220105124431033-0747:S0031824800008837:S0031824800008837_df1.png?pub-status=live)
Although it is not obvious, this is essentially the argument sketched in the main part of the text. Literally the same argument as that given below for the proof of (A1) also shows that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220105124431033-0747:S0031824800008837:S0031824800008837_df31.png?pub-status=live)
Define a partial order on endogenous variables that lie on a causal path from
to
by taking
if
precedes
on some causal path from
to
. Since M is a recursive model, if
, we cannot have
(otherwise there would be a cycle). I prove (A1) by induction on the
ordering. The least element in this ordering is clearly
;
must come before every other variable on a causal path from
to
. By assumption,
, and clearly
. Thus, (A1) holds for
. This completes the base case of the induction.
For the inductive step, let Y be a variable that lies on a causal path from and
, and suppose that (A1) holds for all variables
such that
. Let
be the endogenous variables that Y depends on in M. For each of these variables
, either there is a causal path from
to
or there is not. If there is, then the path from
to
can be extended to a directed path P from
to
, by going from
to
, from
to Y, and from Y to
(since Y lies on a causal path from
to
). Since, by assumption,
lies on every causal path from
to
,
must lie on P. Moreover,
must precede Y on P. (Proof: Since Y lies on a path
from
to
,
must precede Y on
. If Y precedes
on P, then there is a cycle, which is a contradiction.) Since
precedes Y on P, it follows that
, so by the inductive hypothesis,
iff
.
Now if there is no causal path from to
, then there also cannot be a causal path P from
to
(otherwise there would be a causal path from
to
formed by appending P to a causal path from
to
, which must exist since, if not, it easily follows from lemma 1 that
would not be a cause of
). Since there is no causal path from
to
, by lemma 1, we must have that
iff
iff
.
Since the value of Y depends only on the values of and
, and I have just shown that
iff
, it follows that
iff
. This completes the proof of the induction step. Since
is on a causal path from
to
, it follows that
iff
. Since
by construction, we have that
, as desired. Thus,
is a but-for cause for
. QED