1. Introduction
Mechanisms are usually viewed as hierarchical, with lower levels of a mechanism influencing, and decomposing, its higher-level behavior. To draw quantitative predictions from a model of a mechanism, the model must capture this hierarchical aspect. The recursive Bayesian network (RBN) formalism was put forward as a means to model mechanistic hierarchies (Casini et al. Reference Casini, Illari, Russo and Williamson2011). The formalism extends the Bayesian network (BN) formalism, already used to model same-level causal relations probabilistically (Pearl Reference Pearl2000). In RBNs, higher-level variables decompose into lower-level causal BNs. The relations between higher- and lower-level variables are constitutive.
This proposal was criticized by Gebharter (Reference Gebharter2014) and Gebharter and Kaiser (Reference Gebharter, Kaiser, Kaiser, Scholz, Plenge and Hüttemann2014), on two main grounds: descriptive adequacy (it is unclear when the formalism is applicable to real mechanisms) and conceptual adequacy (RBNs do not allow one to draw interlevel inferences for explanation and intervention). To overcome such limitations, Gebharter (Reference Gebharter2014) has made the alternative proposal that decomposition involves arrows rather than variables. In particular, he proposes an alternative formalism, also extending the BN formalism, namely, multilevel causal models (MLCMs).
Decomposing variables and decomposing arrows are two alternative ways of modeling mechanistic hierarchies probabilistically. Here, I argue that the former option is superior to the latter. I proceed as follows. In section 2, I present and illustrate RBNs and MLCMs. In section 3, I argue against decomposing arrows. MLCMs lead to counterintuitive notions of mechanistic decomposition and mechanistic explanation. In section 4, I defend RBNs from the criticism. RBNs do allow interlevel causal explanation, via the uncoupling of interlevel causal relations into a constitutional step and a causal step. RBNs also allow reasoning about interlevel interventions; believing otherwise depends on either wrongly assuming that changes cannot transmit along constitutional arrows or demanding that RBNs represent intervention variables, which the formalism is not meant to represent.
2. The Two Formalisms
Both RBNs and MLCMs are extensions of the BN formalism. A BN consists of a directed acyclic graph (DAG), whose nodes are the variables in a finite set (each variable taking finitely many possible values), and the probability distribution
of each variable Vi conditional on its parents Pari. DAG and the probability function are linked by the Markov Condition:
(MC) For any .
In words, each variable is probabilistically independent of its non-descendants, conditional on its parents. For instance, in the DAG in figure 1, V 4 is independent of V 1, and V 5 is conditional on V 2 and V 3. In BN jargon, V 2 and V 3 ‘screen off’ V 4 from V 1 and V 5.

Figure 1. Example of a Bayesian network.
A BN determines a joint probability distribution over its nodes via , where vi is an assignment Vi = x of a value to Vi, and pari is the assignment of values to its parents induced by the assignment v = v 1 … vn. In a causally interpreted BN, the arrows in the DAG stand for direct causal relations, and the network can be used to infer the effects of interventions and make probabilistic predictions (Pearl Reference Pearl2000). In this case, MC is called the Causal Markov Condition (CMC).
2.1. Recursive Bayesian Networks
RBNs represent hierarchies by decomposing variables (Casini et al. Reference Casini, Illari, Russo and Williamson2011). One of the motivations behind this choice is that scientists often talk of properties at different levels that stand in a constitutive relation with one another.Footnote 1 Another motivation—only implicit in Casini et al. (Reference Casini, Illari, Russo and Williamson2011)—is that decomposing variables has the additional advantage of making ‘interlevel causation’ intelligible, by uncoupling (problematic) cases of interlevel downward or upward causation into two (less problematic) steps, a constitutional, across-level step and a causal, same-level step (Craver and Bechtel Reference Craver and Bechtel2007). RBNs make this idea formally precise.
Mechanistic hierarchy is interpreted via the notion of ‘recursive decomposition’ of variables. An RBN is a BN defined over a finite set V of variables whose values may themselves be RBNs. A variable is called a network variable if one or more of its possible values is an RBN and a simple variable otherwise. A standard BN is an RBN whose variables are all simple. An RBN x that occurs as the value of a network variable in RBN y is said to be at a lower level than y; variables in y are the direct superiors of variables in x, while variables in the same network are peers. If an RBN contains no infinite descending chains (i.e., if each descending chain of networks terminates in a standard BN), then it is well founded. Only well-founded RBNs are considered here.
Consider a toy RBNFootnote 2 over V = {C, S} with joint distribution P, where C = {1, 0} represents whether an organism’s tissue is cancerous, and S = {yes, no} is survival after 5 years (fig. 2). Suppose S is a simple variable, but C is a network variable, with each of its two values denoting a lower-level (standard) BN that represents a state of the mechanism for cancer. I ignore many of the factors, such as DNA damage response mechanisms, also responsible for cancer and only focus on the unregulated cell growth and division, D, that results from mutations in the so-called growth factor, G.

Figure 2. Top level of an RBN representing the relation between a network variable, cancer (C), and a simple variable, survival (S).
To C = 1 corresponds a lower-level network c 1 (see fig. 3, left) with joint distribution representing a functioning control mechanism, with a probabilistic dependence (and a causal connection) between G and D. And to C = 0 corresponds a lower-level network c 0 (fig. 3, right) with joint distribution
representing a malfunctioning growth mechanism, with no dependence (and no causal connection) between G and D. Since these two lower-level networks are standard BNs, the RBN is well founded and fully described by the above three networks.

Figure 3. Lower-level networks decomposing the binary network variable C in figure 2 into, respectively, a relation and the absence of a relation between growth factor (G) and division (D).
If an RBN is to be used to model a mechanism, the arrows at the various levels of the RBN signify causal connections. In addition, just as standard causally interpreted BNs are subject to the CMC, a similar condition applies to causally interpreted RBNs, called the Recursive Causal Markov Condition (RCMC). Let (m ≥ n) be the variable set of an RBN closed under the inferiority relation (i.e.,
contains the variables in V, their direct inferiors, their direct inferiors, etc.). Let NIDi indicate the set of non-inferiors-or-descendants of Vi, and DSupi the set of direct superiors of Vi. Then,
(RCMC) For any .
In words, each variable is independent of those variables that are neither its effects (i.e., descendants) nor its inferiors, conditional on its direct causes (i.e., parents) and its direct superiors. Applied to the toy example, given the value of C, the value of its constituents G and D is redundant for inferring to C’s effect S.
RCMC adds to CMC a recursive MC (RMC), to the effect that variables at any level are probabilistically independent of non-inferiors or peers given their direct superiors. Since the screening off that holds in virtue of RMC depends on constitutional rather than causal facts, not all dependencies identified by the RCMC can be causally interpreted.
While some authors treat CMC as a necessary truth, others argue against its universal validity (e.g., Williamson Reference Williamson2005). A similar stance is adopted with respect to RCMC. RCMC is a modeling assumption in need of testing or justification, not a necessary truth. Thus, whether the formalism allows one to adequately represent a mechanism is an empirical rather than stipulative matter.
Inference in RBNs proceeds via a formal device called a flattening. Let be the network variables in
. For each assignment
of values to the network variables, we can construct a standard BN, the flattening of the RBN with respect to n, denoted by n ↓, by taking as nodes the simple variables in
plus the assignments
to the network variables and including an arrow from one variable to another if the former is a parent or direct superior of the latter in the original RBN. The conditional probability distributions are constrained by those in the original RBN, such that in the RBN where
is the direct superior of Vi,
. The flattenings determine a joint distribution over
via
, where the probabilities on the right-hand side are determined by a flattening induced by v 1 … vm.Footnote 3Notice that MC holds in the flattening because RCMC holds in the RBN. Only, since the arrows that link variables to their direct inferiors are constitutional, CMC is not satisfied.Footnote 4
In the cancer example, the flattening with respect to c 1 is (see fig. 4, left), where P(c 1) = 1 and P(S|c 1) are determined by the top-level distribution P and where
and
are determined by the lower-level distribution
. Analogously, the flattening with respect to c 0 is
(see fig. 4, right), where P(c 0) = 1 and P(S|c 0) are determined by the top-level distribution P and where
and
are determined by the lower-level distribution
.

Figure 4. Flattenings of the RBN represented in figures 2 and 3.
In each case, the required probabilities are determined by the original RBN. Given the joint distribution, the causally interpreted RBN may be used to draw quantitative inferences for explanation and intervention, both within and across levels.
2.2. Multilevel Causal Models
Differently from RBNs, MLCMs decompose arrows rather than variables.Footnote 5 A mechanistic hierarchy has to do with ‘marginalizing out’ variables when moving from a lower-level graph to a higher-level graph. In short, the formalism exploits the following idea: when the value of X in the set of Y’s parents Par(Y) is unknown, P(Y|Par(Y)) may be calculated by summing over X’s possible values, , thereby marginalizing X out. As a result, one gets a truncated distribution over V \ {X}, consistent with the original one over V.
Let us indicate a causal model as 〈V, E, P〉, where V and E define a DAG over a variable set V and a set of edges E, and P is the probability distribution associated to the DAG. Let X ↔ Y indicate that two variables X and Y are effects of a latent common cause (i.e., a cause of X and Y not represented within the graph of some variable set V) and with P* ↑ V the ‘restriction’ of the probability distribution P* to a variable set V. The restriction of a lower-level causal model 〈V*, E*, P*〉 to a higher-level causal model 〈V, E, P〉 is so defined (Gebharter Reference Gebharter2014, 147):
(Restriction) 〈V, E, P〉 is a restriction of 〈V*, E*, P*〉 if and only if
a
, and
b
, and
c for all
:
c.1 if there is a directed path from X to Y in 〈V*, E*〉 and no vertex on this path different from X and Y is in V, then X → Y is in 〈V, E〉, and
c.2 if X and Y are connected by a common cause path π in 〈V*, E*〉 or by a path π free of colliders containing a bidirected edge in 〈V*, E*〉, and no vertex on this path π different from X and Y is in V, then X ↔ Y is in 〈V, E〉, and
d no path not implied by c is in 〈V, E〉.
That is, the lower-level structure 〈V*, E*, P*〉 represents the higher-level structure 〈V, E, P〉 if and only if 〈V, E, P〉 is the restriction of 〈V*, E*, P*〉 uniquely determined when V* is restricted to V. The restriction is such that information about causal relations and existence of common causes in 〈V*, E*〉 is preserved by 〈V, E〉, and the probabilistic information of P* is consistent with P upon marginalizing out variables in V* \ V.
A ‘multilevel causal model’ is so defined (Gebharter Reference Gebharter2014, 148):
(MLCM) 〈M 1 = 〈V 1, E 1, P 1〉, …, Mn = 〈Vn, En, Pn〉〉 is a multi-level causal model if and only if
a M 1, …, Mn are causal models, and
b every Mi with 1 < i ≤ n is a restriction of M 1, and
c M 1 satisfies CMC.
That is, an MLCM is an ordered set of causal models 〈M 1 = 〈V 1, E 1, P 1〉, …, Mn = 〈Vn, En, Pn〉〉, where the bottom-level, unrestricted causal model M 1 satisfies CMC. (Higher-level models may not satisfy CMC.) Each causal model in the MLCM represents a mechanism.
The information on the hierarchical relations among the nested mechanisms in the MLCM is contained in a ‘level graph’ (Gebharter Reference Gebharter2014, 149):
(Level graph) A graph G = 〈V, E〉 is called an MLCM 〈M 1 = 〈V 1, E 1, P 1〉, …, Mn = 〈Vn, En, Pn〉〉’s level graph if and only if
a
, and
b for all Mi = 〈Vi, Ei, Pi〉 and Mj = 〈Vj, Ej, Pj〉 in V: Mi → Mj is in G if and only if Vi ⊂ Vj and there is no Mk = 〈Vk, Ek, Pk〉 in V such that Vi ⊂ Vk ⊂ Vj holds.
A level graph G = 〈V, E〉 is constructed from an MLCM by adding dashed (non-causal) arrows between any two models Mi and Mj, Mi → Mj, if and only if Vi is the largest proper subset of Vj in MLCM, so that Mi is, so to say, the smallest restriction of Mj.
Figure 5 represents a level graph. Since the ordering among graphs is not strict, there may be graph pairs (e.g., M 2 and M 3; M 4 and M 3) that do not stand in a restriction relation. Figure 6 depicts a more concrete example, that is, a two-level water dispenser mechanism.Footnote 6 The room temperature T influences a sensor S; S and the status of a tempering button, B, cause the heater H to be on/off; H causes the temperature of the water dispensed, W.

Figure 5. A level graph (reprinted from Gebharter Reference Gebharter2014, 150).

Figure 6. Dispenser mechanism (reprinted from Gebharter Reference Gebharter2014, 151).
3. Criticism of MLCMs
It is unclear whether hierarchies, as analyzed in terms of the notion of ‘marginalizing out’, are mechanistic—that is, whether they represent mechanistic decompositions and grant mechanistic explanations. First, it is unclear whether MLCMs represent mechanistic decompositions. High-level causal models in an MLCM, for instance, M 2 and M 3 in figure 5, are just more coarse-grain representations of one and the same structure, that is, M 1, such that some of the information in M 1 is missing at the higher level, as the term ‘restriction’ suggests.
Second, it is unclear whether MLCMs represent mechanistic explanations. Admittedly, there is a sense in which one explains the relation between, say, the room temperature T and the water temperature W by uncovering the mediating role of the sensor S and the heater H. However, this sort of explanation is different from the explanation whereby one decomposes the cancer mechanism C and uncovers the role of damage G and response D. Variables G and D have an obvious mechanistic role—insofar as they constitute C; instead, S and H seem to have a purely causal role.
The inadequacy of the MLCM notions of mechanistic decomposition and explanation is made more explicit by looking at the kind of hierarchical relations allowed by the formalism. Consider the ‘decompositions’ in figure 5, which correspond to restricting (i) V 1 to V 2, (ii) V 1 to V 3, and (iii) V 3 to V 5. In all such cases, instead of opening a black box (as is common in mechanistic explanation), one ‘creates’ a box and does not, strictly speaking, decompose anything. In (i), the decomposition is ‘filling a blank’: the absence of probabilistic and causal dependencies among variables is explained by direct causation, a hidden common cause structure, or combinations thereof that involve new variables, too. The absence of probabilistic and causal dependencies between X and Z in M 2 is explained by the structure X ↔ Y ← Z in M 1 (more on this case of ‘explanation’ below). Since there is no arrow between X and Z in M 2, and since mechanisms require causal dependencies, what mechanism is X ↔ Y ← Z in M 1 a decomposition of? In (ii) and (iii), in contrast, the decomposition is in fact ‘adding stuff’. For instance, Z ↔ W in M 5 is ‘decomposed’ into Y ← Z ↔ W in M 3. But in what sense is a lower-level mechanism that includes an isolated effect not included in the higher level a decomposition of the higher-level mechanism?
Relatedly, ‘explanations’ do not seem to correspond to some of the represented restrictions either. Consider the restriction of M 4 to M 5. Here, the common cause structure Z ↔ W is ‘explained’ by the absence of probabilistic or causal dependence between Z and a new variable X, which is apparently disconnected from whatever mechanism is responsible for Z ↔ W. An even more striking case of lack of explanation is the ‘decomposition’ of X and Z in M 2 into X ↔ Y ← Z in M 1. A first issue—arguably unintentional (cf. Gebharter Reference Gebharter2014, 146 n. 8)—is that the bidirected arrow in M 1 violates condition c of an MLCM, namely, that M 1 satisfies CMC. Still, even if condition c were satisfied, the problem would remain that, if decompositions are to explain, this sort of decomposition should not be allowed at any level. Intuitively, hidden common cause structures such as X ↔ Y are—insofar as hidden—non-explanatory. They add a mystery rather than remove it. A (drastic) solution that comes to mind is to forbid bidirected arrows at any level. This would entail, however, that restrictions that marginalize out common causes are disallowed, too, which is undesirable because—if one buys into the MLCM framework—the corresponding decompositions would seem (more) explanatory. One may of course impose further conditions to distinguish good from bad restrictions, but it is not obvious how one should proceed in a non ad hoc way, without clear intuitions on the explanatoriness of bidirected arrows.
In sum, the resulting account of mechanistic hierarchies is at best incomplete and at worst inadequate. To prove RBNs’ superiority, it remains to be shown whether RBNs survive Gebharter’s (Reference Gebharter2014) and Gebharter and Kaiser’s (Reference Gebharter, Kaiser, Kaiser, Scholz, Plenge and Hüttemann2014) objections. The next section endeavours to establish that they do.
4. Defense of RBNs
RBNs interpret mechanistic hierarchy via the operation of ‘recursive decomposition’, which in turn depends on RCMC. Two kinds of objections were raised against RCMC. First, about empirical adequacy: it is unclear when RCMC holds and thus whether the formalism is applicable to real mechanisms. Second, about conceptual adequacy: RCMC prevents RBNs from being useful for interlevel reasoning for explanation and intervention.
Let us begin with the first objection: “it is neither obvious that RCMC holds in general, nor is it clear how one could distinguish cases in which it holds from cases in which it does not” (Gebharter and Kaiser Reference Gebharter, Kaiser, Kaiser, Scholz, Plenge and Hüttemann2014, sec. 3.5.3). Agreed, RCMC may not hold in general, nor did Casini et al. (Reference Casini, Illari, Russo and Williamson2011) claim that it does. When does it hold, then? Intuitively, RCMC holds when higher-level differences in some functional property, or phenomenon, depend on differences in its underlying structure, or mechanism, such that the state of the phenomenon makes the states of its constituents in the underlying mechanism redundant with respect to (among other things) the phenomenon’s causes or effects. Not all higher-level phenomena are so dependent on structures and thus representable by network variables. Thus, RBNs may incur a problem of too limited applicability, which is an empirical matter. On the face of it, many biological phenomena seem representable by means of network variables. For instance, it seems appropriate to represent the different effects of a tissue on survival as dependent on differences in the tissue’s underlying cellular structure. In contrast, if my argument in section 3 is correct, MLCMs appear conceptually inadequate—marginalizations may satisfy the restriction condition and yet not correspond to mechanistic decompositions.
Finally, let us come to the objection that RBNs do not support interlevel reasoning for explanation and for prediction of the results of interventions: “[Casini et al.’s] approach does (i) not allow for a graphical representation of how a mechanism’s macro variables are causally connected to the mechanism’s causal micro structure, which is essential when it comes to causal explanation, and it (ii) leads to the fatal consequence that a mechanism’s macro variables’ values cannot be changed by any intervention on the mechanism’s micro structure whatsoever” (Gebharter Reference Gebharter2014, 139).
Explanation first. Since there are no arrows between variable at different levels screened off by network variables, Gebharter claims that it is unclear over which causal paths probabilistic influence propagates between such higher- and lower-level variables (Reference Gebharter2014, 143–44). True, there are no such arrows. But this is because, by assumption, screened-off variables influence each other, if at all, only via network variables. When RCMC is satisfied, probabilistic influence propagates constitutionally (rather than causally) across the flattening’s dashed arrows and causally across same-level solid arrows.
Let us now show how the second objection is ill founded, with reference to the difference in the toy example in section 2.1 between the unconditional probability of S = s 1 and the probability of S = s 1 conditional on a ‘do’ intervention (Pearl Reference Pearl2000) that sets D = d 1. The former equals to P(c 0) P(s 1|c 0) + P(c 1) P(s 1|c 1). The latter is obtained by first removing the arrow G → D from c 1, so that both flattenings have the same structure (see fig. 7) and then calculating P(s 1|do(D = d 1)) = P(s 1d 1) / P(d 1), where


Figure 7. Flattening representing the structure assumed by the flattenings in figure 4 after an intervention on D.
Gebharter objects that “according to the RBN approach, intervening on a mechanism’s microvariables does not have any probabilistic influence on any one of the macrovariables whatsoever” (Reference Gebharter2014, 145) because if one were to use an intervention variable I to intervene on a lower-level variable, the intervention “would—and this can directly be read off the BN’s associated graph’s topology …—not have any probabilistic influence on any macrovariable at all” (145). In the cancer example, an intervention IR on R would not have any effect on S. I think this objection is due to either of the following misinterpretations.
First, it is true that ci screens off D from S, and thus there is no D → S causal arrow. However, interventions on D can still make a difference to S, as the lack of causal connections in the flattening does not block changes along constitutional arrows. It is important to stress that, although the dashed arrows point downward in the flattening, this is because of technical reasons only, having to do with the condition for MC to hold across levels. One may use the downward-pointing arrows to reason—constitutionally—in both directions. Here, changing D makes a constitutional difference to C, which makes a causal difference to S.
Second, it is true that RCMC says that S is independent of any noninferior or descendent (here, none), conditional on its direct causes (here, C) and direct superiors (here, none). But RCMC is assumed to hold in and not in the expanded set
. The reason is that RBNs are meant to represent decompositions of (properties of) wholes into (properties of) their parts; they are not meant to represent parts that do not belong to any whole, such as ID. The graph topology cannot represent such parts. Thus, one cannot read off the graph topology that such intervention variables have no effect.
More generally, in an RBN, everything one gets at lower levels must be the result of (recursively) decomposing the top level. This is not a limitation of RBNs but a means to an end. One cannot represent interventions as variables.Footnote 7 Yet, one can represent interventions as operators, which change the values of either top-level variables or lower-level variables into which network variables (recursively) decompose. The two representations correspond to two well-known strategies for representing interventions, exemplified by respectively Woodward’s (Reference Woodward2003) interventionist semantics and Pearl’s (Reference Pearl2000) do-calculus. Although both strategies are in principle legitimate, only the latter is relevant to the task for which RBNs were developed, that is, to represent mechanistic decompositions.
5. Conclusion
Decomposing variables and decomposing arrows are alternative ways of modeling mechanistic hierarchies by means of BNs. The two options have been made precise by, respectively, RBNs and MLCMs. I argued that RBNs are better than MLCMs at analyzing mechanistic hierarchies and interpreting interlevel mechanistic reasoning. From a conceptual point of view, the argument establishes that the notion of mechanistic hierarchy has a tight connection to the notion of recursive decomposition but no such connection to the notion of marginalizing out.