Both symposiasts jump right into their favourite topics, modularity and Pearl's own do-calculus. But these involve a very small part of the lessons argued for in Hunting Causes and Using Them: Studies in Philosophy and Economics (HC&UT), so I will describe the basics of the book before addressing their comments.
Pearl, it seems, wants my book to do what his does, and more. But mine has a very different programme, which he ignores, and moreover I argue that what Pearl wants can't be obtained. HC&UT has two announced aims. First, a defence of causal pluralism. There are a variety of different kinds of causal systems; methods for discovering causes differ across different kinds of systems as do the inferences that can be made from causal knowledge once discovered. As to causal models, these must have different forms depending on what they are to be used for and on what kinds of systems are under study.
If causal pluralism is right, Pearl's demand to tell economists how they ought to think about causation is misplaced; and his own are not the methods to use. They work for special kinds of problems and for special kinds of systems – those whose causal laws can be represented as Pearl represents them. HC&UT argues these are not the only kinds there are, nor uncontroversially the most typical. Pearl's disappointment in HC&UT depends on his failure to take seriously the demands on science that pluralism makes and his neglect of the actual strategies – good strategies – employed to deal with them. But it has also to do with his failure to read the positive advice that is there. For instance, how and how not to build models mimicking Galilean experiments. These can reveal the stable contributions causes make (in Pearl's loose vocabulary, ‘stable characteristics’), which he will use the do-calculus to discover, as in the smoking/cancer case. Two different methods of very different sorts, with different advantages and disadvantages, for the same job, which Pearl does not take on board. Or, for the more usual use to which the do-calculus is put: calculating the truth value of singular causal counterfactuals. Chapter 9 of HC&UT gives lots of advice here, from sorting out the kinds of counterfactuals at stake to the, to Pearl unwelcome, advice that this job needs purpose/system-specific models. These can be conventional causal models of the kind Pearl builds, estimated from probabilities; but often they should be very different in form (cf. Lucas discussion below) and constructed from a variety of different kinds of assumptions, including facts about stable contributions that can be carried over from a Pearl do-model or a Galilean thought experiment. At any rate Pearl would be sure to find the advice I offer unpalatable, reflecting as it does what I take to be characteristic of science: it's hard, and nuanced; there are no templates and no set procedures.
My second major thesis is, as Steel records, that metaphysics, methods and use must walk hand-in-hand. The methods used to infer causal claims must underwrite the uses to which those claims are put. Metaphysics can glue the two together. We infer the charge of a particle by measuring its deflection in an electromagnetic field. Then we use the measured charge to predict how strongly the particle will attract and repel others. Electromagnetic theory, which tells us ‘what charge is’, justifies the leap between the two.
What about causation? At the end of the studies of accounts of causation collected in HC&UT I arrived at a frightening conclusion: For causality, we don't have any glue. Nothing available, neither empirical nor philosophical, shows how our methods for inferring causes justify the uses to which we typically put causal knowledge. Theories in both philosophy and economics tend to be either too close to method – e.g. the probabilistic theory of causality and related causal Bayes nets, invariance accounts, and even Heckman and Pearl counterfactuals – or too close to use – as with David Hendry and Kevin Hoover.
The book has three parts. Part I defends my two central theses. It includes a paper on warranting causal claims that makes a distinction that is catching on in evidence-based policy discussions between methods (like RCTs, Bayes nets, some econometric modelling and deduction from theory) that clinch their results – if the assumptions of the method are met, the results deductively imply the tested hypothesis – versus those that merely vouch for their results (like qualitative comparative analysis, case-control studies and ethnographic methods).
Part II focuses on the two accounts most vocally defended now as providing universal characterizing features of causality: Bayes nets and invariance/modularity methods. In keeping with my focus on hunting causes, I discuss Bayes nets methods for causal discovery and in keeping with my interest in clinching methods, I focus on methods that are provably valid for inferring causal relations, taking as axioms the three standard causal Bayes nets assumptions: minimality, causal Markov (CMC) and faithfulness. Chapter II.2 of HC&UT is titled ‘What is wrong with Bayes nets?’ Answer: nothing. What is wrong is taking them to be universally applicable. They are good methods for causal inference just where their axioms hold and, I argue, each of the axioms fails in a variety of real cases.Footnote 1
Besides, Bayes nets have powerful virtues. The axioms provide a ‘metaphysics’Footnote 2, or ‘implicit definition’ for the causal relations under study.Footnote 3 Various methods for causal discovery are then provably valid whenever the axioms are satisfied. So metaphysics and method are properly joined.
And use? He sees this as part of the interpretation of the arrow in a causal DAG. But the axioms already constrain its interpretation. So it is necessary to show that the axioms imply the intended interpretation or at least that the two are consistent.Footnote 4 I myself take the axioms to provide all the interpretation there is and the uses to be exactly those that can be proven, thus marrying metaphysics, method and use.Footnote 5 So, consider a DAG and a probability measure for a situation satisfying the axioms. Suppose the causal structure and probability change in specified ways. The axioms provide a theorem machine that might predict what follows. Modularity theorists focus narrowly on one special kind of change. But the axioms can generate results for all sorts of changes. We thus have a powerful tool for prediction. Powerful, but epistemically demanding: The predictions are only as secure as the combination of the DAG and probability measure supposed, our model of the changes and the assumption that the axioms are satisfied.
Aside: note that I don't claim that causal Bayes nets suppose indeterminism. To the contrary, the independence of the ‘error’ terms, which is necessary for CMC, is most natural assuming determinism, supposing ‘error’ terms represent omitted causes. Suppes offered a probabilistic theory of causality, not a theory of probabilistic causality. My point in comparing the two is that faithfulness and CMC were already heavily criticized in Suppes's theory, the first, from Simpson-paradox examples, the second because it is not appropriate if causes are probabilistic.Footnote 6 So, like Suppes theory, causal Bayes nets don't hold for many causal relations. (Readers of Economics and Philosophy may be interested to note that my criticisms of faithfulness are in line with those of Kevin Hoover.)
Part II also contains two chapters showing that Daniel Hausman and James Woodward's two separate attempts to derive CMC from modularity fail. This would have been a nice argument for their anti-pluralist claims that modularity is the key to causality. Despite lengthy and determined attempts, though, their derivations are invalid.
Part III focuses on economics – which readers of Economics and Philosophy might be most interested in. It studies accounts by Hausman, Hendry, Heckman, Hoover,Footnote 7 T. F. Cooley and Stephan LeRoy, Julian Reiss, and Herbert Simon, and includes a chapter on using economic models to learn about causal relations in the world. This latter depends on my long-standing claim, contrary, for example, to Robert Sugden, that economic models can be used to discover the contributions of causal capacities.Footnote 8
Turn now to Steel's and Pearl's central topic, modularity. Prima facie modular systems seem very special. But a number of authors suppose that modularity is the hallmark of causality. Besides Pearl and Woodward, these include Hoover, possibly Simon, Cooley and LeRoy, and Hausman. I have two objections to the usual claims about modularity.
First, it is not a hallmark of causality. Recall the Phillips curve, a canonical example of a non-modular causal connection – one that, à la Robert Lucas, breaks down under attempts to manipulate the cause (inflation) to control the effect (unemployment). There are two standard responses to this failure. First: shaky equations just aren't causal. To reply I turn to examples like the toaster and the carburettor where other conventional criteria – pushes, pulls, energy interchanges – argue that the connections are causal. The lemonade-biscuit machine is an example where judgements about causality based on mechanical criteria go opposite to those from a manipulationist view. My verdict is pluralism: systems can be causal in different ways.
The second response is Steel's: modularity is more common than I think. I agree I had no business saying modularity generally fails, especially since I think the idea of a default position is mistaken. In cases where it matters, do your best to figure out whether modularity obtains.
Still, he must have more luck with machines than I do. I destroyed my toaster jiggling bits inside. And my car mechanic always breaks one thing when he fixes another. Nor am I surprised. We want the causal processes in our day-to-day machines to be stable across reasonably hard use. The very shields that protect them make it hard to manipulate the internal causes separately. Also, typical machines, as well as many biological systems, are like the Lucas example: The causal connections under study depend on the stability of an underlying generating structure.Footnote 9 As with the carburettor, with efficiency in mind, we design structures to guarantee a number of causal processes at once – but doing so makes it difficult to change one by itself.
Pearl believes that with his model structure he can build a model where modularity does not fail for the Phillips curve or my toaster – just add more variables. Perhaps he thinks of adding as a cause of unemployment a dummy variable, ‘agents mistake inflation for a sector-specific price rise’, with something like a delta function allowing one functional form for the relation between inflation and unemployment given a ‘yes’ value and a different functional form for ‘no’. This won't work. There are an endless variety of features beyond this kind of agent mistake that shift the underlying behaviour and expectations that give rise to the Phillips curve. What is needed is not the insistence that ‘proper’ causal laws be resistant to change but rather the understanding of how the laws arise from more basic structures and how the structures work, which I have argued for decades is not by just more and more causal laws.Footnote 10 HC&UT also gives a variety of arguments against a second strategy: pretending that descriptions of these underlying structures can be represented by proper random variables that will then be included as causes in the causal laws the structures give rise to. This is a metaphysical mistake; worse, it is of no practical use.
The most important question about modularity, however, is: Why want it? Ease of repair, as he argues, is one good reason and I think his work here is important both practically and philosophically. Another is that modularity makes causal claims testable. That's why I say ‘epistemically convenient’ rather than ‘modular’. Concomitant variation is well-known to be a weak indicator for causality. Given epistemic convenience it can become a sure test. This is the main point of the chapter on modularity.
I study epistemically convenient linear deterministic systems (ECLDSs), which are much like those Pearl studies. Modularity is secured by special variation-free causes for each effect that cause nothing else in the system except by causing it. HC&UT shows that for an ECLDSs, regression equations that give correct predictions for the effects as these special causes change one-by-oneFootnote 11 are causally correct – a great feature to have since we have good tools for estimating regressions.Footnote 12 There is also a weaker result.Footnote 13 ECLDSs pair each effect with a cause all its own. Then functionally correct equations where none of these special causes appears where it does not belong are causally correct.
It is a shame that Pearl did not notice these results since they add crucially to his programme. The conditions defining ECLDSs provide the ‘metaphysics’ of the causal relations Pearl studies; the do-calculus, a machine for predicting, hence use. But method? How do we first discover the causal system? Simple econometrics teaches how to identify ECLDSs; my two theorems provide sufficient conditions for an identified system to be casually correct. When these conditions are satisfied, we know we are entitled to the predictions the do-calculus provides.
Be careful though. Chapter 9 of HC&UT is all about counterfactuals in economics; it discusses Heckman, Cooley and LeRoy, Hausman, Reiss and, in addition, Pearl. I focus in this reply on Pearl since that is what he has done. HC&UT queries what counterfactuals say and what they are good for. In Pearl's system the answer to the first is clear. Suppose a situation can be represented with a set of Pearl-type equations, with values or probabilities specified for the external variables. The antecedent of the counterfactual plus the do-semantics dictates what changes occur in the equations.Footnote 14 The new equations dictate new probabilities and values, including the counterfactual consequent. So it's a great predictive tool – so long as the first equations are correct and the changes in the situation are those represented in the antecedent and do-semantics. Unlike Pearl, I think we make far more kinds of change and in far different kinds of systems than he can handle, including ones that change the underlying structure, à la Lucas. This is my second problem with focusing on modularity: it's of very limited use.
My point is obviously not that we should be able to predict ‘when there is not enough information’ to do so, but rather that Pearl's methods are limited in the counterfactuals they can evaluate. And often there are other methods that work where Pearl's won't. After all, Lucas did not just claim the Phillips curve breaks down when inflation is manipulated; he produced a rational-expectations model to show it. And I offer ways to model situations where causes act probabilistically and CMC breaks down, models from which it is equally possible to derive what results from specified changes. Also, like my lemonade-biscuit model, a Hoover-style modelFootnote 15 can give correct predictions about real manipulations without laying out the (mechanical) causal laws, as can models in which manipulated variables are superexogenous even if not causal by other standards. Or – HC&UT talks about ‘implementation-neutral’ counterfactuals. We might want a causal connection to hold under ‘any’ implementation, without knowing what those will be, as in sending products where users live very differently. Often we have good reasons to suppose a counterfactual is, or isn't, true, given an implementation-neutral interpretation. But this can't be extracted by do-semantics.
HC&UT focuses explicitly on counterfactuals that provide direct predictions of what will happen when we act. In his discussion here Pearl embraces a different kind: mere conceptual changes in a representation scheme.Footnote 16 Fine, but what are these counterfactuals good for? Not testing, which I have stressed is a nice thing that modularity under real changes can secure. A case Pearl defends is P(cancer/do(smoking)). For decades I have called this quantity ‘the strength of smoking's capacity to affect cancer’ and only a few pages away from Pearl's quote from me, I argue contra Heckman that it makes sense even where manipulating smoking by itself is not actually possible. So I admit the importance of this quantity. But I add a warning. This quantity may measure the contribution of a ‘stable biological characteristic’ but evidence that there is a stable contribution must come from elsewhere.Footnote 17 We can extract P(x/do(y)) for any x and y from Pearl-type equations governing a situationFootnote 18 but this does not mean this quantity has significance for other situations. This is why I stress that ‘capacity’ is an especially strong metaphysical notion, beyond that of ‘causal law’, and knowing that a factor has a capacity requires more evidence, different in kind, than is required for causal-law claims.
P(x/do(y)) nicely illustrates the theme of hunting and using causes. We extract this quantity from Pearl-equations with do-semantics. We get those equations by econometric modelling plus, say, my sufficiency tests for a causal interpretation.Footnote 19 But to use it in the way Pearl and I suggest, we still need to establish a great deal more. My concern is to see to it that our methods for doing so are appropriate to our uses.