1. Introduction: The Efficiency Question
Economic modeling has been criticized for being too idealized, being intellectually isolated from neighboring fields, and having a poor predictive record.Footnote 1 A particular debate has arisen more recently over whether economic models explain and, if so, in what sense. In this article, I urge a shift in philosophical focus. The reason is that a different issue is much more pressing. Both sides agree that models are sometimes useful and sometimes not. What really matters is, how often are they useful? Thus, should economists do more such modeling or should they invest their efforts elsewhere? What matters is not whether models can play the explanatory role that one side insists they can and the other insists they cannot. Rather, if defenders are right, what matters is how well, in fact, models do play their explanatory role. And likewise, if critics are right, what matters is how well models play an alternative role.
I label this the efficiency question. To answer it requires, so to speak, an epistemic cost-benefit analysis. The costs are the resources invested into modeling, such as mathematical training of students, and perhaps more notably the opportunity costs, such as fieldwork methods not taught and fieldwork not done. The benefits are the successful explanations, predictions, and interventions that modeling leads to. What is the balance—compared to alternatives?
And what are those alternatives? The answer is any mix of methods other than the current one and, in particular, mixes that put less emphasis on orthodox theory. Freed of orthodox methodological constraints, economics would arguably be able to take advantage of a much wider range of empirical methods—generating results that in a virtuous circle could then feed back into more theory development, just as they do in many other sciences. These empirical methods include qualitative methods such as interviews and ethnographic observation; questionnaires; small-N causal inference, such as qualitative comparative analysis; purely predictive models; causal process tracing; causal inference from observational statistics; machine learning from big data; historical studies; randomized controlled trials; laboratory experiments; and natural and quasi experiments.Footnote 2 Each of these has its own strengths and weaknesses, but each is already widely practiced and has a developed and rigorous methodological literature. Turning to them is in no way a return to the fuzzy verbal analysis that is the pejorative memory of much prewar economics.
What is the optimal balance between, on one hand, building up a library of orthodox rational choice models and, on the other hand, pursuing more applied, contextual work and utilizing a wider range of empirical methods? Current practice is already a mixture of the two, so the question becomes, is it the right mixture? Of course, the associated cost-benefit analysis can be done only imperfectly and approximately. In reality, it is hard to count up explanations and predictions in an objective way, hard to weigh these versus other goals of science, and hard also to evaluate the counterfactual of whether a different allocation of resources would have done better. But implicitly, efficiency analyses are unavoidable and are being done already, namely, every time a researcher chooses, or a graduate school teaches, one method rather than another or journals or prizes or hirers choose one paper or candidate rather than another.Footnote 3 The status quo is not inevitable, as shown by different practices in other social sciences and also within economics by the recent ‘empirical turn’ (sec. 5 below). So the efficiency question must be faced. And it is surely better to face it explicitly than to leave it to inertia and sociological winds. Efficiency analysis is worth our time. Indeed, it is arguably the most practically important issue in philosophy of economics.
It might be objected that such efficiency analysis is impossible because orthodox theory and the various alternatives are too entangled to be separated. For example, the very hypothesis a field trial tests might be derived from theory.Footnote 4 This is true up to a point, but not up to the point that the possibility of efficiency analysis can be wished away. This is again best demonstrated by examples, which will illustrate both the feasibility and the value of efficiency analysis.
What is the best way to do efficiency analysis? In practice, it is via case studies, that is, the details of actual examples, rather than via than some abstract calculus. Moreover, its answers are typically not especially sensitive to the exact philosophical account of explanation that we happen to endorse. For this very reason, at least for the purpose of efficiency analysis in economics, philosophical attention should be diverted away from theories of explanation. These claims are again best demonstrated by example, to which I turn in a moment.
Two kinds of efficiency analysis are possible. The first is global: does the current overall allocation of resources serve economics well compared to a different allocation? This is the most challenging to assess because of the vast range of costs and benefits involved. The second kind of efficiency analysis is local: given a particular explanandum, what methods should be used to tackle it and in what proportion? This is much more tractable, and many case studies are in part just such analyses already.
The plan of the article is as follows: in the next section, I present an example of efficiency analysis at the local level. Then I explain how the efficiency question has been neglected by the philosophy of economics literature, before showing how it has been neglected by the wider scientific modeling literature too. At the end, I return to the issue of efficiency analysis at the global level.
2. Local Efficiency Analysis: Prisoner’s Dilemma and World War I Truces
According to JSTOR, almost 22,000 journal articles have appeared about the Prisoner’s Dilemma since 1970. A striking aspect of this huge literature is its overwhelmingly theoretical focus. Much of it concerns developments of the basic game: versions with multiple moves or players, versions with asynchronous moves, iterated versions, evolutionary versions, and many other tweaks besides. Research muscle has been bet on theoretical development. Empirical applications, by contrast, are conspicuously thin on the ground.Footnote 5
It is in fact hard to find serious attempts to apply the Prisoner’s Dilemma to explain actual historical or contemporary phenomena, as opposed to informal mentions or offhand remarks. I will focus here on one of the few such attempts, namely, the well-known analysis by Axelrod (Reference Axelrod1984) of the live-and-let-live system of spontaneous truces in World War I (WWI).Footnote 6 Has the Prisoner’s Dilemma literature’s theoretical focus borne fruit in this case? I answer ‘no’ and thus that there is good reason to think that—with respect to this particular explanandum—research muscle has been allocated inefficiently.Footnote 7
Axelrod models the situation in the WWI trenches as an iterated Prisoner’s Dilemma. What behaviors should we expect? To answer that, Axelrod ran a series of computer tournaments from which he inferred that the optimal strategy is tit-for-tat with initial cooperation; that is, we should expect players initially to cooperate and thereafter to repeat whatever the other player did in the previous period.Footnote 8 Axelrod then draws on the fascinating and detailed account of WWI trench warfare by the historian Tony Ashworth (Reference Ashworth1980), itself based on extensive archives, letters, and interviews with veterans. Axelrod’s explicit goal (Reference Axelrod1984, 71) is to explain how informal truces could have arisen spontaneously on the Western front despite constant pressure against them from senior commanders. His case is that, upon analysis, the implicit payoffs for each side were those of an indefinitely iterated Prisoner’s Dilemma and that cooperation—that is, a truce—is therefore exactly his theory’s prediction.
Many historical details seem to support Axelrod’s case, such as the limited retaliations that followed breaches of a truce or the demonstrations of force capability via harmless means in order to establish a threat credibly but nondisruptively. Perhaps the most striking evidence is how the live-and-let-live system eventually broke down. The (unwitting) cause of this was a policy, dictated by senior command, of frequent raids, that is, carefully prepared attacks on enemy trenches. If successful, prisoners would be taken; if not, casualties would be proof of the attempt. Since raids and retaliations could be easily monitored by senior officers, covert cooperation between the two sides became impossible. It is no coincidence, Axelrod argues, that exactly then the truces broke down.
Is this a case, then, of theoretical work earning its explanatory keep and thus of research resources being allocated wisely? That is certainly how it is usually reported, including by Axelrod. But, alas, closer inspection shows the opposite. To begin, by Axelrod’s own admission some elements of the story deviate from his predictions. The norms of most truces, for instance, were not tit-for-tat but more like three-tits-for-tat; that is, typically retaliation for the breach of a truce was roughly three times stronger than the original breach. More seriously, a vital element to sustaining the truces was the development of what Axelrod terms ethics and rituals: local truce norms became ritualized and their observance quickly acquired a moral tinge in the eyes of soldiers. This made truces much more robust and is crucial to explaining their persistence, as Axelrod concedes. Yet, as Axelrod also concedes, Prisoner’s Dilemma says nothing about it. Indeed, he comments that this emergence of ethics is modeled most easily as a change in the players’ payoffs, that is, as a different game altogether (Reference Axelrod1984, 85).
There are several other important shortfalls in addition to those remarked by Axelrod. First, his theory predicts there should be no truce breaches at all, but in fact breaches were common. Second, as a result (and as Axelrod does acknowledge), a series of dampening mechanisms therefore had to be developed in order to defuse postbreach cycles of retaliation. Again, the tit-for-tat analysis is silent about this vital element. Third, it is not just that truces had to be robust against continuous minor breaches; the bigger story is that often no truces arose at all. Indeed, Ashworth examined regimental and other archives in some detail to arrive at the estimate that, overall, there were truces only about one-quarter of the time (Reference Ashworth1980, 171–75). That is, on average, three-quarters of the front was not in a condition of live-and-let-live. Prisoner’s Dilemma is silent as to why. Finally, Axelrod’s explanations are after the fact; there are no novel predictions. Thus, it is difficult to rule out wishful rationalization or that other games might fit the evidence just as well.
There is no mystery, meanwhile, as to what the actual explanations of these various phenomena are, for they are given clearly by Ashworth and indeed in many cases are explicit in the letters of the original soldiers. Thus, for instance, elite and nonelite units had different attitudes and incentives, for various well-understood reasons. These in turn led to truces occurring overwhelmingly only between nonelite units, again for well-understood reasons. Why did breaches of truces occur frequently, even before raiding became widespread? Ashworth explains via detailed reference to different incentives for different units (artillery vs. frontline infantry, for instance) and to the fallibility of the mechanisms in place for controlling individual hotheads (Reference Ashworth1980, 153–71). And so on. Removing our Prisoner’s Dilemma lens, we see that we have perfectly adequate explanations already.
Overall, we cannot reasonably claim that Axelrod’s theoretical analysis explains the WWI truces. It is not empirically adequate, it misses crucial elements even in those areas where at face value it is empirically adequate, and it is silent on obvious related explananda: not just why truces persisted but also why they arose on only a minority of occasions, how they originated, and (to some degree) when and why they broke down. Meanwhile, we already have an alternative that does explain all of these things—namely, Ashworth’s historical account.
This comparative verdict holds true given any plausible theory of explanation or of prediction’s relation to explanation. We have no empirical warrant for thinking that Prisoner’s Dilemma identified the relevant causes, thus negating claims of causal explanation. Deductive-nomological, unification, and mathematical accounts of scientific explanation similarly require an empirical warrant that is absent in this case. Some recent accounts of explanation by models, as we will see below, do put less emphasis on empirical warrant. But what matters here is the relative explanatory achievement of Ashworth and Axelrod, and given the disparity in empirical success, no plausible account of explanation would prefer Axelrod.
But even if it fails to explain, perhaps Prisoner’s Dilemma instead can earn its keep here heuristically? Alas, not so. The first reason is that it does not lead us to any explanations that we did not have already. Ubiquitous quotations in Ashworth show that soldiers were very well aware of the basic strategic logic of reciprocity and of the importance of a credible threat for deterring breaches (Reference Ashworth1980, 150). They were well aware too of why frequent raiding rendered truces impossible to sustain, an outcome indeed that many ruefully anticipated even before the policy was implemented (191–98). In other words, Prisoner’s Dilemma is following here, not leading.
The second reason Prisoner’s Dilemma lacks heuristic value is that it actively diverts attention away from the aspects that were actually important. I have in mind the crucial features mentioned above: how truces originated, the causes and management of the many small breaches of them, the importance of ethics and ritualization to their maintenance, why truces occurred only in some sections of the front rather than in a majority of them, and so on.
Again, these basic conclusions about the case are robust against differences within the philosophical literature over precisely how best to analyze heuristic or other nonexplanatory virtues, such as understanding.
A common fallback defense here is that at least Prisoner’s Dilemma offers the virtue of systematization over mere singular explanation, as befits social science as opposed to history. Thus, it is claimed, Prisoner’s Dilemma sheds light on cooperation in general, not just in the specific setting of WWI trench warfare. As it were, global efficiency analysis still favors it even if this local one does not. In reply: true enough, models that explain or give heuristic value over many different cases are indeed highly desirable and would accordingly be endorsed by a global efficiency analysis. But Prisoner’s Dilemma does neither and meanwhile uses up huge resources along the way. As Julian Reiss, Robert Sugden, and others have argued, the only way to get a reliable sense of what theoretical input is actually useful is to do detailed empirical investigations, so resources would be better directed toward those rather than toward yet more theoretical development. Empirical success in particular cases is arguably a necessary condition for usefulness across many (Northcott, Reference Northcottforthcoming b). Correctly understanding what actually encouraged cooperation in the WWI case, for instance, is an essential first step if that case is truly to teach us about cooperation in other cases too. But Prisoner’s Dilemma directs our attention to the wrong things.
Local efficiency analyses will inevitably be based on case studies. When studying a case in detail the efficiency question becomes tractably local and concrete, and the verdict often becomes correspondingly clear, so that worries about how exactly to define and weigh up explanations, predictions, and other virtues become unimportant. In the WWI case, the verdict is that resources put into the theoretical Prisoner’s Dilemma analysis were not well spent. They would have been better directed to the history department.
3. The Philosophy of Economics Literature
There is a standard view about how orthodox economic models are, and should be, used. Roughly, no one imagines that any given model will be applicable to every problem; instead, economists build up a library of such models, thereby increasing the repertoire available for any particular application. All such models obey the same orthodox fundamentals, at least in large part. In this way, advocates say, any model is guaranteed to be precise, its conclusions to be derived rigorously and to be clearly testable, and above all its analysis to be ‘economic’ in the sense of being couched as the result of rational agent choices in the face of incentives. Within this orthodox framework, many quite different policy conclusions may be supported; the framework itself merely enforces rigor of method, not any particular policy stance (Rodrik Reference Rodrik2015).
On this view, economic models study the interaction of causal variables in a shielded environment. In this respect, they follow the Galilean method standard in natural sciences for centuries. Model application is a judgment of fit between model and target: we should choose the model that captures the causes that are actually important in any particular case. A model then offers causal explanations, since it shows that a particular effect is to be expected given a particular arrangement of causes. Such models are, of course, idealized. But their idealizations hurt only when they impede the Galilean project, that is, when we cannot give the model a causal interpretation and use it to intervene successfully. Something like this view is endorsed by much influential work in philosophy of economics, for instance, that of early Cartwright (Reference Cartwright1989) and Mäki (Reference Mäki1992). It is also endorsed (sometimes implicitly) by the majority of economists themselves.
This view of economic modeling continues a long tradition stretching back to Mill (Reference Mill1843). He argued that the ever-changing mix of causes in uncontrolled field cases makes accurate prediction a naive and infeasible goal. Instead, theory should state core causal tendencies, such as human agents’ tendency to maximize their wealth. In any particular application, we compose relevant tendencies in a deductive way and then add in as necessary local ‘disturbing causes’—that is, causal factors not captured by theory but that are also present. In this way, deductive theory is claimed to be more empirically fruitful than predictive alternatives because it offers generalizability; that is, it offers the prospect of empirical success in many applications by adding in different disturbing causes each time. This justifies prioritizing modeling orthodoxy over empirical fit—a prioritization that is frequently apparent in economic practice (Reiss Reference Reiss2008, 106–22; Northcott, Reference Northcottforthcoming a).
There have been many criticisms of this orthodoxy, addressing, among other things, idealization, social ontology, and the foundations of rational choice theory (Elster Reference Elster and Ullmann-Margalit1988; Rosenberg Reference Rosenberg1992; Lawson Reference Lawson1997; Cartwright Reference Cartwright1999). But these criticisms, being general and fundamental in nature, have tended not to distinguish the orthodoxy’s empirical successes from its failures. They are not nuanced enough to yield practical advice as to what mix of methods will serve economics best going forward.
More recently, much criticism has targeted the view, implied by the orthodoxy, that economic models explain. The objection is that, on the contrary, economic models do not explain (Northcott and Alexandrova Reference Northcott and Alexandrova2013). It is charged that they do not satisfy the usual criteria for causal explanation, in particular, that their idealizations mean that they do not state true causes. Instead, models are taken to play various other roles. One such alternative role is that they offer ‘how-possibly’ explanations, that is, derivations that speak only to possibility in the idealized world of the model (Aydinonat Reference Aydinonat2008; Grüne-Yanoff Reference Grüne-Yanoff2009; Forber Reference Forber2010). Another is that models are useful only heuristically, serving to suggest initial categories or lines of inquiry but not themselves earning warrant from empirical success. Instead, that warrant accrues to whatever much more narrow-scope causal hypothesis is eventually confirmed empirically and which is typically not derivable from the general model (Alexandrova Reference Alexandrova2008; Alexandrova and Northcott Reference Alexandrova, Northcott, Kincaid and Ross2009; Northcott, Reference Northcottforthcoming b).
In response, it has been argued back that the explanatory claims of models can be established after all, by means of robustness analysis, that is, by showing that a model’s derivations are robust with respect to variation of some of its assumptions (Kuorikoski, Lehtinen, and Marchionni Reference Kuorikoski, Lehtinen and Marchionni2010). Moreover, if we understand explanation sufficiently broadly (Ylikoski and Aydinonat Reference Ylikoski and Aydinonat2014), then it may be that models may still explain even if they are best understood as mere how-possibly explanations or heuristic aids.
But regardless of whether models can indeed explain, that still does not tell us whether to put resources into more modeling or instead into other methods. For that, we would need to know in addition how often and efficiently models explain—or how often and efficiently they perform their nonexplanatory role.
Overall, the efficiency question so far has not been a primary focus of philosophy of economics. And moreover, what has been the primary focus, such as whether models explain, has now reached such an advanced degree of refinement that it no longer has much new to say about how research effort in economics should be allocated. The best allocation, as in the WWI example, may often be obvious regardless of our precise preferred theory of explanation, in which case further emphasis on the latter will not help with the efficiency question. That is the reason for urging a refocusing of philosophical attention.
4. The Scientific Modeling Literature
Turn next to the wider scientific modeling literature. In effect, it too has largely neglected the efficiency question.
Begin by noting that the modeling literature has been “nearly unanimous in saying that models have to be representative in order to give us knowledge” (Knuuttila Reference Knuuttila2005, 1260). Chakravartty (Reference Chakravartty2010, 171) explains why: “a scientific representation is something that facilitates practices such as interpretation and inference with respect to its target system. … How could such practice be facilitated were it not for some sort of similarity between the representation and the thing it represents—is it a miracle?” The core idea is that target systems are objects in the world with a structure that a model’s structure in some way maps onto. Various accounts have been offered of the representation relation between model and target, initially including isomorphism, partial isomorphism, and similarity (Frigg Reference Frigg2006). (More recently, accounts have become influential that analyze representation in terms of practical function or inferential role; see below.)
Across science, many times models clearly are explanatory, and in such cases a focus on representation is eminently sensible: successfully representing a cause immediately yields a causal explanation, for instance, and successful representation explains empirical success too. In nonexplanatory cases matters are subtler because the model itself does not explain and it might not predict successfully either. On the heuristicist view, for instance, what matters to (causal) explanation is instead whether an eventual causal hypothesis represents, not whether the initial heuristic model does. Thus, we need assume nothing about any representation relation between the initial model and the target.Footnote 9 That does still leave a link between representation and explanation, but now in a different place. In the WWI case, for example, Ashworth’s historical explanations succeed precisely because they truly represent actual causes.
But the efficiency question concerns something different: is orthodox modeling a good way to achieve successful representations? The superiority of Ashworth’s explanations is clear on any plausible view of representation, just as it was on any plausible view of explanation. Accordingly, at least in the WWI case, debating the best theory of representation sheds no light on the efficiency question, any more than debating the best theory of explanation did. What is required instead is a comparison of how well different methodological approaches achieve successful representations—in other words, efficiency analysis.
One virtue of the recent modeling literature is that it allows for failures of representation as well as successes, but again that is different from assessing which methods best avoid such failures.
Finally, a separate strand of the modeling literature has concerned the relation between models and parent theories. Reacting against the close relation posited by the semantic view of theories, a rival view has become very influential in the last couple of decades, namely, that of models as mediators (Morgan and Morrison Reference Morgan and Morrison1999). Very roughly, this sees models as being autonomous from both general theories and particular phenomena. This autonomy allows models to act as epistemic tools, facilitating interventions and serving as instruments of exploration in their own right. One example is the Prisoner’s Dilemma game, which is distinct both from general economic principles and from particular examples of strategic cooperation.
There has been some interesting convergence between the mediator and representation strands of the literature. In particular, as noted above, more recent accounts of representation often define it in practical terms such as inferential role (Suarez Reference Suarez2015). Knuuttila (Reference Knuuttila2011) emphasizes models’ role in this regard precisely as epistemic tools. But again, notwithstanding the interest of these accounts for other purposes, what matters for efficiency analysis is a separate question—namely, which methods produce models that are good epistemic tools or good mediators.
5. Conclusion: Global Efficiency Analysis and the Empirical Turn
Recently, economics has seen a much remarked ‘empirical turn’. For instance, in the five most prestigious journals in economics, the percentage of papers that are purely theoretical—that is, free of any empirical data—fell from 57% in 1983 to 19% in 2011 (Hamermesh Reference Hamermesh2013). Moreover, not only is there more empirical work in prestigious venues but also this empirical work is less often theory based as opposed to ‘atheoretical’; that is, it tests particular theoretical models less often as opposed to establishing previously untheorized causal relations. Biddle and Hamermesh (Reference Biddle and Hamermesh2016) report that whereas in the 1970s all microeconomic empirical papers in the top five journals exhibited a theoretical framework, in the 2000s there was a resurgence of atheoretical studies. Citation numbers suggest that the atheoretical work is at least as influential. Angrist and Pischke (Reference Angrist and Pischke2010) also report the rise of atheoretical practice in several subfields.
One obvious possibility is that the empirical turn has been caused by, in effect, an accumulation of global efficiency analyses by practitioners, which have motivated an overall shift in research emphasis from pure theory toward empirical application. There is anecdotal evidence for this conjecture but as yet no more than that, and we must await more detailed work by historians of economics. But the empirical turn is in any case significant here for other reasons. Its mere existence shows that theoretical and empirical work are sufficiently distinct for it to be meaningful to speak of a shift in resources from one to the other. It also shows that the discipline’s norms and incentives are not so entrenched as to make such a shift practically impossible. As a result, it now becomes incumbent on philosophers to evaluate such shifts: is the empirical turn a good thing? And presumably any such evaluation would be precisely some form of global efficiency analysis.
Overall, efficiency analyses, both local and global, are inevitable and happening anyway. As philosophers of economics we should be assessing them explicitly, as well as carrying out such analyses ourselves. We should not be restricting our work just to further examination of the epistemic properties of models; instead, let us widen our view to include also the organization of the discipline as a whole. In common with economists themselves, I take the efficiency question to be of greater practical importance to economics than are the minutiae of explanation or representation. It deserves greater attention.