On the Evidential Import of Unification

Wayne C. Myrvold

doi:10.1086/688937

On the Evidential Import of Unification

Published online by Cambridge University Press: 01 January 2022

Wayne C. Myrvold

Article contents

Abstract
Introduction
Two Kinds of Unification
Probabilistic Measures of Unification
The Evidential Value of Unification
Possible Reactions to the Bayesian Verdict
A Prior Preference for Unifying Hypotheses?
Unification and Reichenbachian Common Causes
Conclusion
Footnotes
References

Rights & Permissions

Abstract

There are two senses in which a hypothesis may be said to unify evidence: (1) ability to increase the mutual information of a set of evidence statements; (2) explanation of commonalities in phenomena by positing a common origin. On Bayesian updating, only Mutual Information Unification contributes to incremental support. Defenders of explanation as a confirmatory virtue that makes independent contribution must appeal to some relevant difference between humans and Bayesian agents. I argue that common origin unification has at best a limited heuristic role in confirmation. Finally, Reichenbachian common cause hypotheses are shown to be instances of Mutual Information Unification.

Type: Research Article
Information: Philosophy of Science , Volume 84 , Issue 1 , January 2017 , pp. 92 - 114

DOI: https://doi.org/10.1086/688937 [Opens in a new window]
Copyright: Copyright © The Philosophy of Science Association

1. Introduction

Myrvold (Reference Myrvold2003) identified what was described therein as “one interesting sense” in which a theory can unify phenomena. This consists of the ability of the theory to render distinct phenomena informative (or more informative) about each other. Call this Mutual Information Unification (MIU). This sense lends itself nicely to a probabilistic explication, and it can be shown that unification in this sense contributes to incremental evidential support of the theory by the phenomena unified.

There is another sense of unification, having to do with hypotheses that posit a common origin for the phenomena in question, be it a common cause or some other type of explanation. Call this Common Origin Unification (COU). As emphasized by Lange (Reference Lange2004) and Schupbach (Reference Schupbach2005), the two senses are logically independent; neither is a necessary or a sufficient condition for the other, even though, in a number of interesting cases, they are concomitants of each other.

In this article, the respective roles of these two notions of unification in connection with the bearing of evidence on a theory are discussed. There are, of course, other questions one might ask, and other roles for a notion of unification to play apart from contributing to confirmation. Having a common explanation for disparate phenomena can contribute to deeper understanding, which is one goal of scientific research. Insofar as it contributes to such understanding, COU may play the role of a cognitive value.Footnote ¹ As such, it can play a legitimate role in questions such as that of which research program to pursue; a theory might be regarded as more worthy of development on account of its potential for affording understanding.Footnote ² This is a different matter from the question at issue in this article, which is whether unification ought to be regarded as contributing to the evidential support of a theory by the phenomena unified.

On the question of the respective roles of these two notions of unification in theory confirmation, on a Bayesian analysis, the answer is clear: MIU contributes to incremental evidential support, and there is no scope, within Bayesian updating, for COU to add to the evidential support of the theory (see sec. 4).

There may be some who do not take this to settle the normative issue and who will maintain that, despite the Bayesian verdict, we ought to take explanatory power of a hypothesis as a confirmatory virtue. Advocates of such a view would have to reject the idea that we should take consideration of a Bayesian agent updating via conditionalization as normative for those of us who are not such agents. This presents a challenge for such advocates. If it is rational, or reasonable, or otherwise well and good for us to do what is impossible for a Bayesian agent updating its credences via conditionalization, that is, to take COU to be something that makes a contribution to evidential support above and beyond what it contributes to MIU, then this must be grounded in some relevant difference between us and Bayesian agents. It is incumbent on an explanationist to give an account of what that difference is.

In what follows, these points are first illustrated by means of a simple example that, despite its artificiality, shares some salient features with cases of actual scientific interest. Next, in section 3, are presented the probabilistic measures of MIU introduced in Myrvold (Reference Myrvold2003), and in section 4 their impact on evidential support is exhibited. In section 5 are outlined possible reactions to the Bayesian verdict regarding the respective confirmatory roles of the two types of unification. In section 6 the question is addressed whether there is still a role for COU to play in hypothesis assessment, in assessing priors rather than in assessing incremental evidential support (the answer is no). Finally, in section 7 it is shown how Reichenbachian common causes fit into the schema of MIU.

2. Two Kinds of Unification

Consider the following toy example, of no use except for introducing the issues at hand, although it does share some salient features with a multitude of real-world cases of genuine scientific interest. You are about to be presented with two data streams, S ₁ and S ₂, each of which will be sequences of 10 heads or tails. You know that they have been produced by coin flipping, but you are not sure of exactly the procedure used or whether the coin or coins involved are fair.

Suppose that you have nonzero credences in both of the following hypotheses:

H ₁: A fair coin was flipped 10 times, and the results of this series of coin flips are reported in both data streams.

H ₂: Two fair coins were flipped 10 times each, and each data stream reports the results of one of these series of coin flips.

I invite you to consider the effect of the evidence on these two hypotheses. That evidence consists of specification of the two data streams:

S ₁: HHHTTHTHHT

S ₂: HHHTTHTHHT

Let E ₁ be the proposition that S ₁ is the string given above, and E ₂ the corresponding proposition about S ₂.

Now, if you have nonnegligible prior credence that the strings might have been produced by radically unfair coins, E ₁ and E ₂ might boost your confidence in the fairness of the coins, and hence conditionalizing on each of E ₁ and E ₂, separately, might boost your credence in both H ₁ and H ₂. But, when taken together, E ₁ and E ₂ strongly favor H ₁ over H ₂.

There are two features of this example that I would like to draw your attention to. The first feature is that H ₁, if true, renders E ₁ informative about what data stream S ₂ will be. Conditional on H ₁, knowing E ₁ permits one to anticipate the truth of E ₂. That is, H ₁ exhibits MIU with respect to the evidence set ${E_{1}, E_{2}}$ . A hypothesis has this property, with respect to a set of evidential propositions, if conditionalizing on that hypothesis increases the mutual informativeness of the set. Obviously, this is the sort of thing that comes in degrees. In our toy example, conditional on H ₁, knowledge of E ₁ permits one to anticipate, with certainty, all details of E ₂. In more interesting cases the increase of informativeness will be less than maximal. Probabilistic measures of degree of this sort of unification will be introduced below.

The second feature is that H ₁ posits a common origin of the two data streams and thus is ripe to be the subject of what Janssen (Reference Janssen2002) has called a COI story, for Common Origin Inference. In addition to MIU, H ₁ exhibits COU.

The two concepts are of a manifestly very different sort. One belongs to a cluster of concepts involving information, states of knowledge, and the like; the other is related to concepts of cause and explanation.Footnote ³ As already mentioned, they are logically independent. A hypothesis can posit a common origin for two (or more) evidential propositions without making them mutually informative about each other, as the propositions could be about independent aspects of their posited common origin; thus, we can have COU without MIU. Furthermore, once two or more evidential propositions are known, that is, have been absorbed into one’s background knowledge, they are no longer informative about each other, although any common origin they might have remains, and again we have COU without MIU. One can also trivially construct hypotheses that exhibit MIU without COU. With respect to our toy example, consider the hypothesis

H ₃: Two fair coins were flipped 10 times each, each data stream reports the results of one of these series of coin flips, and the results of each series of flips just happened to be the same.

This hypothesis, if true, also renders one data stream informative about the other. Of course, before the evidence, one would expect credence in H ₃ to be low, lower than credence in H ₂ by a factor of 1,024.

Although artificial, our toy example has a multitude of parallels in actual science. One is the case of heliocentric v geocentric world systems, discussed by Janssen (Reference Janssen2002), Myrvold (Reference Myrvold2003), and Henderson (Reference Henderson2014). The analog of H ₁ is what was called h_C in Myrvold (Reference Myrvold2003), that is, the heliocentric hypothesis that all planets have circular or nearly circular orbits centered at or near the sun, and the analog of H ₂ is the bare-bones geocentric hypothesis h_P, which posits that, for each planet, there is a deferent circle centered near the earth and that the planet travels on an epicycle whose center travels on the deferent, with no assumption made about any connections between the motions of different planets or between planetary motions and the motion of the sun. The analog of H ₃ is the geocentric hypothesis conjoined with the sun-planet parallelism condition; this is the hypothesis h _SP, or the strengthened Ptolemaic hypothesis.

One can find analogs in cases in which a hypothesis turns disparate, prima facie unrelated phenomena into agreeing measurements of some theoretical parameter. The classic case is Perrin’s argument for the existence of atoms. Perrin (Reference Perrin1913, sec. 119; Reference Perrin and Hammick1916, sec. 120) adduces 13 distinct phenomena that, on the atomic hypothesis, count as measurements of Avogadro’s number. The analog of H ₁ is that atoms exist, and hence there is a common origin explanation of the agreement of these measurements; the analog of H ₂ would be the hypothesis that matter is continuously divisible, and the analog of H ₃ would be the hypothesis that adds to H ₂ the stipulation that Perrin’s 13 phenomena yield values that just happen to agree within experimental error, even though they are not agreeing measurements of any physically meaningful quantity. Another example is the quantum hypothesis, which turns disparate phenomena into agreeing measurements of Planck’s constant (see Kao Reference Kao2015).

3. Probabilistic Measures of Unification

Consider a Bayesian agent whose credences are represented by a probability function Cr. We define the mutual information of a pair of propositions, ${p_{1}, p_{2}}$ , relative to background b, byFootnote ⁴

\begin{matrix} (1) & I (p_{1}, p_{2} ∣ b) = \log_{2} (\frac{C r (p_{2} ∣ p_{1} b)}{C r (p_{2} ∣ b)}) = \log_{2} (\frac{C r (p_{1} p_{2} ∣ b)}{C r (p_{1} ∣ b) C r (p_{2} ∣ b)}) . \end{matrix}

If p ₁ and p ₂ are probabilistically independent on b, then $I (p_{1}, p_{2} ∣ b)$ is zero; it is positive if conditionalizing on one boosts credence in the other, negative, if conditionalizing on one lowers credence in the other.

For a larger set, $p = {p_{1}, p_{2}, \dots, p_{n}}$ , we add up the information yielded by p ₁ about p ₂, the information yielded by p ₁p ₂ about p ₃, and so on, up to the information about p_n yielded by the conjunction of all the others.Footnote ⁵

\begin{matrix} (2) & \begin{matrix} I (p_{1}, \dots, p_{n} ∣ b) & = I (p_{1}, p_{2} ∣ b) + I (p_{1} p_{2}, p_{3} ∣ b) + \dots + I (p_{1} \dots p_{n - 1}, p_{n} ∣ b) \\ = \sum_{k = 1}^{n - 1} I (\land_{i = 1}^{k} p_{i}, p_{k + 1} ∣ b) . \end{matrix} \end{matrix}

Although the form of (2) does not make this obvious, this quantity is independent of the order in which the elements of the set $p$ are taken, and we have

\begin{matrix} (3) & \begin{matrix} I (p_{1}, \dots, p_{n} ∣ b) & = \log_{2} (\frac{C r (p_{1} p_{2} \dots p_{n} ∣ b)}{C r (p_{1} ∣ b) C r (p_{2} ∣ b) \dots C r (p_{n} ∣ b)}) \\ = \log_{2} (\frac{C r (\land_{i = 1}^{n} p_{i} ∣ b)}{Π_{i = 1}^{n} C r (p_{i} ∣ b)}) . \end{matrix} \end{matrix}

With a slight abuse of notation, we will write $I (p ∣ b)$ for $I (p_{1}, \dots, p_{n} ∣ b)$ . We will also drop, as irrelevant, the base of the logarithm, since changing base is only a matter of a constant multiplicative factor.

The mutual information $I (p ∣ b)$ is the logarithm of a quantity that appears in Keynes (1921, sec. 14.8) as the coefficient of dependence, with an attribution to unpublished work by W. E. Johnson.Footnote ⁶ It was called a measure of similarity by Wayne (Reference Wayne1995) and Myrvold (Reference Myrvold1996) and taken by Shogenji (Reference Shogenji1999) as a measure of coherence of a set of propositions.

We will say that a hypothesis h MIUnifies a set $e = {e_{1}, \dots, e_{n}}$ , relative to background b, if and only if

\begin{matrix} (4) & I (e ∣ h b) > I (e ∣ b) . \end{matrix}

This suggests a way to measure the degree to which a hypothesis MIUnifies a set of evidential propositions.Footnote ⁷

\begin{matrix} (5) & M I U_{1} (e; h ∣ b) = I (e ∣ h b) - I (e ∣ b) . \end{matrix}

We might also be interested in whether a hypothesis does a better job of unifying a set of propositions than its negation. Define

\begin{matrix} (6) & \begin{matrix} M I U_{2} (e; h ∣ b) & = M I U_{1} (e; h ∣ b) - M I U_{1} (e; \bar{h} ∣ b) \\ = I (e ∣ h b) - I (e ∣ \bar{h} b) . \end{matrix} \end{matrix}

The two are not ordinally equivalent and, indeed, need not agree as to sign.

Suppose a hypothesis h unifies a body of evidence, relative to background b. That is, suppose the evidence is more mutually informative conditional on hb than on b alone. Then $M I U_{1} (e; h ∣ b)$ is positive. But whether $M I U_{2} (e ∣ b)$ is negative or positive depends on whether $\bar{h}$ unifies the evidence more. If $I (e; \bar{h} ∣ b)$ is greater than $I (e; h ∣ b)$ , then, even if $M I U_{1} (e; h ∣ b)$ is positive, $M I U_{2} (e; h ∣ b)$ is negative. In fact, all four combinations of signs of MIU ₁ and MIU ₂ are possible, although it is easy to show that, unless e ₁ and e ₂ are, when taken individually, oppositely relevant to h (i.e., unless one of them is positively relevant and the other negatively relevant), if $M I U_{1} (e_{1}, e_{2}; h ∣ b)$ is positive, $M I U_{2} (e_{1}, e_{2}; h ∣ b)$ is also positive. See the appendix for details.

Both of these quantities are special cases of a comparative measure of unification:

\begin{matrix} (7) & M I U_{c} (e; h_{1}, h_{2} ∣ b) = I (e ∣ h_{1} b) - I (e ∣ h_{2} b) . \end{matrix}

On McGrew’s account of consilience, h ₁ is said to be more consilient than h ₂ with respect to $e$ to the extent that $I (e ∣ h_{1} b) > I (e ∣ h_{2} b)$ , or, equivalently, to the extent that $M I U_{c} (e; h_{1}, h_{2} ∣ b) > 0$ (McGrew Reference McGrew2003, 562).

Readers are asked to kindly refrain from engaging in a battle of the intuitions over whether MIU ₁ or MIU ₂ is the One True Measure of degree of unification. They are simply measuring different things, and if you have intuitions that are incompatible with properties that one or another of these quantities possesses, then your intuitions are about some other concept.Footnote ⁸

4. The Evidential Value of Unification

To some readers, it might seem obvious that what counts when it comes to confirmation is COU, with MIU being a poor cousin that hardly merits the illustrious family surname. A view of this sort is expressed by Marc Lange, who writes,

the examples I have given suggest that insofar as theories that unify in the stronger,Footnote ⁹ ontological-explanatory sense derive greater support in virtue of the unification they achieve, they do so not solely in virtue of their achieving unification in the weaker, creating-mutual-positive relevance sense. The stronger sense of unification is epistemically significant. In the case of the light-quantum hypothesis, h_C and h_L both supply unity in the weaker sense, but Einstein took h_L to receive greater support from the phenomena than h_C by virtue of h_L’s unifying those phenomena in an ontological-explanatory sense. (Reference Lange2004, 212)

Here h_L is Einstein’s light quantum hypothesis, and h_C is the hypothesis that h_L is false but nevertheless, by sheer coincidence, light behaves as if it were quantized. According to Lange, h_L receives greater support from the phenomena unified than does h_C.

It is not entirely clear whether incremental or absolute support is meant, where incremental support has to do with an increase in credibility lent to a hypothesis by the evidence, and absolute support with the credibility of the hypothesis, taking all known considerations into account. If absolute, this suggests that the case of h_C is analogous to that of our toy example’s H ₃, which is accorded a low prior because it posits an improbable coincidence. But, if the claim is to be a counterexample to anything in Myrvold (Reference Myrvold2003), incremental support must be what is meant. Let us therefore consider the position that, when it comes to incremental support, it is COUnification, not MIUnification, that counts.

A Bayesian analysis renders the opposite verdict: when it comes to incremental support of a hypothesis, it is MIUnification, rather than COUnification, that matters. One popular measure of the degree to which an evidential proposition e lends incremental confirmation to a hypothesis h, relative to background b, is the ratio of posterior probability of h to its prior probability. This is, of course, ordinally equivalent to its logarithm. Let us define

\begin{matrix} (8) & R (h; e ∣ b) = \log (\frac{C r (h ∣ e b)}{C r (h ∣ b)}) . \end{matrix}

Another is the ratio of the posterior odds of h to its prior odds, or, equivalently, the logarithm of this, called weight of evidence by Good (Reference Good1950). Define

\begin{matrix} (9) & W (h; e ∣ b) = \log (\frac{C r (h ∣ e b) / C r (\bar{h} ∣ e b)}{C r (h ∣ b) / C r (\bar{h} ∣ b)}) = \log (\frac{C r (e ∣ h)}{C r (e ∣ \bar{h})}) . \end{matrix}

As Myrvold (Reference Myrvold2003) pointed out, on either way of measuring incremental confirmation, we have a contribution of unification to confirmation.Footnote ¹⁰ The incremental support, as measured by R, of h by $e$ can be decomposed into a sum of increments due to the individual members of $e$ , plus an additional term that is the degree of MIUnification (positive or negative) of $e$ by h, as measured by MIU ₁.

\begin{matrix} (10) & R (h; \land_{i = 1}^{n} e_{i} ∣ b) = \sum_{i = 1}^{n} R (h; e_{i} ∣ b) + M I U_{1} (e; h ∣ b) . \end{matrix}

The result for W takes the same form, with MIU ₂ in place of MIU ₁.

\begin{matrix} (11) & W (h; \land_{i = 1}^{n} e_{i} ∣ b) = \sum_{i = 1}^{n} W (h; e_{i} ∣ b) + M I U_{2} (e; h ∣ b) . \end{matrix}

These relations can be readily verified by the reader.

Since the MIU term is not the only contribution to the increment of confirmation, it would be incorrect to gloss these results as saying that hypotheses that are more unifying receive more confirmation. Although it would not be incorrect to say that ceteris paribus, a hypothesis that achieves a higher degree of MIUnification of the evidence is accorded greater incremental support, this is strictly weaker than what is conveyed in equations (10) and (11), and there is no advantage in making the ceteris paribus claim when it is a trivial matter to say how things stand when all else is not equal.

Imagine, now, a Bayesian agent that had numerical credences, which it updated by conditionalizing on new items of evidence.Footnote ¹¹ Then, depending on how we measured degree of incremental confirmation, the confirmational boost accorded to h by a set $e$ of evidential propositions would be given by (10) or (11). In each case the additional confirmational boost, beyond that attributable to the items of evidence taken singly, is given by the MIUnification term.

Applied to our toy example: the fact that H ₁ and H ₃ make E ₁ and E ₂ informative about each other is reflected in the likelihoods, $C r (E_{1} E_{2} ∣ H_{1})$ and $C r (E_{1} E_{2} ∣ H_{3})$ , which are higher than $C r (E_{1} E_{2} ∣ H_{2})$ by a factor of 1,024. Thus, relative to H ₂, credence in H ₁ and H ₃ is boosted:

\begin{matrix} (12) & \frac{C r (H_{1} ∣ E_{1} E_{2})}{C r (H_{1})} = \frac{C r (H_{3} ∣ E_{1} E_{2})}{C r (H_{3})} = 1, 024 \times \frac{C r (H_{2} ∣ E_{1} E_{2})}{C r (H_{2})} . \end{matrix}

It does not follow, of course, that H ₃ gets final credence comparable to that of H ₁. Since H ₃ posits an improbable coincidence, it is accorded a lower prior probability, lower than that of H ₂ by a factor of 1,024; the additional confirmational boost it receives is just enough to bring it up to posterior credence equal to that of H ₂ (which, of course, must be the case, since, given the evidence, H ₃ is true if and only if H ₂ is).

There is a close parallel between this case and the case of geocentric v heliocentric world systems, and also the case of the light quantum, considered by Lange. In the case of planetary motion, on both the heliocentric hypothesis and the strengthened Ptolemaic hypothesis H _SP, features of one planet’s apparent motion are informative about features of others’ (see Janssen [Reference Janssen2002] and Myrvold [Reference Myrvold2003] for discussion). In the case of the heliocentric hypothesis, H_C, these have a common origin in the motion of our vantage point as observers on earth; for H _SP, they are the consequence of the posited sun-planet parallelism. Against a background that includes little or no information about observed planetary motions, both of these get a confirmational boost from the celestial phenomena, due to the MIU component of incremental confirmation. It does not follow that they end up with equal posterior credence. Arguably, H _SP, on that background, should be accorded markedly lower prior credence than the bare-bones geocentric hypothesis H_P, as it posits a relation that H_P by itself would not lead one to anticipate. The two hypotheses H_C and H _SP get the same incremental confirmation on the evidence. Therefore, posterior credence in H_C will be markedly higher than posterior credence in H _SP unless prior credence in H_C is markedly lower than prior credence in H_P.

Something similar can be said in regard to Lange’s case of the light quantum hypothesis. Let us grant that the light quantum hypothesis plays a unificatory role. Lange asserts that Einstein took the observed phenomena to lend greater support to the light quantum hypothesis than the hypothesis that, by sheer coincidence, all observable phenomena are as if the light quantum hypothesis is true. The suggestion is that such a judgment is the right one, given the evidence available to Einstein in 1905. In order for this claim to be relevant to the issue at hand, this must mean that the phenomena lend greater incremental support to the light-quantum hypothesis than to the coincidence hypothesis. One might also regard the hypothesis of coincidence as so implausible as to be dismissed out of hand. But this would mean according it a low prior, which is consistent with the Bayesian account of the virtue of unification.

5. Possible Reactions to the Bayesian Verdict

Bayesian updating leaves no room for an additional confirmatory boost to be attached to hypotheses with greater explanatory power; the contribution to incremental support comes via the MIUnification term. There is a tension between this Bayesian verdict and the thought that COUnification should play a role in incremental confirmation above and beyond its contribution to MIUnification. We have here an exact parallel with van Fraassen’s argument against those who would take explanatory power of a theory to yield an extra confirmatory boost, beyond that yielded by conditionalization on the evidence (van Fraassen Reference van Fraassen1989, 166–69).

One reaction might be to downplay the distinction, focusing on cases in which explanationist and Bayesian judgments agree. One might be tempted to declare that hypotheses that provide ‘lovelier’ explanations are precisely those that bestow higher likelihood on a hypothesis. This is not tenable as a general thesis. Although, in many interesting cases, explanation and likelihood go together, the connection is not so tight that they never come apart. The interesting question is what the explanationist will say about the cases in which they do come apart.

One possible reaction, in my opinion the correct one, is to use the Bayesian verdict to correct any intuitions one might have that are in tension with it. The ability of a theory to unify disparate phenomena by positing a common origin plays a confirmatory role only insofar as the posited common origin renders distinct phenomena informative (or more informative) about each other. A temptation to assign it a stronger role in confirmation might be ascribed, in part, to a conflation of distinct questions (a conflation encouraged by philosophers’ overuse of the phrase “theory choice,” a phrase that conflates distinct sorts of choices). Certainly, a hypothesis’s power to explain, if true, can contribute to making it worthwhile to pursue a project of developing a theory that includes that hypothesis, and it can contribute to the value of accepting the hypothesis, if true; we should only be wary of thinking that everything that contributes to making a hypothesis pursuit worthy also lends it greater credibility. The temptation might also be ascribed, in part, to not distinguishing between incremental confirmation and overall credibility in the light of all evidence. The most obvious examples that exhibit MIUnification without COUnification are those, such as our H ₃, that achieve it by brute fiat, by tacking on an improbable conjunct, and we rightly regard these as implausible.

This suggests one way in which an explanationist might retrench; the import of COUnification might be relegated to informing priors. While, certainly, common-origin considerations sometimes play a role in assessing prior credibility, I am skeptical that anything beyond a very limited role can be defended; more on this in the next section.

The only other avenue of defense for an advocate of an explanationist thesis would be to deny that considerations of how a Bayesian agent would update have normative force for the judgments of human scientists. A defense along these lines of thought would have to ground it in some relevant difference between us and Bayesian agents. We are certainly different from Bayesian agents in a number of ways. We do not have precise numerical degrees of belief; our judgments about how likely or unlikely a hypothesis is tend to be vague. Moreover, as an abundance of empirical evidence shows, routinely our qualitative judgments of the relative credibility of various propositions are not even compatible with the existence of numerical credences satisfying the axioms of probability, and our changes in credences are often not in accord with Bayesian conditionalization.

The usual understanding of facts of this sort is that they are due to cognitive limitations and that some of them can be understood as resulting from usually reliable heuristics, of the sort that any agent with limited cognitive capacities would be well advised to employ as an alternative to spending excessive time on cogitation. In taking such limitations into account, one does not ipso facto abandon the domain of normativity for descriptive psychology. From a decision-theoretic point of view, deployment of such heuristics can be regarded as rational behavior for a cognitively limited agent. This involves what Good (Reference Good, Godambe and Sprott1971, Reference Sprott, Harper and Hooker1976) called Type II Rationality: decision making that takes into account the cost in time and cognitive effort of the act of deliberation.

Peter Lipton has offered a limited defense of explanationism along these lines. We are often not very good, he notes, at judging likelihoods correctly.

My thought is this. In many real life situations, the calculation that the Bayesian formula would have us make does not, in its bare form, meet the requirement of epistemic effectiveness: it is not a recipe we can readily follow. … My suggestion is that explanatory considerations of the sort to which Inference to the Best Explanation appeals are often more accessible than those probabilistic principles to the inquirer on the street or in the laboratory, and provide an effective surrogate for certain components of the Bayesian calculation. On this proposal, the resulting transition of probabilities in the face of new evidence might well be just as the Bayesian says, but the process that actually brings about the change is explanationist. (Lipton Reference Rackover2004, 113–14; see also Lipton Reference Lipton, Hon and Rackover2001, 110–11)

On such a view, when a judgment needs to be made on the fly, it is better to invoke an explanationist heuristic than to spend time thinking through likelihoods; this will, one hopes, provide judgments that are not too far off, either most of the time or in the most significant cases. Although Lipton suggests that the division of labor between Bayesian and explanationism maps onto the distinction between normative and descriptive accounts, he also uses language that suggests that we cognitively limited agents are well advised to employ explanationist considerations as a surrogate for doing a Bayesian calculation: “explanatory considerations help us to perform what is in effect a Bayesian calculation” (Lipton Reference Rackover2004, 120). This suggests that considerations of Type II rationality are in play.

Using a heuristic of this sort as a surrogate for a considered evaluation of likelihoods carries with it a risk of error, in those cases in which COU and MIU come apart. Presumably, Lipton would agree that, in such cases, if an accurate appraisal of the import of the evidence matters, one should correct the explanationist judgment by reference to the Bayesian one. On Lipton’s view, the role of explanationist considerations is severely constrained.

Can a stronger defense of explanationism be mounted? It is doubtful. Since such a defense would have to be grounded in some difference between cognitively limited humans and Bayesian agents, it is hard to see any role for explanationist consideration beyond the limited heuristic role envisaged by Lipton.

6. A Prior Preference for Unifying Hypotheses?

We have considered cases (in the toy example, H ₁ and H ₃; in the case of planetary motion, H_C and H _SP; and in the light quantum case, h_L and h_C) in which each of a pair of hypotheses possesses the same ability to render items of evidence informationally relevant to one another, but they do so in different ways. In each of these cases one does it by virtue of positing a common origin for prima facie unrelated phenomena, the other, by brute fiat, in positing an unexplained correlation between the phenomena. In each of these cases, the hypothesis that involves a common origin is, arguably, less implausible than the one that posits brute coincidence.

One might be tempted to generalize, positing that, whenever we have MIU without COU, there will be a corresponding hypothesis that achieves precisely the same MIUnification via COUnification, and we should accord much less prior credence to the hypothesis that exhibits MIU without COU than to the one that achieves it via COU. This would mean that there is a role for COU, not in incremental confirmation but in setting priors.

Anything so sweeping would be a mistake, I think. There are patterns in the world of all sorts, some due to some sort of common origin, some not. We should not demand that a common origin be found for every similarity between two phenomena. Given any pattern in the phenomena, however, it will be possible to cook up an artificial MIUnifying hypothesis. We ought not seek a common origin lurking behind every such hypothesis.

Perhaps, then, the generalization should be that, when we do have a pair of hypotheses that both induce the same informational relevance relations among a body of phenomena, one doing it via COUnification and the other by brute fiat, we should attach higher prior credence to the COUnifying hypothesis. This is still too sweeping. When we have a case of two hypotheses h ₁ and h ₂ of roughly equal prior credibility and create a third h ₃ by tacking on to h ₂ some conjunct with low prior plausibility, then, indeed, in such a case we should place lower credence in h ₃ than in h ₁. But not all cases will be like that, and a COUnifying hypothesis might be deemed implausible on other grounds. Take, for example, Ptolemy’s attitude toward heliocentric hypotheses. Since Ptolemy recognized that in the observed phenomena there is a connection between the apparent motion of the sun and that of the other planets, he was in a position to appreciate the COUnifying power of heliocentrism. But, since he accepted Aristotelian physics for terrestrial phenomena, he thought that terrestrial phenomena ruled out a diurnal rotation of the earth (see Ptolemy Reference Toomer1984, bk. I, sec. 7); for him, it was reasonable to place low credence in heliocentric theories that posited such a rotation.

One can exhibit plenty of hypothesis pairs in which the less unifying, less explanatory hypothesis has less prior credibility because the less explanatory hypothesis posits an implausible coincidence. But the emphasis should be on the credibility-diminishing role of coincidence, rather than any prior conviction that nature is unified. What H ₃, the strengthened Ptolemaic hypothesis, and Lange’s h_C have in common is that, in each case, we have a hypothesis to which is tacked on some additional condition that one would not expect to hold in the absence of evidence that it does, and hence we have a hypothesis that ought to be accorded low prior credence. Rather than a sweeping preference for COUnification, I suggest that the methodological adage that underwrites low prior credence in such hypotheses is

Place little prior credence in things you take to be improbable.

This is, I hope, unobjectionable! It is, of course, utterly empty, but I am skeptical that anything stronger could be defended as a maxim of more than very limited scope.

It would be a mistake to raise this bland but unobjectionable maxim into a global rejection of hypotheses that posit coincidences. Improbable things do happen, after all. Moreover, in some cases it is reasonable to accept hypotheses that posit an improbable coincidence. The evidence available to you in the toy example strongly suggests a common cause. But if you were to obtain strong evidence that the two data streams were the result of independent tosses of two fair coins, then it would be reasonable to accord high credence to H ₃. For a real-world case: Ptolemy propounded a geocentric system with an unexplained sun-planet parallelism because he thought he had strong evidence to rule out hypotheses that involved a moving earth.

7. Unification and Reichenbachian Common Causes

Among unifying hypotheses are those that posit a Reichenbachian common cause to explain some observed statistical correlation (Reichenbach Reference Reichenbach1956, sec. 19). This type of hypothesis fits well within the schema of the Bayesian account of unification, but, since this might not be obvious, it is worth showing how it fits.

Consider two sequences of propositions, ${A_{i}, i = 1, \dots, n}$ and ${B_{i}, i = 1 \dots, n}$ . Given such sequences, let n(A) be the number of true instances of the A_is, and let $f (A) = n (A) / n$ be the relative frequency of true instances of the A_is. Define f(B) and f(AB) similarly. Let E ₁ be a proposition expressing which of the A_is are true and which are false. For example, in our toy example, A_i could be the proposition that the ith element of S ₁ is heads, and E ₁ would be

A_{1} A_{2} A_{3} {\bar{A}}_{4} {\bar{A}}_{5} A_{6} {\bar{A}}_{7} A_{8} A_{9} {\bar{A}}_{10} .

Let E ₂ be the evidence statement specifying the B-sequence.

A statistically significant difference between f(AB) and the product $f (A) f (B)$ is thought to call for explanation. A Reichenbachian Common Cause of an observed correlation between A and B is a third sequence C_i that screens off their correlation. That is,

\begin{matrix} (13) & \begin{matrix} P r (A_{i} B_{i} ∣ C_{i}) = P r (A_{i} ∣ C_{i}) P r (B_{i} ∣ C_{i}); \\ P r (A_{i} B_{i} ∣ {\bar{C}}_{i}) = P r (A_{i} ∣ {\bar{C}}_{i}) P r (B_{i} ∣ {\bar{C}}_{i}) . \end{matrix} \end{matrix}

A hypothesis that posits a common cause of this sort, if it leads one to expect correlations close to those observed, clearly can be supported by evidence in which there is an observed statistical correlation between two sequences of events. Such a hypothesis can be a MIUnifying hypothesis, in the sense of making the evidence statements E ₁ and E ₂ mutually informative.

This might seem paradoxical. A common cause screens off the correlations between the A_is and B_is; how can it be that, at the same time, there is a confirmational boost associated with rendering them informative about each other?

The answer to this is that the hypothesized common causes C_i screen off the correlations, but a hypothesis H_cc that posits that there are common causes of the right sort can render the truth or falsity of A_i informative about the truth or falsity of B_i, and hence render E ₁ and E ₂ mutually informative. That is, a hypothesis that there is a common cause of the right sort will lead one to expect correlations between the A_is and B_is, and so count as MIUnifying with respect to the evidence set ${E_{1}, E_{2}}$ , relative to a background against which the observed correlations are unexpected.

Moreover, each event C_i can count as a common origin of A_i and B_i. Let H_cc be some hypotheses according to which there exists a sequence ${C_{i}}$ satisfying (13). Suppose that, on the supposition of H_cc, C_i is a probability raiser for both A_i and B_i, as a cause should be, and suppose that, according to H_cc, for each i, C_i and ${\bar{C}}_{i}$ both have nonzero probability. Then, even though, for each C_i, the truth or falsity of C_i screens off informational relations between A_i and B_i, the supposition of H_cc leads one to expect correlations between the A_is and the B_is.

\begin{matrix} (14) & P r (A_{i} B_{i} ∣ H_{c c}) > P r (A_{i} ∣ H_{c c}) P r (B_{i} ∣ H_{c c}) . \end{matrix}

Let us now see in more detail how this works. We consider the bearing of the statistical evidence stemming from observation of the A-sequence and the B-sequence on members of a family of hypotheses, each of which posits the existence of a Reichenbachian common cause. For simplicity, we consider only hypotheses on which distinct A_is are independent and identically distributed, as are ${B_{i}}$ and ${A_{i} B_{i}}$ . The statistical data can be accounted for on a hypothesis positing C_is that are also independently and identically distributed. Any hypothesis positing a common cause of this sort can be characterized by five parameters:

\begin{matrix} (15) & \begin{matrix} p = P r (C_{i}), \\ a_{1} = P r (A_{i} ∣ C_{i}), & a_{0} = P r (A_{i} ∣ {\bar{C}}_{i}), \\ b_{1} = P r (B_{i} ∣ C_{i}), & b_{0} = P r (B_{i} ∣ {\bar{C}}_{i}) . \end{matrix} \end{matrix}

Probabilities for the A_is, B_is, conditional on a hypothesis of this sort, are

\begin{matrix} (16) & \begin{matrix} P r (A_{i} ∣ H_{c c}) = p a_{1} + (1 - p) a_{0}, \\ P r (B_{i} ∣ H_{c c}) = p b_{1} + (1 - p) b_{0}, \end{matrix} \end{matrix}

and their covariance is

\begin{matrix} (17) & \begin{matrix} Cov (A_{i}, B_{i} ∣ H_{c c}) & = P r (A_{i} B_{i} ∣ H_{c c}) - P r (A_{i} ∣ H_{c c}) P r (B_{i} ∣ H_{c c}) \\ = p (1 - p) (a_{1} - a_{0}) (b_{1} - b_{0}) . \end{matrix} \end{matrix}

As pointed out by Reichenbach (Reference Reichenbach1956, 159–61), and as can be readily seen from (17), if p ∈ (0, 1) and a ₁ − a ₀ and b ₁ − b ₀ are both positive, then, conditional on the hypothesis H_cc, the A_is are positively correlated with the B_is. Obviously, the same conclusion follows if a ₁ − a ₀ and b ₁ − b ₀ are both negative; also, the A_is are negatively correlated with the B_is if a ₁ − a ₀ and b ₁ − b ₀ have opposite sign, and they are uncorrelated if the C_is are irrelevant to either the A_is or the B_is, that is, if a ₁ = a ₀ or b ₁ = b ₀.Footnote ¹² The family of all such hypotheses, thus, includes as a special case those that posit no common cause for A_i and B_i.

We inquire into the degree of support lent to common-cause hypotheses, with various values of the parameters, by the pair {E ₁, E ₂}. Let H_cc be some hypothesis of the form considered above. We have, from (10),

\begin{matrix} (18) & R (H_{c c}; E_{1} E_{2}) = R (H_{c c}; E_{1}) + R (H_{c c}; E_{2}) + M I U_{1} ({E_{1}, E_{2}}; H_{c c}) . \end{matrix}

Since we are interested in comparing degrees of support for different hypotheses on a fixed body of evidence, it is useful to compare log-likelihoods, as, for two different hypotheses, the differences between their R-values will be the same as the differences between the respective log-likelihoods. The log-likelihoods can be partitioned in a manner parallel to our partitioning of R:

\begin{matrix} (19) & \log P r (E_{1} E_{2} ∣ H_{c c}) = \log P r (E_{1} ∣ H_{c c}) + \log P r (E_{2} ∣ H_{c c}) + I (E_{1}, E_{2} ∣ H_{c c}) . \end{matrix}

The first two terms of this are

\begin{matrix} (20) & \begin{matrix} \log P r (E_{1} ∣ H_{c c}) = n (A) \log P r (A_{i} ∣ H_{c c}) + n (\bar{A}) \log P r ({\bar{A}}_{i} ∣ H_{c c}); \\ \log P r (E_{2} ∣ H_{c c}) = n (B) \log P r (B_{i} ∣ H_{c c}) + n (\bar{B}) \log P r ({\bar{B}}_{i} ∣ H_{c c}) . \end{matrix} \end{matrix}

These are maximized by a hypothesis H_cc that has Pr(A_i∣H_cc) = f(A) and Pr(B_i∣H_cc) = f(B). That is, these terms are largest for hypotheses that posit probabilities for the A_is and B_is that are equal to the observed relative frequencies.

The mutual information of E ₁ and E ₂, conditional on a hypothesis H_cc, is

\begin{matrix} (21) & \begin{matrix} I (E_{1}, E_{2} ∣ H_{c c}) & = n (A B) I (A_{i}, B_{i} ∣ H_{c c}) + n (A \bar{B}) I (A_{i}, {\bar{B}}_{i} ∣ H_{c c}) \\ + n (\bar{A} B) I ({\bar{A}}_{i}, B_{i} ∣ H_{c c}) + n (\bar{A} \bar{B}) I ({\bar{A}}_{i}, {\bar{B}}_{i} ∣ H_{c c}) . \end{matrix} \end{matrix}

Once Pr(A_i∣H_cc) and Pr(B_i∣H_cc) are fixed, this is maximized by taking

\begin{matrix} (22) & P r (A_{i} B_{i}) = f (A B) . \end{matrix}

Thus, in the expression (19) for the log-likelihood, we see that the first two terms reward hypotheses whose probabilities for A_i and B_i are close to the observed relative frequencies of these, and the last term, which corresponds to unification in the Mutual Information sense, rewards hypotheses with theoretical correlations close to the observed statistical correlations. What goes for log-likelihoods goes also for the evidential support R. Thus, when there is a difference between f(AB) and f(A)f(B), a common-cause hypothesis on which this difference is expected, by virtue of appropriate values of the parameters, counts as a MIUnifying hypothesis and thereby achieves greater support.

For example, consider a case in which we have two sequences {A_i}, {B_i}, with a significant positive correlation between them: f(AB) is much larger than f(A)f(B). Consider two hypotheses, H_cc and ${H^{'}}_{c c}$ , which posit the existence of sequences {C_i} and ${{C^{'}}_{i}}$ , respectively, such that

\begin{matrix} (23) & \begin{matrix} P r (A_{i} ∣ H_{c c}) = P r (A_{i} ∣ {H^{'}}_{c c}) \approx f (A); \\ P r (B_{i} ∣ H_{c c}) = P r (B_{i} ∣ {H^{'}}_{c c}) \approx f (B) . \end{matrix} \end{matrix}

Suppose, now that, H_cc correctly predicts the correlations, but ${H^{'}}_{c c}$ does not. That is, Pr(A_iB_i∣H_cc) is close to f(AB), but $P r (A_{i} B_{i} ∣ {H^{'}}_{c c})$ is not. In such a case we will have

\begin{matrix} (24) & M I U_{1} ({E_{1}, E_{2}}; H_{c c}) > {MIU}_{1} ({E_{1}, E_{2}}; {H^{'}}_{c c}) . \end{matrix}

Thus, for appropriate values of the parameters, the hypothesis H_cc affords MIUnification to the evidence set {E ₁, E ₂}, even though, in individual cases, the supposition C_i does not render A_i informative about B_i.

This does not prevent C_i from being regarded as a common origin of A_i and B_i. To take an example used by Lange in section 3 of his article, suppose that we take the clinical evidence to establish that some disease C can cause symptoms A and B. Then, if we observe A and B in some patient, this will raise our credence that C also occurs in that patient, even if the symptoms A and B are independent, conditional on C. In such a case, the support provided by the symptoms A and B to the hypothesis that the patient has disease C is just the sum of the supports given to the hypothesis by the individual items by themselves.

Lange raises the question whether we should place more credence in a hypothesis that posits a single disease than in one that posits two independent origins of the symptoms A and B. Suppose there are two other diseases D ₁ and D ₂, such that A but not B is a symptom of D ₁ and B but not A is a symptom of D ₂, and suppose further that the chance that a patient with D ₁ exhibits symptom A is the same as that of a patient with Cand that the chance that a patient with D ₂ exhibits symptom B is the same as that of a patient with C. Then, upon observation of both symptoms, the confirmational boost afforded to the hypothesis that the patient has C is the same as the boost afforded to the hypothesis that the patient has both D ₁ and D ₂. The issue then comes down to priors. Is the joint occurrence of D ₁ and D ₂ much rarer than the occurrence of C? If the answer is yes—as would be the case if the three diseases are equally rare, and D ₁ and D ₂ uncorrelated—then we should place more credence in the hypothesis that the patient has C. If not—if the disease C is so rare and D ₁ and D ₂ so common that more patients contract both D ₁ and D ₂ than C—then our credences should favor the two-disease hypothesis. It would clearly be a mistake for one’s credences to favor the C-hypothesis merely on the basis of a preference for common origin explanations.

8. Conclusion

MIU is not the same as common origin explanation and is neither a necessary nor a sufficient condition for a hypothesis to play an explanatory role. Nevertheless, in a host of interesting cases, MIUnification is a concomitant of common origin explanation. Moreover, when a hypothesis that renders an otherwise puzzling coincidence comprehensible by providing a common origin explanation does receive an incremental confirmational boost from a body of evidence, beyond that provided by the individual items of evidence, that boost stems from MIUnification.

So, at least, is the verdict delivered by a Bayesian analysis; there is no room in Bayesian conditionalization for an extra confirmatory boost that is due to COU. A proponent of an explanationist thesis, to the effect that we ought to take hypotheses that involve common origin explanations to receive greater incremental support than hypotheses that achieve the same degree of MIU without explanation, should be in a position to explain why what is impossible for a Bayesian agent is rational for us. As we have seen, there is a limited heuristic role for considerations of COU, based on considerations of Type II rationality. It is doubtful whether any stronger explanationist thesis can be defended.

Appendix

Appendix Given a probability function Pr and propositions h, e ₁, e ₂, define

\begin{matrix} (A1) & U_{1} = \frac{P r (e_{1} e_{2} ∣ h)}{P r (e_{1} ∣ h) P r (e_{2} ∣ h)} \frac{P r (e_{1}) P r (e_{2})}{P r (e_{1} e_{2})}; \end{matrix}

\begin{matrix} (A2) & U_{2} = \frac{P r (e_{1} e_{2} ∣ \bar{h})}{P r (e_{1} ∣ \bar{h}) P r (e_{2} ∣ \bar{h})} \frac{P r (e_{1}) P r (e_{2})}{P r (e_{1} e_{2})} . \end{matrix}

Then we have

\begin{matrix} (A3) & M I U_{1} (e_{1}, e_{2}; h) = \log U_{1}; \end{matrix}

\begin{matrix} (A4) & M I U_{2} (e_{1}, e_{2}; h) = \log (\frac{U_{1}}{U_{2}}) . \end{matrix}

Thus, MIU ₁(e ₁, e ₂; h) is positive if and only if U ₁ > 1, negative if and only if U ₁ < 1, and zero if and only if U ₁ = 1, and MIU ₂(e ₁, e ₂; h) is positive if and only if U ₁ > U ₂, negative if and only if U ₁ < U ₂, and zero if and only if U ₁ = U ₂.

We want to show that each of the following four alternatives can be realized by some probability function.

1. MIU ₁ > 0 and MIU ₂ > 0; that is, U ₁ > 1 and U ₁ > U ₂.
2. MIU ₁ > 0 and MIU ₂ < 0; that is, 1 < U ₁ < U ₂.
3. MIU ₁ < 0 and MIU ₂ > 0; that is, U ₂ < U ₁ < 1.
4. MIU ₁ < 0 and MIU ₂ < 0; that is, U ₁ < 1 and U ₁ < U ₂.

It is easy to show (see lemma 1, below) that, if either e ₁ or e ₂ is irrelevant to h, then if U ₁ > 1, U ₂ < 1 and vice versa. Thus, it is easy to construct examples that satisfy conditions 1 and 4. Take Pr(e ₁∣h) = Pr(e ₁). Then, on an any probability function with U ₁ > 1, we will have U ₂ < 1 < U ₁, and condition 1 will be satisfied. Similarly, if Pr(e ₁∣h) = Pr(e ₁), on any probability function with U ₁ < 1, we will have U ₁ < 1 < U ₂, and condition 4 will be satisfied.

For condition 2, we need to have both U ₁ and U ₂ greater than 1. As is shown in lemma 1, below, this is possible only if e ₁ and e ₂ are relevant to h in opposite directions, that is, only if R(h; e ₁) and R(h; e ₂) have opposite signs. Here is one way to do it. Take, for simplicity, Pr(h) = Pr(e ₁) = Pr(e ₂) = 1/2, and take Pr(e ₁e ₂) = 1/4. Take Pr(e ₁∣h) = 0.7, Pr(e ₂∣h) = 0.3, and Pr(e ₁e ₂∣h) = 0.24. The reader can readily verify that these are consistent and that they determine the full probability function on Boolean combinations of {h, e ₁, e ₂}. In particular, they entail that $P r (e_{1} ∣ \bar{h}) = 0.3$ , $P r (e_{2} ∣ \bar{h}) = 0.7$ , and $P r (e_{1} e_{2} ∣ \bar{h}) = 0.26$ . We thus have U ₁ = 24/21 and U ₂ = 26/21, satisfying the desired conditions.

For condition 3, we can take the probability assignment described in the previous paragraph and create a new one by interchanging e ₂ and ${\bar{e}}_{2}$ . We have, once again, Pr(h) = Pr(e ₁) = Pr(e ₂) = 1/2, Pr(e ₁e ₂) = 1/4, Pr(e ₁∣h) = 0.7, and $P r (e_{1} ∣ \bar{h}) = 0.3$ . We also have Pr(e ₂∣h) = 0.7, and Pr(e ₁e ₂∣h) = 0.46. These further entail that $P r (e_{2} ∣ \bar{h}) = 0.3$ , and $P r (e_{1} e_{2} ∣ \bar{h}) = 0.04$ . We thus have U ₁ = 46/49, and U ₂ = 4/9, and so U ₂ < U ₁ < 1, and condition 4 is satisfied.

Having shown that all four alternatives are possible, we now prove the lemma alluded to above.

Lemma 1. Let {h, e ₁, e ₂} be logically independent propositions, and let Pr be a probability function on the Boolean algebra generated by this set. We assume that the denominators of the relevant fractions are nonzero and define U ₁ and U ₂ as above.

a) If Pr(h∣e ₁) = Pr(h) or Pr(h∣e ₂) = Pr(h), then if U ₁ > 1, U ₂ < 1 and vice versa.
b) If U ₁ and U ₂ are both less than one, then either both e ₁ and e ₂ are positively relevant to h or they are both negatively relevant to h.
c) If U ₁ and U ₂ are both greater than one, then one of {e ₁, e ₂} is positively relevant to h, and the other negatively relevant.

Proof. Let

\begin{matrix} (A5) & \begin{matrix} p = P r (h); & q = P r (\bar{h}) = 1 - p; \\ α_{1} = P r (h | e_{1}) / P r (h); & α_{2} = P r (h | e_{2}) / P r (h); \\ β_{1} = P r (\bar{h} | e_{1}) / P r (\bar{h}); & β_{2} = P r (\bar{h} | e_{1}) / P r (\bar{h}) . \end{matrix} \end{matrix}

This allows us to write

\begin{matrix} (A6) & U_{1} = \frac{1}{α_{1} α_{2}} \frac{P r (e_{1} e_{2} ∣ h)}{P r (e_{1} e_{2})}; U_{2} = \frac{1}{β_{1} β_{2}} \frac{P r (e_{1} e_{2} ∣ \bar{h})}{P r (e_{1} e_{2})} . \end{matrix}

Once p, α₁, α₂, β₁, and β₂ are fixed, this yields a constraint on U ₁ and U ₂:

\begin{matrix} (A7) & p α_{1} α_{2} U_{1} + q β_{1} β_{2} U_{2} = 1 . \end{matrix}

It is convenient to write this in terms of a weighted average of U ₁ and U ₂. Define

\begin{matrix} (A8) & w_{1} = \frac{p α_{1} α_{2}}{p α_{1} α_{2} + q β_{1} β_{2}}; w_{2} = \frac{q β_{1} β_{2}}{p α_{1} α_{2} + q β_{1} β_{2}} . \end{matrix}

Then (A7) becomes

\begin{matrix} (A9) & w_{1} U_{1} + w_{2} U_{2} = \frac{1}{p α_{1} α_{2} + q β_{1} β_{2}}, \end{matrix}

with w ₁ and w ₂ both nonnegative, and

\begin{matrix} (A10) & w_{1} + w_{2} = 1 . \end{matrix}

It is instructive to rewrite the right-hand side of (A9) using the fact that pα₁ + qβ₁ = pα₂ + qβ₂ = 1. A bit of algebraic manipulation yields

\begin{matrix} (A11) & w_{1} U_{1} + w_{2} U_{2} = 1 - \frac{p q (α_{1} - β_{1}) (α_{2} - β_{2})}{p α_{1} α_{2} + q β_{1} β_{2}} . \end{matrix}

From (A11) it is readily apparent that, if either e ₁ or e ₂ is irrelevant to h—that is, if α₁ = β₁ or α₂ = β₂—then

\begin{matrix} (A12) & w_{1} U_{1} + w_{2} U_{2} = 1, \end{matrix}

and in such a case, if U ₁ > 1, then U ₂ < 1, and vice versa. If we want to construct a case in which U ₁ and U ₂ are both greater than one, this requires the right-hand side of (A11) to be greater than one, which means that α₁ − β₁ and α₂ − β₂ must have opposite sign: one of {e ₁, e ₂} must be positively relevant to h, and the other negatively relevant. If we want to construct a case in which U ₁ and U ₂ are both less than one, then α₁ − β₁ and α₂ − β₂ must have the same sign: e ₁ and e ₂ are either both positively relevant or both negatively relevant to h. QED

Footnotes

†

I thank Michel Janssen, Marc Lange, Bill Harper, and Molly Kao for helpful discussions. I am grateful to Clark Glymour for raising the question, addressed in sec. 7, of how common-cause explanations fit into the framework. This work was supported, in part, by a grant from the Social Sciences and Humanities Research Council of Canada.

1. I am grateful to Michel Janssen for making this suggestion. See Myrvold (Reference Myrvold2011), and references therein, on the subject of how to incorporate cognitive values into a Bayesian framework.

2. Cf. Salmon (Reference Salmon, Hon and Rackover2001, 130): “the scientist might say that Halley’s hypothesis is worth pursuing, not because it is more likely to be true, but because, if it should turn out to be true, it would be extremely valuable in terms of informational content.”

3. Similar remarks apply to probabilistic measures of explanatory power such as those proposed by Popper (Reference Popper1954, Reference Popper1959), Good (Reference Good1960), Schupbach and Sprenger (Reference Schupbach and Sprenger2011), and Crupi and Tentori (Reference Crupi and Tentori2012). Glymour (Reference Glymour2015) has argued that it would be a grave mistake to take any of these probabilistic notions as an explication of explanatory power. This seems to be generally accepted by recent authors; Schupbach and Sprenger, for example, are clear that what is proposed is a measure of strength of explanation between propositions bearing an antecedently identified explanatory relation to each other.

4. A note on notation. We use concatenation for conjunction, and the overbar p¯ for the negation of p. We use boldface letters to denote sets of propositions. Note that these are sets and are not replaceable by a single proposition that is their conjunction. Thus, {p ₁, p ₂} is not the same set as {p ₁p ₂, T}, where T is the logically true proposition, although the conjunction of their members is the same. This matters because we will be concerned with the mutual informativeness of members of a set of propositions; p ₁ and p ₂ may be mutually informative, although the logically true proposition is not informative about their conjunction or anything else.

5. Obviously, a single number cannot capture all the informational relations there could be between elements of a set of more than two members. This would require a specification of all I(q, q′∣b), where q and q′ range over all conjunctions of elements of p. But it is this quantity that will be useful for the purposes at hand.

6. I am indebted to Brössel (Reference Brössel2015) for pointing this out.

7. This quantity is the logarithm of a quantity that was referred to as an “interaction term” in Myrvold (Reference Myrvold1996) and is called focused correlation in Wheeler (Reference Wheeler2009), Schlosshauer and Wheeler (Reference Schlosshauer and Wheeler2011), and Wheeler and Scheines (Reference Wheeler and Scheines2013). What we are calling MIU ₁ was called U (for unification) in Myrvold (Reference Myrvold2003): MIU ₂ was discussed therein, although not given its own name.

8. And if your intuitions find it repugnant to use the word “unification” in connection with either of these, then feel free to use a different word.

9. This is a slip; the two senses are, as Lange emphasizes, logically independent.

10. Equation (10) corresponds to (6) of Myrvold (Reference Myrvold1996) and to (12) of Myrvold (Reference Myrvold2003); (11) corresponds to (13) of Myrvold (Reference Myrvold2003). Closely related results appear already in Keynes (Reference Keynes1921, 151–54); in particular, our eq. (11) is essentially the same as Keynes’s (48).

11. I say “it” because a being with precise numerical credences would be far from human.

12. These probabilistic facts were familiar in the statistical literature well before Reichenbach’s use of them; see Yule (Reference Yule1911, secs. 4.6–7).

References

Brössel, P. 2015. “Keynes’s Coefficient of Dependence Revisited.” Erkenntnis 80:521–53.CrossRef Google Scholar

Crupi, V., and Tentori, K.. 2012. “A Second Look at the Logic of Explanatory Power with Two Novel Representation Theorems.” Philosophy of Science 79:365–85.CrossRef Google Scholar

Glymour, C. 2015. “Probability and the Explanatory Virtues.” British Journal for the Philosophy of Science 66:591–604.CrossRef Google Scholar

Good, I. J. 1950. Probability and the Weighing of Evidence. London: Griffin.Google Scholar

Good, I. J. 1960. “Weight of Evidence, Corroboration, Explanatory Power, Information and the Utility of Experiments.” Journal of the Royal Statistical Society B 22:319–22.Google Scholar

Good, I. J. 1971. “Twenty-Seven Principles of Rationality.” In Foundations of Statistical Inference, ed. Godambe, V. P. and Sprott, D. A., 123–27. Toronto: Holt, Rinehart & Winston. Repr. in I. J. Good, Good Thinking: The Foundations of Probability and Its Applications (Minneapolis: University of Minnesota Press, 1983), 15–19.Google Scholar

Sprott, D. A. 1976. “The Bayesian Influence; or, How to Sweep Subjectivism under the Carpet.” In Foundations of Probability Theory, Statistical Inference, and Statistical Theories of Science, ed. Harper, W. L. and Hooker, C., 2:125–74. Dordrecht: Reidel. Repr. in I. J. Good, Good Thinking: The Foundations of Probability and Its Applications (Minneapolis: University of Minnesota Press, 1983), 22–55.Google Scholar

Henderson, L. 2014. “Bayesianism and Inference to the Best Explanation.” British Journal for the Philosophy of Science 65:687–715.CrossRef Google Scholar

Janssen, M. 2002. “COI Stories: Explanation and Evidence in the History of Science.” Perspectives on Science 10:457–522.CrossRef Google Scholar

Kao, M. 2015. “Unification and the Quantum Hypothesis in 1900–1913.” Philosophy of Science 82:1200–1210.CrossRef Google Scholar

Keynes, J. M. 1921. A Treatise on Probability. London: Macmillan.Google Scholar

Lange, M. 2004. “Bayesianism and Unification: A Reply to Wayne Myrvold.” Philosophy of Science 71:205–15.CrossRef Google Scholar

Lipton, P. 2001. “Is Explanation a Guide to Inference? A Reply to Wesley C. Salmon.” In Explanation: Theoretical Approaches and Applications, ed. Hon, G. and Rackover, S. S., 93–120. Dordrecht: Kluwer.CrossRef Google Scholar

Rackover, S. S. 2004. Inference to the Best Explanation. 2nd ed. London: Routledge.Google Scholar

McGrew, T. 2003. “Confirmation, Heuristics, and Explanatory Reasoning.” British Journal for the Philosophy of Science 54:553–67.CrossRef Google Scholar

Myrvold, W. C. 1996. “Bayesianism and Diverse Evidence: A Reply to Andrew Wayne.” Philosophy of Science 63:661–65.CrossRef Google Scholar

Myrvold, W. C. 2003. “A Bayesian Account of the Virtue of Unification.” Philosophy of Science 70:399–423.CrossRef Google Scholar

Myrvold, W. C. 2011. “Epistemic Values and the Value of Learning.” Synthese 87:547–68.Google Scholar

Perrin, J. 1913. Les Atomes. Paris: Alcan.Google Scholar

Perrin, J. 1916. Atoms. Trans. Hammick, D. L.. New York: Van Nostrand.Google Scholar

Popper, K. R. 1954. “Degree of Confirmation.” British Journal for the Philosophy of Science 5:143–49.CrossRef Google Scholar

Popper, K. R. 1959. The Logic of Scientific Discovery. New York: Basic.Google Scholar

Ptolemy. 1984. Ptolemy’s Almagest. Trans. Toomer, G. J.. London: Duckworth.Google Scholar

Reichenbach, H. 1956. The Direction of Time. Berkeley: University of California Press.CrossRef Google Scholar

Salmon, W. C. 2001. “Reflections of a Bashful Bayesian: A Reply to Peter Lipton.” In Explanation: Theoretical Approaches and Applications, ed. Hon, G. and Rackover, S. S., 121–35. Dordrecht: Kluwer.Google Scholar

Schlosshauer, M., and Wheeler, G.. 2011. “Focused Correlation, Confirmation, and the Jigsaw Puzzle of Variable Evidence.” Philosophy of Science 78:376–92.CrossRef Google Scholar

Schupbach, J. N. 2005. “On a Bayesian Analysis of the Virtue of Unification.” Philosophy of Science 72:594–607.CrossRef Google Scholar

Schupbach, J. N., and Sprenger, J.. 2011. “The Logic of Explanatory Power.” Philosophy of Science 78:105–27.CrossRef Google Scholar

Shogenji, T. 1999. “Is Coherence Truth Conducive?” Analysis 59:338–45.CrossRef Google Scholar

van Fraassen, B. 1989. Laws and Symmetry. Oxford: Oxford University Press.CrossRef Google Scholar

Wayne, A. 1995. “Bayesianism and Diverse Evidence.” Philosophy of Science 62:111–21.CrossRef Google Scholar

Wheeler, G. 2009. “Focused Correlation and Confirmation.” British Journal for the Philosophy of Science 60:79–100.CrossRef Google Scholar

Wheeler, G., and Scheines, R.. 2013. “Coherence and Confirmation through Causation.” Mind 122:135–70.CrossRef Google Scholar

Yule, G. U. 1911. An Introduction to the Theory of Statistics. London: Griffin.CrossRef Google Scholar

Article contents

On the Evidential Import of Unification

Abstract

1. Introduction

2. Two Kinds of Unification

3. Probabilistic Measures of Unification

4. The Evidential Value of Unification

5. Possible Reactions to the Bayesian Verdict

6. A Prior Preference for Unifying Hypotheses?

7. Unification and Reichenbachian Common Causes

8. Conclusion

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests