1. Introduction
A central problem of formal epistemology and philosophy of science is explicating what does it mean, for a piece of evidence E, to confirm a hypothesis H (see, e.g., Crupi Reference Crupi and Zalta2015). Prominent accounts of confirmation define the degree to which E confirms or supports H in terms of the probabilistic relations between E and H. Accordingly, given a probability distribution p, several different measures of confirmation (or inductive support) C(H, E) can be defined: two well-known examples are the difference measure D(H, E) = p(H∣E) − p(H) proposed by Carnap (Reference Carnap1950/1962) and the ratio measure R(H, E) = p(H∣E)/p(H) introduced by Keynes (Reference Keynes1921). It is customary to say that C(H, E) is a Bayesian measure (of confirmation) when the probabilities occurring in its definition are epistemic probabilities, that is, express the degrees of belief of some (ideal) inquirer in the relevant propositions. For instance, D and R are usually construed as Bayesian measures, with the initial probability p(H) expressing the degree of belief in the truth of H of an inquirer who lacks any relevant (empirical) evidence, and the final probability p(H∣E) the inquirer’s degree of belief in H once evidence E is taken into account.
In this article, we focus on a specific class of Bayesian measures of confirmation, that is, so-called incremental measures. Such measures—like D and R above—are supposed to express how much learning E increases the probability of H. In other words, if C is an incremental measure, then C(H, E) expresses the probability increment occurring in the shift from p(H) to p(H∣E). Over the last 15 years, confirmation theorists have been exploring a plethora of incremental measures, grounded in significantly different intuitions concerning confirmation. The problem of assessing the relative adequacy of such measures has recently attracted increasing attention among philosophers of science (e.g., Festa Reference Festa, Galavotti and Pagnini1999, Reference Pagnini2012; Fitelson Reference Fitelson1999, Reference Fitelson2007; Kuipers Reference Kuipers2000; Zalabardo Reference Zalabardo2009; Crupi, Festa, and Buttasi Reference Crupi, Festa, Buttasi, Suárez, Dorato and Rédei2010; Iranzo and Martínez de Lejarza Reference Iranzo and de Lejarza2012; Glass and McCartney Reference Glass and McCartney2014; Roche Reference Roche2014, Reference Roche2015a; Roche and Shogenji Reference Roche and Shogenji2014; Crupi Reference Crupi and Zalta2015). An important motivating issue for this line of inquiry is what Fitelson (Reference Fitelson1999) has called the problem of measure sensitivity—roughly, the fact that the soundness of many philosophical and methodological arguments surrounding the notion of confirmation crucially depends on the specific measure adopted to explicate this notion (see also Festa Reference Festa, Galavotti and Pagnini1999; Brössel Reference Brössel2013). The assumption underlying, more or less explicitly, the ongoing discussion is that an adequate incremental measure should exhibit, so to speak, an appropriate “grammar” (as Crupi et al. [Reference Crupi, Festa, Buttasi, Suárez, Dorato and Rédei2010] put it), that is, a set of properties that formally express plausible intuitions about confirmation.
The grammar of incremental confirmation is the main topic of our article, too. In particular, we will consider some “likelihood principles” recently proposed in the literature as essential properties of any adequate incremental measure. A likelihood principle requires certain relationships to hold between C(H, E) and the likelihoods of H and ¬H with respect to E, that is, between C(H, E) and the probabilities p(E∣H) and p(E∣¬H) or, at least, between C(H, E) and one of those values. Such principles play a key role in current discussions of Bayesian confirmation, with “likelihoodist” theorists arguing that they are necessary—or even sufficient—to adequately define incremental measures.Footnote 1 In this article, we aim at contributing to the ongoing discussion in this area by presenting some new results concerning the different likelihood principles currently on the market. More precisely, our main results are as follows: First, we offer a systematic survey of different likelihood principles (some of them new), characterize their content, and study the logical relations between them, which have so far remained often obscure or unnoticed in the literature. Second, we show that none of the likelihood principles proposed so far is satisfied by all incremental measures; in particular, the so-called Weak Law of Likelihood, which plays a prominent role in recent analyses of Bayesian confirmation, is violated by some incremental measures that are grounded in appealing core intuitions. Third, we argue that the above-mentioned likelihood principles have to be supplemented with new ones, including some prima facie very strange principles, that we call antilikelihood principles; in fact, quite surprisingly, some intuitively appealing incremental measures satisfy at least one of such antilikelihood principles. The upshot of our discussion is that some purportedly basic properties of confirmation are not so fundamental as they are widely believed to be, since some incremental measures violate them. This implies that the grammar of Bayesian confirmation is richer, and the notion of incremental measure more flexible, than previously thought.
Our discussion will proceed as follows. In section 2 we outline the grammar of Bayesian confirmation, stating the basic properties defining incremental measures. Section 3 introduces some (old and new) likelihood principles and explores their logical and conceptual relations. In section 4, we present a new measure of confirmation, which violates some widely accepted likelihoodist intuitions and satisfies a surprising antilikelihood principle. Section 5 concludes the article by discussing some prospective implications of our results for ongoing work on the grammar of Bayesian confirmation. The proofs of all theorems appear in the appendix.
2. Incremental Measures of Bayesian Confirmation
Here we summarize some more or less well-known properties of incremental measures of Bayesian confirmation. First, we introduce the qualitative notion of confirmation (sec. 2.1); then, we formally define the concept of incremental measure (sec. 2.2). We also introduce a distinction between “universal” properties of confirmation (characterizing all incremental measures) and “structural” properties (isolating specific classes of such measures), which will play a central role in the rest of the article.
Two points are worth noting before we start. First, in the literature the term “incremental confirmation” is usually employed in a rather loose way, so that different scholars end up attaching quite different meanings to it. Our definition of incremental confirmation will amount to a rigorous explication of a widely shared use of this term—roughly, an incremental measure is a function of p(H∣E) and p(H) that increases when p(H∣E) increases. Absent a fully general consensus on the meaning of “incremental confirmation,” however, our definition will exclude some confirmation measures that are occasionally labeled “incremental” in the literature.Footnote 2
Second, in order to ensure mathematical definiteness, we will focus on hypotheses H and pieces of evidence E with nonextreme probability values, that is, such that 0 < p(H), p(E) < 1.Footnote 3 The only exception will be the occasional reference to tautological (logically true) evidence; as far as notation is concerned, the tautology will be denoted, as usual, by ⊤.
2.1. Qualitative Confirmation
The starting point of the search for appropriate measures of confirmation is the notion of qualitative confirmation. The guiding intuition is that the specific confirmatory relation occurring between H and E depends on whether and how the initial probability of H is changed by learning E.Footnote 4 This qualitative notion of confirmation is usually defined as follows:
Qualitative Confirmation. For any H and E,
E confirms H if and only if p(H∣E) > p(H) | (confirmation in narrow sense); |
E is neutral for H if and only if p(H∣E) = p(H) | (neutrality); |
E disconfirms H if and only if p(H∣E) < p(H) | (disconfirmation). |
The above threefold classification of the confirmatory relations between H and E should be construed, to use the statistical jargon, as a qualitative ordinal variable.Footnote 5 This means that the “intensity” of confirmation decreases in the shift from confirmation (in the narrow sense) to neutrality and from neutrality to disconfirmation. In other words, Qualitative Confirmation expresses an ordinal ranking (i.e., not a mere trichotomy) of different confirmation relations, which can then provide a foundation for an appropriate quantitative notion of confirmation. Many confirmation theorists would presumably agree that any such notion should be compatible with Qualitative Confirmation in the sense that it should extend the partial ordering just defined. We propose to formalize this requirement in terms of the following principle:
Compatibility (with qualitative confirmation). For any H1, H2 and E1, E2:
i) If E1 confirms H1 and E2 is neutral for H2, then
;
ii) If E1 is neutral for H1 and E2 disconfirms H2, then
;
iii) If E1 is neutral for H1 and E2 is neutral for H2, then
.
As we will see in a moment, all incremental measures are indeed compatible with Qualitative Confirmation in the sense just specified; however, not all measures satisfying Compatibility need to be incremental.Footnote 6 Accordingly, Compatibility is too weak a requirement to isolate incremental measures; other principles are needed, which will now be discussed.
2.2. The Grammar of Incremental Measures of Confirmation
An incremental measure of confirmation C(H, E) may be informally described as a function of p(H∣E) and p(H) that increases when p(H∣E) increases. The grammar of incremental measures is currently being thoroughly investigated by various authors (see esp. Festa Reference Festa, Galavotti and Pagnini1999, Reference Pagnini2012; Fitelson Reference Fitelson1999, Reference Fitelson2007; Crupi et al. Reference Crupi, Festa, Buttasi, Suárez, Dorato and Rédei2010; Brössel Reference Brössel2013; Hájek and Joyce Reference Hájek, Joyce, Psillos and Curd2013; Roche Reference Roche2014; Roche and Shogenji Reference Roche and Shogenji2014; Crupi Reference Crupi and Zalta2015). Following Crupi (Reference Crupi and Zalta2015), we formally characterize incremental measures by means of the following basic principles, or axioms:
P1. Formality. There exists a function f such that, for any H and E,
.
P2. Tautological evidence. For any H1 and H2,
.
P3. Final probability. For any H, E1, E2, if
, then
.
Some remarks about the intuitive meaning of P1–P3 will be useful.Footnote 7 P1 requires that C(H, E) depends only on p(H∣E), p(H), and p(E). Since this triple of probabilities entirely determines the probability distribution p over the algebra generated by H and E, P1 amounts to requiring that C(H, E) depends only on this distribution. P2 requires that all hypotheses receive the same degree of confirmation by the tautological evidence ⊤. Finally, P3 requires that the confirmation of a hypothesis is a strictly increasing function of its final probability.Footnote 8
By definition, all incremental measures satisfy the basic principles P1–P3 and all the other principles that can be logically derived from them (some examples follow in a moment). Such (basic and derived) principles isolate the properties characterizing all incremental measures and are thus labeled “universal” principles. Yet, certain incremental measures may exhibit further interesting properties, which are specified by corresponding “structural” principles. Such principles are logically independent of the basic principles (i.e., they are neither entailed by nor incompatible with P1–P3) and characterize specific (classes of) incremental measures. Two examples of universal principles and one of a structural principle are given below.
Our first example of a universal principle is Compatibility. Indeed, one can check that the basic principles P1–P3 jointly entail Compatibility, that is, that any incremental measure is compatible with Qualitative Confirmation. Our second example is the following consequence of the basic principles—actually of P3 alone (cf. Crupi et al. Reference Crupi, Festa, Buttasi, Suárez, Dorato and Rédei2010, theorem 3):
(IFPD) Initial and Final Probability Dependence. There exists a function g such that, for any H and E, .
The above principle requires that any incremental measure is uniquely determined by the initial and final probability of H, that is, that C(H, E) depends only on p(H∣E) and p(H). Note that IFPD is stronger than the basic principle P1 since the latter allows for the possibility that C(H, E) depends not only on p(H∣E) and p(H) but also on p(E).
It is worth noting that IFPD does not say anything on how C(H, E) should depend on the initial probability p(H). The reader familiar with traditional incremental measures like D and R might suspect that the relations between C(H, E) and p(H) should be ruled by the following condition:
(IP) Initial probability. For any H1, H2, and E if and
, then
.
Condition IP—which is, so to speak, the counterpart of principle P3 for final probability—requires that the confirmation of a hypothesis is a strictly decreasing function of its initial probability. As it is easy to check, most traditional incremental measures satisfy IP. For this reason, one might feel that IP is a universal principle of incremental confirmation. That this is not the case is shown by the fact that there are incremental measures that satisfy all the basic principles P1–P3 but still violate IP. One example is the measure p(H)[p(H∣E) − p(H)], introduced by Crupi et al. (Reference Crupi, Festa, Buttasi, Suárez, Dorato and Rédei2010, 89, theorem 1). Thus, IP provides our first example of a structural principle of Bayesian confirmation.Footnote 9
3. Strong and Weak Likelihood Principles for Bayesian Confirmation
As said in the introduction, likelihood principles require that certain relationships hold between C(H, E) and (one or both of) the likelihoods p(E∣H) and p(E∣¬H). In what follows, we consider one universal and five structural principles of this kind (through this section, unless stated otherwise, “principle(s)” will always mean “likelihood principle(s)”). To begin with, it is worth noting how the basic principle P1 can be reformulated, so to speak, in likelihoodist terms. To recall, P1 states that C(H, E) only depends on the probability distribution over the algebra generated by H and E. Such a distribution can be determined by different triples of probability values, [p(H∣E), p(H), p(E)] being only one of them. Any such triple can be used in P1 to make C(H, E) dependent on the relevant probabilities. For our purposes, the following logically equivalent reformulation of P1 will be useful:
(LF) Likelihood form. There exists a function k such that, for any H and E, .
LF says that C(H, E) can be expressed in “evidential terms,” that is, as a function of the initial probability p(E) of evidence E and of its conditional probabilities p(E∣H) and p(E∣¬H). The label for this principle is justified by noting that p(E) can be construed as the likelihood p(E∣⊤) of a tautological hypothesis. Accordingly, LF says that C(H, E) depends only on the three likelihoods p(E∣H), p(E∣¬H), and p(E).
Given the reformulation of P1 as LF, a new likelihood principle can be immediately stated as follows:
(EL) Equal Likelihoods. For any E, H1, H2, if and
, then
.
Not surprisingly, given LF one can immediately check that:
Theorem 1. EL is a universal principle.
Admittedly, EL is an extremely weak principle, indeed a nearly trivial one. Still, we submit that it is the only universal likelihood principle of incremental confirmation. Our conjecture is supported by the fact that, as we will see in a moment, all likelihood principles considered so far in the literature are indeed structural principles.
Confirmation theorists have been discussing several likelihood principles apart from LF and EL (which, by the way, have both remained often unnoticed or implicit in the literature). Below we consider five such principles and illustrate their intuitive meaning. Afterward, we point out the logical relations occurring among them. Finally, we show that each of these principles is, indeed, a structural principle (i.e., that it is satisfied by some incremental measures and violated by others).
The best known likelihood principle is probably the so-called Law of Likelihood (LL), which is stated as follows:Footnote 10
(LL) If , then
.
This “law” expresses the intuition that E confirms H1 more than H2 just when H1 is better than H2 in predicting E (i.e., when E is more probable given H1 than given H2). Given LF, LL amounts to saying that C(H, E) does not depend on p(E∣¬H) and is a strictly increasing function of p(E∣H).
The above principle naturally suggests the following one, which, to the best of our knowledge, has never been considered in the literature:
(N) If , then
.
The above principle can be seen as the “negative” counterpart of LL (hence the label). In fact, it is formally similar to LL in that it makes confirmation dependent only on one likelihood value; however, this is the likelihood of the negation of the relevant hypotheses, not of the hypotheses themselves as in the case of LL. More precisely, N expresses the intuition that E confirms H1 more than H2 just when ¬H1 is worse than ¬H2 in predicting E, that is, when E is less probable given ¬H1 than given ¬H2. Given LF, N amounts to saying that C(H, E) does not depend on p(E∣H) and that it is a strictly decreasing function of p(E∣¬H).
As anticipated in section 1, an important principle of Bayesian confirmation is the so-called Weak Law of Likelihood (WLL). Among the several versions of WLL introduced in the literature, we focus on the following version:Footnote 11
(WLL) If and
, then
.
The antecedent of WLL says that H1 predicts E “uniformly better” than H2 (this terminology is borrowed from Joyce [Reference Joyce and Zalta2008], sec. 3). This means that E is both more probable assuming that H1 is true than assuming that H2 is true and less probable assuming that H1 is false than assuming that H2 is false. Thus, WLL says that if H1 predicts E uniformly better than H2, then E confirms H1 more than H2. Given LF, WLL means that C(H, E) increases when the likelihood of H increases and the likelihood of ¬H decreases. Two structural likelihood principles closely related to WLL are as follows:
(WLL-L) If and
, then
;
(WLL-N) If and
, then
.
WLL-L expresses the intuition that if H1 is better than H2 in predicting E and ¬H1 is as good as ¬H2 in predicting E, then E confirms H1 more than H2. Analogously, WLL-N says that if H1 is as good as H2 in predicting E and ¬H1 is worse than ¬H2 in predicting E, then E confirms H1 more than H2. Given LF, WLL-L says that, for fixed values of ,
is a strictly increasing function of p(E∣H), while WLL-N states that, for fixed values of
,
is a strictly decreasing function of p(E∣¬H).
The five principles just introduced illustrate different ways in which C(H, E) may depend on one or both of the likelihoods. It is easy to check that such principles are genuinely different from each other, in the sense that no one of them is logically equivalent to another. However, they are by no means logically independent. The following theorem clarifies the relevant logical relations among the five principles (see also fig. 1, where these relations are graphically presented):
Theorem 2. The following logical relations hold:
i) LL entails WLL.
ii) N entails WLL.
iii) WLL-L & WLL-N entails WLL.
iv) LL and N are incompatible.
v) LL entails WLL-L.
vi) N entails WLL-N.
Theorem 2 maps the logical space of a fairly representative family of likelihood principles. As anticipated, no one of them is universal: each is structural, meaning that it is satisfied by some incremental measures and violated by others. To prove this, one only needs to consider two further measures apart from the difference and the ratio measures, D and R, introduced in section 1. The former is the “odds counterpart” of D, where the initial and final odds of H are defined as usual as and
:Footnote 12
Odds difference. .
The second, well-known measure was originally introduced by Gaifman (Reference Gaifman1979, 120):
Gaifman. .
Then one can check that all likelihood principles LL, N, WLL, WLL-L, and WLL-N are structural:
Theorem 3. Each of the likelihood principles LL, N, WLL, WLL-L, and WLL-N is satisfied by at least one of the incremental measures R, OD, and G and is violated by at least one of them.
Theorem 3 is not entirely surprising, since, for instance, LL is well known to be violated by many incremental measures, like D. Still, some other results above are probably much less expected. Let us consider, for instance, the fact that WLL is a structural principle (similar considerations apply to WLL-L and WLL-N). As many scholars have remarked, all traditional measures of confirmation in the literature do satisfy WLL (e.g., Fitelson Reference Fitelson2007; Joyce Reference Joyce and Zalta2008; Crupi et al. Reference Crupi, Festa, Buttasi, Suárez, Dorato and Rédei2010; Roche and Shogenji 2014). Indeed, WLL appears to be such an undemanding principle that it has been often regarded as a very minimal adequacy condition for Bayesian confirmation.Footnote 13 In view of this, the fact that WLL is not a universal principle has significant implications for the grammar of Bayesian confirmation; in the next section, we explore some of them.

Figure 1. Logical relations among the five likelihood principles discussed in this article. Arrows represent logical entailment; dashed lines, logical incompatibility (⊥); dotted lines, logical independence; and solid lines, conjunction (&).
4. Antilikelihood Principles for Bayesian Confirmation
In the previous section we argued that no likelihood principle—with the exception of the very weak principle EL—holds universally, that is, for all incremental measures as defined in section 2.2. Up to this point, however, the reader may still think that this finding is not dramatically interesting for the grammar of Bayesian confirmation. After all, the fact that WLL is not a universal principle may depend on the specific properties of “strange” measures like OD. Below, we show that, in contrast with this idea, there are intuitively well-motivated incremental measures that not only violate WLL (and other traditional likelihood principles) but indeed satisfy some “antilikelihood principles” that express intuitions diametrically opposed to the ones underlying WLL.
4.1. Confirmation as Reduction of Improbability
At the beginning of this article, we presented the central intuition concerning Bayesian confirmation by saying that evidence E confirms hypothesis H when E increases the initial probability of H. Such intuition, however, may be also expressed as follows: E confirms H when E decreases the initial improbability of H. In this way, confirmation is construed not as increment of probability but as reduction of improbability. Accordingly, there are two ways of thinking about an incremental measure C(H, E). The first is the familiar one for which C(H, E) is a function of p(H) and p(H∣E) that increases when p(H∣E) increases. The second is as follows: given suitable definitions of the initial and final improbability of H—denoted by imp(H) and imp(H∣E), respectively—an incremental measure C(H, E) is a function of imp(H) and imp(H∣E) that increases when imp(H∣E) decreases.
The notion of confirmation as reduction of improbability may appear as just a logically equivalent, but more cumbersome, rendition of the traditional one. Still, we submit, there at least two reasons to consider the reduction-of-improbability intuition as a useful complement of the increment-of-probability intuition. First, the reduction-of-improbability intuition provides a new interpretation of some old incremental measures, where the new interpretation appears, in some cases, more appealing than that provided by the increment-of-probability intuition. Second, the reduction-of-improbability intuition is heuristically fruitful, in the sense that it immediately suggests new incremental measures that would be hardly thought of on the basis of the increment-of-probability intuition.
In what follows, we use the term “improbability” as a neutral one, to denote any measure that decreases when p(H) increases. To mention but three examples, the initial improbability of H may be defined as follows:



The final improbability of H, imp(H∣E), is defined in the same way, simply by substituting p(H∣E) and o(H∣E) for p(H) and o(H), respectively, in the definitions above.Footnote 14
On the basis of such definitions, nearly all traditional incremental measures of confirmation can be taken as benchmarks to define new measures in terms of reduction of improbability. For instance, D and R each suggests the following triples of improbability difference measures ID(H, E) and of improbability ratio measures IR(H, E):






It is not difficult to check that three of the above measures (i.e., ID1, IR1, and IR2) are indeed identical to measures that we already know from the foregoing sections. Moreover, a fourth measure, IR3, turns out to be the same as the very well-known odds ratio measure of confirmation OR (famously advocated by Good Reference Good1950). Summing up:
Theorem 4. For any H and E:




Thus, the reduction-of-improbability intuition leads, on the one hand, to a new interpretation of the four traditional incremental measures D, G, R, and OR. On the other hand, it also suggests two new measures, ID2 and ID3, which, given the definitions of imp2(H) and imp3(H), can be expressed as follows:


Note that both ID2 and ID3 express confirmation as the difference between the initial and final improbability of H, where improbability is defined as the inverse of either the relevant probabilities (for ID2) or the corresponding odds (for ID3). As we will see in a moment, measures ID2 and ID3 are genuinely new incremental measures of confirmation and display some interesting properties, especially as far as likelihood principles are concerned.Footnote 15
4.2. The Normalized Difference Measure
To begin with, it is easy to check that the two measures ID2 and ID3 introduced in the previous section are indeed incremental measures (i.e., satisfy P1–P3 from sec. 2). Also, it is not difficult to see that they can be reformulated as follows:


The above reformulation shows how both ID2 and ID3 can be construed, more traditionally, as increment-of-probability measures. In fact, ID2 is the probability difference D normalized by the factor 1/[p(H∣E)p(H)], while ID3 is the normalization of the odds difference OD by the factor 1/[o(H∣E)o(H)].
What is perhaps less obvious to note is that:
Theorem 5. For any H and E, ID2(H, E) = ID3(H, E).
Thus, not only do ID2 and ID3 have the same form: they are indeed the same measure, which is invariant if expressed either as the normalized difference of the relevant probabilities or as the normalized difference of the corresponding odds. For this reason, we refer to this new measure as to the “normalized (probability or odds) difference” measure, in symbols ND(H, E).
The ND measure has a number of interesting properties. For instance, consider the following, prima facie highly counterintuitive, principle:
(RWLL-N) If and
, then
.
Note that this principle is, so to speak, the reversal of principle WLL-N from section 3, in the sense that it is obtained from WLL-N by reversing the inequality sign in the second part of the antecedent (hence the label, which stands for “reversed WLL-N”).
Accordingly, RWLL-N says that if H1 is as good as H2 in predicting E and ¬H1 is better than ¬H2 in predicting E, then E confirms H1 more than H2. Given LF, RWLL-N amounts to saying that, for fixed values of p(E∣H), C(H, E) is a increasing function of p(E∣¬H). For this reason, RWLL-N can be called an antilikelihood principle of Bayesian confirmation.
Of course, RWLL-N is a structural principle, since it is violated by many measures of confirmation, including all the traditional ones. That this is the case can be seen by observing that:
Theorem 6. RWLL-N is incompatible with WLL-N.
Thus, all measures satisfying WLL-N will violate RWLL-N. Still, one can prove that:
Theorem 7. ND satisfies RWLL-N.
It immediately follows, due to the above illustrated incompatibility between RWLL-N and WLL-N, that:
Theorem 8. ND violates WLL-N.
Moreover, one can prove that ND also violates the Weak Law of Likelihood:
Theorem 9. ND violates WLL.
To sum up, we have shown that there is an intuitively motivated measure of confirmation as reduction of improbability, namely, ND, that violates WLL (and hence LL and N) and satisfies a strongly antilikelihood principle, namely, RWLL-N.
5. Concluding Remarks
We have presented and discussed some more or less well-known likelihood principles for Bayesian confirmation. Given a very plausible characterization of incremental measures of Bayesian confirmation, all those principles (with the exception of EL) turned out to be independent of the basic conditions defining such measures. This implies that, as we have proven, each of those principles is satisfied by some incremental measures and violated by others. In turn, this means that none of the likelihood principles considered here can be taken as isolating a fundamental property of Bayesian confirmation. This is particularly true for the Weak Law of Likelihood, which is often regarded as a crucial ingredient of a Bayesian treatment of confirmation. To further show that this is not the case, we focused on the normalized difference measure ND, which “strongly” violates WLL and other likelihood conditions by satisfying what we called an antilikelihood principle of confirmation. Moreover, we showed that measure ND is both intuitively motivated (by construing confirmation as reduction of improbability) and formally equivalent to a normalization of such a traditional measure as D. For these reasons, it seems to us, little doubt exists that ND is a perfectly acceptable measure of confirmation.
One way to resist this conclusion would be as follows.Footnote 16 Investigating the grammar of confirmation should aim at identifying intuitively plausible conditions on incremental measures. Now, one may argue, WLL clearly is one such condition, while RWLL-N is not. Accordingly, all measures that, like OD, fail to meet WLL should be rejected as inadequate explications of confirmation, and this is the case, a fortiori, for measures that, like ND, strongly violate WLL by meeting RWLL-N.
Here, we can just sketch two answers to this kind of objection. First, as a decades-long discussion on competing confirmation measures has shown, in this context intuitions are at least sometimes unreliable and typically insufficient as a guide to the choice of specific measures or conditions. For this reason, in the current article we aimed at a general characterization of (anti)likelihood principles, without committing ourselves from the beginning to one or more of them. That intuitively assessing the plausibility of different principles (and hence measures) may be problematic is shown by the following example. Which one is the most plausible likelihood principle among WLL, WLL-L, and WLL-N? Different answers to this question will exclude as inadequate different measures; for instance, G has to be excluded if one prefers WLL-L over WLL-N, while R has to go if the preference is reversed (see the appendix for relevant proofs). Still, given that all three conditions above are structural (nonuniversal) principles, it is far from obvious on which grounds the choice should be made. In short, it seems to us that, in light of theorem 3, the plausibility of WLL (or of any other structural principle) can be hardly taken for granted without providing some independent justification in its favor. (In this connection, it may be instructive to compare the task of justifying a universal principle like EL with that of motivating a structural one like WLL.)
Second, reasons to doubt the indisputableness of the intuitions underlying WLL and related principles are also suggested by current discussion of other adequacy requirements for confirmation measures apart from likelihood principles. As an example, let us consider the so-called Matthew principles for Bayesian confirmation studied by Festa (Reference Pagnini2012), Roche (Reference Roche2014), and Festa and Cevolani (Reference Festa and Cevolani2015). Two such principles read as follows:Footnote 17
(MP) If and
, then
;
(RMP) If and
, then
.
Both of the above principles concern two hypotheses H1 and H2 having different initial probability and such that H1 and H2 are “equally successful” in predicting E, in the sense that the likelihood of H1 and H2 on E is the same, and both hypotheses make the initial probability of E increase. Under these conditions, principle MP prescribes that an initially more probable hypothesis is more confirmed by its successes than a less probable one, while, according to RMP, an initially more probable hypothesis has to be less confirmed by its successes than a less probable one. Note that this latter condition has a distinctive Popperian flavor, leading one to prefer, in terms of confirmation, improbable (and, in this sense, highly informative) hypotheses over more probable ones.Footnote 18
Not surprisingly, one can prove that different incremental measures satisfy MP while violating RMP and vice versa (see Festa [Reference Pagnini2012, sec. 3] and Festa and Cevolani [Reference Festa and Cevolani2015] for a systematic analysis). For instance, as far as measure ND is concerned, we prove in the appendix that it can be expressed in the following equivalent form:

From this formulation, it is immediately clear that, for fixed E, ND decreases as p(H) increases. It follows that, if p(E∣H1) = p(E∣H2), an initially more probable hypothesis is less confirmed by E than an initially less probable one. In other words, ND meets RMP and violates MP (cf. Festa Reference Pagnini2012, theorem 2.i; Roche Reference Roche2014, theorem 2*.a).
In sum, on the one hand measure ND meets both RWLL-N and RMP, and on the other hand it violates WLL, WLL-N, and MP. This fact alone suggests that intimate relations may obtain between (anti)likelihood principles and Matthew principles. Elsewhere (Festa and Cevolani Reference Festa and Cevolani2015), we have shown that this is in fact the case and that some of the above principles are provably equivalent. Such equivalence results imply, in particular, that it is impossible to satisfy the intuitions underlying the weak laws of likelihood and, at the same time, a Popperian preference for improbable hypotheses as embodied in principle RMP.
In turn, this means that any argument in favor of RMP will immediately translate into an argument against WLL and other likelihood principles. Such an argument would thus provide an indirect justification of antilikelihood principles like RWLL-N. Indeed, a defense of RMP as a plausible principle of confirmation has been recently put forward by Festa (Reference Pagnini2012, sec. 3.3.2). If convincing, this argument would prove the intuitive plausibility of selected antilikelihood principles. In the end, it may well be that, as Roche (Reference Roche2015b, sec. 3.2 n. 17; italics added) puts it, “there is a sense of support [i.e., confirmation] on which any adequate support measure [i.e., confirmation measure] should fail to meet WLL.” Further research is needed to assess these preliminary results, which promise to open the way to a fruitful investigation of the deeper structure of Bayesian confirmation.
Appendix Proofs
We prove all the results presented in the article, except for those already proven elsewhere, for which we provided the relevant references in the text.
Theorem 1
EL is a universal principle. According to LF, C(H, E) can be expressed as a function of ,
, and p(E). It immediately follows that, for any E, if
and
, then
. In other words, LF entails EL, which is then a universal principle given that LF is equivalent to P1, which is a basic principle.
Theorem 2
For what concerns the logical relationships among the likelihood principles LL, N, WLL, WLL-L, and WLL-N introduced in section 3, we need to check the following cases.
LL implies WLL. Assume that WLL is false. This means that there are H1, H2, and E such that while
, so that LL is violated. By contraposition, if LL holds, then also WLL holds.
N implies WLL. Assume that WLL is false. This means that there are H1, H2, and E such that while
, so that N is violated. By contraposition, if N holds, then also WLL holds.
The conjunction of WLL-L and WLL-N implies WLL. We prove that if WLL-L, WLL-N, and the antecedent of WLL (i.e., and
) hold, then the consequent of WLL is also true. Let us consider a H3 such that
and
. Then we have both
and
, which, by WLL-L, imply
. Moreover, we have both
and
, which, by WLL-N, imply
. Finally, from
and
we conclude, by transitivity, that
, which is the consequent of WLL. Thus, WLL-L and WLL-N together imply WLL.
LL and N are incompatible. We prove that the conjunction of LL and N leads to a contradiction. It is sufficient to consider two hypotheses H1 and H2 such that and
. Then it follows both that
from LL and that
from N, which is impossible.
LL implies WLL-L. Assume that WLL-L is false. This means that there are H1, H2, and E such that while
, so that LL is violated. By contraposition, if LL holds, then also WLL-L holds.
N implies WLL-N. Assume that WLL-N is false. This means that there are H1, H2, and E such that while
, so that N is violated. By contraposition, if N holds, then also WLL holds.
Theorem 3
We need to prove that all likelihood principles LL, N, WLL, WLL-L, and WLL-N are structural, that is, that each of them is satisfied by (at least) an incremental measure and violated by some other one. To this purpose, it is useful to consider the relevant measures expressed in “likelihood form,” that is, as functions of p(E∣H), p(E∣¬H), and p(E) only. In this way, it becomes immediately evident, for each of the five principles, which measures satisfy or violate it. The following lemma is easily proven:
Lemma 1. For any H and E:


Proof. As far as R is concerned: R(H, E) is defined as p(H∣E)/p(H), which, by Bayes’s theorem, is equal to . For G, note that
amounts to
, which, again by Bayes’s theorem, is equal to
. QED
The two measures above are already sufficient to prove that four of our five principles (i.e., LL, N, WLL-L, and WLL-N) are structural. In fact, by inspection of the likelihood form of R and G above, one can easily check that:
LL is a structural principle. LL is satisfied by R and violated by G, since it is easy to see that, for any E, implies
but not necessarily
.
N is a structural principle. N is satisfied by G and violated by R, since it is easy to see that, for any E, implies
but not necessarily
.
WLL-L is a structural principle. WLL-L is satisfied by R and violated by G, since it is easy to see that, for any E, and
implies
but
.
WLL-N is a structural principle. WLL-N is satisfied by G and violated by R, since it is easy to see that, for any E, and
implies
but
.
WLL is a structural principle. As far as WLL is concerned, it is satisfied by all the three measures above (and by many others, including D), as it is easily seen again by inspection of their likelihood forms (see also Roche and Shogenji [Reference Roche and Shogenji2014, 119–20], for relevant discussion and proofs). The following counterexample shows, however, that measure OD violates WLL. Consider the following probability distribution over statements E, H1, H2: p(H1 & H2 & E) = 0.15, p(H1 & H2 & ¬E) = 0.05, p(H1 & ¬H2 & E) = 0.10, p(H1 & ¬H2 & ¬E) = 0.02, p(¬H1 & H2 & E) = 0.15, p(¬H1 & H2 & ¬E) = 0.18, p(¬H1 & ¬H2 & E) = 0.05; p(¬H1 & ¬H2 & ¬E) = 0.30. It can then be computed that p(E∣H1) = 0.78 > 0.57 = p(E∣H2) and p(E∣¬H1) = 0.29 < 0.32 = p(E∣¬H2) but OD(H1, E) = 0.78 < 0.87 = OD(H2, E), contrary to WLL, which would require OD(H1, E) > OD(H2, E).
This completes the proofs concerning the five likelihood principles of figure 1 from section 3. The following results concern the measures of confirmation as reduction of improbability from section 4.
Theorem 4
Given the definitions of the improbability measures imp1(H), imp2(H), and imp3(H), the following equalities are easily derived:






Theorem 5
As for measures ID2 and ID3 above, it is immediate to check that


Moreover, recalling that and
, one can easily prove that ID2 and ID3 are the same measure. In fact,
. But,
. Hence,
.
To see that ID2 (or ID3) is an incremental measure, it is sufficient to check that satisfies P1 (by definition), P2 (since ID2(H1, ⊤) = 0 for any H), and P3 (since, for any given H, ID2(H, E) increases as p(H∣E) increases). So ID2 is an incremental measure, as well as ID3.
Theorem 6
To prove that WLL-N and RWLL-N are incompatible, we show that their conjunction leads to a contradiction. It is sufficient to consider two hypotheses H1 and H2 such that and
. Then it follows both that
from WLL-N and that
from RWLL-N, which is impossible.
It remains to prove that measure ND meets RWLL-N and violates WLL-N and WLL; the following lemmas will be useful in proof.
Lemma 2. For any H and E such that E is not neutral for H:
i)
;
ii)
.
Proof. (i) The proof starts from the “law of total probability” according to which . It then follows that
. This latter equality implies
. (ii) As far as p(H∣E) is concerned, from Bayes’s theorem we have
and from the equality just obtained above we immediately derive that
. QED
Lemma 2 allows us to study how, for fixed values of p(E) and ,
varies with respect to
:
Lemma 3. For any H and E such that E is not neutral for H, for fixed values of p(E) and p(E∣H):
i) if E confirms H, p(H) decreases as p(E∣¬H) increases;
ii) if E disconfirms H, p(H) increases as p(E∣¬H) increases.
Proof. From lemma 2, we know that when E is not neutral for H. We rewrite p(H) as a function of variables
,
, and
as follows:

We then study how f varies as z increases. To this purpose, we calculate the partial derivative of f with respect to z by applying the basic (quotient and difference) rules of the calculus

Since the denominator of the above equation is always positive, the partial derivative of p(H) has the same sign as the numerator (x − y). Recalling that x = p(E) and y = p(E∣H), we thus obtain (i) if E confirms H, then and hence the partial derivative of p(H) is negative; (ii) if E disconfirms H, then
and hence the partial derivative of p(H) is positive. In sum: for fixed values of p(E) and p(E∣H), p(H) decreases as p(E∣¬H) increases, if E confirms H, and p(H) increases as p(E∣¬H) increases, if E disconfirms H. QED
Coming now back to ND, we note that, for any H and E, ND(H, E) can be expressed as a function of p(E), p(E∣H), and p(H):

Moreover:
Lemma 4. For fixed values of p(E) and p(E∣H), ND is an increasing function of p(E∣¬H).
Proof. Given lemma 3, we can distinguish two cases. If E confirms H, p(E∣H) − p(E) and hence ND(H, E) is positive; moreover, as p(E∣¬H) increases, p(H) decreases and hence ND(H, E) increases. If E disconfirms H, p(E∣H) − p(E) and hence ND(H, E) is negative; moreover, as p(E∣¬H) increases, p(H) increases, the absolute value of ND(H, E) decreases, and hence ND(H, E) increases. In sum, for fixed values of p(E) and p(E∣H), ND(H, E) is an increasing function of p(E∣¬H). QED
Given the above results, we can then prove the following results about ND.
Theorem 7
ND meets RWLL-N. From lemma 4, we know that, for fixed values of p(E) and p(E∣H), ND increases as p(E∣¬H) increases. It follows that p(E∣H1) = p(E∣H2) and p(E∣¬H1) > p(E∣¬H2) imply ND(H1, E) > ND(H2, E), so that RWLL-N is satisfied.
Theorem 8
ND violates WLL-N. This follows immediately from theorems 6 and 7, since ND meets RWLL-N, which is incompatible with WLL-N.
Theorem 9
ND violates WLL. A counterexample will be sufficient to prove this. Consider the following probability distribution over statements E, H1, H2: p(H1 & H2 & E) = 0.03, p(H1 & H2 & ¬E) = 0.03, p(H1 & ¬H2 & E) = 0.10, p(H1 & ¬H2 & ¬E) = 0.15, p(¬H1 & H2 & E) = 0.01, p(¬H1 & H2 & ¬E) = 0.03, p(¬H1 & ¬H2 & E) = 0.25; p(¬H1 & ¬H2 & ¬E) = 0.40. It can then be computed that p(E∣H1) ≃ 0.42 > 0.4 = p(E∣H2) and p(E∣¬H1) ≃ 0.38 < 0.39 ≃ p(E∣¬H2), but ND(H1, E) ≃ 0.23 < 0.25 = ND(H2, E), contrary to WLL, which would require ND(H1, E) > ND(H2, E).