1. Introduction
Along with Richard Jeffrey’s book The Logic of Decision (Reference Jeffrey1983) and the mathematically bountiful article “Updating Subjective Probability” (Diaconis and Zabell Reference Diaconis and Zabell1982), Jeffrey’s “Conditioning, Kinematics, and Exchangeability” (Reference Jeffrey, Skyrms and Harper1988, Reference Jeffrey1992) is one of the foundational documents of probability kinematics. Among other things, it gives a beautifully lucid account of various equivalent formulations of the preconditions for updating a prior by conditioning or by probability kinematics. However, the section entitled “Successive Updating” contains a subtle error regarding the applicability of updating by so-called relevance quotients in order to ensure the commutativity of successive probability kinematical revisions. Upon becoming aware of this error, Jeffrey formulated the appropriate remedy, but he never discussed the issue in print. To head off any confusion, it seems worthwhile to alert readers of Jeffrey’s “Conditioning, Kinematics, and Exchangeability” to the aforementioned error and to document his remedy, placing it in the context of both earlier and subsequent work on commuting probability kinematical revisions.
Although this Discussion Note touches on some of the philosophical and methodological issues that arise in choosing the correct representation of what is learned from new evidence alone, it is intended primarily to clarify certain mathematical aspects of successive probability kinematical revisions. More detailed discussions of associated philosophical issues may be found in Field (Reference Field1978), Lange (Reference Lange2000), Wagner (Reference Wagner2002), and Hawthorne (Reference Hawthorne2004).
2. Notation and terminology
In what follows, Ω denotes a set of possible states of the world, conceived as mutually exclusive and exhaustive, and A denotes an algebra of subsets (called events) of Ω. If p and q are finitely additive probability measures on A and A ∈ A, the relevance quotient (terminology attributed to Carnap), denoted by
$R_p^q(A)$
, is defined by the formula
$R_p^q(A): = q(A)/p(A)$
. Typically, q is thought of as resulting from the revision of p as a result of encountering new evidence. In such cases, p is called the prior probability, and q is called the posterior probability. If q comes from p by conditioning on the event E, then

Note that
$R_p^q(A)$
contains implicit restraints on the prior p. As a simple example, if
$R_p^q(A) = 2,$
then, necessarily,
$p(A) \le $
1/2. We will return to this apparently trivial observation later in this note.
If p and q are as stated previously, and A and B are events, the Bayes factor, denoted by
$B_p^q(A:B)$
, is defined by the formula
$B_p^q(A:B):$
$ = {{q(A)/q(B)} \over {p(A)/p(B)}}$
, which is simply the ratio of the new to old odds on A against B. Relevance quotients and Bayes factors are connected by the formula

When q comes from p by conditioning on E, then
$B_p^q(A:B)$
reduces to the familiar likelihood ratio
$p(E|A)/p(E|B).$
3. Probability kinematics
In the remainder of this note, all probability measures are assumed to be strictly coherent, in the sense that every nonempty event A is assigned a nonzero probability. This assumption, although inessential, allows us to avoid the distraction of continually having to postulate the positivity of various probabilities in theorems and their proofs.
Suppose that p is your prior probability on A, and E
$ = \{ {E_1}, \ldots ,{E_n}\} $
is a partition of
$\Omega $
, with each
${E_i} \in $
A. New evidence prompts you to revise p to the posterior probability measure q as follows. Based on the total evidence, old as well as new, you first assess the posterior probabilities
$q({E_i}) = {e_i}$
, where, of course,
${e_1} + \cdot \cdot \cdot + {e_n} = 1.$
Judging that you have learned nothing that would disturb any of the prior conditional probabilities
$p(A|{E_i})$
, you adopt the rigidity condition
$q(A|{E_i}) = p(A|{E_i})$
, for all
$A \in $
A and
$i = 1, \ldots ,n$
. This fully and uniquely determines q (Jeffrey Reference Jeffrey1983) by the formula

When probability measures q and p are related by equation (2), we say that q has come from p by probability kinematics (henceforth, PK), or by Jeffrey conditioning, on the partition E.
4. Successive updating
4.1. The elementary model
Consider two possible successive updating schemes. In the first instance, p is first revised to q by PK on the partition E
$ = \{ {E_1}, \ldots ,{E_n}\} $
of
$\Omega $
, with
$q({E_i}) = {e_i}$
, and then q is revised to r on the partition F
$ = \{ {F_1}, \ldots ,{F_m}\} $
of
$\Omega $
, with
$r({F_j}) = {f_j}$
. In the second instance, p is first revised to
$q^\prime$
by PK on F, with
$q^\prime({F_j}) = {f_j}$
, and then
$q^\prime$
is revised to
$r^\prime$
by PK on E, with
$r^\prime({E_i}) = {e_i}$
(figure 1).

Figure 1. Elementary successive updating.
If it turns out that
$r^\prime = r$
, the successive PK revisions are said to
$commute.$
It is straightforward to verify that


It is obvious from equations (3) and (4) that the conditions
$q^\prime({E_i}) = p({E_i})$
and
$q({F_j}) = p({F_j})$
, which Diaconis and Zabell (Reference Diaconis and Zabell1982) dub with the beautifully suggestive nomenclature Jeffrey independence, are sufficient to ensure commutativity. In fact, they prove that Jeffrey independence is necessary for commutativity as well. In general, however,
$r^\prime$
may differ from
$r$
. Individuals who have found this troubling (see Lange [Reference Lange2000] for some sample references) presumably subscribe to the following two principles:
-
1. If the revisions of
$p$ to
$q$ , and of
$q^\prime$ to
$r^\prime$ , are based on identical new learning, and the revisions of
$q$ to
$r$ , and of
$p$ to
$q^\prime$ , are based on identical new learning, then it ought to be the case that
$r^\prime = r.$
-
2. Identical new learning prompting the revisions of
$p$ to
$q$ , and of
$q^\prime$ to
$r^\prime$ , should be represented by the identities
$r^\prime({E_i}) = q({E_i}) = {e_i}$ , for all
$i$ . Identical new learning prompting the revisions of
$q$ to
$r$ , and of
$p$ to
$q^\prime$ , should be represented by the identities
$q^\prime({F_j}) = r({F_j}) = {f_j}$ , for all
$j.$
Although the first of these principles seems uncontroversial, the second is profoundly mistaken. This was already noted by Carnap in correspondence with Jeffrey in the late 1950s, as described by Jeffrey (Reference Jeffrey, Maxwell, Robert and Jr1975). Carnap pointed out (in the terminology of our current example) that the probabilities
$r^\prime({E_i})$
are based not only on the new learning prompting the revision of the probabilities
$q^\prime({E_i})$
but also on the totality of evidence incorporated in the latter probabilities. Similar remarks apply, of course, to the probabilities
$q^\prime({F_j}).$
Carnap’s point was forcefully reiterated by Field (Reference Field1978), who proceeded to identify the correct representation of what is learned from new evidence alone, the details of which we examine in the next subsection.
4.2. The extended model: Field’s analysis
The term extended model refers to the generalization of figure 1 shown in figure 2. As our notation suggests, it is no longer assumed in the extended model that
${e^\prime_i} = {e_i}$
or that
${f^\prime_j} = {f_j}.$
Under what conditions do we get commutativity in this model? Hartry Field (Reference Field1978), presumably inspired by the old Bayesian idea (Good Reference Good1950, Reference Good1983) that ratios of new to old odds furnish the correct representation of what is learned from new evidence alone, established the remarkable result that the classical PK formula in equation (2) could be transformed into a “re-parameterized” form:


Figure 2. Extended successive updating.
The analogous re-parameterizations of the classical formulas for
$r,q^\prime,$
and
$r^\prime$
are


and

Combining equations (5)–(8) yields the successive PK revision formulas

and

from which the following theorem follows immediately:
Theorem 1 The Field parameter identities

imply that r’ = r.
Proof. Obvious. □
It is important not to read more into these results than has so far been established. In figure 2, it is assumed that
$p,q,r,q^\prime$
, and
$r^\prime$
are fully defined probability measures on the algebra A; that
$q$
has come from
$p$
, and
$r^\prime$
from
$q^\prime$
, by PK on E; and that
$r$
has come from
$q,$
and
$q^\prime$
from
$p$
, by PK on F. Then, if it is determined that
${G^\prime_i} = {G_i}$
and
${g^\prime_j} = {g_j}$
, it follows that
$r^\prime = r.$
Consider, however, the following different scenario, in which the preceding assumptions hold only for
$p,q$
, and
$r$
, and the parameters
${G_i}$
and
${g_j}$
have been determined. Can we then design revisions
$q^\prime$
of
$p$
by PK on F, and
$r^\prime$
of
$q^\prime$
by PK on E, so that we are guaranteed to have
$r^\prime = r$
? The natural move is to define
$q^\prime$
and
$r^\prime$
by the formulas

and

To show that this is the right move, however, requires a proof of the following theorems. First, we need to take note of a key property of products of Field parameters.
Theorem 2.
$\prod\limits_{i = 1}^n {{G_i}} = \prod\limits_{i = 1}^n {{G^\prime_i}} = \prod\limits_{j = 1}^m {{g_j} = \prod\limits_{j = 1}^m {{g^\prime_j}} } = 1$
.
Proof. By equations (5) and (1),
$\prod\limits_{i = 1}^n {{G_i}} = \prod\limits_{i = 1}^n {{{\left( {\prod\limits_{k = 1}^n {R_p^q({E_i})/R_p^q({E_k})} } \right)}^{1/n}}} = {\left( {\prod\limits_{i = 1}^n {R_p^q{{({E_i})}^n}/(\prod\limits_{k = 1}^n {R_p^q({E_k}){)^n}} } } \right)^{1/n}} = {1^{1/n}} = 1$
.
Next, we can show that the probabilities defined by equations (12) and (13) behave just as we intend.
Theorem 3 (i) The set function
$q^\prime$
defined by equation (12) is a probability measure on A and comes from
$p$
by PK on F. Moreover,

(ii) The set function
$r^\prime$
defined by equation (13) is a probability measure on A, and
$r^\prime$
comes from
$q^\prime$
by PK on E. Moreover,

So by theorem 2,
$r^\prime = r.$
Proof. (i) It is easy to show that
$q^\prime$
is an additive set function on A and that
$q^\prime(\Omega ) = 1.$
Also,
$q^\prime(A|{F_j}) = q^\prime(A{F_j})/q^\prime({F_j}) = [{g_j}p(A{F_j})/\sum\limits_{j = 1}^m {{g_j}p({F_j}} )]/[{g_j}p({F_j})/\sum\limits_{j = 1}^m {{g_j}} p({F_j})] = p(A|{F_j})$
,
and so
$q^\prime$
comes from
$p$
by PK on F. Finally, equation (13) implies that
$B_p^{q^\prime}({F_j}:{F_k}) = {g_j}/{g_k}$
, and so
${\left( {\prod\limits_{k = 1}^m {B_p^{q^\prime}({F_j}:{F_k}} }) \right)^{1/m}} = {(g_j^m/{g_1} \cdot \cdot \cdot {g_m})^{1/m}} = {g_j}$
, by theorem 1. The proof of (ii) is similar.
In the next section, we will encounter an attempt to simplify Field’s analysis for which analogues of equations (14) and (15) fail to obtain.
5. Jeffrey’s proposal
In an attempt to simplify Field’s parameterization of JC, Jeffrey noted that in the extended model, the classical PK formula
$q(A) = \sum\limits_{i = 1}^n {{e_i}} p(A|{E_i})$
can be recast as
$q(A) = \sum\limits_{i = 1}^n {R_i^{}} p(A{E_i})$
, where
${R_i} = $
$R_p^q({E_i})$
. Similarly, one can recast the classical formulas for
$r,q^\prime,$
and
$r^\prime$
as
$r(A) = \sum\limits_{j = 1}^m {{\rho _j}} q(A{F_j})$
,
$q^\prime(A) = \sum\limits_{j = 1}^m {\rho ^\prime_j} p(A{F_j})$
, and
$r^\prime(A) = \sum\limits_{i = 1}^n {R^\prime_i} q^\prime(A{E_i})$
, where
${\rho _j} = R_q^r({F_j})$
,
${\rho ^\prime_j} = R_p^{q^\prime}({F_j})$
, and
${R^\prime_i} = R_{q^\prime}^{r^\prime}({E_i})$
. It follows that

So if the relevance quotient identities

hold, then
$r^\prime = r.$
Again, it is important to keep in mind here that this commutativity result depends on the assumption that p, q, r, q′, and r′ are fully defined probability measures on the algebra A; that
$q$
has come from
$p$
, and
$r^\prime$
from
$q^\prime$
, by PK on E; and that
$r$
has come from
$q,$
and
$q^\prime$
from
$p$
, by PK on F. Then, if it is determined that
${R^\prime_i} = {R_i}$
and
${\rho ^\prime_j} = {\rho _j}$
, it follows that
$r^\prime = r.$
But suppose that only p, q, and r have been assessed and the relevance quotients
${R_i}$
and
${\rho _j}$
have been evaluated. Can we then design PK revisions of
$p$
to
$q^\prime$
on F, and of
$q^\prime$
to
$r^\prime$
on E, so that we are guaranteed to have
$r^\prime = r?$
Jeffrey proposed setting
$q^\prime({F_j})$
equal to
${\rho _j}p({F_j})$
and setting
$r^\prime({E_i})$
equal to
${R_i}q^\prime({E_i})$
. That this may sometimes fail to do the trick can be seen from the example in Jeffrey’s table 2 (Reference Jeffrey, Skyrms and Harper1988, 236; Reference Jeffrey1992, 134). In this example,
$\Omega = \{ 1,2,3,4\}, {E_1} = \{ 1,2\} ,{E_2} = \{ 3,4\} $
,
${F_1} = \{ 1,4\} $
, and
${F_2} = \{ 2,3\} .$
The prior p is defined by
$p(i) = i/10,$
for
$i = 1, \ldots ,4.$
The probability measure
$q$
comes from
$p$
by PK on
$\{ {E_1},{E_2}\} ,$
with
$q({E_1}) = q({E_2}) = 1/2$
, and the probability measure
$r$
comes from
$q$
by PK on
$\{ {F_1},{F_2}\} ,$
with
$r({F_1}) = r({F_2}) = 1/2.$
In table 1, the distracting arithmetic mistakes in Jeffrey’s table have been corrected (with corrected values in parentheses) so that the error in his proposal for defining
$q^\prime({F_j})$
and
$r^\prime({E_i})$
stands out more clearly.
Table 1. Jeffrey’s Table 2 (arithmetic corrected)

Table 2. Table 1, rectified by normalizing

Note that we do in fact arrive at
$r^\prime = r.$
This was, of course, predictable, in view of the commutativity of ordinary multiplication. But an odd thing occurs on the path from p to r′: we pass through what we have labeled “q′,” which fails to define a probability measure because its entries do not sum to 1. This is simply an illustration of the fact, remarked upon in section 2, that the relevance quotient
$R_p^q(A)$
contains implicit constraints on the prior probability
$p(A).$
So although the positive real numbers
${\rho _1}, \ldots ,{\rho _n}$
might function as a sequence of relevance quotients, in the sense that there exist probabilities
${\pi _1}, \ldots ,{\pi _n}$
with
${\pi _1} + \cdot \cdot \cdot + {\pi _n} = 1$
and
${\rho _1}{\pi _1} + \cdot \cdot \cdot + {\rho _n}{\pi _n} = 1$
, this need not be the case for every sequence of probabilities that sum to 1, just as we saw in table 1.
Upon becoming aware of this problem, Jeffrey proposed to repair the array marked “q′” by normalizing, that is, by dividing each of its entries by 441/437, which then defines, by the array marked
$q^\prime$
in table 2, a genuine probability measure. But now, if the entries in the first row of
$q^\prime$
are multiplied by 5/3, and the entries in the second row are multiplied by 5/7, the resulting array fails to define a probability measure because its entries, predictably, sum to 437/441. Dividing every entry in that table by 437/441 then defines, by the array marked
$r^\prime$
in table 2, a genuine probability measure. Moreover,
$r^\prime = r$
, as intended.
Notice that the commutativity in table 2 is, contrary to what Jeffrey had hoped for,Footnote
3
no longer accounted for by relevance quotient identities. What we get instead are the relevance quotient proportionalities
${\rho ^\prime_j} \propto {\rho _j}$
and
${R^\prime_i} \propto {R_i}$
, with

As we will see in section 7, analogous proportionalities prove to be the rule, rather than the exception, in the most general parameterization of JC.
6. The Jeffrey–Hendrickson parameterization of JC
It is ironic that while Jeffrey sought to simplify Field’s analysis of commutativity by employing relevance quotients, he had in hand, in Jeffrey and Hendrickson (Reference Jeffrey and Hendrickson1988/1989), the perfect parameterization of JC for accomplishing that task. The Jeffrey–Hendrickson transformation of the classical formula in equation (2) takes the form




from which it follows that


Theorem 4 The Jeffrey–Hendrickson parameter identities

are sufficient and, under the regularity conditions,


necessary for
$r^\prime = r$
.
Proof. See Wagner (Reference Wagner2002, theorems 3.1 and 4.1). □
Here again, commutativity depends on the assumption that p, q, r, q′, and r′ are fully defined probability measures on the algebra A; that
$q$
has come from
$p$
, and
$r^\prime$
from
$q^\prime$
, by PK on E; and that
$r$
has come from
$q,$
and
$q^\prime$
from
$p$
, by PK on F. Suppose, however, that only p, q, and r have been defined, and the parameters
${B_i}$
and
${b_j}$
have been determined. As in the case of Field’s parameterization, we can then design probability measures
$q^\prime$
and
$r^\prime$
by means of the definitions

and

so that the following analogue of theorem 3 holds:
Theorem 5 (i) The set function
$q^\prime$
defined by equation (28) is a probability measure on A and comes from
$p$
by PK on F. Moreover,
${b^\prime_j}: = B_p^{q^\prime}({F_j}:{F_1}) = {b_j}$
. (ii) The set function
$r^\prime$
defined by equation (29) is a probability measure on A, and
$r^\prime$
comes from
$q^\prime$
by PK on E. Moreover,
${B^\prime_i}: = B_{q^\prime}^{r^\prime}({E_i}:{E_1}) = {B_i}$
. Hence, (iii)
$r^\prime = r.$
Proof. The proofs of (i) and (ii) are straightforward, and (iii) then follows from theorem 4. □
7. A comprehensive parameterization of JC
Consider the formula
$\hat q(A) = \sum\limits_{i = 1}^n {{u_i}} p(A{E_i}),$
where p is a probability measure on A, and
the parameters
${u_i}$
are any positive real numbers whatsoever. It is easy to check that
$\hat q$
is a nonnegative, additive set function on A. So
$\hat q$
is a probability measure if and only if
$\hat q(\Omega ) = \sum\limits_{i = 1}^n {{u_i}p({E_i}} ) = 1.$
Consequently, whatever the value of
$\sum\limits_{i = 1}^n {{u_i}} p({E_i})$
turns out to be (whether equal to 1 or not), the set function
$q$
, defined by

is a probability measure on A. Moreover, because
$q(A|{E_i}) = p(A|{E_i})$
, for all
$A \in $
A and
$1 \le i \le n,$
$q$
comes from
$p$
by PK on E.
Suppose now that E
$ = \{ {E_1}, \ldots ,{E_n}\} $
, F
$ = \{ {F_1}, \ldots ,{F_m}\} $
, and
${({u_i})_{1 \le i \le n}},{({u^\prime_i})_{1 \le i \le n}},{({v_j})_{1 \le j \le m}},$
and
${({v^\prime_j})_{1 \le j \le m}}$
are sequences of arbitrary positive real numbers. Consider the successive PK updating scenario shown in figure 3.

Figure 3. Generalized successive PK updating.
In figure 3, the probability measure q comes from p by PK on E in accord with the formula in equation (30). Similarly, r comes from q by PK on F,
$q^\prime$
comes from p by PK on F, and
$r^\prime$
comes from
$q^\prime$
by PK on E by the analogous formulas


and

It follows that


From equations (34) and (35), a condition sufficient to ensure that
$r^\prime = r$
is obvious.
Theorem 6 If there exists a constant
$c$
such that
${u^\prime_i} = c \cdot {u_i}$
, for
$1 \le i \le n$
(symbolized by
${u^\prime_i} \propto {u_i}$
), and there exists a constant d such that
${v^\prime_j} = d \cdot {v_j}$
for
$1 \le j \le m$
(symbolized by
${v^\prime_j} \propto {v_j}$
), then
$r^\prime = r.$
Proof. Straightforward. □
The proportionalities
${u^\prime_i} \propto {u_i}$
and
${v^\prime_j} \propto {v_j}$
turn out to be equivalent to certain Bayes factor identities. In order to prove this assertion, however, we need to establish a few preliminary results. We begin by establishing a connnection between the rather abstract quantities
${u_i}$
appearing in the formula in equation (30) and certain Bayes factors.
Theorem 7 For all
$1 \le i,k \le n$
,
${u_i}/{u_k} = B_p^q({E_i}:{E_k})$
.
Proof. By the definition of
$B_p^q({E_i}:{E_k})$
, along with the formula in equation (30), we have

Remark. Analogous formulas for
${v_j}/{v_k}$
, as well as for
${u^\prime_i}/{u^\prime_k}$
and
${v^\prime_j}/{v^\prime_k}$
, should be obvious.
Theorem 8 In the successive updating scenario displayed in figure 3, the proportionality
${u^\prime_i} \propto {u_i}$
is equivalent to the Bayes factor identities

and the proportionality
${v^\prime_j} \propto {v_j}$
is equivalent to the Bayes factor identities

Proof. Suppose first that
${u^\prime_i} \propto {u_i}$
, so that there exists a constant c such that
${u^\prime_i} = c \cdot {u_i}$
, for
$i$
$ = 1, \ldots ,n.$
By theorem 7,
$B_{q^\prime }^{r^\prime }({E_i}:{E_k}) = {{u^\prime _i} \over {u^\prime _k}} = {{c \cdot {u_i}} \over {c \cdot {u_k}}} = {{{u_i}} \over {{u_k}}} = B_p^q({E_i}:{E_k})$
. By theorem 7, equation (36), with
$k = 1$
, yields
${{u^\prime _i} \over {u^\prime _1}} = {{{u_i}} \over {{u_1}}}$
, whence
$u^\prime_i = c \cdot {u_i}$
, where
$c = {u^\prime_1}/{u_1}$
. The proof that
${v^\prime_j} \propto {v_j}$
is equivalent to equation (37) is nearly identical. □
The probability kinematical formulas in equations (30)–(33) encompass, inter alia, (i) Field’s parameterizations (u i = G i , v j = g j , etc.); (ii) Jeffrey’s parameterizations, after normalization (u i = R i , v j = ρ j , etc.); and (iii) the Jeffrey–Hendrickson parameterizations (u i = B i , v j = b j , etc.).
In all of these cases, we have exhibited conditions sufficient to ensure commutativity. But the conditions necessary for commutativity have only been stated for the Jeffrey–Hendrickson parameters. Bayes factor identities play a crucial role in formulating such conditions for other parameterizations, as follows:
-
1. Recall that under the regularity conditions in equations (26) and (27), commutativity implies the Jeffrey–Hendrickson parameter identities in equation (25).
-
2. Observe that the identities in equation (25) imply the Bayes factor identities in equations (36) and (37) because
$B_p^q({E_i}:{E_k}) = {B_i}/{B_k}$ , and so forth.
-
3. Observe that the Bayes factor identities imply the Field parameter identities in equation (11).
-
4. Recall that, by theorem 8, the Bayes factor identities imply the parameter proportionalities
${u^\prime_i} \propto {u_i}$ and
${v^\prime_j} \propto {v_j}.$
We conclude the case for the primacy of Bayes factors in the representation of what is learned from new evidence alone with the final observation that in the elementary model of sequential PK revision represented in figure 1, the Bayes factor identities turn out to be both necessary and sufficient for Jeffrey independence.