The Borel-Kolmogorov Paradox Is Your Paradox Too: A Puzzle for Conditional Physical Probability

Alexander Meehan; Snow Zhang

doi:10.1086/714876

The Borel-Kolmogorov Paradox Is Your Paradox Too: A Puzzle for Conditional Physical Probability

Published online by Cambridge University Press: 01 January 2022

Alexander Meehan and

Snow Zhang

Article contents

Abstract
Introduction
Uniqueness and Equivalence
Is the Formalism to Blame?
The Urgency of the Paradox
Toward a Resolution
Footnotes
References

Rights & Permissions

Abstract

The Borel-Kolmogorov paradox is often presented as an obscure problem that certain mathematical accounts of conditional probability must face. In this article, we point out that the paradox arises in the physical sciences, for physical probability or chance. By carefully formulating the paradox in this setting, we show that it is a puzzle for everyone, regardless of one’s preferred probability formalism. We propose a treatment that is inspired by the approach that scientists took when confronted with these cases.

Type: Decision Theory and Formal Epistemology
Information: Philosophy of Science , Volume 88 , Issue 5 , December 2021 , pp. 971 - 984

DOI: https://doi.org/10.1086/714876 [Opens in a new window]
Copyright: Copyright 2021 by the Philosophy of Science Association. All rights reserved.

1. Introduction

A projectile is about to be launched at an obstacle. Suppose the collision is a chancy process, such that X, the speed of the projectile right after impact, might range anywhere from 0 to 1 meters per second, with each value equally likely. For instance, the probability of fast, the event that X exceeds half a meter per second, is 1/2. At the same time, a second projectile is to be launched toward another obstacle. The possible values of Y, the speed of this projectile right after impact, also range from 0 to 1 meters per second, with each value equally likely. Suppose X and Y are causally and probabilistically independent. Given this setup, what is $P (X > 1 / 2 | X = Y)$ ? In other words, what is the probability of fast conditional on same speed?

A natural thought is that the probability of fast should not be affected by same speed: $P (X > 1 / 2 | X = Y) = P (X > 1 / 2) = 1 / 2$ . Perhaps you do not share this thought, however, and are unready to commit to an answer until you have performed some calculations. In that case, you may still be willing to endorse the following weaker claim:

Uniqueness. In this setup, the physical probability of fast given same speed is well defined and determinable on the basis of the information given above.Footnote ¹

In support of Uniqueness, note that fast and same speed are consistent, well-defined physical events. It would be odd to insist that there is no fact of the matter regarding the chance of one given the other. Moreover, it seems we have fully specified the relevant probabilistic and causal structure of the setup. What further information is needed?Footnote ²

Uniqueness is an attractive assumption. However, consider the following equally, if not more, attractive claim:

Equivalence. If B is necessarily equivalent to same speed, then the physical probability of fast given B equals the probability of fast given same speed (if well defined).Footnote ³

The thought is that if A, B are necessarily equivalent then one cannot happen without the other, and so (e.g.) if the occurrence of A raises the chance of C, then the occurrence of B must raise the chance of C and vice versa.

Unfortunately, Uniqueness and Equivalence are jointly untenable. Moreover, the problem is not restricted to this case. The projectile example was designed to be mathematically simple, but there are less artificial examples in which the same issue arises. Later in the article, we will see an example from statistical mechanics (Kac and Slepian Reference Kac and Slepian1959).

Although this formulation in terms of physical probability and Uniqueness and Equivalence is new, the mathematical aspect of this puzzle is familiar. It is a Borel-Kolmogorov-type paradox, in which conditioning on the same event B, where $P (B) = 0$ , seems to lead to different values of $P (A | B)$ depending on how B is ‘viewed’ (Kolmogorov Reference Kolmogorov1933; Rao Reference Rao1988). How to interpret this phenomenon is a controversial question. For instance, Kolmogorov suggests that this type of paradox “shows that the concept of a conditional probability with regard to an isolated given hypothesis whose probability equals 0 is inadmissible” (Reference Kolmogorov1956, 51). Others see the paradox as a problem internal to Kolmogorov’s probability formalism (Kadane, Schervish, and Seidenfeld Reference Kadane, Schervish, Seidenfeld, Kadane, Schervish and Seidenfeld1999; Hájek Reference Hájek2003; cf. Gyenis, Hofer-Szabó, and Rédei Reference Gyenis, Hofer-Szabó and Rédei2017).

Why rehash this well-known paradox in the physical probability setting? We believe the paradox gains new force and significance when considered in the context of the physical sciences. After presenting the conflict between Uniqueness and Equivalence, we point out that the paradox arises regardless of whether we use the standard Kolmogorov formalism or axiomatize conditional probability directly. We also point out that the paradox has a new urgency because it calls into question the reliability of several of scientists’ probability judgments. We then propose a way to make headway on resolving the paradox. The proposal is inspired by the approach that scientists took toward an instance in statistical mechanics.

2. Uniqueness and Equivalence

In this section we survey multiple methods for determining the probability of fast given same speed and show that, given Equivalence, they each yield incompatible answers depending on how they are applied, suggesting a conflict between Uniqueness and Equivalence.

2.1. Limits

In elementary probability theory, there is a standard method for computing conditional probabilities given probability zero events, which we will call the limits method. Roughly speaking, one calculates $P (A | B)$ as the limit of $P (A | B_{n})$ , where the B_n’s are an appropriately chosen sequence of positive-probability events that converge to B.

Recall the projectile setup above. The idea is to consider the positive probability event that X is close to Y, $| X - Y | \leq ε$ , and then compute the limit of the conditional probability as ε shrinks. Under our assumption that X and Y are independent and both uniformly distributed, the calculation

P (X > \frac{1}{2} | X = Y) = \lim_{ε \to 0} P (X > \frac{1}{2} | | X - Y | \leq ε)

yields 1/2, as expected. To see why, we can picture $| X - Y | \leq ε$ as a band of thickness ε around the diagonal line $X = Y$ , as shown in figure 1A. To determine the conditional probability of fast given the event close speed, one takes the proportion of this band that overlaps with the right-hand side of the square (the area corresponding to $X > 1 / 2$ ). By the symmetry of the band, this proportion is clearly 1/2 and will stay at 1/2 as the thickness of the band shrinks.

Figure 1. Two measures of closeness. A, $| X - Y | \leq ε$ ; B, $| X / Y - 1 | \leq ε$ .

Crucially, however, different ways of quantifying the “closeness” between the two speeds lead to different results when the same method is applied.Footnote ⁴ We have considered whether the difference between the two recoil speeds is small, $| X - Y | \leq ε$ . But we might also consider whether their ratio is almost 1, $| X / Y - 1 | \leq ε$ . And if we instead consider the limit as this ε shrinks, the calculation

P (X > \frac{1}{2} | X = Y) = \lim_{ε \to 0} P (X > \frac{1}{2} | | \frac{X}{Y} - 1 | \leq ε)

yields a value greater than 1/2. Why? Note that smaller speeds are less likely to have smaller ratios (e.g., 0.01 and 0.001 have a ratio of 10, whereas 0.91 and 0.901 have a ratio near 1), and so X/Y being small raises the probability that X is large. The situation is represented in figure 1B. The lopsided band corresponds to the event $| X / Y - 1 | \leq ε$ . The proportion of the band that overlaps with the right-hand side is clearly greater than the proportion that overlaps with the left-hand side.

Let’s make the tension between these two results more precise. On the one hand, defining $Z = X / Y$ , the limits method implies that:

1. $P (X > 1 / 2 | | X - Y | = 0) = \lim_{ε \to 0} P (X > 1 / 2 | | X - Y | \leq ε) = 1 / 2$ ,
2. $P (X > 1 / 2 | | Z - 1 | = 0) = \lim_{ε \to 0} P (X > 1 / 2 | | Z - 1 | \leq ε) > 1 / 2$ .

On the other hand, since $X = Y$ is necessarily equivalent to $| X - Y | = 0$ and $| Z - 1 | = 0$ , Equivalence implies:

3. $P (X > 1 / 2 | X = Y) = P (X > 1 / 2 | | X - Y | = 0) = P (X > 1 / 2 | | Z - 1 | = 0)$ .

And these three claims are inconsistent, assuming Uniqueness (in particular, assuming $P (X > 1 / 2 | X = Y)$ is well defined).

Now, one might insist that the difference $| X - Y | \leq ε$ is the correct or more natural way to quantify closeness. First note, though, that we do not always take difference as the intuitive measure of closeness. Consider the population data of two counties C and D in years 1918 and 2018 shown in table 1. The absolute difference between the populations is the same in both years, but there is an intuitive sense in which the population of C is “closer” to the population of D in 2018 than it was in 1918. Second, note that even if we grant that absolute difference is the correct measure of closeness, whether the response succeeds is sensitive to one’s choice of units. Imagine the Logarians are in every aspect like their earthly counterparts, except that they measure lengths in logameters rather than meters and speeds in logameters per second rather than meters per second. The Logarians might agree that absolute difference is the natural measure of closeness. However, because of their choice of scale, they will judge two speeds as close in absolute difference whenever we judge them as close in ratio. For the Logarians, the closer the two speeds are stipulated to be, the more likely it is for them to be fast. And so in the limit, when two speeds are so close that they are equal, the current method will yield the verdict that the probability of $X > 1 / 2$ is greater than 1/2. In fact, by choosing a suitable rescaling in units, one can make the limiting conditional probability any value one wants (Arnold and Roberston Reference Arnold, Roberston and Balakrishnan2002).

Table 1. Population of C and D by Year

	1918	2018
C	10	100
D	20	110

2.2. Independence Reasoning

Examining the limits method and reflecting on its failure, it seems the method does not make full use of a key piece of information about the setup, namely, that X and Y are both probabilistically and causally independent. Perhaps these independence features can allow us to determine that the answer is 1/2 without Equivalence raising trouble.

As a first attempt, one might propose the following principle:

Naive Independence. If two physical quantities X and Y are causally and probabilistically independent of each other, then the chance that X takes a specific value or range of values is unaffected by whether Y happens to sync up with X. In particular, for any x, the probability of $X > x$ given $X = Y$ equals the unconditional chance of $X > x$ .

Since X is causally and probabilistically independent of Y in the projectile scenario, Naive Independence yields $P (X > 1 / 2 | X = Y) = P (X > 1 / 2) = 1 / 2$ as desired.

As the name suggests, however, Naive Independence is false. It even gives the wrong verdict in simple finite cases. Imagine Jonny tosses a coin with bias a on the North Pole and Donny tosses a coin with bias b on the South Pole, where the two coin tosses are causally and probabilistically independent of one another. Let heads denote the event that Jonny’s coin lands heads, and let same side denote the event that Jonny and Donny’s coins land on the same side. The unconditional chance of heads is a. What is the probability of heads given same side? The Independence Principle entails the answer is a. But suppose, for example, $a = b = 0.6$ . Then

P (heads | same side) = \frac{P (heads same side)}{P (same side)} = \frac{{0.6}^{2}}{{0.6}^{2} + {0.4}^{2}} > 0.6.

In general, one can check that P(heads|same side) = P(heads) if and only if Donny’s coin is unbiased ( $b = 0.5$ ). Intuitively, this is because if Donny’s coin is biased toward heads (tails), then same side means it is more likely that Jonny’s coin lands heads (tails). An upshot here is that causal and probabilistic independence of physical quantities is not always a direct guide to conditional chances involving those quantities.

Nevertheless, the observation that heads is independent of same side when Donny’s coin is unbiased also suggests a charitable amendment to Naive Independence. Crucially, in the projectile example, Y is uniformly distributed, not ‘biased’ toward low or high speeds. At least in that special case, the thought goes, the occurrence or truth of $X = Y$ should not affect the probability of $X \geq x$ .

Independence Principle. If two physical quantities X and Y are causally and probabilistically independent of each other, have the same (marginal) distribution, and are uniformly distributed over their support,Footnote ⁵ then the distribution of X is unaffected by whether X happens to sync up with Y. In particular, for any x the chance of $X > x$ given $X = Y$ equals the unconditional chance of $X > x$ .

This principle also implies the desired verdict of 1/2 in the projectile case. And it fares much better than Naive Independence. Indeed, one can prove that if chances are probabilities that satisfy the standard Kolmogorov axioms, then the principle is true whenever $P (X = Y) > 0$ .

Fact 1. Suppose X and Y are probabilistically independent with identical uniform distributions. If $P (X = Y) > 0$ , then $P (X > x | X = Y) = P (X > x)$ for any x. (See the appendix for the proof.)

It would be very odd, one might think, if the Independence Principle were false only if $P (X = Y) = 0$ . Intuitively, the truth of this principle should not be sensitive to the chance of $X = Y$ .

Remarkably, though, trouble looms even for this revised principle. First notice that analogous reasoning supports the following dependence principle:

Dependence Principle. If two physical quantities W and V are causally and probabilistically independent of each other, have the same (marginal) distribution, and are not uniformly distributed over their support, then there exists a w such that the probability of $W > w$ given $W = V$ does not equal the unconditional chance of $W > w$ .

Indeed, one can prove that this principle is also true in the case in which $P (W = V) > 0$ .

Fact 2. Suppose W and V are probabilistically independent and have the same (marginal) distribution and are not uniform over their support. If $P (W = V) > 0$ , then there is a w such that $P (W > w | W = V) \neq P (W > w)$ .

However, now we can derive a conflict, as follows. Consider the quantities $E_{X} = (1 / 2) m X^{2}$ and $E_{Y} = (1 / 2) m Y^{2}$ , which represent the kinetic energy of the projectiles after impact, where we suppose the two projectiles have the same mass m. Note that E_X and E_Y are not uniformly distributed, and, more generally, they satisfy the conditions of the Dependence Principle, with $W = E_{X}$ and $V = E_{Y}$ . Thus, by the Dependence Principle,

1. There exists an a such that $P (E_{X} > a | E_{X} = E_{Y}) \neq P (E_{X} > a)$ .

Yet, by the Independence Principle:

2. For all x, $P (X > x | X = Y) = P (X > x)$ .

But since $X > \sqrt{2 a / m} \Leftrightarrow E_{X} > a$ , and $E_{X} = E_{Y} \Leftrightarrow X = Y$ , Equivalence implies:

3. $P (X > \sqrt{2 a / m} | X = Y) = P (E_{X} > a | E_{X} = E_{Y})$ .

Again these three claims are inconsistent, assuming Uniqueness (in particular assuming that the conditional probabilities are well defined). To see this, note that $P (X > \sqrt{2 a / m}) = P (E_{X} > a)$ , so if $P (E_{X} > a | E_{X} = E_{Y}) \neq P (E_{X} > a)$ , then $P (X > x | X = Y) \neq P (X > x)$ for some x.

2.3. Other Methods

There are other methods to derive $P (X > 1 / 2 | X = Y) = 1 / 2$ that one might attempt. For instance, consider the discretization method. Unlike the limits method, which considers $| X - Y | \leq ε$ as ε shrinks, the discretization method begins by modeling the projectile speeds as discrete quantities ${\tilde{X}}_{n}$ and ${\tilde{Y}}_{n}$ , with n distinct possible values, and then taking $\lim_{n \to \infty} P ({\tilde{X}}_{n} > 1 / 2 | {\tilde{X}}_{n} = {\tilde{Y}}_{n})$ .

However, as one might predict, this method yields different results depending on how the quantities are discretized. For example, depending on whether we discretize state space into increments of equal speed or increments of equal kinetic energy, we get different results even in the limit.

We need not belabor the point. Reflecting on all these cases, Uniqueness and Equivalence seem in conflict. Crucially, this is not to say there is no good argument for the answer of 1/2. Perhaps there are good reasons to discretize X in a certain way or to take the difference approximation $| X - Y | \leq ε$ rather than the ratio. Our point is simply that such arguments would plausibly have to appeal to physical considerations beyond those supplied in the specification of the chance setup. It seems doubtful that the physical probability of fast given same speed supervenes on, or can be determined from, only the facts about X, Y and the situation that were explicitly provided. And this is what Uniqueness requires.

3. Is the Formalism to Blame?

It is tempting to chalk up the puzzles and difficulties of the last section to a limitation of our mathematical representation, in particular a limitation of standard Kolmogorov probability theory, according to which conditional probabilities are defined derivatively rather than axiomatized directly. As Gyenis et al. (Reference Gyenis, Hofer-Szabó and Rédei2017, 2595) write, “According to [a certain popular view], the Borel-Kolmogorov Paradox poses a serious threat for the standard measure theoretic formalism of probability theory, in which conditional probability is a defined concept, and this is regarded as justification for attempts at axiomatizations of probability theory in which the conditional probability is taken as the primitive rather than a defined notion.” Proponents of this view include Kadane et al. (Reference Kadane, Schervish, Seidenfeld, Kadane, Schervish and Seidenfeld1999, 224–25, notation adapted), who see the paradox as a defect of Kolmogorov’s theory: “The seeming contradiction is often resolved by claiming that the transformation of variables only yields conditional probability given the sigma field of events determined by the random variable … not given individual events in the sigma field. This approach is unacceptable from the point of view of the statistician who, when given the information that $X = Y$ has occurred, must determine the conditional distribution of X.”

We think this view is mistaken for two reasons. First, the tension between Uniqueness and Equivalence is formalism neutral. It exists insofar as one grants that $P (A | B)$ equals the limit of $P (A | B_{n})$ or that both the Independence Principle and the Dependence Principle are true. And these principles seem attractive even if one takes conditional probability to be primitive rather than derivative.

Second, none of the existing alternative probability formalisms really solve the puzzle, conceptualized as a tension between Uniqueness and Equivalence. For instance, in the projectile example, if we model $P (\cdot | \cdot)$ as a primitive two-place function (as in Rényi Reference Rényi1955; Popper Reference Popper1959), then there will be many $P (\cdot | \cdot)$ ’s that are compatible with the setup but differ on the value assigned to $P (X > 1 / 2 | X = Y)$ . In fact, one can prove that for any answer obtained from the limits method discussed in section 2.1, there exists a Popper function $P (\cdot | \cdot)$ , which yields that answer and is otherwise consistent with the setup.Footnote ⁶ Similarly, if we allow P to range over infinitesimals in addition to positive reals between 0 and 1, then the value of $P (X > 1 / 2 | X = Y)$ typically depends on the choice of ultrafilters that specify how to sum up infinitesimal probabilities (see, e.g., Benci, Horsten, and Wenmackers Reference Benci, Horsten and Wenmackers2016). In general, these nonstandard formalisms do not solve whether $P (X > 1 / 2 | X = Y) = 1 / 2$ . They agree that this question can only be answered with respect to an appropriate Φ, where Φ is a primitive conditional probability function, an ultrafilter, a limiting procedure, a conditioning sigma field, and so on, but they leave open how a particular choice of Φ can be justified.Footnote ⁷

4. The Urgency of the Paradox

We have been arguing that the paradox is formalism neutral, and in this sense, even if you do not ascribe to the Kolmogorovian formalism, the Borel-Kolmogorov paradox is your paradox too (cf. Hájek Reference Hájek2007). But it is worth mentioning that even if you doubt this conclusion, the paradox should still concern you for a further reason.

Scientists use the standard formalism. They use the formalism to represent continuous chancy quantities and, crucially, to infer claims about conditional probabilities involving those quantities. For example, Sauve et al. (Reference Sauve, Hero, Rogers, Wilderman and Clinthorne1999) consider a gamma ray shot toward an obstacle and allowed to scatter. They ask for the probability that the ray’s out-of-plane scatter angle ϕ is greater than a, given that the ray’s Compton scatter angle θ equals b, and reason from the premise that ϕ and θ are independent to the answer $P (ϕ > a | θ = b) = P (ϕ > a)$ , which they then use in further calculations. However, our discussion in section 2 casts into doubt the reliability of this independence-based reasoning. Thus, a question arises whether to try to recover the scientists’ judgment or to somehow explain away the role of these conditional probabilities in the model, not only in this case but also many other cases in which theorists and practitioners apply the methods from section 2. We think discarding the judgments is a last resort, so we should explore strategies for resolving the paradox that might vindicate them.

5. Toward a Resolution

Since Uniqueness and Equivalence conflict, the first step in a proposed resolution to the paradox is to reject at least one of these premises. We think that rejecting Equivalence is a last resort. In the setting of rational credence, Cr(A|B) may plausibly differ from Cr(A|B′) even when B and B′ are metaphysically necessarily equivalent. This may happen, for instance, if rational credence is sensitive to intensional or hyperintensional differences; for example, our credences conditional on Hesperus is Phosphorus may differ from our credences conditional on Phosphorus is Phosphorus.Footnote ⁸ However, we find this kind of failure of Equivalence much less plausible in the case of physical probability. Physical probability, it seems, should not be sensitive to intensional and hyperintensional differences, at least not in ordinary scenarios like the projectile example.

We propose instead rejecting Uniqueness, in particular the determinability clause. In the projectile scenario, while there is a single true value of the physical probability $P (X > 1 / 2 | X = Y)$ , which does not change depending on how $X = Y$ is viewed or represented, this value is not determinable from the information provided. Either $P (X > 1 / 2 | X = Y)$ taking the value it does is a primitive matter or, more plausibly, it takes the value it does in virtue of physical features of the chance setup that were not explicitly specified.Footnote ⁹

What kind of physical features could those be? On our treatment, this is the central question. While we do not have a general answer to this question, we think an example from statistical mechanics (Kac and Slepian Reference Kac and Slepian1959) offers us an important clue. We conclude the article by examining this case.

Consider a particle whose trajectory is a Gaussian process ${x (t)}_{t \in ℝ}$ . In particular, fixing a time t, the displacement x(t) follows a Gaussian distribution with a mean of 0 meters and a standard deviation of 1 meter from the origin. Let v(t) denote the velocity of the particle at t, which we suppose also follows a Gaussian distribution and is independent of x(t). Given this setup, what is $P (v (0) > 1 / 2 | x (0) = a)$ ? In other words, what is the probability that the particle is fast at $t = 0$ , given that it is located at point a?

Independence considerations yield the answer

P (v (0) > \frac{1}{2} | x (0) = a) = P (v (0) > \frac{1}{2}) \approx 0.31.

This is equivalent to the answer we obtain by fixing the time $t = 0$ and considering the limit as x(0) gets closer to the point a:

\lim_{ε \to 0} P (v (0) > \frac{1}{2} | | x (0) - a | \leq ε) = P (v (0) > \frac{1}{2}) \approx 0.31.

This procedure, depicted in figure 2A, is called taking the ‘vertical window’.

Figure 2. Vertical (A) and horizontal (B) windows.

However, suppose instead we consider the ‘horizontal window’ where we hold fixed the point a and consider a shrinking interval of time toward $t = 0$ (fig. 2B):

\lim_{ε \to 0} P (v (0) > \frac{1}{2} | x (t) = a for some t \in [- ε, 0]) \approx 0.47.

This second procedure yields a different answer, similarly to the situation in section 2.

So far, these findings parallel our earlier findings in the projectile case. However, in this case there is a more obvious physical consideration that breaks the stalemate between these two answers—and surprisingly, not in favor of the independence-based verdict. The consideration is that the process governing the particle is ergodic. Roughly speaking, ergodicity means the probabilistic properties of the process x(t) can be deduced from its frequency properties in a relevant time ensemble. It turns out that the horizontal window procedure is the only one that respects this equality between the probability and time ensemble limiting frequency. As Cramér and Leadbetter (Reference Cramér and Leadbetter2004, 221; notation adapted) write:

Further, one of the main properties which we shall require in many cases is that the conditional probability function should admit an appropriate interpretation in terms of a limit of relative frequencies. In the simple example given above, suppose that t_i denotes the time points in (0, T) at which crossings of the level a [by the variable x] occur. Then, one may wish to show that the proportion of those t_i for which $v (t_{i}) > 1 / 2$ converges (with probability one) to $P (v (0) > 1 / 2 | x (0) = a)$ as $T \to \infty$ , under an appropriate definition of the conditional probability. The definition to be given will be such that “ergodic properties” of this nature may be demonstrated under appropriate conditions. Specifically, suppose we replace the conditioning event $a - ε < x (0) \leq a$ in $P (v (0) > 1 / 2 | a - ε < x (0) \leq a)$ by the event “ $x (t) = a$ for some t in the interval [−ε, 0].” A conditional probability $P_{2} (v (0) > 1 / 2 | x (0) = a)$ may then be defined by [the horizontal window definition above].

Granted, these specific considerations will not generalize to all chancy processes where the paradox arises. But it is a useful illustration of how, rather than rejecting Equivalence, we might appeal to further physical features of the chance setup to determine the answer. The example also serves to further illustrate that the paradox discussed in this article, concerning the conflict between Equivalence and Uniqueness, is not just a contrived mathematical puzzle. It arises in routine scientific modeling and should be taken seriously.

Appendix. Proofs

To prove fact 1, we use the following lemma, which we leave as an exercise to the reader:

Lemma 1. Let X, Y be independent real-valued random variables with identical, uniform marginal distributions. Let P be a countably additive probability function. If $P (X = Y) > 0$ , then both X and Y take on only finitely many values.

Proof of fact 1. By the lemma it suffices to show $P (X = x | X = Y) = P (X = x)$ for all x. If $P (X = x) = 0$ we are done, so suppose $P (X = x) = P (Y = x) = α > 0$ . Then

P (X = x | X = Y) = \frac{P (X = x, Y = x)}{\sum_{x^{'}} P (X = x^{'}, Y = x^{'})} = \frac{α P (X = x)}{α \sum_{x^{'} : P (X = x^{'}) > 0} P (X = x^{'})} = P (X = x) .

QED

Proof of fact 2. Note that

P (a < W \leq b | W = V) \leq \frac{P (a < W \leq b, a < V \leq b)}{P (W = V)} = \frac{P {(a < W \leq b)}^{2}}{P (W = V)}

for each interval (a, b]. Two cases: either (1) there is (a, b] such that $0 < P (a < W \leq b) < P (W = V)$ , or (2) there is not. (1) For that (a, b], if $P (W > b | W = V) \neq P (W > b)$ , we are done. Otherwise, $P (W > a | W = V) = P (a < W \leq b | W = V) + P (W > b | W = V) < P (a < W \leq b) + P (W > b) = P (W > a)$ . (2) This implies that W is discrete, with its atoms at least of weight $P (W = V)$ .Footnote ¹⁰ Since W and V are identically distributed, the same holds for V. Now since V is not uniform over its support, there are v, v′ such that $P (V = v) > P (V = v^{'}) > 0$ . Thus,

\frac{P (W = v^{'} | W = V)}{P (W = v | W = V)} = \frac{P (W = v^{'}) P (V = v^{'})}{P (W = v) P (V = v)} < \frac{P (W = v^{'})}{P (W = v)},

and so either $P (W = v | W = V) \neq P (W = v)$ or $P (W = v^{'} | W = V) \neq P (W = v^{'})$ or both. Since a change in W’s posterior distribution implies a change in its cumulative distribution function, there must exist an x that satisfies the desired condition. QED

Footnotes

†

We are grateful to Jacob Barandes, David Builes, Eliya Cohen, Kenny Easwaran, Adam Elga, Hans Halvorson, Michele Odisseas Impagnatiello, Jim Joyce, Boris Kment, Kyle Landrum, Sarah McGrath, Chris Register, Laura Ruetsche, Alejandro Naranjo Sandoval, Haley Schilling, and audiences at the Harvard Foundations of Physics Series and Princeton and MIT for their very helpful feedback and advice. The title of this article is inspired by the similar title “The Reference Class Problem Is Your Problem Too” by Alan Hájek (2007).

1. At least for now, we stay neutral on whether determinability is understood in a metaphysical sense (e.g., supervenience) or an epistemic sense (e.g., derivability).

2. To lend credence to the second clause, if P ( X = Y ) > 0 , then the probability of fast given same speed is indeed determinable on the basis of the information specified above.

3. Here we have in mind nomological (or physical) necessity, although similar puzzles will arise even if we strengthen to metaphysical or logical necessity.

4. The mathematical backbone of this case is due to Lindley (Reference Lindley, Oliveria and Epstein1982) and Rescorla (Reference Rescorla2015).

5. The support of a real-valued random variable X is the set { x ∈ ℝ : P ( X ∈ B ( x , r ) ) > 0 for all r > 0 } , where B(x, r) is the set of points whose distance from x is less than r.

6. This follows from theorem 5 of Dubins (Reference Dubins1975).

7. That being said, one’s choice of formalism is not entirely irrelevant to the paradox. Different formalisms may fit more or less naturally with certain approaches to resolving the paradox. Kolmogorov’s formalism, for example, lends itself very naturally to approaches that reject Equivalence, whereas if one models probabilities using primitive conditional probability functions, Equivalence follows as a theorem of their axioms.

8. Rescorla (Reference Rescorla2015) analyzes the Borel-Kolmogorov paradox in the subjective setting and pursues this kind of idea.

9. A related treatment of the paradox is the ‘relativization’ approach (Kolmogorov Reference Kolmogorov1933; Jaynes Reference Jaynes2003; Easwaran Reference Easwaran2008; Gyenis et al. Reference Gyenis, Hofer-Szabó and Rédei2017). The idea is that conditional probability is a three-place relation; the probability of A given B is always relativized to a sigma field that represents some “hypothetical experiment” from which B is drawn. Our proposal goes beyond this approach in two respects. First, while we think conditional (physical) probabilities are sensitive to contextual information about the chance setup, we are agnostic here about (i) whether this information is exhausted by information about the experiment, if there is one, and (ii) whether it can always be mathematically represented as a sigma field of events. Second, as Rescorla (Reference Rescorla2015, 755) points out, the relativization approach does not “explain why conditional probabilities are relativized” but merely “stipulates that they are.” Our proposal gives a partial explanatory story: as one might expect, physical probabilities are sensitive to physical features of the chance setup; what the paradox shows is that these features may not always be fully specified in textbook toy examples.

10. We leave this step as an exercise for the reader.

References

Arnold, B., and Roberston, C.. 2002. “The Conditional Distribution of X Given X = Y Can Be Almost Anything!” In Advances on Theoretical and Methodological Aspects of Probability and Statistics, ed. Balakrishnan, N., 75–81. New York: Taylor & Francis.Google Scholar

Benci, V., Horsten, L., and Wenmackers, S.. 2016. “Infinitesimal Probabilities.” British Journal for the Philosophy of Science 69 (2): 509–52.Google Scholar PubMed

Cramér, H., and Leadbetter, M. R.. 2004. Stationary and Related Stochastic Processes: Sample Function Properties and Their Applications. New York: Dover.Google Scholar

Dubins, L. E. 1975. “Finitely Additive Conditional Probabilities, Conglomerability and Disintegrations.” Annals of Probability 3 (1): 89–99.CrossRef Google Scholar

Easwaran, K. K. 2008. The Foundations of Conditional Probability. Berkeley: University of California Press.Google Scholar

Gyenis, Z., Hofer-Szabó, G., and Rédei, M.. 2017. “Conditioning Using Conditional Expectations: The Borel-Kolmogorov Paradox.” Synthese 194 (7): 2595–630.CrossRef Google Scholar

Hájek, A. 2003. “What Conditional Probability Could Not Be.” Synthese 137 (3): 273–323.CrossRef Google Scholar

Hájek, A.. 2007. “The Reference Class Problem Is Your Problem Too.” Synthese 156 (3): 563–85.CrossRef Google Scholar

Jaynes, E. T. 2003. Probability Theory: The Logic of Science. Cambridge: Cambridge University Press.CrossRef Google Scholar

Kac, M., and Slepian, D.. 1959. “Large Excursions of Gaussian Processes.” Annals of Mathematical Statistics 30 (4): 1215–28.CrossRef Google Scholar

Kadane, J. B., Schervish, M. J., and Seidenfeld, T.. 1999. “Statistical Implications of Finitely Additive Probability.” In Rethinking the Foundations of Statistics, ed. Kadane, J. B., Schervish, M. J., and Seidenfeld, T., 211–32. Cambridge: Cambridge University Press.CrossRef Google Scholar

Kolmogorov, A. N. 1933. Grundbegriffe der Wahrscheinlichkeitsrechnung. Berlin: Springer.CrossRef Google Scholar

Kolmogorov, A. N.. 1956. Foundations of the Theory of Probability. Trans. N. Morrison, 2nd ed. New York: Chelsea.Google Scholar

Lindley, D. 1982. “The Bayesian Approach to Statistics.” In Some Advances in Statistics, ed. Oliveria, J. de and Epstein, B.. New York: Academic Press.Google Scholar

Popper, K. R. 1959. The Logic of Scientific Discovery. London: Routledge.Google Scholar

Rao, M. 1988. “Paradoxes in Conditional Probability.” Journal of Multivariate Analysis 27 (2): 434–46.CrossRef Google Scholar

Rényi, A. 1955. “On a New Axiomatic Theory of Probability.” Acta Mathematica Hungarica 6 (3–4): 285–335.Google Scholar

Rescorla, M. 2015. “Some Epistemological Ramifications of the Borel-Kolmogorov Paradox.” Synthese 192 (3): 735–67.CrossRef Google Scholar

Sauve, A. C., Hero, A., Rogers, W. L., Wilderman, S., and Clinthorne, N.. 1999. “3D Image Reconstruction for a Compton SPECT Camera Model.” IEEE Transactions on Nuclear Science 46 (6): 2075–84.CrossRef Google Scholar

Figure 1. Two measures of closeness. A, |X−Y|≤ε; B, |X/Y−1|≤ε.

Table 1. Population of C and D by Year

Figure 2. Vertical (A) and horizontal (B) windows.

Article contents

The Borel-Kolmogorov Paradox Is Your Paradox Too: A Puzzle for Conditional Physical Probability

Abstract

1. Introduction

2. Uniqueness and Equivalence

2.1. Limits

2.2. Independence Reasoning

2.3. Other Methods

3. Is the Formalism to Blame?

4. The Urgency of the Paradox

5. Toward a Resolution

Appendix. Proofs

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests