1. Introduction
A projectile is about to be launched at an obstacle. Suppose the collision is a chancy process, such that X, the speed of the projectile right after impact, might range anywhere from 0 to 1 meters per second, with each value equally likely. For instance, the probability of fast, the event that X exceeds half a meter per second, is 1/2. At the same time, a second projectile is to be launched toward another obstacle. The possible values of Y, the speed of this projectile right after impact, also range from 0 to 1 meters per second, with each value equally likely. Suppose X and Y are causally and probabilistically independent. Given this setup, what is
? In other words, what is the probability of fast conditional on same speed?
A natural thought is that the probability of fast should not be affected by same speed:
. Perhaps you do not share this thought, however, and are unready to commit to an answer until you have performed some calculations. In that case, you may still be willing to endorse the following weaker claim:
Uniqueness. In this setup, the physical probability of fast given same speed is well defined and determinable on the basis of the information given above.Footnote 1
In support of Uniqueness, note that fast and same speed are consistent, well-defined physical events. It would be odd to insist that there is no fact of the matter regarding the chance of one given the other. Moreover, it seems we have fully specified the relevant probabilistic and causal structure of the setup. What further information is needed?Footnote 2
Uniqueness is an attractive assumption. However, consider the following equally, if not more, attractive claim:
Equivalence. If B is necessarily equivalent to same speed, then the physical probability of fast given B equals the probability of fast given same speed (if well defined).Footnote 3
The thought is that if A, B are necessarily equivalent then one cannot happen without the other, and so (e.g.) if the occurrence of A raises the chance of C, then the occurrence of B must raise the chance of C and vice versa.
Unfortunately, Uniqueness and Equivalence are jointly untenable. Moreover, the problem is not restricted to this case. The projectile example was designed to be mathematically simple, but there are less artificial examples in which the same issue arises. Later in the article, we will see an example from statistical mechanics (Kac and Slepian Reference Kac and Slepian1959).
Although this formulation in terms of physical probability and Uniqueness and Equivalence is new, the mathematical aspect of this puzzle is familiar. It is a Borel-Kolmogorov-type paradox, in which conditioning on the same event B, where
, seems to lead to different values of
depending on how B is ‘viewed’ (Kolmogorov Reference Kolmogorov1933; Rao Reference Rao1988). How to interpret this phenomenon is a controversial question. For instance, Kolmogorov suggests that this type of paradox “shows that the concept of a conditional probability with regard to an isolated given hypothesis whose probability equals 0 is inadmissible” (Reference Kolmogorov1956, 51). Others see the paradox as a problem internal to Kolmogorov’s probability formalism (Kadane, Schervish, and Seidenfeld Reference Kadane, Schervish, Seidenfeld, Kadane, Schervish and Seidenfeld1999; Hájek Reference Hájek2003; cf. Gyenis, Hofer-Szabó, and Rédei Reference Gyenis, Hofer-Szabó and Rédei2017).
Why rehash this well-known paradox in the physical probability setting? We believe the paradox gains new force and significance when considered in the context of the physical sciences. After presenting the conflict between Uniqueness and Equivalence, we point out that the paradox arises regardless of whether we use the standard Kolmogorov formalism or axiomatize conditional probability directly. We also point out that the paradox has a new urgency because it calls into question the reliability of several of scientists’ probability judgments. We then propose a way to make headway on resolving the paradox. The proposal is inspired by the approach that scientists took toward an instance in statistical mechanics.
2. Uniqueness and Equivalence
In this section we survey multiple methods for determining the probability of fast given same speed and show that, given Equivalence, they each yield incompatible answers depending on how they are applied, suggesting a conflict between Uniqueness and Equivalence.
2.1. Limits
In elementary probability theory, there is a standard method for computing conditional probabilities given probability zero events, which we will call the limits method. Roughly speaking, one calculates
as the limit of
, where the Bn’s are an appropriately chosen sequence of positive-probability events that converge to B.
Recall the projectile setup above. The idea is to consider the positive probability event that X is close to Y,
, and then compute the limit of the conditional probability as ε shrinks. Under our assumption that X and Y are independent and both uniformly distributed, the calculation

yields 1/2, as expected. To see why, we can picture
as a band of thickness ε around the diagonal line
, as shown in figure 1A. To determine the conditional probability of fast given the event close speed, one takes the proportion of this band that overlaps with the right-hand side of the square (the area corresponding to
). By the symmetry of the band, this proportion is clearly 1/2 and will stay at 1/2 as the thickness of the band shrinks.

Figure 1. Two measures of closeness. A,
; B,
.
Crucially, however, different ways of quantifying the “closeness” between the two speeds lead to different results when the same method is applied.Footnote 4 We have considered whether the difference between the two recoil speeds is small,
. But we might also consider whether their ratio is almost 1,
. And if we instead consider the limit as this ε shrinks, the calculation

yields a value greater than 1/2. Why? Note that smaller speeds are less likely to have smaller ratios (e.g., 0.01 and 0.001 have a ratio of 10, whereas 0.91 and 0.901 have a ratio near 1), and so X/Y being small raises the probability that X is large. The situation is represented in figure 1B. The lopsided band corresponds to the event
. The proportion of the band that overlaps with the right-hand side is clearly greater than the proportion that overlaps with the left-hand side.
Let’s make the tension between these two results more precise. On the one hand, defining
, the limits method implies that:
1.
,
2.
.
On the other hand, since
is necessarily equivalent to
and
, Equivalence implies:
3.
.
And these three claims are inconsistent, assuming Uniqueness (in particular, assuming
is well defined).
Now, one might insist that the difference
is the correct or more natural way to quantify closeness. First note, though, that we do not always take difference as the intuitive measure of closeness. Consider the population data of two counties C and D in years 1918 and 2018 shown in table 1. The absolute difference between the populations is the same in both years, but there is an intuitive sense in which the population of C is “closer” to the population of D in 2018 than it was in 1918. Second, note that even if we grant that absolute difference is the correct measure of closeness, whether the response succeeds is sensitive to one’s choice of units. Imagine the Logarians are in every aspect like their earthly counterparts, except that they measure lengths in logameters rather than meters and speeds in logameters per second rather than meters per second. The Logarians might agree that absolute difference is the natural measure of closeness. However, because of their choice of scale, they will judge two speeds as close in absolute difference whenever we judge them as close in ratio. For the Logarians, the closer the two speeds are stipulated to be, the more likely it is for them to be fast. And so in the limit, when two speeds are so close that they are equal, the current method will yield the verdict that the probability of
is greater than 1/2. In fact, by choosing a suitable rescaling in units, one can make the limiting conditional probability any value one wants (Arnold and Roberston Reference Arnold, Roberston and Balakrishnan2002).
Table 1. Population of C and D by Year
1918 | 2018 | |
---|---|---|
C | 10 | 100 |
D | 20 | 110 |

2.2. Independence Reasoning
Examining the limits method and reflecting on its failure, it seems the method does not make full use of a key piece of information about the setup, namely, that X and Y are both probabilistically and causally independent. Perhaps these independence features can allow us to determine that the answer is 1/2 without Equivalence raising trouble.
As a first attempt, one might propose the following principle:
Naive Independence. If two physical quantities X and Y are causally and probabilistically independent of each other, then the chance that X takes a specific value or range of values is unaffected by whether Y happens to sync up with X. In particular, for any x, the probability of
given
equals the unconditional chance of
.
Since X is causally and probabilistically independent of Y in the projectile scenario, Naive Independence yields
as desired.
As the name suggests, however, Naive Independence is false. It even gives the wrong verdict in simple finite cases. Imagine Jonny tosses a coin with bias a on the North Pole and Donny tosses a coin with bias b on the South Pole, where the two coin tosses are causally and probabilistically independent of one another. Let heads denote the event that Jonny’s coin lands heads, and let same side denote the event that Jonny and Donny’s coins land on the same side. The unconditional chance of heads is a. What is the probability of heads given same side? The Independence Principle entails the answer is a. But suppose, for example,
. Then

In general, one can check that P(heads|same side) = P(heads) if and only if Donny’s coin is unbiased (
). Intuitively, this is because if Donny’s coin is biased toward heads (tails), then same side means it is more likely that Jonny’s coin lands heads (tails). An upshot here is that causal and probabilistic independence of physical quantities is not always a direct guide to conditional chances involving those quantities.
Nevertheless, the observation that heads is independent of same side when Donny’s coin is unbiased also suggests a charitable amendment to Naive Independence. Crucially, in the projectile example, Y is uniformly distributed, not ‘biased’ toward low or high speeds. At least in that special case, the thought goes, the occurrence or truth of
should not affect the probability of
.
Independence Principle. If two physical quantities X and Y are causally and probabilistically independent of each other, have the same (marginal) distribution, and are uniformly distributed over their support,Footnote 5 then the distribution of X is unaffected by whether X happens to sync up with Y. In particular, for any x the chance of
given
equals the unconditional chance of
.
This principle also implies the desired verdict of 1/2 in the projectile case. And it fares much better than Naive Independence. Indeed, one can prove that if chances are probabilities that satisfy the standard Kolmogorov axioms, then the principle is true whenever
.
Fact 1. Suppose X and Y are probabilistically independent with identical uniform distributions. If
, then
for any x. (See the appendix for the proof.)
It would be very odd, one might think, if the Independence Principle were false only if
. Intuitively, the truth of this principle should not be sensitive to the chance of
.
Remarkably, though, trouble looms even for this revised principle. First notice that analogous reasoning supports the following dependence principle:
Dependence Principle. If two physical quantities W and V are causally and probabilistically independent of each other, have the same (marginal) distribution, and are not uniformly distributed over their support, then there exists a w such that the probability of
given
does not equal the unconditional chance of
.
Indeed, one can prove that this principle is also true in the case in which
.
Fact 2. Suppose W and V are probabilistically independent and have the same (marginal) distribution and are not uniform over their support. If
, then there is a w such that
.
However, now we can derive a conflict, as follows. Consider the quantities
and
, which represent the kinetic energy of the projectiles after impact, where we suppose the two projectiles have the same mass m. Note that EX and EY are not uniformly distributed, and, more generally, they satisfy the conditions of the Dependence Principle, with
and
. Thus, by the Dependence Principle,
1. There exists an a such that
.
Yet, by the Independence Principle:
2. For all x,
.
But since
, and
, Equivalence implies:
3.
.
Again these three claims are inconsistent, assuming Uniqueness (in particular assuming that the conditional probabilities are well defined). To see this, note that
, so if
, then
for some x.
2.3. Other Methods
There are other methods to derive
that one might attempt. For instance, consider the discretization method. Unlike the limits method, which considers
as ε shrinks, the discretization method begins by modeling the projectile speeds as discrete quantities
and
, with n distinct possible values, and then taking
.
However, as one might predict, this method yields different results depending on how the quantities are discretized. For example, depending on whether we discretize state space into increments of equal speed or increments of equal kinetic energy, we get different results even in the limit.
We need not belabor the point. Reflecting on all these cases, Uniqueness and Equivalence seem in conflict. Crucially, this is not to say there is no good argument for the answer of 1/2. Perhaps there are good reasons to discretize X in a certain way or to take the difference approximation
rather than the ratio. Our point is simply that such arguments would plausibly have to appeal to physical considerations beyond those supplied in the specification of the chance setup. It seems doubtful that the physical probability of fast given same speed supervenes on, or can be determined from, only the facts about X, Y and the situation that were explicitly provided. And this is what Uniqueness requires.
3. Is the Formalism to Blame?
It is tempting to chalk up the puzzles and difficulties of the last section to a limitation of our mathematical representation, in particular a limitation of standard Kolmogorov probability theory, according to which conditional probabilities are defined derivatively rather than axiomatized directly. As Gyenis et al. (Reference Gyenis, Hofer-Szabó and Rédei2017, 2595) write, “According to [a certain popular view], the Borel-Kolmogorov Paradox poses a serious threat for the standard measure theoretic formalism of probability theory, in which conditional probability is a defined concept, and this is regarded as justification for attempts at axiomatizations of probability theory in which the conditional probability is taken as the primitive rather than a defined notion.” Proponents of this view include Kadane et al. (Reference Kadane, Schervish, Seidenfeld, Kadane, Schervish and Seidenfeld1999, 224–25, notation adapted), who see the paradox as a defect of Kolmogorov’s theory: “The seeming contradiction is often resolved by claiming that the transformation of variables only yields conditional probability given the sigma field of events determined by the random variable … not given individual events in the sigma field. This approach is unacceptable from the point of view of the statistician who, when given the information that
has occurred, must determine the conditional distribution of X.”
We think this view is mistaken for two reasons. First, the tension between Uniqueness and Equivalence is formalism neutral. It exists insofar as one grants that
equals the limit of
or that both the Independence Principle and the Dependence Principle are true. And these principles seem attractive even if one takes conditional probability to be primitive rather than derivative.
Second, none of the existing alternative probability formalisms really solve the puzzle, conceptualized as a tension between Uniqueness and Equivalence. For instance, in the projectile example, if we model
as a primitive two-place function (as in Rényi Reference Rényi1955; Popper Reference Popper1959), then there will be many
’s that are compatible with the setup but differ on the value assigned to
. In fact, one can prove that for any answer obtained from the limits method discussed in section 2.1, there exists a Popper function
, which yields that answer and is otherwise consistent with the setup.Footnote 6 Similarly, if we allow P to range over infinitesimals in addition to positive reals between 0 and 1, then the value of
typically depends on the choice of ultrafilters that specify how to sum up infinitesimal probabilities (see, e.g., Benci, Horsten, and Wenmackers Reference Benci, Horsten and Wenmackers2016). In general, these nonstandard formalisms do not solve whether
. They agree that this question can only be answered with respect to an appropriate Φ, where Φ is a primitive conditional probability function, an ultrafilter, a limiting procedure, a conditioning sigma field, and so on, but they leave open how a particular choice of Φ can be justified.Footnote 7
4. The Urgency of the Paradox
We have been arguing that the paradox is formalism neutral, and in this sense, even if you do not ascribe to the Kolmogorovian formalism, the Borel-Kolmogorov paradox is your paradox too (cf. Hájek Reference Hájek2007). But it is worth mentioning that even if you doubt this conclusion, the paradox should still concern you for a further reason.
Scientists use the standard formalism. They use the formalism to represent continuous chancy quantities and, crucially, to infer claims about conditional probabilities involving those quantities. For example, Sauve et al. (Reference Sauve, Hero, Rogers, Wilderman and Clinthorne1999) consider a gamma ray shot toward an obstacle and allowed to scatter. They ask for the probability that the ray’s out-of-plane scatter angle ϕ is greater than a, given that the ray’s Compton scatter angle θ equals b, and reason from the premise that ϕ and θ are independent to the answer
, which they then use in further calculations. However, our discussion in section 2 casts into doubt the reliability of this independence-based reasoning. Thus, a question arises whether to try to recover the scientists’ judgment or to somehow explain away the role of these conditional probabilities in the model, not only in this case but also many other cases in which theorists and practitioners apply the methods from section 2. We think discarding the judgments is a last resort, so we should explore strategies for resolving the paradox that might vindicate them.
5. Toward a Resolution
Since Uniqueness and Equivalence conflict, the first step in a proposed resolution to the paradox is to reject at least one of these premises. We think that rejecting Equivalence is a last resort. In the setting of rational credence, Cr(A|B) may plausibly differ from Cr(A|B′) even when B and B′ are metaphysically necessarily equivalent. This may happen, for instance, if rational credence is sensitive to intensional or hyperintensional differences; for example, our credences conditional on Hesperus is Phosphorus may differ from our credences conditional on Phosphorus is Phosphorus.Footnote 8 However, we find this kind of failure of Equivalence much less plausible in the case of physical probability. Physical probability, it seems, should not be sensitive to intensional and hyperintensional differences, at least not in ordinary scenarios like the projectile example.
We propose instead rejecting Uniqueness, in particular the determinability clause. In the projectile scenario, while there is a single true value of the physical probability
, which does not change depending on how
is viewed or represented, this value is not determinable from the information provided. Either
taking the value it does is a primitive matter or, more plausibly, it takes the value it does in virtue of physical features of the chance setup that were not explicitly specified.Footnote 9
What kind of physical features could those be? On our treatment, this is the central question. While we do not have a general answer to this question, we think an example from statistical mechanics (Kac and Slepian Reference Kac and Slepian1959) offers us an important clue. We conclude the article by examining this case.
Consider a particle whose trajectory is a Gaussian process
. In particular, fixing a time t, the displacement x(t) follows a Gaussian distribution with a mean of 0 meters and a standard deviation of 1 meter from the origin. Let v(t) denote the velocity of the particle at t, which we suppose also follows a Gaussian distribution and is independent of x(t). Given this setup, what is
? In other words, what is the probability that the particle is fast at
, given that it is located at point a?
Independence considerations yield the answer

This is equivalent to the answer we obtain by fixing the time
and considering the limit as x(0) gets closer to the point a:

This procedure, depicted in figure 2A, is called taking the ‘vertical window’.

Figure 2. Vertical (A) and horizontal (B) windows.
However, suppose instead we consider the ‘horizontal window’ where we hold fixed the point a and consider a shrinking interval of time toward
(fig. 2B):

This second procedure yields a different answer, similarly to the situation in section 2.
So far, these findings parallel our earlier findings in the projectile case. However, in this case there is a more obvious physical consideration that breaks the stalemate between these two answers—and surprisingly, not in favor of the independence-based verdict. The consideration is that the process governing the particle is ergodic. Roughly speaking, ergodicity means the probabilistic properties of the process x(t) can be deduced from its frequency properties in a relevant time ensemble. It turns out that the horizontal window procedure is the only one that respects this equality between the probability and time ensemble limiting frequency. As Cramér and Leadbetter (Reference Cramér and Leadbetter2004, 221; notation adapted) write:
Further, one of the main properties which we shall require in many cases is that the conditional probability function should admit an appropriate interpretation in terms of a limit of relative frequencies. In the simple example given above, suppose that ti denotes the time points in (0, T) at which crossings of the level a [by the variable x] occur. Then, one may wish to show that the proportion of those ti for which
converges (with probability one) to
as
, under an appropriate definition of the conditional probability. The definition to be given will be such that “ergodic properties” of this nature may be demonstrated under appropriate conditions. Specifically, suppose we replace the conditioning event
in
by the event “
for some t in the interval [−ε, 0].” A conditional probability
may then be defined by [the horizontal window definition above].
Granted, these specific considerations will not generalize to all chancy processes where the paradox arises. But it is a useful illustration of how, rather than rejecting Equivalence, we might appeal to further physical features of the chance setup to determine the answer. The example also serves to further illustrate that the paradox discussed in this article, concerning the conflict between Equivalence and Uniqueness, is not just a contrived mathematical puzzle. It arises in routine scientific modeling and should be taken seriously.
Appendix. Proofs
To prove fact 1, we use the following lemma, which we leave as an exercise to the reader:
Lemma 1. Let X, Y be independent real-valued random variables with identical, uniform marginal distributions. Let P be a countably additive probability function. If
, then both X and Y take on only finitely many values.
Proof of fact 1. By the lemma it suffices to show
for all x. If
we are done, so suppose
. Then

QED
Proof of fact 2. Note that

for each interval (a, b]. Two cases: either (1) there is (a, b] such that
, or (2) there is not. (1) For that (a, b], if
, we are done. Otherwise,
. (2) This implies that W is discrete, with its atoms at least of weight
.Footnote 10 Since W and V are identically distributed, the same holds for V. Now since V is not uniform over its support, there are v, v′ such that
. Thus,

and so either
or
or both. Since a change in W’s posterior distribution implies a change in its cumulative distribution function, there must exist an x that satisfies the desired condition. QED