1. Introduction
Presented with Bayesian confirmation theory, it is easy to feel cheated. One might have hoped for a substantive, detailed account of what sorts of evidence support what sorts of scientific hypotheses. Instead one is told how one’s evidence determines reasonable attitudes toward such hypotheses given a prior (an initial probability function). And one is told that different priors deliver different outputs, even for the same batch of total evidence.
One might worry that given this dependence, Bayesianism is ill-placed to explain the significant agreement observed among reasonable scientists or to deliver an objective account of confirmation in science.Footnote 1 In the face of this worry it is natural to seek comfort from some remarkable long-run convergence-to-the-truth and washing-out theorems. These theorems show that unless priors differ radically, differences between them become negligible in the long run, under the impact of a stream of common evidence. This is sometimes thought to take the sting out of the above worry, by showing that many differences between priors do not end up mattering.
But Belot (Reference Belot2013) argues that rather than helping Bayesianism, such convergence theorems are a liability to it. The argument is that the theorems preclude Bayesians from counting as rational a “reasonable modesty” about whether one’s opinions will approach the truth in the long run.
I will argue
1. Long-run convergence theorems are no liability to finitely additive Bayesianism, a version of Bayesianism that rejects countable additivity as a requirement of rationality.Footnote 2 Defenders of finitely additive Bayesianism are free to count any amount of humility about convergence to the truth—even extreme pessimism—as rationally permissible.Footnote 3
2. Long-run convergence theorems are of no help to Bayesians responding to concerns about scientific agreement. In contrast, short-run convergence theorems (Howson and Urbach Reference Howson and Urbach2006, 238; Hawthorne Reference Hawthorne and Zalta2014, sec. 5) are of some help. Those theorems do not require countable additivity.
Let me take these points in turn, starting with a brief explanation of long-run convergence theorems and an assessment of the charge that such theorems count against Bayesianism.
2. Long-Run Convergence Theorems
To introduce the long-run convergence theorems that are wielded in Belot (Reference Belot2013), consider an immortal investigator whose evidence consists of successive digits from a countably infinite binary sequence (a sequence consisting of zeros and ones). The investigator receives one digit of the sequence per day and is interested in H, a fixed hypothesis about the whole sequence. For example, H might be the proposition that after a certain point, the sequence consists of all ones. (For convenience I treat interchangeably a proposition about the sequence and the corresponding set of sequences for which that proposition holds.) From here on, I assume that this setup is in place unless otherwise noted.
Now apply Bayesian confirmation theory to this setup. In particular, suppose that the investigator starts with a prior probability function P defined over an appropriate domain that includes H, updates by conditionalization each time she receives a digit, and is certain of all of the above.
Formal construction Let C denote , the collection of all functions from
to {0, 1}. Thus, each element of C is a (countably) infinite binary sequence. For any finite or infinite binary sequence x and any positive integer i, we write x(i) to denote the ith term of x—that is, the binary digit that x assigns to i—and we write xi to denote the set
of sequences from C whose first i digits match x. By “string” we mean “finite binary sequence,” and for convenience we count the null sequence—the sequence of length 0—as a string. Given string s, let [s] denote the set of sequences from C that start with s.
We assume that C is a Cantor space. That is, we assume that C is endowed with the product topology. In this context, the family of all sets [s] (for strings s) is a denumerable topological basis for the Cantor Space C. Now let denote the Borel σ-algebra generated by this topology. The pair
accordingly forms a measurable space. In this article we restrict attention to probability functions on the Borel σ-algebra
over C.Footnote 4
We represent the prior beliefs of an ideal Bayesian-rational investigator by a probability function P on : a nonnegative, finitely additive set function on
satisfying
. We assume throughout that P is finitely additive:
whenever
is a finite set of pairwise disjoint members of
. We will sometimes but not always make the stronger assumption that P is countably additive:
whenever
is a finite or countably infinite set of pairwise disjoint members of
.
For every string s such that , we assume that when the investigator sees s as the first digits of the observed sequence, her new probability function is
, the result of conditionalizing P on [s]. For the purposes of this article, we need not specify what happens when an investigator observes a string to which she had previously assigned probability 0.
Now return to our investigator. Before seeing any digits, she might wonder: In the limit of seeing more and more digits, how likely is it that I will arrive at the truth about H? In other words, how likely is it that my probability for H will converge to 1 if H is true and to 0 otherwise?
A pessimistic answer to that question is: I am unlikely to converge to the truth (about H). A more confident answer is: I will probably converge to the truth. A maximally confident answer is: my probability that I will converge to the truth equals 1.
A celebrated long-run convergence theorem entails that if the investigator’s probability function is countably additive, then she is committed to the maximally confident answer:
Theorem. For any countably additive probability function P on and any hypothesis
,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220105124557802-0638:S0031824800008886:S0031824800008886_df1.png?pub-status=live)
where 1H is the indicator function for H (taking value 1 or 0 according to whether its argument is or is not a member of H).
Proof. This is an immediate consequence of the Lévy zero-one law (Reference Lévy1937).Footnote 5
So, countably additive Bayesian confirmation theory entails that in the above situation, rationality requires investigators to have full confidence that their opinions will converge to the truth. But Belot (Reference Belot2013) gives an ingenious argument that rationality requires no such thing.
3. The Orgulity Argument
Here is a stripped-down exposition of what I call the “orgulity argument”—the argument from Belot (Reference Belot2013) that poses the strongest challenge to Bayesianism.Footnote 6 Explaining the argument will require a bit of setup. As before, let H be a hypothesis about the infinite binary sequence that is in the domain of the investigator’s probability function. Examples of such hypotheses include that the sequence eventually becomes periodic, that it ends with the pattern “01010101… ,” that it is computed by a Turing Machine (that the function giving successive digits of the sequence is a computable function), or that it contains infinitely many zeros. (Note that H is not required to be countable; cf. Belot Reference Belot2013, n. 32.)
Say that an investigator is open-minded with respect to H if for every finite batch of evidence, there is a finite extension of it that would lead her to assign probability greater than 1/2 to H and also a finite extension of it that would lead her to assign probability less than 1/2 to H.Footnote 7 An open-minded investigator commits to never irrevocably making up her mind about whether H or not-H is more likely.
Formally, we say that a probability function P is open-minded with respect to a hypothesis if for every string s, there are strings t and t′, each of which is an extension of s, such that
and
. (A string t is said to be an extension of a length-n string s when t is at least as long as s and
for
.)
Remark If a hypothesis fails to be dense in C, or if its complement C\H fails to be dense, then some initial segment of the observed sequence decisively settles whether H is true. As a result, no probability function is open-minded with respect to such a hypothesis—and hence, no probability function is open-minded with respect to every . For example, no probability function is open-minded with respect to the universal hypothesis C, since for any probability function P and any string s,
whenever that conditional probability is well defined. Similarly, no probability function is open-minded with respect to the hypothesis H 111 that the sequence contains at least three ones, since for any probability function P and any extension t of the string 111,
whenever that conditional probability is well defined.
In contrast, there are many countably additive probability functions that are open-minded with respect to hypotheses H that are both dense in C and have dense complements. This is true even if H is uncountable.
For an example of a countably additive prior that is open-minded with respect to a countably infinite hypothesis, see Belot (Reference Belot2013, nn. 34, 37). For an example of a countably additive prior P that is open-minded with respect to an uncountable hypothesis H, let H be the set L 2/3 of sequences in which 1 occurs with limiting relative frequency 2/3. (The limiting relative frequency of ones in a sequence is defined to be .) Now let P be
, where Bp is the Bernoulli measure with bias p (the countably additive probability function on
that treats the digits of the sequence as independent random quantities, each with probability p of taking value 1).
To show that P is open-minded with respect to H, let s be any string of length n. We must show that there exist finite extensions t and t′ of s so that and
. We can do that by letting t be the result of appending many zeros to s and letting t′ be the result of appending many ones to s.Footnote 8
With the above setup in place, here is the first premise of the orgulity argument:
Premise 1. It is rationally permissible to be open-minded with respect to some hypothesis H.
Now take any Bayesian investigator with a countably additive prior who is open-minded about a hypothesis H and consider the set T of sequences that get her to converge to the truth about H. Formally, consider a countably additive probability function P that is open-minded with respect to H, and let .
We noted above that convergence theorems entail that this investigator must assign probability 1 to T. We will now see that T is in one sense a “tiny” set.
Start by defining the Banach-Mazur game. In this game two players generate an infinite binary sequence together, starting with the empty sequence. The players alternate moves; at each move a player extends the sequence by appending whatever finite block of digits she wishes. The goal of the player who moves second is to have the resulting infinite sequence fall outside of some fixed set G.Footnote 9
G is said to be meager if there exists a winning strategy for the second player in this game—in other words, if the second player can force the generated sequence to avoid G. When a set of sequences is meager, it is “tiny” in one sense—it is easy to avoid. (This is just one of several equivalent characterizations of the meager sets.) It is sometimes said that sequences “typically” have a property if the set of sequences that fail to have the property is meager.
The following fact, which is a consequence of Belot’s mathematical observations, drives the orgulity argument:
Fact. Consider a Bayesian investigator who is open-minded with respect to a hypothesis. The set of sequences that get the investigator to converge to the truth about the hypothesis is meager. In other words: “typical” sequences prevent the investigator from converging to the truth about the hypothesis. Formally: for every countably additive probability function P on and hypothesis
, if P is open-minded with respect to H, then T is meager. (Recall that T is the set of sequences for which P converges to the truth about H.) (Belot Reference Belot2013, 498–99)
Proof. In the Banach-Mazur game, player 2 can force the generated sequence to be one that prevents the investigator from converging to the truth about H by, at each of her turns, appending “a string of bits that causes P to [assign probability greater than 1/2 to H] followed by a string of bits that causes P to [assign probability less than 1/2 to H]” (Belot Reference Belot2013, n. 41). Player 2 can implement this strategy because P is open-minded.
Since we can see the truth of this fact, so can a reasonable open-minded investigator. She can see that typical sequences prevent her from converging to the truth about H. Given this, it seems permissible for her to have less than full confidence that she will converge to the truth about H. That is the next premise of the argument:
Premise 2. If it is rationally permissible to be open-minded about a hypothesis, then it is rationally permissible to have less than full confidence that one will converge to the truth about that hypothesis.
From premises 1 and 2 we get the conclusion of the orgulity argument:
Conclusion. It is rationally permissible to have less than full confidence that one will converge to the truth about some hypothesis.
We saw in section 2 that countably additive Bayesianism entails the negation of this conclusion. So if the argument is sound, then that theory stands refuted.
4. The Premises of the Orgulity Argument
Before advocating a reaction to the orgulity argument, let me address the plausibility of its premises. A defender of countably additive Bayesianism might try to reject premise 1 by claiming that open-mindedness is always irrational. That proposal is unappealing because (in the presence of countable additivity), it entails:
MaxCon For every hypothesis and every rational cautious investigator, there is some finite sequence of evidence that would get the investigator to become maximally confident about that hypothesis.
In the above statement, to say that an investigator is cautious is to say that no finite sequence of digits would get her to be maximally confident about whether the next digit is 0 or 1. And to say that an investigator is maximally confident about a claim is to say that her probability for that claim is either 0 or 1. For a proof that (assuming countable additivity) denying premise 1 entails MaxCon, see appendix A.
To see why MaxCon is false, consider an example. Suppose that H is the claim that the sequence contains infinitely many zeros. MaxCon entails that if a Bayesian investigator is cautious and rational, some finite sequence of digits would get her to assign probability 0 or 1 to H. But that is absurd. It is absurd that rationality in every case requires cautious investigators to count some finite string of digits as decisively settling whether the whole string contains infinitely many zeros. So premise 1 cannot reasonably be resisted.
What about premise 2? Belot (Reference Belot2013, 500) considers an opponent who rejects premise 2 for the following reason: sequences of evidence digits that prevent an open-minded investigator from converging to the truth are skeptical scenarios, and the investigator may therefore reasonably assign them total probability 0. Belot responds that such sequences are not skeptical scenarios. Whatever one thinks of that response, however, an additional response is available: even granting that the scenarios in question are skeptical scenarios, it does not immediately follow that they deserve zero probability.
As an example, consider a regularity that is very well confirmed: that gravity is an attractive. Here is a skeptical scenario: 1 year from now, gravity will suddenly turn repulsive. Given our evidence, that scenario deserves only a miniscule amount of probability. But that the scenario is a skeptical one does not immediately show that it deserves absolutely no probability. So simply calling failure-to-converge scenarios “skeptical scenarios” does not on its own make it reasonable to reject premise 2.
Furthermore, the Fact gives at least some initial support to premise 2. It is unsettling to think that to be rational one must have full confidence that one will converge to the truth, given that “typical” sequences prevent one from doing so.
That said, there is at least one objection to premise 2 that is worth taking seriously. One might follow de Finetti (Reference de Finetti, Machi and Smith1970, 34–35) and argue that there is no reason to treat as unlikely or epistemologically negligible those hypotheses that are “small” from a topological or set-theoretic point of view. In the current case, one might affirm that open-minded agents should be fully confident that they will converge to the truth. When it is pointed out that this means assigning probability 0 to a property of sequences that is “typical” in the topological sense, one might simply reply: “So what? That notion of typicality has no relevance to this case.”
What would advance the orgulity argument against this objection is an independent reason for thinking that topological notions of size are relevant in the current case.Footnote 10 Bottom line: the orgulity argument has some force as an objection to countably additive Bayesianism, although there is at least one defensible line of resistance to its second premise.
5. Finitely Additive Bayesianism Permits Humility
Happily, the orgulity argument has no force at all against finitely additive Bayesianism, a version of Bayesianism that rejects countable additivity as an across-the-board requirement of rationality.Footnote 11 That is because finitely additive Bayesians—those Bayesians who reject countable additivity as a requirement of rationality—can comfortably accept the conclusion of the argument. They can accept that it is rationally permissible for an open-minded investigator in the sequence situation to be less than fully confident that she will converge to the truth.Footnote 12
Indeed, they can (if they wish) accept something much stronger. Let us say that an investigator in the sequence situation is completely pessimistic if she is fully confident that she will “converge to the false”—that her probability for H will converge to 0 if H is true and to 1 otherwise. It turns out that some open-minded investigators with finitely additive priors are completely pessimistic (for a proof, see app. B).
It follows that finitely additive Bayesians are free to count even complete pessimism as being rationally permissible. That is as much humility as anyone can demand. Furthermore, finitely additive Bayesianism has significant independent appeal.Footnote 13 Moral: the orgulity argument has no force against finitely additive Bayesianism, a viable alternative to countably additive Bayesianism.
6. Long-Term Convergence Theorems Do Not Address the Charge of Excessive Subjectivity
Recall from section 1 the motive given for appealing to long-run convergence theorems: the deliverances of Bayesianism depend on the choice of a prior probability function. As a result, Bayesianism faces the charge of being excessively subjective and of not sufficiently explaining agreement among reasonable scientists.
In response to those charges, it is tempting to appeal to long-run convergence theorems in order to show that differences between rational priors eventually disappear.Footnote 14 Bayesians who reject countable additivity in response to the orgulity argument cannot appeal to those theorems for that purpose, since those theorems assume countable additivity.Footnote 15 So it might seem that such Bayesians give up a valuable defense against charges of excessive subjectivity.
But in fact, they do not. For the long-run convergence theorems are red herrings. The theorems provide Bayesians with no defense against charges of excess subjectivity. To see why, let me distinguish two versions of the charge of excess subjectivity and explain why long-run convergence theorems do not address either of them.
First, let us pin down a target. Let Core Bayesianism be the conjunction of the following claims:
1. Ideally rational agents have personal probability functions that represent their degrees of belief.
2. Conditionalization on new evidence is a rational way of updating one’s probability function.
Now for the first version of the charge, posed in characteristically trenchant fashion in Earman (Reference Earman1992, 137):
If in the face of currently available evidence you assign a high degree of belief to the propositions that Velikovsky’s Worlds in Collision scenario is basically correct, that there are canals on Mars, that the earth is flat, etc., you will rightly be labeled as having an irrational belief system. And if you arrived at your present beliefs within the framework of Bayesian personalism, then the temptation is to say that at worst there is something rotten at the core of Bayesian personalism and at best there is an essential incompleteness in its account of procedural rationality.
The argument is this: (1) starting with a bizarre prior and conditionalizing on available evidence will result in irrational opinions, but (2) Core Bayesianism does not entail that those opinions are irrational, and (3) a true and complete theory of scientific inquiry would entail this, so (4) Core Bayesianism is false or incomplete.
This argument—call it the problem of deviant priors—is a fair challenge. Various replies are possible. One might, for example, embrace pure subjectivism and admit that it can be rational to believe that the earth is flat, on the basis of current evidence. Or one might adopt an objective Bayesianism of one kind or another and endorse or seek constraints on priors that rule out the deviant ones as irrational.
However, in no case does it help the Bayesian to appeal to long-term convergence theorems. For the most such theorems might show is that in the limit of infinite inquiry, the opinions of a scientist starting with a deviant prior would be likely to approach those of an ordinary scientist. And that conclusion would do nothing to answer the problem of deviant priors, which concerns a single time (now) at which an agent has opinions alleged to be irrational (Earman Reference Earman1992, 148).
A less extreme challenge to Bayesianism is to explain the significant amount of agreement among actual scientists. One might pointedly ask the Bayesian: assuming the truth of Bayesianism, what explains scientific consensus about the basic structure of matter, the rough characteristics of the solar system, the boiling points of various liquids, the mechanism of photosynthesis, and so on? Is it merely a coincidence that the scientific method often gets scientists to rapidly agree? Call this the problem of scientific agreement.
In answering this problem, long-term convergence theorems are again of no help. For again, the most they could hope to show is that in the limit of infinite inquiry, scientists would likely approach agreement. That would not touch the question of why scientist have reached so much agreement already (Howson and Urbach Reference Howson and Urbach2006, 238).
7. Short-Term Convergence Theorems Help Address the Problem of Scientific Agreement
The previous section argued that long-term convergence theorems do not help the Bayesian answer charges of excessive subjectivity. One might wonder: Do any convergence theorems help the Bayesian address those charges?
When it comes to the problem of deviant priors, the answer is no. All convergence theorems place some constraints on priors, and opponents will always be able to ask about priors that violate the constraints.
But when it comes to the problem of scientific agreement, the answer is “at least a bit.” The remainder of this section describes some short-run convergence theorems and explains how they partly address the problem of scientific agreement.
As a toy example of such a theorem, consider two Bayesian agents who are about to observe what they both regard to be independent random draws from an urn containing 100 red and green balls in an unknown proportion. Straightforward calculations show that unless the agents start out with radically different opinions about what the proportion is, they will be confident that their opinions about the urn’s composition will become extremely similar—not just in the limit of infinite draws but soon (after a small number of draws). This result holds because the assumption of independent sampling is so powerful: it is easy for a small number of samples to vastly confirm one hypothesis about the composition of the urn over another. As a result, it is easy for initial differences of opinion about that composition to get swamped.
This toy theorem addresses a toy instance of the problem of scientific agreement. It explains why in the urn setup, unless scientists have extreme differences in priors, observing even a small number of samples from the urn will bring them into significant agreement. That gives the Bayesian a satisfying explanation of why agreement in such cases is so common.
The above story can be generalized to many situations in which scientists start with similar statistical models and use repeated experiments to estimate the value of certain model parameters. For example, suppose that a particular team all agrees on the likely behavior of an apparatus designed to measure the speed of light, conditional on hypotheses about that speed. Then the above story might explain why those team members rapidly come to agree about the speed of light, as they observe the behavior of the apparatus. (For short-run convergence results for cases of roughly this kind, see Savage [Reference Savage1954, sec. 3.6], Edwards, Lindman, and Savage [Reference Edwards, Lindman and Savage1963], Earman [Reference Earman1992, 142–43], Howson and Urbach [Reference Howson and Urbach2006, 239], and Hawthorne [Reference Hawthorne and Zalta2014, sec. 4.1].)
That shows that short-run convergence theorems are of at least some help in addressing the problem of scientific agreement. But there is more work to be done. For in most cases, scientists do not start with similar statistical models. What then? Here the Bayesian may appeal to short-run convergence results that rely on weaker assumptions.
I will not address this strategy in detail but will briefly say why it is promising. Start by adapting the setup from Hawthorne (Reference Hawthorne1993, theorem 6): consider ideal Bayesian agents who initially disagree about a finite set of hypotheses. Suppose that the agents are exposed to a common stream of observations relevant to those hypotheses. Crucially, do not assume that the agents regard successive observations as independent random draws. Instead impose a much weaker condition: that the agents expect successive observations to be, on average, at least slightly informative about the hypotheses in question.Footnote 16
Given these conditions, we may pick one hypothesis H and ask the following question. Supposing that H is true, how many observations would it take to make it at least 99% likely that the observations confirm H over its rivals to some given degree? Remarkably, the proof of Hawthorne (Reference Hawthorne1993, theorem 6) supplies answers to such questions.Footnote 17 And the answers to such questions can be parlayed into explicit bounds on how fast we may expect the agents to converge on the truth, given the truth of each competing hypothesis. If convergence is sufficiently fast in some particular domain, that would provide a satisfying Bayesian explanation for scientific agreement in that domain. It remains for future work to fill out the details of this approach.
The bottom line is that short-run convergence theorems provide a worked-out and convincing answer to the problem of scientific agreement in certain special cases, and there is reason to think that this answer can be extended to include many additional cases.
Happily, none of the short-run convergence theorems mentioned above rely on countable additivity. So finitely additive Bayesians can freely appeal to them.
Appendix A Proof That Countably Additive, Cautious Investigators Are Either Decisive or Extremely Open-Minded
In section 4 it is claimed that under the assumption of countable additivity, the falsity of premise 1 entails MaxCon. Here we prove a slightly stronger claim from which the above claim easily follows.
Say that an investigator with prior P is decisive with respect to if for some string s,
equals 0 or 1. Say that an investigator with prior P is extremely open-minded with respect to H if for every ε > 0 and for every evidence string, there is a finite extension of that evidence that would lead her to assign probability less than ε to H and also a finite extension of it that would lead her to assign probability greater than 1 − ε to H.
Recall that an investigator with prior P is said to be cautious if for no string s, , where s0 denotes the concatenation of s and the string 0. Note that a cautious investigator assigns strictly positive probability to every string. For suppose that a probability function P assigns probability 0 to some string. Then there must be strings s and sd such that sd is the result of appending a single digit d to s, and
while
. (Recall that the length-0 sequence z counts as a string, and so
.) So
; hence,
, and an investigator with prior P is not cautious.
Now for the main claim:
Claim. Suppose that an investigator is cautious and has a countably additive prior P, and let H be any member of . Then the investigator is decisive with respect to H or extremely open-minded with respect to H or both.
Proof. Consider a cautious investigator with countably additive prior P who is not decisive with respect to H. We will show that she is extremely open-minded with respect to H.
Take any string s. Note that (since the investigator is cautious) and
(since she is not decisive about H). Let
be the result of conditionalizing P on [s]. For any
, let Mp be the set of sequences x for which
. By the long-run convergence theorem described in section 2,
P′(P′ converges to the truth about H) = 1.
So . But
, so
. So M 0 is nonempty, and hence there exists a sequence x such that
. So for any ε > 0 there exists an n such that
. It follows that for any ε > 0, there is a finite extension s′ of s such that
. A similar argument shows that for any ε > 0, there is a finite extension s′ of s such that
. So the investigator is extremely open-minded. QED
Appendix B Proof of The Existence of an Open-Minded, Completely Pessimistic Finitely Additive Probability Function
In the following definitions, p ranges over reals in the open unit interval (0, 1) and i ranges over . Let Lp denote the set of infinite binary sequences whose limiting relative frequency of 1 equals p:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220105124557802-0638:S0031824800008886:S0031824800008886_df2.png?pub-status=live)
Given the formal setup of this article, the Bernoulli measure with bias p is the unique countably additive probability function Bp on such that for each
and each
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220105124557802-0638:S0031824800008886:S0031824800008886_df3.png?pub-status=live)
Thus, Bp treats the unknown digits of the successive terms of the true but unknown infinite binary sequence as independent and identically distributed binary random quantities, each with probability p of taking the value 1 (and probability 1 − p of taking the value 0). Observe that by the strong law of large numbers it follows that .
In what follows, the Bernoulli flip-flopper with bias p up to trial i is the unique countably additive probability function on
such that for each
and
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220105124557802-0638:S0031824800008886:S0031824800008886_df4.png?pub-status=live)
Thus, Bp treats the first i digits of x as independent and identically distributed binary random quantities, each with probability p of taking the value 1, and any remaining digits as independent and identically distributed binary random quantities, each with probability 1 − p of taking the value 1. Observe that since applies Bp only up to trial i and thereafter applies
, it follows from the strong law of large numbers that
.Footnote 18
The proof to follow depends on the notion of a Banach limit, which is introduced here in a way that closely follows Rao and Rao (Reference Rao and Rao1983, 39–40). Let be the space of bounded sequences of reals. A Banach limit
is a nonnegative linear functional on
such that
and for any
,
.
It can be shown that for any Banach limit T and any , if
exists, then
. For that reason, one may think of a Banach limit as generalizing the notion of the limit of a sequence of reals. When emphasizing this connection, we sometimes write
for a Banach limit of the sequence
.
Banach limits are not unique, and we are guaranteed of their existence only nonconstructively, by way of the axiom of choice or a weaker axiom such as the ultrafilter lemma. Dependence on Banach limits is what makes the proof below nonconstructive.Footnote 19
Claim. There exists an open-minded, completely pessimistic finitely additive probability function. That is, there exists an open-minded finitely additive probability function P on and a hypothesis
such that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220105124557802-0638:S0031824800008886:S0031824800008886_df5.png?pub-status=live)
Proof. Let blim be a Banach limit. Now define P 0 and P 1 as follows. For any , let
, and let
.
It is easy to check that P 0 and P 1 are finitely additive probability measures. For example, whenever H and H′ are disjoint sets of sequences, .
(Informally, we can think of P 0 and P 1 in the following way: P 0 treats the sequence as if a large initial segment of it is generated by tosses of a coin biased toward 1, and the rest by a coin biased toward 0. And P 1 treats the sequence in exactly the opposite way. In each case, the initial segment is expected to be extremely long, in the following sense: for every n, however large, P 0 and P 1 treat the first n digits as if they are part of the initial segment. That is what forces P 0 and P 1 to be merely finitely additive.)
Let ; P is clearly a finitely additive probability function. We will now complete the proof by showing that P is open-minded and completely pessimistic with respect to the hypothesis L .9. Note that for any
and for any i,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220105124557802-0638:S0031824800008886:S0031824800008886_df6.png?pub-status=live)
Equation (B1) holds by definition. Equation (B2) holds because and
, since for each i,
and
by the strong law of large numbers. Equation (B3) is simple algebra. Equation (B4) holds because for any binary sequence x and any natural number i,
and
. To see why, note that
.Footnote 20
Now consider what happens to (B4) as i approaches infinity: the proportion of ones in the first i digits of x approaches .9 (since ). As a result,
grows without bound, and hence (B4) approaches 0. So when
,
. A similar argument shows that when
,
.
It follows that P is open-minded, since for any initial segment of digits, appending a large enough finite block consisting of 90% ones will force P to assign a probability to L .9 that is arbitrarily close to 1, and appending a large enough finite block consisting of 90% zeros will force P to assign a probability to L .9 that is arbitrarily close to 0.
It also follows that P is completely pessimistic, since , and the above argument shows that P converges to the wrong verdict about L .9 for any sequence in
. QED