PROBABILISTIC STABILITY, AGM REVISION OPERATORS AND MAXIMUM ENTROPY

KRZYSZTOF MIERZEWSKI

doi:10.1017/S1755020320000404

PROBABILISTIC STABILITY, AGM REVISION OPERATORS AND MAXIMUM ENTROPY

Part of: Communication, information Artificial intelligence (68Txx) Foundations of probability theory General logic

Published online by Cambridge University Press: 21 October 2020

KRZYSZTOF MIERZEWSKI

Show author details

KRZYSZTOF MIERZEWSKI*: Affiliation:
LOGICAL DYNAMICS LAB, CENTER FOR THE STUDY OF LANGUAGE AND INFORMATION STANFORD UNIVERSITYSTANFORD, CA 94308, USAE-mail: kmierzew@stanford.edu

Article contents

Abstract
Tracking Bayesian conditioning
Preliminaries
The stability rule
The connection to AGM revision
Tracking and the No-Go theorem
How the stability rule fails tracking
Can we save tracking by changing the threshold?
Recovering AGM operators via Maximum Entropy
Conclusion
Footnotes
References

Rights & Permissions

Abstract

Several authors have investigated the question of whether canonical logic-based accounts of belief revision, and especially the theory of AGM revision operators, are compatible with the dynamics of Bayesian conditioning. Here we show that Leitgeb’s stability rule for acceptance, which has been offered as a possible solution to the Lottery paradox, allows to bridge AGM revision and Bayesian update: using the stability rule, we prove that AGM revision operators emerge from Bayesian conditioning by an application of the principle of maximum entropy. In situations of information loss, or whenever the agent relies on a qualitative description of her information state—such as a plausibility ranking over hypotheses, or a belief set—the dynamics of AGM belief revision are compatible with Bayesian conditioning; indeed, through the maximum entropy principle, conditioning naturally generates AGM revision operators. This mitigates an impossibility theorem of Lin and Kelly for tracking Bayesian conditioning with AGM revision, and suggests an approach to the compatibility problem that highlights the information loss incurred by acceptance rules in passing from probabilistic to qualitative representations of belief.

Keywords

AGM belief revision Bayesian conditioning probabilistic stability acceptance rules tracking maximum entropy

MSC classification

Primary: 03B42: Logics of knowledge and belief (including belief change)

Secondary: 68T27: Logic in artificial intelligence 60A05: Axioms; other general questions 94A17: Measures of information, entropy

Type: Research Article
Information: The Review of Symbolic Logic , Volume 15 , Issue 3 , September 2022 , pp. 553 - 590

DOI: https://doi.org/10.1017/S1755020320000404 [Opens in a new window]
Copyright: © The Author(s), 2020. Published by Cambridge University Press on behalf of The Association for Symbolic Logic

1 Tracking Bayesian conditioning

The Bayesian account of rational belief has both a static and a dynamic component. The static component amounts to the requirement that an agent’s belief states be represented as probability measures, meant to capture her degrees of belief, or credences. The dynamic one is embodied by the requirement that the revision of a credal state be carried out by Bayesian conditioning. By contrast, the more common notion of all-or-nothing belief encountered in classical epistemology, as well as in applied logic and logic-based artificial intelligence, is qualitativeFootnote ¹ : accordingly, doxastic states are often represented ‘qualitatively’ as (sets of) logical propositions, sometimes endowed with some extra structure (e.g., plausibility orderings). In this setting, perhaps the most prominent logic-based account of rational belief change is given by the AGM theory of belief revision (named after its creators Alchourrón, Gärdenfors, and Makinson) [Reference Alchourrón, Gärdenfors and Makinson1], in which revisions triggered by new information are modeled via AGM belief revision operators.

Thus, we have two rather intuitive representations of belief states—one probabilistic and one qualitative—and two corresponding accounts of rational revision of a belief state. It is then natural to ask how Bayesian belief dynamics differ from those of AGM revision. Such a comparison leaves us with a challenge both formal and methodological, motivated by two philosophical questions. Firstly, can one reduce the all-or-nothing notion of belief to a quantitative, probabilistic one? Secondly, what do these two accounts of belief dynamics have in common? Are they competing or compatible? Is there some common notion of dynamic rationality underlying both accounts? A particularly salient question is whether there is any precise sense in which we can see all-or-nothing belief as a coarse-grained counterpart of a Bayesian agent’s credences, and whether there are any circumstances under which we can see AGM revision as an approximation of Bayesian conditioning.

Answering these questions first requires a well-behaved translation between the two representations of belief. One prominent way to approach these questions is through the study of acceptance rules, which map each probability measure to a propositional belief state.Footnote ² Acceptance rules provide an intuitive framework for elucidating the logic(s) of uncertain acceptance, through a simple model that tells us, given an agent’s subjective probability measure, which hypotheses are accepted (or believed) simpliciter.Footnote ³

Although providing a reasonable acceptance rule has been fraught with difficulties—as evidenced by the ever-growing literature on Kyburg’s Lottery paradox [Reference Kyburg17]—several recent proposals have opened up some promising avenues [Reference Kelly and Lin14, Reference Leitgeb18, Reference Leitgeb21]. Notable among these is Leitgeb’s stability rule [Reference Leitgeb18, Reference Leitgeb, Baltag and Smets19, Reference Leitgeb20], which is based on the notion of probabilistic stability (itself adapted from Skyrms’ notion of probabilistic resiliency [Reference Skyrms31]). The key idea is that accepted propositions ought to follow from hypotheses that are resilient, or stable, under new information. The rule is promising in that it succeeds in preserving some intuitions behind the Lockean rule (which recommends the acceptance of all and only propositions with probability above a fixed threshold), while avoiding the Lottery paradox: it also allows to preserve the closure of accepted propositions under logical consequence.

In this paper, we investigate the extent to which Leitgeb’s stability rule can be employed to bridge the dynamics of Bayesian conditioning with the AGM theory of belief revision. The central question concerns what Lin and Kelly [Reference Kelly and Lin15] have called the tracking problem. Tracking is a simple commutativity condition that provides a criterion of dynamic compatibility between probabilistic and qualitative revision. Roughly, a qualitative belief revision method tracks Bayesian conditioning modulo an acceptance rule if, starting from any probabilistic credal state, translation through the acceptance rule followed by qualitative revision results in the same belief state as first using Bayesian conditioning followed by translation (see Figure 1). In this sense, a belief revision operator can be seen as emulating Bayesian conditioning at the qualitative level, where all-or-nothing beliefs are determined by an acceptance rule.

Fig. 1 The general tracking problem.

Here, we study the tracking problem for AGM revision in the light of Leitgeb’s rule. Our starting point is Lin and Kelly’s No-Go Theorem for tracking [Reference Kelly and Lin15], which shows that there exists no well-behaved revision policy that satisfies the AGM axioms of Inclusion and Preservation and, at the same time, tracks Bayesian conditioning. In particular, the No-Go Theorem entails that Bayesian conditioning cannot be tracked by AGM operators through Leitgeb’s rule. Yet, the principles behind the stability rule point to an interesting connection between subjective probability and AGM revision, and it has been argued that the rule offers hope for a reconciliation between Bayesian and AGM dynamics [Reference Leitgeb18, Reference Leitgeb21]. This raises the question of whether a weaker form of dynamic compatibility, short of full commutativity, can still be achieved via Leitgeb’s rule.

We consider two ways in which one may try to circumvent the No-Go Theorem so as to approximate agreement between AGM revision and Bayesian conditioning via the stability rule. First, we consider the possibility of generating a well-behaved AGM revision policy by appropriately adjusting the stability threshold at each update. We show that this method does not succeed: this failure raises some difficulties for the “peace project” between Bayesian and AGM-compliant revision operators, and sheds light on the role of thresholds in Leitgeb’s Humean Thesis on belief [Reference Leitgeb21].

We then prove our main result, which establishes that AGM revision can in fact emerge from Bayesian conditioning and the stability rule. We show that, in a precise sense, AGM revision can be seen as deriving from (1) Leitgeb’s rule, (2) Bayesian conditioning, and (3) a version of the maximum entropy principle (Theorems 8.1 and 8.4). In situations of information loss, or whenever the agent relies on a qualitative description of her information state—a plausibility ranking over hypotheses, or a belief set—the dynamics of AGM belief revision are compatible with Bayesian conditioning; indeed, through the maximum entropy principle, conditioning naturally generates AGM revision operators. This suggests that one could study qualitative revision operators as special cases of Bayesian reasoning which naturally arise in situations of information loss or incomplete probabilistic specification of the agent’s doxastic state.

2 Preliminaries

We work with probability spaces $(\Omega,\mathfrak {A},\mu )$ , with $\mathfrak {A}$ a set algebra (field of sets) over a sample space $\Omega $ , and $\mu $ a probability measure on $\mathfrak {A}$ . We represent propositions $A,B,..., X,Y,Z$ as elements of the set algebra $\mathfrak {A}$ . We let $\Delta _{\mathfrak {A}}$ denote the set of all probability distributions on $\mathfrak {A}$ . A belief set over the algebra of propositions $\mathfrak {A}$ will be identified by a single proposition in $\mathfrak {A}$ : we will work with logically closed belief sets on a finite space, where the belief set of an agent can be uniquely identified with the strongest accepted proposition, conceived of as the conjunction of all accepted propositions. An acceptance rule $\alpha $ maps a probability distribution $\mu $ in $\Delta _{\mathfrak {A}}$ to the logically strongest accepted proposition $\alpha (\mu )\in \mathfrak {A}$ . We then say that an agent accepts (or ‘believes’) a proposition $X\in \mathfrak {A}$ if and only if $\alpha (\mu )\subseteq X$ . By a slight abuse of terminology, the strongest accepted proposition will also be called the ‘belief set’.

In this framework, a qualitative (or propositional) belief revision operator is a function $\ast :\mathfrak {A}\times \mathfrak {A}\rightarrow \mathfrak {A}$ . It is understood that the first variable represents the current strongest accepted proposition, and the second the new revision input. Given a belief set K and a proposition X, the output $K^{\ast }X$ denotes the new belief set (strongest accepted proposition) obtained by revising K by new information X. We will sometimes write a revision by $X\in \mathfrak {A}$ as a function $(\cdot )^{\ast }X:\mathfrak {A}\rightarrow \mathfrak {A}$ parametrized by X.

We assume some acquaintance with the basics of AGM belief revision theory, as presented in [Reference Alchourrón, Gärdenfors and Makinson1, Reference Arló-Costa, Pedersen, Horsten and Pettigrew2, Reference Grove10]. Since the AGM revision postulates are usually formulated in terms of operators acting on sets of logical formulae, it is worth noting that we can adapt them to our context as follows: for any belief state $K\in \mathfrak {A}$ (where K is the strongest accepted proposition) and propositions $X, Y$ , the revision $\ast $ is AGM-compliant (or simply AGM) if we have the following:

• $K^{\ast }X\subseteq X$
• $K\cap X\subseteq K^{\ast }X$ (Inclusion)
• If $K\cap X\neq \emptyset $ , then $K^{\ast }X\subseteq K\cap X$ (Preservation)
• If $K^{\ast }X=\emptyset $ then $K=\emptyset $ or $X=\emptyset $
• $(K^{\ast }X)\cap Y\subseteq K^{\ast }(X\cap Y)$
• If $(K^{\ast }X)\cap Y\neq \emptyset $ , then $K^{\ast }(X\cap Y)\subseteq (K^{\ast }X)\cap Y$

As usual, a probability measure on $\mathfrak {A}$ is a function $\mu :\mathfrak {A}\rightarrow [0,1]$ which is additive (namely, $X\cap Y=\emptyset $ entails $\mu (X\cup Y)=\mu (X)+\mu (Y)$ ) and satisfies $\mu (\Omega )=1$ . Instead of $\mu (\{\omega \})$ we will write $\mu (\omega )$ for simplicity. For finite powerset algebras, a probability distribution on $\Omega $ will be identified with its probability mass function $\mu :\Omega \rightarrow [0,1]$ such that $\sum _{\omega \in \Omega }\mu (\omega )=1$ . Such a function extends uniquely to a probability measure on $\mathfrak {A}$ . We will often denote Bayesian conditioning on $X\in \mathfrak {A}$ in parametric form as $|_{ X}$ : as usual, we have $\mu _{ X}(Y):=\mu (Y|X)=\frac {\mu (Y\cap X)}{\mu (X)}$ .

For finite probability spaces with $\Omega =\{\omega _{1},..,\omega _{n}\}$ , we will identify probability measures $\mu $ with vectors $(\mu (\omega _{1}),...,\mu (\omega _{n}))\in \mathbb {R}^{n}$ , in which case the set $\Delta _{\mathfrak {A}}$ of probability measures on $\Omega $ is the regular $(n-1)$ -simplex $\Delta ^{n-1}$ . In the last section we will make use of the notion of Shannon entropy for probability distributions on finite spaces. When $\mathfrak {A}$ is a finite powerset algebra, the Shannon entropy $\mathcal {H}(\mu )$ of a distribution $\mu \in \Delta _{\mathfrak {A}}$ is defined as $\mathcal {H}(\mu )=\sum _{\omega \in \Omega }-\mu (\omega )\log \mu (\omega )$ . When $S\in \mathfrak {A}$ is some finite set, we write $\mathcal {H}(\mu \upharpoonright S):=\sum _{\omega \in S}-\mu (\omega )\log \mu (\omega )$ . Sometimes we may wish to distinguish $\mathcal {H}$ as a function of n arguments (e.g., seeing the argument $\mu $ as $(\mu (\omega _{1}),...,\mu (\omega _{n}))$ ), in which case we denote it as $\mathcal {H}_{n}(x_{1},...,x_{n})=\sum ^{n}_{i=1}-x_{i}\log x_{i}$ . A motivation for the notion of entropy (and its use in uncertain reasoning) can be found in [Reference Roman27, Reference Paris26, Reference Halpern11]; see [Reference Roman27] for basic properties of entropy measures. A useful fact is the following grouping property: whenever we have a finite partition of $\Omega $ so that $\Omega =\biguplus _{i\leq m} B_{i}$ , we have $\mathcal {H}(\mu )=\mathcal {H}_{m}(\mu (B_{1}),...,\mu (B_{m}))+\sum ^{m}_{i=1}\mu (B_{i})\mathcal {H}(\frac {1}{\mu (B_{i})}\mu \upharpoonright {B_{i}})$ , where the notation $k\mu $ denotes the measure defined as $(k\mu )(\omega )=k\cdot \mu (\omega )$ .

3 The stability rule

Stability-based acceptance principles were introduced by Leitgeb [Reference Leitgeb18], following an earlier proposal by Skyrms [Reference Skyrms31]. To set the stage for our investigation, we define the stability rule and explain how it can be seen as deriving from a few plausible requirements. We will then clarify how it provides a bridge from probabilistic reasoning to AGM revision, before presenting Lin and Kelly’s No-Go Theorem for AGM operators [Reference Kelly and Lin15].

Lockean acceptance can be understood as the conjunction of two principles. Suppose the agent’s credal state is represented by a probability measure $\mu $ on some fixed algebra $\mathfrak {A}$ . Let $\lambda _{t}(\mu )$ denote the strongest accepted proposition under the Lockean rule ( $\lambda $ ) with threshold $t\in (0.5,1]$ . Consider the following:

$$ \begin{align*} & (\rightarrow)\ \text{If}\ \lambda_{t}(\mu)\subseteq X \text{ then }\mu(X)\geq t\ { ({accept\ only\ propositions\ that\ have\ probability\ above}\ }t) \\ & (\leftarrow)\ \text{If } \mu(X)\geq t \text{ then } \lambda_{t}(\mu)\subseteq X\ { ({accept\ all\ propositions\ that\ have\ probability\ above}\ }t) \end{align*} $$

The conjunction of these principles constitutes what is known the Lockean thesis [Reference Foley7]. Both of them are intuitive but, as shown by the Lottery paradox, easily lead to accepting contradictions: for, let $\frac {1}{2}\leq t<\frac {n-1}{n}$ for some natural number $n>2$ , and suppose $\mu $ is a uniform distribution on some finite space $\Omega =\{\omega _{1},...,\omega _{n}\}$ (e.g., a lottery with n tickets, which the agent believes to be fair, where each $\{\omega _{i}\}$ represents the proposition ‘ticket i will win’, and it is assumed only one ticket can win). Then we have, for any $i\leq n$ : $\mu (\omega _{i})=1/n$ , so $\mu (\Omega \setminus \{\omega _{i}\})=\frac {n-1}{n}> t$ . Hence, $\lambda _{t}(\mu )\subseteq \Omega \setminus \{\omega _{i}\}$ . As this holds for any $i\leq n$ , it means that the agent believes of each ticket that it will not win. But it was an elementary assumption (encoded in the sample space $\Omega $ ) that some ticket will win (thus the probability that one of the tickets will win is 1). Formally, we see that $\lambda _{t}(\mu )\subseteq \bigcap _{i\leq n}\Omega \setminus \{\omega _{i}\}=\emptyset $ . So $\lambda _{t}(\mu )=\emptyset $ : i.e., the agent believes a contradiction.

To avoid the shortcomings of the Lockean rule, Leitgeb defends in [Reference Leitgeb18] an acceptance rule based on the notion of stably high probability. We first define the key notion of stability.

Definition 3.1 (Stability)

Let $(\Omega,\mathfrak {A},\mu )$ a probability space and $t\in (0.5,1]$ . A set $X\in \mathfrak {A}$ is $(\mu,t)$ -stable if and only if $\ \forall Y\in \mathfrak {A}$ such that $X\cap Y\neq \emptyset $ and $\mu (Y)>0$ , $\mu _{ {Y}}(X)\geq t$ .

Any $(\mu,t)$ -stable set has unconditional probability above the threshold (consider conditioning on the tautological proposition $\Omega $ ). Stability captures a notion of robustness under new information: a proposition X is $(\mu ,t)$ -stable if only learning a proposition inconsistent with X can bring the probability of X below the threshold.Footnote ⁴ In this sense, X has no defeaters—that is, propositions consistent with X which lower its probability below t. We can then say that a stable proposition is one whose probability is stably high.

In order to define the stability rule more precisely in our framework and understand its behavior, we recall some basic results about probabilistically stable sets from Leitgeb [Reference Leitgeb18, Reference Leitgeb21].Footnote ⁵ For our purposes, the following observation is significant:

Proposition 3.2 (Leitgeb [Reference Leitgeb18, Reference Leitgeb21])

Let $(\Omega ,\mathfrak {A},\mu )$ a probability space and $t\in (0.5,1]$ . Then the set ${\mathfrak {S}^{t}_{<1}(\mu ):=\{X\in \mathfrak {A}\,|\,\mu (X)<1\text { and }\,X\text { is }(\mu ,t)\text {-stable}\}}$ is well-ordered by set inclusion, and has order type at most $\omega $ .

Thus, the collection of all $(\mu ,t)$ -stable sets with probability less than 1 is well-ordered (and at most countable): as a consequence, whenever there is at least one such set for a given $\mu $ and t, a $\subseteq $ -least one exists. However, if the collection $\mathfrak {S}^{t}_{<1}(\mu )$ above is empty—i.e., all $(\mu ,t)$ -stable sets have measure 1—the well-order property is not guaranteed. In order to avoid this difficulty here, we follow Leitgeb in restricting our attention to those probability spaces which admit a $\subseteq $ -least set among all sets with measure 1. More formally, we work with probability spaces which satisfy the following Least Certain Set property (LCS): $\exists X\in \mathfrak {A}$ s.t. $\mu (X)=1$ and for any Y, if $\mu (Y)=1$ then $X\subseteq Y$ . Among the measures satisfying (LCS), we can find all probability measures on finite algebras, all countably additive measures on full powersets of countable sets, and all countably additive measures on regular spaces (s.t. $\mu (X)=0$ iff $X=\emptyset $ ) [Reference Leitgeb18]. Many probability spaces that are typically of interest in artificial intelligence, or used in canonical examples in formal epistemology, satisfy (LCS). In the following we will be dealing mostly with finite probability spaces, in which case this restriction shall not further concern us. (LCS) trivially ensures the following:

Proposition 3.3 (Leitgeb [Reference Leitgeb18, Reference Leitgeb21])

Let $(\Omega ,\mathfrak {A},\mu )$ a probability space satisfying (LCS), and $t\in (0.5,1]$ . Let $S_{\infty }$ the least measure-1 set in $\mathfrak {A}$ . Then the set $\mathfrak {S}^{t}(\mu ):=\mathfrak {S}^{t}_{<1}(\mu )\cup \{S_{\infty }\}$ is well-ordered by set-inclusion.

We call $\mathfrak {S}^{t}(\mu )$ the system of spheres generated by the measure $\mu $ , in reference to Grove’s construction of plausibility orderings for AGM revision [Reference Grove10], to which the link will be made shortly. When the $\mu $ and t are implicit, we can simply refer to the $(\mu ,t)$ -stable sets in $\mathfrak {S}^{t}(\mu )$ as spheres. We can define the stability rule for acceptance:

Definition 3.4 (The $\tau $ -rule)

For any probability measure $\mu $ on $\mathfrak {A}$ which satisfies (LCS), and any $t\in (0.5,1]$ , let $\mathfrak {S}^{t}(\mu )$ the system of spheres generated by $\mu $ . Then we define the map $\tau _{t}:\Delta _{\mathfrak {A}}\rightarrow \mathfrak {A}$ as

$$ \begin{align*} \tau_{t}(\mu):=\min\limits_{\subseteq}\mathfrak{S}^{t}(\mu) \end{align*} $$

Whenever the threshold is fixed and implicit in the discussion, we drop the subscript and denote this map as $\tau $ . Since, under (LCS), the system of spheres $\mathfrak {S}^{t}(\mu )$ is always well-ordered by $\subseteq $ , the expression $\tau _{t}(\mu )$ is clearly well-defined: the strongest accepted proposition under Leitgeb’s $\tau $ -rule is the logically strongest stable set for the given $\mu $ and t. A proposition X is accepted if and only if $\tau (\mu )\subseteq X$ , i.e. if it is entailed by the strongest stable set.

There are many equivalent, but conceptually distinct ways to motivate the stability rule: one simple way to do so is as follows. Leitgeb’s rule follows the basic intuition behind the Lockean rule: the idea is to preserve the ( $\rightarrow $ )-direction of the Lockean thesis, while modifying the ( $\leftarrow $ )-direction so that the agent is never led to accept a contradiction (thus avoiding Lottery-like paradoxes). Instead of believing all propositions with probability above the threshold, one restricts acceptance to only some of them. Consider the following:

Restricted Lockean Principle (RLP): Accept all $(\mu , t)$ -stable propositions.

The ( $\leftarrow $ )-direction of the Lockean thesis recommends the acceptance of all high-probability propositions (those with probability at least $t)$ . (RLP) restricts this requirement to stably high propositions: high-probability events whose conditional probabilities remain high after such conditioning. This can be seen as a natural extension of the Lockean high-probability criterion to (a certain type of) conditional beliefs. The stability rule can now be described by a simple recipe: first (1) accept all $(\mu , t)$ -stable propositions, and then (2) close under logic. The set of accepted propositions is the smallest logically closed set containing all $(\mu , t)$ -stable propositions (this follows immediately from the existence of a logically strongest stable proposition).

Alternatively, the stability rule can be described as follows: the accepted propositions are exactly those that are entailed by a probabilistically stable hypothesis. From this perspective, the stability rule can be seen as deriving from the following principle which captures the view of stable hypotheses as justifications:

The Stable Justification Principle given a threshold t and probability measure $\mu $ , accept all and only propositions that are entailed by a $(\mu ,t)$ -stable event in $\mathfrak {A}$ .

That is, an agent with subjective probability measure $\mu $ accepts exactly the propositions that admit a probabilistically stable justification.

Lastly, the stability rule $\tau $ can be seen as a particular version of the Humean thesis, which requires that each accepted proposition be probabilistically stable upon conditioning on any event that is not disbelieved (in the sense of its negation being accepted). The Humean thesis constitutes a more permissive variant of Leitgeb’s stability theory, developed in [Reference Leitgeb21]: it gives a ‘coherence’ restriction which tells us when the probabilistic and propositional representations of a doxastic state are in harmony, without making all-or-nothing beliefs supervenient on the probability space and the stability threshold. There, acceptance depends on an additional threshold parameter. We will discuss this account in §7. By contrast, in the above, both the stability rule $\tau $ and the Lockean rule $\lambda $ entirely specify exactly a unique belief set for each measure and each value of $t\in (0.5,1]$ : in each case we have well-defined maps $\lambda _{t}, \tau _{t}:\Delta _{\mathfrak {A}}\rightarrow \mathfrak {A}$ .

The stability rule always yields consistent and logically closed belief sets. The logical closure of accepted propositions is immediate, as we take all and only consequences of the strongest stable proposition. Note that any measure-1 set is always $(\mu ,t)$ -stable, as all its potential defeaters have measure 0 and cannot be conditioned upon: so, there is always some $(\mu ,t)$ -stable set in $\mathfrak {A}$ which can be chosen as strongest accepted proposition. Secondly, since $(\mu ,t)$ -stable sets always have probability at least t, the rule guarantees that the $(\rightarrow )$ -direction of the Lockean thesis is satisfied: supposing K is $(\mu ,t)$ -stable, we have that $K\subseteq X$ entails $t\leq \mu (K)\leq \mu (X)$ . This ensures that no contradiction is ever accepted. So the $\tau $ rule avoids Lottery-like paradoxes: one can easily see that, for any uniform distribution $\mu $ on a finite algebra $\mathfrak {A}=\mathcal {P}(\Omega )$ , the only $(\mu ,t)$ -stable set is $\Omega $ : take any $X\subset \Omega $ . Take $\omega \in X$ , $\omega ^{\prime }\not \in X$ and let $Y:=\{\omega ,\omega ^{\prime }\}$ . Then $\mu (Y)>0$ and $\mu _{ Y}(X)=\frac {\mu (X\cap Y)}{\mu (Y)}= \frac {\mu (\omega )}{\mu (\omega )+\mu (\omega ^{\prime })}=1/2<t$ . Thus, no such X is stable: we have $\tau _{t}(\mu )=\min _{\subseteq }\{\Omega \}=\Omega $ , so the agent accepts only the tautological proposition.

4 The connection to AGM revision

What makes the stability rule so interesting for our purpose is that, on the dynamic side, it is very closely connected to AGM revision operators. This connection, which Leitgeb explores in [Reference Leitgeb21, chap. 4], is made by noticing that each system $\mathfrak {S}^{t}(\mu )$ of $(\mu ,t)$ -stable sets can be seen as a system of spheres centered on $\tau (\mu )$ , in the sense of Grove [Reference Grove10]: i.e., (i) it is totally ordered by $\subseteq $ with minimum $\tau (\mu )$ , (ii) we have $S_{\infty }\in \mathfrak {S}^{t}(\mu )$ , and (iii) for every proposition $X\in \mathfrak {A}$ , if X intersects some $S\in \mathfrak {S}^{t}(\mu )$ , then there is a $\subseteq $ -minimal $S_{ X}\in \mathfrak {S}^{t}(\mu )$ which intersects X (this follows form the well-ordering of $\mathfrak {S}^{t}(\mu )$ ). Grove’s well-known representation theorem (see [Reference Grove10, Reference Arló-Costa, Pedersen, Horsten and Pettigrew2]) states that a revision operator $K^{\ast }(\cdot ):\mathfrak {A}\rightarrow \mathfrak {A}$ acting on a belief set K is AGM if and only if there is a system of spheres $\mathfrak {S}$ centered on K such that, for any X, we have $K^{\ast }X=S_{ X}\cap X$ , where $S_{ X}=\min _{\subseteq }\{S\in \mathfrak {S}\,|\,S\cap X\neq \emptyset \}$ . A consequence of this is that, given a system of the form $\mathfrak {S}^{t}(\mu )$ —which we can now call ‘sphere system’ with full legitimacy—we can define a revision operator on $\tau (\mu )$ as $\tau (\mu )^{\ast }X:=S_{ X}\cap X$ , where $S_{ X}:=\min _{\subseteq }\{S\in \mathfrak {S}^{t}(\mu )\,|\,S\cap X\neq \emptyset \}$ . Grove’s theorem then entails that this revision is AGM. We will simply refer to it as the revision operator generated by $\mathfrak {S}^{t}(\mu )$ , or equivalently, generated by $\tau $ (for some fixed threshold). The AGM operator generated by a system of spheres $\mathfrak {S}$ can also be described as follows: given a revision input X, take the restricted system of spheres $\mathfrak {S}\restriction X:=\{S\cap X\,|\, S\in \mathfrak {S}, S\cap X\neq \emptyset \}$ : the new belief set is given by the new smallest sphere $\min _{\subseteq } (\mathfrak {S}\restriction X)$ .

Observe that the sphere system $\mathfrak {S}^{t}(\mu )$ generates a ranking (total preorder) on $\Omega $ . First note that we can index all spheres in $\mathfrak {S}^{t}(\mu )$ by natural numbers (with the possible exclusion of $S_{\infty }$ ) so that $S_{i}\subseteq S_{j}$ if and only if $i\leq j$ : this is possible as the order type of $\mathfrak {S}^{t}_{<1}(\mu )$ is at most the ordinal $\omega $ . We can then define ranks generated by $\mathfrak {S}^{t}(\mu )$ as follows: $R_{0}:=S_{0}$ , and $R_{n+1}:=S_{n+1}\setminus S_{n}$ . If $\mathfrak {S}^{t}_{<1}(\mu )$ is infinite, we have $S_{\infty }=\bigcup _{i\in \mathbb {N}} S_{i}=\biguplus _{i\in \mathbb {N}} R_{i}$ . We say that, for $\omega $ , $\omega ^{\prime }\in \Omega $ , $\omega \preceq ^{\tau }_{\mu }\omega ^{\prime }$ if and only if $\min \{i\,|\,\omega \in R_{i}\}\leq \min \{i\,|\,\omega ^{\prime }\in R_{i}\}$ . We read $\omega \preceq ^{\tau }_{\mu }\omega ^{\prime }$ as saying that $\omega $ is at least as plausible as $\omega ^{\prime }$ . For finite powerset spaces, we can also treat all states in $\Omega $ with measure 0 as being less plausible than all other states. In this way, the $\tau $ -rule effectively translates a probability measure into a qualitative plausibility ordering, using the notion of stability as a bridge between the quantitative and propositional representations.

Thus, we see how the $\tau $ -rule generates a qualitative revision which is AGM. In this sense, it provides a qualitative revision policy: to each probability measure, it assigns not only a qualitative belief state, but also an (AGM-complying) qualitative revision operator for revising the latter. We can now turn to dynamics, and ask to what extent the resulting AGM revisions can be said to agree with Bayesian conditioning.

5 Tracking and the No-Go theorem

In general, a qualitative revision policy A maps each $\mu \in \Delta _{\mathfrak {A}}$ to a proposition $\alpha (\mu )$ and a revision operator $\ast $ applicable to that proposition,Footnote ⁶ and dependent only on $\mu $ . We then say the policy A is based on the underlying acceptance rule $\alpha $ . It is AGM whenever all revision operators it generates are. Following Lin and Kelly’s [Reference Kelly and Lin15] terminology, a revision policy is said to track conditioning if the following condition holds.

Definition 5.1 (Tracking)

A qualitative belief revision policy based on the acceptance rule $\alpha $ tracks Bayesian conditioning if we have the following commutativity property:

$$ \begin{align*} \forall \mu\in\Delta_{\mathfrak{A}}, \forall X\in\mathfrak{A}\ with\ \mu(X)>0, \alpha(\mu)^{\ast}{X}=\alpha(\mu_{ X}), \end{align*} $$

where $\ast $ is the associated revision operator.

This notion is illustrated in Figure 1. We say that AGM revision can track Bayesian conditioning modulo $\alpha $ if there is some AGM-complying revision policy that is based on $\alpha $ and tracks Bayesian conditioning. This corresponds to a straightforward requirement of agreement between the probabilistic and qualitative doxastic states under translation by $\alpha $ , which must persist under updating by new information. The problem of tracking Bayesian conditioning with qualitative revision operators seems to have been first explicitly addressed by Lin and Kelly in [Reference Kelly and Lin15]: there, they severely constrain the hope for a harmonious link between Bayesian kinematics and AGM operators, by proving the following:

Theorem 5.2 (The No-Go Theorem, Lin and Kelly [Reference Kelly and Lin15])

Let $|\Omega |>2$ , $\mathfrak {A}$ a field of sets over $\Omega $ , and let $\alpha :\Delta _{\mathfrak {A}}\rightarrow \mathfrak {A}$ be any sensible acceptance rule. Then no AGM revision policy based on $\alpha $ tracks Bayesian conditioning.

What is a sensible rule? Sensibility amounts to a list of four properties which are intended to give minimal conditions for acceptance rules to count as well-behaved. The most important of those conditions is that the acceptance rule should never lead one to accept the contradictory proposition $\emptyset $ . The other three give fairly natural constraints on the geometry of acceptance zones in $\Delta _{\mathfrak {A}}$ (an acceptance zone for a proposition X is simply the set of all measures in $\Delta _{\mathfrak {A}}$ where X is the strongest accepted proposition, according to the rule). We omit the exact definition of sensible rules, as the general case for arbitrary rules is not at the center of our attention here: for our purposes, it suffices to say that Leitgeb’s $\tau $ -rule can be easily checked to be sensible. Nonetheless, the No-Go Theorem deserves to be stated in its general form, as it indicates that the problem of reconciling AGM revision with Bayesian kinematics goes beyond the difficulties encountered by the $\tau $ -rule:Footnote ⁷ simply put, under relatively weak constraints on the acceptance rule, AGM revision cannot track Bayesian conditioning.

6 How the stability rule fails tracking

Let us take a closer look at those difficulties, to understand how the No-Go Theorem applies in our case. It entails that, once we fix a threshold, a sample space $\Omega $ and algebra $\mathfrak {A}$ (with $|\Omega |>2$ : we assume this henceforth), there always will be some $\mu \in \Delta _{\mathfrak {A}}$ and $X\in \mathfrak {A}$ s.t. $\mu (X)\neq 0$ and $\tau (\mu )^{\ast }{X}\neq \tau (\mu _{ X})$ , where $\ast $ is the $\tau $ -generated revision operator. Note the following:

Observation 6.1. Let $\mu \in \Delta _{\mathfrak {A}}$ , $t\in (0.5,1]$ , and $\ast $ the AGM revision generated by $\tau _{t}$ . Then $\forall X\in \mathfrak {A}$ with $\mu (X)>0$ , the set $\tau (\mu )^{\ast }{X}$ is $(\mu _{ X},t)$ -stable.

Proof. We show $S_{ X}\cap X$ is $(\mu _{ X},t)$ -stable (where $S_{ X}$ is the strongest stable set intersecting X, as defined above). Let $Y\in \mathfrak {A}$ such that $S_{ X}\cap X\cap Y\neq \emptyset $ and $\mu _{ X}(Y)>0$ . As $\mu _{ X}(Y)=\frac {\mu (Y\cap X)}{\mu (X)}$ , this entails $\mu (X\cap Y)>0$ . We have $S_{ X}\cap (X\cap Y)\neq \emptyset $ , so since $S_{ X}$ is $(\mu ,t)$ -stable, we can write $\mu _{X\cap Y}(S_{ X})\geq t$ . But $\mu _{X\cap Y}(S_{ X})=\frac {\mu ((S_{ X}\cap X)\cap Y)}{\mu (X\cap Y)}=\mu _{X\cap Y}(S_{ X}\cap X)$ ; we can write $\mu _{X\cap Y}(S_{ X}\cap X)=\mu _{ X}(S_{ X}\cap X\, |\, Y)\geq t$ , as required. □

Since $\tau (\mu _{ X})$ is the minimal $(\mu _{ X},t)$ -stable set, this observation entails that all revision cases yield $\tau (\mu _{ X})\subseteq \tau (\mu )^{\ast }{X}$ : thus, the belief state obtained by Bayesian conditioning followed by translation through the $\tau $ -rule is in general at least as logically strong as the one obtained by translation followed by the associated AGM revision. As a consequence, whenever tracking fails for the $\tau $ -rule (and the associated revision), we must have the strict entailment $\tau (\mu _{ X})\subset \tau (\mu )^{\ast }{X}$ . The converse holds trivially: this characterizes all revision cases for which $\tau $ -generated revision fails to commute with Bayesian conditioning (modulo $\tau $ ). Consider the following:

Example 6.2. Let $\Omega :=\{\omega _{1},...,\omega _{4}\}$ and $\mathfrak {A}$ the full power set algebra over $\Omega $ . Set $t=0.7$ . Consider the distribution $\mu =(0.5,0.33,0.12,0.05)\in \Delta _{\mathfrak {A}}$ . We have $\tau (\mu )=\{\omega _{1},\omega _{2},\omega _{3}\}$ . Let $X:=\{\omega _{1},\omega _{3},\omega _{4}\}$ . We have $\tau (\mu )^{\ast }X=\{\omega _{1},\omega _{3}\}=\tau (\mu )\cap X$ (this is in accordance with the Inclusion and Preservation postulates). But conditioning on X gives $\mu _{ X}\approx (0.746,0, 0.179,0.075)$ , and we get $\tau (\mu _{ X})=\{\omega _{1}\}$ : conditioning raises the probability of $\omega _{1}$ just enough to make it $(\mu _{ X},t)$ -stable. So $\tau (\mu _{ X})\subset \tau (\mu )^{\ast }{X}$ , and tracking fails.

As this example illustrates, we cannot always guarantee that $\tau (\mu )^{\ast }X$ will be the minimal $(\mu _{ X},t)$ -stable set. Thus the $\tau $ -generated revision operator $\ast $ cannot track Bayesian conditioning modulo $\tau $ . This shows how the No-Go Theorem affects the specific class of revision operators generated by $\tau $ (for any threshold t). Note, however, that the theorem extends to all AGM revision operators: no matter what AGM operator we begin with, a counterexample to commutativity will exists, effectively preventing the $\tau $ -rule itself—and not only the duo $\tau $ -rule + $\tau $ -generated revision—to establish the desired harmony between AGM revision and Bayesian conditioning.

Our example also illustrates one reason why this happens. Here, consider the qualitative revision $\tau (\mu )\mapsto \tau (\mu _{ X})$ , generated by Bayesian conditioning and the $\tau $ -rule. It was bad enough that this “Bayesian” revision did not coincide with the $\tau $ -generated revision. This does not, in itself, prevent the possibility of representing the former through an AGM-complying operation. But, to make things worse, it is clear that this cannot be done, as the revision $\tau (\mu )\mapsto \tau (\mu _{ X})$ simply fails to satisfy the AGM postulates. In particular, we have $\tau (\mu )\cap {X}\not \subseteq \tau (\mu _{ X})$ ; so the Inclusion postulate fails. Consider another case where tracking fails:

Example 6.3. The agent is given an urn. She knows that it is either of the type A—containing 30% black marbles and 70% white marbles, or B —containing 70% black and 30% white marbles. She believes option $\textbf {A}$ and option $\textbf {B}$ are equally plausible. Suppose she draws (with replacement) 10 marbles form the urn. How many black marbles would she have to draw to be convinced the urn is of type $\textbf {A}$ ?Our sample space contains the propositions $\{\textbf {A}\cap D_{i},\,\textbf {B}\cap D_{i}\,|\,0\leq i\leq 10\}$ , where A, B indicate which urn was given, while $D_{i}$ means that i black marbles have been drawn in our 10-draw trial. Here we assume a 50-50 prior distribution for urns A, B with likelihoods given by a binomial distribution

$$ \begin{align*} \mu(D_i\,|\,\textbf{U}_{\theta}) = {10 \choose i}\theta^{i}(1-\theta)^{10-i} \end{align*} $$

where $\textbf {A}=\textbf {U}_{0.3}$ and $\textbf {B}=\textbf {U}_{0.7}$ . We obtain the following joint distribution (approximate values):

For a threshold of 1/2, the system of spheres $\mathfrak {S}^{1/2}$ generates the following ranking (we have two ranks 0 and 1):

This means that, using the $\tau $ -generated revision policy, drawing 0, 1, 2, or 3 black marbles convinces the agent that she was given urn A : e.g., learning $D_{2}$ leaves only the propositions $\textbf {A}\cap D_{2}$ (with rank 0) and $\textbf {B}\cap D_{2}$ (with rank 1), so the agent believes $\textbf {A}\cap D_{2}$ . However, drawing 4 marbles yields disagreement between conditioning and revision: on the AGM side, the agent is undecided between the two urns, as she remains with the propositions $\textbf {A}\cap D_{4}$ and $\textbf {B}\cap D_{4}$ of equal rank. On the Bayesian side, however, we get $\mu (\textbf {A}\,|\, D_{4})\approx 0.845$ , while $\mu (\textbf {B}\,|\, D_{4})\approx 0.155$ . The proposition $\textbf {A}\cap D_{4}$ is then the strongest $\mu _{D_{4}}$ -stable set, and so gets the least rank: the agent believes the urn is of type A .

This down-to-earth example is a good illustration of how the cautiousness of AGM revision may prevent agreement with Bayesian conditioning.

While the No-Go Theorem precludes the possibility of perfect commutativity with AGM modulo the $\tau $ -rule, it is natural to ask if there is any way one could still make the case for a certain harmony between AGM and Bayesian reasoning. Despite its failures, the $\tau $ rule imposes itself as a natural tool for this purpose. Our discussion so far reveals two reasons why this is so: first, it is a well-motivated acceptance principle, which avoids Lottery-like paradoxes and remains close to Lockean intuitions. Secondly, it provides an elegant (though imperfect) bridge to qualitative revision dynamics through its connection with sphere-based models.

We will now consider two attempts at restoring harmony between conditioning and AGM revision. One idea consists in raising the threshold for stability after conditioning: an simple argument shows that this will not work in general. We then prove our main result, showing how the stability rule generates AGM revision operators from Bayesian conditioning via an application of the maximum entropy principle.

7 Can we save tracking by changing the threshold?

As the $\tau $ -rule depends on the threshold parameter t (which, according to Leitgeb, is “contextually determined” [Reference Leitgeb18, p. 14]), a straightforward idea suggests itself for regaining tracking. Whenever commutativity fails for the rule $\tau _{t}$ with some threshold t and update of $\mu $ by X, one could try and show that the belief set $\tau _{t}(\mu )^{\ast }X$ obtained by AGM revision corresponds to applying the stability rule to $\mu _{ X}$ , but for a different stability threshold. This would amount to showing that $\tau _{t}(\mu )^{\ast }X =\tau _{q}(\mu _{ X})$ , where q is some new threshold, so that the revision $\tau _{t}(\mu )\mapsto \tau _{q}(\mu _{ X})$ is AGM. One could then argue that selecting the AGM-compliant belief is a perfectly legitimate instance of stability-based acceptance, only with a different threshold (see Figure 2).

Fig. 2 Rationalizing AGM revision by raising the threshold: here we pick a new threshold $q > t$ .

For a simple illustration, recall Example 6.2: we have a distribution $\mu =(0.5,0.33,0.12,0.05)$ , and a threshold $t=0.7$ , hence $\tau (\mu )=\{\omega _{1},\omega _{2},\omega _{3}\}$ . Let $X:=\{\omega _{1}, \omega _{3}\}$ . Then $\mu _{ X}\approx (0.806, 0,0.194,0)$ and we get $\tau (\mu _{ X})=\{\omega _{1}\}$ . Then tracking fails, since $\tau (\mu )^{\ast }X=\{\omega _{1}, \omega _{3}\}=X$ . But now we can lift the threshold to a new value $q=0.807$ (any value above 0.806 will do). Then clearly $\tau (\mu _{ X})$ is not $(\mu _{ X},q)$ -stable (here, $\mu _{ X}(\tau (\mu _{ X}))$ is not even above the threshold). However $\tau (\mu )^{\ast }X$ is—indeed it has probability 1—and is also the least $(\mu _{ X},q)$ -stable set. We have thus ‘corrected’ the threshold as required: we approximate commutativity as we have $\tau _{t}(\mu )^{\ast }X=\tau _{q}(\mu _{ X })$ , with $\ast $ the AGM revision operator generated by $\tau _{t}$ .

It is easy to see, however, that the threshold-raising approach will not work in general. Define the degree of stability of $X\in \mathfrak {A}$ with respect to a measure $\mu \in \Delta _{\mathfrak {A}}$ as the value

$$ \begin{align*} \mathcal{S}(\mu,X):={\rm sup}\,\{q\in[0,1]\,|\,X\text{ is }(\mu,q)\text{-stable}\}. \end{align*} $$

Equivalently, $\mathcal {S}(\mu ,X)=\inf \big \{\mu (X\,|\,Y)\,|\,X\cap Y\neq \emptyset ,\, \mu (Y)>0\big \}$ . The degree of stability $\mathcal {S}(\mu ,X)$ measures how resilient X is under conditioning on propositions consistent with it: so that X is $(\mu ,t)$ -stable if and only if $\mathcal {S}(\mu ,X)\geq t$ .Footnote ⁸

Consider how the threshold raising method works when it does: we start with a measure $\mu \in \Delta _{\mathfrak {A}}$ and $X\in \mathfrak {A}$ such that $\tau _{t}(\mu _{ X})\subset \tau _{t}(\mu )^{\ast }X$ with both of those sets $(\mu _{ X},t)$ -stable. For the method to work, we need a threshold high enough so that $\tau _{t}(\mu _{ X})$ is no longer stable, but $\tau _{t}(\mu )^{\ast }X$ is. So it works if and only if there is some $q\in (0.5,1]$ for which $\tau _{t}(\mu )^{\ast }X$ is the least $(\mu _{ X},q)$ -stable set, while $\tau _{t}(\mu _{ X})$ is not stable for q. One can raise the threshold to “correct” the revision process only if $\mathcal {S}(\mu _{ X}, \tau (\mu _{ X}))<\mathcal {S}(\mu _{ X},\tau (\mu )^{\ast }X)$ . But this can easily fail, and so not all cases of tracking failure can be corrected by raising the threshold in this way. We illustrate this with a simple counterexample:

Example 7.1. Consider the same setting as in Example 6.2. We have $t=0.7$ , a distribution $\mu =(0.5,0.33, 0.12,0.05)$ , and $X=\{\omega _{1},\omega _{3},\omega _{4}\}$ . Here $\tau _{t}(\mu )=\{\omega _{1},\omega _{2},\omega _{3}\}$ . Then $\mu _{ X}\approx (0.746,0, 0.179,0.075)$ , and tracking fails since $\tau _{t}(\mu _{ X})=\{\omega _{1}\}$ and $\tau _{t}(\mu )^{\ast }X=\{\omega _{1},\omega _{3}\}$ . We want to find $q\in (0.5,1]$ such that $\tau _{t}(\mu )^{\ast }X=\tau _{q}(\mu _{ X})$ : for such a q, we must have $\tau _{t}(\mu )^{\ast }X$ as (the least) $(\mu _{ X},q)$ -stable set, while $\tau _{t}(\mu _{ X})$ is not stable. But we have the following degrees of stability with respect to $\mu _{ X}$ :

$$ \begin{align*} \mathcal{S}(\tau_{t}(\mu_{ X}))=\frac{\mu_{ X}(\omega_{1})} {\mu_{ X}(\omega_{1})+\mu_{ X}(\Omega\setminus\{\omega_{1}\})} = \frac{0.746}{1}=0.746 \end{align*} $$

And

$$ \begin{align*} \mathcal{S}(\tau_{t}(\mu)^{\ast}X)=\frac{\mu_{ X}(\omega_{3})} {\mu_{ X}(\omega_{3})+\mu_{ X}(\Omega\setminus\{\omega_{1},\omega_{3}\})}=\frac{0.179}{0.179+0.075}\approx0.705 \end{align*} $$

This means that the maximal q such that $\tau _{t}(\mu )^{\ast }X$ is $(\mu _{ X},q)$ -stable is $q= \mathcal {S}(\tau (\mu )^{\ast }X) \approx 0.705$ . But for $\tau _{t}(\mu _{ X})$ , this maximal threshold is at $q = 0.746$ . So any threshold q for which $\tau _{t}(\mu )^{\ast }X$ is $(\mu _{ X},q)$ -stable also makes $\tau _{t}(\mu _{ X})$ stable. We cannot raise the threshold so as to approximate commutativity in our sense.

Cases like this one are easy to generate. For any algebra $\mathfrak {A}$ of events generated by more than 3 atoms, there is always an open neighbourhood of measures in $\Delta _{\mathfrak {A}}$ which admit of a counterexample to tracking that cannot be corrected by threshold-raising.Footnote ⁹ The failure of the threshold-raising method is a simple reflection of the fact that degrees of stability of propositions do not respect their order of logical entailment.

This observation is relevant to a different approach to the tracking problem for the stability rule, which also relies on a variable threshold. This is an account developed more recently in Leitgeb’s book [Reference Leitgeb21], based on a distinct, more permissive version of the stability rule: it is what Leitgeb calls the Humean Thesis. Just for the purpose of this section, let the belief set of an agent be represented by a set $\mathbb {K}\subseteq \mathfrak {A}$ of accepted propositions. The Humean thesis is a constraint on accepted beliefs according to which a rational agent’s belief set $\mathbb {K}$ must satisfy

(1)

$$ \begin{align} A\in \mathbb{K} \text{ if and only if } \forall X\in Poss(\mathbb{K}) \text{ such that }\mu(X)>0,\,\mu(A\,|\,X)>t \end{align} $$

where $Poss(\mathbb {K}):=\{X\in \mathfrak {A}\,|\,X^{c}\not \in \mathbb {K}\}$ is the collection of doxastically possible hypotheses. A probabilistic reasoner who follows the Humean thesis believes all and only those hypotheses that are stable under conditioning on any proposition that is not disbelieved. Leitgeb shows that, on any finite probability space, the sets $\mathbb {K}$ satisfying (1) correspond exactly to sets of the form $\{X\in \mathfrak {A}\,|\, S\subseteq X\}$ where S is a $(\mu ,t)$ -stable set. So the Humean reasoner selects a stable set and closes under deduction. Further, we have the following

Theorem 7.2 (Leitgeb [Reference Leitgeb21])

Representation theorem for the Humean Thesis. Let $(\Omega ,\mathfrak {A},\mu )$ a finite probability space and $\mathbb {K}\subseteq \mathfrak {A}$ . Fix $t\in (0.5,1)$ . The following are equivalent.

(i) $\mathbb {K}$ is consistent $(\emptyset \not \in \mathbb {K})$ and satisfies the Humean thesis (1).
(ii) $\mathbb {K}=\{X\in \mathfrak {A}\,|\, S\subseteq X\}$ where S is a $(\mu ,t)$ -stable set.
(iii) $\mathbb {K}=\{X\in \mathfrak {A}\,|\, \mu (X)\geq q\}$ where $q=\mu (S)$ for a $(\mu ,t)$ -stable set S.

In (ii) and (iii), if $\mu (S)=1$ , then S is the least set of measure 1.

The Humean Thesis derives much of its strength from this representation theorem. The theorem shows that, on finite spaces, the ‘nondeterministic’ stability rule, which allows the choice of any $(\mu ,t)$ -stable set as strongest accepted proposition, coincides with the Humean thesis; and that both coincide with a version of the Lockean thesis in which the choice of threshold is constrained by the measure of stable sets.Footnote ¹⁰ In other words, it shows that three distinct ways of thinking about acceptance yield one and the same acceptance rule:

• acceptance as (Humean) stability under doxastically possible propositions,
• acceptance as entailment by a chosen probabilistically stable hypothesis,
• acceptance as selecting a consistent, logically closed belief given by the Lockean thesis (with the choice of the Lockean threshold set to the measure of some stable set).Footnote ¹¹

This account of acceptance evidently does not yield a functional acceptance rule whereby belief supervenes on credences and a stability threshold: a probability measure over a fixed algebra (even together with a fixed threshold) does not uniquely determine a set of accepted hypotheses. Given a probability measure, there exist as many possible belief sets for the agent as there are probabilistically stable hypotheses: one generates a belief set by selecting a $(\mu , t)$ -stable hypothesis and closing the belief set under logical consequence. The Humean thesis is best seen as a stuctural constraint specifying which $\langle $ probability function, belief set $\rangle$ pairs are admissible for rational agents: we allow exactly those pairs $(\mu , K)$ , with $\mu $ the agent’s credence function and K her strongest accepted proposition, where K is $(\mu ,t)$ -stable. This approach is in line with what Leitgeb [Reference Leitgeb21] calls the ‘dual system’ account of the relationship between credences and all-or-nothing belief, where neither attitude is reducible to the other, or taken as more fundamental. The more ‘reductive’ stability rule $\tau $ corresponds to selecting the logically strongest belief set satisfying either one of (i), (ii) or (iii). This canonical choice can be additionally justified as the collection of all propositions that admit a probabilistically stable justification. That is, the $\tau $ -rule amounts to accepting all and only those propositions X for which there exists some stable hypothesis entailing X (by contrast, under the general Humean Thesis, it is permitted to accept only some, and not all, propositions admitting a stable justification: one can pick any stable set as strongest accepted proposition, not necessarily the logically strongest one).

Cases of tracking failure bring to the fore a few notable differences between the general Humean Thesis and the stability rule $\tau $ . It is illuminating to consider the way in which the Humean Thesis deals with the problem of harmonising AGM revision with conditioning.

While the functional version of the stability rule that we consider here ( $\tau $ ) is subject to Lin and Kelly’s No-Go theorem, the Humean Thesis is not. Since $\tau (\mu )^{\ast }X$ is $(\mu _{X}, t)$ stable, it is obvious that conditioning on the initial measure $\mu $ and performing the corresponding AGM revision on $\tau (\mu )$ preserves the Humean thesis. The agent can perform the AGM-compliant revision by X and obtain the belief set $\tau (\mu )^{\ast }X$ , and still respect the Humean Thesis with respect to a new Lockean threshold set to $q:=\mu _{X}(\tau (\mu )^{\ast }X)$ —the probability, after conditioning, of the AGM-revised belief set. Then, according the the Humean Thesis, selecting $\tau (\mu )^{\ast }X$ as strongest accepted proposition is in fact in accordance with Lockean principles, since one gets the full Lockean thesis with respect to the updated measure $\mu _{ X}$ and threshold q. In other words, we get $\forall Y\in \mathfrak {A}$ , $\tau (\mu )^{\ast }X\subseteq Y$ if and only if $\mu _{ X}(Y)\geq q$ , as shown in [Reference Leitgeb18, p. 34]. And indeed, if the pair $(\mu , K)$ satisfies the Humean Thesis by K being a $(\mu ,t)$ -stable set, then the post-update pair $(\mu _{X}, K^{\ast }X)$ , with $K^{\ast }X$ the corresponding AGM-updated belief set, still conforms to the Humean thesis, for $K^{\ast }X$ is $(\mu _{X},t)$ -stable. This property is what Leitgeb [Reference Leitgeb21, p. 174] calls Robustness Persistence, and it is at the root of his most recent approach to reconcile Bayesian conditioning with AGM revision.

Our observations about the threshold-raising method illustrate an important component of the Humean approach: instances of tracking failure highlight the fact that the (‘Humean’) stability threshold and the Lockean threshold are, in general, distinct. In particular, in all cases where threshold-raising fails to rationalize the AGM revision induced by the stability rule, it is always the case that the stability threshold and the new, post-revision Lockean threshold cannot be equal.

Observation 7.3. Let $\mu \in \Delta _{\mathfrak {A}}$ , $X\in \mathfrak {A}$ , and let $\tau $ have a fixed threshold t. Let $K_{\textsf{AGM}}:= \tau (\mu )^{\ast }X$ the AGM-revised belief set, and $K_{\textsf{L}}:=\tau (\mu _{ X})$ the belief set obtained by conditioning followed by the $\tau $ -rule. Suppose the following hold:

• $K_{\textsf{L}}\subset K_{\textsf{AGM}}$ (tracking fails);
• $\mathcal {S}(\mu _{ X}, K_{\textsf{L}})\geq \mathcal {S}(\mu _{ X},K_{\textsf{AGM}})$ (the case is noncorrectible);
• $\mu _{ X}(K_{\textsf{L}})<1$ .

Set $q:=\mu _{ X}(K_{\textsf{AGM}})$ . Then $K_{\textsf{AGM}}$ is not $(\mu _{ X},q)$ -stable.

Proof. For some fixed $t\in (0.5,1]$ , take any ‘noncorrectible’ case as above. Suppose $\tau (\mu )^{\ast }X$ is $(\mu _{ X},q)$ -stable. Then, since we know $\mathcal {S}(\mu _{ X}, \tau (\mu _{ X}))\geq \mathcal {S}(\mu _{ X},\tau (\mu )^{\ast }X)$ , the set $\tau (\mu _{ X})$ is also $(\mu _{ X},q)$ -stable. This entails $\mu _{ X}(\tau (\mu _{ X}))\geq q$ . But we also know $\tau (\mu _{ X})\subset \tau (\mu )^{\ast }X$ . This entailsFootnote ¹² $\mu _{ X}(\tau (\mu _{ X}))<\mu _{ X}(\tau (\mu )^{\ast }X)$ , which means simply $\mu _{ X}(\tau (\mu _{ X}))<q$ . This is a contradiction. □

When we apply the method suggested by the Humean thesis to noncorrectible cases, by raising the threshold to q as above, we violate the Stable Justification Principle (SJP) with respect to q. For example:

Example 7.4. Consider again Example 6.2. There, set $\tau (\mu )^{\ast }X$ as strongest accepted proposition, and let $q:=\mu _{ X}(\tau (\mu )^{\ast }X)=0.925$ . It is easy to see that we obtain the full Lockean thesis for q, but $\tau (\mu )^{\ast }X$ itself is not $(\mu _{ X},q)$ -stable. For take $Y=\{\omega _{3},\omega _{4}\}$ . Then $\mu _{ X}(Y)=0.254>0$ and $(\tau (\mu )^{\ast }X)\cap Y=\{\omega _{3}\}\neq \emptyset $ ; but by conditioning on Y we get $\mu _{ X}(\tau (\mu )^{\ast }X\,|\,Y)= \frac {\mu _{ X}(\omega _{3})}{\mu _{ X}(Y)}=\frac {0.179}{0.254}=0.705<q$ . Thus Y is a defeater for $\tau (\mu )^{\ast }X$ .

Under the more permissive constraint on acceptance given by the Humean Thesis, tracking Bayesian conditioning by AGM revision is at least permissible, as it does not violate the constraints relating the agent’s probability measure to their accepted propositions. Thus the agent can follow AGM revisions by selecting an appropriate Lockean threshold q: but by doing so in cases like the above, the resulting strongest accepted proposition is not stable with respect to the chosen Lockean threshold q (though it is stable with respect the distinct stability threshold t). So the Lockean threshold must necessarily be distinct of the threshold for stability, if we want AGM revisions to preserve both the Lockean thesis and the principle that the strongest accepted proposition be stable.

The divergence between the Lockean and the Humean thresholds is no surprise, as it is built into the Humean Thesis: the two parameters play distinct functional roles in the theory. Roughly, the Lockean threshold determines high enough probability for belief, while the stability threshold determines high enough resiliency to be a candidate for the logically strongest belief. While the Humean stability threshold can remain fixed, the Lockean threshold depends on the given probability measure $\mu $ : it can only take a certain range of values, which depends on the exact structure of $(\mu ,t)$ -stable sets [Reference Leitgeb21, chap. 4]. Once we have the collection of $(\mu ,t)$ -stable sets (where t plays the role of the stability threshold), for any choice of a stable set K, the agent will comply with the Lockean thesis (e.g., for a Lockean threshold set to $q=\mu (K)$ ).

Let us take stock. The Humean Thesis imposes a joint constraint on probability measures and belief sets, and does not yield a functional acceptance rule: the agent’s probability measure (even together with a stability threshold) does not uniquely determine her belief set. The stability and Lockean thresholds are two distinct parameters. AGM-complying revisions are compatible with the Humean Thesis, and they correspond to a change in the Lockean threshold (which takes a value different from the stability threshold).

The merits of this approach notwithstanding, there are several reasons to investigate the possibility of having AGM-compliant revision operators in the context of the (more ‘reductive’) stability rule $\tau $ . Firstly, one may naturally wonder whether it is possible to mitigate the consequences of the No-Go result by exploring restricted forms of compatibility between conditioning and revision in the original setting for which the result was proved—that of acceptance rules, understood as functions from probability models to belief sets. While Lin and Kelly [Reference Kelly and Lin15] use their No-Go theorem as an argument against the AGM account, Leitgeb [Reference Leitgeb21, Appendix C] uses it as an argument against the supervenience of belief on credences, as captured by a fixed acceptance rule. We, on the other hand, investigate AGM revision in the context of the stability rule in the hope for a compromise solution, by identifying salient conditions under which AGM revision can come close to track conditioning, without leaving the framework of acceptance rules.

Among acceptance rules compliant with the Humean Thesis, the original stability rule $\tau $ is arguably the most canonical choice. It is attractive in part because it follows from a simple and plausible refinement of the Lockean thesis: it is a straightforward extension of the high-probability requirement to conditional probabilities. By contrast to the general Humean Thesis, the stability rule only relies on one threshold parameter: the threshold determines what counts as having high probability. A proposition is stable precisely by virtue of its probability being high—above the Lockean threshold—and remaining so under learning new information (as long as the new information does not logically contradict the proposition). In a sense, under the stability rule $\tau $ , the stability threshold is a Lockean threshold, but applied more broadly to conditional probabilities (and with the right-to-left direction of the Lockean thesis weakened appropriately). The stability rule thus obviates the need for an account of how the Humean and Lockean parameters relate to each other.

Further, while AGM-complying revisions are compatible with the Humean Thesis, the constraints placed by the latter are somewhat permissive: when learning new information, nothing prevents an agent from selecting, without violating the Humean Thesis, a belief set (and a corresponding Lockean threshold) that would result in a violation of the AGM postulates (e.g., a violation of the Inclusion axiom). The choice of AGM revisions then requires an independent justification (Leitgeb [Reference Leitgeb21, p.33], for instance, adopts the AGM account of belief revision as an independent assumption). This is perhaps the best that can be hoped for under a two-systems view of belief, where all-or-nothing beliefs do not supervene on the agent’s credences. But the question remains whether it is possible to establish a tighter connection between AGM operators and Bayesian conditioning, closer in spirit to the notion of compatibility captured by the tracking condition. One significant question is whether AGM revision can naturally arise from Bayesian conditioning for agents who do not have independent reasons to adopt the AGM postulates: is there a sense in which probabilistic reasoning, mediated through an acceptance rule, can systematically give rise to AGM revision operators?Footnote ¹³ Can AGM revision be seen as a qualitative counterpart to Bayesian conditioning for agents using the stability rule? As we will now see, these questions admit positive answers.

8 Recovering AGM operators via Maximum Entropy

In this section, we show that the consequences of Lin and Kelly’s No-Go theorem can be mitigated by considering another approach to the comparison of probabilistic and qualitative belief dynamics. We employ elementary notions from information theory to show how AGM revision can emerge from Bayesian conditioning in cases where the agent relies on a purely qualitative representation of her information. The results offered here suggest a general approach for studying qualitative revision operators as special cases of probabilistic reasoning under a specific kind of information loss.

Acceptance rules can be motivated as providing a coarse-gained representation of credences: this representation has an important theoretical role to play in reasoning situations where the agent, for reasons of limited time or cognitive resources, may be unwilling or unable to access her full numerical credences and reason probabilistically. For instance, these can be reasoning scenarios where it is difficult, or unreasonably costly for her to evaluate precise probabilities of the events in question (this will be discussed in more detail in §8.2); or perhaps she may simply not have fully formed probabilistic judgments and only possess relative plausibility judgments.

Consider then a situation in which only a qualitative representation of belief is available to the agent—say, a single proposition X, or a sphere system $\mathfrak {S}$ (equivalently, a total plausibility preorder on states: see §2). Suppose the agent is committed to Bayesian conditioning as an update method. A natural way to proceed, when faced with new information, would be to obtain a probabilistic representation $\mu $ of her doxastic state (appropriately compatible with X or $\mathfrak {S}$ ) in order to perform conditioning on $\mu $ and translate the result back into a qualitative representation, by means of a selected acceptance rule. Provided such a representative probability measure can be selected in a principled way, the qualitative revision thus generated (the result of first applying Bayesian conditioning to a ‘best’ probabilistic representative of the qualitative belief state, and then applying the acceptance rule) can be understood as the best a Bayesian can do: that is, if the agent endorses Bayesian conditioning and follows the acceptance rule in question, it is the rational revision that they would perform in situations where their doxastic state is only characterized qualitatively.

As we will now see, it turns out that, when the acceptance rule in question is Leitgeb’s $\tau $ -rule, this can be done in such a way that the resulting revision operation is always AGM (for finite probability spaces): in this sense, we show how AGM revision emerges in a natural way from purely probabilistic principles together with the $\tau $ -rule.

The key principle we will use the (following version of) the Maximum Entropy Principle.

Maximum Entropy Principle (MAXENT):

If all that is known to the agent is that a probability distribution is constrained to lie within some zone $\mathcal {N}\subseteq \Delta _{\mathfrak {A}}$ , the agent selects a distribution with maximum entropy among those in $\mathcal {N}$ , if such exist.

Versions of the maximum entropy principle abound in the information theory and artificial intelligence literature (see [Reference Halpern11]) and various justifications have been offered for its application, both in the context of statistical inference in particular sciences (e.g. in statistical mechanics, as famously advocated by Jaynes [Reference Jaynes13]) as well as in the general mathematical study of uncertain reasoning (for instance, see [Reference Paris26] or [Reference Halpern11]). The gist of the argument is that the Shannon entropy $\mathcal {H}(\mu )$ of a distribution $\mu $ measures the uncertainty about which state in $\Omega $ is the true one; so the higher the entropy of the distribution, the less biased it is toward any particular element of the sample space. In this way, if the only available constraint on the distribution is that it lies within some zone $\mathcal {N}\subseteq \Delta _{\mathfrak {A}}$ , selecting a maximal entropy distribution in $\mathcal {N}$ , if it exists, amounts to choosing a least biased distribution (or the most equivocal one) given the available information. Such a distribution is naturally thought to best represent the available information, in virtue of its being, in Jaynes’ words, “uniquely determined as the one which is maximally noncommittal with regard to missing information” [Reference Jaynes13, p. 623].

MAXENT can guide the selection of a probabilistic representation of the agent’s belief state. Suppose the we start from a qualitative representation $\mathcal {Q}$ of the agent’s beliefs ( $\mathcal {Q}$ could be a proposition in $\mathfrak {A}$ , or a sphere system $\mathfrak {S}$ ). Which measures count as compatible with $\mathcal {Q}$ ? It is reasonable to expect the chosen acceptance rule $\tau $ to help specify what counts as an acceptable probabilistic representation of $\mathcal {Q}$ . For a probability measure $\mu $ to represent a belief set K we require that $\mu $ generates the belief set K via our acceptance rule. More generally, for a measure $\mu $ to count as a representation we must at least require $\tau (\mu )=\mathcal {Q}$ if $\mathcal {Q}\in \mathfrak {A}$ , and $\mathfrak {S}^{t}(\mu )=\mathcal {Q}$ if $\mathcal {Q}$ is a system of spheres (recall there that $\mathfrak {S}^{t}(\mu )$ is the system of spheres given by the $(\mu ,t)$ -stable sets). Now, there will in general be many suitable representations of $\mathcal {Q}$ in that sense: the acceptance rule only constrains them to lie in a region $\mathcal {N}\subseteq \Delta _{\mathfrak {A}}$ . This is where MAXENT can put to work to select a minimally biased representation among those in $\mathcal {N}$ .

8.1 The main results

We can now give mathematical substance to our claim and prove the two main results. The first states that, starting from a belief set K, there exists a unique maximum entropy distribution $\mu $ subject to the constraint $\tau (\mu )=K$ , and any revision obtained by conditioning on $\mu $ , and then applying the stability rule, will be AGM-complying. The second theorem states that, given a system of spheres $\mathfrak {S}$ , the corresponding AGM revision operator is equivalent to using Bayesian conditioning and applying the stability rule to the maximum entropy measure among all the probability measures generating $\mathfrak {S}$ . In this case also, the maximum entropy measure is unique.

Theorem 8.1. Let $\mathfrak {A}$ finite, $K\neq \emptyset $ a proposition in $\mathfrak {A}$ and $\tau =\tau _{t}$ for some $t\in [0.5,1)$ . Then there is a unique maximal entropy distribution $\mu \in \Delta _{\mathfrak {A}}$ such that $\tau (\mu )=K$ . Moreover, for any $X\in \mathfrak {A}$ with $\mu (X)>0$ , we have $\tau (\mu _{ X})=K^{\ast }X$ , where $\ast $ is the AGM revision operator generated by $\mathfrak {S}^{t}(\mu )$ .

From here on, we assume without loss of generality that $\mathfrak {A}$ is a powerset algebra. First let us introduce the following useful notions: given a measure $\mu \in \Delta _{\mathfrak {A}}$ , we say that $\mu $ is rank-uniform if it is uniform on all ranks in the sphere system $\mathfrak {S}^{t}(\mu )$ associated with $\mu $ . We say that two measures $\rho $ and $\mu $ are rank-equivalent if for they generate the same ranks, and for any rank R in their associated systems of spheres we have $\rho (R)=\mu (R)$ . A measure $\rho $ entropy-dominates $\mu $ if $\mathcal {H}(\rho )\geq \mathcal {H}(\mu )$ . Lastly, we say that $\mu $ has m ranks if $\mathfrak {S}^{t}(\mu )$ does (equivalently, when $|\mathfrak {S}^{t}(\mu )|=m$ ). We will make use of the following observation:

Observation 8.2. Any $\mu \in \Delta _{\mathfrak {A}}$ is entropy-dominated by some rank-equivalent, rank-uniform probability measure $\rho \in \Delta _{\mathfrak {A}}$ .

Proof. We adapt a standard argument for the entropy-maximality of uniform distributions. Let $\mu \in \Delta _{\mathfrak {A}}$ . Write all $\mu $ -ranks as $R_{1},...,R_{m}$ (generated by the $\tau $ rule for some fixed threshold t). We have

(2)

$$ \begin{align} \mathcal{H}(\mu)=\sum\limits_{\omega\in\Omega}-\mu(\omega)\log\mu(\omega)=&\sum\limits^{m}_{i=1}\left(\sum\limits_{\omega\in R_{i}}-\mu(\omega)\log\mu(\omega) \right) \nonumber \\ =&\sum\limits^{m}_{i=1}\mathcal{H}(\mu\upharpoonright R_{i}) \end{align} $$

Now assume $\mu $ is not rank-uniform. Take the rank-equivalent, rank-uniform measure $\rho $ , defined as $\rho (\omega )=\frac {\mu (R_{i})}{|R_{i}|}$ for $\omega \in R_{i}$ . This generates the exact same system of spheres as $\mu $ : (i) clearly $\rho $ cannot create any new stable sets, as all states in the same rank have equal measure, and (ii) $\rho $ preserves the stability of any $(\mu ,t)$ -stable set. Suppose S is $(\mu ,t)$ -stable: this entails that $\frac {\mu (\omega _{m})}{\mu (\omega _{m})+\mu (S^{c})}\geq t$ , where $\omega _{m}$ is the state of minimal measure in S. But the definition of $\rho $ entails $\rho (\omega _{m})\geq \mu (\omega _{m})$ and $\rho (S^{c})=\mu (S^{c})$ , so we have $\frac {\rho (\omega _{m})}{\rho (\omega _{m})+\rho (S^{c})}\geq t$ : S is also $(\rho ,t)$ -stable. So $\mu $ and $\rho $ generate the same ranks. We also have, for each rank $R_{i}$ , $\rho (R_{i})=\mu (R_{i})$ .

For all ranks $R^{\prime }$ on which $\mu $ is uniform, we have $\mu \upharpoonright R^{\prime }=\rho \upharpoonright R^{\prime }$ , and so $\mathcal {H}(\mu \upharpoonright R^{\prime })=\mathcal {H}(\rho \upharpoonright R^{\prime })$ . Now let R be a rank on which $\mu $ is not uniform. We show $\mathcal {H}(\rho \upharpoonright R)>\mathcal {H}(\mu \upharpoonright R)$ : then, Equation (2) gives us the desired result. We have $\mathcal {H}(\rho \upharpoonright R)=\sum _{\omega \in R}\frac {\rho (R)}{|R|}\log \left (\frac {|R|}{\rho (R)}\right )=\rho (R)\log \left (\frac {|R|}{\rho (R)}\right )$ . Now consider the function $\theta :x\mapsto x\log x$ defined on $\mathbb {R}^{+}$ . This is strictly convex on $(0,\infty )$ . Using Jensen’s inequality,Footnote ¹⁴ we can write

$$ \begin{align*} \theta\left(\sum\limits_{\omega\in R}\frac{\mu(\omega)}{|R|}\right)<&\sum\limits_{\omega\in R}\frac{1}{|R|}\theta(\mu(\omega))=\frac{1}{|R|}\sum\limits_{\omega\in R}\theta(\mu(\omega)) \end{align*} $$

As $\sum _{\omega \in R}\mu (\omega )=\mu (R)=\rho (R)$ , we write

$$ \begin{align*} |R|\cdot\theta\Big(\frac{\rho(R)}{|R|}\Big)<&\sum\limits_{\omega\in R}\theta(\mu(\omega)) \\ \text{so } -|R|\cdot\theta\Big(\frac{\rho(R)}{|R|}\Big)>&-\sum\limits_{\omega\in R}\theta(\mu(\omega))=-\sum\limits_{\omega\in R}\mu(\omega)\log\mu(\omega) \end{align*} $$

Now the left-hand side equals $-|R|\cdot (\frac {-\rho (R)}{|R|})\log (\frac {|R|}{\rho (R)})=\rho (R)\log \left (\frac {|R|}{\rho (R)}\right )$ , while the right-hand side equals $\mathcal {H}(\mu \upharpoonright R)$ . So we have

$$ \begin{align*} \rho(R)\log\left(\frac{|R|}{\rho(R)}\right)=\mathcal{H}(\rho\upharpoonright R)>\mathcal{H}(\mu\upharpoonright R) \end{align*} $$

So $\rho $ entropy-dominates $\mu $ , as required. □

Note that we have shown something slightly stronger: that any non rank-uniform $\mu $ is strictly dominated by a rank-equivalent, rank-uniform measure. Another key observation is:

Observation 8.3. If $\mu $ is rank-uniform, then for any $X\in \mathfrak {A}$ , the revision $\tau (\mu )\mapsto \tau (\mu _{ X})$ is the AGM revision generated by $\mathfrak {S}^{t}(\mu )$ .

Proof. Suppose $\mu $ is rank-uniform. We show that $\mathfrak {S}^{t}(\mu _{ X})=\mathfrak {S}^{t}(\mu )\restriction X$ (recall that $\mathfrak {S}^{t}(\mu )\restriction X$ is the restriction of the system of spheres to X). By an argument similar to the one in Observation 6.1, we get $\mathfrak {S}^{t}(\mu )\upharpoonright X\subseteq \mathfrak {S}^{t}(\mu _{ X})$ . We only need the other inclusion. Suppose, for reductio, that $\mathfrak {S}^{t}(\mu _{ X})$ strictly refines $\mathfrak {S}^{t}(\mu )\upharpoonright X$ . Then there exist at least two states $\omega _{1},\omega _{2}\in X$ which both belong to the same $\mu $ -rank R, but are separated into different $\mu _{ X}$ -ranks: say $\omega _{1}\in R_{1}$ and $\omega _{2}\in R_{2}$ . As $\mu $ is rank-uniform, we must have $\mu (\omega _{1})=\mu (\omega _{2})$ and hence $\mu _{ X}(\omega _{1})=\frac {\mu (\omega _{1})}{\mu (X)}$ clearly equals $\mu _{ X}(\omega _{2})$ . But states with equal measure cannot have different ranks. Now $\mathfrak {S}^{t}(\mu _{ X})=\mathfrak {S}^{t}(\mu )\upharpoonright X$ entails $\tau (\mu _{ X})=\min \mathfrak {S}^{t}(\mu _{ X})=\tau (\mu )^{\ast } X$ : first recall that $\tau (\mu )^{\ast } X=S_{ X}\cap X$ , where $S_{ X}$ is the smallest sphere in $\mathfrak {S}^{t}(\mu )$ intersecting X. Suppose towards a contradiction that $\exists Y\in \mathfrak {S}^{t}(\mu )\upharpoonright X$ such that $Y\subset S_{ X}\cap X$ : then we have $Y=S^{\prime }\cap X$ for some $S^{\prime }\in \mathfrak {S}^{t}$ with $S^{\prime }\subset S_{ X}$ , contradicting the minimality of $S_{ X}$ . □

With this at hand, we can prove theorem 8.1.

Proof of Theorem 8.1. Let $\emptyset \neq K\in \mathfrak {A}$ . We can assume $X\neq \Omega $ (if $X=\Omega $ we simply take $\mu $ uniform on $\Omega $ and we are done). Let $\tau ^{-1}(K):=\{\mu \in \Delta _{\mathfrak {A}}\,|\,K=\tau (\mu )\}$ . We want to find a maximum entropy measure $\mu $ in $\tau ^{-1}(K)$ . We now define a distribution $\mu $ such that (1) $\mu $ is uniform on K, (2) $\mu $ is uniform on $\Omega \setminus K$ , and (3) K is $(\mu ,t)$ -stable. Given (1), condition (3) means that we must have

$$ \begin{align*} \frac{\mu(K)}{|K|}\geq\frac{t}{1-t}(1-\mu(K)) \end{align*} $$

Here we set $\mu $ to satisfy $\frac {\mu (K)}{|K|}=\frac {t}{1-t}(1-\mu (K))$ . In other words, among all distributions satisfying (1), (2) and (3), we pick the one that assigns a minimal measure to K. It is clear that $\mathfrak {S}^{t}(\mu )$ contains exactly two spheres (hence two ranks, K and $\Omega \setminus K$ ) and $\mu $ is uniform on both ranks. We claim $\mu $ is the unique distribution with maximum entropy in $\tau ^{-1}(K)$ , i.e., we show $\forall \rho \neq \mu $ in $\tau ^{-1}(K)$ , $\mathcal {H}(\rho )<\mathcal {H}(\mu )$ . Let $\rho \in \tau ^{-1}(K)$ , $\rho \neq \mu $ . By Observation 8.2 above, we know that each measure is entropy-dominated by a rank-equivalent measure which is uniform on all ranks. This means we can assume, without loss of generality, that $\rho $ is rank-uniform and show that $\mu $ strictly entropy-dominates $\rho $ . Here we sketch the rest of the argument, with the specific calculations relegated to the appendix (the proof of the next proposition contains the main ideas, and addresses a more general case). We first consider the case where $\rho $ has $>2$ ranks, and then the case where $\rho $ has 2 ranks, with $\rho (K)>\mu (K)$ . In both cases, we show $\mathcal {H}(\mu )>\mathcal {H}(\rho )$ . The first case is dealt with by a similar argument as in Observation 8.2. In the second case, we show that $\mathcal {H}(\rho )$ strictly decreases as $\rho (K)$ increases on the subspace of all rank-uniform measures with 2 ranks, with $\mathcal {H}(\rho )$ seen as a function of $\rho (K)$ . We show entropy is strictly decreasing for $\rho (K)\in [\mu (K),1]$ , and so is maximized exactly when $\rho (K)=\mu (K)$ , or equivalently for $\rho =\mu $ . So $\mu $ , as we have defined it, is the required maximum entropy distribution, and it is unique. Finally, $\mu $ is rank-uniform, so Observation 8.3 guarantees that any feasible revision $\tau (\mu )\mapsto \tau (\mu _{ X})$ is the AGM revision $K\mapsto K^{\ast }X$ generated by $\mathfrak {S}^{t}(\mu )$ . □

The upshot is that one can uniquely recover AGM revision via the maximum entropy principle in the case where the agent’s doxastic state is represented only by her belief set. Suppose we select, in accordance with MAXENT, the distribution which best represents her state of knowledge—namely, the maximum entropy distribution lying in the acceptance zone for K under $\tau $ . Then we have commutativity: using Bayesian conditioning and an application of the $\tau $ -rule, we get the same result as applying to K the AGM revision generated by $\tau $ and the corresponding maximum entropy distribution.

Further, it is not necessary to ‘forget’ this much information for this maximum-entropy method to work: a similar, and perhaps more telling result still holds if more information is available about the doxastic state. Recall from §2 that each system of spheres $\mathfrak {S}$ generates a plausibility ranking—a total preorder on states. Suppose the agent begins with a full plausibility ranking of the basic hypotheses $\omega \in \Omega $ (equivalently, suppose we begin with a system of spheres). This corresponds to a case where we do not retain complete information about the agent’s probability measure, but we have preserved more information than merely the agent’s raw propositional belief set. Note, in particular, that the system of spheres gives an intermediate description of her doxastic state which can be seen as encoding conditional beliefs, or belief-revision strategies. Then there is still a unique maximum entropy distribution generating this system of spheres: from this distribution, Bayesian conditioning always generates the AGM revision corresponding to the ranking:Footnote ¹⁵

Theorem 8.4. Let $\mathfrak {S}$ be a system of spheres on $\mathfrak {A}$ , with $\mathfrak {A}$ finite. Then there is a unique maximum entropy $\mu $ on $\mathfrak {A}$ such that $\mathfrak {S}^{t}(\mu )=\mathfrak {S}$ . Moreover, for any $X\in \mathfrak {A}$ with $\mu (X)>0$ , we have $\mathfrak {S}^{t}(\mu _{ X})=\mathfrak {S}^{t}(\mu )\upharpoonright X$ , and so the associated revision $\tau (\mu )\mapsto \tau (\mu _{ X})$ is AGM.

Proof. We maximize $\mathcal {H}$ over $[\mathfrak {S}]:=\{\mu \in \Delta \,|\, \mathfrak {S}^{t}(\mu )=\mathfrak {S}\}$ . By Observation 8.2, each non rank-uniform distribution is strictly dominated in entropy by a rank-uniform one generating the same system of spheres: so a maximum entropy distribution on this domain must be rank-uniform. Thus it is enough to maximize entropy on the set $\mathcal {U}$ of all rank-uniform measures $\mu $ such that $\mathfrak {S}^{t}(\mu )=\mathfrak {S}$ . We have

$$ \begin{align*}\arg\max\limits_{\mu\in\mathcal{U}} \mathcal{H}(\mu) = \arg\max\limits_{\mu\in[\mathfrak{S}]} \mathcal{H}(\mu) \end{align*} $$

For any $\mu \in \mathcal {U}$ , we express entropy as a function of the measure of the n ranks $R_{1},...,R_{n}$ generated by $\mathfrak {S}$ (we assume $n>1$ : if there is only one rank, we simply take the uniform distribution). Writing $x_{i}=\mu (R_{i})$ , we obtain ${\mathcal {H}(\mu )=h(x_{1},...,x_{n}):=\sum _{i=1}^{n}x_{i}\log (\frac {|R_{i}|}{x_{i}})}$ . We thus have a convex optimization problem with linear inequality constraints.Footnote ¹⁶ To see this, note that the stability constraints—that each $\bigcup ^{i}_{j\leq i}R_{j}$ be $(\mu ,t)$ -stable—are of the form:

$$ \begin{align*} \forall i\text{ with } 1\leq i< n,\,\,\,\frac{x_{i}}{|R_{i}|}\geq\frac{t}{1-t}\sum\limits_{j=i+1}^{n}x_{j} \end{align*} $$

since $\frac {x_{i}}{|R_{i}|}=\mu (\omega )$ for each $\omega \in R_{i}$ . So we are maximizing the (strictly concave) function $h(\textbf {x})=\sum _{i=1}^{n}x_{i}\log (\frac {|R_{i}|}{x_{i}})$ under one equality constraint $\sum _{i=1}^{n}x_{i}=1$ and linear inequality constraints given by $x_{i}\geq 0$ ( $i\leq n$ ) and ${g_{i}(\textbf {x})\geq 0}$ for each $i<n$ , where ${g_{i}(\textbf {x})=x_{i}-|R_{i}|\cdot \frac {t}{1-t}\sum _{j=i+1}^{n}x_{j}}$ . We want to maximize h on $\mathcal {D}:= \{ \textbf {x}\in \Delta ^{n-1}\,|\, {g_{i}(\textbf {x})\geq 0} \text { for all }i<n \}$ . Clearly we have a one-to-one correspondence between vectors in $\mathcal {D}$ and distributions in $\mathcal {U}$ : for every $\mu \in \mathcal {U}$ , we have $(\mu (R_{1}),...,\mu (R_{n}))\in \mathcal {D}$ with $h(\mu (R_{1}),...,\mu (R_{n})) = \mathcal {H}(\mu )$ , and for each $\textbf {x}\in \mathcal {D}$ we have a unique measure $\mu \in \mathcal {U}$ with $h(\textbf {x})=\mathcal {H}(\mu )$ defined by $\mu (\omega ) = x_{i}/ |R_{i}|$ where $\omega \in R_{i}$ . So $\arg \max _{\textbf {x}\in \mathcal {D}} h(\textbf {x})$ uniquely determines $\arg \max _{\mu \in \mathcal {U}} \mathcal {H}(\mu )$ .

The optimization region $\mathcal {D}$ is an intersection of closed half-spaces, and so it is a closed convex set. As a subset of the simplex $\Delta ^{n-1}$ , $\mathcal {D}$ is also bounded. This means that $\mathcal {D}$ is compact: and since the function h is continuous, it admits a maximum on $\mathcal {D}$ . Because the function is strictly concave, and we maximize it over a convex set, the maximum is unique.Footnote ¹⁷ Since

$$ \begin{align*}\max_{\textbf{x}\in \mathcal{D}} h(\textbf{x}) = \max_{\mu\in\mathcal{U}} \mathcal{H}(\mu),\end{align*} $$

such a maximum gives us the desired maximal entropy distribution. Lastly, this distribution is rank-uniform, and so by Observation 8.3 it generates an AGM revision. This suffices to establish the theorem. We now show an explicit computation of the maximum entropy distribution.

Explicit form. An explicit solution for the maximum entropy distribution can be obtained as follows. The maximum is reached exactly for the unique $\textbf {x}\in \Delta ^{n-1}$ which satisfies all equalities $g_{i}(\textbf {x})=0$ , where all the $g_{i}$ constraints are active. Solving this system of equations, an expression for this $\textbf {x}$ is obtained. Writing $r_{i}:=|R_{i}|$ and $k:=\frac {t}{1-t}$ , the solution for $\textbf {x}=(x_{1},...,x_{n})$ is

(3)

$$ \begin{align} x_{n}&=\frac{1}{ \prod_{j=1}^{n-1}(1+r_{j}k)} \end{align} $$

(4)

$$ \begin{align} x_{i}&= \frac{r_{i}k} {\prod_{j=1}^{i}(1+ r_{j}k)} \,\,\,\text{for } i\leq n-1 \end{align} $$

This solution can be checked via Lagrange multipliers by appealing to the Karush–Kuhn–Tucker conditions for convex optimization [Reference Boyd and Vandenberghe4, p. 244] (checking that this is indeed a feasible solution satisfying the KKT conditions is sufficient, since h is concave, h and the $g_{i}$ s are all differentiable, and the constraints are all linear). Here is a more elementary (and less cumbersome) argument.

Take any $\textbf {x}\in \mathcal {D}$ for which the first $i-1$ constraints are active ( $g_{j}(\textbf {x})=0$ for $j<i$ ), but the i-th one is not, i.e., $g_{i}(\textbf {x})> 0$ : this last inequality means we have

$$ \begin{align*} x_{i}> \frac{r_{i}k}{\prod^{i}_{j=1}(1+r_{j}k)} \end{align*} $$

We first show that we can then find another $\textbf {y}\in \mathcal {D}$ with higher entropy—that is, $h(\textbf {y})>h(\textbf {x})$ —and for which $g_{j}(\textbf {y})=0$ for all of $\{1,.., i\}$ . Given $\epsilon>0$ , define an $(\epsilon , i)$ -improvement of $\textbf {x}$ as

$$ \begin{align*} \textbf{x}^{\epsilon}_{i} = \big(x_{1}, \ldots, x_{i}-\epsilon, x_{i+1}+\epsilon,...,x_{n} \big). \end{align*} $$

Now consider the following lemma (which we prove below):

Lemma 8.5. Let $\textbf {x}\in \mathcal {D}$ with $g_{j}(\textbf {x})=0$ for all $j< i$ but $g_{i}(\textbf {x})> 0 (i<1)$ . We have

$$ \begin{align*} \forall \epsilon \in \Bigg(0, x_{i} - \frac{r_{i}k}{\prod^{i}_{j=1}(1+r_{j}k)}\Bigg], \,\, h(\textbf{x}^{\epsilon}_{i})>h(\textbf{x}). \end{align*} $$

Given the lemma, the result easily follows: take any $\textbf {x}\in \mathcal {D}$ for which some constraint is inactive. Pick the least such i for which $g_{i}(\textbf {x})>0$ . Take its $(\epsilon _{i}, i)$ -improvement with

$$ \begin{align*}\epsilon_{i}= x_{i} - \frac{r_{i}k}{\prod^{i}_{j=1}(1+r_{j}k)}\end{align*} $$

Note that $\textbf {x}^{\epsilon _{i}}_{i}\in \mathcal {D}$ . Proceeding in this way for each subsequent $j>i$ , we have

$$ \begin{align*}h\big(\textbf{x}\big)<h\big(\textbf{x}^{\epsilon_{i}}_{i}\big)< h\big( (\textbf{x}^{\epsilon_{i}}_{i})^{\epsilon_{i+1}}_{i+1} \big)<\cdots< h\big( (...(\textbf{x}^{\epsilon_{i}}_{i})...)^{\epsilon_{n-1}}_{n-1} \big).\end{align*} $$

The last element $(...(\textbf {x}^{\epsilon _{i}}_{i})...)^{\epsilon _{n-1}}_{n-1}$ in this improvement sequence is always equal to the solution $\textbf {x}^{\ast }$ given by Equations (3) and (4), since it is the unique vector $\textbf {x}^{\ast }$ satisfying all $g_{i}(\textbf {x}^{\ast })=0$ : at each step $i\leq n-1$ , the i-th coordinate is replaced by the term $r_{i}k/\prod ^{i}_{j=1}(1+r_{j}k)$ . Now every point $\textbf {y}\in \mathcal {D}$ different from $\textbf {x}^{\ast }$ has some inactive constraints, and so we can construct an improvement sequence culminating in $\textbf {x}^{\ast }$ , which witnesses that $h(\textbf {y})< h(\textbf {x}^{\ast })$ . This improvement procedure corresponds to moving along the boundary of the polytope $\mathcal {D}$ , at each step reaching the intersection of the first i hyperplanes $g_{i}(\textbf {x})=0$ . The procedure terminates at the maximum entropy point $\textbf {x}^{\ast }$ , as given by the intersection of all hyperplanes.

Proof of the Lemma. We first prove the lemma for $(\epsilon ,1)$ -improvements: that is, we show:

(5)

$$ \begin{align} \text{For any } \textbf{x}\in\mathcal{D}\text{ with }g_{1}(\textbf{x})> 0,\text{ we have }h(\textbf{x}^{\epsilon}_{1})>h(\textbf{x})\text{ for all }\epsilon \in \Big(0, x_{1} - \frac{r_{1}k}{1+r_{1}k}\Big]. \end{align} $$

Take any $\textbf {x}\in \mathcal {D}$ for which the first constraint is not active, i.e., $g_{1}(\textbf {x})> 0$ —this means we have

$$ \begin{align*} x_{1}> \frac{r_{1}k}{1+r_{1}k}. \end{align*} $$

(Recall that here $r_{i} := |R_{i}|\in \mathbb {N}\setminus \{0\}$ and $k := \frac {t}{1-t}> 1$ ). Now consider an $(\epsilon ,1)$ -improvement $ \textbf {x}^{\epsilon }_{1} = \big (x_{1}-\epsilon , x_{2}+\epsilon , x_{3},\ldots,x_{n} \big ). $ Their difference in entropy is equal to

(6)

$$ \begin{align} h(\textbf{x}^{\epsilon}_{1}) - h(\textbf{x}) &= x_{1} \log \Big(\frac{x_{1}}{x_{1}-\epsilon}\Big) + x_{2} \log \Big(\frac{x_{2}}{x_{2}+\epsilon}\Big)-\epsilon\log\Big(\frac{r_{1}}{x_{1}-\epsilon}\Big) + \epsilon\log\Big(\frac{r_{2}}{x_{2}+\epsilon}\Big) \end{align} $$

(7)

$$ \begin{align} &= x_{1} \log \Big(\frac{x_{1}}{x_{1}-\epsilon}\Big) + x_{2} \log \Big(\frac{x_{2}}{x_{2}+\epsilon}\Big)+ \epsilon\log\Big(\frac{r_{2}}{r_{1}}\cdot\frac{x_{1}-\epsilon}{x_{2}+\epsilon}\Big) \end{align} $$

We show that this is strictly positive for $\epsilon \in (0, x_{1} - \frac {r_{1}k}{1+r_{1}k}]$ . First note that

(8)

$$ \begin{align} \log\Big(\frac{r_{2}}{r_{1}}\cdot\frac{x_{1}-\epsilon}{x_{2}+\epsilon}\Big) \geq \log(r_{2}k)>0 \end{align} $$

This is so because $x_{1}-\epsilon \geq \frac {r_{1}k}{1+r_{1}k}$ , and it follows from this and $x_{1}+x_{2}\leq 1$ that $x_{2}+\epsilon \leq 1- \frac {r_{1}k}{1+r_{1}k}$ . Then we have that

$$ \begin{align*} \frac{x_{1}-\epsilon}{x_{2}+\epsilon} \geq r_{1}k, \,\,\text{and so }\,\,\frac{r_{2}}{r_{1}}\cdot\frac{x_{1}-\epsilon}{x_{2}+\epsilon} \geq r_{2}k, \end{align*} $$

establishing (8). Now we can write

(9)

$$ \begin{align} h(\textbf{x}^{\epsilon}_{1}) - h(\textbf{x}) \geq g(\epsilon) + \epsilon \log(r_{2}k), \end{align} $$

with $g(\epsilon ):= x_{1} \log \Big (\frac {x_{1}}{x_{1}-\epsilon }\Big ) + x_{2} \log \Big (\frac {x_{2}}{x_{2}+\epsilon }\Big )$ . Observe that $\frac {dg}{d\epsilon }=\frac {\epsilon (x_{1}+x_{2})}{(x_{1}-\epsilon )(x_{2}+\epsilon )}$ is strictly positive when $\epsilon \in (0, x_{1} - \frac {r_{1}k}{1+r_{1}k}]$ . Now we have $g(0) = 0$ and $g(\epsilon )$ strictly increases on this interval: it follows from (9) that $h(\textbf {x}^{\epsilon }_{1}) - h(\textbf {x})>0$ for $0<\epsilon \leq \frac {r_{1}k}{1+r_{1}k}$ , and statement (5) is established. This case suffices to prove the Lemma. To see why, consider some $\textbf {x}\in \mathcal {D}$ with $g_{j}(\textbf {x})=0$ for all $j< i$ but $g_{i}(\textbf {x})> 0 (i<n)$ . This means that we have

$$ \begin{align*} \textbf{x} = \Big( \frac{r_{1}k}{1+r_{1}k},\dots, \frac{r_{i-1}k} {\prod_{j=1}^{i-1}(1+ r_{j}k)}, x_{i},\dots x_{n}\Big) \end{align*} $$

with $x_{i}> \frac {r_{i}k} {\prod _{j=1}^{i}(1+ r_{j}k)}$ . Let $\epsilon \in \Big (0,x_{i} -\frac {r_{i}k} {\prod _{j=1}^{i}(1+ r_{j}k)}\Big ]$ . We show that $h(\textbf {x}^{\epsilon }_{i})> h(\textbf {x})$ . Consider the restricted distribution $\textbf {y}\in \mathbb {R}^{n-i+1}$ defined by re-normalizing:

$$ \begin{align*}(y_{i},...,y_{n}):=\frac{1}{1 - \sum^{i-1}_{j=1} x_{j} } (x_{i},\dots,x_{n} )\end{align*} $$

For convenience we write $\alpha := 1 - \sum ^{i-1}_{j=1} x_{j}$ . Note that

$$ \begin{align*} \alpha=\big(1 - \sum\limits^{i-1}_{j=1} x_{j}\big) \,\,=\,\,1-\sum\limits^{i-1}_{j=1} \Bigg[\frac{r_{j}k} {\prod_{\ell=1}^{j}(1+ r_{\ell}k)}\Bigg] \,\,=\,\, \frac{1}{{\prod_{j=1}^{i-1}(1+ r_{j}k)}} \end{align*} $$

so that

$$ \begin{align*} y_{m} = x_{m}\cdot\Big(\prod\limits_{j=1}^{i-1}(1+ r_{j}k) \Big) = \frac{1}{\alpha}x_{m} \text{ for all }m\in\{i,\dots,n\}. \end{align*} $$

Since $x_{i}> \frac {r_{i}k} {\prod _{j=1}^{i}(1+ r_{j}k)}$ , this entails that $y_{i}> \frac {r_{i}k}{1+r_{i}k}$ . Now, consider the restricted function $\tilde {h}= \sum ^{n}_{j=i}x_{j}\log (r_{j}/x_{j})$ (h restricted to the last $n-i+1$ coordinates). We apply the proof of (5) to the vector $(y_{i},...,y_{n})$ : for any positive $\epsilon ^{\ast }\leq y_{i} - \frac {r_{i}k}{1+r_{i}k}$ , we obtain an improvement

$$ \begin{align*} \textbf{y}^{\ast} =(y^{\ast}_{i}, \dots, y^{\ast}_{n}) := \Big(y_{i}-\epsilon^{\ast},\, y_{i+1}+\epsilon^{\ast}, \dots, y_{n}\Big) \end{align*} $$

with $\tilde {h}(\textbf {y}^{\ast } )>\tilde {h}(\textbf {y})$ . Here take $\epsilon ^{\ast }:=\frac {1}{\alpha }\epsilon $ , and note we have $\frac {1}{\alpha }\epsilon \leq \frac {1}{\alpha } \big (x_{i} -\frac {r_{i}k} {\prod _{j=1}^{i}(1+ r_{j}k)} \big ) = y_{i} - \frac {r_{i}k}{1+r_{i}k}$ . Now consider

$$ \begin{align*} \textbf{x}^{\ast} = \Big( \frac{r_{1}k}{1+r_{1}k},\dots, \frac{r_{i-1}k} {\prod_{j=1}^{i-1}(1+ r_{j}k)}, \alpha y^{\ast}_{i},\dots ,\alpha y^{\ast}_{n}\Big) \end{align*} $$

and observe that this corresponds to taking the improvement $\textbf {x}^{\ast } = \textbf {x}^{\epsilon }_{i}$ . Since $(x_{i},\dots , x_{n}) = (\alpha y_{i},\dots ,\alpha y_{n})$ , we getFootnote ¹⁸

$$ \begin{align*}h(\textbf{x}^{\ast} ) - h(\textbf{x}) = \tilde{h}(\alpha\textbf{y}^{\ast} ) - \tilde{h}(\alpha\textbf{y}) = \alpha [\tilde{h}(\textbf{y}^{\ast} ) - \tilde{h}(\textbf{y})]>0.\end{align*} $$

This concludes the proof of Lemma 8.5. □

Thus we can appeal to the maximum entropy principle to pick a unique distribution which generates an AGM revision, even when some more information about the credal state is preserved—i.e., we have the full plausibility ordering (ranking) encoded in a system of spheres. Assume the agent’s doxastic state is specified by a given system of spheres $\mathfrak {S}$ (equivalently, the corresponding plausibility ordering). Then, following MAXENT, take the relevant maximum entropy measure as a probabilistic representation of the doxastic state. From there, Bayesian conditioning followed by the $\tau $ -rule yields the same result as using AGM revision on the initial sphere system directly (i.e., taking the appropriate restriction of the system of spheres).

More explicitly, Proposition 8.4 and Equations (3) and (4) give us an immediate solution for the maximum entropy distribution that generates a given ranking: given a system of spheres $\mathfrak {S}$ with ranks $R_{1},\dots , R_{n}$ , the maximum entropy distribution such that $\mathfrak {S}^{t}(\mu )=\mathfrak {S}$ is the rank-uniform $\mu $ determined by:

(10)

$$ \begin{align} \mu (R_{n}) &=\frac{1}{ \prod_{j=1}^{n-1}(1+|R_{j}|\cdot \frac{t}{1-t})} \end{align} $$

(11)

$$ \begin{align} \mu (R_{i}) &= \frac{|R_{i}| \cdot \frac{t}{1-t} } {\prod_{j=1}^{i}(1+|R_{j}| \cdot \frac{t}{1-t})} \,\,\,\text{for } i\leq n-1 \end{align} $$

so that the probability of each state $\omega \in \Omega $ is given by

$$ \begin{align*}\mu(\omega)= \frac{\mu(R_{i})}{|R_{i}|} = \frac{\frac{t}{1-t} } {\prod_{j=1}^{i}(1+|R_{j}| \cdot \frac{t}{1-t})} , \end{align*} $$

where $R_{i}$ is the rank containing $\omega $ . This depends only on the threshold and the size of the ranks. Example 8.6 below illustrates an application of this result.

Example 8.6. Let $t=3/4$ , so that $k:=\frac {t}{1-t} = 3$ . Suppose we have a system of spheres $\mathfrak {S}$ given by the ranking depicted below.

We have $(r_{1}, r_{2}, r_{3}, r_{4}) = (3,2,4,2)$ . Then Equations (3) and (4) give us

$$ \begin{align*}\Big(\mu(R_{1}), \mu(R_{2}), \mu(R_{3}),\mu(R_{4}) \Big) = \Big(\frac{819}{910},\frac{78}{910} ,\frac{12}{910} ,\frac{1}{910} \Big) \end{align*} $$

and so the maximum entropy distribution that generates this ranking is given by

$$ \begin{align*}\arg\max\limits_{\mu\in[\mathfrak{S}]}\mathcal{H}(\mu) = \frac{1}{910}(273 , 273 , 273, 39, 39, 3 ,3 ,3, 3, 0.5 ,0.5)\end{align*} $$

Now consider updating on new information $X=\{\omega _{4}, \omega _{5}, \omega _{8}, \omega _{9}, \omega _{11}\}$ .

We see that $\mathfrak {S}(\mu _{ {X}}) = \mathfrak {S}\upharpoonright X$ , so that $\tau (\mu _{ {X}}) = \tau (\mu )^{\ast }X = \{\omega _{4}, \omega _{5}\}$ . Note that the choice of the maximum entropy distribution is significant here, as there also exist distributions that generate $\mathfrak {S}$ but do not commute with the corresponding AGM revision. Take for instance

$$ \begin{align*}\rho=\frac{1}{4060}\big(1218,1218,1218, 300, 80, 6, 6, 6, 6, 1, 1\big)\end{align*} $$

This also generates $\mathfrak {S}$ , although it is not rank-uniform since $\rho (\omega _{4})>\rho (\omega _{5})$ . Here $\mathfrak {S}(\rho _{ {X}})$ gives the ranking

$$ \begin{align*}\{\omega_{4}\} < \{\omega_{5}\} < \{\omega_{8}, \omega_{9}\} < \{\omega_{11}\}\end{align*} $$

and so $\tau (\rho _{ {X}})=\{\omega _{4}\} \neq \tau (\rho )^{\ast }X$ .

Lin and Kelly’s No-Go Theorem, as well as examples of tracking failure for the stability rule, make clear that AGM is too coarse-grained to fully track Bayesian conditioning: it cannot deal with retaining too much information about the probability measure. Theorems 8.1 and 8.4 show that, when some information about the initial probability measure is lost (or abstracted away from), AGM revision can emerge from Bayesian conditioning. These observations suggest the slogan “AGM = Stability Rule + Maximum Entropy + Bayesian Conditioning” for situations involving an incomplete probabilistic specification of the agent’s belief state.

8.2 Discussion

The general lesson from our results is that AGM reasoners can be rationalized as a certain kind of stability-complying, entropy-maximizing Bayesians: they behave as if they were probabilistic reasoners, albeit limited inasmuch as they only retain their information in a qualitative form (alternatively, their credences are only partially specified by the stability constraints implicitly encoded in their system of spheres/plausibility ranking). Through the stability rule, an AGM-compliant agent can be seen as revising her belief state in a way consistent with Bayesian norms: she behaves like a stability-driven Bayesian agent who conditions on distinguished, equivocal probabilistic representations of her full beliefs or ranking structure.Footnote ¹⁹ AGM revision is thereby characterized as a form of Bayesian conditioning for agents entertaining a qualitative representation of their belief state.

Besides giving a characterization of AGM revision as being generated by Bayesian update, our results also suggest that probabilistic reasoners who rely on the stability rule can justify conforming to AGM revision, in situations where their belief state is limited to a qualitative representation of this sort. When, and why, would a probabilistic reasoner rely on qualitative representations such as a belief set or plausibility ranking? The very need for rational acceptance rules is sometimes motivated by a concern for informational or cognitive economy. On this view, in many reasoning contexts, rational agents may or must resort to qualitative beliefs and revision strategies because storing their precise subjective probability distribution is too costly. When introducing the original form of the stability rule, Skyrms writes:

In actual practice, people do not carry around probability assignments in (0,1) for every contingent proposition; at a certain point they simply accept a proposition and in some sense treat it as datum. Carnap and others have argued that (…) any such [acceptance] rule would throw away probabilistic information which would be crucial in some decision theoretic context. Carnap’s objection (…) is only decisive within the context of a model where it costs nothing to store such information. Since no one, except perhaps God, is in this happy state, it is still of interest to investigate the properties of various rules of acceptance. [Reference Skyrms31, p. 152]

On this way of thinking, acceptance rules can be useful precisely because there are many circumstances in which a reasoner must rely on her qualitative belief representation. Acceptance rules guide the choice of the qualitative representation of information, and these qualitative beliefs independently play a genuine role in reasoning. They allow the reasoning task at hand to be simplified, say, by eschewing probability considerations which may unnecessarily complicate deliberations—particularly in the context of a low-stakes, mundane inference or decision problem. This can happen in cases where it is needlessly cognitively costly or difficult for the agent to operate on her credences, which may require weighing the precise impact of the current evidence, or eliciting one’s prior by carefully reflecting on one’s exact probability assignments.Footnote ²⁰

Acceptance rules then play an important function in scenarios where the agent cannot, or perhaps selects not to, appeal to fully specified numerical credences, but rather operates on her logical and qualitative belief state (with raw propositional beliefs, or plausibility rankings). One salient instance, discussed in some detail by Lin [Reference Lin23], concerns the case of practical ‘everyday’ reasoning and decision-making, where it is natural to model the reasoning process as directly involving accepted propositions to be used as premises in further practical reasoning (even if they fall short of probabilistic certainty). Cases like these suggest a view of credences and qualitative judgments as distinct modes of reasoning that, although constrained by each other, might be appropriate for different tasks. In particular, some tasks might be appropriately dealt with reasoning based on an intuitive plausibility ordering of outcomes, rather than a fully fledged probability measure.Footnote ²¹

From this perspective, Theorems 8.1 and 8.4 apply precisely to cases where the agent relies in this way on acceptance rules for reasoning: the qualitative representation is used to provide a quick-and-dirty, rough-and-ready, coarser representation of the agent’s belief state. The results tell us that agents who, in a given reasoning context, rely on such a representation given by the stability rule, will—by doing the best they can do qua Bayesian, entropy-maximizing reasoners—revise their beliefs in accordance with the AGM postulates; or, to shift the emphasis: by following AGM revision they will be doing the best they can do qua Bayesian, entropy maximizing reasoners. Starting from a raw belief set, they can simply apply the AGM revision given by a two-rank sphere system; starting from a full system of spheres, they can apply the associated AGM revision operator. In both cases, they will reach the same posterior belief state as they would through Bayesian conditioning on the corresponding maximum entropy probability measure. Figure 3 illustrates both ways of recovering AGM revision through the maximum entropy principle. In a sense, then, if a stability-driven reasoner takes her acceptance rule seriously—that is, if she considers it a viable tool for reasoning contexts where retaining probabilistic information is costly—then she may well have a principled reason to adopt AGM revision operators.

Fig. 3 AGM revision emerges from Bayesian conditioning via the maximum entropy principle.

Of course, this appeal to AGM revision will only be reasonable to the extent that the agent is justified in her initial reliance on qualitative representations like plausibility orders or belief sets. One may doubt whether a Bayesian agent (even an imperfect one) can ever be compelled, for computational or other reasons, to discard information and entertain only qualitative representations of belief via an acceptance rule. In either case, however, the characterization of AGM revision remains instructive: AGM reasoners behave as Bayesian agents who discard information, in the precise sense of relying on a maximum entropy representative of their qualitative beliefs. AGM revision operators result in this way from Bayesian conditioning for agents who, for better or worse, rely on information encoded in qualitative form.

8.3 Other approaches to the tracking problem

Let us briefly compare the results presented here to other approaches to the tracking problem in the literature. In their original work on tracking Bayesian conditioning with qualitative revision, Lin and Kelly [Reference Kelly and Lin14, Reference Kelly and Lin15] drop the AGM constraints and study the class of camera-shutter acceptance rules, based on odds-ratio comparisons between atomic states in the sample space $\Omega $ , and obtain a tracking result for what they call Shoham revision: revision operators based on restricting partial plausibility orders. This yields a logic of conditional belief corresponding to System $\textsf{P}$ , well-known from the study of nonmonotonic logics [Reference Kraus, Lehmann and Magidor16]. These acceptance rules constitute a further departure from the Lockean thesis, abandoning the high-probability requirement for belief (unless one imposes further restrictions of the threshold, making it dependent on the size of the sample space).

In order to obtain revisions tracking Bayesian conditioning, one can also fix a particular acceptance rule and study the revisions generated by Bayesian conditioning under this rule. In this vein, Shear and Fitelson [Reference Shear and Fitelson30] study the revisions generated by the Lockean rule for specific thresholds, giving a concrete bound on the Lockean threshold for which Lockean revision is guaranteed to be AGM, conditional on the initial Lockean beliefs of the agent being consistent and logically closed. In this case, since the Lockean rule can yield inconsistent beliefs in general, we must dispense with the requirement that the rule be always consistent in the sense of yielding a logically consistent belief set. Another option is to fix the stability rule $\tau $ and a threshold, and study the revision operators $\tau (\mu )\mapsto \tau (\mu _{ {X}})$ generated by Bayesian conditioning. This is done in [Reference Mierzewski25], where a complete characterization of such stability-generated revision operators is given for $t=1/2$ , through a representation theorem based on the theory of comparative probability orders. The characterization can be seen as giving selection function semantics for the nonmonotonic logic of conditional belief generated by the $\tau $ -rule. This logic is unusual as it obeys the Rational Monotonicity rule from nonmonotonic logic, thus staying close to AGM revision, while failing the Or rule, usually a common feature in preferential logics, capturing a qualitative form of the decision-theoretic sure-thing principle [Reference Savage28].

By contrast to these, the maximum entropy approach presented here highlights the information loss incurred when passing from credences to all-or-nothing beliefs. It takes seriously the lossy nature of acceptance rules and tries to account for it in connecting logical revision operators to conditioning; it insists on consistency and logical closure for any choice of threshold, preserves AGM in a limited sense, and provides a partial explanation of when agreement with Bayesian conditioning is possible (or even necessary, depending on one’s attitude as to the normative status of MAXENT and the stability ruleFootnote ²² ).

Some further questions

This approach invites several questions about the information-theoretic aspects of passing from the quantitative to the qualitative framework. The No-Go theorem of Lin and Kelly shows that if no information is lost in the specification of the probabilistic credal state, commutativity fails. Our result, on the other hand, establishes that, when some of the probabilistic information is forgotten, we can recover a certain kind of commutativity. The information loss in question is induced by the acceptance rule: by their very nature, acceptance rules (or joint policies for acceptance and qualitative revision) employ qualitative structures to capture the belief state of the agent, which carry only partial information about the initial probability measure.

The results offered here can be contrasted with Lin and Kelly’s theory [Reference Kelly and Lin15]. Their preferred acceptance/revision duo—the odds-ratio rule with Shoham revision—does not exhibit the same kind of regularity under information loss: in general one cannot, given only the strongest accepted proposition, uniquely recover a Shoham revision operation from the odds-ratio rule and MAXENT.Footnote ²³ One can then ask which rules do allow for such a procedure. For instance, given some reasonably minimal structural constraints on acceptance rules, can we characterize the class of rules that allow for generating AGM operators via maximum entropy?Footnote ²⁴ Is this class of rules definable in a convenient logical framework (and if so, what is its definitional complexity)? Can we provide a useful characterization of the stability rule as defined by elementary information-theoretic and/or geometric constraints?

Next, in the light of the stability rule, we can view AGM revision as conditioning on a default probabilistic representative of one’s qualitative belief state. What exactly is the cost of switching between the two representations—from one’s original prior to the default representative? Suppose the agent has prior $\mu $ , corresponding to a system of spheres $\mathfrak {S}(\mu )$ . We would like to see what happens if she instead uses the system of spheres, via its corresponding maximum entropy representative, in solving an inference or decision problem. The information loss incurred by relying on the qualitative representation can be measured by the Kullback–Leibler divergence (relative entropy) $\textsf{D}_{KL}(\mu \,||\, \rho _{\textsf{max}})$ from one’s original prior $\mu $ to the maximum entropy representative $\rho _{\textsf{max}}:=\textsf{argmax}_{\{\rho \in \Delta _{\mathfrak {A}}\,|\, \mathfrak {S}(\rho )= \mathfrak {S}(\mu ) \}} \mathcal {H}(\rho )$ of the system of spheres $\mathfrak {S}(\mu )$ . In evaluating the behavior of an acceptance rule, it would be interesting to investigate the extent to which this information loss affects the agent’s learning or decision-making. This approach appears closely related to recent work in cognitive science on resource-rational analyses of cognition, whereby one models cognitive tasks for agents with limited computational resources, who resort to approximate representations that throw away some of the available information [Reference Griffiths and Lieder9].Footnote ²⁵

What if the agent does not select a single measure as default representative of her belief state? A different approach is to treat all compatible credence functions on a par: that is, to treat the entire set of compatible probability measures as a representor for the agent’s belief state, in the manner of imprecise probability (IP) [Reference Levi22, Reference Walley35]. There, the connection to logical belief revision is much less clear: can we recover some canonical revision operators using the dynamics of credal sets through an acceptance rule? From this perspective, we can also investigate the behavior of acceptance rules in terms of IP-based decision theory [Reference Bradley and Steele5, Reference Walley35]: given a belief set or a plausibility ranking, the agent chooses bets and actions according to the set of representing probability functions, using a decision rule for imprecise probabilities. How does this use of an acceptance rule affect the agent’s decision-theoretic performance? We leave these questions for another occasion.

9 Conclusion

A stability-driven agent who complies with the probabilistic principles of maximum entropy and Bayesian conditioning, but who stores her information in a qualitative form—e.g. a belief set together with a plausibility ranking—will automatically comply with AGM revision. Thus AGM operators can be seen, in this sense, as actually resulting from Bayesian conditioning. They admit, at the very least, of a probabilistic reconstruction, and need not conflict with Bayesian belief kinematics. And if, for instance, one believes that non-numerical representations of belief have an independent role to play in reasoning, or that a certain type of information loss is inevitable in passing between the probabilistic and logical representations, the above results indicate that AGM revision can be a very natural option for Bayesians.

While Lin and Kelly’s No-Go theorem identifies a crucial conflict between AGM revision and Bayesian conditioning, making the prospects for a principled reconciliation of the two look rather dim, the results presented here suggest that ‘forgetting’ some information along the way—or simply taking our acceptance rules seriously as a device for simplifying our information state in a given reasoning context—provides a viable bridge between the two: one built from Leitgeb’s rule and purely probabilistic principles. This bridge crucially relies on a key feature of acceptance rules: they are forgetful. There may be more to be learnt from it. Perhaps at the root of it all lies another simple moral; agreements cannot always be forced and conflicts cannot always be resolved by negotiation. To make peace, sometimes all it takes is to forget.

Appendix A Additional details for Proposition 8.1

Proof. We fill in the details of the proof. Given $\mu $ as we defined it in the proof (see main text), we show that $\mathcal {H}(\mu )>\mathcal {H}(\rho )$ for any measure $\rho $ that has K as strongest stable set.

(i) First, consider the case where $|\mathfrak {S}^{t}(\rho )|>2$ , hence $\rho $ has $>2$ ranks. Take the measure $\rho ^{\prime }$ s.t. $\rho ^{\prime }\restriction K=\rho \restriction K$ , and $\rho ^{\prime }$ uniform on $\Omega \setminus K$ . Then $\rho ^{\prime }$ is a measure with 2 ranks, with minimal rank K: the fact that $\rho ^{\prime }\restriction K=\rho \restriction K$ clearly guarantees that K is the least $(\rho ^{\prime },t)$ -stable set, and since $\rho ^{\prime }$ is uniform on $\Omega \setminus K$ , the only other $(\rho ^{\prime },t)$ -stable set is $\Omega $ itself (as states with equal measure cannot belong to distinct spheres). Now we can write

$$ \begin{align*} \mathcal{H}(\rho) =& \sum\limits_{\omega\in K}-\rho(\omega)\log\left(\rho(\omega)\right) + \sum\limits_{\omega\in\Omega\setminus K}-\rho(\omega)\log\left(\rho(\omega)\right)\\ =& \mathcal{H}(\rho\restriction{K}) + \mathcal{H}(\rho\restriction K^{c}) \end{align*} $$

And similarly for $ \mathcal {H}(\rho ^{\prime })$ . This means we have $\mathcal {H}(\rho ^{\prime })-\mathcal {H}(\rho )=\mathcal {H}(\rho ^{\prime }\restriction K^{c})-\mathcal {H}(\rho \restriction K^{c})$ . Note that, since $\rho $ has strictly more than 2 ranks, it cannot be uniform on $K^{c}$ , while $\rho ^{\prime }$ is. Moreover we have $\rho ^{\prime }(K)=\rho (K)$ so $\rho ^{\prime }(K^{c})=\rho (K^{c})$ . We have shown in the proof of Observation 8.2 that this entails $\mathcal {H}(\rho ^{\prime }\restriction K^{c})>\mathcal {H}(\rho \restriction K^{c})$ , which gives us $\mathcal {H}(\rho ^{\prime })>\mathcal {H}(\rho )$ . So $\rho ^{\prime }$ is a (rank-uniform) measure with 2 ranks which strictly entropy-dominates $\rho $ . Thus it only remains to show that $\mu $ entropy-dominates all other rank-uniform measures with two ranks (and with minimal rank K). We take care of this in the next case.

(ii) We consider the case of (rank-uniform) measures in $\tau ^{-1}(K)$ with 2 ranks. We denote the set of all such measures $\mathcal {U}$ . Note that any $\rho \in \mathcal {U}$ is entirely specified by the value of $\rho (K)$ (also note that, by choice of $\mu $ , we cannot have $\rho (K)<\mu (K)$ since this contradicts the $(\rho ,t)$ -stability of K). We show that for $\rho \in \mathcal {U}$ , $\mathcal {H}(\rho )$ is maximised exactly when $\rho (K)=\mu (K)$ ; i.e., for $\rho =\mu $ .

Notice that for any measure $\rho \in \Delta _{\mathfrak {A}}$ , we can write

(⊛)

$$ \begin{align} \mathcal{H}(\rho)=\mathcal{H}_{2}(\rho(K),\rho(K^{c})) + \rho(K)\mathcal{H}\left(\frac{1}{\rho(K)}\rho\restriction K\right) + (1-\rho(K))\mathcal{H}\left(\frac{1}{(1-\rho(K))}\rho\restriction K^{c}\right) \end{align} $$

This equality can be derived directly from the definition of $\mathcal {H}$ , or alternatively it can be seen as an immediate consequence of the “grouping” property of entropyFootnote ²⁶ . Now if $\rho $ is uniform on K we have $\forall \omega \in K$ , $\rho (\omega )=\rho (K)/|K|$ , so $\frac {1}{\rho (K)}\rho (\omega )=\frac {1}{|K|}$ . This also means

$$ \begin{align*} \mathcal{H}\left(\frac{1}{\rho(K)}\rho\restriction K\right) &= \sum\limits_{\omega\in K}\frac{1}{\rho(K)}\rho(\omega)\log\left(\frac{1}{\frac{1}{\rho(K)}\rho(\omega)}\right) \\ &= \sum\limits_{\omega\in K} \frac{1}{|K|}\log(|K|)\\ &= \log(|K|) \end{align*} $$

By the same reasoning, when $\rho $ is uniform on $K^{c}$ we get

$$ \begin{align*} \mathcal {H}\left (\frac {1}{(1-\rho (K))}\rho \restriction K^{c}\right )=\log (|\Omega \setminus K|). \end{align*} $$

Now for $\rho \in \mathcal {U}$ we can rewrite (⊛) as

$$ \begin{align*} \mathcal{H}(\rho)&=\mathcal{H}_{2}(\rho(K),\rho(\Omega\setminus K))+\rho(K)\log(|K|)+(1-\rho(K))\log(|\Omega\setminus K|)\\ &= \mathcal{H}_{2}(\rho(K),1-\rho(K))+\rho(K)\log\left(\frac{|K|}{|\Omega\setminus K|}\right)+\log(|\Omega\setminus K|) \end{align*} $$

It remains to show that this expression is maximised on $\mathcal {U}$ exactly when $\rho (K)=\mu (K)$ . For convenience we write $\rho (K)=a$ and so

$$ \begin{align*} \mathcal {H}(\rho )=\mathcal {H}_{2}(a,1-a)+a\log \left (\frac {|K|}{|\Omega \setminus K|}\right )+\log (|\Omega \setminus K|). \end{align*} $$

We have

$$ \begin{align*} \mathcal{H}_{2}(a,1-a)&= a\log\left(\frac{1}{a}\right)+(1-a)\log\left(\frac{1}{1-a}\right)\\ &= a\log\left(\frac{1-a}{a}\right)-\log(1-a) \end{align*} $$

Differentiating $\mathcal {H}(\rho )$ w.r.t a, we obtain

$$ \begin{align*} \frac{d\mathcal{H}}{da}= \log\left(\frac{1-a}{a}\right)+\log\left(\frac{|K|}{|\Omega\setminus K|}\right) \end{align*} $$

By choice of $\mu $ , for any $\rho \in \mathcal {U}$ , we must have $a=\rho (K)\geq \mu (K)=\frac {t\cdot |K|}{1+t(|K|-1)}$ : so we are only concerned with the value of the derivative $d\mathcal {H}/da$ for $a\in [\mu (K),1]$ . First note that $t>1/2$ entailsFootnote ²⁷ $\frac {t\cdot |K|}{1+t(|K|-1)}>\frac {1/2\cdot |K|}{1+1/2(|K|-1)}=\frac {|K|}{|K|+1}$ . So $a\in [\mu (K),1]$ entails $a>\frac {|K|}{|K|+1}$ . This yields $\frac {1-a}{a}<\frac {1}{|K|}$ , so $\log \left (\frac {1}{a}-1\right )< \log \left (\frac {1}{|K|}\right )$ , and we get

$$ \begin{align*} \log\left(\frac{1-a}{a}\right)+\log\left(\frac{|K|}{|\Omega\setminus K|}\right) &< \log\left(\frac{1}{|K|}\right)+\log\left(\frac{|K|}{|\Omega\setminus K|}\right)=\log\left(\frac{1}{|\Omega\setminus K|}\right) \end{align*} $$

But it is clear that $\log \left (\frac {1}{|\Omega \setminus K|}\right )\leq 0$ ( recall $K\neq \Omega $ ). So we have $\log \left (\frac {1-a}{a}\right )+\log \left (\frac {|K|}{|\Omega \setminus K|}\right )<0$ . This means that $d\mathcal {H}/da$ is strictly negative for $a\in [\mu (K),1]$ . Thus $\mathcal {H}(\rho )$ strictly decreases on $\mathcal {U}$ as $\rho (K)$ increases, with $\mathcal {H}(\rho )$ seen as a function of $\rho (K)$ . Since entropy is strictly decreasing for $\rho (K)\in [\mu (K),1]$ , it is maximised exactly when $\rho (K)=\mu (K)$ , or equivalently for $\rho =\mu $ . So $\mu $ , as we have defined it, is the required maximum entropy distribution, and it is unique, as desired. □

Acknowledgments

The results in this paper were obtained as part of the author's MSc thesis, written under Alexandru Baltag's supervision at the Institute for Logic, Language and Computation, University of Amsterdam. Thanks to Alexandru Baltag for his guidance and advice throughout this project. Thanks to Thomas Icard and Francesca Zaffora Blando for their detailed and helpful comments on earlier drafts. Thanks to audiences at the University of Amsterdam, the Delft University of Technology, Stanford University and participants of the Formal Epistemology Workshop 2019 at the University of Turin for comments and feedback.

Footnotes

¹ The word qualitative suffers from a certain ambiguity. Here, by ‘qualitative’, we mean, roughly, models of belief states that do not explicitly involve a numerical (real-valued) measure on the relevant space of propositions.

² In the literature, a distinction is sometimes made between the notions of acceptance and belief [Reference van Fraassen34]. When acceptance is distinguished from belief, accepting a proposition is often characterised as an active, voluntary action or decision, in contrast with believing, conceived as a passive and involuntary disposition [Reference Cohen6]. For the purposes of this paper, we shall treat the two as equivalent. The results presented here apply to any binary propositional attitude, or any belief-like state that can be represented with the qualitative structures we employ here (belief sets, plausibility rankings). Our primary concern is a formal result concerning mathematical models of information states, and we shall not commit to any particular interpretation of the acceptance rule. In the following, one can for instance conceive of subjective probabilities as representing degrees of belief, with the ‘belief set’ representing the propositions accepted by the agent.

³ As we understand them here, acceptance rules presuppose a functional dependence between a rational agent’s credences, represented by a probability space (together with a numerical threshold) and her propositional belief state. We will later briefly discuss a version of Leitgeb’s stability theory (based on the Humean Thesis [Reference Leitgeb21]) that espouses a more permissive view of the relationship between credences and qualitative belief.

⁴ The notion of stability can be traced back to Skyrms’ work [Reference Skyrms31] on what he calls probabilistic resiliency. For standard probability measures such as here, $(\mu ,t)$ -stable sets are analogous to high-probability cores, studied by Arló-Costa and Pedersen in the context of dyadic probability functions [Reference Arló-Costa and Pedersen3].

⁵ In [Reference Leitgeb18], Leitgeb proves Proposition 3.2 under the assumption that $\mathfrak {A}$ is a $\sigma $ -algebra, and that the measure $\mu $ is $\sigma $ -additive. In [Reference Leitgeb21], he shows that $\sigma $ -additivity is not required.

⁶ We allow operators defined only for a restricted set of revision inputs, such as sets with positive measure under $\mu $ .

⁷ In fact, the No-Go Theorem as originally proven in [Reference Kelly and Lin15] is even more general, where the impossibility result is extended not only to AGM revision, but to any revision operators satisfying Inclusion and Preservation.

⁸ In Skyrms’ [Reference Skyrms31] terminology, this is what might be called absolute resiliency. Skyrms [Reference Skyrms31] defines resiliency as a property of probability claims (as opposed to propositions): the resiliency of the claim ‘ $\mu (X)=a$ ’ is $1-\textrm {sup}_{\{Y\,|\,X\cap Y\neq \emptyset \}} |\mu (X\,|\,Y)-a|$ . By contrast, the degree of stability $\mathcal {S}(\mu , X)$ does not measure how much the probability of X can change from its current value by conditioning on possible defeaters, but rather how low it can be brought in absolute terms.

⁹ For $\mathfrak {A}=\mathcal {P}(\Omega )$ with $|\Omega |=3$ , each revision which fails tracking is in fact correctible: this is because, in each such case we have $\tau (\mu _{ X})$ is a singleton, in which case it suffices to raise the threshold above $\mu _{ X}(\tau (\mu _{ X}))$ .

¹⁰ Leitgeb [Reference Leitgeb21, theorem 7, p. 121] shows something slightly stronger: namely, that (i) and (ii) above are equivalent to the statement that $\mathbb {K}$ is consistent, closed under consequence and conjunction, and is generated by some unique strongest proposition S such that it satisfies the right-to-left direction of the Lockean Thesis with threshold equal to $\mu (S)$ .

¹¹ It can be shown that the probabilities of stable sets are Lockean thresholds that yield a conjunctive, deductively closed, and consistent belief set.

¹² To see why this is strict: suppose we have equality here. This means $(\tau (\mu )^{\ast }X)\setminus \tau (\mu _{ X})$ has measure 0 under $\mu _{ X}$ . But we know, by assumption, that $\mu _{ X}(\tau (\mu )^{\ast }X)<1$ . So $\tau (\mu )^{\ast }X$ is a $(\mu _{ X},t)$ -stable set with probability $<1$ , and with a measure 0 subset. This cannot happen in general: for suppose S is $(\mu ,t)$ -stable, $\mu (S)<1$ and $A\subset S$ with $\mu (A)=0$ . Then $S^{c}\cup A$ is a defeater for S, contradicting the fact that S is stable.

¹³ This question also applies to more reductive interpretations the Humean Thesis: under one such reading, the beliefs of the agent are determined by the probability measure $\mu $ and a choice of Lockean threshold, where the choice depends on $\mu $ (such an interpretation is compatible with the Humean Thesis, but it is in no way entailed by it: see our remark above about the dual-system view of credence and belief, favoured by Leitgeb). In order to revise her beliefs in accordance with AGM, in situtations like the above, the agent must adopt—after the revision—a different Lockean threshold (distinct from both the stablity threshold and the previous Lockean threshold). In order to justify AGM revisions, one would want to explain why the Bayesian agent should feel compelled to select a Lockean threshold that generates the AGM-compliant revision; it would be desirable to justify this choice in a principled manner, preferably by appealing to reasons independently acceptable by Bayesian lights. Can this be done without independently assuming the AGM postulates? Looking at degrees of stability gives rise to a worry about this approach: when ‘forcing’ the relevant AGM-compliant belief state in noncorrigible cases of tracking failure like we saw above, inducing AGM revision in this way means choosing a belief state $\tau (\mu )^{\ast }X$ that is less stable (in the sense of degrees of stability $\mathcal {S}$ ) than the one given by Bayesian Conditioning and the $\tau $ rule. Presumably, if the stability requirement for the strongest accepted proposition has any plausibility at the outset—at least enough plausibility to convince the Bayesian reasoner to opt for stable sets as qualitative representations of belief—then perhaps it would be even more natural to advocate the choice of $\tau (\mu _{ X})$ as belief state, given that it is not only logically stronger, but also more stable than $\tau (\mu )^{\ast }X$ . The choice of the AGM-revised belief set is made at the cost of the stability of the strongest accepted proposition. On this reductive version of the two-threshold Humean Thesis, the choice of the AGM-appropriate Lockean threshold would require some additional motivation.

¹⁴ Jensen’s inequality states that whenever we have a strictly convex map $f:\mathbb {R}\rightarrow \mathbb {R}$ , then $\forall x_{i}\in Dom(f)$ (with $i\leq n$ ) and $\alpha _{i}\in [0,1]$ with $\sum _{i}\alpha _{i}=1$ , we have that

$$ \begin{align*} f\left(\sum\limits^{n}_{i=1}\alpha_{i}x_{i}\right)<\sum\limits^{n}_{i=1}\alpha_{i}f(x_{i}) \end{align*} $$

unless all $x_{i}$ s are equal and all $\alpha _{i}>0$ , in which case we have equality. Here we can use the strict inequality, as the $x_{i}$ correspond to the $\mu (\omega )$ s, which, by assumption, are not all equal.

¹⁵ The same holds if one remembers not only the plausibility ordering, but also the measures of each rank.

¹⁶ Note, however, that the acceptance zone $[\mathfrak {S}]$ is generally not convex: it is the restriction to rank-uniform measures that allows to optimize over a convex domain.

¹⁷ When f is strictly concave and $\mathcal {D}$ convex, if $\arg \max _{x\in \mathcal {D}} f(x) $ exists, it is unique [Reference Sundaram33, theorem 7.14, p. 186].

¹⁸ The second equality follows by first observing that $\tilde {h}(\alpha \textbf {y} ) = \alpha \tilde {h}(\textbf {y})+ \alpha \log \big (\frac {1}{\alpha }\big ) $ , since:

$$ \begin{align*} \tilde{h}(\alpha\textbf{y} ) &= \sum\limits_{i} \alpha y_{i}\log\left(\frac{r_{i}}{\alpha y_{i}}\right) = \sum\limits_{i} \alpha y_{i} \left( \log\left(\frac{1}{\alpha}\right) + \log\left(\frac{r_{i}}{y_{i}}\right) \right)\\ &= \sum\limits_{i} \alpha y_{i} \log\left(\frac{1}{\alpha}\right) + \sum\limits_{i} \alpha y_{i}\log\left(\frac{r_{i}}{y_{i}}\right) \\ &= \alpha \log\left(\frac{1}{\alpha}\right) \sum\limits_{i} y_{i} + \alpha \sum\limits_{i} y_{i}\log\left(\frac{r_{i}}{y_{i}}\right)\\ &= \alpha \log\left(\frac{1}{\alpha}\right) + \alpha \tilde{h}(\textbf{y}), \end{align*} $$

as long as $\sum _{i} y_{i} =1$ . So $\tilde {h}(\alpha \textbf {y}^{\ast } ) - \tilde {h}(\alpha \textbf {y})=\alpha \tilde {h}(\textbf {y}^{\ast } ) + \alpha \log \big (\frac {1}{\alpha }\big ) - \Big ( \alpha \tilde {h}(\textbf {y}) + \alpha \log \big (\frac {1}{\alpha }\big ) \Big ) = \alpha \Big (\tilde {h}(\textbf {y}^{\ast } ) - \tilde {h}(\textbf {y})\Big )$ , as desired.

¹⁹ In a sense, the AGM agent can be seen as a Bayesian reasoner who throws away information in a specific (and quantifiable) way, each time passing from a probability measure to the maximum entropy representative of her qualitative belief state. This raises further questions about the impact of such information loss on learning and decision-making performance: we shall return to this point in §8.3.

²⁰ It should be mentioned that, while the idea of resorting to a qualitative representation for reasons of cognitive economy is intuitive, it is hard to say what exactly is being economized in the passage from one representation to the other. In particular, computational complexity might not be the appropriate measure to appeal to: it is not clear that applying the stability rule and performing a qualitative revision presents a real advantage, from the point of view of computational tractability, to reasoning with (possibly approximate) probabilities directly. But the agent may be restricted to qualitative representations for other reasons (perhaps having little to do with computational complexity): for instance, Leitgeb [Reference Leitgeb21] sketches a ‘two-systems’ account of beliefs and credences in which the categorical beliefs are more closely tied to introspection, language processing and explicit reasoning than degrees of belief, and where “the states produced by the categorical categorical belief system could be structurally simpler[…] [and] easier to access consciously: introspection might work well for simple categorical beliefs while it might be hard, if not impossible, in the case of the much more demanding numerical beliefs” [Reference Leitgeb21, p. 23]. For accounts such as this one, the appeal to AGM revision, as a form of Bayesian conditioning on a least biased probability measure, remains a plausible rationale.

²¹ For instance, Lin [Reference Lin and Hung24] proposes to view the probabilistic and logical belief states as two co-existing ‘cognitive systems’, each appropriate for different reasoning problems:

Although the agent already possesses the probabilistic credences that [the subjective probability function] P assigns, those credences might not be “usable” immediately. For example, given that $P (A) = .9274\ldots$ and $P (A|E) = .0361\ldots$ , it does not mean that these credences have already had an impact on the agent’s behavior or thought. For these credences might be stored in a way that requires them to be decoded before they can have any impact on behavior or thought. Such a decoding process might correspond to, for example, retrieval of memory or deliberation over the evidence available. […] When the agent is solving a relatively simple problem—say, about grocery shopping rather than about stock investment—then it might be advantageous for the agent to possess and process the binary representations of belief. (…) If the agent implements a suitable planning algorithm that makes use of […] binary beliefs (rather than probabilistic credences), it might take a relatively short time for her to arrive at a quite reasonable solution. […] If a problem can be solved by a computation process that is more efficient because it works with the simpler, binary representations, then let the problem be solved that way. Reserve probabilistic representations for harder problems, such as those about stock investment. [Reference Lin and Hung24, p. 216]

²² We do not assume here that MAXENT is normatively binding, or that it is the only correct way to recover a canonical compatible measure. Our main purpose here is to prove the suggestive characterization of AGM reasoners as stability-complying, maximum-entropy Bayesians: we leave aside questions about the normative status of MAXENT, and ultimately the question of justification for AGM operators. It is certainly true that the use of MAXENT in this case may not appeal to everyone, and will be more convincing to the ‘objectivist’ brand of Bayesians who are happy to adopt equivocation as a norm in determining priors. There are well-known and legitimate worries about MAXENT as a general principle for probabilistic inference and updating [Reference Friedman and Shimony8, Reference Seidenfeld29, Reference Skyrms32]. One point worth mentioning is that the maximum entropy principle is not used here as an update method. We do not move from a prior distribution to a posterior: rather, MAXENT is used to elicit a probability measure in a case where we only have inequality constraints on the choice of compatible probability measure (no other information is given: the ranking constraints are all there is, and the situation is even compatible with the agent failing to have fully specified credences altogether). Arguably, in this case we do not treat the information given by the constraint $\mathfrak {S}^{t}(\mu ) = \mathfrak {T}$ (where $\mathfrak {T}$ is a system of spheres) as incoming data about an unknown distribution, about which the agent may have beliefs (these are the type of situations to which the Friedman–Shimony argument [Reference Friedman and Shimony8] against MAXENT apply).

²³ There is, in general, no unique maximum entropy measure generating the given belief set through Lin and Kelly’s odds-ratio rule.

²⁴ Natural constraints include the geometrical constraints imposed by Lin and Kelly [Reference Kelly and Lin14, Reference Kelly and Lin15], or convexity requirements for acceptance zones. The latter is reminiscent of the convexity requirement for credal sets advocated by Levi [Reference Levi22]. The qualitative belief state of the agent constrains the credences of the agent through the acceptance rule. It is tempting to see this as a generalization of Levi’s account of credence and full belief to other acceptance rules: thus acceptance bridges, in Levi’s terminology, full beliefs and the credal set of probability measures consistent with the former. This analogy is suggestive, even though in many ways this particular dependence on acceptance rules is not at all in line with Levi’s approach. Here it is worth noting that the ‘credal sets’ corresponding to belief sets, or to systems of spheres (i.e. the acceptance zone $\{\mu \in \Delta _{\mathfrak {A}}\,|\, \mathfrak {S}^{t}(\mu )=\mathfrak {T}\}$ for a system of spheres $\mathfrak {T}$ ) are not in general convex.

²⁵ It is worth comparing this approach to the questions investigated by Icard and Goodman [Reference Icard and Goodman12] in a study of the causal frame problem. There, the authors study the effects of employing a simplified representation in inference and decision problems: they consider the performance of an agent reasoning with a causal model (such as a Bayes net or a Hidden Markov Model) who, to reduce computational costs, discards information by operating on a simpler ‘submodel’ thereof (that is, a probability distribution over a model that ignores certain variables present in the original model). They suggest two questions concerning the effect of using such a submodel. Firstly, how much information do we lose with respect to the full model? That is, what is the KL divergence of the resulting approximate distribution from an initial ‘true’ distribution? Secondly, within a fixed decision problem, what is the difference in expected utilities between the original model and the approximate one? Both of these questions can naturally be asked here: the acceptance rule and the MAXENT principle determine the particular approximate distribution to use, and we can ask what is lost with respect to the original distribution, both in terms of information loss and expected utility. In particular, we can ask how the use of the maximum entropy measure affects the agent’s performance when reasoning about a causal problem, such as one modeled by a Bayes net.

²⁶ See the Preliminaries at the beginning of the paper.

²⁷ The expression for $\mu (K)$ increases in t. Differentiating $\frac {t\cdot |K|}{1+t(|K|-1)}$ w.r.t. t, we get $\frac {|K|}{(1+t(|K|-1))^{2}}$ , which is always positive for $|K|\geq 1$ .

References

BIBLIOGRAPHY

Alchourrón, C., Gärdenfors, P, & Makinson, D. (1985). On the logic of theory change: partial meet contraction and revision functions. The Journal of Symbolic Logic, 50(2), 510–530.CrossRef Google Scholar

Arló-Costa, H. & Pedersen, A. P. (2011). Belief revision. In Horsten, L., and Pettigrew, R., editors. Continuum Companion to Philosophical Logic. London: Continuum Press.Google Scholar

Arló-Costa, H. & Pedersen, A. P. (2012). Belief and probability: A general theory of probability cores. International Journal of Approximate Reasoning, 53(3), 293–315.CrossRef Google Scholar

Boyd, S. & Vandenberghe, L. (2004). Convex Optimization. Cambridge: Cambridge University Press.CrossRef Google Scholar

Bradley, S. & Steele, K. (2016). Can free evidence be bad? value of information for the imprecise probabilist. Philosophy of Science, 83, 1–28.CrossRef Google Scholar

Cohen, J. (1992). An Essay on Belief and Acceptance, Oxford: Clarendon Press.Google Scholar

Foley, R. (1993). Working Without a Net. Oxford: Oxford University Press.Google Scholar

Friedman, K. & Shimony, A. (1971). Jaynes’s maximum entropy prescription and probability theory. Journal of Statistical Physics, 3(4), 620–663.CrossRef Google Scholar

Griffiths, T. & Lieder, F. (2019). Resource-rational analysis: Understanding human cognition as the optimal use of limited computational resources. Behavioral and Brain Sciences, 4, 1–85.Google Scholar

Grove, A. (1988). Two modellings for theory change. Journal of Philosophical Logic, 17, 157–170.CrossRef Google Scholar

Halpern, J. (2003). Reasoning about Uncertainty. Cambridge, MA: MIT Press.Google Scholar

Icard, T. F. & Goodman, N. D. (2015). A resource-rational approach to the causal frame problem. Proceedings of the 37th Annual Conference of the Cognitive Science Society, Pasadena, CA, USA.Google Scholar

Jaynes, E. T (1957). Information theory and statistical mechanics. Physical Review Series II, 106(4), 620–663.Google Scholar

Kelly, K. & Lin, H. (2012). A geo-logical solution to the lottery-paradox, with applications to conditional logic. Synthese, 186(2), 531–575.Google Scholar

Kelly, K. & Lin, H. (2012). Propositional reasoning that tracks probabilistic reasoning. Journal of Philosophical Logic, 41(6), 957–981.Google Scholar

Kraus, S., Lehmann, D., & Magidor, M. (1990). Nonmonotonic reasoning, preferential models and cumulative logics. Artificial Intelligence, 44(1-2), 167–207.CrossRef Google Scholar

Kyburg, H. (1970). Probability and Inductive Logic. Toronto: Palgrave Macmillan.Google Scholar

Leitgeb, H. (2013). Reducing belief simpliciter to degrees of belief. Annals of Pure and Applied Logic, 164, 1338–1389.CrossRef Google Scholar

Leitgeb, H. (2014). Belief as a simplification of probability, and what it entails. In Baltag, A., & Smets, S., editors. Johan van Benthem on Logic and Information Dynamics. New York: Springer-Verlag, pp. 405–417.CrossRef Google Scholar

Leitgeb, H. (2014). The stability theory of belief. Philosophical Review, 123(2), 131–171.CrossRef Google Scholar

Leitgeb, H. (2017). The Stability of Belief. Oxford: Oxford University Press.CrossRef Google Scholar

Levi, I. (1980). The Enterprise of Knowledge: An Essay on Knowledge, Credal Probability, and Chances. Cambridge, MA: MIT Press.Google Scholar

Lin, H. (2013). Foundations of everyday practical reasoning. Journal of Philosophical Logic, 42, 831–862.CrossRef Google Scholar

Lin, H. (2016). Bridging the logic-based and probability-based approaches to artificial intelligence. In Hung, T.-W., editor. Rationality: Constraints and Contexts. Amsterdam: Elsevier, pp. 215–225.Google Scholar

Mierzewski, K. (2018). Probabilistic Stability: Dynamics, Nonmonotonic Logics, and Stable Revision. MSc Thesis, Institute for Logic, Language and Computation, University of Amsterdam.Google Scholar

Paris, J. B. (1994). The Uncertain Reasoner’s Companion: A Mathematical Perspective. Cambridge Tracts in Theoretical Computer Science, Vol. 39. Cambridge: Cambridge University Press.Google Scholar

Roman, S. (1997). Coding and Information Theory, Graduate Texts in Mathematics. New York: Springer-Verlag.Google Scholar

Savage, L. J. (1954). The Foundations of Statistics. New York: John Wiley & Sons Inc.Google Scholar

Seidenfeld, T. (1986). Entropy and uncertainty. Philosophy of Science, 53, 467–491.CrossRef Google Scholar

Shear, T. & Fitelson, B. (2019). Two approaches to belief revision. Erkenntnis, 84, 487–518.CrossRef Google Scholar

Skyrms, B. (1977). Resiliency, propensities, and causal necessity. The Journal of Philosophy, 74(11), 704–711.CrossRef Google Scholar

Skyrms, B. (1987). Updating, Supposing, and Maxent. Theory and Decision, 22(3), 225–246.CrossRef Google Scholar

Sundaram, R. K. (1996). A First Course in Optimization Theory. Cambridge: Cambridge University Press.CrossRef Google Scholar

van Fraassen, B. (1980). The Scientific Image. Oxford: Oxford University Press.CrossRef Google Scholar

Walley, P. (1991). Statistical Reasoning with Imprecise Probabilities. Monographs on Statistics and Applied Probability, Vol. 42. London: Chapman and Hall.CrossRef Google Scholar

Fig. 1 The general tracking problem.

Fig. 2 Rationalizing AGM revision by raising the threshold: here we pick a new threshold $q > t$.

Fig. 3 AGM revision emerges from Bayesian conditioning via the maximum entropy principle.

Article contents

PROBABILISTIC STABILITY, AGM REVISION OPERATORS AND MAXIMUM ENTROPY

Abstract

Keywords

MSC classification

1 Tracking Bayesian conditioning

2 Preliminaries

3 The stability rule

Definition 3.1 (Stability)

Proposition 3.2 (Leitgeb [Reference Leitgeb18, Reference Leitgeb21])

Proposition 3.3 (Leitgeb [Reference Leitgeb18, Reference Leitgeb21])

Definition 3.4 (The $\tau $ -rule)

4 The connection to AGM revision

5 Tracking and the No-Go theorem

Definition 5.1 (Tracking)

Theorem 5.2 (The No-Go Theorem, Lin and Kelly [Reference Kelly and Lin15])

6 How the stability rule fails tracking

7 Can we save tracking by changing the threshold?

Theorem 7.2 (Leitgeb [Reference Leitgeb21])

8 Recovering AGM operators via Maximum Entropy

8.1 The main results

8.2 Discussion

8.3 Other approaches to the tracking problem

Some further questions

9 Conclusion

Appendix A Additional details for Proposition 8.1

Acknowledgments

Footnotes

References

BIBLIOGRAPHY

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests