A unified stability theory for classical and monotone Markov chains

Takashi Kamihigashi; John Stachurski

doi:10.1017/jpr.2019.2

A unified stability theory for classical and monotone Markov chains

Part of: Markov processes Spaces with richer structures Ordered sets

Published online by Cambridge University Press: 12 July 2019

Takashi Kamihigashi and

John Stachurski

Show author details

Takashi Kamihigashi*: Affiliation:
Kobe University
John Stachurski*: Affiliation:
Australian National University
*: *Postal address: Research Institute of Economics and Business, Kobe University, Japan. Email address: tkamihig@rieb.kobe-u.ac.jp
**Postal address: Research School of Economics, Australian National University, Australia. Email address: john.stachurski@anu.edu.au

Article contents

Abstract
Introduction
Preliminaries
Ordered affinity
Total ordered variation
Applications
References

Rights & Permissions

Abstract

In this paper we integrate two strands of the literature on stability of general state Markov chains: conventional, total-variation-based results and more recent order-theoretic results. First we introduce a complete metric over Borel probability measures based on ‘partial’ stochastic dominance. We then show that many conventional results framed in the setting of total variation distance have natural generalizations to the partially ordered setting when this metric is adopted.

Keywords

Total variation Markov chain stochastic domination coupling

MSC classification

Primary: 60J05: Discrete-time Markov processes on general state spaces 60J99: None of the above, but in this section

Secondary: 54E50: Complete metric spaces 06A06: Partial order, general

Type: Research Papers
Information: Journal of Applied Probability , Volume 56 , Issue 1 , March 2019 , pp. 1 - 22

DOI: https://doi.org/10.1017/jpr.2019.2 [Opens in a new window]
Copyright: © Applied Probability Trust 2019

1. Introduction

Following the work of Wolfgang Doeblin ([Reference Doeblin6], [Reference Doeblin7], [Reference Doeblin8]), many classical results from Markov chain theory have built on fundamental connections between total variation distance, Markov chains and couplings. For some models, however, total variation convergence is too strong. In response, researchers have developed an alternative methodology based on monotonicity ([Reference Bhattacharya and Lee1], [Reference Dubins and Freedman10], [Reference Yahav36]). In this line of research, transition probabilities are assumed to have a form of monotonicity not required in the classical theory. At the same time, mixing conditions are generally weaker, as is the notion of convergence to the stationary distribution. Further contributions to this approach can be found in [Reference Bhattacharya, Majumdar and Hashimzade2], [Reference Hopenhayn and Prescott16], [Reference Kamihigashi and Stachurski20], and [Reference Razin and Yahav30]. For some recent extensions and applications in economics, see [Reference Kamihigashi and Stachurski21].

To give one example, consider a Markov chain {X_t} defined by

(1.1)

\begin{align}\label{eq:bern} X_{t+1} = \frac{X_t + W_{t+1}}{2},\end{align}

where {W_t}_t≥1 is an independent and identically distributed (i.i.d.) Bernoulli($(\tfrac12)$) random sequence, taking values 0 and 1 with equal probability. For the state space take S = [0, 1]. Let P^t(x, ·) be the distribution of X_t given X ₀ = x ∈ S. Clearly, if X_t is a rational number in S then so is X _t+1. Similarly, if X_t is irrational then so is X _t+1. Thus, if x and y are rational and irrational, respectively, the distributions P^t(x, ·) and P^t(y, ·) are concentrated on disjoint sets, and, hence, when ‖ · ‖ is the total variation norm,

Figure 1: A time series from the inventory model

\begin{align*}\| P^t(x, \cdot) - P^t(y, \cdot) \| = 2 \quad {{\rm for}\; {\rm all}\; t \in {\mathbb N}.} \end{align*}

Total variation convergence fails for this class of models.

At the same time, the right-hand side of (1.1) is increasing in the current state for each fixed value of the shock W _t+1. Moreover, trajectories mix in a monotone sense: a trajectory starting at X ₀ = 0 can approach 1 with a suitable string of shocks and a trajectory starting at 1 can approach 0. Using these facts, one can show using the results in [Reference Bhattacharya and Lee1], say, that a unique stationary distribution exists and the distribution of X_t converges to it in a complete metric defined over the Borel probability measures that is weaker than total variation convergence.

This is one example where monotone methods can be used to establish some form of stability, despite the fact that the classical conditions based around total variation convergence fail. Conversely, there are many models that the monotone methods developed in [Reference Bhattacharya and Lee1] and related papers cannot accommodate, while the classical theory based around total variation convergence handles them easily. One example is the simple ‘inventory’ model

(1.2)

\begin{align} \label{eq:invent} X_{t+1} = \begin{cases} (X_t - W_{t+1})_+ & \text{if } X_t > 0,\\ (K - W_{t+1})_+ & \text{if } X_t = 0, \end{cases} \end{align}

where x ₊ := max{x, 0}. Again, {W_t} is i.i.d. Assume that ln W_t is standard normal. The state space we take to be S = [0, K]. Figure 1 shows a typical trajectory when K = 100 and X ₀ = 50.

On the one hand, the monotone methods in [Reference Bhattacharya and Lee1] cannot be applied here because of a failure of monotonicity with respect to the standard ordering of ℝ. On the other hand, the classical theory based around total variation convergence is straightforward to apply. For example, one can use Doeblin’s condition (see, e.g. [Reference Meyn and Tweedie25, Theorem 16.2.3]) to show the existence of a unique stationary distribution to which the distribution of X_t converges in total variation, regardless of the distribution of X ₀. In the terminology of [Reference Meyn and Tweedie25], the process is uniformly ergodic.

The purpose of this paper is to show that both of these stability results (i.e. the two sets of results concerning the two models (1.1) and (1.2)), which were based on two hitherto separate approaches, can be derived from the same theoretical framework. More generally, we construct stability results that encompasses all uniformly ergodic models in the sense of [Reference Meyn and Tweedie25] and all monotone models shown to be stable in [Reference Bhattacharya and Lee1], as well as extending to other monotone or partially monotone models on state spaces other than ℝⁿ.

We begin our analysis by introducing what is shown to be a complete metric γ on the set of Borel probability measures on a partially ordered Polish space that includes total variation distance, the Kolmogorov uniform distance, and the Bhattacharya distance ([Reference Bhattacharya and Lee1], [Reference Chakraborty and Rao3]) as special cases. We show that many fundamental concepts from conventional Markov chain theory using total variation distance and coupling have direct generalizations to the partially ordered setting when this new metric is adopted. Then, by varying the choice of partial order, we recover key aspects of both classical total-variation-based stability theory and monotone methods as special cases.

Prior to commencing we note that, in terms of existing literature, there is an additional line of research that deals with Markov models for which the classical conditions of irreducibility and total variation convergence fail. Irreducibility is replaced by an assumption that the law of motion for the state is itself contracting ‘on average’, and this contractivity is then passed on to an underlying metric over distributions that conforms in some way to the topology of the state space. See, for example, [Reference Diaconis and Freedman4], [Reference Jarner and Tweedie17], or [Reference Wu and Shao35]. Such results can be used to show stability of our first example, which contracts on average with respect to the usual metric on R. On the other hand, it cannot be directly applied to our second (i.e. inventory) example using the same metric, since the law of motion contains a jump. In [Reference Bhattacharya and Lee1] and [Reference Hopenhayn and Prescott16] one can find other applications where monotone methods can be used—including the results developed here— while the ‘contraction on average’ conditions of [Reference Diaconis and Freedman4] and [Reference Wu and Shao35] do not in general hold. See, for example, the growth model studied in Section 6.B of [Reference Hopenhayn and Prescott16], where a high marginal product of capital near zero generates rapid growth when the capital stock is low. One consequence is that the law of motion does not typically exhibit contraction on average in the neighbourhood of zero.

After preliminaries, we begin with a discussion of ‘ordered’ affinity, which generalizes the usual notion of affinity for measures. The concept of ordered affinity is then used to define the total ordered variation metric. Throughout the paper, longer proofs are deferred to Appendix A. The conclusion contains suggestions for future work.

2. Preliminaries

Let S be a Polish (i.e. separable and completely metrizable) space, let $\cal O$ be the open sets, let $\mathfrak C$ be the closed sets, and let $\cal B$ be the Borel sets. Let ${\cal M}_s$ denote the set of all finite signed measures on ($(S, \cal B)$). In other words, ${\cal M}_s$ is all countably additive set functions from $\cal B$ to ℝ. Let $\cal M$ and $\cal P$ be the finite measures and probability measures in ${\cal M}_s$, respectively. If κ and λ are in ${\cal M}_s$, then k ≤ λ means that κ(B) ≤ λ(B) for all B ∈ B. The symbol δ_x denotes the probability measure concentrated on x ∈ S.

Let bS be the set of all bounded $\cal B$-measurable functions from S into ℝ. If h ∈ bS and $\lambda \in \cal M_s$, then λ(h) := ∫ dλ. For f and g in bS, the statement f ≤ g means that f (x) ≤ g(x) for all x ∈ S. Let

\begin{align*}H \,: = \{ {h \in bS }:{ -1 \leq h \leq 1} \} \quad \text{and} \quad H_0 \,: = \{ { h \in bS }:{ 0 \leq h \leq 1}\}.\end{align*}

The total variation norm of $\lambda \in \cal M_s$ is ||λ|| := sup_h∈H |λ(h)|. Given μ and ν in $\cal P$, a random element (X, Y) taking values in S × S and defined on a common probability space (Ω, $\cal F$, ℙ) is called a coupling of (μ, ν) if μ = ℙ ° X ⁻¹ and ν = ℙ ° Y ⁻¹ (i.e. if the distribution of (X, Y) has marginals μ and ν, respectively; see, e.g. [Reference Lindvall22] or [Reference Thorisson34]). The set of all couplings of (μ, ν) is denoted below by $\cal C(\mu, \nu)$. A sequence $\{\mu_n\} \subset \cal P$ converges to $\mu \in \cal P$ weakly if μ_n(h) → μ(h) as n → ∞ for all continuous h ∈ bS. In this case we write $\mu_n {\buildrel {\rm w} \over \longrightarrow} \mu$.

Given μ and $\nu \in \cal M$, their measure-theoretic infimum μ ∧ ν is the largest element of $\cal M$ dominated by both μ and ν. It can be defined by taking f and g to be densities of μ and ν, respectively, under the dominating measure λ : = μ + ν and defining μ ∧ ν by (μ ∧ ν)(B) := ∫_B min{ f (x), g(x)}λ(dx) for all $B \in \cal B$. The total variation distance between μ and ν is related to μ ∧ ν via ||μ − ν|| = ||μ|| + ||ν|| − 2 ||μ ∧ ν||. See, for example, [Reference Pollard29]. For probability measures, we also have

(2.1)

\begin{align} \label{eq:sisf} \sup_{B \in \cal B} \{ \mu(B) - \nu(B) \} = \sup_{B \in \cal B} |\mu(B) - \nu(B) | = \tfrac12\| \mu - \nu \| . \end{align}

The affinity between two measures μ, ν in $\cal M$ is the value α(μ, ν) := (μ ∧ ν)(S). The following properties are elementary.

Lemma 2.1. For all $ (\mu, \nu) \in \cal M \times {\cal M}$, we have

(a) 0 ≤ α(μ, ν) ≤ min{μ(S), ν(S)},
(b) α(μ, ν) = μ(S) = ν(S) if and only if μ = ν,
(c) α(cμ, cν) = cα(μ, ν) for all c ≥ 0.

There are several other common representations of affinity. For example, when μ and ν are both probability measures, we have

(2.2)

\begin{align} \label{eq:affc}\alpha(\mu, \nu)= 1 - \sup_{B \in \cal B} | \mu(B) - \nu(B) |= \max_{(X, Y) \in \cal C(\mu, \nu)} \mathbb P\{ X = Y \}.\end{align}

(See, e.g. [Reference Lindvall22] and [Reference Pollard29].) The second equality in (2.2) states that, if $ (X,Y) \in \cal C(\mu, \nu)$, then ℙ{X = Y} ≤ α(μ, ν), and, moreover, there exists a $ (X, Y) \in \cal C(\mu, \nu)$ such that equality is attained. Any such coupling is called a maximal or gamma coupling. See Theorem 5.2 of [Reference Lindvall22]. From (2.1) and (2.2) we obtain

(2.3)

\begin{align} \label{eq:tvmtap} \| \mu - \nu \| = 2(1 - \alpha(\mu, \nu)). \end{align}

3. Ordered affinity

We next introduce a generalization of affinity when S has a partial order. We investigate its properties in detail, since both our metric and the stability theory presented below rely on this concept.

3.1. Preliminaries

As before, let S be a Polish space. A closed partial order ‘⪯’ on S is a partial order ‘⪯’ such that its graph

\begin{align*}\mathbb G \,:= \{{(x, y) \in S \times S}{x \preceq y}\}\end{align*}

is closed in the product topology. Henceforth, a partially ordered Polish space is any such pair (S, ⪯), where S is nonempty and Polish, and ‘⪯’ is a closed partial order on S. When no confusion arises, we denote it simply by S. The partial order is assumed to be closed in the theory developed below because we build a metric over Borel probability measures that depends on this partial order and closedness of the order is required to show that the metric is complete.

For such a space S, we call I ⊂ S increasing if x ∈ I and x ⪯ y implies that y ∈ I. We call h: S → ℝ increasing if x ⪯ y implies that h(x) ≤ h(y). We let $i\cal B,\,i\cal O$, and $i\mathfrak C$ denote the increasing Borel, open, and closed sets, respectively, while ibS is the increasing functions in bS. In addition,

• iH := H ∩ ibS = {h ∈ ibS: − 1 ≤ h ≤ 1}, and
• iH ₀ := H ₀ ∩ ibS = {h ∈ ibS: 0 ≤ h ≤ 1}.

If $B \in \cal B$ then i(B) is all y ∈ S such that x ⪯ y for some x ∈ B, while d(B) is all y ∈ S such that y ⪯x for some x ∈ B. Given μ and ν in $\cal M$, we say that μ is stochastically dominated by ν and write μ ⪯_sd ν if μ(S) = ν(S) and μ(I) ≤ ν(I) for all $I \in i\cal B$. Equivalently, μ(S) = ν(S) and μ(h) ≤ ν(h) for all h in iH or iH ₀. See [Reference Kamae and Krengel18].

One important partial order on S is the identity order, where x ⪯ y if and only if x = y. Then $i\cal B = \cal B$, ibS = bS, iH = H, iH ₀ = H ₀, and μ ⪯_sd ν if and only if μ = ν.

Remark 3.1. Since S is a partially ordered Polish space, for any μ, ν in $\cal P$, we have μ = ν whenever μ(C) = ν(C) for all $C \in i \mathfrak C$, or, equivalently, μ(h) = ν(h) for all continuous h ∈ ibS. See [Reference Kamae and Krengel18, Lemma 1]. Hence, μ ⪯_sd ν and ν ⪯_sd μ imply that μ = ν.

Lemma 3.1. If $\lambda \in \cal M_s$ then $\sup_{I \in i\cal B} \lambda(I) = \sup_{h \in iH_0} \lambda(h)$ and

(3.1)

\begin{align}\label{eq:cfs2} \sup_{h \in iH} |\lambda(h)| = \max \Big\{\sup_{h \in iH} \lambda(h),\; \sup_{h \in iH} (-\lambda)(h) \Big\}\!.\\[-12pt]\nonumber \end{align}

The proof is given in Appendix A.1. We can easily check that

(3.2)

\begin{align}\label{eq:chho}\lambda \in \cal M_s \quad\text{and}\quad \lambda(S) = 0 \quad \Rightarrow \quad \sup_{h \in iH} \lambda(h) = 2 \sup_{h \in iH_0} \lambda(h). \end{align}

3.2. Definition of ordered affinity

For each pair $ (\mu, \nu) \in {\cal M} \times \cal M$, let

\begin{align*} {\Phi(\mu, \nu) \,:= \{ {(\mu', \nu') \in \cal M \times \cal M }: {\mu' \leq \mu,\, \nu' \leq \nu,\, \mu' {\preceq_{sd}} \nu'}\}}.\end{align*}

We call Φ(μ, ν) the set of ordered component pairs for (μ, ν). Here ‘ordered’ means ordered by stochastic dominance. The set of ordered component pairs is always nonempty, since (μ ∧ ν, μ ∧ ν) is an element of Φ(μ, ν).

Example 3.1. If μ and ν are two measures satisfying μ ⪯_sd ν, then (μ, ν) ∈ Φ (μ, ν).

Example 3.2. Given Bernoulli distributions μ = (δ ₁ + δ ₂)/2 and ν = (δ ₀ + δ ₁)/2, we have (μ′, ν′) ∈ Φ (μ, ν) when μ′= ν′= δ ₁/2.

We call an ordered component pair (μ′, ν′) ∈ Φ (μ, ν) a maximal ordered component pair if it has greater mass than all others; that is, if

\begin{align*} \mu''(S) \leq \mu'(S) \quad \text{for all } (\mu'', \nu'') \in \Phi(\mu, \nu). \end{align*}

(We can restate this by replacing μ′(S) and μ″(S) with ν′(S) and ν″(S), respectively, since the mass of ordered component pairs is equal by the definition of stochastic dominance.) We let Φ^∗(μ, ν) denote the set of maximal ordered component pairs for (μ, ν). Thus, if

(3.3)

\begin{align}\label{eq:dsig}{ \alpha_O(\mu, \nu)\,:= {\rm sup} \{ {\mu'(S) }{ (\mu', \nu') \in \Phi(\mu, \nu) }\}},\end{align}

then

\begin{align*}{\Phi^*(\mu, \nu)= \{{(\mu', \nu') \in \Phi(\mu, \nu)}{\mu'(S) = \alpha_O(\mu, \nu)}\}.}\end{align*}

Using the Polish space assumption, one can show that maximal ordered component pairs always exist.

Proposition 3.1. The set Φ^∗(μ, ν) is nonempty for all ${(\mu, \nu) \in \cal M \times \cal M}$.

Proof. Fix ${(\mu, \nu) \in \cal M \times \cal M}$, and let s := α_O(μ, ν). From the definition, we can take sequences $\{\mu'_n\}$ and $\{\nu'_n\}$ in $\cal M$ such that $ (\mu'_n, \nu'_n) \in \Phi(\mu, \nu) $ for all n ∈ ℕ and $\mu'_n(S) \uparrow s$. Since $\mu'_n \leq \mu$ and $\nu'_n \leq \nu$ for all n ∈ ℕ, Prokhorov’s theorem [Reference Dudley11, Theorem 11.5.4] implies that these sequences have convergent subsequences with $\mu'_{n_k} \buildrel {\rm w} \over \longrightarrow \mu'$ and $\nu'_{n_k} \buildrel {\rm w} \over \longrightarrow \nu'$ for some $\mu',\, \nu' \in \cal M$. We claim that (μ′, ν′) is a maximal ordered component pair.

Since $\mu'_{n_k} \buildrel {\rm w} \over \longrightarrow \mu'$ and $\mu'_n \leq \mu$ for all n ∈ ℕ, Theorem 1.5.5 of [Reference Hernández-Lerma and Lasserre15] implies that, for any Borel set B, we have μ′_nk(B) → μ′(B) in ℝ. Hence, μ′(B) ≤ μ(B) and, in particular, μ′≤ μ. An analogous argument gives ν′ ≤ ν. Moreover, the definition of Φ(μ, ν) and stochastic dominance imply that $\mu'_n(S) = \nu'_n(S)$ for all n ∈ ℕ, and, therefore, μ ^′(S) = ν ^′(S). Also, for any $I \in i\cal B$, the fact that $\mu'_n(I) \leq \nu'_n(I)$ for all n ∈ ℕ gives μ ^′ (I) ≤ ν ^′(I). Thus, μ ^′⪯_sd ν ^′. Finally, μ ^′(S) = s, since $\mu'_n(S) \uparrow s$. Hence, (μ ^′, ν ^′) lies in Φ^∗(μ, ν).

The value α_O(μ, ν) defined in (3.3) gives the mass of the maximal ordered component pair. We call it the ordered affinity from μ to ν. On an intuitive level, we can think of α_O(μ, ν) as the ‘degree’ to which μ is dominated by ν in the sense of stochastic dominance. Since (μ ⊥ ν, μ ⊥ ν) is an ordered component pair for (μ, ν), we have

(3.4)

\begin{align}\label{eq:aldbal} 0 \leq \alpha(\mu, \nu) \leq \alpha_O(\mu, \nu),\end{align}

where α(μ, ν) is the standard affinity defined in Section 2. In fact, α_O(μ, ν) generalizes the standard the notion of affinity by extending it to arbitrary partial orders, as shown in the next lemma.

Lemma 3.2. If ⪯ is the identity order then $\alpha_0 = \alpha \; on \; \cal M \times \cal M$.

Proof. Fix ${ (\mu, \nu) \in \cal M \times \cal M}$, and let ‘⪯’ be the identity order (x ⪯ y if and only if x = y). Then ‘⪯_sd’ also corresponds to equality, from which it follows that the supremum in (3.3) is attained by μ Λ ν. Hence, α_O(μ, ν) = α(μ, ν).

3.3. Properties of ordered affinity

Let us list some elementary properties of α_O. The following list should be compared with Lemma 2.1. It shows that analogous results hold for α_O as hold for α. (Lemma 2.1 is in fact a special case of Lemma 3.3 with the partial order set to the identity order.)

Lemma 3.3. For all ${ (\mu, \nu) \in \cal M \times \cal M}$, we have

(a) 0 ≤ α_O(μ, ν) ≤ min{μ(S), ν(S)},
(b) α_O(μ, ν) = μ(S) = ν(S) if and only if μ⪯_sd ν, and
(c) cα_O(μ, ν) = α_O(cμ, cν) whenever c ≥ 0.

Proof. Fix ${ (\mu, \nu) \in \cal M \times \cal M}$. Claim (a) follows directly from the definitions. Regarding claim (b), suppose first that μ⪯_sd ν. Then (μ, ν) ∈ Φ(μ, ν) and, hence, α_O(μ, ν) = μ(S). Conversely, if α_O(μ, ν) = μ(S) then, since the only component μ′≤ μ with μ′(S) = μ(S) is μ itself, we must have (μ, ν′) ∈ Φ(μ, ν′) for some ν′≤ ν with μ⪯_sd ν′. But then μ(I) ≤ ν′(I) ≤ ν(I) for any $I \in i\cal B$. Hence, μ⪯_sdν.

Claim (c) is trivial if c = 0, so suppose instead that c > 0. Fix (μ′, ν′) ∈ Φ(μ, ν) such that α_O(μ, ν) = μ′(S). It is clear that (cμ′, cν′) ∈ Φ(cμ, cν), implying that

(3.5)

\begin{align} \label{eq:cin} c \alpha_O(\mu, \nu) = c \mu'(S) \leq \alpha_O(c\mu, c\nu). \end{align}

For reverse inequality, we can apply (3.5) again to get

\begin{align*} \alpha_O(c\mu, c\nu) = c \left(\frac{1}{c}\right) \alpha_O(c\mu, c\nu) \leq c \alpha_O(\mu, \nu).\\[-44pt] \end{align*}

3.4. Equivalent representations

In (2.2) we noted that the affinity between two measures has several alternative representations. In our setting these results generalize as follows.

Theorem 3.1. For all ${(\mu, \nu) \in \cal P \times \cal P}$, we have

(3.6)

\begin{align}\label{eq:affco}\alpha_O(\mu, \nu)= 1 - \sup_{I \in i\cal B} \{ \mu(I) - \nu(I) \}= \max_{(X, Y) \in \cal C(\mu, \nu)} \mathbb P\{ X \preceq Y \}.\end{align}

Evidently (2.2) is a special case of (3.6) because (3.6) reduces to (2.2) when ‘⪯’ is set to equality. For example, when ‘⪯’ is equality,

$$\mathop {\sup }\limits_{I \in i{\cal B}} \{ \mu (I) - \nu (I)\} = \mathop {\sup }\limits_{B \in {\cal B}} \{ \mu (B) - \nu (B)\} = \mathop {\sup }\limits_{B \in {\cal B}} |\mu (B) - \nu (B)|,$$

where the last step is from (2.1). Note also that, as shown in the proof of Theorem 3.1, the supremum can also be written in terms of the open increasing sets i𝒪 or the closed decreasing sets $d\mathfrak C$. In particular,

\begin{align*} \sup_{I \in i\cal B} \{ \mu(I) - \nu(I) \} = \sup_{I \in i\cal O} \{ \mu(I) - \nu(I) \} = \sup_{D \in d\mathfrak C} \{ \nu(D) - \mu(D) \}.\end{align*}

One of the assertions of Theorem 3.1 is the existence of a coupling $ (X, Y) \in \cal C(\mu, \nu)$ attaining ℙ{X ⪯ Y} = α_O (μ, ν). Let us refer to any such coupling as an order maximal coupling for (μ, ν).

Example 3.3. For (x, y) ∈ S × S, we have

\begin{align*} \alpha_O(\delta_x, \delta_y) = {\bf 1}\{x \preceq y\} = {\bf 1}_{\mathbb G}(x, y),\end{align*}

as can easily be verified from the definition or either of the alternative representations in (3.6). The map $ (x, y) \mapsto {\bf 1}_{\mathbb G}(x, y)$ is measurable due to the Polish assumption. As a result, for any $ (X, Y) \in \cal C(\mu, \nu)$, we have

\begin{align*} \mathbb E \alpha_O(\delta_X, \delta_Y) = \mathbb P\{X \preceq Y\} \leq \alpha_O(\mu, \nu),\end{align*}

with equality when (X, Y) is an order maximal coupling.

3.5. Comments on Theorem 3.1

The existence of an order maximal coupling shown in Theorem 3.1 implies two well-known results that are usually treated separately. One is the Nachbin–Strassen theorem (see, e.g. Theorem 1 of [Reference Kamae, Krengel and O’Brien19] or Chapter IV of [Reference Lindvall22]), which states the existence of a coupling $ (X, Y) \in \cal C(\mu, \nu) $ attaining ℙ{X ⪯ Y} = 1 whenever μ⪯_sdν. The existence of an order maximal coupling for each (μ, ν) in $\cal P \times \cal P$ implies this statement, since, under the hypothesis that μ⪯_sdν, we also have α_O(μ, ν) = 1. Hence, any order maximal coupling satisfies ℙ{X ⪯ Y} = 1.

The other familiar result implied by the existence of an order maximal coupling is the existence of a maximal coupling in the standard sense (see the discussion of maximal couplings after (2.2) and the result on page 19 of [Reference Lindvall22]). Indeed, if we take ‘⪯’ to be the identity order then (3.6) reduces to (2.2), as already discussed.

4. Total ordered variation

Let S be a partially ordered Polish space. Consider the function on $\cal P \times \cal P$ given by

\begin{align*} \label{eq:dtovd} \gamma(\mu, \nu) \,:\!= 2 - \alpha_O(\mu, \nu) - \alpha_O(\nu, \mu).\end{align*}

We call γ (μ, ν) the total ordered variation distance between μ and ν. The natural comparison is with (2.3), which renders the same value if α_O is replaced by α. In particular, when ‘⪯’ is equality, ordered affinity reduces to affinity, and total ordered variation distance reduces to total variation distance. Since ordered affinities dominate affinities (see (3.4)), we have γ (μ, ν) ≤ ||μ − ν|| for all $ (\mu, \nu) \in \cal P \times \cal P $.

Other, equivalent, representations of γ are available. For example, in view of (3.6), for any $ (\mu, \nu) \in \cal P \times \cal P $, we have

(4.1)

\begin{align}\label{eq:tvsio}\gamma(\mu, \nu)= \sup_{I \in i\cal B} (\mu - \nu)(I) + \sup_{I \in i\cal B} (\nu - \mu)(I).\end{align}

By combining Lemma 3.1 and (3.2), we also have

(4.2)

\begin{align}\label{eq:nq} 2 \gamma(\mu, \nu) = \sup_{h \in iH} (\mu - \nu)(h) + \sup_{h \in iH} (\nu - \mu)(h).\end{align}

It is straightforward to show that

(4.3)

\begin{align} \label{eq:idb} \sup_{I \in i\cal B} |\mu(I) - \nu(I)| \leq \gamma(\mu, \nu) \quad \text{and} \quad \sup_{D \in d\cal B} |\mu(D) - \nu(D)| \leq \gamma(\mu, \nu) .\end{align}

Lemma 4.1. The function γ is a metric on $\cal P$.

Proof. The claim that γ is a metric follows in a straightforward way from the definition or the alternative representation (4.1). For example, the triangle inequality is easy to verify using (4.1). Also, γ (μ, ν) = 0 implies that μ = ν by (4.1) and Remark 3.1.

4.1. Connection to other modes of convergence

As well as total variation, the metric γ is closely related to the so-called Bhattacharya metric, which is given by

(4.4)

\begin{align} \label{eq:bhm} \beta(\mu, \nu) \,:\!= \sup_{h \in iH} |\mu(h) - \nu(h)| . \end{align}

See [Reference Bhattacharya and Lee1] and [Reference Chakraborty and Rao3]. (In [Reference Chakraborty and Rao3] the metric is defined by taking the supremum over iH ₀ rather than iH, but the two definitions differ only by a positive scalar.) The Bhattacharya metric can be thought of as an alternative way to generalize total variation distance, in the sense that, like γ, the metric β reduces to total variation distance when ‘⪯’ is the identity order (since iH equals H under this order). From (3.1) we have

\begin{align*} \label{eq:cfsi} \frac{1}{2} \Big[ \sup_{h \in iH} \lambda(h) +\sup_{h \in iH} (-\lambda)(h) \Big] \leq \sup_{h \in iH} |\lambda(h)| \leq \sup_{h \in iH} \lambda(h) + \sup_{h \in iH} (-\lambda)(h), \end{align*}

and from this and (4.2) we have

(4.5)

\begin{align} \label{eq:em} \gamma(\mu, \nu) \leq \beta(\mu, \nu) \leq 2 \gamma(\mu, \nu). \end{align}

Hence, β and γ are equivalent metrics.

The metric γ is also connected to the Wasserstein metric ([Reference Gibbs and Su12], [Reference Givens and Shortt13]). If ρ metrizes the topology on S then the Wasserstein distance between probability measures μ and ν is

\begin{align*} w(\mu, \nu) \,:\!= \inf_{(X, Y) \in \cal C(\mu, \nu)} \mathbb E \, \rho(X, Y) .\end{align*}

The total ordered variation metric can be compared as follows. Consider the ‘directed semimetric’ $\hat \rho(x, y) \,:= \bf 1\{x {\npreceq} y\}$. In view of (3.6) we have

\begin{align*} \gamma(\mu, \nu) = \inf_{(X, Y) \in \cal C(\mu, \nu)} \mathbb E \, \hat \rho(X, Y) + \inf_{(X, Y) \in \cal C(\mu, \nu)} \mathbb E \, \hat \rho(Y, X) .\end{align*}

Thus, γ (μ, ν) is found by summing two partial, ‘directed Wasserstein deviations’. Summing the two directed differences from opposite directions yields a metric.

Proposition 4.1. If $\{\mu_n\}_{n \geq 0} \subset \cal P$ is tight and γ (μ_n, μ ₀) → 0, then $\mu_n \buildrel {\rm w} \over \longrightarrow \mu_0$.

Proof. Let {μ_n} and μ := μ ₀ satisfy the conditions of the proposition. Take any sub-sequence of {μ_n} and observe that, by Prokhorov’s theorem, this subsequence has a subsubsequence converging weakly to some $\nu \in \cal P$. Along this subsubsequence, for any continuous h ∈ ibS, we have both μ_n(h) → μ(h) and μ_n(h) → ν(h). This is sufficient for ν = μ by Remark 3.1. Thus, every subsequence of {μ_n} has a subsubsequence converging weakly to μ, and hence so does the entire sequence.

4.2. Completeness

To obtain completeness of ($\cal P,\,\gamma$), we adopt the following additional assumption, which is satisfied if, say, compact sets are order bounded (i.e. lie in order intervals) and order intervals are compact. (For example, ℝⁿ with the usual pointwise partial order has this property.)

Assumption 4.1. If K ⊂ S is compact then i(K) ∩ d(K) is also compact.

Theorem 4.1. If Assumption 4.1 holds then ($\cal P$, γ) is complete.

Remark 4.1. In [Reference Chakraborty and Rao3] it was shown that β is a complete metric when S = ℝⁿ. Due to equivalence of the metrics, Theorem 4.1 extends this result to partially ordered Polish spaces where Assumption 4.1 is satisfied.

5. Applications

In this section we show that many results in classical and monotone Markov chain theory, hitherto treated separately, can be derived from the same set of results based around total ordered variation and ordered affinity.

Regarding notation, if {S_i} are partially ordered Polish spaces over i = 0, 1, 2, …, we often use a common symbol ‘⪯’ for the partial order on any of these spaces. On products of these spaces we use the product topology and pointwise partial order. Once again, the symbol ‘⪯’ is used for the partial order. For example, if (x ₀, x ₁) and (y ₀, y ₁) are points in S ₀ × S ₁, then (x ₀, x ₁) ⪯ (y ₀, y ₁) means that x ₀ ⪯ y ₀ and x ₁ ⪯ y ₁.

A function $P \colon (S_0, \cal B_1) \to [0, 1]$ is called a Markov kernel from S ₀ to S ₁ if x ↦ P(x, B) is $\cal B_0$-measurable for each $B \in \cal B_1$ and B ↦ P(x, B) is in $\cal P_1$ for all x ∈ S ₀. If S ₀ = S ₁ = S, we will call P a Markov kernel on S, or just a Markov kernel. Following standard conventions (see, e.g. [Reference Meyn and Tweedie25]), for any Markov kernel P from S ₀ to S ₁, any h ∈ bS ₁, and $\mu \in \cal P_0$, we define $\mu P \in \cal P_1$ and Ph ∈ bS ₀ via

\begin{align*} (\mu P)(B) = \int P(x, B) \mu({\rm d} x) \quad {\rm and} \quad (Ph)(x) = \int h(y) P(x, {\rm d} y).\end{align*}

Also, μ ⊗ P denotes the joint distribution on S ₀ × S ₁ defined by

\begin{align*} (\mu \otimes P)(A \times B) = \int_A P(x, B) \mu({\rm d} x).\end{align*}

To simplify notation, we use P_x to represent the measure δ_xP = P(x, ·); P^m is the mth composition of P with itself.

5.1. Order affinity and monotone Markov kernels

Let S be a Polish space partially ordered by ‘⪯’. A Markov kernel P is called monotone if Ph ∈ ibS ₀ whenever h ∈ ibS ₁. An equivalent condition is that μP ⪯_sd νP whenever μ ⪯_sd ν; or just P(x, ·) ⪯_sd P(y, ·) whenever x ⪯ y. It is well known (see, e.g. Proposition 1 of [Reference Kamae, Krengel and O’Brien19]) that if μ ⪯_sd ν and P is monotone, then μ ⊗ P ⪯ _sd ν ⊗ P. Note that, when ‘⪯’ is the identity order, every Markov kernel is monotone.

Lemma 5.1. If P is a monotone Markov kernel from S ₀ to S ₁ and μ, μ′, ν, and ν′are probabilities in $\cal P_0$, then

\begin{align*} \mu' {\preceq_{sd}} \mu \quad and \quad \nu {\preceq_{sd}} \nu' \quad \Rightarrow \quad \alpha_O(\mu P, \nu P) \leq \alpha_O(\mu' P, \nu' P). \end{align*}

Proof. Let P, μ, μ′, ν, and ν′ have the stated properties. In view of the equivalently representation in (3.6), the claim will be established if

\begin{align*} \sup_{I \in i\cal B} \{ (\mu P)(I) - (\nu P)(I) \} \geq \sup_{I \in i\cal B} \{ (\mu' P)(I) - (\nu' P)(I) \}.\end{align*}

This holds by the monotonicity of P and the order of μ, μ′, ν, and ν′.

Lemma 5.2. If P is a monotone Markov kernel from S ₀ to S ₁, then

\begin{align*} \alpha_O(\mu P, \nu P) \geq \alpha_O(\mu, \nu) \quad {for\; any\; \mu, \nu \; in \;\cal P_0}.\end{align*}

Proof. Fix μ, ν in $\cal P_0$, and let ($\hat \mu, \hat \nu$) be a maximal ordered component pair for (μ, ν). From monotonicity of P and the fact the Markov kernels preserve the mass of measures, it is clear that $ (\hat \mu P, \hat \nu P)$) is an ordered component pair for (μP, νP). Hence,

\begin{align*} \alpha_O(\mu P, \nu P) \geq (\hat \mu P)(S) = \hat \mu(S) = \alpha_O(\mu, \nu).\\[-35pt] \end{align*}

On the other hand, for the joint distribution, the ordered affinity of the initial pair is preserved.

Lemma 5.3. If P is a monotone Markov kernel from S ₀ to S ₁, then

\begin{align*} \alpha_O(\mu \otimes P, \nu \otimes P) = \alpha_O(\mu, \nu) \quad {for\; any\; \mu,\nu\; in \;\cal P_0}.\end{align*}

Proof. Fix μ, ν in $\cal P_0$, and let (X ₀, X ₁) and (Y ₀, Y ₁) be random pairs with distributions μ ⊗ P and ν ⊗ P, respectively. We have

\begin{align*} \mathbb P\{ (X_0, X_1) \preceq (Y_0, Y_1) \} \leq \mathbb P\{ X_0 \preceq Y_0 \} \leq \alpha_O(\mu, \nu). \end{align*}

Taking the supremum over all couplings in $\cal C(\mu \otimes P, \nu \otimes P)$ shows that α_O(μ ⊗ P, ν ⊗ P) is dominated by α_O(μ, ν).

To see the reverse inequality, let ($\hat \mu, \hat \nu$) be a maximal ordered component pair for (μ, ν). Monotonicity of P now gives $\hat \mu \otimes P {\preceq_{sd}} \hat \nu \otimes P$. Using this and the fact the Markov kernels preserve the mass of measures, we see that ($\hat \mu \otimes P, \hat \nu \otimes P$) is an ordered component pair for (μ ⊗ P, ν ⊗ P). Hence,

\begin{align*} \alpha_O(\mu P, \nu P) \geq (\hat \mu \otimes P)(S_0 \times S_1) = \hat \mu(S_0) = \alpha_O(\mu, \nu).\\[-32pt] \end{align*}

5.2. Monotone Markov chains

Given μ ∈ 𝒫 and Markov kernel P on S, a stochastic process {X_t}_t≥0 taking values in $S^{\infty} \,:\!= \times_{t=0}^\infty S$ will be called a Markov chain with initial distribution μ and kernel P if the distribution of {X_t} on S ^∞ is

\begin{align*} \boldsymbol Q_\mu \,:\!= \mu \otimes P \otimes P \otimes P \otimes \cdots\!.\end{align*}

(The meaning of the right-hand side is clarified in [Reference Lindvall22, Section III.8], [Reference Kamae, Krengel and O’Brien19, p. 903], and [Reference Meyn and Tweedie25, Section 3.4] for example.) If P is a monotone Markov kernel then (x, B) ↦ Q_x(B) := Q _{δ _x}(B) is a monotone Markov kernel from S to S ^∞. See Propositions 1 and 2 of [Reference Kamae, Krengel and O’Brien19].

There are various useful results about representations of Markov chains that are ordered almost surely. One is that, if the initial conditions satisfy μ ⪯_sd ν and P is a monotone Markov kernel, then we can find Markov chains {X_t} and {Y_t} with initial distributions μ and ν and kernel P such that X_t ⪯ Y_t for all t almost surely. (See, e.g. Theorem 2 of [Reference Kamae, Krengel and O’Brien19].) This result can be generalized beyond the case where μ and ν are stochastically ordered, using the results presented above. For example, let μ and ν be arbitrary initial distributions and let P be monotone, so that Q_x is likewise monotone. By Lemma 5.3 we have

\begin{align*} \alpha_O(\boldsymbol Q_\mu, \boldsymbol Q_\nu) = \alpha_O(\mu \otimes \boldsymbol Q_x, \nu \otimes \boldsymbol Q_x) = \alpha_O(\mu, \nu).\end{align*}

In other words, the ordered affinity of the entire processes is given by the ordered affinity of the initial distributions. It now follows from Theorem 3.1 that there exist Markov chains {X_t} and {Y_t} with initial distributions μ and ν and Markov kernel P such that

\begin{align*} \mathbb P\{ X_t \preceq Y_t \text{ for all } t \geq 0\} = \alpha_O(\mu, \nu).\end{align*}

The standard result is a special case, since μ ⪯_sd ν implies that α_O(μ, ν) = 1, and, hence, the sequences are ordered almost surely.

5.3. Nonexpansiveness

It is well known that every Markov kernel is nonexpansive with respect to the total variation norm, so that

(5.1)

\begin{align} \label{eq:netv} \| \mu P - \nu P \| \leq \| \mu - \nu \| \quad \text{for all } (\mu, \nu) \in \cal P \times \cal P. \end{align}

An analogous result is true for γ when P is monotone. That is,

(5.2)

\begin{align} \label{eq:netov} \gamma(\mu P , \nu P ) \leq \gamma( \mu , \nu ) \quad \text{for all } (\mu, \nu) \in \cal P \times \cal P.\end{align}

The bound (5.2) follows directly from Lemma 5.2. Evidently (5.1) be recovered from (5.2) by setting ‘⪯’ to equality.

Nonexpansiveness is interesting partly in its own right (we apply it in proofs below) and partly because it suggests that, with some additional assumptions, we can strengthen it to contractiveness. We expand on this idea below.

5.4. An order coupling bound for Markov chains

Doeblin [Reference Doeblin7] established and exploited the coupling inequality

(5.3)

\begin{align} \label{eq:d} \| \mu_t - \nu_t \| \leq 2 \, \mathbb P\{X_j \not= Y_j \text{ for any } j \leq t\} ,\end{align}

where μ_t and ν_t are the time t distributions of Markov chains {X_t} and {Y_t} generated by common Markov kernel P. Even when the state space is uncountable, the right-hand side of (5.3) can often be shown to converge to zero by manipulating the joint distribution of (X_j, Y_j) to increase the chance of a meeting ([Reference Doob9], [Reference Harris14], [Reference Lindvall22], [Reference Meyn and Tweedie25], [Reference Nummelin26], [Reference Orey27], [Reference Pitman28], [Reference Revuz31], [Reference Thorisson34]).

Consider the following generalization: given a monotone Markov kernel P on S and arbitrary $\mu, \nu \in \cal P$, we can construct Markov chains {X_t} and {Y_t} with common kernel P and respective initial conditions μ and ν such that

(5.4)

\begin{align} \label{eq:do} \gamma( \mu_t, \nu_t) \leq \mathbb P\{X_j \npreceq Y_j \text{ for any } j \leq t\} + \mathbb P\{Y_j \npreceq X_j \text{ for any } j \leq t\}. \end{align}

This generalizes (5.3) because both the right- and left-hand sides of (5.4) reduce to (5.3) if we take $' \preceq '$ to be equality.

To prove (5.4), we need only show that

(5.5)

\begin{align} \label{eq:ido} 1 - \alpha_O( \mu P^t, \nu P^t) \leq \mathbb P\{X_j \npreceq Y_j \text{ for any } j \leq t\},\end{align}

since, with (5.5) is established, we can reverse the roles of {X_t} and {Y_t} in (5.5) to obtain $1 - \alpha_O( \nu P^t, \mu P^t) \leq \mathbb P\{Y_j \npreceq X_j \;\text {for any}\; j \leq t\}$ and then add this inequality to (5.5) to produce (5.4).

If {X_t} and {Y_t} are Markov chains with kernel P and initial conditions μ and ν, then (3.6) yields α_O(μP^t, νP^t) ≥ ℙ{X_t ⪯ Y_t}. Therefore, we need only construct such chains with the additional property that

(5.6)

\begin{align} \label{eq:oia} \mathbb P\{X_j \npreceq Y_j \text{ for any } j \leq t\} = \mathbb P\{X_t \npreceq Y_t\}.\end{align}

Intuitively, we can do so by using a ‘conditional’ version of the Nachbin–Strassen theorem, producing chains that, once ordered, remain ordered almost surely. This can be formalized as follows. By [Reference Machida and Shibakov23, Theorem 2.3], there exists a Markov kernel M on S × S such that $\mathbb G$ is absorbing for M (i.e. $M((x,y), \mathbb G) = 1$ for all (x, y) in $\mathbb G$),

\begin{align*} P(x, A) = M((x,y), \, A \times S) \quad \text{and} \quad P(y, B) = M((x,y), \, S \times B) \end{align*}

for all (x, y) ∈ S × S and all $A, B \in \cal B$. Given M, let η be a distribution on S × S with marginals μ and ν, let Q_η := η ⊗ M ⊗ M ⊗ · · · be the induced joint distribution, and let {(X_t, Y_t)} have distribution Q_η on (S × S)^∞. By construction, X_t has distribution μP^t and Y_t has distribution νP^t. Moreover, (5.6) is valid because $\mathbb G$ is absorbing for M, and, hence, $\mathbb P\{(X_j, Y_j) \notin \mathbb G $ for any $ j \leq t\} = \mathbb P\{(X_t, Y_t) \notin \mathbb G\}$.

5.5. Uniform ergodicity

Let S be a partially ordered Polish space satisfying Assumption 4.1, and let P be a monotone Markov kernel on S. A distribution π is called stationary for P if πP = π. Consider the value

(5.7)

\begin{align} \label{eq:odob} \sigma(P) \,:\!= \inf_{(x, y) \in S \times S} \alpha_O(P_x, P_y) ,\end{align}

which can be understood as an order-theoretic extension of the Markov–Dobrushin coefficient of ergodicity ([Reference Dobrushin5], [Reference Seneta32]). It reduces to the usual notion when ‘⪯’ is equality.

Theorem 5.1. If P is monotone then

(5.8)

\begin{align} \label{eq:fbod} \gamma(\mu P, \nu P) \leq (1 - \sigma(P)) \, \gamma(\mu, \nu) \quad {for\; all }\;(\mu, \nu) \in \cal P \times \cal P.\end{align}

Thus, strict positivity of σ(P) implies that μ ↦ μP is a contraction map on $ (\cal P, \gamma ) $. Moreover, in many settings, the bound in (5.8) cannot be improved upon, as we now see.

Lemma 5.4. If P is monotone, S is not a singleton, and any x, y in S have a lower bound in S, then,

(5.9)

\begin{align} \label{eq:ev} {for \; all } \; \xi > \sigma(P),\;\;{there \; exists }\; \mu, \nu \in {\cal P} \;{such \; that }\;\gamma(\mu P, \nu P) > (1 - \xi) \gamma(\mu, \nu).\end{align}

The significance of Theorem 5.1 is summarized in the next corollary.

Corollary 5.1. Let P be monotone, and let S satisfy Assumption 4.1. If there exists an m ∈ ℕ such that σ(P^m) > 0, then P has a unique stationary distribution π in $\cal P$, and

(5.10)

\begin{align} \label{eq:ifbod} \gamma(\mu P^t, \pi) \leq (1 - \sigma(P^m))^{{\lfloor t/m \rfloor}} \, \gamma(\mu, \pi)\quad {for \; all }\; \mu \in {\cal P}, t \geq 0.\end{align}

Here ⌊x⌋ is the largest n ∈ ℕ with n ≤ x.

Proof of Corollary 5.1. Let P and μ be as in the statement of the theorem. The existence of a fixed point of μ ↦ μP, and hence a stationary distribution $\pi \in \cal P$, follows from Theorem 5.1 applied to P^m, Banach’s contraction mapping theorem, and the completeness of ($\cal P$, γ) shown in Theorem 4.1. The bound in (5.10) follows from (5.8) applied to P^m and the nonexpansiveness of P in the metric γ (see (5.2)).

5.6. Applications from the introduction

Let us see how Corollary 5.1 can be used to show stability of the two models discussed in the Introduction, beginning with the monotone model in (1.1). The state space is S = [0, 1] with its standard order and all assumptions are as per the discussion immediately following (1.1). We let P be the corresponding Markov kernel and consider the following coupling of (P ₁, P ₀). Let W be a draw from the Bernoulli($\tfrac12$) distribution, and let V = 1 − W. Then P ₁ and P ₀ are respectively the distributions of

\begin{align*}X \,:\!= \frac{1 + W}{2}\quad \text{and} \quad Y \,:\!= \frac{0 + V}{2} = \frac{1 - W}{2}.\end{align*}

Since X ≤ Y if and only if W =0, we have, by Lemma 5.1 and (3.6),

\begin{align*} \alpha_O(P_x, P_y) \geq \alpha_O(P_1, P_0) \geq \mathbb P\{ X \preceq Y \} = \tfrac{1}{2} \end{align*}

for all x, y ∈ S. From the definition in (5.7) we then have $\sigma(P) \geq \tfrac12$, and globally stability in the γ metric follows from Corollary 5.1.

Next we turn to the inventory model in (1.2), with state space S = [0, K] and {W_t} an i.i.d. process satisfying ℙ{W_t ≥ w} > 0 for any real w. Let {X_t} and {Y_t} be generated by (1.2), with common shock sequence {W_t}, and respective initial conditions x, y in S. In view of (2.2), we have

$$\alpha ({P_x},{P_y}) \ge {\mathbb P \{ {X_1} = {Y_1}\}} \ge {\mathbb P \{{W_1} \ge K\}} = : \kappa > 0.$$

Letting ‘⪯’ be equality on S, we have α_O(P_x, P_y) = α(P_x, P_y) ≥ κ, and hence σ(P) ≥ κ. Globally stability in the γ metric follows from Corollary 5.1.

5.7. General applications

Next we show that Theorem 5.1 and in particular Corollary 5.1 cover as special cases two well-known results on Markov chain stability from the classical literature on one hand and the more recent monotone Markov chain literature on the other.

Consider first the standard notion of uniform ergodicity. A Markov kernel P on S is called uniformly ergodic if it has a stationary distribution π and $\sup_{x \in S} \| P_x^t - \pi \| \to 0$ as t → ∞. Uniform ergodicity was studied by Markov [Reference Markov24] in a countable state space, and by Doeblin [6], Yosida and Kakutani [Reference Yosida and Kakutani37], Doob [Reference Doob9], and many subsequent authors in a general state space. It is defined and reviewed in Chapter 16 of [Reference Meyn and Tweedie25]. One of the most familiar equivalent conditions for uniform ergodicity [Reference Meyn and Tweedie25, Theorem 16.0.2] is the existence of an m ∈ ℔ and a nontrivial $\phi \in \cal M$ such that $P^m_x \geq \phi$ for all x in S.

One can recover this result using Corollary 5.1. Take ‘⪯’ to be equality, in which case every Markov operator is monotone, γ is total variation distance, and Assumption 4.1 is always satisfied. Moreover, σ(P^m) reduces to the ordinary ergodicity coefficient of P^m, evaluated using the standard notion of affinity, and, hence,

\begin{align*} \sigma(P^m) = \inf_{(x, y) \in S \times S} \alpha(P^m_x, \, P^m_y) = \inf_{(x, y) \in S \times S} (P^m_x \wedge P^m_y) (S) \geq \phi(S) > 0.\end{align*}

Thus, all the conditions of Corollary 5.1 are satisfied, and

\begin{align*} \sup_{x \in S} \| P^t_x - \pi \| = \sup_{x \in S} \gamma( P^t_x \, , \, \pi) \leq 2 (1 - \sigma(P^m))^{{\lfloor t/m \rfloor}} \to 0 \quad\text{as } t \to \infty.\end{align*}

Now consider the setting of Bhattacharya and Lee [Reference Bhattacharya and Lee1], where S =ℝⁿ, ‘⪯’ is the usual pointwise partial order ‘≤’ for vectors, and {g_t} is a sequence of i.i.d. random maps from S to itself, generating {X_t} via X_t = g_t(X _t−1) = g_t o · · · o g ₁(X ₀). The corresponding Markov kernel is P(x, B) = ℙ{g ₁ (x) ∈ B}. The random maps are assumed to be order preserving on S, so that P is monotone. Bhattacharya and Lee used a ‘splitting condition’, which assumes existence of a x̄ ∈ S and m ∈ ℕ such that

(a) s ₁ := ℙ{g_m o · · · o g ₁(y) ≤ x̄ for all y ∈ S} > 0, and
(b) s ₂ := ℙ{g_m o · · · o g ₁(y) ≥ x̄ for all y ∈ S} > 0.

Under these assumptions, they show that $\sup_{x \in S} \beta(P^t_x, \, \pi)$ converges to zero exponentially fast in t, where β is the Bhattacharya metric introduced in (4.4). This finding extends earlier results by Dubins and Freedman [Reference Dubins and Freedman10] and Yahav [Reference Yahav36] to multiple dimensions.

This result can be obtained as a special case of Corollary 5.1. Certainly S is a partially ordered Polish space and Assumption 4.1 is satisfied. Moreover, the ordered ergodicity coefficient σ(P^m) is strictly positive. To see this, suppose that the splitting condition is satisfied at m ∈ ℕ. Pick any x, y ∈ S, and let {X_t} and {Y_t} be independent copies of the Markov chain, starting at x and y, respectively. We have

\begin{align*} \sigma(P^m) \geq \mathbb P\{X_m \leq Y_m\} \geq \mathbb P\{X_m \leq \bar x \leq Y_m\} = \mathbb P\{X_m \leq \bar x \} \mathbb P\{\bar x \leq Y_m\} \geq s_1 s_2.\end{align*}

The last term is strictly positive by assumption. Hence, all the conditions of Corollary 5.1 are satisfied, a unique stationary distribution π exists, and $\sup_{x \in S} \gamma(P^t_x, \, \pi)$ converges to zero exponentially fast in t. We showed in (4.5) that β ≤ 2γ, so the same convergence holds for the Bhattacharya metric.

We can also recover a related convergence result due to Hopenhayn and Prescott [Reference Hopenhayn and Prescott16, Theorem 2] that is routinely applied to stochastic stability problems in economics. They assumed that S is a compact metric space with a closed partial order, and a least element a and greatest element b. They supposed that P is monotone, and that there exists an x̄ in S and an m ∈ ℕ such that

(5.11)

\begin{align} \label{eq:mmc} P^m(a, [\bar x, b]) > 0 \quad \text{and} \quad P^m(b, [a, \bar x]) > 0.\end{align}

In this setting, they showed that P has a unique stationary distribution π and $\mu P^t \buildrel {\rm w} \over \longrightarrow \pi$ for any $\mu \in \cal P$ as t → ∞. This result can be obtained from Corollary 5.1. Under the stated assumptions, S is Polish and Assumption 4.1 is satisfied. The coefficient σ(P^m) is strictly positive because, if we let {X_t} and {Y_t} be independent copies of the Markov chain starting at b and a, respectively, then, since $ (X_m, Y_m) \in {\cal C}(P^m_b, P^m_a)$, we have

\begin{align*} \alpha_O(P^m_b, P^m_a) \geq \mathbb P\{X_m \leq Y_m\} \geq \mathbb P\{X_m \leq \bar x \leq Y_m\} = \mathbb P\{X_m \leq \bar x \} \mathbb P\{\bar x \leq Y_m\}.\end{align*}

The last term is strictly positive by (5.11). Positivity of σ(P^m) now follows from Lemma 5.1, since a ⪯ x, y ⪯ b for all x, y ∈ S. Hence, by Corollary 5.1, there exists a unique stationary distribution π and $\gamma(\mu P^t, \pi) \to 0$ as t → ∞ for any $\mu \in \cal P$. This convergence implies weak convergence by Proposition 4.1 and compactness of S.

Appendix A

In this appendix we collect the remaining proofs. Throughout, in addition to notation defined above, cbS ₀ denotes all continuous functions h: S → [0, 1], while

\begin{align*} g(\mu, \nu) \,:\!= \| \mu \| - \alpha_O(\mu, \nu) \quad \text{for each $\mu, \nu \in \cal M$}. \end{align*}

A.1. Proofs of Section 3 results

Proof of Lemma 3.1. For the first equality, fix $\lambda \in \cal M_s$ and let

\begin{align*} s(\lambda) \,:\!= \sup_{I \in i\cal B} \lambda(I) \quad \text{and} \quad b(\lambda) \,:\!= \sup_{h \in iH_0} \lambda(h).\end{align*}

Since 1_I ∈ ibS for all $I \in i\cal B$, we have b(λ) ≥ s(λ). To see the reverse inequality, let h ∈ iH ₀. Fix n ∈ ℕ. Let r_j = j/n for j = 0, …, n. Define h_n ∈ iH ₀ by

\begin{align*} h_{n}(x) = \max\{r \in \{r_{0}, \ldots, r_{n}\} \colon r \leq h(x)\}.\end{align*}

Since h ≤ h_n + 1/n, we have

(A.1)

\begin{align} \label{calpie} \lambda (h) \leq \lambda (h_{n}) + \frac{\|\lambda\|}{n}.\end{align}

For j = 0, …, n, let $I_{j} \,:\!= \{x \in S \colon h_{n}(x) \geq r_{j}\} \in i\cal B$. Note that

(A.2)

\begin{align} \label{capable} I_{n} = \{x \in S \colon h_{n}(x) = 1\} \subset I_{n-1} \subset \cdots \subset I_{0} = S.\end{align}

We have

$$\lambda (h_{n} )= \lambda(I_{n}) + \sum_{j=1}^{n} r_{n-j} \lambda(I_{n-j} \setminus I_{n-j+1}).$$

We define f ₀, …, f _n−1 ∈ iH ₀ and $A_{0}, \ldots, A_{n-1} \in i\cal B$ as follows. Let f ₀ = h_n and A ₀ = I_n. Evidently,

\begin{align*} \lambda (\kern2pt f_{0}) \geq \lambda (h_{n}), \qquad f_{0}(x) = 1 \quad\text{for all } x \in A_{0}, \qquad f_{0}(x) = r_{n-1} \quad\text{for all } x \in I_{n-1} \setminus A_{0}.\end{align*}

Now suppose that, for some j ∈ {0, 1, …, n − 2}, we have

(A.3)

\begin{align} \label{calfeutrage} \begin{gathered} \lambda (\kern2pt f_{j}) \geq \lambda (h_{n}),\qquad f_{j}(x) = 1 \quad\text{for all } x \in A_{j}, \\ f_{j}(x) = r_{n-j-1} \quad\text{for all } x \in I_{n-j-1} \setminus A_{j}.\end{gathered}\end{align}

If λ(I _n−j−1 \ A_j) > 0 then define

\begin{align*} f_{j+1}(x) = \begin{cases} 1 & \text{if $x \in I_{n-j-1} \setminus A_{j}$,} \\ f_{j}(x) & \text{otherwise,}\end{cases} \quad \text{and} \quad A_{j+1} = I_{n-j-1}.\end{align*}

Note that in this case

\begin{gather*} \lambda (\kern2pt f_{j+1}) - \lambda (\kern2pt f_{j} ) = (1-r_{n-j-1}) \lambda(I_{n-j-1} \setminus A_{j}) > 0, \\ \label{calfater} f_{j+1}(x) = r_{n-j-2}\quad\text{for all } x \in I_{n-j-2} \setminus A_{j+1}. \end{gather*}

If λ(I _n−j−1 \ A_j) ≤ 0 then define

\begin{align*} f_{j+1}(x) = \begin{cases} r_{n-j-2} & \text{if $x \in I_{n-j-1} \setminus A_{j}$,} \\ f_{j}(x) & \text{otherwise,} \end{cases} \quad \text{and} \quad A_{j+1} = A_{j}. \end{align*}

In this case λ(f _j+1) − λ(f_j) = (r _n−j−2 − r _n−j−1)λ(I _n−j−1 \ A_j) ≥ 0. We also have (A.1) by construction. Thus, in both cases, we have (A.3) with j replaced by j + 1. Continuing this way, we see that (A.3) holds for all j = 0, …, n − 1.

Let j = n − 1 in (A.3). From the definition of r_j and (A.2) we have r ₀ = 0 and I ₀ = S. Thus,

\begin{align*} \begin{gathered} \lambda (\kern2pt f_{n-1}) \geq \lambda (h_{n}),\qquad f_{n-1}(x) = 1 \quad\text{for all }x \in A_{n-1}, \\ f_{n-1}(x) = 0 \quad\text{for all } x \in S \setminus A_{n-1}.\end{gathered} \end{align*}

Since f _n−1 = 1_{A _n−1} and A _n−1 = I_j for some j ∈ {0, …, n − 1}, recalling (A.1) we have

\begin{align*} \lambda (h) - \frac{\|\lambda\|}{n} \leq \lambda (h_{n} ) \leq \lambda(A_{n-1}) \leq s(\lambda).\end{align*}

Applying sup_{h∈iH ₀} to the leftmost side, we see that b(λ) − 1/n ≤ s(λ). Since this is true for any n ∈ ℕ N, we obtain b(λ) ≤ s(λ).

The claim (3.1) follows from |a| = max{a, −a} and interchange of max and sup.

Proof of Theorem 3.1. Let (X, Y) be a coupling of (μ, ν), and define

\begin{align*} \mu'(B) \,:\!= \mathbb P\{X \in B,\, X \preceq Y\} \quad \text{and} \quad \nu'(B) \,:\!= \mathbb P\{Y \in B,\, X \preceq Y\}.\end{align*}

Clearly, μ ′ ≤ μ, ν′ ≤ ν and μ′(S) = ℙ{X ⪯ Y} = ν′(S). Moreover, for any increasing set $I \in \cal B$, we clearly have μ′(I) = ν′(I). Hence, (μ′, ν′) ∈ Φ(μ, ν) and ℙ{X ⪯ Y} = μ′(S) ≤ α_O(μ, ν). We now exhibit a coupling such that equality is attained. In doing so, we can assume that a : = α_O(μ, ν) > 0. If not then, for any $ (X,Y) \in \cal C(\mu, \nu)$, we have 0 ≤ ℙ{X ⪯ Y} ≤ α_O(μ, ν) = 0.

To begin, observe that, by Proposition 3.1, there exists a pair (μ′, ν′) ∈ Φ(μ, ν) with μ′(S) = ν′(S) = a. Let

$$\mu^r \,:\!= \frac{\mu - \mu'}{1 - a} \quad\text{and}\quad \nu^r \,:\!= \frac{\nu - \nu'}{1 - a}.$$

By construction, μ^r, ν^r, μ′/a, and ν′/a are probability measures satisfying

\begin{align*} \mu = (1 - a) \mu^r + a \Big(\frac{\mu'}{a}\Big) \quad \text{and} \quad \nu = (1 - a) \nu^r + a \Big(\frac{\nu'}{ a}\Big).\end{align*}

We construct a coupling (X, Y) as follows. Let U, X′, Y′, X^r, and Y^r be random variables on a common probability space such that

(a) $X' {\mathop = \limits^{\rm D}} \mu' / a$, $Y' {\mathop = \limits^{\rm D}} \nu' / a$, $X^r {\mathop = \limits^{\rm D}} \mu^r$, and $Y^r {\mathop = \limits^{\rm D}} \nu^r$,
(b) U is uniform on [0, 1] and independent of (X′, Y′, X^r, Y^r), and
(c) ℙ{X′ ⪯ Y′} = 1.

The pair in (c) can be constructed via the Nachbin–Strassen theorem [Reference Kamae, Krengel and O’Brien19, Theorem 1], since μ′/a ⪯_sd ν′/a. Now let

\begin{align*} X \,:= {\bf 1}\{U \leq a\} X' + {\bf 1}\{U > a\} X^r \quad \text{and} \quad Y \,:= {\bf 1}\{U \leq a\} Y' + {\bf 1}\{U > a\} Y^r.\end{align*}

Evidently, $ (X,Y) \in \cal C(\mu, \nu) $. Moreover, for this pair, we have

\begin{align*} \mathbb P\{X \preceq Y\} \geq \mathbb P\{X \preceq Y, \; U \leq a \} = \mathbb P\{X' \preceq Y', \; U \leq a \} .\end{align*}

By independence, the right-hand side is equal to ℙ{X′ ⪯ Y′} ℙ{U ≤ a} = a, so

$$\mathbb P\{X \preceq Y\} \geq a \,:\!= \alpha_O(\mu, \nu).$$

We conclude that

(A.4)

\begin{align} \label{eq:affosmo} \alpha_O(\mu, \nu) = \max_{(X, Y) \in \cal C(\mu, \nu)} \mathbb P\{ X \preceq Y \}.\end{align}

Next, observe that, for any $ (X, Y) \in \cal C(\mu, \nu) $ and h ∈ ibS, we have

\begin{align*} \mu(h) - \nu( h) & = {\mathbb E} h(X) - {\mathbb E} h(Y) \\ & = {\mathbb E} [ h(X) - h(Y)] {\bf 1}\{ X \preceq Y \} + {\mathbb E} [ h(X) - h(Y)] {\bf 1}\{ X \npreceq Y \} \\ & \leq {\mathbb E} [ h(X) - h(Y)] {\bf 1}\{ X \npreceq Y \} .\end{align*}

Specializing to h = 1_I for some $I \in i\cal B$, we have $\mu(I) - \nu(I) \leq \mathbb P\{ X \npreceq Y \} = 1 - \mathbb P\{ X \preceq Y \}$. From this bound and (A.4), the proof of (3.6) will be complete if we can show that

(A.5)

\begin{align} \label{eq:ci3} \sup_{(X, Y) \in \cal C(\mu, \nu)} \mathbb P\{X \preceq Y\} \geq 1 - \sup_{I \in i\cal B} \{ \mu(I) - \nu(I) \}.\end{align}

To prove (A.5), let $\cal B \otimes \cal B$ be the product σ-algebra on S × S and let π_i be the ith coordinate projection, so that π ₁(x, y) = x and π ₂(x, y) = y for any (x, y) ∈ S × S. As usual, given Q ⊂ S × S, we let π ₁(Q) be all x ∈ S such that (x, y) ∈ Q, and similarly for π ₂. Recall that $\mathfrak C$ is the closed sets in S and $d\mathfrak C$ is the decreasing sets in $\mathfrak C$. Strassen’s theorem [33] implies that, for any ∊ ≥ 0 and any closed set K ⊂ S × S, there exists a probability measure ξ on $ (S \times S, \cal B \otimes \cal B) $ with marginals μ and ν such that ξ(K) ≥ 1 − ∊ whenever

\begin{align*} \nu(F) \leq \mu(\pi_1(K \cap (S \times F))) + \epsilon \quad\text{for all } F \in \mathfrak C.\end{align*}

Note that if $F \in \mathfrak C$ then, since ‘⪯’ is a closed partial order, so is the smallest decreasing set d(F) that contains F. Let : $\epsilon \,:\!= \sup_{D \in d\mathfrak C} \{ \nu(D) - \mu(D) \}$, so that

\begin{align*} \epsilon \geq \sup_{F \in \mathfrak C} \{ \nu(d(F)) - \mu(d(F)) \} \geq \sup_{F \in \mathfrak C} \{ \nu(F) - \mu(d(F)) \}. \end{align*}

Noting that d(F) can be expressed as $\pi_1(\mathbb G \cap (S \times F))$, it follows that, for any $F \in \mathfrak C$,

\begin{align*} \nu(F) \leq \mu(\pi_1(\mathbb G \cap (S \times F))) + \epsilon.\end{align*}

Since ‘⪯’ is closed, $\mathbb G$ is closed, and Strassen’s theorem applies. From this theorem we obtain a probability measure ξ on the product space S × S such that $\xi(\mathbb G) \geq 1 - \epsilon$ and ξ has marginals μ and ν.

Because complements of increasing sets are decreasing and vice versa, we have

(A.6)

\begin{align} \label{eq:fis} \sup_{I \in i\cal B} \{ \mu(I) - \nu(I) \} \geq \sup_{D \in d\mathfrak C} \{ \nu(D) - \mu(D) \} = \epsilon \geq 1 - \xi(\mathbb G). \end{align}

Now consider the probability space $ (\Omega, \cal F, \mathbb P) = (S \times S, \cal B \otimes \cal B, \xi)$, and let X = π ₁ and Y = π ₂. We then have $\xi(\mathbb G) = \xi \{{(x,y) \in S \times S}{x \preceq y} = \mathbb P {X \preceq Y}\}$. Combining this equality with (A.6) implies (A.5).

A.2. Proofs of Section 4 results

We begin with an elementary lemma.

Lemma A.1. For any μ, $\nu \in \cal M$, we have μ ≤ ν whenever μ(h) ≤ ν(h) for all h ∈ cbS ₀

Proof. Suppose that μ(h) ≤ ν(h) for all h ∈ cbS ₀. We claim that μ(F)

(A.7)

\begin{align} \label{calamite} \text{$\mu(F) \leq \nu(F)$ for any closed set $F \subset S$.} \end{align}

To see this, let ρ be a metric compatible with the topology of S and let F be any closed subset of S. Let f _∊(x) := max{1 − ρ(x, F)/∊, 0} for ∊ > 0, x ∈ S, where ρ(x, F) = inf_y∈F ρ(x, y). Since ρ(·, F) is continuous and 0 ≤ f _∊≤ 1, we have f _∊ ∈ cbS ₀. Let F _∊= {x ∈ S: ρ(x, F) < ∊} for ∊ > 0. Note that f _∊(x) = 1 for all x ∈ F, and that f _∊(x) = 0 for all x ∉ F _∊. Thus,

(A.8)

\begin{align} \label{calamine} \mu(F) \leq \mu (\kern2pt f_{\epsilon}) \leq \nu (\kern2pt f_{\epsilon}) \leq \nu(F_{\epsilon}).\end{align}

Since F = ∩ _∊>0 F _∊, we have lim_∊↓0 ν(F _∊) = ν(F), so letting ∊ ↓ 0 in (A.8) yields μ(F) ≤ ν(F). Hence, (A.7) holds.

Let $B \in \cal B$ and fix ∊ > 0. Since all probability measures on a Polish space are regular, there exists a closed set F ⊂ B such that μ(B) ≤ μ(F) + ∊. Thus, by (A.7), we have μ(B) ≤ μ(F) + ∊ ≤ ν(F) + ∊ ≤ ν(B) + ∊. Since ∊ > 0 is arbitrary, this yields μ(B) ≤ ν(B). Hence, μ ≤ ν.

Proof of Theorem 4.1. Let {μ_n} be a Cauchy sequence in ($\cal P, \gamma $). Our first claim is that {μ_n} is tight. To show this, fix ∊ > 0. Let μ : = μ_N be such that

(A.9)

\begin{align} \label{eq:ngn} n \geq N \quad \Rightarrow \quad \gamma(\mu, \mu_n) < \epsilon.\end{align}

Let K be a compact subset of S such that μ(K) ≥ 1 − ∊ and let K̄ := i(K) ∩ d(K). We have

\begin{align*} \mu_n(\bar K^c) = \mu_n( i(K)^c \cup d(K)^c ) \leq \mu_n( i(K)^c ) + \mu_n(d(K)^c ). \end{align*}

For n ≥ N, this bound, (4.3), (A.9), and the definition of K yield

\begin{align*} \mu_n(\bar K^c) < \mu( i(K)^c ) + \mu(d(K)^c ) + 2\epsilon \leq \mu( K^c ) + \mu(K^c ) + 2\epsilon \leq 4 \epsilon. \end{align*}

Hence, {μ_n}_n≥N is tight. It follows that {μ_n}_n≥1 is likewise tight. As a result, by Prokhorov’s theorem, it has a subsequence that converges weakly to some $\mu^{*} \in \cal P$. We aim to show that γ (μ_n, μ*) → 0.

To this end, fix ϵ > 0 and let n_ϵ be such that γ(μ_m, μ_{n_ϵ}) < ϵ whenever m ≥ n_ϵ. Fix m ≥ n_ϵ and let ν : = μ_m. For all n ≥ n_ϵ, we have γ (ν, μ_n) < ϵ. Fixing any such n ≥ n_ϵ, we observe that, since g(μ_n, ν) < ϵ, there exists $(\tilde{\mu}_{n}, \tilde{\nu}_{n}) \in \Phi(\mu_{n}, \nu)$ with $\|\tilde{\mu}_{n}\| = \|\tilde{\nu}_{n}\| > 1-\epsilon$. Multiplying $\tilde{\mu}_{n}$ and $\tilde{\nu}_{n}$ by $ (1-\epsilon)/\|\tilde{\mu}_n\| < 1 $, denoting the resulting measures by $\tilde{\mu}_{n}$ and $\tilde{\nu}_{n}$ again, we have

(A.10)

\begin{align} \label{calembour} \tilde{\mu}_{n} \leq \mu_{n}, \qquad \tilde{\nu}_{n} \leq \nu, \qquad \|\tilde{\mu}_{n}\| = \|\tilde{\nu}_{n}\| = 1-\epsilon, \qquad \tilde{\mu}_{n} \preceq_{sd} \tilde{\nu}_{n}. \end{align}

Note that $\{\tilde{\nu}_{n}\}$ is tight. Since {μ_n} is tight, so is $\{\tilde{\mu}_{n}\}$. Thus, there exist subsequences $\{\mu_{n_{i}}\}_{i \in \mathbb N}, \{\tilde{\mu}_{n_{i}}\}_{i \in \mathbb N}$, and $\{\tilde{\nu}_{n_{i}}\}_{i \in \mathbb N}$ of {μ_n}, $\{\tilde{\mu}_{n}\}$, and $\{\tilde{\nu}_{n}\}$, respectively, such that, for some $\tilde{\mu}^{*}, \tilde{\nu}^{*} \in \cal M$ with $\|\tilde{\mu}^{*}\| = \|\tilde{\nu}^{*}\| = 1-\epsilon$, we have

\begin{align*} \mu_{n_{i}} {\buildrel {\rm w} \over \longrightarrow} \mu^{*}, \qquad \tilde{\mu}_{n_{i}} {\buildrel {\rm w} \over \longrightarrow} \tilde{\mu}^{*}, \qquad \tilde{\nu}_{n_{i}} {\buildrel {\rm w} \over \longrightarrow} \tilde{\nu}^{*}, \qquad \tilde{\mu}_{n_{i}} \preceq_{sd} \tilde{\nu}_{n_{i}} \quad\text{for all } i \in \mathbb N. \end{align*}

Given h ∈ cbS ₀, since $\tilde{\mu}_{n_{i}} (h) \leq \mu_{n_{i}} (h)$ and $\tilde{\nu}_{n_{i}} (h) \leq \nu (h)$ for all $i \in \mathbb N$ by (A.10), we have $\tilde{\mu}^{*} (h) \leq \mu^{*} (h)$ and $\tilde{\nu}^{*} (h) \leq \nu (h)$ by weak convergence. Thus, $\tilde{\mu}^{*} \leq \mu^{*}$ and ˜ν ^∗ ≤ ν by Lemma A.1. We have $\tilde{\mu}^{*} \preceq_{sd} \tilde{\nu}^{*}$ by [Reference Kamae, Krengel and O’Brien19, Proposition 3]. It follows that $ (\tilde{\mu}^{*}, \tilde{\nu}^{*}) \in \Phi(\mu^{*}, \nu) $. We have $g(\mu^{*}, \nu) \leq 1 - \|\tilde{\mu}^{*}\| = \epsilon$.

By a symmetric argument, we also have g(ν, μ ^∗) ≤ ϵ. Hence, γ (ν, μ ^∗) ≤ 2ϵ. Recalling the definition of ν, we have now shown that, for all m ≥ n_ϵ, γ(μ_m, μ ^∗) ≤ 2ϵ. Since was arbitrary, this concludes the proof.

A.3. Proofs of Section 5 results

We begin with some lemmas.

Lemma A.2. If P is monotone then $\sigma(P) = \inf_{(\mu, \nu) \in \cal P \times \cal P} \alpha_O(\mu P, \nu P)$.

Proof. Let P be a monotone Markov kernel. It suffices to show that the inequality $\sigma(P) \leq \inf_{(\mu, \nu) \in \cal P \times \cal P} \alpha_O(\mu P, \nu P)$ holds, since the reverse inequality is obvious. By the definition of σ(P) and the identities in (3.6), the claim will be established if we can show that

(A.11)

\begin{align} \label{eq:eled} \sup_{x,y} \sup_{I \in i\cal B} \{ P(x, I) - P(y, I) \} \geq \sup_{\mu,\nu} \sup_{I \in i\cal B} \{ \mu P(I) - \nu P(I) \}, \end{align}

where x and y are chosen from S and μ and ν are chosen from $\cal P$. Let s be the value of the right-hand side of (A.11) and let ϵ > 0. Fix μ, $\nu \in \cal P$ and $I \in i\cal B$ such that μP(I) − νP(I) > s − ϵ, or, equivalently,

\begin{align*} \int \{ P(x, I) - P(y, I) \} (\mu \times \nu)({\rm d} x, {\rm d} y) > s - \epsilon. \end{align*}

From this expression we see that there are x̄, ȳ ∈ S such that P(x̄, I) − P(ȳ, I) > s − ϵ. Hence, $\sup_{x,y} \sup_{I \in i\cal B} \{ P(x, I) - P(y, I) \} \geq s$, as was to be shown.

Lemma A.3. If μ, $\nu \in \cal M$ and ($(\tilde{\mu}, \tilde{\nu})$) is an ordered component pair of (μ, ν), then

\begin{align*} g(\mu P, \nu P) \leq g((\mu - \tilde{\mu}) P, (\nu - \tilde{\nu}) P). \end{align*}

Proof. Fix μ, ν in $\cal M$ and $(\tilde{\mu}, \tilde{\nu}) \in \Phi(\mu, \nu)$. Consider the residual measures $\hat{\mu} \,:\!= \mu - \tilde{\mu} $ and $\hat{\nu} \,:\!= \nu - \tilde{\nu}$. Let (μ ^′, ν ^′) be a maximal ordered component pair of ($\hat{\mu} P, \hat{\nu} P$), and define

\begin{align*} \mu^{*} \,:\!= \mu' + \tilde{\mu} P \quad \text{and} \quad \nu^{*} \,:\!= \nu' + \tilde{\nu} P.\end{align*}

We claim that (μ ^∗, ν ^∗) is an ordered component pair for (μP, νP). To see this, note that

\begin{align*} \mu^{*} = \mu' + \tilde{\mu} P \leq \hat{\mu} P + \tilde{\mu} P = (\hat{\mu} + \tilde{\mu}) P = \mu P, \end{align*}

and, similarly, ν ^∗ ≤ νP. The measures μ ^∗ and ν ^∗ also have the same mass, since

\begin{align*} \|\mu^{*}\| = \|\mu'\| + \|\tilde{\mu} P\| = \|\mu'\| + \|\tilde{\mu}\| = \|\nu'\| + \|\tilde{\nu}\| = \|\nu'\| + \|\tilde{\nu} P\| = \|\nu^{*}\|.\end{align*}

Moreover, since $\tilde{\mu} \preceq_{sd} \tilde{\nu}$ and P is monotone, we have $\tilde{\mu} P \preceq_{sd} \tilde{\nu} P$. Hence, μ ^∗ ⪯_sd ν ^∗, completing the claim that (μ ^∗, ν ^∗) is an ordered component pair for (μP, νP). As a result,

\begin{align*} g(\mu P, \nu P) & \leq \|\mu P\| - \|\mu^{*}\| \\ &= \|\mu P\| - \|\mu'\| - \|\tilde{\mu} P\| \\ & = \|\mu\| - \|\mu'\| - \|\tilde{\mu}\| \\ & = \|\hat{\mu}\| - \|\mu'\| \\ & = \|\hat{\mu} P\| - \|\mu'\| \\ &= g(\hat{\mu} P, \hat{\nu} P).\\[-36pt] \end{align*}

Proof of Theorem 5.1. Let μ, $\nu \in \cal P$. Let ($\tilde{\mu}, \tilde{\nu}$) be a maximal ordered component pair for (μ, ν). Let $\hat{\mu} = \mu - \tilde{\mu}$ and $\hat{\nu} = \nu - \tilde{\nu}$ be the residuals. Since $\|\tilde{\mu}\| = \|\tilde{\nu}\|$, we have $\|\hat{\mu}\| = \|\hat{\nu}\|$. Suppose first that $\|\hat{\mu}\| > 0$. Then $\hat{\mu} P/\|\hat{\mu}\|$ and $\hat{\nu} P/\|\hat{\mu}\|$ are both in $\cal P$. Thus, by Lemma A.2,

\begin{align*} 1 - \alpha_O\left(\frac{\hat{\mu} P }{\|\hat{\mu}\|}, \frac{\hat{\nu} P}{\|\hat{\mu}\|}\right) \leq 1 - \sigma(P).\end{align*}

Applying the positive homogeneity property in Lemma 3.3 yields

\begin{align*} \|\hat{\mu}\| - \alpha_O(\hat{\mu} P , \hat{\nu} P) \leq (1 - \sigma(P)) \, \|\hat{\mu}\|. \end{align*}

Note that this inequality trivially holds if $\|\hat{\mu}\| = 0$. From the definition of g, we can write the same inequality as $g(\hat{\mu} P, \hat{\nu} P) \leq (1 - \sigma(P)) g(\mu, \nu)$. If we apply Lemma A.3 to the latter we obtain

\begin{align*} g(\mu P, \nu P) \leq (1 - \sigma(P)) g(\mu, \nu). \end{align*}

Reversing the roles of μ and ν, we also have g(νP, μP) ≤ (1 − σ(P))g(ν, μ). Thus,

\begin{align*} \gamma(\mu P, \nu P) & = g(\mu P, \nu P) + g(\nu P, \mu P) \\ & \leq (1 - \sigma(P)) [g(\mu, \nu) + g(\nu, \mu)] \\ & = (1 - \sigma(P)) \gamma(\mu, \nu), \end{align*}

verifying the claim in (5.8).

Proof of Lemma 5.4. To see that (5.9) holds, fix ξ > σ(P) and suppose first that σ(P) = 1. Then (5.9) holds because the right-hand side of (5.9) can be made strictly negative by choosing μ, $\nu \in \cal P$ to be distinct. Now suppose that σ(P) < 1 holds. It suffices to show that,

(A.12)

$$\begin{align} \label{eq:ev2} \text{for all } \epsilon > 0, \text{ there exists } x, y \in S \text{ such that } y \preceq x, x \npreceq y \text{ and } \alpha_0(P_x, P_y) < \sigma(P) + \epsilon. \end{align}$$

Indeed, if we take (A.12) as valid, set ϵ:= ξ − σ(P) and choose x and y to satisfy the conditions in (A.12), then we have

$$\gamma(P_x, P_y) = 2 - \alpha_0(P_x, P_y) - \alpha_0(P_y, P_x) > 2 - \xi - 1 = 1 - \xi = (1 - \xi) \gamma(\delta_x, \delta_y).$$

Therefore, (5.9) holds.

To show that (A.12) holds, fix ϵ > 0. We can use σ(P) < 1 and the definition of σ(P) as an infimum to choose an δ ∈ (0, ϵ) and points x̄, ȳ ∈ S such that α ₀(P_x̄, P_ȳ) < σ(P) + δ < 1. Note that x̄ ⪯ x̄ cannot hold here, because then α ₀(P_x̄, P_ȳ) = 1, a contradiction. So suppose instead that $\bar x \npreceq \bar y$. Let y be a lower bound of x̄ and ȳ and let x := x̄. We claim that (A.12) holds for the pair (x, y).

To see this, observe that, by the monotonicity result in Lemma 5.1 and y ⪯ ȳ, we have

$$\alpha_0(P_x, P_y) = \alpha_0(P_{\bar x}, P_y) \leq \alpha_0(P_{\bar x}, P_{\bar y}) < \sigma(P) + \delta < \sigma(P) + \epsilon.$$

Moreover, y ⪯ x because x = x̄ and y is by definition a lower bound of x̄. Finally, $x \npreceq y$ because if not then x̄ = x ⪯ y ⪯ ȳ, contradicting our assumption that $\bar x \npreceq \bar y$.

References

Bhattacharya, R. and Lee, O. (1988). Asymptotics of a class of Markov processes which are not in general irreducible. Ann. Prob. 16, 1333–1347.CrossRef Google Scholar

Bhattacharya, R., Majumdar, M. and Hashimzade, N. (2010). Limit theorems for monotone Markov processes. Sankhyā A 72, 170–190.CrossRef Google Scholar

Chakraborty, S. and Rao, B. (1998). Completeness of the Bhattacharya metric on the space of probabilities. Statist. Probab. Lett. 36, 321–326.CrossRef Google Scholar

Diaconis, P. and Freedman, D. (1999). Iterated random functions. SIAM Review 41, 45–76.Google Scholar

Dobrushin, R. L. (1956). Central limit theorem for nonstationary Markov chains. Theory Prob. Appl. 1, 65–80.CrossRef Google Scholar

Doeblin, W. (1937). Sur les propriétés asymptotiques de mouvements régis par certains types de chaînes simples. Bull. Math. Soc. Roum. Sci. 39, (1) 57–115; (2), 3–61.Google Scholar

Doeblin, W. (1938). Exposé de la théorie des chaînes simples constantes de Markov à un nombre fini d’états. Mathématique de l’Union Interbalkanique 2, 78–80.Google Scholar

Doeblin, W. (1940). Éléments d’une théorie générale des chaînes simples constantes de Markoff. In Ann. Sci. Ec. Norm. Supér. 57, 61–111.CrossRef Google Scholar

Doob, J. L. (1953). Stochastic Processes. Wiley, New York.Google Scholar

Dubins, L. E. and Freedman, D. A. (1966). Invariant probabilities for certain Markov processes. Ann. Math. Statist. 37, 837–848.CrossRef Google Scholar

Dudley, R. (2002). Real Analysis and Probability. Cambridge University Press.CrossRef Google Scholar

Gibbs, A. L. and Su, F. E. (2002). On choosing and bounding probability metrics. Int. Stat. Rev. 70, 419–435.CrossRef Google Scholar

Givens, C. R., Shortt, R. M. (1984). A class of Wasserstein metrics for probability distributions. Michigan Math. J. 31, 231–240.Google Scholar

Harris, T. E. (1956). The existence of stationary measures for certain Markov processes. In Proc. Third Berkeley Symp. on Mathematical Statistics and Probability, Vol. 2, University of California Press, pp. 113–124.Google Scholar

Hernández-Lerma, O. and Lasserre, J. (2003). Markov Chains and Invariant Probabilities. Springer.CrossRef Google Scholar

Hopenhayn, H. A. and Prescott, E. C. (1992). Stochastic monotonicity and stationary distributions for dynamic economies. Econometrica 60, 1387–1406.Google Scholar

Jarner, S. and Tweedie, R. (2001). Locally contracting iterated functions and stability of Markov chains. J. Appl. Prob. 38, 494–507.CrossRef Google Scholar

Kamae, T. and Krengel, U. (1978). Stochastic partial ordering. Ann. Prob. 6, 1044–1049.CrossRef Google Scholar

Kamae, T., Krengel, U. and O’Brien, G. (1977). Stochastic inequalities on partially ordered spaces. Ann. Prob. 5, 899–912.CrossRef Google Scholar

Kamihigashi, T. and Stachurski, J. (2012). An order-theoretic mixing condition for monotone Markov chains. Statist. Probab. Lett. 82, 262–267.CrossRef Google Scholar

Kamihigashi, T. and Stachurski, J. (2014). Stochastic stability in monotone economies. Theoret. Econom. 9, 383–407.Google Scholar

Lindvall, T. (2002). Lectures on the Coupling Method. Dover.Google Scholar

Machida, M. and Shibakov, A. (2010). Monotone bivariate Markov kernels with specified marginals. Proc. Amer. Math. Soc. 138, 2187–2194.Google Scholar

Markov, A. A. (1906). Extension of the law of large numbers to dependent quantities. Izv. Fiz.-Matem. Obsch. Kazan Univ. (2nd ser.) 15, 135–156.Google Scholar

Meyn, S. P. and Tweedie, R. L. (2012). Markov Chains and Stochastic Stability. Springer Science & Business Media.Google Scholar

Nummelin, E. (2004). General Irreducible Markov Chains and Non-Negative Operators. Cambridge University Press.Google Scholar

Orey, S. (1959). Recurrent Markov chains. Pacific J. Math. 9, 805–827.CrossRef Google Scholar

Pitman, J. W. (1974). Uniform rates of convergence for Markov chain transition probabilities. Prob. Theory Relat. Fields 29, 193–227.Google Scholar

Pollard, D. (2002). A User’s Guide to Measure Theoretic Probability. Cambridge University Press.Google Scholar

Razin, A. and Yahav, J. A. (1979). On stochastic models of economic growth. Internat. Econom. Review 20, 599–604.CrossRef Google Scholar

Revuz, D. (2008). Markov Chains. Elsevier.Google Scholar

Seneta, E. (1979). Coefficients of ergodicity: structure and applications. Adv. Appl. Prob. 11, 576–590.CrossRef Google Scholar

Strassen, V. (1965). The existence of probability measures with given marginals. Ann. Math. Statist. 36, 423–439.Google Scholar

Thorisson, H. (2000). Coupling, Stationarity, and Regeneration. Springer, New York.CrossRef Google Scholar

Wu, W. B. and Shao, X. (2004). Limit theorems for iterated random functions. J. Appl. Prob. 41, 425–436.CrossRef Google Scholar

Yahav, J. A. (1975). On a fixed point theorem and its stochastic equivalent. J. Appl. Prob. 12, 605–611.Google Scholar

Yosida, K. and Kakutani, S. (1941). Operator-theoretical treatment of Markoff’s process and mean ergodic theorem. Ann. Math. 42, 188–228.Google Scholar

Figure 1: A time series from the inventory model

Article contents

A unified stability theory for classical and monotone Markov chains

Abstract

Keywords

MSC classification

1. Introduction

2. Preliminaries

3. Ordered affinity

3.1. Preliminaries

3.2. Definition of ordered affinity

3.3. Properties of ordered affinity

3.4. Equivalent representations

3.5. Comments on Theorem 3.1

4. Total ordered variation

4.1. Connection to other modes of convergence

4.2. Completeness

5. Applications

5.1. Order affinity and monotone Markov kernels

5.2. Monotone Markov chains

5.3. Nonexpansiveness

5.4. An order coupling bound for Markov chains

5.5. Uniform ergodicity

5.6. Applications from the introduction

5.7. General applications

Appendix A

A.1. Proofs of Section 3 results

A.2. Proofs of Section 4 results

A.3. Proofs of Section 5 results

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests