Hostname: page-component-745bb68f8f-d8cs5 Total loading time: 0 Render date: 2025-02-06T18:22:43.410Z Has data issue: false hasContentIssue false

A unified stability theory for classical and monotone Markov chains

Published online by Cambridge University Press:  12 July 2019

Takashi Kamihigashi*
Affiliation:
Kobe University
John Stachurski*
Affiliation:
Australian National University
*
*Postal address: Research Institute of Economics and Business, Kobe University, Japan. Email address: tkamihig@rieb.kobe-u.ac.jp
**Postal address: Research School of Economics, Australian National University, Australia. Email address: john.stachurski@anu.edu.au
Rights & Permissions [Opens in a new window]

Abstract

In this paper we integrate two strands of the literature on stability of general state Markov chains: conventional, total-variation-based results and more recent order-theoretic results. First we introduce a complete metric over Borel probability measures based on ‘partial’ stochastic dominance. We then show that many conventional results framed in the setting of total variation distance have natural generalizations to the partially ordered setting when this metric is adopted.

Type
Research Papers
Copyright
© Applied Probability Trust 2019 

1. Introduction

Following the work of Wolfgang Doeblin ([Reference Doeblin6], [Reference Doeblin7], [Reference Doeblin8]), many classical results from Markov chain theory have built on fundamental connections between total variation distance, Markov chains and couplings. For some models, however, total variation convergence is too strong. In response, researchers have developed an alternative methodology based on monotonicity ([Reference Bhattacharya and Lee1], [Reference Dubins and Freedman10], [Reference Yahav36]). In this line of research, transition probabilities are assumed to have a form of monotonicity not required in the classical theory. At the same time, mixing conditions are generally weaker, as is the notion of convergence to the stationary distribution. Further contributions to this approach can be found in [Reference Bhattacharya, Majumdar and Hashimzade2], [Reference Hopenhayn and Prescott16], [Reference Kamihigashi and Stachurski20], and [Reference Razin and Yahav30]. For some recent extensions and applications in economics, see [Reference Kamihigashi and Stachurski21].

To give one example, consider a Markov chain {Xt} defined by

(1.1)\begin{align}\label{eq:bern} X_{t+1} = \frac{X_t + W_{t+1}}{2},\end{align}

where {Wt}t≥1 is an independent and identically distributed (i.i.d.) Bernoulli($(\tfrac12)$) random sequence, taking values 0 and 1 with equal probability. For the state space take S = [0, 1]. Let Pt(x, ·) be the distribution of Xt given X 0 = xS. Clearly, if Xt is a rational number in S then so is X t+1. Similarly, if Xt is irrational then so is X t+1. Thus, if x and y are rational and irrational, respectively, the distributions Pt(x, ·) and Pt(y, ·) are concentrated on disjoint sets, and, hence, when ‖ · ‖ is the total variation norm,

Figure 1: A time series from the inventory model

\begin{align*}\| P^t(x, \cdot) - P^t(y, \cdot) \| = 2 \quad {{\rm for}\; {\rm all}\; t \in {\mathbb N}.} \end{align*}

Total variation convergence fails for this class of models.

At the same time, the right-hand side of (1.1) is increasing in the current state for each fixed value of the shock W t+1. Moreover, trajectories mix in a monotone sense: a trajectory starting at X 0 = 0 can approach 1 with a suitable string of shocks and a trajectory starting at 1 can approach 0. Using these facts, one can show using the results in [Reference Bhattacharya and Lee1], say, that a unique stationary distribution exists and the distribution of Xt converges to it in a complete metric defined over the Borel probability measures that is weaker than total variation convergence.

This is one example where monotone methods can be used to establish some form of stability, despite the fact that the classical conditions based around total variation convergence fail. Conversely, there are many models that the monotone methods developed in [Reference Bhattacharya and Lee1] and related papers cannot accommodate, while the classical theory based around total variation convergence handles them easily. One example is the simple ‘inventory’ model

(1.2)\begin{align} \label{eq:invent} X_{t+1} = \begin{cases} (X_t - W_{t+1})_+ & \text{if } X_t > 0,\\ (K - W_{t+1})_+ & \text{if } X_t = 0, \end{cases} \end{align}

where x + := max{x, 0}. Again, {Wt} is i.i.d. Assume that ln Wt is standard normal. The state space we take to be S = [0, K]. Figure 1 shows a typical trajectory when K = 100 and X 0 = 50.

On the one hand, the monotone methods in [Reference Bhattacharya and Lee1] cannot be applied here because of a failure of monotonicity with respect to the standard ordering of ℝ. On the other hand, the classical theory based around total variation convergence is straightforward to apply. For example, one can use Doeblin’s condition (see, e.g. [Reference Meyn and Tweedie25, Theorem 16.2.3]) to show the existence of a unique stationary distribution to which the distribution of Xt converges in total variation, regardless of the distribution of X 0. In the terminology of [Reference Meyn and Tweedie25], the process is uniformly ergodic.

The purpose of this paper is to show that both of these stability results (i.e. the two sets of results concerning the two models (1.1) and (1.2)), which were based on two hitherto separate approaches, can be derived from the same theoretical framework. More generally, we construct stability results that encompasses all uniformly ergodic models in the sense of [Reference Meyn and Tweedie25] and all monotone models shown to be stable in [Reference Bhattacharya and Lee1], as well as extending to other monotone or partially monotone models on state spaces other than ℝn.

We begin our analysis by introducing what is shown to be a complete metric γ on the set of Borel probability measures on a partially ordered Polish space that includes total variation distance, the Kolmogorov uniform distance, and the Bhattacharya distance ([Reference Bhattacharya and Lee1], [Reference Chakraborty and Rao3]) as special cases. We show that many fundamental concepts from conventional Markov chain theory using total variation distance and coupling have direct generalizations to the partially ordered setting when this new metric is adopted. Then, by varying the choice of partial order, we recover key aspects of both classical total-variation-based stability theory and monotone methods as special cases.

Prior to commencing we note that, in terms of existing literature, there is an additional line of research that deals with Markov models for which the classical conditions of irreducibility and total variation convergence fail. Irreducibility is replaced by an assumption that the law of motion for the state is itself contracting ‘on average’, and this contractivity is then passed on to an underlying metric over distributions that conforms in some way to the topology of the state space. See, for example, [Reference Diaconis and Freedman4], [Reference Jarner and Tweedie17], or [Reference Wu and Shao35]. Such results can be used to show stability of our first example, which contracts on average with respect to the usual metric on R. On the other hand, it cannot be directly applied to our second (i.e. inventory) example using the same metric, since the law of motion contains a jump. In [Reference Bhattacharya and Lee1] and [Reference Hopenhayn and Prescott16] one can find other applications where monotone methods can be used—including the results developed here— while the ‘contraction on average’ conditions of [Reference Diaconis and Freedman4] and [Reference Wu and Shao35] do not in general hold. See, for example, the growth model studied in Section 6.B of [Reference Hopenhayn and Prescott16], where a high marginal product of capital near zero generates rapid growth when the capital stock is low. One consequence is that the law of motion does not typically exhibit contraction on average in the neighbourhood of zero.

After preliminaries, we begin with a discussion of ‘ordered’ affinity, which generalizes the usual notion of affinity for measures. The concept of ordered affinity is then used to define the total ordered variation metric. Throughout the paper, longer proofs are deferred to Appendix A. The conclusion contains suggestions for future work.

2. Preliminaries

Let S be a Polish (i.e. separable and completely metrizable) space, let $\cal O$ be the open sets, let $\mathfrak C$ be the closed sets, and let $\cal B$ be the Borel sets. Let ${\cal M}_s$ denote the set of all finite signed measures on ($(S, \cal B)$). In other words, ${\cal M}_s$ is all countably additive set functions from $\cal B$ to ℝ. Let $\cal M$ and $\cal P$ be the finite measures and probability measures in ${\cal M}_s$, respectively. If κ and λ are in ${\cal M}_s$, then kλ means that κ(B) ≤ λ(B) for all BB. The symbol δx denotes the probability measure concentrated on xS.

Let bS be the set of all bounded $\cal B$-measurable functions from S into ℝ. If hbS and $\lambda \in \cal M_s$, then λ(h) := dλ. For f and g in bS, the statement fg means that f (x) ≤ g(x) for all xS. Let

\begin{align*}H \,: = \{ {h \in bS }:{ -1 \leq h \leq 1} \} \quad \text{and} \quad H_0 \,: = \{ { h \in bS }:{ 0 \leq h \leq 1}\}.\end{align*}

The total variation norm of $\lambda \in \cal M_s$ is ||λ|| := suphH |λ(h)|. Given μ and ν in $\cal P$, a random element (X, Y) taking values in S × S and defined on a common probability space (Ω, $\cal F$, ℙ) is called a coupling of (μ, ν) if μ = ℙ ° X −1 and ν = ℙ ° Y −1 (i.e. if the distribution of (X, Y) has marginals μ and ν, respectively; see, e.g. [Reference Lindvall22] or [Reference Thorisson34]). The set of all couplings of (μ, ν) is denoted below by $\cal C(\mu, \nu)$. A sequence $\{\mu_n\} \subset \cal P$ converges to $\mu \in \cal P$ weakly if μn(h) → μ(h) as n → ∞ for all continuous hbS. In this case we write $\mu_n {\buildrel {\rm w} \over \longrightarrow} \mu$.

Given μ and $\nu \in \cal M$, their measure-theoretic infimum μν is the largest element of $\cal M$ dominated by both μ and ν. It can be defined by taking f and g to be densities of μ and ν, respectively, under the dominating measure λ : = μ + ν and defining μν by (μν)(B) := ∫B min{ f (x), g(x)}λ(dx) for all $B \in \cal B$. The total variation distance between μ and ν is related to μν via ||μν|| = ||μ|| + ||ν|| − 2 ||μν||. See, for example, [Reference Pollard29]. For probability measures, we also have

(2.1)\begin{align} \label{eq:sisf} \sup_{B \in \cal B} \{ \mu(B) - \nu(B) \} = \sup_{B \in \cal B} |\mu(B) - \nu(B) | = \tfrac12\| \mu - \nu \| . \end{align}

The affinity between two measures μ, ν in $\cal M$ is the value α(μ, ν) := (μν)(S). The following properties are elementary.

Lemma 2.1. For all $ (\mu, \nu) \in \cal M \times {\cal M}$, we have

  1. (a) 0 ≤ α(μ, ν) ≤ min{μ(S), ν(S)},

  2. (b) α(μ, ν) = μ(S) = ν(S) if and only if μ = ν,

  3. (c) α(cμ, cν) = (μ, ν) for all c ≥ 0.

There are several other common representations of affinity. For example, when μ and ν are both probability measures, we have

(2.2)\begin{align} \label{eq:affc}\alpha(\mu, \nu)= 1 - \sup_{B \in \cal B} | \mu(B) - \nu(B) |= \max_{(X, Y) \in \cal C(\mu, \nu)} \mathbb P\{ X = Y \}.\end{align}

(See, e.g. [Reference Lindvall22] and [Reference Pollard29].) The second equality in (2.2) states that, if $ (X,Y) \in \cal C(\mu, \nu)$, then ℙ{X = Y} ≤ α(μ, ν), and, moreover, there exists a $ (X, Y) \in \cal C(\mu, \nu)$ such that equality is attained. Any such coupling is called a maximal or gamma coupling. See Theorem 5.2 of [Reference Lindvall22]. From (2.1) and (2.2) we obtain

(2.3)\begin{align} \label{eq:tvmtap} \| \mu - \nu \| = 2(1 - \alpha(\mu, \nu)). \end{align}

3. Ordered affinity

We next introduce a generalization of affinity when S has a partial order. We investigate its properties in detail, since both our metric and the stability theory presented below rely on this concept.

3.1. Preliminaries

As before, let S be a Polish space. A closed partial order ‘⪯’ on S is a partial order ‘⪯’ such that its graph

\begin{align*}\mathbb G \,:= \{{(x, y) \in S \times S}{x \preceq y}\}\end{align*}

is closed in the product topology. Henceforth, a partially ordered Polish space is any such pair (S, ⪯), where S is nonempty and Polish, and ‘⪯’ is a closed partial order on S. When no confusion arises, we denote it simply by S. The partial order is assumed to be closed in the theory developed below because we build a metric over Borel probability measures that depends on this partial order and closedness of the order is required to show that the metric is complete.

For such a space S, we call IS increasing if xI and xy implies that yI. We call h: S → ℝ increasing if xy implies that h(x) ≤ h(y). We let $i\cal B,\,i\cal O$, and $i\mathfrak C$ denote the increasing Borel, open, and closed sets, respectively, while ibS is the increasing functions in bS. In addition,

  • iH := HibS = {hibS: − 1 ≤ h ≤ 1}, and

  • iH 0 := H 0ibS = {hibS: 0 ≤ h ≤ 1}.

If $B \in \cal B$ then i(B) is all yS such that xy for some xB, while d(B) is all yS such that yx for some xB. Given μ and ν in $\cal M$, we say that μ is stochastically dominated by ν and write μsd ν if μ(S) = ν(S) and μ(I) ≤ ν(I) for all $I \in i\cal B$. Equivalently, μ(S) = ν(S) and μ(h) ≤ ν(h) for all h in iH or iH 0. See [Reference Kamae and Krengel18].

One important partial order on S is the identity order, where xy if and only if x = y. Then $i\cal B = \cal B$, ibS = bS, iH = H, iH 0 = H 0, and μsd ν if and only if μ = ν.

Remark 3.1. Since S is a partially ordered Polish space, for any μ, ν in $\cal P$, we have μ = ν whenever μ(C) = ν(C) for all $C \in i \mathfrak C$, or, equivalently, μ(h) = ν(h) for all continuous hibS. See [Reference Kamae and Krengel18, Lemma 1]. Hence, μsd ν and νsd μ imply that μ = ν.

Lemma 3.1. If $\lambda \in \cal M_s$ then $\sup_{I \in i\cal B} \lambda(I) = \sup_{h \in iH_0} \lambda(h)$ and

(3.1)\begin{align}\label{eq:cfs2} \sup_{h \in iH} |\lambda(h)| = \max \Big\{\sup_{h \in iH} \lambda(h),\; \sup_{h \in iH} (-\lambda)(h) \Big\}\!.\\[-12pt]\nonumber \end{align}

The proof is given in Appendix A.1. We can easily check that

(3.2)\begin{align}\label{eq:chho}\lambda \in \cal M_s \quad\text{and}\quad \lambda(S) = 0 \quad \Rightarrow \quad \sup_{h \in iH} \lambda(h) = 2 \sup_{h \in iH_0} \lambda(h). \end{align}

3.2. Definition of ordered affinity

For each pair $ (\mu, \nu) \in {\cal M} \times \cal M$, let

\begin{align*} {\Phi(\mu, \nu) \,:= \{ {(\mu', \nu') \in \cal M \times \cal M }: {\mu' \leq \mu,\, \nu' \leq \nu,\, \mu' {\preceq_{sd}} \nu'}\}}.\end{align*}

We call Φ(μ, ν) the set of ordered component pairs for (μ, ν). Here ‘ordered’ means ordered by stochastic dominance. The set of ordered component pairs is always nonempty, since (μν, μν) is an element of Φ(μ, ν).

Example 3.1. If μ and ν are two measures satisfying μsd ν, then (μ, ν) ∈ Φ (μ, ν).

Example 3.2. Given Bernoulli distributions μ = (δ 1 + δ 2)/2 and ν = (δ 0 + δ 1)/2, we have (μ′, ν′) ∈ Φ (μ, ν) when μ′= ν′= δ 1/2.

We call an ordered component pair (μ′, ν′) ∈ Φ (μ, ν) a maximal ordered component pair if it has greater mass than all others; that is, if

\begin{align*} \mu''(S) \leq \mu'(S) \quad \text{for all } (\mu'', \nu'') \in \Phi(\mu, \nu). \end{align*}

(We can restate this by replacing μ′(S) and μ″(S) with ν′(S) and ν″(S), respectively, since the mass of ordered component pairs is equal by the definition of stochastic dominance.) We let Φ(μ, ν) denote the set of maximal ordered component pairs for (μ, ν). Thus, if

(3.3)\begin{align}\label{eq:dsig}{ \alpha_O(\mu, \nu)\,:= {\rm sup} \{ {\mu'(S) }{ (\mu', \nu') \in \Phi(\mu, \nu) }\}},\end{align}

then

\begin{align*}{\Phi^*(\mu, \nu)= \{{(\mu', \nu') \in \Phi(\mu, \nu)}{\mu'(S) = \alpha_O(\mu, \nu)}\}.}\end{align*}

Using the Polish space assumption, one can show that maximal ordered component pairs always exist.

Proposition 3.1. The set Φ(μ, ν) is nonempty for all ${(\mu, \nu) \in \cal M \times \cal M}$.

Proof. Fix ${(\mu, \nu) \in \cal M \times \cal M}$, and let s := αO(μ, ν). From the definition, we can take sequences $\{\mu'_n\}$ and $\{\nu'_n\}$ in $\cal M$ such that $ (\mu'_n, \nu'_n) \in \Phi(\mu, \nu) $ for all n ∈ ℕ and $\mu'_n(S) \uparrow s$. Since $\mu'_n \leq \mu$ and $\nu'_n \leq \nu$ for all n ∈ ℕ, Prokhorov’s theorem [Reference Dudley11, Theorem 11.5.4] implies that these sequences have convergent subsequences with $\mu'_{n_k} \buildrel {\rm w} \over \longrightarrow \mu'$ and $\nu'_{n_k} \buildrel {\rm w} \over \longrightarrow \nu'$ for some $\mu',\, \nu' \in \cal M$. We claim that (μ′, ν′) is a maximal ordered component pair.

Since $\mu'_{n_k} \buildrel {\rm w} \over \longrightarrow \mu'$ and $\mu'_n \leq \mu$ for all n ∈ ℕ, Theorem 1.5.5 of [Reference Hernández-Lerma and Lasserre15] implies that, for any Borel set B, we have μnk(B) → μ′(B) in ℝ. Hence, μ′(B) ≤ μ(B) and, in particular, μ′≤ μ. An analogous argument gives ν′ ≤ ν. Moreover, the definition of Φ(μ, ν) and stochastic dominance imply that $\mu'_n(S) = \nu'_n(S)$ for all n ∈ ℕ, and, therefore, μ (S) = ν (S). Also, for any $I \in i\cal B$, the fact that $\mu'_n(I) \leq \nu'_n(I)$ for all n ∈ ℕ gives μ (I) ≤ ν (I). Thus, μ sd ν . Finally, μ (S) = s, since $\mu'_n(S) \uparrow s$. Hence, (μ , ν ) lies in Φ(μ, ν).

The value αO(μ, ν) defined in (3.3) gives the mass of the maximal ordered component pair. We call it the ordered affinity from μ to ν. On an intuitive level, we can think of αO(μ, ν) as the ‘degree’ to which μ is dominated by ν in the sense of stochastic dominance. Since (μν, μν) is an ordered component pair for (μ, ν), we have

(3.4)\begin{align}\label{eq:aldbal} 0 \leq \alpha(\mu, \nu) \leq \alpha_O(\mu, \nu),\end{align}

where α(μ, ν) is the standard affinity defined in Section 2. In fact, αO(μ, ν) generalizes the standard the notion of affinity by extending it to arbitrary partial orders, as shown in the next lemma.

Lemma 3.2. If ⪯ is the identity order then $\alpha_0 = \alpha \; on \; \cal M \times \cal M$.

Proof. Fix ${ (\mu, \nu) \in \cal M \times \cal M}$, and let ‘⪯’ be the identity order (xy if and only if x = y). Then ‘⪯sd’ also corresponds to equality, from which it follows that the supremum in (3.3) is attained by μ Λ ν. Hence, αO(μ, ν) = α(μ, ν).

3.3. Properties of ordered affinity

Let us list some elementary properties of αO. The following list should be compared with Lemma 2.1. It shows that analogous results hold for αO as hold for α. (Lemma 2.1 is in fact a special case of Lemma 3.3 with the partial order set to the identity order.)

Lemma 3.3. For all ${ (\mu, \nu) \in \cal M \times \cal M}$, we have

  1. (a) 0 ≤ αO(μ, ν) ≤ min{μ(S), ν(S)},

  2. (b) αO(μ, ν) = μ(S) = ν(S) if and only if μsd ν, and

  3. (c) O(μ, ν) = αO(cμ, cν) whenever c ≥ 0.

Proof. Fix ${ (\mu, \nu) \in \cal M \times \cal M}$. Claim (a) follows directly from the definitions. Regarding claim (b), suppose first that μsd ν. Then (μ, ν) ∈ Φ(μ, ν) and, hence, αO(μ, ν) = μ(S). Conversely, if αO(μ, ν) = μ(S) then, since the only component μ′≤ μ with μ′(S) = μ(S) is μ itself, we must have (μ, ν′) ∈ Φ(μ, ν′) for some ν′≤ ν with μsd ν′. But then μ(I) ≤ ν′(I) ≤ ν(I) for any $I \in i\cal B$. Hence, μsdν.

Claim (c) is trivial if c = 0, so suppose instead that c > 0. Fix (μ′, ν′) ∈ Φ(μ, ν) such that αO(μ, ν) = μ′(S). It is clear that (′, ′) ∈ Φ(cμ, cν), implying that

(3.5)\begin{align} \label{eq:cin} c \alpha_O(\mu, \nu) = c \mu'(S) \leq \alpha_O(c\mu, c\nu). \end{align}

For reverse inequality, we can apply (3.5) again to get

\begin{align*} \alpha_O(c\mu, c\nu) = c \left(\frac{1}{c}\right) \alpha_O(c\mu, c\nu) \leq c \alpha_O(\mu, \nu).\\[-44pt] \end{align*}

3.4. Equivalent representations

In (2.2) we noted that the affinity between two measures has several alternative representations. In our setting these results generalize as follows.

Theorem 3.1. For all ${(\mu, \nu) \in \cal P \times \cal P}$, we have

(3.6)\begin{align}\label{eq:affco}\alpha_O(\mu, \nu)= 1 - \sup_{I \in i\cal B} \{ \mu(I) - \nu(I) \}= \max_{(X, Y) \in \cal C(\mu, \nu)} \mathbb P\{ X \preceq Y \}.\end{align}

Evidently (2.2) is a special case of (3.6) because (3.6) reduces to (2.2) when ‘⪯’ is set to equality. For example, when ‘⪯’ is equality,

$$\mathop {\sup }\limits_{I \in i{\cal B}} \{ \mu (I) - \nu (I)\} = \mathop {\sup }\limits_{B \in {\cal B}} \{ \mu (B) - \nu (B)\} = \mathop {\sup }\limits_{B \in {\cal B}} |\mu (B) - \nu (B)|,$$

where the last step is from (2.1). Note also that, as shown in the proof of Theorem 3.1, the supremum can also be written in terms of the open increasing sets i𝒪 or the closed decreasing sets $d\mathfrak C$. In particular,

\begin{align*} \sup_{I \in i\cal B} \{ \mu(I) - \nu(I) \} = \sup_{I \in i\cal O} \{ \mu(I) - \nu(I) \} = \sup_{D \in d\mathfrak C} \{ \nu(D) - \mu(D) \}.\end{align*}

One of the assertions of Theorem 3.1 is the existence of a coupling $ (X, Y) \in \cal C(\mu, \nu)$ attaining ℙ{XY} = αO (μ, ν). Let us refer to any such coupling as an order maximal coupling for (μ, ν).

Example 3.3. For (x, y) ∈ S × S, we have

\begin{align*} \alpha_O(\delta_x, \delta_y) = {\bf 1}\{x \preceq y\} = {\bf 1}_{\mathbb G}(x, y),\end{align*}

as can easily be verified from the definition or either of the alternative representations in (3.6). The map $ (x, y) \mapsto {\bf 1}_{\mathbb G}(x, y)$ is measurable due to the Polish assumption. As a result, for any $ (X, Y) \in \cal C(\mu, \nu)$, we have

\begin{align*} \mathbb E \alpha_O(\delta_X, \delta_Y) = \mathbb P\{X \preceq Y\} \leq \alpha_O(\mu, \nu),\end{align*}

with equality when (X, Y) is an order maximal coupling.

3.5. Comments on Theorem 3.1

The existence of an order maximal coupling shown in Theorem 3.1 implies two well-known results that are usually treated separately. One is the Nachbin–Strassen theorem (see, e.g. Theorem 1 of [Reference Kamae, Krengel and O’Brien19] or Chapter IV of [Reference Lindvall22]), which states the existence of a coupling $ (X, Y) \in \cal C(\mu, \nu) $ attaining ℙ{XY} = 1 whenever μsdν. The existence of an order maximal coupling for each (μ, ν) in $\cal P \times \cal P$ implies this statement, since, under the hypothesis that μsdν, we also have αO(μ, ν) = 1. Hence, any order maximal coupling satisfies ℙ{XY} = 1.

The other familiar result implied by the existence of an order maximal coupling is the existence of a maximal coupling in the standard sense (see the discussion of maximal couplings after (2.2) and the result on page 19 of [Reference Lindvall22]). Indeed, if we take ‘⪯’ to be the identity order then (3.6) reduces to (2.2), as already discussed.

4. Total ordered variation

Let S be a partially ordered Polish space. Consider the function on $\cal P \times \cal P$ given by

\begin{align*} \label{eq:dtovd} \gamma(\mu, \nu) \,:\!= 2 - \alpha_O(\mu, \nu) - \alpha_O(\nu, \mu).\end{align*}

We call γ (μ, ν) the total ordered variation distance between μ and ν. The natural comparison is with (2.3), which renders the same value if αO is replaced by α. In particular, when ‘⪯’ is equality, ordered affinity reduces to affinity, and total ordered variation distance reduces to total variation distance. Since ordered affinities dominate affinities (see (3.4)), we have γ (μ, ν) ≤ ||μν|| for all $ (\mu, \nu) \in \cal P \times \cal P $.

Other, equivalent, representations of γ are available. For example, in view of (3.6), for any $ (\mu, \nu) \in \cal P \times \cal P $, we have

(4.1)\begin{align}\label{eq:tvsio}\gamma(\mu, \nu)= \sup_{I \in i\cal B} (\mu - \nu)(I) + \sup_{I \in i\cal B} (\nu - \mu)(I).\end{align}

By combining Lemma 3.1 and (3.2), we also have

(4.2)\begin{align}\label{eq:nq} 2 \gamma(\mu, \nu) = \sup_{h \in iH} (\mu - \nu)(h) + \sup_{h \in iH} (\nu - \mu)(h).\end{align}

It is straightforward to show that

(4.3)\begin{align} \label{eq:idb} \sup_{I \in i\cal B} |\mu(I) - \nu(I)| \leq \gamma(\mu, \nu) \quad \text{and} \quad \sup_{D \in d\cal B} |\mu(D) - \nu(D)| \leq \gamma(\mu, \nu) .\end{align}

Lemma 4.1. The function γ is a metric on $\cal P$.

Proof. The claim that γ is a metric follows in a straightforward way from the definition or the alternative representation (4.1). For example, the triangle inequality is easy to verify using (4.1). Also, γ (μ, ν) = 0 implies that μ = ν by (4.1) and Remark 3.1.

4.1. Connection to other modes of convergence

As well as total variation, the metric γ is closely related to the so-called Bhattacharya metric, which is given by

(4.4)\begin{align} \label{eq:bhm} \beta(\mu, \nu) \,:\!= \sup_{h \in iH} |\mu(h) - \nu(h)| . \end{align}

See [Reference Bhattacharya and Lee1] and [Reference Chakraborty and Rao3]. (In [Reference Chakraborty and Rao3] the metric is defined by taking the supremum over iH 0 rather than iH, but the two definitions differ only by a positive scalar.) The Bhattacharya metric can be thought of as an alternative way to generalize total variation distance, in the sense that, like γ, the metric β reduces to total variation distance when ‘⪯’ is the identity order (since iH equals H under this order). From (3.1) we have

\begin{align*} \label{eq:cfsi} \frac{1}{2} \Big[ \sup_{h \in iH} \lambda(h) +\sup_{h \in iH} (-\lambda)(h) \Big] \leq \sup_{h \in iH} |\lambda(h)| \leq \sup_{h \in iH} \lambda(h) + \sup_{h \in iH} (-\lambda)(h), \end{align*}

and from this and (4.2) we have

(4.5)\begin{align} \label{eq:em} \gamma(\mu, \nu) \leq \beta(\mu, \nu) \leq 2 \gamma(\mu, \nu). \end{align}

Hence, β and γ are equivalent metrics.

The metric γ is also connected to the Wasserstein metric ([Reference Gibbs and Su12], [Reference Givens and Shortt13]). If ρ metrizes the topology on S then the Wasserstein distance between probability measures μ and ν is

\begin{align*} w(\mu, \nu) \,:\!= \inf_{(X, Y) \in \cal C(\mu, \nu)} \mathbb E \, \rho(X, Y) .\end{align*}

The total ordered variation metric can be compared as follows. Consider the ‘directed semimetric’ $\hat \rho(x, y) \,:= \bf 1\{x {\npreceq} y\}$. In view of (3.6) we have

\begin{align*} \gamma(\mu, \nu) = \inf_{(X, Y) \in \cal C(\mu, \nu)} \mathbb E \, \hat \rho(X, Y) + \inf_{(X, Y) \in \cal C(\mu, \nu)} \mathbb E \, \hat \rho(Y, X) .\end{align*}

Thus, γ (μ, ν) is found by summing two partial, ‘directed Wasserstein deviations’. Summing the two directed differences from opposite directions yields a metric.

Proposition 4.1. If $\{\mu_n\}_{n \geq 0} \subset \cal P$ is tight and γ (μn, μ 0) → 0, then $\mu_n \buildrel {\rm w} \over \longrightarrow \mu_0$.

Proof. Let {μn} and μ := μ 0 satisfy the conditions of the proposition. Take any sub-sequence of {μn} and observe that, by Prokhorov’s theorem, this subsequence has a subsubsequence converging weakly to some $\nu \in \cal P$. Along this subsubsequence, for any continuous hibS, we have both μn(h) → μ(h) and μn(h) → ν(h). This is sufficient for ν = μ by Remark 3.1. Thus, every subsequence of {μn} has a subsubsequence converging weakly to μ, and hence so does the entire sequence.

4.2. Completeness

To obtain completeness of ($\cal P,\,\gamma$), we adopt the following additional assumption, which is satisfied if, say, compact sets are order bounded (i.e. lie in order intervals) and order intervals are compact. (For example, ℝn with the usual pointwise partial order has this property.)

Assumption 4.1. If KS is compact then i(K) ∩ d(K) is also compact.

Theorem 4.1. If Assumption 4.1 holds then ($\cal P$, γ) is complete.

Remark 4.1. In [Reference Chakraborty and Rao3] it was shown that β is a complete metric when S = ℝn. Due to equivalence of the metrics, Theorem 4.1 extends this result to partially ordered Polish spaces where Assumption 4.1 is satisfied.

5. Applications

In this section we show that many results in classical and monotone Markov chain theory, hitherto treated separately, can be derived from the same set of results based around total ordered variation and ordered affinity.

Regarding notation, if {Si} are partially ordered Polish spaces over i = 0, 1, 2, …, we often use a common symbol ‘⪯’ for the partial order on any of these spaces. On products of these spaces we use the product topology and pointwise partial order. Once again, the symbol ‘⪯’ is used for the partial order. For example, if (x 0, x 1) and (y 0, y 1) are points in S 0 × S 1, then (x 0, x 1) ⪯ (y 0, y 1) means that x 0y 0 and x 1y 1.

A function $P \colon (S_0, \cal B_1) \to [0, 1]$ is called a Markov kernel from S 0 to S 1 if xP(x, B) is $\cal B_0$-measurable for each $B \in \cal B_1$ and BP(x, B) is in $\cal P_1$ for all xS 0. If S 0 = S 1 = S, we will call P a Markov kernel on S, or just a Markov kernel. Following standard conventions (see, e.g. [Reference Meyn and Tweedie25]), for any Markov kernel P from S 0 to S 1, any hbS 1, and $\mu \in \cal P_0$, we define $\mu P \in \cal P_1$ and PhbS 0 via

\begin{align*} (\mu P)(B) = \int P(x, B) \mu({\rm d} x) \quad {\rm and} \quad (Ph)(x) = \int h(y) P(x, {\rm d} y).\end{align*}

Also, μP denotes the joint distribution on S 0 × S 1 defined by

\begin{align*} (\mu \otimes P)(A \times B) = \int_A P(x, B) \mu({\rm d} x).\end{align*}

To simplify notation, we use Px to represent the measure δxP = P(x, ·); Pm is the mth composition of P with itself.

5.1. Order affinity and monotone Markov kernels

Let S be a Polish space partially ordered by ‘⪯’. A Markov kernel P is called monotone if PhibS 0 whenever hibS 1. An equivalent condition is that μPsd νP whenever μsd ν; or just P(x, ·) ⪯sd P(y, ·) whenever xy. It is well known (see, e.g. Proposition 1 of [Reference Kamae, Krengel and O’Brien19]) that if μsd ν and P is monotone, then μPsd νP. Note that, when ‘⪯’ is the identity order, every Markov kernel is monotone.

Lemma 5.1. If P is a monotone Markov kernel from S 0 to S 1 and μ, μ′, ν, and νare probabilities in $\cal P_0$, then

\begin{align*} \mu' {\preceq_{sd}} \mu \quad and \quad \nu {\preceq_{sd}} \nu' \quad \Rightarrow \quad \alpha_O(\mu P, \nu P) \leq \alpha_O(\mu' P, \nu' P). \end{align*}

Proof. Let P, μ, μ′, ν, and ν′ have the stated properties. In view of the equivalently representation in (3.6), the claim will be established if

\begin{align*} \sup_{I \in i\cal B} \{ (\mu P)(I) - (\nu P)(I) \} \geq \sup_{I \in i\cal B} \{ (\mu' P)(I) - (\nu' P)(I) \}.\end{align*}

This holds by the monotonicity of P and the order of μ, μ′, ν, and ν′.

Lemma 5.2. If P is a monotone Markov kernel from S 0 to S 1, then

\begin{align*} \alpha_O(\mu P, \nu P) \geq \alpha_O(\mu, \nu) \quad {for\; any\; \mu, \nu \; in \;\cal P_0}.\end{align*}

Proof. Fix μ, ν in $\cal P_0$, and let ($\hat \mu, \hat \nu$) be a maximal ordered component pair for (μ, ν). From monotonicity of P and the fact the Markov kernels preserve the mass of measures, it is clear that $ (\hat \mu P, \hat \nu P)$) is an ordered component pair for (μP, νP). Hence,

\begin{align*} \alpha_O(\mu P, \nu P) \geq (\hat \mu P)(S) = \hat \mu(S) = \alpha_O(\mu, \nu).\\[-35pt] \end{align*}

On the other hand, for the joint distribution, the ordered affinity of the initial pair is preserved.

Lemma 5.3. If P is a monotone Markov kernel from S 0 to S 1, then

\begin{align*} \alpha_O(\mu \otimes P, \nu \otimes P) = \alpha_O(\mu, \nu) \quad {for\; any\; \mu,\nu\; in \;\cal P_0}.\end{align*}

Proof. Fix μ, ν in $\cal P_0$, and let (X 0, X 1) and (Y 0, Y 1) be random pairs with distributions μP and νP, respectively. We have

\begin{align*} \mathbb P\{ (X_0, X_1) \preceq (Y_0, Y_1) \} \leq \mathbb P\{ X_0 \preceq Y_0 \} \leq \alpha_O(\mu, \nu). \end{align*}

Taking the supremum over all couplings in $\cal C(\mu \otimes P, \nu \otimes P)$ shows that αO(μP, νP) is dominated by αO(μ, ν).

To see the reverse inequality, let ($\hat \mu, \hat \nu$) be a maximal ordered component pair for (μ, ν). Monotonicity of P now gives $\hat \mu \otimes P {\preceq_{sd}} \hat \nu \otimes P$. Using this and the fact the Markov kernels preserve the mass of measures, we see that ($\hat \mu \otimes P, \hat \nu \otimes P$) is an ordered component pair for (μP, νP). Hence,

\begin{align*} \alpha_O(\mu P, \nu P) \geq (\hat \mu \otimes P)(S_0 \times S_1) = \hat \mu(S_0) = \alpha_O(\mu, \nu).\\[-32pt] \end{align*}

5.2. Monotone Markov chains

Given μ𝒫 and Markov kernel P on S, a stochastic process {Xt}t≥0 taking values in $S^{\infty} \,:\!= \times_{t=0}^\infty S$ will be called a Markov chain with initial distribution μ and kernel P if the distribution of {Xt} on S is

\begin{align*} \boldsymbol Q_\mu \,:\!= \mu \otimes P \otimes P \otimes P \otimes \cdots\!.\end{align*}

(The meaning of the right-hand side is clarified in [Reference Lindvall22, Section III.8], [Reference Kamae, Krengel and O’Brien19, p. 903], and [Reference Meyn and Tweedie25, Section 3.4] for example.) If P is a monotone Markov kernel then (x, B) ↦ Qx(B) := Q δ x(B) is a monotone Markov kernel from S to S . See Propositions 1 and 2 of [Reference Kamae, Krengel and O’Brien19].

There are various useful results about representations of Markov chains that are ordered almost surely. One is that, if the initial conditions satisfy μsd ν and P is a monotone Markov kernel, then we can find Markov chains {Xt} and {Yt} with initial distributions μ and ν and kernel P such that XtYt for all t almost surely. (See, e.g. Theorem 2 of [Reference Kamae, Krengel and O’Brien19].) This result can be generalized beyond the case where μ and ν are stochastically ordered, using the results presented above. For example, let μ and ν be arbitrary initial distributions and let P be monotone, so that Qx is likewise monotone. By Lemma 5.3 we have

\begin{align*} \alpha_O(\boldsymbol Q_\mu, \boldsymbol Q_\nu) = \alpha_O(\mu \otimes \boldsymbol Q_x, \nu \otimes \boldsymbol Q_x) = \alpha_O(\mu, \nu).\end{align*}

In other words, the ordered affinity of the entire processes is given by the ordered affinity of the initial distributions. It now follows from Theorem 3.1 that there exist Markov chains {Xt} and {Yt} with initial distributions μ and ν and Markov kernel P such that

\begin{align*} \mathbb P\{ X_t \preceq Y_t \text{ for all } t \geq 0\} = \alpha_O(\mu, \nu).\end{align*}

The standard result is a special case, since μsd ν implies that αO(μ, ν) = 1, and, hence, the sequences are ordered almost surely.

5.3. Nonexpansiveness

It is well known that every Markov kernel is nonexpansive with respect to the total variation norm, so that

(5.1)\begin{align} \label{eq:netv} \| \mu P - \nu P \| \leq \| \mu - \nu \| \quad \text{for all } (\mu, \nu) \in \cal P \times \cal P. \end{align}

An analogous result is true for γ when P is monotone. That is,

(5.2)\begin{align} \label{eq:netov} \gamma(\mu P , \nu P ) \leq \gamma( \mu , \nu ) \quad \text{for all } (\mu, \nu) \in \cal P \times \cal P.\end{align}

The bound (5.2) follows directly from Lemma 5.2. Evidently (5.1) be recovered from (5.2) by setting ‘⪯’ to equality.

Nonexpansiveness is interesting partly in its own right (we apply it in proofs below) and partly because it suggests that, with some additional assumptions, we can strengthen it to contractiveness. We expand on this idea below.

5.4. An order coupling bound for Markov chains

Doeblin [Reference Doeblin7] established and exploited the coupling inequality

(5.3)\begin{align} \label{eq:d} \| \mu_t - \nu_t \| \leq 2 \, \mathbb P\{X_j \not= Y_j \text{ for any } j \leq t\} ,\end{align}

where μt and νt are the time t distributions of Markov chains {Xt} and {Yt} generated by common Markov kernel P. Even when the state space is uncountable, the right-hand side of (5.3) can often be shown to converge to zero by manipulating the joint distribution of (Xj, Yj) to increase the chance of a meeting ([Reference Doob9], [Reference Harris14], [Reference Lindvall22], [Reference Meyn and Tweedie25], [Reference Nummelin26], [Reference Orey27], [Reference Pitman28], [Reference Revuz31], [Reference Thorisson34]).

Consider the following generalization: given a monotone Markov kernel P on S and arbitrary $\mu, \nu \in \cal P$, we can construct Markov chains {Xt} and {Yt} with common kernel P and respective initial conditions μ and ν such that

(5.4)\begin{align} \label{eq:do} \gamma( \mu_t, \nu_t) \leq \mathbb P\{X_j \npreceq Y_j \text{ for any } j \leq t\} + \mathbb P\{Y_j \npreceq X_j \text{ for any } j \leq t\}. \end{align}

This generalizes (5.3) because both the right- and left-hand sides of (5.4) reduce to (5.3) if we take $' \preceq '$ to be equality.

To prove (5.4), we need only show that

(5.5)\begin{align} \label{eq:ido} 1 - \alpha_O( \mu P^t, \nu P^t) \leq \mathbb P\{X_j \npreceq Y_j \text{ for any } j \leq t\},\end{align}

since, with (5.5) is established, we can reverse the roles of {Xt} and {Yt} in (5.5) to obtain $1 - \alpha_O( \nu P^t, \mu P^t) \leq \mathbb P\{Y_j \npreceq X_j \;\text {for any}\; j \leq t\}$ and then add this inequality to (5.5) to produce (5.4).

If {Xt} and {Yt} are Markov chains with kernel P and initial conditions μ and ν, then (3.6) yields αO(μPt, νPt) ≥ ℙ{XtYt}. Therefore, we need only construct such chains with the additional property that

(5.6)\begin{align} \label{eq:oia} \mathbb P\{X_j \npreceq Y_j \text{ for any } j \leq t\} = \mathbb P\{X_t \npreceq Y_t\}.\end{align}

Intuitively, we can do so by using a ‘conditional’ version of the Nachbin–Strassen theorem, producing chains that, once ordered, remain ordered almost surely. This can be formalized as follows. By [Reference Machida and Shibakov23, Theorem 2.3], there exists a Markov kernel M on S × S such that $\mathbb G$ is absorbing for M (i.e. $M((x,y), \mathbb G) = 1$ for all (x, y) in $\mathbb G$),

\begin{align*} P(x, A) = M((x,y), \, A \times S) \quad \text{and} \quad P(y, B) = M((x,y), \, S \times B) \end{align*}

for all (x, y) ∈ S × S and all $A, B \in \cal B$. Given M, let η be a distribution on S × S with marginals μ and ν, let Qη := ηMM ⊗ · · · be the induced joint distribution, and let {(Xt, Yt)} have distribution Qη on (S × S). By construction, Xt has distribution μPt and Yt has distribution νPt. Moreover, (5.6) is valid because $\mathbb G$ is absorbing for M, and, hence, $\mathbb P\{(X_j, Y_j) \notin \mathbb G $ for any $ j \leq t\} = \mathbb P\{(X_t, Y_t) \notin \mathbb G\}$.

5.5. Uniform ergodicity

Let S be a partially ordered Polish space satisfying Assumption 4.1, and let P be a monotone Markov kernel on S. A distribution π is called stationary for P if πP = π. Consider the value

(5.7)\begin{align} \label{eq:odob} \sigma(P) \,:\!= \inf_{(x, y) \in S \times S} \alpha_O(P_x, P_y) ,\end{align}

which can be understood as an order-theoretic extension of the Markov–Dobrushin coefficient of ergodicity ([Reference Dobrushin5], [Reference Seneta32]). It reduces to the usual notion when ‘⪯’ is equality.

Theorem 5.1. If P is monotone then

(5.8)\begin{align} \label{eq:fbod} \gamma(\mu P, \nu P) \leq (1 - \sigma(P)) \, \gamma(\mu, \nu) \quad {for\; all }\;(\mu, \nu) \in \cal P \times \cal P.\end{align}

Thus, strict positivity of σ(P) implies that μμP is a contraction map on $ (\cal P, \gamma ) $. Moreover, in many settings, the bound in (5.8) cannot be improved upon, as we now see.

Lemma 5.4. If P is monotone, S is not a singleton, and any x, y in S have a lower bound in S, then,

(5.9)\begin{align} \label{eq:ev} {for \; all } \; \xi > \sigma(P),\;\;{there \; exists }\; \mu, \nu \in {\cal P} \;{such \; that }\;\gamma(\mu P, \nu P) > (1 - \xi) \gamma(\mu, \nu).\end{align}

The significance of Theorem 5.1 is summarized in the next corollary.

Corollary 5.1. Let P be monotone, and let S satisfy Assumption 4.1. If there exists an m ∈ ℕ such that σ(Pm) > 0, then P has a unique stationary distribution π in $\cal P$, and

(5.10)\begin{align} \label{eq:ifbod} \gamma(\mu P^t, \pi) \leq (1 - \sigma(P^m))^{{\lfloor t/m \rfloor}} \, \gamma(\mu, \pi)\quad {for \; all }\; \mu \in {\cal P}, t \geq 0.\end{align}

Here ⌊x⌋ is the largest n ∈ ℕ with nx.

Proof of Corollary 5.1. Let P and μ be as in the statement of the theorem. The existence of a fixed point of μμP, and hence a stationary distribution $\pi \in \cal P$, follows from Theorem 5.1 applied to Pm, Banach’s contraction mapping theorem, and the completeness of ($\cal P$, γ) shown in Theorem 4.1. The bound in (5.10) follows from (5.8) applied to Pm and the nonexpansiveness of P in the metric γ (see (5.2)).

5.6. Applications from the introduction

Let us see how Corollary 5.1 can be used to show stability of the two models discussed in the Introduction, beginning with the monotone model in (1.1). The state space is S = [0, 1] with its standard order and all assumptions are as per the discussion immediately following (1.1). We let P be the corresponding Markov kernel and consider the following coupling of (P 1, P 0). Let W be a draw from the Bernoulli($\tfrac12$) distribution, and let V = 1 − W. Then P 1 and P 0 are respectively the distributions of

\begin{align*}X \,:\!= \frac{1 + W}{2}\quad \text{and} \quad Y \,:\!= \frac{0 + V}{2} = \frac{1 - W}{2}.\end{align*}

Since XY if and only if W =0, we have, by Lemma 5.1 and (3.6),

\begin{align*} \alpha_O(P_x, P_y) \geq \alpha_O(P_1, P_0) \geq \mathbb P\{ X \preceq Y \} = \tfrac{1}{2} \end{align*}

for all x, yS. From the definition in (5.7) we then have $\sigma(P) \geq \tfrac12$, and globally stability in the γ metric follows from Corollary 5.1.

Next we turn to the inventory model in (1.2), with state space S = [0, K] and {Wt} an i.i.d. process satisfying ℙ{Wtw} > 0 for any real w. Let {Xt} and {Yt} be generated by (1.2), with common shock sequence {Wt}, and respective initial conditions x, y in S. In view of (2.2), we have

$$\alpha ({P_x},{P_y}) \ge {\mathbb P \{ {X_1} = {Y_1}\}} \ge {\mathbb P \{{W_1} \ge K\}} = : \kappa > 0.$$

Letting ‘⪯’ be equality on S, we have αO(Px, Py) = α(Px, Py) ≥ κ, and hence σ(P) ≥ κ. Globally stability in the γ metric follows from Corollary 5.1.

5.7. General applications

Next we show that Theorem 5.1 and in particular Corollary 5.1 cover as special cases two well-known results on Markov chain stability from the classical literature on one hand and the more recent monotone Markov chain literature on the other.

Consider first the standard notion of uniform ergodicity. A Markov kernel P on S is called uniformly ergodic if it has a stationary distribution π and $\sup_{x \in S} \| P_x^t - \pi \| \to 0$ as t → ∞. Uniform ergodicity was studied by Markov [Reference Markov24] in a countable state space, and by Doeblin [6], Yosida and Kakutani [Reference Yosida and Kakutani37], Doob [Reference Doob9], and many subsequent authors in a general state space. It is defined and reviewed in Chapter 16 of [Reference Meyn and Tweedie25]. One of the most familiar equivalent conditions for uniform ergodicity [Reference Meyn and Tweedie25, Theorem 16.0.2] is the existence of an m ∈ ℔ and a nontrivial $\phi \in \cal M$ such that $P^m_x \geq \phi$ for all x in S.

One can recover this result using Corollary 5.1. Take ‘⪯’ to be equality, in which case every Markov operator is monotone, γ is total variation distance, and Assumption 4.1 is always satisfied. Moreover, σ(Pm) reduces to the ordinary ergodicity coefficient of Pm, evaluated using the standard notion of affinity, and, hence,

\begin{align*} \sigma(P^m) = \inf_{(x, y) \in S \times S} \alpha(P^m_x, \, P^m_y) = \inf_{(x, y) \in S \times S} (P^m_x \wedge P^m_y) (S) \geq \phi(S) > 0.\end{align*}

Thus, all the conditions of Corollary 5.1 are satisfied, and

\begin{align*} \sup_{x \in S} \| P^t_x - \pi \| = \sup_{x \in S} \gamma( P^t_x \, , \, \pi) \leq 2 (1 - \sigma(P^m))^{{\lfloor t/m \rfloor}} \to 0 \quad\text{as } t \to \infty.\end{align*}

Now consider the setting of Bhattacharya and Lee [Reference Bhattacharya and Lee1], where S =ℝn, ‘⪯’ is the usual pointwise partial order ‘≤’ for vectors, and {gt} is a sequence of i.i.d. random maps from S to itself, generating {Xt} via Xt = gt(X t−1) = gt o · · · o g 1(X 0). The corresponding Markov kernel is P(x, B) = ℙ{g 1 (x) ∈ B}. The random maps are assumed to be order preserving on S, so that P is monotone. Bhattacharya and Lee used a ‘splitting condition’, which assumes existence of a S and m ∈ ℕ such that

  1. (a) s 1 := ℙ{gm o · · · o g 1(y) ≤ for all yS} > 0, and

  2. (b) s 2 := ℙ{gm o · · · o g 1(y) ≥ for all yS} > 0.

Under these assumptions, they show that $\sup_{x \in S} \beta(P^t_x, \, \pi)$ converges to zero exponentially fast in t, where β is the Bhattacharya metric introduced in (4.4). This finding extends earlier results by Dubins and Freedman [Reference Dubins and Freedman10] and Yahav [Reference Yahav36] to multiple dimensions.

This result can be obtained as a special case of Corollary 5.1. Certainly S is a partially ordered Polish space and Assumption 4.1 is satisfied. Moreover, the ordered ergodicity coefficient σ(Pm) is strictly positive. To see this, suppose that the splitting condition is satisfied at m ∈ ℕ. Pick any x, yS, and let {Xt} and {Yt} be independent copies of the Markov chain, starting at x and y, respectively. We have

\begin{align*} \sigma(P^m) \geq \mathbb P\{X_m \leq Y_m\} \geq \mathbb P\{X_m \leq \bar x \leq Y_m\} = \mathbb P\{X_m \leq \bar x \} \mathbb P\{\bar x \leq Y_m\} \geq s_1 s_2.\end{align*}

The last term is strictly positive by assumption. Hence, all the conditions of Corollary 5.1 are satisfied, a unique stationary distribution π exists, and $\sup_{x \in S} \gamma(P^t_x, \, \pi)$ converges to zero exponentially fast in t. We showed in (4.5) that β ≤ 2γ, so the same convergence holds for the Bhattacharya metric.

We can also recover a related convergence result due to Hopenhayn and Prescott [Reference Hopenhayn and Prescott16, Theorem 2] that is routinely applied to stochastic stability problems in economics. They assumed that S is a compact metric space with a closed partial order, and a least element a and greatest element b. They supposed that P is monotone, and that there exists an in S and an m ∈ ℕ such that

(5.11)\begin{align} \label{eq:mmc} P^m(a, [\bar x, b]) > 0 \quad \text{and} \quad P^m(b, [a, \bar x]) > 0.\end{align}

In this setting, they showed that P has a unique stationary distribution π and $\mu P^t \buildrel {\rm w} \over \longrightarrow \pi$ for any $\mu \in \cal P$ as t → ∞. This result can be obtained from Corollary 5.1. Under the stated assumptions, S is Polish and Assumption 4.1 is satisfied. The coefficient σ(Pm) is strictly positive because, if we let {Xt} and {Yt} be independent copies of the Markov chain starting at b and a, respectively, then, since $ (X_m, Y_m) \in {\cal C}(P^m_b, P^m_a)$, we have

\begin{align*} \alpha_O(P^m_b, P^m_a) \geq \mathbb P\{X_m \leq Y_m\} \geq \mathbb P\{X_m \leq \bar x \leq Y_m\} = \mathbb P\{X_m \leq \bar x \} \mathbb P\{\bar x \leq Y_m\}.\end{align*}

The last term is strictly positive by (5.11). Positivity of σ(Pm) now follows from Lemma 5.1, since ax, yb for all x, yS. Hence, by Corollary 5.1, there exists a unique stationary distribution π and $\gamma(\mu P^t, \pi) \to 0$ as t → ∞ for any $\mu \in \cal P$. This convergence implies weak convergence by Proposition 4.1 and compactness of S.

Appendix A

In this appendix we collect the remaining proofs. Throughout, in addition to notation defined above, cbS 0 denotes all continuous functions h: S → [0, 1], while

\begin{align*} g(\mu, \nu) \,:\!= \| \mu \| - \alpha_O(\mu, \nu) \quad \text{for each $\mu, \nu \in \cal M$}. \end{align*}

A.1. Proofs of Section 3 results

Proof of Lemma 3.1. For the first equality, fix $\lambda \in \cal M_s$ and let

\begin{align*} s(\lambda) \,:\!= \sup_{I \in i\cal B} \lambda(I) \quad \text{and} \quad b(\lambda) \,:\!= \sup_{h \in iH_0} \lambda(h).\end{align*}

Since 1IibS for all $I \in i\cal B$, we have b(λ) ≥ s(λ). To see the reverse inequality, let hiH 0. Fix n ∈ ℕ. Let rj = j/n for j = 0, …, n. Define hniH 0 by

\begin{align*} h_{n}(x) = \max\{r \in \{r_{0}, \ldots, r_{n}\} \colon r \leq h(x)\}.\end{align*}

Since hhn + 1/n, we have

(A.1)\begin{align} \label{calpie} \lambda (h) \leq \lambda (h_{n}) + \frac{\|\lambda\|}{n}.\end{align}

For j = 0, …, n, let $I_{j} \,:\!= \{x \in S \colon h_{n}(x) \geq r_{j}\} \in i\cal B$. Note that

(A.2)\begin{align} \label{capable} I_{n} = \{x \in S \colon h_{n}(x) = 1\} \subset I_{n-1} \subset \cdots \subset I_{0} = S.\end{align}

We have

$$\lambda (h_{n} )= \lambda(I_{n}) + \sum_{j=1}^{n} r_{n-j} \lambda(I_{n-j} \setminus I_{n-j+1}).$$

We define f 0, …, f n−1iH 0 and $A_{0}, \ldots, A_{n-1} \in i\cal B$ as follows. Let f 0 = hn and A 0 = In. Evidently,

\begin{align*} \lambda (\kern2pt f_{0}) \geq \lambda (h_{n}), \qquad f_{0}(x) = 1 \quad\text{for all } x \in A_{0}, \qquad f_{0}(x) = r_{n-1} \quad\text{for all } x \in I_{n-1} \setminus A_{0}.\end{align*}

Now suppose that, for some j ∈ {0, 1, …, n − 2}, we have

(A.3)\begin{align} \label{calfeutrage} \begin{gathered} \lambda (\kern2pt f_{j}) \geq \lambda (h_{n}),\qquad f_{j}(x) = 1 \quad\text{for all } x \in A_{j}, \\ f_{j}(x) = r_{n-j-1} \quad\text{for all } x \in I_{n-j-1} \setminus A_{j}.\end{gathered}\end{align}

If λ(I nj−1 \ Aj) > 0 then define

\begin{align*} f_{j+1}(x) = \begin{cases} 1 & \text{if $x \in I_{n-j-1} \setminus A_{j}$,} \\ f_{j}(x) & \text{otherwise,}\end{cases} \quad \text{and} \quad A_{j+1} = I_{n-j-1}.\end{align*}

Note that in this case

\begin{gather*} \lambda (\kern2pt f_{j+1}) - \lambda (\kern2pt f_{j} ) = (1-r_{n-j-1}) \lambda(I_{n-j-1} \setminus A_{j}) > 0, \\ \label{calfater} f_{j+1}(x) = r_{n-j-2}\quad\text{for all } x \in I_{n-j-2} \setminus A_{j+1}. \end{gather*}

If λ(I nj−1 \ Aj) ≤ 0 then define

\begin{align*} f_{j+1}(x) = \begin{cases} r_{n-j-2} & \text{if $x \in I_{n-j-1} \setminus A_{j}$,} \\ f_{j}(x) & \text{otherwise,} \end{cases} \quad \text{and} \quad A_{j+1} = A_{j}. \end{align*}

In this case λ(f j+1) − λ(fj) = (r nj−2r nj−1)λ(I nj−1 \ Aj) ≥ 0. We also have (A.1) by construction. Thus, in both cases, we have (A.3) with j replaced by j + 1. Continuing this way, we see that (A.3) holds for all j = 0, …, n − 1.

Let j = n − 1 in (A.3). From the definition of rj and (A.2) we have r 0 = 0 and I 0 = S. Thus,

\begin{align*} \begin{gathered} \lambda (\kern2pt f_{n-1}) \geq \lambda (h_{n}),\qquad f_{n-1}(x) = 1 \quad\text{for all }x \in A_{n-1}, \\ f_{n-1}(x) = 0 \quad\text{for all } x \in S \setminus A_{n-1}.\end{gathered} \end{align*}

Since f n−1 = 1A n−1 and A n−1 = Ij for some j ∈ {0, …, n − 1}, recalling (A.1) we have

\begin{align*} \lambda (h) - \frac{\|\lambda\|}{n} \leq \lambda (h_{n} ) \leq \lambda(A_{n-1}) \leq s(\lambda).\end{align*}

Applying suphiH 0 to the leftmost side, we see that b(λ) − 1/ns(λ). Since this is true for any n ∈ ℕ N, we obtain b(λ) ≤ s(λ).

The claim (3.1) follows from |a| = max{a, −a} and interchange of max and sup.

Proof of Theorem 3.1. Let (X, Y) be a coupling of (μ, ν), and define

\begin{align*} \mu'(B) \,:\!= \mathbb P\{X \in B,\, X \preceq Y\} \quad \text{and} \quad \nu'(B) \,:\!= \mathbb P\{Y \in B,\, X \preceq Y\}.\end{align*}

Clearly, μ ′ ≤ μ, ν′ ≤ ν and μ′(S) = ℙ{XY} = ν′(S). Moreover, for any increasing set $I \in \cal B$, we clearly have μ′(I) = ν′(I). Hence, (μ′, ν′) ∈ Φ(μ, ν) and ℙ{XY} = μ′(S) ≤ αO(μ, ν). We now exhibit a coupling such that equality is attained. In doing so, we can assume that a : = αO(μ, ν) > 0. If not then, for any $ (X,Y) \in \cal C(\mu, \nu)$, we have 0 ≤ ℙ{XY} ≤ αO(μ, ν) = 0.

To begin, observe that, by Proposition 3.1, there exists a pair (μ′, ν′) ∈ Φ(μ, ν) with μ′(S) = ν′(S) = a. Let

$$\mu^r \,:\!= \frac{\mu - \mu'}{1 - a} \quad\text{and}\quad \nu^r \,:\!= \frac{\nu - \nu'}{1 - a}.$$

By construction, μr, νr, μ′/a, and ν′/a are probability measures satisfying

\begin{align*} \mu = (1 - a) \mu^r + a \Big(\frac{\mu'}{a}\Big) \quad \text{and} \quad \nu = (1 - a) \nu^r + a \Big(\frac{\nu'}{ a}\Big).\end{align*}

We construct a coupling (X, Y) as follows. Let U, X′, Y′, Xr, and Yr be random variables on a common probability space such that

  1. (a) $X' {\mathop = \limits^{\rm D}} \mu' / a$, $Y' {\mathop = \limits^{\rm D}} \nu' / a$, $X^r {\mathop = \limits^{\rm D}} \mu^r$, and $Y^r {\mathop = \limits^{\rm D}} \nu^r$,

  2. (b) U is uniform on [0, 1] and independent of (X′, Y′, Xr, Yr), and

  3. (c) ℙ{X′ ⪯ Y′} = 1.

The pair in (c) can be constructed via the Nachbin–Strassen theorem [Reference Kamae, Krengel and O’Brien19, Theorem 1], since μ′/asd ν′/a. Now let

\begin{align*} X \,:= {\bf 1}\{U \leq a\} X' + {\bf 1}\{U > a\} X^r \quad \text{and} \quad Y \,:= {\bf 1}\{U \leq a\} Y' + {\bf 1}\{U > a\} Y^r.\end{align*}

Evidently, $ (X,Y) \in \cal C(\mu, \nu) $. Moreover, for this pair, we have

\begin{align*} \mathbb P\{X \preceq Y\} \geq \mathbb P\{X \preceq Y, \; U \leq a \} = \mathbb P\{X' \preceq Y', \; U \leq a \} .\end{align*}

By independence, the right-hand side is equal to ℙ{X′ ⪯ Y′} ℙ{Ua} = a, so

$$\mathbb P\{X \preceq Y\} \geq a \,:\!= \alpha_O(\mu, \nu).$$

We conclude that

(A.4)\begin{align} \label{eq:affosmo} \alpha_O(\mu, \nu) = \max_{(X, Y) \in \cal C(\mu, \nu)} \mathbb P\{ X \preceq Y \}.\end{align}

Next, observe that, for any $ (X, Y) \in \cal C(\mu, \nu) $ and hibS, we have

\begin{align*} \mu(h) - \nu( h) & = {\mathbb E} h(X) - {\mathbb E} h(Y) \\ & = {\mathbb E} [ h(X) - h(Y)] {\bf 1}\{ X \preceq Y \} + {\mathbb E} [ h(X) - h(Y)] {\bf 1}\{ X \npreceq Y \} \\ & \leq {\mathbb E} [ h(X) - h(Y)] {\bf 1}\{ X \npreceq Y \} .\end{align*}

Specializing to h = 1I for some $I \in i\cal B$, we have $\mu(I) - \nu(I) \leq \mathbb P\{ X \npreceq Y \} = 1 - \mathbb P\{ X \preceq Y \}$. From this bound and (A.4), the proof of (3.6) will be complete if we can show that

(A.5)\begin{align} \label{eq:ci3} \sup_{(X, Y) \in \cal C(\mu, \nu)} \mathbb P\{X \preceq Y\} \geq 1 - \sup_{I \in i\cal B} \{ \mu(I) - \nu(I) \}.\end{align}

To prove (A.5), let $\cal B \otimes \cal B$ be the product σ-algebra on S × S and let πi be the ith coordinate projection, so that π 1(x, y) = x and π 2(x, y) = y for any (x, y) ∈ S × S. As usual, given QS × S, we let π 1(Q) be all xS such that (x, y) ∈ Q, and similarly for π 2. Recall that $\mathfrak C$ is the closed sets in S and $d\mathfrak C$ is the decreasing sets in $\mathfrak C$. Strassen’s theorem [33] implies that, for any ∊ ≥ 0 and any closed set KS × S, there exists a probability measure ξ on $ (S \times S, \cal B \otimes \cal B) $ with marginals μ and ν such that ξ(K) ≥ 1 − ∊ whenever

\begin{align*} \nu(F) \leq \mu(\pi_1(K \cap (S \times F))) + \epsilon \quad\text{for all } F \in \mathfrak C.\end{align*}

Note that if $F \in \mathfrak C$ then, since ‘⪯’ is a closed partial order, so is the smallest decreasing set d(F) that contains F. Let : $\epsilon \,:\!= \sup_{D \in d\mathfrak C} \{ \nu(D) - \mu(D) \}$, so that

\begin{align*} \epsilon \geq \sup_{F \in \mathfrak C} \{ \nu(d(F)) - \mu(d(F)) \} \geq \sup_{F \in \mathfrak C} \{ \nu(F) - \mu(d(F)) \}. \end{align*}

Noting that d(F) can be expressed as $\pi_1(\mathbb G \cap (S \times F))$, it follows that, for any $F \in \mathfrak C$,

\begin{align*} \nu(F) \leq \mu(\pi_1(\mathbb G \cap (S \times F))) + \epsilon.\end{align*}

Since ‘⪯’ is closed, $\mathbb G$ is closed, and Strassen’s theorem applies. From this theorem we obtain a probability measure ξ on the product space S × S such that $\xi(\mathbb G) \geq 1 - \epsilon$ and ξ has marginals μ and ν.

Because complements of increasing sets are decreasing and vice versa, we have

(A.6)\begin{align} \label{eq:fis} \sup_{I \in i\cal B} \{ \mu(I) - \nu(I) \} \geq \sup_{D \in d\mathfrak C} \{ \nu(D) - \mu(D) \} = \epsilon \geq 1 - \xi(\mathbb G). \end{align}

Now consider the probability space $ (\Omega, \cal F, \mathbb P) = (S \times S, \cal B \otimes \cal B, \xi)$, and let X = π 1 and Y = π 2. We then have $\xi(\mathbb G) = \xi \{{(x,y) \in S \times S}{x \preceq y} = \mathbb P {X \preceq Y}\}$. Combining this equality with (A.6) implies (A.5).

A.2. Proofs of Section 4 results

We begin with an elementary lemma.

Lemma A.1. For any μ, $\nu \in \cal M$, we have μν whenever μ(h) ≤ ν(h) for all hcbS 0

Proof. Suppose that μ(h) ≤ ν(h) for all hcbS 0. We claim that μ(F)

(A.7)\begin{align} \label{calamite} \text{$\mu(F) \leq \nu(F)$ for any closed set $F \subset S$.} \end{align}

To see this, let ρ be a metric compatible with the topology of S and let F be any closed subset of S. Let f (x) := max{1 − ρ(x, F)/∊, 0} for ∊ > 0, xS, where ρ(x, F) = infyF ρ(x, y). Since ρ(·, F) is continuous and 0 ≤ f ≤ 1, we have f cbS 0. Let F = {xS: ρ(x, F) < ∊} for ∊ > 0. Note that f (x) = 1 for all xF, and that f (x) = 0 for all xF . Thus,

(A.8)\begin{align} \label{calamine} \mu(F) \leq \mu (\kern2pt f_{\epsilon}) \leq \nu (\kern2pt f_{\epsilon}) \leq \nu(F_{\epsilon}).\end{align}

Since F = ∩ ∊>0 F , we have lim∊↓0 ν(F ) = ν(F), so letting ∊ ↓ 0 in (A.8) yields μ(F) ≤ ν(F). Hence, (A.7) holds.

Let $B \in \cal B$ and fix ∊ > 0. Since all probability measures on a Polish space are regular, there exists a closed set FB such that μ(B) ≤ μ(F) + ∊. Thus, by (A.7), we have μ(B) ≤ μ(F) + ∊ ≤ ν(F) + ∊ ≤ ν(B) + ∊. Since ∊ > 0 is arbitrary, this yields μ(B) ≤ ν(B). Hence, μν.

Proof of Theorem 4.1. Let {μn} be a Cauchy sequence in ($\cal P, \gamma $). Our first claim is that {μn} is tight. To show this, fix ∊ > 0. Let μ : = μN be such that

(A.9)\begin{align} \label{eq:ngn} n \geq N \quad \Rightarrow \quad \gamma(\mu, \mu_n) < \epsilon.\end{align}

Let K be a compact subset of S such that μ(K) ≥ 1 − ∊ and let := i(K) ∩ d(K). We have

\begin{align*} \mu_n(\bar K^c) = \mu_n( i(K)^c \cup d(K)^c ) \leq \mu_n( i(K)^c ) + \mu_n(d(K)^c ). \end{align*}

For nN, this bound, (4.3), (A.9), and the definition of K yield

\begin{align*} \mu_n(\bar K^c) < \mu( i(K)^c ) + \mu(d(K)^c ) + 2\epsilon \leq \mu( K^c ) + \mu(K^c ) + 2\epsilon \leq 4 \epsilon. \end{align*}

Hence, {μn}nN is tight. It follows that {μn}n≥1 is likewise tight. As a result, by Prokhorov’s theorem, it has a subsequence that converges weakly to some $\mu^{*} \in \cal P$. We aim to show that γ (μn, μ*) → 0.

To this end, fix ϵ > 0 and let nϵ be such that γ(μm, μnϵ) < ϵ whenever mnϵ. Fix mnϵ and let ν : = μm. For all nnϵ, we have γ (ν, μn) < ϵ. Fixing any such nnϵ, we observe that, since g(μn, ν) < ϵ, there exists $(\tilde{\mu}_{n}, \tilde{\nu}_{n}) \in \Phi(\mu_{n}, \nu)$ with $\|\tilde{\mu}_{n}\| = \|\tilde{\nu}_{n}\| > 1-\epsilon$. Multiplying $\tilde{\mu}_{n}$ and $\tilde{\nu}_{n}$ by $ (1-\epsilon)/\|\tilde{\mu}_n\| < 1 $, denoting the resulting measures by $\tilde{\mu}_{n}$ and $\tilde{\nu}_{n}$ again, we have

(A.10)\begin{align} \label{calembour} \tilde{\mu}_{n} \leq \mu_{n}, \qquad \tilde{\nu}_{n} \leq \nu, \qquad \|\tilde{\mu}_{n}\| = \|\tilde{\nu}_{n}\| = 1-\epsilon, \qquad \tilde{\mu}_{n} \preceq_{sd} \tilde{\nu}_{n}. \end{align}

Note that $\{\tilde{\nu}_{n}\}$ is tight. Since {μn} is tight, so is $\{\tilde{\mu}_{n}\}$. Thus, there exist subsequences $\{\mu_{n_{i}}\}_{i \in \mathbb N}, \{\tilde{\mu}_{n_{i}}\}_{i \in \mathbb N}$, and $\{\tilde{\nu}_{n_{i}}\}_{i \in \mathbb N}$ of {μn}, $\{\tilde{\mu}_{n}\}$, and $\{\tilde{\nu}_{n}\}$, respectively, such that, for some $\tilde{\mu}^{*}, \tilde{\nu}^{*} \in \cal M$ with $\|\tilde{\mu}^{*}\| = \|\tilde{\nu}^{*}\| = 1-\epsilon$, we have

\begin{align*} \mu_{n_{i}} {\buildrel {\rm w} \over \longrightarrow} \mu^{*}, \qquad \tilde{\mu}_{n_{i}} {\buildrel {\rm w} \over \longrightarrow} \tilde{\mu}^{*}, \qquad \tilde{\nu}_{n_{i}} {\buildrel {\rm w} \over \longrightarrow} \tilde{\nu}^{*}, \qquad \tilde{\mu}_{n_{i}} \preceq_{sd} \tilde{\nu}_{n_{i}} \quad\text{for all } i \in \mathbb N. \end{align*}

Given hcbS 0, since $\tilde{\mu}_{n_{i}} (h) \leq \mu_{n_{i}} (h)$ and $\tilde{\nu}_{n_{i}} (h) \leq \nu (h)$ for all $i \in \mathbb N$ by (A.10), we have $\tilde{\mu}^{*} (h) \leq \mu^{*} (h)$ and $\tilde{\nu}^{*} (h) \leq \nu (h)$ by weak convergence. Thus, $\tilde{\mu}^{*} \leq \mu^{*}$ and ˜ν ν by Lemma A.1. We have $\tilde{\mu}^{*} \preceq_{sd} \tilde{\nu}^{*}$ by [Reference Kamae, Krengel and O’Brien19, Proposition 3]. It follows that $ (\tilde{\mu}^{*}, \tilde{\nu}^{*}) \in \Phi(\mu^{*}, \nu) $. We have $g(\mu^{*}, \nu) \leq 1 - \|\tilde{\mu}^{*}\| = \epsilon$.

By a symmetric argument, we also have g(ν, μ ) ≤ ϵ. Hence, γ (ν, μ ) ≤ 2ϵ. Recalling the definition of ν, we have now shown that, for all mnϵ, γ(μm, μ ) ≤ 2ϵ. Since was arbitrary, this concludes the proof.

A.3. Proofs of Section 5 results

We begin with some lemmas.

Lemma A.2. If P is monotone then $\sigma(P) = \inf_{(\mu, \nu) \in \cal P \times \cal P} \alpha_O(\mu P, \nu P)$.

Proof. Let P be a monotone Markov kernel. It suffices to show that the inequality $\sigma(P) \leq \inf_{(\mu, \nu) \in \cal P \times \cal P} \alpha_O(\mu P, \nu P)$ holds, since the reverse inequality is obvious. By the definition of σ(P) and the identities in (3.6), the claim will be established if we can show that

(A.11)\begin{align} \label{eq:eled} \sup_{x,y} \sup_{I \in i\cal B} \{ P(x, I) - P(y, I) \} \geq \sup_{\mu,\nu} \sup_{I \in i\cal B} \{ \mu P(I) - \nu P(I) \}, \end{align}

where x and y are chosen from S and μ and ν are chosen from $\cal P$. Let s be the value of the right-hand side of (A.11) and let ϵ > 0. Fix μ, $\nu \in \cal P$ and $I \in i\cal B$ such that μP(I) − νP(I) > sϵ, or, equivalently,

\begin{align*} \int \{ P(x, I) - P(y, I) \} (\mu \times \nu)({\rm d} x, {\rm d} y) > s - \epsilon. \end{align*}

From this expression we see that there are , ȳS such that P(x̄, I) − P(ȳ, I) > sϵ. Hence, $\sup_{x,y} \sup_{I \in i\cal B} \{ P(x, I) - P(y, I) \} \geq s$, as was to be shown.

Lemma A.3. If μ, $\nu \in \cal M$ and ($(\tilde{\mu}, \tilde{\nu})$) is an ordered component pair of (μ, ν), then

\begin{align*} g(\mu P, \nu P) \leq g((\mu - \tilde{\mu}) P, (\nu - \tilde{\nu}) P). \end{align*}

Proof. Fix μ, ν in $\cal M$ and $(\tilde{\mu}, \tilde{\nu}) \in \Phi(\mu, \nu)$. Consider the residual measures $\hat{\mu} \,:\!= \mu - \tilde{\mu} $ and $\hat{\nu} \,:\!= \nu - \tilde{\nu}$. Let (μ , ν ) be a maximal ordered component pair of ($\hat{\mu} P, \hat{\nu} P$), and define

\begin{align*} \mu^{*} \,:\!= \mu' + \tilde{\mu} P \quad \text{and} \quad \nu^{*} \,:\!= \nu' + \tilde{\nu} P.\end{align*}

We claim that (μ , ν ) is an ordered component pair for (μP, νP). To see this, note that

\begin{align*} \mu^{*} = \mu' + \tilde{\mu} P \leq \hat{\mu} P + \tilde{\mu} P = (\hat{\mu} + \tilde{\mu}) P = \mu P, \end{align*}

and, similarly, ν νP. The measures μ and ν also have the same mass, since

\begin{align*} \|\mu^{*}\| = \|\mu'\| + \|\tilde{\mu} P\| = \|\mu'\| + \|\tilde{\mu}\| = \|\nu'\| + \|\tilde{\nu}\| = \|\nu'\| + \|\tilde{\nu} P\| = \|\nu^{*}\|.\end{align*}

Moreover, since $\tilde{\mu} \preceq_{sd} \tilde{\nu}$ and P is monotone, we have $\tilde{\mu} P \preceq_{sd} \tilde{\nu} P$. Hence, μ sd ν , completing the claim that (μ , ν ) is an ordered component pair for (μP, νP). As a result,

\begin{align*} g(\mu P, \nu P) & \leq \|\mu P\| - \|\mu^{*}\| \\ &= \|\mu P\| - \|\mu'\| - \|\tilde{\mu} P\| \\ & = \|\mu\| - \|\mu'\| - \|\tilde{\mu}\| \\ & = \|\hat{\mu}\| - \|\mu'\| \\ & = \|\hat{\mu} P\| - \|\mu'\| \\ &= g(\hat{\mu} P, \hat{\nu} P).\\[-36pt] \end{align*}

Proof of Theorem 5.1. Let μ, $\nu \in \cal P$. Let ($\tilde{\mu}, \tilde{\nu}$) be a maximal ordered component pair for (μ, ν). Let $\hat{\mu} = \mu - \tilde{\mu}$ and $\hat{\nu} = \nu - \tilde{\nu}$ be the residuals. Since $\|\tilde{\mu}\| = \|\tilde{\nu}\|$, we have $\|\hat{\mu}\| = \|\hat{\nu}\|$. Suppose first that $\|\hat{\mu}\| > 0$. Then $\hat{\mu} P/\|\hat{\mu}\|$ and $\hat{\nu} P/\|\hat{\mu}\|$ are both in $\cal P$. Thus, by Lemma A.2,

\begin{align*} 1 - \alpha_O\left(\frac{\hat{\mu} P }{\|\hat{\mu}\|}, \frac{\hat{\nu} P}{\|\hat{\mu}\|}\right) \leq 1 - \sigma(P).\end{align*}

Applying the positive homogeneity property in Lemma 3.3 yields

\begin{align*} \|\hat{\mu}\| - \alpha_O(\hat{\mu} P , \hat{\nu} P) \leq (1 - \sigma(P)) \, \|\hat{\mu}\|. \end{align*}

Note that this inequality trivially holds if $\|\hat{\mu}\| = 0$. From the definition of g, we can write the same inequality as $g(\hat{\mu} P, \hat{\nu} P) \leq (1 - \sigma(P)) g(\mu, \nu)$. If we apply Lemma A.3 to the latter we obtain

\begin{align*} g(\mu P, \nu P) \leq (1 - \sigma(P)) g(\mu, \nu). \end{align*}

Reversing the roles of μ and ν, we also have g(νP, μP) ≤ (1 − σ(P))g(ν, μ). Thus,

\begin{align*} \gamma(\mu P, \nu P) & = g(\mu P, \nu P) + g(\nu P, \mu P) \\ & \leq (1 - \sigma(P)) [g(\mu, \nu) + g(\nu, \mu)] \\ & = (1 - \sigma(P)) \gamma(\mu, \nu), \end{align*}

verifying the claim in (5.8).

Proof of Lemma 5.4. To see that (5.9) holds, fix ξ > σ(P) and suppose first that σ(P) = 1. Then (5.9) holds because the right-hand side of (5.9) can be made strictly negative by choosing μ, $\nu \in \cal P$ to be distinct. Now suppose that σ(P) < 1 holds. It suffices to show that,

(A.12)$$\begin{align} \label{eq:ev2} \text{for all } \epsilon > 0, \text{ there exists } x, y \in S \text{ such that } y \preceq x, x \npreceq y \text{ and } \alpha_0(P_x, P_y) < \sigma(P) + \epsilon. \end{align}$$

Indeed, if we take (A.12) as valid, set ϵ:= ξσ(P) and choose x and y to satisfy the conditions in (A.12), then we have

$$\gamma(P_x, P_y) = 2 - \alpha_0(P_x, P_y) - \alpha_0(P_y, P_x) > 2 - \xi - 1 = 1 - \xi = (1 - \xi) \gamma(\delta_x, \delta_y).$$

Therefore, (5.9) holds.

To show that (A.12) holds, fix ϵ > 0. We can use σ(P) < 1 and the definition of σ(P) as an infimum to choose an δ ∈ (0, ϵ) and points , ȳS such that α 0(P, Pȳ) < σ(P) + δ < 1. Note that cannot hold here, because then α 0(P, Pȳ) = 1, a contradiction. So suppose instead that $\bar x \npreceq \bar y$. Let y be a lower bound of and ȳ and let x := . We claim that (A.12) holds for the pair (x, y).

To see this, observe that, by the monotonicity result in Lemma 5.1 and yȳ, we have

$$\alpha_0(P_x, P_y) = \alpha_0(P_{\bar x}, P_y) \leq \alpha_0(P_{\bar x}, P_{\bar y}) < \sigma(P) + \delta < \sigma(P) + \epsilon.$$

Moreover, yx because x = and y is by definition a lower bound of . Finally, $x \npreceq y$ because if not then = xyȳ, contradicting our assumption that $\bar x \npreceq \bar y$.

References

Bhattacharya, R. and Lee, O. (1988). Asymptotics of a class of Markov processes which are not in general irreducible. Ann. Prob. 16, 13331347.CrossRefGoogle Scholar
Bhattacharya, R., Majumdar, M. and Hashimzade, N. (2010). Limit theorems for monotone Markov processes. Sankhyā A 72, 170190.CrossRefGoogle Scholar
Chakraborty, S. and Rao, B. (1998). Completeness of the Bhattacharya metric on the space of probabilities. Statist. Probab. Lett. 36, 321326.CrossRefGoogle Scholar
Diaconis, P. and Freedman, D. (1999). Iterated random functions. SIAM Review 41, 4576.Google Scholar
Dobrushin, R. L. (1956). Central limit theorem for nonstationary Markov chains. Theory Prob. Appl. 1, 6580.CrossRefGoogle Scholar
Doeblin, W. (1937). Sur les propriétés asymptotiques de mouvements régis par certains types de chaînes simples. Bull. Math. Soc. Roum. Sci. 39, (1) 57115; (2), 3–61.Google Scholar
Doeblin, W. (1938). Exposé de la théorie des chaînes simples constantes de Markov à un nombre fini d’états. Mathématique de l’Union Interbalkanique 2, 7880.Google Scholar
Doeblin, W. (1940). Éléments d’une théorie générale des chaînes simples constantes de Markoff. In Ann. Sci. Ec. Norm. Supér. 57, 61111.CrossRefGoogle Scholar
Doob, J. L. (1953). Stochastic Processes. Wiley, New York.Google Scholar
Dubins, L. E. and Freedman, D. A. (1966). Invariant probabilities for certain Markov processes. Ann. Math. Statist. 37, 837848.CrossRefGoogle Scholar
Dudley, R. (2002). Real Analysis and Probability. Cambridge University Press.CrossRefGoogle Scholar
Gibbs, A. L. and Su, F. E. (2002). On choosing and bounding probability metrics. Int. Stat. Rev. 70, 419435.CrossRefGoogle Scholar
Givens, C. R., Shortt, R. M. (1984). A class of Wasserstein metrics for probability distributions. Michigan Math. J. 31, 231240.Google Scholar
Harris, T. E. (1956). The existence of stationary measures for certain Markov processes. In Proc. Third Berkeley Symp. on Mathematical Statistics and Probability, Vol. 2, University of California Press, pp. 113124.Google Scholar
Hernández-Lerma, O. and Lasserre, J. (2003). Markov Chains and Invariant Probabilities. Springer.CrossRefGoogle Scholar
Hopenhayn, H. A. and Prescott, E. C. (1992). Stochastic monotonicity and stationary distributions for dynamic economies. Econometrica 60, 13871406.Google Scholar
Jarner, S. and Tweedie, R. (2001). Locally contracting iterated functions and stability of Markov chains. J. Appl. Prob. 38, 494507.CrossRefGoogle Scholar
Kamae, T. and Krengel, U. (1978). Stochastic partial ordering. Ann. Prob. 6, 10441049.CrossRefGoogle Scholar
Kamae, T., Krengel, U. and O’Brien, G. (1977). Stochastic inequalities on partially ordered spaces. Ann. Prob. 5, 899912.CrossRefGoogle Scholar
Kamihigashi, T. and Stachurski, J. (2012). An order-theoretic mixing condition for monotone Markov chains. Statist. Probab. Lett. 82, 262267.CrossRefGoogle Scholar
Kamihigashi, T. and Stachurski, J. (2014). Stochastic stability in monotone economies. Theoret. Econom. 9, 383407.Google Scholar
Lindvall, T. (2002). Lectures on the Coupling Method. Dover.Google Scholar
Machida, M. and Shibakov, A. (2010). Monotone bivariate Markov kernels with specified marginals. Proc. Amer. Math. Soc. 138, 21872194.Google Scholar
Markov, A. A. (1906). Extension of the law of large numbers to dependent quantities. Izv. Fiz.-Matem. Obsch. Kazan Univ. (2nd ser.) 15, 135156.Google Scholar
Meyn, S. P. and Tweedie, R. L. (2012). Markov Chains and Stochastic Stability. Springer Science & Business Media.Google Scholar
Nummelin, E. (2004). General Irreducible Markov Chains and Non-Negative Operators. Cambridge University Press.Google Scholar
Orey, S. (1959). Recurrent Markov chains. Pacific J. Math. 9, 805827.CrossRefGoogle Scholar
Pitman, J. W. (1974). Uniform rates of convergence for Markov chain transition probabilities. Prob. Theory Relat. Fields 29, 193227.Google Scholar
Pollard, D. (2002). A User’s Guide to Measure Theoretic Probability. Cambridge University Press.Google Scholar
Razin, A. and Yahav, J. A. (1979). On stochastic models of economic growth. Internat. Econom. Review 20, 599604.CrossRefGoogle Scholar
Revuz, D. (2008). Markov Chains. Elsevier.Google Scholar
Seneta, E. (1979). Coefficients of ergodicity: structure and applications. Adv. Appl. Prob. 11, 576590.CrossRefGoogle Scholar
Strassen, V. (1965). The existence of probability measures with given marginals. Ann. Math. Statist. 36, 423439.Google Scholar
Thorisson, H. (2000). Coupling, Stationarity, and Regeneration. Springer, New York.CrossRefGoogle Scholar
Wu, W. B. and Shao, X. (2004). Limit theorems for iterated random functions. J. Appl. Prob. 41, 425436.CrossRefGoogle Scholar
Yahav, J. A. (1975). On a fixed point theorem and its stochastic equivalent. J. Appl. Prob. 12, 605611.Google Scholar
Yosida, K. and Kakutani, S. (1941). Operator-theoretical treatment of Markoff’s process and mean ergodic theorem. Ann. Math. 42, 188228.Google Scholar
Figure 0

Figure 1: A time series from the inventory model