Hostname: page-component-745bb68f8f-b6zl4 Total loading time: 0 Render date: 2025-02-06T01:45:37.128Z Has data issue: false hasContentIssue false

A note on the polynomial ergodicity of the one-dimensional Zig-Zag process

Published online by Cambridge University Press:  18 July 2022

Giorgos Vasdekis*
Affiliation:
University of Warwick
Gareth O. Roberts*
Affiliation:
University of Warwick
*
*Postal address: Department of Statistics, University of Warwick, Coventry, CV4 7AL, UK.
*Postal address: Department of Statistics, University of Warwick, Coventry, CV4 7AL, UK.
Rights & Permissions [Opens in a new window]

Abstract

We prove polynomial ergodicity for the one-dimensional Zig-Zag process on heavy-tailed targets and identify the exact order of polynomial convergence of the process when targeting Student distributions.

Type
Original Article
Copyright
© The Author(s), 2022. Published by Cambridge University Press on behalf of Applied Probability Trust

1. Introduction

The Zig-Zag process is a piecewise deterministic Markov process (PDMP) that was recently used as a new way to construct Markov chain Monte Carlo (MCMC) algorithms. The one-dimensional Zig-Zag appeared in [Reference Bierkens and Roberts5] as a scaling limit of lifted Metropolis–Hastings (see [Reference Diaconis, Holmes and Neal12] and [Reference Turitsyn, Chertkov and Vucelja21]) applied to the Curie–Weiss model (see [Reference Levin, Luczak and Peres19]). The process was later extended to higher dimensions in [Reference Bierkens, Fearnhead and Roberts7] and has been proposed as a way to sample from posterior distributions in a Bayesian setting (see also [Reference Fearnhead, Bierkens, Pollock and Roberts14] and [Reference Vanetti, Bouchard-Côté, Deligiannidis and Doucet22]). Since then, its properties have been extensively studied in the literature (see e.g. [Reference Bierkens and Duncan4], [Reference Bierkens and Verduyn Lunel6], [Reference Bierkens, Kamatani and Roberts8], [Reference Bierkens, Nyquist and Schlottke9], etc.).

In [Reference Bierkens, Roberts and Zitt10] Bierkens, Roberts, and Zitt proved ergodicity and exponential ergodicity of the Zig-Zag process in arbitrary dimension, but a crucial assumption required for exponential ergodicity in their work is that the target density has exponential or lighter tails. In [Reference Vasdekis and Roberts23] Vasdekis and Roberts proved the converse result. The Zig-Zag sampler fails to be exponentially ergodic when the target distribution is heavy-tailed. On the other hand, it was shown in Theorem 1 of [Reference Bierkens, Roberts and Zitt10] that the process will converge to the invariant measure under very mild assumptions, including the heavy-tailed case. Furthermore, Andrieu, Dobson, and Wang [Reference Andrieu, Dobson and Wang1] used hypocoercivity techniques (see also [Reference Andrieu, Durmus, Nüsken and Roussel2]) to prove polynomial rates of convergence for the Zig-Zag process on heavy-tailed targets in arbitrary dimension.

In this note we will focus on the one-dimensional Zig-Zag process and prove polynomial ergodicity in the heavy-tailed scenario. The result applies in the special case where the tails of the target decay in the same manner as a Student distribution with $\nu$ degrees of freedom. In that case, we prove that the polynomial rate of convergence is arbitrarily close to $\nu$ but not more than $\nu$ . This improves upon the result stated in [Reference Andrieu, Dobson and Wang1] in the special case where $d=1$ , although their work provides convergence results for higher dimensions as well.

The rest of this paper is organised as follows. In Section 2 we recall the definition of the one-dimensional Zig-Zag process and state the main result concerning its polynomial ergodicity in heavy-tailed targets. In Section 3 we provide a proof of this result. Finally, in Section 4 we discuss the rates of polynomial convergence of the Zig-Zag process and compare them with those of other state-of-the-art Metropolis–Hastings algorithms.

2. Results

We begin by recalling the definition of the one-dimensional Zig-Zag process. Let $E = \mathbb{R} \times \{ -1,+1 \}$ , $U \in C^1(E)$ , and $\lambda\,:\, E \rightarrow \mathbb{R}_{\geq 0}$ with

\begin{equation*} \lambda(x,\theta)=[\theta U^{\prime}(x)]^++\gamma(x),\end{equation*}

where $\gamma$ is a non-negative integrable function and we write $a^+=\max\{ a,0 \}$ . The one-dimensional Zig-Zag process $(Z_t)_{t \geq 0}=(X_t,\Theta)_{t \geq 0}$ is a continuous-time Markov process with state space E which evolves as follows. If the process starts from $(x,\theta) \in E$ , then $X_t=x+t\theta$ and $\Theta_t=\theta$ for all $t < T_1$ , where $T_1$ is the first arrival time of a non-homogeneous Poisson process with rate $m_1(s)=\lambda(x+s\theta,\theta)$ . Then $X_{T_1}=x+T_1\theta$ , $\Theta_{T_1}=-\theta$ . Then $X_{t}=X_{T_1}+(t-T_1)\Theta_{T_1}$ and $\Theta_t=\Theta_{T_1}$ for all $t \in (T_1,T_1+T_2)$ , where $T_2$ is the first arrival time of a Poisson process with intensity $m_2(s)=\lambda(X_{T_1}+s\Theta_{T_1},\Theta_{T_1})$ . The process is then defined inductively up to time $T_n$ for all $n \in \mathbb{R}$ .

It was proved in [Reference Bierkens, Fearnhead and Roberts7] that with this choice of $\lambda$ , the process has measure $\mu$ as invariant, where

(1) \begin{equation} \mu=\pi \bigotimes \dfrac{1}{2}\delta_{\{-1,+1\}}\end{equation}

and

\begin{equation*}\pi({\mathrm{d}} x)=\dfrac{1}{Z}\exp\{ -U(x) \}{\mathrm{d}} x,\end{equation*}

with $Z=\int_{\mathbb{R}}\exp\{ -U(y) \}{\mathrm{d}} y<\infty$ .

Since this note concerns the polynomial ergodicity of the one-dimensional Zig-Zag process, we now recall the definition of polynomial ergodicity.

Definition 1. Let $(Z_t)_{t \geq 0}$ be a Markov process with state space E, having invariant probability measure $\mu$ , and let $k>0$ . We say that the process is polynomially ergodic of order k if there exists a function $M\,:\, E \rightarrow \mathbb{R}_{>0}$ such that for all $z \in E$ and $t \geq 0$ ,

\begin{equation*}\| \mathbb{P}_z ( Z_t \in \cdot ) - \mu ({\cdot}) \|_{TV} \leq \dfrac{M(z)}{t^k},\end{equation*}

where $\mathbb{P}_z$ denotes the law of the process starting from z.

We will make the following assumption, typically verified in practice.

Assumption 1. Assume that there exists a $\nu>0$ and a compact set $C \subset \mathbb{R}$ such that for all $x \notin C$ ,

(2) \begin{equation}|U^{\prime}(x)| \geq \dfrac{1+\nu}{|x|}.\end{equation}

Remark 1. This assumption directly implies that there exists a c such that for all $x \in \mathbb{R}$ , $U(x) \geq (1+\nu) \log\!(|x|)-c^{\prime}$ . This is an assumption made in [Reference Bierkens, Roberts and Zitt10] in order to prove non-evanescence of the Zig-Zag process, and it is a natural assumption given that the function $\exp\{ -U(x) \}$ must be integrable.

We also need the following assumption for the refresh rate $\gamma$ .

Assumption 2. Assume that the refresh rate satisfies

\begin{equation*}\lim_{|x|\rightarrow \infty}\dfrac{\gamma(x)}{ | U^{\prime}(x) |}=0.\end{equation*}

Assumption 2 ensures that the bouncing events will vastly outnumber the refresh events, at least in the tails of the target. While we have not been able to establish the necessity of this assumption for the conclusions of Theorem 1, some control over ${\gamma(x)}/{ | U^{\prime}(x) |}$ is definitely needed. Indeed, in the regime where $\lim_{|x|\rightarrow \infty}{\gamma(x)}/{ | U^{\prime}(x) |}=+\infty$ (as would be the case with constant refresh rate and heavy-tailed targets), random direction changes would outnumber systematic ones, leading to random walk/diffusive behaviour commonly associated with slow convergence. In fact, in this case, the algorithm would resemble a random walk Metropolis algorithm, which is known to converge at a slower polynomial rate. Therefore we believe that some control of the refresh rate relative to $|U^{\prime}|$ should be assumed for guarantees about specific rates of convergence to hold. This can be further supported by simulation studies. In Figure 1 we present the mean square error of estimating a tail probability ( $\mathbb{P}(X \geq 5)$ for a standard Cauchy target) using Zig-Zag with different refresh rates. We consider $\gamma(x)=0$ , $\gamma(x)=|U^{\prime}(x)|$ and $\gamma(x)=1$ . For each algorithm we generated 1000 independent realisations, all starting from $({-}5,+1)$ , and all realisations run until time $T=10^4$ . For each time less than T the average square error of the true probability (approximately equal to $0.0628$ ) is reported. It is clear that the smaller refresh rate leads to more rapid convergence.

Figure 1. Mean square error over time of three Zig-Zag algorithms with different refresh rates $\gamma(x)$ , described in the upper right corner. The algorithms target the Cauchy distribution and are used to estimate the probability mass the Cauchy distribution assigns to the event $[5,+\infty)$ . For all three algorithms, 1000 independent realisations were generated, and for each time less than $10^4$ , the average square error between the approximation of the probability and the actual probability (approximately $0.0628$ ) is reported. Evidently the fastest convergence is achieved by lowering the values of $\gamma$ .

Our main result is the following.

Theorem 1. (Polynomial ergodicity of Zig-Zag.) Suppose that U satisfies Assumption 1 and let C and $\nu>0$ as in (2). Suppose further that the refresh rate satisfies Assumption 2. Then, for any $k<\nu$ , there exist constants $B,\delta>0$ and $\beta \in (0,1)$ such that if we let

\begin{equation*} V_{\beta, \delta}(x,\theta)=\exp \{ \beta U(x) + \delta\ \mathrm{sgn}(x) \theta \},\end{equation*}

then for all $(x,\theta) \in \mathbb{R} \times \{ -1,+1 \}$ ,

(3) \begin{equation}\| \mathbb{P}_{x,\theta}(Z_t \in \cdot)-\mu({\cdot}) \|_{TV} \leq \dfrac{B V_{\beta, \delta}(x,\theta)}{t^{1+k}}+\dfrac{B}{t^{k}},\end{equation}

i.e. the process is polynomially ergodic of order k.

Remark 2. By carefully inspecting the proof of Theorem 1, we observe that Assumption 2 can be weakened in the following sense. If $\nu$ is as in Assumption 1 and if we fix $k<\nu$ , then in order to prove polynomial ergodicity of order k, it suffices that there exists a small $\eta >0$ and that we ask that

\begin{equation*} \lim_{|x| \rightarrow \infty}\dfrac{\gamma(x)}{|U^{\prime}(x)|} \leq M{,}\end{equation*}

where

(4) \begin{equation} M=M(k)=\biggl( \dfrac{k+1}{\nu +1}( 1+ \eta ) -\eta \biggr)\dfrac{\bigl( 1- \frac{1+k}{1+\nu}( 1+\eta ) \bigr)^{1+\eta}}{1- \bigl( 1- \frac{1+k}{1+\nu}( 1+\eta ) \bigr)^{1+\eta}}.\end{equation}

We observe, however, that M is not uniform in k. More precisely, assuming that $\eta$ is small enough so that $M(k)>0$ for all k, we have $\lim_{k \rightarrow \nu}M(k)=0$ . A proof of this remark will be given in Section 3.

An immediate corollary of Theorem 1 is the following characterisation of the order of polynomial convergence of the one-dimensional Zig-Zag on Student distributions. For the following lower bound on the total variation distance from the invariant measure, we use a type of argument similar to the proof of Theorem 2.1 of [Reference Vasdekis and Roberts23], suggested to us by Professor Anthony Lee in a private communication.

Corollary 1. Let $\nu>0$ and suppose $\pi$ is a Student distribution with $\nu$ degrees of freedom, that is,

(5) \begin{equation} \pi(x) = \dfrac{1}{Z} \biggl( 1+ \dfrac{x^2}{\nu} \biggr)^{\!-(\nu+1)/2}\end{equation}

and the Zig-Zag process targets $\mu$ as in (1), having a refresh rate that satisfies Assumption 2. For all $k<\nu$ , there exist $\beta \in (0,1)$ and $\delta , B>0$ such that for all $(x,\theta) \in \mathbb{R} \times \{ -1,+1 \}$ ,

\begin{equation*}\| \mathbb{P}_{x,\theta}(Z_t \in \cdot)-\mu({\cdot}) \|_{TV} \leq \dfrac{BV_{\beta, \delta}(x)}{t^{1+k}}+\dfrac{B}{t^{k}},\end{equation*}

that is, for all $k<\nu$ , the process is polynomially ergodic of order k. Furthermore, for any $k>\nu$ , the process is not polynomially ergodic of order k. More specifically, there exists a constant C such that for all $t>0$ large enough,

\begin{equation*}\|\mathbb{P}_{0,+1}(Z_t \in \cdot)-\mu({\cdot})\|_{TV} \geq \dfrac{C^{\prime}}{t^{\nu}}.\end{equation*}

Corollary 1 illustrates that Theorem 1 is close to being tight. At least for Student distributions with $\nu$ degrees of freedom, it is established that the process is polynomially ergodic for order $k<\nu$ and it is not for any order $k>\nu$ . Whether the process is polynomially ergodic of order $\nu$ is still an open question, but probably of limited practical importance.

3. Proofs

Before we prove Theorem 1 we recall the form of the strong generator of the Zig-Zag process.

Definition 2. We define $\mathcal{L}$ to be the operator acting on any $f \in C^1(E)$ so that for all $(x,\theta) \in E$ ,

\begin{equation*} \mathcal{L}f(x,\theta)=\theta f^{\prime}(x,\theta)+([\theta U^{\prime}(x)]^++\gamma(x))( f(x,-\theta)-f(x,\theta) ) .\end{equation*}

It can be proved (see e.g. [Reference Davis11]) that $\mathcal{L}$ is the restriction on $C^1$ of the strong generator of $(Z_t)_{t \geq 0}$ .

General conditions for polynomial ergodicity can be found in Theorems 3.2 and 3.4 of [Reference Douc, Fort and Guillin13] and Theorem 1.2 of [Reference Bakry, Cattiaux and Guillin3] (see also Corollary 6 of [Reference Fort and Roberts15] for some earlier results on sub-geometric convergence). Here we use a result found and proved in the presented form in the unpublished lecture notes by Hairer [Reference Hairer16].

Theorem 2. (Theorem 4.1 of [Reference Hairer16].) Let $(X_t)_{t \geq 0}$ a continuous-time Markov process on X with strong generator $\mathcal{L}$ . Suppose that there exists a function $V\,:\, X \rightarrow [1,+\infty)$ and a constant K such that for all $x \in X$ ,

(6) \begin{equation}\mathcal{L}V(x)\leq K- f(V)\end{equation}

for a function $f\,:\, [0,+\infty)\rightarrow [0,+\infty)$ which is strictly concave, increasing, with $f(0)=0$ , $\lim_{s \rightarrow +\infty}f(s)=+\infty$ . Suppose further that all the sub-level sets of V are pre-compact and small. Then the following hold.

  1. (i) There exists a unique invariant measure $\mu$ for the process such that $\int\! f(V(x)) \mu({\mathrm{d}} x)<\infty$ .

  2. (ii) Let $H_{f}(u)=\int_{1}^u1/f(s){\mathrm{d}} s$ . Then there exists a constant $B>0$ such that for every $x \in X$ ,

    \begin{equation*} \| \mathbb{P}_x(X_t \in \cdot)-\mu({\cdot}) \|_{TV} \leq \dfrac{B V(x)}{H^{-1}_{f}(t)}+\dfrac{B}{f \circ H^{-1}_{f}(t)}.\end{equation*}

Proof of Theorem 1.Suppose that U satisfies Assumption 1 and let $k<\nu$ . Select a such that $k=a/(1-a)$ , so that $a<1-1/(1+\nu)$ . For any $\tilde{\beta} \in (0,1)$ we see that there exists a $c_0>0$ such that for all $x \notin C$ ,

\begin{align*}\bigl( V_{\tilde{\beta}, \delta}(x,\theta) \bigr)^{1-a}|U^{\prime}(x)|\geq c_0 \exp \bigl\{ \tilde{\beta} (1-a)(1+\nu) \log\! |x| \bigr\}\dfrac{1+\nu}{|x|}=c_0(1+\nu) | x |^{\tilde{\beta} (1-a) (1+\nu)-1} .\end{align*}

Since $(1-a)(1+\nu)-1>0$ , there exists a $\beta$ close to 1 such that $\beta (1-a) (1+\nu)-1>0$ , so

(7) \begin{equation}\lim_{|x| \rightarrow \infty}V^{1-a}_{\beta}(x,\theta)|U^{\prime}(x)|=+\infty.\end{equation}

Now $V_{\beta, \delta} \in C^1$ and $\lim_{|x|\rightarrow \infty}V_{\beta, \delta}(x,\theta)=+\infty$ , so all the level sets are compact. Since the process is positive Harris recurrent and some skeleton is irreducible (see [Reference Bierkens, Roberts and Zitt10]), we get from Proposition 6.1 of [Reference Meyn and Tweedie20] that the level sets are also small. Since $\lim_{|x|\rightarrow \infty}U(x)=+\infty$ , it is clear that $V_{\beta, \delta}$ is bounded below away from 0, so by multiplying with an appropriate constant we can assume that $V_{\beta, \delta}(x,\theta)\geq 1$ for all $(x,\theta)$ . We calculate

\begin{equation*}\mathcal{L}V_{\beta, \delta}(x,\theta)=V_{\beta, \delta}(x,\theta)\bigl( \theta \beta U^{\prime}(x)+([\theta U^{\prime}(x)]^++\gamma(x))({\exp} \{ -2 \theta\ \mathrm{sgn}(x) \delta \}-1 ) \bigr).\end{equation*}

Note that due to Assumption 1 and since

\begin{equation*}U(x)\xrightarrow{|x| \rightarrow \infty} +\infty,\end{equation*}

we get that there exists a compact set $\tilde{C}$ such that for all $x \notin \tilde{C}$ , $\mathrm{sgn}(U^{\prime}(x))=\mathrm{sgn}(x)$ . Therefore, when $x \notin \tilde{C}$ and $\theta\ \mathrm{sgn}(x)>0$ ,

\begin{align*}\dfrac{\mathcal{L}V_{\beta, \delta}(x,\theta)}{V_{\beta, \delta}^a(x,\theta)}\leq V_{\beta, \delta}^{1-a}(x,\theta)|U^{\prime}(x)|\biggl[ \beta+\biggl( \dfrac{\gamma(x)}{|U^{\prime}(x)|}+1 \biggr) ({\exp} \{ -2 \delta \}-1 )\biggr],\end{align*}

and when $\theta \mathrm{sgn}(x) <0$ ,

\begin{equation*}\dfrac{\mathcal{L}V_{\beta, \delta}(x,\theta)}{V_{\beta, \delta}^a(x,\theta)}\leq V_{\beta, \delta}^{1-a}(x,\theta)|U^{\prime}(x)|\biggl[ -\beta +\dfrac{\gamma(x)}{|U^{\prime}(x)|}({\exp} \{ 2 \delta \} -1) \biggr].\end{equation*}

Overall, we have for $x \notin \tilde{C}$

(8) \begin{align} &\dfrac{\mathcal{L}V_{\beta, \delta}(x,\theta)}{V_{\beta, \delta}^a(x,\theta)} \leq V_{\beta, \delta}^{1-a}(x,\theta)|U^{\prime}(x)| \max \biggl\{ \beta+\biggl( \dfrac{\gamma(x)}{|U^{\prime}(x)|}+1 \biggr) ({\exp} \{ -2 \delta \}-1 ) ,\nonumber\\& \qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\biggl[\! -\beta +\dfrac{\gamma(x)}{|U^{\prime}(x)|}({\exp} \{ 2 \delta \} -1) \biggr] \biggr\}.\end{align}

Recall that

\begin{equation*}\dfrac{\gamma(x)}{|U^{\prime}(x)|}\xrightarrow{|x| \rightarrow \infty}0.\end{equation*}

Fix $\delta>-1/2\log\!(1-\beta)$ , and by possibly increasing $\tilde{C}$ we have that there exists a constant $c^{\prime}>0$ such that for all $x \notin \tilde{C}$ ,

\begin{equation*}\max \biggl\{ \beta+\biggl(\dfrac{\gamma(x)}{|U^{\prime}(x)|}+1 \biggr) ({\exp} \{ -2 \delta \}-1), -\beta +\dfrac{\gamma(x)}{|U^{\prime}(x)|}({\exp} \{ 2 \delta \} -1 ) \biggr\}<-c^{\prime}<0.\end{equation*}

Combining this with (7) and (8), we find that $V_{\beta, \delta}$ satisfies (6) with $f(u)=cu^a$ for $c=c^{\prime}/2$ and with K being an appropriate constant that bounds the continuous function $\mathcal{L}V_{\beta, \delta}+ f(V_{\beta, \delta})$ inside $\tilde{C}$ .

Therefore all the conditions of Theorem 2 are satisfied. Note that

\begin{equation*} H_{f}(s)=\int_1^s1/f(u){\mathrm{d}} u=c^{-1}\int_1^su^{-a}{\mathrm{d}} u=\dfrac{1}{c(1-a)}\bigl(s^{1-a}-1\bigr), \end{equation*}

so

\begin{equation*}H^{-1}_{f}(t)= ( 1+c(1-a)t )^{1/(1-a)}\end{equation*}

and therefore

\begin{equation*}f \circ H^{-1}_{f}(t)= c ( 1+c(1-a)t )^{a/(1-a)}.\end{equation*}

Since we picked a such that $k=a/(1-a)$ , meaning that $k+1=1/(1-a)$ , (3) follows.

Proof of Remark 2.Fix $k<\nu$ and set a such that $k=a/(1-a)$ , therefore $1-a=1/(1+k)$ . Our goal is to find an appropriate upper bound on ${\gamma(x)}/{|U^{\prime}(x)|}$ that will guarantee that the right-hand side of (8) is negative for appropriately chosen $\beta$ and $\delta$ . For this it suffices to prove that for some $\beta,\delta >0$ , the maximum appearing in the right-hand side of (8) is negative, bounded away from zero and that (7) holds. These two conditions will guarantee that the drift condition (6) holds for $V_{\beta , \delta}$ and we can conclude similarly to the proof of Theorem 1. Since from the proof of that theorem

\begin{align*}\bigl( V_{\tilde{\beta}, \delta}(x,\theta) \bigr)^{1-a}|U^{\prime}(x)|\geq c_0(1+\nu) | x |^{\tilde{\beta} (1-a) (1+\nu)-1},\end{align*}

in order to guarantee (7) it suffices to pick $\beta=(1+\eta)(1+k)/(1+\nu)$ for some $\eta >0$ . From the discussion after (8), we can pick

\begin{equation*}\delta=-\frac{1}{2}(1+\eta)\log\!(1-\beta)=-\frac{1}{2}(1+\eta)\log\!(1-(1+\eta)(1+k)/(1+\nu)).\end{equation*}

With this choice of $\beta, \delta$ , the first part of the maximum of the right-hand side of (8) will be negative, bounded away from zero. The second part of the maximum is given by

\begin{align*}-\beta +\dfrac{\gamma(x)}{|U^{\prime}(x)|}({\exp} \{ 2 \delta \} -1) & =-(1+\eta)\dfrac{1+k}{1+\nu}\\& \quad +\dfrac{\gamma(x)}{|U^{\prime}(x)|}\biggl( \biggl(\dfrac{1}{(1-(1+\eta)(1+k)/(1+\nu))}\biggr)^{(1+\eta)} -1\biggr).\end{align*}

Therefore, solving the inequality

\begin{equation*}-\beta +\frac{\gamma(x)}{|U^{\prime}(x)|}({\exp} \{ 2 \delta \} -1)<-\eta\end{equation*}

(which would guarantee that the maximum on the right-hand side of (8) is negative, bounded away from zero), we get

\begin{equation*} \dfrac{\gamma(x)}{|U^{\prime}(x)|} \leq M(k){,}\end{equation*}

where M(k) as in (4).

Proof of Corollary 1.Let $\pi$ as in (5). Then for all $\delta^{\prime}>0$ there exists a compact set C with $0 \in C$ , such that for all $x \notin C$ ,

\begin{align*}|U^{\prime}(x)|=\dfrac{(\nu+1)|x|}{\nu+x^2}\geq \dfrac{1+\nu-\delta^{\prime}}{|x|}.\end{align*}

Therefore, for every $\delta^{\prime}>0$ , the distribution satisfies Assumption 1, where the $\nu$ in that assumption is equal to $\nu-\delta^{\prime}$ . From Proposition 1, for all $k<\nu$ , there exists a $\beta \in (0,1), \delta >0$ and $B>0$ such that for all $(x,\theta) \in \mathbb{R} \times \{ -1,+1 \}$ ,

\begin{equation*}\| \mathbb{P}_{x,\theta}(Z_t \in \cdot)-\mu({\cdot}) \|_{TV} \leq \dfrac{B V_{\beta, \delta}(x)}{t^{1+k}}+\dfrac{B}{t^{k}}.\end{equation*}

Now suppose that the Zig-Zag starts from $x=0$ , $\theta=+1$ . There exists a $C_0$ and $K>0$ such that for all $|x| \geq K$ , $\pi(x)\geq C_0 |x|^{-\nu-1}$ . Fix a time $t>K$ . Let $A_t= \{ x\,:\, x>t \}$ . After time less than or equal to t has passed, the Zig-Zag will not have hit $A_t$ . We therefore get for all $t > K$

\begin{align*}\|\mathbb{P}_{0,+1}(Z_t \in \cdot)-\mu({\cdot})\|_{TV} &\geq| \mathbb{P}_{0,+1}(X_t \in A_t)-\pi(A_t) |\\ &=\pi(A_t)\\&= \int_{|x|>t} \pi(x){\mathrm{d}} x \\& \geq 2C_0\int_t^{+\infty}x^{-\nu-1}{\mathrm{d}} x\\ &=2\dfrac{C_0}{\nu}\dfrac{1}{t^{\nu}}.\end{align*}

4. Discussion

It is interesting to compare the Zig-Zag polynomial convergence rates with those of the one-dimensional random walk Metropolis (RWM) algorithm and the Metropolis-adjusted Langevin algorithm (MALA). In fact, it is shown in Propositions 4.1 and 4.3 of [Reference Jarner and Tweedie18] (see also [Reference Jarner and Roberts17]) that when targeting a Student distribution with $\nu$ degrees of freedom with any finite variance proposal RWM or with MALA, one gets polynomial order of convergence $(\nu/2)^-$ , that is, for any $\epsilon>0$ , the polynomial rate of convergence is at least $\nu/2-\epsilon$ . However, it is proved not to be $\nu/2$ . In [Reference Jarner and Roberts17], Jarner and Roberts provide these lower bounds for the convergence rates, while in [Reference Jarner and Tweedie18] they provide the upper bound. As proved in this note, the one-dimensional Zig-Zag has polynomial rate of convergence $\nu^-$ in the same setting, which is better than RWM or MALA. This phenomenon was also observed in simulations in [Reference Bierkens and Duncan4]. We conjecture that the advantage of the Zig-Zag is due its momentum, which, in a one-dimensional, unimodal setting with zero refresh rate, will force the process to never switch direction before it hits the mode. This diminishes any possible diffusive behaviour of the process at the tails and helps the algorithm converge faster. We should note here that better polynomial rates (and more precisely, arbitrarily better rates) can be achieved for random walk Metropolis (RWM) if one introduces a proposal with heavier tails. However, a natural analogue of this modification is to allow the Zig-Zag to speed up and move faster in areas of lower density. This idea is further discussed in [Reference Vasdekis and Roberts23] and proved to be able to provide exponentially ergodic algorithms even on heavy-tailed targets, which can outperform the simple Zig-Zag in the sense of having better effective sample size per number of likelihood evaluations.

Acknowledgements

The authors would like to acknowledge Professor Anthony Lee for an indication of the proof of the lower bound of the total variation distance in Corollary 1. They would also like to acknowledge Andrea Bertazzi, George Deligiannidis, and Krzysztof Latuszyński for helpful discussions. Furthermore, they would like to thank the anonymous referee and the Editor for helpful comments that improved this note.

Funding information

G. Vasdekis was supported by EPSRC as part of the MASDOC DTC (EP/HO23364/1) and the Department of Statistics at the University of Warwick (EP/N509796/1). He is also supported by NERC (NE/T00973X/1) under Dr Richard Everitt. G. O. Roberts was supported by EPSRC under the CoSInES (EP/R018561/1) and Bayes for Health (EP/R034710/1) programmes.

Competing interests

There were no competing interests to declare which arose during the preparation or publication process of this article.

References

Andrieu, C., Dobson, P. and Wang, A. Q. (2021). Subgeometric hypocoercivity for piecewise-deterministic Markov process Monte Carlo methods. Electron. J. Prob. 26, 126.10.1214/21-EJP643CrossRefGoogle Scholar
Andrieu, C., Durmus, A., Nüsken, N. and Roussel, J. (2021). Hypocoercivity of piecewise deterministic Markov Process-Monte Carlo. Ann. Appl. Prob. 31, 24782517.10.1214/20-AAP1653CrossRefGoogle Scholar
Bakry, D., Cattiaux, P. and Guillin, A. (2008). Rate of convergence for ergodic continuous Markov processes: Lyapunov versus Poincaré. J. Funct. Anal. 254, 727759.10.1016/j.jfa.2007.11.002CrossRefGoogle Scholar
Bierkens, J. and Duncan, A. (2017). Limit theorems for the zig-zag process. Adv. Appl. Prob. 49, 791825.10.1017/apr.2017.22CrossRefGoogle Scholar
Bierkens, J. and Roberts, G. O. (2017). A piecewise deterministic scaling limit of lifted Metropolis–Hastings in the Curie–Weiss model. Ann. Appl. Prob. 27, 846882.10.1214/16-AAP1217CrossRefGoogle Scholar
Bierkens, J. and Verduyn Lunel, S. M. (2022). Spectral analysis of the zigzag process. Ann. Inst. H. Poincaré Prob. Statist. 58, 827860.10.1214/21-AIHP1188CrossRefGoogle Scholar
Bierkens, J., Fearnhead, P. and Roberts, G. O. (2019). The Zig-Zag process and super-efficient sampling for Bayesian analysis of big data. Ann. Statist. 47, 12881320.10.1214/18-AOS1715CrossRefGoogle Scholar
Bierkens, J., Kamatani, K. and Roberts, G. (2018). High-dimensional scaling limits of piecewise deterministic sampling algorithms. Available at arXiv:1807.11358.Google Scholar
Bierkens, J., Nyquist, P. and Schlottke, M. C. (2021). Large deviations for the empirical measure of the zig-zag process. Ann. Appl. Prob. 31, 28112843.10.1214/21-AAP1663CrossRefGoogle Scholar
Bierkens, J., Roberts, G. O. and Zitt, P. A. (2019). Ergodicity of the zigzag process. Ann. Appl. Prob. 29, 22662301.10.1214/18-AAP1453CrossRefGoogle Scholar
Davis, M. (2018). Markov Models and Optimization (Chapman and Hall/CRC Monographs on Statistics and Applied Probability). Routledge.10.1201/9780203748039CrossRefGoogle Scholar
Diaconis, P., Holmes, S. and Neal, R. M. (2000). Analysis of a nonreversible Markov chain sampler. Ann. Appl. Prob. 10, 726752.10.1214/aoap/1019487508CrossRefGoogle Scholar
Douc, R., Fort, G. and Guillin, A. (2009). Subgeometric rates of convergence of f-ergodic strong Markov processes. Stoch. Process. Appl. 119, 897923.10.1016/j.spa.2008.03.007CrossRefGoogle Scholar
Fearnhead, P., Bierkens, J., Pollock, M. and Roberts, G. O. (2018). Piecewise deterministic Markov processes for continuous-time Monte Carlo. Statist. Sci. 33, 386412.10.1214/18-STS648CrossRefGoogle Scholar
Fort, G. and Roberts, G. O. (2005). Subgeometric ergodicity of strong Markov processes. Ann. Appl. Prob. 15, 15651589.10.1214/105051605000000115CrossRefGoogle Scholar
Hairer, M. (2016). Convergence of Markov Processes. Unpublished lecture notes, available at http://www.hairer.org/Teaching.html.Google Scholar
Jarner, S. and Roberts, G. O. (2007). Convergence of heavy-tailed Monte Carlo Markov chain algorithms. Scand. J. Statist. 34, 781815.Google Scholar
Jarner, S. F. and Tweedie, R. L. (2003). Necessary conditions for geometric and polynomial ergodicity of random-walk-type. Bernoulli 9, 559578.10.3150/bj/1066223269CrossRefGoogle Scholar
Levin, D., Luczak, M. and Peres, Y. (2007). Glauber dynamics for the mean-field Ising model: cut-off, critical power law, and metastability. Prob. Theory Related Fields 146, 223265.10.1007/s00440-008-0189-zCrossRefGoogle Scholar
Meyn, S. P. and Tweedie, R. L. (1993). Stability of Markovian processes II: Continuous-time processes and sampled chains. Adv. Appl. Prob. 25, 487517.10.2307/1427521CrossRefGoogle Scholar
Turitsyn, K. S., Chertkov, M. and Vucelja, M. (2011). Irreversible Monte Carlo algorithms for efficient sampling. Physica D 240, 410414.10.1016/j.physd.2010.10.003CrossRefGoogle Scholar
Vanetti, P., Bouchard-Côté, A., Deligiannidis, G. and Doucet, A. (2017). Piecewise-deterministic Markov chain Monte Carlo. Available at arXiv:1707.05296.Google Scholar
Vasdekis, G. and Roberts, G. O. (2021). Speed up zig-zag. Available at arXiv:2103.16620.Google Scholar
Figure 0

Figure 1. Mean square error over time of three Zig-Zag algorithms with different refresh rates $\gamma(x)$, described in the upper right corner. The algorithms target the Cauchy distribution and are used to estimate the probability mass the Cauchy distribution assigns to the event $[5,+\infty)$. For all three algorithms, 1000 independent realisations were generated, and for each time less than $10^4$, the average square error between the approximation of the probability and the actual probability (approximately $0.0628$) is reported. Evidently the fastest convergence is achieved by lowering the values of $\gamma$.