Hostname: page-component-745bb68f8f-b6zl4 Total loading time: 0 Render date: 2025-02-06T05:19:47.725Z Has data issue: false hasContentIssue false

On the occupancy problem for a regime-switching model

Published online by Cambridge University Press:  04 May 2020

Michael Grabchak*
Affiliation:
University of North Carolina Charlotte
Mark Kelbert*
Affiliation:
National Research University Higher School of Economics
Quentin Paris*
Affiliation:
National Research University Higher School of Economics
*
*Postal address: Department of Mathematics and Statistics, Charlotte, NC, USA. Email address: mgrabcha@uncc.edu
**Postal address: National Research University Higher School of Economics (HSE), Faculty of Economics, Department of Statistics and Data Analysis, Moscow, Russia. Email address: mkelbert@hse.ru
***Postal address: National Research University Higher School of Economics (HSE), Faculty of Computer Science, School of Data Analysis and Artificial Intelligence & HDI Lab, Moscow, Russia. Email address: qparis@hse.ru
Rights & Permissions [Opens in a new window]

Abstract

This article studies the expected occupancy probabilities on an alphabet. Unlike the standard situation, where observations are assumed to be independent and identically distributed, we assume that they follow a regime-switching Markov chain. For this model, we (1) give finite sample bounds on the expected occupancy probabilities, and (2) provide detailed asymptotics in the case where the underlying distribution is regularly varying. We find that in the regularly varying case the finite sample bounds are rate optimal and have, up to a constant, the same rate of decay as the asymptotic result.

Type
Research Papers
Copyright
© Applied Probability Trust 2020

1. Introduction

Let A be a finite or countably infinite set and let $X=(X_{n})_{n\ge 1}$ be a discrete-time A-valued stochastic process defined on some probability space $(\Omega, \mathcal F, \mathrm{P})$ . We refer to set A as the alphabet and to elements of A as letters. These letters may represent different things in the context of different applications. For instance, in linguistics they may represent words in some language, while in ecology they may represent species in an ecosystem. From a general point of view, the occupancy problem (or urn scheme) is to describe the repartition of the process $(X_{n})_{n\ge 1}$ over the set A. In this context two quantities of interest are

\begin{equation*}L_n =\sum_{i=1}^{n}\textbf{1}\{X_{i}=X_{n+1}\} {\quad\mbox{and}\quad} M_{n,r}=\mathrm{P}\left\{L_n=r \,\vert\, X_{1},\dots,X_{n}\right\}\!.\end{equation*}

These quantities are related by the fact that

\begin{equation*}\mathrm{P}\{L_n=r\} = {\mathrm{E}}\{ M_{n,r}\}.\end{equation*}

In words, $L_{n}$ is the number of times that the letter observed at time $n+1$ had previously been observed, and $M_{n,r}$ is the probability that, given the observations up to time n, the letter observed at time $n+1$ will have already been seen r times. We refer to the quantities $M_{n,r}$ as the occupancy probabilities. The quantity $M_{n,0}$ is also sometimes called the missing mass. It corresponds to the probability of seeing a new letter at time $n+1$ . In certain ecological contexts, it represents the probability of discovering a new species. While properties of $L_n$ and $M_{n,r}$ have been thoroughly studied in the context where $X_1,X_2,\dots$ are independent and identically distributed (i.i.d.) random variables, we have seen no work in the literature relating to the case where they follow a more general stochastic process. In this paper we give such results for a class of Markov chains that form a regime-switching model. This model expands the scope of potential applications. Moreover, it is our hope that this paper will stimulate interest in studying this problem in the context of other, more general, processes.

1.1. Related work

In the i.i.d. setting, the literature on the behavior of $L_{n}$ , $M_{n,r}$ , and related quantities is vast; see, for instance, the classic textbook [Reference Johnson and Kotz15], the survey [Reference Gnedin, Hansen and Pitman11], or the recent contributions in [Reference Ben-Hamou, Boucheron and Ohannessian2, Reference Decrouez, Grabchak and Paris7]. Applications include fields such as ecology [Reference Chao5, Reference Gandolfi and Sastri9, Reference Good12, Reference Good and Toulmin13], genomics [Reference Mao and Lindsay17], language processing [Reference Chen and Goodman6], authorship attribution [Reference Efron and Thisted8, Reference Thisted and Efron23, Reference Zhang and Huang25], information theory [Reference Ben-Hamou, Boucheron and Gassiat1, Reference Orlitsky, Santhanam and Zhang19], computer science [Reference Zhang24], and machine learning [Reference Bubeck, Ernst and Garivier4, Reference Grabchak and Zhang14].

We now briefly sketch several key results for the case where the random variables $X_1, X_2, \dots$ are i.i.d. with common distribution $P=(p_a)_{a\in A}$ on A. In this case, it is readily shown that

\begin{equation*}\mathrm{P}\{L_n=r\} = {\mathrm{E}}\{ M_{n,r}\}=\binom{n}{r}\sum_{a\in A}p^{1+r}_a(1-p_a)^{n-r}.\end{equation*}

This expression allows for a precise asymptotic analysis. Following [Reference Karlin16], it is understood that the main ingredients for this analysis are given by the counting measure $\boldsymbol{\nu}_{P}$ and the counting function $\nu$ . These are defined, respectively, by

(1) \begin{equation}\boldsymbol{\nu}_{P}(\textrm{d}u)=\sum_{a\in A}\delta_{p_a}(\textrm{d}u)\end{equation}

and

(2) \begin{equation}\nu({\varepsilon})=\boldsymbol{\nu}_{P}([{\varepsilon},1])=\sum_{a\in A}\textbf{1}\{p_{a}\ge{\varepsilon}\}, \qquad 0\le {\varepsilon}\le 1.\end{equation}

Next, recall that a function $\ell\,{:}\,(0,+\infty)\to {\mathbb R}$ is said to be slowly varying at $+\infty$ if, for any $c>0$ ,

\begin{eqnarray*}\lim_{x\to +\infty}\frac{\ell(cx)}{\ell(x)}=1.\end{eqnarray*}

In this case we write $\ell\in \mathcal{SV}$ . With this notation, if $\nu({\varepsilon})=\boldsymbol{\nu}_{P}([{\varepsilon},1])={\varepsilon}^{-\alpha}\ell(1/{\varepsilon})$ for some $\alpha\in(0,1)$ and some $\ell\in \mathcal{SV}$ , then, for $r\ge0$ ,

(3) \begin{equation}{\mathrm{E}}\{ M_{n,r}\}\sim\frac{\alpha\Gamma(1+r-\alpha)}{r!} n^{-(1-\alpha)}\ell(n).\end{equation}

This result is discussed, in greater detail, in the appendix. Non-asymptotic results are given in [Reference Decrouez, Grabchak and Paris7]. The main result of that paper is as follows.

Lemma 1.1. (Theorem 2.1 of [Reference Decrouez, Grabchak and Paris7].) Let $P=(p_{a})_{a\in A}$ be a probability measure on A with counting function $\nu$ . For all $n\ge 1$ , all $0\le r\le n-1$ , and all $0\le {\varepsilon}\le 1$ ,

\begin{align*}\mathrm{P}\{L_n=r\} = {\mathrm{E}}\{ M_{n,r}\} & = \binom{n}{r}\sum_{a\in A}p^{1+r}_a(1-p_a)^{n-r} \\[3pt] \quad & \le \frac{c(r)\nu({\varepsilon})}{n}+ 2^{1+r}\binom{n}{r}\int_{0}^{{\varepsilon}}\nu\left(\frac u2\right)u^r\left(1-\frac u2\right)^{n-r}{\rm d}u,\end{align*}

where

(4) \begin{equation}c(r)=\bigg\{\begin{array}{l@{\qquad}l}{\rm e}^{-1} &\mbox{ if }\ r=0,\\{\rm e}(1+r)/\sqrt{\pi}&\mbox{ if }\ r\ge 1.\end{array}\end{equation}

1.2. Regime-switching models

A natural extension of the i.i.d. case is to a class of regime-switching Markov chains or regime-switching models. In this context the elements in A no longer represent letters, but entire alphabets. Each $a\in A$ represents an alphabet, which we denote by $\{a\}\times{\mathbb N}_+$ , where ${\mathbb N}_+=\{1,2,\dots\}$ . This alphabet has its own distribution $P_a=(p_{a,k})_{k\ge1}$ , and we assume that observations from each alphabet are i.i.d. with distribution $P_a$ . However, we randomly perform transitions between alphabets following a Markov chain with transition operator Q. Formally, we consider a Markov chain $Z=(Z_n)_{n\ge 1}=(X_n,K_n)_{n\ge 1}$ on the product space $\mathcal A\coloneqq\ A\times {\mathbb N}_+$ with transition operator $\mathcal Q$ defined by

(5) \begin{equation}\mathcal Q((a',k'),(a,k))=Q(a',a)p_{a,k}, \qquad a,a'\in A, \quad k,k'\in{\mathbb N}_+.\end{equation}

We refer to $(Z_n)_{n\ge 1}$ as the regime-switching model and to $(X_n)_{n\ge 1}$ as the underlying process.

One important situation is when $|A|<\infty$ . In this case we can use many properties of finite Markov chains, even though the process of interest is defined on the countably infinite state space $\mathcal A$ . Some of our more detailed results are proved under this additional assumption. For other results, in the interests of generality we not only allow $|A|=\infty$ , but we remove the assumption that the underlying process is a Markov chain. Nevertheless, our motivation comes from the case where the transitions are Markovian and $|A|<\infty$ . Such models can be used to describe a variety of situations, such as:

  1. 1. (Classics) A researcher reads documents in an antique library. The documents are written in a variety of languages (e.g. Latin, Greek, Hebrew, etc.). Assume that transitions between documents written in different languages follow a Markov chain. Here, the regime-switching Markov chain $(Z_n)_{n\ge1}$ represents the sequence of ordered pairs comprised of the word that the researcher is currently reading and the language that the current document is written in. In this context, the missing mass represents the probability that the next word that the researcher encounters will be one that this researcher has not previously seen and will thus need to look up.

  2. 2. (Ecology) An ecologist is observing the animals that are found in a certain plot of a forest. However, the forest has several states (e.g. time of day, weather, etc.), with transitions between these following a Markov chain. To understand the differences in the distribution of species found under different states, the ecologist keeps track of both the species of the observed animal and the state of the forest.

  3. 3. (Computer science) A server periodically enters a state where there is a serious hacking attempt. Assume that transitions into and out of this state follow a Markov chain. To understand the effect of a serious hacking attempt on the number of packets that arrive, a researcher keeps track of the number of packets that arrive in time increments of, say, five minutes, along with the state of the server in that time period.

  4. 4. (Economics) An economy can be in one of several states, e.g. growth, recession, inflation, etc. One can model transitions between these states using a Markov chain. To understand the effect of the state of the economy on some economic indicator (e.g. the number of bank failures in a week), an economist keeps track of both the indicator and the state of the economy.

1.3. Organization

The main goal of this paper is to extend the results given in (3) and Lemma 1.1 from the i.i.d. case to the regime-switching model. We begin by giving results for a simple class of Markov chains, which will drive this model. Toward this end we introduce a useful technical result in Section 2, and then in Section 3 we consider the case of an ergodic Markov chain on a finite state space. In Section 4 we formally define the regime-switching model and give extensions of Lemma 1.1. In the interests of generality, most results in this section do not assume that transitions between alphabets are Markovian. However, this assumption is needed for the more detailed results. Then, in Section 5 we extend (3) to the case of the regime-switching model. Proofs are postposed to Section 6. A brief review of basic properties of regularly varying distributions on an alphabet is given in the appendix.

1.4. Notation

Before proceeding, we set up some notation. We write $\textbf{1}\{\cdot\}$ to denote the indicator function of event $\{\cdot\}$ . For a set A, we write $|A|$ to denote the cardinality of A. For real numbers $a,b\in\mathbb R$ , we write $a\vee b$ or $\max\{a,b\}$ to denote the maximum of a and b, and we write $a\wedge b$ or $\min\{a,b\}$ to denote the minimum of a and b. For two sequences g(n) and h(n) we write $g(n)\sim h(n)$ to mean $g(n)/h(n)\to1$ as $n\to\infty$ . We write $\Gamma(x) = \int_0^\infty u^{x-1}{\rm e}^{-u} \mathrm d u$ for $x>0$ to denote the gamma function.

2. Preliminaries

In this section we introduce a technical result that will be useful later. Toward this end, fix a finite or countably infinite set A, a Markov transition operator Q on A, and a probability measure $\mu$ on A. Let $X=(X_n)_{n\ge 1}$ be an A-valued random process defined on some probability space $(\Omega,\mathcal F,\mathrm{P}_\mu)$ such that X is a Q-Markov chain with initial distribution $\mu$ . We write ${\mathrm{E}}_{\mu}$ to denote the expectation under $\mathrm{P}_{\mu}$ . We write $Q^t$ to denote the t-step transition operator of the Markov chain. For all integers $n\ge 1$ and $a\in A$ , we set

\begin{equation*}L_n(a)\coloneqq\sum_{i=1}^{n}\textbf{1}\{X_i=a\} \end{equation*}

to be the local time of Markov chain X in state a, and we set

\begin{equation*}L_n\coloneqq\sum_{i=1}^{n}\textbf{1}\{X_i=X_{n+1}\}=L_n(X_{n+1})\end{equation*}

to denote the number of times that the state visited at time $n+1$ had been visited up to time n.

We now give a result that connects the distribution of $L_n$ with that of the local times of the reversed chain. We assume that the Q-Markov chain $(X_n)_{n\ge1}$ is irreducible, aperiodic, positive recurrent, and has stationary distribution $\pi=(\pi_a)_{a\in A}$ . We denote by $\hat X=(\hat X_n)_{n\ge 1}$ the associated reversed chain, i.e. an A-valued Markov chain with transition operator $\hat Q$ defined by

\begin{equation*}\hat Q(x,y)\coloneqq\frac{\pi(y)Q(y,x)}{\pi(x)}.\end{equation*}

It is easy to check that $\pi$ is also the stationary distribution of $\hat X$ and that the t-step transition operator of the reversed chain is given by

(6) \begin{eqnarray}\hat Q^t(x,y) = \frac{\pi(y)Q^t(y,x)}{\pi(x)}.\end{eqnarray}

We say that the chain X is reversible when $\pi(x)Q(x,y)=\pi(y)Q(y,x)$ . In this case, $\hat Q=Q$ and the chains X and $\hat X$ have the same distribution so long as they both start at the stationary distribution. We write $\hat L_n(a)$ to denote the local time of the reversed chain at a, i.e.

\begin{equation*}\hat{L}_n(a)\coloneqq\sum_{i=1}^{n}\textbf{1}\{\hat X_i=a\}.\end{equation*}

Lemma 2.1. Let A be a finite or countably infinite set. Suppose $X=(X_n)_{n\ge 1}$ is an irreducible, aperiodic, and positive recurrent Markov chain on A with stationary distribution $\pi$ and reversed chain $\hat X$ . Let $\mu$ and $\eta$ be arbitrary distributions on A. Then, for any positive function f and all integers $n\ge 1$ ,

\begin{equation*}{\mathrm{E}}_{\mu}\left[\frac{\eta(X_{n+1})}{\pi(X_{n+1})}\quad f(L_n)\right]={\mathrm{E}}_{\eta}\left[\frac{\mu(\hat X_{n+1})}{\pi(\hat X_{n+1})}\quad f(\hat L_{n+1}(\hat X_1)-1)\right]\!,\end{equation*}

where, on the right-hand side, it is understood that $\eta$ is taken as the initial distribution of the reversed chain, i.e. it is the distribution of $\hat X_1$ .

Remark 2.1. Note, in particular, that taking $\mu=\eta=\pi$ in the above formula, and supposing the chain to be reversible, we get that for any positive f,

\begin{equation*}{\mathrm{E}}_\pi\{ f(L_n)\}={\mathrm{E}}_\pi\{ f(L_{n+1}(X_1)-1)\},\end{equation*}

so that, under $\mathrm{P}_{\pi}$ , $L_n$ has the same distribution as $L_{n+1}(X_1)-1$ .

3. Finite Markov chains

In this section we provide a bound on $\mathrm{P}_\mu\{L_n= r\}$ in the context of an ergodic Markov chain on a finite state space. This result is interesting in itself, and it will be important in the following because such models will drive our regime-switching model. Let $X=(X_n)_{n\ge 1}$ be an irreducible and aperiodic Markov chain with finite state space A, transition matrix Q, and stationary distribution $\pi=(\pi_a)_{a\in A}$ . This implies that there exists an integer $t_0\ge1$ such that

(7) \begin{eqnarray}Q^{t_0}(a,b) >0 \quad \mbox{for all } a,b\in A.\end{eqnarray}

From (6), it follows that $\hat Q^{t_0}(a,b) >0$ for all $a,b\in A$ . Let

(8) \begin{eqnarray}\ell= \min_{a,b}Q^{t_0}(a,b), \quad \hat\ell= \min_{a,b}\hat Q^{t_0}(a,b),\quad \mbox{and } \lambda = |A|\min\{\ell,\hat\ell\},\end{eqnarray}

where $|A|$ is the cardinality of A. Note that $0<\lambda\le 1$ , and, for each $a\in A$ ,

(9) \begin{eqnarray}Q^{t_0}(a,\cdot) \ge \lambda u(\cdot) \quad \mbox{and } \hat Q^{t_0}(a,\cdot) \ge \lambda u(\cdot),\end{eqnarray}

where u is the uniform distribution on A. By Theorem 8 of [Reference Roberts and Rosenthal22], this implies that, for every $a\in A$ ,

\begin{equation*}\max_{B\subset A}|Q^n(a,B)-\pi(B)| \le (1-\lambda)^{n/t_0-1}, \qquad n=1,2,\dots.\end{equation*}

This result continues to hold if Q is replaced by $\hat Q$ . In this context, Theorem 2 of [Reference Glynn and Ormoneit10] gives the following concentration inequality for $L_n(a)$ .

Lemma 3.1. If $\lambda>0$ and $t_0$ are such that (9) holds, then, for any $a\in A$ , any $\gamma>0,$ and any initial distribution $\mu,$ we have

\begin{equation*}\mathrm{P}_{\mu}\{L_n(a)-{\mathrm{E}}_{\pi}[L_n(a)]\ge n\gamma\} \vee \mathrm{P}_{\mu}\{L_n(a)-{\mathrm{E}}_{\pi}[L_n(a)]\le -n\gamma\}\le \exp\bigg(-\frac{n}{2}\bigg(\frac{\lambda\gamma}{t_0} - \frac{2}{n} \bigg)^2\bigg) \end{equation*}

for $n>\frac{2t_0}{\lambda\gamma}$ .

Clearly, the above holds for both the chain X and the reversed chain $\hat X$ . Similar concentration inequalities can be obtained by applying Corollary 2.10 and Remark 2.11 of [Reference Paulin20]. Combining this with Lemma 2.1 gives the following.

Proposition 3.1. For $n>\frac{2t_0+r\lambda + \lambda(1-\pi_\wedge)}{\lambda\pi_\wedge}$ and any initial distribution $\mu=(\mu_a)_{a\in A}$ , we have

\begin{eqnarray*}\mathrm{P}_\mu\{L_n= r\}\le \mathrm{P}_\mu\{L_n\le r\} \le C \exp\bigg(-\frac{n}{2}\bigg(\frac{\lambda\pi_\wedge}{t_0} - \frac{2+(r+1)\lambda/t_0}{n} \bigg)^2\bigg),\end{eqnarray*}

where $t_0\ {\rm and}$ $\lambda$ are as above, $\pi_\wedge=\min_{a\in A}\pi_a$ , and $C = |A|\wedge\max_{a\in A}(\mu_a/\pi_a)$ .

In particular, note that when $\mu=\pi$ the constant $C=1$ . It is straightforward to check that the asymptotic behavior of the upper bound is given by

\begin{equation*}C \exp\bigg(-\frac{n}{2}\bigg(\frac{\lambda\pi_\wedge}{t_0} - \frac{2+(r+1)\lambda/t_0}{n} \bigg)^2\bigg) \sim C'\exp\bigg(-n\frac{\lambda^2\pi_\wedge^2}{2t_0^2} \bigg) \qquad \mbox{as } n\to\infty,\end{equation*}

where $C'= C \exp( t_0^{-2}\lambda\pi_\wedge (2t_0+(r+1)\lambda))$ .

Remark 3.1. It may be interesting to note that Proposition 3.1 gives a bound with exponential decay. This holds, in particular, for the special case where $X_1,X_2,\dots$ are i.i.d. random variables. In comparison, Corollary 2.1 of [Reference Decrouez, Grabchak and Paris7] focuses on the i.i.d. case and only gives the bound

\begin{equation*}\mathrm{P}_\mu\{L_n= r\} \le c(r)\frac{|A|}{n}, \qquad 0\le r\le n-1,\end{equation*}

where c(r) is given by (4).

The proof of Proposition 3.1 depends heavily on the assumption of a finite alphabet. While concentration inequalities for the local times of Markov chains in the case of infinite alphabets are well known and can be found in, e.g., [Reference Glynn and Ormoneit10, Reference Paulin20], there does not appear to be a simple way to transform these into bounds on $\mathrm{P}_\mu\{L_n=r\}$ . The issue comes from the fact that we need $\pi_\wedge>0$ , but it is always zero when A is an infinite set. An interesting situation, where we are able to deal with infinite alphabets, is the regime-switching model. This is the focus of the remainder of this paper.

4. The regime-switching model

This section formally introduces the regime-switching model and extends the finite-sample bounds given in Lemma 1.1 to this case. While we are primarily interested in the case where transitions between alphabets follow an ergodic Markov chain on a finite state space, our presentation is given in more generality. Let A be a finite or countably infinite set. For each $a\in A$ , let $P_a=(p_{a,k})_{k\ge 1}$ be a probability distribution on ${\mathbb N}_+$ . The finite-dimensional distributions of any discrete-time stochastic process $(Y_n)_{n\ge1}$ on A can be described by a family of conditional distributions $R=(R_n)_{n\ge1}$ , where $R_1(a) = \mathrm{P}(Y_1=a)$ and, for $n\ge2$ ,

\begin{equation*}R_n(a_n \mid a_1,a_2,\dots, a_{n-1}) = \mathrm{P}(Y_n=a_n \mid Y_1 =a_1,Y_2=a_2,\dots,Y_{n-1}= a_{n-1}).\end{equation*}

In the case where $P(Y_1 =a_1,Y_2=a_2,\dots,Y_{n-1}= a_{n-1})=0$ , we will take $R_n(\,\cdot \mid a_1,\break a_2,\dots, a_{n-1})$ to be an arbitrary probability measure on A.

We now introduce a process on the state space $\mathcal A\coloneqq\ A\times {\mathbb N}_+$ defined by the family of conditional distributions given by $\mathcal R=(\mathcal R_n)_{n\ge1}$ , where $\mathcal R_1$ satisfies $\mathcal R_1((a,k))=R_1(a)p_{a,k}$ and, for $n\ge2$ ,

\begin{eqnarray*}\mathcal R_n((a_n,k_n) \mid (a_1,k_1),(a_2,k_2),\dots, (a_{n-1},k_{n-1})) = R_n(a_n \mid a_1,a_2,\dots, a_{n-1})p_{a_n,k_n}.\end{eqnarray*}

Now, let $Z=(Z_n)_{n\ge1}$ be an $\mathcal A$ -valued stochastic process governed by $(\mathcal R_n)_{n\ge1}$ , and let $X=(X_n)_{n\ge 1}$ and $K=(K_n)_{n\ge 1}$ denote the first and second coordinate processes of Z, i.e.

\begin{equation*}Z_n=(X_n,K_n),\qquad n\ge 1.\end{equation*}

We will refer to the process X as the underlying process. Note, in particular, that X is A-valued, while K takes values in ${\mathbb N}_+$ . The next result gives a more explicit description of the dynamics of the processes X and K.

Lemma 4.1. In the above context, the following statements hold:

  1. (1) The finite-dimensional distributions of the process $(X_{n})_{n\ge 1}$ are determined by $(R_n)_{n\ge1}$ .

  2. (2) For all $n\ge 1$ and for all $k\ge 1$ , with probability 1,

    \begin{equation*}\mathrm{P}\{K_{n}=k\,\vert\,X_{1},\dots,X_{n}\}=p_{X_{n},k},\end{equation*}

where $p_{X_{n},k}$ is the random variable equal to $p_{a,k}$ on the event $\{X_{n}=a\}$ .

  1. (3) Conditionally on the variables $X_1,\dots,X_n$ , the variables $K_1,\dots,K_n$ are independent. In particular, for all $i=1,\dots,n$ and all $k\ge 1$ , with probability 1,

    \begin{equation*}\mathrm{P}\{K_{i}=K_{n+1}\,\vert\,X_{1},\dots,X_{n+1},K_{n+1}\}=p_{X_{i},K_{n+1}},\end{equation*}

where $p_{X_{i},K_{n+1}}$ is the random variable equal to $p_{a,k}$ on the event $\{X_{i}=a,K_{n+1}=k\}$ .

Remark 4.1. We are motivated by the case where $R=(R_n)_{n\ge1}$ represents the conditional distributions of a Markov chain with transition operator Q and initial distribution $\eta$ . In this case, we have $R_1=\eta$ and, for $n\ge2$ ,

\begin{equation*}R_n(a_n \mid a_1,a_2,\dots, a_{n-1}) = R_2(a_n \mid a_{n-1}) =Q(a_{n-1},a_n).\end{equation*}

It follows that, in this case, $\mathcal R_1((a_1,k_1))=\eta(a_1)p_{a_1,k_1}$ and, for $n\ge2$ ,

\begin{equation*}\mathcal R_n((a_n,k_n) \mid (a_1,k_1),(a_2,k_2),\dots, (a_{n-1},k_{n-1})) = Q(a_{n-1},a_{n})p_{a,k},\end{equation*}

which is the Markov operator denoted by $\mathcal Q$ in (5). In this case, to emphasize the dependence on the initial distibution we will write $\mathrm{P}_\eta$ for P and ${\mathrm{E}}_\eta$ for ${\mathrm{E}}$ . It should be noted that the subscript refers to the initial distribution of the underlying process X and not of Z.

Our next result establishes a link between the quantities

\begin{equation*}\mathcal{L}_n=\sum_{i=1}^{n}\textbf{1}\{Z_i=Z_{n+1}\}{\quad\mbox{and}\quad} L_n=\sum_{i=1}^{n}\textbf{1}\{X_i=X_{n+1}\}.\end{equation*}

Lemma 4.2. For all $n\ge 1$ and all $0\le r\le n$ ,

\begin{equation*}\mathrm{P}\{\mathcal L_n=r\}= {\mathrm{E}} \bigg[\binom{L_n}{r}\sum_{k=1}^{+\infty}p^{1+r}_{X_{n+1},k}(1-p_{X_{n+1},k})^{L_n-r}\bigg],\end{equation*}

where we take $\binom{L_n}{r}=0$ when $L_n<r$ .

A slight modification of Lemma 4.2 brings us to the main result of this section, which extends Lemma 1.1 from the i.i.d. case to the regime-switching case. First, we introduce some notation. For all $a\in A$ , we write $\nu(a,\cdot)$ to denote the counting function of $P_{a}=(p_{a,k})_{k\in\mathbb N_+}$ , which is defined, for all $0\le{\varepsilon}\le 1$ , by

(10) \begin{eqnarray}{\nu}(a,{\varepsilon})=\sum_{k\ge 1}\textbf{1}\{p_{a,k}\ge{\varepsilon}\}.\end{eqnarray}

Theorem 4.1. For any $n\ge 1$ and any $0\le r\le n-1$ , we have

(11) \begin{equation}\mathrm{P}\{\mathcal L_n= r\}\le \mathrm{P}\{L_n= r\}\sup_{a\in A}\sum_{k=1}^{+\infty}p^{1+r}_{a,k}+\inf_{0\le{\varepsilon}\le1}\{a^{n,r}({\varepsilon})+b^{n,r}({\varepsilon})\},\end{equation}

where

\begin{align*}a^{n,r}({\varepsilon}) & = c(r)\,{\mathrm{E}}\left[\textbf{1}\{L_n>r\}\frac{\nu(X_{n+1},{\varepsilon})}{L_n}\right]\!, \\[4pt] b^{n,r}({\varepsilon}) & = 2^{1+r}\int_{0}^{{\varepsilon}}u^r\,{\mathrm{E}}\left[\textbf{1}\{L_n>r\}\nu\left(X_{n+1},\frac u2\right)\binom{L_n}{r}\left(1-\frac u2\right)^{L_n-r}\right]{\rm d}u,\end{align*}

and where c(r) is as in (4).

Since the formulation of Theorem 4.1 is quite general, an explicit evaluation of the coefficients $a^{n,r}({\varepsilon})$ and $b^{n,r}({\varepsilon})$ can require cumbersome computations. More tractable formulas can be provided in a number of situations. We give several examples.

Example 4.1. Consider the situation where all distributions $P_a=(p_{a,k})_{k\ge1}$ are equal to the same distribution $P=(p_k)_{k\ge1}$ , and therefore all counting functions $\nu(a,\cdot)$ are equal to the counting function $\nu$ of P. In this scenario, an elementary reordering of the terms in (11) yields that, for any ${\varepsilon}\in[0,1]$ ,

\begin{equation*}\mathrm{P}\{\mathcal L_n= r\}\le \sum_{m=r}^{n} C_{r,m}({\varepsilon})\,\mathrm{P}\{L_n= m\},\end{equation*}

where

\begin{equation*}C_{r,r}({\varepsilon})=C_{r,r}=\sum_{k=1}^{+\infty}p^{1+r}_{k} \end{equation*}

and, for $1+r\le m\le n$ ,

\begin{equation*}C_{r,m}({\varepsilon})=\frac{c(r)\nu({\varepsilon})}{m}+2^{1+r}\binom{m}{r}\int_{0}^{{\varepsilon}}u^r\,\nu\left(\frac u2\right)\left(1-\frac u2\right)^{m-r}{\rm d}u,\end{equation*}

where c(r) is as in (4).

Example 4.2. Another favorable scenario corresponds to the case where all probabilities $P_a=(p_{a,k})_{k\ge1}$ have support contained in $\{1,\dots,M\}$ for some $M<+\infty$ independent of $a\in A$ , i.e.

\begin{equation*}p_{a,k}=0 \qquad \mbox{for } a\in A \mbox{ and } k\ge M+1.\end{equation*}

In this case, taking ${\varepsilon}=0$ on the right-hand side of (11), and noticing that $\nu(a,0)$ corresponds to the size of the support of $P_a$ , yields

\begin{equation*}\mathrm{P}\{\mathcal L_n= r\}\le \sum_{m=r}^{n} C'_{r,m}\,\mathrm{P}\{L_n= m\},\end{equation*}

where

\begin{equation*}C'_{r,r}=\sup_{a\in A}\sum_{k=1}^{M}p^{1+r}_{a,k}{\quad\mbox{and}\quad} C'_{r,m}=\frac{c(r)M}{m} \qquad \mbox{for }1+r\le m\le n\end{equation*}

and where c(r) is as in (4).

We now turn to the important situation where the distribution is regularly varying. In the i.i.d. case, the corresponding result is given in Corollary 2.2 of [Reference Decrouez, Grabchak and Paris7].

Proposition 4.1. Assume that, for some $\alpha\in[0,1]$ and some non-increasing function $\ell\in\mathcal{SV}$ , we have

\begin{equation*}\nu(a,\epsilon)\le \epsilon^{-\alpha}\ell(1/\epsilon)\end{equation*}

for all $a\in A$ and all $\epsilon\in(0,1]$ . In this case,

\begin{eqnarray*}\mathrm{P}\{\mathcal L_n= r\} \le c_1(\alpha,r) {\mathrm{E}}[\textbf{1}\{L_n>r\} L_n^{-(1-\alpha)}\ell(L_n)] + c_2(\alpha,r)\mathrm{P}\{L_n=r\},\end{eqnarray*}

where

\begin{align*}c_1(\alpha,r) &= c(r) + \frac{4^{1+r}}{r!}(1+r)^{1+r-\alpha}\gamma\bigg(1+r-\alpha,\frac{1}{2}\bigg),\\[3pt] c_2(\alpha,r) &= \left\{\begin{array}{l@{\quad}l}1&r=0,\\\min\{1,p_\vee^{r+1} r^{\alpha}\ell(r) + r^{-r}\} &r\ge1,\end{array}\right.\end{align*}

$p_\vee=\sup\{p_{a,k}\}_{(a,k)\in\mathcal A}, {\rm and}\ $ $\gamma(t,x)=\int_0^x u^{t-1}{\rm e}^{-u}{\rm d}u\ {\rm is\ the\ incomplete\ gamma\ function.}$

Remark 4.2. Note that in the case $\alpha=1$ and $r=0$ , the bound in Proposition 4.1 is trivial since it involves $\gamma\left(0,\frac{1}{2}\right)=+\infty$ . Even in the i.i.d. case, the bounds given in [Reference Decrouez, Grabchak and Paris7] are not able to deal with this case.

Remark 4.3. Note that Theorem 4.1 and Proposition 4.1 are quite general and hold no matter what the underlying process is. However, this generality has a cost. In particular, we still need to know quite a bit about the underlying process. In the case where the underlying process is a finite state space ergodic Markov chain, we can use Proposition 3.1 and related results to get more explicit formulas.

Corollary 4.1. Assume that $|A|<\infty$ and that the underlying process is an aperiodic and irreducible Markov chain with transition operator Q, stationary distribution $\pi=(\pi_a)_{a\in A}$ , and initial distribution $\eta$ . Let $\pi_\wedge=\min_{a\in A}\pi_a$ , let $t_0$ be as in (7), and let $\lambda$ be as in (8). Assume further that, for some $\alpha\in[0,1]$ and some non-increasing function $\ell\in\mathcal{SV}$ , we have

\begin{equation*}\nu(a,\epsilon)\le \epsilon^{-\alpha}\ell(1/\epsilon), \qquad a\in A, \quad \epsilon\in(0,1].\end{equation*}

For any $\epsilon\in(0,\pi_\wedge)$ , if $n>\frac{2t_0+r\lambda + \lambda(1-\pi_\wedge)}{\lambda\pi_\wedge} \vee \frac{2t_0 + \lambda(1-\pi_\wedge)}{\lambda(\pi_\wedge-\epsilon)}$ , then

\begin{eqnarray*}\mathrm{P}_\eta\{\mathcal L_n = r\} \le H(n,\epsilon),\end{eqnarray*}

where

\begin{align*}H(n,\epsilon) = c_1(\alpha,r)(n\epsilon)^{-(1-\alpha)}\ell(n\epsilon) & + c_2(\alpha,r) C \exp\bigg(-\frac{n}{2}\bigg(\frac{\lambda\pi_\wedge}{t_0} - \frac{2+(r+1)\lambda/t_0}{n} \bigg)^2\bigg) \\[3pt] & + c_3(\alpha,r)C \exp\bigg(-\frac{n}{2}\bigg(\frac{\lambda(\pi_\wedge-\epsilon)}{t_0} - \frac{2+\lambda/t_0}{n} \bigg)^2\bigg).\end{align*}

Here, C is as in Proposition 3.1, $c_1(\alpha,r)$ and $c_2(\alpha,r)$ are as in Proposition 4.1, and

\begin{equation*}c_3(\alpha,r) = c_1(\alpha,r)(r+1)^{-(1-\alpha)}\ell(r+1).\end{equation*}

It may be interesting to note that for any $\epsilon\in(0,\pi_\wedge)$ we have

\begin{eqnarray*}H(n,\epsilon) \sim \epsilon^{-(1-\alpha)}c_1(\alpha,r)n^{-(1-\alpha)}\ell(n) \qquad \mbox{as }n\to\infty.\end{eqnarray*}

5. Asymptotics for the regime-switching model

In this section we extend (3) from the i.i.d. case to the case of the regime-switching model, where the underlying process is an ergodic Markov chain on a finite state space. We first define regular variation of $P=(p_{a,k})_{(a,k)\in A\times\mathbb N_+}$ . For a review of basic facts about regularly varying distributions on $\mathbb N_+$ we refer the reader to Appendix A.

Definition 5.1. We say that $P=(p_{a,k})_{(a,k)\in A\times\mathbb N_+}$ is regularly varying with index $\alpha\in[0,1]$ if there exists an $\ell\in \mathcal{SV}$ and a function $C\,{:}\,A\mapsto[0,\infty)$ , which is not identically zero, such that, for each $a\in A$ ,

\begin{equation*}\lim_{{\varepsilon}\to 0}\frac{\nu(a,{\varepsilon})}{{\varepsilon}^{-\alpha}\ell(1/{\varepsilon})}=C(a),\end{equation*}

where $\nu$ is defined as in (10). In this case we write $P\in RV_\alpha(C,\ell)$ .

Remark 5.1. It is important to note that we allow $C(a)=0$ for some (but not all) $a\in A$ . When $C(a)=0$ , it means that $\nu(a,{\varepsilon})$ either does not approach infinity as ${\varepsilon}\to 0$ , or approaches it but at a rate that is slower than ${\varepsilon}^{-\alpha}\ell(1/{\varepsilon})$ . In particular, if $P\in RV_\alpha(C,\ell)$ and $\nu(a_1,{\varepsilon})={\varepsilon}^{-\alpha_1}\ell_1(1/{\varepsilon})$ for some $a_1\in A$ , $\alpha_1\in[0,\alpha)$ , and $\ell_1\in \mathcal{SV}$ , then $C(a_1)=0$ .

When $\alpha=0$ , we additionally assume that there exists an $\ell_0\in \mathcal{SV}$ and a function $D\,{:}\,A\mapsto[0,\infty)$ , which is not identically zero, such that, for each $a\in A$ ,

(12) \begin{eqnarray}\lim_{{\varepsilon}\to 0}\frac{\sum_{k\ge1}p_{a,k} \textbf{1}\{p_{a,k}\le\varepsilon\}}{\varepsilon \ell_0(1/\varepsilon) }= D(a).\end{eqnarray}

For simplicity of notation, for $x>0$ set

\begin{equation*}h_{\alpha,r}(x) = \left\{\begin{array}{l@{\quad}l}x^{-1}\ell_0(x) & \alpha=0;\\[3pt] \int_x^\infty u^{-1}\ell(u)\mathrm d u & \alpha=1, \ r=0;\\[3pt] x^{-(1-\alpha)}\ell(x) & \mbox{otherwise.}\end{array}\right.\end{equation*}

Propositions A.1 and A.2 imply that if $P\in RV_\alpha(C,\ell)$ then

(13) \begin{eqnarray}\lim_{n\to\infty}\frac{\binom{n}{r}\sum_{k=1}^\infty p_{a,k}^{r+1} (1-p_{a,k})^{n-r}}{h_{\alpha,r}(n)} = F(a,r),\end{eqnarray}

where

(14) \begin{eqnarray}F(a,r) = \left\{ \begin{array}{l@{\quad}l}D(a) & \alpha=0;\\[3pt] C(a) & \alpha=1, \ r=0;\\[3pt] C(a) \frac{\alpha \Gamma(r+1-\alpha)}{r!} & \mbox{otherwise.}\end{array}\right.\end{eqnarray}

Note that since $|A|<\infty$ the convergence in (13) is uniform in a. We now give the main result for this section.

Theorem 5.1. In the context of the regime-switching model, assume that $|A|<\infty$ and that the underlying process is an aperiodic and irreducible Markov chain with stationary distribution $\pi=(\pi_a)_{a\in A}$ and initial distribution $\eta$ . Assume further that $P\in RV_\alpha(C,\ell)$ with $\alpha\in[0,1]$ (when $\alpha=0$ additionally assume that (12) holds), and that $\ell$ (or $\ell_0$ when $\alpha=0$ ) is locally bounded away from 0 and $\infty$ on $[1,\infty)$ . In this case, for all $r\ge 0$ we have

\begin{equation*}\lim_{n\to\infty} \frac{\mathrm{P}_{\eta}\{\mathcal L_n = r\} }{h_{\alpha,r}(n) } = \sum_{a\in A} \pi_a^\alpha F(a,r) \end{equation*}

and

\begin{eqnarray*}\lim_{n\to\infty} \frac{\mathrm{P}_{\eta}\{\mathcal L_n = r\} }{{\mathrm{E}}_\eta [\textbf{1}\{L_n>r\}h_{\alpha,r}(L_n)] } = \frac{\sum_{a\in A} \pi_a^\alpha F(a,r)}{\sum_{a\in A} \pi_a^\alpha},\end{eqnarray*}

where F is given by (14).

This implies that for $\alpha\in(0,1)$ we have, up to a constant, the same asymptotics as for the upper bound in Corollary 4.1. It may be interesting to note that as part of the proof of the theorem we show that, for any $r\ge0$ ,

\begin{equation*}\lim_{n\to\infty} \frac{{\mathrm{E}}_\eta [\textbf{1}\{L_n>r\}h_{\alpha,r}(L_n)] }{h_{\alpha,r}(n)} = \sum_{a\in A} \pi_a^\alpha.\end{equation*}

6. Proofs

6.1. Proofs for Sections 2 and 3

Proof of Lemma 2.1. Let us first prove that, for any distributions $\mu$ and $\eta$ on A and any bounded function $g\,{:}\,A^{n+1}\to {\mathbb R}_+$ ,

(15) \begin{equation}{\mathrm{E}}_{\mu}\left[\frac{\eta(X_{n+1})}{\pi(X_{n+1})}g(X_1,\dots,X_{n+1})\right]={\mathrm{E}}_{\eta}\left[\frac{\mu(\hat X_{n+1})}{\pi(\hat X_{n+1})} g(\hat X_{n+1},\dots,\hat X_1)\right]\!,\end{equation}

where, on the right-hand side, it is understood that $\eta$ is taken as the initial distribution of the reversed chain. From the definition of $\hat Q$ , we obtain

\begin{align*}&{\mathrm{E}}_{\mu}\left[\frac{\eta(X_{n+1})}{\pi(X_{n+1})}g(X_1,\dots,X_{n+1})\right]\\[3pt]&\quad=\sum_{x_1,\dots,x_{n+1}}\frac{\eta(x_{n+1})}{\pi(x_{n+1})}g(x_1,\dots,x_{n+1})\mathrm{P}_{\mu}(X_1=x_1,\dots,X_{n+1}=x_{n+1})\\[3pt] &\quad=\sum_{x_1,\dots,x_{n+1}}\frac{\eta(x_{n+1})}{\pi(x_{n+1})}g(x_1,\dots,x_{n+1})\mu(x_1)Q(x_1,x_2)\cdots Q(x_n,x_{n+1})\\[3pt] &\quad=\sum_{x_1,\dots,x_{n+1}}\frac{\eta(x_{n+1})}{\pi(x_{n+1})}g(x_1,\dots,x_{n+1})\mu(x_1)\frac{\pi(x_2)}{\pi(x_1)}\hat Q(x_2,x_1)\cdots \frac{\pi(x_{n+1})}{\pi(x_n)}\hat Q(x_{n+1},x_n)\\[3pt] &\quad=\sum_{x_1,\dots,x_{n+1}}\frac{\mu(x_{1})}{\pi(x_{1})}g(x_1,\dots,x_{n+1})\eta(x_{n+1})\hat Q(x_{n+1},x_n)\cdots\hat Q(x_2,x_1)\\[3pt] &\quad=\sum_{x_1,\dots,x_{n+1}}\frac{\mu(x_{1})}{\pi(x_{1})}g(x_1,\dots,x_{n+1})\mathrm{P}_{\eta}(\hat X_{1}=x_{n+1},\dots,\hat X_{n+1}=x_{1})\\[3pt] &\quad={\mathrm{E}}_{\eta}\left[\frac{\mu(\hat X_{n+1})}{\pi(\hat X_{n+1})} g(\hat X_{n+1},\dots,\hat X_1)\right]\!,\end{align*}

which proves (15) . Then, for any positive f,

\begin{align*}{\mathrm{E}}_{\mu}\left[\frac{\eta(X_{n+1})}{\pi(X_{n+1})}f(L_n)\right] &= {\mathrm{E}}_{\mu}\bigg[\frac{\eta(X_{n+1})}{\pi(X_{n+1})}f\bigg(\sum_{i=1}^{n}\textbf{1}\{X_i=X_{n+1}\}\bigg)\bigg]\nonumber\\[4pt]&= {\mathrm{E}}_{\eta}\bigg[\frac{\mu(\hat X_{n+1})}{\pi(\hat X_{n+1})}f\bigg( \sum_{i=2}^{n+1}\textbf{1}\{\hat X_{i}=\hat X_{1}\}\bigg)\bigg]\\[4pt]&= {\mathrm{E}}_{\eta}\bigg[\frac{\mu(\hat X_{n+1})}{\pi(\hat X_{n+1})}f\bigg( \sum_{i=1}^{n+1}\textbf{1}\{\hat X_{i}=\hat X_1\}-1\bigg)\bigg]\nonumber\\[4pt]&= {\mathrm{E}}_{\eta}\bigg[\frac{\mu(\hat X_{n+1})}{\pi(\hat X_{n+1})}f( \hat L_{n+1}(\hat X_1)-1)\bigg],\nonumber\end{align*}

where the second line follows by applying identity (15) with

\begin{equation*}g(x_1,\dots,x_{n+1})=f\bigg( \sum_{i=1}^{n}\textbf{1}\{x_{i}=x_{n+1}\}\bigg).\end{equation*}

This completes the proof.

Proof of Proposition 3.1. Fix $r\ge 0$ and observe that the assumption on n implies that $\pi_\wedge> \frac{r}{n}$ . As a result, since ${\mathrm{E}}_{\pi}L_n(a)=n\pi_a$ , we deduce from Lemma 3.1 that

\begin{align*}\mathrm{P}_\mu\{L_n(a)\le r\} &= \mathrm{P}_\mu\{L_n(a)-n\pi_a\le -n(\pi_a-r/n)\} \\[3pt] &\le \mathrm{P}_\mu\{L_n(a)-n\pi_a\le -n(\pi_\wedge-r/n)\} \\[3pt] & \le \exp\bigg(-\frac{n}{2}\bigg(\frac{\lambda\pi_\wedge}{t_0} - \frac{2+r\lambda/t_0}{n} \bigg)^2\bigg)\end{align*}

when $n>2t_0/(\lambda(\pi_\wedge-r/n))$ , which is equivalent to $n>(2t_0+r\lambda)/(\lambda\pi_\wedge)$ . From here, we provide two bounds on $\mathrm{P}_\mu\{L_n\le r\}$ , which, when combined, give the desired result. First, note that

(16) \begin{align}\mathrm{P}_\mu\{L_n\le r\} &= \sum_{a\in A} \mathrm{P}_\mu\{X_{n+1} = a, L_n\le r\} \nonumber\\[3pt] &\le \sum_{a\in A} \mathrm{P}_\mu\{L_n(a)\le r\} \nonumber\\[3pt] & \le |A| \exp\bigg(-\frac{n}{2}\bigg(\frac{\lambda\pi_\wedge}{t_0} - \frac{2+r\lambda/t_0}{n} \bigg)^2\bigg).\end{align}

Next, using Lemma 2.1 with $f(u)=\textbf{1}\{u\le r\}$ , it follows that

\begin{align*}\mathrm{P}_\mu\{L_n\le r\} &= {\mathrm{E}}_{\pi}\bigg[\frac{\mu(\hat X_{n+1})}{\pi(\hat X_{n+1})}\textbf{1}\{\hat L_{n+1}(\hat X_1)\le r+1\}\bigg]\\[3pt] &\le \max_{b\in A}\frac{\mu(b)}{\pi(b)}\mathrm{P}_{\pi}\{\hat L_{n+1}(\hat X_1)\le r+1\}\\[3pt] &= \max_{b\in A}\frac{\mu(b)}{\pi(b)}\sum_{a\in A}\pi(a)\mathrm{P}_{a}\{\hat L_{n+1}(a)\le r+1\},\end{align*}

where $\mathrm{P}_{a}$ is the probability measure that corresponds to the case where the initial distribution is a point mass at a. Hence, once again using Lemma 3.1 and the fact that the stationary distribution of the reversed chain is the same as for the original chain, it follows that

(17) \begin{equation}\mathrm{P}_\mu\{L_n\le r\}\le \max_{b\in A}\frac{\mu_b}{\pi_b} \exp\bigg(-\frac{n+1}{2}\bigg(\frac{\lambda\pi_\wedge}{t_0} - \frac{2+(r+1)\lambda/t_0}{n+1} \bigg)^2\bigg),\end{equation}

provided $n+1>(2t_0+(r+1)\lambda)/(\lambda\pi_\wedge)$ , or equivalently $n>(2t_0+r\lambda + \lambda(1-\pi_\wedge))/(\lambda\pi_\wedge)$ . The desired result follows by combining (16) and (17).

6.2. Proofs for Section 4

For convenience, we sometimes denote $Y_{1\to\,m}=(Y_1,\dots, Y_{m})$ for a given process $(Y_n)_{n\ge 1}$ .

Proof of Lemma 4.1.

  1. (1) The statement follows easily from the structure of $\mathcal R$ . Let $p_1$ and $p_2$ be the functions defined, for $(a,k)\in\mathcal A$ , by $p_1(a,k)=a$ and $p_2(a,k)=k$ . We have

\begin{equation*}\mathrm{P}(X_1 = a) = \sum_{k\ge1}\mathrm{P}(Z_1=(a,k)) = \sum_{k\ge1}\mathcal R_1((a,k)) = \sum_{k\ge1}R_1(a)p_{a,k} = R_1(a).\end{equation*}

Further, for any $n\ge 1$ and any bounded $f\,{:}\,A\mapsto \mathbb R$ ,

\begin{equation*}{\mathrm{E}}[\,f(X_{n+1}) \mid X_{1\to n}] ={\mathrm{E}}[ {\mathrm{E}}[\,f\circ p_1(Z_{n+1}) \mid Z_{1\to n}] \mid X_{1\to n}].\end{equation*}

From here, the fact that

\begin{align*}{\mathrm{E}}[\,f\circ p_1(Z_{n+1}) \mid Z_{1\to n}] &= \sum_{a\in A}\sum_{k\in\mathbb N_+}\mathcal R_{n+1}((a,k) \mid Z_{1},Z_2,\dots,Z_n)f\circ p_1(a,k)\\[3pt] &= \sum_{a\in A}\sum_{k\ge1} R_{n+1}(a \mid X_{1},X_2,\dots,X_n)p_{a,k}\,f(a) \\[3pt] &= \sum_{a\in A} R_{n+1}(a \mid X_{1},X_2,\dots,X_n)\,f(a)\end{align*}

implies that

\begin{equation*}{\mathrm{E}}[\,f(X_{n+1}) \mid X_{1\to n}] = \sum_{a\in A} R_{n+1}(a \mid X_{1},X_2,\dots,X_n) f(a).\end{equation*}

In particular, taking $f(a) = \textbf{1}{\{a=a'\}}$ gives

\begin{equation*}\mathrm{P}(X_{n+1}=a' \mid X_{1\to n}) = R_{n+1}(a' \mid X_{1},X_2,\dots,X_n),\end{equation*}

which proves the claim.

  1. (2) For all $n\ge 2$ , all $k_{n}\ge 1$ , and all $a_1,\dots, a_n\in A$ satisfying $\mathrm{P}\{X_1=a_1,\dots,X_n=a_n\}>0$ ,

(18) \begin{equation}\mathrm{P}\{K_{n}=k_{n} \mid X_1=a_1,\dots,X_n=a_n\} = \sum_{k_1,\dots,k_{n-1}}\frac{\mathrm{P}\{Z_n=(a_n,k_n),\dots,Z_1=(a_1,k_1)\}}{\mathrm{P}\{X_1=a_1,\dots,X_n=a_n\}} .\end{equation}

Using point (1) it follows that

\begin{equation*}\mathrm{P}\{X_1=a_1,\dots,X_n=a_n\}=R_1(a_1)R_2(a_2 \mid a_1)\cdots R(a_{n} \mid a_1,a_2,\dots,a_{n-1}),\end{equation*}

and that

\begin{multline*}\mathrm{P}\{Z_n=(a_n,k_n),\dots,Z_1=(a_1,k_1)\} = \\[3pt] R_1(a_1)R_2(a_2 \mid a_1)\cdots R(a_{n} \mid a_1,a_2,\dots,a_{n-1}) p_{a_1,k_1}p_{a_2,k_2}\cdots p_{a_n,k_n}.\end{multline*}

Combining these two identities with (18), we deduce that

\begin{align}\mathrm{P}\{K_{n}=k_{n} \mid X_1=a_1,\dots,X_n=a_n\}&= p_{a_n,k_n}\sum_{k_1,\dots,k_{n-1}} p_{a_1,k_1} p_{a_2,k_2}\dots p_{a_{n-1},k_{n-1}}\nonumber\\&=p_{a_n,k_n},\nonumber\end{align}

where the last identity follows from the fact that $\sum_k p_{a,k}=1$ . The case where $n=1$ is similar.

  1. (3) For any $k_1,\dots k_n\in \mathbb N_+$ and any $a_1,\dots, a_n\in A$ satisfying $\mathrm{P}\{X_1=a_1,\dots,X_n=a_n\}>0$ ,

\begin{multline*}\\[-30pt]\mathrm{P}\{K_1=k_1,\dots,K_{n}=k_{n} \mid X_1=a_1,\dots,X_n=a_n\} = \\\prod_{i=1}^{n}p_{a_i,k_i} =\prod_{i=1}^{n}\mathrm{P}\{K_i=k_i \mid X_1=a_1,\dots,X_i=a_i\},\end{multline*}

where the first identity follows by arguments similar to those used in the proof of point (2) and the second follows directly from point (2). Finally, the proof that, for $i=1,2,\dots,n$ ,

\begin{equation*}\mathrm{P} \left\{K_{i}=K_{n+1}\,\vert\,X_{1},\dots,X_{n+1},K_{n+1}\right\}=p_{X_{i},K_{n+1}}\end{equation*}

is very similar and is omitted for brevity.

Proof of Lemma 4.2. Fix $n\ge 1$ and $0\le r\le n$ . Since $\{\mathcal L_n=r\}\subset\{L_n\ge r\}$ , we have

\begin{equation*}\mathrm{P}\{\mathcal L_n=r\}= \mathrm{P} \{L_n\ge r,\mathcal L_n=r\}.\end{equation*}

Noticing that the variable $L_n$ is $\sigma(X_{1\to\,n+1})$ -measurable by construction, we obtain

\begin{align}\mathrm{P}\{\mathcal L_n=r\} &=\mathrm{P}\bigg\{L_n\ge r,\,\sum_{i=1}^{n}\textbf{1}\{X_i=X_{n+1},K_i=K_{n+1}\}=r\bigg\} \nonumber\\[3pt] &={\mathrm{E}}\bigg[\textbf{1}\{L_n\ge r\}\mathrm{P}\bigg(\,\sum_{i=1}^{n}\textbf{1}\{X_i=X_{n+1},K_i=K_{n+1}\}=r\,\vert\,K_{n+1},X_{1\to\,n+1}\bigg)\bigg].\nonumber\end{align}

Conditionally on $K_{n+1}$ and $X_{1\to\,n+1}$ the variables $K_{1},\dots,K_{n}$ are, according to point (3) of Lemma 4.1, independent and satisfy

\begin{equation*}\mathrm{P}\{K_{i}=K_{n+1}\,\vert\,X_{1\to\,n+1},K_{n+1}\}=p_{X_{i},K_{n+1}}.\end{equation*}

As a result, conditionally on $K_{n+1}$ and $X_{1\to\,n+1}$ , the variable

\begin{equation*}\sum_{i=1}^{n}\textbf{1}\{X_i=X_{n+1},K_i=K_{n+1}\}\end{equation*}

follows a binomial distribution with parameters $L_n$ and $p_{X_{n+1},K_{n+1}}$ . Hence, we obtain

\begin{align}\mathrm{P}\{\mathcal L_n=r\}&={\mathrm{E}}\left[\textbf{1}\{L_n\ge r\}\binom{L_n}{r}p^{r}_{X_{n+1},K_{n+1}}(1-p_{X_{n+1},K_{n+1}})^{L_n-r}\right]\nonumber\\[4pt] &={\mathrm{E}}\left[\textbf{1}\{L_n\ge r\}\binom{L_n}{r}{\mathrm{E}}\left[p^{r}_{X_{n+1},K_{n+1}}(1-p_{X_{n+1},K_{n+1}})^{L_n-r}\,\vert\,X_{1\to\,n+1}\right]\right]\nonumber\\[4pt] &={\mathrm{E}}\bigg[\textbf{1}\{L_n\ge r\}\binom{L_n}{r}\sum_{k\ge1}p^{1+r}_{X_{n+1},k}(1-p_{X_{n+1},k})^{L_n-r}\bigg]\nonumber,\end{align}

where the last line follows from point (2) of Lemma 4.1.

Proof of Theorem 4.1. From Lemma 4.2 it follows that

\begin{align*}\mathrm{P}\{\mathcal L_n=r\} & = {\mathrm{E}}\bigg[\textbf{1}\{L_n= r\}\binom{L_n}{r}\sum_{k\ge1}p^{1+r}_{X_{n+1},k}(1-p_{X_{n+1},k})^{L_n-r}\bigg] \\[3pt] & \quad + {\mathrm{E}}\bigg[\textbf{1}\{L_n> r\}\binom{L_n}{r}\sum_{k\ge1}p^{1+r}_{X_{n+1},k}(1-p_{X_{n+1},k})^{L_n-r}\bigg] \\[3pt] & \eqqcolon A_1(n) + A_2(n).\end{align*}

Note that

\begin{equation*}A_1(n) = {\mathrm{E}}\bigg[\textbf{1}\{L_n=r\}\sum_{k\ge1}p^{1+r}_{X_{n+1},k}\bigg] \le \mathrm{P}\{L_n= r\}\sup_{a\in A}\sum_{k=1}^{+\infty}p^{1+r}_{a,k}.\end{equation*}

Now, using Lemma 1.1 inside the expectation yields

\begin{align*}A_2(n) & \le {\mathrm{E}}\left[\textbf{1}\{L_n>r\} \inf_{0\le{\varepsilon}\le 1}\{\alpha^{n,r}({\varepsilon})+\beta^{n,r}({\varepsilon})\}\right]\!,\end{align*}

where we have denoted

\begin{align}\alpha^{n,r}({\varepsilon}) & =\frac{c(r)\nu(X_{n+1},{\varepsilon})}{L_n},\nonumber\\[3pt]\beta^{n,r}({\varepsilon}) & = 2^{1+r}\binom{L_n}{r}\int_{0}^{{\varepsilon}}\nu\left(X_{n+1},\frac u2\right)u^r\left(1-\frac u2\right)^{L_n-r}{\rm d}u.\nonumber\end{align}

Finally, observing the fact that

\begin{align*}A_2(n) & \le {\mathrm{E}}\left[ \inf_{0\le{\varepsilon}\le 1}\{\textbf{1}\{L_n>r\}\alpha^{n,r}({\varepsilon})+\textbf{1}\{L_n>r\}\beta^{n,r}({\varepsilon})\}\right]\\[3pt]& \le \inf_{0\le{\varepsilon}\le 1}\left\{{\mathrm{E}}\left[\textbf{1}\{L_n>r\}\alpha^{n,r}({\varepsilon})\right]+{\mathrm{E}}\left[\textbf{1}\{L_n>r\}\beta^{n,r}({\varepsilon})\right]\right\}\\[3pt]& \le \inf_{0\le{\varepsilon}\le 1}\left\{ a^{n,r}({\varepsilon})+b^{n,r}({\varepsilon})\right\} \end{align*}

gives the result.

Proof of Proposition 4.1. By Lemma 4.2, we have

\begin{align*}\mathrm{P}\{\mathcal L_n=r\} &= {\mathrm{E}}\bigg[\textbf{1}\{L_n>r\} \binom{L_n}{r}\sum_{k\ge1}p^{1+r}_{X_{n+1},k}(1-p_{X_{n+1},k})^{L_n-r}\bigg] \\[3pt]& \quad + {\mathrm{E}}\bigg[\textbf{1}\{L_n=r\} \sum_{k\ge1}p^{1+r}_{X_{n+1},k}\bigg]\\[3pt] & \eqqcolon E_1 + E_2.\end{align*}

Corollary 2.2 from [Reference Decrouez, Grabchak and Paris7] implies that

\begin{equation*}E_1 \le c_1(\alpha,r) {\mathrm{E}}[\textbf{1}\{L_n>r\} L_n^{-(\alpha-1)}\ell(L_n)] .\end{equation*}

From here, the results follows in the case where $r=0$ from the fact that $ \sum_{k\ge1}p_{X_{n+1},k}=1$ .

Now, assume that $r\ge1$ . Taking $\epsilon=1/r$ in (2.4) of [Reference Decrouez, Grabchak and Paris7] implies that

\begin{equation*}E_2 \le \left(p_\vee^{r+1} r^{\alpha}\ell(r) + r^{-r}\right) \mathrm{P}\{L_n=r\}.\end{equation*}

On the other hand, since $\sum_{k\ge1}p^{1+r}_{X_{n+1},k}\le \sum_{k\ge1}p_{X_{n+1},k}=1$ , we also have

\begin{equation*}E_2 \le \mathrm{P}\{L_n=r\}.\end{equation*}

This completes the proof.

Proof of Corollary 4.1. Fix $\epsilon\in(0,\pi_\wedge)$ , let

\begin{equation*}A(n) = \{r+1\le L_n < n\epsilon\} \quad \mbox{and} \quad B(n) = \{L_n\ge (n\epsilon) \vee (r+1)\},\end{equation*}

and note that $A(n)\cup B(n)=\{r+1\le L_n\}$ . We can write

\begin{align*} {\mathrm{E}}_{\eta}\left[\textbf{1}\{L_n>r\} L_n^{-(1-\alpha)}\ell(L_n)\right] &= {\mathrm{E}}_{\eta}\left[\textbf{1}_{A(n)} L_n^{-(1-\alpha)}\ell(L_n)\right] + {\mathrm{E}}_{\eta}\left[\textbf{1}_{B(n)} L_n^{-(1-\alpha)}\ell(L_n)\right] \\[3pt] & = E_1+E_2.\end{align*}

Now note that

\begin{eqnarray*}E_1 \le (r+1)^{-(1-\alpha)}\ell(r+1)\mathrm{P}_\eta\{r+1\le L_n < n\epsilon\}\end{eqnarray*}

and

\begin{eqnarray*}E_2 \le (n\epsilon)^{-(1-\alpha)}\ell(n\epsilon) \mathrm{P}_\eta\{L_n\ge n\epsilon\}.\end{eqnarray*}

Combining this with Proposition 4.1 gives

\begin{multline*}\mathrm{P}_{\eta}\{\mathcal L_n = r\} \le \inf_{\epsilon\in(0,\pi_\wedge)}\big\{ c_1(\alpha,r)(n\epsilon)^{-(1-\alpha)}\ell(n\epsilon) \mathrm{P}_\eta\{L_n\ge n\epsilon\} + c_2(\alpha,r)\mathrm{P}_{\eta}\{L_n=r\} \\[3pt] + c_3(\alpha,r)\mathrm{P}_\eta\{r+1\le L_n < n\epsilon\} \big\}.\end{multline*}

From here, the result follows by applying Proposition 3.1.

6.3. Proofs for Section 5

To prove Theorem 5.1, we begin with two technical results.

Lemma 6.1. Let $(X_n)_{n\ge1}$ be an irreducible and aperiodic Markov chain on a finite state space A and with stationary distribution $\pi = (\pi_a)_{a\in A}$ . Let $\pi_\wedge=\min_{a\in A}\pi_a$ and let $L_n= \sum_{k=1}^n \textbf{1}\{X_k=X_{n+1}\}$ .

  1. 1. For any $\beta\in\mathbb R$ , any $\epsilon\in[0,\pi_\wedge)$ , any $r>0$ , and any initial distribution $\eta$ , we have

    \begin{eqnarray*}\lim_{n\to\infty} n^{\beta} \mathrm{P}_{\eta}\left\{\frac{L_n}{n}\le\epsilon \right\} =0\end{eqnarray*}
    and
    \begin{equation*}\lim_{n\to\infty}n^{\beta} \mathrm{P}_{\eta}\{L_n=r\}=0.\end{equation*}
  2. 2. If $\alpha\in[0,1]$ and $\ell\in\mathcal{SV}$ , then, with probability 1,

    (19) \begin{eqnarray}\lim_{n\to\infty}\left( \frac{L_n^{-(1-\alpha)}\ell(L_n)}{n^{-(1-\alpha)}\ell(n)} - \pi_{X_{n+1}}^{-(1-\alpha)} \right)= 0\end{eqnarray}
    and, for any $r\ge0$ and any initial distribution $\eta$ ,
    \begin{equation*}\lim_{n\to\infty} \frac{{\mathrm{E}}_{\eta}[\textbf{1}\{L_n>r\}L_n^{-(1-\alpha)}\ell(L_n)]}{n^{-(1-\alpha)}\ell(n)} = \sum_{a\in A} \pi_a^\alpha .\end{equation*}

Proof. The first part follows immediately from the exponential bound in Proposition 3.1.

We now turn to the second part. For ease of notation, set $h(x) = x^{-(1-\alpha)}\ell(x)$ . Since the Markov chain is irreducible and aperiodic on a finite state space, it is recurrent and hence $\lim_{n\to\infty}L_n=\infty$ with probability 1. Further, it satisfies the strong law of large numbers, which means that for each $a\in A$ , if $L_n(a) = \sum_{k=1}^n \textbf{1}\{X_k=a\}$ then $\lim_{n\to\infty} L_n(a)/n=\pi_a$ with probability 1. Since A is a finite set, with probability 1, this convergence can be taken to be uniform in a. Let $\Omega_0\subset\Omega$ with $\mathrm{P}_\eta(\Omega_0)=1$ , such that for any $\omega\in\Omega_0$ we have $\lim_{n\to\infty}L_n(\omega)=\infty$ and for any $\epsilon>0$ there exists an $N^{\prime}_\epsilon(\omega)$ such that if $n\ge N^{\prime}_\epsilon(\omega)$ then

\begin{equation*}\left|\frac{L_n(\omega)}{n} - \pi_{X_{n+1}(\omega)}\right|<\epsilon.\end{equation*}

Now fix $\epsilon>0$ and $\omega\in\Omega_0$ . There exists an $N_\epsilon(\omega)>0$ such that if $n\ge N_\epsilon(\omega)$ then $\left|\frac{L_n(\omega)}{n} - \pi_{X_{n+1}(\omega)}\right|<0.5\pi_\wedge$ and

\begin{equation*}\left|\pi_{X_{n+1}(\omega)}^{-(1-\alpha)} - \left(\frac{L_n(\omega)}{n}\right)^{-(1-\alpha)}\right| < \epsilon/2.\end{equation*}

Further, by the uniform convergence theorem for regularly varying functions, see e.g. Proposition 2.4 of [Reference Resnick21], there is a $T_\epsilon$ such that, for any $x\in(0.5\pi_{\wedge},1]$ and any $t\ge T_\epsilon$ ,

\begin{equation*}\left|\frac{h(xt)}{h(t)} - x^{-(1-\alpha)}\right| < \epsilon/2.\end{equation*}

Since $\frac{L_n}{n}\le 1$ , it follows that, for $n\ge \max\{N_\epsilon(\omega),T_\epsilon\}$ ,

\begin{multline*}\bigg|\frac{h\big(\frac{L_n(\omega)}{n} n\big)}{h(n)} - \pi_{X_{n+1}(\omega)}^{-(1-\alpha)}\bigg|\le \bigg|\frac{h\big(\frac{L_n(\omega)}{n} n\big)}{h(n)} - \bigg(\frac{L_n(\omega)}{n}\bigg)^{-(1-\alpha)}\bigg| \\+ \bigg|\pi_{X_{n+1}(\omega)}^{-(1-\alpha)} - \bigg(\frac{L_n(\omega)}{n}\bigg)^{-(1-\alpha)}\bigg| < \epsilon,\end{multline*}

which proves (19).

We now turn to the last part. Fix $\epsilon\in(0,\pi_\wedge)$ and let

\begin{equation*}A(n) = \{r+1\le L_n < n\epsilon\} \quad \mbox{and} \quad B(n) = \{L_n\ge (n\epsilon) \vee (r+1)\}.\end{equation*}

Note that $A(n)\cup B(n)=\{r+1\le L_n\}$ . We can write

\begin{align*}&\frac{{\mathrm{E}}_{\eta}[\textbf{1}\{L_n>r\}L_n^{-(1-\alpha)}\ell(L_n)]}{n^{-(1-\alpha)}\ell(n)} \\[3pt]&\qquad\qquad = \frac{{\mathrm{E}}_{\eta}[ \textbf{1}_{A(n)}L_n^{-(1-\alpha)}\ell(L_n)]}{n^{-(1-\alpha)}\ell(n)}+ \frac{{\mathrm{E}}_{\eta}[\textbf{1}_{B(n)}L_n^{-(1-\alpha)}\ell(L_n)]}{n^{-(1-\alpha)}\ell(n)} \\[3pt]&\qquad\qquad\eqqcolon E_{A}(n) + E_{B}(n).\end{align*}

Fix $\delta>0$ ; by the Potter bounds (see, e.g., Theorem 1.5.6 of [Reference Bingham, Goldie and Teugels3]), there exists a $K_\delta>0$ such that

\begin{align*}E_A(n) \le K_\delta {\mathrm{E}}_{\eta}\bigg[\textbf{1}_{A(n)} \bigg( \frac{L_n}{n}\bigg)^{-(1-\alpha)-\delta} \bigg]\le K_\delta \mathrm{P}_{\eta}(A(n)) n^{1-\alpha+\delta} \to0,\end{align*}

where the convergence follows by the first part of this lemma. Similarly,

\begin{eqnarray*}\frac{\textbf{1}_{B(n)}L_n^{-(1-\alpha)}\ell(L_n)}{n^{-(1-\alpha)}\ell(n)} &\le& K_\delta \textbf{1}_{B(n)} \left( \frac{L_n}{n}\right)^{-(1-\alpha)-\delta} \le K_\delta \epsilon^{-(1-\alpha)-\delta}.\end{eqnarray*}

Combining this with the fact that $\pi_{X_{n+1}}^{-(1-\alpha)}$ is bounded means that we can use dominated convergence to get

\begin{align*} \lim_{n\to\infty}E_B(n) &= \lim_{n\to\infty}(E_{B} (n) + {\mathrm{E}}_{\eta}[\pi_{X_{n+1}}^{-(1-\alpha)}] - {\mathrm{E}}_{\eta}[\pi_{X_{n+1}}^{-(1-\alpha)}])\\[3pt]&= {\mathrm{E}}_{\eta}\bigg[\lim_{n\to\infty} \bigg(\frac{\textbf{1}_{B(n)}L_n^{-(1-\alpha)}\ell(L_n)}{n^{-(1-\alpha)}\ell(n)} - \pi_{X_{n+1}}^{-(1-\alpha)}\bigg)\bigg] + \lim_{n\to\infty} {\mathrm{E}}_{\eta}[\pi_{X_{n+1}}^{-(1-\alpha)}] \\[3pt] &=\lim_{n\to\infty} {\mathrm{E}}_{\eta}[\pi_{X_{n+1}}^{-(1-\alpha)}] = {\mathrm{E}}_{\pi}[\pi_{X_{1}}^{-(1-\alpha)}] =\sum_{a\in A} \pi_a^\alpha,\end{align*}

where the third equality follows from (19) and the fact that, with probability 1, there exists a (random) N such that $\textbf{1}_{B(n)}=1$ for all $n\ge N$ , and the fourth equality follows by the fact that the distribution of $X_{n}$ converges weakly to $\pi$ , Skorokhod’s representation theorem, and dominated convergence.

Lemma 6.2. Let $|A|<\infty$ and let $P\in RV_\alpha(C,\ell)$ . When $\alpha=0$ assume, in addition, that (12) holds.

  1. 1. Let $(X_n)_{n\ge1}$ be any sequence of A-valued random variables and let $(N_n)_{n\ge1}$ be a sequence of $\mathbb N$ -valued random variables such that, with probability 1 $N_n\to\infty$ as $n\to\infty$ . With probability 1,

    \begin{equation*}\lim_{n\to\infty}\bigg( \frac{\binom{N_n}{r}\sum_{k=1}^\infty p_{X_{n+1},k}^{r+1} (1-p_{X_{n+1},k})^{N_n-r}}{h_{\alpha,r}(N_n)} - F(X_{n+1}, r)\bigg)=0.\end{equation*}
  2. 2. Let $X=(X_k)_{k\ge1}$ be an irreducible and aperiodic Markov chain with state space A and stationary distribution $\pi=(\pi_a)_{a\in A}$ . If $L_n= \sum_{k=1}^n \textbf{1}\{X_k=X_{n+1}\}$ , then, with probability 1,

    \begin{equation*}\lim_{n\to\infty}\bigg(\frac{\binom{L_n}{r}\sum_{k=1}^\infty p^{1+r}_{X_{n+1},k}(1-p_{X_{n+1},k})^{L_n-r}}{h_{\alpha,r}(n)}- \pi_{X_{n+1}}^{-(1-\alpha)}F(X_{n+1},r)\bigg) =0.\end{equation*}

Note that in the first part the sequences $(X_{n})$ and $(N_n)$ may be dependent or independent.

Proof. We begin with the first part. Let $\Omega_0\in\mathcal F$ be a set with $\mathrm{P}(\Omega_0)=1$ such that, for any $\omega\in\Omega_0$ , $N_n(\omega)\to\infty$ . Fix $\epsilon>0$ and $\omega\in\Omega_0$ . Since (13) holds uniformly in a, it follows that there is an $M_\epsilon>0$ such that for all $m\ge M_\epsilon$ and all $n\ge1$ we have

\begin{eqnarray*}\left|\frac{\binom{m}{r}\sum_{k=1}^\infty p_{X_{n+1}(\omega),k}^{r+1} (1-p_{X_{n+1}(\omega),k})^{m-r}}{ h_{\alpha,r}(m)}- F(X_{n+1}(\omega),r) \right|<\epsilon.\end{eqnarray*}

Now let $M'_\epsilon(\omega)>0$ be a number such that if $n\ge M'_\epsilon(\omega)$ then $N_n(\omega)\ge M_\epsilon$ . For all such n, the above holds with $N_n(\omega)$ in place of m. From here, the first part follows.

For the second part, we have

\begin{align*}&\frac{\binom{L_n}{r}\sum_{k=1}^\infty p^{1+r}_{X_{n+1},k}(1-p_{X_{n+1},k})^{L_n-r}}{h_{\alpha,r}(n)}- \pi_{X_{n+1}}^{-(1-\alpha)}F(X_{n+1},r) \\[3pt] & \quad = \bigg(\frac{h_{\alpha,r}(L_n)}{h_{\alpha,r}(n)} - \pi_{X_{n+1}}^{-(1-\alpha)}\bigg)\bigg( \frac{\binom{L_n}{r}\sum_{k=1}^\infty p^{1+r}_{X_{n+1},k}(1-p_{X_{n+1},k})^{L_n-r}}{h_{\alpha,r}(L_n)}- F(X_{n+1},r)\bigg) \\[3pt]&\qquad+\ \pi_{X_{n+1}}^{-(1-\alpha)} \bigg( \frac{\binom{L_n}{r}\sum_{k=1}^\infty p^{1+r}_{X_{n+1},k}(1-p_{X_{n+1},k})^{L_n-r}}{h_{\alpha,r}(L_n)}- F(X_{n+1},r)\bigg) \\[3pt]&\qquad +\ F(X_{n+1},r) \left( \frac{h_{\alpha,r}(L_n)}{h_{\alpha,r}(n)} - \pi_{X_{n+1}}^{-(1-\alpha)} \right)\!.\end{align*}

Since the Markov chain X is irreducible on a finite state space, all of its states are recurrent and hence $\lim_{n\to\infty}L_n=\infty$ with probability 1. Thus, by the first part of this lemma, the fact that $\max_{a\in A}F(a,r)<\infty$ , and the fact that $\pi_{X_{n+1}}^{-(1-\alpha)}\le \left(\min_a\pi_a\right)^{-(1-\alpha)}$ , it suffices to show that, with probability 1,

\begin{equation*}\lim_{n\to\infty}\left( \frac{h_{\alpha,r}(L_n)}{h_{\alpha,r}(n)} - \pi_{X_{n+1}}^{-(1-\alpha)} \right)= 0,\end{equation*}

which holds by Lemma 6.1.

Proof of Theorem 5.1. Note that, by Lemma 4.2,

\begin{multline*}\mathrm{P}_\eta\{\mathcal L_n=r\} = {\mathrm{E}}_{\eta}\bigg[\textbf{1}\{L_n>r\}\binom{L_n}{r}\sum_{k\ge1}p^{1+r}_{X_{n+1},k}(1-p_{X_{n+1},k})^{L_n-r}\bigg] \\+ {\mathrm{E}}_{\eta}\bigg[\textbf{1}\{L_n=r\}\sum_{k\ge1}p^{1+r}_{X_{n+1},k}\bigg] \eqqcolon E_1+E_2.\end{multline*}

We begin with $E_2$ . Since $\sum_{k\ge1}p^{1+r}_{X_{n+1},k}\le \sum_{k\ge1}p_{X_{n+1},k}=1$ , it follows that

\begin{eqnarray*}\frac{E_2}{h_{\alpha,r}(n)}=\frac{{\mathrm{E}}_{\eta}\left[\textbf{1}\{L_n=r\}\sum_{k\ge1}p^{1+r}_{X_{n+1} ,k}\right]}{h_{\alpha,r}(n)} &\,{\le}\,& \frac{P_{\eta}\{L_n=r\}}{h_{\alpha,r}(n)} \to0,\end{eqnarray*}

where the convergence follows by Lemma 6.1. We next turn to $E_1$ . Note that (13), the fact that $|A|<\infty$ , and the fact that $h_{\alpha,r}$ is locally bounded imply that there is a constant $K'>0$ depending only on r with

\begin{align*}\frac{\binom{L_n}{r}\sum_{k\ge1}p^{1+r}_{X_{n+1},k}(1-p_{X_{n+1},k})^{L_n-r}}{h_{\alpha,r}(n)}&\le K'\frac{h_{\alpha,r}(L_n)}{h_{\alpha,r}(n)}\\&\le H_\delta K'\left( \frac{L_n}{n}\right)^{-(1-\alpha)-\delta} \end{align*}

for any $\delta>0$ and some $H_\delta>1$ . Here, the second inequality follows by the Potter bounds; see, e.g., Theorem 1.5.6 of [Reference Bingham, Goldie and Teugels3]. For simplicity, set $K_\delta = K' H_\delta$ . Fix $\epsilon\in(0,\pi_\wedge)$ and let

\begin{equation*}A(n) = \{r+1\le L_n < n\epsilon\} \quad \mbox{and} \quad B(n) = \{L_n \ge (n\epsilon) \vee (r+1)\}.\end{equation*}

Note that $A(n)\cup B(n)=\{r+1\le L_n\}$ . We can write

\begin{multline*}E_1 = {\mathrm{E}}_{\eta}\bigg[\textbf{1}_{A(n)}\binom{L_n}{r}\sum_{k\ge1}p^{1+r}_{X_{n+1},k}(1-p_{X_{n+1},k})^{L_n-r}\bigg] \\+ {\mathrm{E}}_{\eta}\bigg[\textbf{1}_{B(n)}\binom{L_n}{r}\sum_{k\ge1}p^{1+r}_{X_{n+1},k}(1-p_{X_{n+1},k})^{L_n-r}\bigg] \eqqcolon E_{1A} + E_{1B}.\end{multline*}

By Lemma 6.1, we have

\begin{align*}\frac{E_{1A} }{h_{\alpha,r}(n)}& \le K_\delta {\mathrm{E}}_{\eta}\bigg[\textbf{1}_{A(n)} \left( \frac{L_n}{n}\right)^{-(1-\alpha)-\delta} \bigg] \\& \le K_\delta \mathrm{P}_{\eta}(A(n)) n^{1-\alpha+\delta} \to0.\end{align*}

Similarly,

\begin{align*}\frac{\textbf{1}_{B(n)}\binom{L_n}{r}\sum_{k\ge1}p^{1+r}_{X_{n+1},k}(1-p_{X_{n+1},k})^{L_n-r}}{h_{\alpha,r}(n)} &\le K_\delta \textbf{1}_{B(n)} \left( \frac{L_n}{n}\right)^{-(1-\alpha)-\delta}\\&\le K_\delta \epsilon^{-(1-\alpha)-\delta}.\end{align*}

Combining this with the fact that $\pi_{X_{n+1}}^{-(1-\alpha)}F(X_{n+1},r)$ is bounded for fixed r means that we can use dominated convergence to get

\begin{align*} \lim_{n\to\infty}\frac{E_{1B} }{h_{\alpha,r}(n)} & ={\mathrm{E}}_{\eta}\bigg[\lim_{n\to\infty} \bigg(\textbf{1}_{B(n)} \frac{\binom{L_n}{r}\sum_{k\ge1}p^{1+r}_{X_{n+1},k}(1-p_{X_{n+1},k})^{L_n-r}}{h_{\alpha,r}(n)} \\[4pt] &\quad - \pi_{X_{n+1}}^{-(1-\alpha)} F(X_{n+1},r) \bigg)\bigg] \\& \quad + \lim_{n\to\infty} {\mathrm{E}}_{\eta}[\pi_{X_{n+1}}^{-(1-\alpha)}F(X_{n+1},r)] \\& = \lim_{n\to\infty} {\mathrm{E}}_{\eta}[\pi_{X_{n+1}}^{-(1-\alpha)}F(X_{n+1},r)] = {\mathrm{E}}_{\pi}[\pi_{X_{1}}^{-(1-\alpha)}F(X_{1},r)] = \sum_{a\in A} \pi_a^\alpha F(a,r),\end{align*}

where the second equality follows from Lemma 6.2 and the fact that, with probability 1, there exists a (random) N such that $\textbf{1}_{B(n)}=1$ for all $n\ge N$ . The third equality follows from the fact that the distribution of $X_{n}$ converges weakly to $\pi$ , Skorokhod’s representation theorem, and dominated convergence. This gives the first part of Theorem 5.1. The second part follows from the first and Lemma 6.1.

Appendix A. Regular variation

In this appendix we briefly review several basic facts about regularly varying distributions on $\mathbb N_+$ . First, we recall that for a probability measure $P=(p_{k})_{k\in\mathbb N_+}$ on $\mathbb N_+$ , the counting measure $\boldsymbol{\nu}_P$ is defined by (1) and the counting function $\nu$ is defined by (2).

Definition A.1. A probability distribution $P=(p_{k})_{k\ge 1}$ with counting function $\nu$ is said to be regularly varying, with exponent $\alpha\in[0,1]$ , if

\begin{equation*}\lim_{{\varepsilon}\to 0}\frac{\nu({\varepsilon})}{{\varepsilon}^{-\alpha}\ell(1/{\varepsilon})}=1\end{equation*}

for some $\ell\in\mathcal{SV}$ . In this case, we write $P\in RV_{\alpha}(\ell)$ .

To motivate this definition, we recall the following fact from [Reference Gnedin, Hansen and Pitman11]. For $\alpha\in(0,1)$ , we have $P\in RV_{\alpha}(\ell)$ if and only if $p_k= k^{-1/\alpha} \ell^*(k) $ for some $\ell^*\in\mathcal{SV}$ , which is, in general, different from $\ell$ . When $\alpha=0$ , a sufficient condition for $P\in RV_{\alpha}(\ell)$ is that there exists an $\ell_0\in\mathcal{SV}$ with

(20) \begin{eqnarray}\int_{[0,\varepsilon]} x \boldsymbol{\nu}_P(\mathrm d x) =\sum_{k\ge1}p_k \textbf{1}\{p_k\le\varepsilon\} \sim \varepsilon \ell_0(1/\varepsilon) \qquad \mbox{as }\varepsilon\to0.\end{eqnarray}

In this case, we necessarily have

\begin{equation*}\ell(x) \sim \int_1^x u^{-1}\ell_0(u)\mathrm du \qquad \mbox{as } x\to\infty\end{equation*}

and $\ell_0(x)/\ell(x)\to0$ as $x\to\infty$ ; see Proposition 15 of [Reference Gnedin, Hansen and Pitman11]. We will generally assume that (20) holds in this case.

Proposition A.1. Let $P=(p_{k})_{k\ge 1}\in RV_\alpha(\ell)$ . If $\alpha\in(0,1)$ then, for all $r\ge 0$ ,

(21) \begin{eqnarray}\lim_{n\to\infty}\frac{\binom{n}{r}\sum_{k=1}^\infty p_{k}^{r+1} (1-p_{k})^{n-r}}{n^{\alpha-1}\ell(n)} = \frac{\alpha \Gamma(r+1-\alpha)}{r!}.\end{eqnarray}

If $\alpha=0$ and (20) holds then, for every $r\ge0$ ,

\begin{eqnarray*}\lim_{n\to\infty}\frac{\binom{n}{r}\sum_{k=1}^\infty p^{r+1}_{k} (1-p_{k})^{n-r}}{n^{-1}\ell_0(n)} = 1.\end{eqnarray*}

If $\alpha=1$ then for every $r\ge1$ the result in (21) holds. If $\alpha=1$ and $r=0$ then

\begin{eqnarray*}\lim_{n\to\infty}\frac{\sum_{k=1}^\infty p_{k} (1-p_{k})^{n}}{\ell_1(n)} = 1,\end{eqnarray*}

where $\ell_1(x)=\int_x^\infty u^{-1}\ell(u)\mathrm du$ for $x>1$ is a function with $\ell_1\in\mathcal{SV}$ and $\ell(x)/\ell_1(x)\to0$ as $x\to\infty$ .

Proof.For $\alpha\in(0,1)$ this is Proposition 7 of [Reference Ohannessian and Dahleh18]. For $\alpha=1$ the result follows by combining Proposition 18 of [Reference Gnedin, Hansen and Pitman11] with Lemma 2 of [Reference Grabchak and Zhang14]. Similarly, for $\alpha=0$ the result follows by combining Proposition 19 of [Reference Gnedin, Hansen and Pitman11] with Lemma 2 of [Reference Grabchak and Zhang14]. The facts about $\ell_1$ are given in Proposition 14 of [Reference Gnedin, Hansen and Pitman11].

Proposition A.2. Fix $\alpha\in[0,1]$ and $\ell\in\mathcal{SV}$ . When $\alpha\ne0$ assume that

\begin{eqnarray*}\lim_{{\varepsilon}\to 0}\frac{\nu({\varepsilon})}{{\varepsilon}^{-\alpha}\ell(1/{\varepsilon})}=0 ,\end{eqnarray*}

and when $\alpha=0$ assume that

\begin{eqnarray*}\lim_{{\varepsilon}\to 0}\frac{\int_{[0,{\varepsilon}]} x \boldsymbol{\nu}_P(\mathrm d x) }{ \varepsilon \ell(1/{\varepsilon})} =0.\end{eqnarray*}

If $\alpha\in[0,1)$ then, for all $r\ge 0$ ,

(22) \begin{eqnarray}\lim_{n\to\infty}\frac{\binom{n}{r}\sum_{k=1}^\infty p_{k}^{r+1} (1-p_{k})^{n-r}}{n^{\alpha-1}\ell(n)} = 0.\end{eqnarray}

If $\alpha=1$ then for all $r\ge1$ the result in (22) holds. If $\alpha=1$ and $r=0$ then

\begin{eqnarray*}\lim_{n\to\infty}\frac{\sum_{k=1}^\infty p_{k} (1-p_{k})^{n}}{\ell_1(n)} = 0,\end{eqnarray*}

where $\ell_1(x)\ {\rm is\ derived\ from}$ $\ell$ as in Proposition A.1.

Proof. Let $q=r+1$ , let $\boldsymbol\nu_P^q(\mathrm d x) = x^q \boldsymbol\nu_P(\mathrm d x)$ , let $\Phi_q(n) = \frac{n^q}{q!} \sum_{k=1}^\infty p_k^q {\rm e}^{-np_k}$ , and note that

\begin{equation*}\Phi_q(n) = \frac{n^q}{q!}\int_0^1 {\rm e}^{-nx} \boldsymbol\nu_P^q(\mathrm d x).\end{equation*}

A standard application of Fubini’s theorem gives

\begin{equation*}\boldsymbol\nu_P^q([0,s]) = \int_{[0,s]} x^q \boldsymbol\nu_P(\mathrm dx) = q \int_0^s u^{q-1} \nu(u)\mathrm du - s^q\boldsymbol\nu_P((s,1]).\end{equation*}

Fix $\delta>0$ . For $\alpha\ne0$ , the assumptions imply that for small enough s we have $\nu(s)\le \delta s^{-\alpha}\ell(1/s)$ . It follows that for $\alpha\in(0,1)$ or $\alpha=1$ and $q\ge2$ we have

\begin{align*}\lim_{s\to0^+} \frac{\boldsymbol\nu_P^q([0,s])}{s^{q-\alpha}\ell(1/s)} &= \lim_{s\to0^+} \frac{q \int_0^s u^{q-1} \nu(u)\mathrm du}{s^{q-\alpha}\ell(1/s)}\le \delta \lim_{s\to0^+} \frac{q \int_0^s u^{q-1-\alpha} \ell(1/u)\mathrm du}{s^{q-\alpha}\ell(1/s)} \\[3pt]&= \delta \lim_{s\to0^+} \frac{q \int_{1/s}^\infty u^{-(q+1-\alpha)} \ell(u)\mathrm du}{s^{q-\alpha}\ell(1/s)} = \frac{q}{q-\alpha}\delta,\end{align*}

where the last equality follows by Karamata’s theorem (Proposition 1.5.10 in [Reference Bingham, Goldie and Teugels3]). Hence,

\begin{equation*}\lim_{s\to0^+} \frac{\boldsymbol\nu_P^q([0,s])}{s^{q-\alpha}\ell(1/s)} = 0.\end{equation*}

Similarly, when $\alpha=1$ and $q=1$ we have

\begin{align*}\lim_{s\to0^+} \frac{\boldsymbol{\nu}^1_P([0,s])}{\ell_1(1/s)} &= \lim_{s\to0^+} \frac{ \int_0^s \nu(u)\mathrm du}{\ell_1(1/s)}\le \delta \lim_{s\to0^+} \frac{\int_0^s u^{-1} \ell(1/u)\mathrm du}{\ell_1(1/s)} \\[3pt] &= \delta \lim_{s\to0^+} \frac{ \int_{1/s}^\infty u^{-1} \ell(u)\mathrm du}{\ell_1(1/s)} = \delta \lim_{s\to0^+} \frac{ \ell_1(1/s)}{\ell_1(1/s)} =\delta,\end{align*}

and hence

\begin{equation*}\lim_{s\to0^+} \frac{\boldsymbol{\nu}^1_P([0,s])}{\ell_1(1/s)} =0.\end{equation*}

When $\alpha=0$ we have

\begin{equation*}\boldsymbol\nu_P^q([0,s]) = \int_{[0,s]} x^{q-1} \boldsymbol\nu_P^1(\mathrm dx) \le s^{q-1}\boldsymbol\nu^1([0,s]),\end{equation*}

and hence

\begin{equation*}\lim_{s\to0^+} \frac{\boldsymbol\nu_P^q([0,s]) }{s^q\ell(1/s)}\le \lim_{s\to0^+} \frac{s^{q-1}\boldsymbol\nu_P^1([0,s]) }{s^q\ell(1/s)} = \lim_{s\to0^+} \frac{\boldsymbol\nu_P^1([0,s]) }{s\ell(1/s)}= 0.\end{equation*}

From here a version of Karamata’s Tauberian theorem (Theorem 1.7.1 in [Reference Bingham, Goldie and Teugels3]) implies that for $\alpha\in[0,1)$ or $\alpha=1$ and $q\ge2$ ,

\begin{eqnarray*}\lim_{n\to\infty}\frac{\Phi_q(n)}{n^{\alpha}\ell(n)} = \frac{\int_0^1 {\rm e}^{-nx} \boldsymbol\nu_P^q(\mathrm d x)}{q! n^{\alpha-q}\ell(n)}=0\end{eqnarray*}

and that the corresponding result holds for the case $\alpha=1$ and $q=1$ . From here, since $(n+1)^{\alpha-1}\ell(n+1)\sim n^{\alpha-1}\ell(n)$ and $\ell_1(n+1)\sim\ell_1(n)$ , we can use Lemma 2 of [Reference Grabchak and Zhang14] to complete the result.

Acknowledgements

The work of M. Kelbert and Q. Paris has been funded by the Russian Academic Excellence Project 5-100. The work of M. Grabchak is supported, in part, by the Russian Science Foundation (project number 17-11-01098). The authors would like to thank the two anonymous referees, whose comments led to improvements in the presentation of this paper.

References

Ben-Hamou, A., Boucheron, S. and Gassiat, E. (2016) Pattern coding meets censoring: (Almost) adaptive coding on countable alphabets. arXiv:1608.08367.Google Scholar
Ben-Hamou, A., Boucheron, S. and Ohannessian, M. I. (2017) Concentration inequalities in the infinite urn scheme for occupancy counts and the missing mass, with applications. Bernoulli 23, 249287.CrossRefGoogle Scholar
Bingham, N. H., Goldie, C. M. and Teugels, J. L. (1987) Regular Variation (Encyclopedia of Mathematics And Its Applications). Cambridge University Press.CrossRefGoogle Scholar
Bubeck, S., Ernst, D. and Garivier, A. (2013) Optimal discovery with probabilistic expert advice: Finite time analysis and macroscopic optimality. J. Mach. Learn. Res. 14, 601623.Google Scholar
Chao, A. (1981) On estimating the probability of discovering a new species. Ann. Statist. 9, 13391342.10.1214/aos/1176345651CrossRefGoogle Scholar
Chen, S. F. and Goodman, J. (1999) An empirical study of smoothing techniques for language modeling. Comput. Speech Lang. 13, 359394.10.1006/csla.1999.0128CrossRefGoogle Scholar
Decrouez, G., Grabchak, M. and Paris, Q. (2018) Finite sample properties of the mean occupancy counts and probabilities. Bernoulli 24, 19101941.10.3150/16-BEJ915CrossRefGoogle Scholar
Efron, B. and Thisted, R. (1976) Estimating the number of unseen species: How many words did Shakespeare know? Biometrika 63, 435447.Google Scholar
Gandolfi, A. and Sastri, C. C. A. (2004) Nonparametric estimations about species not observed in a random sample. Milan J. Math. 72, 81105.10.1007/s00032-004-0031-8CrossRefGoogle Scholar
Glynn, P. W. and Ormoneit, D. (2002) Hoeffding’s inequality for uniformly ergodic Markov chains. Statist. Prob. Lett. 56, 143146.CrossRefGoogle Scholar
Gnedin, A., Hansen, B. and Pitman, J. (2007) Notes on the occupancy problem with infinitely many boxes: General asymptotics and power laws. Prob. Surv. 4, 146171.10.1214/07-PS092CrossRefGoogle Scholar
Good, I. J. (1953) The population frequencies of species and the estimation of population parameters. Biometrika 40, 237264.10.1093/biomet/40.3-4.237CrossRefGoogle Scholar
Good, I. J. and Toulmin, G. H. (1956) The number of new species, and the increase in population coverage, when a sample is increased. Biometrika 43, 4563.10.1093/biomet/43.1-2.45CrossRefGoogle Scholar
Grabchak, M. and Zhang, Z. (2017) Asymptotic properties of Turing’s formula in relative error. Mach. Learn. 106, 17711785.10.1007/s10994-016-5620-6CrossRefGoogle Scholar
Johnson, N. L. and Kotz, S. (1977) Urn Models and Their Application. Wiley, New York.Google Scholar
Karlin, S. (1967) Central limit theorems for certain infinite urn schemes. J. Math. Mech. 17, 373401.Google Scholar
Mao, C. X. and Lindsay, B. G. (2002) A Poisson model for the coverage problem with a genomic application. Biometrika 89, 669681.10.1093/biomet/89.3.669CrossRefGoogle Scholar
Ohannessian, M. I. and Dahleh, M. A. (2012) Rare probability estimation under regularly varying heavy tails. In Proc. 25th Ann. Conf. on Learning Theory, Vol. 23, pp. 21.121.24.Google Scholar
Orlitsky, A., Santhanam, N. P. and Zhang, J. (2004) Universal compression of memoryless sources over unknown alphabets. IEEE Trans. Inf. Theory 50, 14691481.10.1109/TIT.2004.830761CrossRefGoogle Scholar
Paulin, D. (2015) Concentration inequalities for Markov chains by Marton couplings and spectral methods. Electron. J. Prob. 20, 132.10.1214/EJP.v20-4039CrossRefGoogle Scholar
Resnick, S. I. (2007) Heavy-Tail Phenomena: Probabilistic and Statistical Modeling. Springer, New York.Google Scholar
Roberts, G. O. and Rosenthal, J. S. (2004) General state space Markov chains and MCMC algorithms. Prob. Surv. 1, 2071.10.1214/154957804100000024CrossRefGoogle Scholar
Thisted, R. and Efron, B. (1987) Did Shakespeare write a newly discovered poem? Biometrika 74, 445455.10.1093/biomet/74.3.445CrossRefGoogle Scholar
Zhang, C. H. (2005) Estimation of sums of random variables: Examples and information bounds. Ann. Statist. 33, 20222041.10.1214/009053605000000390CrossRefGoogle Scholar
Zhang, Z. and Huang, H. (2007) Turing’s formula revisited. J. Quant. Ling. 14, 222241.10.1080/09296170701514189CrossRefGoogle Scholar