Hostname: page-component-745bb68f8f-b95js Total loading time: 0 Render date: 2025-02-10T10:15:53.105Z Has data issue: false hasContentIssue false

Entropy of killed-resurrected stationary Markov chains

Published online by Cambridge University Press:  25 February 2021

Servet Martínez*
Affiliation:
Universidad de Chile
*
*Postal address: Departamento Ingeniería Matemática and Centro Modelamiento Matemático, Universidad de Chile, UMI 2807 CNRS, Casilla 170-3, Correo 3, Santiago, Chile. Email address: smartine@dim.uchile.cl
Rights & Permissions [Opens in a new window]

Abstract

We consider a strictly substochastic matrix or a stochastic matrix with absorbing states. By using quasi-stationary distributions we show that there is an associated canonical Markov chain that is built from the resurrected chain, the absorbing states, and the hitting times, together with a random walk on the absorbing states, which is necessary for achieving time stationarity. Based upon the 2-stringing representation of the resurrected chain, we supply a stationary representation of the killed and the absorbed chains. The entropies of these representations have a clear meaning when one identifies the probability measure of natural factors. The balance between the entropies of these representations and the entropy of the canonical chain serves to check the correctness of the whole construction.

Type
Research Papers
Copyright
© The Author(s), 2021. Published by Cambridge University Press on behalf of Applied Probability Trust

1. Introduction

Our starting point is an irreducible strictly substochastic matrix $P_I$ on a countable set I. It defines a killed Markov chain when adding a cemetery that is an absorbing state. One of our purposes is to explore how we can study the entropy of this chain.

The problem can be posed for a Markov chain that is absorbed in a class of states, which is not necessarily a singleton. It is in this enlarged setting that we study the entropy. In this study we use the following concepts:

  • the quasi-stationary distribution (QSD) of the matrix $P_I$. It exists and it is unique when I is finite, and when I is infinite we assume there exists some QSD;

  • the Markov chain defined by resurrecting the absorbed chain with the QSD;

  • the 2-stringing of the resurrected Markov chain.

In Proposition 1 we show that every QSD defines a canonical stationary distribution associated with an absorbed chain. A construction of this associated stationary chain is given in Proposition 2, showing that it can be recovered from the resurrected chain with the QSD and some additional random elements: the killing on the orbits, the transition to the absorbing states, and a walk on the set of absorbing states. In this chain the absorbing states share the same transition probabilities.

The 2-stringing of the resurrected Markov chain is used to supply stationary Markov representations of the killed and the absorbed Markov chains in an appropriate way, to compute their entropies and provide a clear interpretation. This is done in Sections 5.1 and 5.2 and in Propositions 3 and 4. The entropies are interpreted by identifying the probability measure on the fibers of some natural factors. The entropy of the killed chain is the entropy of the resurrected chain plus the entropy of being alive or killed, and in the absorbed case we must add the entropy of the states where they are absorbed. These additional terms are given by the Abramov–Rokhlin formula on some factors. We note that since the killed and the absorbed trajectory are finite, then almost all the orbits of the stationary representations of the killed and the absorbed Markov chains contain all the killed or absorbed trajectories.

Finally, in Proposition 5 the entropy of the associated stationary chain is decomposed into the entropies of the absorbed chain and of the walk on the set of absorbing states. This last element serves to complete the understanding of the stationary representation of the absorbed chain: it gives the return time to I and the weights of the absorbing states that are necessary for stationarity. The main parameter of the whole construction is the reciprocal of the mean length of the walk, which is also the weight of the set of states I and the Perron–Frobenius eigenvalue of $P_I$; see the relations (9), (4) and (1).

Before proceeding further, we shall illustrate the associated Markov chain by considering a simplified model. Towards this end, the states in I will be called observable while the absorbing states will be called reservoirs; a transition from a reservoir to an observable state is called an outbreak. The dynamics within the reservoirs cannot be observed, it is only seen at the observable states. In a first approach this occurs in disease epidemics that emerge and disappear in time. A disease may emerge at some location due to human contact with a biological reservoir of some microorganism, diffuse in some geographical area, and when finished retreat unseen to some biological reservoir again. In this case the observable states represent the sizes of the healthy and infected populations in geographic areas, while the reservoirs correspond to the biological reservoirs of the microorganism. From our result the associated Markov chain requires a walk over the reservoirs in order to be stationary in time.

As usual, we use the capital letter H for the entropy of a discrete random variable and h for the entropy of a stationary chain.

Even if it is not usual, we use ‘trajectory’ to refer to a visit of a finite sequence of states, and ‘orbit’ for a bilateral sequence of states, that is, for a point in a bilateral product space.

2. Killed and absorbed chains

Let $P_I=(P(i,j)\,:\, i,j\in I)$ be an irreducible strictly substochastic matrix on a countable set I. As usual, we add a state $\partial\not\in I$ called a cemetery, and the extension of $P_I$ to $I\cup \{\partial\}$ is denoted by P, which satisfies $P(i, \partial)=1-\sum_{j\in I} P(i,j)$ for $i\in I$, and the absorption condition $P(\partial, \partial)=1$. Strictly, substochasticity is equivalent to $\sum_{i\in I}P(i,\partial)>0$. By irreducibility the states in I are transient. The process defined by $P_I$ is identified with the chain absorbed at a unique cemetery $\partial$.

The existence of a unique cemetery models the killing when this phenomenon can be interpreted similarly for to all states, for instance in extinction where a unique $\partial$ has a clear meaning. But there can be several ways of being killed or hitting a boundary, and this is expressed by the existence of a set of absorbing states which is not necessary a singleton.

So, we consider a more general situation. Let $P=(P(a,b)\,:\, a,b\in I\cup {\mathcal{E}})$ be a stochastic matrix on the countable set $I\cup {\mathcal{E}}$ whose restriction to I is $P_I=(P(i,j)\,:\,i,j\in I)$, and such that all the states in ${\mathcal{E}}$ are absorbing and are attained from I. This last means that $\sum_{i\in I} P(i,\varepsilon)>0$ for every $\varepsilon\in {\mathcal{E}}$. We retrieve the one-point absorption when ${\mathcal{E}}=\{\partial\}$.

Let ${\mathcal{X}}=({\mathcal{X}}_n)$ be a Markov chain with transition matrix P; it will be called absorbed chain. By $\mathbb{P}_a$ we mean the law of this chain when starting from $a\in I\cup {\mathcal{E}}$, and $\mathbb{E}_a$ denotes the associated mean expected value. Let $\tau_{\mathcal{E}}=\inf\{n\ge 1\,:\, {\mathcal{X}}_n\in {\mathcal{E}}\}$ be the first return time to ${\mathcal{E}}$. If I is finite, the hypotheses made on the chain imply $\mathbb{P}_i(\tau_{\mathcal{E}}<\infty)=1$ for all $i\in I$. In the countable case we assume that $\mathbb{P}_i(\tau_{\mathcal{E}}<\infty)=1$ for all $i\in I$.

Let ${\mathcal{X}}^{(K)}=({\mathcal{X}}_n\,:\, 0\le n<\tau_{\mathcal{E}})$ and ${\mathcal{X}}^{(A)}=({\mathcal{X}}_n\,:\, 0\le n\le \tau_{\mathcal{E}})$ be, respectively, the killed and the absorbed trajectory, both starting from ${\mathcal{X}}_0$. The first one finishes when it is killed, and the second one is stopped at the state where it is absorbed.

2.1. Quasi-stationary distributions

A QSD $\mu=(\mu(i)\,:\, i\in I)$ associated with $P_I$ is a probability measure $\mu$ on I such that $\text{for all } i\in I, \ \mathbb{P}_\mu({\mathcal{X}}_n=i \mid \tau_{\mathcal{E}}>n)=\mu(i)$. By writing this equality for $n=1$, we check that the row vector $\mu^\top$ is a strictly positive left eigenvector of $P_I$ properly normalized (the sum of its components is 1), with eigenvalue $\gamma=\mathbb{P}_\mu(\tau_{\mathcal{E}}>1)\in (0,1)$, that is,

(1) \begin{equation}\mu^\top P_I={\gamma\mu}^{\!\top}, \hbox{ with }\gamma=\sum_{i,j\in I}\mu(i)P(i,j)=\mathbb{P}_\mu(\tau_{\mathcal{E}}>1).\end{equation}

It follows that $\mathbb{P}_\mu(\tau_{\mathcal{E}}>k)=\gamma^k$ for all $k\ge 0$. So, if $\mu$ is a QSD then the survival time is Geometric($1-\gamma$) distributed; see Lemma 2.2 in [Reference Ferrari, Martínez and Picco8]. In the finite case there is a unique QSD (see [Reference Darroch and Seneta4]); it corresponds to the normalized left Perron–Frobenius eigenvector, and $\gamma$ is the associated eigenvalue. The properties of QSD depend on the killed trajectory ${\mathcal{X}}^{(K)}=({\mathcal{X}}_n\,:\, 0\le\break n<\tau_{\mathcal{E}})$. In the infinite case QSDs can exist or not (because the positive left eigenvectors can be of infinite mass), and when they exist there could be more than one (even a continuum of them). From now on we fix some QSD $\mu$ which, as just discussed, exists and is unique in the finite case and we assume its existence in the infinite case.

Let us give some independence properties between the time of killing and the absorption state. In Theorem 2.6 in [Reference Collet, Martínez and San Martín3] the independence relation $\mathbb{P}_\mu({\mathcal{X}}_n=i, \tau_{\mathcal{E}}>n)=\mu(i) \gamma^n$ for all $i\in I$ and $n\ge 0$ was stated. Let us prove that when starting from $\mu$, the pair $({\mathcal{X}}_{{\tau_{\mathcal{E}}}-1},{\mathcal{X}}_{\tau_{\mathcal{E}}})$, consisting of the last visited state before absorption and the absorption state, is independent of the random time $\tau_{\mathcal{E}}$. For $n\ge 1$, $i\in I$, and $\varepsilon\in {\mathcal{E}}$, we have

\begin{align*}\mathbb{P}_\mu({\mathcal{X}}_{{\tau_{\mathcal{E}}}-1}=i, {\mathcal{X}}_{\tau_{\mathcal{E}}}=\varepsilon, \tau_{\mathcal{E}}=n) & =\mathbb{P}_\mu({\mathcal{X}}_{n-1}=i, {\mathcal{X}}_n=\varepsilon, \tau_{\mathcal{E}}=n) \\& = \mathbb{P}_\mu({\mathcal{X}}_n=\varepsilon \mid {\mathcal{X}}_{n-1}=i)\mathbb{P}_\mu({\mathcal{X}}_{n-1}=i,\tau_{\mathcal{E}}>n-1) \\& = P(i,\varepsilon)\, \mu(i) \, \mathbb{P}_\mu(\tau_{\mathcal{E}}>n-1)= P(i,\varepsilon) \, \mu(i) \, \gamma^{n-1}.\end{align*}

Then, the independence relation follows. We can be more precise: we have

\begin{equation*}\mathbb{P}_\mu({\mathcal{X}}_{{\tau_{\mathcal{E}}}-1}=i, {\mathcal{X}}_{\tau_{\mathcal{E}}}=\varepsilon)=P(i,\varepsilon)\, \mu(i) \Bigg(\sum_{l\ge 1}\gamma^{l-1}\Bigg)=P(i,\varepsilon) \, \mu(i) \, (1-\gamma)^{-1}.\end{equation*}

Since $\mathbb{P}_\mu(\tau_{\mathcal{E}}=n)=(1-\gamma)\gamma^{n-1}$, the desired relation holds:

\begin{equation*}\mathbb{P}_\mu({\mathcal{X}}_{{\tau_{\mathcal{E}}}-1}=i, {\mathcal{X}}_{\tau_{\mathcal{E}}}=\varepsilon, \tau_{\mathcal{E}}=n)=\mathbb{P}_\mu({\mathcal{X}}_{{\tau_{\mathcal{E}}}-1}=i, {\mathcal{X}}_{\tau_{\mathcal{E}}}=\varepsilon)\mathbb{P}_\mu(\tau_{\mathcal{E}}=n).\end{equation*}

The above computations also show that the exit law of I when starting from $\mu$ satisfies

(2) \begin{equation} \begin{aligned} \mathbb{P}_\mu({\mathcal{X}}_{\tau_{\mathcal{E}}} & = \varepsilon)=(1-\gamma)^{-1}\Bigg(\sum_{i\in I}\mu(i)P(i,\varepsilon)\Bigg) , \\ \mathbb{P}_\mu({\mathcal{X}}_{\tau_{\mathcal{E}}} & = \varepsilon,\tau_{\mathcal{E}}=n)=\Bigg(\sum_{i\in I}\mu(i)P(i,\varepsilon)\Bigg) \gamma^{n-1}. \end{aligned}\end{equation}

These properties depend on the absorbed trajectory ${\mathcal{X}}^{(A)}=({\mathcal{X}}_n\,:\, 0\le n\le \tau_{\mathcal{E}})$.

3. Associated stationary chain

Let $\rho=(\rho(a)\,:\, a\in I\cup {\mathcal{E}})$ be a probability vector. We define the stochastic matrix $P^\rho=(P^\rho(a,b)\,:\,a,b\in I\cup {\mathcal{E}})$ by

\begin{equation*}P^\rho(i,b)=P(i,b) \hbox{ if } i\in I, b\in I\cup {\mathcal{E}} , \qquad P^\rho(\varepsilon,b)=\rho(b) \hbox{ if } \varepsilon\in {\mathcal{E}},b\in I\cup {\mathcal{E}}.\end{equation*}

So, in $P^\rho$ the $\varepsilon$-row is $P^\rho(\varepsilon,\bullet)=\rho^\top$ for all $\varepsilon\in {\mathcal{E}}$.

Proposition 1. Every QSD $\mu$ of $P_I$ determines a probability distribution $\pi=(\pi(a)\,:\, a\in I\cup {\mathcal{E}})$ given by

(3) \begin{equation}for\ all\ \varepsilon\in {\mathcal{E}}, \ \pi(\varepsilon)=\sum_{i\in I}\mu(i)P(i,\varepsilon), and\ for\ all\ i\in I, \ \pi(i)=\gamma \mu(i) , \end{equation}

which is a stationary distribution of the matrix $P^\pi=(P^\pi(a,b)\,:\, a,b\in I\cup {\mathcal{E}})$. In a reciprocal way, every distribution ${\widetilde{\pi}}$ that satisfies ${\widetilde{\pi}}^\top={\widetilde{\pi}}^\top P^{{\widetilde{\pi}}}$ is defined by a QSD $\mu$ as in (3). So, if $P_I$ has a unique QSD (as in the finite case) then there is a unique distribution $\pi$ that satisfies $\pi^\top=\pi^\top P^\pi$.

Proof. The QSD $\mu$ satisfies $\mu^\top P_I=\gamma\mu^\top$ with $\gamma\in (0,1)$ and $\sum_{i\in I} \mu(i)=1$. The vector $\pi$ is a probability distribution because, from (3) and (1), $\pi(I)=\sum_{i\in I} \pi(i)$ and $\pi({\mathcal{E}})=\sum_{\varepsilon\in {\mathcal{E}}}\pi(\varepsilon)$ satisfy

(4) \begin{equation}\pi(I)=\sum_{i\in I}\gamma \mu(i)=\gamma \hbox{ and }\pi({\mathcal{E}})=\sum_{i\in I} \mu(i)P(i,{\mathcal{E}})=1-\gamma.\end{equation}

We check that $\pi$ is stationary for $P^\pi$. For $\varepsilon\in {\mathcal{E}}$ and $j\in I$ we have

\begin{eqnarray*}&{}& \big(\pi^\top P^\pi\big)(\varepsilon)=\pi(\varepsilon)\sum_{\delta\in {\mathcal{E}}}\pi(\delta)+\sum_{i\in I} \pi(i) P(i,\varepsilon)=\pi(\varepsilon)(1-\gamma)+\gamma \pi(\varepsilon)=\pi(\varepsilon) , \\ &{}& \big(\pi^\top P^\pi\big)(\kern1.5pt j)=\pi(j)\sum_{\varepsilon\in {\mathcal{E}}}\pi(\varepsilon)+\sum_{i\in I} \pi(i) P(i,j)=\pi(j)(1-\gamma)+\gamma \pi(j)=\pi(j).\end{eqnarray*}

Then $\pi^\top=\pi^\top P^\pi$ holds.

Now we check that a probability distribution ${\widetilde{\pi}}$ that satisfies ${\widetilde{\pi}}^\top={\widetilde{\pi}}^\top P^{{\widetilde{\pi}}}$ is necessarily defined by a QSD $\mu$ as in (3). For $j\in I$ we have

\begin{equation*}{\widetilde{\pi}}(j)=\big({\widetilde{\pi}}^\top P^{{\widetilde{\pi}}}\big)(j)={\widetilde{\pi}}(j) \sum_{\delta\in {\mathcal{E}}}{\widetilde{\pi}}(\delta)+ \sum_{i\in I} {\widetilde{\pi}}(i) P(i,j),\end{equation*}

and so ${\widetilde{\pi}}(j)\big(1-\sum_{\delta\in {\mathcal{E}}}{\widetilde{\pi}}(\delta)\big)=\sum_{i\in I} {\widetilde{\pi}}(i) P(i,j)$. Then, the restriction ${\widetilde{\pi}}_I=({\widetilde{\pi}}(i)\,:\, i\in I)$ satisfies $\gamma {\widetilde{\pi}}_I^\top= {\widetilde{\pi}}_I^\top P_I$ for some $\gamma$. So, ${\widetilde{\pi}}_I$ is a strictly positive left eigenvector with finite mass, so $\mu=\gamma^{-1} {\widetilde{\pi}}_I$. Then, ${\widetilde{\pi}}_I=\gamma \mu=(\pi(i)\,:\, i\in I)$ is given by the second term in (3). On the other hand, we have

\begin{equation*}{\widetilde{\pi}}(\varepsilon)=\big({\widetilde{\pi}}^\top P^{{\widetilde{\pi}}}\big)(\varepsilon)={\widetilde{\pi}}(\varepsilon)\sum_{\delta\in {\mathcal{E}}}{\widetilde{\pi}}(\delta)+\sum_{i\in I} {\widetilde{\pi}}(i) P(i,\varepsilon).\end{equation*}

Then, ${\widetilde{\pi}}(\varepsilon)\big(1-\sum_{\delta\in {\mathcal{E}}}{\widetilde{\pi}}(\delta)\big)=\sum_{i\in I} {\widetilde{\pi}}(i) P(i,\varepsilon)$, so ${\widetilde{\pi}}(\varepsilon)\gamma=\gamma \sum_{i\in I} \mu(i) P(i,\varepsilon)$. This gives the equality ${\widetilde{\pi}}(\varepsilon)=\sum_{i\in I} \mu(i)P(i,\varepsilon)$, so ${\widetilde{\pi}}=\pi$, which finishes the proof.

From the equality $\pi(I)=\gamma$ in (4), we shall use $\pi(I)$ in what follows to refer to the Perron–Frobenius eigenvalue of $P_I$.

Observe that (2) can be written as

\begin{equation*}\mathbb{P}_\mu({\mathcal{X}}_{\tau_{\mathcal{E}}}=\varepsilon)=\pi(\varepsilon \mid {\mathcal{E}}) \hbox{ and }\mathbb{P}_\mu({\mathcal{X}}_{\tau_{\mathcal{E}}}=\varepsilon,\tau_{\mathcal{E}}=n)=\pi(\varepsilon \mid {\mathcal{E}})(1-\pi(I)) \pi(I)^{n-1}.\end{equation*}

We denote by ${\mathbb{X}}=(X_n)$ the Markov chain evolving with the transition kernel $P^\pi$ and call it the associated stationary chain. By an abuse of notation we shall denote $P^\pi$ by P, and so, from now on, $P(\varepsilon,b)=\pi(b) \hbox{ for all } \varepsilon\in {\mathcal{E}}, b\in I\cup {\mathcal{E}}$. All the concepts developed in the absorbed case depended only on the trajectory ${\mathcal{X}}^{(A)}=({\mathcal{X}}_n\,:\, n\le \tau_{\mathcal{E}})$, which is equally distributed as $(X_n\,:\, n\le \tau_{\mathcal{E}})$ when starting from $X_0={\mathcal{X}}_0\in I$. Hence, there is no confusion if we continue denoting by $\mathbb{P}_a$ the law of the chain ${\mathbb{X}}$ starting from $a\in I\cup {\mathcal{E}}$ and by $\mathbb{E}_a$ its associated mean expected value.

Since ${\mathbb{X}}=(X_n)$ has transition probability kernel P and stationary distribution $\pi$, its entropy is given by (see Proposition 12.3 in [Reference Denker, Grillenberger and Sigmund5], p. 69)

\begin{equation*}h({\mathbb{X}})=-\sum_{a\in I\cup {\mathcal{E}}} \pi(a)\sum_{b\in I\cup {\mathcal{E}}}P(a,b) \log P(a,b).\end{equation*}

Then,

(5) \begin{eqnarray}\nonumber h({\mathbb{X}})&=&-\sum_{\delta\in {\mathcal{E}}} \pi(\delta)\sum_{a\in I\cup {\mathcal{E}}}\pi(a) \log \pi(a)-\sum_{i\in I} \pi(i) \sum_{b\in I\cup {\mathcal{E}}} P(i,b) \log P(i,b)\\\nonumber&=& -\pi({\mathcal{E}}) \sum_{i\in I} \pi(i) \log \pi(i)-\pi({\mathcal{E}}) \sum_{\delta\in {\mathcal{E}}} \pi(\delta) \log \pi(\delta)\\&{}&\; -\sum_{i,j\in I} \pi(i) P(i,j) \log P(i,j)-\sum_{i\in I,\delta\in {\mathcal{E}}} \pi(i) P(i,\delta) \log P(i,\delta) .\end{eqnarray}

Further, we will compare this entropy to the entropies of some random sequences appearing in the chain.

4. Elements of the associated stationary chain

The object of this section is to show how one can retrieve the chain ${\mathbb{X}}$ from the absorbed trajectories and some walks on the set of absorbing states. To this end, the behavior of the chain ${\mathbb{X}}$ is first decomposed along its visits to I and to ${\mathcal{E}}$ in a separated way.

4.1. Decoupling the stationary chain

Let $\tau_I=\inf\{n\ge 1\,:\, X_n\in I\}$ be the first return time of ${\mathbb{X}}$ to I. Now, consider the stochastic matrix $Q=(Q(i,j)\,:\, i,j\in I)$ given by $Q(i,j)=\mathbb{P}_i(X_{\tau_{\,I}}=j)$. By using that $\mathbb{P}_\varepsilon(X_{\tau_{\,I}}=j)=\pi(j \mid I)=\mu(j)$ for all $\varepsilon\in {\mathcal{E}}, j\in I$, we get

(6) \begin{align}\nonumber Q(i,j) & = \mathbb{E}_i \Big({\bf 1}_{\{X_{\tau_{\,I}}=j, {\tau_I}=1\}}\Big)+\mathbb{E}_i\Big({\bf 1}_{\{X_{\tau_{\,I}}=j,{\tau_I}>1\}}\Big)\nonumber\\ & = P(i,j)+ \mathbb{P}_i({\tau_I}>1)\mathbb{P}_i\left(X_{\tau_{\,I}}=j \mid X_{\tau_{\,I}-1}, {\tau_I}>1 \right)\nonumber\\ & = P(i,j)+P(i,{\mathcal{E}}) \mu(j).\end{align}

Let ${\mathbb{Y}}=(Y_n\,:\, n\in \mathbb{Z})$ be a Markov chain with transition matrix Q. It is straightforwardly checked that $\mu$ is a stationary measure for ${\mathbb{Y}}$.

Remark 1. For a substochastic matrix $P_I$ the matrix $Q=(Q(i,j)=P(i,j)+P(i,{\mathcal{E}}) \mu(j)\,:\break i,j\in I)$ was defined in [Reference Ferrari, Kesten, Martínez and Picco7] and called the resurrected matrix from $P_I$ with distribution $\mu$. It was a key concept used in [Reference Ferrari, Kesten, Martínez and Picco7] to prove the existence of QSD for geometrically absorbed Markov chains taking values in an infinite countable set.

The chain ${\mathbb{Y}}$ can be constructed as follows. Let $\Xi=\{\xi_l\,:\, l\in \mathbb{Z}\}$ be the ordered sequence given by

\begin{equation*}\{\xi_l\,:\, l\in \mathbb{Z}\}=\{n\in \mathbb{Z}\,:\, X_n\in I\} \hbox{ with }\xi_{l-1}<\xi_{l}\, , \; \xi_{-1}<0\le \xi_0.\end{equation*}

Then, $(\xi_{l}-\xi_{l-1}\,:\, l\in \mathbb{Z})$ is a renewal stationary sequence with interarrival times distributed as $\mathbb{P}(\xi_{l}-\xi_{l-1}=\bullet)=\mathbb{P}_\mu(\tau_I=\bullet)$, $l\neq 0$. By definition, $(X_{\xi_l}\,:\, l\in \mathbb{Z}\}$ is a stationary sequence distributed as ${\mathbb{Y}}=(Y_n\,:\, n\in \mathbb{Z})$, so $(X_{\xi_l}\,:\, l\in \mathbb{Z})$ is a copy of ${\mathbb{Y}}$.

The random sequence ${\bf{b}}=(b_l\,:\, l\in \mathbb{Z}, \, l\neq 0)$ defined by $b_l=1$ if $\xi_{l}-\xi_{l-1}=1$ and $b_l=0$ if $\xi_{l}-\xi_{l-1}>1$ is a collection of independent and identically distributed (i.i.d.) Bernoulli random variables, with

\begin{equation*}\mathbb{P}(b_l=1)=\pi(I) \hbox{ and }\mathbb{P}(b_l=0)=\sum_{i\in I} \mu(i) P(i,{\mathcal{E}})=\pi({\mathcal{E}}).\end{equation*}

(Recall that $\pi(I)+\pi({\mathcal{E}})=1$). When $X_0\in I$, we find $\tau_{\mathcal{E}} =\inf\{l\ge 1\,:\, b_l=0\}$.

Remark 2. Every irreducible matrix stochastic matrix Q with stationary distribution $\mu$ can be written as in (6). In fact, let $\chi=(\chi(i)\,:\, i\in I)$ be a non-null vector, $\chi\neq {\vec 0}$, that satisfies

\begin{equation*}\text{for all } i\in I: \quad 0\le \chi(i)<1 \hbox{ and } \chi(i)\le \min\{Q(i,j)\mu(j)^{-1}\,:\, j\in I\}.\end{equation*}

This can be achieved because $\mu$ is strictly positive. Define $P_I=Q-\chi \mu^\top$, so

(7) \begin{equation}\text{for all } i,j\in I: \quad P(i,j)=Q(i,j)-\chi(i) \mu(j).\end{equation}

To avoid the trivial situation we can assume that the vector $\chi$ also satisfies that for every $i\in I$ and for some (or for all) $j\in I$ for which $Q(i,j)>0$, we have $P(i,j)>0$. This allows us to take $\chi$ ensuring that $P_I$ is irreducible. From the construction, $P(i,j)\in [0,1)$ and, since $\chi\neq 0$, we get

\begin{equation*}\sum_{j\in I} P(i,j)=1-\chi(i)\in (0,1], \qquad \sum_{i, j\in I} (1-P(i,j))>0.\end{equation*}

Hence, P is strictly substochastic, it is not trivial, and when adding the cemetery $\partial$ we have $\chi(i)=P(i,\partial)$. So,

\begin{equation*}\mu^\top P=\mu^\top\big(Q-\chi\mu^\top\big)=\Bigg(1-\sum_{i\in I} \mu(i)P(i,\partial)\Bigg)\mu^\top ; \end{equation*}

that is, $\mu^\top$ is the Perron–Frobenius left eigenvector of P with eigenvalue $\pi(I)=\sum_{i, j\in I} \mu(i) P(i,j)$ (see (1) and (4)). From (7) it follows that $Q(i,j)=P(i,j)+P(i,\partial)\mu(j)$, so (6) is satisfied.

The restriction of $P=P^\pi$ to the absorption states ${\mathcal{E}}$ satisfies

\begin{equation*}\text{for all } \varepsilon,\delta\in {\mathcal{E}}\,:\, \quad P(\varepsilon,\delta)=\pi(\delta),\hbox{ so }\mathbb{P}(X_1= \delta \mid X_0=\varepsilon, X_1\in {\mathcal{E}})=\pi(\delta \mid {\mathcal{E}}).\end{equation*}

The transition law to an absorbing point after being in $X_{t-1}=i\in I$ is given by

\begin{equation*}\text{for all } \delta\in {\mathcal{E}}\,:\, \quad\mathbb{P}(X_{t}=\delta \mid X_{t}\in {\mathcal{E}}, X_{t-1}=i)=P(i,\delta)/P(i,{\mathcal{E}}).\end{equation*}

So, if $X_{-1}\in I$ and $X_0\in {\mathcal{E}}$, the total sojourn time at ${\mathcal{E}}$ is $\tau_I$, and it is distributed as a $\hbox{Geometric}(\pi(I))$. Then, immediately after the entrance to ${\mathcal{E}}$ the chain ${\mathbb{X}}$ makes a walk on ${\mathcal{E}}$ of length $\tau_I-1$ (a quantity that could vanish). To describe it, take ${\mathbb{G}}=(G_n\,:\, n\in \mathbb{Z})$ a Bernoulli chain with probability vector $\pi(\bullet \mid {\mathcal{E}})$. Let us consider the finite random sequence $V=(G_l\,:\, 1\le l<\tau_I)$ (with V empty if $\tau_I=1$), which is distributed as

(8) \begin{align}\mathbb{P}(V=\emptyset) & = \mathbb{P}(\tau_I=1)=\pi(I) ,\\ \mathbb{P}(V=(\delta_1, \ldots, \delta_{k-1})) & = \mathbb{P}(G_1 = \delta_1, \ldots, G_{k-1} = \delta_{k-1},\tau_I = k)\nonumber\\ & = \Bigg(\prod_{l=1}^{k-1}\pi(\delta_l \mid {\mathcal{E}})\Bigg)\pi({\mathcal{E}})^{k-1}\pi(I)=\Bigg(\prod_{l=1}^{k-1}\pi(\delta_l)\Bigg)\pi(I)\nonumber\\ &\quad \hbox{ for } k\ge 2, \ (\delta_1, \ldots, \delta_{k-1})\in {\mathcal{E}}^{k-1}.\nonumber\end{align}

Notice that the last equality also holds when $\tau_I=k=1$ because an empty product satisfies $\prod_{l=1}^{k-1}=1$. We have $(X_t, 1\le t < \tau_I \mid X_{-1}\in I, X_0\in {\mathcal{E}})\sim V$, and V is called a walk on ${\mathcal{E}}$. Note that $\tau_I-1 \mid \tau_I>1$ is equally distributed as $\tau_I$. The exit law from ${\mathcal{E}}$ is $\mathbb{P}(X_{\tau_{\,I}}\in \bullet)\sim \mu$. In fact, for all $\delta\in {\mathcal{E}}$,

\begin{align*}\mathbb{P}_\delta(X_{\tau_{\,I}}=i) & = \sum_{\varepsilon\in {\mathcal{E}}}\mathbb{P}_\delta(X_{\tau_{\,I}}=i, X_{\tau_{\,I}-1}=\varepsilon)\\ & = \sum_{\varepsilon\in {\mathcal{E}}} \mathbb{P}_\delta(X_{\tau_{\,I}-1}=\varepsilon)\mathbb{P}_\varepsilon(X_1=i \mid X_1\in I)\\ & = \pi(i \mid I)=\mu(i).\end{align*}

Notice that

(9) \begin{equation}\text{for all } \delta\in {\mathcal{E}}, \ \mathbb{E}_\delta(\tau_I)=\pi(I)^{-1} . \end{equation}

We consider a sequence of i.i.d. random variables ${\bf{{\mathcal{T}}}}=({\mathcal{T}}_n\,:\, n\in \mathbb{Z})$ which are $\hbox{Geometric}(\pi(I))-1$ distributed, that is, $\mathbb{P}({\mathcal{T}}_n=l)=\pi(I) (1-\pi(I))^{l}$ for $l\ge 0$. The construction of i.i.d. walks on ${\mathcal{E}}$ is made as follows. One takes an increasing sequence of times $(t_n\,:\, n\in \mathbb{Z})$ with $t_{n+1}-t_n={\mathcal{T}}_n$ and such that $t_n\to \infty$ if $n\to \infty$ and $t_n\to -\infty$ if $n\to -\infty$. We define $V^n=(G_{t_n}, \ldots, G_{t_{n+1}-1})=(V^n_1, \ldots, V^n_{{\mathcal{T}}_n})$. So, ${\mathbb{V}}=\left(V^n\,:\, n\ge \mathbb{Z}\right)$ is an i.i.d. sequence of walks on ${\mathcal{E}}$. The walk $V^n$ is empty when ${\mathcal{T}}_n=0$.

When ${\mathcal{E}}=\{\partial\}$ is a singleton, we have $\pi(\partial \mid {\mathcal{E}})=1$, ${\mathbb{G}}=(G_n\,:\, n\in \mathbb{Z})$ is the orbit with the unique symbol $G_n=\partial$ for all n, and the random sequence $V=(G_l\,:\, 1\le l<\tau_I)$ has the symbol $\partial$ repeated $|\tau_I|-1$ times.

4.2. Retrieving the stationary chain

Let ${\mathbb{Y}}=(Y_n\,:\, n\in \mathbb{Z})$ be a stationary Markov chain with transition matrix Q. Our purpose is to construct a copy of ${\mathbb{X}}$ from ${\mathbb{Y}}$ by adding a series of random operations.

Let $\mathbb{P}$ be a probability measure governing the law of ${\mathbb{Y}}$ when it starts from the stationary distribution $\mu$, the sequences ${\mathbb{G}}$, ${\mathcal{T}}$ and so ${\mathbb{V}}$, and also the random element ${\mathbb{B}}^{I,I}$ and ${\mathbb{D}}^I$ defined below.

Let ${\mathbb{B}}^{I,I}=\big((B^{i,j}_l\,:\, l\in \mathbb{Z});\ i,j\in I \big)$ be an independent array of Bernoulli random variables such that $B^{i,j}_l\sim B^{i,j}$ for $l\in \mathbb{Z}$, where

(10) \begin{equation}\begin{aligned} \mathbb{P}(B^{i,j}=1) & = \theta_{i,j}, \qquad \mathbb{P}(B^{i,j}=0)={\overline{\theta}}_{i,j}=1-\theta_{i,j} , \\ \theta_{i,j} & = \frac{P(i,j)}{P(i,j)+P(i,{\mathcal{E}})\mu(j)}=\frac{P(i,j)}{Q(i,j)}.\end{aligned}\end{equation}

Let $\tau_\partial=\inf\{l\ge 1\,:\, B^{Y_{l-1},Y_l}_l=0\}$. For $k\ge 1$, $i_0, \ldots, i_{k-1}\in I$ we have

\begin{multline*}\mathbb{P}(Y_0=i_0, Y_1=i_1, \ldots, Y_{k-1}=i_{k-1}, \tau_\partial=k) \\ =\mu(i_0)\Bigg(\prod_{l=1}^{k-1} P(i_{l-1},i_l)\Bigg)\Bigg(\sum_{j\in I}P(i_{k-1},{\mathcal{E}})\mu_j\Bigg)=\mu(i_0)\Bigg(\prod_{l=1}^{k-1} P(i_{l-1},i_l)\Bigg)P(i_{k-1},{\mathcal{E}}).\end{multline*}

Hence, the sequence $Y^{(K)}=(Y_l\,:\, 0\le l<\tau_\partial)$ is distributed as a killed chain ${\mathcal{X}}^{(K)}$ starting from $\mu$.

Now take an independent array ${\mathbb{D}}^I=\left((D^{i}_l\,:\, l\in \mathbb{Z});\, i\in I \right)$ of random variables taking values in ${\mathcal{E}}$ and with law

(11) \begin{equation}\text{for all } \delta \in {\mathcal{E}}\,:\,\quad \mathbb{P}(D^{i}_l=\delta)=P(i,\delta)/P(i,{\mathcal{E}}).\end{equation}

For $k\ge 1$, $i_0, \ldots, i_{k-1}\in I$, $\delta\in {\mathcal{E}}$ we set

\begin{align*}\mathbb{P}(Y_0=i_0, & Y_1=i_1, \ldots, Y_{k-1}=i_{k-1}, D^{i_{k-1}}_k=\delta,\tau_\partial=k)\\[4pt] & =\mu(i_0)\Bigg(\prod_{l=1}^{k-1} P(i_{l-1},i_l)\Bigg)P(i_{k-1},{\mathcal{E}})\big(P(i_{k-1},\delta)/P(i_{k-1},{\mathcal{E}})\big) \\[4pt] & = \mu(i_0)\Bigg(\prod_{l=1}^{k-1} P(i_{l-1},i_l)\Bigg)P(i_{k-1},\delta).\end{align*}

Then, the sequence $Y^{(A)}=(Y_0, \ldots, Y_{\tau_\partial-1},D^{Y_{\tau_{\partial}-1}}_{\tau_\partial})$ is distributed as an absorbed chain ${\mathcal{X}}^{(A)}$ starting from $\mu$.

Let us construct a chain ${\mathbb{S}}^{\text{s}}=({\mathbb{S}}^{\text{s}}_t\,:\, t\in \mathbb{Z})$ from ${\mathbb{Y}}$, ${\mathbb{B}}^{I,I}$, ${\mathbb{D}}^Y$, ${\mathbb{G}}$, and ${\mathcal{T}}$ (and so also ${\mathbb{V}}$), having the same distribution as ${\mathbb{X}}$. First, define a random sequence ${\mathbb{S}}=({\mathbb{S}}_t\,:\, t\in \mathbb{Z})$ as follows. We set $T_0=0$, ${\mathbb{S}}_0=Y_0$ (so ${\mathbb{S}}_0=Y_0\in I$ is distributed as $\mu$), and:

  1. I In a sequential way on $n\ge 0$ we make the following construction. Assume at step n that $T_n$ has been defined; then, put ${\mathbb{S}}_{T_n}=Y_n$ and go to step $n+1$.

  2. Ia If $B^{Y_n,Y_{n+1}}_{n+1}=1$ put $T_{n+1}=T_n+1$, ${\mathbb{S}}_{T_{n+1}}=Y_{n+1}$, and go to step $n+2$.

  3. Ib If $B^{Y_n,Y_{n+1}}_{n+1}=0$ put $T_{n+1}=T_n+{\mathcal{T}}_n+2$, define ${\mathbb{S}}_{T_{n}+1}=D^{Y_n}_{n+1}$, ${\mathbb{S}}_{T_{n}+1+l}=V^n_l$ for $1\le\break l<{\mathcal{T}}_n$ (it is empty when ${\mathcal{T}}_n=0$), and ${\mathbb{S}}_{T_{n+1}}=Y_{n+1}$. Then continue with step $n+2$.

  4. II Similarly, in a sequential way on $n<0$ we make the following construction for step n:

  5. IIa If $B^{Y_{n},Y_{n+1}}_{n+1}=1$ put $T_{n}=T_{n+1}-1$, ${\mathbb{S}}_{T_{n}}=Y_n$, and continue with step $n-1$.

  6. IIb If $B^{Y_n,Y_{n+1}}_{n+1}=0$ put $T_{n}=T_{n+1}-({\mathcal{T}}_n+2)$, ${\mathbb{S}}_{T_n+1}=D^{Y_n}_{n+1}$, ${\mathbb{S}}_{T_{n}+1+l}=V^n_l$ for $1\le l<{{\mathcal{T}}_n}$, and ${\mathbb{S}}_{T_n}=Y_{n}$. Then, continue with step $n-1$.

Let ${\mathbb{S}}=({\mathbb{S}}_t\,:\, t\in \mathbb{Z})$ be the random sequence resulting from this construction, and let $\mathbb{T}=(T_n\,:\, n \in \mathbb{Z})$, recalling that $T_0=0$. By an abuse of notation we also denote by $\mathbb{T}=\break \{T_n\,:\, n\in \mathbb{Z}\}$ the set of these values. By definition, $\mathbb{T}=\{t\in \mathbb{Z}\,:\, {\mathbb{S}}_t\in I\}$ is the set of random points where ${\mathbb{S}}$ is in I. In Proposition 2 we will prove that $({\mathbb{S}}, \mathbb{T})$ is a regenerative process (see [Reference Asmussen2], pp. 169–170), that is, for all $l\ge 0$ the process $({\mathbb{S}}_{\bullet + T_{l}}\,:\, \bullet \ge 0;\ T_{n+1}-T_n,\ n\ge l)$ has the same distribution as $({\mathbb{S}}_{\bullet}\,:\, \bullet \ge 0;\ T_{n+1}-T_n,\ n\ge 0)$ and it is independent of $(T_n\,:\, n \le l)$.

The cycles of this regenerative process, $({\mathbb{S}}_{T_n}, \ldots, {\mathbb{S}}_{T_{n+1}-1})$, $n\in \mathbb{Z}$, are i.i.d., and so all of them have the same distribution as $({\mathbb{S}}_0, \ldots, {\mathbb{S}}_{T_1-1})$. By shifting the process ${\mathbb{S}}^{\text{s}}=({\mathbb{S}}^{\text{s}}_t\,:\, t\in \mathbb{Z})$ by a random time chosen uniformly in $\{0, \ldots, T_1-1\}$ and conditionally independent of the rest of the process, we get a stationary process ${\mathbb{S}}^{\text{s}}=({\mathbb{S}}^{\text{s}}_t\,:\, t\in \mathbb{Z})$ (see Theorem 6.4 in [Reference Asmussen2]). So, ${\mathbb{S}}^{\text{s}}_0$ takes values in $I\cup {\mathcal{E}}$ and, from the next proposition, it is distributed as $\pi$ (different than ${\mathbb{S}}_0$, which takes values in I and is distributed as $\mu$).

Proposition 2. The process $({\mathbb{S}}, \mathbb{T})$ is regenerative and the associated stationary process ${\mathbb{S}}^{\text{s}}$ is equally distributed as ${\mathbb{X}}$.

The proof can be found in the Appendix.

5. Stationary representation of killed and absorbed chains

The stationary Markov chain ${\mathbb{Y}}=(Y_n\,:\, n\in \mathbb{Z})$ with transition matrix Q and stationary distribution $\mu$ has entropy

(12) \begin{equation}h({\mathbb{Y}})=-\sum_{i\in I} \mu(i) \sum_{j\in I} Q(i,j)\log Q(i,j).\end{equation}

To get stationary representations of the killed and the absorbed chains we will use the 2-stringing form of ${\mathbb{Y}}$. Let us recall this notion. Consider the stochastic matrix $Q^{[2]}$, with set of indexes $I^2$, given by $Q^{[2]}((i,j),(l,k))=Q(l,k) {\bf 1}(l=j)$. Its stationary distribution satisfies $\nu((i,j))=\mu(i)Q(i,j)$ for $(i,j)\in I^2$. In fact, by using $\sum_{i\in I}\mu(i)Q(i,j)=\mu(j)$ we get

\begin{equation*}\sum_{(i,j)\in I^2}\nu((i,j)) Q^{[2]}((i,j),(l,k))=\sum_{i\in I}\mu(i)Q(i,l)Q(l,k)=\mu(l)Q(l,k)=\nu((l,k)).\end{equation*}

The stationary chain ${\mathbb{Y}}^{[2]}=((Y^1_n,Y^2_n)\,:\,Y^2_{n-1}=Y^1_{n},\ n\in \mathbb{Z})$ evolving with $Q^{[2]}$ is the 2-stringing of ${\mathbb{Y}}$. We write it by ${\mathbb{Y}}^{[2]}=((Y_{n-1},Y_n)\,:\, n\in \mathbb{Z})$. It is well known that it is conjugated to ${\mathbb{Y}}$ by the (1-coordinate) mapping $\Upsilon(((Y_{n-1},Y_n)\,:\, n\in \mathbb{Z}))=(Y_n\,:\, n\in \mathbb{Z})$. (This property was stated in a general form in Lemma 1 in [Reference Keane and Smorodinsky9].) Being conjugated by a mapping means that the mapping is one-to-one, measure preserving, and commutes with the shift on $\mathbb{Z}$. Since $\Upsilon$ is clearly one-to-one and shift commuting, we only check that it is measure preserving. Taking $(i_l\,:\, l=0, \ldots, k)\in I^{k+1}$, we have

\begin{align*}\mathbb{P}((Y_{n-1},Y_n) \in {\mathbb{Y}}^{[2]}\,:\,(Y_{l-1},Y_l) & = (i_{l-1},i_l)\,:\, l=1, \ldots, k) \\ & = \mu(i_0)Q(i_0,i_1)\prod_{l=1}^{k-1} Q(i_l,i_{l+1})=\mathbb{P}(Y_l = i_l \,:\, l = 0, \ldots, k).\end{align*}

The orbits $((Y_{n-1},Y_n)\,:\, n\in \mathbb{Z})\in {\mathbb{Y}}^{[2]}$ can be identified with the orbits $(Y_n\,:\, n\in \mathbb{Z})\in {\mathbb{Y}}$.

5.1. The killed chain

The stationary representation of the killed chain will be a stationary Markov chain on the set of states $I^2\times \{0,1\}=\{(i,j,a)\,:\, (i,j)\in I^2,\ a\in \{0,1\}\}$. Prior to defining the transition matrix we introduce the function

\begin{equation*}\varphi(i,j,a)=(\theta_{i,j}{\bf 1}(a=1)+{\overline{\theta}}_{i,j}{\bf 1}(a=0)),\qquad (i,j,a)\in I^2\times \{0,1\}.\end{equation*}

The transition matrix ${\mathcal{K}}=\left({\mathcal{K}}((i,j,a),(l,k,b))\,:\, (i,j,a), (l,k,b)\in I^2\times \{0,1\}\right)$ is defined by

\begin{equation*}{\mathcal{K}}((i,j,a), (l,k,b)) = \begin{cases}\!\!\!\!\! &0 \hbox{ if } l \neq j , \\[5pt]\!\!\!\!\! & Q(l,k)\varphi(l,k,b) = P(l,k){\bf 1}(b = 1)+ P(l,{\mathcal{E}})\mu(k){\bf 1}(b = 0)\hbox{ if } l = j.\end{cases}\end{equation*}

It can be straightforwardly checked that this is a stochastic matrix: we claim its stationary distribution $\zeta=(\zeta(i,j,a)\,:\, (i,j,a)\in I^2\times \{0,1\})$ is given by

(13) \begin{align}\zeta(i,j,a)& = \mu(i)Q(i,j)\varphi(i,j,a)\nonumber\\ & = \mu(i)P(i,j){\bf 1}(a=1) + \mu(i)P(i,{\mathcal{E}})\mu(j){\bf 1}(a=0).\end{align}

By using $\sum_{i\in I} \mu(i) Q(i,l)=\mu(l)$ we get the desired property,

\begin{align*}\sum_{(i,j,a)\in I^2\times \{0,1\}}\!\!\!\!\zeta(i,j,a) {\mathcal{K}}((i,j,a), (l,k,b)) & =\Bigg(\sum_{i\in I} \mu(i) Q(i,l)\Bigg) Q(l,k)\varphi(l,k,b) \\& = \mu(l)Q(l,k)\varphi(l,k,b)=\zeta(l,k,b),\end{align*}

so the claim follows.

The killed Markov chain presented in its stationary form is denoted

\begin{equation*}{\mathbb{Y}}^{({\mathcal{K}})}=((Y_{n-1},Y_n,B_n)\,:\, n\in \mathbb{Z});\end{equation*}

it takes values in $I^2\times \{0,1\}$ and has transition matrix ${\mathcal{K}}$. The component $B_n$ is called the label at n. By hypothesis, $P_I$ is irreducible so also ${\mathcal{K}}$ is an irreducible matrix. Then, the Markov shift ${\mathbb{Y}}^{({\mathcal{K}})}$ is ergodic (see Proposition 8.12 in [Reference Denker, Grillenberger and Sigmund5]).

It is straightforward to check that the mapping

(14) \begin{equation}\Upsilon^{({\mathcal{K}})}\,:\, {\mathbb{Y}}^{({\mathcal{K}})}\to {\mathbb{Y}},\ ((Y_{n-1},Y_n,B_n)\,:\, n\in \mathbb{Z})\to (Y_n\,:\, n\in \mathbb{Z}) \end{equation}

is a factor, which means that it is measure preserving and commutes with the shift on $\mathbb{Z}$.

Remark 3. We show that ${\mathbb{Y}}^{({\mathcal{K}})}$ models the killed Markov chain. Let ${\mathcal{N}}=\{n\in \mathbb{Z}\,:\, B_n=0\}$ and write it as ${\mathcal{N}}=\{n_l\,:\, l\in \mathbb{Z}\}$, where $n_l$ is increasing with l, and $n_{-1}<0\le n_0$. Note that $\mathbb{P}(0\in {\mathcal{N}})=\pi({\mathcal{E}})$.

The orbit $((Y_{n-1},Y_n,B_n)\,:\, n\in \mathbb{Z})$ in ${\mathbb{Y}}^{({\mathcal{K}})}$ is denoted in the simpler form $((Y_{n-1},B_n)\,:\break n\in \mathbb{Z})$ and we can divide it into the disjoint connected pieces $(Y,B)^{(K)}_{l}=((Y_{n_l},1), \ldots,$ $(Y_{n_{l+1}-2},1), (Y_{n_{l+1}-1}, 0))$, $l\in \mathbb{Z}$. The component $Y_{n_l}$ is distributed with law $\mu$ for all l, and one can identify $(Y,B)^{(K)}_{l}$ with $Y^{(K)}_{l}=(Y_{n_l}, \ldots, Y_{n_{l+1}-1})$, a piece of the orbit $Y=(Y_n\,:\, n\in \mathbb{Z})$ starting from $\mu$ at $n_l$ and killed at $n_{l+1}-1$. We get that $Y^{(K)}_{l}\sim {\mathcal{X}}^{(K)}$ for all $l \neq -1$ when ${\mathcal{X}}^{(K)}$ starts from $\mu$. In fact, for $s\ge 0$, $i_0, \ldots, i_s\in I$, we have

\begin{multline*}\mathbb{P}\big({\mathcal{X}}^{(K)} = (i_0, \ldots, i_s)\big) = \mu(i_0)\prod_{r=0}^{s-1} P(i_r,i_{r+1})P(i_s,\partial) \\ = \mu(i_0)\Bigg(\prod_{r=0}^{s-1} Q(i_r,i_{r+1})\theta_{i_r,i_{r+1}}\Bigg)\Bigg(\sum_{m\in I}Q(i_s,m){\overline{\theta}}_{i_s,m}\Bigg)=\mathbb{P}\Big(Y_{l}^{(K)}=(i_0, \ldots, i_s)\Big).\end{multline*}

Let $s\ge 0$, $(i_0, \ldots, i_s)\in I^{s+1}$. For almost all the orbits $Y\in {\mathbb{Y}}^{({\mathcal{K}})}$ we have

\begin{equation*}\mathbb{P}(n_0=0, Y_0^{(K)}=(i_0, \ldots, i_s))=\pi({\mathcal{E}})\mu(i_0)\prod_{r=0}^{s-1} P(i_r,i_{r+1}) P(i_s,\partial)>0.\end{equation*}

Since the killed trajectories are finite, the class of killed trajectories is countable. From the ergodic theorem, and since ${\mathbb{Y}}^{({\mathcal{K}})}$ is ergodic, it follows that $\mathbb{P}$-a.e. (almost everywhere) the orbits of ${\mathbb{Y}}^{({\mathcal{K}})}$ contain all the killed trajectories of the chain.

The entropy of the killed chain satisfies

\begin{align*} h\big({\mathbb{Y}}^{({\mathcal{K}})}\big) & = -\!\!\!\sum_{(i,j)\in I^2} \!\!\! \mu(i) Q(i,j) \sum_{k\in I}Q(j,k)\left(\theta_{j,k} \log (Q(j,k)\theta_{j,k})+ {\overline{\theta}}_{j,k}\log (Q(j,k){\overline{\theta}}_{j,k})\right) \\& = - \!\!\sum_{(j,k)\in I^2} \!\!\! \mu(j) Q(j,k) \log Q(j,k)+\sum_{(j,k)\in I^2}\mu(j)Q(j,k) H(B^{j,k}) ,\end{align*}

where $H(B^{j,k})=-\left(\theta_{j,k} \log \theta_{j,k}+{\overline{\theta}}_{j,k} \log {\overline{\theta}}_{j,k}\right)$ is the entropy of the Bernoulli random variable $B^{j,k}$. Hence,

(15) \begin{equation}h\big({\mathbb{Y}}^{({\mathcal{K}})}\big)=h({\mathbb{Y}})+\Delta(B) , \quad \hbox{with }\Delta(B)=\sum_{(j,k)\in I^2}\mu(j)Q(j,k) H(B^{j,k}).\end{equation}

The quantity $\Delta(B)=h({\mathbb{Y}}^{({\mathcal{K}})})-h({\mathbb{Y}})$ is the conditional entropy of ${\mathbb{Y}}^{({\mathcal{K}})}$ given the factor ${\mathbb{Y}}$ (see Lemma 2 and Definition 3 in [Reference Downarowicz and Serafin6]). To be more precise, given an orbit $Y=(Y_n\,:\, n\in \mathbb{Z})$ of ${\mathbb{Y}}$, the fiber given by (14) satisfies $(\Upsilon^{({\mathcal{K}})})^{-1}\{Y\}=\{(B^{Y_{n-1},Y_{n}}_n\,:\, n\in \mathbb{Z})\in\{0,1\}^\mathbb{Z}\}$, and it is distributed as a sequence of independent Bernoulli variables given by (10); we denote it by $(\bf P)_{Y}$. We have

(16) \begin{equation}H_{(\bf P)_{Y}}\big(B_1^{Y_0,Y_{1}}\big)=-\big(\theta_{Y_0,Y_1} \log \theta_{Y_0,Y_1}+ {\overline{\theta}}_{Y_0,Y_1}\log{\overline{\theta}}_{Y_0,Y_1}\big).\end{equation}

Let us summarize the results on the entropy of ${\mathbb{Y}}^{({\mathcal{K}})}$.

Proposition 3. The entropy of the stationary representation ${\mathbb{Y}}^{({\mathcal{K}})}$ of the killed chain satisfies

(17) \begin{align}h\big({\mathbb{Y}}^{({\mathcal{K}})}\big) & = h({\mathbb{Y}})+\Delta(B),\\ \nonumber \Delta(B) & = \int H_{(\bf P)_{Y}}\big(B_1^{Y_0,Y_{1}}\big) \, {\text{d}} \mathbb{P}(Y)=\sum_{(i,j)\in I^2}\mu(i)Q(i,j) H(B^{i,j}) , \end{align}

and

(18) \begin{align} h\big({\mathbb{Y}}^{({\mathcal{K}})}\big) & = -\sum_{i\in I} \mu(i) P(i,{\mathcal{E}})\log P(i,{\mathcal{E}})-(1-\pi(I))\sum_{j\in I} \mu(j)\log \mu(j)\\ \nonumber& \quad -\sum_{i,j\in I} \mu(i) P(i,j)\log P(i,j). \end{align}

Proof. From (15) and (16), and by using the Markov property, we retrieve the Abramov–Rokhlin formula (see [Reference Abramov and Rokhlin1,Reference Downarowicz and Serafin6]),

\begin{equation*}\Delta(B)=h\big({\mathbb{Y}}^{({\mathcal{K}})}\big)-h({\mathbb{Y}})=\int H_{(\bf P)_{Y}}\big(B_1^{Y_0,Y_1}\big) \, {\text{d}}\mathbb{P}(Y)=\sum_{i,j\in I} \mu(i) Q(i,j)H(B^{i,j}).\end{equation*}

This gives (17). The only thing left to prove is (18). By using

\begin{equation*}\sum_{i\in {\mathcal{E}}}\mu(i)P(i,{\mathcal{E}}) = 1\!-\!\pi(I), \quad \sum_{j\in I}P(i,j) = 1\!-\!P(i,{\mathcal{E}}), \quad \sum_{i\in I}\mu(i) P(i,j) = \pi(I) \mu(j) \end{equation*}

and (12), we get

\begin{align*}\Delta(B) & =-\sum_{i,j\in I} \mu(i) \left(P(i,{\mathcal{E}})\mu(j)\log(P(i,{\mathcal{E}})\mu(j))+P(i,j)\log P(i,j)\right) \\& \quad + \sum_{i,j\in I} \mu(i) Q(i,j)\log Q(i,j) \\& = -\sum_{i\in I} \mu(i) P(i,{\mathcal{E}}) \log P(i,{\mathcal{E}})-(1-\pi(I)) \sum_{j\in I} \mu(j)\log \mu(j) \\& \quad +\sum_{i,j\in I} \mu(i) P(i,j)\log P(i,j)-h({\mathbb{Y}}).\end{align*}

This shows (18).

Remark 4. From (13) we get that there are, in mean,

\begin{equation*}\sum_{i,j\in I}\zeta(i,j,1) = \sum_{i,j\in I}\mu(i)P(i,j)=\pi(I)\end{equation*}

sites in $\mathbb{Z}$ where ${\mathbb{Y}}^{({\mathcal{K}})}$ has made a transition with label 1, and a mean

\begin{equation*}\sum_{i,j\in I}\zeta(i,j,0)=\sum_{i\in I}\mu(i)P(i,{\mathcal{E}})=\pi({\mathcal{E}})\end{equation*}

of sites in $\mathbb{Z}$ where ${\mathbb{Y}}^{({\mathcal{K}})}$ has made a transition with label 0, and so resurrects with distribution $\mu$.

5.2. The absorbed chain

Let us construct a stationary representation of the absorbed chain in a similar way as we did for the killed chain. Define ${\mathcal{E}}^*={\mathcal{E}}\cup \{o\}$ with $o\not\in {\mathcal{E}}\cup I$. The absorbed chain will take values on the set of states $I^2\times {\mathcal{E}}^*=\{(i,j,\delta)\,:\, i\in I,\ j\in I,\ \delta\in {\mathcal{E}}^*\}$. The matrix ${\mathcal{A}}=({\mathcal{A}}((i,j,\delta), (l,k,\varepsilon))\,:\,(i,j,\delta), (l,k,\varepsilon)\in I^2\times {\mathcal{E}})$ defined by

\begin{equation*}{\mathcal{A}}((i,j,\delta), (l,k,\varepsilon))=\begin{cases} & \!\!\!\!\! 0 \hbox{ if } l\neq j , \\[3pt] &\!\!\!\!\!Q(l,k)\theta_{l,k}=P(l,k) \hbox{ if } l=j,\ \varepsilon=o , \\[3pt] &\!\!\!\!\!Q(l,k) {\overline{\theta}}_{l,k} P(l,\varepsilon)/P(l,{\mathcal{E}}) = P(l,\varepsilon) \mu(k)\hbox{ if } l = j,\ \varepsilon\in {\mathcal{E}} \end{cases}\end{equation*}

is a stochastic matrix whose stationary distribution $\eta=(\eta(i,j,\delta)\,:\, (i,j,r)\in I^2\times {\mathcal{E}}^*)$ is given by

\begin{align*}\eta(i,j,\delta) & = \mu(i)Q(i,j)\left(\theta_{i,j} {\bf 1}(\delta=o)+{\overline{\theta}}_{i,j} P(i,\delta)/P(i,{\mathcal{E}}) {\bf 1}(\delta\in {\mathcal{E}})\right)\\ & = \mu(i) P(i,j){\bf 1}(\delta=o)+\mu(i)P(i,\delta)\mu(j){\bf 1}(\delta\in {\mathcal{E}}).\end{align*}

In fact, since $\sum_{\delta\in {\mathcal{E}}^*}(\theta_{i,l} {\bf 1}(\delta=o)+{\overline{\theta}}_{i,l} P(i,\delta)/P(i,{\mathcal{E}}) {\bf 1}(\delta\in {\mathcal{E}}))=1$, we get the stationarity property

\begin{eqnarray*}&{}&\sum_{(i,j,\delta)\in I^2\times {\mathcal{E}}^*} \eta(i,j,\delta){\mathcal{A}}((i,j,\delta), (l,k,\varepsilon))\\&{}& \, =\Bigg(\sum_{i\in I} \mu(i) Q(i,l)\Bigg)Q(l,k)\big(\theta_{l,k}{\bf 1}(\varepsilon=o)+{\overline{\theta}}_{l,k} P(l,\varepsilon)/P(l,{\mathcal{E}}){\bf 1}(\varepsilon\in {\mathcal{E}})\big)\\&{}& \, =\mu(l) Q(l,k) \big(\theta_{l,k}{\bf 1}(\varepsilon=o)+{\overline{\theta}}_{l,k}P(l,\varepsilon)/P(l,{\mathcal{E}}) {\bf 1}(\varepsilon\in {\mathcal{E}})\big)=\eta(l,k,\varepsilon).\end{eqnarray*}

We denote by ${\mathbb{Y}}^{({\mathcal{A}})}=((Y_{n-1},Y_n, D^*_n)\,:\, n\in \mathbb{Z})$ the absorbed Markov chain presented in its stationary form, and taking values in $I^2\times {\mathcal{E}}^*$ with transition matrix ${\mathcal{A}}$. Since ${\mathcal{A}}$ is irreducible, the Markov shift ${\mathbb{Y}}^{({\mathcal{A}})}$ is ergodic.

It is straightforward to check that the mapping

(19) \begin{eqnarray}\nonumber&{}&\Upsilon^{({\mathcal{A}})}\,:\, {\mathbb{Y}}^{({\mathcal{A}})}\to {\mathbb{Y}}^{({\mathcal{K}})},((Y_{n-1},Y_n,D^*_n)\,:\, n\in \mathbb{Z})\to (Y_{n-1},Y_n,B_n)\,:\, n\in \mathbb{Z})\\&{}& \hbox{ with } B_n= {\bf 1}(D^*_n=o) \end{eqnarray}

is a factor between ${\mathbb{Y}}^{({\mathcal{A}})}$ and ${\mathbb{Y}}^{({\mathcal{K}})}$.

Remark 5. Let us see that the stationary chain ${\mathbb{Y}}^{({\mathcal{A}})}$ models the absorbed Markov chain. First, denote ${\mathcal{N}}^*=\{n\in \mathbb{Z}\,:\, D^*_n\in {\mathcal{E}}\}$ and write it by ${\mathcal{N}}^*=\{n_l\,:\, l\in \mathbb{Z}\}$ with $n_l$ increasing in l and $n_{-1}<0\le n_0$. We have $\mathbb{P}(0\in {\mathcal{N}}^*)=\pi({\mathcal{E}})$. Similarly to Remark 3, an orbit $((Y_{n-1},Y_n,D^*_n)\,:\, n\in \mathbb{Z})$ of ${\mathbb{Y}}^{({\mathcal{A}})}$ is denoted in the form $((Y_{n-1},D^*_n)\,:\, n\in \mathbb{Z})$ and is partitioned into the disjoint connected pieces

\begin{equation*}(Y,D^*)^{(A)}_{l}=((Y_{n_l},o), \ldots, (Y_{n_{l+1}-2},o),(Y_{n_{l+1}-1}, D^*_{n_{l+1}})) \hbox{ with } l\in \mathbb{Z} .\end{equation*}

The component $Y_{n_{l}}$ is distributed with law $\mu$ for all l, and we can identify $(Y,D^*)^{(A)}_l$ with $Y^{(A)}_l=(Y_{n_l}, \ldots, Y_{n_{l+1}-1}, D^*_{n_{l+1}})$ starting from $\mu$. Since the events $\{n\in \mathbb{Z}\,:\, D^*_n\in {\mathcal{E}}\}$ have the same distribution as $\{n\in \mathbb{Z}\,:\, B_n=0\}$ in ${\mathbb{Y}}^{({\mathcal{K}})}$, it can be checked that, for all $l\neq -1$, $Y^{(A)}_{l}\sim {\mathcal{X}}^{(A)}$, where ${\mathcal{X}}^{(A)}$ starts form $\mu$. In fact, for $s\ge 0$, $i_0, \ldots, i_s\in I$, $\varepsilon\in {\mathcal{E}}$, we have

\begin{equation*}\mathbb{P}\big({\mathcal{X}}^{(A)}=(i_0, \ldots, i_s, \varepsilon)\big) =\mu(i_0)\left(\prod_{r=0}^{s-1}P(i_r,i_{r+1})\right)P(i_s,\varepsilon) = \mathbb{P}\big(Y_{l}^{(A)}=(i_0, \ldots, i_s,\varepsilon)\big).\end{equation*}

Let $s\ge 0$, $(i_0, \ldots, i_s)\in I^{s+1}$, $\varepsilon\in {\mathcal{E}}$. For almost all the orbits $Y\in {\mathbb{Y}}^{({\mathcal{A}})}$ we have

\begin{equation*}\mathbb{P}\big(n_0=0, Y_{0}^{(A)}=(i_0, \ldots, i_s,\varepsilon)\big)=\pi({\mathcal{E}})\mu(i_0)\prod_{r=0}^{s-1} P(i_r,i_{r+1}) P(i_s,\varepsilon)>0.\end{equation*}

Since the absorbed trajectories are finite, the class of absorbed trajectories is countable. Then, since ${\mathbb{Y}}^{({\mathcal{A}})}$ is ergodic, we get from the ergodic theorem that, $\mathbb{P}$-a.e., the orbits of ${\mathbb{Y}}^{({\mathcal{A}})}$ contain all the absorbed trajectories of the chain.

The entropy of the absorbed chain satisfies

\begin{align*} h\big({\mathbb{Y}}^{({\mathcal{A}})}\big) & = - \!\!\!\! \sum_{(i,j)\in I^2}\!\!\!\! \mu(i) Q(i,j) \sum_{k\in I} \!Q(j,k) \theta_{j,k} \log (Q(j,k)\theta_{j,k}) \\ & \quad - \!\!\!\! \sum_{(i,j)\in I^2}\!\!\!\! \mu(i) Q(i,j) \!\!\!\sum_{k\in I,\varepsilon\in {\mathcal{E}}}\!\!\!\!Q(j,k) {\overline{\theta}}_{j,k} P(j,\varepsilon)/P(j,{\mathcal{E}}) \log(Q(j,k){\overline{\theta}}_{j,k}P(j,\varepsilon)/P(j,{\mathcal{E}}))\\& = -\sum_{(j,k)\in I^2}\!\! \mu(j) Q(j,k) \log Q(j,k)-\sum_{j\in I}\mu(j)Q(j,k)(\theta_{j,k} \log \theta_{j,k}+{\overline{\theta}}_{j,k} \log {\overline{\theta}}_{j,k})\\& \quad -\sum_{(j,k)\in I^2}\!\! \mu(j) Q(j,k){\overline{\theta}}_{j,k}\sum_{\varepsilon\in {\mathcal{E}}} P(j,\varepsilon)/P(j,{\mathcal{E}}) \log (P(j,\varepsilon)/P(j,{\mathcal{E}})).\end{align*}

Then,

(20) \begin{align}\nonumber h\big({\mathbb{Y}}^{({\mathcal{A}})}\big) & = h\big({\mathbb{Y}}^{({\mathcal{K}})}\big)+\sum_{i\in I}\mu(i) P(i,{\mathcal{E}}) H(D^i),\quad \hbox{where} \\ H(D^i) & = -\sum_{\delta\in {\mathcal{E}}}P(i,\delta)/P(i,{\mathcal{E}}) \log (P(i,\delta)/P(i,{\mathcal{E}}))\end{align}

is the entropy of a random variable in ${\mathcal{E}}$ distributed as the transition probability from $i\in I$ to a state conditioned to be in ${\mathcal{E}}$. Note that the above expression can also be written as

\begin{equation*}h\big({\mathbb{Y}}^{({\mathcal{A}})}\big)=h\big({\mathbb{Y}}^{({\mathcal{K}})}\big)+\sum_{i\in I}\mu(i) (P(i,{\mathcal{E}}) H(D^i)+P(i,I)H(o)),\end{equation*}

$H(o)=0$ being the entropy of a constant.

Define

\begin{equation*}\Delta(D)=h\big({\mathbb{Y}}^{({\mathcal{A}})}\big)-h\big({\mathbb{Y}}^{({\mathcal{K}})}\big)=\sum_{i\in I}\mu(i) P(i,{\mathcal{E}}) H(D^i).\end{equation*}

This is the conditional entropy of ${\mathbb{Y}}^{({\mathcal{A}})}$ given the factor ${\mathbb{Y}}^{({\mathcal{K}})}$. To see this, take an orbit $Y^{({\mathcal{K}})}=((Y_{n-1},Y_{n},B_n)\,:\, n\in \mathbb{Z})$ of ${\mathbb{Y}}^{({\mathcal{K}})}$. The fiber given by (19) satisfies $(\Upsilon^{({\mathcal{A}})})^{-1}\{Y^{({\mathcal{K}})}\}=\{(D^{Y_n, B_n}_n\,:\, n\in \mathbb{Z})\in ({\mathcal{E}}^*)^\mathbb{Z}\}$ with $D^{Y_n, B_n}_n\in {\mathcal{E}}$ when $B_n=0$ and $D^{Y_n, B_n}_n=o$ when $B_n=1$. These variables are independently distributed as a Bernoulli $D^{Y_n}_n$ given in (11) if $B_n=0$ and the constant variable o if $B_n=1$. This probability measure is denoted by $(\bf P)_{Y^{({\mathcal{K}})}}$. Thus,

(21) \begin{equation}H_{(\bf P)_{Y^{({\mathcal{K}})}}}\big(D^{Y_0,B_0}_0\big)=\begin{cases}& \!\!\!\!\! -\sum_{\delta\in {\mathcal{E}}} P(Y_0,\delta)/P(Y_0,{\mathcal{E}}) \log(P(Y_0,\delta)/P(Y_0,{\mathcal{E}})) \hbox{ if } B_0=0,\\[4pt] & \!\!\!\!\! 0 \hbox{ if } B_0=1.\end{cases}\end{equation}

Proposition 4. The entropy of the stationary representation ${\mathbb{Y}}^{({\mathcal{A}})}$ of the absorbed chain satisfies

\begin{align*} h\big({\mathbb{Y}}^{({\mathcal{A}})}\big) & = h\big({\mathbb{Y}}^{({\mathcal{K}})}\big)+\Delta(D) \quad \hbox{with} \\ \Delta(D) & = \int H_{(\bf P)_{Y}}\big(D^{Y_0,B_0}_0\big) \, {\text{d}}\mathbb{P}(Y^{({\mathcal{K}})})=-\sum_{i\in I}\mu(i) P(i,{\mathcal{E}}) H(D^i) \end{align*}

and

(22) \begin{equation} \begin{aligned} h\big({\mathbb{Y}}^{({\mathcal{A}})}\big) & = -(1-\pi(I))\sum_{j\in I}\mu(j)\log \mu(j)-\sum_{i,j\in I} \mu(i) P(i,j)\log P(i,j) \\& \quad -\sum_{i\in I, \delta\in {\mathcal{E}}}\mu(i) P(i,\delta) \log P(i,\delta).\end{aligned}\end{equation}

Proof. From (20) and (21), and by using the Markov property,

\begin{eqnarray*}\Delta(D)&=&\int H_{(\bf P)_{Y^{({\mathcal{A}})}}}\big(D^{Y_0,B_0}_0\big) \, {\text{d}}\mathbb{P}(Y^{({\mathcal{K}})})\\&=&-\sum_{i\in I}\mu(i)P(i,{\mathcal{E}})\sum_{\delta\in {\mathcal{E}}} P(i,\delta)/P(i,{\mathcal{E}}) \log (P(i,\delta)/P(i,{\mathcal{E}}))\\&=& -\sum_{i\in I,\delta\in {\mathcal{E}}} \mu(i) P(i,\delta) \log P(i,\delta)+\sum_{i\in I} \mu(i) P(i,{\mathcal{E}}) \log P(i,{\mathcal{E}}).\end{eqnarray*}

We then use (18) to get the expression in (22).

Remark 6. Let ${\mathcal{X}}^{(A)}$ be a trajectory of an absorbed chain, with initial distribution $\mu$ in I and finishing after it hits ${\mathcal{E}}$. It has length $\tau_{\mathcal{E}}$ and it corresponds to an absorbed trajectory of length $\tau_{\mathcal{E}}-1$ in the process ${\mathbb{Y}}^{({\mathcal{A}})}$ with alphabet $I^2\times {\mathcal{E}}^*$. In fact, if $({\mathcal{X}}_1, \ldots, {\mathcal{X}}_l,\varepsilon)$ with ${\mathcal{X}}_1, \ldots,\break {\mathcal{X}}_l\in I$, $\varepsilon\in {\mathcal{E}}$, is an absorbed trajectory of length $l+1$, then the associated trajectory in ${\mathbb{Y}}^{({\mathcal{A}})}$ is given by $(({\mathcal{X}}_r,{\mathcal{X}}_{r+1},o), r=1, \ldots, l-1; ({\mathcal{X}}_{l},j^*,\varepsilon))$ of length l. Here, $j^*\in I$ is an element chosen with distribution $\mu$ and it is the starting state of the next absorbed trajectory.

5.3. Entropy balance

The associated stationary chain ${\mathbb{X}}$ with transition kernel $P=P^\pi$ is retrieved from the stationary chain ${\mathbb{Y}}$ with transition kernel Q, a collection of Bernoulli variables ${\mathbb{B}}^{I,I}$ that assign 0 or 1 between the connections of ${\mathbb{Y}}$, a set of Bernoulli variables ${\mathbb{D}}^I$ giving the transition from I to ${\mathcal{E}}$, and a family of walks ${\mathbb{V}}$ whose components are Bernoulli variables $(G_n)$ distributed as $\pi(\bullet \mid {\mathcal{E}})$. The length of these walks is Geometric$(\pi(I))-1$ distributed, and so they could be empty.

It is straightforward to prove the following equality, relating $h({\mathbb{X}})$ given by (5) to the entropies of the elements forming the chain ${\mathbb{X}}$.

Proposition 5.

\begin{equation*}h({\mathbb{X}})=\pi(I) h\big({\mathbb{Y}}^{({\mathcal{A}})}\big)+\pi({\mathcal{E}})^2 h({\mathbb{G}})+\pi(I)\pi({\mathcal{E}}) \log \pi(I) +\pi({\mathcal{E}})^2 \log \pi({\mathcal{E}}).\end{equation*}

Below we discuss the way this equality appears. We have reduced the elements forming the chain ${\mathbb{X}}$ to only two, the absorbed chains ${\mathbb{Y}}^{({\mathcal{A}})}$ and the walks ${\mathbb{V}}$ with Bernoulli variables $G_n$. From (22), we have

\begin{align*}h\big({\mathbb{Y}}^{({\mathcal{A}})}\big) = & -\pi({\mathcal{E}})\sum_{j\in I}\mu(j)\log \mu(j)- \sum_{i,j\in I} \mu(i) P(i,j)\log P(i,j)\\ & -\sum_{i\in I, \delta\in {\mathcal{E}}}\!\!\!\! \mu(i) P(i,\delta) \log P(i,\delta),\end{align*}

and the Bernoulli sequence ${\mathbb{G}}=(G_n)$ has entropy

\begin{equation*}h({\mathbb{G}})=-\sum_{\delta\in {\mathcal{E}}} \pi(\delta \mid {\mathcal{E}}) \log\pi(\delta \mid {\mathcal{E}})=-\pi({\mathcal{E}})^{-1}\sum_{\delta\in {\mathcal{E}}} \pi(\delta) \log \pi(\delta)+\log \pi({\mathcal{E}}).\end{equation*}

Taking some N, we divide the sequence $(X_1, \ldots, X_N)$ into the set of absorbed chains ${\mathcal{X}}^{(A)}$ and the set of nonempty walks V in ${\mathcal{E}}$. The proportion of elements in I approaches $\pi(I)$ as $N\to \infty$. Therefore, from (9) we obtain that for every time $t\in \mathbb{T}$ there are in mean $\sum_{i\in I} \mu(i) P(i,{\mathcal{E}}) (\pi(I)^{-1}-1)$ points belonging to a walk in ${\mathcal{E}}$. Since the set of points in $\mathbb{T}$ has a weight $\pi(I)$, we obtain that the proportion of points in $\mathbb{Z}$ belonging to a walk in ${\mathcal{E}}$ is $\pi(I)\cdot \sum_{i\in I} \mu(i) P(i,{\mathcal{E}}) (\pi(I)^{-1}-1)=\pi({\mathcal{E}})^2$. Hence, the proportion of sites in $(X_1, \ldots, X_N)$ with symbols in ${\mathbb{G}}$ arising from a walk V in ${\mathcal{E}}$ approaches $\pi({\mathcal{E}})^2$ as $N\to \infty$. We have

(23) \begin{equation}\pi({\mathcal{E}})^2 h({\mathbb{G}})=-\pi({\mathcal{E}}) \sum_{\delta\in {\mathcal{E}}} \pi(\delta) \log \pi(\delta)+\pi({\mathcal{E}})^2\log \pi({\mathcal{E}}).\end{equation}

Let us compute $\pi(I) h\big({\mathbb{Y}}^{({\mathcal{A}})}\big)$. Since $\mu(i)=\pi(i)/\pi(I)$ for $i\in I$, we have

\begin{equation*}-\pi(I)\sum_{j\in I} \mu(j)\log \mu(j)=-\sum_{j\in I} \pi(j)\log \pi(j)+\pi(I)\log \pi(I),\end{equation*}

and so, using (22), we get

\begin{eqnarray*}\pi(I) h\big({\mathbb{Y}}^{({\mathcal{A}})}\big)&=&-\pi({\mathcal{E}})\sum_{j\in I}\! \pi(j)\log \pi(j)+\pi({\mathcal{E}}) \pi(I)\log \pi(I)\\&{}&\;\; -\sum_{i,j\in I}\! \pi(i) P(i,j)\log P(i,j)-\sum_{i\in I, \delta\in {\mathcal{E}}}\!\! \pi(i) P(i,\delta) \log P(i,\delta).\end{eqnarray*}

Then, the equality given in Proposition 5 has been proved:

\begin{equation*}h({\mathbb{X}})-(\pi({\mathcal{E}})^2 h({\mathbb{G}})+\pi({\mathcal{E}})^2 \log \pi({\mathcal{E}}))=\pi(I)h\big({\mathbb{Y}}^{({\mathcal{A}})}\big)+\pi({\mathcal{E}}) \pi(I)\log \pi(I).\end{equation*}

The term $\pi(I)\pi({\mathcal{E}}) \log \pi(I)$ has an origin similar to the last term in (23). In fact, from Remark 4, the resurrection weights $\mu(j)$, $j\in I$, appear with frequency $\pi({\mathcal{E}})$ in the sequence ${\mathbb{Y}}$ because this occurs at the sites where there is a jump to ${\mathcal{E}}$. Since the sequence ${\mathbb{Y}}$ appears with frequency $\pi(I)$, then the term $-\sum_{i\in I}\mu(i)\log \mu(i)$ appears with frequency $\pi(I)\pi({\mathcal{E}})$. Hence, as in (23), we have

\begin{equation*}-\pi(I)\pi({\mathcal{E}})\sum_{i\in I}\mu(i)\log \mu(i)=-\pi({\mathcal{E}})\sum_{i\in I}\pi(i)\log \pi(i)+\pi(I)\pi({\mathcal{E}}) \log \pi(I),\end{equation*}

and since $-\pi({\mathcal{E}})\sum_{i\in I}\pi(i)\log \pi(i)$ is the term present in (5), the extra term given by $\pi(I)\pi({\mathcal{E}}) \log \pi(I)$ appears.

Remark 7. From Remark 6 the length of an absorbed trajectory in ${\mathbb{Y}}^{({\mathcal{A}})}$ with alphabet $I^2\times {\mathcal{E}}^*$ is the same as the number of elements in I of an absorbed trajectory ${\mathcal{X}}^{(A)}$ starting from $\mu$ and absorbed when hitting ${\mathcal{E}}$ (this is of length $|{\mathcal{X}}^{(A)}|-1$, which counts the visited sites in I, but not the one containing the absorbing state). Since the entropy of a system is the gain of entropy per unit of time, the proportion of symbols given the entropy $h({\mathbb{Y}}^{({\mathcal{A}})})$ is $\pi(I)$. This explains why the term $\pi(I) h({\mathbb{Y}}^{({\mathcal{A}})})$ appears.

Appendix A. Proof of Proposition 2

Let us first show that the process $({\mathbb{S}}, \mathbb{T})$ is regenerative. Consider a pair of sequence $(a(u)\,:\, u\ge 0)$ and $(b(1), \ldots, b(m))$ taking values in $I\cup {\mathcal{E}}$ and such that $a(0)\in I$. From the construction of ${\mathbb{S}}$, we have

(24) \begin{align}\mathbb{P}({\mathbb{S}}_{T_n+l}=b(l), l=1, \ldots, m \mid {\mathbb{S}}_{T_{n}-u} & = a(u),u\ge 0)\\ & = \mathbb{P}({\mathbb{S}}_{T_n+l}=b(l), l=1, \ldots, m \mid {\mathbb{S}}_{T_n}=a(0)).\nonumber \end{align}

Also, we have that $({\mathbb{S}}_{T_l}\,:\, l\in \mathbb{Z})$ is equally distributed as $(Y_n\,:\, n\in \mathbb{Z})$. Then,

(25) \begin{equation}{\mathbb{S}}_{T_n}\sim \mu \hbox{ for all } n\in \mathbb{Z}.\end{equation}

This proves that $({\mathbb{S}}, \mathbb{T})$ is regenerative. Therefore, by making a shift on a random number of sites U uniformly chosen in $\{0, \ldots, T_1-1\}$ we define a stationary process ${\mathbb{S}}^{\text{s}}$ given by ${\mathbb{S}}^{\text{s}}_{t}={\mathbb{S}}_{t+U}$ for $t\in \mathbb{Z}$.

Since the random number U only depends on the length $T_1$, there is regeneration at the random times in $\mathbb{T}=\{t\in \mathbb{Z}\,:\, {\mathbb{S}}^{\text{s}}_t\in I\}$ (see relations (24) and (25)). Hence, for all $t\in \mathbb{Z}$, $b\in I\cup {\mathcal{E}}$, and $(a(u)\,:\, n\le t)$ taking values in $I\cup {\mathcal{E}}$, we have

(26) \begin{equation}\mathbb{P}({\mathbb{S}}^{\text{s}}_{t+1}=b \mid b{\mathbb{S}}^{\text{s}}_{u}=a(u),u\le t)= \mathbb{P}({\mathbb{S}}^{\text{s}}_{t+1}=b \mid {\mathbb{S}}^{\text{s}}_{u}=a(u) , u=t, \ldots, t-r) ,\end{equation}

where $r\ge 0$ is the first nonnegative element such that $a(t-r)\in I$.

Again using that U only depends on $T_1$, we get that, for all $t\in \mathbb{Z}$ and all $a,b\in I\cup {\mathcal{E}}$,

\begin{equation*}\mathbb{P}({\mathbb{S}}^{\text{s}}_{t+1}=b \mid {\mathbb{S}}^{\text{s}}_{t}=a)=\mathbb{P}({\mathbb{S}}_{u+1}=b \mid {\mathbb{S}}_{u}=a) \quad \hbox{for } u\neq -1,0.\end{equation*}

We avoid taking $u=1$ or $u=0$ because ${\mathbb{S}}_0$ only takes values in I.

Let us compute $\mathbb{P}(t\in \mathbb{T})$. It suffices to calculate $\mathbb{P}(t \not \in T)/\mathbb{P}(t\in \mathbb{T})$. From (9) we get that, for every time $t\in \mathbb{T}$, there are in mean

\begin{equation*}\sum_{i\in I}\mu(i)\sum_{j\in I}Q(i,j) {\overline{\theta}}_{i,j} \pi(I)^{-1}=\sum_{i\in I} \mu(i) P(i,{\mathcal{E}}) \pi(I)^{-1}\end{equation*}

points in $\mathbb{Z}\setminus \mathbb{T}$. Then, from (4) we find

\begin{equation*}\mathbb{P}(t\not\in \mathbb{T})/\mathbb{P}(t\in \mathbb{T})=\sum_{i\in I} \mu(i) P(i,{\mathcal{E}}) \pi(I)^{-1}= \pi({\mathcal{E}})/\pi(I).\end{equation*}

We conclude that

(27) \begin{equation}\mathbb{P}(t\in \mathbb{T})=\pi(I) \hbox{ and } \mathbb{P}(t\not\in \mathbb{T})=\pi({\mathcal{E}}).\end{equation}

Hence, $\mathbb{P}({\mathbb{S}}^{\text{s}}_t\in I)=\pi(I)$ and $\mathbb{P}({\mathbb{S}}^{\text{s}}_t\in {\mathcal{E}})=\pi({\mathcal{E}})$.

Let $i,j\in I$. We have $\mathbb{P}({\mathbb{S}}^{\text{s}}_t=i \mid {\mathbb{S}}^{\text{s}}_t\in I)=\mathbb{P}({\mathbb{S}}_0=i \mid {\mathbb{S}}_0\in I)=\mu(i)$, and so, using (27), we get

(28) \begin{equation}\mathbb{P}({\mathbb{S}}^{\text{s}}_t=i)=\mu(i) \mathbb{P}({\mathbb{S}}^{\text{s}}_t\in I)=\mu(i)\pi(I)=\pi(i).\end{equation}

Let $i,j\in I$. From the definition of $\theta_{i,j}$ in (10) we get

(29) \begin{equation}\mathbb{P}({\mathbb{S}}^{\text{s}}_{t+1}=j \mid {\mathbb{S}}^{\text{s}}_t=i)= \theta_{i,j}Q(i,j)=P(i,j).\end{equation}

We have $\sum_{j\in I}\mathbb{P}({\mathbb{S}}^{\text{s}}_{t+1}=j \mid {\mathbb{S}}^{\text{s}}_t=i)=1-P(i,{\mathcal{E}})$, so $\mathbb{P}({\mathbb{S}}^{\text{s}}_{t+1}\in {\mathcal{E}} \mid {\mathbb{S}}^{\text{s}}_t=i)=P(i,{\mathcal{E}})$. Then, ${\mathbb{S}}^{\text{s}}_t=i$ jumps to ${\mathcal{E}}$ with probability $P(i,{\mathcal{E}})$, and the jump to some particular state $\delta\in {\mathcal{E}}$ is done with probability

(30) \begin{equation}\mathbb{P}({\mathbb{S}}^{\text{s}}_{t+1}=\delta \mid {\mathbb{S}}^{\text{s}}_t=i)=P(i,{\mathcal{E}})P(i,\delta)/P(i,{\mathcal{E}})=P(i,\delta).\end{equation}

Let $\delta, \varepsilon \in {\mathcal{E}}$. We have $P(\delta, {\mathcal{E}})=\mathbb{P}(V\neq \emptyset)=\pi({\mathcal{E}})$, so

(31) \begin{equation}\mathbb{P}({\mathbb{S}}^{\text{s}}_{t+1}=\varepsilon \mid {\mathbb{S}}^{\text{s}}_t=\delta)=\pi(\varepsilon \mid {\mathcal{E}})P(\delta,{\mathcal{E}})=\pi(\varepsilon \mid {\mathcal{E}})\pi({\mathcal{E}})=\pi(\varepsilon).\end{equation}

Then, by using previous relations and (3), we get

\begin{align*}\mathbb{P}({\mathbb{S}}^{\text{s}}_t=\delta) & = \sum_{i\in I}\mathbb{P}({\mathbb{S}}^{\text{s}}_{t-1}=i, {\mathbb{S}}^{\text{s}}_t=\delta)+\sum_{\varepsilon\in {\mathcal{E}}}\mathbb{P}({\mathbb{S}}^{\text{s}}_{t-1}=\varepsilon, {\mathbb{S}}^{\text{s}}_t=\delta)\\ &=\sum_{i\in I}\mathbb{P}({\mathbb{S}}^{\text{s}}_t=\delta \mid {\mathbb{S}}^{\text{s}}_{t-1}=i)\pi(i)+\sum_{\varepsilon\in {\mathcal{E}}}\mathbb{P}({\mathbb{S}}^{\text{s}}_t=\delta \mid {\mathbb{S}}^{\text{s}}_{t-1}=\varepsilon)\mathbb{P}({\mathbb{S}}^{\text{s}}_{t-1}=\varepsilon)\\ &=\sum_{i\in I}P(i,\delta)\pi(i)+\pi(\delta)\sum_{\varepsilon\in {\mathcal{E}}}\mathbb{P}({\mathbb{S}}^{\text{s}}_{t-1}=\varepsilon)\\ &=\sum_{i\in I}P(i,\delta)\mu(i)\pi(I)+\pi(\delta)\pi({\mathcal{E}})=\pi(\delta)(\pi(I)+\pi({\mathcal{E}}))=\pi(\delta) .\end{align*}

Again from the construction of the process ${\mathbb{S}}$, it follows that

(32) \begin{equation}\mathbb{P}({\mathbb{S}}^{\text{s}}_{t+1}=\varepsilon \mid {\mathbb{S}}^{\text{s}}_t=\delta, {\mathbb{S}}^{\text{s}}_{u}=a(u), u<t)=\mathbb{P}({\mathbb{S}}^{\text{s}}_{t+1}=\varepsilon \mid {\mathbb{S}}^{\text{s}}_t=\delta)=\pi(\varepsilon).\end{equation}

Now, let us compute $\mathbb{P}({\mathbb{S}}^{\text{s}}_t=\varepsilon, {\mathbb{S}}^{\text{s}}_{t+1}=j)$ for $\varepsilon\in {\mathcal{E}}$, $j\in I$. The pair $({\mathbb{S}}^{\text{s}}_t=\varepsilon, {\mathbb{S}}^{\text{s}}_{t+1}=j)$ has its origin in some pair $(Y_s=i, Y_{s+1}=j)$ satisfying $B^{i,j}_{s+1}=0$, for some $i\in I$. Then, by summing over all states $i\in I$ and all pieces of trajectories in ${\mathcal{E}}$ that are built between i and j, and by using (30), (8), and (3), we get

(33) \begin{align}\mathbb{P}({\mathbb{S}}^{\text{s}}_t = \varepsilon, {\mathbb{S}}^{\text{s}}_{t+1} = j) & = \sum_{i\in I} \sum_{l\ge 1} \mathbb{P}({\mathbb{S}}^{\text{s}}_{t-l} = i;\ {\mathbb{S}}^{\text{s}}_{t-u}\!\in \!{\mathcal{E}}, 1\!\le \! u\! < \!l;\ {\mathbb{S}}^{\text{s}}_t = \varepsilon,{\mathbb{S}}^{\text{s}}_{t+1} = j)\nonumber\\ & = \sum_{i\in I}\! \mu(i) \mu(\kern1.3pt j) P(i,\varepsilon) \pi(I)+\! \sum_{i\in I}\!\mu(i)\! \mu(\kern1.3pt j)\!\!\!\!\!\!\! \sum_{l\ge 1; \delta_1,\ldots,\delta_l\in {\mathcal{E}}}\!\!\left(\! P(i,\delta_1)\!\prod_{k=2}^{l-1}\!\pi(\delta_k)\! \right)\! \pi(\varepsilon)\pi(I)\nonumber\\ & = \pi(\varepsilon)\pi(I)\mu(j)=\pi(\varepsilon)\pi(j).\end{align}

From relations (28)–(33) we get that the bivariate marginals of the stationary chains ${\mathbb{X}}=(X_n)$ and ${\bf{{\mathbb{S}}}^{\text{s}}}=({\mathbb{S}}^{\text{s}}_n)$ are the same. So, we will finish by proving that ${\bf{{\mathbb{S}}}^{\text{s}}}$ satisfies the Markov property. In view of the regeneration property (26), this will be proven once we show that

(34) \begin{equation}\mathbb{P}({\mathbb{S}}^{\text{s}}_{t+1}=b \mid {\mathbb{S}}^{\text{s}}_{u}=a(u) , u=t, \ldots, t-r)=\mathbb{P}(X_{t+1}=b \mid X_{t}=a(t)) ,\end{equation}

where $r\ge 0$ and satisfies $a(t-r)\in I$, $a(t-u)\in {\mathcal{E}}$ for $u=1, \ldots, r-1$. This was shown for the case $r=0$ in (29) and (30). On the other hand, (32) proves (34) in the case $b\in {\mathcal{E}}$ and $r>0$. So, the unique case left to show is $b\in I$ and $r>0$.

Let $i,j\in I$, $r>0$, $\delta_u\in {\mathcal{E}}$ for $u=0, \ldots, r-1$. Since $\mathbb{P}(X_{t+1}=j \mid X_{t-u}=\delta_0)=\pi(j)$, to achieve the proof of (34), the unique relation that we are left to show is

\begin{equation*}\mathbb{P}({\mathbb{S}}^{\text{s}}_{t+1}=j \mid {\mathbb{S}}^{\text{s}}_{t-u}=\delta_u, u=0, \ldots, r-1,{\mathbb{S}}^{\text{s}}_{t-r}=i)=\pi(j).\end{equation*}

We have

\begin{align*} \mathbb{P}({\mathbb{S}}^{\text{s}}_{t+1} & =j, {\mathbb{S}}^{\text{s}}_{t-u}=\delta_u, u=0,..,r-1, {\mathbb{S}}^{\text{s}}_{t-r}=i)\\ & = \mu(i)(P(i,j)+P(i,{\mathcal{E}})\mu(j))\theta_{i,j}\frac{P(i,\delta_0)}{P(i,{\mathcal{E}})}\left(\prod_{u=1}^{r-1}\pi(\delta_l)\right)\pi(I)\\ & = \mu(i)P(i,{\mathcal{E}})\mu(j)\frac{P(i,\delta_0)}{P(i,{\mathcal{E}})}\left(\prod_{u=1}^{r-1}\pi(\delta_l)\right)\pi(I)\\ & = \mu(i)P(i,\delta_0)\mu(j)\left(\prod_{u=1}^{r-1}\pi(\delta_l)\right)\pi(I),\end{align*}

and

\begin{align*}\mathbb{P}({\mathbb{S}}^{\text{s}}_t=i, {\mathbb{S}}^{\text{s}}_{t-u}=\delta_u\,:\, u=0, \ldots, r-1) & = \left(\sum_{j\in I}\mu(i)P(i,\delta_0)\mu(j)\right)\left(\prod_{u=1}^{r-1}\pi(\delta_l)\right)\\ & = \mu(i)P(i,\delta_0)\left(\prod_{u=1}^{r-1}\pi(\delta_l)\right).\end{align*}

Therefore,

\begin{equation*}\mathbb{P}({\mathbb{S}}^{\text{s}}_{t+1}=j \mid {\mathbb{S}}^{\text{s}}_{t-u}=\delta_u, u=0, \ldots, r-1, {\mathbb{S}}^{\text{s}}_{t-r}=i)=\mu(j)\pi(I)=\pi(j).\end{equation*}

Then (34) follows. We have proven that the laws of the stationary chains ${\mathbb{X}}=(X_t)$ and ${\bf{{\mathbb{S}}}^{\text{s}}}=({\mathbb{S}}^{\text{s}}_t)$ are the same.

Acknowledgements

This work was supported by the Basal ANID project AFB170001. The author thanks Dr. Michael Schraudner from CMM, University of Chile, for calling my attention to reference [Reference Downarowicz and Serafin6]. He is also indebted to an anonymous referee and the Editor for several comments, suggestions, and corrections allowing him to improve the whole presentation of the paper.

References

Abramov, L. and Rokhlin, V. (1962). The entropy of a skew product of measure-preserving transformations. Vestnik Leningrad Univ. 17, 513 (in Russian).Google Scholar
Asmussen, S. (2003). Applied Probability and Queues, 2nd edn. New York, Springer.Google Scholar
Collet, P., Martínez, S. and San Martín, J. (2013). Quasi-Stationary Distributions: Markov Chains, Diffusions and Dynamical Systems. Heidelberg, Springer.CrossRefGoogle Scholar
Darroch, J. and Seneta, E. (1965). On quasi-stationary distributions in absorbing discrete-time finite Markov chains. J. Appl. Prob. 2, 88100.CrossRefGoogle Scholar
Denker, M., Grillenberger, C. and Sigmund, K. (1976). Ergodic Theory on Compact Spaces. Berlin, Springer.CrossRefGoogle Scholar
Downarowicz, T. and Serafin, J. (2002). Fiber entropy and conditional variational principles in compact non-metrizable spaces. Fund. Math. 172, 217247.CrossRefGoogle Scholar
Ferrari, P. A., Kesten, H., Martínez, S. and Picco, P. (1995). Existence of quasi-stationary distributions. A renewal dynamical approach. Ann. Prob. 23, 501521.CrossRefGoogle Scholar
Ferrari, P. A., Martínez, S. and Picco, P. (1992). Existence of non-trivial quasi-stationary distributions in the birth-death chain. Adv. Appl. Prob. 24, 795813.CrossRefGoogle Scholar
Keane, M. and Smorodinsky, M. (1979). Finitary isomorphisms of irreducible Markov shifts. Israel J. Math. 34, 281286.CrossRefGoogle Scholar