Hostname: page-component-745bb68f8f-l4dxg Total loading time: 0 Render date: 2025-02-06T04:52:31.379Z Has data issue: false hasContentIssue false

The rescaled Pólya urn: local reinforcement and chi-squared goodness-of-fit test

Published online by Cambridge University Press:  18 October 2022

Giacomo Aletti*
Affiliation:
Università degli Studi di Milano
Irene Crimaldi*
Affiliation:
IMT School for Advanced Studies Lucca
*
*Postal address: ADAMSS Center, Università degli Studi di Milano, Milan, Italy.
**Postal address: IMT School for Advanced Studies Lucca, Lucca, Italy.
Rights & Permissions [Opens in a new window]

Abstract

Motivated by recent studies of big samples, this work aims to construct a parametric model which is characterized by the following features: (i) a ‘local’ reinforcement, i.e. a reinforcement mechanism mainly based on the last observations, (ii) a random persistent fluctuation of the predictive mean, and (iii) a long-term almost sure convergence of the empirical mean to a deterministic limit, together with a chi-squared goodness-of-fit result for the limit probabilities. This triple purpose is achieved by the introduction of a new variant of the Eggenberger–Pólya urn, which we call the rescaled Pólya urn. We provide a complete asymptotic characterization of this model, pointing out that, for a certain choice of the parameters, it has properties different from the ones typically exhibited by the other urn models in the literature. Therefore, beyond the possible statistical application, this work could be interesting for those who are concerned with stochastic processes with reinforcement.

Type
Original Article
Copyright
© The Author(s), 2022. Published by Cambridge University Press on behalf of Applied Probability Trust

1. Introduction: framework and motivation

The well-known Pearson chi-squared test of goodness of fit is a statistical test applied to categorical data to establish whether an observed frequency distribution differs from a theoretical probability distribution. In this test the observations are always assumed to be independent and identically distributed (i.i.d.). Under this hypothesis, in a multinomial sample of size N, the chi-squared statistics

(1) \begin{equation}\chi^2 =\sum_{i=1}^k \frac{( O_i - E_i)^2}{E_i} =N \sum_{i=1}^k \frac{( \widehat{p}_i - p_i)^2}{p_i}\end{equation}

(where k is the number of possible values, and $O_{i}$ , $E_{i}$ , $\widehat{p}_i=O_i/N$ , and ${p}_i=E_i/N$ are the observed and expected absolute and relative frequencies, respectively) is proportional to N, which is multiplied by the chi-squared distance between the observed and expected probabilities. Therefore, the goodness-of-fit test based on this statistics is highly sensitive to the sample size N (see, for instance, [Reference Bergh8, Reference Knoke, Bohrnstedt and Potter Mee36]): the larger N, the more significant a small value of the chi-squared distance. More precisely, the value of the chi-squared distance has to be compared with the ‘critical’ value $\chi^2_{1-\theta}(k-1)/N$ , where $\chi^2_{1-\theta}(k-1)$ denotes the quantile of order $1-\theta$ of the chi-squared distribution $\chi^2(k-1)$ with $k-1$ degrees of freedom. Hence, it is clear that the larger N, the easier the rejection of $H_0$ . As a consequence, in the context of ‘big data’ (e.g. [Reference Bergh8, Reference Bertoni11]), where one often works with correlated noised data, suitable generative models and related chi-squared goodness-of-fit tests are needed.

Different types of correlation have been taken into account and different techniques have been developed to control the performance of the goodness-of-fit test based on (1) (see, among others, [Reference Bergh8, Reference Chanda14, Reference Gasser26, Reference Gleser and Moore29, Reference Pan43, Reference Pei, Tang and Guo44, Reference Radlow and Alf46, Reference Tang, Pei, Wong and Li52], where some form of correlation is introduced in the sample, and variants of the chi-squared statistics are proposed and analyzed mainly by means of simulations). Our approach differs from the one adopted in the previously quoted papers. Indeed, our starting point is that a natural way to get a positive correlation between events of the same type is to deal with the Dirichlet-multinomial (D-M) distribution: briefly, the parameters of the D-M distribution are randomized a priori with a Dirichlet distribution, yielding an exchangeable (not independent) sequence. The variance–covariance matrix of the D-M distribution is equal to that of the multinomial (M) distribution, multiplied by a fixed constant greater than 1: precisely, given the k parameters $\textbf{b}_{\textbf{0}} = ({b_{0\,1}}, \ldots,{b_{0\,k}})$ of the D-M distribution and setting $|\textbf{b}_{\textbf{0}}| =\sum_{i=1}^k {b_{0\,i}}$ , we have

\begin{align*}Var_{\text{D-M}} (O_i) & = N \frac{b_{0\,i}}{|\textbf{b}_{\textbf{0}}|}\bigg( 1 - \frac{b_{0\,i}}{|\textbf{b}_{\textbf{0}}|} \bigg)\frac{N+|\textbf{b}_{\textbf{0}}| }{1+ |\textbf{b}_{\textbf{0}}|} =Var_{\text{M}} (O_i) \frac{N+|\textbf{b}_{\textbf{0}}| }{1+ |\textbf{b}_{\textbf{0}}|} ,\\[5pt] Cov_{\text{D-M}} (O_i,O_j) & = -N \frac{b_{0\,i}b_{0\,j}}{|\textbf{b}_{\textbf{0}}|^2}\frac{N+|\textbf{b}_{\textbf{0}}|}{1+ |\textbf{b}_{\textbf{0}}|} =Cov_{\text{M}} (O_i,O_j) \frac{N+|\textbf{b}_{\textbf{0}}| }{1+ |\textbf{b}_{\textbf{0}}|} ,\qquad \mbox{for } i\neq j.\end{align*}

Therefore, if we set $|\textbf{b}_{\textbf{0}}| = \frac{1-\rho^2}{\rho^2}$ , we have, for any $i,j \in \{1, \ldots, k\}$ ,

(2) \begin{equation}Cov_{\text{D-M}} (O_i,O_j) =\big(1+ (N-1)\rho^2\big) \, Cov_{\text{M}} (O_i,O_j) ,\end{equation}

where $\rho$ represents a correlation parameter. Roughly speaking, the D-M model adds variance to the multinomial model by taking a mixture or by adding a positive correlation. The property (2) is fundamental for our purpose. In fact, as highlighted in [Reference Rao and Scott47], the two conditions (i) $\widehat{p}_i=O_i/N \to p_i$ almost surely for $N\to\infty$ and (ii) $Cov(O_i,O_j)=\lambda Cov_{\text{M}}(O_i,O_j)$ with $\lambda>1$ imply that the statistics $\chi^2$ , defined in (1), is asymptotically distributed as $\chi^2(k-1)\lambda$ (see [Reference Rao and Scott47, Corollary 2]), so that the critical value for the chi-squared distance becomes $ \chi^2_{1-\theta}(k-1)\lambda/N$ , where $\lambda$ mitigates the effect of N. As already observed, the D-M model satisfies (ii), but it is well known that it does not meet the condition (i). In this paper we present a variant of the D-M model such that the condition (i) holds true and the chi-squared statistics (1) is asymptotically distributed as $\chi^2(k-1)\lambda$ with $\lambda>1$ .

The D-M distribution may be generated by means of the standard Eggenberger–Pólya urn (see [Reference Eggenberger and Pólya25, Reference Mahmoud38]), a model that has been widely studied and generalized (some recent variants can be found in [Reference Aletti, Ghiglietti and Paganoni5, Reference Aletti, Ghiglietti and Rosenberger6, Reference Aletti, Ghiglietti and Vidyashankar7, Reference Berti, Crimaldi, Pratelli and Rigo10, Reference Caldarelli, Chessa, Crimaldi and Pammolli12, Reference Caron13, Reference Chen and Kuba15, Reference Collevecchio, Cotar and LiCalzi17, Reference Crimaldi18, Reference Ghiglietti and Paganoni27, Reference Ghiglietti, Vidyashankar and Rosenberger28, Reference Laruelle and Pagés37]). This urn model with k colors works as follows. An urn contains $N_{0\, i}$ balls of color i, for $i=1,\dots,\, k$ . At each discrete time, a ball is drawn from the urn, and then it is again placed in the urn together with $\alpha>0$ additional balls of the same color. Therefore, if we denote by $N_{n\, i}$ the number of balls of color i in the urn at time n, we have for $n\geq 1$

\begin{equation*}N_{n\, i}=N_{n-1\,i}+\alpha\xi_{n\,i},\end{equation*}

where $\xi_{n\,i}=1$ if the extracted ball at time n is of color i, and $\xi_{n\,i}=0$ otherwise. The parameter $\alpha$ regulates the reinforcement mechanism: the greater $\alpha$ , the greater the dependence of $N_{n\,i}$ on $\sum_{m=1}^n\xi_{m\,i}$ . In addition, it is well known that the conditional expectation of the sequential extractions, i.e. $E[\xi_{n+1\,i} |\, \mbox{`past'}]$ , also known as the predictive mean, converges almost surely to a beta-distributed random variable, forcing the empirical mean $\bar{\xi}_{N\,i} = \sum_{n=1}^N\xi_{n\,i}/N$ to converge almost surely to the same limit.

In this work we exhibit an urn model that preserves the relevant aspects of the models above: a reinforcement mechanism, together with a global almost sure convergence of the empirical mean of the sequential extraction toward a fixed limit. However, differently from the previous models, for a certain choice of the parameters, the predictive mean $E[\xi_{n+1\,i} |\, \mbox{`past'}]$ randomly fluctuates without converging almost surely, asymptotically forming a stationary ergodic process. As a consequence, since the classical martingale approach and the standard stochastic approximation require or imply the convergence of $E[\xi_{n+1\,i} |\, \mbox{`past'}]$ (e.g. [Reference Aletti, Crimaldi and Ghiglietti1, Reference Berti, Crimaldi, Pratelli and Rigo9, Reference Laruelle and Pagés37]), in order to prove asymptotic results for the new urn model introduced, we need mathematical methods that are not usual in the urn modeling literature.

Rescaled Pólya urn

We introduce a new variant of the Eggenberger–Pólya urn with k colors, which we call the rescaled Pólya (RP) urn model. In this model, the almost sure limit of the empirical mean of the draws will play the rôle of an intrinsic long-run characteristic of the process, while a local mechanism generates persistent fluctuations. More precisely, the RP urn model is characterized by the introduction of the parameter $\beta$ , together with the initial parameters $(b_{0\, i})_{i=1,\dots,\,k}$ and $(B_{0\, i})_{i=1,\dots,\,k}$ , next to the parameter $\alpha$ of the original model, so that

(3) \begin{equation}\begin{aligned}N_{n\, i}& =b_{0\, i}+B_{n\, i} &&\text{with }\\B_{n\, i}&=\beta B_{n-1\, i}+\alpha\xi_{n\, i},&& n\geq 1.\end{aligned}\end{equation}

Therefore, the urn initially contains $b_{0\,i}+B_{0\,i}$ balls of color i, and the parameter $\beta\geq 0$ , together with $\alpha>0$ , regulates the reinforcement mechanism. More precisely, $N_{n\,i}$ is the sum of three terms:

  • the term $b_{0\,i}$ , which remains constant along time;

  • the term $\beta B_{n-1\,i}$ , which links $N_{n\,i}$ to the ‘configuration’ at time $n-1$ , through the ‘scaling’ parameter $\beta$ that tunes the dependence on this factor;

  • the term $\alpha\xi_{n\,i}$ , which links $N_{n\,i}$ to the outcome of the extraction at time n, through the parameter $\alpha$ that tunes the dependence on this factor.

Note that the case $\beta=1$ corresponds to the standard Eggenberger–Pólya urn with an initial number $N_{0\,i}=b_{0\,i}+B_{0\,i}$ of balls of color i, while, when $\beta\neq 1$ , the RP urn does not fall among the variants of the Eggenberger–Pólya urn discussed in [Reference Pemantle45, Section 3.2], and, as explained in detail in Section 2, it does not belong to the class of reinforced stochastic processes studied in [Reference Aletti, Crimaldi and Ghiglietti1, Reference Aletti, Crimaldi and Ghiglietti3, Reference Aletti, Crimaldi and Ghiglietti2, Reference Crimaldi, Dai Pra, Louis and Minelli20, Reference Crimaldi, Dai Pra and Minelli21, Reference Dai Pra, Louis and Minelli23, Reference Sahasrabudhe50].

The quantities $p_{0\,1},\ldots,p_{0\,k}$ defined as

(4) \begin{equation}p_{0\,i}=\frac{b_{0\,i}}{\sum_{j=1}^k b_{0\,j}}\end{equation}

can be seen as an intrinsic probability distribution on the possible values (colors) $\{1,\dots,\,k\}$ , that remains constant over time, and that will be related to the long-term characteristic of the process, while the random variables $(B_{n\,1},\dots,\, B_{n\,k})$ model random fluctuations over time, so that the probability distribution on the set of the k possible values at time n is given by

\begin{equation*}\psi_{n\,i}=\frac{N_{n\, i}}{\sum_{j=1}^k N_{n\, j}}=\frac{b_{0\, i}+B_{n\, i}}{\sum_{j=1}^k b_{0\, j}+\sum_{j=1}^k B_{n\, j}}.\end{equation*}

Assuming for $B_{n\,i}$ the dynamics (3) with $\beta>0$ , the probability $\psi_{n\,i}$ increases with the number of times we have observed the value i (see Equation (13) below), and so the random variables $\xi_{n\,i}$ are generated according to a reinforcement mechanism. But, in particular, when $\beta<1$ , the reinforcement at time n associated to observation $\xi_{m\,i}$ , with $m=1,\dots,\,n$ , increases exponentially with m (we refer again to (13) below), leaving the fluctuations to be driven by the most recent draws. We refer to this feature as ‘local’ reinforcement. The case $\beta=0$ is an extreme case where $\psi_{n\,i}$ depends only on the last draw $\xi_{n\,i}$ (and not on $\xi_{m\,i}$ with $m=1,\dots,\,n-1$ ). Hence, we are mainly interested in the case $\beta\in [0,1)$ , because in this case the RP urn exhibits the following distinctive characteristics:

  1. (a) for each i, the process $(\psi_{n\,i})_n$ randomly fluctuates, driven by the most recent observations (‘local’ reinforcement), and does not converge almost surely;

  2. (b) for each i, the empirical mean $\bar{\xi}_{N\,i} = \sum_{n=1}^N\xi_{n\,i}/N$ , that is, the empirical frequency $O_i/N$ , converges almost surely to the deterministic limit $p_i=p_{0\,i}$ ;

  3. (c) the chi-squared statistics (1) is asymptotically distributed as $\chi^2(k-1)\lambda$ with $\lambda>1$ .

As stated before, owing to (a), the usual methods adopted in the urn literature do not work for $\beta < 1$ , and so different techniques are needed for the study of the RP urn model.

We have also considered the asymptotic results for $\beta>1$ , to complete the study of the RP urn model. In this situation, the process $(\psi_{n\,i})_n$ converges exponentially fast to a random limit, and so even faster than in the classical Eggenberger–Pólya urn. Therefore, in this case, we may apply the usual martingale technique (e.g. [Reference Aletti, Crimaldi and Ghiglietti1, Reference Berti, Crimaldi, Pratelli and Rigo9, Reference Laruelle and Pagés37]). Note that at $\beta=1$ we have a phase transition: for the empirical means, we pass from a deterministic limit (the case $\beta<1$ ) to a random limit (the case $\beta\geq 1$ ), and for the predictive means, we pass from fluctuations (the case $\beta<1$ ) to almost sure convergence (the case $\beta\geq 1$ ).

In Figure 1 we show the properties (a) and (b) for $\beta = 0$ and $\beta \in (0, 1)$ (Figure 1(A) and Figure 1(B), respectively) compared with the classical behavior of the processes for $\beta = 1$ and $\beta > 1$ (Figure 1(C) and Figure 1(D), respectively).

Figure 1. Simulations of the two processes $(\psi_{n\, 1})_n$ (grey) and $(\bar{\xi}_{n\,1} )_n$ (black), with $n = 1, \ldots, 20000$ , $p_{0\,1} = \frac{1}{2}$ , and for different values of $\alpha$ and $\beta$ : (A) $\alpha = 199$ , $\beta = 0$ ; (B) $\alpha = 1$ , $\beta = 0.975$ ; (C) $\alpha = 1$ , $\beta = 1$ ; (D) $\alpha = 0.5$ , $\beta = 1.0001$ . As shown, when $\beta <1$ , $(\psi_{n\, 1})_n$ exhibits a persistent fluctuation, locally reinforced, and $(\bar{\xi}_{n\,1} )_n$ converges to the deterministic limit $p_{0\,1}$ . When $\beta \geq 1$ , the y-axis is zoomed to show the random fluctuations of both the processes towards the same random limit.

Goodness-of-fit result and its statistical application

Given a sample $(\boldsymbol{\xi_{\textbf{1}}}, \ldots, \boldsymbol{\xi_{\textbf{N}}})$ (where $\boldsymbol{\xi}_{\textbf{n}}$ denotes the random vector with components $\xi_{n\,i}$ , $i=1,\dots,\,k$ ) generated by an RP urn, the statistics

\begin{equation*}O_i = \#\{n \,:\, \xi_{n\,i}=1\}=\sum_{n=1}^N \xi_{n\,i},\, \qquad i = 1, \ldots, k,\end{equation*}

counts the number of times we have observed the value i. When $\beta\in[0,1)$ , the theorem below states the almost sure convergence of the empirical mean $\overline{\xi}_{N\,i}=O_i/N$ toward the probability $p_{0\,i}$ , together with a chi-squared goodness-of-fit result for the long-term probabilities $p_{0\,1},\dots,\, p_{0\,k}$ . More precisely, we prove the following result.

Theorem 1. Assume $p_{0\,i}>0$ for all $i=1,\dots,\,k$ and $\beta \in[0,1)$ . Define the constants $\gamma$ and $\lambda$ as

(5) \begin{equation}\gamma = \beta + (1-\beta) \frac{\alpha}{(1-\beta)\sum_{i=1}^k b_{0\,i} + {\alpha}}\in (\beta,1)\qquad {and }\end{equation}
(6) \begin{equation}\lambda = \frac{(1-\beta)^2}{(\gamma-\beta)^2 + (1-\gamma^2)}\bigg( 1 + 2\frac{\gamma}{1-\gamma}\bigg)>1.\qquad\qquad\ \end{equation}

Then $O_i/N\stackrel{a.s.}\longrightarrow p_{0\,i}$ and

\begin{equation*}\sum_{i=1}^{k} \frac{(O_{i} - N p_{0\,i})^2}{Np_{0\,i}}\mathop{\longrightarrow}^{d}_{N\to\infty}W_{*}= \lambda W_{0},\end{equation*}

where $W_{0}$ has distribution $\chi^2(k-1)=\Gamma\!\left(\frac{(k-1)}{2},\frac{1}{2} \right)$ , and consequently $W_{*}$ has distribution $\Gamma\!\left(\frac{k-1}{2},\frac{1}{2\lambda}\right)$ .

From a mathematical point of view, the proof of the above result easily follows from Theorem 2, whose proof occupies the main part of the present work.

A possible application we have in mind was inspired by [Reference Bertoni11, Reference Micheletti41] and is the following. We suppose we have a sample $\{\boldsymbol{\xi}_n\,:\,n=1,\dots,\,N\}$ , where the observations cannot be assumed i.i.d., but they exhibit a structure in clusters, with independence between clusters and with correlation inside each cluster. This is a usual circumstance in many applications (e.g. [Reference Chessa, Crimaldi, Riccaboni and Trapin16, Reference Ieva, Paganoni, Pigoli and Vitelli34, Reference Tharwat53, Reference Xu and Tian55]). More precisely, we consider the situation when inside each cluster the probability that a certain unit chooses the value i is affected by the number of units in the same cluster that have already chosen the value i, hence according to a reinforcement rule. For example, we can imagine that our dataset collects messages from the online social network Twitter: ‘tweets’ referring to different topics can be placed in different clusters. If the topics are distant from each other, we can assume independence between clusters. Inside each cluster, the tweets are temporally ordered and the associated ‘sentiment’ is observed to be driven by a local reinforcement mechanism: the probability of having a tweet with positive sentiment is increasing in the number of past tweets with positive sentiment, but the reinforcement is mostly driven by the most recent tweets, leading to a fluctuations of the predictive means [Reference Aletti, Crimaldi and Saracco4]. A different clustering of the tweets can be obtained with different slots of time, sufficiently far from each other.

Another example is the following. Each cluster corresponds to an agent. The agents act independently of each other (independence between clusters). At each time-step, each agent has to choose between k brands that are related to a loyalty program: the more often the agent selects the same brand, the more loyalty points he/she gains. This fact induces the reinforcement mechanism, and it could make sense that the reinforcement is mostly driven by the most recent actions. Finally, we can have the case where clusters are associated to some products, and at each time-step a customer has to give a vote to each product on an online platform. Each cluster collects the votes for the corresponding product. If the products belong to very different categories, we can assume independence between clusters; while, if the customers can see the votes given by the previous customers, we may have a reinforcement mechanism, mainly based on the last observations.

Formally, we suppose that the N units are ordered so that we have the following L clusters of units:

\begin{equation*} C_{\ell}=\left\{\sum_{l=1}^{\ell-1}N_l+1,\dots,\, \sum_{l=1}^\ell N_l\right\}, \qquad \ell=1,\dots,\,L.\end{equation*}

Therefore, the cardinality of each cluster $C_\ell$ is $N_\ell$ . We assume that the units in different clusters are independent; that is,

\begin{equation*}\big[\boldsymbol{\xi_{\textbf{1}}},\dots,\, \boldsymbol{\xi_{\textbf{N}_{\textbf{1}}}}\big],\,\dots\,,\Big[{{\boldsymbol{\xi}_{\sum_{\textbf{l} = \textbf{1}}^{\boldsymbol\ell-\textbf{1}}\textbf{N}_{\textbf{l}}+\textbf{1}}}},\dots,\, {\boldsymbol\xi_{\sum_{\textbf{l}=\textbf{1}}^{\boldsymbol\ell} \textbf{N}_{\textbf{l}}}}\Big],\,\dots\,,\Big[\boldsymbol{\xi_{\sum_{{\textbf{l}} = \textbf{1}}^{{\textbf{L}}-\textbf{1}}\textbf{N}_{\textbf{l}}+\textbf{1}}},\dots,\, \boldsymbol{\xi_{\textbf{N}}}\Big]\end{equation*}

are L independent multidimensional random variables. Moreover, we assume that the observations inside each cluster can be modeled as an RP urn with $\beta\in [0,1)$ . We denote by $ p_{0\,1}(\ell),\dots,\,p_{0\,k}(\ell)$ the intrinsic long-run probabilities for the cluster $C_\ell$ , which we assume strictly positive, and we assume the same parameter $\lambda>1$ for each cluster (but not necessarily the same parameters $\alpha$ and $\beta$ ), so that all of the L random variables

\begin{equation*}Q_\ell=\sum_{i=1}^{k} \frac{\big(O_{i}(\ell) - N_\ell p_{0\,i}(\ell)\big)^2}{N_\ell p_{0\,i}(\ell)}, \quad \text{with }O_i(\ell)= \#\big\{n\in C_\ell \,:\, \xi_{n\,i}=1\big\},\end{equation*}

are asymptotically distributed as $\Gamma\big(\frac{k-1}{2},\frac{1}{2\lambda}\big)$ . Since $Q_1,\dots,\,Q_L$ are independent because they refer to different clusters, when all the cluster sizes $N_\ell$ are large, we can estimate the parameter $\lambda$ by means of the (asymptotic) maximum likelihood and obtain

\begin{equation*}\widehat{\lambda}=\frac{\sum_{\ell=1}^L Q_\ell}{L(k-1)}\stackrel{d}\sim\Gamma\bigg(\frac{L(k-1)}{2}, \frac{L(k-1)}{2\lambda}\bigg).\end{equation*}

Note that $E[\widehat{\lambda}]=\lambda$ ; that is, the estimator is unbiased. Moreover, $\widehat{\lambda}/\lambda$ has asymptotic distribution $\Gamma\!\left(\frac{L(k-1)}{2}, \frac{L(k-1)}{2}\right)$ (that does not depend on $\lambda$ ), and so it can be used to construct asymptotic confidence intervals for $\lambda$ . Moreover, given certain (strictly positive) intrinsic probabilities $p_{0\,1}^*(\ell),\dots,\,p_{0\,k}^*(\ell)$ for each cluster $C_\ell$ , we can use the above procedure with $p_{0\,i}(\ell)=p_{0\,i}^*(\ell)$ for $i=1,\dots,\, k$ and $\ell=1,\dots,\,L$ to obtain an estimate $\widehat{\lambda}^*$ of $\lambda$ , and then use the statistics $Q_\ell$ with $p_{0\,i}(\ell)=p_{0\,i}^*(\ell)$ and the corresponding asymptotic distribution $\Gamma\!\left(\frac{k-1}{2},\frac{1}{2\widehat{\lambda}^*}\right)$ to perform a $\chi^2$ -test with null hypothesis

\begin{equation*}H_0:\quad p_{0\,i}(\ell)=p_{0\,i}^*(\ell)\qquad\forall i=1,\dots,\,k.\end{equation*}

Regarding the probabilities $p_{0\,i}^*(\ell)$ , some possibilities are as follows:

  • we can take $p_{0\,i}^*(\ell)=1/k$ for all $i=1,\dots,\,k$ if we want to test possible differences in the probabilities for the k different values;

  • we can suppose that we have two different periods of time, and so two samples, say $\big\{\boldsymbol{\xi^{(1)}_{\textbf{n}}}\,:\,n=1,\dots,\,N\big\}$ and $\big\{\boldsymbol{\xi^{(2)}_{\textbf{n}}}\,:\,n=1,\dots,\,N\big\}$ ; take $p_{0\,i}^*(\ell)=\sum_{n\in C_\ell}\xi^{(1)}_{n\,i}/N_\ell$ for all $i=1,\dots,\, k$ ; and perform the test on the second sample in order to check for possible changes in the intrinsic long-run probabilities;

  • we can take one of the clusters as a benchmark, say $\ell^*$ ; set $p_{0\,i}^*(\ell)=\sum_{n\in C_{\ell^*}}\xi_{n\,i}/N_{\ell^*}$ for all $i=1,\dots,\, k$ and $\ell\neq\ell^*$ ; and perform the test for the other $L-1$ clusters in order to check differences with the benchmark cluster $\ell^*$ .

Structure of the paper

Summing up, the sequel of the paper is structured as follows. In Section 2 we set up our notation and formally define the RP urn model with parameters $\alpha>0$ and $\beta\geq 0$ . In Section 3 we provide a complete characterization of the RP urn for the three cases $\beta=0$ , $\beta\in (0, 1)$ , and $\beta>1$ . (We do not deal with the case $\beta=1$ because, as stated before, it coincides with the standard Eggenberger–Pólya urn, whose properties are well known.) In particular we show that for each i, the empirical mean of the $\xi_{n\,i}$ almost surely converges to the intrinsic probabilities $p_{0\,i}$ when $\beta\in [0,1)$ , while it almost surely converges to a random limit when $\beta> 1$ . We obtain also the corresponding central limit theorems (CLTs), which, in particular for $\beta\in [0,1)$ , are the basis for the proof of Theorem 1. For completeness, we also describe the case $\alpha = 0$ , which generates a sequence of independent draws. Section 4 contains the proof of Theorem 1, which gives the possibility of constructing a chi-squared test for the intrinsic long-run probabilities when the observed sample is assumed to be generated by an RP urn with $\beta\in [0,1)$ . Finally, the paper contains an appendix: in Section A.1 we state and prove a general CLT for Markov chains with a compact state space $S\subset\mathbb{R}^k$ , under a certain condition which we call the ‘linearity’ condition, and in Section A.2 we explain a fundamental coupling technique used in the proof of the CLT for $\beta\in (0, 1)$ .

2. The rescaled Pólya urn model

In all the sequel (unless otherwise specified) we suppose given two parameters $\alpha>0$ and $\beta \geq 0$ . Given a vector $\textbf{x}=(x_1, \ldots, x_k)^\top\in \mathbb{R}^k$ , we set $ |\textbf{x}| =\sum_{i=1}^k |x_i| $ and $ \|\textbf{x}\|^2 = \textbf{x}^\top \textbf{x}=\sum_{i=1}^k |x_i|^2 $ . Moreover we denote by $\textbf{1}$ and $\textbf{0}$ the vectors with all the components equal to 1 and equal to 0, respectively, and by $\{\textbf{e}_{\textbf{1}}, \ldots, \textbf{e}_{\textbf{k}}\}$ the canonical base of $\mathbb{R}^k$ .

To formally work with the RP urn model presented in the introduction, we add here some notation. Throughout the sequel, the expression number of balls is not to be understood literally; all the quantities are real numbers, not necessarily integers. The urn initially contains $b_{0\,i}+B_{0\,i}>0$ distinct balls of color i, with $i=1,\dots,\,k$ . We set $\textbf{b}_{\textbf{0}}=(b_{0\,1},\dots,\,b_{0\,k})^{\top}$ and $\textbf{B}_{\textbf{0}}=(B_{0\,1},\dots,\,B_{0\,k})^{\top}$ . In all the sequel (unless otherwise specified) we assume $|\textbf{b}_{\textbf{0}}|>0$ . Consistently with (4), we set $\textbf{p}_{\textbf{0}} =\frac{\textbf{b}_{\textbf{0}}}{|\textbf{b}_{\textbf{0}}|}$ . At each discrete time $(n+1)\geq 1$ , a ball is drawn at random from the urn, producing the random vector $\boldsymbol{\xi_{\textbf{n}+1}} = (\xi_{\textbf{n}+1\,1}, \ldots, \xi_{n+1\,k})^\top$ defined as

\begin{equation*}\xi_{n+1\,i} =\begin{cases}1 & \text{when the ball extracted at time $n+1$ is of color i,}\\0 & \text{otherwise},\end{cases}\end{equation*}

and the number of balls in the urn is updated as follows:

(7) \begin{equation}{\textbf{N}_{\textbf{n}+\textbf{1}}}=\textbf{b}_{\textbf{0}}+{\textbf{B}_{\textbf{n}+\textbf{1}}}\qquad\text{with}\qquad{\textbf{B}_{\textbf{n}+\textbf{1}}} = \beta \textbf{B}_{\textbf{n}} + \alpha \boldsymbol{\xi_{\textbf{n}+1}}\,,\end{equation}

which gives (since $|\boldsymbol{\xi}_{\textbf{n}+\textbf{1}}|=1$ )

(8) \begin{equation}|{\textbf{B}_{\textbf{n}+\textbf{1}}}|= \beta |\textbf{B}_{\textbf{n}}| + \alpha.\end{equation}

Therefore, setting $r^*_{n} = |{\textbf{N}_{\textbf{n}}}|= |\textbf{b}_{\textbf{0}}|+|\textbf{B}_{\textbf{n}}|$ , we get

(9) \begin{equation}r_{n+1}^*=r_n^*+(\beta-1)|\textbf{B}_{\textbf{n}}|+\alpha.\end{equation}

Moreover, setting $\mathcal{F}_0$ equal to the trivial $\sigma$ -field and $\mathcal{F}_n=\sigma(\boldsymbol{\xi_{\textbf{1}}},\dots,\,\boldsymbol{\xi}_{\textbf{n}})$ for $n\geq 1$ , the conditional probabilities $\boldsymbol{\psi}_{\!\textbf{n}}= (\psi_{n\,1}, \ldots,\psi_{n\,k})^\top$ of the extraction process, also called predictive means, are

(10) \begin{equation}\boldsymbol{\psi}_{\!\textbf{n}}=E[ \boldsymbol{\xi_{\textbf{n}+1}}| \mathcal{F}_n ]=\frac{{\textbf{N}_{\textbf{n}}}}{|{\textbf{N}_{\textbf{n}}}|} =\frac{\textbf{b}_{\textbf{0}}+\textbf{B}_{\textbf{n}}}{r_n^*}\qquad\mbox{for } n\geq 0.\end{equation}

It is obvious that we have $|\boldsymbol{\psi}_{\!\textbf{n}}|=1$ . Finally, for the sequel, we set ${\overline{\boldsymbol{\xi}}_{\textbf{N}}}=\sum_{n=1}^N\boldsymbol{\xi}_{\textbf{n}}/N$ .

We note that, by means of (10), together with (7) and (9), we have

(11) \begin{equation}\boldsymbol{\psi}_{\!\textbf{n}}-\boldsymbol{\psi}_{\!\textbf{n}-\textbf{1}}=-\frac{(1-\beta)|\textbf{b}_{\textbf{0}}|}{r_{n}^*}\big(\boldsymbol{\psi}_{\!\textbf{n}-\textbf{1}}-\textbf{p}_{\textbf{0}}\big)+\frac{\alpha}{r_{n}^*}\big(\boldsymbol{\xi}_{\textbf{n}}-\boldsymbol{\psi}_{\!\textbf{n}-\textbf{1}}\big).\end{equation}

As previously stated, the RP urn for $\beta=1$ coincides with the well-known standard Eggenberger–Pólya urn, and so we will exclude it from the following analyses. When $\beta\neq 1$ , because of the first term in the right-hand side of the above relation, the RP urn does not belong to the class of reinforced stochastic processes (RSPs) studied in [Reference Aletti, Crimaldi and Ghiglietti1, Reference Aletti, Crimaldi and Ghiglietti3, Reference Aletti, Crimaldi and Ghiglietti2, Reference Crimaldi, Dai Pra, Louis and Minelli20, Reference Crimaldi, Dai Pra and Minelli21, Reference Dai Pra, Louis and Minelli23]. Generally speaking, by reinforcement in a stochastic dynamics we mean any mechanism for which the probability that a given event occurs, i.e. the predictive mean, has an increasing dependence on the number of times that the same event occurred in the past. This ‘reinforcement mechanism’, also known as a ‘preferential attachment rule’, ‘rich get richer rule’, or ‘Matthew effect’, is a key feature governing the dynamics of many biological, economic, and social systems (see e.g. [Reference Pemantle45]). The RSPs are characterized by a ‘strict’ reinforcement mechanism such that, at each time-step, we have a strictly positive increment of the predictive mean associated to the extracted color. As an immediate consequence, the ‘general’ reinforcement mechanism is satisfied; that is, the predictive mean for a given color has an increasing dependence on the number of past extractions of that color. When $\beta\neq 1$ , the RP urn model does not satisfy the ‘strict’ reinforcement mechanism, because the first term in the right-hand side of (11) is positive or negative according to the sign of $(1-\beta)$ and of $\big(\boldsymbol{\psi}_{\!\textbf{n}-\textbf{1}}-\textbf{p}_{\textbf{0}}\big)$ . However, when $\alpha,\,\beta>0$ , it satisfies the general reinforcement mechanism. Indeed, by (7), (8), (9), and (10), using $\sum_{m=0}^{n-1}x^m=(1-x^{n})/(1-x)$ , we have

(12) \begin{equation}r_n^*=|\textbf{b}_{\textbf{0}}|+\frac{\alpha }{1-\beta}+\beta^n\!\left(|\textbf{B}_{\textbf{0}}|- \frac{\alpha }{1-\beta}\right)\end{equation}

and

(13) \begin{equation}\boldsymbol{\psi}_{\!\textbf{n}} =\frac{\textbf{b}_{\textbf{0}}+ \beta^n\textbf{B}_{\textbf{0}} + \alpha\! \sum_{m=1}^n \beta^{n-m} \boldsymbol{\xi_{\textbf{m}}}}{|\textbf{b}_{\textbf{0}}|+\frac{\alpha }{1-\beta}+\beta^n\Big(|\textbf{B}_{\textbf{0}}|- \frac{\alpha }{1-\beta}\Big)}=\frac{\beta^{-n}\textbf{b}_{\textbf{0}}+ \textbf{B}_{\textbf{0}} + \alpha\! \sum_{m=1}^n \beta^{-m} \boldsymbol{\xi_{\textbf{m}}}}{\beta^{-n}\!\left(|\textbf{b}_{\textbf{0}}|+\frac{\alpha }{1-\beta}\right)+|\textbf{B}_{\textbf{0}}|- \frac{\alpha }{1-\beta}}.\end{equation}

In particular, for $\beta>1$ , the dependence of $\boldsymbol{\psi}_{\!\textbf{n}}$ on $\boldsymbol{\xi_{\textbf{m}}}$ exponentially decreases with m, because of the factor $\beta^{-m}$ . For $\beta<1$ we have the opposite behavior; that is, the dependence of $\boldsymbol{\psi}_{\!\textbf{n}}$ on $\boldsymbol{\xi_{\textbf{m}}}$ exponentially increases with m, because of the factor $\beta^{n-m}$ , and so the main contribution is given by the most recent extractions. We refer to this phenomenon as ‘local’ reinforcement. The case $\beta=0$ is an extreme case, for which $\boldsymbol{\psi}_{\!\textbf{n}}$ depends only on the last extraction $\boldsymbol{\xi}_{\textbf{n}}$ : at each time-step $n+1\geq 2$ we extract a ball from an urn with $b_{0\,i}+\alpha$ balls of color i, if i is the color extracted at time n, and $b_{0\,j}$ balls for each color $j\neq i$ . This particular case corresponds to a version of the so-called ‘memory-1 senile reinforced random walk’ on a star-shaped graph introduced in [Reference Holmes and Sakai33], but the study done in that paper differs from ours. Finally, we observe that Equation (11) recalls the dynamics of an RSP with a ‘forcing input’ (see [Reference Aletti, Crimaldi and Ghiglietti1, Reference Crimaldi, Dai Pra, Louis and Minelli20, Reference Sahasrabudhe50]), but the main difference relies on the fact that, for the RP urn, the sequence $\big(r_n^*\big)$ is such that $r_n^*\to r^*>0$ , and so $\sum_n1/r_n^*=+\infty$ and $\sum_n 1/\big(r_n^*\big)^2=+\infty$ , when $\beta\in[0,1)$ , and such that $\sum_n 1/r_n^*<+\infty$ (and $\sum_n 1/\big(r_n^*\big)^2<+\infty$ ) when $\beta>1$ . These facts lead to different asymptotic behavior for $(\boldsymbol{\psi}_{\!\textbf{n}})$ . Specifically, for the RP urn with $\beta\in [0,1)$ , the predictive mean $\boldsymbol{\psi}_{\!\textbf{n}}$ randomly fluctuates and does not converge almost surely, while, for the RP urn with $\beta>1$ , the sequence $(\boldsymbol{\psi}_{\!\textbf{n}})$ almost surely converges to a random variable $\boldsymbol{\psi}_{\!\boldsymbol{\infty}}$ and $|\boldsymbol{\psi}_{\!\textbf{n}}-\boldsymbol{\psi}_{\!\boldsymbol{\infty}}|=O(\beta^{-n})$ . By contrast, for the RSP with a ‘forcing input’, the almost sure convergence of $\boldsymbol{\psi}_{\!\textbf{n}}$ toward the forcing input (which is a constant) holds true, and the corresponding rate of convergence depends on a model parameter $\gamma\in (1/2,1]$ and equals $n^{-\gamma/2}$ .

3. Properties of the rescaled Pólya urn model

We study separately the three cases $\beta=0$ , $\beta\in (0, 1)$ , and $\beta>1$ . In particular, the case $\beta=0$ is placed at the end of this section, because it is elementary.

3.1. The case $\beta \in (0, 1)$

In this case, we have $\lim_n\beta^n=0$ and $\sum_{n\geq 1}\beta^n=\beta/(1-\beta)$ . Therefore, setting $r=\tfrac{\alpha}{1-\beta}$ and $r^*=|\textbf{b}_{\textbf{0}}|+r$ , we have by (8) and (9)

(14) \begin{equation}r_n^*=|\textbf{b}_{\textbf{0}}|+|\textbf{B}_{\textbf{n}}|=|\textbf{b}_{\textbf{0}}|+r + \beta^n ( |\boldsymbol{\textbf{B}_{0}}| - r)\longrightarrow r^*>0,\end{equation}

and so we have that the denominator $r^*_n$ in $\boldsymbol{\psi}_{\!\textbf{n}}$ (see Equation (10)) goes exponentially fast to the limit $r^*$ . Moreover, recalling the definition of the constant $\gamma$ in (5), we have $\beta < \gamma < 1$ (remember that $|\textbf{b}_{\textbf{0}}|>0$ by assumption),

(15) \begin{equation}\gamma-\beta=\frac{\alpha}{r^*}, \qquad\mbox{and}\qquad1-\gamma=\frac{(1-\beta)|\textbf{b}_{\textbf{0}}|}{r^*}.\end{equation}

Therefore, by (14), the terms $\frac{(1-\beta)|\textbf{b}_{\textbf{0}}|}{r_{n}^*}$ and $\frac{\alpha}{r_{n}^*}$ in the dynamics (11) converge exponentially fast to $(1-\gamma)$ and $(\gamma-\beta)$ , respectively. Furthermore, as we will see, the fact that the constant $\gamma$ is strictly smaller than 1 will play a central rôle, because it will imply the existence of a contraction of the process $\boldsymbol{\psi}=(\boldsymbol{\psi}_{\!\textbf{n}})_n$ in a proper metric space with (sharp) constant $\gamma$ . Consequently, it is not a surprise that this constant enters naturally in the parameters of the asymptotic distribution, given in the following result.

Theorem 2. We have ${\overline{\boldsymbol{\xi}}_{\textbf{N}}} \mathop{\longrightarrow}\limits^{a.s.}\textbf{p}_{\textbf{0}}$ and

\begin{equation*}{\sqrt{N}} \big({\overline{\boldsymbol{\xi}}_{\textbf{N}}} - \textbf{p}_{\textbf{0}}\big)=\frac{\sum_{n=1}^N\! \big(\boldsymbol{\xi}_{\textbf{n}} - \textbf{p}_{\textbf{0}}\big)}{\sqrt{N}}\mathop{\longrightarrow}^{d}\mathcal{N} \big(\textbf{0},\Sigma^2\big),\end{equation*}

where

\begin{equation*}\Sigma^2 = \lambda\big( \mathrm{diag} (\textbf{p}_{\textbf{0}}) - \textbf{p}_{\textbf{0}}\textbf{p}_{\textbf{0}}^T\big),\end{equation*}

with $\lambda$ defined in (6) as a function of $\beta$ and $\gamma$ .

Remark 1. Note that in Theorem 2 we do not assume $b_{0\,i}>0$ for all i, but only $|\textbf{b}_{\textbf{0}}|>0$ (as stated in Section 2). The asymptotic result given above fails when $\textbf{b}_{\textbf{0}} = \textbf{0}$ , and a different behavior is observed. Indeed, from (11), we have

(16) \begin{equation}\boldsymbol{\psi}_{\!\textbf{n}}-\boldsymbol{\psi}_{\!\textbf{n}-\textbf{1}}=\frac{\alpha}{r_{n}^*}\big(\boldsymbol{\xi}_{\textbf{n}}-\boldsymbol{\psi}_{\!\textbf{n}-\textbf{1}}\big),\end{equation}

and hence $(\boldsymbol{\psi}_{\!\textbf{n}})$ is a martingale. Therefore the bounded process $(\boldsymbol{\psi}_{\!\textbf{n}})$ converges almost surely (and in mean) to a bounded random variable $\boldsymbol{\psi}_{\!\boldsymbol{\infty}}$ . In addition, since $r_{n}^* \to r=\alpha/(1-\beta)$ , from (16), we obtain that the unique possible limits ${\boldsymbol{\psi}_{\!\boldsymbol{\infty}}}$ are those for which $\boldsymbol{\xi}_{\textbf{n}} = {\boldsymbol{\psi}_{\!\boldsymbol{\infty}}}$ eventually. Hence $\boldsymbol{\psi}_{\!\boldsymbol{\infty}}$ takes values in $\{\textbf{e}_{\textbf{1}},\dots,\,\textbf{e}_{\textbf{k}}\}$ , and since we have $E[{\boldsymbol{\psi}_{\!\boldsymbol{\infty}}}] = E[\boldsymbol{\psi_{0}}] =\textbf{B}_{\textbf{0}}/{|\textbf{B}_{\textbf{0}}|}$ , we get $P( {\boldsymbol{\psi}_{\!\boldsymbol{\infty}}} = \textbf{e}_{\textbf{i}} ) =\frac{{B_{0\,i}}}{ |\textbf{B}_{\textbf{0}}|}$ for all $i=1,\dots,\,k$ .

We will split the proof of Theorem 2 into two main steps: first, we will prove that the convergence behavior of ${\overline{\boldsymbol{\xi}}_{\textbf{N}}}$ does not depend on the initial constant $|\textbf{B}_0|$ , and then, without loss of generality, we will assume $|\textbf{B}_0|=r$ and we will give the proof of the theorem under this assumption.

3.1.1. Independence of the asymptotic properties of the empirical mean from $|\textbf{B}_{\textbf{0}}|$

We use a coupling method to prove that the convergence results stated in Theorem 2 are not affected by the value of the initial constant $|\textbf{B}_{\textbf{0}}|$ .

Set $\boldsymbol{\xi^{(1)}_{\textbf{n}}} =\boldsymbol{\xi}_{\textbf{n}} $ and $\boldsymbol{\psi^{(1)}_{\textbf{n}}} =\boldsymbol{\psi}_{\!\textbf{n}} $ , which follows the dynamics (11), together with (10), starting from a certain initial point ${\boldsymbol{\psi}_{\!\textbf{0}}^{(\textbf{1})}}=\boldsymbol{\psi}_{\!\textbf{0}}$ . By (7) and the relations (15), we can write

\begin{equation*}\begin{aligned}{\boldsymbol{\psi}^{(\textbf{1})}_{\textbf{n}+\textbf{1}}}&=\frac{\textbf{b}_{\textbf{0}} + \beta \textbf{B}_{\textbf{n}} + \alpha \boldsymbol{\xi^{(1)}_{\textbf{n}+1}} }{r^*_{n+1}}=\frac{\textbf{b}_{\textbf{0}} + \beta \Big( r^*_n\boldsymbol{\psi^{(1)}_{\textbf{n}}} - \textbf{b}_{\textbf{0}} \Big) +\alpha \boldsymbol{\xi^{(1)}_{\textbf{n}+1}} }{r^*_{n+1}}\\ &=\beta \boldsymbol{\psi^{(1)}_{\textbf{n}}}\frac{r^*_n}{r^*_{n+1}}+\frac{(1-\beta)\textbf{b}_{\textbf{0}}}{r^*_{n+1}}+\frac{\alpha}{r^*_{n+1}}\boldsymbol{\xi^{(1)}_{\textbf{n}+1}}\\ &= \beta \boldsymbol{\psi^{(1)}_{\textbf{n}}} + (\gamma - \beta) \boldsymbol{\xi^{(1)}_{\textbf{n}+1}}+ {\textbf{l}_{\textbf{n}+\textbf{1}}^{(\textbf{1})}}\Big(\boldsymbol{\psi^{(1)}_{\textbf{n}}},\boldsymbol{\xi^{(1)}_{\textbf{n}+1}}\Big)+(1-\gamma)\textbf{p}_{\textbf{0}} ,\end{aligned}\end{equation*}

where

\begin{equation*}{\textbf{l}_{\textbf{n}+\textbf{1}}^{(\textbf{1})}}(\textbf{x},\textbf{y})=\bigg(\frac{r^*}{r^*_{n+1}}-1\bigg) \big[ (1-\gamma)\textbf{p}_{\textbf{0}}+ (\gamma-\beta) \textbf{y} \big]+\bigg(\frac{r^*_n}{r^*_{n+1}}-1\bigg) \beta\textbf{x}.\end{equation*}

Since, by (14), we have $r^*/r_{n+1}^*-1=O\big(\beta^{n+1}\big)$ and $r_{n}^*/r_{n+1}^* -1 =O\big(\beta^{n+1}\big)$ , we get $|{\textbf{l}_{\textbf{n}+\textbf{1}}^{(\textbf{1})}}|=O\big(\beta^{n+1}\big)$ . Now, take $\boldsymbol{\xi^{(2)}}=\big(\boldsymbol{\xi^{(2)}_{\textbf{n}}}\big)_n$ and $\boldsymbol{\psi^{(2)}} =\big(\boldsymbol{\psi^{(2)}_{\textbf{n}}} \big)_n$ following the same dynamics given in (10) and (11), but starting from an initial point with $|\boldsymbol{\textbf{B}_0^{(2)}}|=r$ . Therefore, we have

\begin{equation*}\boldsymbol{\psi^{(2)}_{\textbf{n}+1}}=\beta\boldsymbol{\psi^{(2)}_{\textbf{n}}}+(\gamma - \beta)\boldsymbol{\xi^{(2)}_{\textbf{n}+1}}+(1-\gamma)\textbf{p}_{\textbf{0}}.\end{equation*}

Both dynamics are of the form (A.9) with $a_0 = \beta$ , $a_1 = (\gamma-\beta)$ , $\textbf{c}= (1-\gamma)\textbf{p}_{\textbf{0}}$ , $c_n^{(1)}=\beta^{n}$ , and $c_{n}^{(2)}=0$ , and, by (10), the condition (A.10) holds true. Hence we can apply Theorem 10 so that there exist two stochastic processes ${\widetilde{\boldsymbol\psi}^{(1)}}$ and ${\widetilde{\boldsymbol\psi}^{(2)}}$ , following the dynamics (A.11) (with the same specifications as above), together with (A.12), starting from the same initial points and such that (A.14) holds true; that is,

\begin{equation*}\begin{aligned}E\Big[ \Big|{\widetilde{\boldsymbol\psi}^{(1)}_{\textbf{n}+1}}-{\widetilde{\boldsymbol\psi}^{(2)}_{\textbf{n}+1}} \Big|\Big| {\widetilde{\boldsymbol\psi}^{(2)}_{0}}, {\widetilde{\boldsymbol\psi}^{(1)}_{0}}\Big]& \leq \gamma^{n+1}\big|{\widetilde{\boldsymbol\psi}^{(1)}_{0}}-{\widetilde{\boldsymbol\psi}^{(2)}_{0}}\big| + O\Bigg( \sum_{j=1}^{n+1} \gamma^{n+1-j} \beta^{j}\Bigg)\\ & =O\big((n+2) \max\!( \gamma , \beta )^{n+1}\big)=O\big((n+2)\gamma^{n+1}\big) .\end{aligned}\end{equation*}

Since $\gamma<1$ , if we subtract (A.11) with $\ell=2$ from (A.11) with $\ell=1$ , we obtain that

\begin{equation*}\sum_{n=1}^{+\infty} E \Big[ \Big|{\widetilde{\boldsymbol\xi}^{(1)}_{\textbf{n}}} -{\widetilde{\boldsymbol\xi}^{(2)}_{\textbf{n}}} \Big|\Big] < +\infty ,\end{equation*}

which implies $\sum_{n=0}^{+\infty}\Big|{\widetilde{\boldsymbol\xi}^{(1)}_{\textbf{n}}} -{\widetilde{\boldsymbol\xi}^{(2)}_{\textbf{n}}} \Big| <+\infty$ almost surely. Recall that ${\widetilde{\boldsymbol\xi}^{(1)}_{\textbf{n}}}\in \{0,1\} $ and $ {\widetilde{\boldsymbol\xi}^{(2)}_{\textbf{n}}}\in \{0,1\}$ , and hence

(17) \begin{equation}{\widetilde{\boldsymbol\xi}^{(1)}_{\textbf{n}}} ={\widetilde{\boldsymbol\xi}^{(2)}_{\textbf{n}}}\qquad\text{eventually (i.e. for sufficiently large n)}.\end{equation}

Therefore, if we prove some asymptotic results for $\boldsymbol{\xi^{(2)}}$ , then they hold true also for ${\widetilde{\boldsymbol\xi}^{(2)}}$ (since they have the same joint distribution); then they hold true also for ${\widetilde{\boldsymbol\xi}^{(1)}}$ (since (17)), and finally they hold true also for $\boldsymbol{{\xi}^{(1)}}$ (since they have the same joint distribution). Summing up, without loss of generality, we may prove Theorem 2 under the additional assumption $|\textbf{B}_{\textbf{0}}| = r$ .

3.1.2. The case $|\textbf{B}_{\boldsymbol{0}}|=r$

Thanks to what we have observed in the previous subsection, we here assume that $|\textbf{B}_{\textbf{0}}|=r=\alpha/(1-\beta)$ , which implies $|\textbf{B}_{\textbf{n}}|=r$ and

(18) \begin{equation}r_n^*=r^*=|\textbf{b}_{\textbf{0}}|+r\end{equation}

for any n. Hence, we can simplify (10) as

\begin{equation*}\psi_{n\,i} = P( \xi_{n+1\,i} = 1| \mathcal{F}_n ) =\frac{b_{0\,i} + B_{n\,i}}{|\textbf{b}_{\textbf{0}}|+ r}=\frac{b_{0\,i} + B_{n\,i}}{r^*}.\end{equation*}

The process $\boldsymbol{\psi}=(\boldsymbol{\psi}_{\!\textbf{n}})_{n\geq 0}$ is then a Markov chain with state space

\begin{equation*}S =\left\{\textbf{x} \,:\,x_i\in\Big[\frac{b_{0\,i}}{r^*},\frac{b_{0\,i}+r}{r^*}\Big] , |\textbf{x}|= 1\right\},\end{equation*}

which, endowed with the distance induced by the norm $|\cdot|$ , is a compact metric space.

In the sequel, according to the context, since we work with a Markov chain with state space $S\subset\mathbb{R}^k$ , the notation P will be used for the following:

  • a kernel $P\,:\, S\times \mathcal{B}(S)\to [0,1]$ , where $\mathcal{B}(S)$ is the Borel $\sigma$ -field on S and we will use the notation $P(\textbf{x},\textbf{dy})$ in the integrals;

  • an operator $P\,:\,C(S) \to M(S)$ , where C(S) and M(S) denote the spaces of the continuous and measurable functions on S, respectively, defined as

    \begin{equation*}(Pf)(\textbf{x}) = \int_S f(\textbf{y}) P(\textbf{x},\textbf{dy}).\end{equation*}
    In addition, when f is the identity map, that is, $f(\textbf{y})=\textbf{y}$ , we will write $(Pid)(\textbf{x})$ or $(P\textbf{y})(\textbf{x})$ .

Moreover, we set $P^0f = f$ and $P^nf = P\big(P^{n-1}f\big)$ .

By (11), together with (15) and (18), the process $\boldsymbol{\psi}=(\boldsymbol{\psi}_{\!\textbf{n}})_n$ follows the dynamics

(19) \begin{equation}\boldsymbol{\psi}_{\!\textbf{n}+\textbf{1}}= \beta \boldsymbol{\psi}_{\!\textbf{n}} +\textbf{b}_{\textbf{0}}\frac{(1-\beta)}{r^*} +\boldsymbol{\xi_{\textbf{n}+1}}\frac{\alpha}{r^*}=\beta \boldsymbol{\psi}_{\!\textbf{n}} +(1-\gamma)\textbf{p}_{\textbf{0}}+(\gamma-\beta)\boldsymbol{\xi_{\textbf{n}+1}}.\end{equation}

Therefore, given $\textbf{z} = (z_1, \ldots, z_k)^T$ and setting

\begin{equation*}\textbf{z}_{(i)} = \Big(z_1, \ldots , z_i+\frac{\alpha}{r^*} , \ldots, z_k\Big)^T=\big(z_1, \ldots , z_i+(\gamma-\beta) , \ldots, z_k\big)^T,\end{equation*}

for any $i=1, \ldots, k$ , we get

(20) \begin{equation}(Pf)(\textbf{x}) =E\big[ f\big(\boldsymbol{\psi}_{\!\textbf{n}+\textbf{1}}\big) | \boldsymbol{\psi}_{\!\textbf{n}}=\textbf{x} \big] =\sum_{i=1}^k x_i f\Big( \big(\beta \textbf{x}+\textbf{p}_{\textbf{0}}(1-\gamma) \big)_{(i)}\Big).\end{equation}

In particular, from the above equality, we get

(21) \begin{equation}(Pid)(\textbf{x})-\textbf{p}_{\textbf{0}} =E\big[ \boldsymbol{\psi}_{\!\textbf{n}+\textbf{1}}-\textbf{p}_{\textbf{0}} | \boldsymbol{\psi}_{\!\textbf{n}}=\textbf{x} \big] =\gamma( \textbf{x}-\textbf{p}_{\textbf{0}} ).\end{equation}

We now show that $\boldsymbol{\psi}$ is an irreducible, aperiodic, compact Markov chain (see Definitions 3 and 4).

Checking that $\boldsymbol{\psi}$ is a compact Markov chain: By Lemma 1, it is sufficient to show that P defined in (20) is weak Feller (Definition 1) and that it is a semi-contractive operator on Lip(S) (Definition 2). From (20), we have immediately that the function Pf is continuous whenever f is continuous and hence P is weak Feller. In order to prove the contractive property, we start by observing that the dynamics (19) of $\boldsymbol{\psi}$ is of the form (A.9) with $a_0=\beta$ , $a_1=(\gamma-\beta)$ , $\textbf{c}=\textbf{p}_{\textbf{0}}(1-\gamma)$ , and $\textbf{l}_{\textbf{n}}\equiv\textbf{0}$ for each n. Moreover, by (10), the condition (A.10) holds true. Then we let $\boldsymbol{\psi^{(1)}}$ and $\boldsymbol{\psi^{(2)}}$ be two stochastic processes following the dynamics (A.9) with the same specifications as above, together with (A.10), and starting, respectively, from the points $\textbf{x}$ and $\textbf{y}$ . Applying Theorem 10, we get two stochastic processes ${\widetilde{\boldsymbol\psi}^{\boldsymbol{(1)}}}$ and ${\widetilde{\boldsymbol\psi}^{\boldsymbol{(2)}}}$ , evolving according to (A.11), together with (A.12), starting from the same initial points $\textbf{x}$ and $\textbf{y}$ and such that

\begin{equation*}E\Big[\Big| {\widetilde{\boldsymbol\psi}_{1}^{(2)}} - {\widetilde{\boldsymbol\psi}_{1}^{(1)}}\Big|\Big]=E\Big[\Big| {\widetilde{\boldsymbol\psi}_{1}^{(2)}} - {\widetilde{\boldsymbol\psi}_{1}^{(1)}} \Big|\Big| {\widetilde{\boldsymbol\psi}_{0}^{(1)}}= \textbf{x} ,\, {\widetilde{\boldsymbol\psi}_{0}^{(2)}}= \textbf{y} \Big]\leq \gamma \big| \textbf{x}-\textbf{y} \big|.\end{equation*}

Therefore, if we take $f \in Lip(S)$ with

\begin{equation*}|f|_{Lip} = \sup_{\textbf{x},\textbf{y}\in S,\,\textbf{x}\neq \textbf{y}}\frac{|f(\textbf{y})-f(\textbf{x})|}{|\textbf{y}-\textbf{x}|},\end{equation*}

we obtain

\begin{align*}| (Pf)(\textbf{y})-(Pf)(\textbf{x}) | &=\Big|E\Big[f\Big({\widetilde{\boldsymbol\psi}_{1}^{(2)}}\Big)\Big|{\widetilde{\boldsymbol\psi}_{0}^{(2)}}=\textbf{y}\Big]-E\Big[f\Big({\widetilde{\boldsymbol\psi}_{1}^{(1)}}\Big)\Big|{\widetilde{\boldsymbol\psi}_{0}^{(1)}}=\textbf{x}\Big]\Big|\\ &=E\Big[\,\Big|f\Big({\widetilde{\boldsymbol\psi}_{1}^{(2)}}\Big) - f\big({\widetilde{\boldsymbol\psi}_{1}^{(1)}}\big)\Big|\,\Big]\\ & \leq|f|_{Lip} E\Big[\Big| {\widetilde{\boldsymbol\psi}_{1}^{(2)}} - {\widetilde{\boldsymbol\psi}_{1}^{(1)}} \Big|\Big ]\leq |f|_{Lip} \ \gamma \ |\textbf{x}-\textbf{y}|\end{align*}

and so

\begin{equation*}|(Pf)|_{Lip} =\sup_{\textbf{x},\textbf{y}\in S,\,\textbf{x}\neq \textbf{y}}\frac{|(Pf)(\textbf{y})-(Pf)(\textbf{x})|}{|\textbf{y}-\textbf{x}|}\leq \gamma |f|_{Lip},\end{equation*}

with $\gamma <1 $ , as desired.

Checking that $\boldsymbol{\psi}$ is irreducible and aperiodic: We prove the irreducibility and aperiodicity condition stated in Definition 4, using Theorem 7. Therefore, let us denote by $\pi$ an invariant probability measure for P. Moreover, let $\boldsymbol{\psi^{(1)}}$ and $\boldsymbol{\psi^{(2)}}$ be two processes that follow the same dynamics (19) of $\boldsymbol{\psi}$ , but for the first process, we set the initial distribution equal to $\pi$ , while for the second process, we take any other initial distribution $\nu$ on S. Again, as above, since (19) is of the form (A.9) with $a_0=\beta$ , $a_1=(\gamma-\beta)$ , $\textbf{c}=\textbf{p}_{\textbf{0}}(1-\gamma)$ , and $\textbf{l}_{\textbf{n}}\equiv\textbf{0}$ for each n, and, by (10), the condition (A.10) holds true, we can apply Theorem 5.2 and obtain two stochastic processes $\boldsymbol{\widetilde\psi^{(1)}}$ and ${\widetilde{\boldsymbol\psi}^{\boldsymbol{(2)}}}$ , evolving according to (A.11) (with the same specifications as above), together with (A.12), starting from the same initial random variables $\boldsymbol{\psi^{(1)}_0}$ (with distribution $\pi$ ) and $\boldsymbol{\psi^{(2)}_0}$ (with distribution $\nu$ ) and such that

\begin{equation*}E\Big[ \Big|{\widetilde{\boldsymbol\psi}^{(1)}_{n+1}}-{\widetilde{\boldsymbol\psi}^{(2)}_{n+1}} \Big|\Big| {\widetilde{\boldsymbol\psi}^{(2)}_{0}}, {\widetilde{\boldsymbol\psi}^{(1)}_{0}}\Big]\leq \gamma^{n+1}\Big|{\widetilde{\boldsymbol\psi}^{(1)}_{0}}-{\widetilde{\boldsymbol\psi}^{(2)}_{0}}\Big|.\end{equation*}

Hence, since $\gamma<1$ , we have $E\Big[\Big|{\widetilde{\boldsymbol\psi}^{(1)}_{\textbf{n}}}-{\widetilde{\boldsymbol\psi}^{(2)}_{\textbf{n}}}\Big|\Big]\longrightarrow 0 $ , and, since the distribution of ${\widetilde{\boldsymbol\psi}^{(1)}_{\textbf{n}}}$ is always $\pi$ (by the definition of an invariant probability measure), we can conclude that $\boldsymbol{\widetilde\psi^{(2)}_{\textbf{n}}}$ , and so $\boldsymbol{\psi^{(2)}_{\textbf{n}}}$ (because they have the same distribution), converge in distribution to $\pi$ .

Proof of the almost sure convergence: We have already proven that the Markov chain $\boldsymbol{\psi}$ has one invariant probability measure $\pi$ . Furthermore, from (19) we get

(22) \begin{multline}\boldsymbol{\xi_{\textbf{n}+1}} - \textbf{p}_{\textbf{0}}= \frac{1}{\gamma-\beta}\big[ \boldsymbol{\psi}_{\!\textbf{n}+\textbf{1}} - \beta \boldsymbol{\psi}_{\!\textbf{n}} - \textbf{p}_{\textbf{0}}(1-\gamma) \big] -\textbf{p}_{\textbf{0}} \\= \frac{1}{\gamma-\beta}\big[\big( \boldsymbol{\psi}_{\!\textbf{n}+\textbf{1}} - \textbf{p}_{\textbf{0}} \big) - \beta \big( \boldsymbol{\psi}_{\!\textbf{n}} - \textbf{p}_{\textbf{0}} \big)\big].\end{multline}

Therefore, applying [Reference Hairer31, Corollary 5.3 and Corollary 5.12], we obtain

\begin{align*}{\overline{\boldsymbol{\xi}}_{\textbf{N}}}-\textbf{p}_{\textbf{0}} &=\frac{1}{N} \sum_{m=1}^N (\boldsymbol{\xi_{\textbf{m}}} -\textbf{p}_{\textbf{0}})=\frac{1-\beta}{\gamma-\beta}\frac{1}{N} \sum_{m=1}^NE\big[ \boldsymbol{{\psi}_{\textbf{m}}} - \textbf{p}_{\textbf{0}} \big] + \frac{O(1)}{N}\\ &\mathop{\longrightarrow}\limits^{a.s.}\frac{1-\beta}{\gamma-\beta}E\big[ {\boldsymbol{\psi}^{(\boldsymbol{\pi})}_{\textbf{n}}} - \textbf{p}_{\textbf{0}} \big],\end{align*}

where $\big({\boldsymbol{\psi}^{(\boldsymbol{\pi})}_{\textbf{n}}}\big)_{n\geq 0}$ is a Markov chain with transition kernel P and initial distribution $\pi$ , and so stationary. From (21), we have

\begin{equation*}E\big[ {\boldsymbol{\psi}^{(\boldsymbol{\pi})}_{\textbf{n}}} - \textbf{p}_{\textbf{0}} \big]=E\big[ {\boldsymbol{\psi}^{(\boldsymbol{\pi})}_{\textbf{n}+\textbf{1}}} - \textbf{p}_{\textbf{0}} \big]=E\big[E\big[ {\boldsymbol{\psi}^{(\boldsymbol{\pi})}_{\textbf{n}+\textbf{1}}} - \textbf{p}_{\textbf{0}} \,|\, {\boldsymbol{\psi}^{(\boldsymbol{\pi})}_{\textbf{n}}}\big]\big]=\gamma E\big[ {\boldsymbol{\psi}^{(\boldsymbol{\pi})}_{\textbf{n}}} - \textbf{p}_{\textbf{0}} \big]\end{equation*}

and so, since $\gamma<1$ , we get $E\big[ {\boldsymbol{\psi}^{(\boldsymbol{\pi})}_{\textbf{n}}} - \textbf{p}_{\textbf{0}} \big]=0$ . This means that $\int_S\textbf{x}\,\pi(d\textbf{x})=\textbf{p}_{\textbf{0}}$ and ${\overline{\boldsymbol{\xi}}_{\textbf{N}}}\stackrel{a.s.}\longrightarrow \textbf{p}_{\textbf{0}}$ .

Proof of the CLT: We apply Theorem 9, taking into account that we have already proven that $\boldsymbol{\psi}$ is an irreducible and aperiodic compact Markov chain. By (22) and what we have already proven before, we have

\begin{equation*}\boldsymbol{\xi_{\textbf{n}+1}} - \textbf{p}_{\textbf{0}}= {\textbf{f}} \big(\boldsymbol{\psi}_{\!\textbf{n}} , \boldsymbol{\psi}_{\!\textbf{n}+\textbf{1}} \big)\quad \text{with}\quad {\textbf{f}} (\textbf{x} , \textbf{y} ) =\frac{1}{\gamma-\beta} \big[( \textbf{y} - \textbf{p}_{\textbf{0}} ) -\beta ( \textbf{x} - \textbf{p}_{\textbf{0}} )\big]\end{equation*}

and $\textbf{p}_{\textbf{0}}=\int_S\textbf{x}\,\pi(d\textbf{x})$ . Hence ${\textbf{f}}$ and P form a linear model as defined in Definition 5 (see also Remark 5). Indeed, we have $A_{1} = -\frac{\beta}{\gamma-\beta} Id$ and $A_{2} =\frac{1}{\gamma-\beta} Id$ . Moreover, by (21), we have $P(id)(\textbf{x})-\textbf{p}_{\textbf{0}}=\gamma( \textbf{x} -\textbf{p}_{\textbf{0}} )$ , which means $A_{P} = \gamma Id$ . Therefore Theorem 9 holds true with

\begin{equation*}D_0= (1-\gamma)^{-1}Id,\quad D_1= -\frac{\gamma(1-\beta)}{(\gamma-\beta)(1-\gamma)}Id,\quad D_2= \frac{(1-\beta)}{(\gamma-\beta)(1-\gamma)}Id,\end{equation*}

and so, after some computations, with

(23) \begin{equation}\begin{aligned}\Sigma^2 =\frac{(1-\beta)^2(1+\gamma)}{(\gamma-\beta)^2(1-\gamma)}\Sigma^2_{\pi} =\frac{(1-\beta)^2}{(\gamma-\beta)^2}\bigg( 1 + 2\frac{\gamma}{1-\gamma}\bigg)\Sigma^2_{\pi} .\end{aligned}\end{equation}

In order to conclude, we take a Markov chain $\big({\boldsymbol{\psi}^{(\boldsymbol{\pi})}_{\textbf{n}}}\big)_{n\geq 0}$ with transition kernel P and initial distribution $\pi$ and we set

\begin{equation*}\boldsymbol{{\xi}^{(\pi)}_{\textbf{n}+1}}-\textbf{p}_{\textbf{0}}=f\Big({\boldsymbol{\psi}^{(\boldsymbol{\pi})}_{\textbf{n}}},{\boldsymbol{\psi}^{(\boldsymbol{\pi})}_{\textbf{n}+\textbf{1}}}\Big)=A_{2}\Big({\boldsymbol{\psi}^{(\boldsymbol{\pi})}_{\textbf{n}+\textbf{1}}}-\textbf{p}_{\textbf{0}} \Big) +A_{1}\Big({\boldsymbol{\psi}^{(\boldsymbol{\pi})}_{\textbf{n}}}-\textbf{p}_{\textbf{0}} \Big).\end{equation*}

Then we observe that, by (A.4) and (A.5), we have

(24) \begin{align} \nonumber\mathrm{diag} (\textbf{p}_{\textbf{0}}) - \textbf{p}_{\textbf{0}}\textbf{p}_{\textbf{0}}^T & =E\Big[\Big({\boldsymbol{\xi}^{(\boldsymbol{\pi})}_{\textbf{1}}}-\textbf{p}_{\textbf{0}} \Big)\Big({\boldsymbol{\xi}^{(\boldsymbol{\pi})}_{\textbf{1}}}-\textbf{p}_{\textbf{0}} \Big)^\top\Big]\\ \nonumber & =A_1 \Sigma^2_{\pi} A^\top_1 + A_2 \Sigma^2_{\pi} A_2^\top+ A_1 \Sigma^2_{\pi} A_P^\top A_2^\top + A_2 A_P \Sigma^2_{\pi} A_1\\ &=\frac{(\gamma-\beta)^2 + \big(1-\gamma^2\big)}{(\gamma-\beta )^2}\Sigma^2_{\pi}.\end{align}

Finally, it is enough to combine (23) and (24).

3.2. The case $\beta>1$

In this case, $\lim_n \beta^{n}=+\infty$ and $\sum_{n\geq 1}\beta^{-n}= 1/(\beta-1)$ . Moreover, by (12), $r_n^*$ increases exponentially to $+\infty$ . Hence, the following results hold true.

Theorem 3. We have

\begin{equation*}\boldsymbol{\psi}_{\!\textbf{N}}\stackrel{a.s.}\longrightarrow\boldsymbol{\psi}_{\!\boldsymbol{\infty}}=\frac{ \textbf{B}_{\textbf{0}} + \alpha\! \sum_{n=1}^{+\infty} \beta^{-n} \boldsymbol{\xi}_{\textbf{n}}}{|\textbf{B}_{\textbf{0}}| + \frac{\alpha}{\beta-1} }\end{equation*}

and

\begin{equation*}|\boldsymbol{\psi_{\textbf{N}}}-{{\boldsymbol{\psi}_{\!\boldsymbol{\infty}}}}|\stackrel{a.s.}=O\big(\beta^{-N}\big).\end{equation*}

Moreover, $\boldsymbol{\psi}_{\!\boldsymbol{\infty}}$ takes values in $\big\{\textbf{x}\in [0,1]^k\,:\,|\textbf{x}|=1\big\}$ , and if $B_{0\,i}>0$ for all $i=1,\dots,\, k$ , then $P\{ \psi_{\infty\, i}\in (0, 1)\}=1$ for each i.

Note that $\boldsymbol{\psi}_{\!\boldsymbol{\infty}}$ is a function of ${\boldsymbol \xi}=(\boldsymbol{\xi}_{\textbf{n}})_{n\geq 1}$ which takes values in $\big(\big\{\textbf{x}\in\{0,1\}^k\,:\, |\textbf{x}|=1\big\}\big)^\infty$ .

Proof. By (13), we have

\begin{equation*}\boldsymbol{\psi}_{\!\textbf{N}} = \frac{\textbf{b}_{\textbf{0}}+\textbf{B}_{\textbf{N}}}{r_N^*}=\frac{\textbf{b}_{\textbf{0}}\beta^{-N} + \textbf{B}_{\textbf{0}} + \alpha\! \sum_{n=1}^N \beta^{-n} \boldsymbol{\xi}_{\textbf{n}}}{|\textbf{b}_{\textbf{0}}|\beta^{-N} +|\textbf{B}_{\textbf{0}}| + \frac{\alpha }{\beta-1}\big( 1 - \beta^{-N}\big) }.\end{equation*}

Hence, the almost sure convergence immediately follows because $\big|\sum_{n\geq 1}\beta^{-n} \boldsymbol{\xi}_{\textbf{n}}\big|\leq \sum_{n\geq 1}\beta^{-n}<+\infty$ . Moreover, after some computations, we have

\begin{equation*}\boldsymbol{\psi}_{\!\textbf{N}}-\boldsymbol{\psi}_{\!\boldsymbol{\infty}}=\frac{-\alpha\Big(|\textbf{B}_{\textbf{0}}|+\frac{\alpha}{\beta-1}\Big)\sum_{n\geq N+1}\beta^{-n}\boldsymbol{\xi}_{\textbf{n}}+\beta^{-N}{\textbf{R}}}{\Big(|\textbf{B}_{\textbf{0}}|+\frac{\alpha}{\beta-1}\Big)\Big(|\textbf{B}_{\textbf{0}}|+\frac{\alpha}{\beta-1}+\beta^{-N}\Big(|\textbf{b}_{\textbf{0}}|-\frac{\alpha}{\beta-1}\Big)\Big)},\end{equation*}

where

\begin{equation*}{\textbf{R}}=\bigg(|\textbf{B}_{\textbf{0}}|+\frac{\alpha}{\beta-1}\bigg)\textbf{b}_{\textbf{0}}-\bigg(|\textbf{b}_{\textbf{0}}|-\frac{\alpha}{\beta-1}\bigg)\Bigg(\textbf{B}_{\textbf{0}}+\alpha\sum_{n=1}^{+\infty}\beta^{-n}\boldsymbol{\xi}_{\textbf{n}}\Bigg).\end{equation*}

Therefore, since $|\sum_{n\geq N+1}\beta^{-n}\boldsymbol{\xi}_{\textbf{n}}|\leq\sum_{n\geq N+1}\beta^{-n}=\beta^{-N}/(\beta-1)$ and $|{\textbf{R}}|$ is bounded by a constant, we obtain that

\begin{equation*}|\boldsymbol{\psi}_{\!\textbf{N}}-\boldsymbol{\psi}_{\!\boldsymbol{\infty}}|\stackrel{a.s.}=O\big(\beta^{-N}\big).\end{equation*}

In order to conclude, it is enough to recall that, by definition, we have $\boldsymbol{\psi}_{\!\textbf{N}}\in [0,1]^k$ with $|\boldsymbol{\psi}_{\!\textbf{N}}|=1$ and observe that, if $B_{0,\, i}>0$ for all i, then we have

\begin{equation*}0<\frac{B_{0\, i}}{|\textbf{B}_{\textbf{0}}| + \frac{\alpha}{\beta-1}}\leq\psi_{\infty\, i}=\frac{B_{0\, i} + \alpha\! \sum_{n=1}^\infty \beta^{-n} \xi_{n\,i}}{|\textbf{B}_{\textbf{0}}| + \frac{\alpha}{\beta-1}}\leq\frac{B_{0\, i} + \frac{\alpha}{\beta-1} }{|\textbf{B}_{\textbf{0}}| + \frac{\alpha}{\beta-1}}<1.\end{equation*}

Theorem 4. We have ${\overline{\boldsymbol{\xi}}_{\textbf{N}}}\stackrel{a.s.}\longrightarrow{{\boldsymbol{\psi}_{\!\boldsymbol{\infty}}}}$ , and

\begin{equation*}\sqrt{N}\big({\overline{\boldsymbol{\xi}}_{\textbf{N}}}-\boldsymbol{\psi_{\textbf{N}}}\big)\mathop{\longrightarrow}^{s}\mathcal{N}\big(\textbf{0}, \Sigma^2 \big)\quad {and}\quad \sqrt{N}\big({\overline{\boldsymbol{\xi}}_{\textbf{N}}}-{{\boldsymbol{\psi}_{\!\boldsymbol{\infty}}}}\big)\mathop{\longrightarrow}^{s}\mathcal{N}\big(\textbf{0}, \Sigma^2\big),\end{equation*}

where $\Sigma^2 = \mathrm{diag} ({{\boldsymbol{\psi}_{\!\boldsymbol{\infty}}}}) -{{\boldsymbol{\psi}_{\!\boldsymbol{\infty}}}} {\boldsymbol{\psi}_{\!\boldsymbol{\infty}}^\top}$ and $\mathop{\longrightarrow}\limits^{s}$ means stable convergence.

Note that if $B_{0\,i}>0$ for all $i=1,\dots,\,k$ , then $\Sigma^2_{i,j}\in (0, 1)$ for each pair (i, j).

Stable convergence was introduced in [Reference Rényi48]; for its definition and properties, we refer to [Reference Crimaldi19, Reference Crimaldi, Letta and Pratelli22, Reference Hall and Heyde32].

Proof. The almost sure convergence of ${\overline{\boldsymbol{\xi}}_{\textbf{N}}}$ to $\boldsymbol{\psi}_{\!\boldsymbol{\infty}}$ follows by the usual martingale arguments (see, for instance, [Reference Berti, Crimaldi, Pratelli and Rigo9, Lemma 2]) because $E\big[\boldsymbol{\xi_{\textbf{n}+1}}|\mathcal{F}_n\big]=\boldsymbol{\psi}_{\!\textbf{n}}\to \boldsymbol{\psi}_{\!\boldsymbol{\infty}}$ almost surely and $\sum_{n\geq 1} E\big[\|\boldsymbol{\xi}_{\textbf{n}}\|^2 \big]n^{-2}\leq \sum_{n\geq 1}n^{-2}<+\infty$ .

Regarding the CLTs, we observe that, by means of (11), we can write

(25) \begin{equation}\boldsymbol{\psi}_{\!\textbf{n}+\textbf{1}}-\boldsymbol{\psi}_{\!\textbf{n}} =\frac{\textbf{H}(\boldsymbol{\psi}_{\!\textbf{n}})}{{r^*_{n+1}}}+\frac{\Delta {\textbf{M}_{\textbf{n}+1}} }{{r^*_{n+1}}},\end{equation}

where $\textbf{H}(\textbf{x})=(\beta-1)|\textbf{b}_{\textbf{0}}|(\textbf{x}-\textbf{p}_{\textbf{0}})$ and $\Delta{\textbf{M}_{\textbf{n}+1}}=\alpha\big({\boldsymbol{\xi_{\textbf{n}+1}}}-\boldsymbol{\psi}_{\!\textbf{n}}\big)$ . Therefore, we get

\begin{equation*}\begin{aligned}\sqrt{N}\big({\overline{\boldsymbol{\xi}}_{\textbf{N}}}-\boldsymbol{\psi_{\textbf{N}}}\big) &=\frac{1}{\sqrt{N}}\big(N{\overline{\boldsymbol{\xi}}_{\textbf{N}}}-N\boldsymbol{\psi_{\textbf{N}}}\big)=\frac{1}{\sqrt{N}}\sum_{n=1}^N\!\left[\boldsymbol{\xi}_{\textbf{n}}-{\boldsymbol{\psi}_{\!\textbf{n}-\textbf{1}}}+n\big({\boldsymbol{\psi}_{\!\textbf{n}-\textbf{1}}}-{\boldsymbol{\psi}_{\!\textbf{n}}}\big)\right]\\ &=\sum_{n=1}^N \textbf{Y}_{\textbf{N},\textbf{n}}+\textbf{Q}_{\textbf{N}},\end{aligned}\end{equation*}

where

\begin{equation*}\textbf{Y}_{\textbf{N},\textbf{n}}=\frac{\boldsymbol{\xi}_{\textbf{n}}-{\boldsymbol{\psi}_{\!\textbf{n}-\textbf{1}}}}{\sqrt{N}}=\frac{\Delta\textbf{M}_{\textbf{n}}}{\alpha\sqrt{N}}\end{equation*}

and

\begin{equation*}\textbf{Q}_{\textbf{N}}=\frac{1}{\sqrt{N}}\sum_{n=1}^N n\!\left({\boldsymbol{\psi}_{\!\textbf{n}-\textbf{1}}}-{\boldsymbol{\psi}_{\!\textbf{n}}}\right)=-\frac{1}{\sqrt{N}}\sum_{n=1}^N\frac{n}{r^*_{n}}\big( \textbf{H}\big({\boldsymbol{\psi}_{\!\textbf{n}-\textbf{1}}}\big) +\Delta\textbf{M}_{\textbf{n}}\big).\end{equation*}

Since $\sum_{n\geq 1} n/{r^*_{n}}<+\infty$ and $|\textbf{H}\big({\boldsymbol{\psi}_{\!\textbf{n}-\textbf{1}}}\big)| + |\Delta \textbf{M}_{\textbf{n}}|$ is uniformly bounded by a constant, we have that $\textbf{Q}_{\textbf{N}}$ converges to zero almost surely. Therefore it is enough to prove that $\sum_{n=1}^N\textbf{Y}_{\textbf{N},\textbf{n}}$ converges stably to the desired Gaussian kernel. For this purpose we observe that $E\big[\textbf{Y}_{\textbf{N},\textbf{n}}|\mathcal{F}_{n-1}\big]=\textbf{0}$ , and so, to prove the stable convergence, we have to check the following conditions (see [Reference Crimaldi, Letta and Pratelli22, Corollary 7] or [Reference Crimaldi19, Corollary 5.5.2]):

  1. (c1) $E\!\left[\,\max_{1\leq n\leq N} |\textbf{Y}_{\textbf{N},\textbf{n}} |\,\right]\to 0$ , and

  2. (c2) $\sum_{n=1}^N \textbf{Y}_{\textbf{N},\textbf{n}} {\textbf{Y}_{{\textbf{N},\textbf{n}}}^\top}\mathop{\longrightarrow}\limits^{P}\Sigma^2$ .

Regarding (c1), we observe that

\begin{equation*}\max_{1\leq n\leq N} |\textbf{Y}_{\textbf{N},\textbf{n}} |\leq \frac{1}{\sqrt{N}}\max_{1\leq n\leq N} |\boldsymbol{\xi}_{\textbf{n}}-{\boldsymbol{\psi}_{\!\textbf{n}-\textbf{1}}}| \leq \frac{1}{\sqrt{N}} \to 0.\end{equation*}

In order to conclude, we have to prove the condition (c2), that is,

\begin{equation*}\sum_{n=1}^N \textbf{Y}_{\textbf{N},\textbf{n}} {\textbf{Y}_{\textbf{N},\textbf{n}}^\top}=\frac{1}{N}\sum_{n=1}^N\big(\boldsymbol{\xi}_{\textbf{n}}-{\boldsymbol{\psi}_{\!\textbf{n}-\textbf{1}}}\big)\big(\boldsymbol{\xi}_{\textbf{n}}-{\boldsymbol{\psi}_{\!\textbf{n}-\textbf{1}}}\big)^\top\mathop{\longrightarrow}\limits^{P} \Sigma^2.\end{equation*}

The above convergence holds true even almost surely by standard martingale arguments (see, for instance, [Reference Berti, Crimaldi, Pratelli and Rigo9, Lemma 2]). Indeed, we have $\sum_{n\geq 1}E\Big[\big\|\boldsymbol{\xi}_{\textbf{n}}-{\boldsymbol{\psi}_{\!\textbf{n}-\textbf{1}}}\big\|^2\Big]/n^2\leq \sum_{n\geq 1}n^{-2}<+\infty$ and

\begin{equation*}E\Big[\big(\boldsymbol{\xi}_{\textbf{n}}-{\boldsymbol{\psi}_{\!\textbf{n}-\textbf{1}}}\big)\big(\boldsymbol{\xi}_{\textbf{n}}-{\boldsymbol{\psi}_{\!\textbf{n}-\textbf{1}}}\big)^\top|\mathcal{F}_{n-1}\Big]=\mathrm{diag} \big({\boldsymbol{\psi}_{\!\textbf{n}-\textbf{1}}}\big) - {\boldsymbol{\psi}_{\!\textbf{n}-\textbf{1}}}{\boldsymbol{\psi}_{\!\textbf{n}-\textbf{1}}}^\top\mathop{\longrightarrow}\limits^{a.s}\Sigma^2.\end{equation*}

The last stable convergence follows from the equality

\begin{equation*}\sqrt{N}\big({\overline{\boldsymbol{\xi}}_{\textbf{N}}}-{{\boldsymbol{\psi}_{\!\boldsymbol{\infty}}}}\big)=\sqrt{N}\big({\overline{\boldsymbol{\xi}}_{\textbf{N}}}-\boldsymbol{\psi_{\textbf{N}}}\big)+\sqrt{N}\big(\boldsymbol{\psi_{\textbf{N}}}-{{\boldsymbol{\psi}_{\!\boldsymbol{\infty}}}}\big),\end{equation*}

where the last term converges almost surely to zero by Theorem 3.

Remark 2. Equation (25) implies that the bounded stochastic process $\boldsymbol{\psi}= (\boldsymbol{\psi}_{\!\textbf{n}})_n$ is a positive (i.e. nonnegative) almost supermartingale [Reference Robbins and Siegmund49] and also a quasi-martingale [Reference Métivier39], because $\textbf{H}({\boldsymbol{\psi}_{\!\textbf{n}}})$ is uniformly bounded by a constant and $\sum_{n\geq 1} 1/{r^*_{n+1}}<+\infty$ .

3.3. Trivial cases

We here consider the two elementary cases $\beta=0$ and $\alpha=0$ .

3.3.1. The case $\beta=0$

In this case, by (7), (8), and (9), we have for all $i = 1, \ldots, k$

(26) \begin{equation}\psi_{0\,i}=\frac{b_{0\,i}+B_{0\,i}}{|\textbf{b}_{\textbf{0}}|+ |\textbf{B}_{\textbf{0}}| }\qquad\mbox{and}\qquad\psi_{n\,i}= \frac{b_{0\,i}+ \alpha \xi_{n\,i}}{|\textbf{b}_{\textbf{0}}|+ \alpha }\qquad\mbox{for } n\geq 1.\end{equation}

We now focus on $\boldsymbol{\psi}_{\!\textbf{n}}$ for $n\geq 1$ . The process $(\boldsymbol{\psi}_{\!\textbf{n}})_{n\geq 1}$ is a k-dimensional Markov chain with a finite state space $ S=\{\textbf{s}_{\textbf{1}},\ldots, \textbf{s}_{\textbf{k}}\} $ , where

\begin{equation*}\textbf{s}_{\textbf{i}} = \frac{1}{|\textbf{b}_{\textbf{0}}|+ \alpha }\big( b_{0\,1} , \ldots, b_{0\,i}+\alpha , \ldots, b_{0\,k}\big)^\top,\qquad \mbox{for } i=1,\ldots, k\,,\end{equation*}

and transition probability matrix

\begin{equation*}P = \frac{1}{|\textbf{b}_{\textbf{0}}|+ \alpha }\big(\textbf{1}_k \, {\textbf{b}_{\textbf{0}}}^\top + \alpha \mathrm{Id}_k \big)=\frac{|\textbf{b}_{\textbf{0}}|}{|\textbf{b}_{\textbf{0}}|+ \alpha }\Big(\textbf{1}_k \, {\textbf{p}_{\textbf{0}}}^\top + \tfrac{\alpha}{|\textbf{b}_{\textbf{0}}|} \mathrm{Id}_k \Big)\,,\end{equation*}

which is irreducible and aperiodic. Now, since $\textbf{1}_k \,{\textbf{p}_{\textbf{0}}}^\top $ is idempotent and commutes with the identity, we have

(27) \begin{equation}\begin{aligned}P^n & = \bigg(\frac{|\textbf{b}_{\textbf{0}}|}{|\textbf{b}_{\textbf{0}}|+ \alpha }\bigg)^n\bigg( \bigg(\sum_{j=0}^{n-1} \binom{n}{j}\Big(\tfrac{\alpha}{|\textbf{b}_{\textbf{0}}|}\Big)^j \bigg)\textbf{1}_k \, {\textbf{p}_{\textbf{0}}}^\top +\Big(\tfrac{\alpha}{|\textbf{b}_{\textbf{0}}|}\Big)^n \mathrm{Id}_k \bigg)\\ & = \bigg(\frac{|\textbf{b}_{\textbf{0}}|}{|\textbf{b}_{\textbf{0}}|+ \alpha }\bigg)^n\bigg(\bigg( \Big(1+\tfrac{\alpha}{|\textbf{b}_{\textbf{0}}|}\Big)^n -\Big(\tfrac{\alpha}{|\textbf{b}_{\textbf{0}}|}\Big)^n\bigg)\textbf{1}_k \, {\textbf{p}_{\textbf{0}}}^\top +\Big(\tfrac{\alpha}{|\textbf{b}_{\textbf{0}}|}\Big)^n \mathrm{Id}_k \bigg)\\ & = \textbf{1}_k \, {\textbf{p}_{\textbf{0}}}^\top +\bigg(\frac{\alpha}{|\textbf{b}_{\textbf{0}}|+ \alpha }\bigg)^n\Big(\mathrm{Id}_k- \textbf{1}_k \, {\textbf{p}_{\textbf{0}}}^\top \Big)\\ & = \textbf{1}_k \, {\textbf{p}_{\textbf{0}}}^\top +\gamma^n\Big(\mathrm{Id}_k- \textbf{1}_k \, {\textbf{p}_{\textbf{0}}}^\top \Big)\,,\end{aligned}\end{equation}

where $\gamma$ is the constant given in (5), which becomes equal to $\frac{\alpha}{|\textbf{b}_{\textbf{0}}| + {\alpha}}$ for $\beta=0$ . We note that $\gamma<1$ (since $|\textbf{b}_{\textbf{0}}|>0$ by assumption) and so $P^n \to \textbf{1}_k \, {\textbf{p}_{\textbf{0}}}^\top $ , and the unique invariant probability measure on S is hence $ \boldsymbol{\pi} = \textbf{p}_{\textbf{0}}$ .

Theorem 5. We have ${\overline{\boldsymbol{\xi}}_{\textbf{N}}}\stackrel{a.s.}\longrightarrow \textbf{p}_{\textbf{0}}$ and

\begin{equation*}\sqrt{N}\big({\overline{\boldsymbol{\xi}}_{\textbf{N}}}-\textbf{p}_{\textbf{0}}\big)=\frac{\sum_{n=1}^N\! \big(\boldsymbol{\xi}_{\textbf{n}} - \textbf{p}_{\textbf{0}}\big)}{\sqrt{N}}\mathop{\longrightarrow}^{d}_{N\to\infty}\mathcal{N} \big(0,\Sigma^2\big),\end{equation*}

where

(28) \begin{equation}\Sigma^2= \lambda\Big(\mathrm{diag}(\textbf{p}_{\textbf{0}})- \textbf{p}_{\textbf{0}} {\textbf{p}_{\textbf{0}}}^\top \Big),\end{equation}

with $\lambda$ defined in (6) (taking $\beta=0$ ).

Proof. We observe that, by (26), we have for each $n\geq 0$

\begin{equation*}\{\xi_{n+1\,i}=1\} = \{\boldsymbol{\psi}_{\!\textbf{n}+\textbf{1}}=\textbf{s}_{\textbf{i}}\}.\end{equation*}

Therefore, the strong law of large numbers for Markov chains immediately yields

\begin{equation*}{\overline{\boldsymbol{\xi}}_{\textbf{N}}} =\frac{1}{N} \sum_{n=1}^N \big( {\unicode{x1D7D9}}_{\{\boldsymbol{\psi}_{\!\textbf{n}} = \textbf{s}_{\textbf{1}}\}}, \ldots,{\unicode{x1D7D9}}_{\{\boldsymbol{\psi}_{\!\textbf{n}} = \textbf{s}_{\textbf{k}}\}} \big)^\top\stackrel{a.s.}\longrightarrow \textbf{p}_{\textbf{0}}.\end{equation*}

Take a vector $\textbf{c} = (c_1,\ldots,c_k)^T$ and define $g(\textbf{x}) =\textbf{c}^T (\textbf{x}-\textbf{p}_{\textbf{0}})$ . Recall that $g(\boldsymbol{\xi}_n)=g\big({\unicode{x1D7D9}}_{\{\boldsymbol{\psi}_{\!\textbf{n}} = \textbf{s}_{\textbf{1}}\}}, \ldots,{\unicode{x1D7D9}}_{\{\boldsymbol{\psi}_{\!\textbf{n}} = \textbf{s}_{\textbf{k}}\}} \big)$ and apply the CLT for uniformly ergodic Markov chains (see, for instance, [Reference Meyn and Tweedie40, Theorem 17.0.1]): the sequence

\begin{equation*}\left( \frac{\sum_{n=1}^N g (\boldsymbol{\xi}_{\textbf{n}})}{\sqrt{N}} \right)\end{equation*}

converges in distribution to the Gaussian distribution $\mathcal{N}\big(0,\sigma^2_{\textbf{c}}\big)$ , with

\begin{equation*}\begin{split}\sigma^2_{\textbf{c}} &= \mathrm{Var}\big[g \big(\boldsymbol{{\xi}^{(\pi)}_0}\big) \big] +2 \sum_{n\geq 1}\mathrm{Cov}\big(g \big(\boldsymbol{{\xi}^{(\pi)}_0}\big) ,g \big(\boldsymbol{{\xi}^{(\pi)}_{\textbf{n}}}\big)\big)\\ &= \textbf{c}^T \bigg( \mathrm{Var}\big[\boldsymbol{{\xi}^{(\pi)}_0}\big] +2 \sum_{n\geq 1} \mathrm{Cov}\big(\boldsymbol{{\xi}^{(\pi)}_0} ,\boldsymbol{{\xi}^{(\pi)}_n} \big) \bigg) \textbf{c}\,,\end{split}\end{equation*}

where

\begin{equation*}\boldsymbol{{\xi}^{(\pi)}_{\textbf{n}}}=\bigg({\unicode{x1D7D9}}_{\big\{{\boldsymbol{\psi}^{(\boldsymbol{\pi})}_{\textbf{n}}} =\textbf{s}_{\textbf{1}}\big\}}, \ldots, {\unicode{x1D7D9}}_{\big\{{\boldsymbol{\psi}^{(\boldsymbol{\pi})}_{\textbf{n}}} = \textbf{s}_{\textbf{k}}\big\}}\bigg)\end{equation*}

and $\big({\boldsymbol{\psi}^{(\boldsymbol{\pi})}_{\textbf{n}}}\big)_{n\geq 0}$ is a Markov chain with transition matrix P and initial distribution $\boldsymbol{\pi}$ , that is, $\textbf{p}_{\textbf{0}}$ . Now, by definition,

, and hence $\mathrm{Var}\big[\boldsymbol{{\xi}^{(\pi)}_0} \big] = \mathrm{diag}(\textbf{p}_{\textbf{0}}) -\textbf{p}_{\textbf{0}}\textbf{p}_{\textbf{0}}^T$ . Moreover, by means of (27),

Hence, since $\gamma < 1$ , we have $\sum_{n\geq 1}\gamma^n=\gamma/(1-\gamma)$ and so

\begin{align*}\mathrm{Var}\Big[\boldsymbol{{\xi}^{(\pi)}_0} \Big] +2 \sum_{n\geq 1} \mathrm{Cov}\big( \boldsymbol{{\xi}^{(\pi)}_0} ,\boldsymbol{{\xi}^{(\pi)}_n} \big) & =\Big(\mathrm{diag}(\textbf{p}_{\textbf{0}})- \textbf{p}_{\textbf{0}} {\textbf{p}_{\textbf{0}}}^\top \Big)\bigg(1+ \frac{2\gamma}{1-\gamma } \bigg).\end{align*}

By the Cramér–Wold device, the theorem is proved with $\Sigma^2$ given in (28).

Remark 3. Note that in Theorem 5 we do not assume $b_{0\,i}>0$ for all i, but only $|\textbf{b}_{\textbf{0}}|>0$ (as stated in Section 2). A different behavior is observed when $\textbf{b}_{\textbf{0}} = \textbf{0}$ . In this case, (26) gives $\boldsymbol{\psi}_{\!\textbf{n}} = \boldsymbol{\xi}_{\textbf{n}} $ for $n\geq 1$ . Since $\psi_{n\,i}=P(\xi_{n+1\,i}=1|\mathcal{F}_n)$ , the above equality implies recursively $\boldsymbol{\psi}_{\!\textbf{n}} = \boldsymbol{\xi}_{\textbf{n}} = \boldsymbol{\xi_{\textbf{1}}}$ for each $n\geq 1$ . In other words, the process of extractions $\boldsymbol{\xi}=(\boldsymbol{\xi}_{\textbf{n}})_{n\geq 1}$ is constant, with $P(\xi_{1\,i} = 1)= \psi_{0\,i} = {B_{0\,i}} / |\textbf{B}_{\textbf{0}}|$ .

3.3.2. The case $\alpha=0$

The model introduced above for $\alpha>0$ also makes sense when $\alpha=0$ . For completeness, in this section we discuss this case. Recall that we are assuming $|\textbf{b}_{\textbf{0}}|>0$ and $b_{0\,i}+B_{0\,i}>0$ (see Section 2). For the case $\beta>1$ , we here assume also $|\textbf{B}_{\textbf{0}}|>0$ .

When $\alpha=0$ , the random vectors $\boldsymbol{\xi}_{\textbf{n}}$ are independent with

\begin{equation*}P(\xi_{n+1\, i}=1)=\psi_{n\, i}=\frac{b_{0\,i}+\beta^{n}B_{0\, i}}{|\textbf{b}_{\textbf{0}}|+\beta^{n}|\textbf{B}_{\textbf{0}}|}.\end{equation*}

Therefore, we have

\begin{equation*}\psi_{n\,i}=\frac{b_{0\,i}+B_{0\, i}}{|\textbf{b}_{\textbf{0}}|+|\textbf{B}_{\textbf{0}}|}\end{equation*}

for all n if $\beta=1$ (which corresponds to the classical multinomial model) and

\begin{equation*}\psi_{n\,i}\longrightarrow\begin{cases}\frac{b_{0\,i}}{|\textbf{b}_{\textbf{0}}|}\qquad &\mbox{if } \beta\in [0,1),\\[9pt] \frac{B_{0\,i}}{|\textbf{B}_{\textbf{0}}|}\qquad &\mbox{if } \beta>1.\end{cases}\end{equation*}

Moreover, the following result holds true.

Theorem 6. We have

\begin{equation*}{\overline{\boldsymbol{\xi}}_{\textbf{N}}}\stackrel{a.s.}\longrightarrow \boldsymbol{\overline{\xi}_\infty}=\begin{cases}\frac{\textbf{b}_{\textbf{0}}+\textbf{B}_{\textbf{0}}}{|\textbf{b}_{\textbf{0}}|+|\textbf{B}_{\textbf{0}}|} \quad & {if } \beta=1,\\[9pt] \frac{\textbf{b}_{\textbf{0}}}{|\textbf{b}_{\textbf{0}}|}=\textbf{p}_{\textbf{0}} \quad & {if } \beta\in [0,1),\\[9pt] \frac{\textbf{B}_{\textbf{0}}}{|\textbf{B}_{\textbf{0}}|} \quad & {if } \beta>1.\\\end{cases}\end{equation*}

Moreover, we have

\begin{equation*}\sqrt{N}\left({\overline{\boldsymbol{\xi}}_{\textbf{N}}}-\boldsymbol{\overline{\xi}_\infty}\right)\stackrel{s}\longrightarrow \mathcal{N}\big(\textbf{0},\Sigma^2\big),\end{equation*}

where $\Sigma^2 = \mathrm{diag} ({{\boldsymbol{\psi}_{\!\boldsymbol{\infty}}}}) -{{\boldsymbol{\psi}_{\!\boldsymbol{\infty}}}} {\boldsymbol{\psi}_{\!\boldsymbol{\infty}}^\top}$ and $\mathop{\longrightarrow}\limits^{s}$ means stable convergence.

Proof. The almost sure convergence follows from the Borel–Cantelli lemmas (see, for instance, [Reference Williams54, Section 12.15]). Indeed, we have the following:

  • if $\sum_{n\geq 0}\psi_{n,i}<+\infty$ , then $\sum_{n=1}^N\xi_{n\,i}\stackrel{a.s.}\longrightarrow \xi_{\infty\,i}$ , with $P(\xi_{\infty\,i}<+\infty)=1$ ;

  • if $\sum_{n\geq 0}\psi_{n,i}=+\infty$ , then $\sum_{n=1}^N\xi_{n\,i}/\sum_{n=1}^N\psi_{n-1\,i}\stackrel{a.s.}\longrightarrow 1$ .

Hence, the statement of the theorem follows because

  1. (i) if $\beta=1$ , then $\sum_{n\geq 0}\psi_{n\,i}=+\infty$ and

    \begin{equation*}\sum_{n=1}^{N}\psi_{n-1\,i}\sim \frac{b_{0\,i}+B_{0\,i}}{|\textbf{b}_{\textbf{0}}|+|\textbf{B}_{\textbf{0}}|} N;\end{equation*}
  2. (ii) if $\beta\in [0,1)$ and $b_{0\,i}>0$ , then $\sum_{n\geq 0}\psi_{n\,i}=+\infty$ and

    \begin{equation*}\sum_{n=1}^{N}\psi_{n-1\,i}\sim \frac{b_{0\,i}}{|\textbf{b}_{\textbf{0}}|} N=p_{0\,i} N;\end{equation*}
  3. (iii) if $\beta\in [0,1)$ and $b_{0\,i}=0$ , then

    \begin{equation*}\sum_{n\geq 0}\psi_{n\,i}\leq \frac{B_{0\,i}}{|\textbf{b}_{\textbf{0}}|}\sum_{n\geq 0}\beta^n <+\infty\end{equation*}
    and so $\sum_{n\geq 1}\xi_{n\,i}<+\infty$ almost surely; that is, $\xi_{n\,i}=0$ eventually with probability one;
  4. (iv) if $\beta>1$ and $B_{0\,i}> 0$ , then $\sum_{n\geq 0}\psi_{n\,i}=+\infty$ and $\sum_{n=1}^{N}\psi_{n-1\,i}\sim \frac{B_{0\,i}}{|\textbf{B}_{\textbf{0}}|} N$ ;

  5. (v) if $\beta>1$ and $B_{0\,i}=0$ , then

    \begin{equation*}\sum_{n\geq 0}\psi_{n\,i}\leq \frac{b_{0\,i}}{|\textbf{B}_{\textbf{0}}|}\sum_{n\geq 0}\beta^{-n}<+\infty\end{equation*}
    and so $\sum_{n\geq 1}\xi_{n\,i}<+\infty$ almost surely; that is, $\xi_{n\,i}=0$ eventually with probability one.

For the CLT we argue as in the proof of Theorem 4. Indeed, we set $\textbf{Y}_{\textbf{N},\textbf{n}}=\frac{\boldsymbol{\xi}_{\textbf{n}}-{\boldsymbol{\psi}_{\!\textbf{n}-\textbf{1}}}}{\sqrt{N}}$ so that we have

\begin{equation*}\sqrt{N}\left({\overline{\boldsymbol{\xi}}_{\textbf{N}}}-\boldsymbol{\overline{\xi}_\infty}\right)=\sum_{n=1}^N\textbf{Y}_{\textbf{N},\textbf{n}}+\frac{1}{\sqrt{N}}\sum_{n=1}^N\Big(\boldsymbol{\psi}_{\!\textbf{n}-\textbf{1}}-\boldsymbol{\overline{\xi}_\infty}\Big),\end{equation*}

where the second term converges to zero because

\begin{equation*}\sum_{n=1}^N \big| \boldsymbol{\psi}_{\!\textbf{n}-\textbf{1}}-\boldsymbol{\overline{\xi}_\infty}\big|= \begin{cases}0\quad &\mbox{if } \beta=1,\\[1pt] O\!\left(\sum_{n=1}^N \beta^n\right)\quad &\mbox{if } \beta\in [0,1),\\[4pt] O\!\left(\sum_{n=1}^N \beta^{-n}\right)\quad &\mbox{if } \beta>1.\\\end{cases}\end{equation*}

Therefore it is enough to prove that $\sum_{n=1}^N\textbf{Y}_{\textbf{N},\textbf{n}}$ converges stably to the desired Gaussian kernel. To this end we observe that $E[\textbf{Y}_{\textbf{N},\textbf{n}}|\mathcal{F}_{n-1}]=\textbf{0}$ , and so, to prove the stable convergence, we have to check the following conditions (see [Reference Crimaldi, Letta and Pratelli22, Corollary 7] or [Reference Crimaldi19, Corollary 5.5.2]):

  1. (c1) $E\!\left[\,\max_{1\leq n\leq N} |\textbf{Y}_{\textbf{N},\textbf{n}} |\,\right]\to 0$ , and

  2. (c2) $\sum_{n=1}^N \textbf{Y}_{\textbf{N},\textbf{n}} {\textbf{Y}_{\textbf{N},\textbf{n}}^\top}\mathop{\longrightarrow}\limits^{P}\Sigma^2$ .

Regarding (c1), we note that

\begin{equation*}\max_{1\leq n\leq N} |\textbf{Y}_{\textbf{N},\textbf{n}} |\leq \frac{1}{\sqrt{N}}\max_{1\leq n\leq N} |\boldsymbol{\xi}_{\textbf{n}}-{\boldsymbol{\psi}_{\!\textbf{n}-\textbf{1}}}| \leq \frac{1}{\sqrt{N}} \to 0.\end{equation*}

Regarding (c2), we observe that

\begin{equation*}\sum_{n=1}^N \textbf{Y}_{\textbf{N},\textbf{n}} {\textbf{Y}_{\textbf{N},\textbf{n}}^\top}=\frac{1}{N}\sum_{n=1}^N\big(\boldsymbol{\xi}_{\textbf{n}}-{\boldsymbol{\psi}_{\!\textbf{n}-\textbf{1}}}\big)\big(\boldsymbol{\xi}_{\textbf{n}}-{\boldsymbol{\psi}_{\!\textbf{n}-\textbf{1}}}\big)^\top\mathop{\longrightarrow}\limits^{a.s.} \Sigma^2,\end{equation*}

because (see, for instance, [Reference Berti, Crimaldi, Pratelli and Rigo9, Lemma 2]) $\sum_{n\geq 1} E\big[\big\|\boldsymbol{\xi}_{\textbf{n}}-{\boldsymbol{\psi}_{\!\textbf{n}-\textbf{1}}}\big\|^2\big]/n^2\leq\sum_{n\geq 1} n^{-2}<+\infty$ and

\begin{equation*}E\Big[\big(\boldsymbol{\xi}_{\textbf{n}}-{\boldsymbol{\psi}_{\!\textbf{n}-\textbf{1}}}\big)\big(\boldsymbol{\xi}_{\textbf{n}}-{\boldsymbol{\psi}_{\!\textbf{n}-\textbf{1}}}\big)^\top|\mathcal{F}_{n-1}\Big]=\mathrm{diag} \big({\boldsymbol{\psi}_{\!\textbf{n}-\textbf{1}}}\big) - {\boldsymbol{\psi}_{\!\textbf{n}-\textbf{1}}}{\boldsymbol{\psi}_{\!\textbf{n}-\textbf{1}}}^\top\mathop{\longrightarrow}\limits^{a.s}\Sigma^2.\end{equation*}

4. Proof of the goodness-of-fit result (Theorem 1)

The proof is based on Theorem 2 (for $0<\beta<1$ ) and Theorem 5 (for $\beta = 0$ ), whose proofs are in Subsections 3.1 and 3.3.1, respectively. The almost sure convergence of $O_i/N$ immediately follows since $O_i/N=\overline{\xi}_{N\,i}$ . In order to prove the stated convergence in distribution, we mimic the classical proof for the Pearson chi-squared test based on the Sherman–Morrison formula (see [Reference Sherman and Morrison51]); however, see also [Reference Rao and Scott47, Corollary 2].

Proof. We start by recalling the Sherman–Morrison formula: if A is an invertible square matrix and $ 1 - \textbf{v}^\top A^{-1} \textbf{u} \neq 0 $ , then

\begin{equation*}\big(A - \textbf{u}\textbf{v}^\top\big)^{-1} =A^{-1} + \frac{A^{-1} \textbf{u}\textbf{v}^\top A^{-1}}{1 - \textbf{v}^\top A^{-1} \textbf{u} }.\end{equation*}

Given the observation $\boldsymbol{\xi}_{\textbf{n}}=(\xi_{n\,1},\dots,\,\xi_{n\,k})^{\top}$ , we define the ‘truncated’ vector $\boldsymbol{\xi}^*_{\textbf{n}}= \big(\xi^*_{n\, 1}, \ldots,\xi^*_{n\, k-1}\big)^\top$ , given by the first $k-1$ components of $\boldsymbol{\xi}_{\textbf{n}}$ . Theorem 2 (for $\beta \in (0, 1)$ ) and Theorem 5 (for $\beta = 0$ ) give the CLT for $(\boldsymbol{\xi}_{\textbf{n}})_n$ , which immediately implies

(29) \begin{equation}\sqrt{N}\left({\overline{\boldsymbol{\xi}}^*_{\textbf{N}}}-\textbf{p}^*\right)=\frac{\sum_{n=1}^N\! \big(\boldsymbol{\xi}^*_{\textbf{n}} - \textbf{p}^*\big) }{\sqrt{N}}\mathop{\longrightarrow}^{d}\mathcal{N} \Big(\textbf{0},\Sigma_*^2\Big),\end{equation}

where $\textbf{p}^*$ is given by the first $k-1$ components of $\textbf{p}_{\textbf{0}}$ and

\begin{equation*}\Sigma_*^2 = \lambda \Big( \mathrm{diag} (\textbf{p}^*) - \textbf{p}^*\textbf{p}^{*^T}\Big).\end{equation*}

By assumption $p_{0\,i}>0$ for all $i=1,\dots,\,k$ and so $\mathrm{diag}(\textbf{p}^*)$ is invertible with inverse $\mathrm{diag} (\textbf{p}^*)^{-1}= \mathrm{diag} \Big(\frac{1}{p_{0\,1}}, \ldots, \frac{1}{p_{0\,k-1}} \Big)$ ; since $ \big(\mathrm{diag} (\textbf{p}^*)^{-1} \big) \textbf{p}^* =\textbf{1}\in\mathbb{R}^{k-1}$ , we have

\begin{equation*}1 - \textbf{p}^{*^T} \mathrm{diag} (\textbf{p}^*)^{-1} \textbf{p}^* =1-\sum_{i=1}^{k-1}p_{0\,i}=\sum_{i=1}^k p_{0\,i} - \sum_{i=1}^{k-1} p_{0\,i} = p_{0\,k} >0.\end{equation*}

Therefore we can use the Sherman–Morrison formula with $A =\mathrm{diag} (\textbf{p}^*)$ and $\textbf{u}= \textbf{v} = \textbf{p}^*$ , and we obtain

(30) \begin{equation}\big(\Sigma_*^2\big)^{-1} = \frac{1}{\lambda}\Big( \mathrm{diag} (\textbf{p}^*) - \textbf{p}^*\textbf{p}^{*^T}\Big)^{-1}=\frac{1}{\lambda}\Big( \mathrm{diag} \Big(\tfrac{1}{p_{0\,1}}, \ldots, \tfrac{1}{p_{0\,k-1}} \Big)+ \frac{1}{p_{0\,k}} \textbf{1}\textbf{1}^\top \Big).\end{equation}

Now, since $\sum_{i=1}^{k} \big(\overline{\xi}_{N\,i} - {p_{0\,i}}\big) = 0$ , we have $\overline{\xi}_{N\,k} - {p_{0\,k}} = -\sum_{i=1}^{k-1}\big(\overline{\xi}_{N\,i} - {p_{0\,i}}\big)$ , and so we get

\begin{equation*}\begin{aligned}\sum_{i=1}^{k} \frac{\big(O_{i} - N{p_{0\,i}}\big)^2}{N{p_{0\,i}}} & =N \sum_{i=1}^{k} \frac{\big(\overline{\xi}_{N\,i} - {p_{0\,i}}\big)^2}{{p_{0\,i}}}=N \Bigg[\sum_{i=1}^{k-1} \frac{\big(\overline{\xi}_{N\,i} - {p_{0\,i}}\big)^2}{{p_{0\,i}}} +\frac{\big(\overline{\xi}_{N\,k} - {p_{0\,k}}\big)^2}{{p_{0\,k}}} \Bigg]\\ & =N \Bigg[ \sum_{i=1}^{k-1} \frac{\big(\overline{\xi}_{N\,i} - {p_{0\,i}}\big)^2}{{p_{0\,i}}} +\frac{\Big(\sum_{i=1}^{k-1} \big(\overline{\xi}_{N\,i} - {p_{0\,i}}\big) \Big)^2}{{p_{0\,k}}}\Bigg]\\ & =N \sum_{i_1,i_2=1}^{k-1} \big(\overline{\xi}_{N\,{i_1}} - {p_{0\,i_1}}\big)\big(\overline{\xi}_{N\,i_2} - {p_{0\,i_2}}\big)\Bigg( \delta_{i_1}^{i_2}\frac{1}{{p_{0\,i_1}}} +\frac{1}{{p_{0\,k}}} \Bigg),\end{aligned}\end{equation*}

where $\delta_{i_1}^{i_2}$ is equal to 1 if $i_1=i_2$ and equal to zero otherwise. Finally, from the above equalities, recalling (29) and (30), we obtain

\begin{equation*}\sum_{i=1}^{k} \frac{\big(O_{i} - N{p_{0\,i}}\big)^2}{N{p_{0\,i}}} =\lambda N \big({\overline{\boldsymbol{\xi}}^*_{\textbf{N}}} - \textbf{p}^*\big)^\top\big(\Sigma_*^2\big)^{-1} \big({\overline{\boldsymbol{\xi}}^*_{\textbf{N}}} - \textbf{p}^*\big)\mathop{\longrightarrow}\limits^{d}\lambda W_0=W_*,\end{equation*}

where $W_0$ is a random variable with distribution $\chi^2(k-1)=\Gamma((k-1)/2,1/2)$ , where $\Gamma(a,b)$ denotes the gamma distribution with density function

\begin{equation*}f(w)=\frac{b^a}{\Gamma(a)}w^{a-1}e^{-bw}.\end{equation*}

As a consequence, $W_*$ has distribution $\Gamma((k-1)/2,1/(2\lambda))$ .

Appendix A

A.1. A central limit theorem for a multidimensional compact Markov chain

In this section we prove the general CLT for Markov chains, used for the proof of Theorem 2.

Let (S, d) be a compact metric space and denote by C(S) the space of continuous real functions on S, by Lip(S) the space of Lipschitz continuous real functions on S, and by $Lip(S\times S)$ the space of Lipschitz continuous real functions on $S\times S$ . Moreover, we define $\|f\|_\infty=\sup_{x\in S}|f(x)|$ for each f in C(S) and, for each f in Lip(S),

\begin{equation*}|f|_{Lip} = \sup_{x,y\in S,\, x\neq y}\frac{|f(y)-f(x)|}{d(x,y)}\qquad\mbox{and }\qquad\|f\|_{Lip}=|f|_{Lip}+\|f\|_\infty.\end{equation*}

Let P(x, dy) be a Markovian kernel on S and set $(Pf)(x) = \int_Sf(y)P(x,dy)$ . We now recall some definitions and results concerning Markov chains with values in S.

Definition 1. We say that P is weak Feller if $(Pf)(x)=\int_S f(y)P(x,dy)$ defines a linear operator $P\,:\,C(S)\to C(S)$ . A Markov chain with a weak Feller transition kernel is called a weak Feller Markov chain.

Remark 4. If P is weak Feller, then the sequence $(P^n)_{n\geq 1}$ of operators from C(S) to C(S) is uniformly bounded with respect to $\|\cdot\|_\infty$ : indeed, we simply have

\begin{multline*}\|P^nf\|_\infty=\sup_{x\in S} |P^nf(x)| =\sup_{x\in S} \Big|\int_S f(y)P^n(x,dy)\Big|\\ \leq \sup_{x\in S} \bigg( \int_S \sup_{y\in S} |f(y) | P^n(x,dy) \bigg) =\sup_{y\in S} |f(y)|=\|f\|_\infty .\end{multline*}

Moreover, the existence of at least one invariant probability measure for P is easily shown. In fact, the set of probability measures $\mathcal{P}(S)$ on S, endowed with the topology of weak convergence, is a compact convex set. In addition, the adjoint operator of P, namely

\begin{equation*}P^* \,:\, \mathcal{P}(S) \to \mathcal{P}(S) ,\qquad (P^*\nu)(B) = \int_S \nu(dx) P(x,B) ,\end{equation*}

is continuous on $\mathcal{P}(S)$ (since P is weak Feller). Then, the existence of an invariant probability measure $\pi$ is a consequence of Brouwer’s fixed-point theorem.

Definition 2. We say that P is semi-contractive or a semi-contraction on Lip(S) if it maps Lip(S) into itself and there exists a constant $\gamma<1$ such that

\begin{equation*}|Pf|_{Lip}\leq \gamma |f|_{Lip}\end{equation*}

for each $f\in Lip(S)$ .

We now give the definition of compact Markov chain (see [Reference Norman42, Chapter 3] for a general exposition of the theory of these processes, and [Reference Doeblin and Fortet24] for the beginning of this theory).

Definition 3. We say that P is a Doeblin–Fortet operator if it is weak Feller and a bounded operator from $(Lip(S),\|\cdot\|_{Lip})$ into itself, and there are finite constants $n_0\geq 1$ , $\gamma<1$ , and $R\geq 0$ such that

\begin{equation*}|P^{n_0}f|_{Lip}\leq \gamma |f|_{Lip} + R \|f\|_\infty\end{equation*}

for each $f\in Lip(S)$ . A Markov chain with a Doeblin–Fortet operator on a compact set S is called compact Markov chain (or process).

Note that the Doeblin–Fortet operator, the weak Feller property, and the semi-contraction may also be defined for a non-compact state space. In general, a compact Markov process is a Doeblin–Fortet process in a compact state space. In our framework, since S is compact, the two concepts coincide and the result below follows immediately.

Lemma 1. If P is weak Feller and a semi-contractive operator on Lip(S), then P is a Doeblin–Fortet operator. In other words, a weak Feller Markov chain such that its transition kernel is semi-contractive on Lip(S) is a compact Markov chain.

Definition 4. We say that P is irreducible and aperiodic if

\begin{equation*}Pf=e^{i\theta}f,\quad \text{with }\theta\in\mathbb{R},\quad f\in Lip(S)\Rightarrow e^{i\theta}=1, \quad \mbox{and}\quad f=\mbox{constant}.\end{equation*}

A Markov chain with an irreducible and aperiodic transition kernel is called an irreducible and aperiodic Markov chain.

Under the hypotheses of the theorem of Ionescu-Tulcea and Marinescu in [Reference Ionescu Tulcea and Marinescu35], the spectral radius of P is 1, the set of eigenvalues of P of modulus 1 has only a finite number of elements, and each relative eigenspace is finite-dimensional. This theorem can always be applied to a compact Markov chain (see [Reference Norman42, Theorem 3.3.1]). More specifically, every compact Markov chain has d disjoint closed sets, called ergodic sets, contained in its compact state space S. These sets are both the support of the base of the ergodic invariant probability measures, and the support of a base of the eigenspaces related to the eigenvalues of modulus 1 (see [Reference Norman42, Theorem 3.4.1]). In addition, each of these ergodic sets may be subdivided into $p_j$ closed disjoint subsets. The number $p_j$ is the period of the jth irreducible component, and the ergodic subdivision gives the support of the eigenfunctions related to the $p_j$ roots of 1 (see [Reference Norman42, Theorem 3.5.1]). Then, as also explained in [Reference Norman42, Section 3.6], there are no other eigenvalues of modulus 1 except 1 (aperiodicity) and no other eigenfunctions except the constant for the eigenvalue equal to 1 (irreducibility) if and only if the compact Markov chain has but one ergodic kernel, and this kernel has period 1. In other words, the following result holds true.

Theorem 7. Let $\psi = (\psi_n)_{n\geq 0}$ be a compact Markov chain and let $\pi$ an invariant probability measure with respect to its transition kernel. If $\psi = (\psi_n)_{n\geq 0}$ converges in distribution to $\pi$ , regardless of its initial distribution, then $\pi$ is the unique invariant probability measure and $\psi$ is irreducible and aperiodic.

We now note that if P is Doeblin–Fortet, irreducible, and aperiodic, then it satisfies all the conditions given in [Reference Guivarc’h and Hardy30, Définition 0] and [Reference Guivarc’h and Hardy30, Définition 1]. Therefore, it has a unique invariant probability measure $\pi$ , and for any $f\in Lip(S\times S)$ , there exists a unique (up to a constant) function $u_f \in Lip(S)$ such that

\begin{equation*}u_f(x) - Pu_f(x) =\int_S f(x,y)P(x,dy) - \int_S \int_S f(x,y)P(x,dy) \pi(dx).\end{equation*}

By means of this function $u_f$ , it is possible to define the (unique) function $ f^{\prime}(x,y) = f(x,y)+u_f(y)-u_f(x) $ so that we have

\begin{equation*}m(f) = \int_S \int_S f(x,y)P(x,dy) \pi(dx)=\int_S \int_S f^{\prime}(x,y)P(x,dy) \pi(dx) = m(f^{\prime}).\end{equation*}

In addition, we may define the quantity $\sigma^2(f) \geq 0$ as follows (see [Reference Guivarc’h and Hardy30, Equation (6)]):

(A.1) \begin{multline}\sigma^2(f) =\int_S \int_S \big[ f^{\prime}(x,y)-m(f^{\prime}) \big]^2 P(x,dy) \pi(dx)\\=\int_S \int_S \big[ f(x,y)-m(f)+u_f(y)-u_f(x) \big]^2 P(x,dy) \pi(dx).\end{multline}

Finally, we have the following convergence result.

Theorem 8. ([Reference Guivarc’h and Hardy30, Théorème 1 and Théorème 2]) Let $\psi=(\psi_n)_{n\geq 0}$ be an irreducible and aperiodic compact Markov chain and denote by $\pi$ its unique invariant probability measure. Let $f\in Lip(S\times S)$ such that $m(f)=0$ and $\sigma^2(f)>0$ . Then, setting $S_N(f)=\sum_{n=0}^{N-1}f(\psi_n,\psi_{n+1})$ , we have

\begin{equation*}\frac{S_N(f)}{\sqrt{N}}\mathop{\longrightarrow}^{d}_{N\to\infty}\mathcal{N}\big(0,\sigma^2(f)\big),\end{equation*}

and

\begin{equation*}\sup_{t}\Big| P\big(S_N(f)<t\sqrt{N}\big)-\mathcal{N}\big(0,\sigma^2(f)\big)({-}\infty, t)\Big|=O\big(1/\sqrt{N}\big).\end{equation*}

Now let us specialize our assumptions, taking as S a compact subset of $\mathbb{R}^k$ . Therefore, in the sequel we will use boldface to highlight the fact that we are working with vectors.

Definition 5. (‘Linearity’ condition.) We say that P and ${\textbf{f}}\,:\, S\times S \to \mathbb{R}^d$ form a linear model if ${\textbf{f}}$ is linear (in $\textbf{x}$ and $\textbf{y}$ ) with $m({\textbf{f}})= \textbf{0}$ and the function

\begin{equation*}(P\textbf{y})(\textbf{x})= \int_S\textbf{y}P(\textbf{x},\textbf{dy})\end{equation*}

is linear (in $\textbf{x}$ ).

Remark 5. Denote by $\textbf{p}_{\textbf{0}} = \int_S \textbf{x} \pi(\textbf{dx})$ the mean value under the invariant probability measure $\pi$ of P. If P and ${\textbf{f}}$ form a linear model, then there exist two matrices $A_{1},A_{2} \in\mathbb{R}^{d\times k}$ such that

(A.2) \begin{equation}{\textbf{f}} (\textbf{x},\textbf{y}) = A_{1}(\textbf{x}-\textbf{p}_{\textbf{0}}) + A_{2}(\textbf{y}-\textbf{p}_{\textbf{0}})\end{equation}

and a square matrix $A_{P} \in \mathbb{R}^{k\times k}$ such that

(A.3) \begin{equation}(P(\textbf{y}-\textbf{p}_{\textbf{0}})) (\textbf{x}) = \int_S (\textbf{y}-\textbf{p}_{\textbf{0}}) P(\textbf{x},\textbf{dy}) =A_{P} (\textbf{x}-\textbf{p}_{\textbf{0}}).\end{equation}

Indeed, if $(P\textbf{y})(\textbf{x}) = A_{P}\textbf{x} + \textbf{b}$ , using that $\pi$ is invariant with respect to P, we obtain

\begin{equation*}\textbf{p}_{\textbf{0}} = \int_S \textbf{y} \pi(\textbf{dy}) =\int_S \int_S \textbf{y} P(\textbf{x},\textbf{dy}) \pi(\textbf{dx}) =\int_S [A_{P}\textbf{x} + \textbf{b}]\pi(\textbf{dx}) = A_{P}\textbf{p}_{\textbf{0}} + \textbf{b},\end{equation*}

and hence $(P(\textbf{y}-\textbf{p}_{\textbf{0}})) (\textbf{x}) = A_{P} (\textbf{x}-\textbf{p}_{\textbf{0}})$ . Moreover, if ${\textbf{f}}(\textbf{x},\textbf{y}) = A_{1}\textbf{x}+A_{2}\textbf{y}+\textbf{b}$ , then

\begin{align*}m(A_{1}\textbf{x}+ A_{2}\textbf{y} + \textbf{b})& =\int_S P(\textbf{x},\textbf{dy}) \int_SA_{1}\textbf{x} \pi(\textbf{dx})+\int_S A_{2}\textbf{y} \pi(\textbf{dy}) + \textbf{b}\\& = (A_{1}+ A_{2})\textbf{p}_{\textbf{0}} + \textbf{b},\end{align*}

and hence, if $m({\textbf{f}})= \textbf{0}$ , we obtain ${\textbf{f}}=A_{1}(\textbf{x}-\textbf{p}_{\textbf{0}}) + A_{2}(\textbf{y}-\textbf{p}_{\textbf{0}})$ .

Theorem 9. Let $\psi=(\psi_n)_{n\geq 0}$ be an irreducible and aperiodic compact Markov chain, and denote by P its transition kernel and by $\pi$ its unique invariant measure. Assume that P and ${\textbf{f}}$ form a linear model and let $A_1,\,A_2$ , and $A_P$ be defined as in (A.2) and in (A.3). Then, setting ${\textbf{S}_{\textbf{N}}}({\textbf{f}})=\sum_{n=0}^{N-1}{\textbf{f}}(\boldsymbol{\psi}_{\!\textbf{n}},\boldsymbol{\psi}_{\!\textbf{n}+\textbf{1}})$ , we have

\begin{equation*}\frac{\textbf{S}_{\textbf{N}}({\textbf{f}})}{\sqrt{N}}\mathop{\longrightarrow}^{d}_{N\to\infty}\mathcal{N}\big(\textbf{0},\Sigma^2\big),\end{equation*}

where

\begin{equation*}\Sigma^2 = D_{1} \Sigma^2_{\pi} D_{1}^\top+D_{1} \Sigma^2_{\pi} A_{P}^\top D_{2}^\top +D_{2} A_{P} \Sigma^2_{\pi} D_{1}^\top+D_{2} \Sigma^2_{\pi} D_{2}^\top ,\end{equation*}

with

\begin{equation*}\Sigma^2_\pi = \int_S (\textbf{x}-\textbf{p}_{\textbf{0}})(\textbf{x}-\textbf{p}_{\textbf{0}})^\top\pi(\textbf{dx})\end{equation*}

(the variance–covariance matrix under the invariant probability measure $\pi$ ),

\begin{equation*}D_{1} = A_{1}-D_0, \quad {and}\quad D_{2} = A_{2}+D_0,\end{equation*}

where $D_0 = (A_{1}+ A_{2} A_{P})(Id - A_{P})^{-1} $ . Moreover, for any $\textbf{c}\in\mathbb{R}^k$ ,

\begin{equation*}\sup_{t}\Big| P\big({\textbf{S}_{\textbf{N}}}\big(\textbf{c}^\top {\textbf{f}}\big)<t\sqrt{N}\big)-\mathcal{N}\big(0,\textbf{c}^\top\Sigma^2\textbf{c}\big)({-}\infty, t)\Big|=O\big(1/\sqrt{N}\big).\end{equation*}

Proof. As a consequence of Definition 2, the spectral radius of $A_{P}$ must be less than one, and hence $Id - A_{P}$ is invertible. Therefore, we may define

\begin{equation*}\textbf{u}_{\textbf{f}}(\textbf{x}) = D_0(\textbf{x}-\textbf{p}_{\textbf{0}}) =(A_{1}+ A_{2} A_{P})(Id - A_{P})^{-1} (\textbf{x}-\textbf{p}_{\textbf{0}}) ,\end{equation*}

so that we have

\begin{align*}\textbf{u}_{\textbf{f}} (\textbf{x}) - (P\textbf{u}_{\textbf{f}})(\textbf{x}) & = (A_{1}+ A_{2} A_{P})(Id - A_{P})^{-1} (\textbf{x}-\textbf{p}_{\textbf{0}})\\ & \qquad -(A_{1}+ A_{2} A_{P})(Id - A_{P})^{-1} A_{P} (\textbf{x}-\textbf{p}_{\textbf{0}})\\ &= (A_{1}+ A_{2} A_{P}) (\textbf{x}-\textbf{p}_{\textbf{0}})\\ &= A_{1} (\textbf{x}-\textbf{p}_{\textbf{0}}) \int_S P(\textbf{x},\textbf{dy})+ A_{2}\int_S (\textbf{y}-\textbf{p}_{\textbf{0}}) P(\textbf{x},\textbf{dy})\\ &= \int_S {\textbf{f}}(\textbf{x},\textbf{y}) P(\textbf{x},\textbf{dy}) - \textbf{0} \\ &= \int_S {\textbf{f}}(\textbf{x},\textbf{y}) P(\textbf{x},\textbf{dy}) -\int_S \int_S {\textbf{f}}(\textbf{x},\textbf{y}) P(\textbf{x},\textbf{dy}) \pi(\textbf{dx}) .\end{align*}

We immediately get that the function $ \textbf{g}(\textbf{x},\textbf{y}) ={\textbf{f}}(\textbf{x},\textbf{y}) +\textbf{u}_{\textbf{f}} (\textbf{y}) -\textbf{u}_{\textbf{f}} (\textbf{x}) $ is linear and it may be written as $ \textbf{g}(\textbf{x},\textbf{y}) =D_{1}(\textbf{x}-\textbf{p}_{\textbf{0}}) + D_{2}(\textbf{y}-\textbf{p}_{\textbf{0}})$ . Taking into account that

\begin{equation*}\int_S \int_S (\textbf{y}-\textbf{p}_{\textbf{0}}) P(\textbf{x},\textbf{dy}) \pi(\textbf{dx}) =\int_S (\textbf{y}-\textbf{p}_{\textbf{0}}) \pi(\textbf{dy}) = \textbf{0},\end{equation*}
(A.4) \begin{equation}\int_S \int_S (\textbf{y}-\textbf{p}_{\textbf{0}})(\textbf{y}-\textbf{p}_{\textbf{0}})^\top P(\textbf{x},\textbf{dy})\pi(\textbf{dx}) =\int_S (\textbf{y}-\textbf{p}_{\textbf{0}})(\textbf{y}-\textbf{p}_{\textbf{0}})^\top \pi(\textbf{dy}) = \Sigma^2_{\pi},\end{equation}

and

(A.5) \begin{equation}\int_S \int_S (\textbf{y}-\textbf{p}_{\textbf{0}})(\textbf{x}-\textbf{p}_{\textbf{0}})^\top P(\textbf{x},\textbf{dy})\pi(\textbf{dx}) =A_{P}\int_S (\textbf{x}-\textbf{p}_{\textbf{0}})(\textbf{x}-\textbf{p}_{\textbf{0}})^\top \pi(\textbf{dx}) =A_{P} \Sigma^2_{\pi},\end{equation}

we can compute the quantity

\begin{align*}\int_S \int_S \textbf{g}(\textbf{x},\textbf{y})\textbf{g}(\textbf{x},\textbf{y})^\top & P(\textbf{x},\textbf{dy}) \pi(\textbf{dx})\\ & = \int_S \int_S \big[ D_{1}(\textbf{x}-\textbf{p}_{\textbf{0}})+ D_{2}(\textbf{y}-\textbf{p}_{\textbf{0}}) \big] \\ & \qquad \qquad \qquad\big[ D_{1}(\textbf{x}-\textbf{p}_{\textbf{0}})+ D_{2}(\textbf{y}-\textbf{p}_{\textbf{0}}) \big]^\top P(\textbf{x},\textbf{dy}) \pi(\textbf{dx})\\ & = D_{1} \Sigma^2_{\pi} D_{1}^\top+D_{1} \Sigma^2_{\pi} A_{P}^\top D_{2}^\top +D_{2} A_{P} \Sigma^2_{\pi} D_{1}^\top+D_{2} \Sigma^2_{\pi} D_{2}^\top\\ &= \Sigma^2.\end{align*}

By the Cramér–Wold device, the theorem is proven with $\Sigma^2$ given above if we prove that, for any $\textbf{c}$ ,

\begin{equation*}\textbf{c}^\top\frac{\textbf{S}_{\textbf{N}}({\textbf{f}})}{\sqrt{N}}=\frac{\textbf{S}_{\textbf{N}}\big(\textbf{c}^\top{\textbf{f}}\big)}{\sqrt{N}}\mathop{\longrightarrow}^{d}_{N\to\infty}\mathcal{N}\big(0,\textbf{c}^\top\Sigma\textbf{c}\big).\end{equation*}

Therefore, in order to conclude, it is enough to note that the above convergence is a consequence of Theorem 8 with $f =\textbf{c}^\top{\textbf{f}}$ . Indeed, by definition $f\in Lip(S\times S)$ and the function $u_f\in Lip(S)$ in (A.1) may be chosen as $ u_f = \textbf{c}^\top\textbf{u}_{\textbf{f}}$ , so that $m(f)=0 $ and $ \sigma^2(f)=\textbf{c}^\top\Sigma\textbf{c}$ .

A.2. Coupling technique

The result proven in this subsection plays a relevant rôle in the proof of Theorem 2. Indeed, it shows that, under suitable assumptions, two stochastic processes can be ‘coupled’ in a suitable way, preserving their respective joint distributions.

Set $S^*=\{\textbf{x}\,:\,x_i\geq 0,\; |\textbf{x}|= 1\} $ (i.e. the standard (or probability) simplex in $\mathbb{R}^k$ ), and recall that $\{\textbf{e}_{\textbf{1}}, \ldots, \textbf{e}_{\textbf{k}}\}$ denotes the canonical base of $\mathbb{R}^k$ . We have the following technical lemma.

Lemma 2. There exist two measurable functions $\textbf{h}^{(\textbf{1})},\textbf{h}^{(\textbf{2})}\,:\,S^*\times S^*\times (0, 1) \to \{\textbf{e}_{\textbf{1}}, \ldots, \textbf{e}_{\textbf{k}}\}$ , such that for any ${\textbf{x} ,\textbf{y} \in S^*}$ ,

(A.6) \begin{equation}\begin{aligned}&\int_{(0, 1)} {\unicode{x1D7D9}}_{\big\{\textbf{h}^{(\textbf{1})} (\textbf{x} , \textbf{y}, u) = \textbf{e}_{\textbf{i}}\big\}} du = x_i,&& \forall i = 1, \ldots, k,\\ &\int_{(0, 1)} {\unicode{x1D7D9}}_{\big\{\textbf{h}^{(\textbf{2})} (\textbf{x} , \textbf{y}, u) = \textbf{e}_{\textbf{i}}\big\}} du = y_i,&& \forall i = 1, \ldots, k,\end{aligned}\end{equation}

and

(A.7) \begin{equation}\int_{(0, 1)} {\unicode{x1D7D9}}_{\big\{\textbf{h}^{(\textbf{1})} (\textbf{x} , \textbf{y}, u) \neq\textbf{h}^{(\textbf{2})} (\textbf{x} , \textbf{y}, u) \big\}} du\leq \frac{|\textbf{x} -\textbf{y}|}{2}.\end{equation}

As a consequence, we have

(A.8) \begin{equation}\int_{(0, 1)} \big| \textbf{h}^{(\textbf{1})} (\textbf{x} , \textbf{y}, u) -\textbf{h}^{(\textbf{2})} (\textbf{x} , \textbf{y}, u) \big| du\leq |\textbf{x} -\textbf{y}|.\end{equation}

Proof. Given $\textbf{x},\textbf{y} \in S^*$ , define ${\underline{\textbf{xy}}}= \textbf{x}\wedge \textbf{y}$ so that $\underline{xy}_{\,i}= \min\!(x_{i},y_i)$ . Set $u_0= |{\underline{\textbf{xy}}}| = \sum_{i=1}^k \min\!(x_{i},y_i)$ , and note that $0 \leq u_0 \leq 1$ . Moreover, for any $i \in \{1,\ldots,k\}$ , set

\begin{equation*}\begin{aligned}A_{{\underline{\textbf{xy}}}\,i} & = \Bigg\{u \,:\, \sum_{j=1}^{i-1} \underline{xy}_{\,j} < u\leq\sum_{j=1}^{i} \underline{xy}_{\,j}\Bigg\},\\ A_{\textbf{x}\,i} & = \Bigg\{u \,:\, u_0+\sum_{j=1}^{i-1} \big(x_j-\underline{xy}_{\,j}\big) < u\leq u_0+ \sum_{j=1}^{i} \big(x_j -\underline{xy}_{\,j}\big) \Bigg\},\\ A_{\textbf{y}\,i} & = \Bigg\{u \,:\, u_0+\sum_{j=1}^{i-1} \big(y_j-\underline{xy}_{\,j}\big) < u\leq u_0+ \sum_{j=1}^{i} \big(y_j -\underline{xy}_{\,j}\big) \Bigg\},\end{aligned}\end{equation*}

and let

\begin{equation*}\begin{aligned} \textbf{h}^{(\textbf{1})} (\textbf{x} , \textbf{y}, u) = \textbf{e}_{\textbf{i}} \qquad \text{if } u \in A_{{\underline{\textbf{xy}}}\,i} \cup A_{\textbf{x}\,i}, \quad \mbox{and } \\ \textbf{h}^{(\textbf{2})} (\textbf{x} , \textbf{y}, u) = \textbf{e}_{\textbf{i}} \qquad \text{if } u \in A_{{\underline{\textbf{xy}}}\,i} \cup A_{\textbf{y}\,i} .\end{aligned} \end{equation*}

Observe that since $1 = u_0 + \sum_{i=1}^{k} \big(x_i-\underline{xy}_{\,i}\big)= u_0 + \sum_{i=1}^{k} \big(y_i-\underline{xy}_{\,i}\big)$ , the equalities above uniquely define $\textbf{h}^{(\textbf{1})},\textbf{h}^{(\textbf{2})}$ on the whole domain. Moreover, since $x_i =\underline{xy}_{\,i} +\big(x_i-\underline{xy}_{\,i}\big)$ and $y_i = \underline{xy}_{\,i}+\big(y_i-\underline{xy}_{\,i}\big)$ , the two conditions collected in Equation (A.6) are satisfied.

To check (A.7), just note that $\textbf{h}^{(\textbf{1})} (\textbf{x} , \textbf{y}, u) $ is equal to $\textbf{h}^{(\textbf{2})} (\textbf{x}, \textbf{y}, u) $ on the set $\cup_i A_{{\underline{\textbf{xy}}}\,i} =(0,u_0)$ , and we have

\begin{equation*}\begin{aligned}2(1 - u_0) & = \sum_{i=1}^k x_i + \sum_{i=1}^k y_i -2 \sum_{i=1}^k\underline{xy}_{\,i}= \sum_{i=1}^k ( x_i + y_i - \min\!(x_i,y_i) ) - \min\!(x_i,y_i)\\& =\sum_{i=1}^k \max\!(x_i,y_i) - \min\!(x_i,y_i) = |\textbf{x} -\textbf{y}| .\end{aligned}\end{equation*}

Finally, (A.8) follows immediately from (A.7) since $| \textbf{h}^{(\textbf{1})}-\textbf{h}^{(\textbf{2})} |\leq 2$ .

Now we are ready to prove the following ‘coupling result’.

Theorem 10. Let $\boldsymbol{{\psi}^{(1)}}= \big(\boldsymbol{{\psi}_{n}^{(1)}}\big)_n$ and $\boldsymbol{{\psi}^{(2)}}=\big(\boldsymbol{{\psi}_{n}^{(2)}}\big)_n$ be two stochastic processes with values in $S^*$ that evolve according to the following dynamics:

(A.9) \begin{equation}\begin{aligned}\boldsymbol{{\psi}_{\textbf{n}+1}^{(\ell)}} & = a_0 \boldsymbol{{\psi}_{\textbf{n}}^{(\ell)}}+ a_1\boldsymbol{\xi_{\textbf{n}+1}^{(\ell)}}+ {\textbf{l}_{\textbf{n}+\textbf{1}}^{(\boldsymbol{\ell})}}\Big(\boldsymbol{{\psi}_{\textbf{n}}^{(\ell)}},\boldsymbol{\xi_{\textbf{n}+1}^{(\ell)}}\Big)+ \textbf{c} ,&& \ell=1,\,2,\end{aligned}\end{equation}

where $a_0,a_1 \geq 0$ , $\textbf{c}\in\mathbb{R}^k$ , $\boldsymbol{\xi_{\textbf{n}+1}^{(\ell)}}$ are random variables taking values in $\{\textbf{e}_{\textbf{1}}, \ldots, \textbf{e}_{\textbf{k}}\}$ and such that

(A.10) \begin{equation}\begin{aligned}P\Big(\boldsymbol{\xi_{\textbf{n}+1}^{(\ell)}} = \textbf{e}_{\textbf{i}} \Big|\boldsymbol{{\psi}_{0}^{(\ell)}},\, \boldsymbol{\xi_{1}^{(\ell)}},\ldots,\boldsymbol{\xi_{\textbf{n}}^{(\ell)}}\Big)& =\\P\Big(\boldsymbol{\xi_{\textbf{n}+1}^{(\ell)}}= \textbf{e}_{\textbf{i}}\Big| \boldsymbol{{\psi}_{\textbf{n}}^{(\ell)}} \Big)& = {{\psi}_{n\,i}^{(\ell)}},\quad {for } i= 1, \dots,\,k,\end{aligned}\end{equation}

and ${\textbf{l}_{\textbf{n}+\textbf{1}}^{(\boldsymbol{\ell})}}$ are measurable functions such that $\big|{\textbf{l}^{(\boldsymbol{\ell})}_{\textbf{n}+\textbf{1}}}\big|=O\big(c_{n+1}^{(\ell)}\big)$ . Then there exist two stochastic processes ${\widetilde{\boldsymbol{\psi}}^{(\boldsymbol{\ell})}}=\big({\widetilde{\boldsymbol\psi}_{n}^{(\ell)}}\big)_{n\geq 0}$ , $\ell=1,2$ , evolving according to the dynamics

(A.11) \begin{equation}\begin{aligned}{\widetilde{\boldsymbol\psi}_{\textbf{n}+1}^{(\ell)}} & = a_0 {\widetilde{\boldsymbol\psi}_{\textbf{n}}^{(\ell)}}+ a_1{\widetilde{\boldsymbol\xi}_{\textbf{n}+1}^{(\ell)}}+ {\textbf{l}_{\textbf{n}+\textbf{1}}^{(\boldsymbol{\ell})}}\Big({\widetilde{\boldsymbol\psi}_{n}^{(\ell)}},{\widetilde{\boldsymbol\xi}_{\textbf{n}+1}^{(\ell)}}\Big)+ \textbf{c} ,&& \ell=1,\,2,\end{aligned}\end{equation}

with ${\widetilde{\boldsymbol\psi}_0^{(\ell)}}=\boldsymbol{\psi_{0}^{(\ell)}}$ and

(A.12) \begin{equation}\begin{aligned}P\Big({\widetilde{\boldsymbol\xi}_{\textbf{n}+1}^{(\ell)}} = \textbf{e}_{\textbf{i}} \Big|\boldsymbol{{\psi}_{0}^{(1)}},\,\boldsymbol{{\psi}_{0}^{(2)}},{\widetilde{\boldsymbol\xi}_{1}^{(1)}},{\widetilde{\boldsymbol\xi}_{1}^{(2)}}\ldots,{\widetilde{\boldsymbol\xi}_{n}^{(1)}}, {\widetilde{\boldsymbol\xi}_{\textbf{n}}^{(2)}}\Big)& =\\P\Big({\widetilde{\boldsymbol\xi}_{\textbf{n}+1}^{(\ell)}}= \textbf{e}_{\textbf{i}}\Big| {\widetilde{\boldsymbol\psi}_{\textbf{n}}^{(\ell)}} \Big)& = {\widetilde{\psi}_{n\,i}^{(\ell)}},\quad {for } i= 1, \dots,\,k,\end{aligned}\end{equation}

and such that, for any $n\geq 0$ , we have

(A.13) \begin{equation}E\Big[ \Big|{\widetilde{\boldsymbol\psi}_{\textbf{n}+1}^{(2)}} - {\widetilde{\boldsymbol\psi}_{\textbf{n}+1}^{(1)}}\Big|\Big|{\widetilde{\boldsymbol\psi}_{\textbf{m}}^{(1)}} , {\widetilde{\boldsymbol\psi}_{\textbf{m}}^{(2)}} , m\leq n\Big]\leq(a_0+a_1)\big|{\widetilde{\boldsymbol\psi}_{\textbf{n}}^{(2)}} - {\widetilde{\boldsymbol\psi}_{\textbf{n}}^{(1)}}\big| + O\big(c_{n+1}^{(1)}\big) + O\big(c_{n+1}^{(2)}\big).\end{equation}

Remark 6. As a consequence, for each $\ell=1,\,2$ , the two stochastic processes ${\widetilde{\boldsymbol{\psi}}^{(\boldsymbol{\ell})}}$ and ${\widetilde{\boldsymbol\xi}^{(\boldsymbol\ell)}}$ have the same joint distribution of $\boldsymbol{\psi}^{(\boldsymbol{\ell})}$ and of $\boldsymbol{\xi}^{(\boldsymbol\ell)}$ , respectively. Indeed, ${\widetilde{\boldsymbol\psi}^{(\boldsymbol\ell)}_{\textbf{0}}}={\boldsymbol{\psi}^{(\boldsymbol\ell)}_{\textbf{0}}}$ , and by (A.9), (A.10), (A.11), and (A.12), the conditional distributions of ${\widetilde{\boldsymbol\psi}^{\boldsymbol\ell}_{\textbf{n}+\textbf{1}}}$ given $\Big[{\widetilde{\boldsymbol\psi}^{(\boldsymbol\ell)}_{\textbf{0}}},\dots,\,{\widetilde{\boldsymbol\psi}^{(\boldsymbol\ell)}_{\textbf{n}}}\Big]$ and of ${\widetilde{\boldsymbol\xi}^{\boldsymbol\ell}_{\textbf{n}+\textbf{1}}}$ given $\Big[{\widetilde{\boldsymbol\psi}^{(\boldsymbol\ell)}_{\textbf{0}}},{\widetilde{\boldsymbol\xi}^{(\boldsymbol\ell)}_{\textbf{1}}}\dots,\, {\widetilde{\boldsymbol\xi}^{(\boldsymbol\ell)}_{\textbf{n}}}\Big]$ are the same as those of ${\boldsymbol{\psi}^{\boldsymbol\ell}_{\textbf{n}+\textbf{1}}}$ given $\Big[{\boldsymbol{\psi}^{(\boldsymbol\ell)}_{\textbf{0}}},\dots,\,\boldsymbol{{\psi}^{(\ell)}_n}\Big]$ and of $\boldsymbol{{\xi}^{\ell}_{\textbf{n}+1}}$ given $\Big[{\boldsymbol{\psi}^{(\boldsymbol\ell)}_{\textbf{0}}}, \boldsymbol{{\xi}^{(\ell)}_1}\dots,\, \boldsymbol{{\xi}^{(\ell)}_{\textbf{n}}}\Big]$ , respectively.

Moreover, from the inequality (A.13), by recursion, we obtain

(A.14) \begin{equation}\begin{split}E\Big[ \Big|{\widetilde{\boldsymbol\psi}_{\textbf{n}+1}^{(2)}} - {\widetilde{\boldsymbol\psi}_{\textbf{n}+1}^{(1)}}\Big|\Big|\boldsymbol{\psi_{0}^{(1)}} , \boldsymbol{\psi_{0}^{(2)}}\Big]& \leq(a_0+a_1)^{n+1}\big|\boldsymbol{\psi_{0}^{(2)}} - \boldsymbol{\psi_{0}^{(1)}}\big|\\& + O\Bigg(\sum_{j=1}^{n+1} \big(a_0+a_1\big)^{n+1-j} \Big(c_{j}^{(1)} + c_{j}^{(2)} \Big)\Bigg).\end{split}\end{equation}

Proof. We set ${\widetilde{\boldsymbol\psi}_{0}^{(\ell)}}= \boldsymbol{{\psi}_{0}^{(\ell)}}$ , for $\ell=1,\,2$ , and we take a sequence $(U_n)_{n\geq 1}$ of i.i.d. (0, 1)-uniform random variables, independent of $\sigma\big(\boldsymbol{{\psi}_{0}^{(1)}},\boldsymbol{{\psi}_{0}^{(2)}} \big)$ . Then we take the two functions $\textbf{h}^{(\textbf{1})},\textbf{h}^{(\textbf{2})}$ of Lemma 2, and for each $\ell$ and any $n\geq 0$ we recursively define

\begin{align*}{\widetilde{\boldsymbol\xi}_{\textbf{n}+1}^{(\ell)}} & ={\textbf{h}^{(\boldsymbol{\ell})}}\Big({\widetilde{\boldsymbol\psi}_{\textbf{n}}^{(1)}} ,{\widetilde{\boldsymbol\psi}_{\textbf{n}}^{(2)}} , U_{n+1}\Big),\\ {\widetilde{\boldsymbol\psi}_{\textbf{n}+1}^{(\ell)}} & = a_0 {\widetilde{\boldsymbol\psi}_{\textbf{n}}^{(\ell)}}+ a_1{\widetilde{\boldsymbol\xi}_{\textbf{n}+1}^{(\ell)}}+{\textbf{l}_{\textbf{n}+\textbf{1}}^{(\boldsymbol{\ell})}}\Big({\widetilde{\boldsymbol\psi}_{\textbf{n}}^{(\ell)}}, {\widetilde{\boldsymbol\xi}_{\textbf{n}+1}^{(\ell)}} \Big)+\textbf{c}.\end{align*}

Setting $\widetilde{\mathcal{F}}_n =\sigma\Big(\boldsymbol{{\psi}_{0}^{(1)}},\boldsymbol{{\psi}_{0}^{(2)}}, U_1,\ldots, U_n\Big)$ , we have that $U_{n+1}$ is independent of $\widetilde{\mathcal{F}}_n$ and, by definition, ${\widetilde{\boldsymbol\xi}_{\textbf{n}}^{(\ell)}}$ and ${\widetilde{\boldsymbol\psi}_{\textbf{n}}^{(\ell)}}$ are $\widetilde{\mathcal{F}}_n$ -measurable, for any $\ell=1,2$ and $n\geq 0$ . Therefore, using the relation (A.6), we get, for any $\ell=1,\,2$ , $n\geq 0$ , and $i=1,\ldots,k$ ,

\begin{equation*}P\Big({\widetilde{\boldsymbol\xi}_{\textbf{n}+1}^{(\ell)}}= \textbf{e}_{\textbf{i}}\big| \widetilde{\mathcal{F}}_n \Big) = \int{\unicode{x1D7D9}}_{\big\{{\textbf{h}^{(\boldsymbol{\ell})}}({\widetilde{\boldsymbol\psi}_{\textbf{n}}^{(1)}},\,{\widetilde{\boldsymbol\psi}_{\textbf{n}}^{(2)}},\, u)= \textbf{e}_{\textbf{i}}\big\}} \, du= \widetilde{\psi}_{n\, i}^{(\ell)}.\end{equation*}

This means that (A.11), together with (A.12), holds true. Finally, by the relation (A.8), we have

\begin{equation*}\begin{split}E\Big[ \Big|\boldsymbol{{\xi}_{\textbf{n}+1}^{(2)}} - \boldsymbol{{\xi}_{\textbf{n}+1}^{(1)}}\Big|\Big|\widetilde{\mathcal{F}}_n\Big]&=\int_{(0, 1)}\Big|\textbf{h}^{(\textbf{1})} \Big({\widetilde{\boldsymbol\psi}^{(1)}_{\textbf{n}}} ,{\widetilde{\boldsymbol\psi}^{(2)}_{\textbf{n}}} ,\, u\Big)-\textbf{h}^{(\textbf{2})} \Big({\widetilde{\boldsymbol\psi}^{(1)}_{\textbf{n}}} ,{\widetilde{\boldsymbol\psi}^{(2)}_{\textbf{n}}},\, u\Big) \Big|du\\ &\leq \big|{\widetilde{\boldsymbol\psi}^{(1)}_{\textbf{n}}} - {\widetilde{\boldsymbol\psi}^{(2)}_{\textbf{n}}}\big|;\end{split}\end{equation*}

hence, subtracting (A.11) with $\ell=2$ from the same relation with $\ell=1$ , we obtain

\begin{equation*}E\Big[ \Big|{\widetilde{\boldsymbol\psi}_{\textbf{n}+1}^{(2)}} - {\widetilde{\boldsymbol\psi}_{\textbf{n}+1}^{(1)}}\Big|\Big|\widetilde{\mathcal{F}}_n\Big]\leq a_0\big|{\widetilde{\boldsymbol\psi}_{\textbf{n}}^{(2)}} - {\widetilde{\boldsymbol\psi}_{\textbf{n}}^{(1)}}\big| +a_1\big|{\widetilde{\boldsymbol\psi}_{\textbf{n}}^{(2)}} - {\widetilde{\boldsymbol\psi}_{\textbf{n}}^{(1)}}\big|+ O\big(c_{n+1}^{(1)}\big) + O\big(c_{n+1}^{(2)}\big),\end{equation*}

and so the inequality (A.13) holds true.

Declaration

All the authors developed the theoretical results, performed the numerical simulations, and contributed to the final version of the manuscript.

Acknowledgements

Giacomo Aletti is a member of the Gruppo Nazionale per il Calcolo Scientifico, of the Istituto Nazionale di Alta Matematica (Italy). Irene Crimaldi is a member of the Gruppo Nazionale per l’Analisi Matematica, la Probabilità e le loro Applicazioni, of the Istituto Nazionale di Alta Matematica.

Funding information

Irene Crimaldi is partially supported by the Programma di Attività Integrata (PAI), under the project ‘TOol for Fighting FakEs’ (TOFFE) funded by the IMT School for Advanced Studies Lucca.

Competing interests

There were no competing interests to declare which arose during the preparation or publication process for this article.

References

Aletti, G., Crimaldi, I. and Ghiglietti, A. (2017). Synchronization of reinforced stochastic processes with a network-based interaction. Ann. Appl. Prob. 27, 37873844.CrossRefGoogle Scholar
Aletti, G., Crimaldi, I. and Ghiglietti, A. (2019). Networks of reinforced stochastic processes: asymptotics for the empirical means. Bernoulli 25, 33393378.CrossRefGoogle Scholar
Aletti, G., Crimaldi, I. and Ghiglietti, A. (2020). Interacting reinforced stochastic processes: statistical inference based on the weighted empirical means. Bernoulli 26, 10981138.CrossRefGoogle Scholar
Aletti, G., Crimaldi, I. and Saracco, F. (2021). A model for the Twitter sentiment curve. PLOS ONE 16, 128.CrossRefGoogle Scholar
Aletti, G., Ghiglietti, A. and Paganoni, A. M. (2013). Randomly reinforced urn designs with prespecified allocations. J. Appl. Prob. 50, 486498.CrossRefGoogle Scholar
Aletti, G., Ghiglietti, A. and Rosenberger, W. F. (2018). Nonparametric covariate-adjusted response-adaptive design based on a functional urn model. Ann. Statist. 46, 38383866.CrossRefGoogle Scholar
Aletti, G., Ghiglietti, A. and Vidyashankar, A. N. (2018). Dynamics of an adaptive randomly reinforced urn. Bernoulli 24, 22042255.CrossRefGoogle Scholar
Bergh, D. (2015). Sample size and chi-squared test of fit—a comparison between a random sample approach and a chi-square value adjustment method using Swedish adolescent data. In Pacific Rim Objective Measurement Symposium (PROMS) 2014 Conference Proceedings, eds Q. Zhang and H. Yang, Springer, Berlin, Heidelberg, pp. 197211.CrossRefGoogle Scholar
Berti, P., Crimaldi, I., Pratelli, L. and Rigo, P. (2011). A central limit theorem and its applications to multicolor randomly reinforced urns. J. Appl. Prob. 48, 527546.CrossRefGoogle Scholar
Berti, P., Crimaldi, I., Pratelli, L. and Rigo, P. (2016). Asymptotics for randomly reinforced urns with random barriers. J. Appl. Prob. 53, 12061220.CrossRefGoogle Scholar
Bertoni, D. et al. (2018). Farmland use transitions after the CAP greening: a preliminary analysis using Markov chains approach. Land Use Policy 79, 789800.CrossRefGoogle Scholar
Caldarelli, G., Chessa, A., Crimaldi, I. and Pammolli, F. (2013). Weighted networks as randomly reinforced urn processes. Phys. Rev. E 87, 020106.CrossRefGoogle ScholarPubMed
Caron, F. et al. (2017). Generalized Pólya urn for time-varying Pitman–Yor processes. J. Machine Learning Res. 18, 132.Google Scholar
Chanda, K. C. (1999). Chi-squared tests of goodness-of-fit for dependent observations. In Asymptotics, Nonparametrics, and Time Series, CRC Press, Boca Raton, FL, pp. 743756.Google Scholar
Chen, M.-R. and Kuba, M. (2013). On generalized Pólya urn models. J. Appl. Prob. 50, 11691186.CrossRefGoogle Scholar
Chessa, A., Crimaldi, I., Riccaboni, M. and Trapin, L. (2014). Cluster analysis of weighted bipartite networks: a new copula-based approach. PLOS ONE 9, 112.CrossRefGoogle ScholarPubMed
Collevecchio, A., Cotar, C. and LiCalzi, M. (2013). On a preferential attachment and generalized Pólya’s urn model. Ann. Appl. Prob. 23, 12191253.CrossRefGoogle Scholar
Crimaldi, I. (2016). Central limit theorems for a hypergeometric randomly reinforced urn. J. Appl. Prob. 53, 899913.CrossRefGoogle Scholar
Crimaldi, I. (2016). Introduzione alla nozione di convergenza stabile e sue varianti. Unione Matematica Italiana, Bologna.Google Scholar
Crimaldi, I., Dai Pra, P., Louis, P.-Y. and Minelli, I. G. (2019). Synchronization and functional central limit theorems for interacting reinforced random walks. Stoch. Process. Appl. 129, 70101.CrossRefGoogle Scholar
Crimaldi, I., Dai Pra, P. and Minelli, I. G. (2016). Fluctuation theorems for synchronization of interacting Pólya’s urns. Stoch. Process. Appl. 126, 930947.CrossRefGoogle Scholar
Crimaldi, I., Letta, G. and Pratelli, L. (2007). A strong form of stable convergence. In Séminaire de Probabilités XL, Springer, Berlin, Heidelberg, pp. 203225.CrossRefGoogle Scholar
Dai Pra, P., Louis, P.-Y. and Minelli, I. G. (2014). Synchronization via interacting reinforcement. J. Appl. Prob. 51, 556568.CrossRefGoogle Scholar
Doeblin, W. and Fortet, R. (1937). Sur des chanes à liaisons complètes. Bull. Soc. Math. France 65, 132148.CrossRefGoogle Scholar
Eggenberger, F. and Pólya, G. (1923). Über die Statistik verketteter Vorgänge. Z. Angew. Math. Mech. 3, 279289.CrossRefGoogle Scholar
Gasser, T. (1975). Goodness-of-fit tests for correlated data. Biometrika 62, 563570.CrossRefGoogle Scholar
Ghiglietti, A. and Paganoni, A. M. (2014). Statistical properties of two-color randomly reinforced urn design targeting fixed allocations. Electron. J. Statist. 8, 708737.CrossRefGoogle Scholar
Ghiglietti, A., Vidyashankar, A. N. and Rosenberger, W. F. (2017). Central limit theorem for an adaptive randomly reinforced urn model. Ann. Appl. Prob. 27, 29563003.CrossRefGoogle Scholar
Gleser, L. J. and Moore, D. S. (1983). The effect of dependence on chi-squared and empiric distribution tests of fit. Ann. Statist. 11, 11001108.CrossRefGoogle Scholar
Guivarc’h, Y. and Hardy, J. (1988). Théorèmes limites pour une classe de chaînes de Markov et applications aux difféomorphismes d’Anosov. Ann. Inst. H. Poincaré Prob. Statist. 24, 7398.Google Scholar
Hairer, M. Ergodic properties of Markov processes. Available at http://www.hairer.org/notes/Markov.pdf.Google Scholar
Hall, P. and Heyde, C. C. (1980). Martingale Limit Theory and Its Application. Academic Press, New York.Google Scholar
Holmes, M. and Sakai, A. (2007). Senile reinforced random walks. Stoch. Process. Appl. 117, 15191539.CrossRefGoogle Scholar
Ieva, F., Paganoni, A. M., Pigoli, D. and Vitelli, V. (2013). Multivariate functional clustering for the morphological analysis of electrocardiograph curves. J. R. Statist. Soc. C [Appl. Statist.] 62, 401418.CrossRefGoogle Scholar
Ionescu Tulcea, C. T. and Marinescu, G. (1950). Théorie ergodique pour des classes d’opérations non complètement continues. Ann. Math. 52, 140147.CrossRefGoogle Scholar
Knoke, D., Bohrnstedt, G. W. and Potter Mee, A. (2002). Statistics for Social Data Analysis. F. E. Peacock Publishers, Itasca, IL.Google Scholar
Laruelle, S. and Pagés, G. (2013). Randomized urn models revisited using stochastic approximation. Ann. Appl. Prob. 23, 14091436.CrossRefGoogle Scholar
Mahmoud, H. M. (2009). Pólya Urn Models. CRC Press, Boca Raton, FL.Google Scholar
Métivier, M. (1982). Semimartingales. Walter de Gruyter, Berlin.CrossRefGoogle Scholar
Meyn, S. and Tweedie, R. L. (2009). Markov Chains and Stochastic Stability, 2nd edn. Cambridge University Press.CrossRefGoogle Scholar
Micheletti, A. et al. (2019). A weighted $\chi^2$ test to detect the presence of a major change point in non-stationary Markov chains. Submitted.Google Scholar
Norman, M. F. (1972). Markov Processes and Learning Models. Academic Press, New York.Google Scholar
Pan, W. (2002). Goodness-of-fit tests for GEE with correlated binary data. Scand. J. Statist. 29, 101110.CrossRefGoogle Scholar
Pei, Y., Tang, M.-L. and Guo, J. (2008). Testing the equality of two proportions for combined unilateral and bilateral data. Commun. Statist. Simul. Comput. 37, 15151529.CrossRefGoogle Scholar
Pemantle, R. (2007). A survey of random processes with reinforcement. Prob. Surveys 4, 179.CrossRefGoogle Scholar
Radlow, R. and Alf, E. F., Jr. (1975). An alternate multinomial assessment of the accuracy of the $\chi^2$ test of goodness of fit. J. Amer. Statist. Assoc. 70, 811813.Google Scholar
Rao, J. N. K. and Scott, A. J. (1981). The analysis of categorical data from complex sample surveys: chi-squared tests for goodness of fit and independence in two-way tables. J. Amer. Statist. Assoc. 76, 221230.CrossRefGoogle Scholar
Rényi, A. (1963). On stable sequences of events. Sankhyā A 25, 293 302.Google Scholar
Robbins, H. and Siegmund, D. (1971). A convergence theorem for non negative almost supermartingales and some applications. In Optimizing Methods in Statistics, Academic Press, New York, pp. 233257.Google Scholar
Sahasrabudhe, N. (2016). Synchronization and fluctuation theorems for interacting Friedman urns. J. Appl. Prob. 53, 12211239.CrossRefGoogle Scholar
Sherman, J. and Morrison, W. J. (1950). Adjustment of an inverse matrix corresponding to a change in one element of a given matrix. Ann. Math. Statist. 21, 124127.CrossRefGoogle Scholar
Tang, M.-L., Pei, Y.-B., Wong, W.-K. and Li, J.-L. (2012). Goodness-of-fit tests for correlated paired binary data. Statist. Methods Med. Res. 21, 331345.CrossRefGoogle ScholarPubMed
Tharwat, A. (2018). Independent component analysis: an introduction. Appl. Comput. Informat. Google Scholar
Williams, D. (1991). Probability with Martingales. Cambridge University Press.CrossRefGoogle Scholar
Xu, D. and Tian, Y. (2015). A comprehensive survey of clustering algorithms. Ann. Data Sci. 2, 165193.CrossRefGoogle Scholar
Figure 0

Figure 1. Simulations of the two processes $(\psi_{n\, 1})_n$ (grey) and $(\bar{\xi}_{n\,1} )_n$ (black), with $n = 1, \ldots, 20000$, $p_{0\,1} = \frac{1}{2}$, and for different values of $\alpha$ and $\beta$: (A) $\alpha = 199$, $\beta = 0$; (B) $\alpha = 1$, $\beta = 0.975$; (C) $\alpha = 1$, $\beta = 1$; (D) $\alpha = 0.5$, $\beta = 1.0001$. As shown, when $\beta <1$, $(\psi_{n\, 1})_n$ exhibits a persistent fluctuation, locally reinforced, and $(\bar{\xi}_{n\,1} )_n$ converges to the deterministic limit $p_{0\,1}$. When $\beta \geq 1$, the y-axis is zoomed to show the random fluctuations of both the processes towards the same random limit.