Hostname: page-component-745bb68f8f-mzp66 Total loading time: 0 Render date: 2025-02-11T13:05:37.182Z Has data issue: false hasContentIssue false

Mean and variance of balanced Pólya urns

Published online by Cambridge University Press:  03 December 2020

Svante Janson*
Affiliation:
Uppsala University
*
*Postal address: Department of Mathematics, Uppsala University, PO Box 480, SE-751 06 Uppsala, Sweden. Email: svante.janson@math.uu.se
Rights & Permissions [Opens in a new window]

Abstract

It is well-known that in a small Pólya urn, i.e., an urn where the second largest real part of an eigenvalue is at most half the largest eigenvalue, the distribution of the numbers of balls of different colours in the urn is asymptotically normal under weak additional conditions. We consider the balanced case, and then give asymptotics of the mean and the covariance matrix, showing that after appropriate normalization, the mean and covariance matrix converge to the mean and covariance matrix of the limiting normal distribution.

Type
Original Article
Copyright
© Applied Probability Trust 2020

1. Introduction

A (generalized) Pólya urn contains balls of different colours. A ball is drawn at random from the urn, and is replaced by a set of balls that depends on the colour of the drawn ball. (Moreover, the replacement set may be random, with a distribution depending on the drawn colour). This is repeated an infinite number of times, and we are interested in the asymptotic composition of the urn. For details, and the assumptions used in the present paper, see Section 2; for the history of Pólya urns, see e.g. [Reference Mahmoud24].

It is well-known, and proved under various conditions in a number of papers by a variety of authors (see e.g. [Reference Janson19, Theorems 3.22–3.24]), that the asymptotic behaviour depends on the eigenvalues of the intensity matrix of the urn, defined in (2.5) below, and in particular on the two largest (in real part) eigenvalues, $\lambda_1$ and $\lambda_2$ . If $\textrm{Re}\ \lambda_2\leqslant\frac12\lambda_1$ (a small urn), then, under some assumptions (including some version of irreducibility), the number of balls of a given colour is asymptotically normal, while if $\textrm{Re}\ \lambda_2>\frac12\lambda_1$ (a large urn), then this is not true: there are (again under some assumptions, and after suitable normalization) limits in distribution, but the limiting distributions have no simple description and are (typically, at least) not normal; furthermore, there may be oscillations so that suitable subsequences converge in distribution but the full sequence does not. Another difference is that for a small urn, the limit is independent of the initial state, and therefore independent of what happens in any fixed finite set of draws (i.e., the limit theorem is mixing; see [Reference Aldous and Eagleson1, Proposition 2]), while for a large urn, on the contrary, there is an almost sure (a.s.) limit result and thus the limit is essentially determined by what happens early in the process.

For large urns, assuming that the urn is balanced (see Section 2), Pouyanne [Reference Pouyanne29] proved a limit theorem which shows such an a.s. result and also shows convergence in $L^p$ for any p, and thus convergence of all moments.

For small urns, however, less has been known about moment convergence in general. Balanced deterministic urns with two colours were considered already by Bernstein [Reference Bernstein9, Reference Bernstein10], who both showed the asymptotic normality in the small urn case and gave results on mean and variance; Savkevitc [Reference Savkevitch30] also considered this urn and studied the mean and variance and, moreover, the third and fourth moments. Bagchi and Pal [Reference Bagchi and Pal5] (independently, but 45 years later) gave another proof of asymptotic normality for balanced deterministic small urns with two colours, by using the method of moments and thus proving moment convergence as part of the proof. Bai and Hu [Reference Bai and Hu6, Reference Bai and Hu7], who consider an arbitrary number of colours and allow random replacements (under somewhat different conditions than ours, allowing also time-dependent replacements), show asymptotic results for the mean and (co)variance as part of their proofs of asymptotic normality for small urns, using the same decomposition of the covariance matrix as in the present paper; however, these results are hidden inside the proof and are not stated explicitly. More recently, Janson and Pouyanne [Reference Janson and Pouyanne21] proved explicit results on asymptotics of mean and variance, and also higher central moments of arbitrary order, for irreducible small urns; the method there combined the known result on asymptotic normality in this case, and moment estimates by the method in [Reference Pouyanne29], leading to uniform integrability.

Note that asymptotics of mean and variance often are of interest in applications, complementing results on convergence in distribution. (In some applications, results on mean and variance have been proved separately by other methods.) Moreover, although not surprising, it is satisfying to know that in a case where a central limit theorem holds, the mean and variance also converge as suggested by this central limit theorem. In particular, loosely speaking, the variance of the number of balls of a given colour is asymptotic to the asymptotic variance. Furthermore, for this random number, the standard normalization by mean and standard deviation yields convergence to a standard normal distribution.

The main purpose of the present paper is to give explicit asymptotics for the first and second moments for a balanced small urn by an elementary direct method. (Our assumptions are somewhat weaker than in [Reference Janson and Pouyanne21]; we do not assume that a central limit theorem holds, and also some non-normal cases are included; see Remark 3.2.) We also include a simple result on non-degeneracy of the limit (Theorem 3.5). Precise statements are given in Section 3. Some results (e.g. Theorem 3.1 and the lemmas in Sections 5 and 6) apply also to large urns.

Our method is closely related to the one used by e.g. [Reference Bai and Hu6, Reference Bai and Hu7]; it is also related to the method of [Reference Pouyanne29] and [Reference Janson and Pouyanne21], but substantially simpler. The main idea (which has been used in various forms earlier, in particular by [Reference Bai and Hu6, Reference Bai and Hu7] is that the drawing of a ball and the subsequent addition of a set of balls, at time k, say, influences the composition of the urn at a later time n not only directly by the added balls, but also indirectly since the added balls change the probabilities for later draws. By including the expectation of these later indirect effects, we find the real effect at time n of the draw at time k, and we may write the composition at time n as the sum, for $k\leqslant n$ , of these contributions; see (4.11). The contributions for different k are orthogonal, and thus the variance can be found by summing the variances of these contributions.

Another purpose of the present paper is to demonstrate in detail this elementary method and how it can be used to obtain important results. We believe that although the method is closely related to earlier proofs in e.g. [Reference Bai and Hu6; Reference Bai and Hu7; Reference Hu and Rosenberger17], making the decomposition (4.11) explicit illuminates both the proofs and the general behaviour of Pólya urns, in particular the difference between large and small urns. See the comments in Section 8.

Section 2 gives definitions and introduces the notation. Section 3 contains the statements of the main results, which are proved in Sections 46. Section 7 presents some applications, and Section 8 contains some further comments on where the variance comes from, i.e., which draws are most important, and the difference between small and large urns. The appendices give some further, more technical, results.

Remark 1.1. We consider in the present paper only the mean and (co)variance. As stated above, similar results on convergence of higher moments for balanced small urns are given in [Reference Janson and Pouyanne21] (under somewhat more restrictive assumptions than in the present paper), by another method, based on showing uniform integrability. (An anonymous referee of the first version of the present paper has suggested another, simpler, way to show uniform integrability; this can be used to simplify the proofs in [Reference Janson and Pouyanne21].)

It is possible that the method in the present paper can be extended to handle higher moments too, but we do not see any immediate extension. On the other hand, for the first and second moments, the present method seems simpler, and perhaps also more informative, than the one in [Reference Janson and Pouyanne21].

Problem 1.1. In the present paper, we consider only balanced urns. We leave it as a challenging open problem to prove (or disprove?) similar results for non-balanced urns.

2. Pólya urns

2.1. Definition and assumptions

A (generalized) Pólya urn process is defined as follows. (See e.g. Mahmoud [Reference Mahmoud24], Johnson and Kotz [Reference Johnson and Kotz23], Janson [Reference Janson19], Flajolet, Gabarró and Pekari [Reference Flajolet, Gabarró and Pekari14] and Pouyanne [Reference Pouyanne29] for the history and further references, as well as some different methods used to study such urns.) There are balls of q colours (types) $1,\dots, q$ , where $2\leqslant q<\infty$ . The composition of the urn at time n is given by the vector $X_n=(X_{n1},\dots,X_{nq})\in [0,\infty)^q$ , where $X_{ni} $ is the number of balls of colour i. The urn starts with a given vector $X_0$ , and evolves according to a discrete-time Markov process. Each colour i has an activity (or weight) $a_i\geqslant0$ , and a (generally random) replacement vector $\xi_i=(\xi_{i1},\dots, \xi_{iq})$ . At each time $ n+1\geqslant 1 $ , the urn is updated by drawing one ball at random from the urn, with the probability of any ball proportional to its activity. Thus, the drawn ball has colour i with probability

(2.1) \begin{equation} \frac{a_iX_{ni}}{\sum_{j}a_jX_{nj}}.\end{equation}

If the drawn ball has type i, it is replaced together with $\Delta X_{nj}$ balls of type j, $j=1,\dots,n$ , where the random vector $ \Delta X_{n}=(\Delta X_{n1},\dots, \Delta X_{nq}) $ has the same distribution as $ \xi_i$ and is independent of everything else that has happened so far. Thus, the urn is updated to $X_{n+1}=X_n+\Delta X_n$ .

In many applications, the numbers $X_{nj}$ and $\xi_{ij}$ are integers, but that is not necessary; it has been noted several times that the Pólya urn process is well-defined also for real $X_{ni}$ and $\xi_{ij}$ , with probabilities for the different replacements still given by (2.1); see e.g. [Reference Bai and Hu6], [Reference Bai, Hu and Rosenberger8, p. 126], [Reference Janson19, Remark 4.2], [Reference Janson20, Remark 1.11], and [Reference Pouyanne29], and the earlier [Reference Jiřina22] for the related case of branching processes; the ‘number of balls’ $X_{ni}$ may thus be any non-negative real number. (This can be interpreted as the amount (mass) of colour i in the urn, rather than the number of discrete balls.) The replacements $\xi_{ij}$ are thus random real numbers. We allow them to be negative, meaning that balls may be subtracted from the urn. However, we always assume that $X_0$ and the random vectors $\xi_i$ are such that, for every $n\geqslant0$ , a.s.

(2.2) \begin{equation}\text{each } X_{ni}\geqslant0 \quad \text{and}\quad \sum_i a_i X_{ni}>0,\end{equation}

so that (2.1) really gives meaningful probabilities (and the process does not stop due to lack of balls to be removed). An urn with such initial conditions and replacement rules is called tenable.

Remark 2.1. A sufficient condition for tenability, which often is assumed in other papers (sometimes with simple modifications), is that all $\xi_{ij}$ and $X_{0i}$ are integers with $\xi_{ij}\geqslant0$ for $j\neq i$ and $\xi_{ii}\geqslant -1$ (this means that we may remove the drawn ball but no other ball), and furthermore, for example, $\sum_ja_j\xi_{ij}\geqslant0$ a.s. (meaning that the total activity never decreases); then the urn is tenable for any $X_0$ with non-zero activity. This is satisfied in most applications we know of, but not all; see Remark 7.1 for a different example. We shall not assume this condition in the present paper, unless explicitly stated.

Remark 2.2. In all applications that we know of, each $\xi_i$ is a discrete random vector, i.e. it takes only a countable (usually a finite) number of different values. This is not necessary, however; the results below hold also if, e.g., some $\xi_{ij}$ is continuous.

We assume, for simplicity, that the initial composition $X_0$ is deterministic.

Remark 2.3. The results are easily extended to the case of random $X_0$ by conditioning on $X_0$ , but that may require some extra conditions or minor modifications in some of the statements, which we leave to the reader.

The Pólya urn is balanced if

(2.3) \begin{equation} \sum_j a_j\xi_{ij} = b>0\end{equation}

(a.s.) for some constant b and every i. In other words, the added activity after each draw is fixed (non-random and not depending on the colour of the drawn ball). This implies that the denominator in (2.1) (which is the total activity in the urn) is deterministic for each n; see (4.9). This is a significant simplification, and is assumed in many papers on Pólya urns. (One exception is [Reference Janson19], which is based on embedding in a continuous-time branching process and stopping at a suitable stopping time, following [Reference Athreya and Karlin3]; this method does not seem to easily give information on moments and is not used in the present paper.)

Remark 2.4. We exclude the case $b=0$ , which is quite different; a typical example is a Markov chain, regarded as an urn always containing a single ball.

We shall assume that the urn is tenable and balanced; this is sometimes repeated for emphasis.

We also assume (2.9) below; as discussed in Remark 2.6 and Appendix A, this is a very weak assumption needed to exclude some trivial cases allowed by our definition of tenable; by Lemma A.1 it is sufficient to assume that every colour in the specification actually may occur in the urn, which always can be achieved by eliminating any redundant colours.

Finally, in order to obtain moment results, we assume that the replacements have second moments:

(2.4) \begin{equation} \mathbb{E} \xi_{ij}^2<\infty,\qquad i,j=1,\dots,q.\end{equation}

It follows that every $X_n$ has second moments, so the covariance matrix $\operatorname{Var}(X_n)$ is finite for each n.

Remark 2.5. In the tenable and balanced case, the assumption (2.4) is almost redundant. First, although there might be negative values of $\xi_{ij}$ , we assume that the urn is tenable. Hence, given any instance $(x_1,\dots,x_q)$ of the urn that may occur with positive probability as some $X_n$ , we have $\xi_{ij}\geqslant -x_j$ a.s. for every i and j such that $a_ix_i>0$ . In particular, if every $a_i>0$ , and every colour may appear in the urn, then each $\xi_{ij}$ is bounded below. Furthermore, still assuming $a_i>0$ for each i, this and (2.3) implies that each $\xi_{ij}$ also is bounded above; hence $\xi_{ij}$ is bounded and has moments of any order.

2.2. Notation

We regard all vectors as column vectors. We use standard notation for (real or complex) vectors and matrices (of sizes q and $q\times q$ , respectively), in particular $^{\prime}$ for transpose, ${}^*$ for Hermitian conjugate, and $\cdot$ for the standard scalar product; thus $u\cdot v=u'v$ for any vectors $u,v\in\mathbb R^q$ . We let $\lVert{\,}\rVert$ denote the standard Euclidean norm for vectors, and the operator norm (or any other convenient norm) for matrices.

Let $a\coloneqq (a_1,\dots,a_q)'$ be the vector of activities. Thus, the balance condition (2.3) can be written $a\cdot\xi_i=b$ .

The intensity matrix of the Pólya urn is the $ q\times q $ matrix

(2.5) \begin{equation}A\coloneqq (a_j\,\mathbb{E}\xi_{ji})_{i,j=1}^{q}.\end{equation}

(Note that, for convenience and following [Reference Janson19], we have defined A so that the element $(A)_{ij}$ is a measure of the intensity of adding balls of colour i coming from drawn balls of colour j; the transpose matrix A $^{\prime}$ is often used in other papers.) The intensity matrix A with its eigenvalues and eigenvectors has a central role for asymptotical results.

Let $\sigma(A)$ (the spectrum of A) be the set of eigenvalues of A.

We shall use the Jordan decomposition of the matrix A in the following form. There exists a decomposition of the complex space $\mathbb C^q$ as a direct sum $\bigoplus_\lambda E_\lambda$ of generalized eigenspaces $E_\lambda$ , such that $A-\lambda I$ is a nilpotent operator on $E_\lambda$ ; here $\lambda$ ranges over the set $\sigma(A)$ of eigenvalues of A. (I is the identity matrix of appropriate size.) In other words, there exist projections $P_\lambda$ , $\lambda\in\sigma(A)$ , that commute with A and satisfy

(2.6) \begin{align}\sum_{\lambda\in\sigma(A)}P_\lambda=I,\end{align}
(2.7) \begin{align}AP_\lambda=P_\lambda A=\lambda P_\lambda+N_\lambda,\end{align}

where $N_\lambda=P_\lambda N_\lambda=N_\lambda P_\lambda$ is nilpotent. Moreover, $P_\lambda P_\mu=0$ when $\lambda\neq\mu$ . We let $\nu_\lambda\geqslant0$ be the integer such that $N_\lambda^{\nu_\lambda}\neq0$ but $N_\lambda^{\nu_\lambda+1}=0$ . (Equivalently, in the Jordan normal form of A, the largest Jordan block with $\lambda$ on the diagonal has size $\nu_\lambda+1$ .) Hence $\nu_\lambda=0$ if and only if $N_\lambda=0$ , and this happens for all $\lambda$ if and only if A is diagonalizable, i.e. if and only if A has a complete set of q linearly independent eigenvectors. (In the sequel, $\lambda$ will always denote an eigenvalue. We may for completeness define $P_\lambda=N_\lambda=0$ for every $\lambda\notin\sigma(A)$ .)

The eigenvalues of A are denoted $\lambda_1,\dots,\lambda_q$ (repeated according to their algebraic multiplicities); we assume that they are ordered with decreasing real parts, $\textrm{Re}\ \lambda_1\geqslant\textrm{Re}\ \lambda_2\geqslant\dots$ , and furthermore, when the real parts are equal, in order of decreasing $\nu_j\coloneqq \nu_{\lambda_j}$ . In particular, if $\lambda_1>\textrm{Re}\ \lambda_2$ , then $\nu_j\leqslant\nu_2$ for every eigenvalue $\lambda_j$ with $\textrm{Re}\ \lambda_j=\textrm{Re}\ \lambda_2$ .

Recall that the urn is called small if $\textrm{Re}\ \lambda_2\leqslant\frac12\lambda_1$ and large if $\textrm{Re}\ \lambda_2>\frac12\lambda_1$ ; the urn is strictly small if $\textrm{Re}\ \lambda_2<\frac12\lambda_1$ .

In the balanced case, by (2.5) and (2.3),

(2.8) \begin{equation} a'A=\Bigl({\sum_{i=1}^q a_i(A)_{ij}}\Bigr)_j=\Bigl({\sum_{i=1}^q a_ia_j\mathbb{E} \xi_{ji}}\Bigr)_j=\bigl(a_j\mathbb{E} (a\cdot \xi_{j})\bigr)_j=ba';\end{equation}

i.e., a $^{\prime}$ is a left eigenvector of A with eigenvalue b. Thus $b\in\sigma(A)$ . We shall assume that, moreover, b is the largest eigenvalue, i.e.,

(2.9) \begin{equation} \lambda_1=b.\end{equation}

Remark 2.6. In fact, (2.9) is a very weak assumption. For example, if each $\xi_{ij}\geqslant0$ , then A is a matrix with non-negative elements, and since the eigenvector a $^{\prime}$ is non-negative, (2.9) is a consequence of the Perron–Frobenius theorem. The same holds (by considering $A+cI$ for a suitable $c>0$ ) under the assumption in Remark 2.1. Under our, more general, definition of tenability, there are counterexamples (see Example A.1), but we show in Appendix A that they are so only in a trivial way, and that we may assume (2.9) without real loss of generality. (Of course, the proof of Lemma A.1, which uses Lemma 5.4, does not use the assumption (2.9).)

We shall in our theorems furthermore assume that $\textrm{Re}\ \lambda_2<\lambda_1$ (and often more), and thus that $\lambda_1=b$ is a simple eigenvalue. There are thus corresponding left and right eigenvectors $u_1^{\prime}$ and $v_1$ that are unique up to normalization. By (2.8), we may choose $u_1=a$ . Furthermore, we let $v_1$ be normalized by

(2.10) \begin{align}u_1\cdot v_1=a\cdot v_1=1.\end{align}

Then the projection $P_{\lambda_1}$ is given by

(2.11) \begin{equation}P_{\lambda_1}=v_1u_1^{\prime}.\end{equation}

Consequently, in the balanced case, for any vector $v\in\mathbb R^q$ ,

(2.12) \begin{equation} P_{\lambda_1}v=v_1u_1^{\prime}v=v_1a'v=(a\cdot v)v_1.\end{equation}

Remark 2.7. The dominant eigenvalue $\lambda_1$ is simple, and $\textrm{Re}\ \lambda_2<\lambda_1$ if, for example, the matrix A is irreducible, but not in general. A simple counterexample is the original Pólya urn (see Markov [Reference Markov26], Eggenberger and Pólya [Reference Eggenberger and Pólya13], and Pólya [Reference Pólya28]), where each ball is replaced together with b balls of the same colour (and every $a_i=1$ ); then $A=bI$ and $\lambda_1=\dots=\lambda_q=b$ . As is well-known, the asymptotic behaviour is quite different in this case; in particular, $X_n/n$ converges in distribution to a non-degenerate distribution and not to a constant; see e.g. [Reference Pólya28] and [Reference Johnson and Kotz23].

Define also

(2.13) \begin{equation} \widehat P\coloneqq \sum_{\lambda\neq\lambda_1} P_\lambda = I-P_{\lambda_1}.\end{equation}

Furthermore, define the symmetric matrix

(2.14) \begin{equation} B\coloneqq \sum_{i=1}^q a_i v_{1i}\mathbb{E}\bigl(\xi_i\xi_i^{\prime}\bigr),\end{equation}

and, if the urn is strictly small, noting that $\widehat P$ commutes with $e^{sA}\coloneqq \sum_{k=0}^\infty (sA)^k/k!$ , define

(2.15) \begin{equation} \Sigma_I\coloneqq \int_0^\infty \widehat P e^{sA} B e^{sA'} \widehat P' e^{-\lambda_1 s} {\mathrm{d}} s.\end{equation}

This integral converges absolutely when the urn is strictly small, as can be seen from the proof of Theorem 3.2, or directly because $\lVert{ \widehat P e^{sA}\rVert}=O\bigl({s^{\nu_2}e^{\textrm{Re}\ \lambda_2s}}\bigr)$ for $s\geqslant1$ , as is easily seen from Lemma 5.1. (The integral is matrix-valued; the space of $q\times q$ matrices is a finite-dimensional space and the integral can be interpreted componentwise.) See also Appendix B.

Unspecified limits are as ${n\to\infty}$ . As usual, $a_n=O(b_n)$ means that $a_n/b_n$ is bounded; here $a_n$ may be vectors or matrices and $b_n$ may be complex numbers; we do not insist that $b_n$ be positive.

$\lceil x\rceil$ is the smallest integer greater than or equal to x.

3. Main results

Our main results on asymptotics of mean and variance are the following. Proofs are given in Section 6. As mentioned in the introduction, results of this type exist, mainly implicitly, in earlier work. In particular, under similar (but not identical) assumptions, Theorem 3.1 is implicit in Bai and Hu [Reference Bai and Hu7, Theorem 3.2], and its proof, and explicit in Pouyanne [Reference Pouyanne29, Proposition 7.1]; Theorems 3.2 and 3.3 are implicit in Bai and Hu [Reference Bai and Hu6, Reference Bai and Hu7].

Theorem 3.1. If the Pólya urn is tenable and balanced, and $\textrm{Re}\ \lambda_2<\lambda_1$ , then for $n\geqslant2$ ,

(3.1) \begin{equation} \begin{split}\mathbb{E} X_n &= (n\lambda_1 + a\cdot X_0)v_1 + O\bigl({n^{\textrm{Re}\ \lambda_2/\lambda_1}\log^{\nu_2} n}\bigr)\\[3pt]&= n \lambda_1 v_1 + O\bigl({n^{\textrm{Re}\ \lambda_2/\lambda_1}\log^{\nu_2} n+1}\bigr) \\[3pt]&= n \lambda_1 v_1 + o(n). \end{split}\end{equation}

In particular, if the urn is strictly small, i.e. $\textrm{Re}\ \lambda_2<\frac12\lambda_1$ , then

(3.2) \begin{equation} \mathbb{E} X_n = n\lambda_1 v_1 + o\bigl({n^{1/2}}\bigr).\end{equation}

Theorem 3.2. If the Pólya urn is tenable, balanced, and strictly small, i.e. $\textrm{Re}\ \lambda_2<\frac12\lambda_1$ , then

(3.3) \begin{equation} n^{-1} \operatorname{Var}(X_n) \to\Sigma\coloneqq \lambda_1\Sigma_I.\end{equation}

Theorem 3.3. If the Pólya urn is tenable, balanced, and small but not strictly small, i.e. $\textrm{Re}\ \lambda_2=\frac12\lambda_1$ , then

\begin{equation*} (n\log^{2\nu_2+1}n)^{-1} \operatorname{Var}(X_n) \to \frac{\lambda_1^{-2\nu_2}}{(2\nu_2+1)(\nu_2!)^2}\sum_{\textrm{Re}\ \lambda=\frac12\lambda_1}N_\lambda^{\nu_2}P_\lambda {B} P_\lambda^*(N_\lambda^*)^{\nu_2}.\end{equation*}

Remark 3.1. Under some additional assumptions (irreducibility of A, at least if we ignore colours with activity 0, and, for example, the condition in Remark 2.1), [Reference Janson19, Theorems 3.22–3.23 and Lemma 5.4], show that if the urn is small, then $X_n$ is asymptotically normal, with the asymptotic covariance matrix equal to the limit in Theorem 3.2 ( $\textrm{Re}\ \lambda_2<\frac12\lambda_1$ ) or Theorem 3.3 ( $\textrm{Re}\ \lambda_2=\frac12\lambda_1$ ). For example, in the strictly small case, $n^{-1/2}(X_n-n\lambda_1v_1)\overset{\mathrm{d}}{\longrightarrow} N(0,\Sigma)$ . Hence (under these hypotheses), Theorems 3.13.3 can be summarized by saying that the mean and (co)variances converge as expected in these central limit theorems.

We also obtain the following version of the law of large numbers for Pólya urns. Convergence a.s. has been shown before under various assumptions, including the unbalanced case as well as time-inhomogeneous generalizations (see [Reference Athreya and Ney4, Section V.9.3], [Reference Aldous, Flannery and Palacios2], [Reference Bai, Hu and Rosenberger8], [Reference Hu and Zhang18, Theorem 2.2], [Reference Janson19, Theorem 3.21], and [Reference Bai and Hu7, Theorem 2.2]), and is included here for completeness and because our conditions are somewhat more general. The $L^2$ result is in [Reference Pouyanne29, Remark 7.1(2)].

Theorem 3.4. If the Pólya urn is tenable and balanced, and $\textrm{Re}\ \lambda_2<\lambda_1$ , then as ${n\to\infty}$ , $X_n/n\to \lambda_1v_1$ a.s. and in $L^2$ .

The asymptotic covariance matrix $\Sigma$ in (3.3) is always singular, since, by (2.15), $\Sigma=\widehat P\Sigma\widehat P'$ and thus $u'\Sigma u=u'\widehat P\Sigma\widehat P'u=0$ when $\widehat P'u=0$ , which happens when $P_{\lambda_1}^{\prime} u=u$ , i.e., when u is a multiple of the left eigenvector $u_1=a$ . In the balanced case, this is easy to see: $a\cdot X_n$ is deterministic and thus $\operatorname{Var}(a\cdot X_n)=0$ ; hence $a'\Sigma a=0$ , since for any vector u, by (3.3),

(3.4) \begin{equation}n^{-1} \operatorname{Var}(u\cdot X_n)=n^{-1} u'\operatorname{Var}(X_n) u \to u'\Sigma u.\end{equation}

With an extra assumption, this is the only case when the asymptotic variance $u'\Sigma u$ vanishes (cf. [Reference Janson19, Remark 3.19]). Let $\widetilde A$ be the submatrix of A obtained by deleting all rows and columns corresponding to colours with activity $a_i=0$ .

Theorem 3.5. Suppose that the Pólya urn is tenable, balanced, and strictly small, i.e. $\textrm{Re}\ \lambda_2<\frac12\lambda_1$ , and, furthermore, that $\widetilde A$ is irreducible. If $u\in \mathbb R^q$ , then $u'\Sigma u =0$ if and only if for every $n\geqslant0$ , $\operatorname{Var}(u\cdot X_n)=0$ , i.e., $u\cdot X_n$ is deterministic.

Remark 3.2. If $\widetilde A$ is reducible, then, on the contrary, $\Sigma$ is typically more singular. As an extreme example, consider a ‘triangular’ urn with two colours, activities $a_i=1$ , and deterministic replacements $\xi_1=(1,0)$ , $\xi_2=(1-\lambda,\lambda)$ for a real $\lambda\in(0,1)$ (starting with one ball of each colour, say). Then $A=\left(\begin{smallmatrix}1&1-\lambda\\0&\lambda\end{smallmatrix}\right)$ . The eigenvalues are 1 and $\lambda$ , so the urn is strictly small if $\lambda<\frac12$ . However, $v_1=(1,0)$ , and thus (2.14) yields $B=\xi_1\xi_1^{\prime}=v_1v_1^{\prime}$ , and thus by (B.4) (or a direct calculation) $\widehat P B =0$ , and thus $\Sigma=\Sigma_I=0$ . Theorems 3.2 and 3.3 are still valid, but say only that the limit is 0. In fact, in this example, the proper normalization is $n^\lambda$ : it follows from [Reference Janson20, Theorem 1.3(v)] that $n^{-\lambda}X_{n2}=n^{-\lambda}(n+2-X_{n1})\overset{\mathrm{d}}{\longrightarrow} W$ for some non-degenerate (and non-normal) random variable W. Moreover, calculations similar to those in Section 6 show that $\mathbb{E} X_{n2}\sim c_1 n^\lambda$ and $\operatorname{Var} X_{n2}\sim c_2 n^{2\lambda}$ for some $c_1,c_2>0$ , as shown earlier in [Reference Pouyanne29, Example 7.2(2)].

Remark 3.3. It is easily seen that $\widetilde A$ is irreducible if and only if $v_{1i}>0$ for every i with $a_i>0$ .

4. Proofs, first steps

Let $I_n$ be the colour of the nth drawn ball, let

(4.1) \begin{equation} \Delta X_n\coloneqq X_{n+1}-X_n,\end{equation}

and let

(4.2) \begin{equation} w_n\coloneqq a\cdot X_n,\end{equation}

the total weight (activity) of the urn. Furthermore, let $\mathcal F_n$ be the $\sigma$ -field generated by $X_1,\dots,X_n$ . Then, by the definition of the urn,

(4.3) \begin{equation} \mathbb{P}\bigl({I_{n+1}=j\mid \mathcal F_n}\bigr) = \frac{a_jX_{nj}}{w_n},\end{equation}

and, consequently, recalling (2.5),

(4.4) \begin{equation} \begin{split}\mathbb{E}\bigl({\Delta X_n\mid \mathcal F_n}\bigr)&=\sum_{j=1}^q \mathbb{P}\bigl({I_{n+1}=j\mid \mathcal F_n}\bigr)\mathbb{E}\xi_j =\frac{1}{w_n}\sum_{j=1}^q a_j X_{nj}\mathbb{E}\xi_j \\&=\frac{1}{w_n}\Bigl({\sum_{j=1}^q (A)_{ij} X_{nj}}\Bigr)_i=\frac{1}{w_n} A X_{n}. \end{split}\end{equation}

Define

(4.5) \begin{equation} Y_n\coloneqq \Delta X_{n-1} - \mathbb{E}\bigl({\Delta X_{n-1}\mid \mathcal F_{n-1}}\bigr).\end{equation}

Then $Y_n$ is $\mathcal F_n$ -measurable, and obviously,

(4.6) \begin{equation}\mathbb{E}\bigl({Y_n\mid\mathcal F_{n-1}}\bigr)=0\end{equation}

and, by (4.1), (4.5) and (4.4),

(4.7) \begin{equation} X_{n+1}=X_n+Y_{n+1}+w_n^{-1} A X_n=\bigl({I+w_n^{-1} A}\bigr)X_n+Y_{n+1}.\end{equation}

Consequently, by induction, for any $n\geqslant0$ ,

(4.8) \begin{equation} X_n=\prod_{k=0}^{n-1}\bigl({I+w_k^{-1} A}\bigr)X_0 +\sum_{\ell=1}^{n}\prod_{k=\ell}^{n-1}\bigl({I+w_k^{-1} A}\bigr)Y_\ell,\end{equation}

where (as below) an empty matrix product is interpreted as I.

We now use the assumption that the urn is balanced, so $a\cdot\Delta X_n=b$ , and thus by (4.1)–(4.2), $w_n$ is deterministic with

(4.9) \begin{equation} w_n=w_0+nb,\end{equation}

where the initial weight $w_0=a\cdot X_0$ . We define the matrix products

(4.10) \begin{equation} F_{i,j}\coloneqq \prod_{i\leqslant k<j}\bigl({I+w_k^{-1} A}\bigr),\qquad 0\leqslant i\leqslant j,\end{equation}

and write (4.8) as

(4.11) \begin{equation} X_n=F_{0,n} X_0+\sum_{\ell=1}^{n}F_{\ell,n} Y_{\ell}.\end{equation}

As said in the introduction, we can regard the term $F_{\ell,n}Y_\ell$ as the real effect on $X_n$ of the $\ell$ th draw, including the expected later indirect effects.

Taking the expectation, since $\mathbb{E} Y_\ell=0$ by (4.6) and the $F_{i,j}$ and $X_0$ are non-random, we find

(4.12) \begin{equation}\mathbb{E} X_n=F_{0,n} X_0.\end{equation}

Hence, (4.11) can also be written

(4.13) \begin{equation} X_n-\mathbb{E} X_n =\sum_{\ell=1}^{n}F_{\ell,n} Y_\ell.\end{equation}

Consequently, the covariance matrix can be computed as

(4.14) \begin{align} \operatorname{Var}(X_n)&\coloneqq \mathbb{E}\bigl({(X_n-\mathbb{E} X_n)(X_n-\mathbb{E} X_n)'}\bigr)\nonumber\\&=\mathbb{E}\sum_{i=1}^{n}\sum_{j=1}^{n} \bigl({F_{i,n} Y_i}\bigr)\bigl({F_{j,n} Y_j}\bigr)'\\&=\sum_{i=1}^{n}\sum_{j=1}^{n} F_{i,n} \mathbb{E}\bigl({Y_i Y_j^{\prime}}\bigr) F_{j,n}'. \nonumber\end{align}

However, if $i>j$ , then $\mathbb{E}\bigl({Y_i\mid \mathcal F_j}\bigr)=0$ by (4.6), and since $Y_j$ is $\mathcal F_{j}$ -measurable, we have

(4.15) \begin{equation} \mathbb{E}\bigl({Y_iY_j^{\prime}}\bigr) =\mathbb{E}\bigl({\mathbb{E} (Y_i\mid \mathcal F_j)Y_j^{\prime}}\bigr)=0.\end{equation}

Taking the transpose we see that $\mathbb{E}\bigl({Y_iY_j'}\bigr)=0$ also when $i<j$ . Hence, all non-diagonal terms vanish in (4.14), and we find

(4.16) \begin{equation} \begin{split}\operatorname{Var}(X_n)=\sum_{i=1}^{n} F_{i,n} \mathbb{E}\bigl({Y_i Y_i'}\bigr) F_{i,n}^{\prime}. \end{split}\end{equation}

The formulas (4.12) and (4.16) form the basis of our proofs, and it remains mainly to analyse the matrix products $F_{i,j}$ .

Remark 4.1. The formula (4.8) holds for general Pólya urns, also when they are not balanced. However, in the general case, the total weights $w_k$ are random, and they are dependent on each other and on the $Y_\ell$ , and it seems difficult to draw any useful consequences from (4.8); certainly the arguments above fail because the $F_{i,j}$ would be random.

Remark 4.2. As remarked by a referee, since $P_{\lambda_1}Y_\ell=0$ by Lemma 6.1, we may also write (4.13) as

(4.17) \begin{equation} X_n-\mathbb{E} X_n =\sum_{\ell=1}^{n} F_{\ell,n}\widehat P Y_\ell=\sum_{\ell=1}^{n} \widehat F_{\ell,n} Y_\ell,\end{equation}

where $\widehat F_{i,j}\coloneqq \widehat P F_{i,j}=F_{i,j}\widehat P=\prod_{i\leqslant k<j}(I+w_k^{-1}\widehat A)-P_{\lambda_1}$ with $\widehat A\coloneqq \widehat P A$ . This could be used instead of (4.13) to make another version of the proofs below; the two versions are very similar and essentially equivalent. (See [Reference Hu and Zhang18] for a version essentially of this type.) The form (4.17) has the advantage that we have eliminated the (large) deterministic part corresponding to $P_{\lambda_1}$ ; for example, assuming $b=1$ , for $1\leqslant\ell\leqslant n$ we obtain $\lVert{\widehat F_{\ell,n}\rVert}=O\bigl({(n/\ell)^{\textrm{Re}\ \lambda_2}(1+\log(n/\ell))^{\nu_2}}\bigr)$ ; see Lemma 5.5. Nevertheless, we prefer to use $F_{i,j}$ in the proofs below.

5. Estimates of matrix functions

In this section we derive some estimates of $F_{i,n}$ ; these are used in the next section together with (4.16) to obtain the variance asymptotics. The estimates of $F_{i,n}$ are obtained by standard matrix calculus, including a Jordan decomposition of A. Similar estimates have been used in several related papers, e.g. [Reference Hu and Zhang18], [Reference Bai and Hu7], [Reference Zhang, Hu and Cheung31], and [Reference Janson19]. For completeness we nevertheless give detailed proofs.

For notational convenience, we make from now on the simplifying assumption $b=1$ . (For emphasis and clarity, we repeat this assumption in some statements; it will always be in force, whether stated or not.) This is no loss of generality: we can divide all activities by b and let the new activities be $\hat a\coloneqq a/b$ ; this defines the same random evolution of the urn, and we have $\hat a\cdot \xi_{i}=b/b=1$ for every i, so the modified urn is also balanced, with balance $\hat b=1$ . Furthermore, the intensity matrix A in (2.5) is divided by b, so all eigenvalues $\lambda_i$ are divided by b, but their ratios remain the same; the projections $P_{\lambda}$ remain the same while the nilpotent parts $N_\lambda$ are divided by b, and in both cases the indices are shifted; also, with the normalization (2.10), $u_1=a$ is divided by b while $v_1$ is multiplied by b. It is now easy to check that $\lambda_1v_1$ , B, and $\lambda_1\Sigma_I$ are invariant, and thus the theorems all follow from the special case $b=1$ . By the assumption (2.9) (see Remark 2.6 and Appendix A), we thus have $\lambda_1=1$ .

Note that (4.9) now becomes

(5.1) \begin{equation}w_n=n+w_0.\end{equation}

Note also that (4.10) can be written $F_{i,j}=f_{i,j}(A)$ , where $0\leqslant i\leqslant j$ and $f_{i,j}$ is the polynomial

(5.2) \begin{equation} \begin{split} f_{i,j}(z)&\coloneqq \prod_{i\leqslant k<j}\bigl({1+w_k^{-1} z}\bigr)=\prod_{i\leqslant k<j}\frac{w_k+ z}{w_k}=\prod_{i\leqslant k<j}\frac{k+w_0+ z}{k+w_0}\\[3pt]&= \frac{\Gamma({\kern1.3pt}j+w_0+z)/\Gamma(i+w_0+z)} {\Gamma({\kern1.3pt}j+w_0)/\Gamma(i+w_0)}= \frac{\Gamma({\kern1.3pt}j+w_0+z)}{\Gamma({\kern1.3pt}j+w_0)} \cdot \frac{\Gamma(i+w_0)}{\Gamma(i+w_0+z)} . \end{split}\end{equation}

Recall that by the functional calculus in spectral theory (see e.g. [Reference Dunford and Schwartz12, Chapter VII.1–3]), we can define f(A) not only for polynomials f(z) but for any function f(z) that is analytic in a neighbourhood of the spectrum $\sigma(A)$ . Furthermore, if K is a compact set that contains $\sigma(A)$ in its interior (for example a sufficiently large disc), then there exists a constant C (depending on A and K) such that for every f analytic in a neighbourhood of K,

(5.3) \begin{equation} \lVert{\,f(A)}\rVert\leqslant C\sup_{z\in K} |\,f(z)|.\end{equation}

We shall use the functional calculus mainly for polynomials and the entire functions $z\mapsto t^z=e^{(\!\log t)z}$ for fixed $t>0$ ; in these cases, f(A) can be defined by a Taylor series expansion as was done before (2.15). Note also that the general theory applies to operators in a Banach space; we only need the simpler finite-dimensional case discussed in [Reference Dunford and Schwartz12, Chapter VII.1].

We shall use the following formula for f(A), where $f^{(m)}$ denotes the mth derivative of f. (The formula can be seen as a Taylor expansion; see the proof.)

Lemma 5.1. For any entire function $f(\lambda)$ , and any $\lambda\in\sigma(A)$ ,

(5.4) \begin{equation}f(A)P_\lambda = \sum_{m=0}^{\nu_\lambda}\frac{1}{m!}\, f^{(m)}(\lambda)N_\lambda^mP_\lambda.\end{equation}

Proof. This is a standard formula in the finite-dimensional case (see [Reference Dunford and Schwartz12, Theorem VII.1.8]), but we give for completeness a simple (and perhaps informative) proof when f is a polynomial (which is the only case that we use, and which furthermore implies the general case by [Reference Dunford and Schwartz12, Theorem VII.1.5(d)]). We then have the Taylor expansion $f(\lambda+z)=\sum_{m=0}^\infty \frac{1}{m!}\, f^{(m)}(\lambda)z^m$ , which can be seen as an algebraic identity for polynomials in z (the sum is really finite since $f^{(m)}=0$ for large m), and thus

(5.5) \begin{equation} f(A)P_\lambda= f(\lambda I+N_\lambda)P_\lambda=\sum_{m=0}^\infty \frac{1}{m!}\, f^{(m)}(\lambda)N_\lambda^m P_\lambda,\end{equation}

where $N_\lambda^m=0$ when $m>\nu_\lambda$ .

Our strategy is to first show estimates for the polynomials $f_{i,j}(z)$ in (5.2) and then use these together with (5.3) and (5.4) to show the estimates for $F_{i,j}=f_{i,j}(A)$ that we need.

Lemma 5.2.

  1. (i) For every fixed i, as $j\to\infty$ ,

    (5.6) \begin{equation} f_{i,j}(z)=j^z \frac{\Gamma(i+w_0)}{\Gamma(i+w_0+z)}\bigl({1+o(1)}\bigr),\end{equation}
    uniformly for z in any fixed compact set in the complex plane.
  2. (ii) As $i,j\to\infty$ with $i\leqslant j$ ,

    (5.7) \begin{equation} f_{i,j}(z)=j^z i^{-z}\bigl({1+o(1)}\bigr),\end{equation}
    uniformly for z in any fixed compact set in the complex plane.

Proof. Both parts follow from (5.2) and the fact that

(5.8) \begin{equation}\frac{\Gamma(x+z)}{\Gamma(x)}=x^z\bigl(1+o(1)\bigr),\end{equation}

uniformly for z in a compact set, as $x\to\infty$ (with x real, say), which is an easy and well-known consequence of Stirling’s formula; see [Reference Olver, Lozier, Boisvert and Clark27, 5.11.12]. (Note that $\Gamma(i+w_0)/\Gamma(i+w_0+z)$ is an entire function for any $i\geqslant0$ , since $w_0>0$ . $\Gamma({\kern1.3pt}j+w_0+z)/\Gamma({\kern1.3pt}j+w_0)$ has poles, but for z in a fixed compact set, this function is analytic when j is large enough.)

For the derivatives $f_{i,j}^{(m)}(z)$ there are corresponding estimates.

Lemma 5.3. Let $m\geqslant0$ .

  1. (i) For every fixed $i\geqslant0$ , as $\ensuremath{{j\to\infty}}$ ,

    (5.9) \begin{equation} f^{(m)}_{i,j}(z)= {j}^z(\!\log j)^m \frac{\Gamma(i+w_0)}{\Gamma(i+w_0+z)}+o\bigl(\,{{j}^z\log^m j}\bigr),\end{equation}
    uniformly for z in any fixed compact set in the complex plane.
  2. (ii) As $i,j\to\infty$ with $i\leqslant j$ ,

    (5.10) \begin{equation} f^{(m)}_{i,j}(z)=\Bigl(\frac{\,j\,}{i}\Bigr)^z\Bigl({\log\frac{\,j\,}{i}}\Bigr)^m+ o\left({\Bigl(\frac{\,j\,}{i}\Bigr)^z\Bigl({1+\log\frac{\,j\,}{i}}\Bigr)^m}\right)\!,\end{equation}
    uniformly for z in any fixed compact set in the complex plane.

Proof. (i): Let $g_j(z)= j^{-z}f_{i,j}(z)$ . Then, by (5.6),

(5.11) \begin{equation}g_j(z)= \frac{\Gamma(i+w_0)}{\Gamma(i+w_0+z)}\bigl({1+o(1)}\bigr)=O(1)\qquad\text{as\quad \ensuremath{{j\to\infty}}},\end{equation}

uniformly in each compact set, and thus by Cauchy’s estimates, for any $\ell\geqslant1$ ,

(5.12) \begin{equation}g_j^{(\ell)}(z)=O(1)\qquad\text{as\quad \ensuremath{{j\to\infty}}},\end{equation}

uniformly in each compact set. By Leibniz’s rule,

(5.13) \begin{equation} \begin{split}f_{i,j}^{(m)}(z)&=\frac{\mathrm{d}^m}{\mathrm{d} z^m}\bigl({j^z g_j(z)}\bigr)=\sum_{\ell=0}^m\binom{m}{\ell}\frac{\mathrm{d}^\ell}{\mathrm{d} z^\ell}j^z \cdot g_j^{(m-\ell)}(z)\\[3pt]&=\sum_{\ell=0}^m\binom{m}{\ell}(\!\log j)^\ell j^zg_j^{(m-\ell)}(z), \end{split}\end{equation}

and (5.9) follows by (5.11)–(5.12).

(ii): Similarly, for $1\leqslant i\leqslant j$ , let $h_{i,j}(z)= (i/j)^{z}f_{i,j}(z)$ . Then, by (5.7),

(5.14) \begin{equation}h_{i,j}(z)=1+o(1)\qquad\text{as $i,j\to\infty$},\end{equation}

uniformly in each compact set, and thus by Cauchy’s estimates, for any $\ell\geqslant1$ ,

(5.15) \begin{equation}h_{i,j}^{(\ell)}(z)=\frac{\mathrm{d}^\ell}{\mathrm{d} z^\ell}\bigl({h_{i,j}(z)-1}\bigr)=o(1)\qquad\text{as $i,j\to\infty$},\end{equation}

uniformly in each compact set. By Leibniz’s rule,

(5.16) \begin{equation} \begin{split}f_{i,j}^{(m)}(z)&=\frac{\mathrm{d}^m}{\mathrm{d} z^m}\bigl({({\kern1.3pt}j/i)^z h_{i,j}(z)}\bigr)=\sum_{\ell=0}^m\binom{m}{\ell}\frac{\mathrm{d}^\ell}{\mathrm{d} z^\ell}({\kern1.3pt}j/i)^z \cdot h_{i,j}^{(m-\ell)}(z)\\[3pt]&=\sum_{\ell=0}^m\binom{m}{\ell}\bigl({\log ({\kern1.3pt}j/i)}\bigr)^\ell ({\kern1.3pt}j/i)^zh_{i,j}^{(m-\ell)}(z), \end{split}\end{equation}

and (5.10) follows by (5.14)–(5.15).

We now apply these estimates to $F_{i,j}$ , noting that by Lemma 5.1,

(5.17) \begin{equation} \begin{split}F_{i,j}P_\lambda=f_{i,j}(A)P_\lambda =\sum_{m=0}^{\nu_\lambda}\frac{1}{m!}\, f_{i,j}^{(m)}(\lambda)N_\lambda^mP_\lambda. \end{split}\end{equation}

Lemma 5.4. If $b=1$ , then, for $n\geqslant2$ and $\lambda\in\sigma(A)$ ,

(5.18) \begin{equation} {F_{0,n} P_\lambda}=n^{\lambda}\log^{\nu_\lambda}n \frac{\Gamma(w_0)}{\nu_\lambda!\,\Gamma(w_0+\lambda)}N_\lambda^{\nu_\lambda}P_\lambda+ o\bigl({n^{\textrm{Re}\ \lambda}\log^{\nu_\lambda}n}\bigr). \end{equation}

Proof. By (5.17) and (5.9),

(5.19) \begin{equation} \begin{split}F_{0,n}P_\lambda&=\sum_{m=0}^{\nu_\lambda}\frac{1}{m!}\, f_{0,n}^{(m)}(\lambda)N_\lambda^mP_\lambda=\frac{1}{\nu_{\lambda}!}\, f_{0,n}^{(\nu_\lambda)}(\lambda)N_\lambda^{\nu_\lambda} P_\lambda+\sum_{m=0}^{\nu_\lambda-1} O\bigl({n^\lambda \log^m n}\bigr), \end{split}\end{equation}

which yields (5.18) by another application of (5.9).

Lemma 5.5. If $b=1$ , then, for $1\leqslant i\leqslant j$ and $\lambda\in\sigma(A)$ ,

(5.20) \begin{equation} F_{i,j} P_\lambda = O\bigl({({\kern1.3pt}j/i)^{\textrm{Re}\ \lambda}(1+\log({\kern1.3pt}j/i))^{\nu_\lambda}}\bigr). \end{equation}

More precisely, for any $\nu\geqslant\nu_\lambda$ , as $i,j\to\infty$ with $i\leqslant j$ ,

(5.21) \begin{align}F_{i,j} P_\lambda &= \frac{1}{\nu!}\Bigl(\frac{\,j\,}{i}\Bigr)^\lambda\log^{\nu}\Bigl(\frac{\,j\,}{i}\Bigr) {N_\lambda^{\nu}P_\lambda }+ o\Bigl({\Bigl(\frac{\,j\,}{i}\Bigr)^{\textrm{Re}\ \lambda}\log^{\nu}\Bigl(\frac{\,j\,}{i}\Bigr)}\Bigr)\nonumber\\[3pt] &\quad +O\Bigr({\Bigl(\frac{\,j\,}{i}\Bigr)^{\textrm{Re}\ \lambda}\Bigl({1+\log^{\nu-1}\Bigl(\frac{\,j\,}{i}\Bigr)}\Bigr)}\Bigr).\end{align}

Proof. This is similar to the proof of Lemma 5.4. First, (5.20) follows directly from (5.17) and (5.10).

For (5.21), note that the summation in (5.17) may be extended to $m\leqslant\nu$ , since $N_\lambda^m=0$ when $m>\nu_\lambda$ . Then use (5.10) for each term $m=\nu$ .

Lemma 5.6. If $\textrm{Re}\ \lambda_2<\lambda_1=b=1$ , then for $0\leqslant i\leqslant j$ ,

(5.22) \begin{equation} \begin{split}F_{i,j}P_{\lambda_1}=f_{i,j}(\lambda_1)P_{\lambda_1} =\frac{j+w_0}{i+w_0}P_{\lambda_1}. \end{split}\end{equation}

Proof. Since $\lambda_1$ thus is assumed to be a simple eigenvalue, $\nu_{\lambda_1}=0$ . (Alternatively, see Lemma A.1.) Hence, (5.17) yields $F_{i,j}P_{\lambda_1}=f_{i,j}(\lambda_1)P_{\lambda_1}$ . Furthermore, (5.2) yields

(5.23) \begin{equation} f_{i,j}(\lambda_1)=f_{i,j}(1)=\frac{j+w_0}{i+w_0},\end{equation}

and (5.22) follows.

Lemma 5.7. For any fixed $x\in(0,1]$ , as ${n\to\infty}$ ,

(5.24) \begin{equation} F_{\lceil xn\rceil,n} \to x^{-A}.\end{equation}

Proof. Let K be a compact set containing $\sigma(A)$ in its interior. As ${n\to\infty}$ , by (5.7),

(5.25) \begin{equation} f_{\lceil xn\rceil,n}(z)=\Bigl(\frac{n}{\lceil xn\rceil}\Bigr)^z \bigl({1+o(1)}\bigr)=x^{-z}\bigl(1+o(1)\bigr)=x^{-z}+o(1),\end{equation}

uniformly for $z\in K$ . Consequently, $f_{\lceil xn\rceil,n}(z)-x^{-z}\to0$ uniformly on K, and thus $F_{\lceil xn\rceil,n}-x^{-A}\to0$ by (5.3).

Lemma 5.8. There exist $i_0$ and C such that if $i_0\leqslant i\leqslant j\leqslant 2i$ , then $\lVert{F_{i,j}\rVert^{-1}}\leqslant C$ .

Proof. Again let K be a compact set containing $\sigma(A)$ in its interior. By (5.7), we may choose $i_0$ such that if $i_0\leqslant i\leqslant j$ , then $|\,f_{i,j}(z)|\geqslant \frac12 |({\kern1.3pt}j/i)^z|$ on K. If furthermore $i\leqslant j\leqslant 2i$ , this implies $|\,f_{i,j}(z)|\geqslant c$ on K, for some $c>0$ , and thus $|\,f_{i,j}^{-1}(z)|\leqslant c^{-1}$ on K. The result follows by (5.3). (The condition $j\leqslant 2i$ is not needed when $\sigma(A)\subset\ensuremath{\{\textrm{Re} z>0\}}$ , so we may assume $\textrm{Re} z\geqslant0$ for $z\in K$ .)

6. Completions of the proofs

Proof of Theorem 3.1. By (4.12) and (2.6),

(6.1) \begin{equation} \mathbb{E} X_n = \sum_{\lambda\in\sigma(A)} F_{0,n} P_\lambda X_0. \end{equation}

For each eigenvalue $\lambda\neq\lambda_1$ , Lemma 5.4 shows that

(6.2) \begin{equation} F_{0,n} P_\lambda X_0=O\bigl({n^{\textrm{Re}\ \lambda}\log^{\nu_\lambda}n}\bigr)= O\bigl({n^{\textrm{Re}\ \lambda_2}\log^{\nu_2}n}\bigr).\end{equation}

Furthermore, by (2.12),

(6.3) \begin{equation} P_{\lambda_1}X_0=(a\cdot X_0)v_1 = w_0v_1,\end{equation}

and it follows from (5.22) that

(6.4) \begin{equation} \begin{split} F_{0,n}P_{\lambda_1}X_0 = \frac{n+w_0}{w_0}P_{\lambda_1}X_0 = \frac{n+w_0}{w_0} w_0 v_1=(n+w_0) v_1. \end{split}\end{equation}

The result (3.1) follows (when $\lambda_1=1$ ) from (6.1), (6.2), and (6.4).

Lemma 6.1. For every n, $ P_{\lambda_1}Y_n=0$ .

Proof. Since the urn is balanced, $a\cdot \Delta X_n=b$ is non-random, and thus, by (4.5),

(6.5) \begin{equation}a\cdot Y_n\coloneqq a\cdot\Delta X_{n-1} - \mathbb{E}\bigl({a\cdot\Delta X_{n-1}\mid \mathcal F_{n-1}}\bigr)=b-b=0.\end{equation}

The result follows by (2.12).

Using (2.6), we can rewrite (4.16) as

(6.6) \begin{equation} \begin{split}\operatorname{Var}(X_n)=\sum_{\lambda}\sum_{\mu} \sum_{i=1}^{n} F_{i,n}P_\lambda \mathbb{E}\bigl({Y_i Y_i'}\bigr)P_\mu^{\prime} F_{i,n}^{\prime}. \end{split}\end{equation}

For convenience, we define

(6.7) \begin{equation}T_{i,n,\lambda,\mu}\coloneqq F_{i,n}P_\lambda \mathbb{E}\bigl({Y_i Y_i^{\prime}}\bigr)P_\mu^{\prime} F_{i,n}^{\prime}.\end{equation}

Note that Lemma 6.1 implies $P_{\lambda_1}\mathbb{E}(Y_iY_i^{\prime})=\mathbb{E}(P_{\lambda_1}Y_iY_i^{\prime})=0$ and thus also, by taking the transpose, $\mathbb{E}(Y_iY_i^{\prime})P_{\lambda_1}^{\prime}=0$ . Hence $T_{i,n,\lambda,\mu}=0$ when $\lambda=\lambda_1$ or $\mu=\lambda_1$ , so these terms can be dropped and (6.6) can be written

(6.8) \begin{equation}\operatorname{Var}(X_n)=\sum_{\lambda\neq\lambda_1}\sum_{\mu\neq\lambda_1} \sum_{i=1}^{n} T_{i,n,\lambda,\mu}.\end{equation}

We begin with a simple estimate of this sum. The same estimates are given in [Reference Bai and Hu7, Theorem 2.2] under similar conditions.

Lemma 6.2. If $\lambda_1=1$ , then, for $n\geqslant2$ ,

(6.9) \begin{equation} \operatorname{Var} X_n=\begin{cases} O\bigl({n}\bigr),& \textrm{Re}\ \lambda_2<\frac12,\\[3pt] O\bigl({n\log^{2\nu_2+1}n }\bigr),& \textrm{Re}\ \lambda_2=\frac12,\\[3pt] O\bigl({n^{2\textrm{Re}\ \lambda_2}\log^{2\nu_2}n }\bigr),& \textrm{Re}\ \lambda_2>\frac12.\end{cases} \end{equation}

In particular, if $\lambda_2<\lambda_1=1$ , then

(6.10) \begin{equation}\operatorname{Var}(X_n) = o\bigl({n^2}\bigr). \end{equation}

Proof. It follows from (2.4) that $\mathbb{E} (Y_nY_n^{\prime})=O(1)$ . By combining this and Lemma 5.5, we see that if $\lambda$ and $\mu$ are two eigenvalues, then, for $1\leqslant i\leqslant n$ ,

(6.11) \begin{equation}T_{i,n,\lambda,\mu}= F_{i,n}P_\lambda \mathbb{E}\bigl({Y_i Y_i^{\prime}}\bigr)(F_{i,n}P_\mu)'=O\bigl({(n/i)^{\textrm{Re}\ \lambda+\textrm{Re}\mu}(1+\log (n/i))^{\nu_\lambda+\nu_\mu}}\bigr).\end{equation}

If $\textrm{Re}\ \lambda+\textrm{Re}\mu\geqslant1$ , we note that this implies

(6.12) \begin{equation}T_{i,n,\lambda,\mu} = O\bigl({(n/i)^{\textrm{Re}\ \lambda+\textrm{Re}\mu}\log^{\nu_\lambda+\nu_\mu}n}\bigr),\end{equation}

while if $\textrm{Re}\ \lambda+\textrm{Re}\mu<1$ , we choose $\alpha$ with $\textrm{Re}\ \lambda+\textrm{Re}\mu<\alpha<1$ and note that (6.11) implies

(6.13) \begin{equation}T_{i,n,\lambda,\mu} =O \bigl({(n/i)^{\alpha}}\bigr).\end{equation}

By summing over i we obtain from (6.12) and (6.13) that

(6.14) \begin{equation} \begin{split}\sum_{i=1}^n T_{i,n,\lambda,\mu} =\begin{cases} O\bigl({n}\bigr),& \textrm{Re}\ \lambda+\textrm{Re}\mu<1,\\[3pt] O\bigl({n\log^{\nu_\lambda+\nu_\mu+1}n }\bigr),& \textrm{Re}\ \lambda+\textrm{Re}\mu=1,\\[3pt] O\bigl({n^{\textrm{Re}\ \lambda+\textrm{Re}\mu}\log^{\nu_\lambda+\nu_\mu}n }\bigr),& \textrm{Re}\ \lambda+\textrm{Re}\mu>1.\end{cases} \end{split}\end{equation}

The result (6.9) follows from (6.8) by summing (6.14) over the finitely many $\lambda,\mu\in\sigma(A)\setminus\ensuremath{\{\lambda_1\}}$ and noting that our estimates are largest for $\lambda=\mu=\lambda_2$ . The simpler estimate (6.10) is an immediate consequence.

Lemma 6.3. If $\textrm{Re}\ \lambda_2<\lambda_1=1$ , then, as ${n\to\infty}$ ,

(6.15) \begin{equation}\mathbb{E}\bigl({Y_nY_n^{\prime}}\bigr)\to B-v_1v_1^{\prime}.\end{equation}

Hence, for any eigenvalue $\lambda\neq\lambda_1$ ,

(6.16) \begin{equation}P_\lambda\mathbb{E}\bigl({Y_nY_n^{\prime}}\bigr)\to P_\lambda B.\end{equation}

Proof. By (4.5) and (4.4), $Y_{n+1}=\Delta X_n-w_n^{-1} AX_n$ , with $\mathbb{E} (Y_{n+1}\mid\mathcal F_n)=0$ by (4.6). Hence,

(6.17) \begin{equation} \begin{split} \mathbb{E}\bigl({Y_{n+1}Y_{n+1}^{\prime}\mid \mathcal F_n}\bigr)= \mathbb{E}\bigl({\Delta X_n(\Delta X_n)'\mid \mathcal F_n}\bigr) -w_n^{-2} A X_n(AX_n)' \end{split}\end{equation}

and thus

(6.18) \begin{equation} \begin{split} \mathbb{E}\bigl({Y_{n+1}Y_{n+1}^{\prime}}\bigr)= \mathbb{E}\bigl({\Delta X_n(\Delta X_n)'}\bigr) -w_n^{-2} A \mathbb{E}\bigl({X_nX_n^{\prime}}\bigr)A'. \end{split}\end{equation}

By the definition of the urn and (4.3),

\begin{equation*} \begin{split}\mathbb{E}\bigl({\Delta X_n(\Delta X_n)'\mid \mathcal F_n}\bigr)&=\sum_{j=1}^q \mathbb{P}\bigl({I_{n+1}=j\mid \mathcal F_n}\bigr)\mathbb{E}\bigl({\xi_j\xi_j^{\prime}}\bigr)=\sum_{j=1}^q \frac{a_j X_{nj}}{w_n}\mathbb{E}\bigl({\xi_j\xi_j^{\prime}}\bigr), \end{split}\end{equation*}

and thus, using (5.1) and Theorem 3.1, and recalling (2.14), as ${n\to\infty}$ ,

(6.19) \begin{equation} \begin{split}\mathbb{E}\bigl({\Delta X_n(\Delta X_n)'}\bigr)=\sum_{j=1}^q \frac{a_j \mathbb{E} X_{nj}}{n+w_0}\mathbb{E}\bigl({\xi_j\xi_j^{\prime}}\bigr)\to\sum_{j=1}^q a_j v_{1j}\mathbb{E}\bigl({\xi_j\xi_j^{\prime}}\bigr)= B. \end{split}\end{equation}

Furthermore, by (6.10) and Theorem 3.1 again,

(6.20) \begin{equation}n^{-2} \mathbb{E}\bigl({X_nX_n^{\prime}}\bigr)=n^{-2} \operatorname{Var}(X_n)+n^{-2}(\mathbb{E} X_n)(\mathbb{E} X_n)'\to 0+v_1v_1^{\prime}.\end{equation}

Consequently, by (6.18), (6.19), and (6.20), and recalling that $w_n/n\to1$ by (5.1) and $Av_1=\lambda_1v_1=v_1$ ,

(6.21) \begin{equation}\mathbb{E}\bigl({Y_{n+1}Y_{n+1}^{\prime}}\bigr)\to B-Av_1v_1^{\prime}A^{\prime}= B-v_1v_1^{\prime}.\end{equation}

This proves (6.15), and (6.16) follows from noting that $P_\lambda v_1=P_\lambda P_{\lambda_1}v_1=0$ when $\lambda\neq\lambda_1$ .

Proof of Theorem 3.2. Let $\lambda,\mu\in\sigma(A)\setminus\ensuremath{\{\lambda_1\}}$ , and note that, by our assumption, $\textrm{Re}\ \lambda,\textrm{Re}\mu\leqslant\textrm{Re}\ \lambda_2<\frac12\lambda_1=\frac12$ . Write the inner sum in (6.8) as an integral:

(6.22) \begin{equation}\frac{1}n \sum_{i=1}^nT_{i,n,\lambda,\mu} =\int_0^1 T_{\lceil xn\rceil,n,\lambda,\mu} {\mathrm{d}} x.\end{equation}

For each fixed $x\in(0,1]$ , by Lemmas 5.7 and 6.3,

(6.23) \begin{equation} \begin{split}T_{\lceil xn\rceil,n,\lambda,\mu}&= F_{\lceil{xn}\rceil,n} P_\lambda \mathbb{E}\bigl({Y_{\lceil xn\rceil}Y^{\prime}_{\lceil xn\rceil}}\bigr)P_\mu^{\prime}F^{\prime}_{\lceil{xn}\rceil,n}\\&\to x^{-A} P_{\lambda} B P_\mu^{\prime} x^{-A'}. \end{split}\end{equation}

Furthermore, choose some $\alpha\in[0,1)$ such that $\textrm{Re}\ \lambda_2<\frac12\alpha$ . Then (6.13) applies and yields, for some $C<\infty$ ,

(6.24) \begin{equation}\begin{split} T_{\lceil xn\rceil,n,\lambda,\mu}\leqslant C (n/\lceil xn\rceil)^{\alpha}\leqslant C x^{-\alpha},\end{split}\end{equation}

which is integrable on (0,1]. Thus, Lebesgue’s theorem on dominated convergence applies to (6.22) and yields, by (6.23) and the change of variables $x=e^{-s}$ ,

\begin{equation*}\frac{1}n \sum_{i=1}^nT_{i,n,\lambda,\mu}\to\int_0^1 x^{-A} P_{\lambda} B P_\mu^{\prime} x^{-A'} {\mathrm{d}} x=\int_0^\infty e^{sA} P_{\lambda} B P_\mu^{\prime} e^{sA'} e^{-s}{\mathrm{d}} s.\end{equation*}

Hence, (6.8) and the definition (2.15) yield

\begin{equation*}\frac{1}n\operatorname{Var} X_n=\frac{1}n \sum_{\lambda\neq\lambda_1}\sum_{\mu\neq\lambda_1}\sum_{i=1}^nT_{i,n,\lambda,\mu}\to\int_0^\infty e^{sA} \widehat P B \widehat P' e^{sA'} e^{-s}{\mathrm{d}} s=\Sigma_I,\end{equation*}

showing (3.3).

Proof of Theorem 3.3. As in the proof of Theorem 3.2, we use (6.8) and consider the sum $\sum_{i=1}^nT_{i,n,\lambda,\mu}$ for two eigenvalues $\lambda,\mu\in\sigma(A)\setminus\ensuremath{\{\lambda_1\}}$ . By assumption, $\textrm{Re}\ \lambda+\textrm{Re}\mu\leqslant2\textrm{Re}\ \lambda_2=1$ , and if $\textrm{Re}\ \lambda+\textrm{Re}\mu<1$ , then $\sum_{i=1}^nT_{i,n,\lambda,\mu}=O(n)$ by (6.14). Hence we only have to consider the case $\textrm{Re}\ \lambda+\textrm{Re}\mu=1$ , i.e., $\textrm{Re}\ \lambda=\textrm{Re}\mu=\frac12=\textrm{Re}\ \lambda_2$ . In particular, $\nu_\lambda,\nu_\mu\leqslant\nu_2$ .

In this case, as in (6.22), we transform the sum into an integral, but this time in a somewhat different way. Using the change of variables $x=n^y=e^{y\log n}$ , we have

(6.25) \begin{equation} \begin{split}\sum_{i=1}^nT_{i,n,\lambda,\mu} &=T_{1,n,\lambda,\mu} + \int_1^n T_{{\lceil x \rceil},n,\lambda,\mu}{\mathrm{d}} x\\&=T_{1,n,\lambda,\mu} + \int_0^1 T_{\lceil n^y\rceil,n,\lambda,\mu} n^y \log n\,{\mathrm{d}} y. \end{split}\end{equation}

Hence, since $T_{1,n,\lambda,\mu}=O\bigl({n\log^{2\nu_2}n}\bigr)$ by (6.12),

(6.26) \begin{equation} \begin{split}\bigl({n\log^{2\nu_2+1}n}\bigr)^{-1}\sum_{i=1}^nT_{i,n,\lambda,\mu}&=o(1) + \int_0^1 n^{y-1} (\!\log n)^{-2\nu_2}T_{\lceil n^y\rceil,n,\lambda,\mu}{\mathrm{d}} y. \end{split}\end{equation}

Fix $y\in(0,1)$ . Then, by (5.21),

(6.27) \begin{equation} \begin{split}F_{\lceil{n^y}\rceil,n} P_\lambda &= \frac{1}{\nu_2!}\left(\frac{n}{\lceil n^y\rceil}\right)^\lambda\log^{\nu_2}\left(\frac{n}{\lceil n^y\rceil}\right) \Bigl({N_\lambda^{\nu_2}P_\lambda +o(1)}\Bigr)\\&= \frac{1}{\nu_2!}n^{(1-y)\lambda}\bigl({(1-y)\log n}\bigr)^{\nu_2}\bigl({N_\lambda^{\nu_2}P_\lambda +o(1)}\bigr), \end{split}\end{equation}

and similarly for $\mu$ .

Recall the assumption $\textrm{Re}\ \lambda+\textrm{Re}\mu=1$ , and let $\tau\coloneqq \textrm{Im}\lambda+\textrm{Im}\mu$ , so $\lambda+\mu=1+\mathrm{i}\tau$ . Then, by (6.7), (6.27), and (6.16),

(6.28) \begin{align}&n^{y-1} (\!\log n)^{-2\nu_2}T_{\lceil n^y\rceil,n,\lambda,\mu}\nonumber\\ &\quad = \frac{1}{(\nu_2!)^2}n^{\mathrm{i}(1-y)\tau}(1-y)^{2\nu_2}N_\lambda^{\nu_2}P_\lambda B \bigl({N_\mu^{\nu_2} P_\mu}\bigr)'+o(1). \end{align}

Moreover, by (6.12), uniformly for $y\in(0,1]$ and $n\geqslant2$ ,

(6.29) \begin{equation} \begin{split} n^{y-1} (\!\log n)^{-2\nu_2}T_{\lceil n^y\rceil,n,\lambda,\mu}= O\bigl({(n/\lceil n^y\rceil) n^{y-1}}\bigr) = O(1). \end{split}\end{equation}

Hence the error term o(1) in (6.28) is also uniformly bounded, and we can apply dominated convergence to the integral of it, yielding

(6.30) \begin{align}&\int_0^1 n^{y-1} (\!\log n)^{-2\nu_2}T_{\lceil n^y\rceil,n,\lambda,\mu}{\mathrm{d}} y \nonumber\\ &\quad = \frac{1}{(\nu_2!)^2}\int_0^1 n^{\mathrm{i}(1-y)\tau}(1-y)^{2\nu_2} {\mathrm{d}} y\cdot N_\lambda^{\nu_2}P_\lambda {B} \bigl({N_\mu^{\nu_2} P_\mu}\bigr)'+o(1).\end{align}

In the case $\tau=0$ , i.e., $\mu=\overline \lambda$ , the integral on the right-hand side of (6.30) is $\int_0^1 (1-y)^{2\nu_2}{\mathrm{d}} y=(2\nu_2+1)^{-1}$ . Furthermore, in this case, $P_\mu=P_{\overline \lambda}=\overline{P_\lambda}$ and thus $P_\mu^{\prime}=P_\lambda^*$ , and similarly $N_\mu^{\prime}=N_\lambda^*$ . Hence, (6.31) yields

(6.31) \begin{equation}\int_0^1 n^{y-1} (\!\log n)^{-2\nu_2}T_{\lceil{n^y}\rceil,n,\lambda,\overline \lambda}{\mathrm{d}} y= \frac{1}{(2\nu_2+1)(\nu_2!)^2}N_\lambda^{\nu_2}P_\lambda {B} P_\lambda^*(N_\lambda^*)^{\nu_2}+o(1).\end{equation}

On the other hand, if $\tau\neq0$ , then, with $u=1-y$ ,

(6.32) \begin{equation} \int_0^1 n^{\mathrm{i}(1-y)\tau}(1-y)^{2\nu_2} {\mathrm{d}} y= \int_0^1 e^{\mathrm{i} (\tau\log n)u} u^{2\nu_2} {\mathrm{d}} u\to0\end{equation}

as ${n\to\infty}$ , and thus $|\tau\log n|\to\infty$ , by an integration by parts (or by the Riemann–Lebesgue lemma). Hence, when $\mu\neq\overline \lambda$ , (6.30) yields

(6.33) \begin{equation} \int_0^1 n^{y-1} (\!\log n)^{-2\nu_2}T_{\lceil n^y\rceil,n,\lambda,\mu}{\mathrm{d}} y=o(1).\end{equation}

We saw in the beginning of the proof that we can ignore the terms in (6.8) with $\textrm{Re}\ \lambda<\frac12$ or $\textrm{Re}\mu<\frac12$ , and by (6.26) and (6.33), we can also ignore the case $\textrm{Re}\ \lambda=\textrm{Re}\mu=\frac12$ but $\mu\neq\overline \lambda$ . Hence only the case $\mu=\overline \lambda$ with $\textrm{Re}\ \lambda=\frac12$ remains in (6.8), and the result follows by (6.26) and (6.33).

Proof of Theorem 3.4. By (6.10),

\begin{equation*}\mathbb{E}\lVert{X_n/n-\mathbb{E} X_n/n\rVert}^2=n^{-2}\mathbb{E}\lVert{X_n-\mathbb{E} X_n\rVert}^2=\sum_{i=1}^q n^{-2}\operatorname{Var}(X_{ni})\to0,\end{equation*}

and $\mathbb{E} X_n/n\to v_1$ by Theorem 3.1. Hence, $\mathbb{E}\lVert{X_n/n-v_1}\rVert^2\to0$ , which is the claimed convergence in $L^2$ .

Moreover, if we fix $\varepsilon\in(0,\frac12)$ such that $\textrm{Re}\ \lambda_2<1-\varepsilon$ , then the same argument shows, using (6.9) and (6.1)–(6.4), that, more precisely,

(6.34) \begin{equation} \mathbb{E}\lVert{X_n-(n+w_0)v_1}\rVert^2 = O\bigl({n^{2-2\varepsilon}}\bigr).\end{equation}

Let $N\geqslant1$ . By (4.11) and the definition (4.10), for any $n\leqslant N$ ,

(6.35) \begin{equation} F_{n,N}X_n = F_{0,N} X_0+\sum_{\ell=1}^{n}F_{\ell,N} Y_{\ell}.\end{equation}

Moreover, by (4.6), $Y_n$ is a martingale difference sequence, and thus so is $F_{n,N}Y_n$ , for $n\leqslant N$ . Hence, (6.35) shows that $F_{n,N}X_n$ , $n\leqslant N$ , is a martingale, and thus

(6.36) \begin{equation} F_{n,N}X_n=\mathbb{E}\bigl({X_N\mid\mathcal F_n}\bigr),\qquad n\leqslant N.\end{equation}

By Lemma 5.6,

\[F_{n,N} v_1=F_{n,N} P_{\lambda_1} v_1=\frac{N+w_0}{n+w_0}v_1,\]

and thus (6.36) implies

(6.37) \begin{equation} F_{n,N}\bigl({X_n-(n+w_0)v_1}\bigr)=\mathbb{E}\bigl({X_N-(N+w_0)v_1\mid\mathcal F_n}\bigr),\qquad n\leqslant N.\end{equation}

Hence, by Doob’s inequality (applied to each coordinate) and (6.34),

(6.38) \begin{equation}\mathbb{E}\sup_{n\leqslant N}\lVert{ F_{n,N}\bigl({X_n-(n+w_0)v_1}\bigr)}\rVert^2\leqslant 4\mathbb{E}\lVert{X_N-(N+w_0)v_1}\rVert^2=O\bigl({N^{2-2\varepsilon}}\bigr).\end{equation}

It follows, using Lemma 5.8, that if $N\geqslant 2i_0$ , then

(6.39) \begin{equation}\mathbb{E}\sup_{N/2\leqslant n\leqslant N}\|\bigl({X_n-(n+w_0)v_1}\bigr)/n\|^2=O\bigl({N^{-2\varepsilon}}\bigr).\end{equation}

This holds trivially for smaller N as well, since each $X_n\in L^2$ and thus the left-hand side of (6.41) is finite for each N. Consequently, taking $N=2^k$ and summing,

(6.40) \begin{equation} \mathbb{E}\sum_{k=1}^\infty \sup_{2^{k-1}\leqslant n\leqslant 2^k}\|\bigl({X_n-(n+w_0)v_1}\bigr)/n\|^2 <\infty.\end{equation}

Consequently, $\|\bigl({X_n-(n+w_0)v_1}\bigr)/n\|\to0$ a.s., and thus $X_n/n\overset{\mathrm{a.s.}}{\longrightarrow} v_1$ .

Proof of Theorem 3.5. If $\operatorname{Var}(u\cdot X_n)=0$ for every n, then $u'\Sigma u=0$ by (3.4).

For the converse, assume that $u'\Sigma u=0$ . Then, by (3.3) and (2.15),

(6.41) \begin{equation}0=u'\Sigma_I u=\int_0^\infty u'\widehat P e^{sA} B e^{sA'} \widehat P'u\, e^{-\lambda_1 s} {\mathrm{d}} s.\end{equation}

The integrand is a continuous function of $s\geqslant0$ , and non-negative since B is non-negative definite by (2.14). Hence, the integrand vanishes for every $s\geqslant0$ . In particular, taking $s=0$ we obtain, using (2.14) again,

(6.42) \begin{equation} \begin{split}0 &= u'\widehat P B \widehat P' u= \sum_{i=1}^q a_i v_{1i} u'\widehat P \mathbb{E}(\xi_i\xi_i^{\prime})\widehat P' u\\&= \sum_{i=1}^q a_i v_{1i} \mathbb{E}\bigl({u'\widehat P \xi_i(u'\widehat P\xi_i)'}\bigr)= \sum_{i=1}^q a_i v_{1i} \mathbb{E}\bigl({u'\widehat P \xi_i}\bigr)^2, \end{split}\end{equation}

noting that $u'\widehat P \xi_i$ is a scalar. Each term is non-negative, and thus each term is 0. If i is such that $a_i>0$ , then it follows from the assumption that $\widetilde A$ is irreducible that $v_{1i}>0$ , and hence (6.42) yields $\mathbb{E}(u'\widehat P \xi_i)^2=0$ and thus $u'\widehat P\xi_i=0$ a.s. Furthermore, since the urn is balanced, by (2.12),

(6.43) \begin{equation} P_{\lambda_1}\xi_i = (a\cdot\xi_i)v_1 = b v_1.\end{equation}

Hence, for every i with $a_i>0$ ,

(6.44) \begin{equation} \begin{split} u\cdot\xi_i = u\cdot \bigl({\widehat P+P_{\lambda_1}}\bigr)\xi_i=0+u\cdot (bv_1)=bu\cdot v_1. \end{split}\end{equation}

This is independent of i, and thus, for every n, a.s.,

(6.45) \begin{equation} u\cdot \Delta X_n=bu\cdot v_1.\end{equation}

Consequently, a.s.,

(6.46) \begin{equation}u\cdot X_n=u\cdot X_0+nbu\cdot v_1\end{equation}

and thus $u\cdot X_n$ is deterministic.

7. Examples

Pólya urns have been used for a long time in various applications, for example to study fringe structures in various random trees; see e.g. [Reference Bagchi and Pal5], [Reference Aldous, Flannery and Palacios2], and [Reference Devroye11]. Some recent examples are given in [Reference Holmgren and Janson15], where, in particular, the number of two-protected nodes in a random m-ary search tree is studied for $m=2$ and $m=3$ using suitable Pólya urns with 5 and 19 types, respectively, and it is shown that if this number is denoted by $Y_n$ ( $m=2$ ) or $Z_n$ ( $m=3$ ) for a search tree with n keys, then

(7.1) \begin{align}\dfrac{Y_n-\frac{11}{30}n}{\sqrt{n}}&\overset{\mathrm{d}}{\longrightarrow}N\left({0,\frac{29}{225}}\right),\end{align}
(7.2) \begin{align}\frac{Z_n-\frac{57}{700}n}{\sqrt{n}}&\overset{\mathrm{d}}{\longrightarrow}N\left({0,\frac{1692302314867}{43692253605000}}\right).\end{align}

(The binary case (7.1) was shown earlier by Mahmoud and Ward [Reference Mahmoud and Ward25] using other methods.) The urns are strictly small; in both cases $\lambda_1=1$ and $\lambda_2=0$ , with $\nu_2=0$ , and Theorems 3.1 and 3.2 yield, using the calculations in [Reference Holmgren and Janson15] (see Remark 3.1),

(7.3) \begin{equation}\hspace*{-44pt}\mathbb{E} Z_n=\frac{57}{700}\,n+ O(1),\end{equation}
(7.4) \begin{align}\operatorname{Var} Z_n&=\frac{1692302314867}{43692253605000}\,n + o(n),\end{align}

together with corresponding results for $Y_n$ . (The results for $Y_n$ were previously shown in Mahmoud and Ward [Reference Mahmoud and Ward25], where exact formulas for the mean and variance of $Y_n$ were given.)

Furthermore, [Reference Holmgren and Janson15] also studies the numbers of leaves and one-protected nodes in a random m-ary search tree using a similar but simpler urn. (For $m=2$ this was done already by Devroye [Reference Devroye11].) For $2\leqslant m\leqslant 26$ , this is a strictly small urn, and again the results in Section 3 yield asymptotics of mean and variance.

See [Reference Holmgren, Janson and Šileikis16] for further similar examples.

Remark 7.1. As mentioned above, the urn used to show (7.1) has five types, corresponding to five different small trees. Drawing a ball corresponds to adding a node to a (randomly chosen) gap in the corresponding tree; this may cause the tree to break up into several smaller trees. The five types have 4, 3, 2, 1, and 0 gaps, respectively, and these numbers are their activities. Moreover, for type 2, the gaps are not equivalent, which makes the replacement for this type random. (We have $\xi_2=(1,-1,0,0,0)$ with probability $1/3$ and $\xi_2=(0,0,0,1,0)$ with probability $2/3$ ; see [Reference Holmgren and Janson15].)

A different, essentially equivalent, approach is to instead consider as types the different gaps in the different trees; this yields five new types that we denote by 1, 2A, 2B, 3, 4. The transition from the old urn to the new is a simple linear transformation: each old ball of type 1 is replaced by four new balls of type 1, which we write as $1\to4\cdot 1$ , and similarly $ 2\to \mathrm{2A}+2\cdot\mathrm{2B}$ , $3\to2\cdot 3$ , $4\to4$ , while balls of type 5 (which has activity 0) are ignored. This yields a new Pólya urn, where all types have activity 1. In the new urn, all replacements are deterministic, which sometimes is an advantage, but on the other hand, replacements now may involve subtractions. For example, in the original urn, $\xi_1=(-1,1,1,0,0)$ , meaning that if we draw a ball of type 1, it is discarded and replaced by one of type 2 and one of type 3. In the new urn, this translates to $\xi_1=(-4,1,2,2,0)$ , meaning that we remove the drawn ball together with three others of type 1, and then add $\mathrm{2A}+2\cdot\mathrm{2B}+2\cdot3$ . Even worse, $\xi_2=(4,-1,-2,0,0)$ , meaning that if we draw a ball of type 2A, we remove it together with two balls of type 2B, and add four balls of type 1. Nevertheless, by the construction, the urn is obviously tenable in the sense of the present paper. This urn, with the gaps as types, thus is an example of a tenable urn with subtractions that occur naturally in an application.

The Pólya urn for the ternary search tree with 19 types in [Reference Holmgren and Janson15] can similarly be translated into an urn (with 29 types) using gaps as types, again with deterministic replacements, but sometimes subtractions.

See also [Reference Holmgren and Janson15], where the transition to the corresponding urn with gaps was used for the simpler urn used to study leaves; in that case there are no subtractions.

8. Further comments

The decomposition (4.11) and its consequence (4.16) explain some of the differences between the small and large urns described in the introduction. Suppose again for convenience that $\lambda_1=1$ . Then the term $F_{\ell,n}Y_\ell$ in (4.11), which is the (direct and indirect) contribution from the $\ell$ th draw, has a variance roughly (ignoring logarithmic factors when $\nu_2>0$ ) of the order $(n/\ell)^{2\textrm{Re}\ \lambda_2}$ ; see Lemma 6.2 and its proof. For a large urn, this decreases rapidly with $\ell$ , and $\sum_\ell \ell^{-2\textrm{Re}\ \lambda_2}$ converges; thus the variance is dominated by the contribution from the first draws. This strong long-term dependency leads to the a.s. limit results, and to the dependency of the limit on the initial state $X_0$ .

On the other hand, for a strictly small urn, the sum of the variances is of the order n, but each term is o(n) and is negligible, which explains why the first draws, and the initial state, do not affect the limit distribution. In fact, for a component $X_{n,i}$ with asymptotic variance $(\Sigma)_{ii}>0$ , we see that for any $\varepsilon>0$ , all but a fraction $\varepsilon$ of $\operatorname{Var} X_{n,i}$ is explained by the draws with numbers in $[\delta n, n]$ , for some $\delta=\delta(\varepsilon)>0$ . The long-term dependency is thus weak in this case.

The remaining case, a small urn with $\textrm{Re}\ \lambda_2=1/2$ , is similar to the strictly small case, but the long-term dependency is somewhat stronger. If for simplicity we assume $\nu_2=0$ , then the contribution of the $\ell$ th draw to $\operatorname{Var}(X_n)$ is of the order $n/\ell$ , giving a total variance of order $n\log n$ . Again, the first draws and the initial state do not affect the limit distribution, but in order to explain all but a fraction $\varepsilon$ of the variance, we have to use the draws in $[n^\delta, n]$ , for some small $\delta>0$ . (Cf. the functional limit theorem [Reference Janson19, Theorem 3.31] with different time scales for strictly small and non-strictly small urns.)

Cf. also [Reference Janson19, Remark 4.3], where a similar argument is made using the corresponding continuous-time branching process.

Appendix A. The largest eigenvalue

We have seen in (2.8) that for a balanced urn, b is an eigenvalue of A, with a non-negative left eigenvector a.

In typical applications, b is the largest eigenvalue $\lambda_1$ . Before proceeding, let us note that this is not always true.

Example A.1. As a counterexample, consider an urn with three colours, with activities 1, and the (deterministic) replacements $\xi_1=(1,2,0)$ , $\xi_2=(2,1,0)$ , $\xi_3=(-1,0,4)$ . The urn is balanced, with $b=3$ , and if we start with $X_0=(1,0,0)$ , the urn is tenable. Nevertheless, the largest eigenvalue $\lambda_1=4>b$ . Of course, the reason is that the urn never will contain any ball of colour 3, so this ought to be treated as an urn with just colours 1 and 2. (If there is any ball of colour 3, the urn is not tenable.)

Example A.1 is obviously a silly counterexample, but it shows that we need some extra assumption to exclude such trivialities. We have the following result, which shows that if we only use colours that actually can occur, then $\lambda_1=b$ holds (or, at least, may be assumed) and $\nu_1=0$ .

Lemma A.1. If the Pólya urn is tenable and balanced, and moreover every colour has a non-zero probability of ever appearing in the urn, then $\textrm{Re}\ \lambda\leqslant b$ for every $\lambda\in\sigma(A)$ , and, furthermore, if $\textrm{Re}\ \lambda=b$ then $\nu_\lambda=0$ . We may thus assume $\lambda_1=b$ .

Proof. As in Section 5, we may and shall assume that $b=1$ .

Suppose that $\lambda\in\sigma(A)$ . By (4.12) and Lemma 5.4,

(A.1) \begin{equation} \begin{split}P_\lambda \mathbb{E} X_n&= P_\lambda F_{0,n} X_0= F_{0,n}P_\lambda X_0\\[3pt]&=n^{\lambda}\log^{\nu_\lambda}n \frac{\Gamma(w_0)}{\nu_\lambda!\,\Gamma(w_0+\lambda)}N_\lambda^{\nu_\lambda}P_\lambda X_0+ o\bigl({n^{\textrm{Re}\ \lambda}\log^{\nu_\lambda}n}\bigr). \end{split}\end{equation}

On the other hand, by our assumption (2.4), $\mathbb{E}\|\xi_i\|<\infty$ for each i, and thus $\mathbb{E}\|\Delta X_n\|\leqslant \max_i\mathbb{E}\|\xi_i\|<\infty$ and therefore $\|\mathbb{E} X_n\|\leqslant\mathbb{E}\|X_n\|=O(n)$ . Hence,

(A.2) \begin{equation}P_\lambda \mathbb{E} X_n=O(n).\end{equation}

Suppose now that either $\textrm{Re}\ \lambda>1$ , or $\textrm{Re}\ \lambda=1$ and $\nu_\lambda>0$ . Then (A.1) and (A.2) yield the desired contradiction unless

(A.3) \begin{equation} N_\lambda^{\nu_\lambda}P_\lambda X_0=0.\end{equation}

Moreover, if we run the urn for k steps and regard $X_k$ as a new starting position (conditioning on $X_k$ ), then the resulting urn is a.s. tenable; hence the argument just given shows that

(A.4) \begin{equation} N_\lambda^{\nu_\lambda}P_\lambda X_k=0\end{equation}

a.s. for every $k\geqslant0$ . Hence, also $ N_\lambda^{\nu_\lambda}P_\lambda \Delta X_k=0$ a.s. If j is any colour with $a_j>0$ , then, by assumption, there exists a k such that $\mathbb{P}(X_{k,j}>0)>0$ , and thus with positive probability $I_{k+1}=j$ and then $\Delta X_k$ is a copy of $\xi_j$ . Consequently, if $a_j>0$ then

(A.5) \begin{equation} N_\lambda^{\nu_\lambda}P_\lambda \xi_j=0\end{equation}

a.s., and thus

(A.6) \begin{equation} N_\lambda^{\nu_\lambda}P_\lambda \mathbb{E}\xi_j=0.\end{equation}

In other words, for every j,

(A.7) \begin{equation} N_\lambda^{\nu_\lambda}P_\lambda \bigl({a_j \mathbb{E}\xi_j}\bigr)=0.\end{equation}

However, by (2.5), the jth column of A is $a_j\mathbb{E}\xi_j$ . Consequently, (A.7) is equivalent to $ N_\lambda^{\nu_\lambda}P_\lambda A\,{=}\,0$ . Since $P_\lambda A =\lambda P_\lambda+N_\lambda$ by (2.7), and $N_\lambda^{\nu_\lambda+1}=0$ , this yields

(A.8) \begin{equation}0= N_\lambda^{\nu_\lambda}P_\lambda A=\lambda N_\lambda^{\nu_\lambda}P_\lambda +N_\lambda^{\nu_\lambda+1}=\lambda N_\lambda^{\nu_\lambda}\end{equation}

and thus $\lambda=0$ , which contradicts the assumption on $\lambda$ .

Consequently, $\textrm{Re}\ \lambda\leqslant1\,{=}\, b$ for every $\lambda\in\sigma(A)$ , and since $b\in\sigma(A)$ , $\textrm{Re}\ \lambda_1=\max_{\lambda\in\sigma(A)}\textrm{Re}\ \lambda\,{=}\,b$ . We have not ruled out the possibility that there are other eigenvalues $\lambda$ with $\textrm{Re}\ \lambda=b$ (see Remark A.2 below), but even if this should happen, we have shown that they all have $\nu_\lambda=0$ , so we are free to choose $\lambda_1=b$ .

Remark A.1. As noted in Remark 2.2, the eigenvalue $\lambda_1=b$ may be multiple. Lemma A.1 shows that $\nu_1\coloneqq \nu_{\lambda_1}=0$ also in this case.

Remark A.2. Lemma A.1 is not completely satisfactory since it does not rule out the possibility that besides b, there is also some complex eigenvalue $\lambda=b+it$ with $t\neq0$ . We do not believe that this is possible, but we do not know a proof for a general tenable urn under our assumptions.

Appendix B. A note on (2.15)

In the balanced case, by (2.12) and (2.3), a.s.,

(B.1) \begin{equation} P_{\lambda_1}\xi_i=(a\cdot\xi_i)v_1=bv_1=\lambda_1v_1,\end{equation}

and thus, by (2.14) and (2.5),

(B.2) \begin{equation} \begin{split}P_{\lambda_1} B &=\sum_{i=1}^q a_i v_{1i}\mathbb{E}\bigl({P_{\lambda_1}\xi_i\xi_i^{\prime}}\bigr)=\sum_{i=1}^q a_i v_{1i}\lambda_1v_1\mathbb{E}\bigl({\xi_i^{\prime}}\bigr)=\lambda_1v_1\Bigl({\sum_{i=1}^q a_i v_{1i}\mathbb{E}\xi_{ij}}\Bigr)_j^{\prime}\\&=\lambda_1v_1\Bigl({\sum_{i=1}^q (A)_{ji} v_{1i}}\Bigr)^{\prime}_j=\lambda_1v_1\bigl({A v_{1}}\bigr)'=\lambda_1^2v_1v_{1}^{\prime}. \end{split}\raisetag{1.5\baselineskip}\end{equation}

Since B is symmetric, also $BP_{\lambda_1}^{\prime}=(P_{\lambda_1} B)'=\lambda_1^2v_1v_1^{\prime}$ , and thus

(B.3) \begin{equation} P_{\lambda_1} B P_{\lambda_1}^{\prime} = P_{\lambda_1} B = B P_{\lambda_1}^{\prime} = \lambda_1^2 v_1v_1^{\prime}\end{equation}

and, as a simple consequence, still in the balanced case,

(B.4) \begin{equation} \widehat P B \widehat P' = \widehat P B = B \widehat P' = B-\lambda_1^2 v_1v_1^{\prime}.\end{equation}

Cf. Lemma 6.3 (where $\lambda_1=1$ ).

Hence, in the balanced case, we can omit either $\widehat P$ or $\widehat P'$ in (2.15). (This was noted empirically by Axel Heimbürger and Cecilia Holmgren, in personal communication.) However, by (B.4), we cannot omit both, nor even move both outside the integral, because $e^{sA}v_1=e^{\lambda_1s}v_1$ and thus

(B.5) \begin{equation}\int_0^\infty e^{sA}v_1v_1^{\prime}e^{sA'}e^{-\lambda_1s}{\mathrm{d}} s=\int_0^\infty e^{\lambda_1s}v_1v_1^{\prime}{\mathrm{d}} s,\end{equation}

which diverges.

Acknowledgements

I thank the anonymous referees for insightful and helpful comments. This work was supported by the Knut and Alice Wallenberg Foundation.

References

Aldous, D. J. and Eagleson, G. K. (1978). On mixing and stability of limit theorems. Ann. Prob. 6, 325331.10.1214/aop/1176995577CrossRefGoogle Scholar
Aldous, D. J., Flannery, B. and Palacios, J. L. (1988). Two applications of urn processes—the fringe analysis of search trees and the simulation of quasi-stationary distributions of Markov chains. Prob. Eng. Inf. Sci. 2, 293307.CrossRefGoogle Scholar
Athreya, K. B. and Karlin, S. (1968). Embedding of urn schemes into continuous time Markov branching processes and related limit theorems. Ann. Math. Statist. 39, 18011817.CrossRefGoogle Scholar
Athreya, K. B. and Ney, P. E. (1972). Branching Processes. Springer, Berlin.10.1007/978-3-642-65371-1CrossRefGoogle Scholar
Bagchi, A. and Pal, A. K. (1985). Asymptotic normality in the generalized Pólya–Eggenberger urn model, with an application to computer data structures. SIAM J. Algebraic Discrete Meth. 6, 394405.CrossRefGoogle Scholar
Bai, Z.-D. and Hu, F. (1999). Asymptotic theorems for urn models with nonhomogeneous generating matrices. Stoch. Process. Appl. 80, 87101.CrossRefGoogle Scholar
Bai, Z.-D. and Hu, F. (2005). Asymptotics in randomized urn models. Ann. Appl. Prob. 15, 914940.CrossRefGoogle Scholar
Bai, Z.-D., Hu, F. and Rosenberger, W. F. (2002). Asymptotic properties of adaptive designs for clinical trials with delayed response. Ann. Statist. 30, 122139.CrossRefGoogle Scholar
Bernstein, S. N. (1940). Nouvelles applications des grandeurs aléatoires presqu’indépendantes. Izv. Akad. Nauk SSSR Ser. Mat. 4, 137150 (in Russian).Google Scholar
Bernstein, S. N. (1940). Sur un problème du schéma des urnes à composition variable. C. R. Acad. Sci. URSS 28, 57.Google Scholar
Devroye, L. (1991). Limit laws for local counters in random binary search trees. Random Structures Algorithms 2, 303315.CrossRefGoogle Scholar
Dunford, N. and Schwartz, J. T. (1958). Linear Operators. Part I: General Theory. Interscience Publishers, Inc., New York.Google Scholar
Eggenberger, F. and Pólya, G. (1923). Über die Statistik verketteter Vorgänge. Z. Angew. Math. Mech. 3, 279289.CrossRefGoogle Scholar
Flajolet, P., Gabarró, J. and Pekari, H. (2005). Analytic urns. Ann. Prob. 33, 12001233.10.1214/009117905000000026CrossRefGoogle Scholar
Holmgren, C. and Janson, S. (2015). Asymptotic distribution of two-protected nodes in ternary search trees. Electron. J. Prob. 20, 120.10.1214/EJP.v20-3577CrossRefGoogle Scholar
Holmgren, C., Janson, S. and Šileikis, M. (2017). Multivariate normal limit laws for the numbers of fringe subtrees in m-ary search trees and preferential attachment trees. Electron. J. Combin. 24, Paper 2.51, 49 pp.CrossRefGoogle Scholar
Hu, F. and Rosenberger, W. F. (2006). The Theory of Response-Adaptive Randomization in Clinical Trials. Wiley-Interscience, Hoboken, NJ.CrossRefGoogle Scholar
Hu, F. and Zhang, L.-X. (2004). Asymptotic normality of urn models for clinical trials with delayed response. Bernoulli 10, 447463.CrossRefGoogle Scholar
Janson, S. (2004). Functional limit theorems for multitype branching processes and generalized Pólya urns. Stoch. Process. Appl. 110, 177245.CrossRefGoogle Scholar
Janson, S. (2005). Limit theorems for triangular urn schemes. Prob. Theory Relat. Fields 134, 417452.CrossRefGoogle Scholar
Janson, S. and Pouyanne, N. (2018). Moment convergence of balanced Pólya processes. Electron. J. Prob. 23, paper no. 34, 13 pp.CrossRefGoogle Scholar
Jiřina, M. (1958). Stochastic branching processes with continuous state space. Czechoslovak Math. J. 8, 292313.CrossRefGoogle Scholar
Johnson, N. L. and Kotz, S. (1977). Urn Models and Their Application. Wiley, New York.Google Scholar
Mahmoud, H. M. (2009). Pólya Urn Models. CRC Press, Boca Raton, FL.Google Scholar
Mahmoud, H. M. and Ward, M. D. (2012). Asymptotic distribution of two-protected nodes in random binary search trees. Appl. Math. Lett. 25, 22182222.CrossRefGoogle Scholar
Markov, A. A. (1917). Sur quelques formules limites du calcul des probabilités. Bull. Acad. Imp. Sci. 11, 177186 (in Russian).Google Scholar
Olver, F. W. J., Lozier, D. W., Boisvert, R. F. and Clark, C. W. (eds) (2010). NIST Handbook of Mathematical Functions. Cambridge University Press. Also available as NIST Digital Library of Mathematical Functions, http://dlmf.nist.gov.Google Scholar
Pólya, G. (1930). Sur quelques points de la théorie des probabilités. Ann. Inst. H. Poincaré 1, 117161.Google Scholar
Pouyanne, N. (2008). An algebraic approach to Pólya processes. Ann. Inst. H. Poincaré Prob. Statist. 44, 293323.CrossRefGoogle Scholar
Savkevitch, V. (1940). Sur le schéma des urnes à composition variable. C. R. Acad. Sci. URSS 28, 812.Google Scholar
Zhang, L.-X., Hu, F. and Cheung, S. H. (2006). Asymptotic theorems of sequential estimation-adjusted urn models. Ann. Appl. Prob. 16, 340369.CrossRefGoogle Scholar