1. Introduction
In [Reference Siegmund and Yakir19], the following urn model was introduced. Balls in an urn are labeled with elements of a finite group G. To update the urn, two balls are drawn (independently and with replacement) and their labels recorded. A new ball with the product of the two labels is then added to the urn. When the initial configuration of the urn contains generators of the group, it is proved that the composition of the urn converges to a uniform distribution on the group (an alternate proof is given in [Reference Abrams, Landau, Landau, Pommersheim and Zaslow1]). A natural extension of the model is to allow labels with values from an infinite group. Specifically, this paper considers the same dynamics for an urn containing integers with the addition operation, and calls such a process a $\mathbb{Z}$-urn.
The behavior of the urn should depend on the initial configuration of balls. Perhaps a natural first question to explore is the behavior of the urn initially started with an additive basis for $\mathbb{Z}$, e.g.
$\{-1, 1 \}$. Figure 1 shows the results from two different simulations of this model. An interesting phenomenon is observed with the starting configuration of
$\{-1, 1 \}$: the values in the urn either become almost all positive or almost all negative. Furthermore, the curve of the histogram appears exponential, reflected on either side of the vertical axis. That is, if
$\mu_n$ is the empirical measure defined by the labels of the n balls in the urn, then
$\mu_n$ converges to some scaled exponential. Another observation from simulations is that the mean of the distribution
$\mu_n$ varies with different trials, i.e. there is not a deterministic limit.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210622133118498-0423:S002190022000090X:S002190022000090X_fig1.png?pub-status=live)
Figure 1: Histograms of two different trials of the urn model with initial configuration $\{-1, 1 \}$, after 5000 balls have been added. Each histogram has 30 bins. The bin width of the left histogram is 100, with mean 356, and the width of the right histogram is 200, with mean
$-737$.
The appropriately rescaled mean of the empirical measure defined by the urn process converges almost surely to some random variable. The distribution of the limiting random variable is dependent on the initial configuration of the urn. The distribution of a random draw from the urn then converges to an exponential. This is our main result, Theorem 1, described in the following section.
1.1. Model and results
Let $\tau_0$ be the number of balls initially started in the urn. For convenience, we begin the time index at
$\tau_0 + 1$, so that at time
$n > \tau_0$ there are exactly n balls in the urn (and the number of additions is
$n - \tau_0$). Let
$X_n \in \mathbb{Z}^d$ denote the label of the nth ball in the urn. When
$d\,{>}\,1$, write
$X_n(i)$ for the ith coordinate. Thus, the urn at time n is given by the set
$\textbf{U}_n = \textbf{U}_0 \cup \{X_{\tau_0 + 1}, \dots, X_n \}$.
The letter $Z_n$ is used to denote a random draw from the urn, that is
$Z_n \sim \mu_n$, where
$\mu_n$ is the empirical distribution
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210622133118498-0423:S002190022000090X:S002190022000090X_eqnU1.png?pub-status=live)
Then $X_{n+1}$ is generated by
$X_{n+1} = Z_n^1 + Z_n^2$, where
$Z_n^1 \stackrel{\mathrm{d}}{=} Z_n^2 \stackrel{\mathrm{d}}{=} Z_n$ are two independent draws from the urn at step n. For convenience, assign an arbitrary indexing to the initial balls
$x \in \textbf{U}_0$, writing them
$X_1, X_2, \dots, X_{\tau_0}$. Let
$S_n = \sum_{i = 1}^n X_i$ be the sum of the labels of balls in the urn at time n. When studying
$Z_n$ and
$X_n$ we focus on the quenched values. That is, there is an implicit underlying urn process, and expectations are calculated conditional on the appropriate step of the urn. For instance, the mean of
$Z_n$ is
$\operatorname{{\mathbb E}}[Z_n \mid \mu_n] = \frac{S_n}{n}$.
Theorem 1. Let $\{\mu_n \}_{n \ge 0}$ be a sequence of empirical measures defined by the
$\mathbb{Z}$-urn with any initial configuration. Suppose
$Z_n \sim \mu_n$ is a random draw from the urn. Let
$A = \lim_{n \to \infty}\break \frac{\operatorname{{\mathbb E}}[Z_n \mid \mu_n]}{n+1}$. Then A exists almost surely,
$\operatorname{\mathbb{P}}(A \neq 0) > 0$, and
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210622133118498-0423:S002190022000090X:S002190022000090X_eqnU2.png?pub-status=live)
where $Z \sim A \cdot \mathrm{Exp}(1)$.
In particular, this result implies that if the urn is started with $\{-1, 1 \}$, the limiting distribution defined by the urn will be supported either on
$(0, \infty)$ or
$(-\infty, 0)$, depending on the result for A. As remarked before, Theorem 1 is a statement about the quenched version of the problem. This means a realization of the urn process
$\{\mu_n \}$ is fixed and the random variable
$Z_n$ is sampled from the urn process at stage n. In contrast, let
$Z_n^*$ denote the annealed model. That is,
$Z_n^*$ is the random variable generated by first generating a new urn
$\mu_n$, and then sampling from
$\mu_n$. In other words,
$Z_n^*$ is the mixture of
$Z_n$ over all realizations of
$\mu_n$. The limiting distribution of
$\frac{Z_n^*}{n}$ is the mixture distribution of exponential multiplied by the random variable A. At the moment, there is not an explicit form for the distribution of A, though we can observe how the expected value of A depends on the initial configuration of the urn.
All results are the same for urns with labels taking values in the real numbers. The motivation for introducing the process for integer-value labels comes from (a) the original desire to extend the model from [Reference Siegmund and Yakir19] to a finitely generated group, and (b) to somewhat simplify the possible starting configurations that can be investigated. In addition, the methods can easily be extended to urns with vector labels. The exponential limit law for $Z_n$, a draw from the urn, follows from a limit law for
$X_n$, the nth ball added to the urn. The key is that the limit of an appropriately scaled version of
$X_n$ satisfies the distributional identity
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210622133118498-0423:S002190022000090X:S002190022000090X_eqn1.png?pub-status=live)
where $U_1, U_2$ are independent and identically distributed (i.i.d.) Uniform([0, 1]) random variables and
$X^1, X^2$ are i.i.d. copies of X. The unique random variable that satisfies this distributional identity is a Gamma with shape 2 and mean
$\operatorname{{\mathbb E}}[X]$. If
$X \in \mathbb{R}^d$ is a vector, then it is interesting to observe that solutions to (1) are of the form
$G \cdot \operatorname{{\mathbb E}}[X]$, where G is a one-dimensional Gamma random variable. That is, each scaled coordinate of
$X_n$ marginally converges to a Gamma distribution, but as a joint process every coordinate converges to a multiple of the same Gamma random variable. The precise statement for urns with vector labels is given in the following theorem.
Theorem 2. Let $\{\mu_n^d \}_{n \ge 0}$ be a sequence of empirical measures defined by the addition urn model on
$\mathbb{Z}^d$. Suppose
$\textbf{X}_n \in \mathbb{Z}^d$ is the nth ball added to the urn. Let
$\textbf{A} = \lim_{n \to \infty} \frac{\operatorname{{\mathbb E}}[\textbf{X}_{n+1} \mid \mu_n]}{2(n+1)}$. Then
$\textbf{A} = (A(1), \dots, A(d)) \in \mathbb{R}^d$ exists almost surely,
$\operatorname{\mathbb{P}}(A(i) \neq 0) > 0$ for all
$i = 1, \dots, d$, and
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210622133118498-0423:S002190022000090X:S002190022000090X_eqnU3.png?pub-status=live)
where $\textbf{X} \sim G \cdot \textbf{A}$, and
$G \sim \textit{Gamma}(2, 1)$.
Remark 1. A simple generalization of the model would be to draw k balls (with replacement) and add a new ball with label the sum of the k drawn balls. For any integer $k \ge 2$, the same techniques used in this paper prove that the nth labeled added
$X_n$ has limiting distribution X satisfying the identity
$X \stackrel{\mathrm{d}}{=} \sum_{i = 1}^k U_i \cdot X^i$, where
$U_i$ are i.i.d. Uniform([0, 1]) and
$X^i$ are i.i.d. copies of X, for
$1 \le i \le k$. Note that when
$k > 2$, a Gamma distribution does not satisfy this distributional identity.
1.2. Related work
Random processes defined using urns can arise is many different settings. For a plethora of example models and applications, see [Reference Johnson and Kotz6] and [Reference Pemantle16]. ‘Pólya-type’ urn models refer to an urn containing different types of balls with some sort of replacement scheme. That is, balls are drawn randomly from the urn and replaced with some number and type of balls that depends on the type of the drawn balls [Reference Mahmoud13].
Urn models are well studied for a finite number of colors (i.e. types of balls); an extended scheme allowing a continuum of colors was proposed in the classic paper by Blackwell and MacQueen [Reference Blackwell and MacQueen4]. More work in this setting has been done in the past few years, and a general infinite-color model was introduced in [Reference Thacker21]. In [Reference Bandyopadhyay and Thacker2], the model on urns containing countably infinite number of colors is proposed, and when the replacement schemes are balanced and associated with bounded increment random walks on the color space (e.g. $\mathbb{Z}^d$), central and local limit theorems for the random color of the nth selected ball can be proven. This has been extended to almost sure convergence, with assumptions on the replacement scheme [Reference Janson5]. In all of these works, the urns are updated after drawing a single ball. The replacement rule is a function of a single variable of the color space.
Much less work has been done for models in which multiple balls are drawn at each stage. Several specific cases have been analyzed, for example in Chapter 10 of [Reference Mahmoud13]. The first general results for two-color urns were contained in [Reference Kuba, Mahmoud and Panholzer9,Reference Kuba and Mahmoud10]. The results were extended to arbitrary (but finite) d-color urns, with an additional assumption removed, in [Reference Lasmar, Mailler and Selmi11]. The method used in [Reference Lasmar, Mailler and Selmi11] is stochastic approximation, which is useful for a broad class of urn models. Though stochastic approximation processes can be defined for infinite-dimensional random processes, it is challenging to apply to the $\mathbb{Z}$-urn because new types are introduced as the process progresses. To the best of the author’s knowledge, the
$\mathbb{Z}$-urn is the first model to be analyzed that involves both multiple drawings and an infinite type of balls.
Motivation for such an urn model comes from philosophy: a reinforcement learning process for a signaling game which models the evolution of a simple language [Reference Skyrms20]. The reinforcement learning can be viewed as an interacting Pólya-type urn system, and each step involves multiple drawings. In [Reference McKenzie Alexander, Skyrms and Zabell12], the reinforcement procedure is extended to include the feature of invention, in which new strategies are introduced as time progresses, and so the urn model contains an infinite type of balls. Mathematical analysis of the convergence of this process remains incomplete.
1.3. Outline
In Section 2, a martingale is used to prove that the rescaled mean converges. Though the rescaled mean is not bounded, the second moment can be bounded, thus applying Doob’s $L_2$ martingale convergence theorem. In addition, recursive arguments prove that all moments of the limiting random variable exist. In Section 3, the contraction method is outlined and a recursive distributional equation is used to prove Theorems 1 and 2. The final section contains conclusions, future questions, and acknowledgments.
2. A martingale for the mean
In this section, a scaled version of the mean of the empirical distribution determined by the urn is studied. Specifically, recall that $S_n = \sum_{i = 0}^n X_i$ is the sum of the labels of the first n balls in the urn, and let
$A_n \,{:}\,{\raise-1.5pt{=}}\, \frac{S_n}{n(n+1)} \in \mathbb{R}^d$.
Lemma 1. The process $\{A_n \}_{n \ge \tau_0}$ is a martingale and
$ 0 < \liminf_{n \ge \tau_0} \operatorname{{\mathbb E}}[A_n(i)^2] \le \limsup_{n \ge \tau_0} \operatorname{{\mathbb E}}[A_n(i)^2] < \infty $ for
$1 \le i \le d$. Hence,
$A_n$ converges almost surely and in
$L_2(\mathbb{R}^d)$.
Proof. Conditional on the current configuration of the urn, the expectation of the $(n + 1)$th ball added is
$\operatorname{{\mathbb E}}[X_{n+1} \mid \mu_n] = 2 \cdot \operatorname{{\mathbb E}}[Z_n \mid \mu_n] = \frac{2 S_n}{n}$,
$n \ge \tau_0$. Then,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210622133118498-0423:S002190022000090X:S002190022000090X_eqnU4.png?pub-status=live)
and thus $\{ A_n \}_{n \ge \tau_0}$ is a martingale. Now, the annealed expectation of the sum
$S_n$ is
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210622133118498-0423:S002190022000090X:S002190022000090X_eqnU5.png?pub-status=live)
for $n > \tau_0$. Note that, for a fixed n, each coordinate
$S_n(i)$ is almost surely bounded. However, the size of the largest possible ball in the urn is growing exponentially. To prove convergence of the martingale
$A_n$, we can prove that each coordinate
$A_n(i)$ is
$L_2$ bounded. For the following, omit writing the index so that each quantity is one dimensional. Define
$R_n \,{:}\,{\raise-1.5pt{=}}\, S_n^2 = \sum_{i, j = 1}^n X_i X_j$, and also define
$Q_n \,{:}\,{\raise-1.5pt{=}}\, \sum_{i = 1}^n X_i^2$, for the purpose of writing
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210622133118498-0423:S002190022000090X:S002190022000090X_eqnU6.png?pub-status=live)
Using this, calculate the conditional expectation
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210622133118498-0423:S002190022000090X:S002190022000090X_eqnU7.png?pub-status=live)
Similarly, we can compute
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210622133118498-0423:S002190022000090X:S002190022000090X_eqnU8.png?pub-status=live)
After taking expectations, for $n > \tau_0$ we get the system of recursions
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210622133118498-0423:S002190022000090X:S002190022000090X_eqnU9.png?pub-status=live)
This recursion works to prove that $\operatorname{{\mathbb E}}[R_n] \le C n^4$ and
$\operatorname{{\mathbb E}}[Q_n] \le 2 Cn^3$, for some constant C. Indeed, assume for induction the statement is true for n. Then,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210622133118498-0423:S002190022000090X:S002190022000090X_eqnU10.png?pub-status=live)
and also
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210622133118498-0423:S002190022000090X:S002190022000090X_eqnU11.png?pub-status=live)
The induction can be initialized for $C = R_{\tau_0}$.
Note that $A_n^2 = R_n/\big(n^2(n + 1)^2\big)$, thus
$\limsup_{n \ge 0} \operatorname{{\mathbb E}}[A_n^2] \le C$, and so by the
$L_2$ martingale convergence theorem (e.g. [22, p. 111])
$A_n$ converges almost surely and in
$L_2$. To prove that
$\inf_{n \ge \tau_0} \operatorname{{\mathbb E}}[A_n^2] > 0$, observe that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210622133118498-0423:S002190022000090X:S002190022000090X_eqnU12.png?pub-status=live)
Since $\sum_{k = 0}^\infty 2/(k^2 + 4k + 4) < \infty$, the infinite product converges to a positive value.
The previous result states that the limit $\lim_{n \to \infty} A_n \eqqcolon\ A$ exists almost surely for each realization of the urn process, and furthermore since
$\operatorname{{\mathbb E}}[A^2] > 0$ the limit is non-trivial. Note that this result holds for any initial configuration of the urn. The exact distribution of the random variable A is undetermined.
3. Recursive distributional equations
The contraction method is a tool in the probabilistic analysis of algorithms for which a recursive distributional equation exists. It was developed in [Reference RÖsler18] to analyze the number of comparisons required by the Quicksort sorting algorithm. A more generalized development was presented in [Reference Rachev and Rüschendorf17]. More limit theorems, along with numerous applications to recursive algorithms and random trees, were discussed in [Reference Neininger and Rüschendorf15]. The contraction method has been used to prove convergence in distribution of general Pólya-type urns [Reference Knape and Neininger7] with a finite number of colors, using recursive distributions for the number of balls of each color, started from some initial configuration.
The idea of the contraction method is to find a fixed-point equation from the recursive distributional equations. Just as in a deterministic recursion, one expects the sequence to converge to the fixed point of the transformation. However, in the setting of random variables care must be taken in choosing a metric in which the recursive equation is a contraction. The correct metric is introduced in Section 3.1, which also contains the necessary distributional identities. The main contraction method proof is in Section 3.2, and it is applied in Section 3.3 with the correct scaling for the $\mathbb{Z}$-urn.
3.1. Setup
Let $\mathcal{M}^{\mathbb{R}^d}$ denote the space of all probability distributions on
$\mathbb{R}^d$ with the Borel
$\sigma$-field. For a vector
$\textbf{x} \in \mathbb{R}^d$, let
$\|\textbf{x} \|$ denote the usual Euclidean norm. Let
$\mu \in \mathbb{R}^d$, and define the following subspaces:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210622133118498-0423:S002190022000090X:S002190022000090X_eqnU13.png?pub-status=live)
For $1 \le p < \infty$ and two probability distributions
$\nu, \rho \in \mathcal{M}^{\mathbb{R}^d}_p$, the minimal
$L_p$ (or Wasserstein
$L_p$) metric,
$\ell_p$, is defined as
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210622133118498-0423:S002190022000090X:S002190022000090X_eqn2.png?pub-status=live)
where $\|V - W \|_p \,{:}\,{\raise-1.5pt{=}}\, ( \operatorname{{\mathbb E}}[\|V - W \|^p] )^{1/p}$ is the usual
$L_p$-distance. For random variables X, Y, we will write
$\ell_p(X, Y)$ to mean
$\ell_p(\mathcal{L}(X), \mathcal{L}(Y))$. The infimum in (2) is obtained for all
$\nu, \rho \in \mathcal{M}^\mathbb{R}_1$, and the random variables
$V \sim \nu$,
$W \sim \rho$ which achieve the infimum are called the optimal coupling. For proofs of this fact and the following lemma, see [Reference Bickel and Freedman3, Section 8].
Lemma 2. Suppose $\{X_n \}$ is a sequence of random variables in
$\mathcal{M}^{\mathbb{R}^d}_p$ and
$X \in \mathcal{M}^{\mathbb{R}^d}_p$. Then
$\ell_p(X_n, X) \to 0$ if and only if
$X_n \to X$ weakly and
$\operatorname{{\mathbb E}}[\|X_n\|^p] \to \operatorname{{\mathbb E}}[\|X \|^p]$.
The contraction method gives a distributional identity that the limiting random variable must satisfy. For the purpose of characterizing the limiting random variable, note the following distributional identity that exponential distributions satisfy. This can easily be calculated, or see [Reference Kotz and Steutel8].
Lemma 3. Let Y be a random variable satisfying $Y \ge 0$ with probability 1 and
$Y \stackrel{\mathrm{d}}{=} U \cdot\break (Y^{1} + Y^{2})$, where U is uniformly distributed on [0, 1] and
$Y^{1} \stackrel{\mathrm{d}}{=} Y^2 \stackrel{\mathrm{d}}{=} Y$, and the three random variables
$U, Y^1, Y^2$ are independent. Then Y has an exponential distribution.
In addition, we need the following characterization of a random vector in which each coordinate is a multiple of the same one-dimensional gamma random variable.
Lemma 4. Let $\textbf{X} \in \mathbb{R}^d$ be a random vector satisfying
$\textbf{X}(i) \ge 0$ with probability 1 for all
$1 \le\break i \le d$ and
$\textbf{X} \stackrel{\mathrm{d}}{=} U_1 \cdot \textbf{X}^1 + U_2 \cdot \textbf{X}^2$, where
$U_1, U_2$ are Uniform([0, 1]) random variables and
$\textbf{X}^1 \stackrel{\mathrm{d}}{=} \textbf{X}^2 \stackrel{\mathrm{d}}{=} \textbf{X}$, and
$U_1$,
$U_2$,
$\textbf{X}^1$, and
$\textbf{X}^2$ are independent. Suppose
$\textbf{m} = \operatorname{{\mathbb E}}[\textbf{X}] \in \mathbb{R}^d$. Then
$\textbf{X} \sim G \cdot \textbf{m}$, where
$G \sim Gamma(2, 1)$.
Proof. To determine the distribution of $\textbf{X}$, we can calculate the joint characteristic function. For
$\textbf{t} = (t_1, \dots, t_d) \in \mathbb{R}^d$, the result is
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210622133118498-0423:S002190022000090X:S002190022000090X_eqnU14.png?pub-status=live)
The function $\varphi_{\textbf{X}}$ also satisfies
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210622133118498-0423:S002190022000090X:S002190022000090X_eqnU15.png?pub-status=live)
The unique solution to this integral equation is
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210622133118498-0423:S002190022000090X:S002190022000090X_eqn3.png?pub-status=live)
Equation (3) is exactly the characteristic function for the random vector $G \cdot \textbf{A}$, where G is a
$\text{Gamma}(2, 1)$ random variable.
3.2. Contraction method result
To write down the defining recursive distributional equation for the $\mathbb{Z}$-urn, recall that the time index is started at
$\tau_0$. If the initial configuration is
$\textbf{U} = \{x_1, \dots, x_{\tau_0} \}$, define
$X_1 = x_1$,
$X_2 = x_2$, …,
$X_{\tau_0} = x_{\tau_0}$. The newly added ball
$X_n$ is the sum of two randomly chosen balls that were added in the past. This gives the distributional identity
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210622133118498-0423:S002190022000090X:S002190022000090X_eqn4.png?pub-status=live)
where, for $i = 1, 2$,
$\{X_n^i \}$ are i.i.d. copies of
$\{X_n \}$ and
$I_n^i$ are indices drawn uniformly from
$\{1, \dots, n - 1 \}$. To prove convergence, it is necessary to scale
$X_n$ appropriately. Define
$\tilde{X}_n = \frac{X_n}{n}$. Then (4) becomes
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210622133118498-0423:S002190022000090X:S002190022000090X_eqn5.png?pub-status=live)
where $B_{I_n^i} = \frac{I_n^i}{n}$. The correct fixed-point equation from this recursion arises from the fact that
$\frac{I_n^i}{n}$ converges to a
$\text{Uniform}([0, 1])$ random variable in the
$\ell_2$ metric. This can be proven by constructing the variables as follows. Let U be
$\text{Uniform}([0, 1])$ and define
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210622133118498-0423:S002190022000090X:S002190022000090X_eqnU16.png?pub-status=live)
Then $I_n \sim \text{Uniform}(\{1, \dots, n - 1\})$. Clearly,
$\frac{I_n}{n} \to U$ weakly, and since the second moment converges to
$1/3$, the convergence is in
$\ell_2$ by Lemma 2. The following is a general result about random variables satisfying the distributional recursion (5). The result is a special case of Theorem 4.1 and Corollary 4.2 from [Reference Neininger14], though since the proof is short it is included for completeness.
Theorem 3. Let $\{\tilde{X}_n \}_{n \ge 1}$ be a sequence with
$\tilde{X}_n \in \mathcal{M}_2^{\mathbb{R}^d}$ and
$\lim_{n \to \infty} \operatorname{{\mathbb E}}[\tilde{X}_n] = \mu$. Let
$\{B_n \}_{n \ge 1}$ be a sequence with
$B_{n} \in \mathcal{M}_2^\mathbb{R}$ and
$\operatorname{{\mathbb E}}[B_n] \to 1/2$, and if
$I_n \sim {\rm Uniform}(\{1, \dots, n-1 \})$, then the sequence
$\{B_{I_n} \}_{n \ge 1}$ converges in
$\ell_2$ to a
${\rm Uniform}([0, 1])$ variable.
For $i = 1, 2$, let
$I_n^i$ be independent
${\rm Uniform}(\{1, \dots, n-1 \})$ random variables,
$\{B_n^i \}$ independent copies of
$\{B_n \}$, and
$\{\tilde{X}_n^i \}$ independent copies of
$\{X_n \}$. Suppose that, for all
$n \ge \tau_0$,
$\tilde{X}_n \stackrel{\mathrm{d}}{=} B_{I_n^1}^1 \cdot \tilde{X}_{I_n^1}^1 + B_{I_n^2}^2 \cdot \tilde{X}_{I_n^2}^2$. Then
$\tilde{X}_n$ converges in distribution to a random variable
$X \in \mathcal{M}_2^{\mathbb{R}^d}(\mu)$ which satisfies
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210622133118498-0423:S002190022000090X:S002190022000090X_eqn6.png?pub-status=live)
where $U_1, U_2$ are
${\rm Uniform}([0, 1])$ random variables,
$X \stackrel{\mathrm{d}}{=} X^1 \stackrel{\mathrm{d}}{=} X^2$, and
$U_1$,
$U_2$,
$X^1$, and
$X^2$ are independent.
Proof. The result is proven for $d = 1$, the multidimensional result being analogous. Let X be a random variable satisfying (6). Let
$(B_{I_n^1}, U_1)$ and
$(B_{I_n^2}, U_2)$ be the optimal couplings, such that
$\ell_2^2(B_{I_n^i}, U_i) = \operatorname{{\mathbb E}}[ (B_{I_n^i} - U_i)^2] \to 0$. Define
$d_n = \ell_2^2(\tilde{X}_n, X)$. Then
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210622133118498-0423:S002190022000090X:S002190022000090X_eqn7.png?pub-status=live)
Observe that the assumptions imply, for $I_n \sim \text{Uniform}(\{1, \dots, n - 1 \})$, that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210622133118498-0423:S002190022000090X:S002190022000090X_eqnU17.png?pub-status=live)
This, along with the independence of the variables involved, gives that the second term in (7) converges to 0 as $n \to \infty$. Now, expanding the first term, and disregarding superscripts,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210622133118498-0423:S002190022000090X:S002190022000090X_eqn8.png?pub-status=live)
Note that $\operatorname{{\mathbb E}} \left[ X^2 (B_{I_n} - U)^2 \right] = \operatorname{{\mathbb E}}[X^2] \cdot \operatorname{{\mathbb E}} \left[ (B_{I_n} - U)^2 \right] \to 0$ by construction. Also, since
$\operatorname{{\mathbb E}}[B_{I_n}(B_{I_n} - U)] \to 0$, the final term in (8) converges to 0 as
$n \to \infty$. Thus, the final term to analyze is
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210622133118498-0423:S002190022000090X:S002190022000090X_eqnU18.png?pub-status=live)
This implies that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210622133118498-0423:S002190022000090X:S002190022000090X_eqnU19.png?pub-status=live)
This gives a contradiction unless $\limsup_{n \to \infty} d_n = 0$. Since
$d_n \ge 0$, this proves that
$\lim_{n \to \infty} d_n = 0$, which is the desired convergence.
Remark 2. Theorem 3 can also be proven by applying [Reference Rachev and Rüschendorf17, Theorem 4.1] with the Zolotarev metric $\zeta_2$. For
$\nu, \rho \in \mathcal{M}^\mathbb{R}$, the Zolotarev distance
$\zeta_s$,
$s \ge 0$, is defined as
$\zeta_s(X, Y) \,{:}\,{\raise-1.5pt{=}}\, \zeta_s(\mathcal{L}(X),\break \mathcal{L}(Y)) \,{:}\,{\raise-1.5pt{=}}\, \sup_{f \in \mathcal{F}_s} | \operatorname{{\mathbb E}}[ f(X) - f(Y)] |$, where
$s = m + \alpha$ with
$0 < \alpha \le 1$,
$m \in \mathbb{N}_0$, and
$\mathcal{F}_s \,{:}\,{\raise-1.5pt{=}}\, \{f \in C^m(\mathbb{R}, \mathbb{R})\,{:}\, |f^{(m)}(x) - f^{(m)}(y) | \le |x - y|^\alpha \}$. The use of Zolotarev metrics in contraction methods was developed in [Reference Rachev and Rüschendorf17] because in some settings the recursive distributional equations do not give a strict contraction in
$\ell_p$. Most important of these settings is fixed-point equations that occur for normal distributions, which can be handled by
$\zeta_s$ for
$s > 2$.
3.3. Limit laws
Now, Theorem 3 can be applied to prove the result about the $\mathbb{Z}$-urn. The one technicality is that Theorem 3 requires that the random variables in question have the same mean. If
$X_n$ was just defined as the nth ball added to the urn, then this would not be true. Thus, it is necessary to rescale by the mean of
$X_n$, and use the result from Section 2 that the mean converges. It is important to remember the quenched setting; all variables are conditional on the same underlying urn process.
Proof of Theorems 1 and 2. We again restrict to a one-dimensional process. Let $\{ \mu_n \}_{n \ge \tau_0}$ be a realization of the
$\mathbb{Z}$-urn process. For
$n > \tau_0$, let
$X_n$ denote the nth ball added to the urn, conditional on the urn
$\mu_{n-1}$. Recall that
$\operatorname{{\mathbb E}}[X_n \mid \mu_{n-1}] = 2 n \cdot A_{n-1}$, where
$A_n \,{:}\,{\raise-1.5pt{=}}\, \frac{S_n}{n(n+1)}$. Thus, define
$\tilde{X}_n \,{:}\,{\raise-1.5pt{=}}\, \frac{X_n}{n}$; then
$\lim_{n \to \infty} \operatorname{{\mathbb E}}[X_n \mid \mu_{n-1}] = 2 \lim_{n \to \infty} A_n = 2 A$ exists by Lemma 1. The recursive distributional equation satisfied by
$\tilde{X}_n$ is
$\tilde{X}_n \stackrel{\mathrm{d}}{=} \frac{I_n^1}{n} \cdot \tilde{X}_{I_n^1}^1 + \frac{I_n^2}{n} \cdot \tilde{X}_{I_n^2}^2$ for
$I_n^i$,
$i = 1, 2$, i.i.d.
$\text{Uniform}(\{1, \dots, n -1 \})$. As remarked before Theorem 3, the random variables
$\frac{I_n^i}{n}$ converge in
$\ell_2$ to the
$\text{Uniform}([0, 1])$ distribution. Thus, taking
$B_{I_n}^i = I_n^i/n$, Theorem 3 can then be applied to get that
$\tilde{X}_n$ converges in distribution to some X with
$\operatorname{{\mathbb E}}[X] = 2 \cdot A$ which satisfies (6). Finally, Lemma 4 says that the distribution of
$X/A$ must be Gamma with shape 2. Now, recalling that
$Z_n$ denotes a draw from the urn
$\mu_n^d$, we have the distributional identity
$Z_n \stackrel{\mathrm{d}}{=} X_{I_n}$, where
$I_n$ is a uniformly chosen index from
$\{1, \dots, n \}$. Again scaling
$\tilde{Z}_n = \frac{Z_n}{n}$, we get
$\tilde{Z}_n \stackrel{\mathrm{d}}{=} \frac{I_n}{n} \cdot \tilde{X}_{I_n}$. Since
$\tilde{X}_{I_n}/A$ converges in
$\ell_2$ to
$X \sim \text{Gamma}(2, 1)$, this proves that
$\tilde{Z}_n$ converges in distribution to
$U \cdot X$, where
$U \sim \text{Uniform}([0, 1])$. By Lemma 3, it follows that the limit point has exponential distribution.
4. Conclusion and future questions
This paper has introduced a new urn model, extending the product urn model from [Reference Siegmund and Yakir19] for finite groups to the integers or real numbers. Among the vast collection of urn models, the $\mathbb{Z}$-urn is interesting because it contains two properties which are challenging to study: drawing more than one ball, and an infinite number of types of balls. It is proven that the rescaled empirical distribution defined by the urn converges to a multiple of an exponential distribution. The multiple, determined by the mean of the limiting urn, is a random variable; the main remaining question is the distribution of this random variable.
Recall the notation $S_n = \sum_{i = 1}^n X_n$ for the sum of the labels in the urn at step n, and
$A_n = \frac{S_n}{n(n+1)}$. The process
$\{A_n \}$ is a martingale and converges almost surely to some limit A. The distribution of A can be explored using simulations. From Section 2, the mean of A is
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210622133118498-0423:S002190022000090X:S002190022000090X_eqnU20.png?pub-status=live)
That is, A depends on both the sum of the initial labels in the urn, and the number $\tau_0$ of balls started. Figure 2(a) shows a histogram for the result for A for an urn started with a ball labeled
$-1$ and a ball labeled 1. The distribution is of course centered around 0, but it is not normal. Figure 2(b) exhibits the results for A for urns started with two 1 labels. The mean in this case is
$2/(2 \cdot 3) = 1/3$. A
$\chi^2$ test for the null hypothesis that a Gamma distribution fits the data resulted in a p-value
$<0.001$, indicating that Gamma is not a good fit. It would be interesting to explore this distribution in future work, though even proving it is continuous seems to be a challenge.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210622133118498-0423:S002190022000090X:S002190022000090X_fig2.png?pub-status=live)
Figure 2: Results for A from simulations of the $\mathbb{Z}$-urn model from two different initial configurations. An urn realization was run for 5000 rounds and the value of
$A_n$ was recorded. A total of 5000 realizations of each urn process were run to display the empirical distribution of A.
Another interesting problem would be to study the model for different infinite groups, for example $\mathbb{R}$ with the multiplication operation, or the Heisenberg group. The result of an exponential limit law is very specific to the addition process, so the limits for different groups and group operations could be curious.
Acknowledgements
The author thanks Persi Diaconis for many discussions, Brian Skryms for introducing motivation to study urn models, and David Siegmund, Svante Janson, and Robin Pemantle for helpful comments. She also thanks an anonymous referee for careful reading and suggestions for interesting generalizations. The author is supported by a National Defense Science & Engineering Graduate Fellowship.