1. Introduction
In this paper we derive upper bounds for the Kolmogorov and Wasserstein distances between a mixture of normal distributions and a normal distribution with properly chosen parameter values. Here, a random variable X is said to have a mixture of normal distributions if there exists a $\sigma$ -algebra $\mathscr{G}$ such that the conditional distribution of X given $\mathscr{G}$ is normal. Also, for comparison and completeness, lower bounds for both distances are derived.
To see why this is of interest, suppose that a random sequence $\{X_n;n=0,1,\ldots\}$ converges in distribution to a normal random variable Z. If $\mathscr{L}\,(Z)$ is used instead of $\mathscr{L}\,(X_n)$ for the (approximate) computation of the expectation $\mathbb{E}(h(X_n))$ , where $h\,:\,\mathbb{R}\to\mathbb{R}$ is a measurable function, an approximation error $\mathbb{E}(h(X_n))-\mathbb{E}(h(Z))$ is incurred, about which the limit theorem per se gives no information. In order to control this error, it is natural to use a metric on the space of probability measures on $(\mathbb{R},\mathscr{R})$ , and try to bound the distance between $\mathscr{L}\,(X_n)$ and $\mathscr{L}\,(Z)$ . A common choice is the Kolmogorov distance, which is defined for any two random variables X and Z with probability distributions $\mu_1$ and $\mu_2$ by
Another possibility is the Wasserstein distance, defined by
where $\mathcal{H}_1$ is the class of Lipschitz functions with Lipschitz constant bounded by 1.
In Section 2, we derive bounds for both distances between the probability distribution of a random variable X which has a mixture of normal distributions, and a normally distributed random variable (Theorems 2.1 and 2.2). The bounds depend only on the first two moments of the first two conditional moments given the ‘mixing’ $\sigma$ -algebra. The main tool used is Stein’s method, a powerful technique introduced in [Reference Stein20]. At the core of this method is a functional equation called the Stein equation,
where $\Phi$ is the cumulative distribution function of the $\text{N}(0,1)$ distribution. By taking expectations with respect to $\mathscr{L}\,(X)$ on both sides, and using analytical properties of the solution function f, bounds can be obtained for the Kolmogorov distance between $\mathscr{L}\,(X)$ and $\text{N}(0,1)$ . While this is easiest if X is a sum of locally dependent random variables, the use of couplings and other special devices has made it possible to handle many other situations. There are also extensions of the method which allow for other approximating distributions to be used, such as Poisson and compound Poisson distributions and multivariate normal distributions. Since its introduction, the number of applications of the method has grown very large. For more details and many examples, see Barbour and Chen [Reference Barbour and Chen3, Reference Barbour and Chen4], and the references therein.
In the second part of the paper we apply the obtained results to branching Ornstein–Uhlenbeck processes. A one-dimensional Ornstein–Uhlenbeck (OU) process is a stochastic process that follows a linear stochastic differential equation of the form
where $\alpha,\sigma_a >0$ , and $\{W(t);\ t\geq 0\}$ is a standard Wiener process. In the subfield of evolutionary biology called phylogenetic comparative methods, processes like (1.1) are used for modelling the evolution of phenotypic traits, such as body size, at the between-species level, in the following way: an Ornstein–Uhlenbeck process evolves on top of a possibly random phylogenetic tree, by which we mean a (random) directed acyclic graph with weights on edges that correspond to edge length, and nodes corresponding to the branching events in the tree; see Figure 1. In the Yule–Ornstein–Uhlenbeck (YOU) model, which we consider here, each speciation (i.e. branching) point is binary, and the edge lengths are independent exponentially distributed random variables. This so-called pure-birth tree is stopped just before the nth speciation event; i.e., it has n leaves (or tips). Without loss of generality we fix the birth rate to 1. Varying the birth rate will only have the effect of rescaling time and will not add anything substantial to our results.
In the YOU model, along each edge (i.e. branch) the process describing the phenotypic trait behaves as defined by (1.1). Then, at a speciation point the process splits into as many copies as there are descendant branches. At the start of each descendant branch the process starts with the value at which the ancestral branch ended (the starting value is the same for all descendant branches). From that point onward, on each descendant lineage the processes behave independently.
The YOU model can be further extended by allowing for jumps; see Bokma [Reference Bokma11]. A particular type of jump that can serve as a starting point for mathematical analysis is when a jump takes place just after a speciation event, independently on each descendant lineage, with a probability p that may depend on the speciation event; see Section 4 for more details.
In the context of evolutionary biology, the observed phenotypic data are the values of the process at the tips, $\{X_{i}\}_{i=1}^{n}$ . Of particular interest are central limit theorems for the sample average, $\overline{X}_n$ , or more generally for functionals of the observed data (see e.g. Ren et al. [Reference Ren, Song and Zhang18], Adamczak and Miłoś [Reference Adamczak and Miłoś1], Bartoszek and Sagitov [Reference Bartoszek and Sagitov10], Ané et al. [Reference Ané, Ho and Roch2], Bartoszek [Reference Bartoszek8], and a multitude of other works). If the drift of the OU process is fast enough, then one can show convergence in distribution for $\overline{X}_n$ to a normal limit. However, if the drift is slow, then the dependencies induced by common ancestry persist and statements about the limit are more involved. The above was shown for the YOU model in [Reference Bartoszek and Sagitov10], while the YOU model with normally distributed jumps was considered in [Reference Bartoszek8]. In the slow drift regime one can show $L^{2}$ convergence (see e.g. [Reference Adamczak and Miłoś1], [Reference Bartoszek8], [Reference Bartoszek and Sagitov10]). However, so far there is no complete characterization of the limit in this case.
In Sections 3 and 4 of the present paper, we extend the central limit theorems for $\overline{X}_n$ by giving bounds for the Kolmogorov and Wasserstein distances between the distribution of $\overline{X}_n$ and properly chosen normal distributions (Theorems 3.1, 3.2, 4.1, and 4.2), which converge weakly to the limiting normal distributions of [Reference Bartoszek and Sagitov10] and [Reference Bartoszek8] as $n\to\infty$ . The key observation is that conditional on the tree (and the locations of jumps), $\overline{X}_n$ is a linear combination of normally distributed random variables, which makes it possible to apply Theorems 2.1 and 2.2. One needs to compute the first two moments of the conditional expectation and variance of $\overline{X}_n$ , which requires a careful analysis of the random quantities involved, e.g., the heights in the tree and speciation events along lineages, but a considerable part of this work was done in [Reference Bartoszek and Sagitov10] and [Reference Bartoszek8] and can be reused here.
Lastly, in the appendix, for the sake of comparison and completeness, we state and prove lower bounds for both distances between the probability distribution of a random variable X which has a mixture of normal distributions, and a normally distributed random variable. The proof is based on ideas in Barbour and Hall [Reference Barbour and Hall5].
2. Normal approximation for mixtures of normal distributions
A metric $d(\cdot,\cdot)$ on the space of probability measures on a measurable space $(\Omega,\mathscr{F})$ is called an integral probability metric (see Müller [Reference Müller15]) if
where $\mathcal{H}$ is a class of measurable functions $h\,:\,\Omega\to\mathbb{R}$ called the generating class. Our interest is in two integral probability metrics on the space of probability measures on $(\mathbb{R},\mathscr{R})$ : the Kolmogorov distance $d_K$ , for which $\mathcal{H}$ is the set of indicator functions of half-lines, $\mathcal{H}_0 = \{I_{(-\infty,z]}({\cdot});\ z\in\mathbb{R}\}$ , and the Wasserstein distance $d_W$ , for which $\mathcal{H}$ is the set $\mathcal{H}_1$ of Lipschitz functions with Lipschitz constant bounded by 1. It is well known that for sequences of probability measures on $(\mathbb{R},\mathscr{R})$ , convergence in either distance implies the usual weak convergence; see Section 4 in [Reference Müller15].
Also, the Kolmogorov distance is scale-invariant (and location-invariant), in the sense that
for any pair of random variables X and Y. This follows from (2.1) and the fact that
The Wasserstein distance is not scale-invariant, but has the property
which follows from (2.1) and the fact that for each $\mu\in\mathbb{R}$ , $\sigma>0$ , the mapping $\xi\,:\,\mathcal{H}_1\to\mathcal{H}_1$ defined by
is a bijection.
Our main results are contained in Theorem 2.1 (Kolmogorov distance) and Theorem 2.2 (Wasserstein distance).
Theorem 2.1. Let X be a real-valued random variable such that $\mathbb{E}(X^2)<\infty$ , and let $\mathscr{G}$ be a $\sigma$ -algebra such that the regular conditional distribution of X given $\mathscr{G}$ is normal. Then
Proof. The following identity, called the Stein identity for the N(0,1) distribution, was originally derived in [Reference Stein20] (for more information, see Chen and Shao [Reference Chen, Shao, Barbour and Chen12] and the references therein): if Z is any real-valued random variable, then $Z\sim\text{N}(0,1)$ if and only if
where $\mathcal{C}_{bd}$ is the set of continuous, piecewise continuously differentiable functions $f\,:\,\mathbb{R}\to\mathbb{R}$ such that $\mathbb{E}(|f^{\prime}(Z_{0,1})|)<\infty$ if $Z_{0,1}\sim\text{N}(0,1)$ .
Using (2.4), we shall first derive a similar Stein identity for the $\text{N}(\mu,\sigma^2)$ distribution, where $\mu\in\mathbb{R}$ and $\sigma\in(0,\infty)$ : if W is any real-valued random variable, then $W\sim\text{N}(\mu,\sigma^2)$ if and only if
where $\mathcal{C}_{bd}^{\mu,\sigma}$ is the set of continuous, piecewise continuously differentiable functions $g\,:\,\mathbb{R}\to\mathbb{R}$ such that $\mathbb{E}(|g^{\prime}(Z_{\mu,\sigma})|)<\infty$ if $Z_{\mu,\sigma}\sim\text{N}(\mu,\sigma^2)$ . To prove (2.5), we define the random variable Z by $Z=\frac{1}{\sigma}(W-\mu)$ , and note that $Z\sim\text{N}(0,1)$ if and only if $W\sim\text{N}(\mu,\sigma^2)$ . We also define the mapping $T\,:\,\mathcal{C}_{bd}^{\mu,\sigma}\to\mathcal{C}_{bd}$ by $Tg(x) = \sigma g(\sigma x + \mu)$ . T is easily seen to be a bijection with inverse
This yields
which in combination with (2.4) gives (2.5).
We next consider the following functional equation, which we propose to call the Stein equation for the $\text{N}(\mu,\sigma^2)$ distribution. It arises in a natural way from (2.5):
where $z\in\mathbb{R}$ . For each fixed $z\in\mathbb{R}$ , it is clear that a function $g\in\mathcal{C}_{bd}^{\mu,\sigma}$ satisfies (2.6) if and only if the function $f = Tg\in\mathcal{C}_{bd}$ (defined above) satisfies the functional equation
which is the classical Stein equation for the N(0,1) distribution. We obtain from Section 2.1 in [Reference Chen, Shao, Barbour and Chen12] that (2.7) has the solution $f = f_z$ , where
It is also shown in Section 2.2 in [Reference Chen, Shao, Barbour and Chen12] that $f_z$ is bounded, continuous, and continuously differentiable except at $x=z$ . Moreover, $f_z$ satisfies
Therefore, the function $g_z = T^{-1}f_z$ , explicitly given by
is a solution to (2.6). This function $g_z$ is bounded, continuous, and continuously differentiable except at $y=\sigma z+\mu$ , and satisfies
For the remainder of the proof, we define for convenience $\mathcal{C}_{bbd}$ as the set of bounded, continuous, piecewise continuously differentiable functions $g\,:\,\mathbb{R}\to\mathbb{R}$ with bounded derivative. By definition, $\mathcal{C}_{bbd}\subset\mathcal{C}_{bd}^{\mu,\sigma}$ for each $\mu\in\mathbb{R}$ , $\sigma\in(0,\infty)$ , and by (2.8), $g_z\in\mathcal{C}_{bbd}$ for each $z\in\mathbb{R}$ . Recalling that the random variable X has a conditionally normal distribution given $\mathscr{G}$ , we obtain from (2.5) that
Taking expectations and rewriting, this gives
From the definition of Kolmogorov distance and (2.2), it follows that for any $\mu\in\mathbb{R}$ and $\sigma\in(0,\infty)$ ,
If we choose $\mu =\mathbb{E}(X)$ and $\sigma^2 = \mathbb{E}\bigl(\mathbb{V}(X|\mathscr{G})\bigr)$ , we get
using (2.8) and Hölder’s inequality. For the second term on the right-hand side of (2.10), we will use a coupling similar to the one used in the proof of Theorem 1.C in Barbour et al. [Reference Barbour, Holst and Janson6]; the latter theorem deals with Poisson approximations for mixtures of Poisson distributions. First, letting the random variable $Y\sim\text{N}(\mu,\sigma^2)$ be independent of $\mathscr{G}$ , we can write
where $A=\{\sigma^2 \leq\mathbb{V}(X|\mathscr{G})\}$ . For each $\omega\in A$ , we construct a probability space with two independent random variables $Y_1\sim\text{N}(0,\sigma^2)$ and $Y_2\sim\text{N}(0,\mathbb{V}(X|\mathscr{G})-\sigma^2)$ , so that $\mathbb{E}(X|\mathscr{G}) + Y_1+Y_2\sim\text{N}(\mathbb{E}(X|\mathscr{G}),\mathbb{V}(X|\mathscr{G}))$ , and $\mu + Y_1\sim\text{N}(\mu,\sigma^2)$ . Using this coupling, and the fact that $\lVert g^{\prime}_z\rVert = \sup_{x\in\mathbb{R}}|g^{\prime}_z(x)| \leq \frac{1}{\sigma^2}$ , we obtain
Similarly, for each $\omega\in A^c$ , we construct a probability space with two independent random variables $\widehat Y_1\sim\text{N}(0,\mathbb{V}(X|\mathscr{G}))$ and $\widehat Y_2\sim\text{N}(0,\sigma^2-\mathbb{V}(X|\mathscr{G}))$ , so that $\mathbb{E}(X|\mathscr{G}) + \widehat Y_1\sim\text{N}(\mathbb{E}(X|\mathscr{G}),\mathbb{V}(X|\mathscr{G}))$ , and $\mu + \widehat Y_1 + \widehat Y_2\sim\text{N}(\mu,\sigma^2)$ . This gives, after some calculations,
Combining these two bounds, for the second term on the right-hand side of (2.10) we get
Remark 2.1. In the case when $\mathbb{E}(X|\mathscr{G})\equiv m$ and $\mathbb{V}(X|\mathscr{G})\equiv \tau^2$ for deterministic constants $m\in\mathbb{R}$ and $\tau>0$ , meaning that $X\sim\text{N}(m,\tau^2)$ independently of $\mathscr{G}$ , we obtain from (2.10) and (2.8) that
Turning to Theorem 2.2, we define $\mathcal{H}_2$ as the set of all real-valued absolutely continuous functions on $(\mathbb{R},\mathscr{R})$ , by which we mean all functions $h\,:\,\mathbb{R}\to\mathbb{R}$ such that h has a derivative almost everywhere, h ′ is Lebesgue integrable on every compact interval, and
It is well known that any Lipschitz continuous function $h\,:\,\mathbb{R}\to\mathbb{R}$ is absolutely continuous, and that $|h^{\prime}(x)|\leq K$ , where K is the Lipschitz constant, for all $x\in\mathbb{R}$ where h ′(x) is defined. Moreover, as stated above, the Wasserstein distance on the space of probability measures on $(\mathbb{R},\mathscr{R})$ is defined by
where $\mathcal{H}_1$ is the set of all Lipschitz continuous functions with Lipschitz constant bounded by 1.
Theorem 2.2. Let X be a real-valued random variable such that $\mathbb{E}(X^2)<\infty$ , and let $\mathscr{G}$ be a $\sigma$ -algebra such that the regular conditional distribution of X given $\mathscr{G}$ is normal. Then
Proof. The first part of the proof is the same as for Theorem 2.1. However, as a Stein equation for the $\text{N}(\mu,\sigma^2)$ distribution, instead of (2.6) we use
where $h\in\mathcal{H}_1$ , and $Z_{0,1}\sim\text{N}(0,1)$ . For each $h\in\mathcal{H}_1$ , it is clear that a function $g\in\mathcal{C}_{bd}^{\mu,\sigma}$ satisfies (2.11) if and only if the function $f = Tg\in\mathcal{C}_{bd}$ (defined in the proof of Theorem 2.1) satisfies the functional equation
It is shown in [Reference Chen, Shao, Barbour and Chen12] that (2.12) has the solution $f = f_h$ , where
Moreover, for each $h\in\mathcal{H}_1$ , $f_h$ is bounded, has an absolutely continuous derivative, and satisfies
where $\lVert\cdot\rVert$ denotes the (essential) supremum. Therefore, the function $g_h = T^{-1}f_h$ , explicitly given by
is a solution to (2.11) which is bounded, has an absolutely continuous derivative, and satisfies
As in the proof of Theorem 2.1, we define $\mathcal{C}_{bbd}$ as the set of bounded, piecewise continuously differentiable functions $g\,:\,\mathbb{R}\to\mathbb{R}$ with bounded derivative. By (2.13), $g_h\in\mathcal{C}_{bbd}$ for each $h\in\mathcal{H}_1$ . As before, we obtain
By definition, the Wasserstein distance can be expressed as follows:
where, using (2.11) and (2.14),
If we choose $\mu =\mathbb{E}(X)$ and $\sigma^2 = \mathbb{E}\bigl(\mathbb{V}(X|\mathscr{G})\bigr)$ , the second term on the right-hand side of (2.15) can be handled in the same way as in the proof of Theorem 2.1, yielding the bound
For the first term on the right-hand side of (2.15), letting the random variable $Y\sim\text{N}(\mu,\sigma^2)$ be independent of $\mathscr{G}$ , we can write
where $A=\{\sigma^2 \leq\mathbb{V}(X|\mathscr{G})\}$ . We can now use exactly the same coupling as for the second term on the right-hand side of (2.15), together with the fact that $\lVert g^{\prime\prime}_z\rVert \leq \frac{2}{\sigma^3}$ , to obtain, after some calculations,
Remark 2.2. In the case when $\mathbb{E}(X|\mathscr{G})\equiv m$ and $\mathbb{V}(X|\mathscr{G})\equiv \tau^2$ for deterministic constants $m\in\mathbb{R}$ and $\tau>0$ , from (2.15) and (2.13) we obtain
Finally, we point out that it is possible to derive lower bounds for the Kolmogorov and Wasserstein distances under the same assumptions as in Theorems 2.1 and 2.2. Using ideas introduced in [Reference Barbour and Hall5] (see also Chapter 3 in [Reference Barbour, Holst and Janson6]), we state and derive lower bounds in the appendix (Theorem A.1; the bounds for the two distances are identical apart from a constant factor). It can be seen from Theorem A.1 that under mild conditions on the asymptotics of the higher order moments $\mathbb{E}((\mu-\mathbb{E}(X|\mathscr{G}))^4)$ and $\mathbb{E}(|\mathbb{V}(X|\mathscr{G})-\sigma^2|(\mu-\mathbb{E}(X|\mathscr{G}))^2)$ , the upper bounds in Theorems 2.1 and 2.2 leave little room for improvement. In particular, the term
cannot be replaced by another that converges faster to 0. However, the lower bound would allow for
to be replaced by
(times some constant) in the first term, should this turn out to be possible.
3. The Yule–Ornstein–Uhlenbeck model
In order to apply the results in Section 2 to the YOU model, we first need to condition on an appropriate $\sigma$ -algebra, and then obtain formulæ, along with their asymptotic behaviours, for the means and variances of the conditional means and variances. Since the OU process is Gaussian, conditionally on the phylogeny the values of the traits at the n leaves will have an n-dimensional Gaussian distribution. Hence, the natural $\sigma$ -algebra to condition on is the $\sigma$ -algebra generated by the pure-birth tree. For a tree with n leaves, we denote this $\sigma$ -algebra by $\mathcal{Y}_{n}$ . Moreover, we use the following notation: $\Gamma({\cdot})$ is the gamma function, $H_n = 1 + \frac{1}{2} + \ldots + \frac{1}{n}$ , and
Theorem 3.1. Consider the YOU model with $\alpha \ge 1/2$ . Let $\overline{X}_{n}$ be the average value of the traits at the n leaves, let
and let
Let also $\mu_n = \mathbb{E}(\overline{Y}_{n})$ and $\sigma_n^2 = \mathbb{E}(\mathbb{V}(\overline{Y}_{n}|\mathscr{G}))$ .
-
(i) If $\alpha=\frac{1}{2}$ , then
\begin{equation*}d_{K}\left(\mathcal{L}\left(\frac{\overline{Y}_{n}-\mu_n}{\sigma_n}\right),\textrm{N}(0,1)\right)= \text{O}(\ln^{-1} n)\end{equation*}as $n\to\infty$ , where $\mu_n = \delta b_{n,1/2}$ and\begin{equation*}\sigma_n^2 = \frac{1}{n} + \left(1-\frac{1}{n} \right)\left(\frac{2}{n-1}(H_n-1)-\frac{1}{n-1}\right) - b_{n,1}.\end{equation*}Moreover, $(\frac{n}{\ln n})^{1/2}\,\mu_n\to 0$ and $\frac{n}{\ln n}\,\sigma_n^2\to 2$ as $n\to\infty$ , so $(\frac{n}{\ln n})^{1/2}\,\overline{Y}_{n}\ \xrightarrow{\ d\ }\ \text{N}(0,2)$ as $n\to\infty$ . -
(ii) If $\alpha>\frac{1}{2}$ , then
\begin{equation*}d_{K}\left(\mathcal{L}\left(\frac{\overline{Y}_{n}-\mu_n}{\sigma_n}\right),\textrm{N}(0,1)\right) = \begin{cases}\text{O}\big(n^{-2\alpha+1}\big),&\text{$\frac{1}{2}<\alpha<\frac{3}{4}$,}\\[4pt] \text{O}\big(\frac{\ln^{1/2}n}{n^{1/2}}\big),&\text{$\alpha=\frac{3}{4}$,}\\[4pt] \text{O}(n^{-1/2}),&\text{$\alpha>\frac{3}{4}$,}\end{cases}\end{equation*}as $n\to\infty$ , where $\mu_n = \delta b_{n,\alpha}$ , and\begin{equation*}\sigma_n^2 = \frac{1}{n} + \left(1-\frac{1}{n}\right)\left(\frac{2 - (n+1)(2\alpha + 1)b_{n,2\alpha}}{(n-1)(2\alpha-1)}\right) - b_{n,2\alpha}.\end{equation*}Moreover, $n^{1/2}\,\mu_n\to 0$ and $n\sigma_n^2\to\frac{2\alpha+1}{2\alpha-1}$ as $n\to\infty$ , so $n^{1/2}\,\;\overline{Y}_{n}\ \xrightarrow{\ d\ }\ \text{N}(0,\frac{2\alpha+1}{2\alpha-1})$ as $n\to\infty$ .
Theorem 3.2. Consider the YOU model with $\alpha \ge 1/2$ , with the same notation as in Theorem 3.1.
-
(i) If $\alpha=\frac{1}{2}$ , then
\begin{equation*}d_{W}\left(\mathcal{L}\left(\frac{\overline{Y}_{n}-\mu_n}{\sigma_n}\right),\textrm{N}(0,1)\right)= \text{O}(\ln^{-1} n)\end{equation*}as $n\to\infty$ . -
(ii) If $\alpha>\frac{1}{2}$ , then $d_{W}\left(\mathcal{L}\left(\frac{\overline{Y}_{n}-\mu_n}{\sigma_n}\right),\textrm{N}(0,1)\right) = \left\{\begin{array}{l@{\quad}l}\text{O}\big(n^{-2\alpha+1}\big),& \frac{1}{2}\lt \alpha \lt \frac{3}{4},\\[4pt] \text{O}\big(\frac{\ln^{1/4}n}{n^{1/2}}\big),& \alpha = \frac{3}{4},\\[4pt] \text{O}\big(n^{-\min(\alpha-1/4,3/4)}\big),& \alpha > \frac{3}{4},\end{array}\right.$ as $n\to\infty$ .
Proof of Theorems 3.1–3.2. As explained above, the phylogeny is modelled by a pure-birth tree, in which each speciation point is binary, and the edge lengths are independent exponentially distributed random variables with the same rate parameter, called the birth rate. Without loss of generality we take 1 as the birth rate. Then the time between the kth and $(k+1)$ th speciation event, denoted by $T_{k+1}$ , is exponentially distributed with rate $(k+1)$ , as the minimum of $(k+1)$ independent rate-1 exponentially distributed random variables; see Figure 2.
There are two key random components to consider: the height of the tree $(U_{n})$ and the time from the present backwards to the coalescence of a random pair (out of $\binom{n}{2}$ possible pairs) of tip species $(\tau^{(n)})$ . These random variables are illustrated in Figure 2, but see also Figure A.8 in [Reference Bartoszek7] and Figures 1 and 5 in [Reference Bartoszek8].
In order to study the properties of the OU (and, in the next section, OU+jumps) process evolving on a tree, we need expressions for the Laplace transforms of the above random objects that contribute to the mean and variance of the average of the tip values, $\overline{X}_{n}$ . In [Reference Bartoszek and Sagitov10] the following formulæ, including the asymptotic behaviour as $n\to\infty$ , are derived (see Lemmata 3 and 4 in [Reference Bartoszek and Sagitov10]):
The variance of the conditional expectation is derived in Lemma 5.1 in [Reference Bartoszek8] and Lemma 11 in [Reference Bartoszek and Sagitov10]:
We furthermore have
(Lemma 8 in [Reference Bartoszek and Sagitov10]) and
(Lemma 4 in [Reference Bartoszek and Sagitov10]). It remains to consider $\mathbb{V}(\mathbb{V}(\overline{Y}_{n} \vert \mathcal{Y}_{n}))$ . Using (3.5), we obtain
We consider the $\alpha \ge 1/2$ regime. As normality of the limiting distribution was not shown for $\alpha <1/2$ in [Reference Bartoszek and Sagitov10] (and should not be expected; see Remark 3.1 below), there will be no gain from presenting long formulæ for that case. Using (3.1), (3.2), and (3.3) (see also Lemmata 3 and 4 in [Reference Bartoszek and Sagitov10] and Lemma 5.1 in [Reference Bartoszek8]), and, when considering $\mathbb{V}\left( {\mathbb{E}\left( { {e^{-2\alpha \tau^{(n)}}} \vert \mathcal{Y}_{n}} \right)} \right)$ , using the approximation for large n
coming from
we obtain the following asymptotic behaviour as $n\to\infty$ :
where $\zeta_{r}$ is the Riemann zeta function,
Denote now the leading constant of $\mathbb{E}(\mathbb{V}(\overline{Y}_{n} \vert \mathcal{Y}_{n}))$ by $C^{EV}_{a,b}$ , that of $\mathbb{V}(\mathbb{E}(\overline{Y}_{n} \vert \mathcal{Y}_{n}))$ by $C^{VE}$ , and that of $\mathbb{V}(\mathbb{V}(\overline{Y}_{n} \vert \mathcal{Y}_{n}))$ by $C^{VV}_{a,b}$ , where a and b are the endpoints of the interval to which $\alpha$ belongs. If $a=b$ , then we just write $C^{VV}_{a}$ . In our notation we drop the dependence of the constant on $\alpha$ and X(0), treating it as implied. For $\alpha=\frac{1}{2}$ , Theorem 2.1 gives
where $\mu_n$ and $\sigma_n^2$ , as well as their asymptotic behaviour as $n\to\infty$ , can be obtained from (3.1), (3.2), (3.5), and (3.6). It follows immediately from (2.2) and Remark 2.1 that $(\frac{n}{\ln n})^{1/2}\,\overline{Y}_{n}\ \xrightarrow{\ d\ }\ \text{N}(0,2)$ as $n\to\infty$ . Analogously, for $\alpha>\frac{1}{2}$ , Theorem 2.1 gives
We obtain $\mu_n$ and $\sigma_n^2$ , their asymptotic behaviour as $n\to\infty$ , and the fact that $n^{1/2}\,\overline{Y}_{n}\ \xrightarrow{\ d\ }\ \text{N}(0,\frac{2\alpha+1}{2\alpha-1})$ as $n\to\infty$ , just as in the previous case.
For the Wasserstein distance, the first term on the right-hand side of (3.10) should be replaced by
and the first term on the right-hand side of (3.11) should be replaced by
We illustrate the bounds from (3.10) and (3.11) and those for the YOU model with jumps (the YOUj model) in Figure 3.
Remark 3.1. The theorems presented in this section do not give information about the case $\alpha < 1/2$ . However, one can strongly suspect that the limit will not be normal in this case. By considering higher moments of the limiting distribution, Remark $3.14$ of [Reference Adamczak and Miłoś1] showed that when the YOU model is stopped at a fixed time (the number of tips being random) for $\alpha < 1/2$ , the limit is not normal. Unfortunately, when stopping just before the nth speciation event, the approach in [Reference Bartoszek and Sagitov10] does not allow for easy derivation of the higher moments in order to reach the same conclusion as in [Reference Adamczak and Miłoś1].
4. The Yule–Ornstein–Uhlenbeck model with jumps
The new feature of the YOUj model, as compared to the YOU model, is that a normally distributed jump with mean 0 may or may not take place in the trait value immediately after a speciation event. The jumps occur independently of one another and of the OU process, but the probability of a jump, and the variance of the jump, may depend on the number of the speciation event: with speciation event number $i=1,\ldots,n$ , we associate a jump probability $p_{i}$ and jump variance $\sigma_{c,i}^{2}$ . If the jump probabilities and variances are constant, we write $(p_i,\sigma_{c,i}^{2})\equiv (p,\sigma_{c}^{2})$ .
The key problem is that one needs to keep careful track of the jumps that take place at speciation events and of how the ‘mean-reversion’ of the OU process part causes their effect to be smoothed out along a lineage. We keep the notation defined in Section 3, except that we now denote by $\mathcal{Y}_{n}$ the $\sigma$ -algebra that contains information on the whole Yule tree and on the jump locations, i.e. the speciation events after which jumps have taken place. We now introduce the concept of convergence with density 1.
Definition 4.1. A subset $E \subset \mathbb{N}$ of positive integers is said to have density 0 (see e.g. Petersen [Reference Petersen16]) if
where $I_{E}({\cdot})$ is the indicator function of the set E.
Definition 4.2. A sequence $a_{n}$ converges to 0 with density 1 if there exists a subset $E\subset \mathbb{N}$ of density 0 such that
Theorem 4.1. Consider the YOUj model with $\alpha \ge 1/2$ . Let $\overline{X}_{n}$ be the average value of the traits at the n leaves, let
and let
Let also $\mu_n = \mathbb{E}(\overline{Y}_{n})$ and $\sigma_n^2=\mathbb{E}\left( {\mathbb{V}\left( { {\overline{Y}_{n}} \vert \mathcal{Y}_{n}} \right)} \right)$ .
-
(i) If $\alpha=1/2$ , and $(p_i,\sigma_{c,i}^{2})\equiv (p,\sigma_{c}^{2})$ , then
\begin{equation*}d_{K}\left(\mathcal{L}\left(\frac{\overline{Y}_{n}-\mu_n}{\sigma_n}\right),\textrm{N}(0,1)\right) = \text{O}\big(\ln^{-\frac{1}{2}} n\big)\end{equation*}as $n\to\infty$ , where $\mu_n = \delta b_{n,\alpha}$ . Moreover, $(\frac{n}{\ln n})^{1/2}\,\mu_n\to 0$ and\begin{equation*}\frac{n}{\ln n}\sigma_n^2\to 2+\frac{4p}{\sigma_a^{2}}\sigma_{c}^{2}\end{equation*}as $n\to\infty$ , so\begin{equation*}\bigg(\frac{n}{\ln n}\bigg)^{1/2}\,\overline{Y}_{n}\ \xrightarrow{\ d\ }\ \text{N}\bigg(0,2+\frac{4p}{\sigma_a^{2}}\sigma_{c}^{2}\bigg)\end{equation*}as $n\to\infty$ . -
(ii) If $\alpha>1/2$ , and $(p_i,\sigma_{c,i}^{2})\equiv (1,\sigma_{c}^{2})$ , then the asymptotics as $n\to\infty$ for
\begin{equation*}d_{K}\left(\mathcal{L}\left(\frac{\overline{Y}_{n}-\mu_n}{\sigma_n}\right),\textrm{N}(0,1)\right)\end{equation*}is the same as in Theorem 3.1(ii), and $\mu_n = \delta b_{n,\alpha}$ . Moreover, $n^{1/2}\,\mu_n\to 0$ and\begin{equation*}n\sigma_n^2\to\frac{2\alpha+1}{2\alpha-1}\bigg(1 + \frac{2p}{\sigma_a^{2}}\sigma_{c}^{2}\bigg)\end{equation*}as $n\to\infty$ , so\begin{equation*}n^{1/2}\,\overline{Y}_{n}\ \xrightarrow{\ d\ }\ \text{N}\bigg(0,\frac{2\alpha+1}{2\alpha-1}\bigg(1 + \frac{2p}{\sigma_a^{2}}\sigma_{c}^{2}\bigg)\bigg)\end{equation*}as $n\to\infty$ . -
(iii) If $\alpha>1/2$ , and the sequence $p_{n}\sigma_{c,n}^{4}$ is bounded and converges to 0 with density 1, then
\begin{equation*}d_{K}\left(\mathcal{L}\left(\frac{\overline{Y}_{n}-\mu_n}{\sigma_n}\right),\textrm{N}(0,1)\right) \to 0\end{equation*}as $n\to\infty$ .
Theorem 4.2. Consider the YOUj model with $\alpha \ge 1/2$ , with the same notation as in Theorem 4.1.
-
(i) If $\alpha=1/2$ , and $(p_i,\sigma_{c,i}^{2})\equiv (p,\sigma_{c}^{2})$ , then
\begin{equation*}d_{W}\left(\mathcal{L}\left(\frac{\overline{Y}_{n}-\mu_n}{\sigma_n}\right),\textrm{N}(0,1)\right) = \text{O}(\ln^{-3/4} n)\end{equation*}as $n\to\infty$ . -
(ii) If $\alpha>1/2$ , and $(p_i,\sigma_{c,i}^{2})\equiv (1,\sigma_{c}^{2})$ , then the asymptotics as $n\to\infty$ for
\begin{equation*}d_{W}\left(\mathcal{L}\left(\frac{\overline{Y}_{n}-\mu_n}{\sigma_n}\right),\textrm{N}(0,1)\right)\end{equation*}is the same as in Theorem 3.2(ii). -
(iii) If $\alpha>1/2$ , and the sequence $p_{n}\sigma_{c,n}^{4}$ is bounded and converges to 0 with density 1, then
\begin{equation*}d_{W}\left(\mathcal{L}\left(\frac{\overline{Y}_{n}-\mu_n}{\sigma_n}\right),\textrm{N}(0,1)\right) \to 0\end{equation*}as $n\to\infty$ .
Proof of Theorems 4.1 and 4.2. In addition to the random quantities defined in Section 3, we have to consider two more random components of the tree, the speciation events on a random lineage (out of n possible ones), and the speciation events common to (i.e. on the path from the origin of the tree to the most recent common ancestor of) a random pair of tip species (out of $\binom{n}{2}$ possible pairs). We define $\mathbf{1}_{i}$ as a binary random variable indicating that the tree’s ith speciation event is present on our randomly chosen lineage, $\tilde{\mathbf{1}}_{i}$ as a binary random variable indicating that the tree’s ith speciation event is present on the path from the root to the most recent common ancestor of our randomly sampled pair of tips, $Z_{i}$ as a binary random variable indicating that a jump took place just after the tree’s ith speciation event on our randomly chosen lineage, and $\tilde{Z}_{i}$ as a binary random variable indicating that a jump took place just after the tree’s ith speciation event on the path from the root to the most recent common ancestor of our randomly sampled pair of tips. For illustration of these random variables see Figure 2.
Furthermore, we define the two following sequences of random variables:
We can recognize that $\phi^{\ast}_{i}$ and $\phi_{i}$ capture how the effect of each (potential) jump will be modified before the end of the randomly selected lineage is reached. The first quantifies the effects that jumps will have on a randomly selected tip species, while the second quantifies the effects that jumps will have on the covariance between a random pair of tip species. Intuitively speaking, a random event at distance (in time) t away from the point of interest is under the OU process discounted by a factor of $e^{-\alpha t}$ , implying that the contribution of its variance will be discounted by $e^{-2\alpha t}$ .
Recall that with each speciation event, $i=1,\ldots,n$ , we associate the jump probability $p_{i}$ and jump variance $\sigma_{c,i}^{2}$ , and that the jumps are normally distributed with mean 0. In the case when $(p_i,\sigma_{c,i}^{2})\equiv (p,\sigma_{c}^{2})$ , we have the following (the $\alpha \ge 1/2$ regime in the proof of Theorem $3.2$ in [Reference Bartoszek7]):
In the case when $p_{n}\sigma_{c,n}^{4}\to 0$ with density 1 as $n\to\infty$ , we have the following, by Corollaries $5.4$ and $5.7$ in [Reference Bartoszek8]: as $n\to\infty$ , for $\alpha=1/2$ ,
and for $\alpha>1/2$ ,
For the conditional mean and variance of $\overline{Y}_{n}$ , the following formulæ are provided in [Reference Bartoszek8], Lemma 6.1:
In the case when $(p_i,\sigma_{c,i}^{2})\equiv (p,\sigma_{c}^{2})$ , using (3.6), (4.1), and (4.2) we obtain
and as in (3.7), we get
It remains to consider $\mathbb{V}(\mathbb{V}(\overline{Y}_{n} \vert \mathcal{Y}_{n}))$ . We will use Cauchy–Schwarz to obtain an upper bound
As before, we first consider the case when $(p_i,\sigma_{c,i}^{2})\equiv (p,\sigma_{c}^{2})$ . We look at $\mathbb{V}\Big( {\sum\limits_{i=1}^{n-1}\phi^{\ast}_{i}} \Big)$ and consider in more detail the elements I, II, and III in the proof of Lemma 5.3 in [Reference Bartoszek8], to obtain
In the same fashion, for $\mathbb{V}\Big( {\sum\limits_{i=1}^{n-1}\phi_{i}} \Big)$ we consider in more detail the element III in the proof of Lemma 5.5 in [Reference Bartoszek8], and using (3.8) we obtain
The other elements I, II, IV, and V for $\alpha \ge 1/2$ converge faster to 0; hence they do not contribute to the leading asymptotic behaviour. Using (3.1), (3.4), (4.8), and (4.9), we obtain the bound
Just as in Section 3, we denote the leading constant of $\mathbb{E}(\mathbb{V}(\overline{Y}_{n} \vert \mathcal{Y}_{n}))$ by $C^{EV}_{a,b}$ , that of $\mathbb{V}(\mathbb{E}(\overline{Y}_{n} \vert \mathcal{Y}_{n}))$ by $C^{VE}$ , and that of $\mathbb{V}(\mathbb{V}(\overline{Y}_{n} \vert \mathcal{Y}_{n}))$ by $C^{VV}_{a,b}$ , where a and b are the endpoints of the interval to which $\alpha$ belongs. If $a=b$ , then we just write $C^{VV}_{a}$ . If $\alpha=1/2$ and $(p_i,\sigma_{c,i}^{2})\equiv (p,\sigma_{c}^{2})$ , Theorem 2.1 gives
where $\mu_n$ and $\sigma_n^2$ , as well as their asymptotic behaviour as $n\to\infty$ , can be obtained from (3.1), (4.5), and (4.6). Just as in Section 3, it follows that
as $n\to\infty$ . For the Wasserstein distance, the first term on the right-hand side of (4.11) should be replaced by
If $\alpha>1/2$ and $(p_i,\sigma_{c,i}^{2})\equiv (p,\sigma_{c}^{2})$ , where $p<1$ , Theorem 2.1 gives
The bound does not converge to 0 as $n\to\infty$ . The same is true for the Wasserstein distance, where the first term on the right-hand side of (4.13) should be replaced by
However, if $p=1$ , the leading term in (4.7) vanishes, which implies the convergence to 0 in Theorem 4.1(ii). In order to obtain the rate of convergence, we need to look at lower-order terms. They turn out to be the same as for
since in the $\alpha\ge 1/2$ regime all the other terms converge to 0 just as fast (Parts I, IV, and V of Lemma 5.5 in [Reference Bartoszek8]) or faster (cf. Lemmata 5.3 and 5.5 in [Reference Bartoszek8]). Using the convergence rates presented in (3.4), (3.7), and (4.6), we obtain Theorem 4.1(ii) and Theorem 4.2(ii).
Finally, if the sequence $p_{n}\sigma_{c,n}^{4}$ is bounded and converges to 0 with density 1, then by (4.4) we obtain
which implies that $n^{2}\mathbb{V}\left( {\mathbb{V}\left( { {\overline{Y}_{n}} \vert \mathcal{Y}_{n}} \right)} \right)\to 0$ as $n\to\infty$ , by (4.7). This in turn entails convergence of both distances to 0 as $n\to\infty$ , but without any information on the rate. This proves Theorem 4.1(iii) and Theorem 4.2(iii).
Remark 4.1. In the original arXiv preprint (identifier arXiv:1602.05189) for [Reference Bartoszek8], it was stated that convergence to normality in the $\alpha \ge 1/2$ regime will only take place if $\sigma_{c,n}^{4}p_{n} \to 0$ with density 1 and is bounded. However, in (4.11) above we can see that in the critical case, $\alpha=1/2$ , convergence to normality will hold even if $(p_i,\sigma_{c,i}^{2})\equiv (p,\sigma_{c}^{2})$ .
Remark 4.2. The condition $p_{n}\sigma_{c,n}^{4} \to 0$ with density 1 in Theorem 4.1 can be slightly relaxed. Essentially the same results (with possibly different bounds) will hold if $(1-p_{n})p_{n}\sigma_{c,n}^{4} \to 0$ with density 1 with additional assumptions on the jump effects on a randomly chosen lineage and for a random pair of sampled lineages (see Theorem $4.6$ in [Reference Bartoszek8]). However, introducing this here would require a significant amount of additional heavy notation, for no gain in the actual application of Stein’s method to the YOUj model.
Appendix
Theorem A.1. Let X be a real-valued random variable such that $\mathbb{E}(X^2)<\infty$ , and let $\mathscr{G}$ be a $\sigma$ -algebra such that the regular conditional distribution of X given $\mathscr{G}$ is normal. Define $\mu=\mathbb{E}(X)$ , $\sigma^2 = \mathbb{E}(\mathbb{V}(X|\mathscr{G}))$ , and
If the asymptotic behaviour of X is such that $\sigma^{-2}\mathbb{E}\bigl((\mu-\mathbb{E}(X|\mathscr{G}))^4\bigr)$ and $\sigma^{-2}\mathbb{E}\bigl((\sigma^2-\mathbb{V}(X|\mathscr{G}))_+(\mu-\mathbb{E}(X|\mathscr{G}))^2\bigr)$ converge to 0 faster than $\mathbb{V}(\mathbb{E}(X|\mathscr{G}))\Bigl[ = \mathbb{E}\bigl((\mu-\mathbb{E}(X|\mathscr{G}))^2\bigr)\Bigr]$ , and $\sigma^{-2}\mathbb{E}\bigl(|\sigma^2-\mathbb{V}(X|\mathscr{G})|(\mu-\mathbb{E}(X|\mathscr{G}))^2\bigr)$ converges to 0 faster than $\mathbb{E}(\kappa(\mathbb{V}(X|\mathscr{G})))$ , then
where
-
(i) either $d=d_K$ and $C = \int_{-\infty}^\infty|2x^3-5x|e^{-x^2/2}dx$ , or $d=d_W$ and $C = \max_{x\in\mathbb{R}}|2x^3-5x|e^{-x^2/2}$ ;
-
(ii) $|T_1(X)| \asymp \mathbb{V}(\mathbb{E}(X|\mathscr{G}))$ and $|T_2(X)| \sim \mathbb{E}\bigl(\kappa(\mathbb{V}(X|\mathscr{G}))\bigr)$ .
Moreover, $\mathbb{E}\bigl(\kappa(\mathbb{V}(X|\mathscr{G}))\bigr) \leq \frac{27}{8}\sigma^{-2}\mathbb{E}((\sigma^2-\mathbb{V}(X|\mathscr{G}))^2)$ .
Proof. Inspired by the approach of Sections 3.2–3.3 in [Reference Barbour, Holst and Janson6], we define the function $g\,:\,\mathbb{R}\to\mathbb{R}$ as follows:
It is easily seen that g is bounded and has a bounded and continuous derivative. Define $h\,:\,\mathbb{R}\to\mathbb{R}$ by $h(x) = \sigma^2g^{\prime}(\sigma x + \mu) - \sigma xg(\sigma x + \mu)$ for each $x\in\mathbb{R}$ . This gives
where $Z\sim\text{N}(\mu,\sigma^2)$ . By the Stein identity (2.5), the second term on the right-hand side in (A.1) is 0, and using Fubini’s theorem, the right-hand side can be rewritten as
implying that
We therefore get the following lower bound for the Kolmogorov distance:
From the definition and (A.1), we get a very similar lower bound for the Wasserstein distance:
We next observe that
and
which in turn implies that
and
for each $x\in\mathbb{R}$ . From this we get
and
It remains to find a lower bound for the numerator in (A.2). Using (2.9), we first write
After some straightforward computations, we get
and multiplying (A.3) by $\mu-\mathbb{E}(X|\mathscr{G})$ , we obtain
an expression which is nonpositive. Furthermore, by convexity,
where the right-hand side is the tangent line at $x=\sigma^2$ . It follows that if $\mathbb{V}(X|\mathscr{G})\geq \sigma^2$ , then
(compare the proof of Lemma 3.2.1 in [Reference Barbour, Holst and Janson6]), while if $\mathbb{V}(X|\mathscr{G})\leq \sigma^2$ , then
Multiplying by $\mu-\mathbb{E}(X|\mathscr{G})$ and taking expectations in the last two sets of inequalities, we get
From this it follows that if the asymptotic behaviour of X is such that $\sigma^{-2}\mathbb{E}\bigl((\mu-\mathbb{E}(X|\mathscr{G}))^4\bigr)$ and $\sigma^{-2}\mathbb{E}\bigl((\sigma^2-\mathbb{V}(X|\mathscr{G}))_+(\mu-\mathbb{E}(X|\mathscr{G}))^2\bigr)$ converge to 0 faster than $\mathbb{E}\bigl((\mu-\mathbb{E}(X|\mathscr{G}))^2\bigr)$ , it holds that $\bigl|\mathbb{E}\bigl((\mu-\mathbb{E}(X|\mathscr{G}))g(X)\bigr)\bigr| \asymp \mathbb{E}\bigl((\mu-\mathbb{E}(X|\mathscr{G}))^2\bigr)$ .
Similarly, after some computations, we obtain
and subtracting with $\mathbb{E}(g^{\prime}(Z)) = \frac{1}{2^{3/2}}$ leads to
Multiplying by $\sigma^2-\mathbb{V}(X|\mathscr{G})$ and using the function $\kappa$ defined in Theorem A.1, we get
We observe that $\kappa(\sigma^2) = 0$ , and that
so $\kappa^{\prime}(x)<0$ for $x\in(0,\sigma^2)$ , $\kappa^{\prime}(\sigma^2)=0$ , and $\kappa^{\prime}(x)>0$ for $x>\sigma^2$ . Moreover, by (A.4),
and $\kappa^{\prime}(x)\to\frac{1}{2^{3/2}}$ as $x\to\infty$ . Next,
and
implying that $\kappa^{\prime\prime}(x)>0$ for $x\in(0,9\sigma^2)$ , $\kappa^{\prime\prime}(9\sigma^2)=0$ , and $\kappa^{\prime\prime}(x)<0$ for $x>9\sigma^2$ . Moreover, $\kappa^{\prime\prime}(x)$ is strictly decreasing for $x\in[0,13\sigma^2)$ . This means that for $\delta>0$ small enough, for any $x_0\in(9\sigma^2-\delta,9\sigma^2)$ it holds that $\kappa^{\prime\prime}(x_0)>0$ and $\kappa^{\prime}(x_0)>\frac{1}{2^{3/2}}$ . It therefore holds that $\kappa(x) \geq \frac{1}{2}\kappa^{\prime\prime}(x_0)(\sigma^2-x)^2$ for $x\in[0,x_0]$ , and
for $x\geq x_0$ . It also follows from the preceding that
for $x\geq 0$ .
Using now the fact, observed in Section 3.2 in [Reference Barbour, Holst and Janson6], that $1\geq (1-2u)e^{-u} \geq 1-3u$ for all $u\geq 0$ , we obtain
and, furthermore,
Also,
Taking expectations in (A.6) and using the last three sets of inequalities, we get
From this it follows that if the asymptotic behaviour of X is such that $\sigma^{-2}\mathbb{E}\bigl(|\sigma^2-\mathbb{V}(X|\mathscr{G})|(\mu-\mathbb{E}(X|\mathscr{G}))^2\bigr)$ converges to 0 faster than $\mathbb{E}(\kappa(\mathbb{V}(X|\mathscr{G})))$ (note also that $\kappa(\mathbb{V}(X|\mathscr{G})) \leq |\sigma^2-\mathbb{V}(X|\mathscr{G})|)$ , it holds that
Acknowledgements
We wish to thank an anonymous referee for a number of insightful comments. K. B. is supported by Grant No. 2017−04951 from the Swedish Research Council (Vetenskapsrådet).