1. Introduction
In this article, we study random geometric graphs on the d-dimensional Poincaré ball, a canonical model for negatively curved spaces in geometry [Reference Cannon, Floyd, Kenyon, Parry and Levy15, Reference Ratcliffe41]. Random geometric graphs were first studied in the 1960s, beginning with [Reference Gilbert24], as a model of radio communications. Since then, they have been a thriving topic of research in probability, statistical physics, and wireless networks (see [Reference Meester and Roy35, Reference Penrose38, Reference Yukich42, Reference Baccelli and Blaszczyszyn3, Reference Baccelli and Blaszczyszyn4, Reference Haenggi26]). In its simplest form, the random geometric graph can be constructed by taking a set of independent and identically distributed (i.i.d.) points ${\mathcal{X}}_n = \{X_1,\ldots,X_n\}$ on a metric space as its vertex set, and placing an edge between any two distinct points within a distance $r_n$ . Most of the studies on such graphs assume that the underlying metric space is Euclidean or a compact and convex Euclidean subset. However, various applications, especially those in topological data analysis, necessitate more general metric spaces. For example, the extension to a compact manifold without boundary has recently been investigated in [Reference Bobrowski and Mukherjee8, Reference Bobrowski and Oliveira9, Reference De Kergorlay, Tillmann and Vipond18]. A crude one-line summary of these studies is that the behaviour of the graph on a ‘nice’ d-dimensional manifold is similar to that on a d-dimensional Euclidean space, though the proofs are technically more challenging and boundary effects need to be accounted for carefully.
Going beyond the class of nice d-dimensional manifolds with positive curvature and the Euclidean space with zero curvature, we focus on ‘negatively’ curved spaces in this paper. Such an investigation on hyperbolic spaces was initiated in [Reference Krioukov30]. Among many equivalent models for the hyperbolic space, we focus our attention on the widely used d-dimensional Poincaré ball. Apart from mathematical curiosity, a good reason for investigating hyperbolic geometric graphs lies in their properties such as sparsity, power-law degree distribution, small-world phenomena, and clustering. Such properties are often observed in complex networks. For more details, see the introductions in [Reference Krioukov30, Reference Gugelmann, Panagiotou and Peter25, Reference Fountoulakis20], where the graph is often referred to as the disk model or the KPKVB model after the authors of [Reference Krioukov30]. However, we always call it the hyperbolic random geometric graph. In a broader sense, our work is an addition to the developing literature on random structures of hyperbolic spaces (see [Reference Brooks and Makover12, Reference Benjamini and Schramm6, Reference Benjamini5, Reference Lyons and Peres34, Reference Lalley31, Reference Petri and Thaele40, Reference Cunningham, Zuev and Krioukov17, Reference Fountoulakis, van der Hoorn, Müller and Schepers21]).
The rest of the article is organized as follows. Sections 1.1 and 1.2 provide simulations on hyperbolic geometric graphs, and a preview of our results and their implications. In Section 2, we introduce our setup in detail and state all our results. We compare our results with those in the existing literature in Section 2.4, and Section 2.5 sketches results for analogous models in the Euclidean case. This is followed by the proofs in Section 3, where we introduce basic lemmas on the hyperbolic metric and measure and also the abstract normal approximation bound or second-order Poincaré inequality from [Reference Last, Peccati and Schulte32] needed for our central limit theorem (CLT).
1.1. Simulations on hyperbolic geometric graphs
We begin by introducing the Poincaré ball and the hyperbolic random geometric graph. Though there are other models of hyperbolic spaces, they are all isometric to the Poincaré ball (see [Reference Cannon, Floyd, Kenyon, Parry and Levy15, Section 7]); therefore, our results are essentially independent of the model selection. The d-dimensional Poincaré ball, denoted by
with negative curvature $-\zeta^2$ , is a d-dimensional open unit ball equipped with the Riemannian metric
where $|\cdot|$ denotes the Euclidean norm. Then, the hyperbolic distance between the origin (the same as the origin in the Euclidean plane) and $x\in B_d^{(\zeta)}$ is given by $d_\zeta(0,x) = \zeta^{-1} \log \big( (1+|x|)/(1-|x|) \big)$ , where $d_\zeta$ is the hyperbolic distance induced by (2). Though $B_d^{(\zeta)}$ is topologically the same as any open Euclidean ball, what matters to us is the metric itself. In the notation of (2), the Euclidean metric can be specified as $ds^2 = dx_1^2 + \cdots + dx_d^2$ , and so, on compact sets of the unit ball, the two metric elements differ only by a constant. This ensures that various geometric ingredients in the Euclidean case can be carried over to the Poincaré disk when restricted to a compact set of the unit ball. However, as $|x| \uparrow 1$ in (2), one would expect that the geometry of the Poincaré disk significantly differs from that of the Euclidean disk; see Figure 1 for hyperbolic lines and circles on the Poincaré disk with $d=2$ . The reader may refer to [Reference Ratcliffe41] for a rigorous discussion of basic features of hyperbolic spaces and geometry.
Let B(0, R) denote the hyperbolic ball of radius R centred at the origin. In addition to the curvature parameter $-\zeta^2$ in (2), our model involves a second curvature parameter $-\alpha^2$ of another d-dimensional Poincaré ball $B_d^{(\alpha)}$ . We first choose a sequence of radii $R_n \to \infty$ as $n \to \infty$ and select $N_n \stackrel{d}{=}$ Poisson(n) i.i.d. ‘uniform’ points $X_1,\ldots,X_{N_n}$ in $B(0,R_n) \subset B_d^{(\alpha)}$ . Here, we mean ‘uniform’ with respect to the hyperbolic volume measure with curvature $-\alpha^2$ . We then represent $X_i$ in the hyperbolic polar coordinates as $(r_i,\Theta_i)$ , where $r_i$ is the hyperbolic distance with respect to the curvature $-\alpha^2$ , and $\Theta_i \in [0,\pi]^{d-2} \times [0,2\pi)$ is a uniform random vector specifying angles. Next, we consider $X_i = (r_i,\Theta_i)$ as the points of $B_d^{(\zeta)}$ by keeping their polar coordinates fixed, and connect any two points $X_i,X_j$ if $0 < d_{\zeta}(X_i,X_j) < R_n$ . In other words, we sample points uniformly in a growing ball in $B_d^{(\alpha)}$ with respect to the hyperbolic volume measure with curvature $-\alpha^2$ , and form a geometric graph on $B_d^{(\zeta)}$ with respect to the hyperbolic metric with curvature $-\zeta^2$ . We denote the resulting graph by $HG_n(R_n ;\ \alpha,\zeta)$ , for which four parameters are involved: the dimension d, two curvature paramaters $\alpha$ and $\zeta$ , and the radii regime $R_n$ . As will be clarified in our later analyses, by choosing two distinct curvature parameters, one can construct a larger parametric family of random graphs with a richer set of features. We refer the reader to Remark 1 for a discussion about the dependence of our results on the individual parameters, not just their ratio, unlike many earlier results. After Definition 1, we briefly remark on the relationship of this model to the random geometric graph on a stationary Poisson process.
Figure 2 presents five simulation results with $d = 2$ , $n = 1000$ , $\zeta = 1$ , $R_n = 2\log 1000 = 13.82$ , and different choices of $\alpha$ . It is useful to keep in mind that for a small $\alpha$ , the space is closer to the Euclidean space, in the sense that more points are scattered near the centre. For a large $\alpha$ , the points near the boundary mainly dominate the asymptotics. Furthermore, by the geometry of a hyperbolic space, the points near the centre easily connect to many other points, and so the presence of such points will change the connectivity structure. Additionally, we may notice a phase transition in the connectivity at $\alpha = 1$ , due to the appearance of points closer to the centre. Some of the above observations will be crystallized later as rigorous theorems, while many more still await exploration in future works. We now preview some of our results and place them in the context of the growing literature on hyperbolic random geometric graphs.
1.2. Tree-like hierarchical structures: a preview of our results and some implications
One of the main characteristics of hyperbolic random geometric graphs is the presence of tree-like hierarchical structures, implying that the vertices on $B_d^{(\zeta)}$ are classified into large groups of smaller subgroups, which themselves consist of further smaller subgroups (see [Reference Krioukov30]). This is reflected in our simulations, illustrating that, as compared to the Euclidean spaces, the non-amenability of a negatively curved space enables one to embed many more trees in the space.
The study of the hyperbolic random geometric graph started with [Reference Krioukov30]. Since then, various features of the resulting graph have been investigated intensively by many authors. Indeed, there are a number of publications on the subject, in areas including the structure of large components [Reference Bode, Fountoulakis and Müller10, Reference Fountoulakis and Müller22, Reference Kiwi and Mitsche29], the probability of connectivity [Reference Bode, Fountoulakis and Müller11], clique-counts [Reference Bläsius, Friedrich and Krohmer7], the degree sequence and clustering [Reference Candellero and Fountoulakis14, Reference Gugelmann, Panagiotou and Peter25, Reference Fountoulakis, van der Hoorn, Müller and Schepers21], the bootstrap percolation process [Reference Candellero and Fountoulakis13], the typical distances and diameter of components [Reference Abdullah, Bode and Fountoulakis1, Reference Kiwi and Mitsche27, Reference Müller and Staps36], and the spectral properties of the graph [Reference Kiwi and Mitsche28]. More recently, [Reference Fountoulakis and Yukich23] has proved expectation and variance asymptotics as well as CLTs for the number of isolated and extreme points for the hyperbolic random geometric graph.
The earlier studies cited above, however, treat only a special case, namely,
for some $\nu\in (0,\infty)$ , or similar variants. In contrast, the current study assumes a set of conditions more general than (3). For example, we allow for a general dimension d, and we do not put any restrictive conditions on $R_n$ as in (3). Under this general setup, the primary objective of this paper is to uncover the spatial distribution of the tree-like structure via sub-tree counts, i.e., the number of copies of a given tree in $HG_n(R_n ;\ \alpha, \zeta)$ . In graph-theoretic language, we count the number of injective graph homomorphisms from a fixed sub-tree to $HG_n(R_n ;\ \alpha, \zeta)$ . Our main results in Theorems 1, 2, and 3 establish expectation asymptotics, variance asymptotics, and the normal approximation bounds for the sub-tree counts. All of our asymptotic results crucially depend on a degree sequence of the tree under consideration as well as the two curvature parameters $-\zeta^2$ and $-\alpha^2$ .
More specifically, the present paper studies a more general sub-tree count, denoted by $S_n^{(\gamma)}$ , $\gamma\in (0,1]$ , which counts sub-trees only on the space $B(0,R_n)\setminus \mathring{B}(0, (1-\gamma)R_n)$ . In other words, $S_n^{(\gamma)}$ counts sub-trees in an annulus with inner radius $(1-\gamma)R_n$ and outer radius $R_n$ . In particular, $S_n\,{:\!=}\,S_n^{(1)}$ denotes the sub-tree counts for the ‘entire’ space $B(0,R_n)$ . Our results will reveal that if $\alpha$ is sufficiently large, i.e., the space $B_d^{(\alpha)}$ is sufficiently hyperbolic (for example, $\alpha=1.2$ in Figure 2), then the asymptotics of $S_n^{(\gamma)}$ does not depend on $\gamma$ . In other words, the spatial distribution of sub-trees is mostly determined by those near the boundary of $B(0,R_n)$ . In this case, $S_n$ is well approximated by $S_n^{(\gamma)}$ for any $\gamma \in (0,1)$ . On the other hand, if $\alpha$ is small, as in Figure 2 with $\alpha=0.8$ , the distribution of sub-trees will be affected by those near the centre of the space, and $S_n^{(\gamma)}$ no longer approximates $S_n$ . To relate the above phenomena to the context of previous studies, Corollary 2 restates our results in the special case when $R_n = 2(\zeta (d-1))^{-1} \log (n/\nu)$ . If $d = 2$ , this clearly reduces to (3). Interested readers wishing to understand this special case can proceed directly to Section 2.4. In particular, Table 1 compares a part of our results with those of earlier works on connectivity, percolation, and clique counts. If we take the tree to be a single edge, then $S_n$ denotes the number of edges. Our results restricted to this case are consistent with the power-law behaviour for degree distributions. Our results also indicate a similar power-law behaviour for more general sub-tree counts, revealing the dependence of the exponents on the degree sequence of the tree under consideration and the parameters $\alpha,\zeta$ . This is also explained in detail in Section 2.4.
Let us now add a few words on our proofs. The expectation and variance asymptotics for sub-tree counts involve the Mecke formula for Poisson point processes and various estimates for the measure and metric on the Poincaré ball. By virtue of the tree assumption, the relative angles between points will exhibit sufficient independence (see Lemma 2). For the CLT, we use the abstract normal approximation result (see [Reference Last, Peccati and Schulte32]) based on the Malliavin–Stein method. To use this result, we derive detailed bounds on the first- (add-one cost) and second-order difference operators of the sub-tree counts. In many applications to Euclidean stochastic geometric functionals, the first- and second-order difference operators are bounded by a constant, but in our setup, the bounds for these operators are unbounded in n. Nevertheless, we manage to apply [Reference Last, Peccati and Schulte32] by carefully bounding the difference operators, so that their growth is dominated by our variance lower bounds (see Remark 3(iii)).
2. Our setup and results
2.1. The Poincaré ball
Our underlying metric space is the d-dimensional Poincaré ball $B_d^{(\zeta)}$ in (1), which is equipped with the Riemannian metric (2), where $-\zeta^2$ represents the negative (Gaussian) curvature with $\zeta > 0$ . This is a standard model in the literature, but earlier studies (e.g. [Reference Bode, Fountoulakis and Müller10, Reference Fountoulakis and Müller22, Reference Kiwi and Mitsche29, Reference Bode, Fountoulakis and Müller11, Reference Bläsius, Friedrich and Krohmer7, Reference Candellero and Fountoulakis14, Reference Gugelmann, Panagiotou and Peter25, Reference Candellero and Fountoulakis13, Reference Kiwi and Mitsche27, Reference Müller and Staps36, Reference Kiwi and Mitsche28, Reference Fountoulakis and Yukich23, Reference Abdullah, Bode and Fountoulakis1, Reference Fountoulakis19, Reference Fountoulakis20]) treated only the special case $d=2$ , whereas we allow for higher dimensions as well. We mention some of the basic properties of this metric space, and some more will be stated in Section 3.1. For more details on the Poincaré ball, we refer the reader to [Reference Ratcliffe41, Reference Anderson2], while [Reference Cannon, Floyd, Kenyon, Parry and Levy15] is a good resource for a quick reading. In what follows, we often represent the point $x \in B_d^{(\zeta)}$ in terms of ‘hyperbolic’ polar coordinates: for $x \in B_d^{(\zeta)}$ , we write $x = (r,\theta_1,\dots, \theta_{d-1})$ , where $r\geq 0$ is the radial part of x, defined by
and $(\theta_1, \dots, \theta_{d-1}) \in C_d \,{:\!=}\, [0,\pi]^{d-2} \times [0,2\pi)$ is the angular part of x. Let $d_\zeta$ denote the hyperbolic distance induced by (2); then it satisfies $d_\zeta(0,x)= r$ .
Using the hyperbolic polar coordinates $(r,\theta_1,\dots,\theta_{d-1})$ , the metric (2) can be rewritten as
from which we obtain the volume element,
We now generate ‘quasi-uniform’ random points on a sequence of growing compact subsets of the Poincaré ball. First, we choose a deterministic sequence $R_n$ , $n\geq1$ , which grows to infinity as $n\to\infty$ . We assume that the angular part of the random points is uniformly chosen from the density,
where we have set $\kappa_m= \int_0^{\pi} \sin^m \theta\, d\theta$ , and trivially, $\kappa_0 = \pi$ . We use the symbol $\pi$ to denote the angular density as well as the famed constant, but the context makes it clear which of them we refer to. Given another curvature parameter $\alpha > 0$ , we assume that the density of the radial part is
This density is described as the ratio of the surface area of B(0, r) to the volume of $B(0,R_n)$ , where B(0, r) and $B(0,R_n)$ are both defined on $B_d^{(\alpha)}$ . So the density (6) can be regarded as a uniform density (for the radial part) on $B_d^{(\alpha)}$ . Combining (5) and (6), we can generate uniform random points on $B_d^{(\alpha)}$ . Since their radial coordinates are bounded by $R_n$ , we can also consider them as points in $B(0,R_n) \subset B_d^{(\alpha)}$ . By using the same hyperbolic polar coordinates, i.e., the same radial and angular coordinates, we now represent these points in $B_d^{(\zeta)}$ as well, and construct the hyperbolic geometric graph in $B_d^{(\zeta)}$ . Specifically, we connect any two distinct points $X_i$ , $X_j$ if $0< d_\zeta(X_i,X_j)\le R_n$ . Obviously, (6) is no longer a uniform density (for the radial part) on $B_d^{(\zeta)}$ , unless $\zeta = \alpha$ ; it is called ‘quasi-uniform’.
Though the probability density in (6) looks a little complicated, we shall mostly resort to the following useful approximation. Set $T\,{:\!=}\, R_n-d_\zeta(0,X)$ , where X is a random point with density $\rho_{n,\alpha} \otimes \pi$ . Denote by $\bar \rho_{n,\alpha} (t)$ the density of T; i.e.,
In the sequel, we often denote a random point X with density $\rho_{n,\alpha} \otimes \pi$ by its hyperbolic polar coordinates $X=(T, \Theta)$ , where $\Theta$ is the angular part, and the radial part is described by T rather than $d_\zeta(0,X)$ . The approximation result below was established for $d=2$ in [Reference Candellero and Fountoulakis14]. The proof is in Section 3.
Lemma 1.
-
(i) As $n\to\infty$ , we have
\begin{align*}\bar \rho_{n,\alpha} (t) \leq \bigl( 1+o(1) \bigr) \alpha (d-1) e^{-\alpha (d-1)t}\end{align*}uniformly for $0 \leq t < R_n$ . -
(ii) For every $0 < \lambda <1$ , we have, as $n\to \infty$ ,
\begin{align*}\bar \rho_{n,\alpha} (t) =\bigl( 1+o(1) \bigr) \alpha (d-1) e^{-\alpha (d-1)t}\end{align*}uniformly for $0 \leq t \leq \lambda R_n$ .
More concretely, this lemma claims that if t is restricted to $[0,\lambda R_n]$ for some fixed $\lambda\in(0,1)$ , then $\bar \rho_{n,\alpha} (t)$ is asymptotically and uniformly equal to $\alpha (d-1) e^{-\alpha (d-1)t}$ . If there is no such restriction on the range of t, we can only bound $\bar \rho_{n,\alpha} (t)$ , up to constant factors, by the same quantity. For the exact asymptotics of $\bar \rho_{n,\alpha} (t)$ , we apply the result in Case (ii), while Case (i) is used for bounding $\bar \rho_{n,\alpha} (t)$ .
Now we define our main object of interest, the hyperbolic random geometric graph. The first ingredient is the Poisson point process on $B_d^{(\zeta)}$ . For every $n \geq1$ , let $(X_i, \, i \geq 1)$ be a sequence of i.i.d. random points on $B_d^{(\zeta)}$ with common density $\rho_{n,\alpha} \otimes \pi$ . Letting $N_n$ be a Poisson random variable with mean n, independent of $(X_i)$ , one can construct the Poisson point process $\mathcal{P}_n = \{ X_1, X_2, \dots, X_{N_n}\}$ whose intensity measure is $n (\rho_{n,\alpha} \otimes \pi)$ .
Definition 1. (Hyperbolic random geometric graph) Let $R_n \to \infty$ as $n \to \infty$ . The hyperbolic random geometric graph $HG_n(R_n ;\ \alpha, \zeta)$ is a simple, undirected graph whose vertex set is the Poisson point process $\mathcal{P}_n$ with intensity measure $n (\rho_{n,\alpha}\otimes \pi)$ , and whose edge set is $\{ (X_i,X_j) : X_i,X_j \in \mathcal{P}_n, 0 < d_\zeta(X_i,X_j) \leq R_n \}$ .
Finally, we remark that even if $\alpha = \zeta$ , the process $\mathcal{P}_n$ is not the restriction of a stationary Poisson process to $B_d^{(\alpha)}$ unless $|B(0,R_n)| = n$ . A random geometric graph on a stationary Poisson process is a different model, which is actually beyond the scope of this article.
2.2. Expectation and variance asymptotics for sub-tree counts
We introduce the notion of a graph homomorphism to define our sub-tree counts. Suppose H is a simple graph on $[k]\,{:\!=}\,\{ 1,\dots,k \}$ with edge set E(H). Given another simple graph $G = (V(G),E(G))$ , a graph homomorphism from H to G refers to a function $f : [k] \to V(G)$ such that if $(i,j) \in E(H)$ , then $(f(i),f(j)) \in E(G)$ , i.e., the adjacency relation is preserved. We denote by ${\mathcal{C}} (H,G)$ the number of injective graph homomorphisms from H to G, i.e., of functions f as above that are injective homomorphisms. Informally, ${\mathcal{C}} (H,G)$ counts the number of copies of H in G. This can be represented easily as follows:
where $\sum^{\neq}$ denotes the sum over distinct k-tuples $v_1,\ldots,v_k$ and $\textbf{1} \{ \cdot \}$ is an indicator function. Notice that the subgraphs counted by ${\mathcal{C}}(H,G)$ are not necessarily induced subgraphs in G isomorphic to H. By (8), it is easy to derive a monotonicity property: if $H_1,H_2$ are simple graphs on [k] such that $E(H_1) \subset E(H_2)$ , then ${\mathcal{C}}(H_2,G) \leq {\mathcal{C}}(H_1,G)$ .
Let us return to our setup in Definition 1. If $N_n \geq k$ , we denote a collection of k-tuples of distinct elements in $\mathcal{P}_n$ by
Set $\mathcal P_{n,\neq}^k = \emptyset$ if $N_n < k$ . Define the annulus $D_\gamma(R_n) \,{:\!=}\, B(0,R_n){\setminus}\mathring{B}(0, (1-\gamma)R_n)$ for $0 < \gamma \leq 1$ , where $\mathring{A}$ denotes the interior of a set A. Constructing the hyperbolic random geometric graph on $\mathcal{P}_n \cap D_\gamma(R_n)$ as in Definition 1, we denote it by $HG_n^{(\gamma)}(R_n ;\ \alpha, \zeta)$ . For $k\ge 2$ , let $\Gamma_k$ be a tree with nodes [k] and edge set E. Our interest lies in the sub-tree counts $S_n^{(\gamma)} \,{:\!=}\, {\mathcal{C}}\bigl(\Gamma_k,HG^{(\gamma)}_n(R_n;\ \alpha, \zeta)\bigr)$ for $\gamma \in (0,1]$ . Recalling that $T_i = R_n-d_\zeta(0,X_i)$ and using (8), one can represent $S_n^{(\gamma)}$ as
In particular, we write $S_n \,{:\!=}\, S_n^{(1)}$ . Then $S_n = {\mathcal{C}}\bigl(\Gamma_k,HG_n(R_n;\ \alpha, \zeta)\bigr)$ . Our first result gives the asymptotic growth rate of $\mathbb{E}(S_n^{(\gamma)})$ for $\gamma \in (0,1]$ .
Theorem 1. Let $\Gamma_k$ be a tree on k vertices ( $k \geq 2$ ) with degree sequence $d_1,\ldots,d_k$ , and let $S_n^{(\gamma)}$ be the sub-tree counts in (10). For $ \gamma \in (0,1) \setminus \{\frac{1}{2}\}$ , we have
where
For $\gamma = 1/2$ , we have
Furthermore, let $d_{(1)} \leq d_{(2)} \leq \ldots \leq d_{(k)}$ be the degree sequence of $\Gamma_k$ arranged in ascending order. If $2\alpha /\zeta > d_{(k)}$ , then, for all $\gamma \in (0,1]$ , we have
Remark 1. Many of the previous studies cited in Section 1.2 set either $\alpha$ or $\zeta$ to be 1, because in their setup, the asymptotics of the hyperbolic random geometric graph is completely determined by the ratio $\alpha /\zeta$ . The crucial assumption in these studies is
for some positive constant $\nu$ . In the current study, however, we do not put any stringent assumptions on $R_n$ as in (14). Consequently, as seen in (11)–(13), $\zeta$ contributes, as an individual parameter, to the asymptotics of sub-tree counts via the term $e^{-\zeta (d-1) (k-1) R_n/2}$ .
Theorem 1 indicates that the asymptotics of $\mathbb{E} \big( S_n^{(\gamma)} \big)$ are crucially determined by the underlying curvatures $-\zeta^2$ and $-\alpha^2$ . To make the implication of the theorem more transparent, we fix $\zeta$ and think of $\alpha$ as a parameter. If $\alpha >0$ is sufficiently large, i.e., the space $B_d^{(\alpha)}$ is sufficiently hyperbolic, sub-trees are asymptotically dominated by the contributions near the boundary of $B(0,R_n)$ . More specifically, if $2\alpha /\zeta > d_{(k)}$ , then $a_n^{(\gamma)} (d_i)$ , $i=1,\dots,k$ , all converge to a positive constant, and thus the growth rate of $\mathbb{E} \big( S_n^{(\gamma)} \big)$ does not depend on $\gamma$ , implying that the spatial distribution of sub-trees is completely determined by those near the boundary of $B(0,R_n)$ . In fact, for each $\gamma \in (0,1)$ , the growth rate of $\mathbb{E} \big( S_n^{(\gamma)} \big)$ coincides with that of $\mathbb{E}( S_n )$ .
On the other hand, if $\alpha$ becomes smaller, i.e., the space $B_d^{(\alpha)}$ becomes flatter, then the spatial distribution of sub-trees begins to be affected by those scattered away from the boundary of $B(0,R_n)$ . For example, if $d_{(k-1)} < 2\alpha /\zeta < d_{(k)} $ , we see that as $n \to \infty$ ,
while $a_n^{(\gamma)} (d_{(i)})$ , $i=1,\dots,k-1$ , all tend to a positive constant. In this case, the growth rate of $\mathbb{E}\big( S_n^{(\gamma)} \big)$ is no longer independent of $\gamma$ , and the growth rate becomes faster as $\gamma \nearrow 1$ , i.e., as the inner radius of the corresponding annulus shrinks. Moreover, if $\alpha$ becomes even smaller, so that $d_{(k-2)}< 2\alpha/\zeta < d_{(k-1)}$ , then $a_n^{(\gamma)} (d_{(k-1)})$ also asymptotically contributes to the growth rate of $\mathbb{E} \big( S_n^{(\gamma)} \big)$ . Finally, if $0 < 2\alpha /\zeta < d_{(1)}$ , then all of the terms $a_n^{(\gamma)} (d_i)$ contribute to the growth rate of $\mathbb{E} \big( S_n^{(\gamma)} \big)$ .
The corollary below claims that if an underlying tree $\Gamma_k$ is a k-star such that $d_{(k)} = k-1$ , $d_{(k-1)} = \cdots =d_{(1)} =1$ , then even more can be said about the asymptotics of $\log \mathbb{E}( S_n)$ .
Corollary 1. Let $\Gamma_k$ be a k-star with $k\ge2$ . Moreover, assume that $R_n$ satisfies
(note that $c=0$ or $c=\infty$ is possible). Let $a \vee b = \max \{a,b \}$ for $a,b\in {\mathbb R}$ .
-
(i) If $2\alpha /\zeta > k-1$ ,
(15) \begin{align}\mathbb{E} ( S_n) \sim \bigg( \frac{2^{d-1}}{(d-1)\kappa_{d-2}}\bigg)^{k-1} & \alpha^k \, \bigl( \alpha -\zeta(k-1)/2\bigr)^{-1} \nonumber\\[3pt]&\times \bigl( \alpha - \zeta/2 \bigr)^{-(k-1)} n^k e^{- \zeta (d-1)(k-1)R_n/2}, \ \ \ n\to\infty. \end{align} -
(ii) If $1 < 2\alpha/\zeta \leq k-1$ ,
\begin{align*}\frac{\log \mathbb{E}(S_n )}{R_n \vee \log n} \to k(c\vee 1)^{-1} - \alpha(d-1 ) \big(1\vee c^{-1}\big)^{-1}, \ \ \ n\to\infty.\end{align*} -
(iii) If $0 < 2\alpha /\zeta \leq 1$ ,
\begin{align*}\frac{\log \mathbb{E}(S_n )}{R_n\vee\log n} \to k(c\vee 1)^{-1}-(d-1)\bigl( \alpha k- \zeta(k-1)/2 \bigl) \big(1\vee c^{-1}\big)^{-1}, \ \ \ n\to\infty.\end{align*}
Our ultimate goal is to establish the CLT for the sub-tree counts $S_n^{(\gamma)}$ for $0 <\gamma \leq 1$ . Before describing it, however, it is important to investigate the variance asymptotics. The theorem below provides an asymptotic lower bound for $\mathbb{V}ar(S_n^{(\gamma)})$ up to a constant factor. Similarly to Theorem 1, if $\alpha /\zeta > d_{(k)}$ , the lower bound of $\mathbb{V}ar(S_n^{(\gamma)})$ is independent of $\gamma$ , whereas it depends on $\gamma$ when $\alpha /\zeta \leq d_{(k)}$ . Furthermore, if $\alpha /\zeta > d_{(k)}$ , we can obtain the exact growth rate of $\mathbb{V}ar(S_n)$ . For two sequences $(a_n)$ and $(b_n)$ , we write $a_n=\Omega(b_n)$ if there exists a positive constant C such that $a_n/b_n\ge C$ for all $n\ge1$ . Furthermore, $C^*$ denotes a generic positive constant, which may vary between lines and does not depend on n.
Theorem 2. Let $\Gamma_k$ be a tree on k vertices ( $k \geq 2$ ) with degree sequence $d_1,\ldots,d_k$ , and let $S_n^{(\gamma)}$ be the sub-tree counts in (10). For $0 < \gamma <1$ ,
Suppose further that $\alpha /\zeta > d_{(k)}$ ; then, for all $\gamma \in (0,1]$ , we have
The main point of (17) is that the growth rate of $\mathbb{V}ar(S_n)$ is determined by how rapidly $R_n$ grows to infinity. To see this in more detail, we consider a special case for which $R_n = c \log n$ for some $c \in (0,\infty)$ . Then we can observe the following phase transitions:
Moreover, Theorems 1 and 2 imply that, in the case of $ \alpha/\zeta < d_{(k)} < 2\alpha /\zeta$ , $\mathbb{V}ar(S_n^{(\gamma)})$ grows super-linearly, while the expectation exhibits only linear growth. A similar phenomenon has been observed for isolated points in [Reference Fountoulakis and Yukich23, Theorem 1.1]. This is indicative of a power-law behaviour in the sub-tree counts.
2.3. Central limit theorem for sub-tree counts
Having derived variance bounds, we proceed to prove the CLT for $S_n^{(\gamma)}$ . Before stating the normal approximation result, we need to define the two metrics to be used: the Wasserstein distance $d_W$ and the Kolmogorov distance $d_K$ . Let $Y_1,Y_2$ be two random variables, and let Lip(1) be the set of Lipschitz functions $h \,{:}\, {\mathbb R} \to {\mathbb R}$ with Lipschitz constant of at most 1. Then we define the two metrics as follows:
Although we have defined $d_W, d_K$ as distances between two random variables, they are actually distances between two probability distributions. Let N denote the standard normal random variable, and let $\Rightarrow$ denote weak convergence in ${\mathbb R}$ . In our proof, we derive more detailed, albeit complicated, bounds that indicate the scope for improvement.
Theorem 3. Let $\Gamma_k$ be a tree on k vertices ( $k \geq 2$ ) with degree sequence $d_1,\ldots,d_k$ , and let $S_n^{(\gamma)}$ be the sub-tree counts in (10). Assume further that $R_n$ satisfies
For every $0 < a < 1/2$ , there exists $0 < \gamma_0 < 1/2$ such that for all $0 < \gamma < \gamma_0$ , we have
Furthermore, if $\alpha /\zeta > d_{(k)}$ , then
Here, we claim that for a small enough $\gamma$ , the CLT holds for $S_n^{(\gamma)}$ , and if $\alpha / \zeta > d_{(k)}$ , the CLT holds for $S_n$ as well. Heuristically, a larger $\alpha / \zeta$ means that the space $B_d^{(\alpha)}$ is more hyperbolic relative to $B_d^{(\zeta)}$ and hence contains more points in the boundary. So in that case, $S_n$ is well approximated by $S_n^{(\gamma)}$ . Note that the CLT for $S_n^{(\gamma)}$ possibly holds even when the variance grows super-linearly in n. It is still unclear whether or not the CLT holds for $S_n$ under super-linear growth of the variance. In [Reference Fountoulakis and Yukich23, Theorem 1.3], it is shown that the CLT for ‘isolated points’ cannot hold under super-linear growth of the variance. Finally, Remark 4 at the end of Section 3.3.3 gives a more precise quantification of the bounds in (19).
Remark 2. If the constant c in (18) is finite and $2\alpha /\zeta>d_{(k)}$ , we see from (13) that $\mathbb{E}(S_n)$ is asymptotically equal to n, up to constant factors. This implies that $S_n$ behaves similarly to the subgraph counts in the thermodynamic regime of the Euclidean setting (more detailed arguments on this point are given in Section 2.5). If $c=\infty$ in (18), $R_n$ grows more slowly, so that the spatial distribution of $\Gamma_k$ becomes denser. For the sparse phase asymptotics, the first author’s ongoing work studies the case in which $R_n$ diverges faster than (18), so that $\mathbb{E}( S_n )$ is asymptotically equal to a positive constant. In this case, the tree-like structure rarely occurs, and $S_n$ will be approximated by a Poisson random variable.
2.4. Special case and comparison with previous studies
The objective of this section is to restate our results in the special case
for some positive constant $\nu \in (0,\infty)$ . By doing so, we can make an easy comparison with previous studies. As mentioned before, most earlier works (e.g. [Reference Bode, Fountoulakis and Müller10, Reference Fountoulakis and Müller22, Reference Kiwi and Mitsche29, Reference Bode, Fountoulakis and Müller11, Reference Bläsius, Friedrich and Krohmer7, Reference Candellero and Fountoulakis14, Reference Gugelmann, Panagiotou and Peter25, Reference Candellero and Fountoulakis13, Reference Kiwi and Mitsche27, Reference Müller and Staps36, Reference Kiwi and Mitsche28, Reference Fountoulakis and Yukich23, Reference Abdullah, Bode and Fountoulakis1, Reference Fountoulakis19, Reference Fountoulakis20]) are concerned only with the case
or its variants; hence, (21) serves as a higher-dimensional version of (22). Assuming (21) and $2\alpha / \zeta \notin {\mathbb N}$ , it is easy to see that as $n \to \infty$ ,
where $(a)_+ = a$ if $a>0$ and $(a)_+=0$ otherwise.
Corollary 2. Let $R_n = 2[\zeta (d-1)]^{-1} \log (n/\nu)$ , let $\Gamma_k$ be a tree on k vertices ( $k\geq 2$ ) with degree sequence $d_1,\dots,d_k$ , and let $S_n^{(\gamma)}$ be a sub-tree count as in (10). Suppose that $2\alpha /\zeta$ is not an integer. For $\gamma \in (0,1) \setminus \{\frac{1}{2}\}$ , we have, as $n\to\infty$ ,
and for $\gamma = 1/2$ , we have
Furthermore, as for the variance asymptotics, we have for $\gamma \in (0,1)$
If $2\alpha /\zeta > d_{(k)}$ , we have for $\gamma \in (0,1]$
If $\alpha / \zeta > d_{(k)}$ , we have for $\gamma \in (0,1]$
To avoid repetition, we do not state the corresponding CLT. However, it holds under the assumptions of Corollary 2. Note that (24) is a partial generalization of [Reference Candellero and Fountoulakis14, Claim 5.2] to higher dimensions.
Now we compare the results in Corollary 2 with those in the existing literature. If we choose $k =2$ in (25), then $S_n$ represents the number of edges with $d_{(2)}=1$ . Moreover, $S_n/2n$ denotes the empirical degree distribution. Then (25) asserts that $\mathbb{E}(S_n/2n) \to C^*$ if $\alpha /\zeta > 1/2$ . The convergence of the expected typical degree to a constant indicates that this regime is, in nature, the thermodynamic regime. Furthermore, when $1/2 < \alpha / \zeta < 1$ , the expected degree distribution converges to a constant, but the variance diverges. This is consistent with the power-law behaviour for degree distributions with exponent $2\alpha /\zeta +1$ , which itself was predicted in [Reference Krioukov30]. Such a power-law behaviour for degree distributions was proven in [Reference Gugelmann, Panagiotou and Peter25, Reference Fountoulakis19] for $d = 2$ . For $2\alpha /\zeta \leq 1$ , the expected average degree grows to infinity, which is again consistent with the conjecture that the degree distribution has a power-law behaviour with exponent 2 (see [Reference Krioukov30]).
In Table 1, we summarize some of the results in the existing literature and those in this paper in the case $d = 2$ , $\zeta = 1$ , $R_n = 2 \log (n / \nu)$ , $\nu \in (0,\infty)$ . As seen in Table 1, our results demonstrate various phase transitions. For example, for $1 < \alpha < d_{(k)}/2$ , the expected sub-tree counts grow super-linearly in n, whereas the expected number of k-cliques is linear, and there is no ‘giant component’. This justifies our observation in Section 1.2 that hyperbolic random geometric graphs contain many tree-like hierarchical structures, as compared to their Euclidean counterparts.
2.5. Comparison to subgraph counts of Euclidean random geometric graphs
In this section, we sketch asymptotic results for subgraph counts of analogous random geometric graph models in the Euclidean space. We claim that the behaviour of the subgraph counts of the Euclidean analogues is significantly different from that of the counts in the previous sections. For example, in contrast to Theorem 1, the growth rates for the subgraph counts in the Euclidean case do not depend on the degree sequence of the subgraph under consideration.
For simplicity, we restrict ourselves to the case $\alpha = \zeta = 1$ and $R_n = 2(d-1)^{-1}\log(n/\nu)$ for some $\nu > 0$ as in Section 2.4. Then, by Corollary 2, we have that $\mathbb{E}(E_n) = \Theta(n)$ , where $E_n$ denotes the number of edges in $HG_n(R_n;\ 1,1)$ . We now consider two possible Euclidean analogues in the setting of Corollary 2. One of them is defined by following the construction of $HG_n(R_n;\ 1,1)$ , and the other is defined to ensure that the expected degree is of the same order as $HG_n(R_n;\ 1,1)$ . It is a moot point as to which is the right analogue, and hence we make comparisons to both. See Figure 3 for the simulations of these Euclidean graphs.
The two analogues are constructed as follows:
-
1. Consider Poisson(n) points distributed uniformly in a sequence of growing Euclidean balls of radius $r_n\to \infty$ , and connect any two points within a distance of $r_n$ (this is the dense regime). We call this graph $EG_{1,n}$ . It has been defined by mimicking the construction of $HG_n(R_n ;\ 1,1)$ .
-
2. Consider Poisson(n) points distributed uniformly in a sequence of Euclidean balls of radius $r_n$ , and connect points within a distance $s_n$ such that the expected number of edges grows linearly in n (this is the thermodynamic regime). We call this graph $EG_{2,n}$ . The expected degree of this graph is of the same order as $HG_n(R_n ;\ 1,1)$ .
We now give a sketch of the calculations for the subgraph counts in the above two models, referring the reader to [Reference Penrose38, Chapter 3] for details. Denote by ${\mathcal{X}}_n$ the collection of Poisson(n) points distributed uniformly in $B_{r_n}(0) \subset {\mathbb R}^d$ , where $B_{r_n}(0)$ is the d-dimensional ball of radius $r_n$ centred at the origin. Then $EG_n(r_n,s_n)$ denotes the graph with vertex set ${\mathcal{X}}_n$ and edges between $X_i,X_j \in {\mathcal{X}}_n$ such that $|X_i - X_j| \leq s_n$ , where $|\cdot|$ is the Euclidean metric. In this notation, $EG_{1,n} = EG_n(r_n,r_n)$ and $EG_{2,n} = EG_n(r_n,s_n)$ , where $s_n$ is chosen such that the expected number of edges grows linearly in n. Let $\Gamma$ be a connected graph on k vertices, and let $J_{i,n}(\Gamma)$ , $i=1,2$ , denote the number of copies of $\Gamma$ in $EG_{i,n}$ , $i=1,2$ . By the Mecke formula, we have that
Since the order of $\mathbb{E}(J_{1,n}(\Gamma))$ is the same as that of the complete subgraph on ${\mathcal{X}}_n$ , we call it the dense regime. We observe that the choice of $r_n$ and the degree sequence of $\Gamma$ do not contribute to the growth rate of $\mathbb{E}(J_{1,n}(\Gamma))$ . This shows a striking contrast to the asymptotics for hyperbolic random geometric graphs in Corollary 2.
As in the above case, assuming $s_n = o(r_n)$ , we can also derive that $\mathbb{E}(J_{2,n}(\Gamma)) = \Theta\bigl(n^k (\frac{s_n}{r_n})^{d(k-1)}\bigr)$ . In the special case of $k=2$ and $s_n= n^{-1/d}r_n$ , this implies that the expected number of edges is asymptotically equal to n, up to a constant factor. The resulting regime is called the thermodynamic regime (see [Reference Penrose38, Chapter 3]), since the expected average degree (or empirical count of neighbours) is asymptotically constant. In contrast to the hyperbolic random geometric graph, the above calculation indicates that the asymptotics of $\mathbb{E}(J_{2,n}(\Gamma))$ is again independent of the degree sequence of $\Gamma$ and the choice of $r_n$ . Finally, we wish to emphasize that the broad message remains unchanged even for the second-order results; see [Reference Penrose and Yukich39, Reference Bobrowski and Mukherjee8] for more detailed computations.
3. Proofs
We first prove the basic lemma on an approximation of the hyperbolic probability density. Subsequently, Section 3.1 presents a few more lemmas concerned with hyperbolic distance. Utilizing these lemmas, we prove expectation and variance results for the sub-tree counts in Section 3.2. Finally, Section 3.3 establishes the required CLT. Throughout this section, $C^*$ denotes a generic positive constant, which may vary between lines and does not depend on n.
Proof of Lemma 1. We see that
Applying the binomial expansion
we find that $\int_0^{R_n} e^{\alpha (d-1)s}ds$ is asymptotically the leading term in the denominator in (26). Using the obvious inequality $ \bigl( 1-e^{-2\alpha (R_n-t)} \bigr)^{d-1} \leq 1$ , we complete the proof of (i).
For the proof of (ii), we need to handle the numerator of (26) as well. By another application of the binomial expansion, we have that
uniformly for $0 \leq t \leq \lambda R_n$ , where $0 < \lambda <1$ .
3.1. Lemmas on hyperbolic distance
We first present a crucial lemma that explains why we consider (not necessarily induced) sub-trees, which are counted as injective homomorphisms in the hyperbolic random geometric graph. In particular, the conditional independence proven in Lemma 2 below does not hold for more general subgraph counts. The asymptotics of general subgraph counts is highly non-trivial not only for ‘induced’ sub-tree counts, but also for even seemingly simple triangle counts. For the latter case, [Reference Candellero and Fountoulakis14] introduced the notion of a global clustering coefficient defined as three times the ratio of the number of triangles to the number of paths of length two.
Lemma 2. Let $\Gamma_k$ be a tree on k vertices with edge set E. Let $X_1,\dots, X_k$ be i.i.d. random points with common density $\rho_{n,\alpha} \otimes \pi$ . Define $T_i=R_n-d_\zeta(0,X_i)$ , $i=1,\dots,k$ , and write ${\textbf{T}} = (T_1,\dots,T_k)$ . Then it holds that
Proof. Let $\Theta_i$ be the angular part of $X_i$ , $i=1,\dots,k$ . For the proof, it suffices to show that
Let us denote by $\Theta_{12}$ the relative angle between $\Theta_1$ and $\Theta_2$ . Then, because of the uniformity of the angular density of $\Theta_i$ in (5), we can derive that the density $\pi_{rel}$ of $\Theta_{12}$ , which is also the same as the conditional density of $\Theta_{12} | \Theta_1$ , is given by
Note that $d_\zeta(X_1,X_2)$ depends only on $T_1,T_2,\Theta_{12}$ , and so we obtain that
Suppose that $\Gamma_k$ is of depth 1 rooted at vertex 1. In this case, since $X_2,\dots,X_k$ are i.i.d.,
Suppose, for induction, that (27) holds for any tree $\Gamma_k$ rooted at vertex 1 with depth $M \geq 1$ . Assume subsequently that $\Gamma_k$ is rooted at vertex 1 with depth $M+1$ . Let $2,\dots,m$ be the vertices connected to 1, and $S_2,\dots,S_m$ the corresponding trees rooted at $2,\dots,m$ . Let $E(S_\ell)$ be the edge set of $S_\ell$ . Then, from the disjointness of the trees and independence of the points $X_i$ , we have
Now, an application of conditional expectations for each $\ell=2,\dots,m$ gives us the desired result as follows:
where the induction hypothesis is used for the second equality, and (29) for the third equality.
We now mention some lemmas that help us to approximate the Poincaré metric. Given $u_1,u_2 \in B(0,R_n)$ , let $\theta_{12} \in [0,\pi]$ be the relative angle between the two vectors $\overrightarrow {Ou_1}$ and $\overrightarrow {Ou_2}$ , where O denotes the origin of $B_d^{(\zeta)}$ . We also define $t_i = R_n-d_\zeta(0,u_i)$ , $i=1,2$ .
Lemma 3. Set $\hat \theta_{12} = \bigl(e^{-2 \zeta(R_n-t_1)}+ e^{-2 \zeta(R_n-t_2)} \bigr)^{1/2}$ . If $\hat \theta_{12} / \theta_{12}$ vanishes as $n\to\infty$ , then
uniformly for all $u_1,u_2$ , with $t_1+ t_2 \leq R_n-\omega_n$ , where $\omega_n=\log \log R_n$ .
Proof. Fix a great circle of $B(0,R_n)$ spanned by $\overrightarrow{Ou_1}$ and $\overrightarrow{Ou_2}$ . Then the hyperbolic law of cosines yields
Since this great circle is a two-dimensional subspace of $B(0,R_n)$ , the rest of the argument is exactly the same as that for Lemma 2.3 in [Reference Fountoulakis19].
Lemma 4. In terms of the hyperbolic polar coordinates, let $X_1 = (t_1,\Theta_1)$ , $X_2 = (t_2,\Theta_2)$ , where $\Theta_1,\Theta_2$ are i.i.d. random vectors on $C_d$ with density $\pi$ , and $t_1,t_2$ are deterministic, representing the hyperbolic distance from the boundary. With the setup in Lemma 3,
uniformly on $\bigl\{(t_1,t_2)\,{:}\, t_1+t_2 \leq R_n-\omega_n \bigr\}$ , where $\kappa_{d-2} = \int_0^\pi \sin^{d-2} \theta\, d\theta$ and $\omega_n=\log \log R_n$ .
Proof. We again denote the relative angle between $\Theta_1$ and $\Theta_2$ by $\Theta_{12}$ . Recall that $\Theta_{12} \stackrel{d}{=} \Theta_{12} | \Theta_1$ . From the hyperbolic law of cosines, we know that the distance between $X_1$ and $X_2$ is determined by $t_1,t_2$ and their relative angle $\Theta_{12}$ . Since $t_1,t_2$ are fixed, by (28) we can write
where we write $u_i=(t_i,\theta_i)$ , $i=1,2$ , and $\theta_{12}$ is the relative angle between $u_1$ and $u_2$ . Now, we shall approximate the above integral. Let $A_{12} = e^{\zeta (R_n-t_1-t_2)/2}$ . Claim 2.5 in [Reference Fountoulakis19] proves that $A_{12}^{-1} / (\omega_n \hat \theta_{12}) \to \infty$ as $n\to\infty$ . By virtue of Lemma 3, along with $A_{12}^{-1} \to 0$ as $n\to\infty$ , we have, on the set $\{ (t_1,t_2): t_1+t_2 \leq R_n-\omega_n \}$ ,
Therefore,
as required.
3.2. Proofs of the expectation and variance results
In what follows, we calculate the moments of $S_n^{(\gamma)}$ . We first introduce some notation to save space. For $u_i \in B(0,R_n)$ , $i=1,\dots,k$ , we write
and
where $t_i=R_n-d_\zeta(0,u_i)$ . Note that $g_{n,\gamma}$ represents the product of multiple indicator functions in the right-hand side of (10), while $h_{n,\gamma}$ is a similar indicator for $S_n - S_n^{(\gamma)}$ .
Proof of Theorem 1. Let $X_1,\ldots,X_k$ be i.i.d. random points with density $\rho_{n,\alpha} \otimes \pi$ , and set $T_i = R_n - d_\zeta(0,X_i)$ as before. By an application of the multivariate Mecke formula for Poisson processes (see, e.g., [Reference Last and Penrose33, Chapter 4]),
where $\omega_n=\log \log R_n$ . From now on, the argument aims to show that $A_n$ coincides asymptotically with the right-hand side of (11). It follows from the conditioning on the radial distances of $X_1,\dots,X_k$ from the boundary and Lemma 2 that
where ${\textbf{T}}=(T_1,\dots,T_k)$ . Setting ${\textbf{t}} = (t_1,\dots,t_k)$ , we may write
where $\bar \rho_{n,\alpha} ({\textbf{t}})$ denotes the product of densities in (7):
Applying Lemma 4 on the set $\bigl\{ t_i+t_j \leq R_n -\omega_n, \ (i,j)\in E \bigr\}$ , we have, as $n\to\infty$ ,
Furthermore, it follows from Lemma 1(ii) that
uniformly for $t_i \leq \gamma R_n$ , $i=1,\dots,k$ . Substituting these results into (33), we obtain
We now show that $B_n = o(A_n)$ as $n\to\infty$ , unless $\gamma =1/2$ . We check only the case in which there is an edge joining $X_1$ and $X_2$ such that $T_1+T_2 > R_n-\omega_n$ , while all the other edges satisfy $T_i+T_j \leq R_n-\omega_n$ . However, the argument below can be applied in an obvious way even when multiple edges satisfy $T_i + T_j >R_n-\omega_n$ . Specifically, we shall verify
Proceeding in the same way as in the computation for $A_n$ , while applying the obvious bound
we see that
By comparing this upper bound with the right-hand side of (11), we see that it suffices to show that
If $0<\gamma<1/2$ , then $D_n$ is identically 0; hence, we may restrict ourselves to the case $1/2 < \gamma <1$ . Without loss of generality, we may assume that $d_1 \geq d_2$ and rewrite $D_n$ as
If $0 < 2 \alpha/\zeta< d_2-1$ , then
On the other hand, let $2 \alpha /\zeta> d_2-1$ . Then
The same result can also be obtained in the boundary case $2\alpha /\zeta= d_2-1$ , and thus we have proven (36). Finally, if $\gamma=1/2$ , the above calculations imply
and thus (12) follows.
We now proceed to proving (13). For $0<\gamma<1$ , define $U_n^{(\gamma)} = S_n - S_n^{(\gamma)}$ . Suppose $2 \alpha/\zeta > d_{(k)}$ . Appealing to (11), we obtain
because, for every $i=1,\dots,k$ , $a_n^{(\gamma)} (d_i)$ converges to $(d-1)^{-1}\bigl( \alpha - \zeta d_i/2 \bigr)^{-1}$ . Therefore, to complete our proof, it suffices to verify that
Recalling the notation (31), we apply the Mecke formula to obtain that
We can calculate $A_n^\prime$ in almost the same manner as $A_n$ ; the only difference is that when handling $\bar \rho_{n,\alpha} ({\textbf{t}})$ , we apply the inequality in Lemma 1(i) instead of Lemma 1(ii). We then obtain
Here, the second equality follows from the assumption that $d_i - 2\alpha /\zeta < 0$ for all $i=1,\dots,k$ . Furthermore, we can show that $B_n^{\prime} = o \bigl( n^k e^{-\zeta(d-1)(k-1)R_n/2} \bigr)$ ; the proof of this is similar to that of the corresponding result for the derivation of (35) and (36), so we omit it. Thus, we have (37) as needed to complete the proof of the theorem.
Proof of Corollary 1. Define $S_n^{(\gamma)}$ as the sub-tree counts relating to a k-star with $d_{(k)} = k-1$ , $d_{(k-1)} = \cdots = d_{(1)} = 1$ :
We also define $U_n^{(\gamma)} = S_n - S_n^{(\gamma)}$ . First, (15) is a direct consequence of (13), so we shall prove only (ii) and (iii).
For the proof of (ii), we start by deriving a suitable upper bound for $\mathbb{E} \big( U_n^{(\gamma)}\big)$ . By the Mecke formula and Lemma 1(i),
Taking the logarithm on both sides, we get
On the other hand, it follows from Theorem 1 that
which implies that, as $n\to\infty$ ,
Moreover, if $\limsup_{n\to\infty} \mathbb{E}\left(U_n^{(\gamma)}\right) / \mathbb{E}\left(S_n^{(\gamma)}\right) < \infty$ ,
and if $\limsup_{n\to\infty} \mathbb{E}\left(U_n^{(\gamma)}\right) / \mathbb{E}\left(S_n^{(\gamma)}\right) = \infty$ ,
Therefore, using the obvious inequalities
and letting $\gamma \nearrow 1$ , we obtain
The proof of (iii) is very similar to that of (ii), so we omit it.
Proof of Theorem 2. We start by writing
where $g_{n,\gamma}$ is defined at (30). For $\ell = 0$ , the Mecke formula yields
Let $\Gamma^{(i,j)}_{2k-1}$ be a tree on $[2k-1]$ (meaning $|E| = 2k-2$ ) formed by taking two copies of $\Gamma_k$ and identifying the vertex of degree $d_i$ in one copy with the vertex of degree $d_j$ in the other copy. In other words, the degree sequence of $\Gamma^{(i,j)}_{2k-1}$ is $d_i + d_j, d_i, d_j$ and a pair of $d_\ell$ ’s for $\ell \in [k] \setminus \{i,j\}$ ; see Figure 4(a)–(b).
We first note that $I_1 = \sum_{i,j=1}^k {\mathcal{C}}\Big(\Gamma^{(i,j)}_{2k-1},HG^{(\gamma)}_n(R_n;\ \alpha,\zeta)\Big)$ , and so, from the identity for $\mathbb{E}(I_0)$ above, we have that
where $i^\prime = j^\prime = (k)$ , that is, $d_{i^\prime} = d_{j^\prime} = d_{(k)}$ . Therefore, applying Theorem 1 to (38), we derive that
Note that owing to (32) and (34), the assumption $\gamma \neq 1/2$ is not required for the lower bound. Similarly, by using the bound $I_k \geq {\mathcal{C}}\bigl(\Gamma_k,HG^{(\gamma)}_n(R_n;\ \alpha,\zeta)\bigr)$ and Theorem 1, we get
Finally, combining (39) and (40) establishes (16).
We now proceed to showing (17). Assume that $\alpha /\zeta > d_{(k)}$ , and for $0<\gamma <1$ , write $\mathbb{V}ar(S_n^{(\gamma)}) = \sum_{\ell = 1}^k \mathbb{E}( I_\ell )$ as in (38). To derive exact asymptotics for $\mathbb{V}ar(S_n^{(\gamma)})$ , we need to derive exact asymptotics for $\mathbb{E}( I_1)$ and $\mathbb{E}( I_k)$ and also show that $\mathbb{E}(I_{\ell}) = o\bigl(\mathbb{E}( I_1) \vee \mathbb{E}( I_k)\bigr)$ for $2 \leq \ell \leq k-1$ .
From the representation of $I_1$ before (38) and Theorem 1, we have that
However, because of the constraint $\alpha /\zeta > d_{(k)} $ , we see that
tends to a positive constant for all $i,j = 1,\dots,k$ . Thus, we conclude that
Similarly, we can verify that $\mathbb{E}(I_k) \sim C^* n^{k} e^{-\zeta (d-1) (k-1)R_n/2} \ \ \text{as } n\to\infty$ .
Next we investigate the rate of $\mathbb{E}(I_\ell)$ for $\ell =2,\ldots,k-1$ . Similarly to $I_1$ , we can express $I_\ell$ as a sum of ${\mathcal{C}}\bigl(H,HG^{(\gamma)}_n(R_n;\ \alpha,\zeta)\bigr)$ over subgraphs H on $[2k-\ell]$ formed by identifying $\ell$ vertices on two copies of $\Gamma_k$ . Choosing a spanning tree $\Gamma_H$ of H, and using the monotonicity of ${\mathcal{C}}$ , we get ${\mathcal{C}}\bigl(H,HG^{(\gamma)}_n(R_n;\ \alpha,\zeta)\bigr) \leq {\mathcal{C}}\bigl(\Gamma_H,HG^{(\gamma)}_n(R_n;\ \alpha,\zeta)\bigr)$ ; see Figure 4(c)–(d). By the choice of H and $\Gamma_H$ , the vertex degrees are necessarily smaller than $\alpha / \zeta$ . Thus, from Theorem 1, we get that for every $\Gamma_H$ ,
Now, we derive that
Note that for every $n\geq1$ , $g(m) = n^m e^{-\zeta(d-1)(m-1)R_n/2}$ is monotonic in $m \in {\mathbb N}_+$ , so either $\mathbb{E}( I_1 )$ or $\mathbb{E} ( I_k )$ determines the actual growth rate of $\mathbb{V}ar(S_n^{(\gamma)})$ . Thus, we have that $\mathbb{E}(I_\ell) = o\bigl(\mathbb{E}( I_1) \vee \mathbb{E}( I_k)\bigr)$ for all $2 \leq \ell \leq k-1$ , and we can conclude that
Next, let $U_n^{(\gamma)} = S_n - S_n^{(\gamma)}$ for $0 < \gamma <1$ . We can finish the proof provided that
As in the proof for $\mathbb{V}ar(S_n^{(\gamma)})$ , we write
Repeating the same argument as in the proof of (37) in Theorem 1, we obtain that for all $\ell=1,\dots,k$ ,
Thus, (41) follows. Moreover, by the Cauchy–Schwarz inequality,
from which we have
3.3. Proof of the central limit theorem
The proof relies on the normal approximation bound derived from the Malliavin–Stein method in [Reference Last, Peccati and Schulte32] (see Theorem 4 below). The Malliavin–Stein method has emerged as a crucial tool for the approximation of Gaussian, Poisson, and Rademacher functionals. An important aspect of the method is to express the terms in Stein’s equation in terms of the Malliavin operators and derive suitable estimates for the latter. The bound in [Reference Last, Peccati and Schulte32] reduces estimates involving the Malliavin operators to those expressed in terms of more tractable difference operators. For a more detailed introduction to this subject, see [Reference Last and Penrose33, Reference Peccati and Reitzner37]. In our proof, we use only the first- and second-order difference operators. Our sub-tree counts are U-statistics, and one may express these difference operators in terms of $g_{n,\gamma}$ as defined in (30). The bulk of our work (see Lemmas 5–7) will be in deducing good estimates for these difference operators. By substituting the obtained estimates and the already derived variance estimates into the second-order Poincaré inequality, we obtain the required normal approximation.
This section is organized as follows. In Section 3.3.1 we introduce the normal approximation bound, as well as simplifications for our case. Next, in Section 3.3.2 we derive necessary estimates for difference operators, which we use in Section 3.3.3 to complete the proof of the CLT.
3.3.1. Malliavin–Stein bound for Poisson functionals
Let $\mathcal{P}$ be a Poisson point process on a finite measure space X with intensity measure $\lambda(\cdot)$ . For a functional F of Radon counting measures (i.e., locally finite collection of points), define the Malliavin difference operators as follows. For $x \in {\textbf{X}}$ , the first-order difference operator is $D_xF \,{:\!=}\, F(\mathcal{P} \cup \{x\}) - F(\mathcal{P})$ . The higher-order difference operators are defined inductively as $D^\ell_{x_1,\ldots,x_\ell} F\,{:\!=}\, D_{x_\ell}(D^{\ell-1}_{x_1,\ldots,x_{\ell-1}}F)$ . We require only the first- and second-order difference operators. The latter is easily seen to be
for $x,y \in {\textbf{X}}$ . We say that $F \in dom \, D$ if
The normal approximation bounds in [Reference Last, Peccati and Schulte32, Theorems 1.1 and 1.2] are given in a compact form but involve the second moments of the product of second-order difference operators. A simplified version of [Reference Last, Peccati and Schulte32, Theorems 1.1 and 1.2] is given in [Reference Last, Peccati and Schulte32, Proposition 1.4], but it does not suffice for our purposes. Therefore, the bounds we shall use below are those obtained by combining [Reference Last, Peccati and Schulte32, Theorems 1.1 and 1.2] and [Reference Last, Peccati and Schulte32, Theorem 6.1].
Theorem 4. ([Reference Last, Peccati and Schulte32, Theorems 1.1, 1.2, and 6.1]) Let $F \in dom \, D$ and let N be a standard normal random variable. Define
where these supremums are essential supremums with respect to $\lambda$ and $\lambda^2$ , respectively. Then
where $W_1,\dots,W_6$ are defined as follows:
For a self-contained proof, we state [Reference Last, Peccati and Schulte32, Theorems 1.1 and 1.2] in the appendix, explaining how the above theorem can be deduced using the proof strategy of [Reference Last, Peccati and Schulte32, Theorem 6.1].
In order to apply Theorem 4 to our setup, we take
where x is represented by its hyperbolic polar coordinates $(t,\theta)$ .
In what follows, we write $g = g_{n,\gamma}$ (see (30)). Observe that $S_n^{(\gamma)}$ is a U-statistic and its Malliavin derivatives have a neat form as follows: for two distinct points $x, y \in B_d^{(\zeta)}$ ,
where $\ell$ and the $\ell_i$ denote the positions of the corresponding coordinates. In particular, (42) represents the number of injective homomorphisms from $\Gamma_k$ to $HG_n^{(\gamma)}(R_n;\ \alpha, \zeta)$ , with one of its vertices fixed at x and the remaining $k-1$ points taken from $\mathcal{P}_n$ . Similarly, (43) represents the number of injective homomorphisms, but here, two of the vertices are fixed at x and y, respectively.
From (42) and (43), the constants $c_1$ , $c_2$ in Theorem 4 have become
For notational convenience later on, we also define
where x is again represented by its hyperbolic polar coordinates $(t,\theta)$ . With this notation and the preliminary facts now available, we can rephrase $W_i$ , $i=1,\dots,6$ , in Theorem 4 as follows:
where we identify the terms $x_i$ with their hyperbolic polar coordinates $(t_i, \theta_i)$ .
3.3.2. Some auxiliary lemmas for the proof of the central limit theorem
Having already derived variance bounds in Theorem 2, we now deduce necessary estimates for the $W_i$ terms in (44). More specifically, Lemmas 5 and 6 below are used to estimate $c_{1,n}$ , $c_{2,n}$ , and $c_{3,n}$ , while Lemma 7 calculates the rate of the remaining integral terms. These estimates are then substituted into the $W_i$ terms in (44). Recall that $C^*$ is a positive generic constant, which is independent of n.
Lemma 5. Let the assumptions of Theorem 3 hold, and let $\gamma \in (0,1/2)$ . Set $\rho_n \,{:\!=}\, ne^{-\zeta(d-1) R_n/2}.$ For $p \geq 1$ , define $\mathcal{G}_p^{(1)}$ and $\mathcal{G}_p^{(2)}$ respectively to be the set of all trees on $\{0\} \cup [p]$ and $\{-1,0\} \cup [p]$ . For a tree $T \in \mathcal{G}_p^{(1)}$ , let $d^{\prime}_0, d^{\prime}_1,\ldots, d^{\prime}_p$ be the degree sequence, and similarly, for a tree $T \in \mathcal{G}_p^{(2)}$ , let $d^{\prime}_{-1},d^{\prime}_0,\ldots,d^{\prime}_p$ be the degree sequence. Then we have
where
Lemma 6. In the notation of Lemma 5, for $0 < \gamma < 1/2$ ,
where
Lemma 7. For $0 < \gamma < 1/2$ and $0 < a < 1$ , we have
Remark 3.
-
(i) One can make the growth rate of the $c_{i,n}^{\prime}$ as slow as one likes by choosing $\gamma$ small enough. To make this a little more clear, assume, for simplicity, that $2\alpha /\zeta$ is not an integer. Then, as $n\to\infty$ ,
\begin{align*}c_{1,n}^\prime \sim \max_{\substack{p = k-1,\ldots,5(k-1), \\ T \in \mathcal{G}_p^{(1)}}} \prod_{j=1}^p\, \Bigl| \, (d-1) (\alpha-\zeta d_j^{\prime}/2) \, \Bigr|^{-1} e^{\zeta (d-1)\bigl[ d_0^\prime + \sum_{i=1}^p \bigl( d_i^\prime -2\alpha /\zeta \bigr)_+ \bigr] \gamma R_n /2}.\end{align*}Note also that $\rho_n\to c \in (0,\infty]$ implies $R_n = O (\log n)$ . Therefore, for any $\epsilon >0$ , there exists $\gamma_0>0$ such that for all $0<\gamma<\gamma_0$ , we have that $c_{1,n}^{\prime} = o(n^\epsilon)$ . The same claim holds for $c_{2,n}^{\prime}, c_{3,n}^{\prime}$ as well. -
(ii) The separation of each $c_{i,n}$ into $c^{\prime}_{i,n}$ and $\rho_n$ terms is because $c^{\prime}_{i,n}$ depends on $\gamma$ , whereas $\rho_n$ does not.
-
(iii) In many applications to Euclidean stochastic geometric functionals (see [Reference Last, Peccati and Schulte32, Section 7]), $c_{1,n},c_{2.n}$ are shown to be bounded in n, whereas in our case, $c_{1,n},c_{2.n}$ are unbounded in n. This is why it is challenging to obtain optimal Berry–Esseen bounds in our normal approximation result.
Proof of Lemma 5. Fix $x \in B_d^{(\zeta)}$ . For $p=k-1,\dots,5(k-1)$ , let $\Sigma_{5(k-1),p}$ denote the set of all surjective maps from $[5(k-1)]$ to [p]. From (42), we have that
It is possible that under some surjections $\sigma$ , the coordinates in $\bigl(\sigma((i-1)(k-1)+1),\ldots,\sigma(i(k-1))\bigr)$ may repeat for some i, but in such cases, $g = 0$ by definition; thus, $\Sigma_{5(k-1), p}$ in the last expression can be replaced with
Now, let us fix $\ell = 1$ , without loss of generality, and $p \in \{k-1,\ldots,5(k-1)\}$ , $\sigma \in \Sigma_{5(k-1),p}^*$ ; then we shall bound
Let $G_\sigma$ be a simple graph on $\{0\}\cup [p]$ with the edge set defined as follows: for every $i=1,\dots,5$ , define $\bigl(\sigma((i-1)(k-1)+j_1),\sigma((i-1)(k-1)+j_2) \bigr) \in E_{\sigma}$ if $(j_1+1,j_2+1)$ is an edge in $\Gamma_k$ . Similarly we say that $\bigl(0,\sigma((i-1)(k-1)+j)\bigr) \in E_{\sigma}$ if $(1,j+1)$ is an edge in $\Gamma_k$ . Then, setting $X_0 = x$ , we have that $G_{\sigma}$ is the graph counted by the summand in (47), i.e.,
Now, the surjectivity of $\sigma$ implies that $G_\sigma$ is connected, and thus we can always find a spanning tree $G_{\sigma}^{\prime}$ of $G_\sigma$ on $\{ 0\}\cup [p]$ . Let $d_0^{\prime}, \dots, d_p^{\prime}$ be the degree sequence of $G_\sigma^{\prime}$ , and let $E^{\prime}_\sigma$ be the edge set of $G_\sigma^{\prime}$ . By the definitions of $A_{n,p,\sigma},G_{\sigma}, G^{\prime}_{\sigma}$ , together with the monotonicity of ${\mathcal{C}}$ and the above identity, we have that
where $t_0=R_n-d_\zeta(0,x)$ is deterministic, and the Mecke formula is applied at the last equality. Note that $|E^{\prime}_\sigma|=p$ .
Proceeding as in the derivation of (34) and noting that $T_i+T_j\leq R_n-\omega_n$ with $\omega_n=\log \log R_n$ always holds as we are taking $\gamma<1/2$ , we derive that
Now, let us take the maximum of $A_{n,p,\sigma}$ over all $p\in \{ k-1, \dots, 5(k-1) \}$ , $\sigma \in \Sigma_{5(k-1), p}^*$ , and the corresponding spanning tree $G_{\sigma}^\prime$ of a degree sequence $d_0^{\prime}, d_1^{\prime}, \dots, d_p^{\prime}$ . Equivalently, we take the maximum of $A_{n,p,\sigma}$ over all $p\in \{ k-1,\dots, 5(k-1) \}$ and $T\in \mathcal G_p^{(1)}$ , to conclude that
If $\rho_n \to \infty$ , then clearly $\rho_n^p = O(\rho_n^{5(k-1)})$ for all $k-1 \leq p \leq 5(k-1)$ , and hence the bound (45) holds trivially. Otherwise, $\rho_n \to c \in (0,\infty)$ , but we can still easily get (45).
Now we shall derive the bound for $c_{2,n}$ in (46). For $p \in \{ k-2,\dots,5(k-2) \}$ , let
Then, arguing as in the derivation of the bounds for $c_{1,n}$ , we find that the task of bounding $\mathbb{E}\bigl[ \bigl|D_{x,y}S_n^{(\gamma)}\bigr|^5\bigr]$ is again reduced to that of bounding
for all $p \in \{k-2,\ldots,5(k-2)\}$ and $\sigma \in \Sigma_{5(k-2),p}^*$ .
Setting $X_{-1} = x, X_0 = y$ respectively, we can define a simple graph $G_\sigma$ on $\{ -1,0 \} \cup [p]$ as in (48) such that
Let $G_\sigma^{\prime}$ be a spanning tree of $G_\sigma$ , which exists as $G_\sigma$ is connected by the surjectivity of $\sigma$ . Let $d^\prime_{-1},d^\prime_0,\ldots,d^\prime_{p}$ be the respective degrees of vertices $\{-1,0\} \cup [p]$ in $G^{\prime}_{\sigma}$ , and let $E^{\prime}_\sigma$ be its edge set. Note that $|E^{\prime}_\sigma|=p+1$ .
Then, setting $t_{-1} = R_n-d_\zeta(0,x)$ and $t_0=R_n-d_\zeta(0,y)$ (which are deterministic), we again derive that
We basically proceed again as in the derivation of (34) by conditioning on the $T_i$ , but here, we need to account for one extra complication, i.e., whether $(\!-1,0) \in E^{\prime}_{\sigma}$ or not. If $(\!-1,0) \in E^{\prime}_\sigma$ , then whether $X_{-1}=x$ and $X_0=y$ are connected is purely deterministic and the randomness arises only in the remaining p edges. In that case, the rightmost term in (52) is bounded by
On the other hand, if $(\!-1,0)\notin E^{\prime}_\sigma$ , the randomness arises in all $p+1$ edges of $E^{\prime}_\sigma$ . Then the rightmost term in (52) is bounded by
Combining (53) and (54), we conclude that
Now, proceeding as in the derivation of (45) (see below (51)), we get (46).
Proof of Lemma 6. From the same reasoning as in Lemma 5, it suffices to bound
for every $p \in \{ k-1,\dots,3(k-1) \}$ and $\sigma \in \Sigma_{3(k-1),p}^*$ .
Let $G_\sigma$ be the same simple graph on $\{0\} \cup [p]$ as that constructed in the proof of Lemma 5(i) (see (48)), for which x is identified as a vertex ‘0’. Once again, let $G^{\prime}_{\sigma}$ be a spanning tree of $G_{\sigma}$ , with $d_0^{\prime},\dots, d_p^{\prime}$ the degree sequence and $E^{\prime}_\sigma$ the edge set of $G_\sigma^{\prime}$ . It follows from the monotonicity of ${\mathcal{C}}$ and the Mecke formula that
For further calculations, we note that $X_0$ in (55) is random, whereas $X_0=x$ in (49) was purely deterministic. Taking into consideration this difference and using Theorem 1 with $k=p+1$ , we have
Finally, taking the maximum completes the proof.
Proof of Lemma 7. We first need the following fact. For $0 < \gamma <1/2$ , let $HG({\mathcal{X}})$ be a hyperbolic geometric graph on a point set ${\mathcal{X}} \subset D_{\gamma}(R_n)$ , connecting any two points within a distance $R_n$ . Suppose that $y_1,y_2 \in {\mathcal{X}}$ are connected by a path of length $\ell \geq 1$ in $HG({\mathcal{X}})$ , and their hyperbolic distances from the boundary, given by $t_i = R_n-d_\zeta(0,y_i)$ , $i=1,2$ , satisfy $t_1,t_2 \leq \gamma R_n$ . For the relative angle $\theta_{12}$ between $y_1$ and $y_2$ , we claim that
uniformly for $t_1,t_2 \leq \gamma R_n$ .
The proof of (56) can be done inductively. For $\ell=1$ , set $\hat \theta_{12} = \bigl( e^{-2\zeta (R_n-t_1)} + e^{-2\zeta (R_n-t_2)} \bigr)^{1/2}$ as in Lemma 3. Since $t_1 + t_2 \leq 2\gamma R_n < R_n-\omega_n$ , we get
as in the proof of Lemma 4.
If $\theta_{12} \leq \hat \theta_{12}$ , then $\theta_{12} \leq \bigl(1+o(1)\bigr)e^{-\zeta (1-2\gamma) R_n/2}$ by (57) with $t_1, t_2 \le \gamma R_n$ , and so (56) holds for $\ell=1$ . If $\theta_{12}\gg \hat \theta_{12}$ , Lemma 3 yields the following: uniformly for $t_i \leq \gamma R_n$ , $i=1,2$ , we have that
Equivalently, we get that $\sin \big(\theta_{12}/2 \big) \le \big( 1+o(1) \big) e^{-\zeta (1-2\gamma) R_n/2}$ . As the right-hand side tends to 0 as $n\to\infty$ , we have, uniformly for $t_i \leq \gamma R_n$ , $i=1,2$ ,
Hence, in either case, the claim for $\ell = 1$ follows.
Now, suppose that the claim holds for $\ell-1$ . Then, if $y_1,y_2$ have a path of length $\ell$ , there exists a $y_0$ such that $y_1$ and $y_0$ have a path of length $\ell-1$ , and $y_0$ and $y_2$ have a path of length 1. Denoting the corresponding relative angles by $\theta_{10}$ and $\theta_{02}$ , we see that $\theta_{12} \leq \theta_{10} + \theta_{02}$ ; hence, the proof of (56) can be completed by the induction hypothesis.
Fix $x_1,x_2,x_3$ such that $t_i \leq \gamma R_n$ for $i =1,2,3$ . From (43), $D^2_{x_1,x_3}S_n^{(\gamma)} \neq 0$ implies that there is a path of length at most $diam(\Gamma_k)$ (i.e., the diameter of the graph) from $x_1$ to $x_3$ in the hyperbolic random geometric graph on $\bigl( \mathcal{P}_n \cap D_{\gamma}(R_n) \bigr) \cup \{x_1,x_3\}$ with radius of connectivity $R_n$ . Thus, from (56) and $diam(\Gamma_k) \leq k$ , we have that
Therefore,
where $\Theta_{j3}$ denotes the relative angle between $X_j$ and $X_3$ , $j=1,2$ , and $T_i=R_n-d_\zeta(0,X_i)$ for $i=1,2,3$ .
Now, the probability of the last term equals
Using the density (28) of a relative angle, it is easy to see that
uniformly for $t_i \leq \gamma R_n$ , $i=1,2,3$ . It now follows from Lemma 1(ii) that (58) is asymptotically equal to
This proves the first statement in the lemma. By exactly the same argument, we obtain the second statement in the lemma.
3.3.3. Proof of the central limit theorem in Theorem 3
We now put together all the bounds and prove our main CLT.
Proof of Theorem 3. From (16), we have
Substituting the lower bound for variance above and the bounds in Lemmas 5–7 into (44), and using the definition of $\rho_n$ , we obtain
From Theorem 4, we know that
By the definition of $a_n^{(\gamma)}$ , one can see that
Note also that (18) ensures $R_n\le C^* \log n$ . Substituting all of these bounds back into $W_1, W_2$ , and $W_3$ above, we find that
for some positive constants $c_i$ , $i=1,2,3$ . Hence, for any $a < 1/2$ , we can choose $\gamma_0$ so small that $W_1 + W_2 + W_3 = O(n^{-a})$ as $n \to \infty$ , for all $0 < \gamma < \gamma_0$ . This proves the Wasserstein bound in (19).
To prove the Kolmogorov bound in (19), again using the bounds in Theorem 4, along with Lemmas 5, 6, and 7 and the variance lower bound above, we derive using (44) that
From Theorem 4, we have
Substituting the bounds for $c_{1,n}^{\prime}$ , $c_{2,n}^{\prime}$ , and $c_{3,n}^{\prime}$ into $W_4, W_5$ , and $W_6$ , as well as $R_n\le C^* \log n$ , one can see that
for some $c_i>0$ , $i=4,5,6$ . Now, for any $a < 1/2$ , we can choose $\gamma_0$ small enough so that the Kolmogorov bound in (19) holds for all $0< \gamma < \gamma_0$ .
In order to show (20), let us assume $\alpha / \zeta > d_{(k)}$ . First, choose $0<\gamma<1/2$ such that
Setting $U_n^{(\gamma)} = S_n - S_n^{(\gamma)}$ , we write
From (17), we have that $\mathbb{V}ar\left(S_n^{(\gamma)}\right) \sim \mathbb{V}ar(S_n)$ as $n \to \infty$ . Since the CLT holds for $S_n^{(\gamma)}$ , the first term converges in distribution to N as $n \to \infty.$ Now, from (41), we know that $\mathbb{V}ar\left(U_n^{(\gamma)}\right) / \mathbb{V}ar(S_n) \to 0$ as $n \to \infty$ , and hence, by Chebyshev’s inequality, the second term converges to 0 in probability. Thus, applying Slutsky’s theorem, we obtain the CLT for $S_n$ as required.
Remark 4. If $2\alpha / \zeta > 5d_{(k)}$ , one can deduce more precise estimates for $W_i$ , $i=1,\dots,6$ . In fact, by the choice of the $d_i$ , we have that each $a_n^{(\gamma)}(d^{\prime}_i)$ converges to a constant, and so we derive that $c^{\prime}_{i,n} \leq C^* (n/\rho_n)^{5(k-1)\gamma}$ for $i = 1,2$ . Thus we have that
In particular, the above inequalities imply that $W_i \to 0$ for $i = 1,\ldots,6$ , whenever $\gamma < 1/8(k-1)$ .
Appendix A
In this appendix, we state the main normal approximation result in [Reference Last, Peccati and Schulte32] and explain how Theorem 4 can be deduced from the same. Recall the notation from Section 3.3.1.
Theorem 5. ([Reference Last, Peccati and Schulte32, Theorems 1.1 and 1.2]) Let F be as in Theorem 4. Then
where $\gamma_i(F)$ , $i = 1,\ldots,6,$ are given by
Proof of Theorem 4. We aim to deduce the bounds for $\gamma_i(F)$ above via the proof strategy for [Reference Last, Peccati and Schulte32, Theorem 6.1]. Using the definitions of $c_1,c_2$ , as well as Hölder’s inequality, we have that
Applying these bounds, together with further applications of Hölder’s inequality, gives that $\gamma_1(F) \leq W_1$ and $\gamma_2(F) \leq W_2$ . Note that $\gamma_3(F)=W_3$ . This completes the proof for the Wasserstein bound in Theorem 4.
Subsequently, the bounds in (59) also yield that $\gamma_5(F) \leq W_5$ and $\gamma_6(F) \leq W_6$ . Using the equation at the end of the proof of [Reference Last, Peccati and Schulte32, Theorem 6.1], we obtain that
where
Using the trivial inequality $\mathbb{P}(D_xF \neq 0) \leq 1$ , we get that $\gamma_4(F) \le W_4$ . Now the Kolmogorov bound in Theorem 4 has been obtained.
Acknowledgements
The authors are very grateful for the detailed and useful comments received from two anonymous referees. These comments helped the authors to introduce a number of improvements to the paper. The work benefited from the visit of both the authors to Technion, Israel, and the authors are thankful to their host Robert Adler for the same. D. Y. is also thankful to the Department of Statistics at Purdue University for hosting him. D. Y. wishes to thank Subhojoy Gupta for some helpful discussions on hyperbolic geometry.
Funding information
T. O.’s research is partially supported by the NSF grant DMS-1811428. D. Y.’s research was supported in part by DST-INSPIRE Faculty fellowship and CPDA from the Indian Statistical Institute.
Competing interests
There were no competing interests to declare which arose during the preparation or publication process for this article.