1. Introduction
There is a growing literature of problems in physics, mathematics, computer science, and operations research that are set up as processes, random or not, on large sparse graphs. The range of problems being studied is wide, and includes problems related to the classification, sorting, and ranking of large networks, as well as the analysis of Markov chains and interacting particle systems on graphs. Popular among the types of graphs used for these purposes are the locally tree-like random graph models such as the configuration model and the inhomogeneous random graph family (which includes the classical Erdős–Rényi model). These random graph models are quite versatile in the types of graphs they can mimic, and have important mathematical properties that make their analysis tractable.
In particular, the mathematical tractability of locally tree-like random graphs comes from the fact that their local neighborhoods resemble trees. This property makes it easy to transfer questions about the process of interest on a graph to the often easier analysis of the process on the limiting tree. Mathematically, this transfer is enabled by the notion of local weak convergence [Reference Aldous and Lyons1, Reference Aldous and Steele2, Reference Benjamini and Schramm3, Reference Garavaglia, van der Hofstad and Litvak15]. However, as is the case for many problems involving usual weak convergence of random variables, it is often desirable to construct the original set of random variables and their corresponding weak limits on the same probability space; in other words, to have a coupling. In addition, many problems studying processes on graphs require that we keep track of additional vertex attributes not usually included in the local weak limits, attributes that may not be discrete. The recent work in [Reference Fraiman, Lin and Olvera-Cravioto14] gives several examples of Markov chains and systems of equations on directed graphs whose analysis relies on the kind of couplings presented here, and include study of the personalized PageRank distribution [Reference Chen, Livtak and Olvera-Cravioto9, Reference Garavaglia, van der Hofstad and Litvak15, Reference Olvera-Cravioto20], among others. Further applications include the analysis of interacting diffusions [Reference Lacker, Ramanan and Wu16, Reference Lacker, Ramanan and Wu17], where we may wish to allow the vertex attributes to influence the dynamics of the processes being studied. The results in this paper were designed to solve these two problems simultaneously by providing a general-purpose coupling between the exploration of the neighborhood of a uniformly chosen vertex in a locally tree-like graph and its local weak limit, including general vertex attributes that may indirectly influence the construction of the graph.
The main results focus only on the two families of random graph models that are known to converge, in the local weak sense, to a marked Galton–Watson process. It is worth mentioning that other locally tree-like graphs like the preferential attachment models do not fall into this category, since their local weak limits are continuous-time branching processes. In particular, we focus on random graphs constructed according to either a configuration model or any of the inhomogeneous random graph models with rank-1 kernels (see Sections 1.1 and 1.2 for the precise definitions). Our results include both undirected and directed graphs, and are given under minimal moment conditions in order to include scale-free graphs. In particular, under our assumptions, it is possible for the offspring distribution in the limiting marked Galton–Watson process to have infinite mean, and in the directed case for the limiting joint distribution of the in-degree and out-degree of a vertex to have infinite covariance.
Before describing the two families of random graph models for which our coupling theorems hold, we introduce some definitions to be used throughout the paper. We use $G(V_n, E_n)$ to denote a graph, or multigraph, having vertices $V_n = \{1, 2, \dots, n\}$ and edges in the set $E_n$. A directed edge from vertex i to vertex j is denoted by (i, j). For multigraphs, we also need to keep track of the multiplicity of each edge or self-loop, so we use l(i) to denote the number of self-loops of vertex i, and e(i, j) to denote the number of edges from vertex i to vertex j. If the graph is undirected, we simply ignore the direction. In the undirected case, we use $D_i$ to denote the degree of vertex i, which corresponds to the number of adjacent neighbors of vertex i. In the directed case, we use $D_i^-$ to denote the in-degree of vertex i and $D_i^+$ to denote its out-degree; the in-degree counts the number of inbound neighbors, and the out-degree the number of outbound ones. All our results are given in terms of the large-graph limit, which corresponds to taking a sequence of graphs $\{ G(V_n, E_n) \,:\, n \geq 1 \}$ and taking the limit as $|V_n| = n \to \infty$, where $|A|$ denotes the cardinality of set A. Both the configuration model and the family of inhomogeneous random graphs are meant to model large static graphs, since there may be no relation between $G(V_n, E_n)$ and $G(V_{m}, E_m)$ for $n \geq m$. Strong couplings for evolving graphs such as the preferential attachment models are a topic for future work.
1.1. Configuration model
The configuration model [Reference Bollobás4, Reference van der Hofstad21] produces graphs from any prescribed (graphical) degree sequence. In the undirected version of this model, each vertex is assigned a number of stubs or half-edges equal to its target degree. Then, these half-edges are randomly paired to create edges in the graph.
For an undirected configuration model (CM), we assume that each vertex $i \in V_n$ is assigned an attribute vector $\mathbf{a}_i = (D_i, \mathbf{b}_i)$, where $D_i \in \mathbb{N}$ is its degree, and $\mathbf{b}_i$ encodes additional information about vertex i that does not directly affect the construction of the graph but may depend on $D_i$. The attributes $\{\mathbf{b}_i\}$ are assumed to take values on a separable metric space $\mathcal{S}^{\prime}$. For the sequence $\{D_i\,:\, 1 \leq i \leq n\}$ to define the degree sequence of an undirected graph, we must have that $L_n \,:\!=\, \sum_{i=1}^n D_i$ is even. Note that this may require us to consider a double sequence $\{ {\bf a}_i^{(n)}\,:\, i \geq 1, n \geq 1\}$ rather than a unique sequence, i.e. one where $\mathbf{a}_i^{(n)} \neq \mathbf{a}_i^{(m)}$ for $n \neq m$. In applications it is often convenient to allow the vertex attributes to be random themselves; [Reference Fraiman, Lin and Olvera-Cravioto14] studies various stochastic recursions on graphs that include vertex attributes such as the ones we envision here.
Assuming that $L_n$ is even, enumerate all the stubs and pick one stub to pair; suppose that stub belongs to vertex i. Next, choose one of the remaining $L_n-1$ stubs uniformly at random, and if the stub belongs to vertex j, draw an edge between vertices i and j. Then, pick another stub to pair. In general, a stub being paired chooses uniformly at random from the set of unpaired stubs, then identifies the vertex to which the chosen stub belongs, and creates an edge between its vertex and the one to which the chosen stub belongs.
The directed version of the configuration model (DCM) is such that each vertex $i \in V_n$ is assigned an attribute of the form $\mathbf{a}_i = (D_i^-, D_i^+, \mathbf{b}_i) \in \mathbb{N}^2 \times \mathcal{S}^{\prime}$. Similarly to the undirected case, $D_i^-$ and $D_i^+$ denote the in-degree and the out-degree, respectively, of vertex i, and $\mathbf{b}_i$ is allowed to depend on $(D_i^-, D_i^+)$. The condition needed to ensure we can draw a graph is now $L_n \,:\!=\, \sum_{i=1}^n D_i^+ = \sum_{i=1}^n D_i^-$, which again may require us to consider a double sequence $\big\{ {\bf a}_i^{(n)}\,:\, i \geq 1, n \geq 1\big\}$.
As for the CM, we give to each vertex i a number $D_i^-$ of inbound stubs, and a number $D_i^+$ of outbound stubs. To construct the graph, we start by choosing an inbound (outbound) stub, say belonging to vertex i, and choose uniformly at random one of the $L_n$ outbound (inbound) stubs. If the chosen stub belongs to vertex j, draw an edge from j to i (from i to j); then pick another inbound (outbound) stub to pair. In general, when pairing an inbound (outbound) stub, we pick uniformly at random from all the remaining unpaired outbound (inbound) stubs. If the stub being paired belongs to vertex i, and the one to which the chosen stub belongs to is j, we draw a directed edge from j to i (from i to j).
We emphasize that both the CM and the DCM are in general multi-graphs, that is, they can have self-loops and multiple edges (in the same direction) between a given pair of vertices. However, provided the pairing process does not create self-loops or multiple edges, the resulting graph is uniformly chosen among all graphs having the prescribed degree sequence. It is well known that when the empirical degree distribution converges weakly and its second moment converges to that of the limit, the pairing process results in a simple graph with a probability that remains bounded away from zero even as the graph grows [Reference Chen and Olvera-Cravioto10, Reference van der Hofstad21].
We use $\mathscr{F}_n = \sigma( {\bf a}_i\,:\, 1 \leq i \leq n)$ to denote the sigma algebra generated by the attribute sequence, which does not include the edge structure of the graph. To simplify the notation, we use $\mathbb{P}_n({\cdot}) = \mathbb{P}(\cdot \mid \mathscr{F}_n)$ and $\mathbb{E}_n[{\cdot}] = \mathbb{E}[\, \cdot \mid \mathscr{F}_n]$ to denote the conditional probability and conditional expectation, respectively, given $\mathscr{F}_n$.
1.2. Inhomogeneous random graphs
The second class of random graph models we consider is the family of inhomogeneous random graphs (digraphs), in which the presence of an edge is determined by the toss of a coin, independently of any other edge. This family includes the classical Erdős–Rényi graph [Reference Erdős and Rényi13], but also several generalizations that allow the edge probabilities to depend on the two vertices being connected, e.g. the Chung–Lu model [Reference Chung and Lu11], the Norros–Reittu model (or Poissonian random graph) [Reference Norros and Reittu19], and the generalized random graph [Reference Britton, Deijfen and Martin-Läf7], to name a few. Unlike the Erdős–Rényi model, these generalizations are capable of producing graphs with inhomogeneous degree sequences, and can mimic almost any degree distribution whose support is $\mathbb{N}$ (or $\mathbb{N}^2$ in the directed case). This paper focuses only on inhomogeneous random graphs (digraphs) having rank-1 kernels (see [Reference Bollobás, Janson and Riordan6]), which excludes models such as the stochastic block model.
Collectively, this family of models has a long history in the random graph literature, and their connectivity properties, phase transitions, and degree distributions are well known. Rather than attempting to name all the existing references where these models have appeared, we refer the interested reader to the books [Reference Bollobás5, Reference Durrett12, Reference van der Hofstad21, Reference van der Hofstad22], where many of their properties have been compiled.
To define an undirected inhomogeneous random graph (IR), assign to each vertex $i \in V_n$ an attribute ${\bf a}_i = (W_i, {\bf b}_i) \in \mathbb{R}_+ \times \mathcal{S}^{\prime}$. The $W_i$ will be used to determine how likely vertex i is to have neighbors, while the ${\bf b}_i$ can be used to include vertex characteristics that are not needed for the construction of the graph but that are allowed to depend on $W_i$. If convenient, one can consider using a double sequence $\big\{ \mathbf{a}_i^{(n)}\,:\, 1 \geq 1, n \geq 1\big\}$ as with the configuration model, but this is not as important since the sequence $\mathscr{W}_n \,:\!=\, \{W_i\,:\, 1 \leq i \leq n\}$ does not need to satisfy any additional conditions in order for us to draw the graph. As with the CM (DCM), the vertex attributes are allowed to be random.
We use the notation $\mathscr{F}_n = \sigma( {\bf a}_i\,:\, 1 \leq i \leq n)$, as for the configuration model, to denote the sigma algebra generated by the vertex attributes, as well as the notation for the corresponding conditional probability, $\mathbb{P}_n({\cdot}) = \mathbb{P}(\cdot \mid \mathscr{F}_n)$, and expectation, $\mathbb{E}_n[{\cdot}] = \mathbb{E}[\, \cdot \mid \mathscr{F}_n]$.
For the IR, the edge probabilities are given by
where $-1 < \varphi_n(W_i, W_j) = \varphi(n, W_i, W_j,\mathscr{W}_n)$ almost surely (a.s.) is a function that may depend on the entire sequence $\mathscr{W}_n$, on the types of the vertices $\{i,j\}$, or exclusively on n, and $0<\theta < \infty$ satisfies $\frac{1}{n} \sum_{i=1}^n W_i \stackrel{\textrm{P}}{\longrightarrow} \theta$, $n \to \infty$. Here, and in the following, $x \wedge y = \min\{x,y\}$ and $x \vee y = \max\{x, y\}$. Since the graph is to be simple by construction, $p_{ii}^{(n)} \equiv 0$ for all $i \in V_n$.
For the directed version, which we refer to as an inhomogeneous random digraph (IRD), the vertex attributes take the form $\mathbf{a}_i = (W_i^-, W_i^+, \mathbf{b}_i) \in \mathbb{R}_+^2 \times \mathcal{S}^{\prime}$. The parameter $W_i^-$ controls the in-degree of vertex i, and $W_i^+$ its out-degree. If we write $\mathbf{W}_i = (W_i^-, W_i^+)$, the edge probabilities in the IRD are given by
where $-1 < \varphi_n(\mathbf{W}_i, \mathbf{W}_j) = \varphi(n, \mathbf{W}_i, \mathbf{W}_j,\mathscr{W}_n)$ a.s. is a function that may depend on the entire sequence $\mathscr{W}_n\,:\!=\, \{ \mathbf{W}_i\,:\, 1 \leq i \leq n\}$, on the types of the vertices $\{i,j\}$, or exclusively on n, and $0<\theta < \infty$ satisfies $\frac{1}{n} \sum_{i=1}^n (W_i^- + W_i^+) \stackrel{\textrm{P}}{\longrightarrow} \theta$, $n \to \infty$. Since the graphs are again simple by construction, we have $p_{ii}^{(n)} \equiv 0$ for all $i \in V_n$.
2. Main result for undirected graphs
For an undirected graph constructed according to one of the two models (CM or IR), our main result shows that there exists a coupling between the breadth-first exploration of the component of a uniformly chosen vertex and that of the root node of a marked Galton–Watson process. Before we can state the theorem, we need to introduce some notation on the graph and describe the Galton–Watson process that describes its local weak limit.
Each vertex i in an undirected graph (multigraph) $G(V_n, E_n)$ is given a vertex attribute of the form
In addition, define for each vertex i its full mark ${\bf X}_i = (D_i, {\bf a}_i)$, where $D_i$ is the degree of vertex i. We point out that the definition of $\mathbf{X}_i$ is redundant when the graph is a CM; however, this is not so if the graph is an IR. In both cases the vertex attributes are measurable with respect to $\mathscr{F}_n$, while the full marks are not if the graph is an IR.
The main assumption needed for the coupling to hold is given in terms of the empirical measure for the vertex attributes, i.e.
In order to state the assumption, recall that the state space for the vertex attributes, $\mathcal{S}^{\prime}$, is assumed to be a separable metric space under metric $\rho^{\prime}$. Now define the metric $\rho({\bf x}, {\bf y}) = |x_1 - y_1| + |x_2 - y_2| + \rho^{\prime}({\bf x}_3, {\bf y}_3)$, ${\bf x} = (x_1, x_2, {\bf x}_3)$, ${\bf y} = (y_1, y_2, {\bf y}_3)$, on the space $\mathcal{S} \,:\!=\, \mathbb{N} \times \mathbb{R} \times \mathcal{S}^{\prime}$, which makes $\mathcal{S}$ a separable metric space as well. Using $\rho$, and for any probability measures $\nu_n$, $\mu_n$ on the conditional probability space $(\mathcal{S}, \mathscr{F}_n, \mathbb{P}_n)$, define the Wasserstein metric of order one: $W_1(\nu_n, \mu_n) = \inf\!\big\{ \mathbb{E}_n\big[ \rho({\hat{\mathbf{Y}}}, \boldsymbol{{Y}}) \big] \,:\, \text{law}({\hat{\mathbf{Y}}} \mid \mathscr{F}_n) = \nu_n$, $\text{law}(\boldsymbol{{Y}} \mid \mathscr{F}_n) = \mu_n \big\}$.
Assumption 2.1. (Undirected.) Let $\nu_n$ be defined according to (2.1), and suppose there exists a non-random probability measure $\nu$ such that $ W_1(\nu_n, \nu) \stackrel{\mathbb{P}}{\longrightarrow} 0$, $n \to \infty$. In addition, assume that the following conditions hold:
A. In the CM, let $(\mathscr{D}, \boldsymbol{{B}} )$ be distributed according to $\nu$, and suppose there exists a non-random ${\bf b}_0 \in \mathcal{S}^{\prime}$ such that $\mathbb{E}[\mathscr{D} + \rho^{\prime}(\boldsymbol{{B}}, {\bf b}_0) ] < \infty$.
B. In the IR, let $(W, \boldsymbol{{B}})$ be distributed according to $\nu$, and suppose the following hold:
(i) $\mathcal{E}_n = \frac{1}{n} \sum_{i=1}^n \sum_{1 \leq i \neq j\leq n} \big|p_{ij}^{(n)} - (r_{ij}^{(n)} \wedge 1) \big| \xrightarrow{\textrm{P}} 0$ as $n \to \infty$, where $r_{ij}^{(n)} = W_i W_j/(\theta n)$.
(ii) There exists a non-random ${\bf b}_0 \in \mathcal{S}^{\prime}$ such that $\mathbb{E}[W + \rho^{\prime}(\boldsymbol{{B}}, {\bf b}_0) ] < \infty$.
Now that we have stated the assumptions for our theorem, we need to describe the local neighborhood of a vertex in the graph $G(V_n, E_n)$. To do this, let $I \in V_n$ denote a uniformly chosen vertex in $G(V_n, E_n)$; vertices are identified with their labels in $\{1, 2, \dots, n\}$. Define $A_0 = \{ I \}$, and let $A_k$ denote the set of vertices at hop distance k from I. Now write $\mathcal{G}_I^{(k)}$ to be the subgraph of $G(V_n, E_n)$ consisting of the vertices in $\bigcup_{r=0}^k A_r$ along with their (multiple) edges and self-loops. We also use the notation $\mathcal{G}_I^{(k)}(\mathbf{a})$ to refer to the graph $\mathcal{G}_I^{(k)}$ including all the attributes of its vertices.
Definition 2.1. We say that two simple graphs $G(V, E)$ and $G^{\prime}(V^{\prime}, E^{\prime})$ are isomorphic if there exists a bijection $\sigma \,:\, V \to V^{\prime}$ such that edge $(i,j) \in E$ if and only if edge $(\sigma(i), \sigma(j)) \in E^{\prime}$. We say that two multigraphs $G(V, E)$ and $G^{\prime}(V^{\prime}, E^{\prime})$ are isomorphic if there exists a bijection $\sigma\,:\, V \to V^{\prime}$ such that $l(i) = l(\sigma(i))$ and $e(i,j) = e(\sigma(i), \sigma(j))$ for all $i \in V$ and all $(i,j) \in E$, where $l(i)$ is the number of self-loops of vertex i, and $e(i,j)$ is the number of edges from vertex i to vertex j. In both cases, we write $G \simeq G^{\prime}$.
To describe the limit of $\mathcal{G}_I^{(k)}$ as $n \to \infty$, we construct a delayed marked Galton–Watson process, denoted $\mathcal{T}(\boldsymbol{{A}})$, using the measure $\nu$ in Assumption 2.1. Here, ‘delayed’ refers to the fact that the root will, in general, have a different distribution than all other nodes in the tree.
To start, let $\mathcal{U} \,:\!=\, \bigcup_{k=0}^\infty \mathbb{N}_+^k$ denote the set of labels for nodes in a tree, with the convention that $\mathbb{N}_+^0 \,:\!=\, \{ \emptyset \}$ contains the root. For a label $\mathbf{i} = (i_1, \dots, i_k)$ we write $|\mathbf{i}| = k$ to denote its length, and use $(\mathbf{i}, j) = (i_1, \dots, i_k, j)$ to denote the index concatenation operation.
The tree $\mathcal{T}$ is constructed as follows. Let $\{ (\mathcal{N}_\mathbf{i}, \boldsymbol{{A}}_\mathbf{i})\,:\, \mathbf{i} \in \mathcal{U} \}$ denote a sequence of independent vectors in $\mathcal{S}$, with $\{ (\mathcal{N}_\mathbf{i}, \boldsymbol{{A}}_\mathbf{i})\,:\, \mathbf{i} \in \mathcal{U}, \mathbf{i} \neq \emptyset \}$ independent and identically distributed (i.i.d.). For any $\mathbf{i} \in \mathcal{U}$, $\mathcal{N}_\mathbf{i}$ denotes the number of offspring of node $\mathbf{i}$, and $\boldsymbol{{A}}_\mathbf{i}$ denotes its attribute (mark). As with the graph, we use the notation $\mathcal{T}$ to denote the tree without its attributes. Let $\mathcal{A}_0 = \{ \emptyset \}$ and recursively define $\mathcal{A}_k = \{(\mathbf{i}, j)\,:\, \mathbf{i} \in \mathcal{A}_{k-1}$, $1 \leq j \leq \mathcal{N}_\mathbf{i} \}$, $k \geq 1$, to be the kth generation of $\mathcal{T}$. To match the notation on the graph, we write $\boldsymbol{{X}}_\emptyset = (\mathcal{N}_\emptyset, \boldsymbol{{A}}_\emptyset)$ and $\boldsymbol{{X}}_\mathbf{i} = (\mathcal{N}_\mathbf{i}+1, \boldsymbol{{A}}_\mathbf{i})$, $\mathbf{i} \neq \emptyset$. The marked tree is then given by $\mathcal{T}(\boldsymbol{{A}}) = \{ \boldsymbol{{X}}_\mathbf{i}\,:\, \mathbf{i} \in \mathcal{T} \}$; note that the marks include the number of offspring of each node, from which the edges in the tree can be deduced. We denote by $\mathcal{T}^{(k)}$ ($ \mathcal{T}^{(k)}(\boldsymbol{{A}})$) the restriction of $\mathcal{T}$ ($\mathcal{T}(\boldsymbol{{A}})$) to its first k generations.
It only remains to identify the distribution of $\boldsymbol{{X}}_\mathbf{i}$, for both $\mathbf{i} = \emptyset$ and $\mathbf{i} \neq \emptyset$, in terms of the probability measure $\nu$ in Assumption 2.1. For a CM, let $\boldsymbol{{A}} = (\mathscr{D}, \boldsymbol{{B}})$ be distributed according to $\nu$; then, $\mathbb{P}(\boldsymbol{{X}}_\emptyset \in \cdot\, ) = \mathbb{P}( (\mathscr{D}, \boldsymbol{{A}}) \in \cdot)$, and $\mathbb{P}(\boldsymbol{{X}}_\mathbf{i} \in \cdot) = ({1}/{\mathbb{E}[\mathscr{D}]}) \mathbb{E} [ \mathscr{D}\mathbf{1}( (\mathscr{D}, \boldsymbol{{A}}) \in \cdot) ]$, $\mathbf{i} \neq \emptyset$. For an IR, let $\boldsymbol{{A}} = (W, \boldsymbol{{B}})$ be distributed according to $\nu$; then, $\mathbb{P}(\boldsymbol{{X}}_\emptyset \in \cdot) = \mathbb{P}( (D, \boldsymbol{{A}}) \in \cdot)$, and $\mathbb{P}(\boldsymbol{{X}}_\mathbf{i} \in \cdot) = ({1}/{\mathbb{E}[W]}) \mathbb{E}[ W \mathbf{1}( (D+1, \boldsymbol{{A}}) \in \cdot) ]$, $\mathbf{i} \neq \emptyset$, where D is a mixed Poisson random variable with mean W. Note that the distribution of $\boldsymbol{{X}}_\mathbf{i}$ for $\mathbf{i} \neq \emptyset$ corresponds to a size-biased version of the distribution of $\boldsymbol{{X}}_\emptyset$ with respect to its first coordinate.
We are now ready to state the main coupling theorem for undirected graphs.
Theorem 2.1. Suppose $G(V_n, E_n)$ is either a CM or an IR satisfying Assumption 2.1. Then, for any fixed k and $\mathcal{G}^{(k)}_I({\bf a})$ the depth-k neighborhood of a uniformly chosen vertex $I \in V_n$, there exists a marked Galton–Watson tree $\mathcal{T}^{(k)} (\boldsymbol{{A}})$ restricted to its first k generations whose root corresponds to vertex I and which is such that $\mathbb{P}_n\big(\mathcal{G}_I^{(k)} \not\simeq \mathcal{T}^{(k)} \big) \xrightarrow{\textrm{P}} 0$, $n \to \infty$; if we let $\sigma({\bf i}) \in V_n$ denote the vertex in the graph corresponding to node ${\bf i} \in \mathcal{T}^{(k)}$, and define for any $\varepsilon > 0$ the event $C_{I}^{(k,\varepsilon)} = \big\{ \bigcap_{{\bf i} \in \mathcal{T}^{(k)}} \{ \rho(\mathbf{X}_{\sigma({\bf i})}, \boldsymbol{{X}}_{\bf i} ) \leq \varepsilon \} , \, \mathcal{G}_I^{(k)} \simeq \mathcal{T}^{(k)} \big\}$, then $\mathbb{E}_n\big[\rho(\mathbf{X}_I, \boldsymbol{{X}}_\emptyset) \big] \xrightarrow{\textrm{P}} 0$ and $\mathbb{P}_n \big( C_I^{(k,\varepsilon)} \big) \xrightarrow{\textrm{P}} 1$, $n \to \infty$. Moreover, for any fixed $m, k \geq 1$, $\varepsilon > 0$, and $\{I_j\,:\, 1 \leq j \leq m\}$ i.i.d. random variables uniformly chosen in $V_n$, there exist i.i.d. copies of $\mathcal{T}^{(k)}(\boldsymbol{{A}})$, denoted $\big\{ \mathcal{T}_{\emptyset(I_j)}^{(k)}(\boldsymbol{{A}})\,:\, 1 \leq j \leq m\big\}$, whose roots correspond to the vertices $\{I_j\,:\, 1 \leq j \leq m\}$ in $G(V_n, E_n)$, such that $\sum_{j=1}^m \mathbb{E}_n\big[ \rho(\mathbf{X}_{I_j}, \boldsymbol{{X}}_{\emptyset(I_j)}) \big] \xrightarrow{\textrm{P}} 0$ and $\mathbb{P}_n\big( \bigcap_{j=1}^m C_{I_j}^{(k,\varepsilon)} \big) \xrightarrow{\textrm{P}} 1$, $n \to \infty$.
Remark 2.1. Theorem 2.1 implies that the graph $G(V_n, E_n)$ converges in the local weak sense in probability, as introduced in [Reference Aldous and Lyons1, Reference Aldous and Steele2, Reference Benjamini and Schramm3] for undirected graphs and later extended in [Reference Garavaglia, van der Hofstad and Litvak15] to marked undirected graphs. The statement involving more than one exploration is related to the notion of propagation of chaos in the interacting particles literature (see, for example, [Reference Chaintron and Diez8, Definition 4.1]), and can be of independent interest in that context.
3. Main result for directed graphs
In the directed case, our main result will allow us to couple the breadth-first exploration of either the in-component or the out-component of a uniformly chosen vertex. Since the two cases are clearly symmetric, we state our results only for the in-component.
As with the undirected graph, each vertex i in the graph $G(V_n, E_n)$ has an attribute,
The full mark of vertex i is now given by $\mathbf{X}_i = (D_i^-, D_i^+, \mathbf{a}_i)$, where $D_i^-$ and $D_i^+$ are the in-degree and out-degree, respectively, of vertex i.
With some abuse of notation, we again use $\nu_n$, as defined in (2.1), to denote the empirical measure for the vertex attributes. However, the state space for the full marks is now $\mathcal{S} \,:\!=\, \mathbb{N}^2 \times \mathbb{R}^2 \times \mathcal{S}^{\prime}$, equipped with the metric $\rho(\mathbf{x}, \mathbf{y}) = |x_1 - y_1| + |x_2 - y_2| + |x_3 - y_3| + |x_4 - y_4| + \rho^{\prime}(\mathbf{x}_5, \mathbf{y}_5)$ for $\mathbf{x} = (x_1, x_2, x_3, x_4, \mathbf{x}_5)$ and $\mathbf{y} = (y_1, y_2, y_3, y_4, \mathbf{y}_5)$. The Wasserstein metric $W_1$ defined on the conditional probability space $(\mathcal{S}, \mathscr{F}_n, \mathbb{P}_n)$ remains the same after the adjustments made to $\mathcal{S}$ and $\rho$.
Assumption 3.1. (Directed.) Let $\nu_n$ be defined according to (2.1), and suppose there exists a non-random probability measure $\nu$ such that $ W_1(\nu_n, \nu) \stackrel{\textrm{P}}{\longrightarrow} 0$, $n \to \infty$. In addition, assume that the following conditions hold:
A. In the DCM, let $(\mathscr{D}^-, \mathscr{D}^+, \boldsymbol{{B}} )$ be distributed according to $\nu$, and suppose there exists a non-random ${\bf b}_0 \in \mathcal{S}^{\prime}$ such that $\mathbb{E}[\mathscr{D}^- + \mathscr{D}^+ + \rho(\boldsymbol{{B}}, {\bf b}_0) ] < \infty$.
B. In the IRD, let $(W^-, W^+, \boldsymbol{{B}})$ be distributed according to $\nu$, and suppose the following hold:
(i) $\mathcal{E}_n = \frac{1}{n} \sum_{i=1}^n \sum_{1 \leq i \neq j\leq n} \big|p_{ij}^{(n)} - (r_{ij}^{(n)} \wedge 1) \big| \xrightarrow{\textrm{P}} 0$ as $n \to \infty$, where $r_{ij}^{(n)} = W_i^+ W_j^-/(\theta n)$.
(ii) There exists a non-random ${\bf b}_0 \in \mathcal{S}^{\prime}$ such that $\mathbb{E}[W^- + W^+ + \rho(\boldsymbol{{B}}, {\bf b}_0) ] < \infty$.
Since we will state our result for the exploration of the in-component of a uniformly chosen vertex, the structure of the coupled tree will be determined by the vertices that we encounter during a breadth-first exploration. This exploration starts with a uniformly chosen vertex $I \in V_n$, which is used to create the set $A_0 = \{ I\}$. It then follows all the inbound edges of I to discover all the vertices at inbound distance one from I, which become the set $A_1$. In general, to identify the vertices in the set $A_k$, we explore all the inbound edges of vertices in $A_{k-1}$. As we perform the exploration, we also discover the out-degrees of the vertices we have encountered; however, we do not follow any outbound edges. We then define $\mathcal{G}_I^{(k)}$ to be the subgraph of $G(V_n, E_n)$ whose vertex set is $\bigcup_{r=0}^k A_r$ and whose edges are those that are encountered during the breadth-first exploration we described. The notation $\mathcal{G}_I^{(k)}(\mathbf{a})$ is used to refer to the graph $\mathcal{G}_I^{(k)}$ including the values of the full marks $\{ \mathbf{X}_i\}$ for all of its vertices.
In the directed case, the limit of $\mathcal{G}_I^{(k)}$ is again a delayed marked Galton–Watson process, with the convention that all its edges are pointing towards the root. We denote the tree $\mathcal{T}(\boldsymbol{{A}})$ as before; however, it will be constructed using a sequence of independent vectors of the form $\{ (\mathcal{N}_\mathbf{i}, \mathcal{D}_\mathbf{i}, \boldsymbol{{A}}_\mathbf{i})\,:\, \mathbf{i} \in \mathcal{U}\}$, with $\{ (\mathcal{N}_\mathbf{i}, \mathcal{D}_\mathbf{i}, \boldsymbol{{A}}_\mathbf{i})\,:\, \mathbf{i} \in \mathcal{U}, \mathbf{i} \neq \emptyset\}$ i.i.d. In other words, the full marks now take the form $\boldsymbol{{X}}_\mathbf{i} = (\mathcal{N}_\mathbf{i}, \mathcal{D}_\mathbf{i}, \boldsymbol{{A}}_\mathbf{i})$, $\mathbf{i} \in \mathcal{U}$. The construction of the tree $\mathcal{T}$ is done as in the undirected case using the $\{\mathcal{N}_\mathbf{i}\,:\, \mathbf{i} \in \mathcal{U} \}$, and the marked tree is given by $\mathcal{T}(\boldsymbol{{A}}) = \{ \boldsymbol{{X}}_\mathbf{i} \,:\, \mathbf{i} \in \mathcal{T} \}$. The notation $\mathcal{T}^{(k)}$ ($\mathcal{T}^{(k)}(\boldsymbol{{A}})$) again refers to the restriction of $\mathcal{T}$ ($\mathcal{T}(\boldsymbol{{A}})$) to its first k generations.
The distributions of the full marks $\mathbf{X}_\mathbf{i}$ for both $\mathbf{i} = \emptyset$ and $\mathbf{i} \neq \emptyset$ are also different than in the undirected case. For a DCM, let $\boldsymbol{{A}} = (\mathscr{D}^-, \mathscr{D}^+, \boldsymbol{{B}})$ be distributed according to $\nu$; then, $\mathbb{P}(\boldsymbol{{X}}_\emptyset \in \cdot) = \mathbb{P}( (\mathscr{D}^-, \mathscr{D}^+, \boldsymbol{{A}}) \in \cdot)$, and $\mathbb{P}(\boldsymbol{{X}}_\mathbf{i} \in \cdot) = ({1}/{\mathbb{E}[\mathscr{D}^+]}) \mathbb{E}[ \mathscr{D}^+ \mathbf{1}((\mathscr{D}^-, \mathscr{D}^+, \boldsymbol{{A}}) \in \cdot) ]$, $\mathbf{i} \neq \emptyset$. For an IRD, let $\boldsymbol{{A}} = (W^-, W^+, \boldsymbol{{B}})$ be distributed according to $\nu$; then, $\mathbb{P}(\boldsymbol{{X}}_\emptyset \in \cdot) = \mathbb{P}( (D^-, D^+, \boldsymbol{{A}}) \in \cdot)$, and $\mathbb{P}(\boldsymbol{{X}}_\mathbf{i} \in \cdot) = ({1}/{\mathbb{E}[W^+]}) \mathbb{E}[ W^+ \mathbf{1}((D^-, D^++1, \boldsymbol{{A}}) \in \cdot) ]$, $\mathbf{i} \neq \emptyset$, where $D^-$ and $D^+$ are conditionally independent (given $(W^-, W^+)$) Poisson random variables with means $c W^-$ and $(1-c)W^+$, respectively, and $c = \mathbb{E}[W^+]/\mathbb{E}[W^- + W^+]$. Note that in this case, the distribution of $\boldsymbol{{X}}_\mathbf{i}$ for $\mathbf{i} \neq \emptyset$ corresponds to a size-biased version of the distribution of $\boldsymbol{{X}}_\emptyset$ with respect to its second coordinate.
The following is our main coupling theorem for directed graphs.
Theorem 3.1. Suppose $G(V_n, E_n)$ is either a DCM or an IRD satisfying Assumption 3.1. Then, for any fixed k and $\mathcal{G}^{(k)}_I({\bf a})$ the depth-k neighborhood of a uniformly chosen vertex $I \in V_n$, there exists a marked Galton–Watson tree $\mathcal{T}^{(k)} (\boldsymbol{{A}})$ restricted to its first k generations, whose root corresponds to vertex I, and such that $\mathbb{P}_n\big( \mathcal{G}_I^{(k)} \not\simeq \mathcal{T}^{(k)} \big) \xrightarrow{\textrm{P}} 0$, $n \to \infty$, and if we let $\sigma({\bf i}) \in V_n$ denote the vertex in the graph corresponding to node ${\bf i} \in \mathcal{T}^{(k)}$, and define for any $\varepsilon > 0$ the event $C_{I}^{(k,\varepsilon)} = \big\{ \bigcap_{{\bf i} \in \mathcal{T}^{(k)}} \{ \rho(\mathbf{X}_{\sigma({\bf i})}, \boldsymbol{{X}}_{\bf i} ) \leq \varepsilon \} , \, \mathcal{G}_I^{(k)} \simeq \mathcal{T}^{(k)} \big \}$, then $\mathbb{E}_n[ \rho(\mathbf{X}_I, \boldsymbol{{X}}_\emptyset) ] \xrightarrow{\textrm{P}} 0$ and $\mathbb{P}_n \big( C_I^{(k,\varepsilon)} \big) \xrightarrow{\textrm{P}} 1$, $n \to \infty$. Moreover, for any fixed $m, k \geq 1$, $\varepsilon > 0$, and $\{I_j\,:\, 1 \leq j \leq m\}$ i.i.d. random variables uniformly chosen in $V_n$, there exist i.i.d. copies of $\mathcal{T}^{(k)}(\boldsymbol{{A}})$, denoted $\{ \mathcal{T}_{\emptyset(I_j)}^{(k)}(\boldsymbol{{A}})\,:\, 1 \leq j \leq m\}$, whose roots correspond to the vertices $\{I_j\,:\, 1 \leq j \leq m\}$ in $G(V_n, E_n)$, such that $\sum_{j=1}^m \mathbb{E}_n[ \rho(\mathbf{X}_{I_j}, \boldsymbol{{X}}_{\emptyset(I_j)}) ] \xrightarrow{\textrm{P}} 0$ and $\mathbb{P}_n\big( \bigcap_{j=1}^m C_{I_j}^{(k,\varepsilon)} \big) \xrightarrow{\textrm{P}} 1$, $n \to \infty$.
The remainder of the paper contains the proofs of Theorem 2.1 and 3.1.
4. Proofs
The proofs of Theorem 2.1 and 3.1 are based on an intermediate coupling between the breadth-first exploration of the graph $G(V_n, E_n)$ and a delayed marked Galton–Watson process whose offspring distribution and marks still depend on the filtration $\mathscr{F}_n$. This intermediate step consists in coupling $\mathcal{G}^{(k)}_I$ with a marked tree denoted $\hat T^{(k)}({\hat{\mathbf{A}}})$. Interestingly, this coupling will be perfect, in the sense that the vertex/node marks in each of the two graphs will also be identical to each other. The proofs of Theorem 2.1 and 3.1 will be complete once we show that $\hat T^{(k)}({\hat{\mathbf{A}}})$ can be coupled with the limiting $\mathcal{T}^{(k)}(\boldsymbol{{A}})$.
To organize the exposition, we will separate the undirected case from the directed one. Most of the proofs for the directed case have been established in prior work by the author, and therefore will be omitted; the precise references and the missing details are given in Section 4.2. Once the intermediate coupling theorems are proved, the coupling between the two trees can be done similarly for the undirected and directed cases (on the trees, the direction of the edges is irrelevant).
4.1. Discrete coupling for undirected graphs
As mentioned above, the main difference between the intermediate tree and the limiting one lies in the distribution of the marks. As before, we start with the construction of the possibly infinite tree $\hat T$, which is done with the conditionally independent (given $\mathscr{F}_n$) sequence of random vectors in $\mathcal{S}$, $\{ (\hat N_\mathbf{i}, {\hat{\mathbf{A}}}_\mathbf{i})\,:\, \mathbf{i} \in \mathcal{U}\}$, with $\{ (\hat N_\mathbf{i}, {\hat{\mathbf{A}}}_\mathbf{i})\,:\, \mathbf{i} \in \mathcal{U}, \, \mathbf{i} \neq \emptyset \}$ conditionally i.i.d. Let $\hat A_0 = \{ \emptyset\}$, and recursively define $\hat A_k = \{ (\mathbf{i}, j)\,:\, \mathbf{i} \in \hat A_{k-1}, 1 \leq j \leq \hat N_\mathbf{i} \}$, $k \geq 1$. Next, define the full marks according to ${\hat{\mathbf{X}}}_\emptyset = (\hat N_\emptyset, {\hat{\mathbf{A}}}_\emptyset)$ and ${\hat{\mathbf{X}}}_\mathbf{i} = (\hat N_\mathbf{i}+1, {\hat{\mathbf{A}}}_\mathbf{i})$, $\mathbf{i} \neq \emptyset$, and let $\hat T({\hat{\mathbf{A}}}) = \{ {\hat{\mathbf{X}}}_\mathbf{i}\,:\, \mathbf{i} \in \hat T \}$.
For a CM, the distribution of the full marks is given by
For the IR model, first let $\{b_n\}$ be a sequence such that $b_n \xrightarrow{\textrm{P}} \infty$ and $b_n/\sqrt{n} \xrightarrow{\textrm{P}} 0$ as $n\to\infty$, and use it to define $\bar W_i = W_i \wedge b_n$ and $\Lambda_n = \sum_{i=1}^n \bar W_i$. The marks on the coupled marked Galton–Watson process are given by
where conditionally on ${\bf a}_i$, $D_i$ is a Poisson random variable with mean $\Lambda_n \bar W_i/(\theta n)$.
We will also need to extend our definition of an isomorphism for marked graphs.
Definition 4.1. A graph $G(V,E)$ is called a vertex-weighted graph if each of its vertices has a mark (weight) assigned to it. We say that the two vertex-weighted simple graphs $G(V, E)$ and $G(V^{\prime}, E^{\prime})$ are isomorphic if there exists a bijection $\sigma\,:\, V \to V^{\prime}$ such that edge $(i,j) \in E$ if and only if edge $(\sigma(i), \sigma(j)) \in E^{\prime}$, and in addition, the marks of i and $\sigma(i)$ are the same. For vertex-weighted multigraphs, we say that $G(V,E)$ and $G^{\prime}(V^{\prime}, E^{\prime})$ are isomorphic if there exists a bijection $\sigma\,:\, V \to V^{\prime}$ such that $l(i) = l(\sigma(i))$ and $e(i,j) = e(\sigma(i), \sigma(j))$ for all $i \in V$ and all $(i,j) \in E$, where $l(i)$ is the number of self-loops of vertex i and $e(i,j)$ is the number of edges from vertex i to vertex j, and in addition, the marks of i and $\sigma(i)$ are the same. In both cases, we write $G \simeq G^{\prime}$.
The intermediate coupling theorem is given below.
Theorem 4.1. Suppose $G(V_n, E_n)$ is either a CM or an IR satisfying Assumption 2.1. Then, for $\mathcal{G}^{(k)}_I({\bf a})$ the depth-k neighborhood of a uniformly chosen vertex $I \in V_n$, there exists a marked Galton–Watson tree $\hat T^{(k)} ({\hat{\mathbf{A}}})$ restricted to its first k generations whose root corresponds to vertex I and such that, for any fixed $k \geq 1$, $\mathbb{P}_n\big( \mathcal{G}_I^{(k)}(\mathbf{a}) \not\simeq \hat T^{(k)}({\hat{\mathbf{A}}}) \big) \xrightarrow{\textrm{P}} 0$, $n \to \infty$.
The proof of Theorem 4.1 is given separately for the two models being considered, the CM and the IR.
4.1.1. Coupling for the configuration model.
To explore the neighborhood of depth k of vertex $I \in G(V_n, E_n)$ we start by labeling the set of $L_n$ stubs in such a way that stubs $\{1, \dots, D_1\}$ belong to vertex 1, stubs $\{ D_1+1, \dots, D_1 + D_2\}$ belong to vertex 2, and, in general, stubs $\{ D_1 + \dots + D_{m-1} + 1, \dots, D_1 + D_m\}$ belong to vertex m.
For any $k \geq 0$, define the sets:
These sets will be constructed as we explore the graph.
To do a breadth-first exploration of $G(V_n, E_n)$, we start by selecting vertex I uniformly at random. Next, let $J_0$ denote the set of stubs belonging to vertex I and set $A_0 = \{ I \}$. For $k \geq 1$, step k in the exploration will identify all the stubs belonging to nodes in $A_k$:
Step k, $k \geq 1$:
(a) Initialize the sets $A_k = J_k = \varnothing$.
(b) For each vertex $i \in A_{k-1}$:
(i) For each of the unpaired stubs of vertex i:
(1) Pick an unpaired stub of vertex i and sample uniformly at random a stub from the $L_n$ available. If the chosen stub is the stub currently being paired, or if it had already been paired, sample again until an unpaired stub is sampled.
(2) If the chosen stub belongs to vertex j, draw an edge between vertices i and j using the chosen stub. If vertex j had not yet been discovered, add it to $A_k$ and add all of its unpaired stubs to $J_k$.
The exploration terminates at step k if $J_k = \varnothing$, at which point the component of I will have been fully explored.
To couple the construction of $\hat T$, initialize $\hat A_0 = \{ \emptyset\}$, identify $\emptyset$ with vertex I in $G(V_n, E_n)$, and set $\hat N_\emptyset = D_I$, ${\bf a}_\emptyset = {\bf a}_I$. For $k \geq 1$, step k in the construction will identify all the nodes in $\hat A_k$ by adding nodes in agreement with the exploration of the graph. Each node that is added to the tree will have a number of stubs equal to one less than the total number of stubs of the corresponding vertex (the one being used to create the edge), regardless of whether some of those stubs may already have been paired.
Step k, $k \geq 1$:
(a) Initialize the set $\hat A_k = \emptyset$.
(b) For each node ${\bf i} = (i_1, \dots, i_{k-1}) \in \hat A_{k-1}$:
(i) For each $1 \leq r \leq \hat N_{\bf i}$:
(1) Pick a stub uniformly at random from the $L_n$ available.
(2) If the chosen stub belongs to vertex j, then add node $({\bf i},r)$ to $\hat A_k$ and set $\hat N_{({\bf i}, r)} = D_j - 1$, ${\hat{\mathbf{A}}}_{({\bf i},r)} = {\bf a}_j$.
This process will end in step k if $\hat N_{\bf i} = 0$ for all ${\bf i} \in \hat A_k$, or it may continue indefinitely.
Note that the coupling relies on using the same uniform random numbers in step (b)(i)(1) for the two constructions. Specifically, there is one uniform for each of the $L_n$ stubs which is used in step (b)(i)(1) of the tree construction, and in the first pick in the acceptance–rejection step (b)(i)(1) of the graph exploration; in case of a rejection, independent uniforms are used until a stub is accepted.
Definition 4.2. We say that the coupling breaks in generation $\tau = k$ if:
the first time we have to resample a stub in step (b)(i)(1) occurs while exploring a stub belonging to a vertex in $A_{k-1}$; or
given that the above has not happened, a stub belonging to a vertex in $A_{k-1}$ is paired with a stub belonging to a previously encountered vertex (this vertex could be in either $A_{k-1}$ or the current set $A_k$).
Note that the exploration of the component of depth k of vertex I in $G(V_n, E_n)$ and the construction of the first k generations of the tree $\hat T$ will be identical provided $\tau > k$, meaning $\mathcal{G}_I^{(k)}(\mathbf{a}) \simeq \hat T^{(k)}({\hat{\mathbf{A}}})$ in the sense of Definition 4.1. This particular construction is standard in the analysis of the configuration model, with small variations depending on whether we need to keep track of completed generations or simply the number of stubs that have been paired. We refer the reader to [Reference van der Hofstad22, Chapter 4] for the stub-by-stub version and a full history of the model. Earlier versions of the coupling imposed finite second moment conditions that more recent proofs can omit (see, e.g., [Reference van der Hofstad, Hooghiemstra and Van Mieghem23]).
Proof of Theorem 4.1.($m=1$) for the CM. From the observation made above, it suffices to show that the exploration of the k-neighborhood of vertex I does not require us to resample any stub in step (b)(i)(1), nor does it sample a stub belonging to a vertex that had already been discovered. To compute the probability of successfully completing k generations in $\hat T$ before the coupling breaks, write $\mathbb{P}_n \big( \mathcal{G}_I^{(k)}(\mathbf{a}) \neq \hat T^{(k)}({\hat{\mathbf{A}}}) \big) \leq \mathbb{P}_n(\tau \leq k)$.
The coupling breaks the first time we draw a stub belonging to a vertex that has already been explored: either a stub already paired, or one that is unpaired but already attached to the graph. The number of paired stubs when exploring a vertex in $A_{r-1}$ is less than or equal to $2 \sum_{j=1}^{r} |A_j| + |J_{r}| $, which corresponds to two stubs for each of the vertices at distance at most r of I and the unpaired stubs belonging to nodes in $J_{r}$. Note that up to the moment that the coupling breaks, we have $|A_j| = |\hat A_j|$ for all $0 \leq j \leq r$, and $|J_r| = |\hat A_{r+1}|$, so the probability that we break the coupling while exploring a vertex in $A_{r-1}$ is less than or equal to $P_{r} \,:\!=\, ({2}/{L_n}) \sum_{j=1}^{r+1} |\hat A_j| \leq {2 |\hat V_{r+1}|}/{L_n}$, $r \geq 1$.
It follows that, for any $a_n > 0$,
where $\text{Bin}(n,p)$ represents a binomial random variable with parameters (n, p). Hence, we have $\mathbb{P}_n \big( \mathcal{G}_I^{(k)}(\mathbf{a}) \not\simeq \hat T^{(k)}({\hat{\mathbf{A}}}) \big) \leq \mathbb{P}_n(\tau \leq k) \leq ({2k a_n^2}/{L_n}) + \mathbb{P}_n ( |\hat V_{k+1}| > a_n)$.
To analyze the last probability we use the first part of Theorem 4.3 to obtain that, for any fixed $k \geq 1$, there exists a tree $\mathcal{T}^{(k)}$ of depth k, whose distribution does not depend on $\mathscr{F}_n$, such that $\mathbb{P}_n \big( \hat T^{(k)} \not\simeq \mathcal{T}^{(k)} \big) \xrightarrow{\textrm{P}} 0$ as $n \to \infty$. Let $|\mathcal{A}_k|$ denote the size of the kth generation of that tree, define $|\mathcal{V}_{k+1}| = \sum_{j=0}^{k+1} |\mathcal{A}_j|$, and note that $\mathbb{P}_n \big( \mathcal{G}_I^{(k)}(\mathbf{a}) \not\simeq \hat T^{(k)}({\hat{\mathbf{A}}}) \big) \leq ({2k a_n^2}/{L_n}) + \mathbb{P} ( |\mathcal{V}_{k+1}| > a_n) + \mathbb{P}_n \big( \hat T^{(k)} \not\simeq \mathcal{T}^{(k)} \big)$. Choosing $a_n \xrightarrow{\textrm{P}} \infty$ so that $a_n^2/n \xrightarrow{\textrm{P}} 0$ as $n \to \infty$, and observing that $|\mathcal{V}_{k+1}| < \infty$ a.s., completes the proof.
4.1.2. Coupling for the inhomogeneous random graph.
We couple the exploration of the component of vertex $I \in G(V_n, E_n)$ with a marked multi-type Galton–Watson process with n types, one for each vertex in $G(V_n, E_n)$. A node of type $i \in \{1, 2, \dots, n\}$ in the tree will have a Poisson number of offspring of type j with mean $q_{ij}^{(n)} = {\bar W_i \bar W_j}/({\theta n})$, $ 1\leq j \leq n$.
Similarly to the CM, define:
We again do a breadth-first exploration of $G(V_n, E_n)$ starting from a uniformly chosen vertex I. To start, let $\{ U_{ij}\,:\, i,j \geq 1\}$ be a sequence of i.i.d. Uniform[0, 1] random variables, independent of $\mathscr{F}_n$. We will use this sequence of i.i.d. uniforms to realize the Bernoulli random variables that determine the presence/absence of edges in $G(V_n, E_n)$. Set $A_0 = \{ I \}$ and initialize the set $J = \varnothing$; the set J keeps track of the vertices that have been fully explored (all its potential edges realized), and coincides with $V_{k-1}$ at the end of step k.
Step k, $k\geq 1$:
(a) Initialize the set $A_k = \varnothing$.
(b) For each vertex $i \in A_{k-1}$:
(i) Let $X_{ij} = 1(U_{ij} > 1- p_{ij}^{(n)})$ for each $j \in \{1, 2, \dots, n\} \setminus J$.
(ii) If $X_{ij} = 1$ draw an edge between vertices i and j and add vertex j to $A_k$.
(iii) Add vertex i to set J.
The exploration terminates at the end of step k if $A_k = \varnothing$, at which point the component of I will have been fully explored.
To couple the construction of $\hat T$, initialize $\hat A_0 = \{ \emptyset\}$ and identify $\emptyset$ with vertex I in $G(V_n, E_n)$ as before; let $\hat B_0 = \{ I \}$. To construct the tree, we sample for a node of type i a Poisson number of offspring of type j for each $j \in \{1, \dots, n\}$. To do this, let $G({\cdot};\, \lambda)$ be the cumulative distribution function of a Poisson random variable with mean $\lambda$, and let $G^{-1}(u;\, \lambda) = \inf\{ x \in \mathbb{R}\,:\, G(x;\, \lambda) \geq u\}$ denote its pseudoinverse. In order to keep the tree coupled with the exploration of the graph we use the same sequence of i.i.d. uniform random variables used to sample the edges in the graph. Initialize the set $\hat J = \varnothing$, which will keep track of the types that have appeared and whose offspring have been sampled. The precise construction is given below:
Step k, $k \geq 1$:
(a) Initialize the sets $\hat A_k = \hat B_k = \varnothing$.
(b) For each node ${\bf i} = (i_1, \dots, i_{k-1}) \in \hat A_{k-1}$:
(i) If i has type $t \notin \hat J$:
(1) For each type $j \in \{1, \dots, n\} \setminus \hat J$, let $Z_{tj} = G^{-1}(U_{tj};\, q_{tj}^{(n)})$, and create $Z_{tj}$ children of type j for node i. If $Z_{tj} \geq 1$, create $Z_{tj}$ children of type j for node i, each with node attribute equal to $\mathbf{a}_j$, and add j to set $\hat B_k$.
(2) For each type $j \in \hat J$, sample $Z_{tj}^* \sim \textrm{Poisson}(q_{tj}^{(n)})$, independently of the sequence $\{U_{ij}\,:\, i,j \geq 1\}$ and any other random variables. If $Z_{tj}^* \geq 1$ create $Z_{tj}^*$ children of type j for node i, each with attribute equal to $\mathbf{a}_j$.
(3) Randomly shuffle all the children created in steps (b)(i)(1) and (b)(i)(2) and give them labels of the form $({\bf i}, j)$, then add the labeled nodes to set $\hat A_k$. The node attributes will be denoted ${\hat{\mathbf{A}}}_{(\mathbf{i},j)} = \mathbf{a}_j$. (The shuffling avoids the label from providing information about its type).
(4) Add type t to set $\hat J$.
(ii) If i has type $t\in \hat J$:
(1) For each type $j \in \{1, \dots, n\}$, sample $Z_{tj}^* \sim \textrm{Poisson}(q_{tj}^{(n)})$, independently of the sequence $\{U_{ij}\,:\, i,j \geq 1\}$ and any other random variables; create $Z_{tj}^*$ children of type j for node i, each with attribute equal to $\mathbf{a}_j$.
(2) Randomly shuffle all the children created in step (b)(ii)(1) and give them labels of the form $({\bf i},j)$, attributes ${\hat{\mathbf{A}}}_{(\mathbf{i},j)} = \mathbf{a}_j$, and add the labeled nodes to set $\hat A_k$.
This construction may continue indefinitely, or may terminate at the end of step k if $\hat A_k = \varnothing$.
We point out that this coupling is not standard, since in most of the literature on IR models the coupling is done between the binomial distribution (the degree of a vertex) and its coupled Poisson limit (see [Reference Bollobás, Janson and Riordan6] or [Reference van der Hofstad22, Chapter 3], where the proof uses the moments method). Moreover, most of the existing couplings consider first a multi-type Galton–Watson process with finitely many types, and then use a monotonicity argument to obtain the general case. The coupling described above avoids the need for this second step since the number of types grows as $n \to \infty$, and since it is based on coupling the individual Bernoulli random variables (edges) with their Poisson counterparts, it allows us to keep track of the vertex marks with no additional effort.
Definition 4.3. We say that the coupling breaks in generation $\tau = k$ if, for any node in $\hat A_{k-1}$, either:
in step (b)(i)(1) we have $Z_{tj} \neq X_{tj}$ for some $j \in \{1, \dots, n\} \setminus \hat J $;
in step (b)(i)(1) we have $Z_{tj} \geq 1$ for some $j \in (\hat B_{k-1} \cup \hat B_k) \setminus \hat J$, in which case a cycle or self-loop is created; or,
in step (b)(i)(2) we have $Z_{tj}^* \geq 1$ for some $j \in \hat J$.
We start by proving the following preliminary result. Throughout this section, let $\Delta_n \,:\!=\, \int_0^1 \big| F_n^{-1}(u) - F^{-1}(u) \big|\, \textrm{d} u \leq W_1(\nu_n, \nu)$, where $F_n(x) = \frac{1}{n} \sum_{j=1}^n \mathbf{1}(W_i \leq x)$ and $F(x) = \mathbb{P}(W \leq x)$. We also use the notation $X_n = O_{\textrm{P}}( x_n)$ as $n \to \infty$ to mean that there exists a random variable $Y_n$ such that $|X_n |\leq_{\text{st}} Y_n$ and $Y_n/x_n \xrightarrow{\textrm{P}} K$ for some finite constant K.
Lemma 4.1. For any $1 \leq i \leq n$ we have $\mathbb{P}_n\big( \max_{1 \leq j \leq n, j \neq i} |X_{ji} - Z_{ji}| \geq 1 \big) \leq \min\!\left\{ 1,\mathbf{1}(W_i > \right.$ $\left. b_n) + \mathcal{P}_n(i) + \bar W_i \eta_n \right\}$, where $\mathcal{P}_n(i) = \sum_{1 \leq j \leq n, j \neq i} \big| p_{ji}^{(n)} - (r_{ji}^{(n)} \wedge 1) \big| $, $\eta_n = ( \Delta_n + g(b_n) + b_n^2/n + b_n^2 \Delta_n/(\theta n))/\theta$, and $ g(x) = \mathbb{E}[(W - x)^+]$.
Proof. Let $R_{ij} = 1(U_{ij} > 1- r_{ij}^{(n)})$, with $r_{ij}^{(n)} = W_i W_j/(\theta n)$. The union bound gives
Now note that
The first probability can be computed to be $\mathbb{P}_n( |X_{ij} - Z_{ij}| \geq 1) = \big| p_{ij}^{(n)} - \big(r_{ij}^{(n)} \wedge 1\big)\big|$.
To analyze each of the probabilities involving $R_{ij}$ and $Z_{ij}$, note that
Now use the inequalities $\textrm{e}^{-x} \geq 1-x$, $\textrm{e}^{-x} - 1 + x \leq x^2/2$, and $\textrm{e}^x - 1 - x \leq x^2 \textrm{e}^x/2$ for $x \geq 0$, to obtain that
It follows that
To further bound the second term, note that if we let $W^{(n)}$ denote a random variable distributed according to $F_{n}$ and W a random variable distributed according to F, then $\frac{1}{n} \sum_{j=1}^n (W_j - b_n)^+ = \mathbb{E}_n[ (W^{(n)} - b_n)^+ ] \leq \mathbb{E}_n[ | W^{(n)} - W| + (W - b_n)^+ ] = \Delta_n + g(b_n)$. And for the last term,
We conclude that for $\mathcal{E}_n$ as defined in the statement of the lemma,
which in turn yields $\mathbb{P}_n( \max_{1 \leq j \leq n, j \neq i} |X_{ij} - Z_{ij}| \geq 1 ) \leq \min\{ 1, \mathbf{1}(W_i > b_n) + \mathcal{P}_n(i) + $ $\eta_n \bar W_i \} $.
Proof of Theorem 4.1.($m=1$) for the IR. We start by defining the following events:
Next, note that $\mathbb{P}_{n} (\tau \leq k) \leq \mathbb{P}_{n} (\tau \leq k, M_k ) + \mathbb{P}_{n} (M_k^\textrm{c})\leq \sum_{r=1}^k \frac{1}{n} \sum_{i=1}^n \mathbb{P}_{n,i} (\tau =r, M_r ) +$ $\mathbb{P}_{n} (M_k^\textrm{c})$, where the last probability can be bounded using the first part of Theorem 4.3 as at the end of the proof of Theorem 4.1 for the CM. Specifically, $\mathbb{P}_{n} (M_k^\textrm{c}) \leq \mathbb{P}\!\left( |\mathcal{V}_{k+1}| > s_n \right) + \mathbb{P}_n\big( \hat T^{(k)} \not\simeq \mathcal{T}^{(k)} \big)$, where $|\mathcal{V}_{k+1}| = \sum_{j=0}^{k+1} |\mathcal{A}_j| < \infty$ a.s. and the distribution of $\mathcal{T}^{(k)}$ does not depend on $\mathscr{F}_n$.
Now note that, for any $r \geq 1$, $\mathbb{P}_{n,i}(\tau = r, M_{r} ) = \mathbb{P}_{n,i}( M_{r} \cap \bigcap_{m=1}^{r-1} H_{m} \cap H_r^\textrm{c} )$, with the convention that $\bigcap_{m=1}^0 H_m = \Omega$. Let $\mathcal{F}_t$ denote the sigma-algebra that contains the history of the exploration process in the graph as well as that of its coupled tree, up to the end of step t of the graph exploration process. It follows that we can write $\mathbb{P}_{n,i}(\tau = r, M_{r}) = \mathbb{E}_{n,i}\big[ \mathbf{1}\big( M_{r-1} \cap \bigcap_{m=1}^{r-1} H_{m} \big) \mathbb{P}_n( M_r \cap H_r^\textrm{c} \mid \mathcal{F}_{r-1}) \big]$. To analyze the conditional probability inside the expectation above, note that, conditionally on $\mathcal{F}_{r-1}$, the set $A_{r-1}$ is known, and recall that the set $J = V_{r-2}$ at the beginning of step r (assuming $r \geq 2$, otherwise $J = \varnothing$). Therefore, by the union bound and the independence among the edges, we have:
Now use the independence of the edges from the rest of the exploration process and Lemma 4.1 to obtain that
Next, condition further on the exploration up to the moment we are about to explore the neighbors of i, and use the independence of the edges from the rest of the exploration process to obtain that
where in the last inequality we used $1 - \textrm{e}^{-x} \leq x$ for $x \geq 0$ and $|\mathbb{B}_i| \leq |\hat V_r| \leq s_n$.
It follows that
To analyze this remaining expectation we note that on the event $ \bigcap_{m=1}^{r-1} H_{m} $ the coupling has not been broken yet, and therefore we can can replace $A_{r-1}$ with its tree counterpart $\hat A_{r-1}$. Also, note that by [Reference Lee and Olvera-Cravioto18, Lemma 3.4], the types of the nodes in each of the sets $\hat A_k$ are independent of the type of their parents. We will then identify the nodes in $\hat A_{r-1}$ as $\big\{Y_1, \dots, Y_{|\hat A_{r-1}|}\big\}$, where, for any $t \geq 1$, $\mathbb{P}_n(Y_{t} = j) = {{\bar W}_j}/{\Lambda_n}$, $j = 1, 2, \dots, n$.
It follows that
To compute the last expectation, let $(W^{(n)}, W)$ be constructed according to an optimal coupling of $F_n$ and F. Let $\bar W^{(n)} = W^{(n)} \wedge b_n$. Then, for any $c_n \geq 1$,
as $n \to \infty$, and since $\eta_n = O_{\textrm{P}}( \Delta_n + g(b_n) + b_n^2/n )$, we conclude that $\mathbb{P}_{n,i}(\tau = r, M_r ) = O_{\textrm{P}}( s_n ( g(c_n) + c_n \mathcal{E}_n + c_n \Delta_n + c_n g(b_n) + c_n b_n^2/n ) )$ as $n \to \infty$. It now follows from the beginning of the proof that $\mathbb{P}_n( \tau \leq k ) \leq O_{\textrm{P}}( k s_n ( g(c_n) + c_n \mathcal{E}_n + c_n \Delta_n + c_n g(b_n) + c_n b_n^2/n ) ) + \mathbb{P}( |\mathcal{V}_{k+1}| > s_n ) + \mathbb{P}_n\big( \hat T^{(k)} \not\simeq \mathcal{T}^{(k)} \big)$ as $n \to \infty$. Since $\lim_{x \to \infty} g(x) = 0$, choosing $c_n = ( \mathcal{E}_n + \Delta_n + g(b_n) + b_n^2/n )^{-1/2}$ and $s_n = (g(c_n) + c_n^{-1/2})^{-1/2}$ proves the theorem.
4.2. Discrete coupling for directed graphs
The equivalent of Theorem 4.1 ($m=1$) for directed graphs has already been proved, under conditions equivalent to those in Assumption 3.1, in [Reference Olvera-Cravioto20, Theorem 6.3] for the DCM, and in [Reference Lee and Olvera-Cravioto18, Theorem 3.7] for the IRD. Hence, we only need to describe the distribution of the intermediate tree and state the coupling theorem. The descriptions of the couplings follow, with some adjustments, those from Sections 4.1.1 and 4.1.2. However, the precise descriptions in the directed case can be found in [Reference Chen, Livtak and Olvera-Cravioto9, Section 5.2] for the DCM and [Reference Lee and Olvera-Cravioto18, Section 3.2.2] for the IRD.
In the directed case, the intermediate tree $\hat T$ is constructed using a sequence of conditionally independent (given $\mathscr{F}_n$) random vectors $\{ (\hat N_\mathbf{i}, \hat D_\mathbf{i}, {\hat{\mathbf{A}}}_\mathbf{i})\,:\, \mathbf{i} \in \mathcal{U}\}$ in $\mathcal{S}$, with $\{ (\hat N_\mathbf{i}, \hat D_\mathbf{i}, {\hat{\mathbf{A}}}_\mathbf{i})\,:\, \mathbf{i} \in \mathcal{U}, \mathbf{i} \neq \emptyset \}$ conditionally i.i.d. The tree $\hat T$ is constructed as in the undirected case using the $\{\hat N_\mathbf{i}\}$, with all edges pointing towards the root, and the full marks take the form ${\hat{\mathbf{X}}}_\mathbf{i} = (\hat N_\mathbf{i}, \hat D_\mathbf{i}, {\hat{\mathbf{A}}}_\mathbf{i})$, $\mathbf{i} \in \mathcal{U}$. The marked tree is given by $\hat T({\hat{\mathbf{A}}}) = \{ {\hat{\mathbf{X}}}_\mathbf{i}\,:\, \mathbf{i} \in \hat T \}$.
We now specify the distribution of the full marks, which in the case of a DCM is given by
For the IRD model, first let $\{a_n\}$ and $\{b_n\}$ be sequences such that $a_n \wedge b_n \xrightarrow{\textrm{P}} \infty$ and $a_n b_n/n \xrightarrow{\textrm{P}} 0$ as $n\to\infty$, and use them to define $\bar W_i^- = W_i^- \wedge a_n$ and $\bar W_i^+ = W_i^+ \wedge b_n$, with $\Lambda_n^- = \sum_{i=1}^n \bar W_i^-$ and $\Lambda_n^+ = \sum_{i=1}^n \bar W_i^+$. The marks on the coupled marked Galton–Watson process are given by
where, conditionally on ${\bf a}_i$, $D_i^-$ and $D_i^+$ are independent Poisson random variables with means $\Lambda_n^+ \bar W_i^+/(\theta n)$ and $\Lambda_n^- \bar W_i^-/(\theta n)$, respectively.
The intermediate coupling theorem for directed graphs is given below, and it is a direct consequence of [Reference Olvera-Cravioto20, Theorem 6.3] and [Reference Lee and Olvera-Cravioto18, Theorem 3.7].
Theorem 4.2. Suppose $G(V_n, E_n)$ is either a DCM or an IRD satisfying Assumption 3.1. Then, for $\mathcal{G}^{(k)}_I({\bf a})$ the depth-k neighborhood of a uniformly chosen vertex $I \in V_n$, there exists a marked Galton–Watson tree $\hat T^{(k)} ({\hat{\mathbf{A}}})$, restricted to its first k generations, whose root corresponds to vertex I and is such that, for any fixed $k \geq 1$, $\mathbb{P}_n\big( \mathcal{G}_I^{(k)}(\mathbf{a}) \not\simeq \hat T^{(k)}({\hat{\mathbf{A}}}) \big) \xrightarrow{\textrm{P}} 0$, $n \to \infty$.
4.3. Coupling between two trees
In view of Theorem 4.1 and 4.2, the proofs of the main theorems, Theorem 2.1 and 3.1 ($m = 1$) will be complete once we establish that with high probability the intermediate tree $\hat T^{(k)}$ is isomorphic to the limiting tree $\mathcal{T}^{(k)}$, and that the node marks in the two trees are within $\varepsilon$ distance of each other.
Note that there is no need to consider the undirected and directed cases separately, since they only differ on the sample space for the full marks, ${\hat{\mathbf{X}}}_\mathbf{i}$ / $\boldsymbol{{X}}_\mathbf{i}$, which take values in $\mathcal{S} = \mathbb{N} \times \mathbb{R} \times \mathcal{S}^{\prime}$ in the undirected case and $\mathcal{S} = \mathbb{N} \times \mathbb{N} \times \mathbb{R} \times \mathbb{R} \times \mathcal{S}^{\prime}$ in the directed one. For the directed case, all edges in the trees point towards the root.
The coupling theorem between the two trees is the following. The proof of the main theorems, Theorem 2.1 and 3.1 ($m=1$), follow directly from combining Theorem 4.1 and 4.3 in the undirected case, and Theorem 4.2 and 4.3 in the directed one.
Theorem 4.3. Under Assumption 2.1 or 3.1, as appropriate, there exists a coupling of $\hat T^{(k)}({\hat{\mathbf{A}}})$ and $\mathcal{T}^{(k)}(\boldsymbol{{A}})$ such that $\mathbb{P}_n \big( \hat T^{(k)} \not\simeq \mathcal{T}^{(k)} \big) \xrightarrow{\textrm{P}} 0$, $n \to \infty$, and such that, for any $\varepsilon > 0$, $\mathbb{E}_n\big[ \rho({\hat{\mathbf{X}}}_\emptyset, \boldsymbol{{X}}_\emptyset) \big] \xrightarrow{\textrm{P}} 0$ and $\mathbb{P}_n \big( \bigcap_{{\bf i} \in \mathcal{T}^{(k)}} \{ \rho({\hat{\mathbf{X}}}_{\bf i}, \boldsymbol{{X}}_{\bf i}) \leq \varepsilon \} , \, \hat T^{(k)} \simeq \mathcal{T}^{(k)} \big) \xrightarrow{\textrm{P}} 1$ as $n \to \infty$.
Before proving Theorem 4.3, we need to prove a couple of technical lemmas. The first establishes the existence of couplings for the node attributes, whose distributions are given by $\nu_n({\cdot}) = \mathbb{P}_n\big( {\hat{\mathbf{A}}} \in \cdot \,\big) = \frac{1}{n} \sum_{i=1}^n \mathbf{1}(\mathbf{a}_i \in \cdot)$ and $\nu({\cdot}) = \mathbb{P}(\boldsymbol{{A}} \in \cdot)$, and their size-biased versions. Recall that in the undirected case the node attributes are of the form $\mathbf{a}_i = (D_i, \mathbf{b}_i)$ in the CM and $\mathbf{a}_i = (W_i, \mathbf{b}_i)$ in the IR, while in the directed case they take the form $\mathbf{a}_i = (D_i^-, D_i^+, \mathbf{b}_i)$ in the DCM and $\mathbf{a}_i = (W_i^-, W_i^+, \mathbf{b}_i)$ in the IRD. In the undirected case, the size-bias is done with respect to the first coordinate, while in the directed case with respect to the second one. Specifically, the size-biased attributes in the undirected case take the form
and
while in the directed case they take the form
and
For the undirected case, let $\rho^{\prime\prime}$ be the metric on $\mathcal{S}^{\prime\prime} = [0,\infty) \times \mathcal{S}^{\prime}$ given by $\rho^{\prime\prime}(\mathbf{x}, \mathbf{y}) = |x_1 - y_1| + \rho^{\prime}(\mathbf{x}_2, \mathbf{y}_2)$, $\mathbf{x} = (x_1, \mathbf{x}_2)$, $\mathbf{y} = (y_1, \mathbf{y}_2)$, and for the directed case let $\rho^{\prime\prime}$ be the metric on $\mathcal{S}^{\prime\prime} = [0,\infty) \times [0,\infty) \times \mathcal{S}^{\prime}$ given by $\rho^{\prime\prime}(\mathbf{x}, \mathbf{y}) = |x_1 - y_1| + |x_1-y_2| + \rho^{\prime}(\mathbf{x}_3, \mathbf{y}_3)$, $\mathbf{x} = (x_1, x_2, \mathbf{x}_3)$, $\mathbf{y} = (y_1,y_2, \mathbf{y}_3)$.
Lemma 4.2. Under Assumption 2.1 or 3.1, as appropriate, there exist couplings $\big({\hat{\mathbf{A}}}, \boldsymbol{{A}}\big)$ and $\big({\hat{\mathbf{A}}}_b, \boldsymbol{{A}}_b\big)$ constructed on the same probability space $(\mathcal{S}^{\prime\prime}, \mathscr{F}_n, \mathbb{P}_n)$ such that $\mathbb{E}_n\big[ \rho^{\prime\prime}( {\hat{\mathbf{A}}}$, $\boldsymbol{{A}}) \big] \xrightarrow{\textrm{P}} 0$, $\rho^{\prime\prime}\big( {\hat{\mathbf{A}}}, \boldsymbol{{A}}\big) \xrightarrow{\textrm{P}} 0$, and $\rho^{\prime\prime}\big( {\hat{\mathbf{A}}}_b, \boldsymbol{{A}}_b\big) \xrightarrow{\textrm{P}} 0$ as $n \to \infty$.
Proof. Assumptions 2.1 and 3.1 state that $W_1(\nu_n, \nu) \xrightarrow{\textrm{P}} 0$ as $n \to \infty$, and by the properties of the Wasserstein metric (see [Reference Villani24, Theorem 4.1]), there exists an optimal coupling $\big({\hat{\mathbf{A}}}, \boldsymbol{{A}}\big)$ such that $\mathbb{E}_n\big[ \rho^{\prime\prime}( {\hat{\mathbf{A}}}, \boldsymbol{{A}}) \big] = W_1(\nu_n, \nu) \xrightarrow{\textrm{P}} 0$ and $\rho^{\prime\prime}\big( {\hat{\mathbf{A}}}, \boldsymbol{{A}}\big) \xrightarrow{\textrm{P}} 0$ as $n \to \infty$.
For the biased versions, note that it suffices to prove the lemma for the undirected case, since a simple rearrangement of terms such as $\mathbf{a}^{\prime}_i = (D^{\prime}_i, \mathbf{b}^{\prime}_i) \,:\!=\, (D_i^+, D_i^-, \mathbf{b}_i)$ or $\mathbf{a}^{\prime}_i = (W^{\prime}_i, \mathbf{b}^{\prime}_i) \,:\!=\, (W_i^+, W_i^-, \mathbf{b}_i)$ reduces the directed case to the undirected one. Through the remainder of the proof, we write ${\hat{\mathbf{A}}} = (\hat Y, {\hat{\mathbf{B}}})$ and $\boldsymbol{{A}} = (Y, \boldsymbol{{B}})$ to avoid having to separate the CM and IR cases.
Next, note that we only need to show that ${\hat{\mathbf{A}}}_b \Rightarrow \boldsymbol{{A}}_b$ as $n \to \infty$, where $\Rightarrow$ denotes convergence in distribution, since then we can take the almost sure representation to obtain that $\rho^{\prime\prime}\big({\hat{\mathbf{A}}}_b, \boldsymbol{{A}}_b\big) \xrightarrow{\textrm{P}} 0$. To this end, let $f\,:\, \mathcal{S}^{\prime\prime} \to \mathbb{R}$ be a bounded and continuous function, and let $\big({\hat{\mathbf{A}}}, \boldsymbol{{A}}\big)$ be the one from the beginning of the proof. Let $\tilde Y = \hat Y$ if the graph is a CM or $\tilde Y = \hat Y \wedge b_n$ if it is an IR. Then,
Since $W_1(\nu_n, \nu) \xrightarrow{\textrm{P}} 0$ implies that $\mathbb{E}_n[ |\hat Y - Y| ] \xrightarrow{\textrm{P}} 0$ as $n \to \infty$, we have $ \mathbb{E}_n\big[ \big|\tilde Y - Y\big| \big] \leq \mathbb{E}_n\big[ \big|\hat Y - Y \big| \big] + \mathbb{E} [ Y 1(Y > b_n) ] \xrightarrow{\textrm{P}} 0$ and $\mathbb{E}_n [\tilde Y] \xrightarrow{\textrm{P}} \mathbb{E}[Y]$ as $n \to \infty$. And, by the dominated convergence theorem, $\lim_{n \to \infty} \mathbb{E}\big[ \mathbb{E}_n \big[ Y \big|f({\hat{\mathbf{A}}}) - f(\boldsymbol{{A}})\big| \big] \big] = \mathbb{E}\big[ \lim_{n \to \infty} Y \big|f({\hat{\mathbf{A}}}) - f(\boldsymbol{{A}})\big| \big] = 0$. Hence, $ \mathbb{E}_n \big[ Y \big|f({\hat{\mathbf{A}}}) - f(\boldsymbol{{A}})\big| \big] \xrightarrow{\textrm{P}} 0$ as $n \to \infty$, and $\mathbf{A}_b \Rightarrow \boldsymbol{{A}}_b$ as required.
The second technical lemma relates the convergence of the attributes to that of the full marks.
Lemma 4.3. Suppose Assumption 2.1 or 3.1 holds, as appropriate, and let $({\hat{\mathbf{A}}}, \boldsymbol{{A}})$ and $({\hat{\mathbf{A}}}_b, \boldsymbol{{A}}_b)$ be the couplings in Lemma 4.2. Then, there exist couplings for $({\hat{\mathbf{X}}}_\emptyset, \boldsymbol{{X}}_0)$ and $({\hat{\mathbf{X}}}, \boldsymbol{{X}})$ constructed on the same probability space as $({\hat{\mathbf{A}}}, \boldsymbol{{A}})$ and $({\hat{\mathbf{A}}}_b, \boldsymbol{{A}}_b)$, such that $\mathbb{E}_n\big[ \rho( {\hat{\mathbf{X}}}_\emptyset, \boldsymbol{{X}}_0) \big] \xrightarrow{\textrm{P}} 0$, $\rho\big( {\hat{\mathbf{X}}}_\emptyset, \boldsymbol{{X}}_0\big) \xrightarrow{\textrm{P}} 0$, and $\rho\big( {\hat{\mathbf{X}}}, \boldsymbol{{X}}\big) \xrightarrow{\textrm{P}} 0 $ as $n \to \infty$.
Proof. For the two undirected models, CM and IR, write:
For the two directed models, DCM and IRD, write:
To obtain the statement of the lemma for the CM, simply set $({\hat{\mathbf{X}}}_\emptyset, \boldsymbol{{X}}_\emptyset) = (\hat Y, {\hat{\mathbf{A}}}, Y, \boldsymbol{{A}})$ and $({\hat{\mathbf{X}}}_1, \boldsymbol{{X}}_1) = (\hat Y_b, {\hat{\mathbf{A}}}_b, Y_b, \boldsymbol{{A}}_b)$. Similarly, for the DCM set $({\hat{\mathbf{X}}}_\emptyset, \boldsymbol{{X}}_\emptyset) = (\hat Y^-, \hat Y^+, {\hat{\mathbf{A}}}$, $Y^-, Y^+, \boldsymbol{{A}})$ and $({\hat{\mathbf{X}}}_1, \boldsymbol{{X}}_1) = (\hat Y_b^-, \hat Y_b^+, {\hat{\mathbf{A}}}_b, Y_b^-, Y_b^+, \boldsymbol{{A}}_b)$.
For the IR, construct $(\hat S, S) = ( \Lambda_n (\hat Y \wedge b_n)/(\theta n), Y )$. Note that our assumptions imply that $\mathbb{E}_n \big[ |\hat S - S| \big] \xrightarrow{\textrm{P}} 0$ as $n \to \infty$. Now let $U\sim \textrm{Uniform}[0, 1]$ be i.i.d. and independent of $(\hat S, S)$, and take $({\hat{\mathbf{X}}}_\emptyset, \boldsymbol{{X}}_0 ) = \big( G^{-1}(U; \hat S), {\hat{\mathbf{A}}}, G^{-1}(U;\, Y), \boldsymbol{{A}} \big)$, where $G^{-1}(u;\, \lambda) = \sum_{m=0}^\infty m 1(G(m;\, \lambda) \leq u < G(m+1;\, \lambda))$ is the generalized inverse of the Poisson distribution function with mean $\lambda$. Note that since $G(m;\, \lambda)$ is decreasing in $\lambda$ for all $m \geq 0$, we have $\text{Poi}(\lambda) \geq_{\text{st}} \text{Poi}(\mu)$ whenever $\lambda \geq \mu$, where $\geq_{\text{st}}$ denotes the usual stochastic order and $\text{Poi}(\alpha)$ denotes a Poisson random variable with mean $\alpha$. It follows that $\mathbb{E}\big[\big| G^{-1}(U;\, \lambda)- G^{-1}(U;\, \mu) \big|\big] = |\lambda-\mu|$, which in turn implies that $\mathbb{E}_n\big[ \rho( {\hat{\mathbf{X}}}_\emptyset, \boldsymbol{{X}}_0) \big] = \mathbb{E}_n\big[ |\hat S - S| + \rho^{\prime\prime}({\hat{\mathbf{A}}}, \boldsymbol{{A}}) \big] \xrightarrow{\textrm{P}} 0$, $n \to \infty$.
For the size-biased versions, set $( \hat S_b, S_b) = ( \Lambda_n (\hat Y_b \wedge b_n)/(\theta n), \, Y_b )$, note that Lemma 4.2 gives $| \hat S_b - S_b| \xrightarrow{\textrm{P}} 0$ as $n \to \infty$, and let $({\hat{\mathbf{X}}}_1, \boldsymbol{{X}}_1 ) = \big( G^{-1}(U;\, \hat S_b)+1, {\hat{\mathbf{A}}}_b, G^{-1}(U;\, Y_b)+1, \boldsymbol{{A}}_b \big)$. Now use the continuity in $\lambda$ of $G^{-1}(u;\, \lambda)$ to obtain that $\rho( {\hat{\mathbf{X}}}_1, \boldsymbol{{X}}_1 ) = \big| G^{-1}(U;\, \hat S_b) - G^{-1}(U;\, Y_b) \big| + \rho^{\prime\prime}( {\hat{\mathbf{A}}}_b, \boldsymbol{{A}}_b) \xrightarrow{\textrm{P}} 0$, $n \to \infty$.
The same steps also give the result for the IRD by setting
where $c = \mathbb{E}[W^+]/\mathbb{E}[W^- + W^+]$, and setting
for some U, U ′ i.i.d. Uniform[0, 1] and independent of $\mathscr{F}_n$; this completes the proof.
Finally, we can give the proof of Theorem 4.3.
Proof of Theorem 4.3.By Lemma 4.3 there exist couplings $({\hat{\mathbf{X}}}_\emptyset, \boldsymbol{{X}}_\emptyset)$ and $({\hat{\mathbf{X}}}_1, \boldsymbol{{X}}_1)$ such that $\mathbb{E}_n\big[ \rho( {\hat{\mathbf{X}}}_\emptyset, \boldsymbol{{X}}_\emptyset) \big] \xrightarrow{\textrm{P}} 0$ and $\rho({\hat{\mathbf{X}}}_1, \boldsymbol{{X}}_1) \xrightarrow{\textrm{P}} 0$ as $n \to \infty$. Now let $\{ ({\hat{\mathbf{X}}}_\mathbf{i}, \boldsymbol{{X}}_\mathbf{i})\,:\, \mathbf{i} \in \mathcal{U}, \mathbf{i} \neq \emptyset\}$ be i.i.d. copies of $({\hat{\mathbf{X}}}_1, \boldsymbol{{X}}_1)$, independent of $({\hat{\mathbf{X}}}_\emptyset, \boldsymbol{{X}}_\emptyset)$. Recall that $\hat N_\mathbf{i}$ ($\mathcal{N}_\mathbf{i}$) can be determined from the first coordinate of ${\hat{\mathbf{X}}}_\mathbf{i}$ ($\boldsymbol{{X}}_\mathbf{i}$).
We will now use the sequence $\{ ({\hat{\mathbf{X}}}_\mathbf{i}, \boldsymbol{{X}}_\mathbf{i})\,:\, \mathbf{i} \in \mathcal{U}\}$ to construct both $\hat T({\hat{\mathbf{A}}})$ and $\mathcal{T}(\boldsymbol{{A}})$ by determining their nodes according to the recursions $\hat A_k = \{ (\mathbf{i}, j)\,:\, \mathbf{i} \in \hat A_{k-1}, 1 \leq j \leq \hat N_\mathbf{i} \}$ and $\mathcal{A}_k = \{ (\mathbf{i},j)\,:\, \mathbf{i} \in \mathcal{A}_{k-1}, 1 \leq j \leq \mathcal{N}_\mathbf{i} \}$ for $k \geq 1$. Without loss of generality, assume that $0 < \varepsilon < 1$.
Now define the stopping time $\kappa(\varepsilon) = \inf\big\{ k \geq 0\,:\, \rho({\hat{\mathbf{X}}}_\mathbf{i}, \boldsymbol{{X}}_\mathbf{i}) > \varepsilon \text{ for some } \mathbf{i} \in \hat A_k \big\}$. Note that since $\hat N_\mathbf{i}$ and $\mathcal{N}_\mathbf{i}$ are integer-valued, $\rho ({\hat{\mathbf{X}}}_\mathbf{i}, \boldsymbol{{X}}_\mathbf{i}) < 1$ implies that $\hat N_\mathbf{i} = \mathcal{N}_\mathbf{i}$. It follows that, for any $x_n \geq 1$,
where $\mathcal{V}_k = \bigcup_{r=0}^k \mathcal{A}_r$. To compute the last probabilities, note that $\mathbb{P}_n(\kappa(\varepsilon) = 0) \leq \varepsilon^{-1} \mathbb{E}_n\big[ \rho( {\hat{\mathbf{X}}}_\emptyset, \boldsymbol{{X}}_\emptyset) \big]$, and, for $r \geq 1$,
where in the third step we used the independence of $({\hat{\mathbf{X}}}_1, \boldsymbol{{X}}_1)$ from $\mathcal{A}_r$. It follows that, if we choose $x_n = \mathbb{P}_n \big( \rho({\hat{\mathbf{X}}}_1, \boldsymbol{{X}}_1) > \varepsilon \big)^{-1/2} \xrightarrow{\textrm{P}} \infty$, then
as $n \to \infty$. This completes the proof.
The last proof in the paper relates to the case $m \geq 2$ for both Theorem 2.1 and 3.1. Since the proof for the directed case follows exactly the same steps as for the undirected one, we include here only the undirected case.
Proof of Theorem 2.1.$(m \geq 2)$. Start by sampling $\{I_j\,:\, 1 \leq j \leq m\}$ independently and uniformly in $V_n$. Without loss of generality we can assume that $I_1 \neq I_2 \neq \cdots \neq I_m$. Next, note that the couplings for $\mathcal{G}_i^{(k)}(\mathbf{a})$ with their corresponding intermediate trees can be done simultaneously for all $i \in V_n$, so let $\hat T^{(k)}_{\emptyset(i)}({\hat{\mathbf{A}}})$ be the coupled tree for $\mathcal{G}_i^{(k)}(\mathbf{a})$. Note that the $\big\{ \hat T^{(k)}_{\emptyset(I_j)}({\hat{\mathbf{A}}})\,:\, 1 \leq j \leq m\big\}$ are not independent of each other, but by Theorem 2.1 for the $m = 1$ case, they satisfy $\sum_{j=1}^m \mathbb{E}_n\big[ \rho(\mathbf{X}_{I_j}, {\hat{\mathbf{X}}}_{\emptyset(I_j)}) \big] \xrightarrow{\textrm{P}} 0$ and $\mathbb{P}_n\big( \bigcup_{j=1}^m \big\{ \mathcal{G}_{I_j}^{(k)} (\mathbf{a}) \not\simeq \hat T^{(k)}_{\emptyset(I_j)} ({\hat{\mathbf{A}}}) \big\} \big) \xrightarrow{\textrm{P}} 0$ as $n \to \infty$. We now explain how to construct a set of i.i.d. copies of $\hat T_{\emptyset(I_1)}^{(k)} ({\hat{\mathbf{A}}})$, denoted $\big\{ \tilde T_{\emptyset(I_j)}^{(k)}({\tilde{\mathbf{A}}}) \,:\, 1 \leq j \leq m\big\}$, satisfying
To start, let $\tilde T_{\emptyset(I_1)}^{(k)}({\tilde{\mathbf{A}}}) = \hat T_{\emptyset(I_1)}^{(k)}({\hat{\mathbf{A}}})$. Next, note that the trees $\big\{ \hat T_{\emptyset(I_j)}^{(k)}({\hat{\mathbf{A}}})\,:\, 1 \leq j \leq m \big\}$ are each (delayed) marked Galton–Watson processes whose roots have distribution $\mu_n^*({\cdot}) = \mathbb{P}_n\big( {\hat{\mathbf{X}}}_\emptyset \in \cdot \,\big)$ and all other nodes have distribution $\mu_n({\cdot}) = \mathbb{P}_n\big( {\hat{\mathbf{X}}}_1 \in \cdot \,\big)$. Since each full mark ${\hat{\mathbf{X}}}_\mathbf{i}$ contains the vector ${\hat{\mathbf{A}}}_\mathbf{i}$, we can see which vertex attributes have been sampled. Note that the possible vertex attributes are $\{\mathbf{a}_1, \dots, \mathbf{a}_n\}$; they are such that $p_i^* = \mathbb{P}_n( {\hat{\mathbf{A}}}_\emptyset = \mathbf{a}_i) = 1/n$, and $p_i = \mathbb{P}_n({\hat{\mathbf{A}}}_1 = \mathbf{a}_i)$ is either equal to $D_i/L_n$ in the CM or to $\bar W_i/\Lambda_n$ in the IR. We need to keep track of the labels $\{1, 2, \dots, n\}$ of the vertex attributes $\{\mathbf{a}_1, \mathbf{a}_2, \dots, \mathbf{a}_n\}$. To do this, let S be the set of labels sampled in the construction of $\tilde T_{\emptyset(I_1)}^{(k)}({\tilde{\mathbf{A}}})$, and then construct each of $\tilde T_{\emptyset(I_j)}^{(k)}({\tilde{\mathbf{A}}})$, $2 \leq j \leq m$, in a breadth-first fashion according to the following rule:
If a node in $\hat T_{\emptyset(I_j)}^{(k)}({\hat{\mathbf{A}}})$ has a vertex attribute whose label is not in the set S, copy the node onto $\tilde T_{\emptyset(I_j)}^{(k)}({\tilde{\mathbf{A}}})$ and add the new observed label to the set S.
-
• Otherwise, attach an independent copy of a Galton–Watson marked tree having full mark distribution $\mu_n$ where the repeated node would be.
Since, as long as we do not sample any vertex attributes from the set S, we will have $\tilde T_{\emptyset(I_j)}^{(k)}({\tilde{\mathbf{A}}}) \simeq \hat T_{\emptyset(I_j)}^{(k)}({\hat{\mathbf{A}}})$, for any $x_n> 0$ and $M_n$ equal to either $L_n$ for a CM or $\Lambda_n$ for an IR, we have
Now note that since, in the first probability, $\sum_{j=1}^m\big| \hat T_{\emptyset(I_j)}^{(k)} \big| \leq mx_n$ and the chances of sampling a label from S is at most $m x_n/M_n$, we have that (4.2) is bounded by $\mathbb{P}_n( \text{Bin}(m x_n, mx_n/M_n) \geq 1) \leq {m^2 x_n^2 }/{M_n}$, where $\text{Bin}(n,p)$ is a binomial random variable with parameters (n, p). To analyze (4.3), let $Y_\mathbf{i} = p_i M_n$ if ${\hat{\mathbf{A}}}_\mathbf{i} = \mathbf{a}_i$, and note that $\sum_{i\in S} p_i = \sum_{j=1}^m \sum_{\mathbf{i} \in \hat T_{\emptyset(I_j)}^{(k)}} Y_\mathbf{i}/M_n$, so by the union bound we obtain that (4.3) is bounded from above by
Now use Theorem 4.3 to obtain that, for any $\varepsilon > 0$,
as $n \to \infty$, where $\mathcal{Y}_\mathbf{i}$ is equal to the second component of $\boldsymbol{{X}}_\mathbf{i}$. Since $|\mathcal{T}^{(k)}| < \infty$ almost surely and does not depend on $\mathscr{F}_n$, and $1/M_n = O_\mathbb{P}(1/n)$ as $n \to \infty$, choosing $x_n = n^{1/2}/\log n$ proves (4.1).
Finally, use Theorem 4.3 applied to each of the $\big\{ \tilde T_{\emptyset(I_j)}^{(k)}({\tilde{\mathbf{A}}})\,:\, 1 \leq j \leq m\big\}$ to obtain that there exists an i.i.d. set $\big\{ \mathcal{T}^{(k)}_{\emptyset(I_j)}(\boldsymbol{{A}})\,:\, 1 \leq j \leq m \big\}$ having the same distribution as $\mathcal{T}^{(k)}(\boldsymbol{{A}})$ such that, for any $\varepsilon \in (0,1)$, $\sum_{j=1}^m \mathbb{E}_n\big[ \rho({\tilde{\mathbf{X}}}_{\emptyset(I_j)}, \boldsymbol{{X}}_{\emptyset(I_j)}) \big] \xrightarrow{\textrm{P}} 0$ and
as $n \to \infty$. This completes the proof.
Acknowledgements
I would like to thank Sayan Banerjee for suggesting the connection between strong couplings and the notion of propagation of chaos.
Funding information
There are no funding bodies to thank relating to the creation of this article.
Competing interests
There were no competing interests to declare which arose during the preparation or publication process of this article.