1. Introduction
In social network analysis, reciprocal edges characterize communication between two users. For instance, on Facebook, one user leaves messages on another user’s wall page, and a response from the target user then creates a reciprocal edge. Reciprocity, which is classically defined as the proportion of reciprocal edges (cf. [Reference Newman, Forrest and Balthrop12, Reference Wasserman and Faust21]), is one important network metric to measure interactions among individual users. The directed network constructed from Facebook wall posts [Reference Viswanath, Mislove, Cha and Gummadi17] is one example of social networks with a large proportion of reciprocal edges. The study on eight different types of networks in [Reference Jiang, Zhang and Towsley7] shows that online social networks (e.g. [Reference Cha, Mislove and Gummadi3, Reference Java, Song, Finin and Tseng6, Reference Magno9, Reference Mislove10, Reference Viswanath, Mislove, Cha and Gummadi17]) tend to have a higher proportion of reciprocal edges, compared to other types of networks such as biological networks, communication networks, software call graphs, and P2P networks.
Another widely observed feature of directed social networks is the scale-free property, where both in- and out-degree distributions have Pareto-like tails. The directed preferential attachment (PA) network model is appealing (cf. [Reference Bollobás, Borgs, Chayes and Riordan2, Reference Krapivsky and Redner8]), since theoretically the directed PA mechanism generates a network with the scale-free property because nodes with large degrees are likely to attract more edges than those with small degrees (cf. [Reference Resnick and Samorodnitsky14, Reference Samorodnitsky15, Reference Wan, Wang, Davis and Resnick18, Reference Wang and Resnick20]). However, the asymptotic behavior of the proportion of reciprocal edges in a directed PA model has not yet been explored in the literature. In this paper, we derive asymptotic results about (1) the number of reciprocal edges between two fixed nodes; (2) the proportion of reciprocal edges in a directed PA network, provided that the network has a large number of edges; and (3) the first time a reciprocal edge appears between two distinct fixed nodes.
Our theoretical results suggest that for certain choices of model parameters, especially when the proportion of edges added between two existing nodes is small, the proportion of reciprocal edges in the entire graph is close to 0, even though the total number of reciprocal edges between two fixed nodes may be of order $O(n^a)$ , $a\in (0,1)$ . Such behavior flags potential problems for fitting a directed PA model in practice. When fitting a directed PA model to a real network with high reciprocity using existing methods developed in [Reference Wan, Wang, Davis and Resnick18, Reference Wan, Wang, Davis and Resnick19], there is no guarantee that the calibrated model will also have a high proportion of reciprocal edges. Such a discrepancy indicates a poor fit of the PA model, since the fitted model fails to capture the important feature of high reciprocity. In these cases, variants of the directed PA model need to be considered. For instance, [Reference Cheng, Romero, Meeder and Kleinberg4] provides several different ways to predict the reciprocal edges between two given nodes, and one may incorporate those features to construct a refined network model that is both scale-free and of high reciprocity.
The rest of the paper is organized as follows. Section 1.1 provides the notation and definitions necessary to specify a growing sequence of graphs that evolve according to PA. The definitions use model parameters that control the growth of power law sequences. In Section 2, we give the power law growth of the in- and out-degree sequences. This leads to the main results of the paper, in Section 3:
-
1. For fixed nodes i, j, the number of reciprocal edges between i and j evolves as a power law.
-
2. For specified subsets of the parameters, the proportion of reciprocal edges in the graph goes to 0, showing that many real data sets with significant reciprocity do not follow this model.
-
3. For fixed nodes i, j we provide information about the first time a reciprocal pair of edges forms between i and j.
Section 4 contains a short discussion of theoretical results and future research directions, and the appendix gives some lemmas and proofs.
1.1. Model setup
Here is the specification of the classical directed PA model. Initialize the model with the graph G(0), which consists of one node, labeled node 1, and a self-loop. After n steps in the construction, $G(n)=(V(n), E(n))$ is the graph with node set V(n) with $V(0) = \{1\}$ and $|V(0)| = {1}$ and set of directed edges E(n) such that an ordered pair $(i,j)\in E(n)$ , $i,j\in V(n)$ , represents a directed edge $i\mapsto j$ . When $n=0$ , we have $E(0) = \{(1,1)\}$ . For later use, self-loops are not counted for reciprocity.
Let $\bigl(D^{\text{in}}_v(n), D^{\text{out}}_v(n)\bigr)$ be the in- and out-degrees of node $v \in V(n)$ with the convention that if $v \notin V(n)$ , $D^{\text{in}}_v(n) = 0 =D^{\text{out}}_v(n)$ . From G(n) to $G(n+1)$ , one of three scenarios happens:
-
1. With probability $\alpha$ , we add a new edge (v, w), where $w\in V(n)$ , and $v\notin V(n)$ is a new node. The existing node w is chosen with probability
\[\frac{D^{\text{in}}_w(n)+\delta_{\text{in}}}{\sum_{w\in V(n)} \big(D^{\text{in}}_w(n)+\delta_{\text{in}}\big)}= \frac{D^{\text{in}}_w(n)+\delta_{\text{in}}}{n+1+\delta_{\text{in}} |V(n)|}.\] -
2. With probability $\beta$ , a new edge (v, w) is added between two existing nodes $v,w\in V(n)$ , where the starting node v and the ending node w are chosen independently with probability
\begin{align*}&\frac{D^{\text{in}}_w(n)+\delta_{\text{in}}}{\sum_{w\in V(n)} \big(D^{\text{in}}_w(n)+\delta_{\text{in}}\big)}\frac{D^{\text{out}}_v(n)+\delta_{\text{out}}}{\sum_{v\in V(n)} \big(D^{\text{out}}_v(n)+\delta_{\text{out}}\big)} \\[4pt] &\qquad=\frac{D^{\text{in}}_w(n)+\delta_{\text{in}}}{n+1+\delta_{\text{in}} |V(n)|}\frac{D^{\text{out}}_v(n)+\delta_{\text{out}}}{n+1+\delta_{\text{out}} |V(n)|}.\end{align*}For brevity of notation, we set(1.1) \begin{equation}A_{wv}(n) \,:\!=\, \frac{D^{\text{in}}_w(n)+\delta_{\text{in}}}{n+1+\delta_{\text{in}} |V(n)|}\frac{D^{\text{out}}_v(n)+\delta_{\text{out}}}{n+1+\delta_{\text{out}}|V(n)|};\end{equation}then the attachment probability in the $\beta$ -scheme is $\beta A_{wv}(n)$ . -
3. With probability $\gamma$ , we add a new edge (w, v), where $w\in V(n)$ , and $v\notin V(n)$ is a new node. The existing node w is chosen with probability
\[\frac{D^{\text{out}}_w(n)+\delta_{\text{out}}}{\sum_{w\in V(n)} \big(D^{\text{out}}_w(n)+\delta_{\text{out}}\big)}=\frac{D^{\text{out}}_w(n)+\delta_{\text{out}}}{n+1+\delta_{\text{out}} |V(n)|}.\]
We assume $\alpha+\beta+\gamma=1$ , $\beta\in [0,1)$ , and $\delta_{\text{in}},\delta_{\text{out}}>0$ . Owing to the $\alpha$ - and $\gamma$ -schemes, $|V(n)|-1$ follows a binomial distribution with size n and success probability $\alpha+\gamma=1-\beta$ , so that $|V(n)|\stackrel{\text{a.s.}}{\longrightarrow}\infty$ as $n\to\infty$ . For $v\ge 1$ , we define $S_v$ to be the time when node v is created, i.e.
Since we use the convention that $D^{\text{in}}_v(n) = 0$ and $D^{\text{out}}_v(n) = 0$ if $S_v>n$ , we have by (1.1) that $A_{vw}(n)\in [0,1]$ for all n.
Let $\mathcal{F}_n$ be the $\sigma$ -field generated by observing the network up to the creation of the nth new edge. Suppose $\tau$ is a stopping time with respect to the filtration $\big(\mathcal{F}_n\big)_{n\ge 0}$ ; then
By (1.2), we see that $S_v$ is a stopping time with respect to $\big(\mathcal{F}_n\big)_{n\ge 0}$ . For $n\ge k\ge 0$ , we have
so $S_v+k$ , $k\ge 0$ , is a stopping time with respect to $\big(\mathcal{F}_n\big)_{n\ge 0}$ . Note that for $v>n-k$ , $\{S_v=n-k\}=\emptyset\in \mathcal{F}_{n-k}\subset \mathcal{F}_n$ . In what follows, we write $\mathbb{E}^{\mathcal{F}_n}(\cdot) \,:\!=\, \mathbb{E}\big({\cdot}|\mathcal{F}_n\big)$ .
2. Growth of the degree sequences $\boldsymbol{D}^{\textrm{in}}_{\boldsymbol{v}}(n)$ , $\boldsymbol{D}^{\textrm{out}}_{\boldsymbol{v}}(n)$
The parameters controlling the behavior of the model are $\alpha,\beta,\gamma,\delta_{\text{in}}, \delta_{\text{out}}$ , and we now define two new parameters which will serve as power-law exponents:
From (2.1), $0<c_1<\alpha+\beta $ , $0<c_2<\beta+\gamma$ , and $0<c_1+c_2 < 1+\beta$ . In fact, we can reparametrize the PA model using $(\alpha,\beta,\gamma, c_1, c_2)$ , which leads to an estimation method for model parameters alternative to unavailable maximum likelihood techniques; see [Reference Wan, Wang, Davis and Resnick19] for details.
The following proposition summarizes the power-law growth of $\big(D^{\text{in}}_v(n), D^{\text{out}}_v(n)\big)$ , which is controlled by the parameters $c_1,c_2$ . From these growth rates, we will derive the limiting behavior of the number of reciprocal edges between two fixed nodes in Section 3.1.
Proposition 2.1. For $v\ge 1$ , there are random variables $\xi^{{in}}_v$ , $\xi^{{out}}_v$ satisfying $\mathbb{P}\big(\xi^{{in}}_v\in (0,\infty)\big)=1=\mathbb{P}\big(\xi^{{out}}_v\in (0,\infty)\big)$ , such that
The proof of Proposition 2.1 is given in Appendix A.3, after the statement and proof of two lemmas.
3. Reciprocity in the preferential attachment network
In this section, we focus on the asymptotic behavior of the number of reciprocal edges between two fixed, distinct nodes i and j as well as that of the proportion of reciprocal edges in the entire graph. We also consider the first time a reciprocal edge forms between two distinct nodes.
To assess goodness of fit of the directed PA model to a particular dataset, it is useful to evaluate statistics to see whether the empirical values match those from the fitted model. The statistic we focus on here is reciprocity. If there is a significant discrepancy between the reciprocity measure for the fitted theoretical PA model and that of the actual network data, then we conclude that variants of the classical PA model should be considered.
Given a directed graph $G=(V,E)$ , let $L_{(i,j)}= L_{(i,j)}(G)$ be the number of directed edges (i, j) in the graph G, for $i\neq j$ . Then define the reciprocity coefficient, $R=R(G)$ , as
Note that a node pair can be counted more than once but self-loops are never counted. For example, consider the graph given in Figure 1, where there are $|E|=6$ edges and node set $V=\{1,2,3\}$ . We distinguish multiple edges between two nodes by different colors, and if a pair of reciprocal edges are observed, we label the pair with the same color. The graph in Figure 1 contains a pair of blue edges and a pair of red edges, thus giving $R_6 = 4/6 = 0.667$ . In R, we can easily compute the reciprocity coefficient by applying the dyad_census() function in the igraph package to the graph object, and its mut value outputs the total number of unordered node pairs $\{i,j\}$ with reciprocal connections (i, j) and (j, i), allowing multiplicity.
Consider a sequence of graphs $\left\{G(n) = \bigl(V(n), E(n)\bigr),n\geq 1\right\}$ constructed following the PA rule with parameters $\big(\alpha,\beta,\gamma,\delta_{\text{in}},\delta_{\text{out}}\big)$ as outlined in Section 1.1. For two nodes $i\neq j\in V(n)$ , write $L_{(i,j)}(n) = L_{(i,j)}(G(n))$ , and write the reciprocity coefficient for the PA network G(n) as $R_n^{\text{pa}}\,:\!=\, R(G(n))$ . We emphasize that the definition in (3.1) excludes self-loops when counting reciprocal edges. In Sections 3.1 and 3.2, we study the asymptotic behavior of $L_{(i,j)}(n)$ for fixed $i\neq j$ , and $R_n^{\text{pa}}$ in a PA network G(n), respectively.
3.1. Reciprocal edges between two fixed nodes
In a directed PA network, the total number of reciprocal edges between two fixed nodes i, j is equal to
We first study the limiting behavior of the number of edges between two fixed nodes $i\neq j$ , $L_{(i,j)}(n)$ , when n is large. The asymptotics of $L_{i\leftrightarrow j}(n)$ then follow from a continuous mapping argument, and this leads in Section 3.2 to a study of the asymptotic behavior of $R_n^{\text{PA}}$ . The asymptotic behavior of $L_{(i,j)}(n)$ also assists in the study, in Section 3.3, of the behavior of the first time a reciprocal pair is formed between i and j.
3.1.1. Convergence of $L_{(i,j)}(n)$
Theorem 3.1 gives the main asymptotic results, but we start by presenting a lemma on $A_{ij}(n)$ which is useful for the proof of Theorem 3.1.
Lemma 3.1. Recall the definition of $A_{ij}(n)$ in (1.1) and the notation in Proposition 2.1. For fixed $1\le i<j$ ,
This further gives the following:
-
1. When $c_1+c_2>1$ , there exist constants $C_{ij}>0$ such that
(3.3) \begin{align}\frac{c_1+c_2-1}{n^{c_1+c_2-1}}\sum_{k=S_j}^{S_j+n} A_{ij}(k) &\longrightarrow C_{ij}\xi_j^{{in}} \xi_i^{{out}}, \qquad {a.s.\ and\ in\ L_1}.\end{align} -
2. When $c_1+c_2=1$ , there exist constants $C^{\prime}_{ij}>0$ such that
(3.4) \begin{align}\frac{1}{\log n}\sum_{k=S_j}^{S_j+n} A_{ij}(k) &\longrightarrow C^{\prime}_{ij} \xi_j^{{in}} \xi_i^{{out}},\qquad {a.s.\ and\ in\ L_1}. \end{align} -
3. When $c_1+c_2<1$ , there exist constants $C^{\prime\prime}_{ij}>0$ such that
(3.5) \begin{align}\frac{1}{n^{c_1+c_2-1}}\sum_{k=n}^{\infty} A_{ij}(k) &\longrightarrow C^{\prime\prime}_{ij} \xi_j^{{in}} \xi_i^{{out}}, \qquad {a.s.\ and\ in\ L_1},\end{align}which further implies $\sum_{k\ge 1} \mathbb{E}(A_{ij}(k)) <\infty$ and $\sum_{k\ge 1} A_{ij}(k) <\infty$ a.s.
In addition, we have similar convergence results for $A_{ji}(n)$ by replacing $A_{ij}(n)$ , $\xi^{{in}}_j$ , and $\xi^{{out}}_i$ with $A_{ji}(n)$ , $\xi^{{in}}_i$ , and $\xi^{{out}}_j$ , respectively.
Proof. By Lemma 2.1 as well as the fact that $|V(n)|/n\stackrel{\text{a.s.}}{\longrightarrow} 1-\beta$ , we have
Note that once we show
we have by [Reference Durrett5, Theorem 4.6.2] that $\big\{A_{ij}(n)/n^{c_1+c_2-2}\,:\, n\ge 1\big\}$ is uniformly integrable, which gives the $L_1$ -convergence in (3.2).
To prove (3.6), we now use the Cauchy–Schwarz inequality to obtain
Then by Lemma A.2 we have
With (3.2) established, the results in (3.3)–(3.5) follow directly from Karamata’s theorem (cf. [Reference Resnick13, Theorem 2.1]).
The results for $A_{ji}(n)$ follow from similar reasoning.
We now give the asymptotic behavior of $L_{(i,j)}(n)$ in a directed PA model.
Theorem 3.1. Consider two fixed nodes $1\le i< j$ . We have the following:
-
(i) If $c_1+c_2> 1$ , then there exists some random variable $\xi_{ij}$ , satisfying $\mathbb{P}\big(\xi_{ij}\in (0,\infty)\big) = 1 $ , such that as $n\to\infty$ ,
(3.7) \begin{equation} \frac{L_{(i,j)}(n)}{n^{c_1+c_2-1}}\stackrel{{a.s.}}{\longrightarrow} \xi_{ij}. \end{equation}So for large n, the number of (i, j) edges is of order $n^{c_1+c_2-1}$ . -
(ii) If $c_1+c_2= 1$ , then there exists some random variable $\zeta_{ij}$ , satisfying $\mathbb{P}\big(\zeta_{ij}\in (0,\infty)\big) = 1 $ , such that
(3.8) \begin{equation} \frac{L_{(i,j)}(n)}{\log n}\stackrel{{a.s.}}{\longrightarrow} \zeta_{ij}. \end{equation}So for large n, the number of (i, j) edges is of order $\log n$ . -
(iii) If $c_1+c_2<1$ , then for any $i<j$ , there is a last time for an (i, j) edge to form. Furthermore, as $n\to\infty$ ,
(3.9) \begin{equation} L_{(i,j)}(n)\uparrow L_{(i,j)}(\infty)<\infty, \qquad {a.s.}, \end{equation}and $L_{(i,j)}(n)-L_{(i,j)}(\infty) = 0$ a.s. for n large.
In addition, we obtain similar convergence results for $L_{(\,j,i)}(n)$ by replacing $L_{(i,j)}(n)$ , $L_{(i,j)}(\infty)$ , $\xi_{ij}$ , and $\zeta_{ij}$ with $L_{(\,j,i)}(n)$ , $L_{(\,j,i)}(\infty)$ , $\xi_{ji}$ , and $\zeta_{ji}$ , respectively.
Proof. Set
i.e. $\Delta_k(i,j) = 1$ if a directed edge (i, j) is created from $G(k-1)$ to G(k). For $1\le i<j$ , notice that
For $n\ge 0$ ,
and
When $c_1+c_2\ge 1$ , (3.3) and (3.4) suggest that $\sum_{k=0}^{\infty} A_{ji}\big(S_j+k\big) = \infty$ a.s.; then we apply [Reference Durrett5, Theorem 4.5.5] to get
Also, by a similar argument as in (3.3), we have that when $c_1+c_2>1$ , there exists some constant $\widetilde{C}_{ij}>0$ such that
Then combining (3.13) with (3.14) gives
Analogous reasoning is also applicable to the case $c_1+c_2=1$ , where the scaling function $n^{c_1+c_2-1}$ is replaced with $\log n$ , according to (3.4).
Next, consider the case $c_1+c_2<1$ . By the corollary in [Reference Neveu11, Chapter IV.6, p. 151], we have a.s.
Using a similar argument as in Lemma 3.1(3), we have $\sum_{k\ge S_j} A_{ji}(k)<\infty$ a.s., thus giving a.s.
Hence, with probability 1, there is a finite number of (i, j) edges that can be formed, and there exists a last time for an (i, j) edge to form.
Note that by the definition of $L_{i\leftrightarrow j}(n)$ , applying continuous mapping gives the asymptotic results for $L_{i\leftrightarrow j}(n)$ , which also depends on the value of $c_1+c_2$ :
-
(1) If $c_1+c_2> 1$ , then there exists some random variable $\overline{\xi}_{ij}$ , satisfying $\mathbb{P}\big(\overline{\xi}_{ij}\in (0,\infty)\big) = 1 $ , such that as $n\to\infty$ ,
\[\frac{L_{i\leftrightarrow j}(n)}{n^{c_1+c_2-1}}\stackrel{\text{a.s.}}{\longrightarrow} \overline{\xi}_{ij}.\]So for large n, the number of reciprocal edges between i and j is of order $n^{c_1+c_2-1}$ . -
(2) If $c_1+c_2= 1$ , then there exists some random variable $\overline{\zeta}_{ij}$ , satisfying $\mathbb{P}\big(\overline{\zeta}_{ij}\in (0,\infty)\big) = 1 $ , such that
\[\frac{L_{i\leftrightarrow j}(n)}{\log n}\stackrel{\text{a.s.}}{\longrightarrow} \overline{\zeta}_{ij}.\]So for large n, the number of reciprocal edges between i and j is of order $\log n$ . -
(3) If $c_1+c_2<1$ , then a.s.
\[L_{i\leftrightarrow j}(n)\uparrow L_{i\leftrightarrow j}(\infty)<\infty.\]
Now consider a special case with $\gamma=0$ and $\delta_{\text{in}} = \delta_{\text{out}} = \delta>0$ ; then
From Theorem 3.1, we see that when the probability of generating a new node in a PA network at each step is too high, the node set grows strongly and it is difficult to form reciprocal edges. Then the number of reciprocal edges between two nodes is finite a.s. If $\beta = 0$ , then for $\delta_{\text{in}}, \delta_{\text{out}}>0$ , we always have $c_1+c_2<1$ , indicating that the number of edges between two fixed nodes is finite a.s. when no edge is added between two existing nodes.
3.2. Reciprocity in the entire graph
Here we consider the proportion of reciprocal edges in the entire PA network, $R_n^{\text{pa}} $ , and the next theorem specifies the asymptotic behavior of $R_n$ for $0< c_1+c_2< 5/3$ .
Theorem 3.2. Suppose $R_n^{{pa}}$ is as defined in (3.1). Then for $0< c_1+c_2< 5/3$ , we have
When $c_1+c_2<5/3$ , the reciprocity coefficient $R_n^{\text{pa}} $ is likely to be small, provided that the number of edges in the PA network is large. In particular, when $c_1+c_2\le 1$ , i.e. in the second and third cases in Theorem 3.1, we have $R_n^{\text{pa}}\stackrel{p}{\longrightarrow} 0$ .
Proof. It suffices to show that $\mathbb{E}\big(R_n^{\text{pa}} \big)\to 0$ for $0< c_1+c_2< 5/3$ , as $n\to \infty$ . Recall that
so that $\Delta_k(i,j)\boldsymbol{1}_{\{(\,j,i)\in E(k-1)\}}$ indicates the event that from $G(k-1)$ to G(k), an edge (i, j) is created when (j, i) already exists in $G(k-1)$ . By the definition of $R_n^{\text{pa}} $ , we have
For $Q_1(n)$ , we have
Since $S_j\ge j-1$ for $j\ge 2$ , (3.16) implies
By the Cauchy–Schwarz inequality, we have
When $c_1+c_2>1$ , Lemma A.2 and (A.11) together imply that there exist constants $M_1$ , $M_2$ , $M_3>0$ such that for $i<j\le k\le n$ ,
and that for $j\le l\le k-1$ ,
Therefore, when $c_1+c_2>1$ , we have
and since $k^{c_1+c_2-1}/\big(i^{c_1}j^{c_2}\big)\ge j^{c_1-1}/i^{c_1}$ for $k\ge j$ , there exists some constant $M>0$ such that
Then (3.17) leads to
If $c_1/2+c_2>1$ , $c_1+c_2/2>1$ , and $c_1+c_2<5/3$ , then
If $c_1/2+c_2< 1$ , $c_1+c_2/2< 1$ , then
If $c_1/2+c_2<1$ , $c_1+c_2/2>1$ , and $c_1+c_2<5/3$ , then
as $c_1+c_2/2<1+1/2=3/2$ . Similarly, $\mathbb{E}(Q_1(n)) \to 0$ , when $c_1/2+c_2>1$ , $c_1+c_2/2<1$ , and $c_1+c_2<5/3$ . The proof machinery also applies to the case where either $c_1/2+c_2=1$ or $c_1+c_2/2=1$ , and $c_1+c_2<5/3$ , which gives the conclusion that for $1<c_1+c_2<5/3$ , $\mathbb{E}(Q_1(n)) \to 0$ . Following the same reasoning, we have $\mathbb{E}(Q_2(n)) \to 0$ , for $1<c_1+c_2<5/3$ , thus implying $R_n^{\text{pa}} \stackrel{p}{\longrightarrow} 0$ for $1<c_1+c_2<5/3$ .
When $c_1+c_2=1$ , we revise the bound in (3.19) to get the following: for some constant $\widetilde{M}>0$ ,
Then we have
Meanwhile, for some constant $\widetilde{M}^{\prime}>0$ , we have
Hence, we have $R_n^{\text{pa}} \stackrel{p}{\longrightarrow}0$ when $c_1+c_2=1$ .
When $c_1+c_2<1$ , the bound in (3.18) implies that there exists some constant $\bar M>0$ such that
which gives
Similar reasoning also gives $\mathbb{E}(Q_2(n))\to 0$ when $c_1+c_2<1$ , thus giving $\mathbb{E}\big(R_n^{\text{pa}} \big)\to 0$ and completing the proof of the theorem.
Remark 3.1. (i) From the definition of $c_1$ and $c_2$ in (2.1), if $\beta\le 2/3$ , then $c_1+c_2<1+\beta\le 5/3$ . Theorem 3.2 suggests that if the proportion of edges added between two existing nodes is less than $2/3$ and n is sufficiently large, the corresponding PA network will have $R_n^{\text{pa}} $ close to 0.
(ii) Note also that
Hence, when $c_1+2c_2<1$ , applying the bound in (3.20) gives the following: for fixed i, there exists some constant $\tilde{M}>0$ such that
This indicates that when $c_1+2c_2<1$ , a fixed node i can form a reciprocal pair of edges only with finitely many nodes.
3.2.1. Simulation for $c_1+c_2\ge 5/3$
Theorem 3.2 does not explain the asymptotic behavior of $R_n^{\text{pa}} $ for $c_1+c_2\in [5/3, 1+\beta)$ , provided that $\beta\in (2/3,1)$ . For comparison, we choose three sets of parameters,
such that the values of $c_1+c_2$ are equal to $1.393<5/3$ , $1.667=5/3$ , and $1.727>5/3$ , respectively. For each $\boldsymbol{\theta}_i$ , $i=1,2,3$ , we simulate 1000 replications of the directed PA network with $10^5$ edges, and compute the value of $R_n^{\text{pa}} $ for each replication.
The numerical results are summarized as box plots in Figure 2. For each box plot, we use a dark red dot to mark the corresponding averaged empirical $R_n^{\text{pa}} $ . Under $\boldsymbol{\theta}_1$ , all 1000 empirical $R_n^{\text{pa}} $ values are close to 0, with a maximum of 0.090 and a minimum of 0.021. The empirical $R_n^{\text{pa}} $ values under $\boldsymbol{\theta}_2$ and $\boldsymbol{\theta}_3$ are more variable, but both have higher mean than in the $\boldsymbol{\theta}_1$ case. This simulation experiment confirms that the asymptotic behavior of $R_n^{\text{pa}} $ in a directed PA model depends on the value of $c_1+c_2$ . Meanwhile, when $c_1+c_2\ge 5/3$ , the value of $R_n^{\text{pa}} $ may not necessarily concentrate around a specific value, but may vary over a certain range.
3.3. The first time when a reciprocal pair forms
Theorem 3.1 gives results about the first time when a reciprocal pair forms between nodes $i\neq j$ . The $c_1+c_2\ge 1$ scenario is discussed in Corollary 3.1, while the $c_1+c_2<1$ case is analyzed in Proposition 3.1.
Corollary 3.1. Let $N_0^{i\leftrightarrow j}$ be the first time when a reciprocal pair of edges $i\leftrightarrow j$ is formed between nodes $i\neq j$ ; i.e.
with the convention that $\inf\emptyset = \infty$ . If $1\le i<j$ are fixed, and $c_1,c_2$ are as given in (2.1) and satisfy $c_1+c_2\ge1$ , then $N_0^{i\leftrightarrow j}<\infty$ a.s.
Proof. We will show that for $c_1+c_2\ge 1$ , $\lim_{n\to\infty}\mathbb{P}\Big(N_0^{i\leftrightarrow j}{>} S_j+n\Big)=\mathbb{P}\Big(N_0^{i\leftrightarrow j}=\infty\Big)=0$ . Note that when $c_1+c_2\ge 1$ , (3.15) indicates that $L_{(i,j)}\big(S_j+n\big)\stackrel{\text{a.s.}}{\longrightarrow} \infty$ , and similarly, $L_{(\,j,i)}\big(S_j+n\big)\stackrel{\text{a.s.}}{\longrightarrow} \infty$ . Therefore,
as $n\to\infty$ .
When $c_1+c_2<1$ , if we have $\mathbb{E}\big(L_{(i,j)}(\infty)\big)<1$ , then
which implies $\mathbb{P}\big(L_{(i,j)}(\infty) =0\big)>0$ , and $\mathbb{P}\Big(N_0^{i\leftrightarrow j} =\infty\Big)>0$ . Note that
and by (A.11), there exist some constants ${K}_1, {K}_2>0$ such that for $n\ge i\ge 1$ ,
Then applying the Cauchy–Schwarz inequality gives that for $n\ge j>i\ge 1$ ,
Therefore, we have
which gives
Since $c_1-1<0$ , for j sufficiently large we have $\mathbb{E}\big(L_{(i,j)}(\infty)\big)<1$ , thus giving $\mathbb{P}\Big(N_0^{i\leftrightarrow j} =\infty\Big)$ $>0$ . In other words, when $c_1+c_2<1$ , it is possible to have zero pairs of reciprocal edges between two fixed nodes i, j, if j is created at a late stage of network evolution.
Equation (3.22) also suggests that for arbitrarily chosen i, j, $\mathbb{E}\big(L_{(i,j)}(\infty)\big)<1$ if $\beta$ is small enough. Hence, if few edges are created between existing nodes, it is possible for two fixed nodes never to form a reciprocal pair of edges. In particular, when $\beta = 0$ , there is no reciprocal pair of edges, i.e. $\mathbb{P}\Big(N_0^{i\leftrightarrow j} =\infty\Big)=1$ , for all fixed i, j.
The following proposition assumes $c_1+c_2<1$ and gives the asymptotic behavior of $\sup_{j\ge n\epsilon} \mathbb{P}\Big(N_0^{i\leftrightarrow j}\le n\Big)$ .
Proposition 3.1. If $c_1+c_2<1$ , then for fixed $i\ge 1$ , $\epsilon>0$ , as $n\to\infty$ ,
Proof. Applying the union bound to $\mathbb{P}\Big(N_0^{i\leftrightarrow j}\le S_{j}+n\Big)$ gives
By (3.5), we see that for $\epsilon>0$ ,
as $n\to\infty$ , which gives
4. Discussion
Suppose that we are given a scale-free network with a large proportion of reciprocal edges, e.g. Facebook wall posts [Reference Viswanath, Mislove, Cha and Gummadi17], Twitter [Reference Java, Song, Finin and Tseng6], Google $+$ [Reference Magno9], or Flickr [Reference Cha, Mislove and Gummadi3, Reference Mislove10]. In fitting a directed PA model to such a dataset using inference methods developed in [Reference Wan, Wang, Davis and Resnick18, Reference Wan, Wang, Davis and Resnick19], there is no guarantee that the calibrated model also has a large $R^{\text{pa}}_n$ . In fact, estimated values of $\hat{c}_1$ and $\hat{c}_2$ do not necessarily satisfy $\hat{c}_1+\hat{c}_2\ge 5/3$ . If we have $\hat{c}_1+\hat{c}_2<5/3$ in the calibrated model, then by Theorem 3.2, the corresponding $R_n^{\text{pa}}$ is close to 0, which differs from the feature of high reciprocity in the given dataset. This flags modeling error and suggests the consideration of alternative models or variants of the classical PA network. For instance, once a directed edge (i, j) is created following the PA rule, we may add a reciprocal edge (j, i) with probability $\rho\in (0,1)$ . The study in [Reference Cheng, Romero, Meeder and Kleinberg4] also provides other features that can be employed to predict reciprocal edges; we will defer the analysis of these variants of directed PA models to future research.
Appendix A. Lemmas and proofs needed in Section 2
In this section, we state and prove two lemmas needed for the proof of Proposition 2.1, which is also given.
A.1. Statement and proof of Lemma A.1
Lemma A.1. For some integer $p\ge 1$ , $k\ge 0$ , and $v\ge 1$ , we have the following:
-
(i) For $p=1$ ,
(A.1) \begin{align}\mathbb{E}^{\mathcal{F}_{S_v+k}}&\Big(D^{{in}}_v(S_v+k+1)+\delta_{{in}}\Big)\nonumber\\ &=\Big(D^{{in}}_v(S_v+k)+\delta_{{in}}\Big)\left(1+\frac{\alpha+\beta}{S_v+k+1+\delta_{{in}}|V(S_v+k)|}\right).\end{align} -
(ii) For an integer $p\ge 2$ ,
(A.2) \begin{align}&\mathbb{E}^{\mathcal{F}_{S_v+k}}\left(\left(D^{{in}}_v(S_v+k+1)+\delta_{{in}}\right)^p\right)\nonumber\\ &=\Big(D^{{in}}_v(S_v+k)+\delta_{{in}}\Big)^p\left(1+p\frac{\alpha+\beta}{S_v+k+1+\delta_{{in}}|V(S_v+k)|}\right) \nonumber\\ &\qquad+ \frac{\alpha+\beta}{S_v+k+1+\delta_{{in}}|V(S_v+k)|}\sum_{r=2}^p\left(\begin{array}{c} p\\ r \end{array}\right) \Big(D^{{in}}_v(S_v+k)+\delta_{{in}}\Big)^{p-r+1}.\end{align}
Proof. Note that the right-hand sides of (A.1) and (A.2) are both $\mathcal{F}_{S_v+k}$ -measurable. We prove the results for $p\ge 2$ , and the case $p=1$ follows by a similar argument. Let $F\in \mathcal{F}_{S_v+k}$ ; then
and since $D^{\text{in}}_v(l+1) = D^{\text{in}}_v(l) + \boldsymbol{1}_{\{\text{Node} \,v\, \text{is chosen at step}\, l+1}\} \,=\!:\, D^{\text{in}}_v(l) + \Delta_v(l+1)$ , this equals
Since $F\cap \{S_v+k=l\}\in \mathcal{F}_l$ , the quantity in (A.3) is equal to
Note also that $\mathbb{E}^{\mathcal{F}_l}\big(\Delta_v(l+1)\big)=(\alpha+\beta)\Big(D^{\text{in}}_v(l)+\delta_{\text{in}}\Big)/\big(l+1+\delta_{\text{in}} |V(l)|\big)$ , so we have
A.2. Statement and proof of Lemma A.2
Next, we study properties of $\mathbb{E}\left[\big(D^{\text{in}}_v(S_v+k)\big)^p\right]$ and $\mathbb{E}\left[\big(D^{\text{out}}_v(S_v+k)\big)^p\right]$ , $k\ge 1$ , which are needed for deriving the theorems in Section 3 as well as for the proof of Proposition 2.1.
Lemma A.2. For $v\in V(n)$ and $p\ge 1$ , we have
Proof. For $p=1$ , we see from (A.1) that
since $S_v\ge 0$ and $|V(S_v+k)|\ge |V(k)|$ , this is
and as $\alpha+\beta\le 1$ , this is
Since $\big|V(k)\big|-1$ follows a binomial distribution with size $k\ge 1$ and success probability $1-\beta$ , by applying the Chernoff bound we obtain
and rewrite the term $H^{(2)}_v(k)$ in (A.4) as
Since $D^{\text{in}}_v(S_v+k)\le k+1$ and $\Bigl||V(k)|-(1-\beta)k\Bigr|\le k+1$ , the foregoing term is bounded by
Combining the bound in (A.6) with (A.4) gives
Recursively applying the inequality above k times, we have
Here, $\mathbb{E}\big(D^{\text{in}}_v(S_v)+\delta_{\text{in}}\big)=\alpha\delta_{\text{in}} +\gamma(1+\delta_{\text{in}})$ , depending on whether the $\alpha$ - or the $\gamma$ -scenario occurs. Note that there exists a constant $M>0$ such that
and it follows from (A.7) that
For $p\ge 2$ , suppose
holds for some constants, $A_r$ , $r=1,\ldots, p-1$ . Let $A_0 = \max\{A_r\,:\, r=1,\ldots, p-1\}$ ; then by (A.2), we have
We rewrite the $C^{(1)}_v(k)$ term in (A.9) to get
Similarly to the Chernoff bound in (A.5), we have, for $p\ge 2$ ,
Therefore, analogously to the calculation in (A.6), we have
and since $\left(D^{\text{in}}_v(S_v+k)+\delta_{\text{in}}\right)^p\le \big(k+1+\delta_{\text{in}}\big)^p$ , this is
Hence,
Note also that
for all $k\ge 1$ , $p\ge 2$ . We conclude from (A.9) that
Following the recursive step as for the $p=1$ case gives
Note that for $p\ge 1$ ,
since $D^{\text{in}}_v(n)$ is monotone in n. Then we have, for $v\ge 1$ ,
Applying a similar argument to the out-degrees completes the proof.
Remark A.1. In the proof of Lemma A.2, if we revise the inequality $|V(S_v+k)|\ge |V(k)|$ to $|V(S_v+k)|\ge |V(v+k-1)|$ , then we have for $p\ge 1$
These results are used in the proof of Theorem 3.2, in Section 3.2.
A.3. Proof of Proposition 2.1
We prove only the results for $D^{\text{in}}_v(n)$ ; those for $D^{\text{out}}_v(n)$ follow from a similar argument. First, by Lemma A.1, we see that for $n\ge 1$ ,
is a nonnegative $\left(\mathcal{F}_{S_v+n}\right)_{n\ge 0}$ -martingale, which by the martingale convergence theorem converges to some limit $L_v$ a.s. as $n\to\infty$ . It remains to analyze the denominator and to verify that $\mathbb{P}(L_v\in (0,\infty))=1$ . We do this by applying some proof machinery similar to that of [Reference Van der Hofstad16, Lemma 8.17].
By Markov’s inequality, we see that for $\epsilon>0$ , and $\max\{-1,-\delta_{\text{in}}\}<m<0$ ,
By (A.13), it suffices to show
Similarly to [Reference Van der Hofstad16, Equation (8.7.23)], there exists some constant $C_m$ such that
Hence, once we show
the inequality in (A.13) implies $\mathbb{P}(L_v\in (0,\infty))=1$ .
We prove (A.14) by showing that
is an $\left(\mathcal{F}_{S_v+n}\right)_{n\ge 0}$ -martingale. Note that
which confirms that $M^{(m)}_n$ is an $\left(\mathcal{F}_{S_v+n}\right)_{n\ge 0}$ -martingale. Also,
Since $m<0$ and $S_v\ge v-1\ge 0$ , we have
This further implies
thus completing the proof of (A.14).
Next, we consider the convergence of $X^{(v)}_n$ by noting that
Since $\log(1+x)\le x$ , for all $x\ge 0$ , we have $\text{I}_v(n+1)-\text{I}_v(n)\le 0$ for all n; i.e. $\text{I}_v(n)$ is decreasing in n. Note also that $\left|\log(1+x)-x \right|\le x^2/2$ for all $x\ge 0$ , so we have
which implies $\text{I}_v(\infty)<\infty$ a.s., and $\text{I}_v(n)\stackrel{\text{a.s.}}{\longrightarrow} \text{I}_v(\infty)$ as $n\to\infty$ .
By [Reference Athreya and Ney1, Theorem 3.9.4], we see that there exists a finite random variable Z such that
then
Since $\sum_{k=1}^n 1/k-\log n\to \tilde{c}$ , where $\tilde{c}$ is Euler’s constant, for $v=1$ we have
and for $v\ge 2$ ,
Hence, as $n\to\infty$ ,
Combining (A.15) with the convergence of the martingale in (A.12), we have
then
Since $\mathbb{P}(L_v\in (0,\infty))=1$ and $S_v+1\ge v\ge 1$ , we also have $\mathbb{P}(\xi^{\text{in}}_v\in (0,\infty))=1$ .
Acknowledgements
We thank the two anonymous referees for their valuable comments on the manuscript.
Funding information
There are no funding bodies to thank in relation to this creation of this article.
Competing interests
There were no competing interests to declare which arose during the preparation or publication process for this article.