1. INTRODUCTION
The present analysis is motivated by content distribution in peer-to-peer (P2P) networks (see, e.g., [Reference Van Mieghem6, Chap. 13]) such as Napster, Bit Torrent, Tribler, and so forth. Content is often either fully replicated at a peer or is split into chunks and stored over m peers. A major task of a member of the peer group lies in selecting the best peer among those m peers that possess the desired content. This searching process for peers is not always optimized and several variants exist depending on which criterion is optimized. For example, one might choose that peer that is most nearby in the number of hops or in the delay or latency. The first problem is analogous to the anycast problem in IPv6, treated in [Reference Van Mieghem5, Chap. 18]. Here, we confine ourselves to the second problem: Given a network of N nodes over which m peers are randomly distributed, what is the delay from a certain peer to the nearest (in delay) peer? The presented analysis shows how that delay varies as the number m of peers changes and it might set bounds on the minimum number of peers needed to still offer an acceptable content distribution service.
In a P2P network, peers are joining and leaving regularly, which motivates us to consider such a network to first order as an Erdös–Rényi random graph. The shortest path between two nodes is the path for which the sum of the link weights is minimal. The shortest path tree (SPT) rooted at an arbitrary node to m uniformly chosen peers is the union of the shortest paths from that node to all m different peers in the graph. Thus, the computation of a shortest path requires the knowledge of link weights, such as a monetary cost, the number of hops, a delay, a distance, and so forth. In most communication networks, link weights are not precisely known. Here and as motivated in [Reference Van Mieghem5, Sect. 16.1], we assign independent and identically distributed (i.i.d.) exponentially distributed weights with unit mean to links.
In [Reference van der Hofstad, Hooghiemstra and Van Mieghem4,Reference Van Mieghem5], we have rephrased the shortest path problem between two arbitrary nodes in the complete graph K N with i.i.d. exponential link weights as a Markov discovery process, which starts the path searching process at the source. The discovery process is a continuous-time Markov chain with N states. Each state n represents the n already discovered nodes (including the source node). If at some stage in the Markov discovery process n nodes are discovered, then the next node is reached with rate λn,n+1 = n(N − n), which is the transition rate in the continuous-time Markov chain. Since the discovering of nodes at each stage only increases n, the Markov discovery process is a pure birth process with birth rate n(N − n). We call τn the interattachement time between the inclusion of the nth and (n + 1)st node to the SPT for n = 1, … , N − 1. The interattachement time τn is exponentially distributed with parameter n(N − n) as follows from the theory of Markov processes. By the memoryless property of the exponential distribution and the symmetry in the complete graph K N, the new node is added uniformly to an already discovered node. Hence, the resulting SPT to all nodes (i.e., m = N − 1) is exactly a uniform recursive tree (URT). A URT of size N is a random tree rooted at some source node and where, at each stage, a new node is attached uniformly to one of the existing nodes until the total number of nodes is equal to N. As proven in [Reference van der Hofstad, Hooghiemstra and Van Mieghem4] for large N, the URT is also asymptotically the shortest path in the class of Erdös–Rényi random graph with i.i.d. exponential link weights.
The article is outlined as follows. Section 2 derives the exact probability generating function of the weight toward the first encountered peer. The asymptotics for large N (and constant peer group size m) is computed in Section 3 and compared with simulations for finite graph size N. Mathematical derivations are given in the appendices.
2. PROBABILITY GENERATING FUNCTION
The weight W N;m of the shortest path from an arbitrary node to the nearest (in weight) peer in a peer group of size m isFootnote 1

where T N(m) is the number of steps in the continuous Markov discovery process until one peer of the group of m peers is reached, or T N(m) is the hitting time to the peer group of size m. Since the random variables 1{T N(m) ≥ k} and τk (for a fixed k) are independent, the average weight directly follows as
![$$E\left[W_{N\semicolon m}\right]=\sum_{k=1}^{N-1}\hbox{Pr}\left[T_{N}\left(m\right)\ge k\right]E\left[\tau_{k}\right].$$](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20151022131819267-0421:S026996480800003X_eqnU1.gif?pub-status=live)
The event {T N(m) ≥ k} implies that the kth discovered (attached) node in the URT is the first peer out of the m peers that is reached. Hence, the k − 1 previously discovered nodes do not belong to the peer group. Since the m peers as well as the k − 1 nodes are uniformly chosen out of the N − 1 nodes, we have that

such that, with E[τk] = 1/k(N − k),

Invoking the identity (C.1) of Appendix C yields

After partial fraction expansion, the q-sum is

and we arrive at

where ψ(x) is the digamma function [Reference Abramowitz and Stegun1, Sect. 6.3]. For large N, we have [Reference Abramowitz and Stegun1, Sect. 6.3.18]

Since the sequence of random variables 1{T N(m)≥1}, 1T N(m)≥2}, … , 1{T N(m)≥N−1} is obviously not independent, a computation of the probability generating function (pgf) ϕW N;m(z) = E[e −zW N;m] different from straightforwardly using (1) is needed. We define v k = ∑n=1kτn as the weight of the shortest path to the kth attached (discovered) node. Since the laws of the Markov discovery process and of the URT are independent [Reference van der Hofstad, Hooghiemstra and Van Mieghem4], we obtain

where Y m(k) denotes the event that the kth attached node is the first encountered peer among the m peers in the URT. Now, there are ways to distribute the m peers (different from the source node) over the N − 1 remaining nodes. If the kth node is the first of the m discovered peers, it implies that the remaining m − 1 peers need to be discovered later. There are
possible ways, from which

Thus, we obtainFootnote 2

After partial integration of a single-sided Laplace transform ϕX(z) = ∫0∞f X(t)e −ztdt and assuming that the derivative f′X(t) exists, the relation f X(0) = limz→∞zϕX(z) is found. Applied to (3), this yields

3. ASYMPTOTIC pgf AND pdf
Although the inverse Laplace transform of (3) can be computed, the resulting series for the probability density function (pdf) is hardly appealing. As shown below, an asymptotic analysis leads to an elegant result that, in addition, seems applicable to a relatively small network size N. Following a similar procedure as in [Reference Van Mieghem5, pp. 518–520], we write

and define . Then

and, substituted in (3), yields

For large N and |z| < N, we have(3) that such that

We now introduce the scaling z = Nx, where |x| < 1 since |z| < N:

For large N and fixed m and x, the asymptotic order of the sum

scales as

This result (7) is derived in Appendix A. Substitution of (7) into (5), leads, for large N, fixed m and |x| < 1, to

or
![$$\eqalignno{\lim_{N\rightarrow\infty}N^{x}\varphi_{W_{N\semicolon m}}\left(Nx\right) =\lim_{N\rightarrow\infty} E\left[e^{-\lpar NW_{N\semicolon m}-\ln N\, \rpar x}\right]\cr ={\pi\, x \over \sin\pi\, x}{\Gamma\lpar x+m\rpar \over \Gamma\lpar m\rpar }.}$$](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20151022131819267-0421:S026996480800003X_eqn8.gif?pub-status=live)
Finally, the inverse Laplace transform, computed in Appendix B, is

where Γ(−m,e −t) is the incomplete Gamma function (C.3). If , then we have
![$$\lim_{N\rightarrow\infty}\hbox{Pr}\left[NW_{N}-\ln N\le t\right]=1-e^{-t} e^{e^{-t}}\vint_{e^{-t}}^{\infty}{e^{-u} \over u}\, du.$$](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20151022131819267-0421:S026996480800003X_eqn10.gif?pub-status=live)
The pdf follows after derivation of (9) with respect to t as

4.1. The Fermi–Dirac Distribution
For large m and to first order, we have , such that
![$$\lim_{N\rightarrow\infty} E\left[e^{-\lpar NW_{N\semicolon m}-\ln {N / m}\rpar x}\right]={\pi x \over {\rm sin}\, \pi x}.$$](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20151022131819267-0421:S026996480800003X_eqnU14.gif?pub-status=live)
In view of the average (2) for large N, we can write NW N;m − ln(N/m) = N(W N;m − E[W N;m]). After inverse Laplace transform with 0 < c < 1,
![$$\lim_{N\rightarrow\infty}\hbox{Pr}\left[NW_{N\semicolon m}-\ln \textstyle {\displaystyle {N \over m}}\le t\right]={1 \over 2\pi i}\vint_{c-i\infty}^{c+i\infty}{\pi e^{tx} \over {\rm sin}\, \pi x}\, dx.$$](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20151022131819267-0421:S026996480800003X_eqnU15.gif?pub-status=live)
For t > 0 (t < 0), the contour can be closed over the negative (positive) Re(x)-plane, resulting, for any t, in the Fermi–Dirac distribution function
![$$\lim_{N\rightarrow\infty}\hbox{Pr}\left[NW_{N\semicolon m}-\ln \textstyle {\displaystyle {N \over m}}\le t\right]={1 \over 1+e^{-t}}$$](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20151022131819267-0421:S026996480800003X_eqn12.gif?pub-status=live)
whose symmetric probability density function is

Figure 1 plots the scaled, asymptotic pdf (11) for various values of m. The Fermi–Dirac distribution function frequently appears in statistical physics (see, e.g., [Reference Ashcroft and Mermin2]). It is of interest to observe that the deep tails of f NW N;m−ln N(t) for large |t| decrease as O(e −|t|). This is different than the decrease of either a Gumbel or Gaussian, which implies that larger variations around the mean can occur.

Figure 1. The asymptotic and scaled pdf (11) for various values of m = 1, 2, 3, 4, 5, 10, 15, 20. The pdf of the Gumbel and the Fermi–Dirac distributions are also shown.
If we rescale (9) by changing t = y − ln m, then
![$$\lim_{N\rightarrow\infty} \hbox{Pr}\left[NW_{N\semicolon m}-\ln \textstyle {\displaystyle {N \over m}}\le y\right]=e^{-my}m^{m+1}e^{me^{-y}}\vint_{me^{-y}}^{\infty}{e^{-u} \over u^{m+1}}\, du$$](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20151022131819267-0421:S026996480800003X_eqnU17.gif?pub-status=live)
whose dependence on m and tendency to the Fermi–Dirac distribution (12) is shown in Figure 2. From a practical point of view, if m > 5, the much simpler Fermi–Dirac distribution is sufficiently accurate. The observation that a peer group of m ≈ 5 is already close to the asymptotic regime leads us to conclude that relatively small peer group sizes suffice to offer a good service quality (e.g., small latency) and that increasing the peer group size only logarithmically (i.e., marginally) improves the quality of service of weight-related metrics.

Figure 2. The tendency toward the Fermi–Dirac distribution (in bold) for increasing m = 1, 2, 3, 4, 5, 10, 15, 20.
Rewriting (12) as
![$$\hbox{Pr}\left[W_{N\semicolon m}\le y\right]\simeq{1 \over 1+{N \over m}e^{-Ny}}$$](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20151022131819267-0421:S026996480800003X_eqn13.gif?pub-status=live)
and after derivation, we obtain

from which

The condition that W N;m > 0 is ignored in the asymptotic analysis, such that the largest error is made for f W N;m(0). From (4), it follows that the accuracy of the asymptotic analysis is always better than a factor . Hence, the smaller the fraction
of peers in the network, the higher the accuracy of the asymptotic analysis. Figure 3 shows, for finite values of N, simulation of the scaled pdf f NW N;m−ln N(t) and the asymptotic result (8) for m = 1, and Figure 4 plots the comparison for m = 5. Even when ignoring the simulation error bars, both Figures 3 and 4 indicate that the asymptotic analysis is already accurate for the relatively small network size N. The accuracy factor
also agrees and explains why the asymptotic formula (8) is less accurate for higher m.

Figure 3. Comparison of the analysic result (11) with simulations for various N and m = 1.

Figure 4. Comparison of the analysic result (11) with simulations for various N and m = 5.
ACKNOWLEDGMENT
This work has been partially supported by European Union NoE CONTENT (FP6-IST-038423) (www.ist-content.eu).
APPENDIX A
Evaluation of the Sum S
We introduce the Beta function [Reference Abramowitz and Stegun1, 6.2.1],

which is valid for Re(z) > 0 and Re(w) > 0, in (6) such that for Re(x) > 0,

Introducing Euler's Gamma integral yields

Using (C.5),

gives

Let

Partial integration yields

and we find that

We compute the u-integral as

Substituting this series results in

This second series (A.1) for S is more suited than (6) for deriving the asymptotic behavior of S for large N (and fixed m and x). Invoking [Reference Abramowitz and Stegun1, 6.1.47] yields

The remaining series can be rewritten as a hypergeometric series:

where the latter follows from Gauss' formula (C.2). The next sum is similarly computed with

as

Higher-order terms in O(N −j) with j > 1 are computed similarly and converge. Combining all contributions yields

After applying the reflection formula for the Gamma function [Reference Abramowitz and Stegun1, 6.1.17], , we find (7).
APPENDIX B
Inverse Laplace Transform of (8)
It is a little easier to compute the distribution function instead of the pdf. The inverse Laplace transform is
![$$\lim_{N\rightarrow \infty}\hbox{Pr}\left[NW_{N\semicolon m}-\ln N\le t\right]= {1 \over 2\pi i\Gamma\lpar m\rpar }\vint_{c-i\infty}^{c+i\infty}{\pi \over \sin\pi x} \Gamma\lpar x+m\rpar e^{xt}\, dx,$$](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20151022131819267-0421:S026996480800003X_eqnU34.gif?pub-status=live)
where 0 < c < 1. Since

for any t, the contour can be closed over the negative Re(x) < 0 plane, where we encounter simple poles of at x = −k and of Γ(x + m) at x = −m, −m − 1, … . Hence, the integrand has simple poles at x = −1, − 2, … , −m + 1 and double poles at x = −m, −m − 1, … . After closing the contour over the negative Re(x)-plane and applying Cauchy's residue theorem, we obtain
![$$\eqalign{\lim_{N\rightarrow\infty}\hbox{Pr}\left[NW_{N\semicolon m}-\ln N\le t\right]=\sum_{k=0}^{m-1}\lim_{x\rightarrow-k}{\pi \, \lpar x+k\rpar \over \sin\pi x}{\Gamma \lpar x+m\rpar \over \Gamma\lpar m\rpar }e \, ^{xt}\cr \quad + \sum_{k=m}^{\infty}\lim_{x\rightarrow-k}\left[{d \over dx}{\pi \, \lpar x+k\rpar ^{2} \over \sin\pi x}{\Gamma\lpar x+m\rpar \over \Gamma\lpar m\rpar }e \, ^{xt}\right].}$$](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20151022131819267-0421:S026996480800003X_eqnU36.gif?pub-status=live)
The first sum equals

The second sum first requires the computation of the derivative:

Using the reflection formula of the Gamma function,

the derivative becomes
![$$\eqalign{{d \over dx}{\lpar x+k\rpar ^{2} \over \sin\pi x}\Gamma\lpar x+m\rpar e^{xt} = {\lpar x+k\rpar ^{2} \over \sin^{2}\!\! \pi x}{\pi \, \lpar \!\!-\!\!1\rpar ^{m} \over \Gamma\lpar 1\!\!-\!\!x\!-\!m\rpar }e^{xt}\left[{2 \over \lpar x+k\rpar }-\pi\cot\pi x+\psi \, \lpar x+m\rpar +t\right].}$$](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20151022131819267-0421:S026996480800003X_eqnU40.gif?pub-status=live)
The reflection formula for the digamma function [Reference Abramowitz and Stegun1, 6.3.7] shows that

such that the sum between brackets is

Since

we have
![$$\eqalign{\lim_{x\rightarrow-k}\left[{d \over dx}{\pi \, \lpar x+k\rpar ^{2} \over {\rm sin}\, \pi x} {\Gamma\lpar x+m\rpar \over \Gamma\lpar m\rpar }e^{xt}\right] ={\lpar\!-\!\!1\rpar^{m} \over \left(k\!-\!m\right)!\Gamma\lpar m\rpar }e^{-kt}\left[\psi \, \lpar 1\!+k-m\rpar +t\right].}$$](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20151022131819267-0421:S026996480800003X_eqnU44.gif?pub-status=live)
The second sum is
![$$\eqalign{\sum_{k=m}^{\infty}\lim_{x\rightarrow-k}\left[{d \over dx}{\pi \, \lpar x+k\rpar^{2} \over {\rm sin}\, \pi x} {\Gamma\lpar x+m\rpar \over \Gamma\lpar m\rpar }e^{xt}\right]\cr ={\lpar\! -\!\!1\rpar ^{m} \over \Gamma\lpar m\rpar }\sum_{k=m}^{\infty}{e^{-kt} \over \left(k-m\right)!}\left[\psi \, \lpar 1\!+k-m\rpar +t\right]\cr ={\lpar \!-\!\!1\rpar ^{m}e^{-mt} \over \Gamma\lpar m\rpar }\sum_{k=0}^{\infty}{e^{-kt} \over k!}\left[\psi \, \lpar 1\!+k\rpar +t\right]\cr ={\lpar\! -\!\!1\rpar ^{m}e^{-mt} \over \Gamma\lpar m\rpar }\left\{te^{e^{-t}} + \sum_{k=0}^{\infty}{e^{-kt} \over k!}\psi \, \lpar 1+k\rpar \right\}.}$$](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20151022131819267-0421:S026996480800003X_eqnU45.gif?pub-status=live)
For integer values of the diagamma function, we know [Reference Abramowitz and Stegun1, 6.3.2] that , and, thus,

Using (C.7) gives

The series expansion of the exponential integral is [Reference Abramowitz and Stegun1, 5.1.11]

and we recognize that

Hence,
![$$\sum_{k=m}^{\infty}\lim_{x\rightarrow-k}\left[{d \over dx}{\pi \, \lpar x+k\rpar ^{2} \over {\rm sin}\, \pi x} {\Gamma\lpar x+m\rpar \over \Gamma\lpar m\rpar }e^{xt}\right]= {\lpar \!-\!\!1\rpar ^{m}e^{-mt} \over \Gamma\lpar m\rpar }e^{e^{-t}}E_{1}\left(e^{-t}\right).$$](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20151022131819267-0421:S026996480800003X_eqnU50.gif?pub-status=live)
Combining all contributions yields
![$$\eqalign{\lim_{N\rightarrow\infty}\hbox{Pr}\left[NW_{N\semicolon m}-\ln N\le t\right]=\sum_{k=0}^{m-1}\lpar \!\!-\!\!1\rpar ^{k}{\lpar m-\!1\!-k\rpar ! \over \lpar m-\!1\rpar !}e^{-kt} + {\lpar \!\!-\!\!1\rpar ^{m}e^{-mt} \over \lpar m-\!1\rpar !}e^{e^{-t}}\vint_{e^{-t}}^{\infty}{e^{-u} \over u}\, du.}$$](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20151022131819267-0421:S026996480800003X_eqnU51.gif?pub-status=live)
APPENDIX C
Identities
This section lists identities that we apply. For any complex a,b and positive integers k and n, there holds that

A famous result of Gauss on the hypergeometric series is [Reference Abramowitz and Stegun1, 5.1.20]

where c≠ 0, −1, −2, … and Re(c − a − b) > 0.
The incomplete Gamma function is defined [Reference Abramowitz and Stegun1, 6.5.3] as

Clearly, Γ(1, x) = e −x. Partial integration yields

from which the recursion follows:

Iterating this recursion a few times reveals that

If a = 1, then

If a = −n, then, after applying the reflection formula of the Gamma function, we obtain

where Γ(0, x) = E 1(x) is the exponential integral.
A result derived from Ramanyuan's work [Reference Abramowitz and Stegun3, p. 46] is

where f(x) = ∑k=0∞a kx k for |x| < R. Applied to f(x) = e x, this yields
