1. INTRODUCTION
Various sorts of extremal path length in binary search trees have been studied because of their importance as interpretations of some analyses of algorithm (mostly in the areas of searching and sorting). For example, the height of the binary search tree is considered in various sources for its role as a global measure of worst-case search in a random tree (see Robson [17], Mahmoud and Pittel [14], Pittel [15], Devroye [2,3], Drmota [6,7], and Reed [16]). At the other end of the spectrum, the length of the shortest root-to-leaf path is considered as a measure of optimism for the best search time (see Pittel [14]). Many of these results are surveyed in sorting textbooks such as Knuth [10] and Mahmoud [12].
The above-mentioned extremal path lengths have a common thread: They all are the “raw” depth of some extremal leaf in the tree. We are concerned in this investigation with “weighted” extremal path lengths, where nodes on the path have types of contribution to the path length other than a mere count of their incoming edge, such as contributing their own value. The path lengths involved have other interpretations as quantities underlying certain algorithms.
Some algorithms may go down a path from the root of a binary search tree of size n to the node ranked j, collecting the sum of the values encountered. We investigate the distribution of such paths in some extreme cases. Let Wj(n) be the value of the path length associated with traversing the tree from its root to the node labeled j while aggregating the values on the path. We will see that W1(n), when appropriately scaled, has Dickman's infinitely divisible distribution, a result that parallels in some way the Dickman distribution associated with finding the smallest item via the one-sided Quicksort (the so-called Quickselect algorithm; see Mahmoud, Modarres, and Smythe [13] and Hwang and Tsai [8]). By contrast, Wn(n), when appropriately centered and scaled, converges in distribution to a normal variate. The exercise demonstrates that there is a variety of different distributions associated with Wj(n) for different values of j.
2. SCOPE
A binary tree is a hierarchical structure of nodes each having no children, one left child, one right child, or two children (one left and one right). The nodes of such a tree can be labeled from some ordered set, say the natural numbers. The tree can further be endowed with a search property (to support fast searching of the items (also called keys) stored in it), which imposes the restriction on the labeling scheme that the label of any node is larger than the labels in its left subtree and no greater than any label in its right subtree. For definitions and combinatorial properties, see Mahmoud [11], and for applications in sorting, see Knuth [10] or Mahmoud [12].
A binary search tree is constructed from the permutation (π1,…, πn) of {1,2,…, n} by the following algorithm. The first element of the permutation is inserted in an empty tree; a root node is allocated for it. A subsequent element πj (with j ≥ 2) is directed to the left subtree if πj < π1, otherwise it is directed to the right subtree. In whichever subtree πj goes, it is subjected to the same insertion algorithm recursively, until it is inserted in an empty subtree, in which case, a node is allocated for it and linked appropriately as a left (right) child if its rank is less than (at least as much as) the value of the last node on the path. Figure 1 illustrates the tree constructed from the random permutation (5,8,7,3,9,1,6,2,4).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170409003157-13395-mediumThumb-S026996480707009Xfig001g.jpg?pub-status=live)
A binary search tree.
Several models of randomness are in common use on binary trees. The uniform model in which all trees are equally likely has been proposed for applications in formal languages, compilers, computer algebra, and so forth (see Kemp [9]). However, for the searching and sorting algorithms alluded to, the random permutation model is considered to be more appropriate. In this model of randomness, we assume that the tree is built from permutations of {1,…, n}, where a uniform probability model is imposed on the permutations instead of the trees. When all n! permutations are equally likely or random, binary search trees are not equally likely. Several permutations give rise to the same tree, favoring shorter and well-balanced trees rather than scrawny and tall shapes, which is a desirable property in searching and sorting algorithms (see Mahmoud [11]). The term random tree (and occasionally just the tree) will refer to a binary search tree built from a random permutation. The random permutation model is not really restrictive, as it covers a rather wide variety of instances, such as when the input is a sample drawn from any continuous probability distribution, and the construction algorithm is concerned only with the ranks of the keys, not their actual values.
We study the weighted path length leading to the different nodes. For instance, in the tree of Figure 1, W1(9) = 5 + 3 + 1 = 9, W2(9) = 5 + 3 + 1 + 2 = 11,…,W9(9) = 5 + 8 + 9 = 22.
The article is organized as follows. In Section 3 we study the weighted path length leading from the root to the minimal label in the tree and show that it has Dickman's distribution (after suitable scaling). The weighted path length leading to the maximal label n is investigated separately in Section 4, where we explore a useful reflection principle. It is shown in Section 4 that the weighted path length to the maximal label converges in distribution to the normal random variate (after appropriate centering and scaling). Section 5 gives some concluding remarks where a brief derivation of the average is given for the rest of the cases 1 < i < n.
3. WEIGHTED PATH TO THE MINIMAL LABEL
Let Ln be the number of items that appear in the left subtree; thus, Ln + 1 is the value of the root. For n ≥ 1, the stochastic recurrence
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170224030638074-0545:S026996480707009X:S026996480707009Xffm001.gif?pub-status=live)
represents the weighted path length from the root to the node labeled 1 (i.e., the sum of the collection of values on the leftmost path in the tree). Let φn(t) be the characteristic function of W1(n). By conditioning the stochastic recurrence, we obtain
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170224030638074-0545:S026996480707009X:S026996480707009Xffm002.gif?pub-status=live)
valid for all n ≥ 1. This telescoping sum is amenable to the differencing scheme—subtract a version of the last recurrence with n − 1 from one with n to obtain
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170224030638074-0545:S026996480707009X:S026996480707009Xffm003.gif?pub-status=live)
which can be unwound by direct iteration to give
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170224030638074-0545:S026996480707009X:S026996480707009Xffm004.gif?pub-status=live)
By differentiating this latter form once and twice (then evaluating at t = 0), we obtain the mean and the second moment.
Proposition 1: Let W1(n) be the weighted path length from the root to the least ranked label in a binary search tree built from a random permutation. We then have
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170224030638074-0545:S026996480707009X:S026996480707009Xffm005.gif?pub-status=live)
Guided by the rate of growth of the variance, we next proceed to argue the infinite divisibility of n−1W1(n). We take the natural logarithm of the characteristic function and write it in asymptotic form (as t → 0):
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170224030638074-0545:S026996480707009X:S026996480707009Xffm006.gif?pub-status=live)
Since the rate of growth of the standard deviation is n, one expects W1(n)/n to converge to a limit in distribution. At the level of characteristic function, this means changing the scale from t to t/n. Let v = t/n. This entails (for fixed v)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170224030638074-0545:S026996480707009X:S026996480707009Xffm007.gif?pub-status=live)
The O term converges to zero and the remaining sum approaches
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170224030638074-0545:S026996480707009X:S026996480707009Xffm008.gif?pub-status=live)
We thus have the convergence
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170224030638074-0545:S026996480707009X:S026996480707009Xffm009.gif?pub-status=live)
The characteristic function
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170224030638074-0545:S026996480707009X:S026996480707009Xffm010.gif?pub-status=live)
is that of Dickman's infinitely divisible random variable X in Kolmogorov's canonical form (see Billingsley [1, p.372]); that is,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170224030638074-0545:S026996480707009X:S026996480707009Xffm011.gif?pub-status=live)
We have arrived at the main result of this section.
Theorem 1: Let W1(n) be the weighted path length from the root to the least ranked label in a binary search tree built from a random permutation. Then
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170224030638074-0545:S026996480707009X:S026996480707009Xffm012.gif?pub-status=live)
where X is Dickman's random variable.
Remark: The limiting random variable for n−1W1(n) bears some similarity to the limiting random variable for n−1Cn[1], the normalized number of comparisons required by Quickselect to find the least item in a random input (of size n) with ranks following the random permutation model. It was shown by Mahmoud et al. [13] that n−1Cn[1] converges in distribution to 2 + X. Thus, asymptotically, the distribution of n−1Cn[1] behaves like that of 1 + n−1W1(n).
4. WEIGHTED PATH TO THE MAXIMAL LABEL
Let us introduce a reflection operation, which might generally be useful in this type of problem. In a binary search tree Tn of size n, exchange the right and left children of every node, starting at the root and progressing recursively toward the leaves to obtain the reflected tree Tn′. This reflection concerns only the shape of the tree, not the labels. One can think of this operation as if a two-sided mirror has been placed on a vertical axis passing through the root; then one sees the right subtree of Tn′ as the reflection in the left side of the mirror of the left subtree of Tn, and the left subtree of Tn′ as the reflection in the right side of the mirror of the right subtree of Tn. To maintain the binary search property in Tn′, we reinsert the numbers 1,…, n in a manner consistent with the search property. For example, the reflected tree of that in Figure 1 is shown in Figure 2.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170224030638074-0545:S026996480707009X:S026996480707009Xfig002g.gif?pub-status=live)
The reflection of the tree of Figure 1.
Note that, by the symmetry of binary search trees, Tn′ has the same probability as Tn; that is, there are as many permutations of {1,2,…, n} producing Tn as those producing Tn′. Observe that a key K in Tn corresponds to the value n + 1 − K in the reflection. Let the length of the rightmost path in Tn be Qn and suppose that the chain of values appearing on it from the root to the rightmost node (containing n) is Y1,Y2,…,YQn+1. Observe that the rightmost path in Tn becomes a leftmost one (of the same length) in Tn′ and suppose that the corresponding labels in the reflection are Y1′,Y2′,…,YQn+1′. This connection suggests that we can use the distribution of the path to the minimal value, which was established in Section 3, for the rightmost path as follows. We have
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170224030638074-0545:S026996480707009X:S026996480707009Xfrm001.gif?pub-status=live)
We can now quickly develop a Gaussian law for Wn(n) from known results about Qn. The latter variable is known to be asymptotically normal, satisfying
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170224030638074-0545:S026996480707009X:S026996480707009Xffm013.gif?pub-status=live)
see Devroye [4]. This indicates that centering and scaling relation (1) with asymptotic mean and standard deviation of nQn will yield a limit distribution. Let
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170224030638074-0545:S026996480707009X:S026996480707009Xffm014.gif?pub-status=live)
According to Theorem 1, we have
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170224030638074-0545:S026996480707009X:S026996480707009Xffm015.gif?pub-status=live)
indicating that the main contribution in Wn(n) comes from the length of the right-most path. Also, according to the limit law of Qn,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170224030638074-0545:S026996480707009X:S026996480707009Xffm016.gif?pub-status=live)
, and, of course,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170224030638074-0545:S026996480707009X:S026996480707009Xffm017.gif?pub-status=live)
. Hence,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170224030638074-0545:S026996480707009X:S026996480707009Xffm018.gif?pub-status=live)
5. CONCLUSION
The useful notation j [rtri ] k will stand for the event that the label j is encountered on the path to k, thus contributing its value to the weighted path length Wk(n). In view of this notation, the weighted path length of the node k is
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170224030638074-0545:S026996480707009X:S026996480707009Xfrm002.gif?pub-status=live)
The indicators involved are dependent, and it is not easy to determine limit distributions from this representation. Even computations such as the variance are daunting. However, of course, the representation being a sum, the average is not a major obstacle. We develop this average to use as a benchmark for what falls between the two extremes.
Lemma 1:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170224030638074-0545:S026996480707009X:S026996480707009Xffm019.gif?pub-status=live)
Proof: Suppose j < k. A property of binary search trees is that when j [rtri ] k, all of the numbers in Akj = {j,j + 1,…, k} appear after j (See Devroye and Neininger [5] for a discussion via the theory of records.) It suffices to count Bn, the number of permutations favorable to the event j [rtri ] k. If j appears at position p in a favorable permutation, there must be at least k − j positions past p to receive the numbers in Akj − {j}. Thus, n − p ≥ k − j. To complete the construction of a favorable permutation, choose any of these j − k positions for the elements of Akj − {j} and permute its j − k numbers over these positions in (k − j)! ways. Now permute the remaining n − (k − j + 1) elements in an unrestricted way over the remaining n − (k − j + 1) positions. Therefore,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170224030638074-0545:S026996480707009X:S026996480707009Xffm020.gif?pub-status=live)
The argument for j > k is symmetrical, with j and k exchanging roles. █
It follows from Lemma 1 and the representation (2) that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170224030638074-0545:S026996480707009X:S026996480707009Xffm021.gif?pub-status=live)
The substitution m = k − j + 1 in the first sum together with a symmetrical one in the second sum gives the result in a simple form:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170224030638074-0545:S026996480707009X:S026996480707009Xffm022.gif?pub-status=live)
where Hr is the rth harmonic number
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170224030638074-0545:S026996480707009X:S026996480707009Xffm023.gif?pub-status=live)
.
The form for E[Wk(n)] is for the entire spectrum of nodes. For low-indexed nodes k = o(n), we have
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170224030638074-0545:S026996480707009X:S026996480707009Xffm024.gif?pub-status=live)
whereas for nodes with high index k = n − o(n),
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170224030638074-0545:S026996480707009X:S026996480707009Xffm025.gif?pub-status=live)
however, if k ∼ αn, for 0 < α < 1, the asymptotic approximation is
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170224030638074-0545:S026996480707009X:S026996480707009Xffm026.gif?pub-status=live)
which indicates that the majority of the medium range indexes lean toward the behavior of the weight of the path length to the maximal node, rather than the linear order of magnitude associated with the minimal node. This could ultimately be reflected in the average distribution across all of the nodes. For instance, if we select a node randomly in the tree, its index Kn will be uniformly distributed on the set {1,…, n}; consequently its weighted path length has the average
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170224030638074-0545:S026996480707009X:S026996480707009Xffm027.gif?pub-status=live)
with an order of magnitude just like that of the weight of the path length to the maximal node.