ON CLIMBING TRIES

Costas Christophi; Hosam Mahmoud

doi:10.1017/S0269964808000089

ON CLIMBING TRIES

Published online by Cambridge University Press: 18 December 2007

Costas Christophi and

Hosam Mahmoud

Show author details

Costas Christophi: Affiliation:
Cyprus International Institute for the Environment and Public Health in Association with Harvard School of Public Health, 1105, Nicosia, Cyprus Biostatistics Center, The George Washington University, Rockville, MD 20852 E-mail: cchristophi@cyprusinstitute.org
Hosam Mahmoud: Affiliation:
Department of Statistics, The George Washington UniversityWashington, DC 20052 E-mail: hosam@gwu.edu

Article contents

Abstract
INTRODUCTION
TRIES
METHODOLOGY
TYPICAL CLIMBING
Footnotes
References

Rights & Permissions

Abstract

To sample a typical key in a “trie,” an appropriate climbing might consider generating random edges in the same manner as the data are generated. In the absence of the probability generating the keys, an uninformed random choice among the children still provides an alternative. We are also interested in extremal sampling, achieved by following a leftmost (or a rightmost) path. Each of these climbing strategies always generates a key, but one that might not necessarily be in the database. We investigate the altitude of the position at which climbing is terminated. Analytical techniques, including poissonization and the Mellin transform, are used for the accurate calculation of moments. In all strategies, the mean is always logarithmic. For typical and uninformed climbing, the variance is bounded in unbiased tries but grows logarithmically in biased tries. Consequently, in the biased case, one can find appropriate centering and scaling to produce a limit distribution for these two climbing strategies; the limit is normal. For extremal climbing, the variance is always bounded for both biased and unbiased cases, and no nontrivial limit exists under any scaling.

Type: Research Article
Information: Probability in the Engineering and Informational Sciences , Volume 22 , Issue 1 , January 2008 , pp. 133 - 149

DOI: https://doi.org/10.1017/S0269964808000089 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2008

1. INTRODUCTION

The random climbing of trees has been a subject that authors revisit from time to time. It was considered in Moon [Reference Moon10] and in Meir and Moon [Reference Meir and Moon11, Reference Meir and Moon12]. The subject has been revisited recently by Panholzer [Reference Panholzer13], who considered several classes of random trees, including simply generated families and Pólya trees. In these investigations, a class of trees is considered, and a type of random walk on it is exercised. Starting at the root, certain nodes are accessed, and at each node, a randomly selected edge emanating from it is chosen at random (all edges coming out of a node being equally likely). The process is perpetuated until it is no longer possible to proceed. When the process is stopped, the path inscribed in the tree by climbing reaches a leaf.

We would like to consider the climbing of a class of random digital trees called the “trie.” However, we think the “model of randomness” should be changed from the usual uniform choice of edges to a climbing model that conforms with the manner in which these tries are randomly generated. The trie emerges from keys that are taken from a data generator that emits bits of data, with 1's having probability p and 0's having probability q = 1 − p. So, at each node of the trie, a simple Bernoulli random variable will govern the direction of the next turn. The process might not necessarily end on a leaf, as it might terminate at a null node, but it always generates a key (not necessarily in the trie).

The general interpretation of this climbing is that a typical key is being “sampled” from the database. Hence, we call this strategy typical climbing. In the absence of knowledge of the key generating probability, we consider an alternative strategy called uninformed climbing, in which we follow the right and left nexus with equal probability. Motivated by these sampling schemes, we also consider the case of sampling extremal data. In all of the cases, we develop asymptotic distributions for the length of the total distance climbed. One might be able to get exact expressions, too. For example, by purely combinatorial arguments, we compute the exact distribution of the climbing pathlength in extremal sampling to show that such exact expressions are possible in principle.

The plan of the article is as follows. For economy of notation, variables are reused in sections that are self-contained. For example, in Section 4, S _n, with moment generating function φ_n (t), will be the number of nodes on the path of typical climbing. These two symbols will be reused for the number of nodes on the path of uninformed and extremal climbing, and the corresponding moment generating functions, in Sections 5 and 6. Section 2 gives an overview of the trie structure and other terminology. In Section 3 the methodology is outlined. Analysis of typical climbing is pursued in Section 4, where in the biased case, we find a Gaussian limit for an appropriately normalized version of the climbing pathlength. Technicalities for asymptotically accurate mean and variance calculation are discussed in Subsection 4.1. Analysis of uninformed climbing is pursued in Section 5, where in the biased case, we also find a Gaussian limit. By contrast, we demonstrate in Section 6 that the extremal climbing pathlength does not possess a nontrivial limit under any scaling. We devote Subsection 6.1 to a combinatorial derivation of the exact distribution of the pathlength of extremal climbing.

2. TRIES

The trie is a data structure suitable for digital data (bits, hexadecimal strings, words, DNA strands, etc.), which are prevalent in science and technology. The trie was invented independently by De La Briandais [Reference De La Briandais2] and Fredkin [Reference Fredkin5] for information retrieval. Tries have numerous applications as a data structure for computer files, telecommunication signals, DNA, and so forth because of the digital nature of these data. Tries also provide a model for the analysis of several important algorithms, such as Radix Exchange Sort (see Knuth [Reference Knuth8]) and Extendible Hashing (see Fagin, Nievergelt, Pippenger, and Strong [Reference Fagin, Nievergelt, Pippenger and Strong3]).

A binary trie is a digital tree consisting of internal nodes, each having one or two children, and leaves that hold data (keys). The trie grows from n keys according to a construction algorithm. If n = 0, the insertion algorithm terminates. If n = 1, a leaf is allocated for the key given. If n ≥ 2, an internal node is allocated as a root of the tree; keys starting with zero go to the left subtree, and keys starting with 1 go to the right. The construction proceeds recursively in the subtrees, but at level ℓ, the (ℓ + 1)st bit of the key is used for branching. When the algorithm terminates, each key is in a leaf by itself, and the root-to-leaf paths correspond to minimal prefixes sufficient to distinguish the keys. Figure 1 illustrates a trie with five keys:

$\eqalign{X_1 &= 00111 \ldots\comma\; \cr X_2 &= 11011 \ldots\comma\; \cr X_3 &= 00011 \ldots\comma\; \cr X_4 &= 01010 \ldots\comma\; \cr X_5 &= 11111 \ldots.}$

Figure 1. A trie with five keys.

For ease of exposition, we will assume our data to be in binary representation of numbers in [0, Reference Christophi and Mahmoud1]. We can always insert a binary point to the left of a binary string to turn it into a number from this range. The binary case lays out the methodology for any size alphabet. The case of a larger alphabet can be handled similarly; it just involves more details.

Suppose we have n ≥ 0 keys, each given as an (infinite) string representing its expansion into binary bits. We assume the Bernoulli(p) model of randomness, according to which the bits within a key are independent with probability p of a bit being 1 and probability q of it being 0 (with p + q = 1), and the keys themselves are independent. The data entropy in this model is

$h_p=- \lpar\, p\hbox{ ln}\,p + q\hbox{ ln}\,q\rpar .$

The ideal unbiased Bernoulli model is equivalent to sampling from a uniform distribution, a realistic assumption under hashing schemes, where the primary goal is to achieve uniformity.

3. METHODOLOGY

Two main tools in the ensuing analysis are the Mellin transform and poissonization–depoissonization. These methods are now standard and we will not produce the details in any great length, but refer the reader to standard sources on such material.

The Mellin transform of a function f(x) is

$\vint_0^\infty f\lpar x\rpar x^{s - 1} \, ds$

and will be denoted by f *(s). The Mellin transform usually exists in vertical strips, in the s complex plane, of the form

$a\lt \Re \, s\lt b$

for real numbers a < b. We will denote this strip by 〈a, b〉. The function f(x) can be recovered from its transform by a line integral

$f\lpar x\rpar ={1 \over 2\pi i} \vint_{c - i\infty}^{c+i \infty} f^{\ast} \lpar s\rpar x^{ - s} \, ds$

for any c ∈ (a, b). Usually, such an integral is computed asymptotically (as x → ∞) by shifting the line of integration an arbitrary distance to the right of the existence strip and compensating for the shift by the residues of the poles between the two lines of integration. There often is a small residual error of the form O(x ^−M) for an arbitrary large positive number M. For a survey of the uses of the Mellin transform in the analysis of algorithms, see Flajolet, Gourdon, and Dumas [Reference Flajolet, Gourdon and Dumas4].

Certain complicated types of functional equation appear in the formulation of recurrence for pathlength under all climbing strategies. These types of recurrence equation are not easy to solve. However, poissonized versions are amenable to asymptotic analysis via the Mellin transform. In this context, poissonization means considering an analogous problem, but with a Poisson random number of keys instead of fixed n. The number of keys is taken to be a Poisson random variable with parameter z. The required asymptotic results for the fixed population are then extracted from the poissonized model by depoissonization, which usually means using the same results for the poissonized model after replacing z with n. This operation is justified by checking some regularity conditions, but it also introduces an asymptotically negligible error. We consider this as a standard program and will not give details, but refer the reader to the original work of Jacquet and Szpankowski [Reference Jacquet and Szpankowski7] or its presentation in textbook style in Szpankowski [Reference Szpankowski15].

4. TYPICAL CLIMBING

In typical sampling, we climb a trie by following an algorithm that emulates the natural frequency of bits. We start at the root and access nodes. At each node accessed, we generate an independent Bernoulli(p) random variable. If this variable yields zero, we follow the left edge if it exists (otherwise, the climbing is stopped), and if the value generated is 1, we follow the right edge if it exists (otherwise, the climbing is stopped).

Let S _n be the number of nodes on the path inscribed in the trie by the typical climbing. For example, given the trie of Figure 1, typical climbing might produce the key X ₄ in two steps with conditional probability pq, in which case S ₅ = 3 (counting the node containing X ₄). It might also reach the only null node in the trie (left of the root's right child) with probability pq, in which case S ₅ = 2. If the null node is reached, we take our typical sample to be 0.100000 … .

Note that S _n can be linked to the depth D _n of a randomly chosen key, after an additional key is added to the initial n keys. If we are inserting the (n + 1)st key, this will follow the path of S _n. If the climbing terminates at an empty node, S _n and D _n+1 are the same, but if the climbing terminates at a key, we need to insert a number of additional nodes. The number of additional nodes is geometrically distributed because we have to break the tie—there will be a number (zero or more) of bits in the new key agreeing with bits in the key colliding with it past the point of collision: Each agreement occurs with probability p ² + q ², but sooner or later, one disagreement (with probability 1 − p ² − q ²) will break the tie. This geometric random variable is independent of the structure of the trie; it is distributed like the depth of two keys in a trie of size 2, but the total amount of modification to S _n to become D _n+1 is dependent on S _n. More precisely, if X ^(r) is the prefix of length r of a digital key X, then

$D_{n+1}=S_n+{\tilde D}_2 {\bf 1}_{\lcub \cup_{j=1}^n \lcub X_{n+1}^{\lpar S_n\rpar } = X_j^{\lpar S_n\rpar }\rcub \rcub } = S_n + {\tilde D}_2 \sum\limits_{j=1}^n {\bf 1}_{\lcub X_{n+1}^{\lpar S_n\rpar } = X_j^{\lpar S_n\rpar }\rcub }\comma\;$

where 1_E is the indicator function of an event E and D˜₂ is an independent copy of D ₂. The dependence in the sum introduces complications, but the analytic approach that follows is transparent and systematic enough to cover cases that cannot be linked easily to the depth, such as the case of climbing without the knowledge of p, where we generate right and left moves with equal probability.

Let φ_n(t) be the moment generating function of S _n. Let L _n and R _n be respectively the number of keys in the left and right subtrees (so L _n + R _n = n). In view of the Bernoulli model, L _n $\mathop=\limits^{\cal L}$ Binomial (n, q). The subtrees themselves are random tries on their respective order, which follows from the independence structure assumed in the data. The variable S _n satisfies a basic recurrence:

$S_n \vert L_n = \left\{\matrix{1+S_{L_n} \quad\hbox{with probability } q \cr \cr 1+{\tilde S}_{R_n}\quad \hbox{with probability }p.}\right.$

Here and in the sequel, a tilded random variable stands for a random variable distributed like the untilded version and is conditionally independent of it. We have the conditional expectation

${\bf E} \lsqb e^{S_n t}\vert L_n\rsqb = e^{\lpar 1+S_{L_n}\rpar t} q+e^{\lpar 1+{\tilde S}_{R_n}\rpar t} p.$

By a standard double expectation, we get

${\bf E}\lsqb e^{S_n t}\rsqb ={\bf E}\lsqb e^{\lpar 1+S_{L_n}\rpar t} q\rsqb +{\bf E}\lsqb e^{\lpar 1+{\tilde S}_{R_n}\rpar t} p\rsqb .$

By the binomial distribution of L _n, we get by conditioning

$\eqalign{\phi_n \lpar t\rpar & = {\bf E}\lsqb e^{S_n t}\rsqb \cr & = \sum\limits_{\ell=0}^n {\bf E} \lsqb e^{\lpar 1+S_\ell\rpar t}\rsqb q \left(\matrix{n \cr\ell} \right)q^\ell p^{n - \ell} + \sum\limits_{\ell=0}^n {\bf E} \lsqb e^{\lpar 1+S_{n - \ell}\rpar t}\rsqb p \left(\matrix{n \cr\ell}\right)q^\ell p^{n - \ell}.}$

Toward poissonization we construct the supergenerating function A(z, t) = ∑_n=0^∞(φ_n(t)/n!)z ⁿ. First, we multiply both sides of the latter equality by z ⁿ and sum over all possible values of n to get

$\eqalign{\sum\limits_{n=2}^\infty {\phi_n \lpar t\rpar \over n!} z^n = qe^t \sum\limits_{n=2}^\infty \sum\limits_{\ell=0}^n z^n \phi_\ell \lpar t\rpar {1 \over \ell !\, \lpar n - \ell\rpar !} q^\ell p^{n - \ell} \cr \quad + pe^t \sum\limits_{n=2}^\infty \sum\limits_{\ell=0}^n z^n \phi_{n - \ell} \lpar t\rpar {1 \over \ell !\, \lpar n - \ell\rpar !} q^\ell p^{n - \ell}.}$

The sums in this last equation are then extended to start from n = 0, after introducing the necessary adjustments for n = 0 and n = 1. Subsequently,

$\eqalign{A\lpar z\comma\ t\rpar &= qe^t e^{pz} A\lpar qz\comma \ t\rpar +pe^t e^{qz} A\lpar pz\comma\ t\rpar \cr & + 1 - e^t + ze^t - \lpar\, p^2+q^2\rpar e^{2t} z - 2pqe^t z.}$

Direct poissonization of A(z, t) runs into a problem in the existence of the Mellin transform. The difficulty is that when we multiply both sides of the equation by e ^−z, the right-hand side has the loose term (1 − e ^t) e ^−z. Formally, its Mellin transform is (1 − e ^t)Γ(s) and its domain of existence is ℜs > 0. The remaining terms, however, impose domains of existence in a strip with a negative real part. Hence, there will be no common domain of intersection.

To circumvent the difficulty, we deal with a shifted supermoment generating function. As we will see shortly, in this way the right-hand side has a difference of exponential terms, for which a domain of existence is consistent with the rest of the expression. Set

$B\lpar z\comma \ t\rpar =e^{ - z} \lpar A\lpar z\comma\; t\rpar - 1\rpar .$

The function B(z, t) has the interpretation:

$B\lpar z\comma\; t\rpar = {\bf E}\lsqb e^{S_{N\lpar z\rpar } t}\rsqb - e^{ - z}\comma$

where N(z) is a Poisson random variable with parameter z; that is, B(z, t) is the poissonized moment generating function of the climbing pathlength, modified by an exponentially negligible error, from which we obtain the functional equation

$\eqalign{B\lpar z\comma \ t\rpar & = e^t \lpar\, pe^{ - pz} A\lpar\, pz\comma \ t\rpar + qe^{ - qz} A\lpar qz\comma \ t\rpar +\lpar\, p^2+q^2\rpar ze^{ - z} \lpar 1 - e^t\rpar - e^{ - z}\rpar \cr & = e^t \lpar\, pB\lpar\, pz\comma \ t\rpar +qB\lpar qz\comma \ t\rpar +p\lpar e^{ - pz} - e^{ - z}\rpar +q\lpar e^{ - qz} - e^{ - z}\rpar\cr & \quad +\lpar\, p^2+q^2\rpar ze^{ - z} \lpar 1 - e^t\rpar \rpar .}$

The Mellin transform of this function is

(1)

$B^{\ast}\lpar s\comma \ t\rpar ={e^t \Gamma \lpar s\rpar \lpar\, p^{ - s+1}+q^{ - s+1} - 1+\lpar\, p^2+q^2\rpar \lpar 1 - e^t\rpar s\rpar \over 1 - e^t \lpar\, p^{ - s+1}+q^{ - s+1}\rpar }\comma\;$

where we treated a term like e ^−pz − e ^−z as e ^−pz − 1 − (e ^−z − 1), with each shifted exponential function having a Mellin transform representation in terms of the Cauchy–Saalschütz gamma function in the domain 〈−1, 0〉; that is, such a term has the Mellin transform (p ^−s − 1)Γ(s) in that domain. Now, the Mellin transform B ^*(s, t) exists in the domain

$- 1\lt \Re s \lt s_0 \lpar t\rpar \comma$

where s ₀ (t) is the only real solution to the equation

(2)

$p^{ - s+1}+q^{ - s+1}=e^{ - t}.$

Observe that s ₀ (t) is a continuous function of t, with value 0 at t = 0. We thus can find a neighborhood around t = 0 for which s ₀(t) is arbitrarily close to zero. We will keep |t| small enough for the entire strip 〈s ₀ (t), −s ₀ (t)〉 to be contained in 〈−1/4, 1/4〉.

4.1. The Mean Climbing Pathlength in Typical Climbing

We will need a few technicalities for the proof, and we discuss them first. The inverse Mellin transform involved in the mean requires a computation of residues at the roots of the characteristic equation

(3)

$1 - p^{ - s+1} - q^{ - s+1}=0.$

The roots of this equation have been studied. The following special case is known (e.g., see Szpankowski [Reference Szpankowski15], who attributed the result to Jacquet [Reference Jacquet6] and Schachinger [Reference Schachinger14]). We present the result as written in Drmota, Reznik, Savari, and Szpankowski [private communication].

Lemma 1

Let p < q. There are countably infinitely many simple solutions (characteristic roots) of 1 − p ^−s+1 − q ^−s+1 = 0. The roots satisfy the following:

(i) s ₀ = 0 is always a root.
(ii) If b is a root, then 0 ≤ ℜb ≤ ρ, where ρ is the unique real positive solution of 1 − p ^−s+1 + q ^−s+1 = 0. Moreover, for every integer k, there exists a unique root s _kwith imaginary part (2k − 1)π/|ln p| < 𝔧 s _k ≤ (2k + 1)π/|ln p|. Consequently s _k, for k = 0, ±1, ±2, … , are all the roots.
(iii) If ln p/ln q = m/r (where gcd (m, r) = 1 for positive integers m and r ), there are m − 1 roots, s ₁, s ₂, … , s _m−1, with real part greater than zero. The rest of the roots are in the form
$s_{k}=s_{k\, {\rm mod}\, m} + {2 \pi i(k-k \ \hbox{mod}\ m)\over \ln p} \quad for \, k\ge m \, and \, k\lt 0.$
(iv) If ln p/ln q is irrational, then ℜs _k > 0 for all k ≠ 0.

A symmetrical statement applies when p > q, but in this case, ρ is defined as the positive root of 1 + p ^−s+1 − q ^−s+1 = 0.

Theorem 1

Let S _nbe the number of nodes on the path of typical climbing of a trie on n keys from the Bernoulli(p) model. Then

$\eqalign{{\bf E}\lsqb S_n\rsqb &= {{\ln n } \over {h_p }}+{1 \over {h_p }}\lpar \gamma - 1 - \ln p+2pq - \ln q\rpar \cr &- {1 \over {2\, h_p^2 }}\lpar\, p\ln ^2 \!p+2\ln p\ln q+q\ln^2 q\rpar +\eta_1 \lpar \ln n \rpar \cr &+ o\lpar 1 \rpar \comma\; \cr{\bf Var} \lsqb S_n\rsqb &= {pq\lpar \ln p - \ln q\rpar ^2 \over h_p^3} \ln n +o\lpar \ln n\rpar \comma }$

where η₁(·) is the function given by the Fourier expansion

$\eqalign{\eta_1 \lpar u\rpar & =\left\{\matrix{- \displaystyle{1 \over h_p} \sum\limits_{k=- \infty \atop k \ne 0}^\infty {\matrix{\lpar 1+\lpar\, p^2+q^2\rpar s_{mk}\rpar \Gamma \lpar s_{mk}\rpar e^{ - s_{mk} u}}\hfill}\cr 0\hfill}\right. \hfill \cr &\quad\quad\quad{ if\ {\ln p \over \ln q} = {m \over r} \, is \, rational\ with \hbox{ gcd} \lpar m\comma \ r\rpar = 1} \hfill\cr &\quad\quad\quad otherwise}$

(with s _mka nonzero solution of p ^−s+1 + q ^−s+1 = 1 with real part zero). In either case, η₁is uniformly bounded by a small number.Footnote ¹The o(ln n) term in the variance might also have small bounded oscillations.

Proof

In order to calculate the mean, we take the first derivative of (1) with respect to t and evaluate for t = 0, yielding

${\partial \over {\partial t}}B^{\ast} \lpar s\comma \ t\rpar \vert_{t=0}=- {{\Gamma \lpar s\rpar \lpar 1+\lpar\, p^2+q^2\rpar s\rpar } \over {1 - q^{-s+1}-p^{-s+1} }}.$

This is the Mellin transform of the expected poissonized pathlength S _N(z) and exists in 〈0, 1〉.

The roots of the characteristic equation (Reference Fagin, Nievergelt, Pippenger and Strong3) determine the asymptotics of the mean. According to Lemma 1, in the case when ln p/ln q is rational, there are roots aligned and equispaced on the vertical axis of the s complex plane, and the rest of the characteristic roots fall in the right half of the s complex plane, whereas in the case where ln p/ln q is irrational, all of the roots fall in the right half of the s complex plane except for s ₀ = 0.

The inverse Mellin transform is

${\bf E} \lsqb S_{N\lpar z\rpar }\rsqb =O\lpar z^{ - M}\rpar + \sum\limits_{k=- \infty}^\infty \, \mathop {\rm Res} \limits_{s=s_{k}} \left[{z^{ - s} \Gamma \lpar s\rpar \lpar 1+\lpar\, p^2+q^2\rpar s\rpar \over 1 - q^{-s+1}-p^{-s+1}} \right]\comma$

where M > ρ ≥ a is any arbitrary large positive number.

The main contribution comes from s ₀ = 0, as it is the only double pole; the rest are simple. We obtain

$\eqalign{{\bf E}\lsqb S_{N\lpar z\rpar }\rsqb & = {{\ln z} \over {h_p }}+{1 \over {h_p }}\lpar \gamma - 1 - \ln p+2pq - \ln q\rpar \cr & \quad - {1 \over {2 h_p^2 }}\lpar\, p\ln^2 p+2\ln p\ln q+q\ln^2 q\rpar +\eta_1 \lpar \ln z\rpar +o\lpar 1\rpar. }$

By standard depoissonization we arrive at the same expression for E[S _n], and only the error term is modified by the depoissonization error of O(n ⁻¹ ln n), which comes on top of the Mellin inversion error of o(1).

In order to calculate the second moment, we take the second derivative of (Reference Christophi and Mahmoud1) with respect to t and evaluate at t = 0. We have

$\eqalign{{{\partial^2 } \over {\partial t^2 }}B^ {\ast} \lpar s\comma \ t\rpar \vert_{t=0} & = - {{\Gamma \lpar s\rpar } \over {\lpar 1 - q^{-s+1}-p^{-s+1}\rpar ^2 }}\lsqb 1+3\lpar\, p^2+q^2\rpar s \cr & \quad - \lpar 1 - \lpar\, p^2+q^2\rpar s\rpar \lpar\! - q^{-s+1}-p^{-s+1}\rpar \rsqb .}$

This is the Mellin transform of the expected value of S ²_N(z). In the inverse Mellin transform, the main contribution comes from s _k = 0. After depoissonization, we get

$\eqalign{{\bf E}\lsqb S_n^2\rsqb & \sim {1 \over {h_p^2 }} \ln^2 n+{1 \over {h_p^3}} \lpar \lpar 1-p^2\rpar \ln^2 q - \lpar\, p^2 - 2p\rpar \ln^2 p - 2pq\ln p\ln q \cr & \quad - \lpar 4p^3 - 8p^2+6p+2\gamma q - 2\rpar \ln q - \lpar 4p^2 q+2\gamma p - 2p\rpar \ln p\rpar \ln n.}$

The variance follows from the first two moments, after straightforward algebraic simplification.■

Curiously, the variance in the unbiased case is $O\lpar 1\rpar$ (in this case, all of the poles lie on the vertical axis of the s complex plane). In the biased case ( $p \ne q$ ), we have growth in the variance with the number of keys, which admits the existence of an asymptotic distribution for the typical climbing pathlength (after an appropriate normalization).

Theorem 2

Let S _nbe the number of nodes on the path of typical climbing of a trie on n keys from a biased Bernoulli(p) model. Then

${{{S_n - \left({ 1 \over h_p}\right) \ln n} \over \sqrt {\ln n}}} \mathop {\to} \limits^D {\cal N} \left( 0\comma\; {pq \over h_p^3} \lpar \ln p - \ln q\rpar ^2 \right).$

Proof

We take |t| small enough so that 〈s ₀(t), −s ₀(t)〉⊆〈−(1/4), (1/4)〉. The inverse Mellin transform of (Reference Christophi and Mahmoud1) yields

$B\lpar z\comma \ t\rpar = - \sum\limits_{k=- \infty}^\infty \mathop {\rm Res}\limits_{s=s_{k} \lpar t\rpar } \left[{B^{\ast} \lpar s\comma\; t\rpar z^{ - s}} \right]+ O\lpar z^{ - M}\rpar$

for any fixed M > ρ. Hence,

$\eqalign{{\bf E}\lsqb e^{S_{N\lpar z\rpar } t}\rsqb & = - {{\Gamma \lpar s_0 \lpar t\rpar \rpar \lpar e^{ - t} - 1+\lpar\, p^2+q^2\rpar \lpar 1 - e^t\rpar s_0 \lpar t\rpar \rpar z^{ - s_0 \lpar t\rpar } } \over {p^{ - s_0 \lpar t\rpar +1} \ln p+q^{ - s_0 \lpar t\rpar +1} \ln q}}. \cr & \quad - {1 \over {p^{ - s_0 \lpar t\rpar +1} \ln p+q^{ - s_0 \lpar t\rpar +1} \ln q}} \cr & \quad \times \sum\limits_{{{k=- \infty }{k \ne 0} } }^\infty \, \lpar \Gamma \lpar s_{k} \lpar t\rpar \rpar \lpar e^{ - t} - 1+\lpar\, p^2+q^2\rpar \lpar 1 - e^t\rpar s_{k} \lpar t\rpar \rpar z^{ - s_{k} \lpar t\rpar }\rpar +O\lpar z^{ - M}\rpar .}$

We isolated the role of s ₀(t) because, as we will see shortly, it provides the dominant asymptotics when t is in a neighborhood of zero (where the gamma function also becomes very large), contrasting the finite limit of Γ(s _k(t)), as t → 0, for each k ≠ 0. Depoissonization gives

${\bf E}\lsqb e^{S_n t}\rsqb \sim - {{\Gamma \lpar s_0 \lpar t\rpar \rpar \lpar e^{ - t} - 1+\lpar\, p^2+q^2\rpar \lpar 1 - e^t\rpar s_0 \lpar t\rpar \rpar n^{ - s_0 \lpar t\rpar } } \over {p^{ - s_0 \lpar t\rpar +1} \ln p+q^{ - s_0 \lpar t\rpar +1} \ln q}}.$

The essential root s ₀(t) is a continuous infinitely differentiable function of t. For t → 0, s ₀(t) has the expansion

$s_0 \lpar t\rpar =s_0 \lpar 0\rpar +s^{\prime}_0 \lpar 0\rpar t+s_0^{\prime\prime} \lpar 0\rpar {{t^2 } \over 2}+O\lpar t^3\rpar .$

It is clear from (Reference De La Briandais2) that s ₀(0) = 0. Also, s′₀(0) = − ${1\over h_{p}}$ and s″₀(0) = − $pq \over h_{p}^{3}$ (ln p − ln q)², as can be seen from the derivatives of (Reference De La Briandais2). Further, we use the local expansions 1 − e ^x = −x + O(x ²) and Γ(x) = ${1\over x}$ + γ+ O(x), near x = 0. After substituting t by υ/ln n, for fixed υ, we obtain

$\eqalign{{\bf E}\lsqb e^{S_n \lpar{\upsilon / \sqrt {\ln n}}\rpar} \rsqb & \sim {{n^{ - \left( s_0 \lpar 0\rpar \left({\upsilon / \sqrt {\ln n}}\right)+\mathop {s^{\prime\prime}}\nolimits_0 \lpar 0\rpar {{{\lpar \upsilon\,^2 } / {2\ln n}}}\rpar +O\lpar {1 / \ln^{3/2} n}\rpar \right) } } \over {s^{\prime}\lpar 0\rpar \left(\,p^{ - s_0 \left({\upsilon / \sqrt {\ln n}}\right) +1} \ln p+q^{ - s_0 \left( {\upsilon / \sqrt {\ln n}}\right) +1} \ln q\right) }} \cr & \sim {{e^{\ln n \left( {\lpar \upsilon / h_p \sqrt {\ln n}\rpar} - {{{\lpar s^{\prime\prime}\lpar 0\rpar \upsilon^2 } / {2\ln n}}\rpar}\right) } } \over {\mathop {s^{\prime}}\nolimits_0 \lpar 0\rpar \lpar\, p\ln p+q\ln q\rpar }}.}$

Therefore,

${\bf E}\left[ e^{\lpar S_n - {\lpar \lpar\ln\ n\rpar / h_p}\rpar \rpar\ \left({\upsilon / \sqrt {\ln \ n}}\right)}\right] \to e^{ - \lpar{s^{\prime\prime}\lpar 0\rpar / 2}\rpar\upsilon^2}\comma$

with the right-hand side being the moment generating function of a normal random variate with mean 0 and variance −s″₀(0).■

Note that Theorem 2 and its proof can stand alone without the need for the development of the mean and variance of Theorem 1. However, the mean and variance given by the shortcut in Theorem 2 are only the leading terms in the full expansion provided by the more elaborate residue calculation of Theorem 1. One would not even detect the oscillations in the mean and variance with the method used in Theorem 2.

5. CLIMBING WITH THE LACK OF KNOWLEDGE OF p

If one is uninformed about p, one might be inclined to plead ignorance and simply generate moves in the random walk to the right and left subtrees with equal probability, hoping that this will average good and bad cases, achieving a sampling strategy that is not too much worse than typical climbing.

The result presented next indicates that the average speed of climbing is improved in uninformed climbing on average. Of course, the two strategies coincide when p = q = 1/2, but uninformed climbing requires less time than typical climbing as p gets away from 1/2, and the uninformed strategy speeds up considerably near the extremal values p = 0 and p = 1. However, the improved performance in the uninformed search comes at the expense of the quality of sampling, as less probable keys are given more weight than their actual probability.

The techniques are much the same as in typical climbing. We will only set up the problem, show the salient intermediate steps, and state the analogous results without proof.

Let S _n be the number of nodes on the path inscribed in the trie by the uninformed climbing and let φ_n(t) be its moment generating function. For example, given the trie of Figure 1, uninformed climbing might produce in two steps the key X ₄, with S ₅ = 3, or the null node (corresponding to a key value 0.100000…), in which case, S _n = 2. In either case, the key is generated with probability 1/4.

The length S _n satisfies a basic recurrence:

$S_n \vert L_n = \left\{\matrix{1+S_{L_n}\; \quad\!\!\hbox{with probability }{1 \over 2} \cr 1 + {\tilde S}_{R_n} \;\quad {\rm with probability}{1 \over 2}.}\right.$

By a standard double expectation, we get

${\bf E} \lsqb { e}^{S_n t}\rsqb = {1 \over 2} {\bf E} \lsqb e^{\lpar 1+S_{L_n }\rpar t}\rsqb + {1 \over 2} {\bf E}\lsqb e^{\lpar 1+{\tilde S}_{R_n}\rpar t}\rsqb .$

Toward poissonization, we reintroduce the supergenerating function A(z, t) = ∑_n=0^∞(φ_n(t)/ n!)z ⁿ, and after manipulation similar to that in the case of typical climbing (mutatis mutandis, of course), we reach

$A\lpar z\comma \ t\rpar - 1 = {1 \over 2}e^t \lpar e^{pz} A\lpar qz\comma \ t\rpar +e^{qz} A\lpar\, pz\comma\ t\rpar - 2+z - ze^t\rpar .$

As earlier, we reintroduce

$B\lpar z\comma \ t\rpar =e^{ - z} \lpar A\lpar z\comma\ t\rpar - 1\rpar$

to work with a shifted function possessing a Mellin transform. We first obtain the functional equation

$B\lpar z\comma \ t\rpar ={1 \over 2}e^t \lpar B\lpar\, pz\comma\ t\rpar +B\lpar qz\comma \ t\rpar +ze^{ - z} \lpar 1 - e^t\rpar +e^{ - pz}+e^{ - qz} - 2e^{ - z}\rpar \comma$

the Mellin transform of which is

$B^{\ast} \lpar s\comma\ t\rpar ={{e^t \Gamma \lpar s\rpar \lpar\, p^{ - s}+q^{ - s} - 2+\lpar 1 - e^t\rpar s\rpar } \over {2 - e^t \lpar\, p^{ - s}+q^{ - s}\rpar }}\comma$

existing in the strip 〈−1, s ₀(t)〉, and s ₀(t) being the only real root of the equation

$p^{ - s\lpar t\rpar }+q^{ - s\lpar t\rpar }=2e^{ - t}.$

We take |t| small enough so that 〈s ₀(t), −s ₀(t)〉 ⊆ 〈−1/4, 1/4〉. After all of the manipulation and the residue calculation, we reach the result for this random walk.

Theorem 3

Let S _nbe the number of nodes on the path of uninformed climbing of a trie on n keys from the Bernoulli(p) model. Then

$\eqalign {&\quad{\bf E}\lsqb S_n\rsqb = 2\log_{{{1 / {pq}}}} n+{{\ln^2 p+\lpar 1 - 2\gamma\rpar \ln\lpar\, pq\rpar + \ln^2 q} \over { \ln^2 \lpar\, pq\rpar }}+\eta_2 \lpar \ln n \rpar +o ({1}) \comma\; \cr & {\bf Var}\lsqb S_n\rsqb \sim {{2\lpar \ln p - \ln q\rpar ^2 } \over { \ln^3 {{1 \over {pq}}}}}\ln n\comma }$

where η₂(·) is a function given by the Fourier expansion

$\eta_2 \lpar u\rpar =\left\{\matrix{- {1 \over h_p} \sum\limits_{k=- \infty \atop k \ne 0}^\infty \Gamma \lpar s_{mk}\rpar \lpar 2+s_{km}\rpar e^{ - s_{mk} u} \quad\matrix{if {\ln p \over \ln q} = {m \over r} \, is \, rational\ with \hbox{ gcd} \lpar m\comma\ r\rpar = 1} \cr 0\hfill otherwise\hfill}\right.$

(with s _mka nonzero solution of p ^−s + q ^−s = 2 with real part equal to zero), which is bounded by a very small number. The lower-order term in the variance can also have small bounded oscillations. Moreover, in the biased case,

${S_n - 2\log_{1 / pq} n \over \sqrt {\ln n}} \mathop {\to} \limits^D {\cal N} \left(0\comma\; {2 \lpar \ln p - \ln q\rpar ^2 \over \ln ^3 { 1 \over pq} }\right).$

6. EXTREMAL SAMPLING

To develop a sense for the extremes of the data present in the trie, a sampler might take after the extremal strategy of following a leftmost (for smallest) or a rightmost (for largest) path. Of course, the two strategies are symmetric with respect to the roles of p and q, and we only analyze one of them.

Let us reintroduce S _n as the number of nodes on the leftmost path and let φ_n(t) be its moment generating function. For instance, given the trie of Figure 1, the extremal leftmost climbing samples the key X ₃, and S ₅ = 4. If the leftmost path reaches a null node, we augment the corresponding prefix of zeros with a 1 to construct a representative sample of the smallest data.

The problem can be thought of in terms of the longest run of consecutive zeros. This case is connected to the maximum and second largest of independent and identically distributed geometric variables—let Z _i be the number of initial zeros in the key X _i and let Z _(i) be the ith order statistics of Z ₁, … , Z _n. Then Z _i are identically distributed geometric variables and

$S_{n}= \left\{\matrix{Z_{\lpar 2\rpar }+1\quad {\rm if}\ {\rm Z}_{\lpar 1\rpar } = Z_{\lpar 2\rpar } \cr Z_{\lpar 2\rpar } + 2\hfill {\rm otherwise}.\hfill}\right.$

One might be able to obtain a result for S _n from this representation. However, order statistics of independent and identically distributed discrete random variables are somewhat intricate because of the possible ties. We will proceed with our systematic analytic method.

The basic conditional recurrence is

$S_{n} \vert L_{n}=1+S_{L_{n}}\comma$

for n ≥ 2, giving

${\bf E}\lsqb e^{S_{n} t} \vert L_{n} \rsqb =e^{\lpar 1+S_{L_{n}}\rpar t}.$

Hence,

${\bf E}\lsqb e^{S_{n} t} \rsqb =\sum\limits_{\ell=0}^n \, {\bf E}\left[{e^{\lpar 1+S_\ell\rpar t} } \right]\left({\matrix{n \cr\ell}} \right)q^\ell p^{n - \ell } .$

Toward poissonization, we reintroduce the generating function A(z, t) = ∑_n=0^∞z ⁿ(φ_n(t)/n!), where φ_n(t) = E[e ^{S _n}^t]. By steps similar to previous derivations in the other two strategies, we can easily establish the relation

$A\lpar z\comma \ t\rpar - 1 - ze^{t}=e^{t} e^{pz} A\lpar qz\comma \ t\rpar - e^{t} \lpar 1+pz+qze^{t}\rpar .$

As we did in Section 4 for typical sampling, we do not poissonize A(z, t) directly, but we poissonize the shifted version A(z, t) − 1, for the same technical reason to overcome existential problems of the Mellin transform. The routine is pretty much the same and we omit its details. One obtains the Mellin transform

(4)

$B^ {\ast} \lpar s\comma\ t\rpar ={{e^{t} \Gamma \lpar s\rpar \lpar q^{ - s} - 1+sq\lpar 1 - e^{t}\rpar \rpar } \over {1 - q^{ - s} e^{t} }}.$

Theorem 4

Let S _nbe the number of nodes on the path of extremal climbing of a trie on n keys from the Bernoulli(p) model. Then

$\eqalign{&\quad{\bf E}\lsqb S_{n} \rsqb = \log_{1/q} n+{{2q+\ln q - 2\gamma } \over {2\ln q}}+\eta_3 \lpar \ln n\rpar +o\lpar 1\rpar \comma\; \cr &{\bf Var}\lsqb S_{n} \rsqb = {1 \over {12}}+{{\pi ^2 } \over {6\ln ^2 q}}+{{2q} \over {\ln q}} - {{q^2 } \over {\ln ^2 q}}+o\lpar 1\rpar \comma\; \cr}$

where η₃(·) is the function given by the Fourier expansion

$\eta_3 \lpar u\rpar =\left\{\matrix{{1 \over \ln q} \sum\limits_{{k=- \infty} \atop{\ne 0}}^{\infty} \Gamma \lpar s_{k}\rpar \lpar 1+s_{k} q\rpar e^{ - s_{k} u} \quad if {\ln p \over \ln q} \; is\; rational \cr 0\hfill \qquad\qquad\qquad\ \qquad otherwise\hfill}\right.$

(with s _k = 2πik/ln q), which is bounded by a very small number. The o(Reference Christophi and Mahmoud1) term in the variance might also have small bounded oscillations. Furthermore, S _n − ⌊log_1/qn⌋ does not have a nontrivial limit in distribution under any scaling.

Proof

The mean and variance are computed by the same poissonization–depoissonization routine, aided by the Mellin transform and residue calculation as was done for typical and uninformed climbing.

We restrict |t| < 1/ln1/q. The distribution is found from the inverse of the Mellin transform (Reference Flajolet, Gourdon and Dumas4). The poles of the transform are the roots of the equation

$q^{ - s} e^{t}=1\semicolon$

that is, they are at the points s _k(t) = $\left({1\over {\ln q}}\right)$ (1/ln q) (t + 2πik), for k = 0, ±1, ±2, … . So,

$B\lpar z\comma\ t\rpar =- \sum\limits_{k=- \infty }^{\infty} \mathop {{\rm Res}}\limits_{s=s_{k} \lpar t\rpar } \left[{B^{ \ast} \lpar s\comma\ t\rpar z^{ - s} } \right]+O\lpar z^{ - M}\rpar \comma$

for any fixed M > −1/ln q. Hence,

$\eqalign{{\bf E}\lsqb e^{S_{N\lpar z\rpar } t} \rsqb & = - {1 \over {\ln q}}\sum\limits_{k=\infty}^{\infty} \lpar \Gamma \lpar s_{k} \lpar t\rpar \rpar \cr &\quad \times\lpar 1 - q^{s_{k} \lpar t\rpar }+q^{s_{k} \lpar t\rpar +1} \lpar 1 - e^{t}\rpar s_{k} \lpar t\rpar \rpar z^{ - s_{k} \lpar t\rpar }\rpar +O\lpar z^{ - M}\rpar .}$

It helps put the result in a concise form to define the function

$\lcub x\rcub =x - \lfloor x\rfloor.$

Depoissonization gives the same expression with n replacing z and an adjustment in the error term. We then have

$\eqalign{{\bf E}\lsqb e^{\lpar S_{n} - \lfloor\log_{{1 / q}} n\rfloor\rpar t}\rsqb & \sim {1 \over {\ln {1 \over q}}}\sum\limits_{k=- \infty }^{\infty} \, \Gamma \left( {{t+2\pi ik} \over {\ln q}}\right) \cr & \quad \times \left(1 - q^{\lpar t+2\pi ik / \ln q\rpar}+q^{{\lpar t+2\pi ik / \ln q\rpar}+1} \lpar 1 - e^{t}\rpar {{t+2\pi ik} \over {\ln q}}\right) \cr &\quad \times n^{ - {\lpar 2\pi ik / \ln q \rpar }} e^{\lcub\! \log_{1 / q} n\rcub t}.}$

We can write this as

${\bf E}\lsqb e^{\lpar S_{n} - \lfloor\log_{{1 / q}} n\rfloor\rpar t} \rsqb \sim \lpar g\lpar t\rpar +h_{n} \lpar t\rpar \rpar e^{\lcub\! \log_{1 / q} n\rcub t}\comma$

with g(t) being equal to the zeroth term in the sum and h _n(t) collecting all of the remaining terms. It is clear that no increasing scale of t will give a nontrivial limit.

It is well known that the function {log_1/qn} is dense in the interval [0, Reference Christophi and Mahmoud1); see, for example, Kuipers and Niederreiter [Reference Kuipers and Niederreiter9]. For any fixed t in the range |t| < 1/ln(1/q), the function h _n(t) provides additional oscillations around g(t), and, of course, the small error term can be made smaller than any arbitrary fixed number. Hence, no limit distribution exists. For a more detailed account of how such arguments work, see Christophi and Mahmoud [Reference Christophi and Mahmoud1].■

6.1. The Exact Distribution

Some of the exact distributions within the scope of this research might be amenable to direct combinatorial methods. We illustrate this for extremal climbing, to show that it can be done in principal.

Theorem 5

Let S _nbe the number of nodes on the path of leftmost climbing of a trie on n ≥ 2 keys from the Bernoulli(p) model. Then, for k ≥ 2,

$\eqalign{{\bf Prob}\lpar S_{n}=k\rpar =nq^{k - 1} \lpar q\lpar 1 - q^{k - 1}\rpar ^{n - 1} - \lpar 1 - q^{k - 2}\rpar ^{n - 1}\rpar +\lpar 1 - q^{k}\rpar ^{n} - \lpar 1 - q^{k - 1}\rpar ^{n}}$

and Prob(S _n = 0) = 0 and Prob(S _n = 1) = p ⁿ.

Proof

The boundary cases Prob(S _n = k), for k = 1, 2, are trivial. We will develop the result in terms of the number of edges S′_n = S _n − 1. Let k ≥ 2. We dissect the event {S′_n = k} into two disjoint subsets. One of the two subsets, A ₁, corresponds to the case where the tree goes down the left path k edges and then turns right, with all of the keys having a string of k zeros as a prefix continuing with 1 at position k + 1 (there must be at least two such keys). This construction leaves a null node dangling at the leftmost position in the tree. This event can occur by having r keys, r = 2, … , n, in the subtree, the root of which is a sibling of the leftmost null node; the probability for any specific r to have this particular key structure is (q ^kp)^r. The rest of the n − r keys are not allowed to have a prefix of k zeros, otherwise they would disturb the pattern. The probability for these other keys not to have the forbidden prefix is (1 − q ^k)^n−r. The r keys can be chosen in $\left({\matrix{n \cr r}}\right)$ ways. Hence,

${\bf Prob}\lpar A_{1}\rpar =\sum\limits_{r=2}^{n} \, \left({\matrix{n \cr r}}\right)\lpar\, pq^{k}\rpar ^{r} \lpar 1-q^{k}\rpar ^{n-r}.$

The second event, A ₂, corresponds to the case where there is exactly one key at the end of a leftmost path with k internal vertices on it. By combinatorial arguments similar to that for A ₁, we see that

${\bf Prob}\lpar A_{2}\rpar =\sum\limits_{r=1}^{n - 1} \lpar r+1\rpar \left({\matrix{n \cr {r+1}}}\right)\lpar\, pq^{k - 1}\rpar ^{r} q^{k} \lpar 1-q^{k - 1}\rpar ^{n-r - 1}.$

Now,

$\eqalign{{\bf Prob}\lpar S^{\prime}_{n}=k\rpar &= {\bf Prob}\lpar A_{1} \cup A_{2}\rpar \cr&= \sum\limits_{r=2}^{n} \left({\matrix{n \cr r \cr}} \right)\lpar\, pq^{k}\rpar ^{r} \lpar 1-q^{k}\rpar ^{n-r} \cr & \quad+\sum\limits_{r=1}^{n - 1} \, \lpar r+1\rpar \left({\matrix{ n \cr {r+1} \cr } } \right)\lpar\, pq^{k - 1}\rpar ^{r} q^{k} \lpar 1-q^{k - 1}\rpar ^{n-r - 1} . \cr}$

The sums can be reduced via the binomial theorem.■

Acknowledgment

The second author wishes to thank the Institute of Statistical Mathematics, Tokyo, for supporting this research.

Footnotes

1. As an instance, when (ln p)/(ln q) = (2/3), η₁(ln n) is bounded uniformly in n by 0.752 × 10⁻¹⁴.

References

1.Christophi, C. & Mahmoud, H. (2005). The oscillatory distribution of distances in random tries. Annals of Applied Probability 15: 1536–1564.CrossRef Google Scholar

2.De La Briandais, R. (1959). File searching using variable length keys. In Proceedings of the Western Joint Computer Conference, AFIPS, San Francisco, pp. 295–298.CrossRef Google Scholar

3.Fagin, R., Nievergelt, J., Pippenger, N., & Strong, H. (1979). Extendible hashing: A fast access method for dynamic files. ACM Transactions on Database Systems 4: 315–344.CrossRef Google Scholar

4.Flajolet, P., Gourdon, X., & Dumas, P. (1995). Mellin transform and asymptotic harmonic sums. Theoretical Computer Science 144: 3–58.CrossRef Google Scholar

5.Fredkin, E. (1960). Trie memory. Communications of the ACM 3: 490–499.CrossRef Google Scholar

6.Jacquet, P. (1989). Contribution de l'Analyse d'Algorithmes a l'Evaluation de Protocoles de Communication. Thèse Universitè de Paris Sud-Orsay, Paris, France.Google Scholar

7.Jacquet, P. & Szpankowski, W. (1998). Analytical depoissonization and its applications. Theoretical Computer Science 201: 1–62.CrossRef Google Scholar

8.Knuth, D. (1998). The art of computer programming, Vol. 3: Sorting and searching, 2nd ed.Reading, MA: Addison-Wesley.Google Scholar

9.Kuipers, L. & Niederreiter, H. (1974). Uniform distribution of sequences. New York: Wiley.Google Scholar

10.Moon, J. (1970). Climbing random trees. Aequationes Mathematicae 5: 68–74.CrossRef Google Scholar

11.Meir, A. & Moon, J. (1975). Climbing certain types of rooted trees I. In Proceedings of the Fifth British Combinatorial Conference, pp. 461–469.Google Scholar

12.Meir, A. & Moon, J. (1978). Climbing certain types of rooted trees II. Acta Mathematica Academia Scientiarum Hungaricae 31: 43–54.CrossRef Google Scholar

13.Panholzer, A. (2005). The climbing depth of random trees. Random Structures and Algorithms 26: 84–109.CrossRef Google Scholar

14.Schachinger, W. (1993). Beiträge zur Analyse von Datenstrukturen zur Digitalen Suche. Dissertation, Technische Universität Wien, Vienna.Google Scholar

15.Szpankowski, W. (2001). Average case analysis of algorithms on sequences. New York: Wiley.CrossRef Google Scholar

Figure 1. A trie with five keys.

Article contents

ON CLIMBING TRIES

Abstract

1. INTRODUCTION

2. TRIES

3. METHODOLOGY

4. TYPICAL CLIMBING

4.1. The Mean Climbing Pathlength in Typical Climbing

Lemma 1

Theorem 1

Proof

Theorem 2

Proof

5. CLIMBING WITH THE LACK OF KNOWLEDGE OF p

Theorem 3

6. EXTREMAL SAMPLING

Theorem 4

Proof

6.1. The Exact Distribution

Theorem 5

Proof

Acknowledgment

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests