1. INTRODUCTION
The random climbing of trees has been a subject that authors revisit from time to time. It was considered in Moon [Reference Moon10] and in Meir and Moon [Reference Meir and Moon11, Reference Meir and Moon12]. The subject has been revisited recently by Panholzer [Reference Panholzer13], who considered several classes of random trees, including simply generated families and Pólya trees. In these investigations, a class of trees is considered, and a type of random walk on it is exercised. Starting at the root, certain nodes are accessed, and at each node, a randomly selected edge emanating from it is chosen at random (all edges coming out of a node being equally likely). The process is perpetuated until it is no longer possible to proceed. When the process is stopped, the path inscribed in the tree by climbing reaches a leaf.
We would like to consider the climbing of a class of random digital trees called the “trie.” However, we think the “model of randomness” should be changed from the usual uniform choice of edges to a climbing model that conforms with the manner in which these tries are randomly generated. The trie emerges from keys that are taken from a data generator that emits bits of data, with 1's having probability p and 0's having probability q = 1 − p. So, at each node of the trie, a simple Bernoulli random variable will govern the direction of the next turn. The process might not necessarily end on a leaf, as it might terminate at a null node, but it always generates a key (not necessarily in the trie).
The general interpretation of this climbing is that a typical key is being “sampled” from the database. Hence, we call this strategy typical climbing. In the absence of knowledge of the key generating probability, we consider an alternative strategy called uninformed climbing, in which we follow the right and left nexus with equal probability. Motivated by these sampling schemes, we also consider the case of sampling extremal data. In all of the cases, we develop asymptotic distributions for the length of the total distance climbed. One might be able to get exact expressions, too. For example, by purely combinatorial arguments, we compute the exact distribution of the climbing pathlength in extremal sampling to show that such exact expressions are possible in principle.
The plan of the article is as follows. For economy of notation, variables are reused in sections that are self-contained. For example, in Section 4, S n, with moment generating function φn (t), will be the number of nodes on the path of typical climbing. These two symbols will be reused for the number of nodes on the path of uninformed and extremal climbing, and the corresponding moment generating functions, in Sections 5 and 6. Section 2 gives an overview of the trie structure and other terminology. In Section 3 the methodology is outlined. Analysis of typical climbing is pursued in Section 4, where in the biased case, we find a Gaussian limit for an appropriately normalized version of the climbing pathlength. Technicalities for asymptotically accurate mean and variance calculation are discussed in Subsection 4.1. Analysis of uninformed climbing is pursued in Section 5, where in the biased case, we also find a Gaussian limit. By contrast, we demonstrate in Section 6 that the extremal climbing pathlength does not possess a nontrivial limit under any scaling. We devote Subsection 6.1 to a combinatorial derivation of the exact distribution of the pathlength of extremal climbing.
2. TRIES
The trie is a data structure suitable for digital data (bits, hexadecimal strings, words, DNA strands, etc.), which are prevalent in science and technology. The trie was invented independently by De La Briandais [Reference De La Briandais2] and Fredkin [Reference Fredkin5] for information retrieval. Tries have numerous applications as a data structure for computer files, telecommunication signals, DNA, and so forth because of the digital nature of these data. Tries also provide a model for the analysis of several important algorithms, such as Radix Exchange Sort (see Knuth [Reference Knuth8]) and Extendible Hashing (see Fagin, Nievergelt, Pippenger, and Strong [Reference Fagin, Nievergelt, Pippenger and Strong3]).
A binary trie is a digital tree consisting of internal nodes, each having one or two children, and leaves that hold data (keys). The trie grows from n keys according to a construction algorithm. If n = 0, the insertion algorithm terminates. If n = 1, a leaf is allocated for the key given. If n ≥ 2, an internal node is allocated as a root of the tree; keys starting with zero go to the left subtree, and keys starting with 1 go to the right. The construction proceeds recursively in the subtrees, but at level ℓ, the (ℓ + 1)st bit of the key is used for branching. When the algorithm terminates, each key is in a leaf by itself, and the root-to-leaf paths correspond to minimal prefixes sufficient to distinguish the keys. Figure 1 illustrates a trie with five keys:


Figure 1. A trie with five keys.
For ease of exposition, we will assume our data to be in binary representation of numbers in [0, Reference Christophi and Mahmoud1]. We can always insert a binary point to the left of a binary string to turn it into a number from this range. The binary case lays out the methodology for any size alphabet. The case of a larger alphabet can be handled similarly; it just involves more details.
Suppose we have n ≥ 0 keys, each given as an (infinite) string representing its expansion into binary bits. We assume the Bernoulli(p) model of randomness, according to which the bits within a key are independent with probability p of a bit being 1 and probability q of it being 0 (with p + q = 1), and the keys themselves are independent. The data entropy in this model is

The ideal unbiased Bernoulli model is equivalent to sampling from a uniform distribution, a realistic assumption under hashing schemes, where the primary goal is to achieve uniformity.
3. METHODOLOGY
Two main tools in the ensuing analysis are the Mellin transform and poissonization–depoissonization. These methods are now standard and we will not produce the details in any great length, but refer the reader to standard sources on such material.
The Mellin transform of a function f(x) is

and will be denoted by f *(s). The Mellin transform usually exists in vertical strips, in the s complex plane, of the form

for real numbers a < b. We will denote this strip by 〈a, b〉. The function f(x) can be recovered from its transform by a line integral

for any c ∈ (a, b). Usually, such an integral is computed asymptotically (as x → ∞) by shifting the line of integration an arbitrary distance to the right of the existence strip and compensating for the shift by the residues of the poles between the two lines of integration. There often is a small residual error of the form O(x −M) for an arbitrary large positive number M. For a survey of the uses of the Mellin transform in the analysis of algorithms, see Flajolet, Gourdon, and Dumas [Reference Flajolet, Gourdon and Dumas4].
Certain complicated types of functional equation appear in the formulation of recurrence for pathlength under all climbing strategies. These types of recurrence equation are not easy to solve. However, poissonized versions are amenable to asymptotic analysis via the Mellin transform. In this context, poissonization means considering an analogous problem, but with a Poisson random number of keys instead of fixed n. The number of keys is taken to be a Poisson random variable with parameter z. The required asymptotic results for the fixed population are then extracted from the poissonized model by depoissonization, which usually means using the same results for the poissonized model after replacing z with n. This operation is justified by checking some regularity conditions, but it also introduces an asymptotically negligible error. We consider this as a standard program and will not give details, but refer the reader to the original work of Jacquet and Szpankowski [Reference Jacquet and Szpankowski7] or its presentation in textbook style in Szpankowski [Reference Szpankowski15].
4. TYPICAL CLIMBING
In typical sampling, we climb a trie by following an algorithm that emulates the natural frequency of bits. We start at the root and access nodes. At each node accessed, we generate an independent Bernoulli(p) random variable. If this variable yields zero, we follow the left edge if it exists (otherwise, the climbing is stopped), and if the value generated is 1, we follow the right edge if it exists (otherwise, the climbing is stopped).
Let S n be the number of nodes on the path inscribed in the trie by the typical climbing. For example, given the trie of Figure 1, typical climbing might produce the key X 4 in two steps with conditional probability pq, in which case S 5 = 3 (counting the node containing X 4). It might also reach the only null node in the trie (left of the root's right child) with probability pq, in which case S 5 = 2. If the null node is reached, we take our typical sample to be 0.100000 … .
Note that S n can be linked to the depth D n of a randomly chosen key, after an additional key is added to the initial n keys. If we are inserting the (n + 1)st key, this will follow the path of S n. If the climbing terminates at an empty node, S n and D n+1 are the same, but if the climbing terminates at a key, we need to insert a number of additional nodes. The number of additional nodes is geometrically distributed because we have to break the tie—there will be a number (zero or more) of bits in the new key agreeing with bits in the key colliding with it past the point of collision: Each agreement occurs with probability p 2 + q 2, but sooner or later, one disagreement (with probability 1 − p 2 − q 2) will break the tie. This geometric random variable is independent of the structure of the trie; it is distributed like the depth of two keys in a trie of size 2, but the total amount of modification to S n to become D n+1 is dependent on S n. More precisely, if X (r) is the prefix of length r of a digital key X, then

where 1E is the indicator function of an event E and D˜2 is an independent copy of D 2. The dependence in the sum introduces complications, but the analytic approach that follows is transparent and systematic enough to cover cases that cannot be linked easily to the depth, such as the case of climbing without the knowledge of p, where we generate right and left moves with equal probability.
Let φn(t) be the moment generating function of S n. Let L n and R n be respectively the number of keys in the left and right subtrees (so L n + R n = n). In view of the Bernoulli model, L n Binomial (n, q). The subtrees themselves are random tries on their respective order, which follows from the independence structure assumed in the data. The variable S n satisfies a basic recurrence:

Here and in the sequel, a tilded random variable stands for a random variable distributed like the untilded version and is conditionally independent of it. We have the conditional expectation

By a standard double expectation, we get

By the binomial distribution of L n, we get by conditioning

Toward poissonization we construct the supergenerating function A(z, t) = ∑n=0∞(φn(t)/n!)z n. First, we multiply both sides of the latter equality by z n and sum over all possible values of n to get

The sums in this last equation are then extended to start from n = 0, after introducing the necessary adjustments for n = 0 and n = 1. Subsequently,

Direct poissonization of A(z, t) runs into a problem in the existence of the Mellin transform. The difficulty is that when we multiply both sides of the equation by e −z, the right-hand side has the loose term (1 − e t) e −z. Formally, its Mellin transform is (1 − e t)Γ(s) and its domain of existence is ℜs > 0. The remaining terms, however, impose domains of existence in a strip with a negative real part. Hence, there will be no common domain of intersection.
To circumvent the difficulty, we deal with a shifted supermoment generating function. As we will see shortly, in this way the right-hand side has a difference of exponential terms, for which a domain of existence is consistent with the rest of the expression. Set

The function B(z, t) has the interpretation:

where N(z) is a Poisson random variable with parameter z; that is, B(z, t) is the poissonized moment generating function of the climbing pathlength, modified by an exponentially negligible error, from which we obtain the functional equation

The Mellin transform of this function is

where we treated a term like e −pz − e −z as e −pz − 1 − (e −z − 1), with each shifted exponential function having a Mellin transform representation in terms of the Cauchy–Saalschütz gamma function in the domain 〈−1, 0〉; that is, such a term has the Mellin transform (p −s − 1)Γ(s) in that domain. Now, the Mellin transform B *(s, t) exists in the domain

where s 0 (t) is the only real solution to the equation

Observe that s 0 (t) is a continuous function of t, with value 0 at t = 0. We thus can find a neighborhood around t = 0 for which s 0(t) is arbitrarily close to zero. We will keep |t| small enough for the entire strip 〈s 0 (t), −s 0 (t)〉 to be contained in 〈−1/4, 1/4〉.
4.1. The Mean Climbing Pathlength in Typical Climbing
We will need a few technicalities for the proof, and we discuss them first. The inverse Mellin transform involved in the mean requires a computation of residues at the roots of the characteristic equation

The roots of this equation have been studied. The following special case is known (e.g., see Szpankowski [Reference Szpankowski15], who attributed the result to Jacquet [Reference Jacquet6] and Schachinger [Reference Schachinger14]). We present the result as written in Drmota, Reznik, Savari, and Szpankowski [private communication].
Lemma 1
Let p < q. There are countably infinitely many simple solutions (characteristic roots) of 1 − p −s+1 − q −s+1 = 0. The roots satisfy the following:
(i) s 0 = 0 is always a root.
(ii) If b is a root, then 0 ≤ ℜb ≤ ρ, where ρ is the unique real positive solution of 1 − p −s+1 + q −s+1 = 0. Moreover, for every integer k, there exists a unique root s kwith imaginary part (2k − 1)π/|ln p| < 𝔧 s k ≤ (2k + 1)π/|ln p|. Consequently s k, for k = 0, ±1, ±2, … , are all the roots.
(iii) If ln p/ln q = m/r (where gcd (m, r) = 1 for positive integers m and r ), there are m − 1 roots, s 1, s 2, … , s m−1, with real part greater than zero. The rest of the roots are in the form
(iv) If ln p/ln q is irrational, then ℜs k > 0 for all k ≠ 0.
A symmetrical statement applies when p > q, but in this case, ρ is defined as the positive root of 1 + p −s+1 − q −s+1 = 0.
Theorem 1
Let S nbe the number of nodes on the path of typical climbing of a trie on n keys from the Bernoulli(p) model. Then

where η1(·) is the function given by the Fourier expansion

(with s mka nonzero solution of p −s+1 + q −s+1 = 1 with real part zero). In either case, η1is uniformly bounded by a small number.Footnote 1The o(ln n) term in the variance might also have small bounded oscillations.
Proof
In order to calculate the mean, we take the first derivative of (1) with respect to t and evaluate for t = 0, yielding

This is the Mellin transform of the expected poissonized pathlength S N(z) and exists in 〈0, 1〉.
The roots of the characteristic equation (Reference Fagin, Nievergelt, Pippenger and Strong3) determine the asymptotics of the mean. According to Lemma 1, in the case when ln p/ln q is rational, there are roots aligned and equispaced on the vertical axis of the s complex plane, and the rest of the characteristic roots fall in the right half of the s complex plane, whereas in the case where ln p/ln q is irrational, all of the roots fall in the right half of the s complex plane except for s 0 = 0.
The inverse Mellin transform is
![{\bf E} \lsqb S_{N\lpar z\rpar }\rsqb =O\lpar z^{ - M}\rpar + \sum\limits_{k=- \infty}^\infty \, \mathop {\rm Res} \limits_{s=s_{k}} \left[{z^{ - s} \Gamma \lpar s\rpar \lpar 1+\lpar\, p^2+q^2\rpar s\rpar \over 1 - q^{-s+1}-p^{-s+1}} \right]\comma](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20151022131819267-0421:S0269964808000089_eqnU21.gif?pub-status=live)
where M > ρ ≥ a is any arbitrary large positive number.
The main contribution comes from s 0 = 0, as it is the only double pole; the rest are simple. We obtain

By standard depoissonization we arrive at the same expression for E[S n], and only the error term is modified by the depoissonization error of O(n −1 ln n), which comes on top of the Mellin inversion error of o(1).
In order to calculate the second moment, we take the second derivative of (Reference Christophi and Mahmoud1) with respect to t and evaluate at t = 0. We have

This is the Mellin transform of the expected value of S 2N(z). In the inverse Mellin transform, the main contribution comes from s k = 0. After depoissonization, we get

The variance follows from the first two moments, after straightforward algebraic simplification.■
Curiously, the variance in the unbiased case is (in this case, all of the poles lie on the vertical axis of the s complex plane). In the biased case (
), we have growth in the variance with the number of keys, which admits the existence of an asymptotic distribution for the typical climbing pathlength (after an appropriate normalization).
Theorem 2
Let S nbe the number of nodes on the path of typical climbing of a trie on n keys from a biased Bernoulli(p) model. Then

Proof
We take |t| small enough so that 〈s 0(t), −s 0(t)〉⊆〈−(1/4), (1/4)〉. The inverse Mellin transform of (Reference Christophi and Mahmoud1) yields
![B\lpar z\comma \ t\rpar = - \sum\limits_{k=- \infty}^\infty \mathop {\rm Res}\limits_{s=s_{k} \lpar t\rpar } \left[{B^{\ast} \lpar s\comma\; t\rpar z^{ - s}} \right]+ O\lpar z^{ - M}\rpar](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20151022131819267-0421:S0269964808000089_eqnU26.gif?pub-status=live)
for any fixed M > ρ. Hence,

We isolated the role of s 0(t) because, as we will see shortly, it provides the dominant asymptotics when t is in a neighborhood of zero (where the gamma function also becomes very large), contrasting the finite limit of Γ(s k(t)), as t → 0, for each k ≠ 0. Depoissonization gives

The essential root s 0(t) is a continuous infinitely differentiable function of t. For t → 0, s 0(t) has the expansion

It is clear from (Reference De La Briandais2) that s 0(0) = 0. Also, s′0(0) = − and s″0(0) = −
(ln p − ln q)2, as can be seen from the derivatives of (Reference De La Briandais2). Further, we use the local expansions 1 − e x = −x + O(x 2) and Γ(x) =
+ γ+ O(x), near x = 0. After substituting t by υ/ln n, for fixed υ, we obtain

Therefore,
![{\bf E}\left[ e^{\lpar S_n - {\lpar \lpar\ln\ n\rpar / h_p}\rpar \rpar\ \left({\upsilon / \sqrt {\ln \ n}}\right)}\right] \to e^{ - \lpar{s^{\prime\prime}\lpar 0\rpar / 2}\rpar\upsilon^2}\comma](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20151022131819267-0421:S0269964808000089_eqnU31.gif?pub-status=live)
with the right-hand side being the moment generating function of a normal random variate with mean 0 and variance −s″0(0).■
Note that Theorem 2 and its proof can stand alone without the need for the development of the mean and variance of Theorem 1. However, the mean and variance given by the shortcut in Theorem 2 are only the leading terms in the full expansion provided by the more elaborate residue calculation of Theorem 1. One would not even detect the oscillations in the mean and variance with the method used in Theorem 2.
5. CLIMBING WITH THE LACK OF KNOWLEDGE OF p
If one is uninformed about p, one might be inclined to plead ignorance and simply generate moves in the random walk to the right and left subtrees with equal probability, hoping that this will average good and bad cases, achieving a sampling strategy that is not too much worse than typical climbing.
The result presented next indicates that the average speed of climbing is improved in uninformed climbing on average. Of course, the two strategies coincide when p = q = 1/2, but uninformed climbing requires less time than typical climbing as p gets away from 1/2, and the uninformed strategy speeds up considerably near the extremal values p = 0 and p = 1. However, the improved performance in the uninformed search comes at the expense of the quality of sampling, as less probable keys are given more weight than their actual probability.
The techniques are much the same as in typical climbing. We will only set up the problem, show the salient intermediate steps, and state the analogous results without proof.
Let S n be the number of nodes on the path inscribed in the trie by the uninformed climbing and let φn(t) be its moment generating function. For example, given the trie of Figure 1, uninformed climbing might produce in two steps the key X 4, with S 5 = 3, or the null node (corresponding to a key value 0.100000…), in which case, S n = 2. In either case, the key is generated with probability 1/4.
The length S n satisfies a basic recurrence:

By a standard double expectation, we get

Toward poissonization, we reintroduce the supergenerating function A(z, t) = ∑n=0∞(φn(t)/ n!)z n, and after manipulation similar to that in the case of typical climbing (mutatis mutandis, of course), we reach

As earlier, we reintroduce

to work with a shifted function possessing a Mellin transform. We first obtain the functional equation

the Mellin transform of which is

existing in the strip 〈−1, s 0(t)〉, and s 0(t) being the only real root of the equation

We take |t| small enough so that 〈s 0(t), −s 0(t)〉 ⊆ 〈−1/4, 1/4〉. After all of the manipulation and the residue calculation, we reach the result for this random walk.
Theorem 3
Let S nbe the number of nodes on the path of uninformed climbing of a trie on n keys from the Bernoulli(p) model. Then

where η2(·) is a function given by the Fourier expansion

(with s mka nonzero solution of p −s + q −s = 2 with real part equal to zero), which is bounded by a very small number. The lower-order term in the variance can also have small bounded oscillations. Moreover, in the biased case,

6. EXTREMAL SAMPLING
To develop a sense for the extremes of the data present in the trie, a sampler might take after the extremal strategy of following a leftmost (for smallest) or a rightmost (for largest) path. Of course, the two strategies are symmetric with respect to the roles of p and q, and we only analyze one of them.
Let us reintroduce S n as the number of nodes on the leftmost path and let φn(t) be its moment generating function. For instance, given the trie of Figure 1, the extremal leftmost climbing samples the key X 3, and S 5 = 4. If the leftmost path reaches a null node, we augment the corresponding prefix of zeros with a 1 to construct a representative sample of the smallest data.
The problem can be thought of in terms of the longest run of consecutive zeros. This case is connected to the maximum and second largest of independent and identically distributed geometric variables—let Z i be the number of initial zeros in the key X i and let Z (i) be the ith order statistics of Z 1, … , Z n. Then Z i are identically distributed geometric variables and

One might be able to obtain a result for S n from this representation. However, order statistics of independent and identically distributed discrete random variables are somewhat intricate because of the possible ties. We will proceed with our systematic analytic method.
The basic conditional recurrence is

for n ≥ 2, giving

Hence,
![{\bf E}\lsqb e^{S_{n} t} \rsqb =\sum\limits_{\ell=0}^n \, {\bf E}\left[{e^{\lpar 1+S_\ell\rpar t} } \right]\left({\matrix{n \cr\ell}} \right)q^\ell p^{n - \ell } .](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20151022131819267-0421:S0269964808000089_eqnU45.gif?pub-status=live)
Toward poissonization, we reintroduce the generating function A(z, t) = ∑n=0∞z n(φn(t)/n!), where φn(t) = E[e S nt]. By steps similar to previous derivations in the other two strategies, we can easily establish the relation

As we did in Section 4 for typical sampling, we do not poissonize A(z, t) directly, but we poissonize the shifted version A(z, t) − 1, for the same technical reason to overcome existential problems of the Mellin transform. The routine is pretty much the same and we omit its details. One obtains the Mellin transform

Theorem 4
Let S nbe the number of nodes on the path of extremal climbing of a trie on n keys from the Bernoulli(p) model. Then

where η3(·) is the function given by the Fourier expansion

(with s k = 2πik/ln q), which is bounded by a very small number. The o(Reference Christophi and Mahmoud1) term in the variance might also have small bounded oscillations. Furthermore, S n − ⌊log1/qn⌋ does not have a nontrivial limit in distribution under any scaling.
Proof
The mean and variance are computed by the same poissonization–depoissonization routine, aided by the Mellin transform and residue calculation as was done for typical and uninformed climbing.
We restrict |t| < 1/ln1/q. The distribution is found from the inverse of the Mellin transform (Reference Flajolet, Gourdon and Dumas4). The poles of the transform are the roots of the equation

that is, they are at the points s k(t) = (1/ln q) (t + 2πik), for k = 0, ±1, ±2, … . So,
![B\lpar z\comma\ t\rpar =- \sum\limits_{k=- \infty }^{\infty} \mathop {{\rm Res}}\limits_{s=s_{k} \lpar t\rpar } \left[{B^{ \ast} \lpar s\comma\ t\rpar z^{ - s} } \right]+O\lpar z^{ - M}\rpar \comma](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20151022131819267-0421:S0269964808000089_eqnU50.gif?pub-status=live)
for any fixed M > −1/ln q. Hence,

It helps put the result in a concise form to define the function

Depoissonization gives the same expression with n replacing z and an adjustment in the error term. We then have

We can write this as

with g(t) being equal to the zeroth term in the sum and h n(t) collecting all of the remaining terms. It is clear that no increasing scale of t will give a nontrivial limit.
It is well known that the function {log1/qn} is dense in the interval [0, Reference Christophi and Mahmoud1); see, for example, Kuipers and Niederreiter [Reference Kuipers and Niederreiter9]. For any fixed t in the range |t| < 1/ln(1/q), the function h n(t) provides additional oscillations around g(t), and, of course, the small error term can be made smaller than any arbitrary fixed number. Hence, no limit distribution exists. For a more detailed account of how such arguments work, see Christophi and Mahmoud [Reference Christophi and Mahmoud1].■
6.1. The Exact Distribution
Some of the exact distributions within the scope of this research might be amenable to direct combinatorial methods. We illustrate this for extremal climbing, to show that it can be done in principal.
Theorem 5
Let S nbe the number of nodes on the path of leftmost climbing of a trie on n ≥ 2 keys from the Bernoulli(p) model. Then, for k ≥ 2,

and Prob(S n = 0) = 0 and Prob(S n = 1) = p n.
Proof
The boundary cases Prob(S n = k), for k = 1, 2, are trivial. We will develop the result in terms of the number of edges S′n = S n − 1. Let k ≥ 2. We dissect the event {S′n = k} into two disjoint subsets. One of the two subsets, A 1, corresponds to the case where the tree goes down the left path k edges and then turns right, with all of the keys having a string of k zeros as a prefix continuing with 1 at position k + 1 (there must be at least two such keys). This construction leaves a null node dangling at the leftmost position in the tree. This event can occur by having r keys, r = 2, … , n, in the subtree, the root of which is a sibling of the leftmost null node; the probability for any specific r to have this particular key structure is (q kp)r. The rest of the n − r keys are not allowed to have a prefix of k zeros, otherwise they would disturb the pattern. The probability for these other keys not to have the forbidden prefix is (1 − q k)n−r. The r keys can be chosen in ways. Hence,

The second event, A 2, corresponds to the case where there is exactly one key at the end of a leftmost path with k internal vertices on it. By combinatorial arguments similar to that for A 1, we see that

Now,

The sums can be reduced via the binomial theorem.■
Acknowledgment
The second author wishes to thank the Institute of Statistical Mathematics, Tokyo, for supporting this research.