EXPLOITING THE WAITING TIME PARADOX: APPLICATIONS OF THE SIZE-BIASING TRANSFORMATION

Mark Brown

doi:10.1017/S026996480606013X

EXPLOITING THE WAITING TIME PARADOX: APPLICATIONS OF THE SIZE-BIASING TRANSFORMATION

Published online by Cambridge University Press: 06 March 2006

Mark Brown

Show author details

Mark Brown: Affiliation:
Department of Mathematics, The City College, CUNY, New York, NY, E-mail: CYBERGARF@aol.com

Article contents

Abstract
1. INTRODUCTION
2. PROPERTIES OF T(F)
3. INEQUALITIES FOR NBUE DISTRIBUTIONS
4. ITERATING THE STATIONARY RENEWAL TRANSFORMATION
5. POISSON ESTIMATION
6. EXTREME VALUES
7. LOWER BOUNDS FOR IFR SURVIVAL
8. HYPOTHESIS TESTING
9. SUBMULTIPLICATIVE FUNCTIONS AND NBU DISTRIBUTIONS
10. RATIO MOMENT INEQUALITIES
11. LARGE DEVIATIONS
12. RENEWAL REWARD PROCESSES
13. LAPLACE TRANSFORMS
Acknowledgments
References

Rights & Permissions

Abstract

We consider the transformation T that takes a distribution F into the distribution of the length of the interval covering a fixed point in the stationary renewal process corresponding to F. This transformation has been referred to as size-biasing, length-biasing, the renewal length transformation, and the stationary lifetime operator. We review and develop properties of this transformation and apply it to diverse areas.

Type: Research Article
Information: Probability in the Engineering and Informational Sciences , Volume 20 , Issue 2 , April 2006 , pp. 195 - 230

DOI: https://doi.org/10.1017/S026996480606013X [Opens in a new window]
Copyright: © 2006 Cambridge University Press

1. INTRODUCTION

The famous “waiting time paradox” [22, p.12] arises because the length of the interval covering a fixed point in a stationary renewal process tends to be longer than a typical interarrival time. If the interarrival time distribution has the probability density function (pdf) f, then the length of the interval covering a fixed point has pdf xf (x)/EX. This renewal length distribution is larger than f under monotone likelihood ratio and is thus stochastically larger.

In this article we consider the transformation, T, which takes a distribution on [0,∞), with finite positive mean, into its renewal length distribution,

We will refer to T as the size-biasing transformation. Other terminology that has been used is length biasing [14, p.61; 42, p.70], stationary lifetime operator [1], and renewal length transformation [12]. Size biasing appears to be the most widely used term for this transformation.

In Section 2, a few simple properties of T are reviewed and developed, and in the remainder of the article, they are applied to a number of application areas. These areas were chosen because size-biasing methodology does not appear to have previously been explicitly employed. In some cases, now results are derived (Sections 3, 4, 7, 9, 12, and 13), whereas in others the transformation merely provides perspective for more profound ideas (Sections 5, 8, and 11).

I will not give examples in areas in which size biasing has previously been applied. These include branching and population processes [2,3,20,21,23,30,31], Stein's error bound methodology [16,25,35,36], and size-biased permutations [18,24,34].

2. PROPERTIES OF T(F)

2.1. Basic Properties

If F is the cumulative distribution function (cdf) of a distribution on [0,∞), with F(0) < 1, finite mean μ, and density f with respect to a σ-finite measure λ, then T(F) is defined to be the distribution with pdf

Note that the Laplace transform of T(F) [1] is given by

and that [42, p.264]

for all R with E_F|XR(X)| < ∞. The distribution T(F) is the unique distribution satisfying (2.2), as can be seen by setting R equal to indicator functions.

Recall that the stationary renewal distribution corresponding to F is the distribution with pdf

We will use the notation, G(F), or sometimes G(X), for the stationary renewal distribution corresponding to X ∼ F. Lemma 2.1 derives two useful relationships between T(F) and G(F). Relationship (2.4) appears in [14, p.62].

Lemma 2.1: Let X ∼ F be a random variable on [0,∞) with F(0) < 1 and finite mean μ. Let T(X) denote a random variable with distribution T(F), and let G(X) be a random variable with distribution G(F). Then

where U is uniformly distributed on (0,1) and independent of T(X). Furthermore,

where G(X) is independent of X.

Proof: Let (W₁,W₂) be defined to have pdf (relative to [ell ] × λ, where [ell ] is Lebesque measure)

We find that

and thus that W₁ ∼ G(F). Also,

and thus W₂ ∼ T(F). Next,

thus the conditional distribution of W₁|W₂, given W₂ = w₂, is uniform(0,1) for all w₂. Thus, we have constructed W₁ ∼ G(F) and W₂ ∼ T(F) with W₁ /W₂ ∼ U, independently of W₂. It follows that the marginal distribution of G(X) is that of the product of two independent variables: one distributed as T(X), and the other, uniform(0,1). Thus, (2.4) holds.

Next,

Thus, the conditional distribution of W₂ given W₁ = w₁ coincides with that of X given X > w₁. Therefore, (2.5) holds. █

By integrating the joint pdf (2.6) over {(w₁,w₂),w₂ > min(w₁,t)}, we find that

where G is the survival function of the stationary renewal distribution. Defining

and noting that

it follows that

2.2. Identifying T(F)

We now discuss methods for finding T(F). Lemma 2.2 is found in [42, p.262].

Lemma 2.2: Let

have pdf f (x), with respect to a σ-finite measure w on Rⁿ, and let g,R₊ⁿ → R₊, satisfy

Define Y to be a random vector with pdf, g(x) f (x)/μ_g. Then

Proof:

for all K with E|K(X)g(X)| < ∞. In particular,

so by (2.2),

The fact that (2.12) is more general than required for the conclusion will be helpful in later sections. █

We now give some examples of computation of T(F).

Example 1: Products. If X₁,…,X_n are independent with

. Then by Lemma 2.2,

where T(X₁),…,T(X_n) are independent.

It follows from (2.4) and (2.13) that if X and Y are independent, then

Example 2: [26, p.9]:

Sums: If

, it follows from Lemma 2.2 that

where

We can construct

in the following manner. First, let I be a random variable with Pr(I = j) = μ_j /[sum ] μ_i, j = 1,…,n. Given I = j, define Y_j to be distributed as T(X_j). Finally, given I = j and Y_j = y_j, construct the conditional distribution of Y_j = {Y_k,k ≠ j} to be that of X_j given X_j = y_j. Under this construction, Y has pdf (2.15) and [sum ] Y_i is distributed as

where I is as described earlier, given I = j, T(X_I) is distributed as T(X_j), and given I = j and T(X_j) = y_j, S_−I is distributed as [sum ]_k≠j X_k given X_j = y_j.

In the case of independence, it follows that given I = j, S_−I is distributed as the unconditional distribution of S_−j, independently of T(X_j).

In the independent and identically distributed (i.i.d.) case

where T(X) and S_n−1 are independent.

Mixtures: If F is a mixture,

then

where μ(α) is the mean corresponding to F_α and μ = ∫μ(α) dP(α) is assumed finite.

Random Sums: If

where {X_i} is i.i.d., N is independent of {X_i}, and EY = ENEX < ∞, then

where S_T(N)−1 and T(X) are independent. The result follows from (2.17) and the discussion of mixtures.

Restrictions: If X ∼ F, let X_A denote a random variable distributed as X, given X ∈ A, where F(A) > 0. Then

The proof is straightforward.

2.3. Distance Between F and T(F)

Let X ∼ F and assume that F(0) = 0. Let T(X) ∼ T(F). Define

the total variation distance between F and T(F). There are several equivalent expressions for D. First,

The scaled L₁ norm, E|X − μ|/μ, is a measure of variability, which when small suggests T(F) ≈ F. Also note that

by (2.4) and (2.7). Finally,

Thus, D equals the probability that X is smaller than an independent uniform(0,μ) random variable.

2.4. Distributional Inverse

Once again, let F be a distribution on (0,∞) with F(0) = 0 and finite mean. Define V(F) to be the distribution of 1/T(X), where T(X) ∼ T(F). Note that

Also,

It follows from (2.2) and (2.24) that

and

Thus, V² = I (the identity operator) and V = V⁻¹, as operators on probability distributions on (0,∞), with finite means. Thus, we consider (F,V(F)) to be an inverse pair of distributions, conceding that the concept requires greater development and insight.

Note that under total variation distance D,

thus, by (2.20),

Thus, X and V have the same variability as measured by scaled L₁ norm. Similarly, by (2.21),

where G_V is the stationary renewal distribution corresponding to V. From (2.22),

where U is uniform(0,1) and independent of V and X.

When we examine scaled L₂ variability, there is a duality present in which scaled L₂ variability for F is equivalent to a scale-invariant harmonic mean measure of variability for V(F). As V(V(F)) = F, this relationship is reversible. Specifically,

and

as can be easily verified.

2.5. Fixed Points of V

From the definition of T, it follows that T(F) = F if and only if F is concentrated at a single point, μ > 0. The question as to when V(F) = F is far more interesting. In view of (2.23), μ = 1 is a necessary condition (for V(F) = F). However, a point mass at 1 is only one of many solutions. For example, choose A > 1, r ≤ (A + 1)⁻¹, and set Pr(X = A⁻¹) = rA, Pr(X = 1) = 1 − r(A + 1), and Pr(X = A) = r. Then we can easily verify that V(X) = X.

A method for generating a large class of fixed points for V is to choose X ∼ F,V(X) ∼ V(F), with X and V(X) independent, and define W to be a random variable distributed as

Then

Thus,

Choose X ∼ F, with F absolutely continuous with pdf F and finite mean μ. From (2.32), we compute

Thus, (2.35) gives a large class of distributions satisfying V(W) = W.

Following are a few choices of f with corresponding choices of f_W, computed from (2.35):

Call

the class of distributions satisfying V(F) = F. One property of

is that if

, with X₁ and X₂ independent, then the distribution of X₁ X₂ also belongs to

. This holds since

Another property of

is closure under mixtures. If

for all α and X ∼ ∫F_α dP_α, then it follows from EX_α ≡ 1 and (2.18) that

I pose the problem of finding a general form for all distributions in

2.6. A Subadditivity Property

For F a distribution on [0,∞) with F(0) < 1,μ(p) = EX^p < ∞, and density f, define T^p(F) to be the distribution with density

We allow −∞ < p < ∞ in (2.36).

Lemma 2.3 derives a subadditivity property for T^p,p > 0, and a superadditivity property for T^p,p < 0.

Lemma 2.3: Let X₁,…,X_n be independent, with EX_i^p < ∞, i = 1,…,n. Let T^p(X₁),…,T^p(X_n) denote independent random variables, distributed as T^p(X₁),…,T^p(X_n). Then

Proof: We will prove (i) (p > 0); the other case follows by obvious modification. We only need to consider n = 2; the case for general n then follows by induction.

Arguing similarly as in Lemma 2.2, we can show that

where

The symbol ∝ denoting proportionality. Now

so that

a quantity decreasing in y₁ for p > 0. Thus, (2.38) implies that Y₁ is smaller than T^p(X₁) in the sense of monotone likelihood ratio ordering and, thus, under stochastic ordering. Therefore, we have

Next,

so that

which is decreasing in y₂. Thus,

The result now follows from (2.37)–(2.40). █

3. INEQUALITIES FOR NBUE DISTRIBUTIONS

In this section we derive a property of the renewal length transformation and use it to derive new inequalities for NBUE (new better than used) distributions.

If F is an absolutely continuous distribution on (0,∞), then the record value process corresponding to F is a nonhomogeneous Poisson process, with intensity function equal to the failure rate, h, of F. For {X_i} i.i.d. with distribution F, the first record value, S₁, equals X₁; the second record value, S₂ is defined by

where J₂ = min{i : X_i > X₁}. More generally,

where J_m = min{i : X_i > X_{J_m−1}}.

For the second record value, S₂,

where N(t) = {#S_i ∈ [0,t]},H(t) = −ln(F(t)) = EN(t). See [38] for a discussion of record values.

Define

Note that K is increasing and

Let X* ∼ G(X). Then

where T(X) is the renewal length transformation. Of course, (3.3) is a direct consequence of (2.5).

Recall [6, p.159] that F is defined to be NBUE if EX < ∞, and

for all t such that F(t) > 0. Thus, F is NBUE if and only if

From (3.2)–(3.4) and the fact that K is increasing, Lemma 3.1 follows.

Lemma 3.1: If F is an absolutely continuous NBUE distribution, then

Employing Lemma 3.1, we obtain an inequality for NBUE distributions. Some consequences of this inequality are discussed in Section 3.2.

Theorem 3.2: If X ∼ F is NBUE, then

Proof: First consider the case of absolutely continuous F. From Lemma 3.1 and equations (2.7) and (3.1),

For {t : F(t) > 0}, divide both sides by F(t) to obtain (recalling (2.9))

Thus,

This proves (3.5) for F absolutely continuous. For the general case, we use a smoothing argument. If X ∼ F is NBUE but not absolutely continuous, let U(ε) be uniformly distributed on (0,ε) and independent of X and define

Since the NBUE class is closed under convolutions [6, p.187], X(ε) has an absolutely continuous NBUE distribution.

Applying (3.7) to X(ε), letting ε → 0, and noting that Pr(X(ε) > t) → Pr(X ≥ t),μ(ε) → μ, and E(X(ε)|X(ε) > t) → E(X|X ≥ t), (3.5) follows. █

Comments and Additions

1. The upper bound, min(e^{−(t(/μ)−1)},1) (Theorem 3.2), is the best possible global bound of the form min(be^−ct,1), even within the considerably more restrictive subclass of increasing failure rate (IFR) distributions. The exponential distribution with mean μ shows that a bound in this class must have c ≤ μ⁻¹. The point mass at {μ} has Pr(X ≥ μ) = 1, ensuring that if c = μ⁻¹, then b must be at least equal to e.

2. If, in addition (to NBUE), F is absolutely continuous with failure rate h satisfying

then L(t) ≥ t + a⁻¹. Applying (3.5) gives

A consequence of (3.9) is that

Thus, if a is close in ratio to μ⁻¹, then F is approximately exponential. This result complements a result of Daley [15], who obtained a bound on distance to exponentiality based on the first two moments of an NBUE distribution.

3. The bound (3.5) improves upon Markov's inequality based on knowledge of μ. This is not surprising, as Markov's inequality does not make any further assumptions about F. For F NBUE,

4. Let {X_t,t ≥ 0} be an ergodic Markov chain and let A be a subset of the state space. Denote by T_j(A) the first passage from jεA^c to A. Then if

T_j(A) is easily seen to be NBUE. Note that T_j(A) does not necessarily satisfy any stronger aging property (such as NBU or IFR average (IFRA)). Thus, if ET_j(A) is known, then (3.5) provides a simple upper bound for Pr(T_j(A) ≥ t).

4. ITERATING THE STATIONARY RENEWAL TRANSFORMATION

Let F be a distribution on [0,∞) with F(0) < 1 and μ_n = E_F Xⁿ < ∞, for all n = 1,2,…. Define G₁ to be the stationary renewal distribution corresponding to F, and recursively define G_n+1 to be the stationary renewal distribution corresponding to G_n,n = 1,2,…. Next, define

Since F exponential implies G exponential, we would anticipate that under general conditions, Z_n would converge to an exponential distribution with mean 1 for the fixed point and with mean 1 for the transformation. Harkness and Shataram [27] obtained sufficient conditions for convergence. Below, we obtain the same sufficient conditions by a different argument, which leads to an error bound us well.

Lemma 4.1 shows that a distribution of the form G_n(X) is a mixture of distributions of the form m_n(t), where m_n(t) is distributed as the minimum of n i.i.d. uniform(0,t) random variables. For n large, the distribution of m_n(t) is approximately exponential and, thus, G_n(X) is approximately a mixture of exponentials. A mixture of exponentials, in turn, is approximately a pure exponential if the mixing distribution has a small coefficient of variation. These ideas are quantified in Corollary 4.2.

Lemma 4.1:

, where Tⁿ(X) and m_n are independent, m_n is distributed as the minimum of n i.i.d. uniform(0,1) random variables, and Tⁿ(X) has distribution given by (2.36), with p = n.

Proof: By (2.4), the result is true for n = 1. Assume that it is true for n. By (2.4),

The pdf of G(m_n) is given by

which is the pdf of m_n+1. Thus, from (4.1),

and the induction argument is complete. █

Corollary 4.2: Let

be exponentially distributed with mean 1. Then

Thus, if lim(μ_n+1²/μ_n μ_n+2) = 1, then Z_n /EZ_n converges in distribution to

Proof: By Lemma 4.1, we represent

By the triangle inequality,

As m_n is IFR, it follows from Brown [10, p.13] that

It is straightforward to show that if Z is independent of X and Y, then

thus, from (4.3),

Next,

, is a mixture of exponential distributions and thus is decreasing failure rate (DFR). It follows from Brown [8, p.422] that

The result now follows from (4.3)–(4.5). █

5. POISSON ESTIMATION

5.1. Single Parameter Case

If X is Poisson distributed with parameter λ, then

This follows since

From (2.2) and (5.1),

Next,

Thus, based on a single observation, X, we can unbiasedly estimate Cov(X, K(X)). It follows that if X₁,…,X_n is a random sample, with λ unknown, then

is the minimum variance unbiased estimator of Cov(X,K(X)). Of course,

is binomially distributed with parameters (k,1/n).

5.2. Multiparameter Case

Next, suppose that X₁,…,X_n are independent, with X_i Poisson distributed with parameter λ_i; the λ_i's are not assumed equal. Consider the problem of the simultaneous estimation of

, based on the data X₁,…,X_n under expected total mean square error loss

We can take an arbitrary estimator of λ, and put it into the form

Then

The savings in risk in employing

over that of X is given by

The extraordinarily insightful approach initiated by Stein [40], is to unbiasedly estimate the savings in risk (5.6) and then, by inspired choice of g, demonstrate that the unbiased estimator is nonnegative with probability 1 and positive with positive probability. Its expectation will then be positive for all λ, and X + g will dominate X, proving X inadmissible. Stein's approach has worked successfully for several families of distributions [29,40,41]. Its elegant application to the Poisson case is due to Peng [32]. From our point of view, the key identities for the Poisson case are related to properties of the size-biasing transformation.

Recall (5.3). This identity is easily generalized, via (2.12), to

where δ_i is a vector of n components with δ_i(i) = 1,δ_i(j) = 0,j ≠ i.

From (5.6) and (5.7),

We now have an unbiased estimator of the savings in risk. For n ≥ 3, Peng discovered a choice of g for which the quantity in brackets is nonnegative with probability 1 and positive with positive probability. Thus, for n ≥ 3, X is inadmissible under total mean square error loss. Furthermore, for some λ, the savings in risk can be substantial. The surprising aspect, which goes back to Stein's famous 1955 paper [39], is that for each parameter λ_i, Peng's estimator uses {X_j,j ≠ i} in computing

, as well as X_i. As {X_j,j ≠ i} is independent of X_i and has a distribution that does not depend on λ_i, it is counterintuitive that it should be relevant at all. This issue has received a great deal of deep thought. See, for example, Efron and Morris [19].

6. EXTREME VALUES

Let X₁,…,X_n,… be an i.i.d. sequence and define M_n = max(X₁,…,X_n),n = 1,2,…. For a Borel set A with Pr(X ∈ A) > 0, define

We are interested in properties of the sequence {c_n}, which we will refer to as a max sequence. Since

we see that {c_n} is supermultiplicative. Another property, derived in Brown [11, p.855], is that

It follows from (6.3) that

and

Setting c₀ = 1, it follows from (6.4) that

Thus, if c_i /c_i−1 converges, its limit must equal c.

Using a property of the renewal length transformation (Lemma 6.1), we prove that

It follows from (6.6) and (6.7) that

One consequence of (6.8) is that if c(A) < 1, then {c_n} is ultimately decreasing. More specifically, define

Then n ≥ N, implies that c_n < c_n−1. A sufficient condition for c < 1 is that for some t, F(t) > 0 and F(A ∩ (t,∞)) = 0. We now derive (6.7).

Lemma 6.1: If c₁ = P(X ∈ A) > 0, then (6.7) holds.

Proof: First consider the case in which X is uniformly distributed on (0,1). Let M_n = max(X₁,…,X_n). Then

Thus, T(M_n) = M_n+1, and from (2.19),

Since ET(X) ≥ EX for any random variable X, it follows from (6.10) that

Next,

The result for X uniform(0,1) thus follows from (6.11) and (6.12).

For the general case, define

Then X ∼ r(U), where U ∼ uniform(0,1), and M_n = max(X₁,…,X_n) ∼ r(U_(n)), where U_(n) is the maximum of n i.i.d. uniform(0,1) random variables. Consequently,

and the general result follows from the uniform case. █

Remark: We note that the sequence, {c_n+1 /c_n} need not be monotone. For example, if X ∼ Uniform(0,1) and A = (0.4,0.6) ∪ (0.99,1), then c₃ /c₂ > c₄ /c₃ < c₅ /c₄.

Remark: I pose the following question. What are necessary and sufficient conditions for a sequence, {c_n}, to be a max sequence?

7. LOWER BOUNDS FOR IFR SURVIVAL

In this section we study the problem of lower bounding the survival function of an IFR distribution, in terms of its first two moments. The problem was suggested to me by Persi Diaconis (personal communication). It arises in the study of ergodicity of birth and death processes (see, e.g., Diaconis and Fill [17]).

Barlow and Marshall [5, p.1262] derived the best lower bound for an IFR distribution, given the first two moments. This bound is not an explicit function of the first two moments, but is the solution of equations depending on these moments, which are solved numerically. We derive an explicit, simple, but nonoptimal lower bound.

Recall that an IFR distribution on (0,∞) is absolutely continuous, except perhaps for an atom at the right-hand endpoint of its support. For simplicity, we will assume that there is no such atom. The IFR property requires that the failure rate, h(t) = f (t)/F(t), be nondecreasing and, equivalently, that the integrated hazard, H(t), be convex.

A simple method for lower bounding F(t) is to use Jensen's inequality,

so that if EY is known, and EH(Y) can be upper bounded (say by b), then by convexity,

Similarly, if EY < EZ are known and H(EY) ≤ b and H(EZ) ≤ c with b < c are known, then we can upper bound H for t ∈ [EY,EZ], using convexity, and thus lower bound F(t).

We will use G(X),X, and T(X) as our choices of random variables. The means are of course known. The mean of H(X) equals one, and we will show in Lemma 7.1 that

The use of the three distributions along with (7.1), (7.2), and convexity of H lead to

and then for F,

Barlow and Proschan [6, p.112] derived lower bounds for F(t) using the theory of convex ordering. Their bounds, based on the first two moments, give

The new bound (7.3) offers improvement over (7.5) in that it has a larger domain

and gives smaller upper bounds for H(t) in their common domain. The (7.5)-related bounds, using higher moments, still remain useful.

In the above-mentioned birth and death process application, (7.3) can be used to lower bound the separation distance between a birth and death process at time t (starting in state 0) and the steady-state distribution. This separation distance is a convolution of exponential distributions and is thus IFR.

We now derive (7.1) and (7.2). They hold for general distributions; however, (7.3) does not.

Lemma 7.1: Let F be an absolutely continuous distribution with E_F X² < ∞. Let X* ∼ G(F) and T ∼ T(F). Then

Proof: Recall the record value process discussed in Section 3. Now

Furthermore,

where m(t) = E(X − t|X > t). A well-known and useful result [7] is that Em²(X) = σ². It thus follows from (7.6) and (7.7) that

Next, we observe that

Thus, (7.1) holds. █

The fact that

follows from (2.5), as H(T) − H(X*) is the total hazard accumulated between X* and the next event, T.

8. HYPOTHESIS TESTING

The size-biasing transformation provides an interesting perspective for simple versus simple hypothesis testing. Suppose that X is a random variable, vector, or function, with density f₀ under H₀, and density f₁ under H₁, both with respect to the σ-finite measure μ. Assume that f₀ and f₁ are mutually absolutely continuous. Define

the likelihood ratio statistic. Define Y to be a random variable distributed as the distribution of λ(X) under f₀. Then T(Y) is the distribution of λ(X) under f₁, and V(Y) is the distribution of f₀(X)/f₁(X) = λ⁻¹ under f₁. Thus, λ under f₀ and λ⁻¹ under f₁ are inverse distributional pairs, as defined in Section 2.4. These assertions hold since EY = 1 and

Thus, by (2.2) and (8.2), T(Y) is the distribution of λ under f₁.

Define F to be the survival function of Y and thus of λ(X) under f₀. Then

the type 1 error of the Neyman–Pearson test that rejects H₀ for λ > c and otherwise accepts H₀. Furthermore, by (2.7),

the power of the above test where G is the stationary renewal distribution corresponding to F. It follows from (8.3) that

Thus, the distribution of λ under H₁ is determined from its distribution under H₀ by (8.4).

Define

which we call the excess at c. We see from (8.5) that the excess is an absolutely continuous convex function. This is true, despite the fact that the distribution of λ might be purely discrete.

If the distribution of λ under H₀ belongs to

(see Section 2.5), then an interesting symmetry property holds. The distribution of λ⁻¹ under H₁ will coincide with that of λ under H₀. If statistician I rejects H₀ for λ > c and accepts H₀ otherwise and statistician II rejects H₁ for λ⁻¹ > c and otherwise accepts H₁, then the two statisticians will have (each according to his own perspective) the same type 1 and type 2 errors.

9. SUBMULTIPLICATIVE FUNCTIONS AND NBU DISTRIBUTIONS

A function g on [0,∞) is defined to be submultiplicative if

Submultiplicative functions arise in the study of Banach algebras, semigroups, and other areas of functional analysis (see, for example, several discussions in Hille and Phillips [28]). Phillips [33] derived the bounds (9.1) and (9.2). Let g be a nonnegative, integrable, submultiplicative function on [0,∞) with

Then

Suppose that we restrict g further by requiring that g(t) = F(t), the survival function of a distribution on [0,∞). The submultiplicative property is then equivalent to the NBU condition

for all s ≥ 0 and t ≥ 0. Daley (personal communication) suggested exploring the connection between NBU distributions and the theory of submultiplicative functions. That suggestion is the subject of this section.

It turns out that (9.1) provides a neat upper bound,

the square of the Markov inequality bound F(t) ≤ (μ/t). It does not appear to be known in the NBU literature. On the other hand, (9.2) is not very good for NBU distributions, as our bound, (3.5), for the larger NBUE class improves both the exponent (from 2/μe to 1/μ) and the multiplying constant (from e² to e). Of course, Phillips was concerned with general submultiplicative functions, whereas we work with survival functions.

Inequality (9.3) does not extend to NBUE distributions. For example, let X = 1 or 3, each with probability ½. Then F is NBUE, but not NBU. Moreover,

Combining (3.5) and (9.3), it follows that for F NBU,

The bound (9.3) is better than (3.5) for t < cμ, with c ≈ 3.51286; however, the situation reverses for t > cμ.

We now study the problem of improving (9.4). In Corollary (9.2), we show that for t > μ,

where

Our approach is to use the Section 3 idea that

where X is independent of Y₁ and Y₂. In Section 3 we chose Y₁ ∼ G(X) and Y₂ ∼ S₂. For our current application, we employ Y₁ ∼ T(X) and Y₂ ∼ S₂, the second record value. Now for F absolutely continuous,

where S₃ is the third record value. Note that

For Y₂ = T, define T₂ to be a random variable distributed as X|X > T, where X and T are independent. We find

where L(t) is defined in (2.8) and h is the failure rate function of X.

We require a lower bound for F_T₂(t), which is derived in Lemma 9.1.

Lemma 9.1: Let F be an absolutely continuous NBU distribution on (0,∞). Then

Proof: Defining

, we find that

as can be verified by differentiation. Now

where we used the superaddivity property of H(H(t) ≥ H(x) + H(t − x)), which is equivalent to the submultiplicativity of F. Thus, from (9.12),

Thus, (9.11)–(9.13), prove (9.10). █

Using Lemma 9.1,

, (9.8) and (9.9), we now derive (9.5). A probabilistic proof of Phillips' bound (9.1), for the case of survival functions, is also provided.

Corollary 9.2: If F is NBU with mean μ, then for t ≥ μ,

where B(t) is given by (9.6). Moreover,

where

is the unique solution in (1,∞) of the equation

Proof: Let U_t be uniformly distributed on [0,t]. By the submulticativity of F,

Now, F(U_t) and F(t − U_t) are identically distributed and negatively correlated, as F(U_t) is decreasing in t while F(t − U_t) is increasing. Thus,

To prove the result for F(t⁻) = Pr(X ≥ t), we use the same smoothing argument that we employed in the proof of Theorem 3.2.

Now assume that F is absolutely continuous. By (9.8), (9.9),

, and Lemma 9.1,

For {t|F(t) > 0}, divide both sides of (9.20) by F(t) and rearrange terms, obtaining

For t > μ, the inequality in (9.21) is satisfied if and only if H(t) ≥ B(t), given by (9.6), as B(t) is the unique positive root of

and s(x) < 0 for 0 ≤ x < B(t), whereas s(x) ≥ 0 for x ≥ B(t). For the general case (F not necessarily absolutely continuous) we use the smoothing argument of Theorem 3.2. Thus, (9.14) has been proved. As for (9.15), we note that

from which the desired conclusion follows. The value for

in (9.16) and (9.17) was found numerically. █

Remark: If X is NBU, then each interarrival time, S_n − S_n−1, for the record value process is stochastically smaller than X and thus stochastically smaller than the corresponding interarrival time for the renewal process corresponding to X. It follows that

where EZ(t) is the expected forward recurrence time at t. It thus follows that

Comparing (9.22) to (3.5), for distributions that deteriorate with age, we would usually expect that for most t,

so that (9.22) offers potential improvement over (3.5).

10. RATIO MOMENT INEQUALITIES

Lemma 2.3 discussed a subadditivity property for T^p, p > 0, and a corresponding superadditivity property p > 0. It follows that if g is an increasing function, then

with a reverse inequality for p < 0. In (10.1), X₁,…,X_n are independent, as are T^p(X₁),…,T^p(X_n).

We will only discuss the choice g(x) = x. Define

It follows from (10.1) that

For the choice p = −1, (10.3) implies

equivalently,

where

Defining k(a₁,…,a_n) = ([sum ]a_i⁻¹)⁻¹, (10.4) is equivalent to

Jensen's inequality for concave functions is suggested by (10.6), with the assumption that k is concave as a function from R₊ⁿ to R₊. To prove (10.4) inductively, we only need to consider the case n = 2. By examination of the Hessian corresponding to k(x,y), we confirm that k is concave. Therefore, (10.4) and (10.5) hold, without independence. I have not seen this inequality in the literature.

The choice k = −2 leads to

Unlike (10.4), I do not know how to verify (10.7) directly.

Thus, I am suggesting that (10.1), with freedom of choice for g and p, could be a useful mechanism for generating interesting inequalities.

11. LARGE DEVIATIONS

In this section we discuss Chernoff's theorem [4, p.6; 13, p.49]. The renewal length transformation is already implicit in the existing proofs, so we merely make it a bit more explicit here. We will not endeavor to find the weakest possible conditions. First we derive a preliminary lemma.

Lemma 11.1: Let W be a positive random variable, with T = T(W) its renewal length transformation. Then the following hold:

(i) Pr(W ∈ A) ≤ (EW)Pr(T ∈ A)e^{−E(log T
|T∈A)}

(ii) If, in addition, E log T = 0, then

Proof:

(i)

where we used convexity of −ln x as well as (2.2).

(ii) If E log T = 0, then

Next, suppose that {X_i} is i.i.d. with EX < 0,Pr(X > 0) > 0, and M(t) = Ee^tX < ∞ for all t. The condition EX < 0 implies M′(0) < 0;Pr(X > 0) > 0 ensures that M(t) → ∞ as t → ∞. As M is strictly convex, decreasing at t = 0 and eventually increasing, it follows that there exists a unique τ > 0, with M′(τ) = 0. At t = τ,M has a global minimum, ρ = M(τ) < M(0) = 1.

For applying Lemma 11.1, set

so that

Since W = Πe^τX_i, it follows from Lemma 2.2 and (2.13) that

, where each Y_i has pdf

From (11.2), EY = M′(τ) = 0 and Var Y = ρ⁻¹M′′(τ). It follows that (recall T = T(W))

and

Next, set A = [1,∞),α_n = Pr(X_n ≥ 0) (where

), and β_n = Pr(Y_n ≥ 0). By Lemma 11.1(ii),

It follows from (11.3) that

By the central limit theorem, β_n → ½; thus,

Finally, Markov's inequality yields

This ensures that

We thus conclude from (11.4) and (11.5) that

the conclusion of Chernoff's theorem.

12. RENEWAL REWARD PROCESSES

Consider a renewal reward process [37, pp.132,133]. This is generated by an i.i.d. sequence {X_i,i ≥ 1}, with X ∼ F and F(0) = 0. Corresponding to each X_i there is a reward R_i; R_i is allowed to depend on X_i and other random quantities, but the pairs {(X_i,R_i),i ≥ 1} are assumed i.i.d. Define

, and

If EX and ER are both finite, then

almost surely and in expectation [37, p.133]. The quantity ER/EX is referred to as the long-run average reward rate.

Next, consider the stationary version of the renewal reward process. It can be constructed by starting a renewal interval at −Y, with Y ∼ G, the stationary renewal distribution corresponding to F; if Y = y, then the length of the interval covering {0} is constructed to be distributed as X|X > y. By (2.5), the resulting unconditional distribution of the length of the interval covering {0} is that of T(F). Finally, given T = t, the reward

for this interval covering {0} is constructed to have the conditional distribution of R given X = t, so that

Now, by (2.2) and (12.1),

Thus, the observed reward rate over the interval covering a fixed point in the stationary process is an unbiased estimator of the long-term average reward rate.

Next, suppose that the experimenter observes the interval covering {0}, as well as the next n − 1 ordinary intervals. For this person, the observed reward rate is given by

In Lemma 12.1 we show that the observed reward rate, (12.3), is once again an unbiased estimator of the long-term average reward rate (ER/EX).

Lemma 12.1: The expected value of the random quantity in (12.3) equals ER/EX.

Proof: Let Y = (T,X₂,…,X_n), where T ∼ T(X) and X_i ∼ F,i = 2,…,n, and T and X₂,…,X_n are independent. The pdf of Y is given by

It follows from (12.4) that

where X = (X₁,…,X_n) are i.i.d. random variables distributed as F and g satisfies E|X₁ g(X)| < ∞.

Next, suppose that g is symmetric. Then

From (12.5) and (12.6) we have

for all symmetric g with E|X₁ g(X)| < ∞.

Next, choose

It follows from (12.7) that

Remark: It might have occurred to the reader to prove Lemma 12.1 by choosing the interarrival time distribution to be distributed as

. Then

(2.17), and

, so that ER(S_n)/ES_n = ER/EX, and the result would then follow from (12.2). The problem with this argument is that one would need to prove that (12.1) implies

which is not obvious to me.

13. LAPLACE TRANSFORMS

For X a random variable on [0,∞) with Laplace transform Ł, we can interpret Ł to be the survival function of

, where

are independent and

is exponential with mean 1:

Similarly, the Laplace transform of T(X) is the survival function of

, where

and is independent of

We see from (13.1) and (13.2) that as a survival function, Ł_X is the stationary renewal distribution corresponding to Ł_T. This can also be seen from (2.14) and (2.25) as

It is shown in Brown [8, p.422] that if

is the stationary renewal distribution corresponding to F, then

Applying (13.3) and (13.4) to Y = V and recalling (2.23) and (2.30), we obtain

where

Thus, if EX⁻¹ ≈ (EX)⁻¹, then the Laplace transform form of X is close to that of a one point distribution at {μ} and is also close to the Laplace transform of T(X).

Next, consider a completely monotone function g on (0,∞), with g(0) < ∞. Employing Bernstein's theorem as in Brown [9, p.1241], it follows from (13.5), and (13.6) that

and

Thus, we can interpret c²(X) as a scale-invariant measure of variability. When c²(X) is small, X is close to a point distribution at μ. This proximity is reflected in the behavior of expectations of completely monotone functions of X.

Note that if g is completely monotone, then g is maximized at zero and decreases rapidly as t increases. The mean of g(X) lies in (0,g(0)]. The quantity c²(X) ∈ [0,1) provides an upper bound to the fraction of g(0) that Eg(X) can realize. It is appropriate to base a bound on a parameter that measures the relative concentration of X near zero (where g is maximized), rather than one based on the tail behavior that is largely irrelevant. Thus, an harmonic mean-based bound, rather than a variance-based bound does make good sense here. Similarly,

thus, the Laplace transform of V(X), considered as a survival function, is the stationary renewal distribution corresponding to the Laplace transform of X⁻¹. The analogs of (13.8) and (13.9) are

where

Thus, the dual measures of variability discussed in Section 2.4 enter into the bounds for expectations of completely monotone functions of X and V(X).

Acknowledgments

I am indebted to Daryl Daley for suggesting the Section 9 problem and to Persi Diaconis for suggesting the problem discussed in Section 7. I thank David Aldous, Albert Marshall, and two referees for bringing relevant references to my attention. I am grateful to the National Security Agency for their support of this research.

References

REFERENCES

Abate, J. & Whitt, W. (1996). An operational calculus for probability distributions via Laplace transforms. Advances in Applied Probability 28: 75–113.Google Scholar

Aldous, D. & Pitman, J. (1998). Tree-valued Markov chains derived from Galton-Watson processes. Annales de l'Instute Henri Poincaré: Probabilités et statistiques 34: 637–686.Google Scholar

Athreya, K.B. (2000). Change of measures for Markov chains and the L log L theorem for branching processes. Bernoulli 6: 323–338.Google Scholar

Bahadur, R.R. (1971). Some limit theorems in statistics. Philadelphia: SIAM.

Barlow, R.E. & Marshall, A.W. (1964). Bounds for distributions with monotone hazard rate, II. Annals of Mathematical Statistics 35: 1258–1274.Google Scholar

Barlow, R.E. & Proschan, F. (1975). Statistical theory of reliability and life testing: Probability models. New York: Holt, Rhinehart and Winston.

Bremaud, P. (1981). Point processes and queues: Martingale dynamics. New York: Springer-Verlag.

Brown, M. (1983). Approximating IMRL distributions by exponential distributions with applications to first passage times. Annals of Probability 11: 419–427.Google Scholar

Brown, M. (1985). A measure of variability based on the harmonic mean and its use in approximations. Annals of Statistics 13: 1239–1243.Google Scholar

Brown, M. (1987). Inequalities for distributions with increasing failure rate. In A.E. Gelfand, Contributions to the theory and applications of statistics. A volume in honor of Herbert Solomon. New York: Academic Press, pp. 3–17.

Brown, M. (1999). Interlacing eigenvalues in time reversible Markov chains. Mathematics of Operations Research 24: 847–864.Google Scholar

Brown, M. (2001). Exploiting the waiting time paradox: Applications of the renewal length transformation. City College, CUNY Technical Report, City College, New York.

Chernoff, H. (1972). Sequential analysis and optimal design. Philadelphia: SIAM.

Cox, D.R. & Lewis, P.A.W. (1966). The statistical analysis of series of events. New York: Wiley.

Daley, D.J. (1988). Tight bounds on the exponential approximation of some aging distributions. Annals of Probability 16: 414–423.Google Scholar

Dembo, A. & Rinott, Y. (1996). Some examples of normal approximations by Stein's method. In Random Discrete Structures, IMA Vol. Math. 76. New York: Springer-Verlag.

Diaconis, P. & Fill, J.A. (1990). Strong stationary times via a new form of duality. Annals of Probability 18: 1483–1522.Google Scholar

Donnelly, P. (1991). The heaps process, libraries and size-biased permutations. Journal of Applied Probability 28: 321–335.Google Scholar

Efron, B. & Morris, C. (1973). Combining possibly related estimation problems (with discussion). Journal of the Royal Statistical Society B 35: 379–421.Google Scholar

Ethier, S.N. (1990). The distribution of the frequencies of age-ordered alleles in a diffusion model. Advances in Applied Probability 22: 519–532.Google Scholar

Ethier, S.N. (1992). Equivalence of two descriptions of the age of alleles. Journal of Applied Probability 29: 185–189.Google Scholar

Feller, W. (1971). An introduction to probability theory and its applications, Vol. II, 2nd ed. New York: Wiley.

Geiger, J. (1996). Size-biased and conditioned random splitting trees. Stochastic Processes and their Applications 2: 187–207.Google Scholar

Gnedin, A.V. (1998). On convergence and extensions of size-biased permutations. Journal of Applied Probability 35: 642–650.Google Scholar

Goldstein, L. & Reinart, G. (1997). Stein's method and the zero bias transformation with application to simple random sampling. Annals of Applied Probability 4: 935–952.Google Scholar

Goldstein, L. & Rinott, Y. (1996). Multivariate normal approximations by Stein's method and size bias couplings. Journal of Applied Probability 33: 1–17.Google Scholar

Harkness, W.L. & Shataram, R. (1969). Convergence of a sequence of transformations of distribution functions. Pacific Journal of Mathematics 31: 403–415.Google Scholar

Hille, E. & Phillips, R.S. (1957). Functional analysis and semi-groups. American Mathematical Society Colloquium Publications, Vol. XXXI. Providence, RI: American Mathematical Society.

Hudson, H.M. (1978). A natural identity for exponential families with applications in multiparameter estimation. Annals of Statistics 6: 473–484.Google Scholar

Lyons, R., Pemantle, R., & Peres, Y. (1995). Conceptual proofs of L log L criteria for mean behavior of branching processes. Annals of Probability 23: 1125–1138.Google Scholar

Olofsson, P. (1998). The L log L condition for general branching processes. Journal of Applied Probability 3: 537–544.Google Scholar

Peng, J.C. (1975). Simultaneous estimation of the parameters of independent Poisson distributions. Stanford University Technical Report 78, Stanford, CA.

Phillips, R.S. (1954). An inversion formula for Laplace transforms and semi-groups of linear operators. Annals of Mathematics 59: 325–356.Google Scholar

Pitman, J. (1996). Random discrete distributions invariant under size-biased permutations. Advances in Applied Probability 28: 525–539.Google Scholar

Reinert, G. (1998). Stein's method and application to empirical measure. Sociedad Mathematica Mexicana 14: 65–120.Google Scholar

Rinott, Y. & Rotar, V. (2000). Normal approximations by Stein's method. Decisions in Economics and Finance 23: 15–29.Google Scholar

Ross, S.M. (1996). Stochastic processes, 2nd ed. New York: Wiley.

Shorrock, R.W. (1972). A limit theorem for inter-record times. Journal of Applied Probability 9: 219–223.Google Scholar

Stein, C. (1955). Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In Proceedings of the third Berkeley symposium on mathematical statistics and probability. Berkeley: University of California Press, pp. 197–209.

Stein, C. (1973). Estimation of the mean of a multivariate normal distribution. Stanford University Technical Report 48, Stanford, CA.

Stein, C. (1974). Estimation of the mean of a multivariate normal distribution: I. Estimation of the means. Stanford University Technical Report 63, Stanford, CA.

Thorisson, H. (2000). Coupling, stationarity and regeneration. New York: Springer-Verlag.

Article contents

EXPLOITING THE WAITING TIME PARADOX: APPLICATIONS OF THE SIZE-BIASING TRANSFORMATION

Abstract

1. INTRODUCTION

2. PROPERTIES OF T(F)

2.1. Basic Properties

2.2. Identifying T(F)

2.3. Distance Between F and T(F)

2.4. Distributional Inverse

2.5. Fixed Points of V

2.6. A Subadditivity Property

3. INEQUALITIES FOR NBUE DISTRIBUTIONS

Comments and Additions

4. ITERATING THE STATIONARY RENEWAL TRANSFORMATION

5. POISSON ESTIMATION

5.1. Single Parameter Case

5.2. Multiparameter Case

6. EXTREME VALUES

7. LOWER BOUNDS FOR IFR SURVIVAL

8. HYPOTHESIS TESTING

9. SUBMULTIPLICATIVE FUNCTIONS AND NBU DISTRIBUTIONS

10. RATIO MOMENT INEQUALITIES

11. LARGE DEVIATIONS

12. RENEWAL REWARD PROCESSES

13. LAPLACE TRANSFORMS

Acknowledgments

References

REFERENCES

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests