Hostname: page-component-745bb68f8f-5r2nc Total loading time: 0 Render date: 2025-02-04T15:06:28.049Z Has data issue: false hasContentIssue false

EXPLOITING THE WAITING TIME PARADOX: APPLICATIONS OF THE SIZE-BIASING TRANSFORMATION

Published online by Cambridge University Press:  06 March 2006

Mark Brown
Affiliation:
Department of Mathematics, The City College, CUNY, New York, NY, E-mail: CYBERGARF@aol.com
Rights & Permissions [Opens in a new window]

Abstract

We consider the transformation T that takes a distribution F into the distribution of the length of the interval covering a fixed point in the stationary renewal process corresponding to F. This transformation has been referred to as size-biasing, length-biasing, the renewal length transformation, and the stationary lifetime operator. We review and develop properties of this transformation and apply it to diverse areas.

Type
Research Article
Copyright
© 2006 Cambridge University Press

1. INTRODUCTION

The famous “waiting time paradox” [22, p.12] arises because the length of the interval covering a fixed point in a stationary renewal process tends to be longer than a typical interarrival time. If the interarrival time distribution has the probability density function (pdf) f, then the length of the interval covering a fixed point has pdf xf (x)/EX. This renewal length distribution is larger than f under monotone likelihood ratio and is thus stochastically larger.

In this article we consider the transformation, T, which takes a distribution on [0,∞), with finite positive mean, into its renewal length distribution,

We will refer to T as the size-biasing transformation. Other terminology that has been used is length biasing [14, p.61; 42, p.70], stationary lifetime operator [1], and renewal length transformation [12]. Size biasing appears to be the most widely used term for this transformation.

In Section 2, a few simple properties of T are reviewed and developed, and in the remainder of the article, they are applied to a number of application areas. These areas were chosen because size-biasing methodology does not appear to have previously been explicitly employed. In some cases, now results are derived (Sections 3, 4, 7, 9, 12, and 13), whereas in others the transformation merely provides perspective for more profound ideas (Sections 5, 8, and 11).

I will not give examples in areas in which size biasing has previously been applied. These include branching and population processes [2,3,20,21,23,30,31], Stein's error bound methodology [16,25,35,36], and size-biased permutations [18,24,34].

2. PROPERTIES OF T(F)

2.1. Basic Properties

If F is the cumulative distribution function (cdf) of a distribution on [0,∞), with F(0) < 1, finite mean μ, and density f with respect to a σ-finite measure λ, then T(F) is defined to be the distribution with pdf

Note that the Laplace transform of T(F) [1] is given by

and that [42, p.264]

for all R with EF|XR(X)| < ∞. The distribution T(F) is the unique distribution satisfying (2.2), as can be seen by setting R equal to indicator functions.

Recall that the stationary renewal distribution corresponding to F is the distribution with pdf

We will use the notation, G(F), or sometimes G(X), for the stationary renewal distribution corresponding to XF. Lemma 2.1 derives two useful relationships between T(F) and G(F). Relationship (2.4) appears in [14, p.62].

Lemma 2.1: Let XF be a random variable on [0,∞) with F(0) < 1 and finite mean μ. Let T(X) denote a random variable with distribution T(F), and let G(X) be a random variable with distribution G(F). Then

where U is uniformly distributed on (0,1) and independent of T(X). Furthermore,

where G(X) is independent of X.

Proof: Let (W1,W2) be defined to have pdf (relative to [ell ] × λ, where [ell ] is Lebesque measure)

We find that

and thus that W1G(F). Also,

and thus W2T(F). Next,

thus the conditional distribution of W1|W2, given W2 = w2, is uniform(0,1) for all w2. Thus, we have constructed W1G(F) and W2T(F) with W1 /W2U, independently of W2. It follows that the marginal distribution of G(X) is that of the product of two independent variables: one distributed as T(X), and the other, uniform(0,1). Thus, (2.4) holds.

Next,

Thus, the conditional distribution of W2 given W1 = w1 coincides with that of X given X > w1. Therefore, (2.5) holds. █

By integrating the joint pdf (2.6) over {(w1,w2),w2 > min(w1,t)}, we find that

where G is the survival function of the stationary renewal distribution. Defining

and noting that

it follows that

2.2. Identifying T(F)

We now discuss methods for finding T(F). Lemma 2.2 is found in [42, p.262].

Lemma 2.2: Let

have pdf f (x), with respect to a σ-finite measure w on Rn, and let g,R+nR+, satisfy

Define Y to be a random vector with pdf, g(x) f (x)/μg. Then

Proof:

for all K with E|K(X)g(X)| < ∞. In particular,

so by (2.2),

.

The fact that (2.12) is more general than required for the conclusion will be helpful in later sections. █

We now give some examples of computation of T(F).

Example 1: Products. If X1,…,Xn are independent with

. Then by Lemma 2.2,

where T(X1),…,T(Xn) are independent.

It follows from (2.4) and (2.13) that if X and Y are independent, then

Example 2: [26, p.9]:

Sums: If

, it follows from Lemma 2.2 that

where

We can construct

in the following manner. First, let I be a random variable with Pr(I = j) = μj /[sum ] μi, j = 1,…,n. Given I = j, define Yj to be distributed as T(Xj). Finally, given I = j and Yj = yj, construct the conditional distribution of Yj = {Yk,kj} to be that of Xj given Xj = yj. Under this construction, Y has pdf (2.15) and [sum ] Yi is distributed as

where I is as described earlier, given I = j, T(XI) is distributed as T(Xj), and given I = j and T(Xj) = yj, SI is distributed as [sum ]kj Xk given Xj = yj.

In the case of independence, it follows that given I = j, SI is distributed as the unconditional distribution of Sj, independently of T(Xj).

In the independent and identically distributed (i.i.d.) case

where T(X) and Sn−1 are independent.

Mixtures: If F is a mixture,

then

where μ(α) is the mean corresponding to Fα and μ = ∫μ(α) dP(α) is assumed finite.

Random Sums: If

where {Xi} is i.i.d., N is independent of {Xi}, and EY = ENEX < ∞, then

where ST(N)−1 and T(X) are independent. The result follows from (2.17) and the discussion of mixtures.

Restrictions: If XF, let XA denote a random variable distributed as X, given XA, where F(A) > 0. Then

The proof is straightforward.

2.3. Distance Between F and T(F)

Let XF and assume that F(0) = 0. Let T(X) ∼ T(F). Define

the total variation distance between F and T(F). There are several equivalent expressions for D. First,

The scaled L1 norm, E|X − μ|/μ, is a measure of variability, which when small suggests T(F) ≈ F. Also note that

by (2.4) and (2.7). Finally,

Thus, D equals the probability that X is smaller than an independent uniform(0,μ) random variable.

2.4. Distributional Inverse

Once again, let F be a distribution on (0,∞) with F(0) = 0 and finite mean. Define V(F) to be the distribution of 1/T(X), where T(X) ∼ T(F). Note that

Also,

It follows from (2.2) and (2.24) that

and

Thus, V2 = I (the identity operator) and V = V−1, as operators on probability distributions on (0,∞), with finite means. Thus, we consider (F,V(F)) to be an inverse pair of distributions, conceding that the concept requires greater development and insight.

Note that under total variation distance D,

thus, by (2.20),

Thus, X and V have the same variability as measured by scaled L1 norm. Similarly, by (2.21),

where GV is the stationary renewal distribution corresponding to V. From (2.22),

where U is uniform(0,1) and independent of V and X.

When we examine scaled L2 variability, there is a duality present in which scaled L2 variability for F is equivalent to a scale-invariant harmonic mean measure of variability for V(F). As V(V(F)) = F, this relationship is reversible. Specifically,

and

as can be easily verified.

2.5. Fixed Points of V

From the definition of T, it follows that T(F) = F if and only if F is concentrated at a single point, μ > 0. The question as to when V(F) = F is far more interesting. In view of (2.23), μ = 1 is a necessary condition (for V(F) = F). However, a point mass at 1 is only one of many solutions. For example, choose A > 1, r ≤ (A + 1)−1, and set Pr(X = A−1) = rA, Pr(X = 1) = 1 − r(A + 1), and Pr(X = A) = r. Then we can easily verify that V(X) = X.

A method for generating a large class of fixed points for V is to choose XF,V(X) ∼ V(F), with X and V(X) independent, and define W to be a random variable distributed as

Then

Thus,

Choose XF, with F absolutely continuous with pdf F and finite mean μ. From (2.32), we compute

Thus, (2.35) gives a large class of distributions satisfying V(W) = W.

Following are a few choices of f with corresponding choices of fW, computed from (2.35):

Call

the class of distributions satisfying V(F) = F. One property of

is that if

, with X1 and X2 independent, then the distribution of X1 X2 also belongs to

. This holds since

Another property of

is closure under mixtures. If

for all α and X ∼ ∫Fα dPα, then it follows from EXα ≡ 1 and (2.18) that

I pose the problem of finding a general form for all distributions in

.

2.6. A Subadditivity Property

For F a distribution on [0,∞) with F(0) < 1,μ(p) = EXp < ∞, and density f, define Tp(F) to be the distribution with density

We allow −∞ < p < ∞ in (2.36).

Lemma 2.3 derives a subadditivity property for Tp,p > 0, and a superadditivity property for Tp,p < 0.

Lemma 2.3: Let X1,…,Xn be independent, with EXip < ∞, i = 1,…,n. Let Tp(X1),…,Tp(Xn) denote independent random variables, distributed as Tp(X1),…,Tp(Xn). Then

Proof: We will prove (i) (p > 0); the other case follows by obvious modification. We only need to consider n = 2; the case for general n then follows by induction.

Arguing similarly as in Lemma 2.2, we can show that

where

The symbol ∝ denoting proportionality. Now

so that

a quantity decreasing in y1 for p > 0. Thus, (2.38) implies that Y1 is smaller than Tp(X1) in the sense of monotone likelihood ratio ordering and, thus, under stochastic ordering. Therefore, we have

Next,

so that

which is decreasing in y2. Thus,

The result now follows from (2.37)–(2.40). █

3. INEQUALITIES FOR NBUE DISTRIBUTIONS

In this section we derive a property of the renewal length transformation and use it to derive new inequalities for NBUE (new better than used) distributions.

If F is an absolutely continuous distribution on (0,∞), then the record value process corresponding to F is a nonhomogeneous Poisson process, with intensity function equal to the failure rate, h, of F. For {Xi} i.i.d. with distribution F, the first record value, S1, equals X1; the second record value, S2 is defined by

where J2 = min{i : Xi > X1}. More generally,

where Jm = min{i : Xi > XJm−1}.

For the second record value, S2,

where N(t) = {#Si ∈ [0,t]},H(t) = −ln(F(t)) = EN(t). See [38] for a discussion of record values.

Define

Note that K is increasing and

Let X* ∼ G(X). Then

where T(X) is the renewal length transformation. Of course, (3.3) is a direct consequence of (2.5).

Recall [6, p.159] that F is defined to be NBUE if EX < ∞, and

for all t such that F(t) > 0. Thus, F is NBUE if and only if

From (3.2)–(3.4) and the fact that K is increasing, Lemma 3.1 follows.

Lemma 3.1: If F is an absolutely continuous NBUE distribution, then

.

Employing Lemma 3.1, we obtain an inequality for NBUE distributions. Some consequences of this inequality are discussed in Section 3.2.

Theorem 3.2: If XF is NBUE, then

Proof: First consider the case of absolutely continuous F. From Lemma 3.1 and equations (2.7) and (3.1),

For {t : F(t) > 0}, divide both sides by F(t) to obtain (recalling (2.9))

Thus,

This proves (3.5) for F absolutely continuous. For the general case, we use a smoothing argument. If XF is NBUE but not absolutely continuous, let U(ε) be uniformly distributed on (0,ε) and independent of X and define

Since the NBUE class is closed under convolutions [6, p.187], X(ε) has an absolutely continuous NBUE distribution.

Applying (3.7) to X(ε), letting ε → 0, and noting that Pr(X(ε) > t) → Pr(Xt),μ(ε) → μ, and E(X(ε)|X(ε) > t) → E(X|Xt), (3.5) follows. █

Comments and Additions

1. The upper bound, min(e−(t(/μ)−1),1) (Theorem 3.2), is the best possible global bound of the form min(bect,1), even within the considerably more restrictive subclass of increasing failure rate (IFR) distributions. The exponential distribution with mean μ shows that a bound in this class must have c ≤ μ−1. The point mass at {μ} has Pr(X ≥ μ) = 1, ensuring that if c = μ−1, then b must be at least equal to e.

2. If, in addition (to NBUE), F is absolutely continuous with failure rate h satisfying

then L(t) ≥ t + a−1. Applying (3.5) gives

A consequence of (3.9) is that

Thus, if a is close in ratio to μ−1, then F is approximately exponential. This result complements a result of Daley [15], who obtained a bound on distance to exponentiality based on the first two moments of an NBUE distribution.

3. The bound (3.5) improves upon Markov's inequality based on knowledge of μ. This is not surprising, as Markov's inequality does not make any further assumptions about F. For F NBUE,

4. Let {Xt,t ≥ 0} be an ergodic Markov chain and let A be a subset of the state space. Denote by Tj(A) the first passage from jεAc to A. Then if

Tj(A) is easily seen to be NBUE. Note that Tj(A) does not necessarily satisfy any stronger aging property (such as NBU or IFR average (IFRA)). Thus, if ETj(A) is known, then (3.5) provides a simple upper bound for Pr(Tj(A) ≥ t).

4. ITERATING THE STATIONARY RENEWAL TRANSFORMATION

Let F be a distribution on [0,∞) with F(0) < 1 and μn = EF Xn < ∞, for all n = 1,2,…. Define G1 to be the stationary renewal distribution corresponding to F, and recursively define Gn+1 to be the stationary renewal distribution corresponding to Gn,n = 1,2,…. Next, define

Since F exponential implies G exponential, we would anticipate that under general conditions, Zn would converge to an exponential distribution with mean 1 for the fixed point and with mean 1 for the transformation. Harkness and Shataram [27] obtained sufficient conditions for convergence. Below, we obtain the same sufficient conditions by a different argument, which leads to an error bound us well.

Lemma 4.1 shows that a distribution of the form Gn(X) is a mixture of distributions of the form mn(t), where mn(t) is distributed as the minimum of n i.i.d. uniform(0,t) random variables. For n large, the distribution of mn(t) is approximately exponential and, thus, Gn(X) is approximately a mixture of exponentials. A mixture of exponentials, in turn, is approximately a pure exponential if the mixing distribution has a small coefficient of variation. These ideas are quantified in Corollary 4.2.

Lemma 4.1:

, where Tn(X) and mn are independent, mn is distributed as the minimum of n i.i.d. uniform(0,1) random variables, and Tn(X) has distribution given by (2.36), with p = n.

Proof: By (2.4), the result is true for n = 1. Assume that it is true for n. By (2.4),

The pdf of G(mn) is given by

which is the pdf of mn+1. Thus, from (4.1),

and the induction argument is complete. █

Corollary 4.2: Let

be exponentially distributed with mean 1. Then

Thus, if lim(μn+12n μn+2) = 1, then Zn /EZn converges in distribution to

.

Proof: By Lemma 4.1, we represent

By the triangle inequality,

As mn is IFR, it follows from Brown [10, p.13] that

It is straightforward to show that if Z is independent of X and Y, then

thus, from (4.3),

Next,

, is a mixture of exponential distributions and thus is decreasing failure rate (DFR). It follows from Brown [8, p.422] that

The result now follows from (4.3)–(4.5). █

5. POISSON ESTIMATION

5.1. Single Parameter Case

If X is Poisson distributed with parameter λ, then

This follows since

From (2.2) and (5.1),

Next,

Thus, based on a single observation, X, we can unbiasedly estimate Cov(X, K(X)). It follows that if X1,…,Xn is a random sample, with λ unknown, then

is the minimum variance unbiased estimator of Cov(X,K(X)). Of course,

is binomially distributed with parameters (k,1/n).

5.2. Multiparameter Case

Next, suppose that X1,…,Xn are independent, with Xi Poisson distributed with parameter λi; the λi's are not assumed equal. Consider the problem of the simultaneous estimation of

, based on the data X1,…,Xn under expected total mean square error loss

We can take an arbitrary estimator of λ, and put it into the form

Then

The savings in risk in employing

over that of X is given by

The extraordinarily insightful approach initiated by Stein [40], is to unbiasedly estimate the savings in risk (5.6) and then, by inspired choice of g, demonstrate that the unbiased estimator is nonnegative with probability 1 and positive with positive probability. Its expectation will then be positive for all λ, and X + g will dominate X, proving X inadmissible. Stein's approach has worked successfully for several families of distributions [29,40,41]. Its elegant application to the Poisson case is due to Peng [32]. From our point of view, the key identities for the Poisson case are related to properties of the size-biasing transformation.

Recall (5.3). This identity is easily generalized, via (2.12), to

where δi is a vector of n components with δi(i) = 1,δi(j) = 0,ji.

From (5.6) and (5.7),

We now have an unbiased estimator of the savings in risk. For n ≥ 3, Peng discovered a choice of g for which the quantity in brackets is nonnegative with probability 1 and positive with positive probability. Thus, for n ≥ 3, X is inadmissible under total mean square error loss. Furthermore, for some λ, the savings in risk can be substantial. The surprising aspect, which goes back to Stein's famous 1955 paper [39], is that for each parameter λi, Peng's estimator uses {Xj,ji} in computing

, as well as Xi. As {Xj,ji} is independent of Xi and has a distribution that does not depend on λi, it is counterintuitive that it should be relevant at all. This issue has received a great deal of deep thought. See, for example, Efron and Morris [19].

6. EXTREME VALUES

Let X1,…,Xn,… be an i.i.d. sequence and define Mn = max(X1,…,Xn),n = 1,2,…. For a Borel set A with Pr(XA) > 0, define

We are interested in properties of the sequence {cn}, which we will refer to as a max sequence. Since

we see that {cn} is supermultiplicative. Another property, derived in Brown [11, p.855], is that

It follows from (6.3) that

and

Setting c0 = 1, it follows from (6.4) that

Thus, if ci /ci−1 converges, its limit must equal c.

Using a property of the renewal length transformation (Lemma 6.1), we prove that

It follows from (6.6) and (6.7) that

One consequence of (6.8) is that if c(A) < 1, then {cn} is ultimately decreasing. More specifically, define

Then nN, implies that cn < cn−1. A sufficient condition for c < 1 is that for some t, F(t) > 0 and F(A ∩ (t,∞)) = 0. We now derive (6.7).

Lemma 6.1: If c1 = P(XA) > 0, then (6.7) holds.

Proof: First consider the case in which X is uniformly distributed on (0,1). Let Mn = max(X1,…,Xn). Then

Thus, T(Mn) = Mn+1, and from (2.19),

Since ET(X) ≥ EX for any random variable X, it follows from (6.10) that

Next,

The result for X uniform(0,1) thus follows from (6.11) and (6.12).

For the general case, define

Then Xr(U), where U ∼ uniform(0,1), and Mn = max(X1,…,Xn) ∼ r(U(n)), where U(n) is the maximum of n i.i.d. uniform(0,1) random variables. Consequently,

and the general result follows from the uniform case. █

Remark: We note that the sequence, {cn+1 /cn} need not be monotone. For example, if X ∼ Uniform(0,1) and A = (0.4,0.6) ∪ (0.99,1), then c3 /c2 > c4 /c3 < c5 /c4.

Remark: I pose the following question. What are necessary and sufficient conditions for a sequence, {cn}, to be a max sequence?

7. LOWER BOUNDS FOR IFR SURVIVAL

In this section we study the problem of lower bounding the survival function of an IFR distribution, in terms of its first two moments. The problem was suggested to me by Persi Diaconis (personal communication). It arises in the study of ergodicity of birth and death processes (see, e.g., Diaconis and Fill [17]).

Barlow and Marshall [5, p.1262] derived the best lower bound for an IFR distribution, given the first two moments. This bound is not an explicit function of the first two moments, but is the solution of equations depending on these moments, which are solved numerically. We derive an explicit, simple, but nonoptimal lower bound.

Recall that an IFR distribution on (0,∞) is absolutely continuous, except perhaps for an atom at the right-hand endpoint of its support. For simplicity, we will assume that there is no such atom. The IFR property requires that the failure rate, h(t) = f (t)/F(t), be nondecreasing and, equivalently, that the integrated hazard, H(t), be convex.

A simple method for lower bounding F(t) is to use Jensen's inequality,

so that if EY is known, and EH(Y) can be upper bounded (say by b), then by convexity,

Similarly, if EY < EZ are known and H(EY) ≤ b and H(EZ) ≤ c with b < c are known, then we can upper bound H for t ∈ [EY,EZ], using convexity, and thus lower bound F(t).

We will use G(X),X, and T(X) as our choices of random variables. The means are of course known. The mean of H(X) equals one, and we will show in Lemma 7.1 that

The use of the three distributions along with (7.1), (7.2), and convexity of H lead to

and then for F,

Barlow and Proschan [6, p.112] derived lower bounds for F(t) using the theory of convex ordering. Their bounds, based on the first two moments, give

The new bound (7.3) offers improvement over (7.5) in that it has a larger domain

and gives smaller upper bounds for H(t) in their common domain. The (7.5)-related bounds, using higher moments, still remain useful.

In the above-mentioned birth and death process application, (7.3) can be used to lower bound the separation distance between a birth and death process at time t (starting in state 0) and the steady-state distribution. This separation distance is a convolution of exponential distributions and is thus IFR.

We now derive (7.1) and (7.2). They hold for general distributions; however, (7.3) does not.

Lemma 7.1: Let F be an absolutely continuous distribution with EF X2 < ∞. Let X* ∼ G(F) and TT(F). Then

Proof: Recall the record value process discussed in Section 3. Now

Furthermore,

where m(t) = E(Xt|X > t). A well-known and useful result [7] is that Em2(X) = σ2. It thus follows from (7.6) and (7.7) that

Next, we observe that

Thus, (7.1) holds. █

The fact that

follows from (2.5), as H(T) − H(X*) is the total hazard accumulated between X* and the next event, T.

8. HYPOTHESIS TESTING

The size-biasing transformation provides an interesting perspective for simple versus simple hypothesis testing. Suppose that X is a random variable, vector, or function, with density f0 under H0, and density f1 under H1, both with respect to the σ-finite measure μ. Assume that f0 and f1 are mutually absolutely continuous. Define

the likelihood ratio statistic. Define Y to be a random variable distributed as the distribution of λ(X) under f0. Then T(Y) is the distribution of λ(X) under f1, and V(Y) is the distribution of f0(X)/f1(X) = λ−1 under f1. Thus, λ under f0 and λ−1 under f1 are inverse distributional pairs, as defined in Section 2.4. These assertions hold since EY = 1 and

Thus, by (2.2) and (8.2), T(Y) is the distribution of λ under f1.

Define F to be the survival function of Y and thus of λ(X) under f0. Then

the type 1 error of the Neyman–Pearson test that rejects H0 for λ > c and otherwise accepts H0. Furthermore, by (2.7),

the power of the above test where G is the stationary renewal distribution corresponding to F. It follows from (8.3) that

Thus, the distribution of λ under H1 is determined from its distribution under H0 by (8.4).

Define

which we call the excess at c. We see from (8.5) that the excess is an absolutely continuous convex function. This is true, despite the fact that the distribution of λ might be purely discrete.

If the distribution of λ under H0 belongs to

(see Section 2.5), then an interesting symmetry property holds. The distribution of λ−1 under H1 will coincide with that of λ under H0. If statistician I rejects H0 for λ > c and accepts H0 otherwise and statistician II rejects H1 for λ−1 > c and otherwise accepts H1, then the two statisticians will have (each according to his own perspective) the same type 1 and type 2 errors.

9. SUBMULTIPLICATIVE FUNCTIONS AND NBU DISTRIBUTIONS

A function g on [0,∞) is defined to be submultiplicative if

Submultiplicative functions arise in the study of Banach algebras, semigroups, and other areas of functional analysis (see, for example, several discussions in Hille and Phillips [28]). Phillips [33] derived the bounds (9.1) and (9.2). Let g be a nonnegative, integrable, submultiplicative function on [0,∞) with

Then

Suppose that we restrict g further by requiring that g(t) = F(t), the survival function of a distribution on [0,∞). The submultiplicative property is then equivalent to the NBU condition

for all s ≥ 0 and t ≥ 0. Daley (personal communication) suggested exploring the connection between NBU distributions and the theory of submultiplicative functions. That suggestion is the subject of this section.

It turns out that (9.1) provides a neat upper bound,

the square of the Markov inequality bound F(t) ≤ (μ/t). It does not appear to be known in the NBU literature. On the other hand, (9.2) is not very good for NBU distributions, as our bound, (3.5), for the larger NBUE class improves both the exponent (from 2/μe to 1/μ) and the multiplying constant (from e2 to e). Of course, Phillips was concerned with general submultiplicative functions, whereas we work with survival functions.

Inequality (9.3) does not extend to NBUE distributions. For example, let X = 1 or 3, each with probability ½. Then F is NBUE, but not NBU. Moreover,

Combining (3.5) and (9.3), it follows that for F NBU,

The bound (9.3) is better than (3.5) for t < cμ, with c ≈ 3.51286; however, the situation reverses for t > cμ.

We now study the problem of improving (9.4). In Corollary (9.2), we show that for t > μ,

where

Our approach is to use the Section 3 idea that

where X is independent of Y1 and Y2. In Section 3 we chose Y1G(X) and Y2S2. For our current application, we employ Y1T(X) and Y2S2, the second record value. Now for F absolutely continuous,

where S3 is the third record value. Note that

For Y2 = T, define T2 to be a random variable distributed as X|X > T, where X and T are independent. We find

where L(t) is defined in (2.8) and h is the failure rate function of X.

We require a lower bound for FT2(t), which is derived in Lemma 9.1.

Lemma 9.1: Let F be an absolutely continuous NBU distribution on (0,∞). Then

Proof: Defining

, we find that

as can be verified by differentiation. Now

where we used the superaddivity property of H(H(t) ≥ H(x) + H(tx)), which is equivalent to the submultiplicativity of F. Thus, from (9.12),

Thus, (9.11)–(9.13), prove (9.10). █

Using Lemma 9.1,

, (9.8) and (9.9), we now derive (9.5). A probabilistic proof of Phillips' bound (9.1), for the case of survival functions, is also provided.

Corollary 9.2: If F is NBU with mean μ, then for t ≥ μ,

where B(t) is given by (9.6). Moreover,

where

is the unique solution in (1,∞) of the equation

Proof: Let Ut be uniformly distributed on [0,t]. By the submulticativity of F,

Now, F(Ut) and F(tUt) are identically distributed and negatively correlated, as F(Ut) is decreasing in t while F(tUt) is increasing. Thus,

To prove the result for F(t) = Pr(Xt), we use the same smoothing argument that we employed in the proof of Theorem 3.2.

Now assume that F is absolutely continuous. By (9.8), (9.9),

, and Lemma 9.1,

For {t|F(t) > 0}, divide both sides of (9.20) by F(t) and rearrange terms, obtaining

For t > μ, the inequality in (9.21) is satisfied if and only if H(t) ≥ B(t), given by (9.6), as B(t) is the unique positive root of

and s(x) < 0 for 0 ≤ x < B(t), whereas s(x) ≥ 0 for xB(t). For the general case (F not necessarily absolutely continuous) we use the smoothing argument of Theorem 3.2. Thus, (9.14) has been proved. As for (9.15), we note that

from which the desired conclusion follows. The value for

in (9.16) and (9.17) was found numerically. █

Remark: If X is NBU, then each interarrival time, SnSn−1, for the record value process is stochastically smaller than X and thus stochastically smaller than the corresponding interarrival time for the renewal process corresponding to X. It follows that

where EZ(t) is the expected forward recurrence time at t. It thus follows that

Comparing (9.22) to (3.5), for distributions that deteriorate with age, we would usually expect that for most t,

so that (9.22) offers potential improvement over (3.5).

10. RATIO MOMENT INEQUALITIES

Lemma 2.3 discussed a subadditivity property for Tp, p > 0, and a corresponding superadditivity property p > 0. It follows that if g is an increasing function, then

with a reverse inequality for p < 0. In (10.1), X1,…,Xn are independent, as are Tp(X1),…,Tp(Xn).

We will only discuss the choice g(x) = x. Define

It follows from (10.1) that

For the choice p = −1, (10.3) implies

equivalently,

where

.

Defining k(a1,…,an) = ([sum ]ai−1)−1, (10.4) is equivalent to

Jensen's inequality for concave functions is suggested by (10.6), with the assumption that k is concave as a function from R+n to R+. To prove (10.4) inductively, we only need to consider the case n = 2. By examination of the Hessian corresponding to k(x,y), we confirm that k is concave. Therefore, (10.4) and (10.5) hold, without independence. I have not seen this inequality in the literature.

The choice k = −2 leads to

Unlike (10.4), I do not know how to verify (10.7) directly.

Thus, I am suggesting that (10.1), with freedom of choice for g and p, could be a useful mechanism for generating interesting inequalities.

11. LARGE DEVIATIONS

In this section we discuss Chernoff's theorem [4, p.6; 13, p.49]. The renewal length transformation is already implicit in the existing proofs, so we merely make it a bit more explicit here. We will not endeavor to find the weakest possible conditions. First we derive a preliminary lemma.

Lemma 11.1: Let W be a positive random variable, with T = T(W) its renewal length transformation. Then the following hold:

(i) Pr(WA) ≤ (EW)Pr(TA)eE(log T |TA)

(ii) If, in addition, E log T = 0, then

Proof:

(i)

where we used convexity of −ln x as well as (2.2).

(ii) If E log T = 0, then

Next, suppose that {Xi} is i.i.d. with EX < 0,Pr(X > 0) > 0, and M(t) = EetX < ∞ for all t. The condition EX < 0 implies M′(0) < 0;Pr(X > 0) > 0 ensures that M(t) → ∞ as t → ∞. As M is strictly convex, decreasing at t = 0 and eventually increasing, it follows that there exists a unique τ > 0, with M′(τ) = 0. At t = τ,M has a global minimum, ρ = M(τ) < M(0) = 1.

For applying Lemma 11.1, set

so that

Since W = ΠeτXi, it follows from Lemma 2.2 and (2.13) that

, where each Yi has pdf

From (11.2), EY = M′(τ) = 0 and Var Y = ρ−1M′′(τ). It follows that (recall T = T(W))

and

Next, set A = [1,∞),αn = Pr(Xn ≥ 0) (where

), and βn = Pr(Yn ≥ 0). By Lemma 11.1(ii),

It follows from (11.3) that

By the central limit theorem, βn → ½; thus,

Finally, Markov's inequality yields

This ensures that

We thus conclude from (11.4) and (11.5) that

the conclusion of Chernoff's theorem.

12. RENEWAL REWARD PROCESSES

Consider a renewal reward process [37, pp.132,133]. This is generated by an i.i.d. sequence {Xi,i ≥ 1}, with XF and F(0) = 0. Corresponding to each Xi there is a reward Ri; Ri is allowed to depend on Xi and other random quantities, but the pairs {(Xi,Ri),i ≥ 1} are assumed i.i.d. Define

, and

If EX and ER are both finite, then

almost surely and in expectation [37, p.133]. The quantity ER/EX is referred to as the long-run average reward rate.

Next, consider the stationary version of the renewal reward process. It can be constructed by starting a renewal interval at −Y, with YG, the stationary renewal distribution corresponding to F; if Y = y, then the length of the interval covering {0} is constructed to be distributed as X|X > y. By (2.5), the resulting unconditional distribution of the length of the interval covering {0} is that of T(F). Finally, given T = t, the reward

for this interval covering {0} is constructed to have the conditional distribution of R given X = t, so that

Now, by (2.2) and (12.1),

Thus, the observed reward rate over the interval covering a fixed point in the stationary process is an unbiased estimator of the long-term average reward rate.

Next, suppose that the experimenter observes the interval covering {0}, as well as the next n − 1 ordinary intervals. For this person, the observed reward rate is given by

In Lemma 12.1 we show that the observed reward rate, (12.3), is once again an unbiased estimator of the long-term average reward rate (ER/EX).

Lemma 12.1: The expected value of the random quantity in (12.3) equals ER/EX.

Proof: Let Y = (T,X2,…,Xn), where TT(X) and XiF,i = 2,…,n, and T and X2,…,Xn are independent. The pdf of Y is given by

It follows from (12.4) that

where X = (X1,…,Xn) are i.i.d. random variables distributed as F and g satisfies E|X1 g(X)| < ∞.

Next, suppose that g is symmetric. Then

From (12.5) and (12.6) we have

for all symmetric g with E|X1 g(X)| < ∞.

Next, choose

It follows from (12.7) that

Remark: It might have occurred to the reader to prove Lemma 12.1 by choosing the interarrival time distribution to be distributed as

. Then

(2.17), and

, so that ER(Sn)/ESn = ER/EX, and the result would then follow from (12.2). The problem with this argument is that one would need to prove that (12.1) implies

which is not obvious to me.

13. LAPLACE TRANSFORMS

For X a random variable on [0,∞) with Laplace transform Ł, we can interpret Ł to be the survival function of

, where

are independent and

is exponential with mean 1:

Similarly, the Laplace transform of T(X) is the survival function of

, where

and is independent of

:

We see from (13.1) and (13.2) that as a survival function, ŁX is the stationary renewal distribution corresponding to ŁT. This can also be seen from (2.14) and (2.25) as

It is shown in Brown [8, p.422] that if

is the stationary renewal distribution corresponding to F, then

Applying (13.3) and (13.4) to Y = V and recalling (2.23) and (2.30), we obtain

where

Thus, if EX−1 ≈ (EX)−1, then the Laplace transform form of X is close to that of a one point distribution at {μ} and is also close to the Laplace transform of T(X).

Next, consider a completely monotone function g on (0,∞), with g(0) < ∞. Employing Bernstein's theorem as in Brown [9, p.1241], it follows from (13.5), and (13.6) that

and

Thus, we can interpret c2(X) as a scale-invariant measure of variability. When c2(X) is small, X is close to a point distribution at μ. This proximity is reflected in the behavior of expectations of completely monotone functions of X.

Note that if g is completely monotone, then g is maximized at zero and decreases rapidly as t increases. The mean of g(X) lies in (0,g(0)]. The quantity c2(X) ∈ [0,1) provides an upper bound to the fraction of g(0) that Eg(X) can realize. It is appropriate to base a bound on a parameter that measures the relative concentration of X near zero (where g is maximized), rather than one based on the tail behavior that is largely irrelevant. Thus, an harmonic mean-based bound, rather than a variance-based bound does make good sense here. Similarly,

thus, the Laplace transform of V(X), considered as a survival function, is the stationary renewal distribution corresponding to the Laplace transform of X−1. The analogs of (13.8) and (13.9) are

where

Thus, the dual measures of variability discussed in Section 2.4 enter into the bounds for expectations of completely monotone functions of X and V(X).

Acknowledgments

I am indebted to Daryl Daley for suggesting the Section 9 problem and to Persi Diaconis for suggesting the problem discussed in Section 7. I thank David Aldous, Albert Marshall, and two referees for bringing relevant references to my attention. I am grateful to the National Security Agency for their support of this research.

References

REFERENCES

Abate, J. & Whitt, W. (1996). An operational calculus for probability distributions via Laplace transforms. Advances in Applied Probability 28: 75113.Google Scholar
Aldous, D. & Pitman, J. (1998). Tree-valued Markov chains derived from Galton-Watson processes. Annales de l'Instute Henri Poincaré: Probabilités et statistiques 34: 637686.Google Scholar
Athreya, K.B. (2000). Change of measures for Markov chains and the L log L theorem for branching processes. Bernoulli 6: 323338.Google Scholar
Bahadur, R.R. (1971). Some limit theorems in statistics. Philadelphia: SIAM.
Barlow, R.E. & Marshall, A.W. (1964). Bounds for distributions with monotone hazard rate, II. Annals of Mathematical Statistics 35: 12581274.Google Scholar
Barlow, R.E. & Proschan, F. (1975). Statistical theory of reliability and life testing: Probability models. New York: Holt, Rhinehart and Winston.
Bremaud, P. (1981). Point processes and queues: Martingale dynamics. New York: Springer-Verlag.
Brown, M. (1983). Approximating IMRL distributions by exponential distributions with applications to first passage times. Annals of Probability 11: 419427.Google Scholar
Brown, M. (1985). A measure of variability based on the harmonic mean and its use in approximations. Annals of Statistics 13: 12391243.Google Scholar
Brown, M. (1987). Inequalities for distributions with increasing failure rate. In A.E. Gelfand, Contributions to the theory and applications of statistics. A volume in honor of Herbert Solomon. New York: Academic Press, pp. 317.
Brown, M. (1999). Interlacing eigenvalues in time reversible Markov chains. Mathematics of Operations Research 24: 847864.Google Scholar
Brown, M. (2001). Exploiting the waiting time paradox: Applications of the renewal length transformation. City College, CUNY Technical Report, City College, New York.
Chernoff, H. (1972). Sequential analysis and optimal design. Philadelphia: SIAM.
Cox, D.R. & Lewis, P.A.W. (1966). The statistical analysis of series of events. New York: Wiley.
Daley, D.J. (1988). Tight bounds on the exponential approximation of some aging distributions. Annals of Probability 16: 414423.Google Scholar
Dembo, A. & Rinott, Y. (1996). Some examples of normal approximations by Stein's method. In Random Discrete Structures, IMA Vol. Math. 76. New York: Springer-Verlag.
Diaconis, P. & Fill, J.A. (1990). Strong stationary times via a new form of duality. Annals of Probability 18: 14831522.Google Scholar
Donnelly, P. (1991). The heaps process, libraries and size-biased permutations. Journal of Applied Probability 28: 321335.Google Scholar
Efron, B. & Morris, C. (1973). Combining possibly related estimation problems (with discussion). Journal of the Royal Statistical Society B 35: 379421.Google Scholar
Ethier, S.N. (1990). The distribution of the frequencies of age-ordered alleles in a diffusion model. Advances in Applied Probability 22: 519532.Google Scholar
Ethier, S.N. (1992). Equivalence of two descriptions of the age of alleles. Journal of Applied Probability 29: 185189.Google Scholar
Feller, W. (1971). An introduction to probability theory and its applications, Vol. II, 2nd ed. New York: Wiley.
Geiger, J. (1996). Size-biased and conditioned random splitting trees. Stochastic Processes and their Applications 2: 187207.Google Scholar
Gnedin, A.V. (1998). On convergence and extensions of size-biased permutations. Journal of Applied Probability 35: 642650.Google Scholar
Goldstein, L. & Reinart, G. (1997). Stein's method and the zero bias transformation with application to simple random sampling. Annals of Applied Probability 4: 935952.Google Scholar
Goldstein, L. & Rinott, Y. (1996). Multivariate normal approximations by Stein's method and size bias couplings. Journal of Applied Probability 33: 117.Google Scholar
Harkness, W.L. & Shataram, R. (1969). Convergence of a sequence of transformations of distribution functions. Pacific Journal of Mathematics 31: 403415.Google Scholar
Hille, E. & Phillips, R.S. (1957). Functional analysis and semi-groups. American Mathematical Society Colloquium Publications, Vol. XXXI. Providence, RI: American Mathematical Society.
Hudson, H.M. (1978). A natural identity for exponential families with applications in multiparameter estimation. Annals of Statistics 6: 473484.Google Scholar
Lyons, R., Pemantle, R., & Peres, Y. (1995). Conceptual proofs of L log L criteria for mean behavior of branching processes. Annals of Probability 23: 11251138.Google Scholar
Olofsson, P. (1998). The L log L condition for general branching processes. Journal of Applied Probability 3: 537544.Google Scholar
Peng, J.C. (1975). Simultaneous estimation of the parameters of independent Poisson distributions. Stanford University Technical Report 78, Stanford, CA.
Phillips, R.S. (1954). An inversion formula for Laplace transforms and semi-groups of linear operators. Annals of Mathematics 59: 325356.Google Scholar
Pitman, J. (1996). Random discrete distributions invariant under size-biased permutations. Advances in Applied Probability 28: 525539.Google Scholar
Reinert, G. (1998). Stein's method and application to empirical measure. Sociedad Mathematica Mexicana 14: 65120.Google Scholar
Rinott, Y. & Rotar, V. (2000). Normal approximations by Stein's method. Decisions in Economics and Finance 23: 1529.Google Scholar
Ross, S.M. (1996). Stochastic processes, 2nd ed. New York: Wiley.
Shorrock, R.W. (1972). A limit theorem for inter-record times. Journal of Applied Probability 9: 219223.Google Scholar
Stein, C. (1955). Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In Proceedings of the third Berkeley symposium on mathematical statistics and probability. Berkeley: University of California Press, pp. 197209.
Stein, C. (1973). Estimation of the mean of a multivariate normal distribution. Stanford University Technical Report 48, Stanford, CA.
Stein, C. (1974). Estimation of the mean of a multivariate normal distribution: I. Estimation of the means. Stanford University Technical Report 63, Stanford, CA.
Thorisson, H. (2000). Coupling, stationarity and regeneration. New York: Springer-Verlag.