Published online by Cambridge University Press: 06 March 2006
We consider the transformation T that takes a distribution F into the distribution of the length of the interval covering a fixed point in the stationary renewal process corresponding to F. This transformation has been referred to as size-biasing, length-biasing, the renewal length transformation, and the stationary lifetime operator. We review and develop properties of this transformation and apply it to diverse areas.
The famous “waiting time paradox” [22, p.12] arises because the length of the interval covering a fixed point in a stationary renewal process tends to be longer than a typical interarrival time. If the interarrival time distribution has the probability density function (pdf) f, then the length of the interval covering a fixed point has pdf xf (x)/EX. This renewal length distribution is larger than f under monotone likelihood ratio and is thus stochastically larger.
In this article we consider the transformation, T, which takes a distribution on [0,∞), with finite positive mean, into its renewal length distribution,
We will refer to T as the size-biasing transformation. Other terminology that has been used is length biasing [14, p.61; 42, p.70], stationary lifetime operator [1], and renewal length transformation [12]. Size biasing appears to be the most widely used term for this transformation.
In Section 2, a few simple properties of T are reviewed and developed, and in the remainder of the article, they are applied to a number of application areas. These areas were chosen because size-biasing methodology does not appear to have previously been explicitly employed. In some cases, now results are derived (Sections 3, 4, 7, 9, 12, and 13), whereas in others the transformation merely provides perspective for more profound ideas (Sections 5, 8, and 11).
I will not give examples in areas in which size biasing has previously been applied. These include branching and population processes [2,3,20,21,23,30,31], Stein's error bound methodology [16,25,35,36], and size-biased permutations [18,24,34].
If F is the cumulative distribution function (cdf) of a distribution on [0,∞), with F(0) < 1, finite mean μ, and density f with respect to a σ-finite measure λ, then T(F) is defined to be the distribution with pdf
Note that the Laplace transform of T(F) [1] is given by
and that [42, p.264]
for all R with EF|XR(X)| < ∞. The distribution T(F) is the unique distribution satisfying (2.2), as can be seen by setting R equal to indicator functions.
Recall that the stationary renewal distribution corresponding to F is the distribution with pdf
We will use the notation, G(F), or sometimes G(X), for the stationary renewal distribution corresponding to X ∼ F. Lemma 2.1 derives two useful relationships between T(F) and G(F). Relationship (2.4) appears in [14, p.62].
Lemma 2.1: Let X ∼ F be a random variable on [0,∞) with F(0) < 1 and finite mean μ. Let T(X) denote a random variable with distribution T(F), and let G(X) be a random variable with distribution G(F). Then
where U is uniformly distributed on (0,1) and independent of T(X). Furthermore,
where G(X) is independent of X.
Proof: Let (W1,W2) be defined to have pdf (relative to [ell ] × λ, where [ell ] is Lebesque measure)
We find that
and thus that W1 ∼ G(F). Also,
and thus W2 ∼ T(F). Next,
thus the conditional distribution of W1|W2, given W2 = w2, is uniform(0,1) for all w2. Thus, we have constructed W1 ∼ G(F) and W2 ∼ T(F) with W1 /W2 ∼ U, independently of W2. It follows that the marginal distribution of G(X) is that of the product of two independent variables: one distributed as T(X), and the other, uniform(0,1). Thus, (2.4) holds.
Next,
Thus, the conditional distribution of W2 given W1 = w1 coincides with that of X given X > w1. Therefore, (2.5) holds. █
By integrating the joint pdf (2.6) over {(w1,w2),w2 > min(w1,t)}, we find that
where G is the survival function of the stationary renewal distribution. Defining
and noting that
it follows that
We now discuss methods for finding T(F). Lemma 2.2 is found in [42, p.262].
Lemma 2.2: Let
have pdf f (x), with respect to a σ-finite measure w on Rn, and let g,R+n → R+, satisfy
Define Y to be a random vector with pdf, g(x) f (x)/μg. Then
Proof:
for all K with E|K(X)g(X)| < ∞. In particular,
so by (2.2),
.
The fact that (2.12) is more general than required for the conclusion will be helpful in later sections. █
We now give some examples of computation of T(F).
Example 1: Products. If X1,…,Xn are independent with
. Then by Lemma 2.2,
where T(X1),…,T(Xn) are independent.
It follows from (2.4) and (2.13) that if X and Y are independent, then
Example 2: [26, p.9]:
Sums: If
, it follows from Lemma 2.2 that
where
We can construct
in the following manner. First, let I be a random variable with Pr(I = j) = μj /[sum ] μi, j = 1,…,n. Given I = j, define Yj to be distributed as T(Xj). Finally, given I = j and Yj = yj, construct the conditional distribution of Yj = {Yk,k ≠ j} to be that of Xj given Xj = yj. Under this construction, Y has pdf (2.15) and [sum ] Yi is distributed as
where I is as described earlier, given I = j, T(XI) is distributed as T(Xj), and given I = j and T(Xj) = yj, S−I is distributed as [sum ]k≠j Xk given Xj = yj.
In the case of independence, it follows that given I = j, S−I is distributed as the unconditional distribution of S−j, independently of T(Xj).
In the independent and identically distributed (i.i.d.) case
where T(X) and Sn−1 are independent.
Mixtures: If F is a mixture,
then
where μ(α) is the mean corresponding to Fα and μ = ∫μ(α) dP(α) is assumed finite.
Random Sums: If
where {Xi} is i.i.d., N is independent of {Xi}, and EY = ENEX < ∞, then
where ST(N)−1 and T(X) are independent. The result follows from (2.17) and the discussion of mixtures.
Restrictions: If X ∼ F, let XA denote a random variable distributed as X, given X ∈ A, where F(A) > 0. Then
The proof is straightforward.
Let X ∼ F and assume that F(0) = 0. Let T(X) ∼ T(F). Define
the total variation distance between F and T(F). There are several equivalent expressions for D. First,
The scaled L1 norm, E|X − μ|/μ, is a measure of variability, which when small suggests T(F) ≈ F. Also note that
by (2.4) and (2.7). Finally,
Thus, D equals the probability that X is smaller than an independent uniform(0,μ) random variable.
Once again, let F be a distribution on (0,∞) with F(0) = 0 and finite mean. Define V(F) to be the distribution of 1/T(X), where T(X) ∼ T(F). Note that
Also,
It follows from (2.2) and (2.24) that
and
Thus, V2 = I (the identity operator) and V = V−1, as operators on probability distributions on (0,∞), with finite means. Thus, we consider (F,V(F)) to be an inverse pair of distributions, conceding that the concept requires greater development and insight.
Note that under total variation distance D,
thus, by (2.20),
Thus, X and V have the same variability as measured by scaled L1 norm. Similarly, by (2.21),
where GV is the stationary renewal distribution corresponding to V. From (2.22),
where U is uniform(0,1) and independent of V and X.
When we examine scaled L2 variability, there is a duality present in which scaled L2 variability for F is equivalent to a scale-invariant harmonic mean measure of variability for V(F). As V(V(F)) = F, this relationship is reversible. Specifically,
and
as can be easily verified.
From the definition of T, it follows that T(F) = F if and only if F is concentrated at a single point, μ > 0. The question as to when V(F) = F is far more interesting. In view of (2.23), μ = 1 is a necessary condition (for V(F) = F). However, a point mass at 1 is only one of many solutions. For example, choose A > 1, r ≤ (A + 1)−1, and set Pr(X = A−1) = rA, Pr(X = 1) = 1 − r(A + 1), and Pr(X = A) = r. Then we can easily verify that V(X) = X.
A method for generating a large class of fixed points for V is to choose X ∼ F,V(X) ∼ V(F), with X and V(X) independent, and define W to be a random variable distributed as
Then
Thus,
Choose X ∼ F, with F absolutely continuous with pdf F and finite mean μ. From (2.32), we compute
Thus, (2.35) gives a large class of distributions satisfying V(W) = W.
Following are a few choices of f with corresponding choices of fW, computed from (2.35):
Call
the class of distributions satisfying V(F) = F. One property of
is that if
, with X1 and X2 independent, then the distribution of X1 X2 also belongs to
. This holds since
Another property of
is closure under mixtures. If
for all α and X ∼ ∫Fα dPα, then it follows from EXα ≡ 1 and (2.18) that
I pose the problem of finding a general form for all distributions in
.
For F a distribution on [0,∞) with F(0) < 1,μ(p) = EXp < ∞, and density f, define Tp(F) to be the distribution with density
We allow −∞ < p < ∞ in (2.36).
Lemma 2.3 derives a subadditivity property for Tp,p > 0, and a superadditivity property for Tp,p < 0.
Lemma 2.3: Let X1,…,Xn be independent, with EXip < ∞, i = 1,…,n. Let Tp(X1),…,Tp(Xn) denote independent random variables, distributed as Tp(X1),…,Tp(Xn). Then
Proof: We will prove (i) (p > 0); the other case follows by obvious modification. We only need to consider n = 2; the case for general n then follows by induction.
Arguing similarly as in Lemma 2.2, we can show that
where
The symbol ∝ denoting proportionality. Now
so that
a quantity decreasing in y1 for p > 0. Thus, (2.38) implies that Y1 is smaller than Tp(X1) in the sense of monotone likelihood ratio ordering and, thus, under stochastic ordering. Therefore, we have
Next,
so that
which is decreasing in y2. Thus,
The result now follows from (2.37)–(2.40). █
In this section we derive a property of the renewal length transformation and use it to derive new inequalities for NBUE (new better than used) distributions.
If F is an absolutely continuous distribution on (0,∞), then the record value process corresponding to F is a nonhomogeneous Poisson process, with intensity function equal to the failure rate, h, of F. For {Xi} i.i.d. with distribution F, the first record value, S1, equals X1; the second record value, S2 is defined by
where J2 = min{i : Xi > X1}. More generally,
where Jm = min{i : Xi > XJm−1}.
For the second record value, S2,
where N(t) = {#Si ∈ [0,t]},H(t) = −ln(F(t)) = EN(t). See [38] for a discussion of record values.
Define
Note that K is increasing and
Let X* ∼ G(X). Then
where T(X) is the renewal length transformation. Of course, (3.3) is a direct consequence of (2.5).
Recall [6, p.159] that F is defined to be NBUE if EX < ∞, and
for all t such that F(t) > 0. Thus, F is NBUE if and only if
From (3.2)–(3.4) and the fact that K is increasing, Lemma 3.1 follows.
Lemma 3.1: If F is an absolutely continuous NBUE distribution, then
.
Employing Lemma 3.1, we obtain an inequality for NBUE distributions. Some consequences of this inequality are discussed in Section 3.2.
Theorem 3.2: If X ∼ F is NBUE, then
Proof: First consider the case of absolutely continuous F. From Lemma 3.1 and equations (2.7) and (3.1),
For {t : F(t) > 0}, divide both sides by F(t) to obtain (recalling (2.9))
Thus,
This proves (3.5) for F absolutely continuous. For the general case, we use a smoothing argument. If X ∼ F is NBUE but not absolutely continuous, let U(ε) be uniformly distributed on (0,ε) and independent of X and define
Since the NBUE class is closed under convolutions [6, p.187], X(ε) has an absolutely continuous NBUE distribution.
Applying (3.7) to X(ε), letting ε → 0, and noting that Pr(X(ε) > t) → Pr(X ≥ t),μ(ε) → μ, and E(X(ε)|X(ε) > t) → E(X|X ≥ t), (3.5) follows. █
1. The upper bound, min(e−(t(/μ)−1),1) (Theorem 3.2), is the best possible global bound of the form min(be−ct,1), even within the considerably more restrictive subclass of increasing failure rate (IFR) distributions. The exponential distribution with mean μ shows that a bound in this class must have c ≤ μ−1. The point mass at {μ} has Pr(X ≥ μ) = 1, ensuring that if c = μ−1, then b must be at least equal to e.
2. If, in addition (to NBUE), F is absolutely continuous with failure rate h satisfying
then L(t) ≥ t + a−1. Applying (3.5) gives
A consequence of (3.9) is that
Thus, if a is close in ratio to μ−1, then F is approximately exponential. This result complements a result of Daley [15], who obtained a bound on distance to exponentiality based on the first two moments of an NBUE distribution.
3. The bound (3.5) improves upon Markov's inequality based on knowledge of μ. This is not surprising, as Markov's inequality does not make any further assumptions about F. For F NBUE,
4. Let {Xt,t ≥ 0} be an ergodic Markov chain and let A be a subset of the state space. Denote by Tj(A) the first passage from jεAc to A. Then if
Tj(A) is easily seen to be NBUE. Note that Tj(A) does not necessarily satisfy any stronger aging property (such as NBU or IFR average (IFRA)). Thus, if ETj(A) is known, then (3.5) provides a simple upper bound for Pr(Tj(A) ≥ t).
Let F be a distribution on [0,∞) with F(0) < 1 and μn = EF Xn < ∞, for all n = 1,2,…. Define G1 to be the stationary renewal distribution corresponding to F, and recursively define Gn+1 to be the stationary renewal distribution corresponding to Gn,n = 1,2,…. Next, define
Since F exponential implies G exponential, we would anticipate that under general conditions, Zn would converge to an exponential distribution with mean 1 for the fixed point and with mean 1 for the transformation. Harkness and Shataram [27] obtained sufficient conditions for convergence. Below, we obtain the same sufficient conditions by a different argument, which leads to an error bound us well.
Lemma 4.1 shows that a distribution of the form Gn(X) is a mixture of distributions of the form mn(t), where mn(t) is distributed as the minimum of n i.i.d. uniform(0,t) random variables. For n large, the distribution of mn(t) is approximately exponential and, thus, Gn(X) is approximately a mixture of exponentials. A mixture of exponentials, in turn, is approximately a pure exponential if the mixing distribution has a small coefficient of variation. These ideas are quantified in Corollary 4.2.
Lemma 4.1:
, where Tn(X) and mn are independent, mn is distributed as the minimum of n i.i.d. uniform(0,1) random variables, and Tn(X) has distribution given by (2.36), with p = n.
Proof: By (2.4), the result is true for n = 1. Assume that it is true for n. By (2.4),
The pdf of G(mn) is given by
which is the pdf of mn+1. Thus, from (4.1),
and the induction argument is complete. █
Corollary 4.2: Let
be exponentially distributed with mean 1. Then
Thus, if lim(μn+12/μn μn+2) = 1, then Zn /EZn converges in distribution to
.
Proof: By Lemma 4.1, we represent
By the triangle inequality,
As mn is IFR, it follows from Brown [10, p.13] that
It is straightforward to show that if Z is independent of X and Y, then
thus, from (4.3),
Next,
, is a mixture of exponential distributions and thus is decreasing failure rate (DFR). It follows from Brown [8, p.422] that
The result now follows from (4.3)–(4.5). █
If X is Poisson distributed with parameter λ, then
This follows since
From (2.2) and (5.1),
Next,
Thus, based on a single observation, X, we can unbiasedly estimate Cov(X, K(X)). It follows that if X1,…,Xn is a random sample, with λ unknown, then
is the minimum variance unbiased estimator of Cov(X,K(X)). Of course,
is binomially distributed with parameters (k,1/n).
Next, suppose that X1,…,Xn are independent, with Xi Poisson distributed with parameter λi; the λi's are not assumed equal. Consider the problem of the simultaneous estimation of
, based on the data X1,…,Xn under expected total mean square error loss
We can take an arbitrary estimator of λ, and put it into the form
Then
The savings in risk in employing
over that of X is given by
The extraordinarily insightful approach initiated by Stein [40], is to unbiasedly estimate the savings in risk (5.6) and then, by inspired choice of g, demonstrate that the unbiased estimator is nonnegative with probability 1 and positive with positive probability. Its expectation will then be positive for all λ, and X + g will dominate X, proving X inadmissible. Stein's approach has worked successfully for several families of distributions [29,40,41]. Its elegant application to the Poisson case is due to Peng [32]. From our point of view, the key identities for the Poisson case are related to properties of the size-biasing transformation.
Recall (5.3). This identity is easily generalized, via (2.12), to
where δi is a vector of n components with δi(i) = 1,δi(j) = 0,j ≠ i.
From (5.6) and (5.7),
We now have an unbiased estimator of the savings in risk. For n ≥ 3, Peng discovered a choice of g for which the quantity in brackets is nonnegative with probability 1 and positive with positive probability. Thus, for n ≥ 3, X is inadmissible under total mean square error loss. Furthermore, for some λ, the savings in risk can be substantial. The surprising aspect, which goes back to Stein's famous 1955 paper [39], is that for each parameter λi, Peng's estimator uses {Xj,j ≠ i} in computing
, as well as Xi. As {Xj,j ≠ i} is independent of Xi and has a distribution that does not depend on λi, it is counterintuitive that it should be relevant at all. This issue has received a great deal of deep thought. See, for example, Efron and Morris [19].
Let X1,…,Xn,… be an i.i.d. sequence and define Mn = max(X1,…,Xn),n = 1,2,…. For a Borel set A with Pr(X ∈ A) > 0, define
We are interested in properties of the sequence {cn}, which we will refer to as a max sequence. Since
we see that {cn} is supermultiplicative. Another property, derived in Brown [11, p.855], is that
It follows from (6.3) that
and
Setting c0 = 1, it follows from (6.4) that
Thus, if ci /ci−1 converges, its limit must equal c.
Using a property of the renewal length transformation (Lemma 6.1), we prove that
It follows from (6.6) and (6.7) that
One consequence of (6.8) is that if c(A) < 1, then {cn} is ultimately decreasing. More specifically, define
Then n ≥ N, implies that cn < cn−1. A sufficient condition for c < 1 is that for some t, F(t) > 0 and F(A ∩ (t,∞)) = 0. We now derive (6.7).
Lemma 6.1: If c1 = P(X ∈ A) > 0, then (6.7) holds.
Proof: First consider the case in which X is uniformly distributed on (0,1). Let Mn = max(X1,…,Xn). Then
Thus, T(Mn) = Mn+1, and from (2.19),
Since ET(X) ≥ EX for any random variable X, it follows from (6.10) that
Next,
The result for X uniform(0,1) thus follows from (6.11) and (6.12).
For the general case, define
Then X ∼ r(U), where U ∼ uniform(0,1), and Mn = max(X1,…,Xn) ∼ r(U(n)), where U(n) is the maximum of n i.i.d. uniform(0,1) random variables. Consequently,
and the general result follows from the uniform case. █
Remark: We note that the sequence, {cn+1 /cn} need not be monotone. For example, if X ∼ Uniform(0,1) and A = (0.4,0.6) ∪ (0.99,1), then c3 /c2 > c4 /c3 < c5 /c4.
Remark: I pose the following question. What are necessary and sufficient conditions for a sequence, {cn}, to be a max sequence?
In this section we study the problem of lower bounding the survival function of an IFR distribution, in terms of its first two moments. The problem was suggested to me by Persi Diaconis (personal communication). It arises in the study of ergodicity of birth and death processes (see, e.g., Diaconis and Fill [17]).
Barlow and Marshall [5, p.1262] derived the best lower bound for an IFR distribution, given the first two moments. This bound is not an explicit function of the first two moments, but is the solution of equations depending on these moments, which are solved numerically. We derive an explicit, simple, but nonoptimal lower bound.
Recall that an IFR distribution on (0,∞) is absolutely continuous, except perhaps for an atom at the right-hand endpoint of its support. For simplicity, we will assume that there is no such atom. The IFR property requires that the failure rate, h(t) = f (t)/F(t), be nondecreasing and, equivalently, that the integrated hazard, H(t), be convex.
A simple method for lower bounding F(t) is to use Jensen's inequality,
so that if EY is known, and EH(Y) can be upper bounded (say by b), then by convexity,
Similarly, if EY < EZ are known and H(EY) ≤ b and H(EZ) ≤ c with b < c are known, then we can upper bound H for t ∈ [EY,EZ], using convexity, and thus lower bound F(t).
We will use G(X),X, and T(X) as our choices of random variables. The means are of course known. The mean of H(X) equals one, and we will show in Lemma 7.1 that
The use of the three distributions along with (7.1), (7.2), and convexity of H lead to
and then for F,
Barlow and Proschan [6, p.112] derived lower bounds for F(t) using the theory of convex ordering. Their bounds, based on the first two moments, give
The new bound (7.3) offers improvement over (7.5) in that it has a larger domain
and gives smaller upper bounds for H(t) in their common domain. The (7.5)-related bounds, using higher moments, still remain useful.
In the above-mentioned birth and death process application, (7.3) can be used to lower bound the separation distance between a birth and death process at time t (starting in state 0) and the steady-state distribution. This separation distance is a convolution of exponential distributions and is thus IFR.
We now derive (7.1) and (7.2). They hold for general distributions; however, (7.3) does not.
Lemma 7.1: Let F be an absolutely continuous distribution with EF X2 < ∞. Let X* ∼ G(F) and T ∼ T(F). Then
Proof: Recall the record value process discussed in Section 3. Now
Furthermore,
where m(t) = E(X − t|X > t). A well-known and useful result [7] is that Em2(X) = σ2. It thus follows from (7.6) and (7.7) that
Next, we observe that
Thus, (7.1) holds. █
The fact that
follows from (2.5), as H(T) − H(X*) is the total hazard accumulated between X* and the next event, T.
The size-biasing transformation provides an interesting perspective for simple versus simple hypothesis testing. Suppose that X is a random variable, vector, or function, with density f0 under H0, and density f1 under H1, both with respect to the σ-finite measure μ. Assume that f0 and f1 are mutually absolutely continuous. Define
the likelihood ratio statistic. Define Y to be a random variable distributed as the distribution of λ(X) under f0. Then T(Y) is the distribution of λ(X) under f1, and V(Y) is the distribution of f0(X)/f1(X) = λ−1 under f1. Thus, λ under f0 and λ−1 under f1 are inverse distributional pairs, as defined in Section 2.4. These assertions hold since EY = 1 and
Thus, by (2.2) and (8.2), T(Y) is the distribution of λ under f1.
Define F to be the survival function of Y and thus of λ(X) under f0. Then
the type 1 error of the Neyman–Pearson test that rejects H0 for λ > c and otherwise accepts H0. Furthermore, by (2.7),
the power of the above test where G is the stationary renewal distribution corresponding to F. It follows from (8.3) that
Thus, the distribution of λ under H1 is determined from its distribution under H0 by (8.4).
Define
which we call the excess at c. We see from (8.5) that the excess is an absolutely continuous convex function. This is true, despite the fact that the distribution of λ might be purely discrete.
If the distribution of λ under H0 belongs to
(see Section 2.5), then an interesting symmetry property holds. The distribution of λ−1 under H1 will coincide with that of λ under H0. If statistician I rejects H0 for λ > c and accepts H0 otherwise and statistician II rejects H1 for λ−1 > c and otherwise accepts H1, then the two statisticians will have (each according to his own perspective) the same type 1 and type 2 errors.
A function g on [0,∞) is defined to be submultiplicative if
Submultiplicative functions arise in the study of Banach algebras, semigroups, and other areas of functional analysis (see, for example, several discussions in Hille and Phillips [28]). Phillips [33] derived the bounds (9.1) and (9.2). Let g be a nonnegative, integrable, submultiplicative function on [0,∞) with
Then
Suppose that we restrict g further by requiring that g(t) = F(t), the survival function of a distribution on [0,∞). The submultiplicative property is then equivalent to the NBU condition
for all s ≥ 0 and t ≥ 0. Daley (personal communication) suggested exploring the connection between NBU distributions and the theory of submultiplicative functions. That suggestion is the subject of this section.
It turns out that (9.1) provides a neat upper bound,
the square of the Markov inequality bound F(t) ≤ (μ/t). It does not appear to be known in the NBU literature. On the other hand, (9.2) is not very good for NBU distributions, as our bound, (3.5), for the larger NBUE class improves both the exponent (from 2/μe to 1/μ) and the multiplying constant (from e2 to e). Of course, Phillips was concerned with general submultiplicative functions, whereas we work with survival functions.
Inequality (9.3) does not extend to NBUE distributions. For example, let X = 1 or 3, each with probability ½. Then F is NBUE, but not NBU. Moreover,
Combining (3.5) and (9.3), it follows that for F NBU,
The bound (9.3) is better than (3.5) for t < cμ, with c ≈ 3.51286; however, the situation reverses for t > cμ.
We now study the problem of improving (9.4). In Corollary (9.2), we show that for t > μ,
where
Our approach is to use the Section 3 idea that
where X is independent of Y1 and Y2. In Section 3 we chose Y1 ∼ G(X) and Y2 ∼ S2. For our current application, we employ Y1 ∼ T(X) and Y2 ∼ S2, the second record value. Now for F absolutely continuous,
where S3 is the third record value. Note that
For Y2 = T, define T2 to be a random variable distributed as X|X > T, where X and T are independent. We find
where L(t) is defined in (2.8) and h is the failure rate function of X.
We require a lower bound for FT2(t), which is derived in Lemma 9.1.
Lemma 9.1: Let F be an absolutely continuous NBU distribution on (0,∞). Then
Proof: Defining
, we find that
as can be verified by differentiation. Now
where we used the superaddivity property of H(H(t) ≥ H(x) + H(t − x)), which is equivalent to the submultiplicativity of F. Thus, from (9.12),
Thus, (9.11)–(9.13), prove (9.10). █
Using Lemma 9.1,
, (9.8) and (9.9), we now derive (9.5). A probabilistic proof of Phillips' bound (9.1), for the case of survival functions, is also provided.
Corollary 9.2: If F is NBU with mean μ, then for t ≥ μ,
where B(t) is given by (9.6). Moreover,
where
is the unique solution in (1,∞) of the equation
Proof: Let Ut be uniformly distributed on [0,t]. By the submulticativity of F,
Now, F(Ut) and F(t − Ut) are identically distributed and negatively correlated, as F(Ut) is decreasing in t while F(t − Ut) is increasing. Thus,
To prove the result for F(t−) = Pr(X ≥ t), we use the same smoothing argument that we employed in the proof of Theorem 3.2.
Now assume that F is absolutely continuous. By (9.8), (9.9),
, and Lemma 9.1,
For {t|F(t) > 0}, divide both sides of (9.20) by F(t) and rearrange terms, obtaining
For t > μ, the inequality in (9.21) is satisfied if and only if H(t) ≥ B(t), given by (9.6), as B(t) is the unique positive root of
and s(x) < 0 for 0 ≤ x < B(t), whereas s(x) ≥ 0 for x ≥ B(t). For the general case (F not necessarily absolutely continuous) we use the smoothing argument of Theorem 3.2. Thus, (9.14) has been proved. As for (9.15), we note that
from which the desired conclusion follows. The value for
in (9.16) and (9.17) was found numerically. █
Remark: If X is NBU, then each interarrival time, Sn − Sn−1, for the record value process is stochastically smaller than X and thus stochastically smaller than the corresponding interarrival time for the renewal process corresponding to X. It follows that
where EZ(t) is the expected forward recurrence time at t. It thus follows that
Comparing (9.22) to (3.5), for distributions that deteriorate with age, we would usually expect that for most t,
so that (9.22) offers potential improvement over (3.5).
Lemma 2.3 discussed a subadditivity property for Tp, p > 0, and a corresponding superadditivity property p > 0. It follows that if g is an increasing function, then
with a reverse inequality for p < 0. In (10.1), X1,…,Xn are independent, as are Tp(X1),…,Tp(Xn).
We will only discuss the choice g(x) = x. Define
It follows from (10.1) that
For the choice p = −1, (10.3) implies
equivalently,
where
.
Defining k(a1,…,an) = ([sum ]ai−1)−1, (10.4) is equivalent to
Jensen's inequality for concave functions is suggested by (10.6), with the assumption that k is concave as a function from R+n to R+. To prove (10.4) inductively, we only need to consider the case n = 2. By examination of the Hessian corresponding to k(x,y), we confirm that k is concave. Therefore, (10.4) and (10.5) hold, without independence. I have not seen this inequality in the literature.
The choice k = −2 leads to
Unlike (10.4), I do not know how to verify (10.7) directly.
Thus, I am suggesting that (10.1), with freedom of choice for g and p, could be a useful mechanism for generating interesting inequalities.
In this section we discuss Chernoff's theorem [4, p.6; 13, p.49]. The renewal length transformation is already implicit in the existing proofs, so we merely make it a bit more explicit here. We will not endeavor to find the weakest possible conditions. First we derive a preliminary lemma.
Lemma 11.1: Let W be a positive random variable, with T = T(W) its renewal length transformation. Then the following hold:
(i) Pr(W ∈ A) ≤ (EW)Pr(T ∈ A)e−E(log T |T∈A)
(ii) If, in addition, E log T = 0, then
Proof:
(i)
where we used convexity of −ln x as well as (2.2).
(ii) If E log T = 0, then
Next, suppose that {Xi} is i.i.d. with EX < 0,Pr(X > 0) > 0, and M(t) = EetX < ∞ for all t. The condition EX < 0 implies M′(0) < 0;Pr(X > 0) > 0 ensures that M(t) → ∞ as t → ∞. As M is strictly convex, decreasing at t = 0 and eventually increasing, it follows that there exists a unique τ > 0, with M′(τ) = 0. At t = τ,M has a global minimum, ρ = M(τ) < M(0) = 1.
For applying Lemma 11.1, set
so that
Since W = ΠeτXi, it follows from Lemma 2.2 and (2.13) that
, where each Yi has pdf
From (11.2), EY = M′(τ) = 0 and Var Y = ρ−1M′′(τ). It follows that (recall T = T(W))
and
Next, set A = [1,∞),αn = Pr(Xn ≥ 0) (where
), and βn = Pr(Yn ≥ 0). By Lemma 11.1(ii),
It follows from (11.3) that
By the central limit theorem, βn → ½; thus,
Finally, Markov's inequality yields
This ensures that
We thus conclude from (11.4) and (11.5) that
the conclusion of Chernoff's theorem.
Consider a renewal reward process [37, pp.132,133]. This is generated by an i.i.d. sequence {Xi,i ≥ 1}, with X ∼ F and F(0) = 0. Corresponding to each Xi there is a reward Ri; Ri is allowed to depend on Xi and other random quantities, but the pairs {(Xi,Ri),i ≥ 1} are assumed i.i.d. Define
, and
If EX and ER are both finite, then
almost surely and in expectation [37, p.133]. The quantity ER/EX is referred to as the long-run average reward rate.
Next, consider the stationary version of the renewal reward process. It can be constructed by starting a renewal interval at −Y, with Y ∼ G, the stationary renewal distribution corresponding to F; if Y = y, then the length of the interval covering {0} is constructed to be distributed as X|X > y. By (2.5), the resulting unconditional distribution of the length of the interval covering {0} is that of T(F). Finally, given T = t, the reward
for this interval covering {0} is constructed to have the conditional distribution of R given X = t, so that
Now, by (2.2) and (12.1),
Thus, the observed reward rate over the interval covering a fixed point in the stationary process is an unbiased estimator of the long-term average reward rate.
Next, suppose that the experimenter observes the interval covering {0}, as well as the next n − 1 ordinary intervals. For this person, the observed reward rate is given by
In Lemma 12.1 we show that the observed reward rate, (12.3), is once again an unbiased estimator of the long-term average reward rate (ER/EX).
Lemma 12.1: The expected value of the random quantity in (12.3) equals ER/EX.
Proof: Let Y = (T,X2,…,Xn), where T ∼ T(X) and Xi ∼ F,i = 2,…,n, and T and X2,…,Xn are independent. The pdf of Y is given by
It follows from (12.4) that
where X = (X1,…,Xn) are i.i.d. random variables distributed as F and g satisfies E|X1 g(X)| < ∞.
Next, suppose that g is symmetric. Then
From (12.5) and (12.6) we have
for all symmetric g with E|X1 g(X)| < ∞.
Next, choose
It follows from (12.7) that
Remark: It might have occurred to the reader to prove Lemma 12.1 by choosing the interarrival time distribution to be distributed as
. Then
(2.17), and
, so that ER(Sn)/ESn = ER/EX, and the result would then follow from (12.2). The problem with this argument is that one would need to prove that (12.1) implies
which is not obvious to me.
For X a random variable on [0,∞) with Laplace transform Ł, we can interpret Ł to be the survival function of
, where
are independent and
is exponential with mean 1:
Similarly, the Laplace transform of T(X) is the survival function of
, where
and is independent of
:
We see from (13.1) and (13.2) that as a survival function, ŁX is the stationary renewal distribution corresponding to ŁT. This can also be seen from (2.14) and (2.25) as
It is shown in Brown [8, p.422] that if
is the stationary renewal distribution corresponding to F, then
Applying (13.3) and (13.4) to Y = V and recalling (2.23) and (2.30), we obtain
where
Thus, if EX−1 ≈ (EX)−1, then the Laplace transform form of X is close to that of a one point distribution at {μ} and is also close to the Laplace transform of T(X).
Next, consider a completely monotone function g on (0,∞), with g(0) < ∞. Employing Bernstein's theorem as in Brown [9, p.1241], it follows from (13.5), and (13.6) that
and
Thus, we can interpret c2(X) as a scale-invariant measure of variability. When c2(X) is small, X is close to a point distribution at μ. This proximity is reflected in the behavior of expectations of completely monotone functions of X.
Note that if g is completely monotone, then g is maximized at zero and decreases rapidly as t increases. The mean of g(X) lies in (0,g(0)]. The quantity c2(X) ∈ [0,1) provides an upper bound to the fraction of g(0) that Eg(X) can realize. It is appropriate to base a bound on a parameter that measures the relative concentration of X near zero (where g is maximized), rather than one based on the tail behavior that is largely irrelevant. Thus, an harmonic mean-based bound, rather than a variance-based bound does make good sense here. Similarly,
thus, the Laplace transform of V(X), considered as a survival function, is the stationary renewal distribution corresponding to the Laplace transform of X−1. The analogs of (13.8) and (13.9) are
where
Thus, the dual measures of variability discussed in Section 2.4 enter into the bounds for expectations of completely monotone functions of X and V(X).
I am indebted to Daryl Daley for suggesting the Section 9 problem and to Persi Diaconis for suggesting the problem discussed in Section 7. I thank David Aldous, Albert Marshall, and two referees for bringing relevant references to my attention. I am grateful to the National Security Agency for their support of this research.