1. Introduction
Phase-type (PH) distributions were introduced in 1975 by Neuts [Reference Neuts3] for the study of queueing systems. Since then, PH distributions have been a subject of research, and have found applications in many areas of applied probability. For example, a succession of papers by O’Cinneide ([Reference O’cinneide5]–[Reference O’cinneide8]) revealed some fundamental properties of PH distributions. In [Reference Aldous and Shepp1] it was shown that the squared coefficient of variation (SCV) of a PH distribution with a PH representation of size m is greater than or equal to 1/m. In [Reference Telek, Latouche and G. Taylor11] the minimal SCV of discrete PH distributions was found. In [Reference He, Zhang, Vera and Latouche2] stochastic comparison was utilized in the study of PH distributions, which also led to moment bounds of PH distributions. These results not only deepen our understanding of PH distributions and PH representations, but also facilitate the applications of PH distributions significantly.
The steepest increase property of PH distributions was first proposed in O’Cinneide [Reference O’cinneide8] and proved by O’Cinneide [Reference O’cinneide8] and Yao [Reference Yao12]. We think that it is a very interesting property of PH distributions, which received little attention in the literature. In this paper we apply the property to find moment bounds of PH distributions, which demonstrates the usefulness of the property.
PH distributions with finite support were introduced and investigated in [Reference Ramaswami and Viswanath9]. This generalization extends the applications of PH distributions significantly.
In this paper we find a number of stochastic and moment bounds for PH distributions with finite and infinite support. Many of the moment bounds depend only on the size of PH representations and the eigenvalue with the largest real part of PH generators. A highlight of this paper is that the SCV of a bounded PH distribution with a PH representation of size m is greater than or equal to 1/(m(m + 2)). The moment bounds of PH distributions reveal fundamental properties of such probability distributions, e.g. they indicate which type of general distributions can be closely approximated. The results are useful in, but not limited to,
• finding moment bounds of performance measures for stochastic models, and
• selecting PH representations in the parameter estimation of PH distributions.
The rest of the paper is organized as follows. In Section 2 we introduce PH distributions with infinite support together with the steepest increase property and with some of their moment bounds. In Section 3 we introduce PH distributions with finite support and study their moment bounds. Section 4 concludes the paper.
2. PH distributions with infinite support
In this section we first define PH distributions and their corresponding PH representations. Then we review the so-called ‘steepest increase lemma’ (see [Reference Yao12]) on the density function of (ordinary) PH distributions, since this turned out to be useful to derive the moment bounds. We further extend the ‘steepest increase lemma’ and derive new moment bounds for ordinary PH distributions.
A nonnegative random variable 𝒴 has a PH distribution if it is the absorption time in a finite-state, continuous-time Markov chain (see [Reference Neuts4]). We assume that 𝒴 has a PH representation (α, A) of size m, where α is the initial distribution of the underlying continuous-time Markov chain and A contains the transition rates among the transient states of the underlying continuous-time Markov chain (referred to as a PH generator or sub-intensity matrix). That is, α is a nonnegative probability vector and A has nonnegative off-diagonal and negative diagonal elements such that the diagonal element dominates each row, A1 ≤ 0 (elementwise), where 1 is a column vector of 1s with the appropriate size. Let us denote the density function of 𝒴 by f (t) = αeA t( − A1) and the cumulative distribution function (CDF) by F(t) = 1 − αeAt1 (both for t ≥ 0). To avoid the trivial case 𝒴≡0, we assume that 0 < α1 ≤ 1. We also note that, by [Reference O’CINNEIDE6], f (t) > 0 for all t > 0.
The steepest increase conjecture was first published and partially proven by O’Cinneide [Reference O’cinneide8]. Its complete proof was given by Yao [Reference Yao12]. The steepest increase lemma states that f′(t)/f (t) ≤ (m − 1)/t for t > 0, or, equivalently, f (t)/t m−1 is decreasing in t for t > 0. Next, we present and show a ‘sharp’ form of the steepest increase lemma.
Lemma 1
For a PH distribution with representation (α, A) of size m and with density function f (t), we have f (t)
where λ is the absolute value of the eigenvalue of A with the largest real part (which is real and negative). Here −λ is also referred to as the dominant eigenvalue of matrix A. In (1) the equality holds when 𝒴 is Erlang(m, λ)-distributed (i.e. 𝒴 is the sum of m independent exponential random variables with the same parameter λ).
Proof. The inequality just before Equation (12) of [Reference O’cinneide8], combined with the proof in [Reference Yao12], leads to ((m − 1 − λ)I − A)eA ≥ 0 for PH generator A, where I is the identity matrix. For PH generator A, At is also a PH generator for t > 0. The eigenvalue with the largest real part of At is −λt for t > 0. Setting A =: At in the inequality, we obtain, for t > 0,
which leads to
Pre-multiplying and post-multiplying both sides of (2) by α and −A1, respectively, we obtain f′(t)t ≤ (m − 1 − λt)f (t), which proves the lemma.
Similar to Lemma 1, −λ stands for the dominant eigenvalue of A in the sequel.
Equation (1) can be written in several alternative forms. For t ≥ 0,
and utilizing the fact that λ > 0, we also have, for t > 0,
Lemma 2
For 𝒴 with PH generator A of size m and dominant eigenvalue −λ, we have, for 0 < t 1 < t 2,
where f (m, λ)(t) = λmt m−1e−λt/(m − 1)! is the density function of the Erlang random variable with parameters (m, λ).
Proof. For t > 0, (1) can be written as,
Integrating both sides of (5) from t 1 > 0 to t 2 > t 1 yields
where f (t) > 0 for all t > 0 ensures that the integration can be done properly. Taking the exponent of both sides of (6), we obtain
which leads to the desired result.
Lemma 2 implies that f (t)/f (m,λ)(t) is decreasing in t, which is a generalization of the monotonicity of the function f (t)/t m−1. Furthermore, Lemma 2 leads to the following stochastic comparison result between 𝒴 and the Erlang random variable 𝒳(m,λ) with parameters (m, λ). A random variable Y is stochastically smaller than or equal to a random variable X, denoted as Y ≤d X, if F Y(t) ≥ F X(t) holds for all real t (see [Reference Stoyan10]).
Corollary 2.1
The PH-distributed random variable 𝒴 (of size m and with dominant eigenvalue −λ) is stochastically smaller than or equal to 𝒳(m,λ). Consequently, we have, for n ≥ 1,
Proof. Since both f (t) and f (m,λ)(t) are density functions on [0, ∞), there must be at least one intersection in (0, ∞). If t* is an intersection (i.e. f (t*) = f (m,λ)(t*)), by Lemma 2, we must have f (t) ≤ f (m,λ) (t) for t > t* and f (t) ≥ f (m,λ)(t) for t < t*. Thus, there are only three possible cases:
• f (t) and f (m,λ)(t) are identical;
• f (t) and f (m,λ)(t) have exactly two intersections, t = 0 and t = t*;
• f (t) and f (m,λ)(t) have exactly one intersection, t = t*.
Then we must have f (t) ≥ f (m,λ)(t) for 0 < t ≤ t* and f (t) ≤ f (m,λ)(t) for t ≥ t* (> 0), which leads to F(t) ≥ F 𝒳(m,λ) (t), where F 𝒳(m,λ) (t) is the CDF of 𝒳(m,λ) for 0 < t ≤ t*, and 1 − F(t) ≤ 1 − F 𝒳(m,λ) (t) for t > t*. Consequently, we obtain F(t) ≥ F 𝒳(m,λ)(t) for t > 0, which leads to the first result. All the moment bounds can be obtained from 𝒴≤d 𝒳(m,λ) directly.
A random variable X is smaller in the mean residual life than a random variable Y, denoted as X ≤c Y, if 𝔼 [max{0, X − t}] ≤ 𝔼 [max{0, Y − t}] holds for all real t (see [Reference Stoyan10]). By [Reference O’cinneide7], it is known that the Erlang distribution with parameters (m, m/𝔼[𝒴]) is smaller in the mean residual life than 𝒴, which has the same mean. Then the moments of 𝒴 are bounded from above and below as follows:
A by-product of the above inequalities is that 𝔼[𝒴] = m/λ if and only if 𝒴 has an Erlang distribution, which can also be obtained from inequality (3) directly.
Based on Lemma 1, the next lemma refines the upper bound for the (n + 1)th moment based on the nth moment, which gives Corollary 2.1 an alternative proof.
Lemma 3
For n = 0, 1, …, the (n + 1)th moment of 𝒴 (of size m and with dominant eigenvalue −λ) is bounded by
and the equality holds when 𝒴 = 𝒳(m,λ).
Proof. Multiplying both sides of (3) by t n and integrating from 0 to ∞ gives the following identities for the left-hand side (LHS) and the right-hand side (RHS):
Thus
When 𝒴 is Erlang(λ, m) distributed, the equality in (7) comes from the fact that Lemma 1 gives equality for an Erlang distribution for all t > 0.
Applying Lemma 3 for n = 0 and n = 1 enables us to derive the following upper bounds on the mean 𝔼[𝒴] and the squared coefficient of variation SCV𝒴 = 𝔼[𝒴2]/𝔼[𝒴]2 – 1:
Interestingly, (9) gives an upper bound for SCV𝒴, while the lower bound for SCV 𝒴 is much more widely known as SCV𝒴 ≥ 1/m. Hence, we have (see Figure 1)
3. PH distributions with finite support
PH distributions with finite support were introduced in [Reference Ramaswami and Viswanath9], where three classes of finite support distributions were considered (matrix exponential densities from the lower bound to the upper bound, and from the upper bound to the lower bound, and a convex combination of the two). Instead, we define $\mathcal{Z}$ to have distribution b + (𝒴 | 𝒴 < T). The support of $\mathcal{Z}$ is thus [b, B) with b < B and B = b + T. Recall that 𝒴 is the ordinary (or infinite support) PH distribution with density function f (t) = αeAt(− A)1. Then the density function of $\mathcal{Z}$ is given by
for t ∈ [b, B), and $f_{\mathcal{Z}}(t)=0$ for t ∉ [b, B).
We denote this class of finite PH distributions by FTPH1 (with some further similar classes FTPH2 and FTPH3 in mind: FTPH2 is defined as $\mathcal{Z}_2=B-(\mathcal{Y}\mid \mathcal{Y}<T)$ and FTPH3 as the convex combination of $\mathcal{Z}$ and $\mathcal{Z}_2$; these are subject to future investigations). In this paper we primarily focus on the moments of FTPH1 distributions.
Although FTPH1 is obtained by just a simple truncation of an ordinary PH distribution, it has some very interesting members. A truncated exponential distribution with a very small intensity parameter leads to a uniform distribution (Figure Figure 2(a)), since
Similarly, a truncated Erlang-N distribution with very small intensity gives
yielding linear, quadratic, and cubic distributions (see Figure 2(b) and 2(c)).
The importance of these special FTPH1 members is that density functions of such shapes are notoriously difficult to capture by ordinary PH distributions. Ordinary PH distributions with a limited number of phases purely approximate them. In practical computations the λ → 0 limit also causes difficulties, but easily computable small positive λ values give reasonably good approximations. These approximation issues and parameter estimations of PH distributions with finite support will be addressed in a separate paper.
A general formula for computing the moments for PH distributions with finite support can be obtained as follows.
Theorem 1
The nth moment of an FTPH1 random variable, $\mathcal{Z}$ with parameters (α, A) over interval [b, B), is expressed by
where T = B − b.
Proof. The theorem is proved by routine calculations.
3.1. Moment bounds for the case with b = 0
In this subsection we assume that b = 0, which is extended to b > 0 in the next subsection. A random variable in FTPH1 is denoted as 𝒲 = 𝒴 | 𝒴 < T. We derive and prove the lower and upper bounds for the moments of 𝒲. For i = 0, 1, … , we introduce the notation $E_i(T)=\mathbb{E}[{\mathbb{I}_{\{\mathcal{Y}< T\}}}{\mathcal{Y}^i}]=\int_{t=0}^{T} t^i f(t) {\text {d}}t$, where 𝕀{a} denotes the indicator of a. Then 𝔼[𝒲i] can be written as, for i = 0, 1, …,
The next lemma is similar to Lemma 3 (for ordinary PH distributions), but it does not depend on the dominant eigenvalue −λ.
Lemma 4
The moments of an FTPH1 random variable, 𝒲, with support on (0, T), satisfies
Proof. Multiplying both sides of (4) by t n−1(T − t) and integrating from 0 to T gives the following identities for the left-hand side:
We use Stieltjes integration by parts in the first step,
and
in the second step. For the right-hand side, we have
from which
which leads to the desired result.
In the following corollary the upper bound for the nth moment is provided, independent from the lower-order moments.
Corollary 1
The nth moment of an FTPH1 random variable, 𝒲, with support on (0, T), is bounded by
and the upper bound is strict except when 𝒴 is Erlang(λ, m) distributed and λ tends to 0. In particular, we have 𝔼 [𝒲] ≤ mT/(m + 1), which indicates that no PH distribution with finite support can have a mean close to the upper bound T = B.
Proof. Recursively applying (10) for moments 1, …, n gives the upper bound. The statement on the equality comes from the fact that the right-hand side of (12) gives equality only when (11) gives equality for 1, …, n, and it occurs only when 𝒴 is Erlang(λ, m) distributed and λ tends to 0, because (4) gives equality only for an Erlang(λ, m) distribution when λ tends to 0.
Having derived these moment bounds, we now consider how to reach the extreme values, and the FTPH1 structure which realizes the moment bounds. According to the next lemma, Erlang distributions play an important role in this respect.
Lemma 5
The upper bound in Lemma 4 is strict when 𝒲equals 0 with probability p =1 − (m + n − 1)μ n−1/mT n−1 and is truncated Erlang(λ, m) distributed with probability 1 − p such that λ tends to 0, where μ n−1 = 𝔼[𝒲n−1].
Proof. To prove the statement, we show that compared to the extreme distribution of Lemma 5 any valid change in the probability of the mass at 0, p, decreases the nth moment. First we note that increasing p is not a valid change, because to maintain 𝔼[𝒲n−1] the (n − 1)th moment of a strictly positive FTPH1 needs to be increased above mT n−1/(m + n − 1) (which is not possible according to Corollary 1).
Let us now try to decrease rather than increase p. Consider a distribution whose mass at 0 has probability p̂ = p − Δμ n−1 , where Δ is a small positive number. In this case, the (n − 1)th moment of the strictly positive part, $\mu_{n-1}^+$, is
When the (n − 1)th moment of the strictly positive part, 𝒲+, is $\mu_{n-1}^+$, its nth moment is bounded by (10), and using that, we can write
where the inequality is strict, because the distribution of 𝒲+ is different from an Erlang(λ, m) → 0, since its (n − 1)th moment is less than mT n−1/(m + n − 1).
While the moment bounds derived in Lemma 4 are independent of the dominant eigenvalue, the following results provide moment bounds as a function of λ.
Lemma 6
For n = 1, 2, …, the (n + 1)th moment of 𝒲 is bounded by
Proof. Multiplying both sides of (3) by (T − t)t n−1 and integrating from 0 to T gives
from which it follows that
On the other hand, multiplying both sides of (3) by t n and integrating from 0 to T gives
from which we obtain
which leads to the desired results.
Lemma 6 gives a tight moment bound, for which the upper and the lower limits are identical if 𝒴 is Erlang distributed. To get rid of the density function in the boundary, a loose version of Lemma 6 is
where the strict inequality indicates the loose boundary.
Now we look at the lower-order moments by focusing on moment bounds for the mean and SCV for FTPH1 distributions with b = 0.
Corollary 2
The SCV of 𝒲, SCV𝒲 = 𝔼[𝒲2]/𝔼[𝒲]2 − 1, is bounded by
Proof. For n = 1, Lemma 6 gives m
whose right-hand side can be upper bounded by (m + 1)𝔼[𝒲]/λ, from which the corollary follows by dividing with (𝔼 [𝒲] )2 and subtracting 1.
We note that the difference between the upper and the lower limits of SCV𝒲 in Corollary 2 is
for which, according to (8) and the definition of 𝒲, we have
That is, when λ tends to 0, the upper bound converges to ∞. In this case, the λ independent upper limit mT/(m + 1) from Corollary 1 can be applied. Combining the results, we obtain 𝔼[𝒲] ≤ min{m/λ, mT/(m + 1)}.
Our final result of this subsection gives a lower bound of SCV𝒲 in terms of m only, which generalizes the result of Aldous and Shepp [Reference Aldous and Shepp1] mentioned in the introduction.
Theorem 2
The SCV of the FTPH1 random variable 𝒲 with support on (0, B) is bounded by SCV𝒲 ≥ 1/(m(m + 2)).
Proof. Recall that T = B − b = B. Define
By Lemma 1, (m − 1 − λt)f (t) − tf′(t) ≥ 0 for t > 0, which leads to (m − 1)f (t) − tf′(t) > 0 for t > 0. By integrating from 0 to T, we obtain
Consequently, g(t) is a density function of a random variable, to be called Y T, with support [0, T). Note that $\int_0^T t^n{\text {d}}F(t) = \mathbb{E}[{\mathcal{Y}^n{\mathbb{I}_{\{\mathcal{Y} < T\}}}}]$ for n = 0, 1, 2, …. By routine calculations, we obtain
It is well known that $\mathbb{E}\left[{Y_T^2}\right] /(\mathbb{E}\left[{Y_T}\right])^2 \geq 1$. Using the above expressions, we obtain
Recall that 𝒲 = 𝒴 | 𝒴 < T. We also note that 𝔼[𝒲2] = 𝔼[𝒴2𝕀{𝒴<T]/F(T) and 𝔼[𝒲] = 𝔼[𝒴𝕀{𝒴<T]/F(T). The above equation leads to
We want to show that Θ(T) ≥ 1 for all T > 0. Since mF(T) > Tf (T) according to (13), Θ(T) ≥ 1 is equivalent to
which is equivalent to
The last equation holds by applying the well-known inequality a 2 + b 2 ≥ 2ab for any real numbers a and b. Thus, we have shown that Θ(T) ≥ 1 for all T > 0. Consequently, we have shown that
which is equivalent to SCV𝒲 ≥ 1/(m(m + 2)).
By Corollary 1, the lower bound of SCV𝒲 is strict for all PH distributions with finite support. The following lemma and corollary show how the lower bound of SCV𝒲 can be attained approximately by bounded Erlang distributions. For t ≥ 0, denote by
the distribution function and density function of an Erlang random variable 𝒳(m,λ), respectively. By routine calculations, we obtain
Lemma 7
For all T > 0, the following bounds apply for the distribution functions of Erlang random variables:
In addition, we have limT→0 F m(T)F m+2(T)/(F m+1(T))2 = (m + 1)/(m + 2) and limT→∞ F m(T)F m+2(T)/(F m+1(T))2 = 1.
Proof. For convenience, we denote λT as t in this proof. First, we prove the upper bound. Since $\text {e}^{\lambda t} = \sum_{k=0}^\infty (\lambda t)^k/k!$, we need to show that
which can be reduced to
Next, we compare the coefficients of t k on both sides. For k = 2m + 2, the left-hand side is 1/((m !(m + 2)!) and the right-hand side is 1/((m + 1)!)2. From
it follows that the left-hand side is smaller than the right-hand side. For k ≥ 2m + 3, we have, for k = j + m,
which is equivalent to m + 1 ≤ j, and holds since j = k − m ≥ m + 3. Consequently, we have shown the upper bound.
The proof of the lower bound is similar but tedious. The lower-bound expression can be rewritten as $(m+1)F^2_{m+1}(t/\lambda)\leq (m+2)F_m(t/\lambda)F_{m+2}(t/\lambda)$, which can be rewritten explicitly as
which leads to
To prove the above inequality, we compare the coefficients of t n on both sides. For n = 2m, we have
which holds. For n ≥ 2m + 1, we need to prove that
Separating the first and last terms of the summation and applying k ! = k(k − 1)!, we obtain
which leads to
For any i ∈ {m + 1, … , n − m − 1}, we have i ≤ n − m and m + 1 ≤ n − i, from which we can write
Considering that n − m − 1 − (m + 1) + 1 = n − 2m − 1 terms are summed on the right-hand side of (14), each of which is greater than or equal to (m + 2)/((n − m)! (m + 1)!), inequality (14) as well as the lower bound of Lemma 7 are proved. This completes the proof of the lemma.
Immediate consequences of Lemma 7 are a lower bound and an upper bound of the SCV for bounded 𝒳(m,λ).
Corollary 3
Assume that 𝒳 has an Erlang distribution with parameters (m, λ). For all T > 0, we have
3.2. The b > 0 case
Let $\mathcal{Z}=b+\mathcal{W}=b+(\mathcal{Y}mid \mathcal{Y} < T)$. Then, for $\mathbb{E}\left[{\mathcal{Z}^n}\right]$, we have
where E i(T) is defined as before. That is, for n = 1, 2, we have
Corollary 4
The nth moment of an FTPH1 random variable, $\mathcal{Z}$, with support on (b, B) is bounded by
For lower-order moments, according to (15), the mean of $\mathcal{Z}$ is bounded by
where both moment bounds are tight. The lower boundary is reached when 𝔼[𝒴] tends to 0 and the upper boundary is reached when 𝒴 is Erlang(λ, m) distributed and λ tends to 0.
For the SCV, we have the following corollary.
Corollary 5
The ${\rm SCV}_\mathcal{Z}$ is bounded by the following λ independent and dependent moment bounds:
Proof. From Lemmas 4 and 6 we have 𝔼[𝒲2] ≤ (m + 1)T𝔼 [𝒲]/(m + 2) and
respectively. Subtracting 𝔼[𝒲]2 and then dividing by (b + 𝔼[𝒲])2 in (16) gives the corollary.
Different from the b = 0 case, the ${\rm SCV}_{\mathcal{Z}}$ can reach 0 in the b > 0 case.
4. Discussion and conclusion
In this paper we presented new moment bounds on PH distributions with infinite and finite supports by using the steepest increase property. For PH distributions with infinite support and a PH representation (α, A) of size m, denoted as 𝒴, we have
• shown that any PH distribution is stochastically smaller than or equal to an Erlang distribution 𝒳(m,λ) with λ the absolute value of the dominant eigenvalue of A; and
• obtained upper bounds of moments in terms of m and λ (e.g. 𝔼[𝒴] ≤ m/λ).
For PH distributions with finite support (for the set FTPH1), denoted as 𝒲 = 𝒴 | 𝒴 < T, we have
• obtained upper bounds of moments in terms of m and T;
• obtained lower and upper bounds of moments depending on λ;
• shown that 𝔼[𝒲]≤ min{mT/(m + 1), m/λ}; and
• shown that SCV𝒲 ≥ 1/(m(m + 2)).
For the finite support case, we focused on the distribution set FTPH1. Results for the set FTPH2 can be obtained similarly. The set FTPH3 is a convex mixture of FTPH1 and FTPH2. Moment bounds can also be obtained as a convex mixture of the moment bounds obtained for FTPH1 and FTPH2, but this is outside the scope of the current work.
Acknowledgements
This work was supported by the Hungarian research project OTKA K123914, a NSERC
discovery grant (Canada), and by the János Bolyai Research Scholarship of the Hungarian Academy of Sciences. The authors would like to thank two anonymous referees and an Associate Editor for their exceptionally careful reviews of the paper.