Hostname: page-component-745bb68f8f-v2bm5 Total loading time: 0 Render date: 2025-02-06T20:06:41.879Z Has data issue: false hasContentIssue false

Inhomogeneous phase-type distributions and heavy tails

Published online by Cambridge University Press:  11 December 2019

Hansjörg Albrecher*
Affiliation:
Université de Lausanne
Mogens Bladt*
Affiliation:
University of Copenhagen
*
*Postal address: Department of Actuarial Science, Université de Lausanne, Quartier UNIL-Chamberonne, Bâtiment Extranef, 1015 Lausanne, Switzerland.
**Postal address: Department of Mathematical Sciences, University of Copenhagen, Universitetsparken 5, DK-2100 Copenhagen Ø, Denmark.
Rights & Permissions [Opens in a new window]

Abstract

We extend the construction principle of phase-type (PH) distributions to allow for inhomogeneous transition rates and show that this naturally leads to direct probabilistic descriptions of certain transformations of PH distributions. In particular, the resulting matrix distributions enable the carrying over of fitting properties of PH distributions to distributions with heavy tails, providing a general modelling framework for heavy-tail phenomena. We also illustrate the versatility and parsimony of the proposed approach in modelling a real-world heavy-tailed fire insurance dataset.

Type
Research Papers
Copyright
© Applied Probability Trust 2019 

1. Introduction

A phase-type (PH) distribution is the distribution of the time until absorption in a Markov jump process with finitely many states of which one is absorbing and the remainder transient. PH distributions have a long history in applied probability (see [Reference Bladt and Nielsen6] and references therein), especially in areas such as insurance risk and queueing theory, where under PH assumptions exact solutions and explicit formulae can often be derived even in rather complex models (see e.g., [Reference Asmussen3,Reference Asmussen and Albrecher4]). Though the class of phase-type distributions is dense in the class of distributions on the positive real line (in the sense of weak convergence), by construction PH tails are always light (exponential decay), and the latter property is a main concern regarding applications where heavy tails are present (such as large claims in insurance).

In [Reference Bladt, Nielsen and Samorodnitsky7] the class of PH distributions was extended to allow for infinite-dimensional matrices, which led to a class of distributions with phase-type-like properties and a genuinely heavy tail, and the estimation of such distributions was considered in [Reference Bladt and Rojas-Nandayapa8].

In this paper we take another approach to defining a tractable and dense class of heavy-tailed distributions. The idea is to transform PH distributions into heavy-tailed ones, but rather than transforming the PH random variable directly, we transform the time scales of each state of the underlying Markov process, leading to time-dependent jump intensities. This time scaling allows us to carry over some of the computational tools and advantages of the PH class outside of the latter and leads to a probabilistic description of classes of heavy-tailed distributions akin to the one for PH distributions. Compared to the approach in [Reference Bladt, Nielsen and Samorodnitsky7], the present approach has the advantage of preserving the finite dimensionality, and offers flexibility and transparency in the choice of tail type. A distinctive disadvantage is that renewal theory and, consequently, ladder height techniques break down due to the inhomogeneity of time. However, we show that for certain transformations calculations are still explicit, and the flexibility of PH distributions can in this way be more efficiently transferred to heavy-tailed distributions than with previous approaches. Moreover, the resulting new class of matrix distributions is again dense in the class of all distributions on the positive half-line, which paves the way for using it in model fitting.

From a modelling perspective, the approach proposed in this paper has some attractive features. Let us recall that a particular strength of the class of PH distributions is that it extends the favourable properties of an exponential random variable (probabilistic ones such as lack of memory, as well as analytic ones such as the simplicity of calculations due to the elementary and paramount role of the exponential function in real and complex analysis) to cases where the exponential random variable itself is not a good model. It does so by concatenating these properties in a Markov jump process framework, leading to matrix expressions, but still maintaining those favourable properties. The denseness of the PH class then in principle allows the approximation of any random variable and its role in the respective model with these techniques. On the other hand, Pareto random variables with their heavy tails can be motivated in various ways for particular modelling applications. One may, for instance, interpret them in terms of scaling phenomena in nature (like with considerations of Zipf’s law). One can also simply view them as model candidates with power tails, which decay slower than exponential. Indeed, in various applications such tail behaviour can be observed in data. Alternatively, one may also just see them as (possibly rescaled and shifted) exponentials of exponential random variables, which inherits some attractive mathematical properties, like a scaled version of lack of memory (and the latter is one reason for the particular attractiveness of Pareto random variables as modelling tools in certain applications, like reinsurance). At the same time, a pure Pareto random variable is often not a good fit to data (particularly when the entire range is considered). One way to deal with this in practice is to use spliced models (also sometimes called composite models), where one type of model is used for smaller values and a Pareto model is used for the tail (see, e.g., [Reference Pigeon and Denuit16-Reference Scollnik18]).

A number of known distributions can be obtained as the distribution of a transformed exponential random variable. This is, for example, indeed the case for the Pareto, the Weibull, and the generalized extreme value (GEV) distributions. The obvious idea is to propose a matrix version of these distributions by transforming a phase-type distributed random variable with the same transformations. We show that the resulting distributions have similar forms and properties as their original counterparts, and that they may be seen as inhomogeneous phase-type distributions. This provides a unifying framework for seeing transformed phase-type distributed random variables as absorption times.

Transformed phase-type random variables were, for example, treated in [Reference Ahn, Kim and Ramaswami1], where the logarithmic transformation is used. This results in a Pareto type of tail behaviour. For other transformations we may obtain tail behaviours of other types, e.g. Weibull or GEV. The class of transformed distributions of a certain type is again dense in the class of distributions on the positive reals. In fact we will provide different sub-classes generated by mixtures of transformed Erlang distributions which are dense and can be written in an explicit and simple form. In principle a classical phase-type distribution may approximate any heavy-tailed positive distribution arbitrarily well, but since they are light-tailed the approximating distribution will always have a distinct tail, which can become an important issue if applied to situations where the tail behaviour matters (e.g. ruin probabilities). If one instead fits a transformed phase-type distribution to data with the ‘correct’ tail, then not only will one be able to capture the proper tail behaviour but it will also lead to more parsimonious models (with fewer components) than fitting with the traditional PH class. On the other hand the transformed phase-type distributions may have computational advantages (and possibly methodological advantages in terms of interpretability) over other competitors like random variables with more general regularly varying tails.

From a statistical point of view one can apply standard techniques from phase-type fitting to the transformed random variables. This is, for instance, the case for the log-transformed phase-type distributions where one simply fits a phase-type distribution to the logarithm of the data points. In [Reference Ahn, Kim and Ramaswami1], a log-phase-type distribution was fitted to Danish fire insurance data. We provide an example using Dutch insurance data, where the main purpose is a comparison of our approach with a previously obtained model in [Reference Albrecher, Beirlant and Teugels2] which was based on splicing Erlang and Pareto distributions. Since the splicing components are in some sense implicit in the log-phase-type setup, it is interesting to see whether we shall be able to retrieve parts of that model (and in a more automated way).

While our estimation method is based on maximum likelihood, which treats each data point with the same weight, it may be interesting in future research to extend the respective statistical techniques to incorporate extreme value analysis considerations for the fitting procedure of this type of model, with more weight being given to larger data points.

The paper is organized as follows. In Section 2 we derive various theoretical properties of inhomogeneous PH distributions. On the basis of this representation, Section 3 gives various properties for the class of matrix-Pareto distributions and provides some details for several special cases. Subsequently, Section 4 considers other transformations of PH distributions and studies some properties for the resulting distribution classes. In Section 5 we then discuss in more detail the modelling dimension of matrix-Pareto distributions and illustrate its use for a set of Dutch fire insurance data that was recently studied with other statistical techniques in [Reference Albrecher, Beirlant and Teugels2]. Finally, Section 6 concludes the paper.

2. Inhomogeneous phase-type distributions

Consider a time-inhomogeneous Markov jump process [Reference Goodman and Johansen12] $\{ X_t \}_{t\geq 0}$ on the finite state-space $E=\{1,2,\ldots,p,p+1\}$ , where states $1,2,\ldots,p$ are transient and $p+1$ absorbing. Thus, the intensity matrix of $\{ X_t \}_{t\geq 0}$ is of the form

\begin{equation} \boldsymbol{\Lambda}(t) = \begin{pmatrix} \textbf{T}(t) & \textbf{t}(t) \\ \boldsymbol{0} & 0 \end{pmatrix}\!. \end{equation}

Here, for any time $t\ge 0$ , $\textbf{t}(t)=-\textbf{T}(t)\textbf{e}$ , where $\textbf{e}$ denotes the column vector $\textbf{e}=(1,1,\ldots,1)^\top$ . Assume that ${\mathrm{P}} (X_0=p+1)=0$ and define $\boldsymbol{\pi}=(\pi_1,\ldots,\pi_p)$ .

Definition 2.1. Let

\begin{equation*} \tau = \inf \{ t\geq 0 \,:\, X_t=p+1 \} \end{equation*}

denote the time until absorption of $\{ X_t\}$ . The distribution of $\tau$ is then said to be an inhomogeneous phase-type distribution with representation ${(\boldsymbol{\pi},\textbf{T}(t))}$ , and we write $\tau \sim \mbox{IPH}(\boldsymbol{\pi},\textbf{T}(t))$ .

The transition matrix $\textbf{P}(s,t)=\{ p_{ij}(s,t) \}$ , where

\begin{equation*} p_{ij}(s,t)={\mathrm{P}} (X_t=j \mid X_s=i) , \end{equation*}

is related to the intensity matrix in terms of the product integral (see [Reference Gill and Johansen11,Reference Slavk19])

\begin{equation*} \textbf{P}(s,t)= \prod_{s}^t (\textbf{I}+\boldsymbol{\Lambda}(u)\,{\rm d} u) , \end{equation*}

which is defined by

\begin{equation*} \prod_{s}^t (\textbf{I}+\boldsymbol{\Lambda}(u)\,{\rm d} u) = \textbf{I} + \sum _ { k = 1} ^ { \infty } \int _ { s } ^ { t } \int _ { s } ^ { u _ { k } } \cdots \int _ { s } ^ { u _ { 2} } \boldsymbol{\Lambda} ( u _ { 1 } ) \cdots \boldsymbol{\Lambda} ( u _ { k} ) \,{\rm d} u _ { 1} \cdots \,{\rm d} u _ { k } . \end{equation*}

It is then straightforward to see that

(1) \begin{equation} \textbf{P}(s,t) = \begin{pmatrix} \displaystyle \prod_{s}^t (\textbf{I}+\textbf{T}(u)\,{\rm d} u) & \textbf{e}-\displaystyle \prod_{s}^t (\textbf{I}+\textbf{T}(u)\,{\rm d} u)\textbf{e} \\ \boldsymbol{0} & 1 \end{pmatrix}, \label{eqn1} \end{equation}

where $\textbf{e}=(1,1,\ldots,1)^{\top}$ .

Theorem 2.2. Assume that $\tau \sim {\rm IPH}(\boldsymbol{\pi},\textbf{T}(t))$ . Then the density f and the distribution function F of $\tau$ are given by

\begin{align*} f(x)&= \boldsymbol{\pi}\prod_{0}^x (\textbf{I}+\textbf{T}(u)\,{\rm d} u)\textbf{T}(x) , \\ F(x)&= 1 - \boldsymbol{\pi}\prod_{0}^x (\textbf{I}+\textbf{T}(u)\,{\rm d} u)\textbf{e} . \end{align*}

Proof. Since ${(\boldsymbol{\pi},0) \textbf{P}(0,t)}$ is the distribution of $X_t$ , we get from (2) that

\begin{equation*} {\mathrm{P}} (X_t=j) = \boldsymbol{\pi}\prod_{0}^t (\textbf{I}+\textbf{T}(u)\,{\rm d} u)\textbf{e}_j .\end{equation*}

The density f for $\tau$ is now readily derived as

\begin{align*} f(x)\,{\rm d} x &= {\mathrm{P}} (\tau\in (x,x+{\rm d}x]) \\ &= \sum_{j=1}^p \boldsymbol{\pi}\prod_{0}^x (\textbf{I}+\textbf{T}(u)\,{\rm d} u)\textbf{e}_j t_j(x)\,{\rm d} x \\ &= \boldsymbol{\pi}\prod_{0}^x (\textbf{I}+\textbf{T}(u)\,{\rm d} u)\textbf{T}(x)\,{\rm d} x , \end{align*}

so that

\begin{equation*} f(x) = \boldsymbol{\pi}\prod_{0}^x (\textbf{I}+\textbf{T}(u)\,{\rm d} u)\textbf{t}(x) . \label{eq:density} \end{equation*}

The distribution function F for $\tau$ is obtained by

\begin{align*} 1-F(x) &= {\mathrm{P}} (\tau >x) \\ &={\mathrm{P}} (X_x \in \{ 1,2,\ldots,p\} ) \\ &=\sum_{j=1}^p {\mathrm{P}} (X_x=j) \\ &= \boldsymbol{\pi}\prod_{0}^x (\textbf{I}+\textbf{T}(u)\,{\rm d} u)\textbf{e} . \tag*{$\square$} \end{align*}

By an entirely similar argument we obtain the following.

Theorem 2.3. If $\tau \sim {\rm IPH}(\boldsymbol{\pi},\textbf{T}(t))$ then

\begin{equation} {\mathrm{P}} (\tau >s+t \mid \tau>s) = \frac{\boldsymbol{\pi} \prod_{0}^s(1+\textbf{T}(u)\,{\rm d} u)}{\boldsymbol{\pi} \prod_{0}^s(1+\textbf{T}(u)\,{\rm d} u)\textbf{e}}\prod_s^t(1+\textbf{T}(u)\,{\rm d} u)\textbf{e} \end{equation}

so that

\begin{equation*} \tau-s \mid \{\tau>s \} \sim {\rm IPH}( \boldsymbol{\alpha}, \textbf{S}(\!\cdot\!) ) ,\end{equation*}

where $\textbf{S}(u)=\textbf{T}(s+u)$ and

\begin{equation*} \boldsymbol{\alpha}= \frac{\boldsymbol{\pi} \prod_{0}^s(1+\textbf{T}(u)\,{\rm d} u)}{\boldsymbol{\pi} \prod_{0}^s(1+\textbf{T}(u)\,{\rm d} u)\textbf{e}} .\end{equation*}

Hence the overshoot over a certain level is again phase-type distributed. Methods for numerical evaluation of transition probabilities given by the product integral are treated in, e.g., [Reference Helton and Stuckwisch13] or [Reference Møller14]. We shall, however, concentrate on the cases where the matrices $\textbf{T}(u)$ commute, which will simplify expressions and numerical methods considerably.

Corollary 2.4. If $\tau \sim {\rm IPH}(\boldsymbol{\pi},\textbf{T}(t))$ and $\textbf{T}(t_1)$ and $\textbf{T}(t_2)$ commute for all $t_1,t_2\geq 0$ , then the density f and the distribution function F of $\tau$ are given by

\begin{align*} f(x) &= \boldsymbol{\pi}\exp \bigg( \int_0^x \textbf{T}(u)\,{\rm d} u \bigg)\textbf{t}(x) , \\ F(x) &= 1 - \boldsymbol{\pi}\exp \bigg( \int_0^x \textbf{T}(u)\,{\rm d} u \bigg)\textbf{e} . \end{align*}

Proof. If $\textbf{T}(t_1)$ and $\textbf{T}(t_2)$ commute for all $t_1,t_2\geq 0$ , then

\begin{eqnarray*} {\int _ { s } ^ { t } \int _ { s } ^ { u _ { k } } \cdots \int _ { s } ^ { u _ { 2} } \textbf{T} ( u _ { k } ) \cdots \textbf{T} ( u _ { 1} ) \,{\rm d} u _ { 1} \cdots \,{\rm d} u _ { k } }~~~~~~~~~~~~~\\ &=& \frac { 1} { k ! } \int _ { s } ^ { t } \int _ { s } ^ { t } \cdots \int _ { s } ^ { t } \textbf{T} ( u _ { k } ) \cdots \textbf{T} ( u _ { 1} ) \,{\rm d} u _ { 1} \cdots \,{\rm d} u _ { k } \\ &=& \frac { 1} { k ! } \bigg( \int_s^t \textbf{T}(u)\,{\rm d} u \bigg)^k, \end{eqnarray*}

so

\begin{equation*} \prod_{s}^t (\textbf{I}+\textbf{T}(u)\,{\rm d} u) = \exp \bigg( \int_s^t \textbf{T}(u)du \bigg), \end{equation*}

from which the result follows.

An important special case where all matrices $\textbf{T}(\!\cdot\!)$ commute is the class of sub-intensity matrices of the form $\textbf{T}(t) = \lambda (t) \textbf{T}$ , which corresponds to the case where all the transition rates of the Markov chain are scaled in the same way. More generally, block-diagonal matrices of the form

\begin{equation*} \begin{pmatrix} \textbf{S}(u) & \boldsymbol{0} \\ \boldsymbol{0} & \textbf{T}(u) \end{pmatrix} \end{equation*}

commute whenever $\{ \textbf{S}(u) \}$ commute and $\{ \textbf{T}(u) \}$ commute. This includes the case where one of the two families of matrices are constant, e.g. $\textbf{S}(u)=\textbf{S}$ for all $u\geq 0$ . In the latter case, the resulting distribution will be a mixture of an inhomogeneous and a homogeneous phase-type distribution.

Definition 2.5. If $\textbf{T}(t) = \lambda (t) \textbf{T}$ , where $\lambda (t)$ is some known non-negative real function and $\textbf{T}$ is a sub-intensity matrix, then we shall write $\tau \sim \mbox{IPH}(\boldsymbol{\pi},\textbf{T},\lambda )$ instead of $\mbox{IPH}(\boldsymbol{\pi},\textbf{T}(t))$ .

Corollary 2.6. If $\tau \sim {\rm IPH}(\boldsymbol{\pi},\textbf{T},\lambda )$ , then the density f and the distribution function F of $\tau$ are given by

\begin{align*} f(x) &= \lambda (x) \boldsymbol{\pi}\exp \bigg( \int_0^x \lambda (t)\,{\rm d} t\, \textbf{T} \bigg)\textbf{t}, \\ F(x) &= 1 - \boldsymbol{\pi}\exp \bigg( \int_0^x \lambda (t)\,{\rm d} t\, \textbf{T} \bigg)\textbf{e}. \end{align*}

Proof. Follows immediately from

\begin{align*} \textbf{T}(t) & = -\textbf{T}(t)\textbf{e} = -\lambda (t) \textbf{T}\textbf{e} = \lambda (t) \textbf{T} . \tag*{$\square$} \end{align*}

Remark 2.7. The commutativity condition of the matrices $\textbf{T}(t)$ may be slightly relaxed by assuming only quasi-commutativity of the family $\{ \textbf{T}(t)\}$ , which is characterized by the property

\begin{equation*} \frac{{\rm d}}{{\rm d}t} \bigg( \int_s^t \textbf{T}(u)\,{\rm d} u \bigg)^k = \frac{1}{k} \bigg( \int_s^t \textbf{T}(u)\,{\rm d} u \bigg)^{k-1} \textbf{T}(t) , \end{equation*}

see [9, Section 1.4]. In this case the above expressions for the density and the distribution function remain valid.

If $\tau \sim \mbox{IPH}(\boldsymbol{\pi},\textbf{T},\lambda )$ and $\lambda (t) \equiv 1$ , then $\tau$ has a conventional phase-type distribution (see [Reference Bladt and Nielsen6]), in which case we write $\tau\sim \mbox{PH}(\boldsymbol{\pi},\textbf{T})$ . Now consider $X\sim \mbox{PH}(\boldsymbol{\pi},\textbf{T})$ , $g\,:\,\mathbb{R}_+\rightarrow \mathbb{R}_+$ an increasing and differentiable function, and $Y\,:\!=g(X)$ . Then

\begin{align*} \skew3\bar{F}_Y(\kern1pty)&=1-F_Y(\kern1pty) = {\mathrm{P}} (g(X)> y ) \\ &={\mathrm{P}} (X> g^{-1}(\kern1pty)) \\ &=\boldsymbol{\pi}{\rm e}^{\textbf{T}g^{-1}(\kern1pty)}\textbf{e}, \end{align*}

and consequently, using that $\textbf{t}=-\textbf{T}\textbf{e}$ ,

\begin{equation*} f_Y(\kern1pty) = \frac{{\rm d}}{{\rm d}y}F_Y(\kern1pty) = \boldsymbol{\pi}{\rm e}^{\textbf{T}g^{-1}(\kern1pty)}\textbf{t} \frac{1}{g^\prime (g^{-1}(\kern1pty))} .\label{eq:dens} \end{equation*}

This suggests that a random variable $\tau\sim \mbox{IPH}(\boldsymbol{\pi},\textbf{T},\lambda )$ can be obtained as a transformation of a phase-type distributed random variable $X\sim \mbox{PH}(\boldsymbol{\pi},\textbf{T})$ as follows.

Theorem 2.8. Let $\tau \sim {\rm IPH}(\boldsymbol{\pi},\textbf{T},\lambda )$ and let g be defined in terms of its inverse function through

\begin{equation*} g^{-1} (x) = \int_0^x \lambda (t)\,{\rm d} t . \end{equation*}

Then

\begin{equation*} \tau \stackrel{{\rm d}}{=} g(X) , \end{equation*}

where $X\sim {\rm PH}(\boldsymbol{\pi},\textbf{T})$ .

Proof. First, we notice that g is well defined since $\lambda (t)>0$ for all t, and correspondingly $x\rightarrow \int_0^x \lambda (t)\,{\rm d} t$ is strictly increasing. Then consider $Y=g(X)$ . We have that

\begin{align*} {\mathrm{P}} (Y>y)&= {\mathrm{P}} ( X> g^{-1}(\kern1pty) ) \\ &= {\mathrm{P}} \bigg( X> \int_0^y \lambda (t)\,{\rm d} t \bigg) \\ &= \boldsymbol{\pi} \exp \bigg( \int_0^y \lambda (t)\,{\rm d} t \, \textbf{T} \bigg) \textbf{e} . \end{align*}

Thus $Y\sim \tau$ .

If h is an analytic function and $\textbf{A}$ is a matrix, we define

\begin{equation*} h(\textbf{A}) = \frac{1}{2\pi {\rm i}} \oint_\gamma h(z)(z\textbf{I}-\textbf{A})^{-1}\,{\rm d} z , \end{equation*}

where $\gamma$ is a simple path enclosing the eigenvalues of $\textbf{A}$ . We refer to Section 3.4 of [Reference Bladt and Nielsen6] for further details.

Theorem 2.9. Let $\tau \sim {\rm IPH}(\boldsymbol{\pi},\textbf{T},\lambda )$ , with g defined in terms of its inverse function through

\begin{equation*} g^{-1} (x) = \int_0^x \lambda (t)\,{\rm d} t . \end{equation*}

Assume that the Laplace transform for g, $L_g(s)$ , exists for all $s>\max_i {\rm Re}(\lambda_i)$ , where $\lambda_i$ denote the eigenvalues for $\textbf{T}$ . Then the mean of $\tau$ is given by

(2) \begin{equation} {\mathrm{E}} (\tau ) = \boldsymbol{\pi} L_g({-}\textbf{T})\textbf{t} . \label{eqn2} \end{equation}

More generally, if $g^\alpha$ exists in the same domain as above, where $\alpha>0$ , then

(3) \begin{equation} {\mathrm{E}} (\tau^\alpha ) = \boldsymbol{\pi} L_{g^\alpha}({-}\textbf{T})\textbf{t} , \label{eqn3} \end{equation}

where $L_{g^\alpha}$ denotes the Laplace transform of $g^\alpha$ .

Proof. Laplace transforms are analytic functions in the domain where they exist, so

\begin{equation*} {\mathrm{E}} (\tau)={\mathrm{E}} g(X) = \int_0^\infty g(x)\boldsymbol{\pi}{\rm e}^{\textbf{T}x}\textbf{T} \,{\rm d} x = \boldsymbol{\pi}L_g ({-}\textbf{T})\textbf{t} . \end{equation*}

Similarly, if $L_{g^\alpha}$ exists, then it is analytic and

\begin{align*} {\mathrm{E}} (\tau^\alpha)={\mathrm{E}} g^\alpha (X) = \boldsymbol{\pi}L_{g^\alpha} ({-}\textbf{T})\textbf{T} . \tag*{$\square$} % \end{equation*} \end{align*}

We notice the following trivial but important result.

Theorem 2.10. Suppose that

\begin{equation*} \tau_i \sim g(X_i) \end{equation*}

where $X_1,\ldots,X_N$ are independent and $X_i\sim {\rm PH}(\boldsymbol{\pi},\textbf{T})$ . Let X be distributed as the mixture of $X_1,\ldots,X_N$ , with density

\begin{equation*} f_X(x)=\sum_{i=1}^N \alpha_i\ f_{X_i}(x), \end{equation*}

where $\alpha_1+ \cdots +\alpha_N=1$ . Then the mixture of $\tau_1,\ldots,\tau_N$ with probabilities $\alpha_1,\ldots,\alpha_N$ is distributed as g(X).

3. Matrix-Pareto distributions

We start by proving the following result.

Theorem 3.1. Let $Z=\exp (X) -1$ where $X\sim {\rm PH}(\boldsymbol{\pi},\textbf{T})$ . Let $F_Z$ , $\skew3\bar{F}_Z$ , and $f_Z$ denote the distribution function, survival function, and density of Z, respectively. Then

\begin{align*} \skew3\bar{F}_Z(z) &= 1-F_Z(z)=\boldsymbol{\pi}(z+1)^{\textbf{T}}\textbf{e} , \\ f_Z(z) &= \boldsymbol{\pi}(z+1)^{\textbf{T}-\textbf{I}}\textbf{t} . \end{align*}

If the real part of the eigenvalue for $\textbf{T}$ with maximum real part is smaller than $-\alpha$ , then

\begin{equation*} {\mathrm{E}} ((Z+1)^\alpha) = \boldsymbol{\pi}({-}\alpha\textbf{I}-\textbf{T} )^{-1}\textbf{t} <\infty. \end{equation*}

The Laplace transform of Z is given by

\begin{equation} L_Z(s) = {\rm e}^{s}\boldsymbol{\pi}s^{-\textbf{T}}\Gamma (\textbf{T},s)\textbf{t} , \ \ \qquad s>0, \end{equation}

where

\begin{equation*} \Gamma (\alpha,s) = \int_s^\infty t^{\alpha-1}{\rm e}^{-t}\,{\rm d} t \end{equation*}

is the upper incomplete $\Gamma$ function.

Proof. Assuming that $z>0$ , $x\rightarrow x\log(1+z)$ is analytic in the whole complex plane, and so therefore is its exponential $x\rightarrow (1+z)^x$ . Therefore ${(1+z)^{\textbf{T}}}$ can be written in terms of the Cauchy integral formula

\begin{equation} (1+z)^{\textbf{T}}=\frac{1}{2\pi{\rm i}}\oint_\gamma (1+z)^y (\kern1pty\textbf{I}-\textbf{T})^{-1}\,{\rm d} y , \end{equation}

where $\gamma$ is a simple path in $\mathbb{C}$ enclosing the eigenvalues for $\textbf{T}$ . The results then follow from Theorem 2.8 with $\lambda (\kern1pty)={1}/(1+y)$ (which indeed results in $g(x)={\rm e}^x-1$ ) and Corollary 2.6.

The mean is straightforward, as the mean of ${\rm e}^{X}$ is simply the moment-generating function of X evaluated at 1. The $\alpha$ -moment of $Y+1$ is obtained by noting that the Laplace transform of ${\rm e}^{\alpha x}$ is $s\rightarrow 1/(s-\alpha )$ and using (3). Concerning the Laplace transform,

\begin{align} L_Z(s) &= \int_0^{\infty}{\rm e}^{-sz} f_Z(z)\,{\rm d} z \\ &= \boldsymbol{\pi}\int_0^{\infty}(1+z)^{\textbf{T}-\textbf{I}}{\rm e}^{-sz}\,{\rm d} z \, \textbf{t} \\ &= {\rm e}^{s}\boldsymbol{\pi}s^{-\textbf{T}}\Gamma (\textbf{T},s)\textbf{t} , \end{align}

since $\alpha\rightarrow \Gamma (\alpha ,s)$ is an entire function for $s\neq 0$ (see [Reference Olver, Lozier, Boisvert and Clark15], 8.2).

Remark 3.2. In [Reference Ahn, Kim and Ramaswami1], similar results are proved for the distribution of $\exp (X)$ (not subtracting 1) apart from the Laplace transform. Their results are equivalent, though the presentation differs in notation and in our use of functional calculus in the proof.

Remark 3.3. If the real part of the eigenvalue for $\textbf{T}$ with maximum real part is smaller than $-1$ , then the mean of $Z={\rm e}^{X}-1$ exists and is given by

\begin{equation*} {\mathrm{E}} (Z)= \boldsymbol{\pi}({-}\textbf{I}-\textbf{T} )^{-1}\textbf{t}-1 .\end{equation*}

Remark 3.4. The matrix $\Gamma (\textbf{T},s)$ is given by

\begin{equation*} \Gamma (\textbf{T},s) = \frac{1}{2\pi{\rm i}} \int_\gamma \Gamma (z,s)(z\textbf{I}-\textbf{T})^{-1}\,{\rm d} z , \end{equation*}

where $\gamma$ is a simple path enclosing the eigenvalues for $\textbf{T}$ . If all the eigenvalues are simple (i.e. have multiplicity one), then $ \Gamma (\textbf{T},s)$ can be readily calculated by the residual theorem, as the terms only involve $\Gamma (\lambda_i,s)$ divided by polynomials. On the other hand, if an eigenvalue $\lambda$ has multiplicity $m>1$ , then also the $(m-1)$ -order derivative of a polynomial times $\Gamma (z,s)$ with respect to z will have to be evaluated at $\lambda$ , which in turn involves derivatives of $z\rightarrow \Gamma (z,s)$ of orders $1,\ldots,m-1$ .

While the distribution of $\exp (X)$ can naturally be called a log-phase-type distribution, we shall refer to distributions of $\exp (X)-1$ as matrix-Pareto distributions, since the distribution of Z from Theorem 3.1 may be seen as a Pareto distribution with a matrix parameter. A previous example conforming to this idea relates to the class of matrix-exponential distributions (containing the phase-type distributions), which bears its name from the fact that its elements can be seen as exponential distributions with a matrix parameter.

To gain generality, and in particular in order to draw on the analogy with the generalized Pareto distribution $G_{\xi,\beta}(x)=1-(1+\xi x/\beta)^{-1/\xi}$ for $\xi>0$ (cf. [Reference Embrechts, Klüppelberg and Mikosch10]), we introduce the following additional scaling in the transformation (for reasons that become clear in Theorem 3.6, it is useful to express the scaling in units of $\mu={\mathrm{E}} (X)$ ):

Definition 3.5. (Matrix-Pareto distribution.) Let $X\sim \mbox{PH}(\boldsymbol{\pi},\textbf{T})$ and define

\begin{equation*} Y = \frac{\beta ({\rm e}^{X}-1)}{\mu } , \end{equation*}

where $\beta\in\mathbb{R}$ . If we parametrize with

\begin{equation*} \mu = {\mathrm{E}} (X) = \boldsymbol{\pi}({-}\textbf{T})^{-1}\textbf{e} ,\end{equation*}

we say that the distribution of Y follows a matrix-Pareto distribution, and write

\begin{equation*} Y \sim \mbox{M-Pareto}(\beta,\boldsymbol{\pi},\textbf{T}) .\end{equation*}

If $ Y \sim \mbox{M-Pareto}(\beta,\boldsymbol{\pi},\textbf{T})$ , then

\begin{equation*} Y \sim \frac{\beta}{\mu} Z , \end{equation*}

where $Z={\rm e}^X-1$ . Then, from Theorem 3.1 we get that

\begin{equation*} {\mathrm{P}} (Y>y) = \boldsymbol{\pi} \bigg( 1+ \mu \frac{y}{\beta} \bigg)^{\textbf{T}}\textbf{e} \end{equation*}

and

\begin{equation*} f_Y(\kern1pty)=\frac{\mu}{\beta}\boldsymbol{\pi}\bigg(1+\mu \frac{y}{\beta}\bigg)^{\textbf{T}-\textbf{I}}\textbf{t} .\end{equation*}

Moreover, by a simple transformation of the formula in Theorem 3.1, we get that Y has Laplace transform

\begin{equation*} L_Y(s)= {\rm e}^{\beta s/\mu}\boldsymbol{\pi}\bigg( \frac{\mu}{s\beta} \bigg)^{\textbf{T}}\Gamma (\textbf{T},s\beta/\mu)\textbf{t} . \end{equation*}

Next, consider ${\mathrm{P}} (Y>x+y \mid Y>x)$ . From Theorem 2.3 we then get that

\begin{align} {\mathrm{P}} (Y>x+y \mid Y>x) &= \frac{\boldsymbol{\pi} \big(1 +\mu \frac{x}{\beta} \big)^{\textbf{T}} }{\boldsymbol{\pi} \big(1 +\mu \frac{x}{\beta} \big)^{\textbf{T}} \textbf{e}}\cdot \bigg[ \frac{ 1+\mu \frac{x}{\beta}+ \mu \frac{y}{\beta}}{1+\mu \frac{x}{\beta}} \bigg]^{\textbf{T}}\textbf{e} \\ &= \frac{\boldsymbol{\pi} \big(1 +\mu \frac{x}{\beta} \big)^{\textbf{T}} }{\boldsymbol{\pi} \big(1 +\mu \frac{x}{\beta} \big)^{\textbf{T}} \textbf{e}}\cdot \bigg[ 1+\frac{\mu }{\beta+\mu x} y \bigg]^{\textbf{T}}\textbf{e}, \end{align}

from which we see that conditionally on $Y>x$ , Y has a matrix-Pareto distribution given by

\begin{equation*} \mbox{M-Pareto}\left( \beta +\mu x,\frac{\boldsymbol{\pi} \big(1 +\mu \frac{x}{\beta} \big)^{\textbf{T}} }{\boldsymbol{\pi} \big(1 +\mu \frac{x}{\beta} \big)^{\textbf{T}} \textbf{e}},\textbf{T} \right)\!. \end{equation*}

We collect the previous results in the following theorem.

Theorem 3.6. Let $Y\sim {\rm M-Pareto}(\beta,\boldsymbol{\pi},\textbf{T})$ . Denote by $ \skew3\bar{F}(\kern1pty)=\skew3\bar{F}_{\beta,\boldsymbol{\pi},\textbf{T}}(\kern1pty)={\mathrm{P}} (Y>y)$ the survival function of Y. Then:

(a) $\skew3\bar{F}_{\beta,\boldsymbol{\pi},\textbf{T}}(\kern1pty)= \boldsymbol{\pi} ( 1+ \mu \frac{y}{\beta})^{\textbf{T}}\textbf{e} $ .

(b) $ f_Y(\kern1pty)=\frac{\mu}{\beta}\boldsymbol{\pi}(1+\mu \frac{y}{\beta})^{\textbf{T}-\textbf{I}}\textbf{t} $ .

(c) ${\mathrm{P}} (Y>x+y \mid Y>x) = \skew3\bar{F}_{\beta +\mu x, \tilde{\pi},\textbf{T}}(\kern1pty)$ , where

\begin{equation*} \tilde{\boldsymbol{\pi}}= \frac{\boldsymbol{\pi} \big(1 +\mu \frac{x}{\beta} \big)^{\textbf{T}} }{\boldsymbol{\pi} \big(1 +\mu \frac{x}{\beta} \big)^{\textbf{T}} \textbf{e}} .\end{equation*}

(d) $L_Y(s)= {\rm e}^{\beta s/\mu}\boldsymbol{\pi}\big( \frac{\mu}{s\beta} \big)^{\textbf{T}}\Gamma (\textbf{T},s\beta/\mu)\textbf{t} $ .

(e) If the real part of the eigenvalue for $\textbf{T}$ with maximum real part is less than $-\alpha$ , then

\begin{equation*}{\mathrm{E}} \bigg(\bigg(\frac{\mu}{\beta}Y+1\bigg)^\alpha\bigg) = \boldsymbol{\pi}({-}\alpha\textbf{I}-\textbf{T} )^{-1}\textbf{t} <\infty.\end{equation*}

Remark 3.7. Note that these results can indeed be seen as generalizations of results from Theorem 3.4.13 of [Reference Embrechts, Klüppelberg and Mikosch10], which stated such properties for the generalized Pareto distribution (which for $\xi>0$ refers to the scalar case $p=1$ in our setting). In particular, property (c) above (cf. also [1, Eq. (7)]) is a considerable extension of [10, Eq. (3.53)], which can be quite useful when using models based on matrix-Pareto distributions. For instance, one reason for the popularity of Pareto distributions among heavy-tailed distributions in reinsurance modelling is the simple pricing formulas for XL reinsurance contracts based on [10, Eq. 3.53], where x will typically refer to the retention level. The present extension shows that these properties can be carried over to the much more general class of matrix-Pareto distributions.

Example 3.8. Consider the distribution of

\begin{equation*} Y={\rm e}^X-1, \end{equation*}

where X has an Erlang distribution $\mbox{Er}_n(\lambda)$ , i.e. $X\sim X_1+ \cdots +X_n$ for independent and identically distributed (i.i.d.) $X_i\sim {\rm Exp}(\lambda)$ . Then $Y\sim \mbox{M-Pareto}(\beta,\boldsymbol{\pi},\textbf{T})$ with $\beta=\mu={\mathrm{E}} (Y)$ (which is a particular type of log-Gamma distribution). A phase-type representation of the $\mbox{Er}_n(\lambda)$ distribution is $\boldsymbol{\pi}=(1,0,\ldots,0)$ and

\begin{equation*} \textbf{T} = \begin{pmatrix} -\lambda & \lambda & 0 & \ldots & 0 & 0 \\ 0 & -\lambda & \lambda & \ldots & 0 & 0 \\ 0 & 0 & -\lambda & \ldots & 0 & 0 \\ \vdots & \vdots &\vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & \ldots & -\lambda & \lambda \\ 0 & 0 & 0 & \ldots & 0 & -\lambda \end{pmatrix} ,\end{equation*}

where the vector is n-dimensional and $\textbf{T}$ is an $n\times n$ matrix. Now,

\begin{equation*} (s\textbf{I}-\textbf{T})^{-1} = \begin{pmatrix} \frac{1}{s+\lambda} & \frac{\lambda}{(s+\lambda)^2} & \frac{\lambda^2}{(s+\lambda)^3} & \ldots & \frac{\lambda^{n-1}}{(s+\lambda)^n} \\ 0 & \frac{1}{s+\lambda} & \frac{\lambda}{(s+\lambda)^2} & \ldots & \frac{\lambda^{n-2}}{(s+\lambda)^{n-1}} \\ 0 & 0 & \frac{1}{s+\lambda} & \ldots & \frac{\lambda^{n-3}}{(s+\lambda)^{n-2}} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \ldots & \frac{1}{s+\lambda} \end{pmatrix}, \end{equation*}

so

\begin{equation*} (1+x)^{\textbf{T}} = \frac{1}{2\pi {\rm i}} \oint_\gamma (1+x)^s (s\textbf{I}-\textbf{T})^{-1}\,{\rm d} s \end{equation*}

is a matrix with (i,j)th element, $j\geq i$ , equal to

\begin{equation*} \frac{1}{2\pi{\rm i}}\oint_\gamma \frac{\lambda^{j-i}(1+x)^s}{(s+\lambda)^{\:j-i+1}}\,{\rm d} s . \end{equation*}

Here, $\gamma$ is a simple path about the multiple eigenvalue $-\lambda$ . For $j<i$ the elements are clearly zero. Defining

\begin{equation*} \phi (s) = (s+\lambda)^{\:j-i+1}\frac{\lambda^{j-i}(1+x)^s}{(s+\lambda)^{\:j-i+1}} = \lambda^{j-i}(1+x)^{s} , \end{equation*}

we get, by the residue theorem,

\begin{equation*} \frac{1}{2\pi{\rm i}}\oint_\gamma \frac{\lambda^{j-i}(1+x)^s}{(s+\lambda)^{\:j-i+1}}\,{\rm d} s = \frac{\phi^{(\:j-i)}({-}\lambda)}{(\:j-i)!} , \end{equation*}

where $\phi^{(k)}(s)$ denotes the kth-order derivative of $\phi$ . But

\begin{equation*} \frac{{\rm d}^k}{{\rm d}s^k}(1+x)^s=(1+x)^s (\log(1+x))^k , \end{equation*}

so we conclude that

\begin{equation*} \frac{1}{2\pi{\rm i}}\oint_\gamma \frac{\lambda^{j-i}(1+x)^s}{(s+\lambda)^{\:j-i+1}}\,{\rm d} s = (1+x)^{-\lambda} \frac{\lambda^{j-i}}{(\:j-i)!}(\log(1+x))^{\:j-i}. \end{equation*}

Then the tail of Y is seen to be the sum of the terms in the first row of ${(1+y)^{\textbf{T}}}$ ,

\begin{align*} {\mathrm{P}} (Y>y) & = \boldsymbol{\pi}(1+y)^{\textbf{T}}\textbf{e} \\ & = (1+y)^{-\lambda}\sum_{j=0}^{n-1}\frac{\lambda^j}{j!}(\log(1+y))^{\:j} . \end{align*}

The density is given by

(4) \begin{align} f_Y(\kern1pty)&= \boldsymbol{\pi}(1+y)^{\textbf{T}-\textbf{I}}\textbf{t} \nonumber \\ &= (1+y)^{-1}\boldsymbol{\pi}(1+y)^{\textbf{T}}\textbf{t} \nonumber\\ &= \frac{\lambda^n}{(n-1)!}(1+y)^{-\lambda-1}[\log(1+y)]^{n-1}, \label{eqn4}\end{align}

since $\textbf{t}=(0,0,\ldots,0,\lambda)$ , so the density is essentially given by $\lambda$ times the (1,n)th element of ${(1+x)^{\textbf{T}}}$ .

An immediate consequence of Example 3.8 is that if X is a mixture of Erlang distributions $\mbox{Er}_{n_i}(\lambda_i)$ , $i=1,\ldots,N$ , i.e.

\begin{equation*} f_X(x)=\sum_{i=1}^N \alpha_i\ f_{X_i}(x) = \sum_{i=1}^N \alpha_i \frac{(x-1)^{n_i-1}}{(n_i-1)!}\lambda_i^{n_i} {\rm e}^{-\lambda_i x} , \end{equation*}

where $\alpha_1+ \cdots +\alpha_N=1$ , then Y is a mixture of distributions on the form (4), i.e.

\begin{equation*} f_Y(\kern1pty)=\sum_{i=1}^N \alpha_i\frac{\lambda^{n_i}}{(n_i-1)!}(1+y)^{-\lambda_i-1}[\log(1+y)]^{n_i-1}, \end{equation*}

cf. also Theorem 2.10. Mixtures of Erlang distributions constitute the smallest and simplest sub-class of phase-type distributions which is dense in the class of distributions on the positive real line, and has often been employed in applications (see, e.g., [Reference Willmot and Woo20]). Since $f\,:\, x\rightarrow \exp (x)-1$ and $f^{-1}\,:\, x\rightarrow \log (x+1) $ are continuous functions, by the continuous mapping theorem it is clear that the class of mixtures of log-Erlang distributions is also dense in the class of distributions on the positive half-line (this denseness was also already noted in [1, Theorem 1] for general log-PH distributions). Due to its potential for modelling purposes, we formulate the above result as a theorem.

Theorem 3.9. The class of matrix-Pareto distributions generated by mixtures of Erlang distributions,

\begin{equation*} f_X(x)= \sum_{i=1}^N \alpha_i\frac{(x-1)^{n_i-1}}{(n_i-1)!}\lambda_i^{n_i} {\rm e}^{-\lambda_i x}, \end{equation*}

contains densities of the form

\begin{equation*} f_Y(\kern1pty)=\sum_{i=1}^N \alpha_i\frac{\lambda^{n_i}}{(n_i-1)!}(1+y)^{-\lambda_i-1}[\log(1+y)]^{n_i-1}. \end{equation*}

The class is dense (in the sense of weak convergence) in the class of distributions on the positive real line.

This means that any distribution may be approximated arbitrarily closely by either a mixture of Erlang distributions or a mixture of matrix-Pareto distributions generated by Erlangs. Whether one chooses one over the other class when approximating a distribution in practice should depend on the tail of the distribution to be approximated. If the tail is of Pareto or log-Gamma type, then one will obtain a much better approximation using the matrix-Pareto; as for the mixtures of Erlangs, the number of phases needed would be large and still the resulting approximation will not capture the tail behaviour well. We illustrate this important point in Section 5 by fitting a general matrix-Pareto distribution to a heavy-tailed dataset, and it will turn out that an excellent fit is in fact obtained by a matrix-Pareto which is close to being generated by a mixture based on Erlangs.

Example 3.10. Consider the distribution of

\begin{equation*} Y={\rm e}^X-1, \end{equation*}

where X has a generalized Erlang distribution $\mbox{Er}_n(\lambda_1,\ldots,\lambda_n)$ , i.e. $X\sim X_1+ \cdots +X_n$ for i.i.d. $X_i\sim \mbox{Exp} (\lambda_i)$ , $i=1,\ldots,n$ . Assume for simplicity that all $\lambda_i$ are different. Then a calculation similar to the above reveals that the density for Y is given by

\begin{equation*} f_Y(\kern1pty)=\bigg(\prod_{j=1}^{n}\lambda_j\bigg) \sum_{i=1}^n \frac{(1+y)^{-\lambda_i-1}}{\prod_{j\neq i}(\lambda_j-\lambda_i)} . \end{equation*}

Example 3.11. While the derivation of Theorem 3.6(c) basically relies on a probabilistic argument using transition probabilities, most distributional properties in our construction rely only on the matrix-exponential forms of the phase-type distributions. To emphasize this point, we consider here a distribution which can be written in a matrix-exponential form but which is not phase-type. For such distributions the matrix-exponentials are no longer transition matrices. Distributions whose density can be written in terms of a matrix-exponential, including the phase-type distributions, are known as matrix-exponential distributions (see [Reference Bladt and Nielsen6] for further details) and consist exactly of the distributions with rational Laplace transform. Concretely,

\begin{equation*} f(x) = \frac{101}{100} {\rm e}^{-x}(1-\cos (10x)) \end{equation*}

is an example of a density that does not belong to a phase-type random variable but has rational Laplace transform

\begin{equation*} L_f(s)= \frac{101}{s^3+3s^2+103s+101} \end{equation*}

and can be written in matrix-exponential form as

\begin{equation*} f(x)=(101,0,0) \exp \left( \begin{pmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ -101 & -103 & -3 \end{pmatrix} x \right) \begin{pmatrix} 0 \\ 0\\ 1 \end{pmatrix} . \end{equation*}

Let $\boldsymbol{\pi}=(101,0,0)$ , $\textbf{t}=(0,0,1)^\prime$ , and let $\textbf{T}$ denote the matrix inside the exponential. Define the resolvent by

\begin{equation*} \textbf{R}(s)=(s\textbf{I}-\textbf{T})^{-1} \end{equation*}

and let

\begin{equation*} Y={\rm e}^{X}-1 . \end{equation*}

Then the density for Y is given by

\begin{equation} f_Y(\kern1pty)=\boldsymbol{\pi}(1+y)^{\textbf{T}-\textbf{I}}\textbf{t} , \end{equation}

and since only the first and last elements of $\boldsymbol{\pi}$ and $\textbf{t}$ are different from zero, respectively, only the (1,3) element in $\textbf{R}$ is of importance. Since

\begin{equation*} \textbf{R}(1,3) = \frac{1}{(s+1)(s+1+10{\rm i})(s+1-10{\rm i})}, \end{equation*}

the density for Y is readily calculated (using the residue theorem) to be

\begin{equation*} f_Y(\kern1pty)=\frac{101}{100}(1+y)^{-2} -\frac{101}{200}(1+y)^{-2-10{\rm i}}-\frac{101}{200}(1+y)^{-2+10{\rm i}}, \end{equation*}

which simplifies to

\begin{equation*} f_Y(\kern1pty)=\frac{101}{100}(1+y)^{-2}(1- \cos (10\log(1+y) ) ). \end{equation*}

In Figure 1, both f and $f_Y$ are plotted. As might be expected, one observes that the shape of the distribution is somewhat preserved while the tail is stretched out.

Figure 1: The density for a matrix-exponentially distributed random variable X (dashed line) and the corresponding density for $Y=\exp(X)-1$ (solid line).

4. Analogous generalizations of other classes of distributions

We now list a number of distributions obtained through a transformation of a PH random variable other than the exponential function. All these transformations, as opposed to the M-Pareto distribution, are parameter dependent, and as such fitting procedures to data are less straightforward.

4.1. Matrix-Weibull distributions

Let $X\sim \mbox{PH}_p(\boldsymbol{\pi},\textbf{T})$ . Inspired by the construction of a Weibull random variable as a power of an exponentially distributed random variable, one can define

\begin{equation*} Y = X^{1/\beta} \end{equation*}

for some $\beta >0$ . The distribution function of Y is immediately seen to be

\begin{equation*} F_Y(\kern1pty) = 1- \boldsymbol{\pi}{\rm e}^{\textbf{T}y^\beta}\textbf{e} \end{equation*}

with corresponding density

\begin{equation*} f_Y(\kern1pty)=\boldsymbol{\pi}{\rm e}^{\textbf{T}y^\beta}\textbf{T} \beta y^{\beta-1} . \end{equation*}

Y can be called a matrix-Weibull distribution, since the scale parameter of the usual Weibull distribution with cumulative distribution function $F_{\rm Wei}(\kern1pty)=1-\exp({-}cy^{\beta})$ is now replaced by a matrix. Alternatively, one may refer to Y as a power-PH distribution. Its mean is given by

\begin{align*} {\mathrm{E}} (Y) & = \int_0^\infty y\boldsymbol{\pi}{\rm e}^{\textbf{T}y^\beta}\textbf{T} \beta y^{\beta-1}\,{\rm d} y \\ &= \boldsymbol{\pi} \int_0^\infty x^{1/\beta} {\rm e}^{\textbf{T}x}\,{\rm d} x\, \textbf{t} \\ &= \Gamma (1+1/\beta)\boldsymbol{\pi} ({-}\textbf{T})^{-1/\beta-1} \textbf{t}, \end{align*}

where the last line follows from Theorem 3.4.4 of [Reference Bladt and Nielsen6]. More generally, the $\theta$ th moment ( $\theta>0$ ) can be deduced similarly to be

\begin{equation*} {\mathrm{E}} (Y^\theta) = \Gamma (1+\theta/\beta)\boldsymbol{\pi}({-}\textbf{T})^{-\theta/\beta-1}\textbf{t} . \end{equation*}

The moment-generating function is found by expansion:

\begin{align*} {\mathrm{E}} ({\rm e}^{\theta Y}) &= \int_0^\infty {\rm e}^{\theta y}\boldsymbol{\pi}{\rm e}^{\textbf{T}y^\beta}\textbf{T}\beta y^{\beta-1}\,{\rm d} y \\ &= \boldsymbol{\pi} \sum_{n=0}^\infty \frac{\theta^n}{n!}\int_0^\infty x^{n/\beta}{\rm e}^{\textbf{T}x}\,{\rm d} x \, \textbf{t} \\ &= \sum_{n=0}^\infty \frac{\theta^n}{n!}\Gamma (1+n/\beta )\boldsymbol{\pi}({-}\textbf{T})^{-n/\beta-1} \textbf{t} , \end{align*}

which is valid for $\beta>1$ .

If ${(\boldsymbol{\pi},\textbf{T})}$ is the representation of an $\mbox{Er}_n(\lambda )$ -distribution (see Example 3.8), then

\begin{equation*} f_Y(\kern1pty)= \beta\frac{\lambda^{n-1}}{(n-1)!}y^{n\beta-1} {\rm e}^{-\lambda y^\beta} . \end{equation*}

Therefore, as for the matrix-Pareto distribution, a dense class of distributions with a Weibull tail can then be defined through mixtures of the form

\begin{equation*} f(\kern1pty)= \beta \sum_{i=1}^N \alpha_i \frac{\lambda_i^{n_i-1}}{(n_i-1)!}y^{n_i\beta-1} {\rm e}^{-\lambda_i y^\beta} . \end{equation*}

It is evident that this class will be a very appropriate choice for approximating distributions which have tails (suspected to be) close to a Weibull.

4.2. Exponential-PH distribution

Definition 4.1. Let $X\sim \mbox{PH}_p(\boldsymbol{\pi},\textbf{T})$ . Then define an exponential-PH distribution with parameters $\mu$ , $\sigma$ , $\boldsymbol{\pi}$ , and $\textbf{T}$ as the distribution of the random variable

\begin{equation*} Y = \mu - \sigma \log (X) . \end{equation*}

The distribution function of Y is given by

(5) \begin{align} F_Y(\kern1pty)&={\mathrm{P}} \left(\log(X)\geq -\frac{y-\mu}{\sigma}\right) \nonumber\\ &={\mathrm{P}} \left(X\geq \exp \left({-}\frac{y-\mu}{\sigma} \right)\right)\nonumber \\ &=\boldsymbol{\pi}\exp (\textbf{T} {\rm e}^{-(\kern1pty-\mu)/\sigma} )\textbf{e} \label{eqn5}\end{align}

with support on ${({-}\infty,\infty)}$ . The density is obtained through differentiation as

\begin{equation} f_Y(\kern1pty)=\frac{1}{\sigma} {\rm e}^{-\frac{y-\mu}{\sigma}}\boldsymbol{\pi}{\rm e}^{\textbf{T}{\rm e}^{-\frac{y-\mu}{\sigma}}}\textbf{t} =\frac{1}{\sigma} \boldsymbol{\pi} \exp \left({-}\frac{y-\mu}{\sigma}\textbf{I} + \textbf{T}{\rm e}^{-\frac{y-\mu}{\sigma}} \right)\textbf{t} . \end{equation}

One can interpret (5) as a matrix extension of the usual Gumbel distribution with cumulutative distribution function $F_{\rm Gu}(\kern1pty)=\exp({-}\exp({-}(\kern1pty-\mu)/\sigma))$ , but since here the matrix $\textbf{T}$ replaces the constant $-1$ rather than a parameter, it is less obvious to call it a matrix-Gumbel distribution, and one may prefer the name exponential-PH distribution. Concerning the mean, we proceed as follows. Since

\begin{equation*} {\mathrm{E}} (\log (X))= -\gamma -\boldsymbol{\pi}\log ({-}\textbf{T})\textbf{e} \end{equation*}

(use (2) or see, e.g., [Reference Bladt and Nielsen6], p. 180) we get that

\begin{equation*} {\mathrm{E}} (Y) = \mu + \sigma \gamma + \sigma \boldsymbol{\pi}\log ({-}\textbf{T})\textbf{e} , \end{equation*}

where $\gamma$ is Euler’s constant. In particular, if $X\sim \mbox{Exp} (1)$ then $\textbf{T}=-1$ , the term $\log ({-}\textbf{T})=0$ , and ${\mathrm{E}} (Y)= \mu + \gamma \sigma$ .

Concerning the Laplace transform for Y,

\begin{equation*} L_Y(s)={\rm e}^{-s\mu} \Gamma (1+s\sigma) \boldsymbol{\pi}({-}\textbf{T})^{s\sigma}\textbf{e}, \end{equation*}

which follows from

\begin{align*} L_Y(s)&= {\mathrm{E}} \big( {\rm e}^{-s(\mu-\sigma \log (X))} \big) \\ &= {\rm e}^{-\mu s}{\mathrm{E}} (X^{s\sigma}) \\ &= {\rm e}^{-\mu s} \Gamma (1+s\sigma)\boldsymbol{\pi}({-}\textbf{T})^{-s\sigma}\textbf{e} , \end{align*}

where the last step follows from Theorem 3.4.6 of [Reference Bladt and Nielsen6] (p. 175). For the special (scalar) case of $X\sim \mbox{Exp}(1)$ , we recuperate the known formula ${\rm e}^{-\mu s}\Gamma (1+\sigma s)$ of the Laplace transform of the Gumbel distribution.

If ${(\boldsymbol{\pi},\textbf{T})}$ is the representation of an $\mbox{Er}_n(\lambda )$ -distribution (see Example 3.8), then

\begin{equation*} f_Y(\kern1pty)=\frac{1}{\sigma} {\rm e}^{-\frac{y-\mu}{\sigma}} \frac{\lambda^{(n-1)}}{(n-1)!}{\rm e}^{-(n-1)\frac{y-\mu}{\sigma}} \exp \big({-}\lambda {\rm e}^{-\frac{y-\mu}{\sigma}} \big), \end{equation*}

so a dense class of distributions with a Gumbel-type tail can be defined in terms of distributions of the form

\begin{equation*} f(\kern1pty)=\frac{1}{\sigma} {\rm e}^{-\frac{y-\mu}{\sigma}}\sum_{i=1}^N \alpha_i \frac{\lambda_i^{(n_i-1)}}{(n_i-1)!}{\rm e}^{-(n_i-1)\frac{y-\mu}{\sigma}} \exp \big({-}\lambda_i {\rm e}^{-\frac{y-\mu}{\sigma}} \big) . \end{equation*}

4.3. Shifted-power-PH distribution

More generally, inspired by the Generalized Extreme Value distribution $\mbox{GEV}(\mu,\sigma,\xi)$ with cumulative distribution function

(6) \begin{equation} F(\kern1pty) =\left\{\!\! \begin{array}{ll} \exp ({-}( 1 + \xi \frac{y-\mu}{\sigma} )^{-1/\xi} ), & \xi \neq 0, \\ \exp ({-}\exp ({-}\frac{y-\mu}{\sigma} ) ), & \xi = 0, \end{array} \right. \label{eqn6} \end{equation}

we see that for $X\sim \mbox{Exp}(1)$ one has

\begin{align*} F(\kern1pty) &= {\mathrm{P}} \bigg( X>\left( 1 + \xi \frac{y-\mu}{\sigma} \right)^{-1/\xi} \bigg) \\ &= {\mathrm{P}} \left( \frac{X^{-\xi}-1}{\xi} \leq \frac{y-\mu}{\sigma} \right) \\ &= {\mathrm{P}} \left( \mu+ \sigma \frac{X^{-\xi}-1}{\xi} \leq y \right)\!. \end{align*}

That is, for $\xi\neq 0$ one can construct $Y\sim \mbox{GEV}(\mu,\sigma,\xi)$ from an exponentially distributed random variable $X\sim \mbox{Exp} (1)$ by letting

(7) \begin{equation} Y=\mu+ \sigma \frac{X^{-\xi}-1}{\xi} . \label{eqn7} \end{equation}

Taking the limit $\xi\rightarrow 0$ then leads back to

\begin{equation*} Y=\mu-\sigma \log(X), \end{equation*}

which is the Gumbel case treated in Section 4.2, so we focus on $\xi\neq 0$ in the following.

Now letting $X\sim \mbox{PH}(\boldsymbol{\pi},\textbf{T})$ replace the exponentially distributed random variable above, we apply the transformation (7) for $\xi\neq 0$ . The cumulative distribution function for Y is then readily obtained as

(8) \begin{equation} F_Y(\kern1pty) = \boldsymbol{\pi}\exp\bigg( \textbf{T}\bigg( 1 + \xi \frac{y-\mu}{\sigma} \bigg)^{-1/\xi} \bigg)\textbf{e} , \label{eqn8} \end{equation}

and the density is obtained through differentiation:

\begin{align*} f_Y(\kern1pty) &= \boldsymbol{\pi}\exp\bigg( \textbf{T}\bigg( 1 + \xi \frac{y-\mu}{\sigma} \bigg)^{-1/\xi} \bigg)\textbf{T}\textbf{e} \bigg({-}\frac{1}{\xi} \bigg)\bigg( 1 + \xi \frac{y-\mu}{\sigma} \bigg)^{-1/\xi-1}\frac{\xi}{\sigma} \\ &= \frac{1}{\sigma}\boldsymbol{\pi}\exp\bigg( \textbf{T}\bigg( 1 + \xi \frac{y-\mu}{\sigma} \bigg)^{-1/\xi} \bigg)\textbf{t}\bigg( 1 + \xi \frac{y-\mu}{\sigma} \bigg)^{-(1+\xi)/\xi} \\ &= \frac{1}{\sigma}z(\kern1pty)^{\xi+1}\boldsymbol{\pi}\exp( \textbf{T}z(\kern1pty) )\textbf{t}, \end{align*}

where

\begin{equation*} z(\kern1pty)=\bigg( 1 + \xi \frac{y-\mu}{\sigma} \bigg)^{-1/\xi} . \end{equation*}

Again, the constant $-1$ in (6) is replaced by the matrix $\textbf{T}$ in (8) now, so that one could view the latter as a certain matrix-GEV distribution, but the name shifted-power-PH distribution may be considered more appropriate. The mean of Y is

\begin{align*} {\mathrm{E}} (Y)&= \mu + \frac{\sigma}{\xi}( {\mathrm{E}} (X^{-\xi}-1) ) \\ &= \mu + \frac{\sigma}{\xi} ( \Gamma (1-\xi) \boldsymbol{\pi}({-}\textbf{T})^\xi \textbf{e} -1 ), \end{align*}

which follows from Theorem 3.4.6 of [Reference Bladt and Nielsen6] whenever $\xi<1$ . Higher-order integer moments can be calculated recursively by expanding the power of Y in terms of X and using Theorem 3.4.6 of [Reference Bladt and Nielsen6] again.

If ${(\boldsymbol{\pi},\textbf{T})}$ is the representation of an $\mbox{Er}_n(\lambda )$ -distribution, then

\begin{equation*} f_Y(\kern1pty) = \frac{1}{\sigma}\frac{\lambda^{(n-1)}}{(n-1)!}t(\kern1pty)^{\xi+n} \exp ({-}\lambda t(\kern1pty) ) \end{equation*}

so a dense class of distributions with GEV-type tails can be defined in terms of mixtures

\begin{equation*} f(\kern1pty)= \frac{1}{\sigma}\sum_{i=1}^N \alpha_i \frac{\lambda_i^{(n_i-1)}}{(n_i-1)!}t(\kern1pty)^{\xi+n_i} \exp ({-}\lambda_i t(\kern1pty) ). \end{equation*}

5. Modelling with a matrix-Pareto distribution

As mentioned earlier, the denseness of PH distributions in the class of all distributions on the positive half-line makes them an attractive modelling tool. However, traditionally one disadvantage of this class is that by its exponentially decaying tails it may require many states to be a reasonable fit to distributions with a heavy tail (even on finite time intervals). From the denseness of matrix-Pareto distributions in the class of all distributions on the positive half-line (Theorem 3.9 or [1, Theorem 1]), it becomes clear that if one reckons that a heavy-tailed distribution describes a given set of data points well, then a fitting procedure with matrix-Pareto distributions may lead to a much more parsimonious (and natural) model (quantitatively expressed by a low number of states and a sparse intensity matrix $\textbf{T}$ ) of the underlying distribution yet keeping the advantageous properties of phase-type distributions. In [Reference Ahn, Kim and Ramaswami1], a Danish fire insurance dataset was investigated in this direction. Here, in the context of the present paper, we would like to illustrate the potential of the proposed inhomogeneous PH approach for a set of 1282 Dutch fire insurance claims from the period 2000–2014 (all in excess of EUR 1 million), which were already studied in detail and modelled by other means in [Reference Albrecher, Beirlant and Teugels2]. Indeed, after testing various models, in [2, p. 107] a splicing model with a mixed Erlang component for the bulk, a Pareto distribution for the range to the right of that, and another Pareto distribution for the tail was identified to be an appropriate choice for that dataset. Apart from seeking an adequate fit by a matrix-Pareto distribution, we shall pay special attention to the aforementioned (possibly mixed) Erlang structure and compare it to the splicing method.

In order to fit the totality of the data by one single distribution, we propose a matrix-Pareto distribution (log-phase-type), which also contains the Pareto tail as a special case. Thus we must fit a phase-type distribution to the logarithm of the data. Since the logarithm of the data has support contained in $[8.5,20]$ , we decide to subtract $8.5$ from all log data, making it possible to reduce the number of phases significantly. Hence, the model we consider is

\begin{equation*} y_1,y_2,\ldots,y_N \sim \mbox{PH}(\boldsymbol{\pi},\textbf{T}), \end{equation*}

where $y_i=\log(x_i)-8.5$ ( $i=1,\ldots,N=1282$ ) are i.i.d. transformed data points $x_1,\ldots,x_N$ . The implied model for the original generic data X is then

\begin{equation} X= {\rm e}^{8.5}{\rm e}^{Y}, \ \ \quad Y\sim \mbox{PH}(\boldsymbol{\pi},\textbf{T}) , \end{equation}

i.e. a (scaled) log-phase-type distribution.

Utilizing an expectation maximization (EM) algorithm (cf. [Reference Asmussen, Nerman and Olsson5]), we use 20 phases to obtain an adequate fit to the data and the result is presented in Figure 2. The choice of 20 phases is taken so as to allow a sufficiently flexible phase-type fit to the data, the concrete magnitude being supported by trial experiments. Though the starting point of the EM algorithm was then a general phase-type distribution, the resulting estimate reduces to a much simpler structure, a phenomenon which is commonly observed in phase-type fitting. Concretely, the structure of the fitted intensity matrix is as follows:

\begin{align}\footnotesize \left(\arraycolsep=1.4pt \begin{array}{rrrrrrrrr|rr|rrrrrrrr|r} -a & a & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & -a & a & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & -a & a & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & -a & a & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & -a & a & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & -a & a & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & -a & a & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & -a & a & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & -a & a_1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \hline 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & -a_2 & a_3 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\\hline 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & -b & b & 0 & 0 & 0 & 0 & 0 & 0 & 0 &0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & -b & b & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 &0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & -b & b & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & -b & b & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & -b & b & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & -b & b & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & -b & b & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & -b & b & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & -b & b_1 \\ \hline 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & -c \end{array} \right)\!, \end{align}

Figure 2: Fitted phase-type distribution to log data (left) and QQ-plot of original data vs. the fitted log-phase-type distribution.

where $a=3.322\,438$ , $a_1=3.292\,191$ , $ a_2=3.322\,884$ , $a_3=3.292\,191$ , $b=3.288\,628$ , $b_1 = 2.063\,919$ , and $c = 48.214\,21$ . The estimator for the initial distribution is

\begin{equation*} \boldsymbol{\pi} = (0.999\,2200,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.000\,78 ) . \end{equation*}

Hence we may describe this distribution in terms of eight parameters. The distribution consists of a very fast state (the last one), which can be entered directly with the very small probability $0.000\,78$ , representing the possibility of very small outcomes (one can interpret this to stem from some data points in that region due to the left-truncation of the data at $y_i=0$ ). Furthermore, there are two blocks of nine-dimensional convolutions of exponential distributions (i.e. Er $_9$ distributions) with intensities a and b, respectively. Exits to the absorbing state are possible from states 9, 10, 19, and 20.

The fit of the phase-type data (see Figure 2) looks adequate in the main body and most of the tail (QQ-plot), which is quite remarkable when compared to the much more complex model suggested in [2, p. 107]. The seven most extreme tail points start to deviate from the fit, being most pronounced for the last three points. This could be expected since maximum likelihood treats all points of the distribution equally as opposed to extreme value methods focusing on the tail fit. In any case, the identified matrix-Pareto distribution seems to be an excellent and parsimonious fit to the data over quite a large range.

From the concrete numerical values, one can also see that a simple matrix-Pareto distribution with underlying Er $_{19}$ distribution (with rate around 3.3, and hence altogether only two parameters!) might be a (surprisingly) reasonable description of the data as well. To further pursue this point, we fitted an $\mbox{Er}_{19}(\lambda)$ (i.e. a convolution of 19 i.i.d. exponential distributions with intensities $\lambda$ ) to the data, resulting in an estimated intensity of $\hat{\lambda}=3.340\,752$ . The Erlang distribution indeed also fits the data well in a first approximation (the dashed blue line in Figure 2), though it is clear that the full PH is a more suitable model for small values. The log-likelihoods for the full phase-type and Erlang fits are $-1250.092$ and $-1366.735$ , respectively.

In conclusion, one may argue that we have obtained a parsimonious but still somewhat similar model to the one in [Reference Albrecher, Beirlant and Teugels2] but using a quite different rationale, and without the need to decide about the location of splicing points. In that respect, it has the advantage of being rather easily implemented in practice.

6. Conclusion

In this paper we proposed a simple way to carry over the advantages of PH distributions as a modelling tool to distributions with a different tail behaviour. This can be achieved by introducing time inhomogeneity in the Markov jump process underlying the construction of the PH distribution. In the case that the respective time scaling is the same for all states of the Markov process, this leads to a number of new simple families of distributions (including matrix-Pareto distributions, (shifted) power-PH distributions, exponential-PH distributions) that all inherit the denseness in the class of distributions on the positive real line, but can lead to much more parsimonious descriptions than a direct PH fit, when the tail of the distribution is in fact a corresponding transform of a PH tail. In view of its relevance in applications, we focused on heavy tails and in particular established various properties of matrix-Pareto distributions. We finally studied the modelling performance of the latter on a real-world dataset from fire insurance, with rather promising results.

There are several possible directions for future research. One may be to more directly merge light- and heavy-tail fitting by time scaling only some of the states of the Markov process, and possibly design an EM algorithm that automatically decides on the basis of the dataset how many components of each type are needed for a good description of the data. It should also be feasible to extend the approach proposed in this paper to complement splicing models under censoring, see, e.g., [Reference Reynkens, Verbelen, Beirlant and Antonio17]. Finally, adapting extreme value techniques to the dense distribution classes proposed in this paper may provide interesting alternatives for parsimonious modelling with emphasis on the appropriateness of the fit in the tail, but still good performance for smaller values.

Acknowledgement

H.A. acknowledges financial support from the Swiss National Science Foundation Project 200021_168993.

References

Ahn, S., Kim, J. H. and Ramaswami, V. (2012). A new class of models for heavy tailed distributions in finance and insurance risk. Insurance Math. Econom. 51, 4352.CrossRefGoogle Scholar
Albrecher, H., Beirlant, J. and Teugels, J. L. (2017). Reinsurance: Actuarial and Statistical Aspects. John Wiley, Chichester.CrossRefGoogle Scholar
Asmussen, S. (2003). Applied Probability and Queues, 2nd edn. Springer, New York.Google Scholar
Asmussen, S. and Albrecher, H. (2010). Ruin Probabilities, 2nd edn. Advanced Series on Statistical Science & Applied Probability, 14. World Scientific Publishing Co. Pte. Ltd., Hackensack, NJ.CrossRefGoogle Scholar
Asmussen, S., Nerman, O. and Olsson, M. (1996). Fitting phase-type distributions via the EM algorithm. Scand. J. Statist. 23, 419441.Google Scholar
Bladt, M. and Nielsen, B. F. (2017). Matrix-Exponential Distributions in Applied Probability. Springer, New York.CrossRefGoogle Scholar
Bladt, M., Nielsen, B. F. and Samorodnitsky, G. (2015). Calculation of ruin probabilities for a dense class of heavy tailed distributions. Scand. Actuarial J. 2015, 573591.CrossRefGoogle Scholar
Bladt, M. and Rojas-Nandayapa, L. (2018). Fitting phase-type scale mixtures to heavy-tailed data and distributions. Extremes 21, 285313.CrossRefGoogle Scholar
Breuer, L. (2012). From Markov Jump Processes to Spatial Queues. Springer Science & Business Media, New York.Google Scholar
Embrechts, P., Klüppelberg, C. and Mikosch, T. (1997). Modelling Extremal Events for Insurance and Finance. Springer, Berlin.CrossRefGoogle Scholar
Gill, R. D. and Johansen, S. (1990). A survey of product-integration with a view toward application in survival analysis. Ann. Statist. 18, 15011555.CrossRefGoogle Scholar
Goodman, G. S. and Johansen, S. (1973). Kolmogorov’s differential equations for non-stationary, countable-state Markov processes with uniformly continuous transition probabilities. Math. Proc. Camb. Phil. Soc. 73, 119138.CrossRefGoogle Scholar
Helton, J. and Stuckwisch, S. (1976). Numerical approximation of product integrals. J. Math. Anal. Appl. 56, 410437.CrossRefGoogle Scholar
Møller, C. M. M. (1992). Numerical evaluation of markov transition probabilities based on the discretized product integral. Scand. Actuarial J. 1992, 7687.CrossRefGoogle Scholar
Olver, F. W., Lozier, D. W., Boisvert, R. F. and Clark, C. W. (2010). NIST Handbook of Mathematical Functions, 1st edn. Cambridge University Press, New York.Google Scholar
Pigeon, M. and Denuit, M. (2011). Composite lognormal-Pareto model with random threshold. Scand. Actuarial J. 2011, 177192.CrossRefGoogle Scholar
Reynkens, T., Verbelen, R., Beirlant, J. and Antonio, K. (2017). Modelling censored losses using splicing: A global fit strategy with mixed Erlang and extreme value distributions. Insurance Math. Econom. 77, 6577.CrossRefGoogle Scholar
Scollnik, D. P. (2007). On composite lognormal-Pareto models. Scand. Actuarial J. 2007, 2033.CrossRefGoogle Scholar
Slavk, A. (2007). Product Integration. Its History and Applications. Matfyzpress, Prague.Google Scholar
Willmot, G. E. and Woo, J.-K. (2007). On the class of Erlang mixtures with risk theoretic applications. N. Amer. Actuarial J. 11, 99115.CrossRefGoogle Scholar
Figure 0

Figure 1: The density for a matrix-exponentially distributed random variable X (dashed line) and the corresponding density for $Y=\exp(X)-1$ (solid line).

Figure 1

Figure 2: Fitted phase-type distribution to log data (left) and QQ-plot of original data vs. the fitted log-phase-type distribution.