Bayesian over-dispersed Poisson model and the Bornhuetter &amp; Ferguson claims reserving method

Peter D. England; Richard J. Verrall; Mario V. Wüthrich

doi:10.1017/S1748499512000012

Bayesian over-dispersed Poisson model and the Bornhuetter & Ferguson claims reserving method

Published online by Cambridge University Press: 27 February 2012

Peter D. England ,

Richard J. Verrall and

Mario V. Wüthrich

Show author details

Peter D. England: Affiliation:
Towers Watson, London, UK
Richard J. Verrall: Affiliation:
Cass Business School, City University, London, UK
Mario V. Wüthrich*: Affiliation:
ETH Zurich, RiskLab, Department of Mathematics, Switzerland
*: *Correspondence to: Mario V. Wüthrich, ETH Zurich, RiskLab, Department of Mathematics, 8092 Zurich, Switzerland. E-mail: mario.wuethrich@math.ethz.ch

Article contents

Abstract
Bayesian over-dispersed Poisson model
Uniform prior distributions and the chain ladder method
Gamma prior distributions
Bias, prediction uncertainty and MCMC
Example
Conclusions
References

Rights & Permissions

Abstract

We consider the Bayesian over-dispersed Poisson (ODP) model for claims reserving in general insurance. We choose two different types of prior distributions for the parameters and then study the different Bayesian predictors. This study leads, on the one hand, to the classical chain ladder predictor and, on the other hand, to Bornhuetter & Ferguson predictors. We highlight (either analytically or numerically) how these predictors are obtained and how their prediction uncertainty can be determined.

Keywords

Claims reserving Over-dispersed Poisson model Bornhuetter and Ferguson method Chain ladder method Prediction uncertainty Mean square error of prediction Markov chain Monte Carlo method

Type: Papers
Information: Annals of Actuarial Science , Volume 6 , Issue 2 , 23 August 2012 , pp. 258 - 283

DOI: https://doi.org/10.1017/S1748499512000012 [Opens in a new window]
Copyright: Copyright © Institute and Faculty of Actuaries 2012

1 Bayesian over-dispersed Poisson model

1.1 Introduction

A number of papers have appeared in the recent literature looking at stochastic models related to the Bornhuetter & Ferguson (BF, 1972) method of claims reserving, see for example Alai et al. (Reference Alai, Merz and Wüthrich2009), Mack (Reference Mack2008), Saluz et al. (Reference Saluz, Gisler and Wüthrich2011), Verrall (Reference Verrall2004). The basic philosophy underlying these papers, and the BF method, is that there is external knowledge about the ultimate losses that is not contained in the runoff triangles of data. In statistical methodology, the usual way to incorporate such external knowledge is to use Bayesian methods. This paper examines the use of Bayesian methods for over-dispersed Poisson (ODP) models. The Bayesian ODP model treated in this paper was briefly covered in England & Verrall (Reference England and Verrall2002), the present paper provides a much more detailed analysis and examines the use of different prior distributions and posterior estimators. We provide analytical results, where possible, which allow for intuitive interpretations. Where it is not possible to derive analytical results, we use Markov chain Monte Carlo (MCMC) methods to obtain numerical results.

Although this paper is related to the BF method, in that the underlying philosophy is similar, the results are not the same as the results from applying the conventional BF method. The reason for this is that the BF method uses the same runoff pattern as the chain ladder (CL) technique, whereas the application of Bayesian prior distributions to the rows of the claims development triangle naturally affects the posterior distributions of the parameters for the columns (i.e. the runoff parameters) of the claims development triangle – even if non-informative prior distributions are used for the latter. It is possible to construct a Bayesian BF model where the runoff pattern is exactly the same as in the CL technique, see Verrall (Reference Verrall2004), but in that case it is not clear that this is a statistically optimal estimator. Thus, the purpose of this paper is to examine Bayesian models which incorporate prior knowledge about ultimate losses (as the BF method does), but it is not the purpose to reproduce the results from the BF method exactly, as in Alai et al. (Reference Alai, Merz and Wüthrich2009).

The model assumptions are set out in full in Sections 1.2, 2.1 and 3.1, but the basic idea is to use an ODP model for the incremental claims with cross-classified means μ_iγ_j, where μ_i is the row parameter in accident year i (related to the exposure of accident year i) and γ_j is the column parameter for development period j (related to the runoff pattern), and to apply prior distributions to these parameters. We will assume that there is no prior knowledge about the runoff parameters, and we use non-informative prior distributions for γ_j. By assuming informative prior distributions for the μ_i's we can incorporate external knowledge about the ultimate losses. We investigate a number of different formulations of these informative prior distributions, and examine the properties of the resulting posterior estimators. We also compare our results with the traditional BF method.

An important observation will be that although we choose non-informative prior distributions for the parameters, their shapes may have a significant influence on the resulting claims reserves.

Organization of the paper. In the remainder of this section we define the general Bayesian ODP Model and we discuss prediction in a Bayesian framework. In Sections 2 and 3 we then specify two different types of prior distributions (the uniform prior model with log link function and the gamma prior model). Parameter estimates, e.g. for γ_j, are always denoted by $\[--><$> {{\widehat{\gamma }}_j} <$><$> \widehat{\gamma }_{j}^{ \ast } <$><!--$ in the gamma prior model. In Section 4 we discuss parameter estimation via simulation methods and in Section 5 a numerical example is provided. All the statements are proved in the appendix.

1.2 Model assumptions

The model assumptions are similar to those in the Bayesian claims reserving models presented in England & Verrall (Reference England and Verrall2002, Reference England and Verrall2006), Verrall (Reference Verrall2004) and Wüthrich & Merz (Reference Wüthrich and Merz2008), Section 4.4. We assume that the parameters are modelled through prior distributions and, conditional on these parameters, the incremental claims X_i,j have independent ODP distributions for accident years i ∈ {0,…,I} and development years j ∈ {0,…,I}. The final development year is given by I and the observations at time I are given in the (upper) runoff triangle

$--><$$> {{{\cal D}}_I}\, = \,\left\{ {{{X}_{i,j}}\,:\, i\, + \,j\,\leq \,I} \right\}. \eqno<$$><!--$

Our goal is to predict the future claims in the lower triangle $\[--><$> {\cal D}_{I}^{c} \, = \,\left\{ {{{X}_{i,j}}\,:\, i\, + \,j\, \gt \,I\,,\, i\,\leq \,I} \right\} <$><!--$ .

Model 1.1 (Bayesian ODP model)

• μ ₀,…,μ_I _,γ ₀,…,γ_I, ϕ are independent positive random variables with joint density u(·).
• Conditionally, given parameters $\[--><$> {\bf {{\brTheta }}}\, = \,({{\mu }_0}, \ldots, {{\mu }_I},{{\gamma }_0}, \ldots, {{\gamma }_I},\varphi ),\,{{X}_{i,j}} <$><!--$ are independent random variables with

$--><$$>{{\left. {\frac{{{{X}_{i,j}}}}{\varphi }} \right|}_{\bf {{\brTheta }}}} {\mathop{ \sim }\limits^{{(d)}}} \,{\rm{Poi}}\,({{\mu }_i}{{\gamma }_j}/\varphi ).\eqno<$$><!--$

The parameter μ_i plays the role of the row parameter (related to the exposure of accident year i, see (1.2)), the γ_j's describe column parameters (related to an incremental claims development (runoff) pattern that is not necessarily normalized, see (1.2)) and ϕ describes the dispersion parameter. We obtain the following first two conditional moments

(1.1)

$--><$$>E\left[ {\left. {{{X}_{i,j}}} \right|{\bf {{\brTheta }}}} \right]\, = \,{{\mu }_i}{{\gamma }_j}\; \; \; \; {\rm{and}}\; \; \; \; {\rm{Var}}\left( {\left. {{{X}_{i,j}}} \right|{\bf {{\brTheta }}}} \right)\, = \,\varphi {{\mu }_i}{{\gamma }_j},\eqno<$$><!--$

and the conditional total ultimate claim of accident year i is given by

(1.2)

$--><$$>E\left[ {\left. {\mathop{\sum}\limits_{j\, = \,0}^I {{X}_{i,j}}} \right|{{{\brTheta }}}} \right] \, = \, {{\mu }_i} \mathop{\sum}\limits_{j\, = \,0}^I {{\gamma }_j}.\eqno<$$><!--$

We analyze the Bayesian ODP Model 1.1 for different types of prior distributions for Θ and different types of parameter estimates for Θ (see (1.3)–(1.4) below).

1.3 Bayesian predictors

Assume the Bayesian ODP Model 1.1 to hold. Using Bayes’ Theorem we find the posterior density of Θ, given the observations $\[--><$> {{{\cal D}}_I} <$><!--$ , by

$--><$$>u(\bitheta |{{{\cal D}}_I})\,{\rm{\varpropto }}\,\prod\limits_{i\, + \,j\,\leq \,I} \,\exp \,\left\{ {{\rm{ - }}\,\frac{{{{\mu }_i}{{\gamma }_j}}}{\varphi }} \right\}\,\frac{{{{{\left( {\frac{{{{\mu }_i}{{\gamma }_j}}}{\varphi }} \right)}}^{{{X}_{i,j/\varphi }}}} }}{{\left( {\frac{{{{X}_{i,j}}}}{\varphi }} \right)!}} u(\bitheta ),\eqno<$$><!--$

where the proportionality sign ∝ means up to normalization w.r.t. the random vector Θ. In Bayesian theory there are two commonly used predictors, the minimum mean square error (MMSE) predictor and the maximum a posteriori (MAP) predictor for Θ, given $\[--><$>{\cal D}_I<$><!--$ . These are given by

(1.3)

$--><$$>{{\widehat{{{{\brTheta }}}}}^{MMSE}} \, = \,E\left. {[{{{\brTheta }}}\,|\,{{{\cal D}}_I}} \right],\ \ \ \ \ \ \qquad\eqno<$$><!--$

(1.4)

$--><$$>{{\widehat{{{{\brTheta }}}}}^{MAP}} \, = \,\arg \mathop {\max }\limits_{\bf {{\bitheta }}} \,u({\bf {{\bitheta }}}\,|\,{{{\cal D}}_I}).\eqno<$$><!--$

The predictor $\[--><$> {{\widehat{{{{\brTheta }}}}}^{MMSE}} <$><$> {{\widehat{{{{\brTheta }}}}}^{MAP}} <$><$> {{{\cal D}}_I} <$><$> {{\widehat{{{{\brTheta }}}}}^{MAP}} <$><$> {{\widehat{{{{\brTheta }}}}}^{MMSE}} \,{\rm{ - }}\,{{\widehat{{{{\brTheta }}}}}^{MAP}} <$><$> {{{\cal D}}_I} <$><!--$ ), that can, in general, only be calculated numerically, for example, using the MCMC methodology. This is discussed in the rest of this paper.

2 Uniform prior distributions and the chain ladder method

In this section we start with uniform priors and log links for the parameters μ_i and γ_j. Such a model has already been studied in England & Verrall (Reference England and Verrall2006), Section 7.1. The crucial consequence of the uniform priors assumption is that if we make them non-informative we obtain the classical CL estimate from the MAP predictors. In this spirit, this model is another example that replicates the CL reserves (see also Subsection 2.3).

2.1 The (non-informative) uniform priors model with log link

We define the parameters on the log scale: $\[--><$> {{\alpha }_i}\, = \,\log ({{\mu }_i}) <$><$> {{\beta }_j}\, = \,\log ({{\gamma }_j}) <$><!--$ .

Model 2.1In addition to Model 1.1 we assume that $\[--><$>{{\alpha }_i}\, = \,\log ({{\mu }_i})<$><$>{{\beta }_j}\, = \,\log ({{\gamma }_j})<$><!--$ are uniformly distributed on (−b, b) for b > 0 and ϕ > 0 is constant.

Remark. It might be more appropriate to use different uniform priors for each parameter, e.g. α_i are uniformly distributed on (−m_i,m_i). However, if we choose non-informative priors for $\[--><$> {{\alpha }_i}\, = \,\log ({{\mu }_i}) <$><$> {{\beta }_j}\, = \,\log ({{\gamma }_j}) <$><!--$ (i.e. we will let m→∞ and b→∞), then the specific prior differences between the α_i's and between the β_j's are not relevant.

With (1.1) we obtain

$--><$$>\log E\left[ {\left. {{{X}_{i,j}}} \right|\brTheta } \right]\, = \,\log \left( {{{\mu }_i}{{\gamma }_j}} \right)\, = \,{{\alpha }_i}\, + \,{{\beta }_j},\eqno<$$><!--$

which illustrates the role of the log link function, see also England & Verrall (Reference England and Verrall2002), Section 2.3. That is, with the log link function we derive the generalized linear model form.

The posterior density under Model 2.1 is given by

$--><$$>u({\bf {{\bitheta }}}|{{{\cal D}}_I})\,{\rm{\varpropto }}\,\prod\limits_{i\, + \,j\,\leq \,I} \,\exp \,\left\{ {{\rm{ - }}\,\frac{{{{e}^{{{\alpha }_i}}} {{e}^{{{\beta }_j}}} }}{\varphi }} \right\}\,{{\left( {{{e}^{{{\alpha }_i}}} {{e}^{{{\beta }_j}}} } \right)}^{\frac{{{{X}_{i,j}}}}{\varphi }}} \, \prod\limits_{i\, = \,0}^I \frac{1}{{2m}}{{ 1}_{({\rm{ - }}m,m)}}({{\alpha }_i})\, \prod\limits_{j\, = \,0}^I \frac{1}{{2b}}{{ 1}_{({\rm{ - }}b,b)}}({{\beta }_j}).\eqno<$$><!--$

If we assume that m and b are sufficiently large (we comment on this below) then the MAP predictors for α_i and β_j can be found by maximizing the posterior log-likelihood function log u(θ| $\[--><$>{\cal D}_I<$><$> {{\widehat{\alpha }}_i} <$><$> {{\widehat{\beta }}_j} <$><!--$ for α_i and β_j, respectively, that correspond to the solution of the following system of equations, see also e.g. (2.16)–(2.17) in Wüthrich & Merz (Reference Wüthrich and Merz2008),

(2.1)

$--><$$>{{e}^{{{\alpha }_i}}} \mathop{\sum}\limits_{j\, = \,0}^{I\,{\rm{ - }}\,i} {{e}^{{{\beta }_j}}} \, = \,\mathop{\sum}\limits_{j\, = \,0}^{I\,{\rm{ - }}\,i} {{X}_{i,j}},\; \; \; \; \; \; \; \; \; \; \; \; \; \; {\rm{for}}\,i\, = \,0, \ldots, I,\eqno<$$><!--$

(2.2)

$--><$$>{{e}^{{{\beta }_j}}} \, \mathop{\sum}\limits_{i\, = \,0}^{I\,{\rm{ - }}\,j} {{e}^{{{\alpha }_i}}} \, = \,\mathop{\sum}\limits_{i\, = \,0}^{I\,{\rm{ - }}\,j} {{X}_{i,j}},\; \; \; \; \; \; \; \; \; \; \; \; \; \; {\rm{for}}\,j\, = \,0, \ldots, I.\eqno<$$><!--$

Remarks 2.2

• $\[--><$> {{C}_{i,j}}\, = \,\mathop{\sum}\nolimits_{k\, = \,0}^j {{{X}_{i,k}}} <$><$$>{{R}_i}\, = \,\mathop{\sum}\limits_{j\, = \,I\,{\rm{ - }}\,i\, + \,1}^I {{X}_{i,j}}\, = \,{{C}_{i,I}}\,{\rm{ - }}\,{{C}_{i,I\,{\rm{ - }}\,i}},\eqno<$$><!--$
under the assumption that X_i,j denotes claims payments. The final goal is to predict these outstanding loss liabilities R_i and to determine the prediction uncertainty.
• The solution to (2.1)–(2.2) is not unique, i.e. whenever $\[--><$> {{\widehat{\alpha }}_i} <$><$> {{\widehat{\beta }}_j} <$><$> K \in {\Bbb R} <$><$> {{\widehat{\alpha }}_i}\, + \,K <$><$> {{\widehat{\beta }}_j}\,{\rm{ - }}\,K <$><$> K \in {\Bbb R} <$><$> ({{\widehat{\alpha }}_0}\, + \,K, \ldots, {{\widehat{\alpha }}_I}\, + \,K,{{\widehat{\beta }}_0}\,{\rm{ - }}\,K, \ldots, {{\widehat{\beta }}_I}\,{\rm{ - }}\,K) <$><$> {{[{\rm{ - }}m,m]}^{I\, + \,1}} \,\times \,{{[{\rm{ - }}b,b]}^{I\, + \,1}} <$><!--$ . We fix such a constant K and then denote the resulting MAP predictor by

$--><$$>{{\widehat{\brTheta }}^{MAP}} \, = \,\left( {{{e}^{\widehat{\alpha }_{0}^{{MAP}} }}, \ldots, {{e}^{\widehat{\alpha }_{I}^{{MAP}} }}, {{e}^{\widehat{\beta }_{0}^{{MAP}} }}, \ldots, {{e}^{\widehat{\beta }_{I}^{{MAP}} }} } \right).\eqno<$$><!--$

The MAP predictor for the outstanding loss liabilities $\[--><$> {{R}_i} <$><!--$ in (2.3) is then defined by

$--><$$>\widehat{R}_{i}^{{MAP}} \, = \, {{e}^{\widehat{\alpha }_{i}^{{MAP}} }} \mathop{\sum}\limits_{j\, = \,I\,{\rm{ - }}\,i\, + \,1}^I {{e}^{\widehat{\beta }_{j}^{{MAP}} }} .\eqno<$$><!--$

We see that K cancels in this product and hence the specific choice of K is not important as long as at least one such K exists.

• The MAP optimization problem (2.1)–(2.2) can be solved analytically. This is discussed in Section 2.3, below.
• For the priors of α_i we can either use informative priors (i.e. m < ∞) or non-informative priors (i.e. m→∞). However, since we have only one parameter, namely m, we always have prior expected value E[α_i] = 0 and variance $\[--><$> {\rm{Var}}({{\alpha }_i})\, = \,{{m}^2} /3 <$><!--$ . Because we would like to have more flexibility in these parameter choices (if we have prior knowledge on α_i), we consider different priors in Section 3, which then leads to a Bayesian BF model. For the BF method we refer to Bornhuetter & Ferguson (Reference Bornhuetter and Ferguson1972).
• Note that the MAP predictors do not depend on the explicit choices of m, b and ϕ, once m and b are sufficiently large. On the other hand the MMSE predictors will depend on these parameter choices.

The MMSE predictor for R_i in (2.3) is given by

(2.4)

$--><$$>\eqalignno{\qquad \qquad \qquad \qquad \widehat{R}_{i}^{{MMSE}} \, = \,E\left. {[{{R}_i}|{{\cal D}_I}} \right]\, = \,E\left[ {\mathop{\sum}\limits_{j\, = \,I\,{\rm{ - }}\,i\, + \,1}^I {{X}_{i,j}}\bigg|{{\cal D}_I}} \right] \cr = \,\mathop{\sum}\limits_{j\, = \,I\,{\rm{ - }}\,i\, + \,1}^I E\left. {\left[{{\mu }_i}{{\gamma }_j} \bigg| {{\cal D}_I}} \right]\, = \, \mathop{\sum}\limits_{j\, = \,I\,{\rm{ - }}\,i\, + \,1}^I E\left. {[{{e}^{{{\alpha }_i}}} {{e}^{{{\beta }_j}}} |{{\cal D}_I}} \right]. <$$><!--$

Due to the posterior dependence between α_i and β_j, given $\[--><$>{\cal D}_I<$><!--$ , this cannot be further decoupled and calculated in closed form, see also Verrall (Reference Verrall2004). Therefore, the MMSE predictor can only be calculated numerically.

We analyze the right-hand side of (2.4) in more detail. We denote $\[--><$> \bialpha \, = \,({{\alpha }_0}, \ldots, {{\alpha }_I}) <$><$> \bibeta \, = \,({{\beta }_0}, \ldots, {{\beta }_I}) <$><$> \bimu \, = \,({{\mu }_0}, \ldots, {{\mu }_I}) <$><$> \bigamma \, = \,({{\gamma }_0}, \ldots, {{\gamma }_I}) <$><$> {{\mu }_i}\, = \,{{e}^{{{\alpha }_i}}} <$><$> {{\gamma }_j}\, = \,{{e}^{{{\beta }_j}}} <$><!--$ we obtain

(2.5)

$--><$$>\eqalignno{ \qquad \qquad \qquad \qquad \quad \widehat{R}_{{i,j}}^{{MMSE}} \, = \,E\left[ {{{e}^{{{\alpha }_i}}} {{e}^{{{\beta }_j}}} |{{\cal D}_I}} \right]\, = \, {\int}_{{{{\!\!\!\!\Bbb R}}^{2I}} } {{e}^{{{\alpha }_i}}} {{e}^{{{\beta }_j}}} u\left( {\left. {\bialpha, \bibeta } \right|{{\cal D}_I}} \right)d\bialpha \,d\bibeta \cr = \,{\int}_{{\!\!\!\Bbb R}_{ + }^{{2I}} } {{\mu }_i}{{\gamma }_j} u\left( {\left. {\bimu, \bigamma } \right|{{\cal D}_I}} \right) \prod\limits_{k\, = \,0}^I \frac{1}{{{{\mu }_k}}} \prod\limits_{l\, = \,0}^I \frac{1}{{{{\gamma }_l}}}d\bimu \,d\bigamma, <$$><!--$

with posterior density

(2.6)

$--><$$>\eqalign{ u\left( {\left. {\bimu, \bigamma } \right|{{\cal D}_I}} \right) \prod\limits_{k\, = \,0}^I \frac{1}{{{{\mu }_k}}} \prod\limits_{l\, = \,0}^I \frac{1}{{{{\gamma }_l}}}\; \; \varpropto \; \; \prod\limits_{i\, + \,j\,\leq \,I} \exp \,\left\{ {{\rm{ - }}\,\frac{{{{\mu }_i}{{\gamma }_j}}}{\varphi }} \right\}{{\left( {{{\mu }_i}{{\gamma }_j}} \right)}^{\frac{{{{X}_{i,j}}}}{\varphi }}} \cr \quad\times \, \prod\limits_{i\, = \,0}^I \frac{{\mu _{i}^{{{\rm{ - }}1}} }}{{2m}}{{ 1}_{({{e}^{{\rm{ - }}m}}, {{e}^m} )}}({{\mu }_i}) \prod\limits_{j\, = \,0}^I \frac{{\gamma _{j}^{{{\rm{ - }}1}} }}{{2b}}{{ 1}_{({{e}^{{\rm{ - }}b}}, {{e}^b} )}}({{\gamma }_j}). \cr}\eqno<$$><!--$

Maximizing the right-hand side of (2.6) provides the MAP estimators $\[--><$> \widehat{\mu }_{i}^{{MAP}} <$><$> \widehat{\gamma }_{j}^{{MAP}} <$><!--$ .

Remarks 2.3

• Basically the same remarks about the uniqueness of the MAP estimators $\[--><$> \widehat{\mu }_{i}^{{MAP}} <$><$> \widehat{\gamma }_{j}^{{MAP}} <$><$> {{[{{e}^{{\rm{ - }}m}} \,,\,{{e}^m} ]}^{I\, + \,1}} \,\times \,{{[{{e}^{{\rm{ - }}b}} \,,\,{{e}^b} ]}^{I\, + \,1}} <$><!--$ .
• The MAP optimization problem (2.6) can be solved analytically. This is discussed in Section 3.2, below.
• The MAP predictor for the outstanding loss liabilities R_i in (2.3) is then defined by

$--><$$>\breve{R}_{i}^{{MAP}} \, = \, \widehat{\mu }_{i}^{{MAP}} \mathop{\sum}\limits_{j\, = \,I\,{\rm{ - }}\,i\, + \,1}^I \widehat{\gamma }_{j}^{{MAP}} .\eqno<$$><!--$

This now leads to a slightly unpleasant observation. Note that the MMSE predictor in (2.5) does not depend on the parametrization. This is not true for the MAP predictor! The MAP estimators $\[--><$> \widehat{\alpha }_{i}^{{MAP}} <$><$> \widehat{\beta }_{j}^{{MAP}} <$><$> \widehat{\mu }_{i}^{{MAP}} <$><$> \widehat{\gamma }_{j}^{{MAP}} <$><!--$ will solve the system of equations (3.11)–(3.12), below. Because these two systems of equations differ, we find

$--><$$>{{e}^{\widehat{\alpha }_{i}^{{MAP}} }} {{e}^{\widehat{\beta }_{j}^{{MAP}} }} \, \ne \,\widehat{\mu }_{i}^{{MAP}} \widehat{\gamma }_{j}^{{MAP}} \; \; \; \; {\rm{which\ in \ general \ implies}}\; \; \; \; \widehat{R}_{i}^{{MAP}} \, \ne \,\breve{R}_{i}^{{MAP}} .\eqno<$$><!--$

This property is well known in Bayesian statistics, see for example Smith (Reference Smith1998). It gives us a first indication that the MAP predictor is not always suitable in a Bayesian context.

2.2 Prediction uncertainty

We measure the prediction uncertainty in terms of the conditional mean square error of prediction (MSEP) which for a $\[--><$>{\cal D}_I<$><$> {{\widehat{R}}_i} <$><!--$ for R_i is given by, see also Section 3.1 in Wüthrich & Merz (Reference Wüthrich and Merz2008),

(2.7)

$--><$$>{\rm{mse}}{{{\rm{p}}}_{{{R}_i}|{{\cal D}_I}}}\left( {\widehat{{{{R}_i}}}} \right)\, = \,E\left[ {{{{\left( {{{R}_i}\,{\rm{ - }}\,\widehat{{{{R}_i}}}} \right)}}^2} |{{\cal D}_I}} \right]\, = \,{\rm{Var}}\left. {({{R}_i}|{{\cal D}_I}} \right)\, + \,{{\left( {{{{\widehat{{{{R}_i}}}}}^{MMSE}} \,{\rm{ - }}\,\widehat{{{{R}_i}}}} \right)}^2} .\eqno<$$><!--$

From (2.7) we see that the MMSE predictor $\[--><$> \widehat{R}_{i}^{{MMSE}} \, = \,E\left[ {{{R}_i}|{{\cal D}_I}} \right] <$><!--$ minimizes the conditional MSEP. The conditional MSEP for the MAP predictor is given by

(2.8)

$--><$$>\eqalignno{\quad\quad\quad\quad\quad {\rm{mse}}{{{\rm{p}}}_{{{R}_i}|{{\cal D}_I}}}\left( {{{{\widehat{{{{R}_i}}}}}^{MAP}} } \right) = \,E\left[ {{{{\left( {{{R}_i}\,{\rm{ - }}\,{{{\widehat{{{{R}_i}}}}}^{MAP}} } \right)}}^2} |{{\cal D}_I}} \right] \cr & \quad= \,{\rm{mse}}{{{\rm{p}}}_{{{R}_i}|{{\cal D}_I}}}\left( {{{{\widehat{{{{R}_i}}}}}^{MMSE}} } \right)\, + \,{{\left( {{{{\widehat{{{{R}_i}}}}}^{MMSE}} \,{\rm{ - }}\,{{{\widehat{{{{R}_i}}}}}^{MAP}} } \right)}^2} \geq \,{\rm{mse}}{{{\rm{p}}}_{{{R}_i}|{{\cal D}_I}}}\left( {{{{\widehat{{{{R}_i}}}}}^{MMSE}} } \right). <$$><!--$

The MMSE predictor and the conditional MSEP can, in general, only be determined numerically, using e.g. the MCMC methodology.

First conclusions. In many situations the MAP predictor $\[--><$> {{\widehat{{{{R}_i}}}}^{MAP}} <$><$> {{\widehat{{{{R}_i}}}}^{MMSE}} <$><$> {{\left( {{{{\widehat{{{{R}_i}}}}}^{MMSE}} \,{\rm{ - }}\,{{{\widehat{{{{R}_i}}}}}^{MAP}} } \right)}^2} <$><$>{\cal D}_I)<$><!--$ .

2.3 Link to the chain ladder algorithm

The remarkable property of the MAP predictor $\[--><$> \widehat{R}_{i}^{{MAP}} <$><$> \widehat{R}_{i}^{{CL}} <$><!--$ . That is, the non-informative priors Bayesian Model 2.1 with MAP predictors is another stochastic model that leads to the CL reserves: since our system (2.1)–(2.2) of equations is exactly the same as the one for the ODP model, see (2.16)–(2.17) in Wüthrich & Merz (Reference Wüthrich and Merz2008), we have

(2.9)

$--><$$>\widehat{R}_{i}^{{MAP}} \, = \,\widehat{R}_{i}^{{CL}} .\eqno<$$><!--$

In the literature this was, for example, proved by Mack (Reference Mack1991). Therefore, we define for j = 0,…,I – 1 the CL factor estimators

$--><$$>{{\widehat{f}}_j}\, = \,\frac{{\mathop{\sum}\limits_{i\, = \,0}^{I\,{\rm{ - }}\,j\,{\rm{ - }}\,1} {{{C}_{i,j\, + \,1}}} }}{{\mathop{\sum}\limits_{i\, = \,0}^{I\,{\rm{ - }}\,j\,{\rm{ - }}\,1} {{{C}_{i,j}}} }}.\eqno<$$><!--$

Corollary 2.18 and Remarks 2.19 in Wüthrich & Merz (Reference Wüthrich and Merz2008) then imply that (for the appropriate normalizing constant K)

(2.10)

$--><$$>\widehat{\beta }_{j}^{{MAP}} \, = \,\log \left[ {\left( {1\,{\rm{ - }}\,\frac{1}{{{{{\widehat{f}}}_{j\,{\rm{ - }}\,1}}}}} \right)\prod\limits_{k\, = \,j}^{I\,{\rm{ - }}\,1} \frac{1}{{{{{\widehat{f}}}_k}}}} \right]\; \; \; \; {\rm{and}}\; \; \; \; \widehat{\alpha }_{i}^{{MAP}} = \log \left[ {\frac{{\mathop{\sum}\limits_{j\, = \,0}^{I\,{\rm{ - }}\,i} {{{X}_{i,j}}} }}{{\mathop{\sum}\limits_{j\, = \,0}^{I\,{\rm{ - }}\,i} {{{e}^{\widehat{\beta }_{j}^{{MAP}} }} } }}} \right].\eqno<$$><!--$

That is, we can explicitly calculate the MAP predictors. Moreover, this gives another stochastic model that allows for the calculation of the conditional MSEP given in (2.8). Unlike in Mack's (1993) distribution-free CL model and in the ODP model (see England & Verrall (Reference England and Verrall2002), Section 7.2) we do not need any approximations here for the estimation of the MSEP, but we calculate the exact conditional MSEP value (2.8) numerically in this Bayesian inference model (using the MCMC methodology).

In this spirit the parameter uncertainty of the estimate Θ is part of the model, see (2.8). Moreover, because we have all key figures in terms of the full posterior distributions, we can calculate any risk measure of interest (not only the conditional MSEP).

3 Gamma prior distributions

In Section 2 we have used uniform priors with log links in order to obtain the CL reserves. In this section gamma prior distributions (with the identity link) are used, especially for the modelling of the row parameters μ_i. This allows us to incorporate prior expert knowledge about the model parameters and we obtain claims reserves in a similar spirit to the BF method. However, in our model, we still have the freedom to determine how much credibility weight we give to the prior knowledge. A similar Bayesian ODP model with gamma priors has, for example, already been studied in Section 7.11 of England & Verrall (Reference England and Verrall2002) and Example 4.51 of Wüthrich & Merz (Reference Wüthrich and Merz2008).

3.1 Informative priors for the row parameters

Model 3.1In addition to Model 1.1 we assume that μ_i are Γ-distributed with mean m_i > 0 and shape parameter a_i > 0, γ_j are Γ-distributed with mean c_j > 0 and shape parameter b > 0, and ϕ > 0 is constant.

In contrast to Model 2.1, we now extend the prior model for μ_i to a two-parameter distribution. Our aim is to keep the mean m_i fixed and study the sensitivity in the shape parameter a_i. The priors for γ_j will be chosen to be non-informative (i.e. b→0).

The posterior density (likelihood function) in Model 3.1 is given by

(3.1)

$--><$$>u({\rm{\bitheta }}|{{\cal D}_I})\,{\rm{\varpropto }}\,\prod\limits_{i\, + \,j\,\leq \,I} \exp \left\{ {{\rm{ - }}\,\frac{{{{\mu }_i}{{\gamma }_j}}}{\varphi }} \right\}{{\left( {{{\mu }_i}{{\gamma }_j}} \right)}^{\frac{{{{X}_{i,j}}}}{\varphi }}} \prod\limits_{i\, = \,0}^I \mu _{i}^{{{{a}_i}\,{\rm{ - }}1}} \exp \left\{ {{\rm{ - }}\,\frac{{{{a}_i}{{\mu }_i}}}{{{{m}_i}}}} \right\}\prod\limits_{j\, = \,0}^I \gamma _{j}^{{b{\rm{ - }}1}} \exp \left\{ {{\rm{ - }}\,\frac{{b{{\gamma }_j}}}{{{{c}_j}}}} \right\}.\eqno<$$><!--$

The MAP predictors using non-informative priors for γ_j (i.e. b→0) are then found by solving

(3.2)

$--><$$>{{\mu }_i} \left( {\mathop{\sum}\limits_{j\, = \,0}^{I\,{\rm{ - }}\,i} {{\gamma }_j}\, + \,\frac{{{{a}_i}\varphi }}{{{{m}_i}}}} \right)\, = \,\mathop{\sum}\limits_{j\, = \,0}^{I\,{\rm{ - }}\,i} X_{{i,j}}^{ \ast } \, + \,{{a}_i}\varphi, \; \; \; \; {\rm{for}}\,i\, = \,0, \ldots, I,\eqno<$$><!--$

(3.3)

$--><$$>{{\gamma }_j} \mathop{\sum}\limits_{i\, = \,0}^{I\,{\rm{ - }}\,j} {{\mu }_i}\, = \,\mathop{\sum}\limits_{i\, = \,0}^{I\,{\rm{ - }}\,j} X_{{i,j}}^{ \ast }, \; \; \; \; \; \; \; \; \; \; \; \; {\rm{for}}\,j\, = \,0, \ldots, I,\eqno<$$><!--$

with adjusted incremental claims

$--><$$>X_{{i,j}}^{ \ast } \, = \,\left\{ {\matrix{ {{{X}_{i,j}}} \hfill & {{\rm{for}} \ j\,\geq \,1 \ {\rm{and}} \ i\,\geq \,1,} \hfill \cr {{{X}_{i,j}}\,{\rm{ - }}\,\varphi } \hfill & {{\rm{for}} \ (j\, = \,0 \ {\rm{and}} \ i\,\geq \,1) \ {\rm{or}} \ (j\,\geq \,1 \ {\rm{and}} \ i\, = \,0),} \hfill \cr {{{X}_{i,j}}\, + \,(I\,{\rm{ - }}\,1)\varphi } \hfill & {{\rm{for}} \ j\, = \,0 \ {\rm{and}} \ i\, = \,0,} \hfill<$$><!--$

and adjusted cumulative claims $\[--><$> C_{{i,j}}^{ \ast } \, = \,\mathop{\sum}\limits_{k\, = \,0}^j X_{{i,k}}^{ \ast } <$><!--$ .

Therefore, the MAP predictors for non-informative claims development pattern γ_j will be a function of the parameters ϕ, m = (m ₀,…,m_I) and a = (a ₀,…,a_I).

Lemma 3.2 We assume Model 3.1 is fulfilled, and we assume that $\[--><$>\mathop{\sum}\nolimits_{j\, = \,0}^{I\,{\rm{ - }}\,i} {X_{{i,j}}^{ \ast } } \, \gt \,0<$><$>\mathop{\sum}\nolimits_{i\, = \,0}^{I\,{\rm{ - }}\,j} X_{{i,j}}^{ \ast } \, \gt \,0<$><!--$ for all j = 0,…,I. The solution to (3.2)–(3.3)satisfies μ_i > 0 and γ_j > 0 for all a_i ≥ 0 and i,j = 0,…,I.

We first state a CL type result. Note that in the following lemma we do a CL argument on the rows instead of on the columns; and for a = 0 we obtain the CL method on rows. Its proof is similar to the classical CL result, see e.g. Section 2.4 in Wüthrich & Merz (Reference Wüthrich and Merz2008).

Lemma 3.3 In Model 3.1 equations (3.2)–(3.3) imply for j = 0,…,I−1

$--><$$>\frac{{\mathop{\sum}\limits_{k\, = \,0}^{I\,{\rm{ - }}\,j} {{{\mu }_k}} }}{{\mathop{\sum}\limits_{k\, = \,0}^{I\,{\rm{ - }}\,j\,{\rm{ - }}\,1} {{{\mu }_k}} }}\, = \,\frac{{\mathop{\sum}\limits_{k\, = \,0}^{I\,{\rm{ - }}\,j} {C_{{k,j}}^{ \ast } } \, + \,\varphi \mathop{\sum}\limits_{k\, = \,0}^{I\,{\rm{ - }}\,j} {{{a}_k}} \left( {1\,{\rm{ - }}\,\frac{{{{\mu }_k}}}{{{{m}_k}}}} \right)}}{{\mathop{\sum}\limits_{k\, = \,0}^{I\,{\rm{ - }}\,j\,{\rm{ - }}\,1} {C_{{k,j}}^{ \ast } } \, + \,\varphi \mathop{\sum}\limits_{k\, = \,0}^{I\,{\rm{ - }}\,j\,{\rm{ - }}\,1} {{{a}_k}} \left( {1\,{\rm{ - }}\,\frac{{{{\mu }_k}}}{{{{m}_k}}}} \right)}}.\eqno<$$><!--$

The statement of Lemma 3.3 can also be written in incremental form, i.e. for i = 1,…,I

(3.4)

$--><$$>\frac{{{{\mu }_i}}}{{\mathop{\sum}\limits_{k\, = \,0}^{i\,{\rm{ - }}\,1} {{{\mu }_k}} }}\, = \,\frac{{C_{{i,I\,{\rm{ - }}\,i}}^{ \ast } \, + \,\varphi {{a}_i}\left( {1\,{\rm{ - }}\,\frac{{{{\mu }_i}}}{{{{m}_i}}}} \right)}}{{\mathop{\sum}\limits_{k\, = \,0}^{i\,{\rm{ - }}\,1} {C_{{k,I\,{\rm{ - }}\,i}}^{ \ast } } \, + \,\varphi \mathop{\sum}\limits_{k\, = \,0}^{i\,{\rm{ - }}\,1} {{{a}_k}} \left( {1\,{\rm{ - }}\,\frac{{{{\mu }_k}}}{{{{m}_k}}}} \right)}}.\eqno<$$><!--$

This implies the following theorem:

Theorem 3.4 In Model 3.1 equations (3.2)–(3.3) imply for i = 1,…,I

(3.5)

$--><$$>{{\mu }_i}\, = \,\frac{{C_{{i,I\,{\rm{ - }}\,i}}^{ \ast } + {{a}_i} \varphi }}{{\mathop{\sum}\limits_{k\, = \,0}^{i\,{\rm{ - }}\,1} {C_{{k,I\,{\rm{ - }}\,i}}^{ \ast } } \, + \,\varphi \mathop{\sum}\limits_{k\, = \,0}^{i\,{\rm{ - }}\,1} {{{\mu }_k}} \, \left( {\frac{{{{a}_k}}}{{{{\mu }_k}}}\,{\rm{ - }}\,\frac{{{{a}_k}}}{{{{m}_k}}}\, + \,\frac{{{{a}_i}}}{{{{m}_i}}}} \right)}} \mathop{\sum}\limits_{k\, = \,0}^{i\,{\rm{ - }}\,1} {{\mu }_k},\eqno<$$><!--$

and

(3.6)

$--><$$> \mathop{\sum}\limits_{i\, = \,0}^I \frac{{{{a}_i} {{\mu }_i}}}{{{{m}_i}}}\, = \,\mathop{\sum}\limits_{i\, = \,0}^I {{a}_i}. \eqno<$$><!--$

Theorem 3.4 is now the key to obtain the MLE which are the same as the solutions of equations (3.2)–(3.3). Note that the right-hand side of (3.5) only depends on μ ₀,…,μ_i ₋₁. Therefore, once we know the initial value $\[--><$> {{\mu }_0} <$><!--$ , the remaining estimators for μ ₁,…,μ_I are calculated iteratively by (3.5). This is discussed in more detail below.

Solution to (3.2)–(3.3)fora∈(0,∞). We apply Theorem 3.4. Choose an initial value $\[--><$> {{\widetilde{\mu }}_0}(\mu )\, = \,\mu \, \gt \,0 <$><!--$ , then using (3.5) we define iteratively for i = 1,…,I

$--><$$>{{\widetilde{\mu }}_i}(\mu )\, = \,\frac{{C_{{i,I\,{\rm{ - }}\,i}}^{ \ast } \, + \,{{a}_i}\varphi }}{{\mathop{\sum}\limits_{k\, = \,0}^{i\,{\rm{ - }}\,1} {C_{{k,I\,{\rm{ - }}\,i}}^{ \ast } } \, + \,\varphi \mathop{\sum}\limits_{k\, = \,0}^{i\,{\rm{ - }}\,1} {{{{\widetilde{\mu }}}_k}(\mu )} \,\left( {\frac{{{{a}_k}}}{{{{{\tilde{\mu }}}_k}(\mu )}}\,{\rm{ - }}\,\frac{{{{a}_k}}}{{{{m}_k}}}\, + \,\frac{{{{a}_i}}}{{{{m}_i}}}} \right)}} \mathop{\sum}\limits_{k\, = \,0}^{i\,{\rm{ - }}\,1} {{\widetilde{\mu }}_k}(\mu )\, = \,\frac{1}{{{{{\cal I}}_i}(\mu )}} \mathop{\sum}\limits_{k\, = \,0}^{i\,{\rm{ - }}\,1} {{\widetilde{\mu }}_k}(\mu ),\eqno<$$><!--$

where we have defined

$--><$$>{{{\cal I}}_i}(\mu )\, = \,{{\left( {\frac{{C_{{i,I\,{\rm{ - }}\,i}}^{ \ast } \, + \,{{a}_i}\varphi }}{{\mathop{\sum}\limits_{k\, = \,0}^{i\,{\rm{ - }}\,1} {C_{{k,I\,{\rm{ - }}\,i}}^{ \ast } } \, + \,\varphi \mathop{\sum}\limits_{k\, = \,0}^{i\,{\rm{ - }}\,1} {{{{\widetilde{\mu }}}_k}(\mu )} \,\left( {\frac{{{{a}_k}}}{{{{{\widetilde{\mu }}}_k}(\mu )}}\,{\rm{ - }}\,\frac{{{{a}_k}}}{{{{m}_k}}}\, + \,\frac{{{{a}_i}}}{{{{m}_i}}}} \right)}}} \right)}^{{\rm{ - }}1}} .\eqno<$$><!--$

Note that the vector $\[--><$> ({{\widetilde{\mu }}_0}(\mu ), \ldots, {{\widetilde{\mu }}_I}(\mu )) <$><!--$ is now a function of one single parameter μ > 0. The MAP predictors for (3.2)–(3.3) are then found by using the normalizing condition (3.6), that is, choose μ > 0 such that

(3.7)

$--><$$>\mathop{\sum}\limits_{i\, = \,0}^I \frac{{{{a}_i} {{{\widetilde{\mu }}}_i}(\mu )}}{{{{m}_i}}}\, = \,\mathop{\sum}\limits_{i\, = \,0}^I \frac{{{{a}_i}}}{{{{m}_i}}} \frac{1}{{{{{\cal I}}_i}(\mu )}}\mathop{\sum}\limits_{k\, = \,0}^{i\,{\rm{ - }}\,1} {{\widetilde{\mu }}_k}(\mu )\,{\mathop{ = }\limits^{!}} \,\mathop{\sum}\limits_{i\, = \,0}^I {{a}_i}.\eqno<$$><!--$

We denote the resulting MAP predictors for 0 ≤ I, j ≤ I by

$--><$$>\widehat{\mu }_{i}^{{MAP \ast }} ({\bf {{a}}}) = \widehat{\mu }_{i}^{{MAP \ast }} ({\bf {{a}}},{\bf {{m}}},\varphi )\; \; \; \; {\rm{and}}\; \; \; \; \widehat{\gamma }_{j}^{{MAP \ast }} ({\bf {{a}}}) = \widehat{\gamma }_{j}^{{MAP \ast }} ({\bf {{a}}},{\bf {{m}}},\varphi ),\eqno<$$><!--$

where $\[--><$>\widehat{\gamma }_{j}^{{MAP \ast }} ({\bf {{a}}})<$><!--$ is obtained from (3.3).

Remarks 3.5

• Predictors in the gamma priors Model 3.1 are denoted by a superscript “*”, e.g. $\[--><$> \widehat{\gamma }_{j}^{ \ast } <$><$> \widehat{{{{\beta }_j}}} <$><$> {{\widehat{\gamma }}_j} <$><!--$ , respectively, depending on the parametrization (2.5).
• Note that the MAP predictors can now easily be found by a simple (one-dimensional) root searching algorithm (it only depends on one single parameter μ > 0, see (3.7)). This is slightly more involved than the closed form solution (2.10) in the uniform prior case, but it is a lot simpler than the multi-dimensional generalized linear model (GLM) claims reserving problems where one either uses the Newton-Raphson algorithm or Fisher's scoring method to find the roots for the multidimensional problems, see for example Chapter 6 in Wüthrich & Merz (Reference Wüthrich and Merz2008).
• For the special case of $\[--><$> {{m}_i}\, \equiv \,m <$><$> {{a}_i}\, \equiv \,a <$><!--$ we obtain a closed form solution. Equation (3.5) implies for constant m_i and a_i

$--><$$>{{\mu }_i}\, = \,\frac{{C_{{i,I\,{\rm{ - }}\,i}}^{ \ast } \, + \,a\varphi }}{{\mathop{\sum}\limits_{k\, = \,0}^{i\,{\rm{ - }}\,1} {\left( {C_{{k,I\,{\rm{ - }}\,i}}^{ \ast } \, + \,a\varphi } \right)} }} \mathop{\sum}\limits_{k\, = \,0}^{i\,{\rm{ - }}\,1} {{\mu }_k}.\eqno<$$><!--$

The normalization condition (3.6) then provides

(3.8)

$--><$$>\openup 8pt\eqalignno{\qquad \ \ m(I\, + \,1)\, & {\mathop{ = }\limits^{!}} \,\mathop{\sum}\limits_{i\, = \,0}^I {{\mu }_i}\, = \,\mathop{\sum}\limits_{i\, = \,0}^I \frac{{C_{{i,I\,{\rm{ - }}\,i}}^{ \ast } \, + \,a\varphi }}{{\mathop{\sum}\limits_{k = 0}^{i\,{\rm{ - }}\,1} {\left( {C_{{k,I\,{\rm{ - }}\,i}}^{ \ast } + a\varphi } \right)} }} \mathop{\sum}\limits_{k\, = \,0}^{i\,{\rm{ - }}\,1} {{\mu }_k} \cr = \,\left( {\frac{{C_{{I,0}}^{ \ast } \, + \,a\varphi }}{{\mathop{\sum}\limits_{k\, = \,0}^{I\,{\rm{ - }}\,1} {\left( {C_{{k,0}}^{ \ast } \, + \,a\varphi } \right)} }}\, + \,1} \right)\mathop{\sum}\limits_{i\, = \,0}^{I\,{\rm{ - }}\,1} {{\mu }_i}\, = \, \ldots \, = \,\prod\limits_{j\, = \,0}^{I\,{\rm{ - }}\,1} \frac{{\mathop{\sum}\limits_{k\, = \,0}^{I\,{\rm{ - }}\,j} {\left( {C_{{k,j}}^{ \ast } \, + \,a\varphi } \right)} }}{{\mathop{\sum}\limits_{k\, = \,0}^{I\,{\rm{ - }}\,j\,{\rm{ - }}\,1} {\left( {C_{{k,j}}^{ \ast } \, + \,a\varphi } \right)} }} {{\mu }_0}. <$$><!--$

Hence, from this we can explicitly calculate the MAP predictor

$--><$$>\widehat{\mu }_{0}^{{MAP \ast }} ({\bf {{a}}})\, = \,m(I\, + \,1){{\left( {\prod\limits_{j\, = \,0}^{I\,{\rm{ - }}\,1} \frac{{\mathop{\sum}\limits_{k\, = \,0}^{I\,{\rm{ - }}\,j} {\left( {C_{{k,j}}^{ \ast } \, + \,a\varphi } \right)} }}{{\mathop{\sum}\limits_{k\, = \,0}^{I\,{\rm{ - }}\,j\,{\rm{ - }}\,1} {\left( {C_{{k,j}}^{ \ast } \, + \,a\varphi } \right)} }}} \right)}^{{\rm{ - }}1}}, \eqno<$$><!--$

and the iteration then provides the remaining MAP predictors.

• We can now study the MAP predictors as a function of the degree of information a contained in the prior estimates m_i, in particular, we obtain a smoothed claims development pattern $\[--><$>\widehat{\gamma }_{j}^{{MAP \ast }} ({\bf {{a}}})\eqno<$><!--$ , where the degree of smoothing depends on a.

The MAP predictor for the outstanding loss liabilities of accident year i > 0 is then given by

(3.9)

$--><$$>\widehat{R}_{i}^{{MAP \ast }} ({\bf {{a}}})\, = \, \widehat{\mu }_{i}^{{MAP \ast }} ({\bf {{a}}}) \mathop{\sum}\limits_{j\, = \,I\,{\rm{ - }}\,i\, + \,1}^I \widehat{\gamma }_{j}^{{MAP \ast }} ({\bf {{a}}}).\eqno<$$><!--$

3.2 Non-informative prior case

For the non-informative prior case we let a_i→0 for all i. The posterior density (likelihood function) in Model 3.1 for non-informative priors is then asymptotically given by

(3.10)

$--><$$>u(\bitheta |{{\cal D}_I})\,{\rm{\varpropto }}\,\prod\limits_{i\, + \,j\,\leq \,I} \exp \left\{ {{\rm{ - }}\,\frac{{{{\mu }_i}{{\gamma }_j}}}{\varphi }} \right\}{{\left( {{{\mu }_i}{{\gamma }_j}} \right)}^{\frac{{{{X}_{i,j}}}}{\varphi }}} \prod\limits_{i\, = \,0}^I \mu _{i}^{{{\rm{ - }}1}} \prod\limits_{j\, = \,0}^I \gamma _{j}^{{{\rm{ - }}1}} .\eqno<$$><!--$

There are two important observations: (i) The non-informative prior case in Model 3.1 has exactly the same posterior density as the non-informative prior case in Model 2.1 “under the change of variables’’, see (3.10) and (2.6) for m,b→∞. Therefore, the predictive posterior distributions of the outstanding loss liabilities R_i in these two non-informative priors models will coincide as well as their MAP predictors. (ii) Note that in the case (3.10) the last terms on the left-hand side and the right-hand side of (3.2) disappear. Therefore, we are left with

(3.11)

$--><$$>{{\mu }_i} \mathop{\sum}\limits_{j\, = \,0}^{I\,{\rm{ - }}\,i} {{\gamma }_j}\, = \,\mathop{\sum}\limits_{j\, = \,0}^{I\,{\rm{ - }}\,i} X_{{i,j}}^{ \ast }, \; \; \; \; \; \; \; \; \; \; \; \; {\rm{for}} \ i \, = \,0, \ldots, I,\eqno<$$><!--$

(3.12)

Similar to the solution of (2.1)–(2.2) we find the following solutions to (3.11)–(3.12)

$--><$$>\widehat{\gamma }_{j}^{{MAP \ast }} (0)\, = \,{{e}^K} \left( {1\,{\rm{ - }}\,\frac{1}{{\widehat{f}_{{j\,{\rm{ - }}\,1}}^{ \ast } }}} \right)\prod\limits_{k\, = \,j}^{I\,{\rm{ - }}\,1} \frac{1}{{\widehat{f}_{k}^{ \ast } }}\; \; \; \; {\rm{and}}\; \; \; \; \widehat{\mu }_{i}^{{MAP \ast }} (0)\, = \,\frac{{\mathop{\sum}\limits_{j\, = \,0}^{I\,{\rm{ - }}\,i} {X_{{i,j}}^{ \ast } } }}{{\mathop{\sum}\limits_{j\, = \,0}^{I\,{\rm{ - }}\,i} {\widehat{\gamma }_{j}^{{MAP \ast }} } (0)}},\eqno<$$><!--$

for any positive constant e ^K and CL factors $\[--><$> \widehat{f}_{j}^{ \ast } <$><$> C_{{i,j\, + \,1}}^{ \ast } <$><!--$

$--><$$>\widehat{f}_{j}^{ \ast } \, = \,\frac{{\mathop{\sum}\limits_{i\, = \,0}^{I\,{\rm{ - }}\,j\,{\rm{ - }}\,1} {C_{{i,j\, + \,1}}^{ \ast } } }}{{\mathop{\sum}\limits_{i\, = \,0}^{I\,{\rm{ - }}\,j\,{\rm{ - }}\,1} {C_{{i,j}}^{ \ast } } }}.\eqno<$$><!--$

Therefore, we obtain a CL model for the incremental claims $\[--><$> X_{{i,j}}^{ \ast } <$><!--$ . However, in this case we can find a “natural’’ normalizing constant e ^K. Theorem 3.4 implies

(3.13)

$--><$$>{{\mu }_i}(0)\, = \, \mathop {\lim }\limits_{{\bf {{a}}} \rightarrow 0} {{\mu }_i}({\bf {{a}}})\, = \, \frac{{C_{{i,I\,{\rm{ - }}\,i}}^{ \ast } }}{{\mathop{\sum}\limits_{k\, = \,0}^{i\,{\rm{ - }}\,1} {C_{{k,I\,{\rm{ - }}\,i}}^{ \ast } } }} \mathop{\sum}\limits_{k\, = \,0}^{i\,{\rm{ - }}\,1} {{\mu }_k}(0).\eqno<$$><!--$

Proposition 3.6 In Model 3.1 equations (3.2)–(3.3)imply

$--><$$>\mathop {\lim }\limits_{{\bf {{a}}} \rightarrow 0} \widehat{\mu }_{0}^{{MAP \ast }} ({\bf {{a}}})\, = \, (I\, + \,1){{\left[ {\frac{1}{{{{m}_0}}}\, + \,\mathop{\sum}\limits_{j\, = \,0}^{I\,{\rm{ - }}\,1} \frac{1}{{{{m}_{I\,{\rm{ - }}\,j}}}} \frac{{C_{{I\,{\rm{ - }}\,j,j}}^{ \ast } }}{{\mathop{\sum}\limits_{k\, = \,0}^{I\,{\rm{ - }}\,j\,{\rm{ - }}\,1} C_{{k,j}}^{ \ast } }} \prod\limits_{n\, = \,j\, + \,1}^{I\,{\rm{ - }}\,1} \frac{{\mathop{\sum}\limits_{m\, = \,0}^{I\,{\rm{ - }}\,n} {C_{{m,n}}^{ \ast } } }}{{\mathop{\sum}\limits_{m\, = \,0}^{I\,{\rm{ - }}\,n\,{\rm{ - }}\,1} {C_{{m,n}}^{ \ast } } }}} \right]}^{{\rm{ - }}1}} .\eqno<$$><!--$

Therefore, Proposition 3.6 provides a natural scaling constant $\[--><$> {{e}^K} \, \gt \,0 <$><!--$ if we let the degree of information a converge to 0.

3.3 Strong prior case

For the strong prior case we let a_i→∞ for all i and obtain from (3.2)–(3.3)

$--><$$>\frac{{{{\mu }_i}}}{{{{m}_i}}}\, = \,1,\; \; \; \; \; \; \; \; \; \; \; \; {\rm{for}}\,i\, = \,0, \ldots, I,\eqno<$$><!--$

Therefore, $\[--><$> \widehat{\mu }_{i}^{{MAP \ast }} (\infty ) = {{m}_i} <$><!--$ and

$--><$$>\widehat{\gamma }_{j}^{{MAP \ast }} (\infty )\, = \,\mathop{\sum}\limits_{i\, = \,0}^{I\,{\rm{ - }}\,j} X_{{i,j}}^{ \ast } \,/\, \mathop{\sum}\limits_{i\, = \,0}^{I\,{\rm{ - }}\,j} {{m}_i}.\eqno<$$><!--$

In this case we can explicitly calculate the posterior distributions of γ_j, given $\[--><$>{\cal D}_I<$><!--$ . These posterior distributions are independent with

(3.14)

$--><$$>{{\left. {{{\gamma }_j}} \right|}_{{{\cal D}_I}}} {\mathop{ \sim }\limits^{{(d)}}} \,{\rm{\rGamma }}\,\left( {\mathop{\sum}\limits_{i\, = \,0}^{I\,{\rm{ - }}\,j} {{X}_{i,j}}/\varphi, \mathop{\sum}\limits_{i\, = \,0}^{I\,{\rm{ - }}\,j} {{m}_i}/\varphi } \right).\eqno<$$><!--$

This immediately implies that

(3.15)

$--><$$>\widehat{\gamma }_{j}^{{MMSE \ast }} (\infty )\, = \,\frac{{\mathop{\sum}\limits_{i\, = \,0}^{I\,{\rm{ - }}\,j} {{{X}_{i,j}}} }}{{\mathop{\sum}\limits_{i\, = \,0}^{I\,{\rm{ - }}\,j} {{{m}_i}} }},\eqno<$$><!--$

and the bias of the MAP predictor of γ_j is given by

$--><$$>\widehat{\gamma }_{j}^{{MMSE \ast }} (\infty )\,{\rm{ - }}\,\widehat{\gamma }_{j}^{{MAP \ast }} (\infty )\, = \,\frac{{\mathop{\sum}\limits_{i\, = \,0}^{I\,{\rm{ - }}\,j} {{{X}_{i,j}}} \,{\rm{ - }}\,\mathop{\sum}\limits_{i\, = \,0}^{I\,{\rm{ - }}\,j} {X_{{i,j}}^{ \ast } } }}{{\mathop{\sum}\limits_{i\, = \,0}^{I\,{\rm{ - }}\,j} {{{m}_i}} }}\, = \,\frac{\varphi }{{\mathop{\sum}\limits_{i\, = \,0}^{I\,{\rm{ - }}\,j} {{{m}_i}} }}.\eqno<$$><!--$

Therefore, in the strong prior case we obtain closed form posterior distributions which allow for an analytical analysis of the model, both for the MAP predictor

$--><$$>\widehat{R}_{i}^{{MAP \ast }} (\infty )\, = \,{{m}_i} \mathop{\sum}\limits_{j\, = \,I\,{\rm{ - }}\,i\, + \,1}^I \widehat{\gamma }_{j}^{{MAP \ast }} (\infty ),\eqno<$$><!--$

and the MMSE predictor

$--><$$>\widehat{R}_{i}^{{MMSE \ast }} (\infty )\, = \,{{m}_i} \mathop{\sum}\limits_{j\, = \,I\,{\rm{ - }}\,i\, + \,1}^I \widehat{\gamma }_{j}^{{MMSE \ast }} (\infty ).\eqno<$$><!--$

3.4 Link to the Bornhuetter & Ferguson (Reference Bornhuetter and Ferguson1972) method

The BF method, as applied in practice, uses as claims development pattern γ_j the one implied by the CL factor estimates given in (2.10). Therefore, the classical BF predictor for the outstanding loss liabilities is given by

(3.16)

$--><$$>\widehat{R}_{i}^{{BF}} \, = \,{{m}_i} \frac{{\mathop{\sum}\limits_{j\, = \,I\,{\rm{ - }}\,i\, + \,1}^I {{{e}^{\widehat{\beta }_{j}^{{MAP}} }} } }}{{\mathop{\sum}\limits_{j\, = \,0}^I {{{e}^{\widehat{\beta }_{j}^{{MAP}} }} } }},\eqno<$$><!--$

where the $\[--><$> \widehat{\beta }_{j}^{{MAP}} <$><$> \widehat{R}_{i}^{{BF}} <$><!--$ exactly corresponds to the BF predictor studied in Alai et al. (Reference Alai, Merz and Wüthrich2009).

Mack (Reference Mack2008) provides a different BF predictor where he uses a different method for the estimation of the claims development pattern γ_j. We include a comparison of the results with two versions of the Mack (Reference Mack2008) method. In the first case we define the raw pattern, see formula (3) in Mack (Reference Mack2008),

$--><$$>\widehat{\gamma }_{j}^{{raw}} \, = \,\frac{{\mathop{\sum}\limits_{i\, = \,0}^{I\,{\rm{ - }}\,j} {{{X}_{i,j}}} }}{{\mathop{\sum}\limits_{i\, = \,0}^{I\,{\rm{ - }}\,j} {{{m}_i}} }}\, = \, \widehat{\gamma }_{j}^{{MMSE \ast }} (\infty ).\eqno<$$><!--$

This pattern is not normalized, i.e. does not add up to 1. Therefore, we can also study a second development pattern defined by

$--><$$>\widehat{\gamma }_{j}^{{norm}} \, = \,\frac{{\widehat{\gamma }_{j}^{{raw}} }}{{\mathop{\sum}\limits_{j = 0}^I {\widehat{\gamma }_{j}^{{raw}} } }}.\eqno<$$><!--$

We then define similar to Mack (Reference Mack2008)

$--><$$>\widehat{R}_{i}^{{Mack1}} \, = \,{{m}_i}\mathop{\sum}\limits_{j\, = \,I\,{\rm{ - }}\,i\, + \,1}^I \widehat{\gamma }_{j}^{{raw}} \, = \,\widehat{R}_{i}^{{MMSE \ast }} (\infty )\; \; \; \; {\rm{and}}\; \; \; \; \widehat{R}_{i}^{{Mack2}} = {{m}_i}\mathop{\sum}\limits_{j\, = \,I\,{\rm{ - }}\,i\, + \,1}^I \widehat{\gamma }_{j}^{{norm}} .\eqno<$$><!--$

These BF predictors $\[--><$> \widehat{R}_{i}^{{BF}} <$><$> \widehat{R}_{i}^{{Mack1}} <$><$> \widehat{R}_{i}^{{Mack2}} <$><$> \widehat{R}_{i}^{{CL}} \, = \,\widehat{R}_{i}^{{MAP}} <$><$> \widehat{R}_{i}^{{MAP \ast }} ({\bf {{a}}}) <$><!--$ , for a_i∈[0,∞] and the corresponding MMSE predictors. In this spirit, the Bayesian predictors can be viewed as BF predictors where a_i determines the degree of information contained in the prior value m_i. These predictions and estimators are compared in Section 5.

Moreover, $\[--><$> \widehat{\gamma }_{j}^{{MAP \ast }} ({\bf {{a}}}) <$><$> \widehat{\gamma }_{j}^{{MMSE \ast }} ({\bf {{a}}}) <$><!--$ can be viewed as smoothed claims development patterns where we account for the prior information m_i according to its degree of information a_i for smoothing.

4 Bias, prediction uncertainty and MCMC

4.1 Gibbs sampler

In general, Models 2.1 and 3.1 do not allow for analytical calculations of the posterior distributions. In most cases the posterior distribution of the parameters can only be determined up to the normalizing constant. This is then the ideal situation to apply MCMC simulation methods which provide empirical posterior distributions. These empirical posterior distributions then allow for the calculation of claims reserves, cash flows and any desirable risk measure. For an introduction to MCMC methods we refer to Gilks et al. (Reference Gilks, Richardson and Spiegelhalter1996), Asmussen & Glynn (Reference Asmussen and Glynn2007) and Spiegelhalter et al. (Reference Spiegelhalter, Thomas, Best and Gilks1995, Reference Spiegelhalter, Best, Carlin and van der Linde2002). We mention that in recent actuarial literature MCMC methods became rather popular, see e.g. Scollnik (Reference Scollnik2001) and the literature therein, England & Verrall (Reference England and Verrall2002, Reference England and Verrall2006) and Section 4.4 in Wüthrich & Merz (Reference Wüthrich and Merz2008).

Here, we use the Gibbs sampler, see Gilks et al. (Reference Gilks, Richardson and Spiegelhalter1996), page 12. The Gibbs sampler is a simplified version of the single-component Metropolis-Hastings algorithm (Metropolis et al., Reference Metropolis, Rosenbluth, Rosenbluth, Teller and Teller1953, Hastings, Reference Hastings1970). Our aim is to sample from the posterior density u(θ| $\[--><$>{\cal D}_I<$><$> \bitheta \, = \,(\bimu, \bigamma )\, = \,({{\mu }_0}, \ldots, {{\mu }_I},{{\gamma }_0}, \ldots, {{\gamma }_I}) <$><!--$ , see (3.1). This posterior density has the special property that

$--><$$>u(\bimu |{{\cal D}_I},\bigamma )\,\,{\rm{are}} \ {\rm{independent}} \ {\rm{gamma}} \ {\rm{densities}} \ {\rm{with}} \ {\rm{parameters}} \ {{a}_i}\, + \,\mathop{\sum}\limits_{j\,{\rm{ = }} \ {\rm{0}}}^{I\,{\rm{ - }}\,i} \frac{{{{X}_{i,j}}}}{\varphi } \ {\rm{and}}\,\frac{{{{a}_i}}}{{{{m}_i}}}\, + \,\mathop{\sum}\limits_{j\,{\rm{ = }} \ {\rm{0}}}^{I\,{\rm{ - }}\,i} \frac{{{{\gamma }_{\rm{j}}}}}{\varphi }{\rm{,}}\eqno<$$><!--$

$--><$$>u(\bigamma |{{\cal D}_I},\bimu ) \ {\rm{are}} \ {\rm{independent}} \ {\rm{gamma}} \ {\rm{densities}} \ {\rm{with}} \ {\rm{parameters}}\,b\, + \,\mathop{\sum}\limits_{i\,{\rm{ = }} \ {\rm{0}}}^{I\,{\rm{ - }}\,j} \frac{{{{X}_{i,j}}}}{\varphi } \ {\rm{and}}\,\frac{b}{{{{c}_j}}}\, + \,\mathop{\sum}\limits_{i\,{\rm{ = }} \ {\rm{0}}}^{I\,{\rm{ - }}\,j} \frac{{{{\mu}_i}}}{j}{\rm{.}}\eqno<$$><!--$

Thus, from these conditional posterior densities u (μ| $\[--><$>{\cal D}_I, \bigamma<$><$>\bigamma | {\cal D}_I<$><!--$ ,μ) we can directly sample from. The Gibbs sampler then goes as follows:

1. Initialize $\[--><$> {{{\rm{\brTheta }}}^{(0)}} \, = \,({{\bimu }^{(0)}}, {{\bigamma }^{(0)}} ) <$><!--$ .
2. For t ≥ 1 do
1. (a) generate $\[--><$> {{\bimu }^{(t)}} \, \sim \,u( \cdot |\,{{{\cal D}}_I}\,,\,{{\bigamma }^{(t\,{\rm{ - }}\,1)}} ); <$><$> {{{\rm{\bigamma }}}^{(t)}} \sim u( \cdot |{{{\cal D}}_I},{{\bimu }^{(t)}} ); <$><$> {{{\rm{\brTheta }}}^{(t)}} \, = \,({{\bimu }^{(t)}}, {{\bigamma }^{(t)}} ) <$><!--$ .

Then, this algorithm provides a Markov chain $\[--><$> {{({{\brTheta }^{(t)}} )}_{t\,\geq \,0}} <$><$>{\cal D}_I<$><$>{\cal D}_I<$><!--$ ), see Gilks et al. (Reference Gilks, Richardson and Spiegelhalter1996) and Asmussen & Glynn (Reference Asmussen and Glynn2007).

4.2 Empirical distribution from Gibbs sampling

Using the Gibbs sampler we obtain (after burn-in T) an empirical distribution from the sample

$--><$$>{{\left( {{{\brTheta }^{(t)}} \, = \,({{\bimu }^{(t)}}, {{\bigamma }^{(t)}} )} \right)}_{t\, \gt \,T}}\, = \,{{\left( {(\mu _{0}^{{(t)}}, \ldots, \mu _{I}^{{(t)}}, \gamma _{0}^{{(t)}}, \ldots, \gamma _{I}^{{(t)}} )} \right)}_{t\, \gt \,T}}\eqno<$$><!--$

which is an estimator for the posterior distribution u(·| $\[--><$>{\cal D}_I<$><$> \widehat{R}_{i}^{{MMSE}} <$><!--$ by the sample mean

$--><$$>{{\widehat{{{{{\widehat{R}}}_i}}}}^{MMSE}} \, = \,\frac{1}{{\widetilde{T} \ {\rm{ - }}\,T}} \mathop{\sum}\limits_{t\, = \,T\, + \,1}^{\tilde{T}} \mathop{\sum}\limits_{j\, = \,I\,{\rm{ - }}\,i\, + \,1}^I \mu _{i}^{{(t)}} \gamma _{j}^{{(t)}} .\eqno<$$><!--$

To indicate that this is the sample mean we use two hats in the notation. The conditional MSEP of the MMSE predictor is estimated similarly. Note that

(4.1)

$--><$$>\openup 4pt\eqalign{ {\rm{mse}}{{{\rm{p}}}_{{{R}_i}|{{\cal D}_I}}}\left( {{{{\widehat{{{{R}_i}}}}}^{MMSE}} } \right) & \, = \,{\rm{Var}}\,\left( {{{R}_i}|{{{\cal D}}_I}} \right) \cr & = E\left[ {{\rm{Var}}\left( {{{R}_i}|{{{\cal D}}_I},\brTheta } \right)|{{{\cal D}}_I}} \right]\, + \,{\rm{Var}}\,\left( {E\left[ {{{R}_i}|{{{\cal D}}_I},\brTheta } \right]|{{{\cal D}}_I}} \right) \cr & = \varphi E\left[ {{{R}_i}|{{{\cal D}}_I}} \right]\, + \,{\rm{Var}}\,\left( {\mathop{\sum}\limits_{j\, = \,I\,{\rm{ - }}\,i\, + \,1}^I {{\mu }_i}{{\gamma }_j} \bigg| {{{\cal D}}_I}} \right). \cr}\eqno<$$><!--$

Therefore, we get the estimator

$--><$$>{{\widehat{{\widehat{{{\rm{msep}}}}}}}_{{{R}_i}|{{{\cal D}}_I}}}\left( {{{{\widehat{{{{R}_i}}}}}^{MMSE}} } \right)\, = \,\varphi {{\widehat{{{{{\widehat{R}}}_i}}}}^{MMSE}} \, + \,\frac{1}{{\widetilde{T} \ {\rm{ - }}\,T}} \mathop{\sum}\limits_{t\, = \,T\, + \,1}^{\widetilde{T}} {{\left( {\mathop{\sum}\limits_{j\, = \,I\,{\rm{ - }}\,i\, + \,1}^I \mu _{i}^{{(t)}} \gamma _{j}^{{(t)}} } \right)}^2} \,{\rm{ - }} \ {{\left( {{{{\widehat{{{{{\widehat{R}}}_i}}}}}^{MMSE}} } \right)}^2} .\eqno<$$><!--$

Remarks 4.1

• We would like to emphasize that using the Gibbs sampler we do not only estimate the conditional MSEP. The Gibbs sampler provides an approximation to the full posterior distribution u(·| $\[--><$>{\cal D}_I<$><!--$ ) and one can calculate any desirable risk measure.
• The empirical sample $\[--><$> {{\left( {{{\brTheta }^{(t)}} } \right)}_{t\, \gt \,T}} <$><!--$ allows for the simulation of the payments X_i,j: for any t > T we may sample for i + j > I

(4.2)

$--><$$>X_{{i,j}}^{{(t)}} /\varphi \,{\mathop{ \sim }\limits^{{(d)}}} \,{\rm{Poi}}\,\left( {\mu _{i}^{{(t)}} \gamma _{j}^{{(t)}} /\varphi } \right).\eqno<$$><!--$

This provides the simulated cash flows. The sampled outstanding loss liabilities R_i are then obtained by

(4.3)

$--><$$>R_{i}^{{(t)}} \, = \,\mathop{\sum}\limits_{j\, = \,I\,{\rm{ - }}\,i\, + \,1}^I X_{{i,j}}^{{(t)}} .\eqno<$$><!--$

The sample $\[--><$> {{(R_{i}^{{(t)}} )}_{t\, \gt \,T}} <$><$>{\cal D}_I<$><!--$ , see also Figure 2. Moreover, it also allows for the direct estimation of (4.1), simply by calculating the sample variance of the simulated values.

5 Example

5.1 Univariate example

Before we start with a real data example (in the next subsection) we illustrate the behaviour of the MAP and the MMSE predictors in a univariate example. This example highlights the importance of the choice of the prior distribution, the link function and its implications.

Assume conditionally, given Λ, X ₁,…,X_n, X_n ₊₁ are i.i.d. Poisson distributed with parameter Λ. We assume that X ₁,…,X_n are observed and we would like to make Bayesian inference for Λ and X_n ₊₁.

We now make different choices for the distribution Λ:

Case 1. Γ = log(Λ) has a non-informative uniform prior distribution. In that case the posterior distribution of Γ, given X = (X ₁,…,X_n), is given by

(5.1)

$--><$$>u\left( {\rGamma |{\bf {{X}}})} \right.\,\varpropto \, {{e}^{{\rm{ - }}n{{e}^{\rm{\rGamma }}} }} {{({{e}^{\rm{\rGamma }}} )}^{\mathop{\sum}\limits_{i\, = \,1}^n {{X}_i}}} .\eqno<$$><!--$

This implies

$--><$$>{{e}^{{{{\widehat{\rGamma }}}^{MAP}} }} \, = \,\frac{1}{n}\mathop{\sum}\limits_{i\, = \,1}^n {{X}_i}\; \; \; \; {\rm{and}}\; \; \; \; {{\widehat{{\rm{\rLambda }}}}^{MMSE}} \, = \,E\left[ {\left. {{{e}^{\rm{\rGamma }}} } \right|{\bf {{X}}}} \right]\, = \,\frac{1}{n}\mathop{\sum}\limits_{i\, = \,1}^n {{X}_i}.\eqno<$$><!--$

In this case the MAP and the MMSE predictors coincide.

Case 2. We make the same assumptions as in Case 1 but we do a change of variable in (5.1). We set Λ = e ^Γ this provides posterior density

$--><$$>u\left( {{\rm{\rLambda }}|{\bf {{X}}})} \right.\,\varpropto \,{{e}^{{\rm{ - }}n{\rm{\rLambda }}}} {{{\rm{\rLambda }}}^{\mathop{\sum}\limits_{i\, = \,1}^n {{{X}_i}{\rm{ - }}1} }} .\eqno<$$><!--$

This implies

$--><$$>{{\widehat{{\rm{\rLambda }}}}^{MAP}} \, = \,\frac{1}{n}\mathop{\sum}\limits_{i\, = \,1}^n {{X}_i} \ {\rm{ - }}\,\frac{1}{n}\; \; \; \; {\rm{and}}\; \; \; \; {{\widehat{{\rm{\rLambda }}}}^{MMSE}} \, = \,\frac{1}{n}\mathop{\sum}\limits_{i\, = \,1}^n {{X}_i}.\eqno<$$><!--$

That is, we obtain $\[--><$> {{\widehat{{\rm{\rLambda }}}}^{MAP}} \, \lt \,{{\widehat{{\rm{\rLambda }}}}^{MMSE}} <$><!--$ . This shows that the MAP predictors are not invariant under re-parametrization and therefore are often not appropriate. This is well-known in Bayesian theory, see for example Smith (Reference Smith1998).

Case 3. Λ has a non-informative gamma prior distribution. In that case the posterior distribution of Λ, given X, has exactly the same form as in Case 2 and therefore we obtain the same inference picture as in Case 2.

Case 4. Λ has the non-informative Jeffrey's prior distribution $\[--><$> {{\lambda }^{{\rm{ - }}1/2}} <$><!--$ . In that case the posterior distribution of Λ, given X = (X ₁,…,X_n), is given by

$--><$$>u({\rm{\rLambda }}|{\bf {{X}}})\,\varpropto \,{{e}^{{\rm{ - }}n{\rm{\rLambda }}}} \,{\rm{\rLambda }}{{\,}^{\mathop{\sum}\limits_{i\, = \,1}^n {{{X}_i}{\rm{ - }}1/2} }} .\eqno<$$><!--$

Jeffrey's non-informative priors are often used because they have invariance properties under parameter transformations. This implies

$--><$$>{{\widehat{{\rm{\rLambda }}}}^{MAP}} \, = \,\frac{1}{n}\mathop{\sum}\limits_{i\, = \,1}^n {{X}_i} \ {\rm{ - }}\,\frac{1}{{2n}}\; \; \; \; {\rm{and}}\; \; \; \; {{\widehat{{\rm{\rLambda }}}}^{MMSE}} \, = \,\frac{1}{n}\mathop{\sum}\limits_{i\, = \,1}^n {{X}_i}\, + \,\frac{1}{{2n}}.\eqno<$$><!--$

In this paper we do not further investigate Jeffrey's priors.

Conclusion. The MMSE predictor $\[--><$> {{\widehat{{\rm{\rLambda }}}}^{MMSE}} <$><$> {{{\cal D}}_I} <$><!--$ , is always given by

$--><$$>E\left[ {\left. {{{X}_{n\, + \,1}}} \right|{\bf {{X}}}} \right]\, = \,E\left[ {\left. {E\left[ {\left. {{{X}_{n\, + \,1}}} \right|{\rm{\rLambda }},{\bf {{X}}}} \right]} \right|{\bf {{X}}}} \right]\, = \,E\left[ {\left. {\rm{\rLambda }} \right|{\bf {{X}}}} \right]\, = \,{{\widehat{{\rm{\rLambda }}}}^{MMSE}} .\eqno<$$><!--$

5.2 Real data example

We revisit the BF example given in Tables 2.2–2.4 of Wüthrich & Merz (Reference Wüthrich and Merz2008) (this is the example also considered in the BF analysis in Alai et al., Reference Alai, Merz and Wüthrich2009)), see Table 1. We analyze this data set for non-informative uniform priors according to Model 2.1 and for gamma priors according to Model 3.1. In order to compare the results to the results in Wüthrich & Merz (Reference Wüthrich and Merz2008) and Alai et al. (Reference Alai, Merz and Wüthrich2009) we choose a fixed plug-in estimate ϕ = 14,714.

Table 1 Observed incremental claims X_i,j, i + j ≤ I, and prior values m_i.

5.2.1 Non-informative priors and the CL method

In this subsection we study Model 2.1 with non-informative uniform priors and log link as well as Model 3.1 with non-informative gamma priors. The Gibbs sampler allows us to numerically calculate the MMSE predictors

$--><$$>{{\widehat{{\widehat{R}}}}^{MMSE}} \, = \,\mathop{\sum}\limits_i {{\widehat{{{{{\widehat{R}}}_i}}}}^{MMSE}} \; \; \; \; {\rm{and}}\; \; \; \; {{\widehat{{\widehat{R}}}}^{MMSE \ast }} \, = \,\mathop{\sum}\limits_i {{\widehat{{{{{\widehat{R}}}_i}}}}^{MMSE \ast }}, \eqno<$$><!--$

for Model 2.1 (with m,b→∞) and Model 3.1 (with $\[--><$> {{a}_i}\, \equiv \,a\, \rightarrow \,0 <$><!--$ and b→0). In Model 2.1 the posterior density is then given by (2.5)–(2.6) with m,b→∞. In Model 3.1 the posterior density is then given by (3.10).

Note that for these two non-informative prior cases the posterior densities coincide, see (2.6) and (3.10). Therefore, we only need to run one Gibbs simulation to solve both of these two cases numerically.

We have used the Gibbs sampler and we have run 1,000,000 simulations after the subtraction of burn-in costs T = 100,000. This provided the empirical posterior distribution of the parameters $\[--><$> {{\left( {(\mu _{0}^{{(t)}}, \ldots, \mu _{I}^{{(t)}}, \gamma _{0}^{{(t)}}, \ldots, \gamma _{I}^{{(t)}} )} \right)}_{t\, = \,100,001, \ldots, 1,100,000}} <$><$> {{\widehat{{\widehat{{{\rm{msep}}}}}}}_{R|{{\cal D}_I}}}\left( \cdot \right) <$><!--$ were provided, see Section 4.2. For the estimation of the prediction uncertainty of the MAP predictor we have used formula (2.8). The results are presented in Table 2.

Table 2 Claims reserves predictors with corresponding conditional MSEP^1/2 in Model 2.1 (with non-informative uniform priors and log link) and in Model 3.1 (with non-informative gamma priors). The figures in brackets are obtained by Gibbs sampling, the others are exact. The results are compared to the frequentist's CL model of England & Verrall (Reference England and Verrall2002) and of Mack (Reference Mack1993). Note that Mack (Reference Mack1993) is a rather different model, so we include Mack's (Reference Mack1993) results only for comparison purposes.

Observations 5.1

• We observe that the predictors of the outstanding loss liabilities are all rather similar in these non-informative prior situations. The MAP predictor $\[--><$> {{\widehat{R}}^{MAP}} \, = \,6,047,059 <$><$> {{\widehat{R}}^{CL}} <$><$> {{\widehat{{\widehat{R}}}}^{MMSE}} \, = \,{{\widehat{{\widehat{R}}}}^{MMSE \ast }} \, = \,6,049,398 <$><$> {{\widehat{R}}^{MAP \ast }} \, = \,5,783,089 <$><!--$ that deviates from the others. This prediction seems too low and moreover, as mentioned in Remarks 2.3, the MAP predictor is not invariant under re-parametrization. Therefore its use is questionable.
• Note that although the MAP predictors for Models 2.1 and 3.1 (with non-informative priors) are different, the distributions of the reserves are identical since they are from the same Gibbs simulation. This highlights the danger of focusing solely on the MAP predictors, and not on the distribution.
• Prediction uncertainties in terms of the conditional MSEP: We compare our Bayesian calculations to the frequentist's estimates found in the literature: (i) ODP (constant scale) analytical approximation using asymptotic normality of MLEs, see Section 7.2 in England & Verrall (Reference England and Verrall2002), (ii) distribution-free CL method, see Mack (Reference Mack1993):
$--><$$>\openup 8pt{{\widehat{{{\rm{msep}}}}}_{R|{{\cal D}_I}}}\left( {{{{\widehat{R}}}^{CL}} } \right)\, = \,\left\{ {\matrix {{{\rm{ODP}} \ {\rm{(constant}} \ {\rm{scale) \ approximation,}} \ {\rm{England}} \ {\rm{\amp \ Verrall}} \ {\rm{(2002),}}} \hfill \cr {{\rm{according}} \ {\rm{to}} \ {\rm{Mack^{\prime}s}} \ {\rm{distribution}} \ {\rm{ - }} \ {\rm{free}} \ {\rm{CL}} \ {\rm{model}} \ {\rm{(1993)}}{\rm{.}}}\hfill \right\eqno<$$><!--$
We observe that our Bayesian models provide a prediction uncertainty in the range of 430,000. This is very similar to the estimate of England & Verrall (Reference England and Verrall2002) in the asymptotic normality approximation. Mack (Reference Mack1993)'s model is a rather different model, therefore we include Mack's (Reference Mack1993) results only for comparison purposes.
• The Bayesian models now have the advantage that they provide the full posterior parameter distributions. Therefore, we can calculate the predictive distribution of the outstanding loss liabilities (not only the claims reserves and the conditional MSEP). This is further outlined below.

5.2.2 Informative gamma priors

We turn to Model 3.1 (gamma priors) with informative priors, that is, we implement prior knowledge about the exposure parameters μ_i. We choose the degree of information a constant for all accident years, i.e. $\[--><$> {{a}_i}\, \equiv \,a\, \in \,[0,\infty ] <$><!--$ . Then the MAP predictors in Model 3.1 are given by

$--><$$>{{\widehat{R}}^{MAP \ast }} (a)\, = \, \mathop{\sum}\limits_i \widehat{R}_{i}^{{MAP \ast }} (a)\, = \, \mathop{\sum}\limits_{i\, + \,j\, \gt \,I} \widehat{\mu }_{i}^{{MAP \ast }} (a) \widehat{\gamma }_{j}^{{MAP \ast }} (a).\eqno<$$><!--$

These are calculated by the root searching algorithm given in (3.7) for a∈(0,∞), the cases a = 0 and a = ∞ can be solved explicitly. Figure 1 gives the MAP predictors $\[--><$> {{\widehat{R}}^{MAP \ast }} (a) <$><$> {{\widehat{R}}^{MAP \ast }} (a) <$><!--$ are an increasing function in the degree of information a. This comes from the fact that the prior estimates m_i are rather conservative (this will be further highlighted below).

Figure 1 MAP predictors $\[--><$> {{\widehat{R}}^{MAP \ast }} (a) <$><$> a \in [0,\infty ] <$><!--$ .

Next, we determine the MMSE predictors and the prediction uncertainties in the gamma priors model. Therefore we again apply the Gibbs sampler. After subtracting the burn-in costs T = 100,000 we again simulate 1,000,000 samples. The results are provided in Table 3.

Table 3 Claims reserves predictors with corresponding conditional MSEP^1/2 in the gamma Model 3.1 for different degrees of information a. The figures in brackets are obtained by Gibbs sampling.

Observations 5.2

• The first observation is that in the Gibbs sampler we obtain long-range dependencies for later development periods. This comes from the fact that we have large variances (for non-informative priors) and only a few observations. Therefore, we need many simulations ( $\[--><$> \widetilde{T} <$><!--$ large) for the convergence of the empirical mean.
• Similar to the non-informative gamma prior case we see substantial posterior bias terms in the MAP predictors. This comes from the fact that the dispersion ϕ is fairly large compared to the incremental payments X_i,j in later development periods. For a = ∞, for example, this results in $\[--><$> \widehat{\gamma }_{9}^{{MAP \ast }} (\infty )\, = \,0.01\% \ {\rm{and}}\ \widehat{\gamma }_{9}^{{MMSE \ast }} (\infty )\, = \,0.14\% <$><!--$ which explains the posterior bias terms. This again indicates that the MAP predictors should not be used.
• We see that the MMSE predictors are increasing in the degree of information a. This comes from the fact that the prior means m_i were chosen rather conservative and the more weight we give to these conservative prior means the more the MMSE predictors increase.
• The conditional MSEPs are decreasing in the degree of information a. This is rather obvious because the less uncertainty we have in the prior distributions the less prediction uncertainty we obtain. We see that the conditional $\[--><$> {\rm{MSE}}{{{\rm{P}}}^{{\rm{1/2}}}} <$><!--$ decreases from 430,160 to 395,012.

As mentioned above, we obtain the full posterior distribution from the Gibbs sampler for the outstanding loss liabilities $\[--><$> R\, = \,\mathop{\sum}\nolimits_i {{R}_i} <$><$>{\cal D}_I<$><!--$ , see (4.3). We consider in Model 3.1 the case a = 100. The histogram of the total reserves from 100,000 simulation of the outstanding loss liabilities R is given in Figure 2.

Figure 2 Histogram for $\[--><$> R{{|}_{{{\cal D}_I}}} <$><!--$ in Model 3.1 for a = 100 from 100,000 simulations.

This empirical distribution now allows for the estimation of any risk measure, not only the conditional MSEP. Moreover, we can also plot confidence intervals, for example, in Figure 3 we show the confidence intervals per accident year i. As expected, we observe that the uncertainty in old accident years is rather low, because they are already well developed, whereas for younger accident years we obtain bigger ranges.

Figure 3 Confidence intervals of $\[--><$> {{R}_i}{{|}_{{{\cal D}_I}}} <$><!--$ , i = 1,…,10, in Model 3.1 for a = 100. The different quantiles correspond to: minimum, 5th percentile, 10th percentile, 25th percentile, 50th percentile, 75th percentile, 90th percentile, 95th percentile and maximum.

The Gibbs sampler not only provides the conditional distribution of the outstanding loss liabilities R_i, given $\[--><$> {{{\cal D}}_I} <$><$> {{{\cal D}}_I} <$><$> {{{\cal D}}_I} <$><!--$ ). This is why there is no uncertainty at time 1. After this first development year we obtain the corresponding confidence intervals. Figure 5 describes the same development uncertainty but for the second youngest accident year. Of course, we observe a smaller uncertainty, because we have more observations compared to the case in Figure 4.

Figure 4 Development of C_i,j, i = I and j ≥0, over time in Model 3.1 for a = 100. The different gray scales correspond to: 10th percentile, 25th percentile, 50th percentile, 75th percentile, 90th percentile.

Figure 5 Development of C_i,j, i = I–1 and j ≥0, over time in Model 3.1 for a = 100. The different gray scales correspond to: 10th percentile, 25th percentile, 50th percentile, 75th percentile, 90th percentile.

5.2.3 Strong gamma prior case and the BF method

Finally, we compare the gamma priors Model 3.1 with strong priors for μ_i to the classical BF predictor. In the literature the classical BF predictor is given by (3.16). We also compare the classical BF predictor to the BF predictors obtained from Mack (Reference Mack2008):

$--><$$>{{\widehat{R}}^{Mack1}} \, = \,\mathop{\sum}\limits_i \widehat{R}_{i}^{{Mack1}} \, = \,\mathop{\sum}\limits_{i\, + \,j\, - \,I} {{m}_i} \widehat{\gamma }_{j}^{{raw}} \; \; \; \; {\rm{(BF}}{\rm{\hbox- }} {\rm{Mack}} \ {\rm{predict}} \ {\rm{or}} \ {\rm{from}} \ {\rm{raw}} \ {\rm{pattern),}}\eqno<$$><!--$

$--><$$>{{\widehat{R}}^{Mack2}} \, = \,\mathop{\sum}\limits_i \widehat{R}_{i}^{{Mack2}} \, = \,\mathop{\sum}\limits_{i\, + \,j\, - \,I} {{m}_i}\,\widehat{\gamma }_{j}^{{norm}} \; \; \; \; {\rm{(BF}} {\rm{\hbox- }}{\rm{Mack}} \ {\rm{predict}} \ {\rm{or}} \ {\rm{from}} \ {\rm{normalized}} \ {\rm{pattern)}}{\rm{.}}\eqno<$$><!--$

For the calculation of the prediction uncertainty of the BF predictor there are different methods in the literature: The conditional MSEP $\[--><$> {{\widehat{{{\rm{msep}}}}}_{R|{{{\cal D}}_I}}}\left( {{{{\widehat{R}}}^{BF}} } \right) <$><$> {{\widehat{{{\rm{msep}}}}}_{R|{{{\cal D}}_I}}}\left( {{{{\widehat{R}}}^{Mack1}} } \right) <$><$> {{\widehat{{{\rm{msep}}}}}_{R|{{{\cal D}}_I}}}\left( {{{{\widehat{R}}}^{Mack2}} } \right) <$><!--$ of the BF-Mack predictors are calculated according to Mack (Reference Mack2008). The results are presented in Table 4.

Table 4 Claims reserves predictors with corresponding conditional MSEP^1/2 according to Model 3.1 with strong priors, Alai et al. (Reference Alai, Merz and Wüthrich2009) and Mack (2008).

Observations 5.3

• The different BF predictors are rather diverse. This comes from the fact that the prior values m_i are too high, which has the rather unpleasant effect that we do not obtain reliable estimates for the claims development pattern γ_j. For the raw pattern we obtain $\[--><$> \mathop{\sum}\nolimits_j \widehat{\gamma }_{j}^{{raw}} \, = \,\mathop{\sum}\limits_j \widehat{\gamma }_{j}^{{MAP \ast }} (\infty )\, \lt \,1 <$><$> {{\widehat{R}}^{Mack2}} <$><$> {{\widehat{R}}^{BF}} <$><$> {{\widehat{{\widehat{R}}}}^{MMSE \ast }} (a\, = \,\infty ) <$><$> {{\widehat{R}}^{Mack1}} <$><!--$ seem to be too high because of the large values of m_i.
• The predictors $\[--><$> {{\widehat{R}}^{MMSE \ast }} (a\, = \,\infty ) <$><$> {{\widehat{R}}^{Mack1}} <$><$> {{\widehat{R}}^{MMSE \ast }} (a\, = \,\infty ) <$><$> {{\widehat{R}}^{Mack1}} <$><!--$ we also add uncertainty to m_i.
• The gamma priors Model 3.1 is consistent in the sense that it also uses the prior knowledge on μ_i to estimate the claims development pattern γ_j (whereas the other BF methods are not). In this spirit our Bayes model should be preferred. Moreover, we also have the flexibility to attach credibility weights in terms of a to this prior knowledge which then results in Table 3.

6 Conclusions

The Bayesian ODP claims reserving model with uniform priors and log link (Model 2.1) and with gamma priors (Model 3.1) give mathematically consistent ways to estimate claims reserves in the Bornhuetter & Ferguson (Reference Bornhuetter and Ferguson1972) spirit:

• they use prior knowledge m_i for the expected ultimate claim;
• they combine the prior knowledge m_i with an estimated claims development pattern $\[--><$> {{\widehat{\gamma }}_j} <$><!--$ to obtain the reserves;
• this claims development pattern is estimated using a credibility weighted average between the observations $\[--><$>{\cal D}_I<$><!--$ and the prior knowledge m_i according to the degree of information a contained in the prior knowledge. Complete prior knowledge (a = ∞) leads to a BF model similar to Mack (Reference Mack2008), no prior knowledge (a = 0) leads to the CL case, and for a∈(0,∞) we can model any intermediate case.

The advantage of such full Bayesian models is that they allow for a complete analysis and for the calculation of any risk measure, whereas the frequentist's approaches (Alai et al. (Reference Alai, Merz and Wüthrich2009) and Mack (Reference Mack2008)) need additional approximations for the determination of the conditional MSEP, and are unable to provide additional information such as predictive distributions of cash flows.

Limitations and outlook for further research. This paper only considers the ODP model with constant scale factor ϕ, and the BF model in the context of the CL model without a tail factor. In many cases the choice of a constant scale parameter ϕ should be checked. Often data suggests that ϕ _j depends on the development period j. Furthermore, it should be checked whether the conditional independence assumption between the X_i,j's is appropriate and whether one should include tail factors beyond the latest development period.

A Proofs

Proof of Lemma 3.2. This is an immediate consequence of the assumptions and (3.2)–(3.3).□

Lemma A.1In Model 3.1 equations (3.2)–(3.3) imply for j = 0,…,I

Proof of Lemma A.1. We first prove the two statements (an empty sum is set equal to 0)

({\rm A}.1)

$--><$$>\mathop{\sum}\limits_{k\, = \,0}^{I\,{\rm{ - }}\,j} C_{{k,j}}^{ \ast } \, = \,\mathop{\sum}\limits_{k\, = \,0}^{I\,{\rm{ - }}\,j} {{\mu }_k}\mathop{\sum}\limits_{m\, = \,0}^j {{\gamma }_m}\, + \,\varphi \mathop{\sum}\limits_{k\, = \,I\,{\rm{ - }}\,j\, + \,1}^I {{a}_k}\left( {1\,{\rm{ - }}\,\frac{{{{\mu }_k}}}{{{{m}_k}}}} \right),\eqno<$$><!--$

({\rm A}.2)

$--><$$>\mathop{\sum}\limits_{i\, = \,0}^I \frac{{{{a}_i} {{\mu }_i}}}{{{{m}_i}}}\, = \,\mathop{\sum}\limits_{i\, = \,0}^I {{a}_i}.\eqno<$$><!--$

If we sum (3.2) over i = 0.…,I and (3.3) over j = 0.…,I we obtain

$--><$$>\mathop{\sum}\limits_{i\, + \,j\,\leq \,I} {{\gamma }_j}{{\mu }_i}\, + \,\varphi \mathop{\sum}\limits_{i\, = \,0}^I \frac{{{{a}_i} {{\mu }_i}}}{{{{m}_i}}} \ {\rm{ - }}\,\varphi \mathop{\sum}\limits_{i\,{\rm{ - }}\,0}^I {{a}_i}\, = \,\mathop{\sum}\limits_{i\, + \,j\,\leq \,I} X_{{i,j}}^{ \ast } \, = \,\mathop{\sum}\limits_{i\, + \,j\,\leq \,I} {{\gamma }_j}{{\mu }_i}.\eqno<$$><!--$

This immediately implies statement (A.2). We now turn to (A.1). The proof is similar to the proof of Lemma 2.17 in Wüthrich & Merz (Reference Wüthrich and Merz2008) and goes by induction.

We start with j = 0: using (3.3) in the second step we have

$--><$$>\mathop{\sum}\limits_{k\, = \,0}^I C_{{k,0}}^{ \ast } \, = \,\mathop{\sum}\limits_{k\, = \,0}^I X_{{k,0}}^{ \ast } \, = \,{{\gamma }_0} \mathop{\sum}\limits_{k\, = \,0}^I {{\mu }_k}.\eqno<$$><!--$

This proves the claim for j = 0.

Induction step j→j + 1. We assume that the claim holds true for j ≤ I−1, then we prove the claim for j + 1:

$--><$$>\mathop{\sum}\limits_{k\, = \,0}^{I\,{\rm{ - }}\,(j\, + \,1)} C_{{k,j\, + \,1}}^{ \ast } \, = \,\mathop{\sum}\limits_{k\, = \,0}^{I\,{\rm{ - }}\,j\,{\rm{ - }}\,1} \mathop{\sum}\limits_{m\, = \,0}^{j\, + 1} X_{{k,m}}^{ \ast } \, = \,\mathop{\sum}\limits_{k\, = \,0}^{I\,{\rm{ - }}\,j} \mathop{\sum}\limits_{m\, = 0}^j X_{{k,m}}^{ \ast } \,{\rm{ - }}\,\mathop{\sum}\limits_{m\, = \,0}^j X_{{I\,{\rm{ - }}\,j,m}}^{ \ast } \, + \,\mathop{\sum}\limits_{k\, = \,0}^{I\,{\rm{ - }}\,j\,{\rm{ - }}\,1} X_{{k,j\, + \,1}}^{ \ast } .\eqno<$$><!--$

To the first term on the right-hand side we apply the induction assumption

$--><$$>\mathop{\sum}\limits_{k\, = \,0}^{I\,{\rm{ - }}\,j} \mathop{\sum}\limits_{m\, = \,0}^j X_{{k,m}}^{ \ast } \, = \,\mathop{\sum}\limits_{k\, = \,0}^{I\,{\rm{ - }}\,j} C_{{k,j}}^{ \ast } \, = \,\mathop{\sum}\limits_{k\, = \,0}^{I\,{\rm{ - }}\,j} {{\mu }_k}\mathop{\sum}\limits_{m\, = \,0}^j {{\gamma }_m}\, + \,\varphi \mathop{\sum}\limits_{k\, = \,I\,{\rm{ - }}\,j\, + \,1}^I {{a}_k}\left( {1\,{\rm{ - }}\,\frac{{{{\mu }_k}}}{{{{m}_k}}}} \right),\eqno<$$><!--$

and to the second and third term (3.2) and (3.3), respectively. This implies

$--><$$>\openup 5pt\eqalign{ \mathop{\sum}\limits_{k\, = \,0}^{I\,{\rm{ - }}\,(j\, + \,1)} C_{{k,j\, + \,1}}^{ \ast } \, = \,\mathop{\sum}\limits_{k\, = \,0}^{I\,{\rm{ - }}\,j} {{\mu }_k}\mathop{\sum}\limits_{m\, = \,0}^j {{\gamma }_m}\, + \,\mathop{\sum}\limits_{k\, = \,I\,{\rm{ - }}\,j\, + \,1}^I \varphi {{a}_k}\left( {1\,{\rm{ - }}\,\frac{{{{\mu }_k}}}{{{{m}_k}}}} \right) \cr \; \; \; \; \; \; \; \; \quad{\rm{ - }}\left[ {{{\mu }_{I\,{\rm{ - }}\,j}}\mathop{\sum}\limits_{k\, = \,0}^j {{\gamma }_k}\, + \,\varphi {{a}_{I\,{\rm{ - }}\,j}}\left( {\frac{{{{\mu }_{I\,{\rm{ - }}\,j}}}}{{{{m}_{I\,{\rm{ - }}\,j}}}}{\rm{ - }}1} \right)} \right]\, + \,{{\gamma }_{j\, + \,1}}\mathop{\sum}\limits_{k\, = \,0}^{I\,{\rm{ - }}\,j\,{\rm{ - }}\,1} {{\mu }_k} \cr = \quad\,\mathop{\sum}\limits_{k\, = \,0}^{I\,{\rm{ - }}\,(j\, + \,1)} {{\mu }_k}\mathop{\sum}\limits_{m\, = \,0}^{j\, + \,1} {{\gamma }_m}\, + \,\varphi \mathop{\sum}\limits_{k\, = \,I\,{\rm{ - }}\,(j\, + \,1)\, + \,1}^I {{a}_k}\left( {1\,{\rm{ - }}\,\frac{{{{\mu }_k}}}{{{{m}_k}}}} \right). \cr}\eqno<$$><!--$

This proves (A.1). If we now combine (A.1) and (A.2) we obtain

$--><$$>\mathop{\sum}\limits_{k\, = \,0}^{I\,{\rm{ - }}\,j} {{\mu }_k}\mathop{\sum}\limits_{m\, = \,0}^j {{\gamma }_m}\, = \,\mathop{\sum}\limits_{k\, = \,0}^{I\,{\rm{ - }}\,j} C_{{k,j}}^{ \ast } \,{\rm{ - }}\,\varphi \mathop{\sum}\limits_{k\, = \,I\,{\rm{ - }}\,j\, + \,1}^I {{a}_k}\left( {1\,{\rm{ - }}\,\frac{{{{\mu }_k}}}{{{{m}_k}}}} \right)\, = \,\mathop{\sum}\limits_{k\, = \,0}^{I\,{\rm{ - }}\,j} C_{{k,j}}^{ \ast } \, + \,\varphi \mathop{\sum}\limits_{k\, = \,0}^{I\,{\rm{ - }}\,j} {{a}_k}\left( {1\,{\rm{ - }}\,\frac{{{{\mu }_k}}}{{{{m}_k}}}} \right).\eqno<$$><!--$

This proves the claim.□

Proof of Lemma 3.3. Choose j ≤ I–1 then we have from Lemma A.1 and equation (3.2)

({\rm A}.3)

$--><$$>\openup 5pt\eqalign{ \mathop{\sum}\limits_{k\, = \,0}^{I\,{\rm{ - }}\,j\,{\rm{ - }}\,1} {{\mu }_k}\mathop{\sum}\limits_{m\, = \,0}^j {{\gamma }_m}\, = & \mathop{\sum}\limits_{k\, = \,0}^{I\,{\rm{ - }}\,j} {{\mu }_k}\mathop{\sum}\limits_{m\, = \,0}^j {{\gamma }_m} \ {\rm{ - }} \ {{\mu }_{I\,{\rm{ - }}\,j}}\mathop{\sum}\limits_{m\, = \,0}^j {{\gamma }_m} \cr = & \mathop{\sum}\limits_{k\, = \,0}^{I\, - \,j} C_{{k,j}}^{ \ast } \, + \,\varphi \mathop{\sum}\limits_{k\, = \,0}^{I\,{\rm{ - }}\,j} {{a}_k}\left( {1\,{\rm{ - }}\,\frac{{{{\mu }_k}}}{{{{m}_k}}}} \right)\,{\rm{ - }}\,\mathop{\sum}\limits_{k\, = \,0}^j X_{{I\,{\rm{ - }}\,j,k}}^{ \ast } \,{\rm{ - }} \ {{a}_{I\,{\rm{ - }}\,j}}\varphi \left( {1\, - \,\frac{{{{\mu }_{I\,{\rm{ - }}\,j}}}}{{{{m}_{I\,{\rm{ - }}\,j}}}}} \right) \cr = & \mathop{\sum}\limits_{k\, = \,0}^{I\,{\rm{ - }}\,j\,{\rm{ - }}\,1} C_{{k,j}}^{ \ast } \, + \,\varphi \mathop{\sum}\limits_{k\, = \,0}^{I\,{\rm{ - }}\,j\,{\rm{ - }}\,1} {{a}_k}\left( {1\,{\rm{ - }}\,\frac{{{{\mu }_k}}}{{{{m}_k}}}} \right). \cr}\eqno<$$><!--$

If we divide the equality in Lemma A.1 by (A.3) we obtain the claim.□

Proof of Theorem 3.4. We solve (3.4) for μ_i. In a first step we obtain for i = 1,…,I

$--><$$>{{\mu }_i}\left[ {\mathop{\sum}\limits_{k\, = \,0}^{i\,{\rm{ - }}\,1} C_{{k,I\,{\rm{ - }}\,i}}^{ \ast } \, + \varphi \mathop{\sum}\limits_{k\, = \,0}^{i\,{\rm{ - }}\,1} {{a}_k}\left( {1\,{\rm{ - }}\,\frac{{{{\mu }_k}}}{{{{m}_k}}}} \right)} \right]\, = \,\left( {\mathop{\sum}\limits_{k\, = \,0}^{i\,{\rm{ - }}\,1} {{\mu }_k}} \right)\left[ {C_{{i,I\,{\rm{ - }}\,i}}^{ \ast } \, + \,\varphi {{a}_i}\left( {1\,{\rm{ - }}\,\frac{{{{\mu }_i}}}{{{{m}_i}}}} \right)} \right].\eqno<$$><!--$

Moreover, we have

$--><$$>{{\mu }_i}\left[ {\mathop{\sum}\limits_{k\, = \,0}^{i\,{\rm{ - }}\,1} C_{{k,I\,{\rm{ - }}\,i}}^{ \ast } \, + \,\varphi \mathop{\sum}\limits_{k\, = \,0}^{i\,{\rm{ - }}\,1} {{a}_k}\left( {1\,{\rm{ - }}\,\frac{{{{\mu }_k}}}{{{{m}_k}}}} \right)\, + \,\frac{{{{a}_i} \varphi }}{{{{m}_i}}}\mathop{\sum}\limits_{k\, = \,0}^{i\,{\rm{ - }}\,1} {{\mu }_k}} \right]\, = \,\left( {\mathop{\sum}\limits_{k\, = \,0}^{i\,{\rm{ - }}\,1} {{\mu }_k}} \right)\left[ {C_{{i,I\,{\rm{ - }}\,i}}^{ \ast } \, + \,{{a}_i} \varphi } \right].\eqno<$$><!--$

Therefore, if we divide by the bracket on the left-hand side we obtain

$--><$$>{{\mu }_i}\, = \,\frac{{\left( {\mathop{\sum}\limits_{k\, = \,0}^{i\,{\rm{ - }}\,1} {{\mu }_k}} \right)\left[ {C_{{i,I\,{\rm{ - }}\,i}}^{ \ast } \, + \,{{a}_i} \varphi } \right]}}{{\mathop{\sum}\limits_{k\, = \,0}^{i\,{\rm{ - }}\,1} C_{{k,I\,{\rm{ - }}\,i}}^{ \ast } \, + \,\varphi \mathop{\sum}\limits_{k\, = \,0}^{i\,{\rm{ - }}\,1} {{\mu }_k}\left( {\frac{{{{a}_k}}}{{{{\mu }_k}}} \ {\rm{ - }}\,\frac{{{{a}_k}}}{{{{m}_k}}}\, + \,\frac{{{{a}_i}}}{{{{m}_i}}}} \right)}}.\eqno<$$><!--$

But then the first claim easily follows. The second claim was already proved in (A.2).□

Proof of Proposition 3.6. The proof follows from the normalization condition (3.6) and (3.13) similar to the derivation (3.8).□

References

Alai, D.H., Merz, M., Wüthrich, M.V. (2009). Mean square error of prediction in the Bornhuetter-Ferguson claims reserving method. Annals of Actuarial Science, 4/1, 7–31.CrossRef Google Scholar

Asmussen, S., Glynn, P.W. (2007). Stochastic Simulation. Springer.Google Scholar

Bornhuetter, R.L., Ferguson, R.E. (1972). The actuary and IBNR. Proceedings CAS, LIX, 181–195.Google Scholar

England, P.D., Verrall, R.J. (2002). Stochastic claims reserving in general insurance. British Actuarial Journal, 8/3, 443–518.Google Scholar

England, P.D., Verrall, R.J. (2006). Predictive distributions of outstanding liabilities in general insurance. Annals of Actuarial Science, 1/2, 221–270.Google Scholar

Gilks, W.R., Richardson, S., Spiegelhalter, D.J. (1996). Markov Chain Monte Carlo in Practice. Chapman & Hall.Google Scholar

Hastings, W.K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57, 97–109.CrossRef Google Scholar

Mack, T. (1991). A simple parametric model for rating automobile insurance or estimating IBNR claims reserves. ASTIN Bulletin, 21/1, 93–109.Google Scholar

Mack, T. (1993). Distribution-free calculation of the standard error of chain ladder reserve estimates. ASTIN Bulletin, 23/2, 213–225.Google Scholar

Mack, T. (2008). The prediction error of Bornhuetter/Ferguson. ASTIN Bulletin, 38/1, 87–103.CrossRef Google Scholar

Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E. (1953). Equation of state calculations by fast computing machines. Journal Chemical Physics, 21/6, 1087–1092.CrossRef Google Scholar

Saluz, A., Gisler, A., Wüthrich, M.V (2011). Development pattern and prediction error for the stochastic Bornhuetter-Ferguson claims reserving model. ASTIN Bulletin, 41/2, 279–313.Google Scholar

Scollnik, D.P.M. (2001). Actuarial modeling with MCMC and BUGS. North American Actuarial Journal, 5/2, 96–125.Google Scholar

Smith, R.L. (1998). Bayesian and frequentist approach to parametric predictive inference. In:Bayesian statistics, 6, Bernardo, J.M., Berger, J.O., Dawid, A.P. & Smith, A.F.M. (eds.) Oxford University Press, 589–612.Google Scholar

Spiegelhalter, D.J., Thomas, A., Best, N.G., Gilks, W.R. (1995). BUGS: Bayesian Inference Using Gibbs Sampling, Version 0.5. MRC Biostatistics Unit, Cambridge.Google Scholar

Spiegelhalter, D.J., Best, N.G., Carlin, B.P., van der Linde, A. (2002). Bayesian measures of model complexity and fit (with discussion). Journal Royal Statistical Society, 64, 583–639.Google Scholar

Verrall, R.J. (2004). A Bayesian generalized linear model for the Bornhuetter-Ferguson method of claims reserving. North American Actuarial Journal, 8/3, 67–89.CrossRef Google Scholar

Wüthrich, M.V., Merz, M. (2008). Stochastic Claims Reserving Methods in Insurance. Wiley.Google Scholar

Table 1 Observed incremental claims Xi,j, i + j ≤ I, and prior values mi.

Table 2 Claims reserves predictors with corresponding conditional MSEP1/2 in Model 2.1 (with non-informative uniform priors and log link) and in Model 3.1 (with non-informative gamma priors). The figures in brackets are obtained by Gibbs sampling, the others are exact. The results are compared to the frequentist's CL model of England & Verrall (2002) and of Mack (1993). Note that Mack (1993) is a rather different model, so we include Mack's (1993) results only for comparison purposes.

Figure 1 MAP predictors \[--><$> {{\widehat{R}}^{MAP \ast }} (a) <$><$> a \in [0,\infty ] <$>

Table 3 Claims reserves predictors with corresponding conditional MSEP1/2 in the gamma Model 3.1 for different degrees of information a. The figures in brackets are obtained by Gibbs sampling.

Figure 2 Histogram for \[--><$> R{{|}_{{{\cal D}_I}}} <$>

Figure 3 Confidence intervals of \[--><$> {{R}_i}{{|}_{{{\cal D}_I}}} <$>

Figure 4 Development of Ci,j, i = I and j ≥0, over time in Model 3.1 for a = 100. The different gray scales correspond to: 10th percentile, 25th percentile, 50th percentile, 75th percentile, 90th percentile.

Figure 5 Development of Ci,j, i = I–1 and j ≥0, over time in Model 3.1 for a = 100. The different gray scales correspond to: 10th percentile, 25th percentile, 50th percentile, 75th percentile, 90th percentile.

Table 4 Claims reserves predictors with corresponding conditional MSEP1/2 according to Model 3.1 with strong priors, Alai et al. (2009) and Mack (2008).