Hostname: page-component-745bb68f8f-f46jp Total loading time: 0 Render date: 2025-02-06T18:00:41.325Z Has data issue: false hasContentIssue false

Automated Graduation using Bayesian Trans-dimensional Models

Published online by Cambridge University Press:  12 July 2011

Rights & Permissions [Opens in a new window]

Abstract

This paper presents a new method of graduation which uses parametric formulae together with Bayesian reversible jump Markov chain Monte Carlo methods. The aim is to provide a method which can be applied to a wide range of data, and which does not require a lot of adjustment or modification. The method also does not require one particular parametric formula to be selected: instead, the graduated values are a weighted average of the values from a range of formulae. In this way, the new method can be seen as an automatic graduation method which we believe can be applied in many cases without any adjustments and provide satisfactory graduated values. An advantage of a Bayesian approach is that it allows for model uncertainty unlike standard methods of graduation.

Type
Papers
Copyright
Copyright © Institute and Faculty of Actuaries 2011

1. Introduction

Many methods have been proposed for graduating mortality data in order to provide smoother values which can be used in practice. Parametric models, using maximum likelihood or weighted least squares estimation, are very useful when there is sufficient data. Forfar et al. (Reference Forfar, McCutcheon and Wilkie1988) provides a comprehensive introduction to the use of parametric models for graduation, with a particular emphasis on standard tables. Forfar et al. (Reference Forfar, McCutcheon and Wilkie1988) define a family of models which are sufficiently broad to be able to provide satisfactory results in many cases, which they called “Gompertz-Makeham formulae”. When classical estimation methods are used for Gompertz-Makeham (GM) formula, it is necessary to fit a wide range of formula and search through these to find one particular curve which provides a satisfactory graduation. The idea of this paper is to use GM formulae, but to replace the classical estimation with a Bayesian method which does not require the process of searching through a range of candidate models to identify the best one to use. Instead, the Bayesian method calculates the posterior probability for each model and produces graduated values based on these. In effect, the graduated values are a weighted average of the values from each GM formula, where the weights are the posterior probabilities for each GM formula. If there is one GM formula which is clearly the “best” model to use, then the posterior probability should be close to 1, and the graduated values will be close to those from that formula. While this can happen in certain circumstances, it is more likely that there is some doubt about which model is the best one (as can be seen from some of the examples in Forfar et al., Reference Forfar, McCutcheon and Wilkie1988). In this case, the new method in this paper has some advantages since it does not require a single formula to be chosen.

We believe that this new method may have a further advantage over the classical estimation methods, since it is more flexible as well as automatic. It is more flexible since it can use a combination of GM formulae, and we believe that this flexibility means that it is possible that the method could be applied to a wider range of data than the straightforward GM formulae.

There have been a number of applications of Bayesian methods to the estimation of mortality rates. These include, for example, Kimeldorf & Jones (Reference Kimeldorf and Jones1967), Broffit (Reference Broffit1988) and Carlin (Reference Carlin1992) which used conventional Bayesian analysis. Markov chain Monte Carlo (MCMC) methods have been applied to graduation by Scollnik (Reference Scollnik2001) and Neves & Migon (Reference Neves and Migon2007). Scollnik (Reference Scollnik2001) provides an excellent introduction to the use of MCMC methods, with an example of the application to graduation. Neves & Migon (Reference Neves and Migon2007) applies hierarchical dynamic models (Gamerman & Migon, Reference Gamerman and Migon1993) to graduation. Also, Czado et al. (Reference Czado, Delwarde and Denuit2005) apply Bayesian estimation to Poisson log bilinear regression for mortality forecasting, using MCMC methods.

In this paper, we use reversible jump Markov chain Monte Carlo methods, which are an extension of the MCMC methodology applied to graduation in the actuarial literature. The reversible jump algorithms allow us to consider cases where the dimension of the parameter vector is unknown: it is not known, a priori, how many parameters are appropriate for a particular regression. We use the generic reversible jump implementation in the package winBUGS (Lunn et al., Reference Lunn, Thomas, Best and Spiegelhalter2000).

Bayesian methods have been transformed by the use of Markov chain Monte Carlo (MCMC) methods: see, for example Gilks et al. (Reference Gilks, Richardson and Spiegelhalter1996). For example, these methods have enabled statisticians to apply complex Bayesian models to a very wide range of applications. The monograph by Congdon (Reference Congdon2006) provides many wide-ranging examples. As mentioned above, Scollnik (Reference Scollnik2001) provides an excellent introduction with actuarial examples, and we would also recommend Johansen et al. (Reference Johansen, Evers and Whiteley2010) for details of the algorithms themselves. An important extension is the use of reversible jump MCMC (RJMCMC) methods (Green, Reference Green1995), which allow the analysis of trans-dimensional models. The key idea is to extend the range of models so that the number of variables is also unknown. In the context of parametric models for graduation, we can therefore apply a set of models and allow the Bayesian estimation process to indicate (through the posterior distributions) which are the most appropriate for the data. This is all part of the model, and it is not necessary to make subjective decisions about how many parameters to use for the graduation. In fact, the model can be used so that the graduated values are weighted averages of values from a number of different GM models, with the weights chosen according to the posterior probability for each model. In this way, it is possible to add some flexibility to the family of GM models, which may enable them to be used when conventional estimation fails: in other words, when the parametric models are abandoned in favour of a non-parametric approach (for example). The approach we use is implemented within winBUGS, using the RJMCMC procedures outlined in Lunn et al. (Reference Lunn, Best and Whittaker2009).

While parametric graduation has proved to be very successful in a number of contexts, there are many other areas where it has not been found to be suitable. In general, this is when there is not enough data, or where the underlying pattern of mortality rates is such that no parametric curve can be found which proves satisfactory. In the latter case, the problems are usually caused by particular features such as the accident hump or rapid changes in mortality rates during infant year, which are difficult to model with a parsimonious model. There have been some suggestions for parametric models to cater for features such as this, including the model for the full age range by Heligman & Pollard (Reference Heligman and Pollard1980), and it is possible that a trans-dimensional Bayesian approach could be used for this class of models as well. However, the focus of this paper is on the family of GM formulae proposed by Forfar et al. (Reference Forfar, McCutcheon and Wilkie1988) and we believe that the Bayesian approach will extend the number of cases where they prove suitable, as well as providing a less subjective method for applying them. We recognise that there are some features which they are not able to capture, and therefore this paper looks at the graduation of mortality rates over adult years, from approximately 18 upwards. This may include the accident hump, but some care will need to be taken to ensure that the fitted rates are suitable.

Alternative approaches that can be used when parametric modelling is not suitable include non-parametric graduation such as Whittaker graduation (Whittaker, Reference Whittaker1923) for which Verrall (Reference Verrall1993), proposed a Bayesian model, building on the work of Taylor (Reference Taylor1990). These methods have the advantage that they can be more flexible and adapt better to local features of the data. However, they also suffer from some disadvantages and it cannot be claimed that they provide a panacea for all graduation problems. In many ways, we believe that the use of the trans-dimensional approach in conjunction with parametric models provides an ideal combination of the straightforwardness of a mathematical formula together with the flexibility which is often required in practice.

The paper is set out as follows. In Section 2, the notation and methodology of the graduation methods are outlined. Section 3 contains an introduction to the Bayesian methods we use, and Section 4 describes how these can be applied to graduation. Section 5 contains two examples of the application of the new approach to CMI data in Forfar et al. (Reference Forfar, McCutcheon and Wilkie1988), and Section 6 provides some concluding comments.

2. Parametric graduation and Gompertz-Makeham models

In this section, the notation used is defined and the general class of parametric models is set out. These models were first defined by Forfar et al. (Reference Forfar, McCutcheon and Wilkie1988). We assume that data are available for a set of (not necessarily consecutive) ages. We denote the age by x, and the set of ages for which data are available by RI, where I denotes all of the observed data which are available. For the sake of notational simplicity, we assume that age is defined as age nearest birthday, although all of the methods are trivially adapted to other definitions. Then it is assumed that the observed data, I, consist of the number of deaths, dx, and the central exposure, )(--><$>E_{x}^{C} <$><!--, for )(--><$>x \in {{R}_I} <$><!--. These data are to be used to estimate the force of mortality, μx, over a range of ages which may be larger than RI (for example, estimated values will be produced at any missing ages, and also may be required outside the range of RI). In this paper, we will assume that data are generated by a set of independent lives, and will therefore exclude the possibility of duplicate policies, or the case where the data are based on amounts of insurance or annuity. The likelihood can therefore be written as

\[--><$$> {{L}_I} \propto \prod\limits_{x \in {{R}_I}} {\mu _{x}^{{{{d}_x}}} {{e}^{{\rm{ - }}E_{x}^{C} {{\mu }_x}}} } \eqno<$$><!--\]

(see, for example, Macdonald, Reference Macdonald1996), which is equivalent to the use of a Poisson likelihood function. The force of mortality can be estimated by maximum likelihood estimation, with a parametric model for μx inserted when carrying parametric graduation. Many parametric models have been suggested for μx, of which two of the earliest and simplest are the Gompertz model (Gompertz, Reference Gompertz1825) and the Makeham model (Makeham, Reference Makeham1859). The Gompertz model is μx = Bc x, and the Makeham model is μx = A + Bc x. While these models are usually too simple to provide satisfactory graduations, it is well known that they do capture some essential properties of the progression of mortality rates over much of the range of life. The Gompertz formula models the aging effect, which is the dominant effect over (approximately) ages 50 to 90. This is so fundamental to the modelling of mortality rates that it is usually used as the base model in some sense, even when non-parametric models are employed. The Makeham model contains this aging effect, but includes a constant, A, which measures a non-age-dependent background mortality rate which is particularly important below the age of 50. Various extensions to these two models have been suggested and used, for example by the Continuous Mortality Investigation (CMI) in the UK in the construction of mortality tables for use in the insurance industry. As a part of this process, Forfar et al. (Reference Forfar, McCutcheon and Wilkie1988) suggest a general modelling framework which encompasses the Gompertz and Makeham models, but allows a much wider range of models to be fitted. Forfar et al. (Reference Forfar, McCutcheon and Wilkie1988) notice that many of the parametric models which had been suggested for mortality could be expressed in a unified way and extended to a wider range of possible models. The advantage of this is that it provides a range of models which can be searched through in order to find a reasonable graduation. The general model is called a Gompertz-Makeham (GM) formula, because it starts from these basic mortality models. The GM formula, of order (r,s) is

(2.1)
\[--><$$> G{{M}^{r,s}} \left( x \right) = \mathop{\sum}\limits_{i = 1}^r {{{\alpha }_i}{{x}^{i{\rm{ - }}1}} } + \exp \left( {\mathop{\sum}\limits_{i = r + 1}^{r + s} {{{\alpha }_i}{{x}^{i{\rm{ - }}r{\rm{ - }}1}} } } \right) \eqno<$$><!--\]

with the convention that the sums disappear when r = 0 or s = 0. With this notation, the Gompertz model is a GM(0,2) and the Makeham model is a GM(1,2). The general strategy is to investigate a large range of values of r and s in the GM formula in order to find a reasonable graduation. It should be noted that a model with s < 2 will never be appropriate in the context of the graduation of mortality data. Hence, like Forfar et al. (Reference Forfar, McCutcheon and Wilkie1988), we only consider models with s ≥ 2.

In order to assess whether a graduation is “reasonable”, some criteria are needed. There are a number of tests of the fit and smoothness of a graduation, but the initial sifting through possible models can be carried out using likelihood ratio test. Twice the change in the log-likelihood has (approximately) a )(--><$>\chi _{\nu }^{2} <$><!-- distribution, where the number of degrees of freedom, ν, is the change in the number of parameters (usually 1). The usual approach is to start with a simple model (the Gompertz model) and add parameters one at a time, examining the likelihood ratio test statistic to see whether it is justifiable to add that parameter. Forfar et al. (Reference Forfar, McCutcheon and Wilkie1988) contains a number of detailed examples of this approach which are very useful in illustrating the overall approach. Each of these examples relates to one of the CMI investigations for the period 1979 to 1982, and we use the data from Section 15 (widows of life office pensioners) and Section 16 (male life office pensioners) of Forfar et al. (Reference Forfar, McCutcheon and Wilkie1988) in the examples in Section 5 of this paper.

Before setting out the alternative Bayesian estimation method, it is necessary to consider some detailed computational aspects of the GM models. It can be seen that the GM formula, (2.1), contains powers of x, which may become very large: for example, x 4 = 100,000,000 at age 100. The effect of this is to make the corresponding parameter extremely small, which can cause computational issues. To avoid this, it is usual to use a transformation of the age, instead of the age itself. This transformation is chosen in order to ensure that it stays in the range [−1, 1], and this can be achieved by using )(--><$>\frac{{x{\rm{ - }}u}}{v} <$><!-- instead of x, where )(--><$>u = \frac{{{{x}_{\min }} + {{x}_{\max }}}}{2} <$><!--, )(--><$>v = \frac{{{{x}_{\max }}{\rm{ - }}{{x}_{\min }}}}{2} <$><!-- and x min and x max are the minimum and maximum values, respectively, of )(--><$>x \in {{R}_I} <$><!--. Finally, the GM formulae are defined in terms of Chebycheff polynomials of the first kind, Cn (x), rather than {1, x, x 2, x 3, …}, where C 0 (x) = 1, C 1 (x) = x and Cn + 1 (x) = 2xCn (x)−Cn −1 (x) for n ≥ 1. The reason for using these Chebycheff polynomials is again for computational efficiency, since they form an orthonormal basis: for further details of this, see Forfar et al. (Reference Forfar, McCutcheon and Wilkie1988). Thus, the exact form of the GM formula which we use is

(2.2)
\[--><$$> G{{M}^{r,s}} \left( x \right) = \mathop{\sum}\limits_{i = 1}^r {{{\alpha }_i}{{C}_{i{\rm{ - }}1}}\left( {\frac{{x{\rm{ \,-\, }}u}}{v}} \right)} + \exp \left( {\mathop{\sum}\limits_{i = r + 1}^{r + s} {{{\alpha }_i}{{C}_{i{\rm{ -}}r{\rm{ - }}1}}\left( {\frac{{x{\rm{ \,-\, }}u}}{v}} \right)} } \right) \eqno<$$><!--\]

For the rest of this paper, GM r,s(x) refers to the form in (2.2) rather than (2.1). This parametric formula forms the basis for all of the models that we use, and the form of the model depends on which of the parameters, αi are non-zero. One difference between the approach of Forfar et al. (Reference Forfar, McCutcheon and Wilkie1988) and the approach taken in this paper is that we do not insist that the models are nested, since we are not using likelihood ratio tests to search through models. This means that we include models where αi may be 0 for some i < r, even though αr is non-zero.

The specification of the model is completed by the distributional assumptions, which, as stated above, are equivalent to the assumption that

\[--><$$> {{d}_x}\!\sim\!\! \;{\rm{independent}}\ {\rm{Poisson}}\,{\rm{with}}\ {\rm{mean}}\ E_{x}^{C} {{\mu }_x} \eqno<$$><!--\]

where )(--><$>{{\mu }_x} = G{{M}^{r,s}} \left( x \right) <$><!--.

The Bayesian approach uses models in the form of (2.2), assumes that they are all equally likely (a priori) and estimates the posterior probability for each of them given the data. This entails assessing a total of 2r + s models, of varying dimension and calculating the posterior probability for each of these. This is done using MCMC methods, and since the number of parameters is not the same for each model, it also entails using reversible jump methods, as described in Section 3.

3. Trans-dimensional models and Markov chain Monte Carlo methods

In this section, we give a very brief overview of the Bayesian techniques which are used in the new graduation method. There are many books and papers on this methodology, including the books by Congdon (Reference Congdon2006) and Gelman et al. (Reference Gelman, Carlin, Stern and Rubin1995). We do not provide the detailed algorithms, but Johansen et al. (Reference Johansen, Evers and Whiteley2010) provide an excellent introduction together with many more technical details than is appropriate here. The application of the methods uses the software winBUGS, and the web page for the BUGS project contains links to many on-line resources (http://www.mrcbsu.cam.ac.uk/bugs).

At the heart of Bayesian modelling is Bayes’ theorem, where all of the parameters are assumed to be unknown random variables. Thus, the distribution of the observed data, I, is denoted by )(--><$>f\left( {I|\theta, M} \right)<$><!-- and depends on the unknown parameters θ for a specific model M. It is assumed that M belongs to a class of models, SM. The model and model parameters are assigned prior distributions, f(M) and )(--><$>f\left( {\theta |M} \right)<$><!--, and the posterior distribution is given by )(--><$>f\left( {\theta, M|I} \right) \propto f\left( {I|\theta, M} \right)f\left( {\theta |M} \right)f\left( M \right)<$><!--. It can be seen from this result that parameter uncertainty is included through the prior distribution of the parameters (conditional on the model); and also model uncertainty is included through the prior distribution for M. It is the inclusion of the prior distribution for M which is the new feature of this paper, and which requires the use of the methods set out in Section 3.1. Note that it is assumed that a GM formula is appropriate for the data, although the values of r and s are not known. Thus, the model uncertainty included in this paper is within the family of GM models.

For graduation purposes, we require the posterior distribution of the mortality rates, μx, given the data I. A more limited aim would be to choose a model first, and then derive the posterior distribution of μx, conditional on the model M and the data I:

(3.1)
\[--><$$> f\left( {{{\mu }_x}|M,I} \right) = \int {f\left( {{{\mu }_x}|M,\theta } \right)f\left( {\theta |M,I} \right)d\theta } . \eqno<$$><!--\]

Note that this is a standard Bayesian analysis, which can be used to estimate the parameters for a particular model. The more complete problem is not to condition on M, which then enables us to include inference about the models in SM. This is addressed in Section 3.1, and this will give the posterior probability of each model M, given the data, I. In this way, model uncertainty (within SM) is summarised in these posterior probabilities. It is possible to take into account this type of model uncertainty when producing graduated mortality rates in two different ways. Either we can choose the most likely model (a posteriori), M max, and base the graduation on this, or we can estimate μx using a weighted average of all models, using the posterior probabilities for each model as the weights. In other words, the choice is between

(3.2)
\[--><$$> f\left( {{{\mu }_x}|{{M}_{\max }},I} \right) \eqno<$$><!--\]
(3.3)
\[--><$$> {\rm{and}}\; \; \mathop{\sum}\limits_{M\, \in \,{{S}_M}} {f\left( {{{\mu }_x}|M,I} \right)P\left( {M|I} \right)} . \eqno<$$><!--\]

We believe that (3.3) is preferable, since it is usually the case that there is not one particular model which clearly has the highest posterior probability. It is sometimes the case that one model does indeed dominate, and we could then use (3.2). However, it is also the case that this model would then dominate the sum in (3.3) and hence the graduated mortality rates. If it is desired that the graduated rates should follow precisely a parametric curve, then (3.2) should be used, and it will be necessary to go through a similar method of model choice as for classical estimation methods (as in Forfar et al., Reference Forfar, McCutcheon and Wilkie1988). However, we believe that the added flexibility of leaving all of the models in the estimation, albeit with possibly very small posterior probabilities in (3.3), is very useful in the context of graduation. Also, it is often the case that there are a number of models whose posterior probabilities are quite similar and it may be difficult to decide which model is the best one to use when using (3.2). This is certainly true in Forfar et al. (Reference Forfar, McCutcheon and Wilkie1988), much of which is devoted to deciding which single model should be used to produce the graduated values. For example, Section 16.2 considers the “Choice of Order of Formula” for the male life offices pensioners’ data. A total of 15 models are considered, of which the GM 1,3 (x) and GM 1,5 (x) are identified as the best candidates, based on a battery of tests and consideration of the shape and smoothness of the graduated values. We replace this selection process with the Bayesian fitting procedure described in the following section.

3.1 Reversible Jump MCMC

In this section, we extend (3.1) so that the model uncertainty is also included. Thus, the posterior distribution of )(--><$>{{\mu }_x}|I<$><!--, taking into account model uncertainty as well as parameter uncertainty, can be written as

\[--><$$>f\left( {{{\mu }_x}|I} \right) = \int {f\left( {{{\mu }_x}|M,\theta } \right)f\left( {\theta, M|I} \right)d\left( {M,\theta } \right)} \eqno<$$><!--\]

and in some cases this distribution may be obtained exactly in a straightforward way. However, in most cases it is not possible to obtain the posterior distribution in closed form, for example when the model is unknown and the parameter vector is high dimensional, or complex. In these cases, simulation methods can be highly effective and the recent advances in Bayesian methodology use simulation based on Markov chains: the so-called Markov chain Monte Carlo methods.

In MCMC methods, a Markov chain )(--><$>\left\{ {\left( {{{M}^{\left( b \right)}}, {{\theta }^{\left( b \right)}} } \right)} \right\}_{{b = 1}}^{\infty } <$><!-- is generated whose equilibrium distribution is the required posterior distribution, )(--><$>f\left( {\theta, M|I} \right)<$><!--. The distribution for any required quantity can then be approximated by a Monte Carlo average. In this case, an estimate of the mortality rate can be obtained as

(3.4)
\[--><$$>f\left( {{{\mu }_x}|I} \right) \approx \frac{1}{N}\mathop{\sum}\limits_{a = 1}^N {f\left( {{{\mu }_x}|{{M}^{(B + ta)}}, {{\theta }^{(B + ta)}} } \right)} \eqno<$$><!--\]

where B is the “burn-in” (a number of iterations of the Markov chain before it converges to the equilibrium distribution) and t is a thinning variable (which is often chosen as 1). The MCMC methodology provides a general framework of generating the Markov chain, and there are a number of different algorithms that can be used, such as Gibbs sampling (Geman & Geman, Reference Geman and Geman1984, and Gelfand & Smith, Reference Gelfand and Smith1990) and the Metropolis-Hastings algorithm (Metropolis et al., Reference Metropolis, Rosenbluth, Rosenbluth, Teller and Teller1953 and Hastings, Reference Hastings1970). For more details of these algorithms, see Johansen et al. (Reference Johansen, Evers and Whiteley2010). The basic idea of MCMC methods is to simulate a sequence of values in such a way that they converge to the required posterior distribution. This is then extended to allow jumps between different models by the use of reversible jump MCMC methods. The term “reversible jump” refers to a technical property of the sampling procedure that ensures that it converges to the required posterior distribution. Given the current state, (M (b), θ (b)), a subsequent state (M, θ) is drawn from some proposal distribution π and is either accepted or rejected, so that the next state is (M (b + 1), θ (b + 1)), where

\[--><$$>\left( {{{M}^{(b + 1)}}, {{\theta }^{(b + 1)}} } \right) = \left\{\matrix {{\left( {M,\theta } \right)\; \; \; \;\qquad {\rm{if}}\ (M,\theta )\;{\rm{is}}\,{\rm{accepted}}} \hfill \cr \!\!\!{\left( {{{M}^{^{{(b)}} }}, {{\theta }^{(b)}} } \right)\; \; \;\;{\rm{if}}\ (M,\theta )\;{\rm{is}}\,{\rm{rejected}}} \\\end{}}} \right.\eqno<$$><!--\]

For variable dimension models, the sampling procedure has to be designed quite carefully in order to ensure convergence. This involves an extension to the Metropolis-Hastings algorithm, which leads to a sampling procedure known as the reversible jump algorithm, and which was proposed by Green (Reference Green1995).

3.2 Trans-dimensional models in BUGS

Bayesian models which allow for model uncertainty where the number of parameters is one of the unknown quantities are often referred to as “trans-dimensional” models. It is possible to construct computer programmes separately from first principles for each application, but winBUGS is freely available and has been designed to be “flexible software for the Bayesian analysis of complex statistical models using Markov chain Monte Carlo (MCMC) methods”. Hence, the applications in this paper make use of winBUGS, together with the RJMCMC add-ons which are also available from the BUGS project web site. There is also a useful User Manual available (“winBUGS Jump Interface: User Manual”). This Manual allows us to apply the type of models described above, in which the structure of the model itself is unknown. There are two main classes of models that can be used within winBUGS, one of which will be used in this paper (see Lunn et al., Reference Lunn, Best and Whittaker2009 for more details). This class is described in this section, in general terms, with the application to graduation specified in Section 4. The distinctive feature of a trans-dimensional model is that the number of parameters that is included in the model is not fixed a priori. Instead, we denote the number of parameters included by k, where k can take any value from 0 up to a maximum of Q (and the value of Q is always clear from the context). The prior distribution for k is specified so that all values of k are equally likely (a priori). This means that each parameter is either included or excluded, so that a binomial distribution is the appropriate prior for k, with parameters Q and )(--><$>\frac{1}{2}<$><!--.

We note that, for a GM r,s (x) formula, as given by equation (2.2), we will use two of these models: one for )(--><$>\mathop{\sum}\sidelimits_{i = 1}^r {{{\alpha }_i}{{x}^{i{\rm{ - }}1}} } <$><!-- and the other for )(--><$>\exp \left( {\mathop{\sum}\sidelimits_{i = r + 1}^{r + s} {{{\alpha }_i}{{x}^{i{\rm{ - }}r{\rm{ - }}1}} } } \right) <$><!--. However, these are estimated together and details of the winBUGS code are given in the Appendix. The reason for using two separate trans-dimensional models is that the distributional properties of the parameters are likely to be different in each model. Within each trans-dimensional model, it will be assumed that the parameters have the same variance. This is credible within each separate component, )(--><$>\mathop{\sum}\sidelimits_{i = 1}^r {{{\alpha }_i}{{x}^{i{\rm{ - }}1}} } <$><!-- and )(--><$>\exp \left( {\mathop{\sum}\sidelimits_{i = r + 1}^{r + s} {{{\alpha }_i}{{x}^{i{\rm{ - }}r{\rm{ - }}1}} } } \right) <$><!--, but not for the complete set of parameters. Thus, since there are two choices to make when fitting a GM model, we will use two of these trans-dimensional models in the application to graduation. These will then be combined in the calculation of the mean, as specified in equation (2.1), and the estimation performed in a single procedure. In this way, the RJMCMC methods will allow us to consider graduations where the number of parameters in each of these terms is unknown.

The posterior distribution for k is a part of the output, giving an indication of how many parameters should be included. More importantly, the output also gives information on which parameters these are. Note that it is possible that the Bayesian model will indicate that any set of parameters can be included – there is no restriction on them being consecutive parameters. This is in contrast with the conventional use of GM models (as implied by equation 3.4) where the choice of r and s implies that all parameters, from α 1 to αr (inclusive) and from αr + 1 to αr +s (inclusive), are included.

4. Trans-dimensional models for graduation

In this section, we specify a trans-dimensional modelling framework which we believe is suitable for many graduations. In particular, this framework should be suitable for graduations of mortality rates over adult ages, although it may not capture all of the features that may be present at young ages. Since a GM model has two components, )(--><$>\mathop{\sum}\sidelimits_{i = 1}^r {{{\alpha }_i}{{C}_{i{\rm{ - }}1}}\left( {\frac{{x{\rm{ - }}u}}{v}} \right)} <$><!-- and )(--><$>\exp \left( {\mathop{\sum}\sidelimits_{i = r + 1}^{r + s} {{{\alpha }_i}{{C}_{i{\rm{ - }}r{\rm{ - }}1}}\left( {\frac{{x{\rm{ - }}u}}{v}} \right)} } \right) <$><!--, each of these will be modelled by a separate trans-dimensional model. Before specifying these in detail, we first consider the range of GM models which should be included in the overall framework. As was noted in Section 2, it is not appropriate to consider any models where s < 2, since the Gompertz term will always be needed. Also, the Makeham term is often needed (r = 1), and may also be necessary to consider higher values of r in order to capture the progression of mortality rates at younger ages. The terms in )(--><$>\mathop{\sum}\sidelimits_{i = 1}^r {{{\alpha }_i}{{C}_{i{\rm{ - }}1}}\left( {\frac{{x{\rm{ - }}u}}{v}} \right)} <$><!-- have to be more carefully handled, since they can cause the model to produce negative values for the mortality rates. However, it is unlikely that values of r higher than 3 will be needed since the higher terms in )(--><$>\exp \left( {\mathop{\sum}\sidelimits_{i = r + 1}^{r + s} {{{\alpha }_i}{{C}_{i{\rm{ - }}r{\rm{ - }}1}}\left( {\frac{{x{\rm{ - }}u}}{v}} \right)} } \right) <$><!-- usually capture the shape of the mortality curve satisfactorily. For these reasons, the most complicated model we include is the GM 3,6 (x). The trans-dimensional modelling approach will allow each parameter to be included or excluded, and we believe that this provides a sufficiently flexible framework for most graduations. The trans-dimensional models will be specified in terms of the maximal model:

(4.1)
\[--><$$> G{{M}^{3,6}} \left( x \right) = \mathop{\sum}\limits_{i = 1}^3 {{{\alpha }_i}{{C}_{i{\rm{ - }}1}}\left( {\frac{{x{\rm{ - }}u}}{v}} \right)} + \exp \left( {\mathop{\sum}\limits_{i = 4}^9 {{{\alpha }_i}{{C}_{i{\rm{ - }}r{\rm{ - }}1}}\left( {\frac{{x{\rm{ - }}u}}{v}} \right)} } \right). \eqno<$$><!--\]

The parameter vector is (α 1, α 2, … α 9) and, as explained above, we only consider models with s ≥ 2. This means that α 4 and α 5 are always included in the model. Because of the way that the trans-dimensional models have been implemented in winBUGS by Lunn et al. (Reference Lunn, Best and Whittaker2009), it is necessary also to always include α 1. This means that the “base” model – the model with the lowest number of parameters – that is considered is the Makeham model. It is possible that the simple Gompertz model might be suitable, as is illustrated in Section 5.1, and in this case the estimated value for α 1 is negligibly small. We do not think that this is a particularly serious issue, since, for the majority of cases, α 1 will be needed and, in other cases its posterior distribution will have a mean of 0 if it is not playing a significant role. This leaves 6 other parameters, (α 2, α 3) and (α 6, α 7, α 8, α 9), which can be included or excluded, making a total of 64 different models in the trans-dimensional framework. Each of the two sets of parameters, (α 2, α 3) and (α 6, α 7, α 8, α 9), will be modelled using a separate trans-dimensional model and these are specified below. The prior distribution for the remaining parameter, α 4, is specified separately.

Lunn et al. (Reference Lunn, Best and Whittaker2009) set up the trans-dimensional models in winBUGS by defining a new variable which is related to the parameter vector. For completeness, and in order that the winBUGS code can be interpreted, we do the same here for the trans-dimensional models which are to be used for graduation. However, we would emphasise that the important point is that the parameters (α 2, α 3) and (α 6, α 7, α 8, α 9) which are included in the mean can be included or excluded from the model according to the evidence from the data. Thus, we define vectors ψ (1) and ψ (2) as follows

(4.2)
\[--><$$>{{\psi }^{(1)}} = \left ( \matrix{{\psi _{1}^{{(1)}} } \hfill \cr \  \!\!{\psi _{2}^{{(1)}} } \hfill \cr {\psi _{3}^{{(1)}} } \\\end{}} \right) = \left( {\matrix {1 &amp; 0 &amp; 0 \hfill \cr 1 &amp; 1 &amp; 0 \hfill \cr  1 &amp; 0 &amp; 1 \end{}}}\right ) \left( \matrix {{{\alpha }_1} \hfill \cr {{\alpha }_2} \hfill \cr {{\alpha }_3}}} \right )\\\end{} \eqno<$$><!--\]
(4.3)
\[--><$$>{\rm{and}}\; \; {{\psi }^{(2)}} = \left( {\matrix {{\psi _{1}^{{(2)}} } \hfill \cr  {\psi _{2}^{{(2)}} } \hfill \cr {\psi _{3}^{{(2)}} } \hfill \cr  {\psi _{4}^{{(2)}} } \hfill \cr  {\psi _{5}^{{(2)}} } \\\end{}}} \right) = \left( {\matrix {1 &amp; 0 &amp; 0 &amp; 0 &amp; 0 \hfill \cr 1 &amp; 1 &amp; 0 &amp; 0 &amp; 0 \hfill \cr 1 &amp; 0 &amp; 1 &amp; 0 &amp; 0 \hfill \cr 1 &amp; 0 &amp; 0 &amp; 1 &amp; 0 \hfill \cr 1 &amp; 0 &amp; 0 &amp; 0 &amp; 1 \\\end{}}} \right)\left( {\matrix {{{{\alpha }_5}} \hfill \cr {{{\alpha }_6}} \hfill \cr {{{\alpha }_7}} \hfill \cr {{{\alpha }_8}} \hfill \cr {{{\alpha }_9}}} \\\end{}} \right)\eqno<$$><!--\]

The parameter vectors ψ (1) and ψ (2) can be seen in the winBUGS code, and it is straightforward to recover the original parameters from the elements:

\[--><$$> {{\alpha }_1} = \psi _{1}^{{(1)}}, \,{{\alpha }_2} = \psi _{2}^{{(1)}} {\rm{ - }}\psi _{1}^{{(1)}}, \,{{\alpha }_3} = \psi _{3}^{{(1)}} {\rm{ - }}\psi _{1}^{{(1)}} \eqno<$$><!--\]
\[--><$$> {\rm{and}}\; \; {{\alpha }_5} = \psi _{1}^{{(2)}}, \,{{\alpha }_i} = \psi _{{i{\rm{ - }}4}}^{{(2)}} {\rm{ - }}\psi _{1}^{{(2)}} \;(i = 6,7,8,9). \eqno<$$><!--\]

As mentioned in Section 3.2, the prior distribution for the number of parameters included is chosen such that all of the models are equally likely (a priori). In other words, P(M (1)) = 2−2 for the first trans-dimensional component, and P(M (2)) = 2−4 for the second. The prior distributions of the optional parameters, conditional on M (j) (j = 1, 2), is set by default such that they are independently normally distributed. It is possible to specify the model in winBUGS such that all parameters have the same prior mean and variance, or such that the first parameter has a different mean and variance. For the first trans-dimensional model, (4.2), we give all of the parameters the same mean and variance, but for the second, (4.3), we give the first parameter a different prior mean and variance in order to accommodate the second Gompertz parameter. We have found that the most efficient way to proceed is to first fit a simple Gompertz model to the data, and use the maximum likelihood estimates of the parameters as the prior means.

In summary, the prior distributions are specified as follows (with all of the parameters being assumed to be independent, a priori).

\[--><$$> \left( {{{\alpha }_1},{{\alpha }_2},{{\alpha }_3}} \right)\sim \,{\rm{independent}}\ {\rm{normal}}\ {\rm{with}}\ {\rm{mean}}\ {\rm{0}}\,{\rm{and}}\,{\rm{variance}}\ \sigma _{1}^{2} \eqno<$$><!--\]
\[--><$$> {{\alpha }_4}\sim N\left( {{{a}_1},10,000} \right),\;{{\alpha }_5}\sim N\left( {{{a}_2},10,000} \right), \eqno<$$><!--\]

where a 1, a 2 are the maximum likelihood estimates of the parameters in the simple Gompertz model, and a large variance of 10,000 is used so that the prior distributions are relatively vague

\[--><$$> \left( {{{\alpha }_6},{{\alpha }_7},{{\alpha }_8},{{\alpha }_9}} \right)\sim \,{\rm{independent}}\ {\rm{normal}}\ {\rm{with}}\ {\rm{mean}}\ {\rm{0}}\,{\rm{and}}\,{\rm{variance}}\ \sigma _{2}^{2} \eqno<$$><!--\]
\[--><$$> \sigma _{1}^{{{\rm{ - }}2}}, \sigma _{2}^{{{\rm{ - }}2}} \sim \,{\rm{independent}}\ \Gamma \left( {0.001,0.001} \right). \eqno<$$><!--\]

Finally, it is necessary to place some basic restrictions on the values of the parameters (α 1, α 2, α 3), in order to ensure that the values of the mortality rates, μx, should all be positive. Clearly, negative values are not practically justifiable, and they may cause the programme to crash when it tries to calculate ln(μx) in the log-likelihood. Thus, we place restrictions to ensure that )(--><$>\mathop{\sum}\sidelimits_{i = 1}^3 {{{\alpha }_i}{{C}_{i{\rm{ - }}1}}\left( {\frac{{x{\rm{ - }}u}}{v}} \right)} \gt 0 <$><!-- at low ages, noting that )(--><$>\exp \left( {\mathop{\sum}\sidelimits_{i = r + 1}^{r + s} {{{\alpha }_i}{{C}_{i{\rm{ - }}r{\rm{ - }}1}}\left( {\frac{{x{\rm{ - }}u}}{v}} \right)} } \right) <$><!-- will be close to zero at low ages. Firstly, we ensure that α 1 > 0, as a basic requirement of a sensible GM model. Secondly, we note that the second derivative of )(--><$>\mathop{\sum}\sidelimits_{i = 1}^3 {{{\alpha }_i}{{C}_{i{\rm{ - }}1}}\left( {\frac{{x{\rm{ - }}u}}{v}} \right)} <$><!-- should be positive, to reflect the expected convex shape of this contribution to the mortality rates at low ages. Hence, the second restriction is α 3 > 0. Finally, the value of α 2 is restricted so that )(--><$>\mathop{\sum}\sidelimits_{i = 1}^3 {{{\alpha }_i}{{C}_{i{\rm{ - }}1}}\left( {\frac{{x{\rm{ - }}u}}{v}} \right)} \gt 0 <$><!-- when )(--><$>\frac{{x{\rm{ - }}u}}{v} = {\rm{ - }}1 <$><!--. This means that α 1α 2 + α 3 > 0 and hence the final restriction is α 2 < α 1 + α 3. In the MCMC algorithms in winBUGS, it is straightforward to ensure that these restrictions are not violated by simply replacing the sampled value when it is not satisfactory. Thus, for example, negative values of α 1 and α 3 are replaced by 0. Again, this situation is not quite ideal and the compromise we have suggested is necessary because the implementation in winBUGS only allows normal prior distributions to be used. We do not believe that the effect of this is significant since any negative values are unlikely to be much different from 0.

5. Examples

We illustrate the application of the automatic graduation method using two sets of data, which are taken from Forfar et al. (Reference Forfar, McCutcheon and Wilkie1988). It should be emphasised that the same programme is used for each data set, and the differences in the results are entirely due to the differing natures of the data themselves. It will be seen that this graduation method deals satisfactorily with the data in each case, without the need for any intervention from the graduator, and it is for this reason that we call this an “automatic” graduation method.

In all cases, we have used an initial burn-in of 50,000 iterations (these are values which are discarded), and have found that the models have then converged. In general, we would expect that 50,000 burn-in iterations would be sufficient for convergence, but it is always recommended that this is checked (see, for example, Johansen et al., Reference Johansen, Evers and Whiteley2010, for tools to monitor convergence). After this, we ran 50,000 iterations and used these for the results. Thus, in equation (3.4), B = 50,000, N = 50,000 and t = 1.

5.1 Example 1

The data for the first example are taken from table 15.5 of Forfar et al. (Reference Forfar, McCutcheon and Wilkie1988), and consist of data from the CMI relating to the numbers of deaths for insured pensioners’ widows over the calendar years 1979–82, grouped by age nearest birthday. Forfar et al. (Reference Forfar, McCutcheon and Wilkie1988) conclude that a satisfactory graduation for μx could be provided by the simple Gompertz model, GM 0,2 (x). Note that Forfar et al. (Reference Forfar, McCutcheon and Wilkie1988) use u = 70 and v = 50, whereas we use )(--><$>u = \frac{{{{x}_{\min }} + {{x}_{\max }}}}{2} = 62.5 <$><!--, )(--><$>v = \frac{{{{x}_{\max }}{\rm{ - }}{{x}_{\min }}}}{2} = 45.5 <$><!--. Hence, the parameter estimates cannot be directly compared, although we could implement a simple conversion to obtain corresponding values. Since Forfar et al. (Reference Forfar, McCutcheon and Wilkie1988) conclude that a Gompertz model provides a satisfactory graduation, this example provides a test of whether the new graduation method is able to come to a similar conclusion: in effect, we would expect the Bayesian model to tell us that none of the optional parameters is required. As was explained in Section 4, we always start from the Makeham model, and we would therefore also expect that the posterior mean of α 1 should be close to 0. We would also expect α 2 and α 3 not to be needed in the model. However, since we use a separate trans-dimensional model for this part of the GM formula, the Bayesian model treats them separately and indicates that they should be included in the model. However, this result is not a suprise since the parameter estimates themselves are extremely small: in effect the trans-dimensional model assumes that all of the parameters are very small and, relative to this, that they are all of the same magnitude and should therefore be included. The estimates of these parameters are shown in Table 1.

Table 1 Estimates of the parameters in the first part of the GM formula for the data from Example 1 of Forfar et al. (Reference Forfar, McCutcheon and Wilkie1988)

It can be seen that these parameter estimates are so small that they will have no substantive effect on the graduated values whether or not they are included in the model.

The second trans-dimensional model indicates that none of the optional parameters in )(--><$>\exp \left( {\mathop{\sum}\sidelimits_{i = r + 1}^{r + s} {{{\alpha }_i}{{C}_{i{\rm{ - }}r{\rm{ - }}1}}\left( {\frac{{x{\rm{ - }}u}}{v}} \right)} } \right) <$><!-- should be included, leaving just α 4 and α 5 in the model. The conclusion from this is that the basic Gompertz model, )(--><$>{{\mu }_x} = \exp \left( {{{\alpha }_4} + {{\alpha }_5}\left( {\frac{{x{\rm{ - }}u}}{v}} \right)} \right) <$><!-- is indeed most likely to provide a suitable graduation for these data. As noted in Section 3, we could either base inferences about the mortality rates on the most likely model or we can use a weighted average of all models, with the most likely models getting the most weight: see (3.2) and (3.3). In this paper, we use (3.3) and base the estimates of the mortality rates on the means of their posterior distributions. Table 2 shows the estimates of the parameters in )(--><$>\exp \left( {\mathop{\sum}\sidelimits_{i = r + 1}^{r + s} {{{\alpha }_i}{{C}_{i{\rm{ - }}r{\rm{ - }}1}}\left( {\frac{{x{\rm{ - }}u}}{v}} \right)} } \right) <$><!--.

Table 2 Estimates of the parameters in the second part of the GM formula for the data from Example 1 of Forfar et al. (Reference Forfar, McCutcheon and Wilkie1988)

For comparison purposes, the estimates of α 4 and α 5 using u = 70 and v = 50 would be −3.55139 and 4.26291, compared with −3.553013 and 4.316579 in Forfar et al. (Reference Forfar, McCutcheon and Wilkie1988). Alternatively, the maximum likelihood estimates of the parameters of the Gompertz model using u = 62.5 and v = 45.5 are −4.2005 and 3.9281, which can be compared with the estimates of α 4 and α 5 in Table 2.

Figure 1 shows the results of the graduation (plotted on the log scale), together with the Gompertz curve fitted by Forfar et al. (Reference Forfar, McCutcheon and Wilkie1988). The graduated rates for the Bayesian model are obtained by averaging over the models using the posterior probabilities in (3.3), which explains why they do not follow exactly a straight line. We believe that this is the best way to proceed in general, and it can be seen that the new automatic graduation method has produced graduated values which are very close to those which were deemed to be suitable in Forfar et al.

Figure 1 Crude mortality rates, together with graduated rates from Forfar et al. (Reference Forfar, McCutcheon and Wilkie1988) (solid line) and from the Bayesian model (dashed line) using the posterior weights to average over all models, for the data from Example 1

In order to test the fit of the graduation, the same tests can be deployed as in Forfar et al. (Reference Forfar, McCutcheon and Wilkie1988). The number of parameters is not completely determined in the Bayesian method, although it would be reasonable to assume that it is close to 2, since the only significant parameters which were indicated that should be included are α 4 and α 5. The χ 2 goodness-of-fit test statistic is 37.22. This compares favourably with the value in Forfar et al. (Reference Forfar, McCutcheon and Wilkie1988), 38.29, but this is probably simply due to the fact that the Bayesian model mixes in (with very low weights) some models with more parameters and therefore achieves a slightly better fit. All the other tests are satisfactory, and we do not repeat them here (the complete test results for Example 2 are shown below).

In conclusion, this example has shown that the Bayesian model has produced graduated values which are very close to those of the Gompertz model, without the need for any input or model choice.

5.2 Example 2

This example considers a case which is not as straightforward as Example 1, and uses the CMI data from table 16.5 of Forfar et al. (Reference Forfar, McCutcheon and Wilkie1988). These data come from the mortality experience of male insured pensioners over the calendar years 1967–70, and Forfar et al. (Reference Forfar, McCutcheon and Wilkie1988) conclude, after considering a number of different possible models, that a GM(1,3) model s is the most suitable for graduating these data. For comparison purposes, the fitted GM(1,3) model in Forfar et al. (Reference Forfar, McCutcheon and Wilkie1988) was

\[--><$$> {{\mu }_x} = 0.00557291 + \exp \left( {{\rm{ - }}5.4677 + 6.007755\left( {\frac{{x{\rm{ \,-\, }}63.5}}{{44.5}}} \right){\rm{ - }}1.3219\left( {2{{{\left( {\frac{{x{\rm{ \,-\, }}63.5}}{{44.5}}} \right)}}^2} {\rm{ \,-\, }}1} \right)} \right). \eqno<$$><!--\]

For this example, the Bayesian model suggests that a different model is more appropriate, and Table 3 shows the parameter estimates.

Table 3 Estimates of the parameters for the Bayesian model, for the data from Example 2 of Forfar et al. (Reference Forfar, McCutcheon and Wilkie1988)

The posterior probabilities for the models in the first part of the GM formula are shown in Table 4. The 0's and 1's in the first column refer to whether α 2 and α 3 should be included (α 1 is always included, as explained in Section 4).

Table 4 Posterior probabilities for the set of possible models for the first part of the GM formula

Table 4 shows that, although the model structure 00 (with just α 1) is the most likely model (concurring with the choice of the GM(1,3) in Forfar et al. (Reference Forfar, McCutcheon and Wilkie1988), there are also reasonable posterior probabilities for the other models. Table 5 shows the marginal probabilities that each parameter should be included. When the fitted mortality rates are calculated using the weighted average of these models, the effect of these probabilities will be seen at early ages.

Table 5 Marginal posterior probabilities for the parameters in the first part of the GM formula

Similarly, Tables 6 and 7 show the corresponding posterior and marginal probabilities for the second part of the GM formula.

Table 6 Posterior probabilities for the set of possible models for the second part of the GM formula

Table 7 Marginal posterior probabilities for the parameters in the second part of the GM formula

The model fitted by Forfar et al. (Reference Forfar, McCutcheon and Wilkie1988), a GM(1,3) corresponds to the model structure 1000, with just α 6 being included. It is interesting to note that the Bayesian model disagrees with this, concluding that the parameter which definitely needs to be included is α 7 (with a reasonably large posterior marginal probability for α 8 and α 6). Since the GM models of Forfar et al. (Reference Forfar, McCutcheon and Wilkie1988) only allow nested models, the only model structures that they would consider in this framework are 1000, 1100, 1110 and 1111. Again, it is interesting to note that the two most likely models from the Bayesian model are not in this set. Of course, any judgment over which approach gives the better graduation will depend on the fitted mortality rates. For this example, in contrast to example 1, there are some significant differences between the results, in terms of the parameters and the models, from the conventional GM approach and those from the Bayesian approach. However, it can be seen from Figure 2 that the graduated values are remarkably similar, except at extreme ages. The χ 2 goodness-of-fit test statistic value for the graduated values from the Bayesian method is, as in Example 1, smaller than that of the GM(1,3): 53.03 compared with 54.72. It can be argued that the Bayesian model is mixing in graduations with more than 4 parameters, and that its goodness-of-fit should be better because of this feature.

Figure 2 Crude mortality rates, together with graduated rates from Forfar et al. (Reference Forfar, McCutcheon and Wilkie1988) (solid line) and from the Bayesian model (dashed line), for the data from Example 2

It is difficult to make the kind of judgments in terms of the number of parameters to use that are applied by Forfar et al. (Reference Forfar, McCutcheon and Wilkie1988). Their argument is that, if a graduation has too many parameters, then its properties would be unsatisfactory in terms of its shape and “sheaf” (how wide the confidence intervals are around the graduated values). Figure 3 shows the graduated mortality rates, together with the 95% confidence bands from the posterior distribution for Bayesian method. As can be seen, this graduation is certainly satisfactory in terms of its sheaf. A comparison with Figure 16.2 of Forfar et al. (Reference Forfar, McCutcheon and Wilkie1988) shows that this sheaf is similarly tight, although some of the characteristics are different. In particular, the Bayesian model is less confident about the mortality rates at high ages: this is probably due to the fact that the Bayesian model includes some model uncertainty. It should be noted that the approach used by Forfar et al. (Reference Forfar, McCutcheon and Wilkie1988) does not make any allowance for model uncertainty: a model is chosen and it is then assumed that this is the correct model (with 100% certainty) when producing graduated mortality rates. We would argue that the Bayesian model is more realistic in the sense that it acknowledges that the model may not be correct.

Figure 3 Crude and graduated mortality rates, together with the 95% sheaf for the results of the Bayesian model

All of the usual tests of the graduation can be applied. For example, we can compare the results of the tests for the Bayesian graduation with those in Table 16.3 of Forfar et al. (Reference Forfar, McCutcheon and Wilkie1988). Although these tests may not provide a comprehensive assessment of the goodness-of-fit of a graduation, it is useful to compare the performance of the Bayesian methods using the same tests as Forfar et al. (Reference Forfar, McCutcheon and Wilkie1988). The tests used are:

  • Comparison of total actual and expected deaths. These should be close to each other, and can be compared by looking at either the difference or the ratio.

  • Signs Test. It is expected that there should be the same number of + and − signs, and this can be tested using a binomial reference distribution.

  • Runs Test. There should not be too few runs of the same sign of residuals, since this would indicate a pattern in the residuals where the fit was not good. The Runs Test is a standard test for this.

  • The Kolmogorov-Smirnov test can be used to compare two distributions, and is used here in order to compare the actual and graduated distributions of deaths.

  • Autocorrelations. The residuals should not show any systematic patterns, which should have been captured in the graduation. Examinations of the autocorrelations can be used to assess one aspect of this.

Comparison of total actual deaths (A) and total expected (E):

Signs Test:

Runs test:

The results of the Kolmogorov-Smirnov test are the same for both, with a maximum deviation of 0.0019. Figure 4 shows the auto-correlations for the residuals from the Bayesian model, together with the 95% confidence limits. It can be seen from this Figure that none of the autocorrelations are significant.

Figure 4 Autocorrelations of the residuals from the Bayesian model for the data from Example 2

Overall, the conclusions from these detailed tests are that the graduated values from the Bayesian model provide a satisfactory fit to the data.

6. Conclusions

This paper has proposed a new method for graduating mortality data, which we believe to be suitable for data over adult ages. The method has the great advantage that it is relatively automatic: in most cases, the results can simply be taken as they stand without any further adjustment. We believe that this is the major advantage of this new method over the more conventional process of searching through a set of possible models and trying to choose one which is appropriate. In many cases, the estimation of the parameters using maximum likelihood can be problematic because of the non-linear nature of the mean. For the Bayesian model, we have not encountered any such problems although there is the question of the restrictions on the parameters in Section 4. The complicated process described in Forfar et al. (Reference Forfar, McCutcheon and Wilkie1988) where a large range of potential models have to be fitted and assessed separately is replaced in the Bayesian method with a automatic procedure that requires little intervention from the user. Clearly, every graduation should be checked, but we believe that this new automatic method will produce a set of graduated rates which are acceptable for a large range of applications. Finally, we would emphasise again that the Bayesian model includes some model error in the calculation of the confidence intervals. We believe that this feature is more realistic than the conventional method where the confidence intervals are calculated assuming that the model chosen is the correct model.

We have implemented the procedure in winBUGS, which has the advantage that it is possible for anyone to use the code for their own data, and to experiment with any changes that they might like to consider. A restriction of using winBUGS is that it is necessary to use the RJMCMC as it is implemented in the software, as pointed out at various points in this paper. A further development would be to implement the method using code created specifically for this purpose in a programme such as Visual Basic.

Appendix: WinBUGS code

This Appendix contains the WinBUGS code for the data in Example 2. This can be easily adapted for other data.

References

Broffit, J.D. (1988). Increasing and increasing convex Bayesian graduation. Transactions of Society of Actuaries, 40, 115148.Google Scholar
Carlin, B.P. (1992). A simple Monte Carlo approach to Bayesian graduation. Transactions of Society of Actuaries, 44, 5576.Google Scholar
Congdon, P. (2006). Bayesian Statistical Modelling. John Wiley.CrossRefGoogle Scholar
Czado, C., Delwarde, A., Denuit, M. (2005). Bayesian Poisson log-bilinear mortality projections. Insurance: Mathematics and Economics, 36(3), 260284.Google Scholar
Forfar, D.O., McCutcheon, J.J., Wilkie, A.D. (1988). On Graduation by Mathematical Formula. Journal of the Institute of Actuaries, 115, 1149.CrossRefGoogle Scholar
Gamerman, D., Migon, H.S. (1993). Bayesian dynamic hierarchical models. Journal of the Royal Statistical Society Series B, 55(3), 629642.Google Scholar
Gelfand, A.E., Smith, A.F.M. (1990). Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association, 85, 398409.CrossRefGoogle Scholar
Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B. (1995). Bayesian Data Analysis. Chapman and Hall, London.CrossRefGoogle Scholar
Geman, S., Geman, D. (1984). Stochastic relaxation, Gibbs distribution, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721741.CrossRefGoogle ScholarPubMed
Gilks, W.R., Richardson, S., Spiegelhalter, D.J. (1996). Markov Chain Monte Carlo in Practice. Chapman and Hall, London.Google Scholar
Gompertz, B. (1825). On the Nature of the Function Expressive of the Law of Human Mortality, and on a New Mode of Determining the Value of Life Contingencies. In a letter to Francis Bailey. Philosophical Transactions of the Royal Society, 115, 513583.Google Scholar
Green, P.J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82, 711732.CrossRefGoogle Scholar
Hastings, W.K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57, 97109.CrossRefGoogle Scholar
Heligman, L., Pollard, J.H. (1980). The Age Pattern of Mortality. Journal of the Institute of Actuaries, 107, 4980.CrossRefGoogle Scholar
Johansen, A.M., Evers, L., Whiteley, N. (2010). Monte Carlo Methods. Lecture Notes, Department of Mathematics, University of Bristol.Google Scholar
Kimeldorf, G.S., Jones, D.A. (1967). Bayesian graduation. Transactions of the Society of Actuaries, 19, 66112.Google Scholar
Lunn, D.J., Best, N., Whittaker, J.C. (2009). Generic reversible jump MCMC using graphical models. Statistics and Computing, 19, 395408.Google Scholar
Lunn, D.J., Thomas, A., Best, N., Spiegelhalter, D. (2000). WinBUGS – a Bayesian modelling framework: concepts, structure, and extensibility. Statistics and Computing, 10, 325337.Google Scholar
Macdonald, A.S. (1996). An Actuarial Survey of Statistical Models for Decrement and Transition Data. I: Multiple State, Binomial and Poisson Models. British Actuarial Journal, 2, 129155.CrossRefGoogle Scholar
Makeham, W. (1859). On the Law of Mortality and the Construction of Annuity Tables. Journal of the Institute of Actuaries, 8, 301310.Google Scholar
Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.B., Teller, A.H., Teller, E. (1953). Equation of state calculations by fast computing machines. The Journal of Chemical Physics, 21, 10871092.CrossRefGoogle Scholar
Neves, da Rocha C., Migon, H.S. (2007). Bayesian graduation of mortality rates: an application to reserve evaluation. Insurance: Mathematics and Economics, 40, 424434.Google Scholar
Scollnik, D.P.M. (2001). Actuarial Modeling with MCMC and BUGS. North American Actuarial Journal, 5(2), 96124.Google Scholar
Taylor, G. (1990). A Bayesian Interpretation of Whittaker-Henderson Graduation. Paper presented to the Risk Theory seminar, Mathematschers Forschungsinstitut, Oberwolfach, Federal Republic of Germany.Google Scholar
Verrall, R.J. (1993). A State Space Formulation of Whittaker-Henderson Graduation, with Extensions. Insurance: Mathematics and Economics, 13, 714.Google Scholar
Whittaker, E.T. (1923). On a new method of graduation. Proceedings of the Edinburgh Mathematical Society, 41, 6375.Google Scholar
Figure 0

Table 1 Estimates of the parameters in the first part of the GM formula for the data from Example 1 of Forfar et al. (1988)

Figure 1

Table 2 Estimates of the parameters in the second part of the GM formula for the data from Example 1 of Forfar et al. (1988)

Figure 2

Figure 1 Crude mortality rates, together with graduated rates from Forfar et al. (1988) (solid line) and from the Bayesian model (dashed line) using the posterior weights to average over all models, for the data from Example 1

Figure 3

Table 3 Estimates of the parameters for the Bayesian model, for the data from Example 2 of Forfar et al. (1988)

Figure 4

Table 4 Posterior probabilities for the set of possible models for the first part of the GM formula

Figure 5

Table 5 Marginal posterior probabilities for the parameters in the first part of the GM formula

Figure 6

Table 6 Posterior probabilities for the set of possible models for the second part of the GM formula

Figure 7

Table 7 Marginal posterior probabilities for the parameters in the second part of the GM formula

Figure 8

Figure 2 Crude mortality rates, together with graduated rates from Forfar et al. (1988) (solid line) and from the Bayesian model (dashed line), for the data from Example 2

Figure 9

Figure 3 Crude and graduated mortality rates, together with the 95% sheaf for the results of the Bayesian model

Figure 10

Figure 4 Autocorrelations of the residuals from the Bayesian model for the data from Example 2