Executive summary
In this paper we examine two aspects of extreme events: their calculation and their communication. In relation to calculation, there are two types of extreme events that are considered.
The first is the extent to which extreme events in two or more variables occur together. This is important when considering the structure and parameterisation of financial models. There are a number of approaches that can be used, all of which focus on the tails of joint distributions. As such, they are referred to as measures of tail association. Tail correlation – essentially, the correlation between variables in the tail of a joint distribution – can be used to calculate the strength of the relationship between two or more variables in a joint tail. Even here, there are several different measures of correlation that can be used. However, the direction of correlation says nothing about the importance of the observations in the joint tail. A measure that does this better is tail dependence. This can be thought of as the number of observations in the joint tail of a distribution as a proportion of the maximum number of observations that could appear. This means that its values range from zero (no tail dependence) to one (full tail dependence).
The traditional measure of tail dependence is the coefficient of tail dependence. This is the number of observations in the joint tail of a distribution as a proportion of the maximum number of observations that could appear as the size of the joint tail reduces to zero (for the coefficient of lower tail dependence) or one (for the corresponding upper measure). However, a major drawback with this measure is that it fails to distinguish between multivariate normal distributions or normal copulas with different correlation coefficients. An alternative measure which avoids this issue is the coefficient of finite tail dependence, which evaluates the measure using a finite tail size – so, for example, looking at the measure for the lowest ten per cent of observations from two distributions.
All of these measures can be evaluated for more than two distributions – in other words, they can be used to describe the extent to which three or more variables behave in extreme circumstances.
The second type of extreme event arises from combinations of losses from a series of risks that together result in total losses exceeding a particular level. This is measured using ruin lines or, in higher dimensions, planes and hyperplanes. Whilst only the order of observations matters for the coefficience of tail dependence, the values of the observations are important for measures of extreme loss, since extreme loss is measured relative to a monetary amount rather than percentiles. We consider two measures of this risk. The first is the probability of ruin. This looks at the likelihood that a combination of risks will result in an institution becoming insolvent by some measure. The second measure is the economic cost of ruin. This evaluates average loss beyond some certain critical point. The probability of ruin and economic cost of ruin are analogous to the value at risk (VaR) and conditional or tail value at risk (CVaR or TVaR), the difference being that the measures we consider look at loss beyond a particular level whilst the VaR-based measures look at the loss beyond a particular probability.
We also consider how loss might be defined in this context. The first approach we use is to look at current levels of exposure to various risks, and to consider how much risk an institution is exposed to for groups of two, three of more of these risks. The second approach is to consider the risks to which an organisation might be exposed and to look at the optimal combination of risks according to one of these measures. It is important to recognise any non-linearities between risks. Whilst there will be no interaction between some risks – for example, the risk of investing in UK and US equities, for others there will be significant interaction. A good example is the way in which interest rate and mortality risk affect one another in an annuity portfolio.
The communication of extreme events goes beyond just looking at numbers. This is because the number of combinations of risks can be so large that a list of statistics does not make interpretation easy. We therefore consider a range of graphical techniques, some for considering the extent of tail dependence and some for looking at the risk of extreme loss. When looking at tail dependence, the charts we show display two factors: the extent of tail dependence, and the importance of the risk combination, based on the size of the exposure to each of the risks considered. For the risk of extreme loss, we consider charts that show the optimal combinations of risk, and the risk-reward trade-offs that use measures of extreme risk in place of more traditional metrics such as volatility.
1. Introduction
1.1. Themes
In this paper we consider the calculation and communication of tail association and the risk of extreme loss. This means considering a number of issues separately. Calculation is dealt with first, for both measures tail association and of extreme loss. Communication of the results – in particular, for a large number of risk combinations – is then dealt with separately
Tail association
There are several motivations for this paper. Tail association is relevant because it considers the extent to which extreme values for two or more variables are likely to occur together, something ignored if only the correlation – linear or rank – between data series is used. This is important when considering the structure and parameterisation of financial models, since focussing on the jointly extreme observations can help ensure that the structure of a model is sound. However, there are a number of ways in which tail association can be measured, and some approaches are better – and easier to apply – than others.
The risk of joint loss
A separate theme within the area of calculation is the risk of joint loss. This is the risk that the total loss from two or more individual sources exceeds some critical level. When measuring risk for an investor, joint loss is generally more important than co-movement. There are a number of reasons why the risk of joint loss is not simply a subset of the risk of extreme co-movement. Whilst jointly extreme observations can cause a large loss, large losses are not exclusively caused by jointly extreme observations. In particular, a loss can occur with an extreme value from one variable and an average value from another, a scenario ignored if the likelihood of jointly extreme variables is the sole consideration. Joint loss also depends on the marginal distributions of the risks. Whilst marginal distributions affect measures of co-movement based on the linear correlation coefficient, it is not generally true. Finally, the risk of joint loss depends on the exposure to each of the risks, or how much of each risk is being taken. This can be thought of as an aspect of the marginal distribution, to the extent that the level of a risk taken scales the marginal distribution. We therefore also look at measures of risk that examine the likelihood and severity of total losses arising from two or more sources.
1.2. Copulas
The level of risk faced in respect of two or more risks depends on the way in which these risks interact and, in some cases, on the way in which the risks behave individually. These two factors can be described independently.
The individual behaviour of risks is determined by the marginal distribution of those risks, whereas the way in which they interact is given by the copula between them. Both can be described in a number of ways. A common distinction is between empirical and parametric distributions and copulas. Empirical distributions and copulas describe exactly the shape of a distribution or the relationship between two or more distributions without trying to define a mathematical form behind the distribution or relationship. Parametric distributions and copulas, on the other hand, provide mathematical functions that define the shape of the marginal distribution or relationships between variables.
For copulas, there are more choices. If a parametric copula is used, then there are two aspects of the copula that must be considered in the context of the relationships between variables. The first is the mathematical form of the copula, whilst the second is the parameter or parameters used. Together, these describe not just the shape of the relationship between variables, but also the overall strength of that relationship.
However, the overall strength of the relationship between variables is not the same as the relationship for variables in extreme circumstances. In particular, the shape of the relationship will determine whether the overall relationship is weaker or stronger in the joint tails of the distribution.
Many of these measures concentrate on the relationship between two variables. However, it can be as instructive to consider co-movement among three, four or even more variables. For example, if two pairs of asset classes, with one asset class being common to each pair, each have a strong degree of co-movement for extreme observations then this is worth noting; however, if the strength of the relationship persists when co-movement between the three asset classes is considered then this is even more important. In particular, it suggests that a common factor is impacting all three asset classes in the same way rather than two factors affecting the two pairs.
The relationship between variables in the tails, whilst a useful way of measuring jointly extreme events, does not necessarily reflect the total level of risk faced by an organisation. For example, an organisation will suffer severe losses if two lines of business suffer large losses, but also if profits in one line are wiped out by catastrophic losses in another. This situation can be modelled using the concept of ruin lines. However, it is important to note that here the marginal distributions are as important as the copulas between the variables.
The relationship between such risks can also be considered in more than two dimensions – concentrations of risk between three, four or more lines of business, and their joint ability to cause large losses, could be even more serious than the concentrations between two.
However, measuring risk is only part of the issue. The result of this sort of analysis can be a large amount of quantitative information, and it is important that this information can be communicated clearly. Therefore, ways of communicating the tail dependency structure are also discussed. Most of the focus here is on communicating not just the strength of the tail relationship, but also the importance to an investor. The reason for this is that two risks, for example arising from investment in asset classes, might be closely linked, but if the investor's exposure to those risks is minimal then the impact of that risk will also be limited.
Before investigating these issues in more detail, it is worth describing briefly the nature of copulas, given their importance in much of the analysis that follows. It is also worth looking at measures of correlation that describe the overall strength of relationships between two variables.
2. Copulas
As mentioned above, a copula describes the way in which two distributions are linked. If two or more sets of observations are being considered, this means that a copula does not describe the way in which these observations are linked, but it describes the relationship between the order of the observations. Importantly, the copula between two or more sets of observations is independent of the observations’ marginal distribution functions. Consider two random variables, X and Y with the following joint distribution function:
Each of these variables will also have a marginal distribution function, $$${{F}_X}(x)\, = \, {\rm Pr}(X\leq x)$$$ and $$${{F}_Y}(y)\, = \,{\rm Pr}(Y\leq y)$$$. If u and v are defined as $$$u\, = \,{{F}_X}(x)$$$ and $$$v\, = \,{{F}_Y}(y)$$$, then F(x,y) can also be written as:
where the function C(u,v) is termed the copula. In other words, the joint distribution function can be described as a function of the individual distribution functions. Sklar (Reference Sklar1959) shows that if FX (x) and FY(y) are continuous, the function C(u,v) is unique.
Copulas can also be defined for more than two variables. For example, if $$${{u}_n}\, = \,{{F}_{{{X}_n}}}({{x}_n})$$$ where n = 1,2, …, N, then the joint distribution function of the random variables X 1,X 2, …, XN, can be written as:
where the copula C(u 1,u 2, … ,u N) is unique if the variables are continuous.
Various types of copula are described in Appendix 1, but it is worth describing here the approach used to create an empirical copula, since these can be used to calculate some of the measures of tail association we consider later. Empirical copulas describe exactly the relationship between two or more samples without trying to define a mathematical form behind the relationship. To calculate an empirical copula, it is necessary to use the ranks of variables to give measures that are always greater than zero and less than one. For T pairs of observations, this could mean (for example) having values from 1/(1 + T) to T/(1 + T), or from 1/2T to (2T – 1)/2T. In the first case, this means defining the joint distribution function as:
Where Xs and Ys are some Xt and Yt respectively, and I(Xt ≤ x and Yt ≤ y) is an indicator function which is equal to one if the conditions in the parentheses are met and zero otherwise. The joint distribution function calculated using the second case can be defined as:
To extend this to higher dimensions, the summations described above should be based on Xn,t being less than or equal to xn for all n. These are joint distribution functions, but because they depend only on the order of the observations they can be regarded as copulas.
3. Broad Measures of Association
3.1. Overview
Broad measures of association summarise the strength of relationship between variables into a single numerical value. This means that a great deal of information about the nature of the relationship can be lost; however, such measures are useful indicators of the overall level of co-movement. This is important, as it indicates the measures that might usefully be adapted to describe tail association. Measures of association are usually calculated using only two variables – that is, they describe the broad relationship between pairs of variables. However, some measures have been constructed to quantify the strength of relationship between three or more variables. This can be helpful in summarising the broad relationship between a larger number of variables. In particular, it can indicate whether a group of variables is excessively affected by a single underlying factor.
There are in fact a number of ways of calculating such measures and the main ones are outlined below. However, before doing this it is worth considering what constitutes a “good” measure of association.
3.2. Concordance
There is no definitive description of what makes one measure of association better than another; however, there are a number of features that it might be desirable for a measure of association to have. One set of criteria is collectively referred to as concordance. Concordance can be thought of as the extent to which the orders of two or more sets of variables are related to each other. Scarsini (Reference Scarsini1984) proposes a number of features that a measure of association, MX,Y, should have to be regarded as a measure of concordance between two variables X and Y. These are:
• completeness of domain: MX,Y must be defined for all values of X and Y, with X and Y being continuous – this is important, as it ensures that a measure of association can be calculated for any value of X and Y, and will not produce an error;
• symmetry: MX,Y = MY,X or in other words switching X and Y should not affect the value of the measure – this is a “common sense” requirement;
• coherence: if C(u, v) ≥ C(m, n) then MX,Y ≥ MS,T where m,n,u and v are equal to F(S), F(t), F(x) and F(y) respectively, or in other words if the joint cumulative probability is higher, then the measure of association should also be higher – this is to ensure that if the cumulative probability for one pair of observations is higher than for another, this is reflected in the measure of association for that pair of observations;
• unit range: −1 ≤ MX,Y ≤ 1, and the extreme values in this range should be feasible – this is simply for convenience, in that it is easier to interpret a measure that is scaled such that the extremes are −1 and 1. In particular, it makes it easier to compare measures of association for different sets of data;
• Independence: if X and Y are independent, then MX,Y = 0 – this ensures that independence between any two sets of data produces the same result for a given measure of association, and a value of zero is the most intuitively sensible result in such a situation;
• Consistency: if X = −Z, then MX,Y = −MZ,Y, so reversing the signs of one variable should simply reverse the sign of the measure – this ensures that the relative strength of relationships for is consistent between sets of variables whether the direction of this relationship is positive or negative; and
• Convergence: X 1,X 2, …, XT and Y 1,Y 2, …, YT are sequences of T observations with the joint distribution function F T(x,y) and the copula C T(u,v), and if C T(u,v) tends to C(u,v) as T tends to infinity, then $$$M_{{X,Y}}^{T} $$$ must tend to MX,Y – this ensures that a measure calculated from discrete data tends to the measure calculated from a continuous distribution as the number of discrete observations increases.
Together, this list of features also implies that:
• if g(X) and h(Y) are monotonic transformations of X and Y, it is also true that Mg (X ),h (Y ) = MX,Y – in other words, only the order of observations matters; and
• if X and Y are co-monotonic (when X is higher then Y is always higher), then MX,Y = 1; if they are counter-monotonic (when X is higher then Y is always lower), then MX,Y = −1 – this describes the natural limits defined above.
It is more difficult to come up with corresponding criteria for measures looking at more than two dimensions, but Úbeda Flores (Reference Úbeda Flores2005), Dolati & Úbeda Flores (Reference Dolati and Úbeda-Flores2006), Behboodian et al. (Reference Behboodian, Dolati and Úbeda-Flores2007) and Taylor (Reference Taylor2007) all propose multivariate extensions of the above list.
3.3. Pearson's rho
As hinted at above, there are a range of measures of association, not all of which can be defined as measures of concordance as described in section 3.2. The first of these is Pearson's product-moment correlation coefficient, also known as Pearson's rho (ρ) and the linear correlation coefficient. This is the most commonly used measure of correlation, and it is defined as:
The linear correlation coefficient can take any value between minus one and one, with ρ = −1 indicating perfect negative correlation, ρ = 1 indicating perfect positive correlation and ρ = 0 indicating that the variables are uncorrelated.
The sample version of Pearson's rho, $$${{\hat{\rho }}}$$$, can be calculated by applying (6) to variances and covariances calculated from a sample of data. The linear correlation coefficient is, therefore, easy to calculate either from a sample of data or when given population measures. It is also the natural measure of dependence in jointly elliptical distributions such as the normal and t-distributions, where the level of dependence is essentially defined by the linear correlation coefficient. However, it has a number of limitations.
First, the linear correlation coefficient between two variables does not even exist if one variable has infinite variance, as is the case with some fat-tailed distributions. Furthermore, a linear correlation of zero does not necessarily imply that two variables are independent and whilst the linear correlation coefficient does not change under linear transformations of the underlying variables, non-linear monotonic transformations – such as taking the logarithm of one variable – will change the linear correlation coefficient. This means that, under Scarsini's axioms, it is not a measure of concordance. The issue underlying most of these problems is that the linear correlation coefficient depends on the marginal distributions of the variables, not just their copula. Indeed, the linear correlation coefficient only truly describes the relationship between variables if they are jointly elliptical.
3.4. Spearman's rho
A way of dealing with the issues caused by correlation depending on absolute values is to use measures of rank correlation instead. These depend solely on the position – or rank – of the observations and not on their values. Two of the most commonly used rank correlation coefficients are Spearman's rho (ρS) and Kendall's tau (τ), both of which conform with all of Scarsini's axioms.
The standard version of Spearman's rho is a bivariate measure. It is defined as the linear correlation coefficient of the ranks of the observations. If RX,t and RY,t are the ranks of the tth observations of X and Y respectively where t = 1,2, … ,T, then it can be shown that the sample version of Spearman's rho, $$${{\hat{\rho }}_S}$$$, can be calculated as:
The population version is given in terms of copulas. The covariance is given and, since the variance of the uniform distribution is one-twelfth and the linear correlation coefficient is the covariance divided by the standard deviation of each of the two variables, the following definition can be derived:
As well as being defined as the linear correlation coefficient of the ranks, Spearman's rho is also closely linked to the linear correlation coefficient for some distributions. Hult & Lindskog (Reference Hult and Lindskog2002) show that for the normal distribution (but not other elliptical distributions) the two measures are related as follows:
There also exist a number of multivariate versions of Spearman's rho, all of which reduce to (8) for two dimensions. These give a single value that measures the strength of the relationship between any number of variables. For example, Wolff (Reference Wolff1980) proposes:
whilst Ruymgaart & van Zuijlen (Reference Ruymgaart and van Zuijlen1978) propose:
In each case N is the number of dimensions, u represents u 1,u 2, … ,uN, a vector of length N and the term [0,1]N means that N integrals (or a single N-dimensional integral) must be evaluated from the range zero to one. Unfortunately, it is not so straightforward to derive sample versions of these two population formulas, meaning that any measure based on simulated or observed data needs to be evaluated numerically – in other words, by simulating a large number of observations, calculating the shape of the copula distribution function and evaluating one of the formulas above.
3.5. Kendall's tau
Kendall's tau can be calculated in two dimensions from the number of concordant and discordant pairs in a set of observations. If (X 1, Y 1) and (X 2, Y 2) are two pairs of observations, then the pairs are concordant if the signs of X 1−X 2 and Y 1−Y 2 are the same (that is, both positive or both negative); otherwise the two pairs are discordant. The formula for this sample version of calculating Kendall's tau is:
where Tc and Td are the numbers of concordant and discordant pairs respectively, with Tc + Td = T. As with Spearman's rho, the population version of Kendall's tau is expressed in terms of copulas. In two dimensions, the expression is:
where C(…) is the copula density function whose relationship with the copula is analogous to that of a probability density function to a probability distribution function. This means that such an expression is difficult to evaluate using an empirical copula, since such functions “jump” each time an observation is added, making it difficult to determine the density. The expression in (13) represents the probability of concordance less the probability of discordance. Hult & Lindskog (Reference Hult and Lindskog2002) also show that, for an elliptical copula such as the Gaussian or t:
A multivariate version of Kendall's tau is proposed by Joe (Reference Joe1990):
When N = 2 this reduces to (13). As with the multivariate versions of Spearman's rho, there is no obvious expression for a multivariate sample statistic, suggesting that a numerical approach must be used.
4. Parametric and Non-Parametric Approaches
At this point, it is worth making an important distinction between parametric and non-parametric approaches in relation to measures of association, a distinction which is equally valid when considering measures of extreme co-movement.
The parametric approach is used if the measure of association is defined by the parameters of the joint distribution or copula. For example, if two variables are assumed to have a joint normal distribution with a linear correlation of 0.7, then no simulations are needed to determine that the correlation between the observations – given enough simulations – will be 0.7. Similarly, for a Gumbel copula with a parameter value of 4, Kendall's tau will be 0.75 and no further calculations are needed.
However, if the relationship between two variables is more complex – for example, if both are outputs from some econometric model – then the only way to measure the correlation is to carry out simulations and to determine the observed level of correlation. In this case, it will often be impossible to determine the “true” underlying level of association. This is the non-parametric approach, and it means that the statistic must be calculated using either a formula for the sample version of the statistic or, particularly for multivariate statistics, using a numerical approximation to evaluate an integral. Numerical and other approaches are also used to calculate some of the statistics of extreme co-movement described below.
5. Extreme Co-Movement
5.1. Overview
The above measures of correlation give an indication of the overall level of association between two variables. However, the way in which the strength of association varies across the range of the variables is determined by the type of copula joining these variables. This means that it is possible for two pairs of variables to have the same overall level of association (as measured by some correlation coefficient), but for the correlation in the tails to be very different. This is shown in Figure 1. The parameters for each of these Archimedean copulas is consistent with a Kendall's tau of 0.75, implying that the broad level of association is the same for each copula. However, the association for particular ranges clearly differs greatly. For this reason, it is important to consider measures of extreme co-movement, of which there are several.
5.2. Conditional Correlation Coefficients
One commonly-used approach to investigating the strength of the relationship in the tails of a distribution is to use conditional correlation. This calculates some correlation coefficient for a subset of the full data. It mitigates the problem of copulas on routes other than the main diagonal by including a finite portion of the distribution rather than relying on calculation at the limit.
Graphical representations of conditioning are shown in Figure 2, with the shaded regions indicating which observations are included. The first of these conditions on Y ∈ K, that is, includes only those observations for which Y takes particular values of K. The second looks at only observations where Y < k. Finally, the diagram on the right includes only those observations where both X and Y < k.
One potential complication with conditional correlations is that the boundary at which the correlations are calculated is subjective, although it should be straightforward to agree a limited number of “common” boundary levels, in the same way that widely-used limits are used for tests of statistical significance.
5.2.1. Linear Correlation Coefficient
A number of measures have been calculated. Some of the expressions are quite involved, but they all aim to create mathematical representations of the principles of conditional correlation coefficients described above. Malevergne & Sornette (Reference Malevergne and Sornette2006) start by proposing a conditional correlation coefficient based on the linear correlation coefficient. This involves using K, some subset of Y:
If X and Y have a joint normal distribution with a linear correlation coefficient ρ, then Boyer et al. (Reference Boyer, Gibson and Lauretan1997) show that:
Of more interest is the case where K is not just a subset of Y, but is equivalent to the values of Y above or below some level, k. For example, if the condition Y ∈ K is equivalent to Y > k and if for simplicity var(Y) = 1, then the upper conditional correlation coefficient ρ U(k) is given by:
Because the normal distribution is symmetrical, a similar formula can be defined for Y < k. Malevergne & Sornette (Reference Malevergne and Sornette2006) show that when k is large the following is a good approximation for ρ U(k):
Note that since k is on the denominator of the right hand side of (19), the conditional correlation coefficient tends to zero for very large values of k.
For the t distribution, Malevergne and Sornette show that the conditional correlation coefficient ρ (K) is given by:
Looking at the more interesting case where Y > k, the exact expression for the coefficient ρ U(k) is more involved for the t distribution. However, Malevergne & Sornette (Reference Malevergne and Sornette2006) show that if k is large, the following is a good approximation for a bivariate t distribution with y degrees of freedom:
Note that k does not appear in the right hand side of this equation – in other words, as k tends to infinity, the conditional correlation coefficient tends to the constant on the right hand side of (21).
If X and Y are defined as extremes of the dataset, for example X > k and Y > k, then the result is the (upper) tail linear correlation coefficient. For large k, Malevergne & Sornette (Reference Malevergne and Sornette2006) show that for the bivariate normal distribution the following is true:
If this expression is taken to the limit, then the result is clearly zero. Since the relationship is only exact at the limit, this means that the expression gives only approximate results for finite values of k.
As noted earlier, one issue with the linear correlation coefficient is that it is only an accurate measure of the level of association between two variables if they are jointly elliptical. This remains true for the tail linear correlation coefficient. As such, this measure does not comply with Scarsini's axioms of concordance. Of broader interest are tail rank correlation coefficients, which do.
5.2.2. Spearman's Rank Correlation Coefficient
Malevergne & Sornette (Reference Malevergne and Sornette2006) note that the conditional equivalent of Spearman's rho can be calculated in the same way as the full-sample version, by replacing the observations in the calculation of the conditional linear correlation coefficient with their ranks.
Schmid & Schmidt (Reference Schmid and Schmidt2007) show that the integral form of Spearman's rho in (10) can be adapted to give a conditional version of Spearman's rho, the lower tail version of Spearman's rho for 0 < k ≤ 1 being defined in N dimensions as:
Restricting the number of dimensions to two gives the following expression:
It is also possible to define this tail correlation in terms of conditional copulas. Charpentier (Reference Charpentier2003) defines the lower conditional copula, C*(u, v), as:
where:
and the values at which the functions are evaluated are such that 0 ≤ u ≤ u* and 0 ≤ v ≤ v*. The upper conditional copula, $$${{\bar{C}}^\ast} (u,v)$$$, is defined in terms of survival copulas as:
where:
Charpentier then shows that the bivariate lower tail version of Spearman's rho is given by:
where u* = v* = k. This approach can be extended to higher dimensions using (10) or (11).
5.2.3. Kendall's Rank Correlation Coefficient
Venter (Reference Venter2002) defines a bivariate cumulative tau, which is the lower tail version of Kendall's tau, in terms of integrals, as:
This suggests that a multivariate lower tail version of Kendall's tau could be calculated as:
In all of the cases involving integrals these can be replaced with an empirical copula and evaluated numerically, although the difficulties of evaluation copula densities empirically have already been noted. However, a simpler approach to calculating the tail equivalent of Spearman's rho or Kendall's tau is simply to calculate the statistic using only the observations in the tail. This reflects the fact that the statistics described above are population rather than sample measures.
5.3. Adjusting Tail Correlations for the Proportion of Observations in the Tails
As mentioned above, conditional correlation coefficients report on the strength of relationships in the tails. However, they do not necessarily report on the importance of such relationships. For example, strong positive relationships in the lower left- and upper right-hand corners are of little importance if a far greater proportion of the distribution is found in the upper left- and lower right-hand corners. One way of characterising the extent to which this occurs is “arachnitude”. This measure, proposed by Shaw et al. (Reference Shaw, Smith and Spivak2010), seeks to measure the extent to which extreme observations are found off the diagonal line marking the main underlying dependency. For example, if a Q-Q (quantile-quantile) plot is drawn of set of observations for two variables with a strong positive correlation then most observations would be expected close to a diagonal line running from the bottom left to the top right of the chart. In particular, extreme observations would be expected in the bottom left and top right corners. However, in some cases a significant proportion of observations might also be seen in the top left and bottom right corners. This could indicate that an extreme value for one variable was likely to result in an extreme value for the other variable, but that the direction of this dependence was variable. Such relationships are seen in t-distributions with low degrees of freedom. One way to measure the degree to which this is true is to calculate the arachnitude between two variables. If, as described earlier, Spearman's rho can be calculated as the linear correlation coefficient between RX and RY, then the arachnitude can be calculated as the linear correlation between (2RX−1)2 and (2RY−1)2.
This is similar in concept to the measure described by Blomqvist (Reference Blomqvist1950) known as Blomqvist's beta, βN. This is defined as the the number of observations in the top left and bottom right quadrants (N 1 and N 3) less those in the top right and the bottom left quadrants (N 2 and N 4), taken as a proportion of all observations:
This issue is important because it implies that even if the relationship between two variables measured in the bottom left hand corner is strong, this is not necessarily a high risk if only a low proportion of the distribution is found in this corner relative to the top left and bottom right corners.
This suggests that measures of tail correlation should be modified to take this factor into account. One way of doing this would be to weight the measure of tail association calculated in the bottom left corner by the proportion C/(A + B + C), where A, B and C are the volumes of the distribution covering the areas shown in Figure 3. These areas are the parts of the distribution bounded by the quantiles of interest. For example, if the conditional correlation coefficient is being calculated for both X and Y < k, then the area in the bottom left of the distribution then then square in the bottom left corner is bounded by the same limits. The two triangles in the top left and bottom right together have the same area as this square.
An analogous measure is needed for upper tail association, in case losses are being treated as positive rather than negative values. This is important for asymmetric copulas such as the Gumbel or Clayton forms. Both lower and upper measures of these weighted correlation measures conform with Scarsini's axioms.
5.4. Coefficient of Tail Dependence
Weighting a coefficient of tail correlation gives one way of allowing for the proportion of a joint distribution present in the tails. However, a more direct approach is to simply measure such a proportion. This is, in fact, a more relevant measure: the direction of a tail relationship is far less important than the likelihood of joint observations being present in the tail.
One of the most fundamental measures of such a probability is the coefficient of tail dependence, λ. This is more commonly related to either the upper (right) or lower (left) tail such that extreme positive and extreme negative observations are measured separately. The two measures used are the coefficient of upper tail dependence, λU and the coefficient of lower tail dependence, λL. These are most commonly defined as:
where 0 ≤ k ≤ 1, and:
as described by Sibuya (Reference Sibuya1960) and others. The limit in (35) means zero is approached from above, whilst the limit in (36) means one is approached from below. The copula $$$\bar{C}( \ldots )$$$ in (34) is a survival copula, measuring the joint probability of survival beyond the quantiles in parentheses. In both cases, if the result is between zero and one then tail dependence exists whilst if the result is zero there is no tail dependence.
It is helpful to consider the rationale behind this statistic. The copula C(k, k) gives the proportion of observations in the bottom left corner of the unit square with upper bounds of F(x) = k and F(y) = k. If two variables are perfectly correlated, then the proportion of observations falling within these bounds will be k. This is the maximum value of the copula. Therefore, dividing the copula by k gives a level of tail association that falls between zero and one. Taking k to the limit of zero gives the level of association at the limit.
The coefficient of tail dependence, and related statistics, do not conform with Scarsini's axioms – but they are not meant to. Those axioms are intended to give information on the strength of relationships which can be “positive”, “neutral” and “negative”; however, tail dependence (as contrasted with tail association) is measured on a scale ranging from “completely absent” to “completely present”, meaning that the scale should run from zero to one, rather than minus one to plus one. Values between zero and one signify varying levels of dependence.
Earlier, is was highlighted that two approaches that can be used to evaluate measures of association: parametric and non-parametric. The same is true of coefficients of upper and lower tail dependence. If the copula is specified parametrically, the coefficients can often be calculated analytically. Alternatively, if only samples are available – either from historical data or from stochastic simulation – then the coefficients can be estimated non-parametrically.
A number of results have been derived for the coefficients of tail dependence for various copulas. Embrechts et al. (Reference Embrechts, McNeil and Strauman2002) show that for the t copula the coefficients of lower and upper tail dependence (which are identical here) are equal to:
where tγ is the cumulative distribution function for the t-distribution with γ degrees of freedom. Hult & Lindskog (Reference Hult and Lindskog2002) generalise this result for all elliptical copulas. The expression in (35) also implies that for the Gaussian copula – which is given as γ tends to infinity – the coefficients of upper and lower tail dependence are zero. Unfortunately, this means that the coefficients of tail dependence cannot be used to determine the risk of extreme co-movement for finite values if variables are assumed to have a joint normal distribution. More accurately, it is impossible to discriminate between the levels of risk for variables with high and low correlations observed combinations of variables. In these cases risks may still exist, even if the coefficients of tail dependence are zero.
McNeil et al. (Reference McNeil, Frey and Embrechts2005) also give the coefficients of lower and upper tail dependence for a range of Archimedean copulas. These are given with the results for the Gaussian and t copulas in Table 1.
Source: McNeil et al. (Reference McNeil, Frey and Embrechts2005); the forms of these copulas are also given in McNeil et al. (Reference McNeil, Frey and Embrechts2005)
Schmid & Schmidt (Reference Schmid and Schmidt2007) use the expression in (29) to define the following alternative measure of (lower) tail dependence:
For two dimensions, this simplifies to:
This alternative measure of tail dependence has the feature that it considers the asymptotic direction of the relationship in the tail, not just the weight of observations. However, in terms of extreme risk it is this weight which is more important.
5.5. Estimating the Coefficient of Tail Dependence
It is impossible to calculate exactly the coefficients of tail dependence using non-parametric approaches, since it is by definition a limiting measure. However, it is possible to estimate the coefficients of tail dependence. Fischer & Dörflinger (Reference Fischer and Dörflinger2006) propose six non-parametric estimators of λU, denoted $$$\hat{\lambda }_{U}^{{[{{i}}]}} $$$, based on empirical copulas C(k, k), such as those calculated from observations or simulated values using (4) or (5) . They first order the variables defining a series of T bivariate observations from (XT, YT) to such that:
The various measures are then defined as follows:
Fischer & Dörflinger (Reference Fischer and Dörflinger2006) state that using a value of $$$t\, \approx \sqrt T $$$ seems to be appropriate in these two cases. The third estimate, $$$\hat{\lambda }_{U}^{{[{{\rm 3}}]}} $$$, is the ordinary least squares estimator in the following expression:
where s = 1,… ,t, whilst $$$\hat{\lambda }_{U}^{{[{{\rm 4}}]}} $$$ corresponds to the α which minimises:
where $$$\hat{C}(u,v)\, = \,\alpha \,\min (u,v)\, + \,({\rm{1 - }}\alpha )uv$$$. The fifth estimate, $$$\hat{\lambda }_{U}^{{[{{\rm 5}}]}} $$$ results from a least-squares estimation of the equation:
where again s = 1,… ,t, whilst:
Fischer & Dörflinger (Reference Fischer and Dörflinger2006) also show that $$$\hat{\lambda }_{U}^{{[{{\rm 5}}]}} \, \approx \,\hat{\lambda }_{U}^{{[{{\rm 6}}]}} }} $$$.
For $$$\hat{\lambda }_{U}^{{[{{\rm 3}}]}}$$$ to $$$\hat{\lambda }_{U}^{{[{{\rm 6}}]}}$$$ it is often worth adjusting the range of data covered such that s = j, … ,t where j > 1 rather than 1, … ,t. This is because, since the Copulas for the highest and lowest quantiles are calculated using so few observations that they can be volatile. This can be seen in Figure 4, in relation to the coefficient of lower tail dependence.
Consider, for example, $$$\hat{\lambda }_{U}^{{[{{\rm 3}}]}}$$$, the lower tail equivalent of $$$\hat{\lambda }_{U}^{{[{{\rm 3}}]}}$$$. Values of C(k, k) can be calculated using (5) based on daily UK and US equity returns for the ten years to 7 December 2010. After excluding the first ten values of k where the low number of observations gives unreliable results, we have fitted a line to the next 90 observations. This gives a result of $$$\hat{\lambda }_{U}^{{[{{\rm 3}}]}} \, = \,{\rm{0}}{\rm{.27}}$$$. Figure 4 shows that it is straightforward to adapt the techniques described above to estimate the coefficient of lower tail dependence.
Frahm et al. (Reference Frahm, Junker and Schmidt2005) also propose an additional non-parametric estimator for the coefficient of upper tail dependence:
where Us and Vs are random observations sampled from the copula C(…). This final measure requires the assumption that the copula approximates an extreme value copula, discussed later.
Another approach to calculating coefficients of tail dependence uses tail copulas. Schmidt & Stadtmüller (Reference Schmidt and Stadtmüller2006) start by defining a lower tail copula as:
and an upper tail copula as:
The lower tail dependence coefficient can then be defined as:
and the upper tail dependence coefficient as:
Both coefficients can be estimated by estimating empirically the tail copulas ΛL and ΛU. For example, if $$$R_{{X,n}}^{{(k)}} \,{\rm{and}}\,R_{{Y,n}}^{{(k)}} $$$ are the ranks of the tth observations of X and Y respectively where t = 1,2, …, k, then one empirical estimator of ΛU(x,y) is:
where a is an integer between one and k is chosen by the user and I is an indicator function which is equal to one if the conditions in the parentheses are true and zero otherwise. This then leads to the following estimator:
Frahm et al. (Reference Frahm, Junker and Schmidt2005) note a number of pitfalls with both the parametric and non-parametric approaches to calculating coefficients of tail dependence. They note that the use of parametric margins instead of empirical margins can lead to model uncertainty and, ultimately, an incorrect interpretation of the dependence structure. They state that failing to test or even ignoring the quality of the marginal fit can cause the underlying dependence structure to be dramatically misinterpreted. The issues is even more serious in the tails, and Frahm et al. (Reference Frahm, Junker and Schmidt2005) note that fitting a parametric model to tail dependence functions is not robust since models can be affected by a small number of unusual or incorrect data values.
In relation to parametric calculations, Frahm et al. (Reference Frahm, Junker and Schmidt2005) note that it is difficult to conclude whether two variables are tail dependent or not from a finite number of observations. They point out that one can always specify thin-tailed distributions which produce sample observations suggesting heavy-tails even for large samples, and conversely that one may create samples which seem to be tail independent but are realisations of a tail dependent distribution. They conclude that non-parametric estimation a tail dependence function is very inefficient due to the volume of data required.
Küppelberg et al. (Reference Küppelberg, Kuhn and Peng2008) seek to solve these issues by using a semi-parametric approach. They do this by fitting an elliptical copula only to the tails of the joint distribution. They calculate Kendall's tau from the data and, using (16), derive the linear correlation coefficient for the copula. The shape of the joint tail is determined by a shape parameter, which is most recognisable as the number of degrees of freedom for a t copula. This allows the coefficients of upper and lower tail dependence to be determined.
Although all of the analysis so far has considered the coefficient of tail dependence as a bivariate measure, there is no theoretical reason that it cannot be extended into higher dimensions. For an N-dimensional copula the maximum number of observations in an N-dimensional hypercube starting at the origin with sides length k is still k, so the following formula can be derived:
and
where $$$\bar{k}$$$ represents 1−k, 1−k, …, 1−k.
However, one practical issue is that any problems of estimation are magnified as the number of dimensions increases – the so-called “curse of dimensionality”. Simply stated, this means that as the number of dimensions or variables increases, the proportion of observations in the joint tail of the distribution falls exponentially, meaning that there fewer and fewer observations available for parameterisation.
5.6. Extreme Value Copulas
An extension of the semi-parametric approach used by Küppelberg et al. (Reference Küppelberg, Kuhn and Peng2008) to calculate the coefficient of tail dependence is to use extreme value theory (EVT) to model the shape of the tail for extreme values. EVT requires an assumption that one is so far into the tail of a distribution that simplifying assumptions can be made about its shape. In particular, it allows the use of distribution functions that can be evaluated exactly rather than numerically – in other words, the functions have a simple closed form rather than being given in terms of the integral of a density function that can be evaluated only approximately. Some information on extreme value theory is given in Appendix 2.
Juri & Wüthrich (Reference Juri and Wüthrich2003) use the Generalised Pareto Distribution to fit extreme value copulas, using the results to allow them to determine coefficients of tail dependence. Picklands (Reference Picklands1981) also uses extreme value theory to create a bivariate Fréchet-type GEV distribution by linking two marginal Fréchet-type GEV distributions using a function known as Pickand's dependence function, A(W). This has the following form:
Frahm et al. (Reference Frahm, Junker and Schmidt2005) show that this function can be used to derive the coefficient of upper tail dependence:
Khoudraji (Reference Khoudraji1995) describes Kendall's tau and Spearman's rho in terms of integrals based on Pickand's dependence function, but there are a number of non-parametric approaches to estimating the function itself including versions by Pickands (1981), Deheuvels (Reference Deheuvels1991) and Capéraà et al. (Reference Capéraà, Fougères and Genest1997).
5.7. Stable Tail Dependence Function
An alternative measure of tail dependence is given by the stable tail dependence function, which combines the coefficient of tail dependence with the values of the marginal distribution functions to give a value that can be calculated over a wider range of values. In particular, if the joint tail of two variables follows a power law distribution, for example if they are linked by a t copula, then Embrechts et al. (Reference Embrechts, McNeil and Strauman2002) show that a limiting function known as the stable tail dependence function exists. Haug et al. (Reference Haug, Klüppelberg and Kuhn2009) define this measure for the upper tail only. If this measure is lU(u, v), then it can be defined as:
where $$${{\hat{\lambda }}\limits}_U}(u,v)$$$ is the estimate of λU calculated at the quantiles u and v. Haug et al. (Reference Haug, Klüppelberg and Kuhn2009) note that for an elliptical distribution λU = λL. If lU(u, v) ≈ lL(1−u, 1−v), then this implies that:
If the joint tail of two variables follows a power law distribution, then both lL(u, v) and lU(u, v) should remain stable in the tails over a range of values. As a result, an average value of lL(u, v) or lU(u, v) over a range of values of u and v in the tail could be used as a measure of tail dependence.
5.8. Malevergne-Sornette Coefficient of Tail Dependence
As Embrechts et al. (Reference Embrechts, Lindskog and McNeil2003) note, the Gaussian copula – and, by extension, the bivariate Gaussian distribution – has no upper or lower tail dependence for any correlation less than one. This is unhelpful. Whilst there may be no relationship between two variables with such a copula at the limit, there will still clearly be a significant degree of co-movement in extreme scenarios if ρ is high. However, λL will be the same whether ρ = 0.1 or ρ = 0.9: it will be zero. Some measure that distinguished between these different values of ρ would be preferable.
Malevergne & Sornette (Reference Malevergne and Sornette2006) introduce an alternative measure of upper tail dependence, $$${{\bar{\lambda }}_U}$$$, which they define as:
This measure is based on a statistic, η, defined by Coles et al. (Reference Coles, Heffernan and Tawn1999). In particular, $$$\bar{\lambda }\, = \,{\rm{2}}\eta {\rm{ - 1}}$$$. The expression in (9) also suggests that an analogue for lower tail dependence can be defined as:
Malevergne & Sornette (Reference Malevergne and Sornette2006) show that if λU = 0, then $$${{\bar{\lambda }}_U}$$$ will take a value between minus one and one. This appears to be a useful advance; however, they also show that if if λU > 0 – as is the case for t copulas – then $$${{\bar{\lambda }}_U}\, = \,{\rm{1}}$$$. This means that whilst their measure is useful for discriminating between levels of tail dependence for Gaussian copulas (where they show that $$${{\bar{\lambda }}_U}\, = \,\rho $$$), it cannot do the same for t copulas, which will all have the same level of tail dependence under the Malevergne & Sornette (Reference Malevergne and Sornette2006) measure. In both cases, this is unlike the “traditional” coefficient of tail dependence.
5.9. Coefficient of Finite Tail Dependence
Even ignoring the lack of discrimination between Gaussian copulas, the coefficient of tail dependence is not without problems. The issues with both parametric and non-parametric estimation have been discussed. However, as a limiting measure it actually concentrates on infinitely extreme co-movement. This is unhelpful when finite co-movement can cause enough problems.
The stable tail dependence function helps, but is only valid for power law distributions – that is, those with fat tails. This means once more that the t copula is included, but the Gaussian copula is excluded.
One potential solution is to use a finite alternative. The coefficient of tail dependence is a measure calculated at the limit. However, it is possible to calculate an identical measure for values of k between zero and one. Such a statistic – which we term the coefficient of finite tail dependence – also has a lower version:
and an upper version:
It can be applied over any number of dimensions for the lower version:
and the upper version:
Such a measure gives a non-zero coefficient for variables linked by both Gaussian and t copulas, allowing discrimination between a wider range of dependencies. The values for selected parameterisations of Gaussian and t copulas and for various values of k are given in Appendix 3. These were calculated using the statistical package R, and the code used is given in Appendix 4.
It is simple to evaluate the coefficients of finite tail dependence for Archimedean copulas. For example, consider the bivariate Gumbel copula discussed earlier with the single parameter α equal to 2. For k = 0.05 the coefficient of lower tail dependence can be calculated simply as:
It is also easy to evaluate this measure from simulated data – for the coefficient of finite lower tail dependence, simply calculate the empirical copula using the kth quantile of each marginal distribution and divide the result by k.
5.10. Conclusion
Measures of correlation can be adapted to give the association only in the joint tails of distributions. Rank correlations given more robust results since they do not rely on the marginal distributions having any particular characteristics. However, these statistics ignore the weight of observations in the tail, looking only at the directions of relationships. It is possible to allow for this, but it makes more sense to look directly at the weight of the joint distribution in the tail – and this means looking at tail dependence.
Traditional measures of tail dependence are evaluated at the limit. This gives them some unfortunate properties, in particular an inability to discriminate between certain copulas. The answer is to use the coefficient of finite tail dependence, which is evaluate at a finite quantile, k. As with other measures that are not calculated at the limit, the choice of k is subjective; however, if finding a statistic to give a sensible comparison requires the use of professional judgement, then this is a price worth paying. As such, this is the measure we recommend for measuring the extent to which pairs or groups of variables are linked at the extremes.
6. Extreme Losses
6.1. Overview
So far, the analysis has concentrated on the likelihood of jointly extreme observations from two or more variables. The measures discussed give useful insights into the concentrations of risk that can occur between different variables. However, in many cases financial institutions are more concerned when the impact of two or more variables causes losses exceeds a particular level. This could be when all variables showed moderately bad losses, or it could be when the losses from several variables were small but the losses from one variable were very bad. This suggests that for financial firms it is also important to consider risk of extreme losses arising from two or more variables as well as the degree to which extreme co-movement exists.
One way of assessing the risk of extreme loss is through the concept of ruin lines, discussed below. However, this concept raises the question of what is meant by loss, in particular whether it relates to an existing set of exposures or a potential future trade-off, as described below. The definition of loss is covered in the following sub-section.
6.2. Ruin Lines
A starting point for this type of measure is the ruin line. This is the line covering the combination of results from various lines of business or risk types that will result in a loss beyond a certain level. Consider, for example, a profit defined as P where a larger negative value of P means that the profit is smaller or the loss is greater. If there are two lines of business whose profits are X and Y, then the ruin line is defined as:
If X + Y < P, where P is negative, then the loss is regarded as excessive. This too can be extended to higher dimensions, giving a ruin plane or even a ruin hyperplane defined for N profits, X 1, X 2, … ,XN as:
This loss can be defined as an absolute amount or as some proportion; it is not, though, defined in quantile terms. The reason for this is that the total loss is a function of the individual loss amounts, not the percentile in which those losses lie. This means that when considering the distribution of losses beyond the ruin line, the marginal distributions for the variables involved matter as much as the copula between those variables. As a result, ruin line analysis is most easily carried out in terms of the observations themselves (x, y) rather than their distribution functions (F(x), F(y) or u, v). This does not mean that distributions should (or should not) be fitted to the observations; it only means that it is the value of observations that matters rather than their order, and it is the order that is essentially given by a distribution function. However, it is interesting to consider the shape of the distribution of losses as defined by a ruin line in terms of a distribution function.
Consider, for example, the previously mentioned profits X and Y and assume that they are correlated and normally-distributed with means μX and μY, standard deviations σX and σY, and a linear correlation coefficient of ρ. If the density of the losses is shown by contour lines, the area or excess loss can be shown as the shaded area in Figure 5. If this loss is expressed in terms of distribution functions, as shown by the shaded area in Figure 6, it can be seen that the area covered has a very different shape to that measured by tail correlations. This does not mean that ruin lines are “better” than tail correlations, but it does mean that they measure something different: the risk of extreme loss, more commonly known as the probability of ruin, rather than the likelihood of extreme co-movement. The probability of ruin can be regarded as the reciprocal of the value at risk, VaR. Whilst the VaR essentially gives the level beyond which loss occurs with a particular probability, the probability of ruin gives the probability that losses exceed a certain level.
As mentioned earlier, the shape of the area of loss given in terms of copulas will change depending not only on the copula but also on the marginal distributions. This means that it makes more sense to evaluate the risk of loss directly from observations, or from the joint distribution if this is known.
If two variables follow a bivariate normal distribution then it is straightforward to evaluate the probability that losses will fall below the ruin line. This can be done by taking advantage of the fact that the sum of jointly normally distributed variables is itself normally distributed. For example, for the profits X and Y with means μX and μY, standard deviations σX and σY, and a linear correlation coefficient of ρ, the sum of X and Y has a mean of μZ = μX + μY and a standard deviation of $$${{\sigma }_Z}\, = \,\sqrt {\sigma _{X}^{{\rm{2}}} \, + \,\sigma _{Y}^{{\rm{2}}} {\rm{ - 2}}\rho {{\sigma }_X}{{\sigma }_Y}} $$$. This means that if the ruin line is defined as P = X + Y, then the probability of ruin is given by:
The same formula can be used for combinations of a greater number of risks. If ρxm, xn is defined as the correlation between Xm and Xn whilst $$${{\mu }_{{{X}_n}}}$$$ and $$${{\sigma }_{{{X}_n}}}$$$ are the mean and standard deviation of risk Xn, with m, n = 1,2, … ,N, then μZ and σZ can be calculated as follows:
An alternative approach is to use affine transformation. This involves turning the distribution from one where the variables have standard deviations σX and σY, and a linear correlation coefficient of ρ to one where the standard deviations are one and the linear correlation coefficient is zero. The transformation that accomplishes this results in the means of the two variables changing to $$$\mu _{X}^{\ast} $$$ and $$$\mu _{Y}^{\ast} $$$, but a new ruin line can also be derived using the same transformation.
If an elliptical distribution can be turned into a spherical one, it can be rotated around its centre, then its centre can be shifted to the origin, as shown in Figure 7. Consider the formula describing P, the proportion of the observations to the left of the transformed ruin line, which is the probability of ruin:
In other words, if you know z the distance from the centre of the transformed distribution to the ruin line, then the proportion of observations below and to the left of the ruin line can be evaluated as $$$\rPhi ( - z)$$$. Exactly the same approach can be used for more dimensions – if the variables of interest have losses X 1, X 2, … ,XN then rotating the hyperplane relative to the hypersphere gives:
To find z, two sets of items are needed:
• the means of the transformed distribution; and
• the parameters of the transformed ruin line (or plane, or hyperplane).
These are found by multiplying the original parameters by the same matrix that transforms the original distribution into one with unit variance and zero correlation: the inverse of the square root of the covariance matrix, C −1/2. The square root of the matrix is found by Cholesky decomposition, and the resulting triangular matrix is then inverted. Post-multiplying the ruin line and a vector containing the original means of the variables by this matrix gives the transformed results. If the transformed N-dimensional ruin line is defined as $$${{\alpha }_0}\, + \,{{\alpha }_1}{{x}_1}\, + \,{{\alpha }_2}{{x}_2}\, + \, \cdots \, + \,{{\alpha }_N}{{x}_N}\, = \,0$$$ and the transformed means are $$$\mu _{{{{X}_1}}}^{\ast}, \mu _{{{{X}_2}}}^{\ast}, \ldots, \mu _{{{{X}_N}}}^{\ast} $$$, then z can be calculated as:
The main reason for describing this more complicated approach is that it can be used not only with jointly normal variables but also with the multivariate t distribution. In this case, the matrix used is the matrix of co-spread parameters rather than the covariance matrix. The distinction between co-spread and covariance is subtle. It is most easily appreciated in relation to a single variable in terms of the difference between the spread parameter and the variance. The variance is a measure calculated from the distribution which expresses the volatility of a series; the spread, on the other hand, is the parameter used in the distribution that determines this volatility. For the normal distribution, spread and variance are one and the same. However, in the univariate t distribution, for example, the spread parameter is σ 2 whilst the variance is σ 2v/(v−2), where v is the number of degrees of freedom). Co-spread and co-variance are related by analogy, so the covariance matrix can be denoted [v/(v − 2)]C.
For non-elliptical joint distributions, including those where the marginal distributions and copulas are chosen separately, the proportion of observations to the left of and below the ruin line can be calculated by carrying out a large number of simulations and counting the proportion of observations for which $$$\nolimits{\sum}\limits_{n\, = \,1}^N {{{X}_n}\, \lt \,P} $$$. If a subscript t is added where t = 1,2, … ,T, denoting the time of each observation, then the probability ofruin calculated from simulations can be described as:
where I(…) is an indicator variable taking a value of 1 if the expression inside the parentheses is true and zero otherwise.
A measure closely related to the probability of ruin is the economic cost of ruin. This gives the average value of losses given that the aggregate loss is below and to the left of the ruin line. This is helpful as it gives an indication not only of the likelihood of extreme loss, but the severity. If the probability of ruin is regarded as the reciprocal of VaR, then the economic cost of ruin can be regarded as the reciprocal of the tail value at risk, TVaR, the average loss beyond the VaR. As with the probability of ruin, the economic cost of ruin can be applied to actual risk exposure or to help determine risk exposures. It can also be used for any number of risks in combination, from just pairs of risks to all risks faced.
If multivariate normal variables are combined to give a single normal variable with a mean of μZ and a standard deviation of σZ, then the economic cost of ruin is given by:
where φ(…) is the normal density function. This expression essentially takes the density of the loss distribution at the point of loss as a proportion of the probability of loss. This is then stretched by the standard deviation and shifted to the mean of the loss distribution. Affine transformations cannot be used to give simple results for the economic cost of ruin; however, the formula for calculation from simulated data is straightforward:
It is worth noting that in practice the ruin line will not always be straight. In particular, if there is an interaction between two risks, for example interest rate and longevity risk, the line will be curved. This is because the impact of an increase in longevity will be greater the lower the interest rate. This principle clearly extends to more risks and higher dimensions.
6.3. Definition of Loss
One issue that has not yet been discussed is the definition of loss. For measures of tail association no definition is needed, since it is only the order of observations that is of interest; however for extreme loss, this definition is crucial.
There are two broad approaches that can be used, the choice depending on the use to which the measure is being put. The first approach is used if the desire is to assess the current level of exposure to risks. In this case, the ruin line parameter L is defined as the critical level of total loss. This can be the maximum loss acceptable from two, three or more, even all, sources. It can also be defined in absolute terms or as a change in value.
The next stage is to define X 1, X 2, … ,XN where N is the number of risks being considered concurrently. Each of these can be regarded as the loss arising from a particular risk. The loss from market risk is clearly given by the distribution of market returns. However, a little thought is needed for the loss from items such as mortality risk. For example, consider the losses from a term assurance portfolio. If the risks under consideration were interest rate risk and mortality risk, simulations could be carried out and the results applied to the portfolio allowing for stochastic variation in interest and mortality rates, but with all other variables changing deterministically. The probability or economic cost of ruin from these two factors could then be assessed by counting the proportion of observations below the ruin line.
7. Communicating Tail Association and the Risk of Extreme Loss
The second approach is used if the desire is to determine the appropriate allocation between different risks. In this case, some notional threshold loss from risk combinations must be set, say £10 per £100 invested for two risks. In other words, for a total investment of £100, somehow spread between two risks, a total loss from the two risks of £10 would be regarded as severe. The distributions of losses for different combinations of risk exposures – in terms of ways of using the £100 of investment – can then be determined. For example, if the threshold loss from two risks is £10 per £100 invested and the two risks being considered are investment in assets 1 and 2, then probability of ruin (for example) can be calculated for combinations of £W 1 and £W 2 = £(100−W 1). In other words, as W 1 increases from £0 to £100, we can see how the probability of ruin – with ruin being defined as total losses greater than £10 – changes. This information can also be combined with other information on the combination of risks such as the expected return. As such, the trade off between risk and return for different values of W 1 can be determined. This means that, for example, the expected return can be maximised for a given probability of ruin. This can, of course, be extended to higher dimensions, and the resulting statistic can be used in the calculation of efficient frontiers. Here, as well, it is important to recognise any non-linearities arising from interaction between the different risks. The communication of the results of calculations discussed earlier is an important consideration. Whilst calculation is helpful, the results are useful only if they can be understood easily by stakeholders. The considerations for communication differ for tail association and extreme loss, so each is dealt with separately.
7.1. Tail Association
When communicating tail association, or any measure of association between variables, there are a number of factors that should be communicated, including:
• The level of association between two variables;
• The extent to which an asset (or liability) provides a match for a liability (or asset); and
• The importance of variables measured.
The level of association between the two variables has already been discussed. However, the level of association between those variables that are assets and those that are liabilities is also important. A high level of association between an asset and a liability is less of a concern than – and may be preferable to – a high level of association between two assets or between two liabilities. Similarly, a high level of tail association between two assets each of which are held to match liabilities is less of a concern than the same level of association between two “return-producing” assets. However, the definition of a “matching asset” is not straightforward – corporate bonds might be held as a matching asset, but a large increase in spreads or an unexpectedly high level of defaults will reduce the effectiveness of such a match to say the least. Credit risk can, of course, be considered separately, but the issue here is a binary one – are the bonds matching or not? This suggests that an element of subjectivity is needed when interpreting the results.
The final item is the importance of the variables being measured. If each variable is an asset constituting less than 1% of the entire portfolio, then a high level of correlation is less likely to be of concern than if each constitutes 10%. Consider a portfolio of assets where the correlation between all pairs of assets is identical. As long as the correlation is less than one, the risk of extreme loss from all constituents of a portfolio of 10 equally-weighted assets will be less than that for a portfolio of 100.
For the examples that follow we consider a stylised final pension scheme with the following characteristics:
• assets are initially equal to liabilities;
• liabilities are initially half nominal and half inflation linked;
• nominal liabilities are assumed to perform in line with the portfolio of assets in Table 2;
• inflation linked liabilities are assumed to perform in line with the portfolio of assets in Table 3; and
• assets are assumed to be invested in line with Table 4.
We propose several ways of displaying the data. The first is a scatter plot, as shown in Figure 8. This is a simplified chart considering only market risk for the assets, and nominal and real interest rate risks for the liabilities. On the vertical axis, the level of association is given – we suggest (for returns) the coefficient of finite tail dependence with k = 0.1. The horizontal axis gives the “importance” of the pair considered. This is measured as the sum of the natural logarithm of the asset weights or asset amounts (the difference being only one of scale). This is equivalent to the logarithm of the multiple of the weights or amounts. The use of both weights means that only relationships between two important risks are regarded as important; taking logarithms ensures that the chart is legible.
In this plot, three shades are used:
• the black dots represent the level and importance of risk between two assets, at least one of which is a return-producing asset (that is, not a matching asset) or between two liabilities;
• the white dots represent the level and importance of risk between an asset and a liability; and
• the grey dots represent the level and importance of risk between two assets both of which are matching assets.
The black dots in the top right corner of such a plot are the most important – these are the ones for which the level of tail risk is highest and also for which the importance of the risk is greatest. Grey and white dots in this area should also be considered, however, since they signify a significant reliance on matching.
The second approach, shown in Figure 9, is a bar chart. The vertical axis is the same as for Figure 8, but the horizontal axis gives the pairs in descending order of finite tail dependence – the number of the pair is shown, which corresponds to the list of pairs given alongside the chart. The level of importance – defined as the sum of the log asset (or liability) weights – is overlaid as a line chart, marked on the second axis. The bars are colour coded in the same way as the points on the scatter chart.
The chart shows the top ten risks by coefficient of tail dependence with all combinations included. However, is it also possible to show a subset of risks, for example all those where one or both items are return-producing assets. It is also possible to show only the most important risks by level of importance instead.
Both of these charts can be used to display risks calculated in higher dimensions – in other words with the results for groups of three, four or even more risks. However, the colour scheme requires modification. The simplest approach is to consider only the relationships between groups of return-producing asset (that is, not a matching asset) or between groups of liabilities (the black series). This covers all the risk combinations that actually matter. A high level of tail association between return-producing assets implies a concentration of risk, as does the same between groups of liabilities. On the other hand, if anything has a high level of tail association with a matching asset, this is a source of comfort rather than concern.
It is also worth recognising that if the number of risks is large, the number of combinations could grow to be unwieldy as the number of risks considered together increases, since this process follows a binomial expansion. For example, with 10 risks there are 10!/2!8! = 45 pairs of risks to consider, but 10!/3!7! = 120 groups of three risks and 10!/4!6! = 210 groups of four risks. For more combinations from more risks the range of scenarios grows considerably.
Another approach is to use two axes of a scatter plot to denote the exposure to each of a pair of risks, whilst using the size, shape and/or colour of the point to convey information about the coefficient of finite tail dependence between the two risks. For example, the size of the point – as measured by its diameter – can be used to represent the value of the coefficient with the a large, white circle representing a coefficient of one and a black dot representing a coefficient of zero, as shown in Figure 10. In other words, the coefficient of tail dependence is reflected in both the diameter of the circles (with a larger circle implying a higher coefficient) and the colour (with a lighter colour implying a higher coefficient, as shown in the scale at the side of the diagram). This means that two aspects of the points – the colour and the size – are being used to convey the same information. The reason for this is to make it easier to distinguish between pairs of risks where more than one pair reflects the same exposure to risks. If only the colour were used, then the point for one pair of risks could completely obscure another; if only the size were used, then having all points the same colour would make it difficult to distinguish between points. For this particular “balloon plot” or “bubble plot”, the exposures in terms of assets are shown as positive and in terms of liabilities, negative on the two axes. For example, point 3 on the chart shows that the coefficient of tail dependence between Europe ex UK and UK equities is relatively low, since the point has a smaller diameter and is darker in colour. The amount allocated to one asset is 30% of the portfolio, and to the other is 10%. In contrast, the coefficient of tail dependence between real and nominal liabilities, shown by point 6 – the exposure to each being 50% – is higher as evidenced by the larger diameter and lighter colour. The risk combinations of interest should therefore be large bubbles in either the top right or bottom left of the chart.
One potential problem with this approach is that even using both colour and size to display the strength of the relationship, the points for two pairs risks with similar exposures the points may overlap. One way of dealing with this is to instead group risks into ranges and use a sunflower plot, with each “petal” representing a pair of risks and the colour of the petal denoting the range in which the coefficient of finite tail dependence lies. Because the petals do not overlap, there is no need to use petal length in addition to colour to denote the strength of the relationship between two variables. Such a plot is shown in Figure 11.
Of course, if colours are used to help display the level of association, they cannot also easily be used to convey information about the extent to which an asset class is held to match a particular risk.
7.2. Extreme Loss
Approaches similar to those used for tail association can also be used to highlight the risks of extreme loss. When the range of risks currently faced is being considered, the probability and economic cost of ruin implicitly allow for the level of exposure to that risk. This means that the bar chart in Figure 9 can be used without the need for a line describing separately the size of the position.
When considering possible risk combinations, Figure 12 shows how pairs of risks could be described; a surface can be used to show the combination of three risks, as shown in Figure 13, but it is difficult to see how higher dimensions could be visualised. However, the main use of the possible risk combinations is in the context of efficient frontiers. The traditional efficient frontier used in mean-variance analysis links mean-variance efficient portfolios. These are portfolios for which no higher expected return is available for a given volatility (or, conversely, no lower volatility is available for a given level of expected return). However, whilst the traditional frontier uses volatility for its measure of risk, there is no reason that some other measure of risk could not be used instead. This means that a measure such as the probability of ruin or the economic cost of ruin could be used as a risk measure instead.
8. Conclusion
This paper has two broad themes: calculation and communication. Within the theme of calculation, there are two aspects of extreme events that can be usefully examined.
The first is the extent to which extreme events in two or more variables occur together. This can be gauged by using measures of tail association. Higher levels of tail association are useful for highlighting the extent to which there are concentrations of risk.
The most popular measure of tail association is the coefficient of tail dependence, which is measured at the limit. However, for some copulas – notably the Gaussian copula – the coefficient of tail dependence is zero for all linear correlation coefficients less than one. Conditional correlations are also used. These look at the correlation between variables over only a small range of values, usually in the tail. However, whilst these give the direction of the association, they do not reflect the importance of the observations in the tail in the context of the copula as a whole. We therefore propose another measure, the coefficient of finite tail dependence. This is similar to the coefficient of tail dependence, but is evaluated at a finite level rather than at the limit.
The second aspect of extreme event that is of interest is the extent to which combined losses from a series of risks result in losses beyond a certain point. This can be measured using ruin lines or, in higher dimensions, planes and hyperplanes. Two measures that are of interest here are the probability of ruin and the economic cost of ruin. Both can be calculated easily for multivariate Gaussian distributions, and the probability of ruin can be calculated relatively easily for the multivariate t distribution. As part of the discussion of the risk of extreme loss it is important to consider the way in which loss is defined, in particular whether it is in relation to the current portfolio of risks or some proposed change.
In relation to the second theme of communication, there are a number of ways of displaying the information on extreme events. The most important aspects of the events to communicate are the likelihood of extreme loss and the importance of that loss in terms of the value of the risks by some measure. This can result in difficulties given the volume of information that must be communicated. The issue is further complicated by the fact that some assets match liabilities – meaning that links between them are of less concern – whilst others do not. A range of approaches are shown that seek to deal with these issues.
Acknowledgement
The authors gratefully acknowledge the funding for this research from the Institute and Faculty of Actuaries.
Appendix 1 – Types of Copulas
Explicit Copulas
There are a number of ways of classifying different types of copula. For parametric copulas, a common distinction is made between explicit and implicit copulas. Implicit copulas take as their starting point u and v. The Archimedean family of copulas is a good example of this type. They use a generator function to transform each univariate cumulative probability, which by definition will be between zero and one, into a number between zero and infinity. The “generated” numbers for each variable are summed and the result is passed through the inverse of the generator function (or pseudo-inverse if certain conditions are not met), which transforms any number from zero to infinity into a number between zero and one. Thus the univariate cumulative probabilities are combined into a joint cumulative probability.
Consider, for example, the Gumbel copula. This has a generator function, ψ, equal to (−lnu)α where α is a parameter that characterises the strength of the association. If u is 0.05, v is 0.01 and α is 2, then the joint distribution function using a Gumbel copula is:
As mentioned earlier, this can be extended to more than two variables.
The choice of Archimedean copula determines the shape of the relationship between variables. However, the values of the small number of parameters – often only one – used in each Archimedean copula determine the strength of the relationship between the variables. As such these are often closely linked to one or more measures of correlation – for example, Kendall's rank correlation coefficient, τ, (discussed below) is equal to 1−(1/α) for the Gumbel copula. This means that for the above example, τ = 0.5.
Fundamental Copulas
An important subset of explicit copulas is the group of fundamental copulas. These described three fundamental relationships between variables:
• independence, with the independence copula where C(u, v) = uv;
• identicality, with the minimum or co-monoticity copula where C(u, v) = min(u, v); and
• mutual exclusivity, with the maximum or counter-monoticity copula, where C(u, v) = max (u + v−1,0).
The final two copulas are particularly interesting as they describe the upper and lower limits for all copulas, known as the Fréchet-Höffding bounds. As such, other copulas often tend to these limits as their parameters take extreme values.
Implicit Copulas
Implicit copulas are copulas derived from multivariate distributions such as the multivariate normal (Gaussian) or t-distributions. In these cases, the distribution functions u and v are used to calculate the values that would have been seen had the marginal distribution been, say, a standard univariate normal distribution. For example, u if is 0.05, v is 0.01 and ρ is 0.75, then the joint distribution function using a Gaussian copula is:
where Φ0.75(…) is the joint distribution function for a pair of standard normal variables with a correlation of 0.75. These copulas can also be extended to higher dimensions. In the case of the Gaussian copula, above, this means that the relationship between the variable is defined by a correlation matrix.
Appendix 2 – Extreme Value Theory
The Generalised Extreme Value Distribution
There are two main approaches used in extreme value theory. The Generalised Extreme Value (GEV) distribution, describes the distribution of the largest observations in blocks of observations, or the distribution of the number of observations in each block exceeding some value, is given by:
where 1 + ξx > 0 and ξ is a parameter determining the family to which the limit belongs. In particular, if ξ > 0 the result is a Fréchet-type distribution; if ξ = 0, the result is a Gumbel-type distribution; and if ξ < 0, the result is a Weibull-type distribution. The Fréchet-type distribution has fat tails, like the t-distribution, the tails of the Gumbel-type distribution are exponential, like the normal distribution and the Weibull-type distribution has tails so narrow that the distribution has a finite right endpoint. This approach – fitting the largest observations from blocks of data – is known as the block maxima approach.
The Generalised Pareto Distribution
A second approach is the threshold exceedances approach, which attempts to fit all observations in the tail of a distribution to a single model, the Generalised Pareto Distribution (GPD). This is used as a conditional distribution and has the following distribution function:
Appendix 4 – R Code for the Calculation of Coefficients of Finite Tail Dependence
The code below generates 10,000,000 random observations from a t-copula with v degrees of freedom.
Suppose we have a copula with 4 variables, W,X,Y and Z. This is denoted in the code below by dim = 4. Each pair of variables (6 pairs in the case of 4 variables) has a correlation co-efficient of r(W,X), r(W,Y)…r(Y,Z). Then the r vector “param” describes the relationship between the marginals as: param<−c(r(W,X), r(W,Y), r(W,Z), r(X,Y), r(X,Z), r(Y,Z)). The function c simply converts the individual elements into a vector. The line disptr = “un” means simply that the correlation coefficients are presented as a list rather than as a correlation matrix.
In the code below we have used a single value of r in order to limit the number of tables produced; however, separate values can instead be used.
r < −0.9
v < −50
param < −c(r,r,r,r,r,r)
t.Cop < −tCopula(param, dim =4, dispstr=“un”, df=v, df.fixed=FALSE)
x < − rcopula(t.Cop, 10000000)
Exactly the same principles are used to generate 10,000,000 random observations from a Gaussian copula with 4 variables. No degrees of freedom parameter is needed here.
r < −0.8
param < −c(r,r,r,r,r,r)
NormalCopula < −normalCopula(param, dim=4, dispstr=“un”)
x < −rcopula(NormalCopula, 10000000)
In each case, the matrix x contains 10,000,000 observations each of which is in the form of four co-ordinates. As such, each observations can be described as a co-ordinate in a four-dimensional hypercube. This information is then used to calculate the proportion of observations that are found in the extreme corner of the hypercube between the origin and the co-ordinates (u,u,u,u). This proportion can be defined as C(u,u,u,u). If this is divided by u – the maximum value that C(u,u,u,u) could take – then the result is the finite coefficient of tail dependence
u < −0.005
k = 0
for(i in 1:10000000){
if(x[i,1] < u){if(x[i,2] < u){if(x[i,3] < u){if(x[i,4 ]< u)
{k=k+1}else{k=k}}}}}
p3k<−k/10000000
p3<−p3k/u
p3