Published online by Cambridge University Press: 01 October 2004
We consider models for financial data by Lévy processes, including hyperbolic, normal inverse Gaussian, and Carr, Geman, Madan, and Yor (CGMY) processes. They are given by their Lévy triplet (μ(θ),σ2,eθxg(x)ν(dx)), where μ denotes the drift, σ2 the diffusion, and eθxg(x)ν(dx) the Lévy measure, and the unknown parameter θ models the skewness of the process. We provide local asymptotic normality results and construct efficient estimators for the skewness parameter θ taking into account different discrete sampling schemes.I thank Prof. Dr. L. Rüschendorf for his steady encouragement, the referees for helpful comments, and the German National Scholarship Foundation for financial support.
Lévy processes, processes with stationary independent increments, became popular for modeling financial data during the last decade. However, the earliest attempt to model the stock behavior by a Lévy process, the Brownian motion, was by Bachelier (1900) in his Ph.D. thesis. More recently there has been a focus on Lévy processes with jumps. Hyperbolic Lévy motions (cf. Eberlein and Keller, 1995; Keller, 1997), generalized hyperbolic Lévy motions (cf. Prause, 1999; Raible, 2000), normal inverse Gaussian processes (cf. Barndorff-Nielsen, 1998; Rydberg, 1997), stable processes (cf. Rachev and Mittnik, 2000), variance gamma processes (cf. Madan and Senata, 1990), and CGMY processes, also called truncated Lévy flights (cf. Carr, Geman, Madan, and Yor, 2002) yield good models for log-return processes of prices and exchange rates. These are models of the form log St = Xt, where St is the price process. Models based on these processes are less restrictive than the traditional ones; they allow jumps and include both finite and infinite activity and also bounded and unbounded variation. Furthermore, the empirical facts of excess kurtosis, skewness, and fat tails can be modeled more realistically.
One parameter that is especially important for modeling is the skewness parameter. The skewness of a distribution is modeled by multiplying with an exponential term
. For θ > 0 the resulting distribution is right/positive skewed depending on the size of θ; i.e., the bigger θ is the more weight is put on larger x. This parameter is an important parameter in finance, because according to the empirical data the distribution of the log-return prices is mostly skewed (cf. Prause, 1999). Carr et al. (2002) perform a detailed analysis of skewness, finding that statistical data (i.e., the time series of stock returns) are significantly skewed either right or left, whereas risk-neutral data (i.e., data derived from option prices) are consistently left skewed. The skewness parameter is of course not the same as the financial term skewness, the appropriately normed third centered moment of a distribution. It measures the same effect, namely, the derivation from the symmetric distribution, but especially for fitting data we need the skewness parameter itself.
The main problem for estimating parameters entering a Lévy process Xt is that in general the process is given by the Lévy–Khinchin formula or in other words the characteristic function,
where μ denotes the drift, σ2 the diffusion, and g the density w.r.t. ν of the Lévy measure, satisfying ∫(1 ∧ x2)g(x,θ)ν(dx) < ∞, and h(x) is some truncation function, which behaves like x in the neighborhood of zero and ensures integrability in the characteristic function. Common examples are h(x) = x1|x|≤1(x) and h(x) = x/(x2 + 1). However, the density of the process itself is unknown and cannot be calculated explicitly, as for most stable, hyperbolic, generalized hyperbolic and CGMY processes. Hence we have to find conditions on the Lévy triplet (μ(θ),σ2(θ),g(x,θ)ν(dx)) that allow us to construct efficient estimators explicitly.
We look at the special case where our unknown parameter is the skewness, g(x,θ) = eθxg(x), and focus on the concept of asymptotic statistics. We show that under very mild regularity conditions we obtain local asymptotic normality for the skewness parameter, giving us the maximal rate of convergence of a sequence of regular estimators and the minimal asymptotic variance, which turns out to be fully explicit, only involving the quantities of the Lévy triplet. Furthermore, we can then construct efficient estimators and show the relation to martingale estimating functions.
For all financial applications (e.g., fitting of models and also pricing derivatives and quantifying risk) it is important to have good estimators of the underlying parameters when the data are given at discrete time points, XΔ,X2Δ,…,XnΔ, because continuous data are hardly available in practice or not economical to observe.
We face two different sampling schemes; either we let the distance between the observations Δ be fixed and the number of observations n tend to infinity, or we let nΔ → ∞ as Δ → 0 and n → ∞. The first sampling scheme seems to be of more practical interest, because the distance of the observations can be large. The second one is an approximation to the continuously observed model. The third possible discrete sampling scheme where nΔ = const. < ∞, n → ∞, and Δ → 0, which would be the classical framework of high-frequency data, is not possible in our setting. Heuristically, this can easily be seen when looking at the continuously observed model. In the continuously observed model we obtain local asymptotic normality for the skewness parameter (cf. Akritas and Johnson, 1981) when the observed time T tends to infinity. Hence we also need in the discretely observed model that Δn → ∞, because T may be identified with nΔ. This has some important implications. We cannot infer the skewness parameter by high-frequency data over a fixed period of time. On the other hand, it has the advantage that it makes sense to infer the skewness parameter, even in the presence of other unknown parameters such as diffusion or scale, because they have a faster rate of convergence.
The outline of the paper is the following. In Section 2 we will review the concepts of local asymptotic normality and measure changes in Lévy processes and the result for continuously observed models. In Section 3 we will prove local asymptotic normality, and in Section 4 we construct efficient estimators and view them in the context of martingale estimating functions. Section 5 concludes.
The theory of local asymptotic statistics and the related efficiency results are established by Le Cam (1960) and Hájek (1972) and extended by Jeganathan (1981, 1983) and others.
This concept provides answers to important questions in estimation theory, e.g., how to characterize optimal estimators. Having local asymptotic normality we can specify the maximal rate of convergence of a sequence of estimators and the minimal asymptotic variance. Furthermore, it not only allows us to decide if a given sequence of estimators is efficient but also allows us to construct efficient estimators from suboptimal estimators by a one-step improvement, described, e.g., in Le Cam and Yang (1990).
Let us recall the definition of local asymptotic normality (LAN). Let pn(X0,…,Xn;θ) be the joint density of (X0,…,Xn) under θ ∈ Θ and ln(θ + δn h,θ) the log-likelihood around θ, i.e.,
where δn,h > 0.
DEFINITION (LAN). Let Θ be an open subset of
a sequence of experiments.
For fixed
is called LAN in θ, if
(i) There exist δn = δn(θ) ↓ 0.
Γn symmetric, strictly positive definite, such that for all
(ii) There exists a finite, positive semidefinite, nonrandom matrix Γ, such that
as n → ∞.
LAN
that a sequence of estimators cannot converge to the true parameter value θ0 at a rate faster than δn−1 and the asymptotic variance of a δn−1-consistent estimator is bounded from below by Γ−1.
The LAN property is often established by proving L2-convergence, which implies the appropriate expansion of the log-likelihood function only involving first derivatives. However, in our case of the skewness parameter it turns out that it is easier to take a different way, looking at the Taylor expansion of the log-likelihood function up to the second order, because second derivatives exist. For the sampling scheme with fixed distance of observations Δ we are in the framework of independent and identically distributed (i.i.d.) random variables, which is well studied (cf. Witting, 1985; Janssen, 1992; van der Vaart, 1998). For the other sampling scheme we are in the framework of triangular arrays because of the dependence of the densities on n through Δn, and we have to establish an appropriate central limit theorem (CLT) and law of large numbers (LLN) for the first and second terms of the Taylor expansion of the log-likelihood function around the true parameter θ0. For more details see the proof of Theorem 2 in Section 3.
Our aim is to prove LAN for discretely observed Lévy processes. However, for continuously observed Lévy processes there are some results known. Akritas and Johnson (1981) consider general purely discontinuous Lévy processes. We will state their result for the special case of the skewness parameter. We recall this result because the continuously observed model builds a natural benchmark for the model with discrete observations. Especially the sampling scheme with nΔ → ∞ as Δ → 0 and n → ∞ may be well compared with the continuous model, as nΔ corresponds to t.
However, the continuous model only provides an optimality bound when the underlying measures are absolutely continuous. This is quite a strong restriction to the possible variation of parameters as we can see in the following theorem. Skorokhod (1957) derived this result first; for a detailed account, see, e.g., Shiryaev (1999).
THEOREM 1. Let Xt be a Lévy process with triplet (μ,σ2,g(x)ν(dx)) under some probability measure P. Then the following two conditions are equivalent.
(1) There is a probability measure
such that Xt is a Q-Lévy process with triplet (μ,σ2,g(x)ν(dx)).
(2) All of the following four conditions hold.
[bull ] g(x)ν(dx) = k(x)g(x)ν(dx) for some Borel function :
.
[bull ] μ = μ + ∫h(x)(k(x) − 1)g(x)ν(dx) + σβ for some
.
[bull ] σ = σ.
[bull ]
.
This theorem implies that we cannot have the LAN property in the continuous model when we aim to estimate the diffusion, or a scalar factor in front of an infinite Lévy measure, because the underlying measures cannot be absolutely continuous. However, for the skewness parameter, i.e., k(x) = exp{θx}, we can have an absolutely continuous change of measures and LAN for the continuously observed model with
, where T is the observed time that tends to infinity and σ = 0 (cf. Akritas and Johnson, 1981).
Let us now assume that we are given a Lévy process with a skewed Lévy measure eθxg(x)ν(dx) and we aim to estimate the skewness parameter θ. First of all we are interested how the multiplicative term in the Lévy measure changes the distribution of the underlying Lévy process. We obtain the following lemma. Though we shall only look at Lévy processes with unbounded support, the same calculations hold for processes with bounded support, especially for subordinators.
LEMMA 1. Denote by pt(x) the density of the Lévy process w.r.t. m, given by the triplet (μ,σ2,g(x)ν(dx)) and by pt(x,θ) the density corresponding to the process with the skewed Lévy measure eθxg(x)ν(dx). Furthermore, assume that for all θ ∈ U(θ0) ⊂ Θ, where U(θ0) is a neighborhood of the true parameter θ0,
Then we obtain
and the corresponding drift is μ = μ + ∫h(x)(eθx − 1)g(x)ν(dx).
Remark. (1) For processes with finite variation, i.e., ∫(|x| ∧ 1)g(x)ν(dx) < ∞, e.g., all compound Poisson processes and subordinators, we may take h(x) = 0 and obtain μ = μ.
(2) We consider densities pt w.r.t. to some measure m. However, of most practical interest, except for compound Poisson processes, is when m equals the Lebesgue measure, as is the case in our examples. Even though the density pt might not be known explicitly, conditions on the Lévy measure may ensure its existence; e.g., as was shown in Tucker (1962), infiniteness of the Lévy measure together with a Lebesgue density of the Lévy measure already ensures the existence of a Lebesgue density pt. However, the relation between the existence of a density pt and the existence of a density of the corresponding Lévy measure and its mass near zero is much more complex. A detailed outline is given in Sato (1999).
Proof. Under the assumption (1) the denominator in (2) is well defined. We assume that the characteristic function corresponding to pt is
Hence we can calculate the characteristic function of the skewed distribution
which yields the desired result. █
Of course this result is not new. The principle of multiplying with an exponential factor is well established in different disciplines of stochastics but is named differently. In statistics the family of distributions or processes derived by varying the skewness parameter is called the natural exponential family (cf. Janssen, 1992; Küchler and Sørensen, 1997). In finance the the distribution obtained by adding the skewness is called the Esscher transform (cf. Shiryaev, 1999). This concept is also an easy way of explicitly calculating equivalent martingale measures.
Using (2) we can prove LAN for the skewness parameter. Though we start with a general Lévy process with unknown density, we obtain a fully explicit result of the minimal asymptotic variance, depending only on the quantities of the Lévy triplet, for both sampling schemes. Furthermore, our minimal asymptotic variance for the sampling scheme with Δ → 0 turns out to be the same as in the continuously observed model.
THEOREM 2. Assume condition (1) for θ ∈ U(θ0) ⊂ Θ, a neighborhood of θ0; then we obtain LAN
Proof. Because we have i.i.d. increments as a result of the structure of the Lévy processes, we can use the results for i.i.d. random variables.
First, we have to calculate the derivatives. We may interchange integration and differentiation if θ is in the interior of Θ and obtain
(i) For the sampling scheme with fixed distance of observations Δ, we are in the framework of i.i.d. random variables with densities; hence conditions under which LAN in θ0 holds are well known (see, e.g., Witting, 1985, p. 179; Le Cam and Yang, 1990, Ch. 6.3). The conditions are that the density is positive m a.s. and continuously differentiable w.r.t. the unknown parameter θ in a neighborhood U(θ0). Then we obtain LAN with maximal rate of convergence
and
which is the inverse of the minimal asymptotic variance, provided Γ(θ) is finite for all θ ∈ U(θ0) and continuous in θ0. This result applies as for θ ∈ U(θ0), pt(x,θ) is continuously differentiable in θ and pt(x,θ) > 0 [m] . We obtain LAN with
and
because Γ(θ) is obviously finite for θ ∈ U(θ0) and continuous in θ0. For the last equation we use the well-known moment representation, by the derivatives of the characteristic function,
.
(ii) For the sampling scheme with Δ → 0 we need a few more conditions to ensure the LAN property, which results basically from conditions needed to perform the CLT and LLN for triangular schemes (see Woerner, 2001; Gnedenko and Kolmogorov, 1968). The conditions are as follows. Assume that there exists a neighborhood U(θ0) of θ0 such that the density pt(x,θ) w.r.t. m of the Lévy process is two times continuously differentiable w.r.t. the unknown parameter θ and regular (i.e.,
) for all θ ∈ U(θ0). Denote by θn = θn(x) a measurable function from
. Furthermore, assume that for all ε > 0 as n → ∞ and Δ → 0, such that nΔ → ∞,
Then we obtain LAN with
for the sampling scheme with Δ → 0 as n → ∞.
Finding dominating functions to ensure these conditions is straightforward because of the special simple structure that θ only enters in the exponential term and Δ only as a polynomial factor, by applying the moment representation to the integrals w.r.t. pΔ(x,θ)m(dx). We look at the details of condition (i); the others can be checked analogously.
as n → ∞, Δ → 0, and nΔ → ∞. █
The normal inverse Gaussian process is characterized by the Lévy triplet
where K1 denotes the modified Bessel function of third order with index 1 and
. Because as |x| → 0
the process is of unbounded variation and possesses infinitely many jumps. Hence it also has a density w.r.t. the Lebesgue measure,
As pointed out in Barndorff-Nielsen (1998) this process is used both for modeling turbulence, in particular when the Reynolds number is high, and in finance. This is due to some special properties of the normal inverse Gaussian process, such as possible asymmetry modeled by the skewness parameter β, unbounded variation, and semiheavy tails; namely, as |x| → ∞
The parameter β may be estimated according to Theorem 2. Here assumption (1) is satisfied. Hence for the sampling scheme with Δ fixed and n → ∞ we have
and
which is indeed the same result as using the density and the results for i.i.d. random variables.
The gamma process is characterized by the Lévy triplet
with α,β,x > 0. Hence the process is a subordinator and only possesses nonnegative jumps. The density w.r.t. the Lebesgue measure can be calculated, and we can see that the name reflects the fact that the increments are distributed according to a gamma function,
Analogously to Example 1 β can be estimated, and we obtain for the sampling scheme with Δ fixed and
and
which is again the same result as using the density and the i.i.d. results.
The hyperbolic Lévy motion which was introduced by Barndorff-Nielsen (1977) for modeling mass-size distributions of aeolin sand deposits, has also been applied to some other areas of interest, e.g., turbulence data (cf. Barndorff-Nielsen, 1996) and to financial data (cf. Eberlein and Keller, 1995; Keller, 1997; Rydberg, 1997; Prause, 1999; Raible, 2000).
The hyperbolic Lévy motion may be characterized by the Lévy triplet
where J1 denotes the Bessel function of the first order with index 1 and Y1 the Bessel function of the second order with index 1. Furthermore, we have α,δ > 0 and 0 ≤ |β| < α. Keller (1997) has established that the density of the Lévy measure behaves like x−2 at the origin; hence the process is of unbounded variation. Though the process possesses a density, we cannot calculate it analytically. Only the distribution for t = 1 can be written down explicitly; it is the hyperbolic distribution
where K1 denotes the modified Bessel function of third order with index 1. We can estimate β analogously to Example 1 and obtain for the sampling scheme with Δ fixed and
and
where
using Keller (1997).
The CGMY process, named after Carr, Geman, Madan, and Yor, or in physical literature called truncated stable or truncated Lévy flight, is given by the Lévy triplet
where C > 0, G ≥ 0, M ≥ 0, and Y < 2. This class of processes is a flexible model for index dynamics and also for the dynamics of individual stocks, because by varying the parameters it allows all features of finite and infinite activity, bounded and unbounded variation, and also skewness to be modeled directly by the characteristic function (cf. Carr et al., 2002). The variance gamma process (cf. Madan and Senata, 1990) is a special case of the CGMY process. In general the density of the process is not known explicitly, but as in the previous examples we can infer the parameters M or G in the special one-sided case, when either G = ∞ or M = ∞, only by the knowledge of the Lévy measure. When G = ∞ we obtain (0,0,C(exp{−M|x|}/|x|1+Y)1x>0) and for the sampling scheme with Δ fixed and
and
With this explicit result for the minimal asymptotic variance we can now try to find a sequence of estimators that is efficient. Because the continuous model is a benchmark for the discretely observed model, we look at the continuous likelihood function to get an idea of what discrete estimation functions may look like. In the continuous model the likelihood function is given by
where Yt denotes the process under the measure pt(x)m(dx). Changing to the process Xt under the measure eθxpt(x)m(dx)/(∫eθxpt(x)m(dx)) and discretizing the time, i.e., t = nΔ, leads to the log-likelihood function
which indeed provides an appropriate estimating function.
We obtain that only the knowledge of the last observation is important for the estimation procedure. In other words, the last observation contains all necessary information. This is however not surprising, because from the theory of exponential families it is well known that Yt or XnΔ, respectively, is sufficient for {Ptθ,θ ∈ Θ}.
THEOREM 3 (Optimal estimators).
Proof.
(i) Using X0 = 0 and nΔEθ X1 = Eθ XnΔ, we can rewrite
This yields asymptotic normality by inserting
as n → ∞ by the CLT and
in the expansion
where
.
(ii) This is analogous to using the CLT for triangular schemes as in Theorem 2 for the convergence in distribution and calculating the explicit form of
as in (i). █
With Theorem 3 equation (3) now provides an easily computable estimating function that indeed leads to efficient estimators. It is especially simple because it only involves the observations and the first moment. Depending on the form of the first moment the equation can sometimes even be solved analytically.
For the gamma process equation (3) is
Hence we obtain
for the sequence of efficient estimators as n → ∞ for the sampling scheme with fixed Δ.
Equation (4) can also be viewed as the simplest form of a martingale estimating function, having the general form
where Yi = XiΔ − X(i−1)Δ i.i.d. distributed according to pΔ(.,θ). We can now show that our sequence of estimators is not only optimal in the sense of local asymptotic statistics but also in the sense of Godambe and Heyde (1987), which is the classical optimality concept for martingale estimating functions.
Let us first give a short review on the concept of martingale estimating functions and the definition of optimality by Godambe and Heyde (1987).
The basic problem is that we want to draw inference for discretely observed stochastic processes when the likelihood function is unknown. Because in general the maximum likelihood estimator performs quite well, the idea is to approximate the unknown score function to obtain an approximate maximum likelihood estimator. Using an approximation, the problem might occur that we perhaps do not have mean zero and hence eventually will get biased estimates, especially when the distance of observations Δ is bounded away from zero. A solution is to approximate the score function by a zero mean martingale w.r.t. the filtration generated by the observations. This implies that we obtain consistent and asymptotically normal estimators.
The optimality concept of Godambe and Heyde (1987) and Heyde (1988) formalizes the heuristics that the optimal element in a given class of martingale estimating functions is the one whose L2-distance to the true score function is minimal, or in the partial order of nonnegative matrices the distance to the score function is minimal.
DEFINITION (OA-optimality). Let
where Gn denotes the compensator of
and 〈G,GT〉 the quadratic characteristic of Gn. Then
is called OA-optimal in
if and only if
As a result of the special structure of i.i.d. increments, all considerations simplify greatly for our model compared to general stochastic processes. By straightforward calculations we obtain f*(x) = x for the optimal G*. Hence we also have optimality in the sense of Godambe and Heyde (1987) for the estimators in Theorem 3.
We derived local asymptotic normality for the skewness parameter of Lévy processes that are observed at discrete time points only. This provides an efficiency criterion in terms of the maximal rate of convergence and the minimal asymptotic variance for a sequence of estimators. Furthermore, we obtained easily computable estimating functions that lead to efficient estimators both in the sense of asymptotic statistics and in the sense of Godambe and Heyde (1987) for martingale estimating functions. Hence our results enable us to optimally infer the skewness parameter from discrete observations for the popular Lévy process models, such as generalized hyperbolic, normal inverse Gaussian, and CGMY, even when other unknown parameters are involved.