1. INTRODUCTION
With the increasing popularity of generalized autoregressive conditional heteroskedasticity (GARCH) modeling, there is also increased interest in general, even nonparametric, models and in moving away from the particular specification of the classical GARCH models, as introduced by Engle (1982) and Bollerslev (1986). In this paper, we assume that (εt) belongs to the general class of GARCH(1,1) processes, defined by
where the sequence (ηt) is independent and identically distributed (i.i.d.). For statistical purposes the assumption that ηt has zero mean and unit variance is often required, but we do not need this assumption in this paper. The following assumptions are made on the functions ω, a, and h.
(i)
is such that its restrictions to
are either constant and strictly positive, or continuous and, respectively, strictly increasing and strictly decreasing;
(ii)
is such that its restrictions to
are continuous and, respectively, strictly increasing and strictly decreasing;
(iii)
is 1-1, onto and increasing.
The standard GARCH(1,1) model is obtained for ω(x) = ω, h(x) = x2, and a(x) = αx2 + β with α ≥ 0,β ≥ 0. This specification has been found adequate for a number of financial series and is arguably the most popular volatility model. When h has a form inspired by the Box–Cox transformation, and for some particular specifications of the functions ω and a, we get the augmented GARCH introduced by Duan (1997). For h(x) = xδ we get the class of GARCH(1,1) models defined by He and Teräsvirta (1999), which includes a variety of other first-order specifications.1
Namely, the absolute value GARCH model (Taylor, 1986; Schwert, 1989) for ω(x) = ω, h(x) = x, and a(x) = α|x| + β; the threshold GARCH model of Zakoïan (1994) for ω(x) = ω, h(x) = x, and a(x) = α− max(0,−x) + α+ max(0,x) + β; the Glosten, Jagannathan, and Runkle (1993) model for ω(x) = ω, h(x) = x2, and a(x) = α−{max(0,−x)}2 + α+{max(0,x)}2 + β; the asymmetric power GARCH model of Ding et al. (1993) for ω(x) = ω, h(x) = xδ, and a(x) = α(|x| − γx)δ + β; a moving average GARCH process, inspired by the moving average conditionally heteroskedastic (MACH) model of Yang and Bewley (1995), for h(x) = x2, a(x) = a, and, for instance, ω(x) = ω1 + ω2 x2; the sign-switching autoregressive conditional heteroskedasticity (ARCH) model of Fornari and Mele (1997) for h(x) = x2, ω(x) = ω1 + ω2 sign(x), and a(x) = αx2 + β. See also Bühlmann and McNeil (2000) and Yang and Tschernig (2006) for recent references on nonparametric GARCH(1,1) modeling.
Nelson (1990) and Bougerol and Picard (1992) showed in the standard GARCH case that if Eηt = 0 and Var(ηt) = 1,
is a necessary and sufficient condition for the existence of a unique strictly stationary and nonanticipative solution to model (1). A nonanticipative solution is a process (εt) such that εt is a measurable function of the variables ηt−s, s ≥ 0. The extension to model (1) will be given subsequently. For statistical inference, however, strict stationarity is not a sufficient assumption, and it can be crucial to know when the stationary solution possesses mixing properties. Knowing that these properties hold may make it possible or easier to establish other properties such as central limit theorems (CLTs).
Mixing properties of classes of models including GARCH-type processes have been investigated by Ango Nze (1992, 1998), Lu (1996), Carrasco and Chen (2002), Rahbek, Hansen, and Dennis (2002), Lee and Shin (2004), Hwang and Kim (2004), and Meitz and Saikkonen (2004), among others. Unfortunately, when applied to standard GARCH processes, their results require moment assumptions that are much stronger than the strict stationarity assumption. Typically the condition α + β < 1 is imposed for the standard GARCH(1,1), which amounts to restricting the class of strictly stationary solutions to those admitting a second-order moment. To our knowledge the most significant contribution, specifically devoted to the standard GARCH(p,q), is the dissertation by Boussama (1998), which establishes strong mixing under conditions we will further discuss. However, the proof relies on heavy geometric algebra based upon the Mokkadem (1990) result for polynomial autoregressive processes. See also Kristensen (2006).
Our main contribution is to show that under (2), the β-mixing of the strictly stationary solution holds without any additional restriction on the function a(·). In particular we do not make any moment assumptions on the process (εt). We provide simple sufficient conditions on the process ηt under which the strictly stationary solution to model (1) is β-mixing with exponential decay. We do not impose a continuous distribution for ηt, contrary to the preceding references dealing with mixing. This may have interest for financial applications because prices, and hence returns, are not observed continuously but are multiples of a monetary unit called the tick. A continuous distribution for the i.i.d. process would typically imply a continuous distribution for εt. On the other hand, dealing with the mixing properties of discrete-valued time-series models is in general a difficult task. For these reasons, and for the sake of generality, we will allow for both a discrete and a continuous part in the distribution of ηt. We rely on the results displayed in the book by Meyn and Tweedie (1996) so that the proof can be easily followed.
The fact that requirements for the existence of second-order moments can be ignored is particularly important for the statistical inference of GARCH(1,1) models. Indeed, recent references establish asymptotic normality of the maximum likelihood estimator essentially under the assumption (2) using the martingale theory.2
See, e.g., Lee and Hansen (1994), Lumsdaine (1996), Berkes, Horváth, and Kokoszka (2003), and Francq and Zakoïan (2004). See also Jensen and Rahbek (2004) for an extension to the nonstationary case.
In the next section we give, for the reader's convenience, the Markov chain results we need. Section 3 is devoted to strict stationarity. In Section 4 we establish geometric ergodicity of the strictly stationary solution. Two statistical applications are proposed in Section 5.
2. BASIC MARKOV CHAIN THEORY
This section is drawn from the papers by Tjøstheim (1990) and Basrak, Davis, and Mikosch (2002) and the book by Meyn and Tweedie (1996). All the random variables considered in this paper are defined on some probability space
. Let {Xt,t ≥ 0} be a homogeneous Markov chain on
where
is the Borel σ-field on E. We denote the probability of moving from x to the set B in t steps by
The Markov chain (Xt) is φ-irreducible if, for some nontrivial measure φ on
,
If (Xt) is φ-irreducible, there exists a maximal irreducibility measure M (see Meyn and Tweedie, 1996, Prop. 4.2.2), and we set
. We call the chain positive recurrent if
For a φ-irreducible Markov chain, positive recurrence is equivalent (see Meyn and Tweedie, 1996, Thm. 18.2.2) to the existence of a (unique) invariant probability measure, that is, a probability π such that
Let ∥·∥ denote the total variation norm. The Markov chain (Xt) is said to be geometrically ergodic if there exists a ρ, ρ ∈ (0,1), such that
Recall that for a stationary process, the strong (α-) mixing coefficients are defined by
where the first supremum is taken over the set of measurable functions f and g such that | f | ≤ 1, |g| ≤ 1, and the second supremum is taken over the sets A ∈ σ(Xs,s ≤ 0) and B ∈ σ(Xs,s ≥ k), whereas the β-mixing coefficients are defined by
where in the last equality the sup is taken over all pairs of partitions {A1,…, AI} and {B1,…, BJ} of Ω such that Ai ∈ σ(Xs,s ≤ 0) for each i and Bj ∈ σ(Xs,s ≥ k) for each j. The process is called α-mixing (resp. β-mixing) if limk→∞ αX(k) = 0 (resp. limk→∞ βX(k) = 0). We have αX(k) ≤ βX(k), so that β-mixing implies α-mixing. If Y = (Yt) is a process such that Yt = f (Xt,…, Xt−r) for some measurable function f and some integer r ≥ 0, then σ(Yt, t ≤ s) ⊂ σ(Xt, t ≤ s) and σ(Yt, t ≥ s) ⊂ σ(Xt−r, t ≥ s). Thus
Note that for a stationary Markov process we have αX(k) = supf,g|Cov(f (X0),g(Xk))|, where f and g are as in the previous definition (see Bradley, 1986). One consequence of the geometric ergodicity is that the Markov chain (Xt) is β-mixing, and hence strongly mixing, with geometric rate. Indeed, Davydov (1973) showed that for an ergodic Markov chain (Xt) with invariant probability measure π,
Thus βX(k) = O(ρk) if (3) holds.
To state the following criterion for the geometric ergodicity of a Markov chain, we need the idea of a Feller chain. We call (Xt) a Feller Markov chain (or weak Feller chain) if the function
is continuous for every bounded and continuous function g on E.
THEOREM 1 (Feigin and Tweedie, 1985, Thm. 1). Assume that
(i) (Xt) is a Feller Markov chain;
(ii) (Xt) is φ-irreducible for some measure φ on
;
(iii) there exists a compact set C ⊂ E such that φ(C) > 0 and a nonnegative continuous function (test function)
such that
and for some c > 0
Then (Xt) is geometrically ergodic.
3. STRICT STATIONARITY
The assumption that the sequence (ηt) is i.i.d. with E log+{a(ηt)} < ∞, where log+x = max(log x,0) for x ≥ 0, is maintained throughout. Our first result can be deduced from results established by Bougerol and Picard (1992) for generalized autoregressive vector equations. But we prefer to give a simple self-contained proof of this theorem.
THEOREM 2. If (2) holds, then the series
converges almost surely (a.s.), and the process (εt), defined by εt = h−1(ht)ηt, is a strictly stationary solution of (1). This solution is unique, nonanticipative, and ergodic.
If (2) does not hold and P[ηt = 0] ≠ 1, there exists no strictly stationary solution to model (1).
Proof. First note that γ =: E log a(ηt) exists in [−∞,+∞). Now let
the limit being well defined in
in view of the positivity of the summands. Because ht(N) = ω(ηt−1) + a(ηt−1)ht−1(N − 1) for all N, we have, letting N go to infinity, ht = ω(ηt−1) + a(ηt−1)ht−1. It remains to show that ht is a.s. finite. Let
for some constant τ > 0 and
. Let ht*(N) be obtained by replacing ω by ω* in ht(N) and denote by ht* its a.s. limit. We have
as n → ∞, by the strong law of large numbers applied to the i.i.d. sequence (log{a(ηt)}). It follows from the Cauchy rule that for any t, the sequence {ht*(N),N ≥ 1} converges a.s. in
. Because ht ≤ ht* we thus have ht < ∞ a.s. As a function of an i.i.d. sequence, the limit ht is thus strictly stationary and ergodic, in which case so is εt.
To prove uniqueness, let
be another strictly stationary solution process of (1). Suppose
for some t. Iterating the second equation in (1) we have
. From the strong law of large numbers and (2), we have a(ηt−1) … a(ηt−n) → 0 with probability 1 as n → ∞. Thus
, which entails
or
with nonzero probability. This is not possible because the sequences
are stationary. Therefore
for any t, a.s.
To prove the necessary part, suppose there exists a strictly stationary solution (ht) of (1). We have for n > 0,
from which we deduce that a(η−1) … a(η−n)ω(η−n−1) converges to zero, a.s., when n → ∞, or, equivalently, that
First suppose E log{a(ηt)} > 0. Then by the strong law of large numbers,
, and it is necessary for (14) to hold that log ω(η−n−1) → −∞ a.s. This convergence implies ω(η0) = 0 a.s., which is precluded because ηt is not identically equal to zero. Now suppose E log{a(ηt)} = 0. By the Chung–Fuchs theorem (see, e.g., Chow and Teicher, 1997) we have
with probability 1 and, using the elementary Lemma 1, which follows, the convergence (14) entails log ω(η−n−1) → −∞ in probability. Thus, we are led to a contradiction as in the previous case. Thus, the assumption that a strictly stationary solution exists when E log{a(ηt)} ≥ 0 entails a contradiction. █
LEMMA 1. If (Xn) and (Yn) are two independent sequences of random variables such that Xn + Yn → −∞ and Xn [nrarr ] −∞ in probability, then Yn → −∞ in probability.
Remark 1. It can be seen from the proof that a solution (ht), as given by (12), always exists in
but that when (2) does not hold and
, this solution satisfies
. See Klüppelberg, Lindner, and Maller (2004) for more detailed results in the standard GARCH(1,1) case.
4. GEOMETRIC ERGODICITY
To prove geometric ergodicity we require additional assumptions on the i.i.d. process (ηt), essentially to ensure that the transition kernel has a Lebesgue component.
Assumption A. The distribution
of the variable ηt is a mixture of an absolutely continuous component with respect to the Lebesgue measure λ on
and Dirac masses at some points
. With standard notation we then have
where f is a density of the continuous component. Let η+0 = inf{η|η > 0,f (η) > 0} and η−0 = sup{η|η < 0,f (η) > 0}, when these sets are nonempty, and assume that
for some τ > 0. By convention (η−0 − τ,η−0) = Ø (resp. (η+0, η+0 + τ) = Ø) when η−0 (resp. η+0) is not defined. Finally, E {ω(ηt)r} < ∞ and E {a(ηt)r} < ∞ for some r > 0.
Remark 2. The standard case where
is absolutely continuous with respect to the Lebesgue measure is obtained by taking p = 0. Note however that the case p = 1, that is, when the law of ηt has no continuous component, is excluded. In such a case, criteria based on topological properties of the chain fail to prove ergodic properties (see a similar example in Meyn and Tweedie, 1996, p. 127). This does not mean that the process is not geometrically ergodic in those situations: for example, in the standard GARCH(1,1) case, if ηt2 = 1 with probability 1, then the strictly stationary solution process is an independent white noise, which is obviously geometrically ergodic.
The main result of this paper is as follows.
THEOREM 3. Under Assumption A and if the strict stationarity condition (2) holds, then the strictly stationary and nonanticipative solution (εt) of the GARCH(1,1) model (1) is β-mixing with exponential decay.
Remark 3. The proof of this theorem relies on showing that (2) entails the geometric ergodicity of (ht). Moreover, geometric ergodicity implies strict stationarity. Under Assumption A, which entails
, condition (2) is therefore necessary and sufficient for the existence of a geometrically ergodic solution (ht) and also for the existence of a strictly stationary and geometrically β-mixing solution (εt).
Remark 4. To our knowledge, existing results on mixing conditions for nonstandard GARCH(1,1) processes (see references in the introduction) are demanding in terms of moment assumptions. For instance, in Carrasco and Chen (2002), the mixing properties are obtained for various GARCH(1,1) models under moment conditions on the process (εt) (see their Table 1). By contrast, we find that the strictly stationary solution is β-mixing without any moment restriction.
Remark 5. When applied to standard GARCH(1,1) models, this theorem is also more general than those already established. In Boussama (1998), the geometric ergodicity of standard GARCH models is proved under the assumption that ηt has an absolutely continuous distribution with respect to the Lebesgue measure (i.e., p = 0 in our framework), with a positive density in a neighborhood of zero. In this case Assumption A holds with η−0 = η+0 = 0. Note however that our assumption allows us to handle more general cases, where the density is null on a neighborhood of zero or where the distribution of ηt does not admit a density with respect to the Lebesgue measure.
Before proving the theorem, we start by establishing geometric ergodicity of (ht).
LEMMA 2. Under the assumptions of Theorem 3, the strictly stationary and nonanticipative solution (ht) of model (1) is geometrically ergodic.
Proof. By the second equation in model (1), (ht) is obviously a homogeneous Markov chain on
. The proof consists in checking the three conditions of Theorem 1.
Step (i): Feller property. For any bounded and continuous function g on
, the function
is continuous in x over
, by the Lebesgue dominated convergence theorem, which shows that the Markov chain (ht) is Feller.
Step (ii): Irreducibility. Let τ′ ∈ (0,τ) be small enough so that the set Dτ′ =: (η−0 − τ′,η−0) ∪ (η+0, η+0 + τ′) does not contain any mass μi.
(a) First assume that
. Note that {a(x) = 0} ⊂ {x = 0} in view of the assumptions made on the function a. Set H(x,y) = ω(y) + a(y)ω(x) and remark that H is strictly increasing in |y| for fixed x. Let
and denote by λI the restriction of the Lebesgue measure to the (nonempty) set I. For
, we have
We have, for any Borel set B,
Note that, conditional on the event {η1 ∈ Dτ′}, the variable H(0,η1) admits a density, fH, that is positive over I. Thus
which proves that the Markov chain (ht) is λI-irreducible.
(b) Now suppose
. For ease of presentation we shall assume that
The case where the distribution of η is absolutely continuous with respect to λ, that is, p = 0 in (15), can be handled by a straightforward adaptation of what follows.
We have, in view of (15),
where
By convention take a(η−0) = 1 if {η|η < 0,f (η) > 0} = Ø and a(η+0) = 1 if {η|η > 0,f (η) > 0} = Ø. Inequality (17) follows from the fact that a is monotonous over the positive and negative real semilines. It follows that, under (2),
Hence there exist some integers n0, m0, and ni, for i = 1,…, N, such that
By continuity of a, it is not restrictive to assume that τ′ is small enough so that
Now let
. We have, for all t > 0,
The inequalities (18) and (19) will allow us to control the products of this sum, provided that we constrain the i.i.d. process to visit some states with appropriate frequencies. To this aim we introduce, for K = 1,2,…, the event
where
. When η+0 or η−0 is not defined, the corresponding terms can be withdrawn from the definition of AK. Thus we have
Denote
, for k = 0,…, K − 1. The conditional distribution given AK of the vector
has a density with respect to the Lebesgue measure on
.
Now we wish to show that
First suppose that the function ω is constant over
. For any integer [ell ] and any vector u = (u1,…,u[ell ]) denote u← = (u1,…,u[ell ]−1) and u+ = u[ell ]. By convention let
for n < k. Let
. We have, given AK,
where
is a constant and S(YK←) is an a.s. positive random variable. In view of (16) it follows that given AK the mapping
is a C1 diffeomorphism between open sets of
. Indeed, the determinant of the Jacobian matrix of this mapping is given by
S(YK←) which is a.s. positive. Therefore the distribution of ZK conditional on AK has a density with respect to the Lebesgue measure on
. Consequently (22) holds.
Now suppose that ω is nonconstant over
. Let
and let
. We have
where
is a constant and
is a random variable. The conclusion follows from the same argument as before, noting that the mapping
is a C1 diffeomorphism between open sets of
. The case where ω is nonconstant over
can be handled similarly.
To determine the support IK of the conditional distribution of hKn, first note that, for t = Kn, the last term on the right-hand side of (20) writes, conditional on AK,
and therefore belongs to the set [ρKx,ρ1Kx], in view of the assumptions made on the function a. Products of the form a(ηKn−1) … a(ηKn−kn), for k = 1,…,K − 1, can be handled similarly. To deal with the other products in (20) we introduce the notation
If AK holds true, using the assumptions (i) and (ii) on the functions a and ω, we have
and thus, for K sufficiently large, in view of (18) and (19),
Indeed, I is the closure of the limit of the sets IK when K tends to infinity. Because the lower and upper bounds of IK are reached, by the intermediate values theorem and in view of (22), we conclude that
Denoting by λI the restriction of the Lebesgue measure to the set I, it follows that λI is an irreducibility measure because, for any set
,
by (21) and (23).
Step (iii): First note that the assumptions E log a(ηt) < 0 and E {a(ηt)r} < ∞ for r > 0 imply the existence of a number s ∈ (0,1) such that
(see Nelson, 1990; Berkes, Horváth, and Kokoszka, 2003, Lem. 2.3).
Let the test function defined by V(x) = 1 + xs, let 0 < c < 1 − ρ2, and let the compact set
where ωs = E {ω(ηt−1)s} and s is chosen small enough so that ωs < ∞. We have, for x ∉ C, using the elementary inequality (a + b)s ≤ as + bs for a,b ≥ 0 and s ∈ [0,1],
which proves (10). Moreover (9) holds true.
It remains to check that φ(C) > 0 where φ = λI is the irreducibility measure obtained previously. Given the shape of the intervals I and C, it is clear that
If c is chosen close enough to ρ2 − 1 the latter inequality will be verified. For such a c, the compact set C meets the assumptions of Theorem 1. It follows that the Markov chain (ht) is geometrically ergodic. █
Proof of Theorem 3. We will show that the process (εt) inherits the mixing property established for (ht). We first show that the process Yt = (ht,ηt)′ has the mixing property. It is clear that (Yt) is a Markov chain on
endowed with its Borel σ-field. Moreover (Yt) is strictly stationary as a measurable function of ηt,ηt−1,…. By independence between ht and ηt we can denote by
the stationary distribution of Yt, where
is that of
that of ηt. Denote by
the transition probabilities of the Markov chain (Yt). We have, for
,
Therefore, because
is a probability measure,
The right-hand-side term converging to 0 at exponential rate, by the geometric ergodicity of (ht), we can deduce that (Yt) is geometrically ergodic and thus geometrically β-mixing. Because εt = h−1(ht)ηt is a measurable function of Yt, we can conclude in view of (8) that the process (εt) is geometrically β-mixing. █
Remark 6. The theorem could be straightforwardly extended to the case where N = ∞, provided that a(μi) < 1 for a finite number, say, N0, of indexes i. Indeed, in this case the inequality (17) continues to hold with N replaced by N0.
Remark 7. In Pham (1986), irreducibility is established for a class of models that is very similar to our model for (ht). However we cannot use Pham's results because he assumes a continuous distribution for the i.i.d. process with a positive density in a neighborhood of 0, and more importantly, he requires ω(0) = 0 (with our notations), which does not hold, in particular, for the standard GARCH model.
Examples
1. Consider the standard ARCH(1) model and assume that ηt has a mass at zero, with Eηt2 = 1. Thus (2) is met because
It follows that geometric ergodicity holds, under Assumption A, without any restriction on the parameter α.
2. Consider the threshold ARCH(1) model, introduced by Zakoïan (1994), where ω(x) = ω > 0, h(x) = x, and a(x) = max(0,−x)α− + max(0,x)α+, with α− > 0, α+ > 0. Let
. We have
By Remark 3, under Assumption A, a necessary and sufficient condition for the existence of a stationary and geometrically ergodic solution is
5. STATISTICAL APPLICATIONS
The time-series literature abounds in statistical results requiring mixing assumptions. Consequently, Theorem 3 has numerous direct applications. We only give two of them. The first application concerns the asymptotic distribution of sample autocorrelations and is directly inspired from Romano and Thombs (1996). The second application shows that the standard Dickey–Fuller unit-root tests remain asymptotically valid when the error term follows the GARCH(1,1) model that we consider in this paper.
5.1. Behavior of the Sample Autocorrelations
For a time series ε1,…εn the identification stage of a model of the form (1) may involve the use of many statistics. Traditional estimators of the population autocorrelations of the squares are given by
and
stands for the mean-corrected observations of a time series (Xt,1 ≤ t ≤ n). Such statistics are often used to get an insight into the fourth-order structure of the process (εt). For the model of Ding, Granger, and Engle (1993), based on a Box–Cox power transformation of the conditional standard deviation process, the squares can be replaced by powers δ of the |εt|. For nonparametric GARCH models, such as model (1), general transformations of the data lead to statistics of the form
for some measurable function g. The asymptotic distributions of such statistics are easily deduced from Theorem 3.
COROLLARY 1. Assume that E|g(εt)|4+ν < ∞ for some ν > 0. Then, under the assumptions of Theorem 3, and for any fixed h ≥ 0, the vectors
where γg([ell ]) = Cov{g(εt),g(εt−[ell ])} and ρg([ell ]) = γg([ell ])/γg(0), are asymptotically normally distributed.
Proof. The proof is similar to that of Theorems 3.1 and 3.2 in Romano and Thombs (1996). Theorem 3 shows that (Yt) =: (g(εt) − Eg(εt)) is geometrically β- (and α-) mixing. Let
. The Wold–Cramer device and the CLT for strongly mixing processes, given in Ibragimov (1962) and Herrndorf (1984), show that
is asymptotically normally distributed with covariance matrix Σ given by
The absolute convergence of the last sum follows from standard covariance inequalities for mixing processes. To show the asymptotic normality of the vector involving the
, it remains to show that
. This can be proved by the arguments given in the proof of Proposition 7.3.7 of Brockwell and Davis (1991). The vector of the sample autocorrelations, because it is a differentiable function of the sample autocovariances vector, is also asymptotically normally distributed. █
Remark 8. For g(x) = x, the result can be deduced from the Lindeberg CLT for martingale differences. However, for autocovariances of general transformations of εt, it may be difficult, if not impossible, to rely on asymptotic theorems for martingales. In such cases, mixing results offer an alternative.
5.2. Unit-Root Tests for Autoregressive (AR) Models with GARCH Errors
Many financial series, such as (logarithms of) stock-market indices, are suspected to behave roughly like random walks with conditionally heteroskedastic increments. For such series, one could consider a model of the form
where (εt) belongs to the general class of GARCH(1,1) models (1). Given the consequences of the random walk hypothesis, especially in terms of persistence of the economic shocks, it is important to consider tests for the unit-root hypothesis H0 : φ = 1 against the stationarity assumption H1 : φ ∈ (−1,1). Let
be the least squares estimator (LSE) of φ and let {W(t), t ∈ [0,1]} be a standard Brownian motion. The following corollary demonstrates the asymptotic validity of the standard Dickey–Fuller tests, in our heteroskedastic framework.
COROLLARY 2. Suppose that (Xt) satisfies (24) where (εt) is a general GARCH(1,1) process satisfying the assumptions of Theorem 3. Then if E|εt|2+ν < ∞ for some ν > 0,
and if E|εt|4+ν < ∞ for some ν > 0,
Proof. Note that
. In view of Theorem 3, the weak convergence (25) is deduced from Phillips (1987), and (26) can be deduced, for instance, from Francq and Zakoïan (1998). █
Remark 9. For the standard GARCH(1,1) errors, Ling, Li, and McAleer (2003) derived the asymptotic distribution in (25) under the second moment condition, namely, α + β < 1.
Remark 10. Consider model (1) with ht = σt2. Assume that ηt has a symmetric distribution, that ω and a are even functions, and that the following moments exist, for i = 1,2,
with a2 < 1. Then tedious computations show that the asymptotic variance of the LSE is, under H1,
It is interesting to note that in the unit-root case the asymptotic distribution of the LSE is the same with i.i.d. or GARCH errors, whereas it depends on the GARCH parameters in the stationary case. Thus the Dickey–Fuller test statistic can still be used when the errors satisfy a model of the form (1). A similar finding was obtained by Rahbek et al. (2002) in a multivariate framework. They showed that the trace test for the cointegration rank remains valid when the standard i.i.d. Gaussian errors are replaced by ARCH-type innovations, with appropriate moment conditions.
Remark 11. Corollaries 1 and 2 are just given for illustrative purposes and can be straightforwardly extended. In particular, Corollary 2 could include more general models with augmented variables and/or an intercept and/or a deterministic trend, as in Phillips and Perron (1988). Similar convergences could also be stated for t-statistics.