Published online by Cambridge University Press: 01 December 2004
In this paper we discuss weak dependence and mixing properties of some popular models. We also develop some of their econometric applications. Autoregressive models, autoregressive conditional heteroskedasticity (ARCH) models, and bilinear models are widely used in econometrics. More generally, stationary Markov modeling is often used. Bernoulli shifts also generate many useful stationary sequences, such as autoregressive moving average (ARMA) or ARCH(∞) processes. For Volterra processes, mixing properties obtain given additional regularity assumptions on the distribution of the innovations.
We recall associated probability limit theorems and investigate the nonparametric estimation of those sequences.We first thank the editor for the huge amount of additional editorial work provided for this review paper. The efficiency of the numerous referees was especially useful. The error pointed out in Hall and Horowitz (1996) was the origin of the present paper, and we thank the referees for asking for a more detailed treatment of a correct proof for this paper in Section 2.3. Also we thank Marc Henry and Rafal Wojakowski for a very careful rereading of the paper. An anonymous referee has been particularly helpful in the process of revision of the paper. The authors thank him for his numerous suggestions of improvement, including important results on negatively associated sequences and a thorough update in standard English.
Mixing is now systematically used in time series where martingale assumptions and results cannot be directly employed. Mixing has proved particularly useful in cases where nonlinearities appear, such as autoregressive conditional heteroskedasticity (ARCH) modeling in econometrics. This success relies on powerful limit theorems proved under mixing conditions (see, among others, Doukhan, 1994; Rio, 2000; Doukhan, 2002). These limit results serve as basic tools for computation of the significance level and power of statistical tests. Mixing assumptions can be used in more general frameworks involving fading memory (asymptotic independence between functions of the past and the future of the process), such as near epoch dependence (NED) of a mixing process.
We recall here the mixing properties of some models used in econometrics. Simultaneously, we present a different approach to limit theorems when mixing does not hold (which really may occur, as shown in Andrews, 1984, example (4.16)). For the sake of simplicity, our exposition mainly focuses on one-dimensional time series.
The paper is organized as follows. To provide deep econometric motivations, Section 2 exposes several situations where the various weak dependence conditions arise, and after some generic examples, we consider specific problems, including unit root problems, parametric problems, sieve bootstrap, and semiparametric estimation problems in Sections 2.1, 2.2, 2.4, and 2.5. Section 2.3 considers generalized method of moments (GMM) estimation in which the Doukhan and Louhichi (1999) weak dependence condition allows one to provide a complete proof of the results in Hall and Horowitz (1996). Indeed, the latter authors improperly claim a mixing property of their models to prove their consistency results. Finally, Section 2.6 considers nonparametric estimation problems.
Section 3 makes precise the mathematical framework of weak dependence needed in the previous section. It describes some classical concepts of fading memory (mixing conditions, the association property) and also the new weak dependence conditions introduced by Doukhan and Louhichi (1999). After this, Section 4 provides numerous classes of models commonly used in econometrics and finance and focuses on their weak dependence properties. Section 5 recalls some probabilistic limit theorems available in those cases. Extensions of Donsker's functional central limit theorem (FCLT) and the FCLT for the cumulative distribution function are discussed. Section 6 is devoted to functional estimation. Consistency and CLTs are discussed here. Proofs are given in Appendix A, and Appendix B recalls the main probabilistic tools.
Finally, we remark that the limit theorems of Section 3 and the asymptotic results for functional estimation in Section 6 are provided for very large classes of models (Sections 3 and 4). Hence, more general time series formulations such as those in Section 4 allow us to extend directly the classical results of Section 2.
Time series analysis is a major part of econometrics. Here we provide several examples of interest in which it is essential to consider dependent structures instead of simple independence. In some situations, classical tools of weak dependence such as mixing are useless. For instance, when bootstrap techniques are used, no mixing conditions can be expected. Consider the following example concerning bootstrap: let a stationary autoregressive sequence be generated by an independent and identically distributed (i.i.d.) sequence
:
Standard nonparametric estimation techniques provide an estimate of the autoregression function r. Let
be a convenient estimator of r. Given data (X1,…,Xn) from the sequence (2.1), another autoregressive process is defined by
The innovations
are drawn according to the empirical measure of the estimated residuals,
. No mixing assumption can be expected for the previous model (2.2): see (4.16) in Section 4. However, a new concept of fading memory can still be applied. Bickel and Bühlmann (1999) set up such a new weak dependence condition to build critical bootstrap values for a linearity test in linear models. Doukhan and Louhichi (1999) have extended it to fit models such as positively dependent sequences, Markov chains (with or without topological assumptions), and Bernoulli shifts (see Definition 4.1). The Bernoulli shifts are defined in Assumption 1 of Hall and Horowitz (1996) and are used throughout that paper. The previously mentioned weak dependence conditions yield standard results concerning convergence in distribution with an
-normalization.
Another application of these results concerns linearity tests in time series analysis. Rios (1996) considers a stationary functional autoregressive model (2.1) where r = L + C is the decomposition of the autoregression function into a sum of linear (L) and nonlinear (C) components. Local linearity of r is then tested via the null hypothesis
where the weight function w has compact support.
Still another problem of interest is to test the independence of the innovations
in a regression model
This can be performed using the Durbin–Watson statistic. The latter can be written as a continuous functional of the Donsker line of the sequence
.
Consider a stationary autoregressive sequence
generated by an i.i.d. sequence
,
A classical problem is to test whether there is a unit root (i.e., a = 1). In the specific context of aggregate time series, the assumption of white noise innovations seems to be rather strong. Phillips (1987) develops unit root tests for mixing and heterogeneously distributed innovations. The ordinary least squares estimate
is shown to be a continuous functional of the Donsker line of the sequence
. As an application of the FCLT, Phillips shows that a unit root test can be based on the fact that under the null hypothesis H0 : a = 1,
where W denotes the standard Brownian motion and
. The author works with stationary strong mixing sequences, and conditions under which the FCLT result holds true are reported in Section 5.1. This result can be obtained under a weak dependent context detailed in Section 4. The conditions for which Donsker's theorems hold are described in Section 5.1. This example, as the author suggests, can be generalized to error sequences
that allow for heteroskedasticity. See also Mills (1999) for a discussion of the Dickey–Fuller unit root test in autoregressive models when errors fluctuate about a nonzero mean.
GMM estimation procedures involve an estimate
, which is a solution of the arg-min problem
, where
Here
is a finite-dimensional parameter set, and g(·,·) is a given function such that
, where θ0 is the true parameter point. In the time series context, the positive semidefinite matrix Ω is often replaced (see Hall and Horowitz, 1996, equation (3.2)) by an asymptotically optimal weight matrix estimate
and κ is such that
. The statistic to test
, where
(the square root of a symmetric positive matrix is uniquely defined).
A bootstrap procedure allows one to estimate the limit distribution of an estimate.
We describe a block-bootstrap procedure that is adapted to the times series
. Let b = b(n) and l = l(n) denote the number and the length of the blocks. Then bl = n and the block j is
(see Künsch, 1989). In this construction, a suitable form of
if the process Xk = H(ξk,ξk−1,…) is a Bernoulli shift as defined in Section 4.3. Here
are obtained through filtering and estimation procedures in the simple case of a linear process (H(z0,z1,…) = [sum ]k ak zk; see Section 2.4); in the general setting, one needs to develop additional estimation procedures. To describe the asymptotic properties of such processes one needs to know the limiting asymptotic behavior of Bernoulli shifts.
A simple local conditional bootstrap is investigated by Ango Nze, Bühlmann, and Doukhan (2002). In that paper, it is shown that asymptotic properties can be obtained using the same weak dependence techniques. The following central limit theorem (CLT) holds under standard mixing assumptions:
where the diagonal matrix Σn has d entries. GMM techniques naturally involve an unknown covariance matrix. To estimate such limiting distributions it is natural to use bootstrap techniques.
Let
denote a block-bootstrap sample and let
. The expectation is taken with respect to the bootstrap distribution. The GMM estimate
solves the arg-min problem
if the matrix Ω is known.
Hall and Horowitz (1996) make the erroneous statement that such Bernoulli shifts are strong mixing. However, the procedure used by Hall and Horowitz makes the bootstrap work. The weak dependence condition as defined in Doukhan and Louhichi (1999) allows us to rigorously prove the consistency of the Hall and Horowitz procedures. More precisely, if Xn = h(εn,εn−1,…) for some i.i.d. sequence
, their Assumption 1 is
1The function h takes its values in Lx × 1 space equipped with a norm ∥·∥. The assumption is missing the symbol of mathematical expectation
, as in Andrews (2002).
This condition holds for linear processes, and it is claimed to imply geometric strong mixing by the authors. Simple example (4.16) proves that this does not hold in general. This condition, however, does imply weak dependence in Doukhan and Louhichi (1999). Consequently, a tail inequality for sums of functions of the sequence ξn = f (Xn,θ) can be derived. It is the main tool to prove the validity of the bootstrap in this dependent setting.
The preceding procedure can be used for testing the null hypothesis H0 : θ = θ0 against the bilateral alternative. The studentized statistic Tn(θ) described in (2.4) and the critical value Qα then satisfy the relation that, under H0,
Hall and Horowitz show that the bootstrap studentized statistic
and the previously mentioned statistic Tn(θ) have close laws, in the sense that
for a relevant integer 2a, with a ≥ 1 + ξ, and the range of ξ ∈ [0,1] is formulated according to the dependence assumptions prescribed. This relation comes from an Edgeworth development. It yields an improved acceptance rule for the test of H0:
Andrews (2002) points out that a direct computation of the bootstrap critical value Qα* is a hard problem and that the common estimating procedure, which is based upon B bootstrapped, independent copies (from the law of large numbers, it follows that
) is also difficult to implement. Indeed, the computations involve the minimization of B nonlinear functionals. A numerical improvement is brought to bear in Andrews's paper. A bootstrap estimator
is computed by applying the Newton–Raphson algorithm. The initial value is
, and k iterations are made (k ≥ 3). A bootstrap studentized statistic Tn,k* is now available, for which the computation of the critical bootstrap values Qα*(B) is much easier, because the problem is linear. The method is claimed to be as accurate as the one discussed by Hall and Horowitz. In fact, the author states a similar result to (2.6) with respect to
. The assumptions are those of Hall and Horowitz. Therefore, the previously mentioned Assumption 1 must also be read in the context of the comments we have already made about Hall and Horowitz's paper. For the sake of completeness we present a corrected version of Lemma 1 in Hall and Horowitz (1996), which is proved in Section B.4 of Appendix B.
LEMMA 2.1. Let (ξn) be a stationary ψ1-weakly dependent (see Definition 3.4) sequence with
such that
for some a > 0, as r ↑ ∞, and
. Then
2Here θr is a dependence coefficient, and it is not related to a statistical parameter denoted θ and estimated by
.
satisfies
Following precisely the same steps as in Hall and Horowitz (1996), we thus prove, by only replacing their Lemma 1 by Lemma 2.1, that bootstrapping critical values for GMM estimators are valid.
Now Theorems 1–3 in Hall and Horowitz (1996) are rigorously proved. A paper on this topic by the authors is in preparation to provide sharper results; for instance, the exponent 33 in the previous lemma is unnatural, and it can be changed.
Bickel and Bühlmann (1999) tackle the problem of the “sieve bootstrap” for a one-sided linear process
where (ξn) is a sequence of i.i.d. random variables (r.v.s) with
and the density function fξ, and where
. Under the assumption that the function
has no root in the closed unit circle, the process (2.7) admits an AR(∞) representation
with
. Equation (2.8) is fitted with an autoregressive process of finite order p(n) (p(n)/n → 0, p(n) → ∞). Using estimated residuals, the resampling (i.i.d.) innovation process
is constructed by smoothing the empirical process based on those residuals by a kernel density estimate of the density fξ. Finally, the smoothed sieve bootstrap sample
is defined by resampling the AR(p(n)) process from innovations
:
The purpose of the paper was to carry over a weak dependence property (here strong mixing) of the initial sequence
to the sieve processes
(a classic and a smoothed version were examined in the paper). The goal is unrealistic for the classic bootstrap sample because the distribution of the bootstrapped innovations is discrete. Proving a mixing property for the smoothed sieve bootstrap sample eludes the efforts of the authors. In the latter case, it nevertheless appears that limit theorems can be proved by another method. It consists of using the following property:
with
and for smooth functions g1,g2 belonging to the classes
(see equation (3.1) in Bickel and Bühlmann for the definition of the class
and some examples). The new dependence coefficient ν is less than the strongly mixing coefficient. Bickel and Bühlmann (1999) cannot prove that the sieve sequence (Xn*) is strongly mixing. A weak dependence condition is now defined by the ν coefficient. Bickel and Bühlmann prove that it is satisfied by both this sequence and a smooth version of the resampled innovations. For instance, Bickel and Bühlmann prove that if the sequence
satisfies some regularity conditions ensuring that
(recall that νk ≤ αk), then the sieve bootstrap process
satisfies a ν-mixing condition with a polynomial rate
for relevant classes
and a positive constant L. See Theorem 3.2 in Bickel and Bühlmann (1999, p. 422) for more details.
We follow the presentation in Robinson (1989). He considers an economic variable observable at time n that is an R × 1 vector of r.v.s
. We observe Wn at time n = 1 − P,2 − P,…,T where P is nonnegative and T large. Hypotheses of economic interest often involve a subset Xn = B(Wn′,…,Wn−P′) of the array (Wn′,…,Wn−P′)′; for this B is a J × (PR) matrix formed from the PR-rowed identity matrix IPR by omitting PR − J rows (which means that in B, PR − J elements of Wn,…,Wn−P are deleted). Thus, in B, elements of Wn,Wn−1,…,Wn−P that are not in Xn are deleted, and Xn can have elements in common with Xn+P−1,…,Xn+1,Xn−1,…,Xn−P. Let Xn = (Yn′,Zn′)′, where Yn and Zn are K × 1 and L × 1 vectors (K + L = J). The problem of interest is to test the hypothesis
against the alternative
. This null hypothesis is written in the form
for
for some M > 0 and some function H(z,z) defined as
for some convenient function G and where
for any Borel set A of
. Here f(j)(z) denotes the vector of j-partial derivatives of f.
An example of this framework is given by Xn = (Yn′,Zn′)′, where Yn = (tn,sn′)′ and Zn = vn. The regression model
is of common use in econometrics. Here sn,tn,vn are, respectively, scalars, p × 1, and q × 1; they are observable random sequences whereas the innovation process (un) is centered and unobservable, so that
; we denote
. In the case of a weakly dependent and stationary innovation process, Robinson (1989) considers the hypothesis H0 : β = 0. In this case, the hypothesis can be written as before, and Robinson calculates β = τ where K = p + 1, L = 1, M = 0, and G(x1,x2) = (t1 − t2)s1φ(v1) for some function
(usually φ ≡ 1). Robinson considers the statistics
constructed from the n-sample (X1,…,Xn). Here,
is a U-statistic and
is the natural estimator of the covariance matrix of
. One such estimate is
. Tapered versions might be preferred (see Robinson, 1989, formula (2.21)); here
with di,j = G(Xi,Xj)k(Zi − Zj)/h, where k(z) = h−L(k(z),h−1k(1)(z)′,…,h−Mk(M)(z)′). Under β-mixing assumptions, Robinson proves that these estimates are
-consistent and satisfy a CLT. Under a natural β-mixing condition, Robinson proves in fact that the statistic
has asymptotically a χ2-distribution if
where b > μ/(μ − 2) under the moment assumption
.
The β-mixing assumption allows us to compare the joint distribution of the initial sequence with respect to a sequence of r.v.s with independent blocks. This reconstruction is due to Berbee's coupling lemma, no matter how big the size of the blocks may be. Yoshihara (1976) derives a covariance inequality that fits to U-statistics. A way to get rid of β-mixing conditions is to consider an independent realization
of the trajectory X1,…,Xn. Now a simpler estimator of τ is given by
The asymptotic behavior of this expression is easy to derive under alternative weak dependence conditions by using our results because
is the numerator of a Nadaraya–Watson kernel for the regression estimation problem
in the special case of the previous example. In fact this trick avoids the corresponding coupling construction for U-statistics. For another application of semiparametric problems, see, for example, Chen and Fan (1999).
For a stationary process
with Zt = (Xt,Yt), an important quantity is the regression function
. Various methods to fit such a function have been developed. Nadaraya–Watson kernel estimates are very popular; see, for instance, Rosenblatt (1991), Prakasa (1983), and Robinson (1983). Let K be some kernel function that integrates to 1, Lipschitzian and with compact support.
Among other problems, one may wish to estimate the volatility of financial times series, v(x) = Var(Xt|Xt−1 = x). The question enters the general framework because v(x) = v2(x) − v12(x), where
.
Another important problem of econometric interest is to estimate the marginal density f of a stationary sample. Density kernel estimators built on K are available. Density and regression functions of derivatives can also be estimated by using analogous procedures.
Finally, conditional quantiles are linked to the conditional distribution
. More precisely, we denote by q(t|x) = inf{y; F(y|x) > t} the generalized (right-continuous, with left limits) inverse of the monotone function y [map ] F(y|x). Consistent estimators of the conditional regression
, where
, provide information on the previous conditional quantiles.
Various generalizations of independence have been introduced to answer the econometric questions discussed in Section 2. The martingale setting was the first extension of the independence framework (Hall and Heyde, 1980); weakening martingale conditions yields mixingales. Martingale conditions are written in terms of conditional expectations, and they seem to be quite restrictive in econometric practice. NED is a more flexible tool for modeling fading memory. The ergodic theorem is the first limit theorem proved for dependent sequences. Another point of view is given by the mixing properties of stationary sequences in the sense of ergodic theory: uniform versions of such properties are the forthcoming mixing properties. Those conditions are also based on independence properties of the underlying generated σ-algebras. They are also difficult to check (see Doukhan, 1994).
Our aim is to promote the weak dependence properties, which will be seen to be much easier to prove.
Let
be a probability space and let
be two sub σ-algebras of
. Various measures of dependence between
have been introduced; among them we recall
These coefficients are, respectively, the strong mixing coefficient
of Rosenblatt (1956), the absolute regularity coefficient
of Wolkonski and Rozanov (1959, 1961), the maximal correlation coefficient
of Kolmogorov and Rozanov (1960), and the uniform mixing coefficient
of Ibragimov (1962).
Let
be a discrete time stationary process. We denote XA = {Xt; t ∈ A} the A-marginal of X with
. Finally, σ(Z) will denote the σ-algebra generated by an r.v. Z.
For any coefficient previously defined, say, c(.,.), we shall call the process X a c-mixing process if limk→∞ cX,k = 0, where cX,k = c(σ(X]−∞,0]),σ(X[k,+∞[)). The following relations hold:
and no reverse implication holds in general. See Doukhan (1994) for more information.
Let
be a real-valued process. We let
.
DEFINITION 3.1 (McLeish, 1975; Andrews, 1988). Let p ≥ 1 and let
be an increasing sequence of σ-algebras. The sequence
is called an
-mixingale if there exist nonnegative sequences
such that ψ(n) → 0 as n → ∞ and for all integers
,
This property of fading memory is easier to handle than the martingale condition. A more general concept is the NED on a mixing process. Its definition can be found in the work by Billingsley (1968), who considered functions of φ-mixing processes.
DEFINITION 3.2 (Pötscher and Prucha, 1991a, 1991b). Let p ≥ 1. We consider a c-mixing process
. For any integers i ≤ j, set
. The sequence
is called an
-NED process on the c-mixing process
if there exist nonnegative sequences
such that ψ(n) → 0 as n → ∞ and for all integers
,
This approach is developed in detail in Pötscher and Prucha (1991). Functions of MA(∞) processes can be handled using the NED concept. For instance, limit theorems can be deduced for sums of such functions of MA(∞) processes. These previous definitions translate the fact that a k-period—ahead in the first case, both ahead and backward in the second definition—projection is convergent to the unconditional mean. They are known to be satisfied by a wide class of models. For example, martingale differences can be described as
-mixingale sequences, and linear processes with martingale difference innovations also.
The notion of association was introduced independently by Esary, Proschan, and Walkup (1967) and Fortuin, Kastelyn, and Ginibre (1971).
The motivations of those authors were radically different because the first group of authors was working in reliability theory and the others in mechanical statistics. The condition of the second group of authors is known as FKG inequality.
DEFINITION 3.3. The sequence
is associated, if for all coordinatewise increasing real-valued functions h and k,
for all finite disjoint subsets A and B of
.
This extends the positive correlation assumption to model the notion that two stochastic processes have a tendency to evolve in a similar way.
This definition is deeper than the simple positive correlatedness. Besides the evident fact that it does not assume that the variances exist, one can easily construct orthogonal (hence positively correlated) sequences that do not have the association property. An important difference between the preceding conditions is that its uncorrelatedness implies independence of an associated sequence (Newman, 1984). Let, for instance, (ξk,ηk) be independent and i.i.d.
sequences. Then the sequence
defined by Xk = ξk(ηk − ηk−1) is neither correlated nor independent, and hence it is not an associated sequence. Heredity of association only holds under monotonic transformations. This unpleasant restriction will disappear under the assumption of weak dependence.
The preceding property of associated sequences was a guideline for the forthcoming definition of weak dependence. It contains the idea that weakly correlated associated sequences are also “weakly dependent.” The very explicit inequality (B.2) proves that this idea makes sense.
On the opposite side, negatively associated sequences of r.v.s are defined by a similar relation as the aforementioned covariance inequality, except for the sign of this inequality. Shao (2000) provides a lucid summary of this type of association. Then he points out a crucial property of domination by comparable independent sequences. This property breaks the seemingly parallel definitions of positively and negatively associated sequences. We shall develop this idea further.
Here we shall make more explicit the asymptotic independence between “past” and “future.” Roughly speaking, for convenient functions h and k, we shall assume that
is small when the distance between the “past” and the “future” is sufficiently large. Define by
the union of the sets
of numerical bounded measurable functions on some euclidean space
and ∥.∥∞ the corresponding uniform norm. We define the Lipschitz modulus of a function
by
if x = (z1,…,zu). Consider the class
DEFINITION 3.4 (Doukhan and Louhichi, 1999). A sequence
is called
-weak dependent if there exist a sequence
decreasing to zero at infinity and a function ψ with arguments
such that for any u-tuple (i1,…,iu) and any v-tuple (j1,…,jv) with i1 ≤ … ≤ iu < iu + r ≤ j1 ≤ … ≤ jv, one has
if the functions h and k are defined, respectively, on
and on
.
Notice that the sequence θ depends both on the class
and on the function ψ. The function ψ can in fact depend on all its arguments, contrary to the case of bounded mixing sequences. This definition is hereditary through images by convenient functions.
The examples of interest to follow involve the function ψ1(h,k,u,v) = uLip(h) + vLip(k), ψ1′(h,k,u,v) = vLip(k), ψ2(h,k,u,v) = uvLip(h)Lip(k), and ψ2′(h,k,u,v) = vLip(h)Lip(k). For example, proving that an MA(∞) process Xn = [sum ]k≥ akξn−k based on an i.i.d. sequence such that
and [sum ]k|ak| < ∞ is ψ1′-weakly dependent with
is based on the decomposition Xn = Xn + Xn with Xn = [sum ]k<r akξn−k. In this case, assuming for simplicity that v = 1 and j1 = n, we have
3Thanks to an anonymous referee, we prove that NED implies our weak dependence through the following inequalities. For simplicity, write h = h(Xi1,…,Xiu), k = k(Xj1,…,Xjv), then the Cauchy–Schwarz inequality gives
and the last expression can be bounded using the
-mixingale property of
-NED sequences. Clearly, this implication is not an equivalence between both notions. It is an open question whether or not these notions are equivalent.
Let
be a real-valued i.i.d. sequence and let M be some function. We consider vector-valued models driven by the equation
To justify the title of this section, note that the vector-valued sequence
, where Xn(p) = (Xn−1,…,Xn−p), is Markovian. Using Proposition 7.6 of Kallenberg (1997) proves that any Markov process has such a representation.
Under “reasonable” assumptions (described subsequently) such models can be rewritten as ergodic Markov chains (see Meyn and Tweedie, 1993; Tjøstheim, 1990). Thus, the stationarity assumption is reachable. An interesting class is given by
where
are two mutually independent i.i.d. sequences and the function S satisfies S(x1,…,xp) ≥ s > 0 for some
, and any real numbers x1,…,xp and the functions R and S essentially satisfy contraction assumptions (for developments, see Doukhan, 1994; Ango Nze, 1995, 1998; Duflo, 1990).
For instance, ARMA(p,q) processes
have such a Markov representation in the case when the roots of the polynomial
lie outside the unit disk. Indeed,
, where
is a Markov process. See Mokkadem (1990).
A further example is that of bilinear models4
Further technical details on this topic are provided by Granger and Andersen (1978), among other references.
Examples of such models are also doubly stochastic autoregressive processes: Xn = ηn Xn−1 + ξn.
Econometricians have introduced generalized ARCH-GARCH processes:
to model conditional variances (interpreted as, e.g., an asset volatility in finance theory) that change over time (for further references, see Bollerslev, 1986). These models are known to satisfy the NED property of Definition (3.2).
Note that functional autoregressive models correspond to constant functions s (see Bollerslev, 1986).
Moreover, threshold models are those for which r is linear on a partition of the space into polygonal regions. For example, Petruccelli and Woolford (1984) study threshold autoregressive models such as
where x+ = max(x,0) and x− = min(x,0). This model is ergodic if a < 1,b < 1, and ab < 1 and has geometric rates of convergence in total variation to the stationary limit if the centered sequence (ξn) has finite exponential moment and its distribution has a density with respect to Lebesgue measure. If, for instance, (a,b) = (½,−2), the function r(x) = ax+ + bx− is relevant, but it is not a contraction.
ARCH or GARCH models are those with nonconstant functions s, such as square roots of nonnegative polynomials with degree 2, namely,
with |a| + |c| < 1. Vector-valued versions of such models can also be described. They include GARCH models. He and Teräsvirta's paper (1999) looks at the existence of marginal moments and conditions for stationarity of GARCH models. The following example of a Markovian nonmixing sequence is given in Andrews (1984) and Rosenblatt (1985). This is the (Markov) AR(1)-process with binomial innovations
:
This is also the Bernoulli shift Xn = H(ξn,ξn−1,…) with
. Full definitions of Bernoulli shifts will be given in Section 4.3. This model has stationary uniform distribution on the interval [0,1], but it satisfies no mixing condition. Indeed, the innovations ξj (j ≤ n) are the digits of the dyadic expansion of Xn; hence, Xk is a deterministic function of Xn for k ≤ n. An extension of this model to innovations taking p different values is immediate; for this, one can use the numeration in basis p. The process Xn = 0.ξnξn−1… is the solution of the recurrence equation Xn = (1/p)(Xn−1 + ξn) if the innovations are uniform on {0,1,…,p − 1}.
Lipschitzian models (see Duflo, 1990) are multivariate Markov models, defined recursively through Xn = M(Xn−1,ξn) and the assumptions that
for some 0 ≤ a < 1 and S ≥ 1. Here, the
-space where the process lives is equipped with some—not necessarily euclidean—norm ∥·∥. Duflo (1990) introduces the concept of stability of such Markov chains. She proves their geometric stability. That is, denoting Φn(X0) to be precisely the initial state X0, there exists some c ∈ [0,1[such that for any
In the particular case where X0′s distribution is the stationary probability measure,
Using those results, Doukhan and Louhichi (1999) deduce
-weak dependence. In fact, under the assumptions that follow, one has
. Here neither stationarity nor any further regularity assumption on the sequence of innovations is required. Such contraction properties are also used by Pötscher and Prucha (1991). More general AR(p) nonlinear models, Xn = M(Xn−1,…,Xn−p;ξn), have the same properties, if, for example,
and, for some constants aj ≥ 0,1 ≤ j ≤ p with
,
The more recent papers by Diaconis and Freedman (1999) and Jarner and Tweedie (2001) provide a wide range of examples in this spirit. Alsmeyer and Fuh (2001) give conditions for arithmetic decay of the weak dependence coefficient sequence. Both papers study iterated random sequences Mn = F(εn,Mn−1) for independent sequences (εn) and some F, measurable and Lispschitz in the second variable. The process (Mn) takes values in a complete separable metric space (E,d) and forms a Markov chain. Under the assumption of existence of the unique invariant distribution π, both papers prove, using different methods, that
if for some x0 ∈ E and p > 0
The distance
is the Prohorov metric associated with d.
Mixing properties of the models with a Markovian representation, Xn = M(Xn−1,…,Xn−p,ξn), are described in Meyn and Tweedie (1993). The preceding models are ergodic under suitable assumptions on ξ0's distribution.
Assume that
and assume the existence of an almost surely nonvanishing density f for ξ0's law. Then, under contraction assumptions on the function M, one can prove that, under the invariant initial distribution,
. If M(Xn−1,…,Xn−p,ξn) = R(Xn−1,…,Xn−p) + ξn, then the second relation holds if, for example,
and R is continuous. (It is enough that R be continuous out of a null set; e.g., a piecewise continuous function R is relevant, as in the previously mentioned threshold model by Petruccelli and Woolford, 1984. For more details, see Doukhan, 1994). In fact, Davydov (1973) has proved that Harris recurrent Markov chains are ergodic and β-mixing when stationary; moreover, denoting by μ the stationary distribution of the Markov chain Xn, by P the transition probability kernel, and by ∥·∥TV the norm in total variation, one has that
Returning to the more specific models introduced before, Ango Nze (1995) proves that (4.18) implies the β-mixing property under the assumption that ξ0's distribution has a density with respect to Lebesgue measure. The mixing coefficients decrease at a geometric rate. If, moreover, p = 1 in the preceding (functional AR(1)) model, he proves (see also Doukhan, 1994) that, under the previous assumptions on the white noise (ξn),
The expression
for some constants c, A > 0, and any real number x. The function R is continuous. A more general result is obtained in Ango Nze (1998) if
. Veretennikov (1999) improves on the previous hyperbolic mixing decay assumptions. Under a local Doëblin condition (implied by the preceding absolute continuity assumptions on ξ0's distribution), he proves that
if b < S/2 − 1 where S > 4 satisfies
. The existence of the stationary distribution is proved under the relaxed condition S ≥ 2. Improved results are provided in Fort and Moulines (2002); they are clarified in the work by Jarner and Tweedie (2001), where constants are explicitly given.
The expression
for some constants B < e−b and A > 0 and any real number x. The function R is continuous.
If the innovations have a finite exponential moment,
, Mokkadem (1990) proves that the assumptions |R(x)| ≤ |x| − ε for some ε > 0 and |x| big enough to ensure an analogous result: the mixing sequence (βn) decays at a exponential rate.
The expression
if R(x) is a bounded function, continuous outside a null set, and ξ0's law is not orthogonal to Lebesgue measure; moreover, the stationarity is no longer required. Unfortunately, this drastic boundedness condition excludes, for example, the linear autoregressive processes.
The preceding results provide (upper) bounds for the mixing coefficients. It is a much harder problem to derive both upper and lower bounds of the mixing sequences from the assumptions about the model; for results about some types of Markov sequences, see Davydov (1973). Meyn and Tweedie (1993) also provide necessary and sufficient conditions for geometric ergodicity of threshold autoregressive linear processes (the function R is piecewise linear; see also Cline and Pu, 1998). Doubly stochastic AR models are geometrically ergodic if ξ's distribution has an absolutely continuous component and
(see Tjøstheim, 1986).
For the other models, we refer to Pham (1986), Doukhan (1994), and Ango Nze (1998).
Associated sequences with finite variances are
-weak dependent with θr = supi≥r Cov(X0,Xi) (see Doukhan and Louhichi, 1999). Note that broad classes of examples of associated processes result from the fact that any independent sequence is associated and that monotonicity preserves association (for this, see Newman, 1984).
The case of Gaussian sequences is analogous. One may also consider combinations of sums of Gaussian and associated sequences, or Bernoulli shifts driven by stationary, associated, instead of i.i.d. sequences.
Linear processes with nonnegative coefficients are associated, and so are functional autoregressive processes with nondecreasing regression functions. Note that for associated or Gaussian sequences, the function ψ2′ replaces ψ2 if θr = supi≥r|Cov(X0,Xi)| is replaced by θr = [sum ]i≥r|Cov(X0,Xi)|.
Giraitis, Kokoszka, and Leipus (2000) consider ARCH(∞)-models (4.19) with nonnegative coefficients and nonnegative inputs. In that case the models are also associated.
DEFINITION 4.1. Let
be a sequence of i.i.d. real-valued r.v.s and the function
be measurable. The sequence
is called a Bernoulli shift if it is defined by
.
We refer the reader to Ornstein and Weiss (1990), where such models are motivated through deep ergodic theoretic arguments.
One-sided shifts are defined as Xn = H(ξn,ξn−1,ξn−2,…,ξ0,ξ−1,ξ−2,…), that is,
. The model described in equation (4.16) is an example of such a shift:
. The previous model is a simple example of a weakly dependent but possibly nonmixing sequence.
A general situation where sequences are one sided is the following Markov stationary setting. Consider a Markov process driven by the updating equation Xt = M(Xt−1,ξt), for some i.i.d. sequence
; then the function H if it exists is defined implicitly by the relation H(x) = M(H(x′),x0), where x = (x0,x1,x2,…), x′ = (x1,x2,x3,…). To consider more general Markov sequences one may also refer to the previous section devoted to Markov processes. To prove such Bernoulli shift representations, Mokkadem (1990) and Meyn and Tweedie (1993) use the tools of control theory.
We now specialize the analysis to chaotic expansions associated with the discrete chaos generated by the sequence
. Let
; we write in a condensed formulation
, where H(k)(x) denotes the kth-order chaos contribution, H(0)(x) = a0(0), is only a centering constant and for k > 0,
or in short, in vector notation,
.
Processes associated with a finite number of chaotic terms (i.e., H(k) = 0 if k > k0) are also called Volterra processes. The first example of such a Volterra process is clearly the class of linear processes that includes autoregressive moving average (ARMA) processes: it corresponds to the consideration of just a term in the first chaos (i.e., k = 1 in the previous representation); it is widely used in the field of statistics (see, e.g., Rosenblatt, 1985). A simple and general condition for
-convergence of such series is, still in a condensed notation,
.
The simple bilinear process, Xt = (a + bξt−1)Xt−1 + ξt, is stationary if
(see, e.g., Tong, 1981). It is a Bernoulli shift with
, for
.
More general affine models are considered in Mokkadem (1990).
ARCH(∞)-models (see Giraitis et al., 2000) are given by a sequence (bj)j≥1 and an i.i.d. sequence of r.v.s (ξj)j≥0 through the recursive relation
Such models have a stationary representation with the chaotic expansion
under the simple assumption
.
Finite moving averages Xn = H(ξn,ξn−1,…,ξn−m) are trivially m-dependent.
The Bernoulli shift Xn = H(ξn,ξn−1,…) (with
) is not mixing; this is again example (4.16) of a Markovian, nonmixing,
5Its stationary representation writes
. Here ξn−k is the kth digit in the binary expansion of the uniformly chosen number Xn = 0.ξnξn−1… ∈ [0,1]. This proves that Xn is a deterministic function of X0, which is the main argument to derive that such models are not mixing. The same arguments apply to the model described before of an autoregressive process with innovations taking p distinct values.
Hence one cannot expect sufficient condition for mixing in such weak shifts.
Contrary to mixing conditions, it can be proved that even two-sided sequences can be
-weak dependent. For instance, for infinite moving averages
. Note also, for completeness, that (NED) conditions can also be deduced for such two-sided models. More generally, we can state the following definition.
DEFINITION 4.2. For any integer k > 0, we denote by δr any number such that
Such sequences
are related to the modulus of uniform continuity of H; that is, if for positive constants
, the inequality
holds for any sequences
, and if the sequence
has a finite moment of order b, then one can choose
.
PROPOSITION 4.1 (Doukhan and Louhichi, 1999). Bernoulli shifts are
-weak dependent with θr = 2δr/2 and ψ(h,k,u,v) = 4(u∥k∥∞ Lip(h) + v∥h∥∞ Lip(k)). If, moreover, the Bernoulli shift is one sided, then it is
-weak dependent with θr = δr and ψ(h,k,u,v) = 2vLipk∥h∥∞.
We turn back to Volterra expansions. A suitable bound for δr corresponds here to the stationarity condition
The one-sided example of a simple bilinear process, Xt = (a + bξt−1)Xt−1 + ξt, with convergent chaotic representation for
satisfies δr = θr = cr(r + 1)/(1 − c); it has a geometric rate of decay under a stationarity condition set out by Tong (1981). The stochastic volatility model
is another example yielding a one-sided chaotic decomposition. The sequence (ηj) is assumed to consists of independent r.v.s and to be independent of the centered reduced sequence (ξn). The chaotic representation converges if
.
Using suitable assumptions on the law of ξ0, the one-sided linear processes
satisfy β-mixing conditions. This requires the absolute continuity of ξ0's density. If
, and if, moreover, for some
, then
, for some C > 0 (Pham and Tran, 1985). See Doukhan
6If
, then, under the preceding regularity and moment conditions, we have βn ∼ n−b, where b = ((a − 2)δ − 1)/(1 + δ) and a > 2 + 1/δ. Therefore,
holds if a > 3 + 2/δ. If, for instance, δ = 1, this becomes a > 5; on the other hand, when δ = ∞, this becomes a > 3.
The aim of this section is to present the state of the art of the limit theorems for stationary sequences.
Consider a stationary sequence
. We assume that this sequence is integrable and centered at expectation
Denote by [x] the integer part of a real number x ([x] ≤ x < [x] + 1). The Donsker line (Dn(t))t∈[0,1] is defined for any sample with positive size n as the following continuous time process:
We consider the following convergence result in the space
of continuous functions on the unit interval when the sample size n grows to infinity.
THEOREM 5.1. The following functional convergence holds in the space
under any of the weak dependence conditions formulated subsequently:
Here,
(the series is assumed to be convergent).
Recall that here (Wt)t∈[0,1] is the standard Brownian motion; that is, W denotes the centered Gaussian real-valued process with covariance function
To avoid triviality we shall also assume that σ ≠ 0.
The preceding FCLT is known to hold in the cases that follow.
Recall that the quantile function Q of the distribution of X0 is also the càdlàg (left continuous with right limits) inverse of the tail function
and that α−1 is the càdlàg inverse of the monotone function t → α[t]. The condition
implies the FCLT in Theorem 5.1 and also the convergence of the series in the definition of σ2 (for more details, see Rio, 2000). Let δ > 0. Assume that moment of order (2 + δ) of X0 is finite. Condition (DMR) is equivalent to
The previous FCLT result was obtained by Davydov (1973) under the slightly stronger assumption
. In other words, both series converge for the same hyperbolic mixing decays αn ∼ n−a for a > 1 + 2/δ. Note that no gain seems to be obtained here when one considers β-mixing sequences.
The condition
with
implies the FCLT, as proven by Shao (1988). It is well known that the preceding conditions ensure nice behavior of second-order moments.
The condition
implies the functional convergence (see Newman and Wright, 1981). Clearly, this condition is also necessary to ensure that Theorem 5.1 holds.
Notice that the property “orthogonality implies independence” makes this condition credible for this very special case of an associated sequence.
The FCLT is phrased for a strictly stationary negatively associated sequence in identical terms. In fact, the tightness of
is shown using an exponential inequality that results from a comparison theorem on moment inequalities proved by Shao (2000).
We consider two-sided linear sequences. If the coefficients in the linear process
satisfy
, then
is the decorrelation rate of the sequence. The FCLT holds under the condition D > 1 (see Giraitis and Surgailis, 1986).
Giraitis and Surgailis (1986) prove the result for the partial sums of Φ(Xk) if Φ is some polynomial function with Appell rank m (see definition that follows). Recall that Appell polynomials are defined through the relation
Like Hermite polynomials, they satisfy a recursive relation: for all
,
Setting
, one obtains, for instance,
Let F denote the distribution function of X0. An analytic function Φ such that
has an uniquely defined Appell expansion:
Now, if the distribution function F is infinitely differentiable, then, setting f = F′ for the density function, one obtains
. A straightforward integration by parts yields
This means that the system of functions (Ak,Qk)k≥0 is biorthogonal.
It is suitable to define the Appell rank of Φ as the smallest integer m such that cm ≠ 0. Appell rank is thus uniquely defined at least for polynomials. The system of Appell polynomials is not orthogonal (except for the special case of Hermite polynomials, which are associated with Gaussian distributions). Hence existence and uniqueness of such expansions follow from additional conditions such as analyticity. We refer to Giraitis and Surgailis (1986) for details. Giraitis and Surgailis (1986) assume the existence of moments of any order and
. The functional convergence is ensured by the Chentsov tightness criterion, given in Appendix B. The reason is that the method of moments is used to prove the CLT.
Concerning one-sided sequences, Ho and Hsing (1997) obtain an analogous CLT for more general nonlinear functionals of a one-sided linear sequence. The idea is to approximate such nonlinear functions of a one-sided linear sequence by m-dependent moving averages that are easily shown to satisfy an
-CLT. The main assumptions are
and the following regularity condition. The regularity is twofold. For any subset J of the set
of nonnegative integers, define
.
A C1-condition on the functions
is required if Card(J) is large enough. Hence, even a very weak regularity condition on the marginal innovations such as
for a small δ > 0 implies the regularity of the distribution of YJ, hence the regularity needed on ΦJ's. Now Burkholder's inequality for martingales with
still yields the Chentsov tightness criterion.
Such linear sequences are also
-weakly dependent sequences. Therefore, one may refer to Section 4.3.
The same finite-dimensional CLT as for linear processes holds for instantaneous functions of Gaussian stationary sequences when one replaces m by the Hermite rank of an arbitrary function Φ such that
(see Breuer and Major, 1983).
However, Donsker's Theorem 5.1 requires an additional tightness condition. Chambers and Slud (1989) introduce such a condition in terms of the coefficients of Φ's Hermite expansion. Assume that
, where
. This means that an exponential decay of the coefficients is needed to obtain Donsker's theorem.
Chambers and Slud provide a result for general stationary processes that are built on a Gaussian chaos. Such functionals may fail to be instantaneous functions Φ(Yn). They can be written as general Bernoulli shifts of
:
The authors also consider instantaneous functionals of Gaussian sequences satisfying CLT but not the Donsker theorem.
Recall, however, that a smooth lower bound assumption on the spectral density of the process yields ρ-mixing. Hence Theorem 5.1 still holds under slight additional conditions on the Gaussian process. The assumption concerning the function Φ is unchanged:
.
Assume a
-weak dependence condition with
, for the stationary sequence
. Suppose also that for some
.
Then if the function ψ associated with weak dependence is ψ1 (respectively, ψ2), Doukhan and Louhichi (1999) prove the FCLT if
So, without any regularity condition on innovations, Theorem 5.1 holds for a bounded Lipschitz function of a linear process if
when D > 3. The latter doesn't need to be one sided, whereas Ho and Hsing (1997) need this assumption. Moreover, the functions Φ available are more general than those considered by Giraitis and Surgailis (1986).
Finally, a faster hyperbolic decay of coefficients in place of the boundedness assumption for Φ together with the finiteness of the fourth-order moment for the innovations yields the FCLT.
Define the class
of real-valued functions
Assuming
-weak dependence, we can even improve on the preceding condition by the following uniform bound on the set of integers i ≤ j ≤ k ≤ l and j + r ≤ k
In this section, we consider conditions in terms of conditional expectations with respect to an adapted filtration. We first recall that Theorem 5.1 holds for martingales with stationary square integrable increments such that
(see Billingsley, 1968). More generally, let
be a process adapted to the filtration
is
-measurable for any
. The following result is proved by Dedecker and Rio (2000). All ergodicity assumptions are gone. Let
denote the right shift operator (i.e.,
). Denote by
the tail σ-algebra of T-invariant Borel sets of
.
THEOREM 5.2. Assume that
for any
and
is a convergent series in
. Denote by
the partial sums. Then the sequence
converges in
to some r.v. η and, conditionally on the tail σ-algebra
, the process
converges to the Brownian motion ηWt.
Remark. This result provides a FCLT with a limit process that is not Gaussian in general. If the sequence is ergodic, a standard Donsker theorem holds. Indeed, the ergodicity assumption implies that the r.v. η is almost surely constant. Hall and Heyde (1980) give this theorem under a more restrictive
assumption: both series
converge in
.
Theorem 5.1 under the condition (DMR) stated in Section 5.1.1 can also be derived as a corollary of Theorem 5.2.
The following corollary can be deduced from Theorem 5.2. Consider a stationary Markov sequence
with stationary distribution μ and transition operator P. Let Xn = g(ξn) be a centered at expectation, nonlinear functional of
. Then the FCLT holds under the condition of convergence of the series
.
Let us consider a stationary sequence
. We assume without loss of generality that the marginal distribution of this sequence is uniform on [0,1]. The cumulative distribution of the empirical process, En, of the sequence
at time n is defined as (1/n)En(x), where
We consider the following convergence result in the Skohorod space
when the sample size n converges to infinity:
Here
is the dependent analogue of a Brownian bridge; that is, B denotes the centered Gaussian process with covariance function given by
Note that for independent sequences with a marginal cumulative distribution function F, this just means that B(x) = B(F(x)) for some standard Brownian bridge B; this justifies the name generalized Brownian bridge.
THEOREM 5.3. The following functional convergence holds in the Skohorod space of real-valued càdlàg functions on the real line,
, under the weak dependence conditions detailed in the next sections:
The preceding covariance function can be rewritten as
For the i.i.d. case, it is equivalent to Γ(x,y) = Γiid(x,y) = F(x) ∧ F(y) − F(x)F(y); as the supremum of two regular functions, this term is intrinsically singular on the diagonal x = y. This is no longer the case for the other terms
. If, for instance, the second-order marginals (X0,Xk) have a continuous joint density, then Tk(x,y) is a C2-function. It is well known that the regularity properties of a Gaussian process are determined by those of its covariance function. Hence, the distortion of the Brownian bridge due to this series is not very important. Either finite-dimensional or empirical functional convergence (EFCLT) is known to hold in the following cases.
The condition
implies finite-dimensional convergence. The EFCLT holds if, for some a > 1,
This result, proved by Rio (2000), improves on the previous conditions
formerly given by Shao and Yu (1996) and on the condition a > 3 from Yoshihara (1973). This condition is close to the previous summability, necessary to ensure finite-dimensional convergence.
The condition
implies finite-dimensional convergence. Doukhan, Massart, and Rio (1995) obtain the EFCLT Theorem 5.3 when
, for some a > 2. Here tightness obtains with an additional loss term of order (log2 n). Finally, Rio (2000) obtains the simple (and optimal) sufficient condition for EFCLT
In a previous paper Arcones and Yu (1994) have proved CLTs for empirical processes indexed by so-called V.C. subgraph classes of functions, not necessarily bounded (for more details, see van der Vaart and Wellner, 1996, p. 141). This context contrasts with the common conditions in terms of bracketing numbers. The EFCLT obtains for uniformly bounded classes in the pth mean under the β-mixing condition that
The condition
implies finite-dimensional convergence (see Peligrad, 1987). Shao and Yu (1996) obtain the EFCLT under the same condition.
Let
be a standard Gaussian stationary process:
. Consider a function Φ with Hermite rank m such that
. Csörgő and Mielniczuk (1996) prove the EFCLT for the subordinated field Xn = Φ(Yn) under the natural condition
Hence even if the indicator functions do not satisfy the additional tightness assumption in Chambers and Slud (1989), the tightness of the empirical process follows from the diagram formula. In this case the Kolmogorov–Smirnov statistics are convergent even if the Donsker theorem does not hold for finite-dimensional distributions of the empirical cumulative process.
We assume that the marginal distribution of X0 is uniform on an interval [0,1]. Then the condition
implies finite-dimensional convergence. Louhichi (2000) obtains the EFCLT Theorem 5.3 under the condition that, for some a > 4,
Her result improves on the condition of Shao and Yu (1996):
.
The EFCLT Theorem 5.3 holds if, for some 0 < γ ≤ 1 and S,C,Δ > 0, with SΔ > 2γ, we have
If the innovations have moments of any order, the existence of some Δ > 0 such that the preceding inequality holds implies the result.
On the other hand, a lower order moment assumption for the innovation allows higher regularity properties. If, for example, γ = 1 and S = 2, the inversion formula shows the existence of a
-integrable density. If now γ = 1 and S = 4, the density must be
-integrable.
For γ = 1, this result recovers the proof of the EFCLT in Doukhan and Surgailis (1998) when the covariance series is absolutely convergent. The latter paper considers the case S = 4; however, Burkholder's inequality for martingales yields the general result in a straightforward way. Giraitis and Surgailis (1994) gave a hint of the available results in a long memory dependence framework.
The sequence
is assumed to satisfy a weak dependence condition we now present:
where
(in this case a weak dependence condition holds for a class of functions
working only with the values u = 1 or 2).
PROPOSITION 5.1. Let
be a stationary sequence such that (5.20) holds with
for some ν > 0. Then the sequence of processes
is tight in the Skohorod space
.
In the same way, stationary mixing sequences satisfy the conditions of Proposition 5.1 if
. This condition is slightly sharper than Yoshihara's condition,
for some ν > 0. Yet, it is slightly sharper than the corresponding result in Shao and Yu (see Theorem 2.2 therein), and the result of Rio (2000) improves on both of them.
THEOREM 5.4. Suppose that
is
-weak dependent. If either
, then the empirical functional convergence holds.
Remark. The use of the space
allows one to work with each of the classes of models in the previous section (association and Gaussian sequences enter the first case, whereas the second one corresponds to Bernoulli shifts). This yields new results for Bernoulli shifts and apparently for Markov sequences. Note that Rio's condition for α-mixing sequences improves this result. Moreover, Yu (1993) proves the same result for associated sequences with an exponent loss term 1.5.
We consider a stationary process
with Zt = (Xt,Yt) where
. The quantity of interest is the regression function
. Let K be some kernel function integrated to 1, Lipschitzian with compact support. The kernel estimators are defined by
Here
is a sequence of positive real numbers. We always assume that hn → 0, nhn → ∞ as n → ∞.
DEFINITION 6.1. Let ρ = a + b with
. Define the set of ρ-regular functions
by
Here,
is the set of a-times continuously differentiable functions.
Assuming
, one can choose a kernel function K of order ρ (not necessarily nonnegative) such that the bias bh satisfies
uniformly on any compact subset of
(see Rosenblatt, 1991). If, moreover, ρ is an integer with b = 1, ρ = a − 1, then with an appropriately chosen kernel K of order ρ, bh(x) ∼ (g(ρ)(x)/ρ!)hρ∫ sρK(s) ds, uniformly on any compact interval.
In view of the asymptotic analysis we assume that the marginal density f (.) of X0 exists and is continuous. Moreover, f (x) > 0 for any point x of interest, and the regression function
exists and is continuous. Finally, for some
exists and is continuous. We set g = fr with obvious shorthand notation. Moreover, we impose one of the following moment conditions. Either
or
We consider first the properties of
. We also consider the following conditionally centered equivalent of g2 appearing in the asymptotic variance of the estimator
,
Assume that the densities of the pairs
, exist and are uniformly bounded: supk>0∥ f(k)∥∞ < ∞. Moreover, uniformly over all
, the functions
are continuous. Under these assumptions, the functions g(k) = f(k) r(k) are locally bounded.
THEOREM 6.1. Suppose that the stationary sequence
satisfies the conditions (6.23) and (6.24) with p = 2. Suppose that nδh → ∞ for some δ ∈]0,1[. Then
under any of the weak dependence condition formulated subsequently.
To consider asymptotics for the ratio estimator
we use a method, already used by Collomb (1984), that consists of studying higher order asymptotics. It is the topic of the next section.
THEOREM 6.2. Suppose that the stationary sequence
satisfies the conditions (6.23) and (6.24) with p = 2. Consider a positive kernel K. Let
for some ρ ∈]0,2], and nh1+2ρ → 0. Then, for all x belonging to any compact subset of
,
under any of the weak dependence conditions formulated subsequently.
Theorems 6.1 and 6.2 hold if hn → 0, nhn /log(n) → ∞, and if
for some a > max{6δ,2 + 2δ}. The proof is based on a Bernstein grouping argument. Besides, Robinson (1983) proves the CLT result in Theorem 5.2 under condition (5.5) for a > 2S/(S − 2) without the assumption of positivity of the kernel K.
The estimators in Theorem 6.1 obey the CLT under the mixing assumption
and if the bandwidth condition hn → 0, nhn /log2(n) → ∞ is fulfilled (see Peligrad, 1996).
The latter CLT Theorem 6.2 holds if, for some
and if the bandwidth condition hn → 0, nhn /log2(n) → ∞, holds. The proof is based on a triangular CLT in Peligrad (1996) combined with moment inequalites B.1 from Shao (1995).
Assuming that the sequence
is
-weak dependent with
, for j = 1 or j = 2, Ango Nze et al. (2002) prove that, uniformly in x belonging to any compact subset of
,
The exponential moment assumption can be relaxed. Suppose that the stationary sequence
satisfies conditions (6.22) and (6.24) with p = 2, S > 2. The former results then hold if the sequence
is
-weak dependent with
, for j = 1 or j = 2.
The CLT convergence Theorem 6.1 holds, under the conditions (6.23) and (6.24) with p = 2 if the stationary sequence
is
-weak dependent with
and
for j = 1 or j = 2. These results extend the results of Doukhan and Louhichi (1999), valid for the case of the density function
, to the estimate
under weak dependence with either ψ1 or ψ2. Indeed, the first right-hand-side term is obtained by Bernstein's blocking technique described in Appendix B. The second right-hand-side term results from the application of the Lindeberg method (see Rio, 2000).
The CLT convergence Theorem 6.2 relies on the expansion
where
with
Using the Rosenthal inequalities described in Appendix B and the aforementioned CLT, we obtain the CLT convergence Theorem 6.2 for the regression function, under conditions (6.23) and (6.24) with p = 2, if the stationary sequence
is
-weak dependent with
,
for j = 1 or j = 2.
The results stated in Theorem 6.1 and Theorem 6.2 also hold for finite-dimensional convergence. The components are asymptotically jointly independent, much in the same way as for i.i.d. sequences.
Moreover, Rios (1996) proves that the local linearity test can be handled in the strong mixing case. The function r is assumed to be ρ continuous. Then the plug-in estimator
converges to T = ∫r2(x)w(x) dx if
and the bandwidth condition hn ∈ [n−1/10,n−1/(2ρ−4)].
THEOREM 6.3. Let
be a stationary sequence satisfying the conditions (6.23) and (6.24) with p = 2. Then under the conditions formulated in the sections that follow,
Remark. Under condition (ii) of Theorem 6.3, but assuming only the weaker condition about the bandwidth sequence
we obtain
Liebscher (1996) proves the uniform almost sure convergence at the optimal rate (εn = 1) if
, with a > 4 + 3/ρ.
Peligrad (1991) states a uniform almost sure convergence result with εn = log(n) if
, with r > (ρ + 1)/2ρ.
For the sake of simplicity, we only consider the geometrically dependent case.
THEOREM 6.4. Let
be a stationary sequence satisfying the conditions (6.23) and (6.24) with p = 2 and either
-weak dependent with θr ≤ ar for some 0 < a < 1.
This paper presented the new weak dependence condition of Doukhan and Louhichi (1999). It is related to some of the most popular conditions used by econometricians to transcribe the notion of fading memory. The new dependence condition has the advantage that it allows consideration of a broader class of models. This natural weak dependence condition also fits well with the more general (stationary) models used in econometrics. As we have illustrated, most applications of interest can be set out under this weaker dependence condition. Moreover, our framework is a very natural one for bootstrapping techniques. We have also provided several useful limit theorems.
Proof of Proposition 4.1. Let A = i1,…,iu and B = j1,…,jv be two finite subsets of
with i1 ≤ … ≤ iu < iu + r ≤ j1 ≤ … ≤ jv. Set
and, for a given integer
. Then for any functions
with obvious notations
If p ≤ r/2, the r.v.s
are measurable with respect to the independent σ-fields σ(ξi : i1 − p < i < iu + p) and σ(ξj : j1 − p < j < jv + p). Therefore, the last covariance term is null. Besides, set
Then,
in the one-sided case. The two-sided case is handled similarly. █
Sketch of the Proof of Theorem 5.1 in the α-Mixing Case. With the notations of Rosenthal inequality in Lemma B.10 in Appendix B,
converges in distribution to the normal distribution
, where σ2 = lim n−1 Var(Sn) is assumed to be positive. The tightness of the process (Dn(t)) is derived according to Lemma B.1. Because the sequence (Sn2/n) is uniformly integrable, there exists a convex function G increasing faster than x at infinity, such that
blocking argument, it follows that (see details in Rio, 2000, Proposition 2)
Here
. The tightness condition obtains as soon as nαn → 0. █
Sketch of the Proof of Theorem 5.1 in the ρ-Mixing Case. The proof follows the same lines as in the α-mixing case. Details are developed in the book by Lin and Lu (1996) (see Section 4.1 therein). █
Sketch of the Proof of Theorem 5.1 in the Weak Dependent Case. Lemma B.12 and a maximal inequality by Moricz, Serfling, and Stout (1982) yield
as soon as for any increasing sequence of integers 0 ≤ i < j < k ≤ l
Moreover, this entails that σ2 = lim n−1 Var(Sn) > 0, so that the finite-dimensional convergence is obtained. The tightness of the process is a consequence of (A.3). The first part of (A.3) follows from the covariance bound
. The latter bound follows from
. █
Proof of Proposition 5.1. Using the Rosenthal inequality in Lemma B.12, we get
The conclusion follows from Lemma 9.4 in Shao and Yu (1996). █
Proof of Theorem 5.4. Lemma B.13.
Proof of Theorem 6.1. We proceed as in Rio (2000) and more specifically as in Coulon-Prieur and Doukhan (2000) for density estimation.
Consider a sequence
of i.i.d. r.v.s with standard normal distribution independent of
. Here
denotes the estimator truncated at level M(n) by a Lipschitz continuous function. Define
where
. Let φ denote a three times differentiable function with bounded derivatives up to order 3 and consider the following r.v.s:
We are interested in establishing that
We consider either a
- or a
-weakly dependent sequence
. We shall follow the practical abuse of notation, in which the same letter C is used for different constants.
Formula (A.6) is easily proved using the exponential moment assumption (6.23).
To prove formula (A.7), we apply the so-called Lindeberg–Rio method (see Rio, 2000). Clearly,
Because
we obtain
Moreover,
It then follows that
We can now bound the preceding five terms.
In the
case,
For a
-weak dependent sequence, again using (A.9)–(A.15), we need θr = O(r−a) with
If the sequence
-weak dependent with θr = O(r−a) for some a > 3, then by (A.9)–(A.15) the right-hand side term of (A.8) tends to zero as n−1.
The CLT for
is now proved. The second assertion is a consequence of the first one, where Yt is replaced by Yt − r(x). █
Proof of Theorem 6.3. We keep usual notations and denote by C a universal constant (whose value can change from one place to another). Assume that
. Then
and, by the Cauchy–Schwarz inequality,
We can now reduce computations to the case of a density estimator, as in Doukhan and Louhichi (1999). Assume that the interval [−M,M] is covered by Lν intervals with diameter 1/ν (here ν = ν(n) depends on n, and we denote by Ij the jth interval and by xj the center of the interval). Assume that the relation hν → ∞ holds (for n → ∞). Assume that the compactly supported kernel K vanishes if t > R0. Liebscher (1996) exhibits another kernel-type density estimate
based on an even, continuous kernel, decreasing on [0,∞[, constant on [0,2R0], taking the value 0 at t = 3R0. Then, he proves that
Therefore, for any λ > 0,
Exponential inequality (DPL), which can be found in Section B.4.1 of Appendix B, completes the proof of assertion (i). █
Remark. Zhao and Fang (1985) prove almost sure convergence, uniform on compact sets, of the kernel regression estimator for strongly mixing stationary process under the same condition as in Theorem 6.3. Let us consider a strongly mixing processes that satisfies conditions (6.23) and (6.24) with p = 2. Assume that
for some δ ∈]0,1[. If the moment condition (6.22) holds with S > 4 + 2/ρ, and if the mixing rate is αr ≤ r−a for some a > 4 such that
then, almost surely,
To obtain functional convergence in distribution it is usual to make use of some chaining argument to prove tightness of the sequence (Yn(t))n≥1, where
, in the space
. Chaining techniques can be found in Pollard (1981). For the sake of completeness, we recall the following tightness result deduced from the Arzela–Ascoli theorem.
LEMMA B.1 (Billingsley, 1968). The sequence of processes (Yn(t))t∈[0,1] is a tight sequence in the space
for n = 1,2,… if for each ε,η > 0, there exists a δ > 0 and an integer n0 such that for all t ∈ [0,1]
To conclude the section we recall the standard Kolmogorov–Chentsov tightness criterion, adapted to moment inequalities.
LEMMA B.2 (Billingsley, 1968). A sequence of continuous processes (Zn(t))t∈[0,1] is tight in the space
for n = 1,2,… if there exist constants C > 0, p > 0, q > 1, such that for any s,t ∈ [0,1]:
We now recall a tightness criterion in the Skohorod space D([0,1]), of left continuous with right limits functions on the interval. Note that we can restrict ourselves to the case of marginal uniform distributions.
LEMMA B.3 (Billingsley, 1968). A sequence of càdlàg (right continuous with left limits) processes (Zn(x))x∈[0,1] is tight in D([0,1]) if
To conclude the section, we recall a chaining lemma, a proof of which can be found in Shao and Yu (1996); this result can be used to prove tightness of the empirical process when the fourth-order moment involved (p = 4) is bounded. The absolutely regular case is an exception for which a more general technique is used by Rio (2000).
LEMMA B.4 (Shao and Yu, 1996). Let Xn be a stationary sequence with marginal cumulative function F. Then the empirical Brownian bridge
is a tight sequence in the Skohorod space
if there exist constants C > 0, p > 2, q > 1, and u > 0, 0 ≤ v ≤ 1, with u + v > 1 such that for any
:
A fundamental covariance inequality due to Rio (2000) extends on the previously known ones. First, recall the following definition.
DEFINITION 9.1. The quantile function QX of the real valued r.v. X is the càdlàg inverse of the tail function of |X|,
We are now in a position to state the fundamental covariance inequality.
LEMMA B.5 (Rio, 2000). Let u, v be two real-valued r.v.s with finite variance. Then, setting α = α(σ(u),σ(v)), we have
Because
, we find by taking expectations that
Hence a simple use of the Fubini theorem yields, for p > 0, the identity
. Thus, using Hölder's inequality, we deduce
Similar covariance inequalities are available for other mixing type sequences. Namely, we can state the following lemmas.
LEMMA B.6 (Bradley and Bryc, 1985). Let u, v be two real-valued r.v.s. Set ρ = ρ(σ(u),σ(v)). Let p > 0 and q > 0 be such that 1/p + 1/q = 1. Then
LEMMA B.7 (Ibragimov, 1962). Let u, v be two real-valued r.v.s. Set φ = φ(σ(u),σ(v)). Let p > 0 and q > 0 be such that 1/p + 1/q = 1. Then
Such results exist in the α-mixing and β-mixing cases. The first one is due to Berbee (1979), and the second one is due to Rio (1994). The reader is referred to Rio (2000) for further details.
LEMMA B.8 (Berbee, 1979). Let u, v be two r.v.s defined on the probability space
and taking their values in Polish spaces
. Then, enlarging the probability space if it is necessary, it is possible to define an r.v. v* with the same distribution as v and such that u and v* are independent r.v.s and
We shall use a similar device for real-valued strong mixing sequences.
LEMMA B.9 (Rio, 2000). Let
be a σ-field of
and let v be a real-valued r.v. with values (a.s.) in [a,b]. Suppose, furthermore, that there exists an r.v. U with uniform distribution on [0,1] independent of
. Then there exists an r.v. v* independent of
with the same distribution as
-measurable and such that
First applications of Rosenthal-type inequalities can be found in Billingsley (1968). They concern the Kolmogorov–Smirnov functional CLT for the empirical cumulative distribution of a φ-mixing sequence. A nice and general presentation of applications of Rosenthal inequalities to functional CLT is provided in Andrews and Pollard (1994). Maximal Rosenthal inequalities are available either in the α-mixing or in the ρ-mixing case.
For simplicity we consider stationary sequences
, such that for some real number r ≥ 2,
and we set
Similarly, α−1 denotes the inverse of the decreasing function s → α[s] (where
denotes the integral part of
defined by
LEMMA B.10 (Rio, 1994). Let (Xn) be a strong mixing sequence and for r ≥ 2 set
Then there exists a constant Cr only depending on r such that
For example, an explicit computation gives
This implies the maximal Rosenthal inequality
.
Assume now that
for some positive δ. Then the integral
can be bounded using Hölder's inequality and the relation
:
The integral on the right-hand side of the previous inequality can be written as a sum, yielding, after the use of the Abel transformation and the simple inequality (n + 1)p − np ≤ p(n + 1)p−1 that follows for p > 1 from the use of the mean value theorem,
If
, the moment Mr,α is finite.
LEMMA B.11 (Shao, 1995). Let (Xn) be a ρ-mixing sequence and set for r ≥ 2 and some constant Kr,ρ depending only on r and ρ:
Then there exists a constant Cr,ρ depending only on r and ρ such that
This follows from Bradley and Bryc's lemma and from several important lemmas in Peligrad (1987). The work of Peligrad (1987) yields maximal bounds for the moments of sums of n-samples of order 2 and 4 that only involve the sums
.
The following inequality is essential for studying associated r.v.s.
THEOREM B.1 (Newman, 1984). For a pair of measurable numeric functions (f,g) defined on
, we write f << g if both functions g + f and g − f are nondecreasing with respect to each argument. Now let X be any associated random vector with range in A. Then
This theorem follows simply from several applications of the definition to the coordinatewise nondecreasing functions gi − fi and gi + fi. By an easy application of the preceding inequalities one can check that
for
-valued associated random vectors
functions f and g with bounded partial derivatives. For this, it suffices to note that f << f1 if one defines
and uses Theorem B.1.
Denote by
the real part of the complex number z. Theorem B.1 can be extended to complex-valued functions, up to a factor 2 on the left-hand side of inequality (B.2). Indeed, we can now set f << g if for any real number ω the mapping
is nondecreasing with respect to each argument. Also, for any real numbers t1,…,tk,
If now the r.v.s Xi have a density, bounded uniformly with respect to the index i, then
This relation provides a connection between
-weak dependences (the class
will be defined in (B.11) of Section B.4) in the context of associated sequences.
The negatively associated r.v.s are much simpler to handle. The key result is a comparison theorem established by Shao (2000). For any convex function f on
,
for a given sequence (Xi)1≤i≤n of negatively associated r.v.s and for any sequence (Xi*)1≤i≤n of independent r.v.s such that
for each i = 1,…,n. To avoid triviality, we assume that the preceding right-hand-side term exists.
Let
be a sequence of r.v.s with
. In this section, we give moment bounds for
, when
. For positive integers r, define coefficients of weak dependence as nondecreasing sequences (Cr,q)q≥2 such that
for 1 ≤ t1 ≤ … ≤ tq ≤ n and for integers 1 ≤ m < q, tm+1 − tm = r. Doukhan and Louhichi (1999) provide explicit bounds Cr,q to construct inequalities for partial sums Sn. Two kinds of bounds are considered, either
or
where QX still denotes X's quantile function and c,γ ≥ 0 denote real numbers. In the examples, bound (B.5) holds for bounded sequences such that ∥Xn∥∞ ≤ M. For instance,
-weak dependence yields the bounds
where
. As in Lemma 1 of Doukhan and Louhichi (1999), we see that under
-weak dependence with ψ(h,k,u,v) = c(u,v) × Lip(h)Lip(k), a bound is
Bound (B.6) holds for more general r.v.s, as can be shown using moment or tail assumptions. A first consequence of the previous definitions is the following Marcinkiewicz–Zygmund inequality.
Let
be a sequence of r.v.s with
satisfying the condition
Then, there exists a constant B > 0 independent of n for which
The following lemma gives moment inequalities with q ∈ {2,4}. It was essentially proved in Billingsley (1968, Lemmas 3 and 4, p. 172).
LEMMA B.12 (Doukhan and Louhichi, 1999). If
is a sequence of r.v.s with
, then
The following theorems deal with higher order moments.
THEOREM B.2 (Doukhan and Louhichi, 1999). Let q ≥ 2 be some integer. Assume that dependence coefficients Cr,p of the sequence (Xn) satisfy
for all integers 0 < p ≤ q and for some positive constants M, γ, C. Then, for any integer n ≥ 2,
Theorem B.2 is adapted to work with bounded sequences. Define the class
by
To consider the unbounded case, we shall consider
-weak dependence where ψ(h,k,u,v) = c(u,v).
THEOREM B.3 (Doukhan and Louhichi, 1999). Let
be a
-weak dependent sequence with
and set Cq = (maxu+v≤q c(u,v)) ∨ 2. Then
In the special case of strongly mixing and stationary sequences, this is Theorem 1 in Rio (1994). The explicit form of the constants compensates for the fact that we are restricted to even integers.
Remark. Exponential inequalities can be obtained using Theorem B.2. Define
We first suppose that for all integers q ≥ 2 and n
where β is some constant and An is a sequence independent of q. Then the following Doukhan–Portal type exponential inequality is available:
One can take here the constants
. Let (Xn) be a sequence of r.v.s with
. Inequality (DPL) holds when Cr,q = Cσ2Mq−2eγqe−br for suitable constants C,σ,γ,b > 0, if ∥Xn∥∞ ≤ M and ∥Xn∥2 ≤ σ, for any integer n ≥ 0. In this case, An = nσ2. For example, inequality (DPL) holds under
-weak dependence if
for some δ ≥ 0.
The use of combinatorics in the preceding inequalities makes them relatively weak. For instance, Bernstein's inequality, valid for independent sequences, allows one to replace the term
in the preceding inequality by x2 under the same assumption nσ2 ≥ 1; in the mixing cases similar inequalities are also obtained using coupling arguments that are not available here.
Shao (2000) utilizes his main comparison result (expression (B.4)) to show that most of the inequalities for independent r.v.s remain true for negatively associated r.v.s, even with respect to the constants. For instance, a Rosenthal inequality is proved. Recalling that Sn* = max{|S1|,…,|Sn|}, we have that for any real number p > 2,
An exponential inequality is also derived for negatively associated r.v.s. It is as sharp as the Bernstein's inequality for independent r.v.s.
The results of Shao (2000) can also be combined with those in the papers by Ibragimov and Sharakhmentov (2001) and de la Peña, Ibragimov, and Sharakhmentov (2003) to obtain sharp versions of Rosenthal inequalities. These inequalities concern general moments of partial sums of negatively associated sequences of nonnegative and symmetric r.v.s and also mean zero r.v.s in the case of even power p. A symmetrization argument gives improvements of formula (B.12) in the context of mean zero r.v.s and arbitrary power p.
Proof of Lemma 2.1. As an application of the preceding results, let us turn to the proof of a correct version of Lemma 1 in Hall and Horowitz (1996). Set
and let
. Then
Following Withers (1981) and Newman (1984), the CLT holds if σ2 = limn→∞(1/n)Var(X1 + … + Xn) > 0 (this limit is assumed to exist), and the sequence Sn2/n is uniformly integrable.
Define by
the class of complex exponential functions of complex exponential functions f such that f (x1,…,xp) = exp(itf (x1 + … + xp)) for some
. The function ψ is defined for this class by ψexp(f,g,p,q) = pq(1 + tf tg). Let
be the dependence coefficient associated with the preceding function class.
The CLT holds for a
-weak dependent sequence with
. The inclusion
implies that the CLT holds under a
-weak dependence condition, with ψ(h,k,u,v) = min(u,v)Lip(h)Lip(k) + 1. Notice that the preceding condition
holds for associated r.v.s where [sum ]n Cov(X0,Xn) < ∞.
PROPOSITION B.1 (Doukhan and Louhichi, 1999). The CLT holds under
-weak dependence if ψ(h,k,u,v) = min{u,v}μ(Lip(h),Lip(v)) for some locally bounded real function on
. It also holds if, for some d > 0, ψ(h,k,u,v) = (u + v)d × μ(Lip(h),Lip(v)) and, for some
.
The proof is based on a more general lemma (see, e.g., Ibragimov, 1962; Ibragimov and Linnik, 1971; and Withers, 1981) setting Yk = φ(Xk). Let
be a stationary sequence. The idea is to split
into Bernstein's blocks.
LEMMA B.13 (Ibragimov, 1962). Let (Yk) be a stationary sequence of centered r.v.s. Let p = p(n), q = q(n), k = [n/(p + q)] be integer-valued functions satisfying
We define Bernstein blocks as
Then we set
Denote σn2 = Var Sn and let g, h be either x → cos x or x → sin x. If
then the sequence Sn /σn converges in distribution to a Gaussian
distribution.