WEAK DEPENDENCE: MODELS AND APPLICATIONS TO ECONOMETRICS

Patrick Ango Nze; Paul Doukhan

doi:10.1017/S0266466604206016

WEAK DEPENDENCE: MODELS AND APPLICATIONS TO ECONOMETRICS

Published online by Cambridge University Press: 01 December 2004

Patrick Ango Nze and

Paul Doukhan

Show author details

Patrick Ango Nze: Affiliation:
Université Lille 3
Paul Doukhan: Affiliation:
Université de Cergy Pontoise

Article contents

Abstract
1. INTRODUCTION
2. ECONOMETRICS AND DEPENDENCE
3. WEAK DEPENDENCE CONDITIONS
4. MODELS: DEPENDENCE PROPERTIES
5. LIMIT THEOREMS
6. FUNCTIONAL ESTIMATION
7. CONCLUSION
APPENDIX A: PROOFS
APPENDIX B: TECHNICAL LEMMAS
References

Rights & Permissions

Abstract

In this paper we discuss weak dependence and mixing properties of some popular models. We also develop some of their econometric applications. Autoregressive models, autoregressive conditional heteroskedasticity (ARCH) models, and bilinear models are widely used in econometrics. More generally, stationary Markov modeling is often used. Bernoulli shifts also generate many useful stationary sequences, such as autoregressive moving average (ARMA) or ARCH(∞) processes. For Volterra processes, mixing properties obtain given additional regularity assumptions on the distribution of the innovations.

We recall associated probability limit theorems and investigate the nonparametric estimation of those sequences.We first thank the editor for the huge amount of additional editorial work provided for this review paper. The efficiency of the numerous referees was especially useful. The error pointed out in Hall and Horowitz (1996) was the origin of the present paper, and we thank the referees for asking for a more detailed treatment of a correct proof for this paper in Section 2.3. Also we thank Marc Henry and Rafal Wojakowski for a very careful rereading of the paper. An anonymous referee has been particularly helpful in the process of revision of the paper. The authors thank him for his numerous suggestions of improvement, including important results on negatively associated sequences and a thorough update in standard English.

Type: Research Article
Information: Econometric Theory , Volume 20 , Issue 6 , December 2004 , pp. 995 - 1045

DOI: https://doi.org/10.1017/S0266466604206016 [Opens in a new window]
Copyright: © 2004 Cambridge University Press

1. INTRODUCTION

Mixing is now systematically used in time series where martingale assumptions and results cannot be directly employed. Mixing has proved particularly useful in cases where nonlinearities appear, such as autoregressive conditional heteroskedasticity (ARCH) modeling in econometrics. This success relies on powerful limit theorems proved under mixing conditions (see, among others, Doukhan, 1994; Rio, 2000; Doukhan, 2002). These limit results serve as basic tools for computation of the significance level and power of statistical tests. Mixing assumptions can be used in more general frameworks involving fading memory (asymptotic independence between functions of the past and the future of the process), such as near epoch dependence (NED) of a mixing process.

We recall here the mixing properties of some models used in econometrics. Simultaneously, we present a different approach to limit theorems when mixing does not hold (which really may occur, as shown in Andrews, 1984, example (4.16)). For the sake of simplicity, our exposition mainly focuses on one-dimensional time series.

The paper is organized as follows. To provide deep econometric motivations, Section 2 exposes several situations where the various weak dependence conditions arise, and after some generic examples, we consider specific problems, including unit root problems, parametric problems, sieve bootstrap, and semiparametric estimation problems in Sections 2.1, 2.2, 2.4, and 2.5. Section 2.3 considers generalized method of moments (GMM) estimation in which the Doukhan and Louhichi (1999) weak dependence condition allows one to provide a complete proof of the results in Hall and Horowitz (1996). Indeed, the latter authors improperly claim a mixing property of their models to prove their consistency results. Finally, Section 2.6 considers nonparametric estimation problems.

Section 3 makes precise the mathematical framework of weak dependence needed in the previous section. It describes some classical concepts of fading memory (mixing conditions, the association property) and also the new weak dependence conditions introduced by Doukhan and Louhichi (1999). After this, Section 4 provides numerous classes of models commonly used in econometrics and finance and focuses on their weak dependence properties. Section 5 recalls some probabilistic limit theorems available in those cases. Extensions of Donsker's functional central limit theorem (FCLT) and the FCLT for the cumulative distribution function are discussed. Section 6 is devoted to functional estimation. Consistency and CLTs are discussed here. Proofs are given in Appendix A, and Appendix B recalls the main probabilistic tools.

Finally, we remark that the limit theorems of Section 3 and the asymptotic results for functional estimation in Section 6 are provided for very large classes of models (Sections 3 and 4). Hence, more general time series formulations such as those in Section 4 allow us to extend directly the classical results of Section 2.

2. ECONOMETRICS AND DEPENDENCE

Time series analysis is a major part of econometrics. Here we provide several examples of interest in which it is essential to consider dependent structures instead of simple independence. In some situations, classical tools of weak dependence such as mixing are useless. For instance, when bootstrap techniques are used, no mixing conditions can be expected. Consider the following example concerning bootstrap: let a stationary autoregressive sequence be generated by an independent and identically distributed (i.i.d.) sequence

Standard nonparametric estimation techniques provide an estimate of the autoregression function r. Let

be a convenient estimator of r. Given data (X₁,…,X_n) from the sequence (2.1), another autoregressive process is defined by

The innovations

are drawn according to the empirical measure of the estimated residuals,

. No mixing assumption can be expected for the previous model (2.2): see (4.16) in Section 4. However, a new concept of fading memory can still be applied. Bickel and Bühlmann (1999) set up such a new weak dependence condition to build critical bootstrap values for a linearity test in linear models. Doukhan and Louhichi (1999) have extended it to fit models such as positively dependent sequences, Markov chains (with or without topological assumptions), and Bernoulli shifts (see Definition 4.1). The Bernoulli shifts are defined in Assumption 1 of Hall and Horowitz (1996) and are used throughout that paper. The previously mentioned weak dependence conditions yield standard results concerning convergence in distribution with an

-normalization.

Another application of these results concerns linearity tests in time series analysis. Rios (1996) considers a stationary functional autoregressive model (2.1) where r = L + C is the decomposition of the autoregression function into a sum of linear (L) and nonlinear (C) components. Local linearity of r is then tested via the null hypothesis

where the weight function w has compact support.

Still another problem of interest is to test the independence of the innovations

in a regression model

This can be performed using the Durbin–Watson statistic. The latter can be written as a continuous functional of the Donsker line of the sequence

2.1. Unit Root Tests

Consider a stationary autoregressive sequence

generated by an i.i.d. sequence

A classical problem is to test whether there is a unit root (i.e., a = 1). In the specific context of aggregate time series, the assumption of white noise innovations seems to be rather strong. Phillips (1987) develops unit root tests for mixing and heterogeneously distributed innovations. The ordinary least squares estimate

is shown to be a continuous functional of the Donsker line of the sequence

. As an application of the FCLT, Phillips shows that a unit root test can be based on the fact that under the null hypothesis H₀ : a = 1,

where W denotes the standard Brownian motion and

. The author works with stationary strong mixing sequences, and conditions under which the FCLT result holds true are reported in Section 5.1. This result can be obtained under a weak dependent context detailed in Section 4. The conditions for which Donsker's theorems hold are described in Section 5.1. This example, as the author suggests, can be generalized to error sequences

that allow for heteroskedasticity. See also Mills (1999) for a discussion of the Dickey–Fuller unit root test in autoregressive models when errors fluctuate about a nonzero mean.

2.2. Parametric Problems

GMM estimation procedures involve an estimate

, which is a solution of the arg-min problem

, where

Here

is a finite-dimensional parameter set, and g(·,·) is a given function such that

, where θ₀ is the true parameter point. In the time series context, the positive semidefinite matrix Ω is often replaced (see Hall and Horowitz, 1996, equation (3.2)) by an asymptotically optimal weight matrix estimate

and κ is such that

. The statistic to test

, where

(the square root of a symmetric positive matrix is uniquely defined).

2.2.1. Block Bootstrap.

A bootstrap procedure allows one to estimate the limit distribution of an estimate.

We describe a block-bootstrap procedure that is adapted to the times series

. Let b = b(n) and l = l(n) denote the number and the length of the blocks. Then bl = n and the block j is

(see Künsch, 1989). In this construction, a suitable form of

if the process X_k = H(ξ_k,ξ_k−1,…) is a Bernoulli shift as defined in Section 4.3. Here

are obtained through filtering and estimation procedures in the simple case of a linear process (H(z₀,z₁,…) = [sum ]_k a_k z_k; see Section 2.4); in the general setting, one needs to develop additional estimation procedures. To describe the asymptotic properties of such processes one needs to know the limiting asymptotic behavior of Bernoulli shifts.

2.2.2. Conditional Bootstrap.

A simple local conditional bootstrap is investigated by Ango Nze, Bühlmann, and Doukhan (2002). In that paper, it is shown that asymptotic properties can be obtained using the same weak dependence techniques. The following central limit theorem (CLT) holds under standard mixing assumptions:

where the diagonal matrix Σ_n has d entries. GMM techniques naturally involve an unknown covariance matrix. To estimate such limiting distributions it is natural to use bootstrap techniques.

2.3. Bootstrapping Critical Values for GMM Estimators

Let

denote a block-bootstrap sample and let

. The expectation is taken with respect to the bootstrap distribution. The GMM estimate

solves the arg-min problem

if the matrix Ω is known.

Hall and Horowitz (1996) make the erroneous statement that such Bernoulli shifts are strong mixing. However, the procedure used by Hall and Horowitz makes the bootstrap work. The weak dependence condition as defined in Doukhan and Louhichi (1999) allows us to rigorously prove the consistency of the Hall and Horowitz procedures. More precisely, if X_n = h(ε_n,ε_n−1,…) for some i.i.d. sequence

, their Assumption 1 is

The function h takes its values in L_x × 1 space equipped with a norm ∥·∥. The assumption is missing the symbol of mathematical expectation

, as in Andrews (2002).

This condition holds for linear processes, and it is claimed to imply geometric strong mixing by the authors. Simple example (4.16) proves that this does not hold in general. This condition, however, does imply weak dependence in Doukhan and Louhichi (1999). Consequently, a tail inequality for sums of functions of the sequence ξ_n = f (X_n,θ) can be derived. It is the main tool to prove the validity of the bootstrap in this dependent setting.

The preceding procedure can be used for testing the null hypothesis H₀ : θ = θ₀ against the bilateral alternative. The studentized statistic T_n(θ) described in (2.4) and the critical value Q_α then satisfy the relation that, under H₀,

Hall and Horowitz show that the bootstrap studentized statistic

and the previously mentioned statistic T_n(θ) have close laws, in the sense that

for a relevant integer 2a, with a ≥ 1 + ξ, and the range of ξ ∈ [0,1] is formulated according to the dependence assumptions prescribed. This relation comes from an Edgeworth development. It yields an improved acceptance rule for the test of H₀:

Andrews (2002) points out that a direct computation of the bootstrap critical value Q_α* is a hard problem and that the common estimating procedure, which is based upon B bootstrapped, independent copies (from the law of large numbers, it follows that

) is also difficult to implement. Indeed, the computations involve the minimization of B nonlinear functionals. A numerical improvement is brought to bear in Andrews's paper. A bootstrap estimator

is computed by applying the Newton–Raphson algorithm. The initial value is

, and k iterations are made (k ≥ 3). A bootstrap studentized statistic T_n,k* is now available, for which the computation of the critical bootstrap values Q_α*(B) is much easier, because the problem is linear. The method is claimed to be as accurate as the one discussed by Hall and Horowitz. In fact, the author states a similar result to (2.6) with respect to

. The assumptions are those of Hall and Horowitz. Therefore, the previously mentioned Assumption 1 must also be read in the context of the comments we have already made about Hall and Horowitz's paper. For the sake of completeness we present a corrected version of Lemma 1 in Hall and Horowitz (1996), which is proved in Section B.4 of Appendix B.

LEMMA 2.1. Let (ξ_n) be a stationary ψ₁-weakly dependent (see Definition 3.4) sequence with

such that

for some a > 0, as r ↑ ∞, and

. Then

Here θ_r is a dependence coefficient, and it is not related to a statistical parameter denoted θ and estimated by

satisfies

Following precisely the same steps as in Hall and Horowitz (1996), we thus prove, by only replacing their Lemma 1 by Lemma 2.1, that bootstrapping critical values for GMM estimators are valid.

Now Theorems 1–3 in Hall and Horowitz (1996) are rigorously proved. A paper on this topic by the authors is in preparation to provide sharper results; for instance, the exponent 33 in the previous lemma is unnatural, and it can be changed.

2.4. Sieve Bootstrap

Bickel and Bühlmann (1999) tackle the problem of the “sieve bootstrap” for a one-sided linear process

where (ξ_n) is a sequence of i.i.d. random variables (r.v.s) with

and the density function f_ξ, and where

. Under the assumption that the function

has no root in the closed unit circle, the process (2.7) admits an AR(∞) representation

with

. Equation (2.8) is fitted with an autoregressive process of finite order p(n) (p(n)/n → 0, p(n) → ∞). Using estimated residuals, the resampling (i.i.d.) innovation process

is constructed by smoothing the empirical process based on those residuals by a kernel density estimate of the density f_ξ. Finally, the smoothed sieve bootstrap sample

is defined by resampling the AR(p(n)) process from innovations

The purpose of the paper was to carry over a weak dependence property (here strong mixing) of the initial sequence

to the sieve processes

(a classic and a smoothed version were examined in the paper). The goal is unrealistic for the classic bootstrap sample because the distribution of the bootstrapped innovations is discrete. Proving a mixing property for the smoothed sieve bootstrap sample eludes the efforts of the authors. In the latter case, it nevertheless appears that limit theorems can be proved by another method. It consists of using the following property:

with

and for smooth functions g₁,g₂ belonging to the classes

(see equation (3.1) in Bickel and Bühlmann for the definition of the class

and some examples). The new dependence coefficient ν is less than the strongly mixing coefficient. Bickel and Bühlmann (1999) cannot prove that the sieve sequence (X_n*) is strongly mixing. A weak dependence condition is now defined by the ν coefficient. Bickel and Bühlmann prove that it is satisfied by both this sequence and a smooth version of the resampled innovations. For instance, Bickel and Bühlmann prove that if the sequence

satisfies some regularity conditions ensuring that

(recall that ν_k ≤ α_k), then the sieve bootstrap process

satisfies a ν-mixing condition with a polynomial rate

for relevant classes

and a positive constant L. See Theorem 3.2 in Bickel and Bühlmann (1999, p. 422) for more details.

2.5. A Semiparametric Estimation Problem

We follow the presentation in Robinson (1989). He considers an economic variable observable at time n that is an R × 1 vector of r.v.s

. We observe W_n at time n = 1 − P,2 − P,…,T where P is nonnegative and T large. Hypotheses of economic interest often involve a subset X_n = B(W_n′,…,W_n−P′) of the array (W_n′,…,W_n−P′)′; for this B is a J × (PR) matrix formed from the PR-rowed identity matrix I_PR by omitting PR − J rows (which means that in B, PR − J elements of W_n,…,W_n−P are deleted). Thus, in B, elements of W_n,W_n−1,…,W_n−P that are not in X_n are deleted, and X_n can have elements in common with X_n+P−1,…,X_n+1,X_n−1,…,X_n−P. Let X_n = (Y_n′,Z_n′)′, where Y_n and Z_n are K × 1 and L × 1 vectors (K + L = J). The problem of interest is to test the hypothesis

against the alternative

. This null hypothesis is written in the form

for

for some M > 0 and some function H(z,z) defined as

for some convenient function G and where

for any Borel set A of

. Here f^(j)(z) denotes the vector of j-partial derivatives of f.

An example of this framework is given by X_n = (Y_n′,Z_n′)′, where Y_n = (t_n,s_n′)′ and Z_n = v_n. The regression model

is of common use in econometrics. Here s_n,t_n,v_n are, respectively, scalars, p × 1, and q × 1; they are observable random sequences whereas the innovation process (u_n) is centered and unobservable, so that

; we denote

. In the case of a weakly dependent and stationary innovation process, Robinson (1989) considers the hypothesis H₀ : β = 0. In this case, the hypothesis can be written as before, and Robinson calculates β = τ where K = p + 1, L = 1, M = 0, and G(x₁,x₂) = (t₁ − t₂)s₁φ(v₁) for some function

(usually φ ≡ 1). Robinson considers the statistics

constructed from the n-sample (X₁,…,X_n). Here,

is a U-statistic and

is the natural estimator of the covariance matrix of

. One such estimate is

. Tapered versions might be preferred (see Robinson, 1989, formula (2.21)); here

with d_i,j = G(X_i,X_j)k(Z_i − Z_j)/h, where k(z) = h^−L(k(z),h⁻¹k⁽¹⁾(z)′,…,h^−Mk^(M)(z)′). Under β-mixing assumptions, Robinson proves that these estimates are

-consistent and satisfy a CLT. Under a natural β-mixing condition, Robinson proves in fact that the statistic

has asymptotically a χ²-distribution if

where b > μ/(μ − 2) under the moment assumption

The β-mixing assumption allows us to compare the joint distribution of the initial sequence with respect to a sequence of r.v.s with independent blocks. This reconstruction is due to Berbee's coupling lemma, no matter how big the size of the blocks may be. Yoshihara (1976) derives a covariance inequality that fits to U-statistics. A way to get rid of β-mixing conditions is to consider an independent realization

of the trajectory X₁,…,X_n. Now a simpler estimator of τ is given by

The asymptotic behavior of this expression is easy to derive under alternative weak dependence conditions by using our results because

is the numerator of a Nadaraya–Watson kernel for the regression estimation problem

in the special case of the previous example. In fact this trick avoids the corresponding coupling construction for U-statistics. For another application of semiparametric problems, see, for example, Chen and Fan (1999).

2.6. Nonparametric Problems

For a stationary process

with Z_t = (X_t,Y_t), an important quantity is the regression function

. Various methods to fit such a function have been developed. Nadaraya–Watson kernel estimates are very popular; see, for instance, Rosenblatt (1991), Prakasa (1983), and Robinson (1983). Let K be some kernel function that integrates to 1, Lipschitzian and with compact support.

Among other problems, one may wish to estimate the volatility of financial times series, v(x) = Var(X_t|X_t−1 = x). The question enters the general framework because v(x) = v₂(x) − v₁²(x), where

Another important problem of econometric interest is to estimate the marginal density f of a stationary sample. Density kernel estimators built on K are available. Density and regression functions of derivatives can also be estimated by using analogous procedures.

Finally, conditional quantiles are linked to the conditional distribution

. More precisely, we denote by q(t|x) = inf{y; F(y|x) > t} the generalized (right-continuous, with left limits) inverse of the monotone function y [map ] F(y|x). Consistent estimators of the conditional regression

, where

, provide information on the previous conditional quantiles.

3. WEAK DEPENDENCE CONDITIONS

Various generalizations of independence have been introduced to answer the econometric questions discussed in Section 2. The martingale setting was the first extension of the independence framework (Hall and Heyde, 1980); weakening martingale conditions yields mixingales. Martingale conditions are written in terms of conditional expectations, and they seem to be quite restrictive in econometric practice. NED is a more flexible tool for modeling fading memory. The ergodic theorem is the first limit theorem proved for dependent sequences. Another point of view is given by the mixing properties of stationary sequences in the sense of ergodic theory: uniform versions of such properties are the forthcoming mixing properties. Those conditions are also based on independence properties of the underlying generated σ-algebras. They are also difficult to check (see Doukhan, 1994).

Our aim is to promote the weak dependence properties, which will be seen to be much easier to prove.

3.1. Mixing

Let

be a probability space and let

be two sub σ-algebras of

. Various measures of dependence between

have been introduced; among them we recall

These coefficients are, respectively, the strong mixing coefficient

of Rosenblatt (1956), the absolute regularity coefficient

of Wolkonski and Rozanov (1959, 1961), the maximal correlation coefficient

of Kolmogorov and Rozanov (1960), and the uniform mixing coefficient

of Ibragimov (1962).

Let

be a discrete time stationary process. We denote X_A = {X_t; t ∈ A} the A-marginal of X with

. Finally, σ(Z) will denote the σ-algebra generated by an r.v. Z.

For any coefficient previously defined, say, c(.,.), we shall call the process X a c-mixing process if lim_k→∞ c_X,k = 0, where c_X,k = c(σ(X_]−∞,0]),σ(X_[k,+∞[)). The following relations hold:

and no reverse implication holds in general. See Doukhan (1994) for more information.

3.2. Mixingales and Near Epoch Dependence

Let

be a real-valued process. We let

DEFINITION 3.1 (McLeish, 1975; Andrews, 1988). Let p ≥ 1 and let

be an increasing sequence of σ-algebras. The sequence

is called an

-mixingale if there exist nonnegative sequences

such that ψ(n) → 0 as n → ∞ and for all integers

This property of fading memory is easier to handle than the martingale condition. A more general concept is the NED on a mixing process. Its definition can be found in the work by Billingsley (1968), who considered functions of φ-mixing processes.

DEFINITION 3.2 (Pötscher and Prucha, 1991a, 1991b). Let p ≥ 1. We consider a c-mixing process

. For any integers i ≤ j, set

. The sequence

is called an

-NED process on the c-mixing process

if there exist nonnegative sequences

such that ψ(n) → 0 as n → ∞ and for all integers

This approach is developed in detail in Pötscher and Prucha (1991). Functions of MA(∞) processes can be handled using the NED concept. For instance, limit theorems can be deduced for sums of such functions of MA(∞) processes. These previous definitions translate the fact that a k-period—ahead in the first case, both ahead and backward in the second definition—projection is convergent to the unconditional mean. They are known to be satisfied by a wide class of models. For example, martingale differences can be described as

-mixingale sequences, and linear processes with martingale difference innovations also.

3.3. Association

The notion of association was introduced independently by Esary, Proschan, and Walkup (1967) and Fortuin, Kastelyn, and Ginibre (1971).

The motivations of those authors were radically different because the first group of authors was working in reliability theory and the others in mechanical statistics. The condition of the second group of authors is known as FKG inequality.

DEFINITION 3.3. The sequence

is associated, if for all coordinatewise increasing real-valued functions h and k,

for all finite disjoint subsets A and B of

This extends the positive correlation assumption to model the notion that two stochastic processes have a tendency to evolve in a similar way.

This definition is deeper than the simple positive correlatedness. Besides the evident fact that it does not assume that the variances exist, one can easily construct orthogonal (hence positively correlated) sequences that do not have the association property. An important difference between the preceding conditions is that its uncorrelatedness implies independence of an associated sequence (Newman, 1984). Let, for instance, (ξ_k,η_k) be independent and i.i.d.

sequences. Then the sequence

defined by X_k = ξ_k(η_k − η_k−1) is neither correlated nor independent, and hence it is not an associated sequence. Heredity of association only holds under monotonic transformations. This unpleasant restriction will disappear under the assumption of weak dependence.

The preceding property of associated sequences was a guideline for the forthcoming definition of weak dependence. It contains the idea that weakly correlated associated sequences are also “weakly dependent.” The very explicit inequality (B.2) proves that this idea makes sense.

On the opposite side, negatively associated sequences of r.v.s are defined by a similar relation as the aforementioned covariance inequality, except for the sign of this inequality. Shao (2000) provides a lucid summary of this type of association. Then he points out a crucial property of domination by comparable independent sequences. This property breaks the seemingly parallel definitions of positively and negatively associated sequences. We shall develop this idea further.

3.4. Weak Dependence

Here we shall make more explicit the asymptotic independence between “past” and “future.” Roughly speaking, for convenient functions h and k, we shall assume that

is small when the distance between the “past” and the “future” is sufficiently large. Define by

the union of the sets

of numerical bounded measurable functions on some euclidean space

and ∥.∥_∞ the corresponding uniform norm. We define the Lipschitz modulus of a function

if x = (z₁,…,z_u). Consider the class

DEFINITION 3.4 (Doukhan and Louhichi, 1999). A sequence

is called

-weak dependent if there exist a sequence

decreasing to zero at infinity and a function ψ with arguments

such that for any u-tuple (i₁,…,i_u) and any v-tuple (j₁,…,j_v) with i₁ ≤ … ≤ i_u < i_u + r ≤ j₁ ≤ … ≤ j_v, one has

if the functions h and k are defined, respectively, on

and on

Notice that the sequence θ depends both on the class

and on the function ψ. The function ψ can in fact depend on all its arguments, contrary to the case of bounded mixing sequences. This definition is hereditary through images by convenient functions.

The examples of interest to follow involve the function ψ₁(h,k,u,v) = uLip(h) + vLip(k), ψ₁′(h,k,u,v) = vLip(k), ψ₂(h,k,u,v) = uvLip(h)Lip(k), and ψ₂′(h,k,u,v) = vLip(h)Lip(k). For example, proving that an MA(∞) process X_n = [sum ]_k≥ a_kξ_n−k based on an i.i.d. sequence such that

and [sum ]_k|a_k| < ∞ is ψ₁′-weakly dependent with

is based on the decomposition X_n = X_n + X_n with X_n = [sum ]_k<r a_kξ_n−k. In this case, assuming for simplicity that v = 1 and j₁ = n, we have

Thanks to an anonymous referee, we prove that NED implies our weak dependence through the following inequalities. For simplicity, write h = h(X_i₁,…,X_{i_u}), k = k(X_j₁,…,X_{j_v}), then the Cauchy–Schwarz inequality gives

and the last expression can be bounded using the

-mixingale property of

-NED sequences. Clearly, this implication is not an equivalence between both notions. It is an open question whether or not these notions are equivalent.

4. MODELS: DEPENDENCE PROPERTIES

4.1. Markovian Models

4.1.1. Models with a Markovian Representation.

Let

be a real-valued i.i.d. sequence and let M be some function. We consider vector-valued models driven by the equation

To justify the title of this section, note that the vector-valued sequence

, where X_n^(p) = (X_n−1,…,X_n−p), is Markovian. Using Proposition 7.6 of Kallenberg (1997) proves that any Markov process has such a representation.

Under “reasonable” assumptions (described subsequently) such models can be rewritten as ergodic Markov chains (see Meyn and Tweedie, 1993; Tjøstheim, 1990). Thus, the stationarity assumption is reachable. An interesting class is given by

where

are two mutually independent i.i.d. sequences and the function S satisfies S(x₁,…,x_p) ≥ s > 0 for some

, and any real numbers x₁,…,x_p and the functions R and S essentially satisfy contraction assumptions (for developments, see Doukhan, 1994; Ango Nze, 1995, 1998; Duflo, 1990).

For instance, ARMA(p,q) processes

have such a Markov representation in the case when the roots of the polynomial

lie outside the unit disk. Indeed,

, where

is a Markov process. See Mokkadem (1990).

A further example is that of bilinear models⁴

Further technical details on this topic are provided by Granger and Andersen (1978), among other references.

, which are popular in econometrics

Examples of such models are also doubly stochastic autoregressive processes: X_n = η_n X_n−1 + ξ_n.

Econometricians have introduced generalized ARCH-GARCH processes:

to model conditional variances (interpreted as, e.g., an asset volatility in finance theory) that change over time (for further references, see Bollerslev, 1986). These models are known to satisfy the NED property of Definition (3.2).

Note that functional autoregressive models correspond to constant functions s (see Bollerslev, 1986).

Moreover, threshold models are those for which r is linear on a partition of the space into polygonal regions. For example, Petruccelli and Woolford (1984) study threshold autoregressive models such as

where x⁺ = max(x,0) and x⁻ = min(x,0). This model is ergodic if a < 1,b < 1, and ab < 1 and has geometric rates of convergence in total variation to the stationary limit if the centered sequence (ξ_n) has finite exponential moment and its distribution has a density with respect to Lebesgue measure. If, for instance, (a,b) = (½,−2), the function r(x) = ax⁺ + bx⁻ is relevant, but it is not a contraction.

ARCH or GARCH models are those with nonconstant functions s, such as square roots of nonnegative polynomials with degree 2, namely,

with |a| + |c| < 1. Vector-valued versions of such models can also be described. They include GARCH models. He and Teräsvirta's paper (1999) looks at the existence of marginal moments and conditions for stationarity of GARCH models. The following example of a Markovian nonmixing sequence is given in Andrews (1984) and Rosenblatt (1985). This is the (Markov) AR(1)-process with binomial innovations

This is also the Bernoulli shift X_n = H(ξ_n,ξ_n−1,…) with

. Full definitions of Bernoulli shifts will be given in Section 4.3. This model has stationary uniform distribution on the interval [0,1], but it satisfies no mixing condition. Indeed, the innovations ξ_j (j ≤ n) are the digits of the dyadic expansion of X_n; hence, X_k is a deterministic function of X_n for k ≤ n. An extension of this model to innovations taking p different values is immediate; for this, one can use the numeration in basis p. The process X_n = 0.ξ_nξ_n−1… is the solution of the recurrence equation X_n = (1/p)(X_n−1 + ξ_n) if the innovations are uniform on {0,1,…,p − 1}.

4.1.2. Weak Dependence Properties.

Lipschitzian models (see Duflo, 1990) are multivariate Markov models, defined recursively through X_n = M(X_n−1,ξ_n) and the assumptions that

for some 0 ≤ a < 1 and S ≥ 1. Here, the

-space where the process lives is equipped with some—not necessarily euclidean—norm ∥·∥. Duflo (1990) introduces the concept of stability of such Markov chains. She proves their geometric stability. That is, denoting Φ_n(X₀) to be precisely the initial state X₀, there exists some c ∈ [0,1[such that for any

In the particular case where X₀′s distribution is the stationary probability measure,

Using those results, Doukhan and Louhichi (1999) deduce

-weak dependence. In fact, under the assumptions that follow, one has

. Here neither stationarity nor any further regularity assumption on the sequence of innovations is required. Such contraction properties are also used by Pötscher and Prucha (1991). More general AR(p) nonlinear models, X_n = M(X_n−1,…,X_n−p;ξ_n), have the same properties, if, for example,

and, for some constants a_j ≥ 0,1 ≤ j ≤ p with

The more recent papers by Diaconis and Freedman (1999) and Jarner and Tweedie (2001) provide a wide range of examples in this spirit. Alsmeyer and Fuh (2001) give conditions for arithmetic decay of the weak dependence coefficient sequence. Both papers study iterated random sequences M_n = F(ε_n,M_n−1) for independent sequences (ε_n) and some F, measurable and Lispschitz in the second variable. The process (M_n) takes values in a complete separable metric space (E,d) and forms a Markov chain. Under the assumption of existence of the unique invariant distribution π, both papers prove, using different methods, that

if for some x₀ ∈ E and p > 0

The distance

is the Prohorov metric associated with d.

4.1.3. Mixing Properties.

Mixing properties of the models with a Markovian representation, X_n = M(X_n−1,…,X_n−p,ξ_n), are described in Meyn and Tweedie (1993). The preceding models are ergodic under suitable assumptions on ξ₀'s distribution.

Assume that

and assume the existence of an almost surely nonvanishing density f for ξ₀'s law. Then, under contraction assumptions on the function M, one can prove that, under the invariant initial distribution,

. If M(X_n−1,…,X_n−p,ξ_n) = R(X_n−1,…,X_n−p) + ξ_n, then the second relation holds if, for example,

and R is continuous. (It is enough that R be continuous out of a null set; e.g., a piecewise continuous function R is relevant, as in the previously mentioned threshold model by Petruccelli and Woolford, 1984. For more details, see Doukhan, 1994). In fact, Davydov (1973) has proved that Harris recurrent Markov chains are ergodic and β-mixing when stationary; moreover, denoting by μ the stationary distribution of the Markov chain X_n, by P the transition probability kernel, and by ∥·∥_TV the norm in total variation, one has that

Returning to the more specific models introduced before, Ango Nze (1995) proves that (4.18) implies the β-mixing property under the assumption that ξ₀'s distribution has a density with respect to Lebesgue measure. The mixing coefficients decrease at a geometric rate. If, moreover, p = 1 in the preceding (functional AR(1)) model, he proves (see also Doukhan, 1994) that, under the previous assumptions on the white noise (ξ_n),

The expression

for some constants c, A > 0, and any real number x. The function R is continuous. A more general result is obtained in Ango Nze (1998) if

. Veretennikov (1999) improves on the previous hyperbolic mixing decay assumptions. Under a local Doëblin condition (implied by the preceding absolute continuity assumptions on ξ₀'s distribution), he proves that

if b < S/2 − 1 where S > 4 satisfies

. The existence of the stationary distribution is proved under the relaxed condition S ≥ 2. Improved results are provided in Fort and Moulines (2002); they are clarified in the work by Jarner and Tweedie (2001), where constants are explicitly given.

The expression

for some constants B < e^−b and A > 0 and any real number x. The function R is continuous.

If the innovations have a finite exponential moment,

, Mokkadem (1990) proves that the assumptions |R(x)| ≤ |x| − ε for some ε > 0 and |x| big enough to ensure an analogous result: the mixing sequence (β_n) decays at a exponential rate.

The expression

if R(x) is a bounded function, continuous outside a null set, and ξ₀'s law is not orthogonal to Lebesgue measure; moreover, the stationarity is no longer required. Unfortunately, this drastic boundedness condition excludes, for example, the linear autoregressive processes.

The preceding results provide (upper) bounds for the mixing coefficients. It is a much harder problem to derive both upper and lower bounds of the mixing sequences from the assumptions about the model; for results about some types of Markov sequences, see Davydov (1973). Meyn and Tweedie (1993) also provide necessary and sufficient conditions for geometric ergodicity of threshold autoregressive linear processes (the function R is piecewise linear; see also Cline and Pu, 1998). Doubly stochastic AR models are geometrically ergodic if ξ's distribution has an absolutely continuous component and

(see Tjøstheim, 1986).

For the other models, we refer to Pham (1986), Doukhan (1994), and Ango Nze (1998).

4.2. Associated Sequences

Associated sequences with finite variances are

-weak dependent with θ_r = sup_i≥r Cov(X₀,X_i) (see Doukhan and Louhichi, 1999). Note that broad classes of examples of associated processes result from the fact that any independent sequence is associated and that monotonicity preserves association (for this, see Newman, 1984).

The case of Gaussian sequences is analogous. One may also consider combinations of sums of Gaussian and associated sequences, or Bernoulli shifts driven by stationary, associated, instead of i.i.d. sequences.

Linear processes with nonnegative coefficients are associated, and so are functional autoregressive processes with nondecreasing regression functions. Note that for associated or Gaussian sequences, the function ψ₂′ replaces ψ₂ if θ_r = sup_i≥r|Cov(X₀,X_i)| is replaced by θ_r = [sum ]_i≥r|Cov(X₀,X_i)|.

Giraitis, Kokoszka, and Leipus (2000) consider ARCH(∞)-models (4.19) with nonnegative coefficients and nonnegative inputs. In that case the models are also associated.

4.3. Bernoulli Shifts

DEFINITION 4.1. Let

be a sequence of i.i.d. real-valued r.v.s and the function

be measurable. The sequence

is called a Bernoulli shift if it is defined by

We refer the reader to Ornstein and Weiss (1990), where such models are motivated through deep ergodic theoretic arguments.

One-sided shifts are defined as X_n = H(ξ_n,ξ_n−1,ξ_n−2,…,ξ₀,ξ₋₁,ξ₋₂,…), that is,

. The model described in equation (4.16) is an example of such a shift:

. The previous model is a simple example of a weakly dependent but possibly nonmixing sequence.

4.3.1. Markov Sequences.

A general situation where sequences are one sided is the following Markov stationary setting. Consider a Markov process driven by the updating equation X_t = M(X_t−1,ξ_t), for some i.i.d. sequence

; then the function H if it exists is defined implicitly by the relation H(x) = M(H(x′),x₀), where x = (x₀,x₁,x₂,…), x′ = (x₁,x₂,x₃,…). To consider more general Markov sequences one may also refer to the previous section devoted to Markov processes. To prove such Bernoulli shift representations, Mokkadem (1990) and Meyn and Tweedie (1993) use the tools of control theory.

4.3.2. Chaotic Representations.

We now specialize the analysis to chaotic expansions associated with the discrete chaos generated by the sequence

. Let

; we write in a condensed formulation

, where H^(k)(x) denotes the kth-order chaos contribution, H⁽⁰⁾(x) = a₀⁽⁰⁾, is only a centering constant and for k > 0,

or in short, in vector notation,

Processes associated with a finite number of chaotic terms (i.e., H^(k) = 0 if k > k₀) are also called Volterra processes. The first example of such a Volterra process is clearly the class of linear processes that includes autoregressive moving average (ARMA) processes: it corresponds to the consideration of just a term in the first chaos (i.e., k = 1 in the previous representation); it is widely used in the field of statistics (see, e.g., Rosenblatt, 1985). A simple and general condition for

-convergence of such series is, still in a condensed notation,

The simple bilinear process, X_t = (a + bξ_t−1)X_t−1 + ξ_t, is stationary if

(see, e.g., Tong, 1981). It is a Bernoulli shift with

, for

More general affine models are considered in Mokkadem (1990).

ARCH(∞)-models (see Giraitis et al., 2000) are given by a sequence (b_j)_j≥1 and an i.i.d. sequence of r.v.s (ξ_j)_j≥0 through the recursive relation

Such models have a stationary representation with the chaotic expansion

under the simple assumption

4.3.3. Mixing Properties.

Finite moving averages X_n = H(ξ_n,ξ_n−1,…,ξ_n−m) are trivially m-dependent.

The Bernoulli shift X_n = H(ξ_n,ξ_n−1,…) (with

) is not mixing; this is again example (4.16) of a Markovian, nonmixing,

⁵

Its stationary representation writes

. Here ξ_n−k is the kth digit in the binary expansion of the uniformly chosen number X_n = 0.ξ_nξ_n−1… ∈ [0,1]. This proves that X_n is a deterministic function of X₀, which is the main argument to derive that such models are not mixing. The same arguments apply to the model described before of an autoregressive process with innovations taking p distinct values.

sequence. The difference between two such independent processes of this type provides an example of a nonassociated and nonmixing process.

Hence one cannot expect sufficient condition for mixing in such weak shifts.

4.3.4. (θ,ℒ,ψ)-Weak Dependence.

Contrary to mixing conditions, it can be proved that even two-sided sequences can be

-weak dependent. For instance, for infinite moving averages

. Note also, for completeness, that (NED) conditions can also be deduced for such two-sided models. More generally, we can state the following definition.

DEFINITION 4.2. For any integer k > 0, we denote by δ_r any number such that

Such sequences

are related to the modulus of uniform continuity of H; that is, if for positive constants

, the inequality

holds for any sequences

, and if the sequence

has a finite moment of order b, then one can choose

PROPOSITION 4.1 (Doukhan and Louhichi, 1999). Bernoulli shifts are

-weak dependent with θ_r = 2δ_r/2 and ψ(h,k,u,v) = 4(u∥k∥_∞ Lip(h) + v∥h∥_∞ Lip(k)). If, moreover, the Bernoulli shift is one sided, then it is

-weak dependent with θ_r = δ_r and ψ(h,k,u,v) = 2vLipk∥h∥_∞.

We turn back to Volterra expansions. A suitable bound for δ_r corresponds here to the stationarity condition

The one-sided example of a simple bilinear process, X_t = (a + bξ_t−1)X_t−1 + ξ_t, with convergent chaotic representation for

satisfies δ_r = θ_r = c^r(r + 1)/(1 − c); it has a geometric rate of decay under a stationarity condition set out by Tong (1981). The stochastic volatility model

is another example yielding a one-sided chaotic decomposition. The sequence (η_j) is assumed to consists of independent r.v.s and to be independent of the centered reduced sequence (ξ_n). The chaotic representation converges if

4.3.5. Linear Processes.

Using suitable assumptions on the law of ξ₀, the one-sided linear processes

satisfy β-mixing conditions. This requires the absolute continuity of ξ₀'s density. If

, and if, moreover, for some

, then

, for some C > 0 (Pham and Tran, 1985). See Doukhan

⁶

, then, under the preceding regularity and moment conditions, we have β_n ∼ n^−b, where b = ((a − 2)δ − 1)/(1 + δ) and a > 2 + 1/δ. Therefore,

holds if a > 3 + 2/δ. If, for instance, δ = 1, this becomes a > 5; on the other hand, when δ = ∞, this becomes a > 3.

(1994) for a bibliography; nevertheless, the study of two-sided linear sequences herein is not complete.

5. LIMIT THEOREMS

The aim of this section is to present the state of the art of the limit theorems for stationary sequences.

5.1. The Donsker Line

Consider a stationary sequence

. We assume that this sequence is integrable and centered at expectation

Denote by [x] the integer part of a real number x ([x] ≤ x < [x] + 1). The Donsker line (D_n(t))_t∈[0,1] is defined for any sample with positive size n as the following continuous time process:

We consider the following convergence result in the space

of continuous functions on the unit interval when the sample size n grows to infinity.

THEOREM 5.1. The following functional convergence holds in the space

under any of the weak dependence conditions formulated subsequently:

Here,

(the series is assumed to be convergent).

Recall that here (W_t)_t∈[0,1] is the standard Brownian motion; that is, W denotes the centered Gaussian real-valued process with covariance function

To avoid triviality we shall also assume that σ ≠ 0.

The preceding FCLT is known to hold in the cases that follow.

5.1.1. Strong Mixing Case (α-Mixing).

Recall that the quantile function Q of the distribution of X₀ is also the càdlàg (left continuous with right limits) inverse of the tail function

and that α⁻¹ is the càdlàg inverse of the monotone function t → α_[t]. The condition

implies the FCLT in Theorem 5.1 and also the convergence of the series in the definition of σ² (for more details, see Rio, 2000). Let δ > 0. Assume that moment of order (2 + δ) of X₀ is finite. Condition (DMR) is equivalent to

The previous FCLT result was obtained by Davydov (1973) under the slightly stronger assumption

. In other words, both series converge for the same hyperbolic mixing decays α_n ∼ n^−a for a > 1 + 2/δ. Note that no gain seems to be obtained here when one considers β-mixing sequences.

5.1.2. ρ-Mixing Case.

The condition

with

implies the FCLT, as proven by Shao (1988). It is well known that the preceding conditions ensure nice behavior of second-order moments.

5.1.3. Associated Case.

The condition

implies the functional convergence (see Newman and Wright, 1981). Clearly, this condition is also necessary to ensure that Theorem 5.1 holds.

Notice that the property “orthogonality implies independence” makes this condition credible for this very special case of an associated sequence.

The FCLT is phrased for a strictly stationary negatively associated sequence in identical terms. In fact, the tightness of

is shown using an exponential inequality that results from a comparison theorem on moment inequalities proved by Shao (2000).

5.1.4. Nonlinear Functions of Linear Processes.

We consider two-sided linear sequences. If the coefficients in the linear process

satisfy

, then

is the decorrelation rate of the sequence. The FCLT holds under the condition D > 1 (see Giraitis and Surgailis, 1986).

Giraitis and Surgailis (1986) prove the result for the partial sums of Φ(X_k) if Φ is some polynomial function with Appell rank m (see definition that follows). Recall that Appell polynomials are defined through the relation

Like Hermite polynomials, they satisfy a recursive relation: for all

Setting

, one obtains, for instance,

Let F denote the distribution function of X₀. An analytic function Φ such that

has an uniquely defined Appell expansion:

Now, if the distribution function F is infinitely differentiable, then, setting f = F′ for the density function, one obtains

. A straightforward integration by parts yields

This means that the system of functions (A_k,Q_k)_k≥0 is biorthogonal.

It is suitable to define the Appell rank of Φ as the smallest integer m such that c_m ≠ 0. Appell rank is thus uniquely defined at least for polynomials. The system of Appell polynomials is not orthogonal (except for the special case of Hermite polynomials, which are associated with Gaussian distributions). Hence existence and uniqueness of such expansions follow from additional conditions such as analyticity. We refer to Giraitis and Surgailis (1986) for details. Giraitis and Surgailis (1986) assume the existence of moments of any order and

. The functional convergence is ensured by the Chentsov tightness criterion, given in Appendix B. The reason is that the method of moments is used to prove the CLT.

Concerning one-sided sequences, Ho and Hsing (1997) obtain an analogous CLT for more general nonlinear functionals of a one-sided linear sequence. The idea is to approximate such nonlinear functions of a one-sided linear sequence by m-dependent moving averages that are easily shown to satisfy an

-CLT. The main assumptions are

and the following regularity condition. The regularity is twofold. For any subset J of the set

of nonnegative integers, define

A C¹-condition on the functions

is required if Card(J) is large enough. Hence, even a very weak regularity condition on the marginal innovations such as

for a small δ > 0 implies the regularity of the distribution of Y^J, hence the regularity needed on Φ_J's. Now Burkholder's inequality for martingales with

still yields the Chentsov tightness criterion.

Such linear sequences are also

-weakly dependent sequences. Therefore, one may refer to Section 4.3.

5.1.5. Gaussian Processes.

The same finite-dimensional CLT as for linear processes holds for instantaneous functions of Gaussian stationary sequences when one replaces m by the Hermite rank of an arbitrary function Φ such that

(see Breuer and Major, 1983).

However, Donsker's Theorem 5.1 requires an additional tightness condition. Chambers and Slud (1989) introduce such a condition in terms of the coefficients of Φ's Hermite expansion. Assume that

, where

. This means that an exponential decay of the coefficients is needed to obtain Donsker's theorem.

Chambers and Slud provide a result for general stationary processes that are built on a Gaussian chaos. Such functionals may fail to be instantaneous functions Φ(Y_n). They can be written as general Bernoulli shifts of

The authors also consider instantaneous functionals of Gaussian sequences satisfying CLT but not the Donsker theorem.

Recall, however, that a smooth lower bound assumption on the spectral density of the process yields ρ-mixing. Hence Theorem 5.1 still holds under slight additional conditions on the Gaussian process. The assumption concerning the function Φ is unchanged:

5.1.6. (θ,ℒ,ψ)-Weak Dependence.

Assume a

-weak dependence condition with

, for the stationary sequence

. Suppose also that for some

Then if the function ψ associated with weak dependence is ψ₁ (respectively, ψ₂), Doukhan and Louhichi (1999) prove the FCLT if

So, without any regularity condition on innovations, Theorem 5.1 holds for a bounded Lipschitz function of a linear process if

when D > 3. The latter doesn't need to be one sided, whereas Ho and Hsing (1997) need this assumption. Moreover, the functions Φ available are more general than those considered by Giraitis and Surgailis (1986).

Finally, a faster hyperbolic decay of coefficients in place of the boundedness assumption for Φ together with the finiteness of the fourth-order moment for the innovations yields the FCLT.

Define the class

of real-valued functions

Assuming

-weak dependence, we can even improve on the preceding condition by the following uniform bound on the set of integers i ≤ j ≤ k ≤ l and j + r ≤ k

5.1.7. Martingales and Generalizations.

In this section, we consider conditions in terms of conditional expectations with respect to an adapted filtration. We first recall that Theorem 5.1 holds for martingales with stationary square integrable increments such that

(see Billingsley, 1968). More generally, let

be a process adapted to the filtration

-measurable for any

. The following result is proved by Dedecker and Rio (2000). All ergodicity assumptions are gone. Let

denote the right shift operator (i.e.,

). Denote by

the tail σ-algebra of T-invariant Borel sets of

THEOREM 5.2. Assume that

for any

and

is a convergent series in

. Denote by

the partial sums. Then the sequence

converges in

to some r.v. η and, conditionally on the tail σ-algebra

, the process

converges to the Brownian motion ηW_t.

Remark. This result provides a FCLT with a limit process that is not Gaussian in general. If the sequence is ergodic, a standard Donsker theorem holds. Indeed, the ergodicity assumption implies that the r.v. η is almost surely constant. Hall and Heyde (1980) give this theorem under a more restrictive

assumption: both series

converge in

Theorem 5.1 under the condition (DMR) stated in Section 5.1.1 can also be derived as a corollary of Theorem 5.2.

The following corollary can be deduced from Theorem 5.2. Consider a stationary Markov sequence

with stationary distribution μ and transition operator P. Let X_n = g(ξ_n) be a centered at expectation, nonlinear functional of

. Then the FCLT holds under the condition of convergence of the series

5.2. The Empirical Cumulative Distribution

Let us consider a stationary sequence

. We assume without loss of generality that the marginal distribution of this sequence is uniform on [0,1]. The cumulative distribution of the empirical process, E_n, of the sequence

at time n is defined as (1/n)E_n(x), where

We consider the following convergence result in the Skohorod space

when the sample size n converges to infinity:

Here

is the dependent analogue of a Brownian bridge; that is, B denotes the centered Gaussian process with covariance function given by

Note that for independent sequences with a marginal cumulative distribution function F, this just means that B(x) = B(F(x)) for some standard Brownian bridge B; this justifies the name generalized Brownian bridge.

THEOREM 5.3. The following functional convergence holds in the Skohorod space of real-valued càdlàg functions on the real line,

, under the weak dependence conditions detailed in the next sections:

The preceding covariance function can be rewritten as

For the i.i.d. case, it is equivalent to Γ(x,y) = Γ_iid(x,y) = F(x) ∧ F(y) − F(x)F(y); as the supremum of two regular functions, this term is intrinsically singular on the diagonal x = y. This is no longer the case for the other terms

. If, for instance, the second-order marginals (X₀,X_k) have a continuous joint density, then T_k(x,y) is a C²-function. It is well known that the regularity properties of a Gaussian process are determined by those of its covariance function. Hence, the distortion of the Brownian bridge due to this series is not very important. Either finite-dimensional or empirical functional convergence (EFCLT) is known to hold in the following cases.

5.2.1. Strong Mixing Case (α-Mixing).

The condition

implies finite-dimensional convergence. The EFCLT holds if, for some a > 1,

This result, proved by Rio (2000), improves on the previous conditions

formerly given by Shao and Yu (1996) and on the condition a > 3 from Yoshihara (1973). This condition is close to the previous summability, necessary to ensure finite-dimensional convergence.

5.2.2. Absolute Regularity Condition (β-Mixing).

The condition

implies finite-dimensional convergence. Doukhan, Massart, and Rio (1995) obtain the EFCLT Theorem 5.3 when

, for some a > 2. Here tightness obtains with an additional loss term of order (log² n). Finally, Rio (2000) obtains the simple (and optimal) sufficient condition for EFCLT

In a previous paper Arcones and Yu (1994) have proved CLTs for empirical processes indexed by so-called V.C. subgraph classes of functions, not necessarily bounded (for more details, see van der Vaart and Wellner, 1996, p. 141). This context contrasts with the common conditions in terms of bracketing numbers. The EFCLT obtains for uniformly bounded classes in the pth mean under the β-mixing condition that

5.2.3. ρ-Mixing Case.

The condition

implies finite-dimensional convergence (see Peligrad, 1987). Shao and Yu (1996) obtain the EFCLT under the same condition.

5.2.4. Gaussian Subordinated Case.

Let

be a standard Gaussian stationary process:

. Consider a function Φ with Hermite rank m such that

. Csörgő and Mielniczuk (1996) prove the EFCLT for the subordinated field X_n = Φ(Y_n) under the natural condition

Hence even if the indicator functions do not satisfy the additional tightness assumption in Chambers and Slud (1989), the tightness of the empirical process follows from the diagram formula. In this case the Kolmogorov–Smirnov statistics are convergent even if the Donsker theorem does not hold for finite-dimensional distributions of the empirical cumulative process.

5.2.5. Associated Case.

We assume that the marginal distribution of X₀ is uniform on an interval [0,1]. Then the condition

implies finite-dimensional convergence. Louhichi (2000) obtains the EFCLT Theorem 5.3 under the condition that, for some a > 4,

Her result improves on the condition of Shao and Yu (1996):

5.2.6. One-Sided Linear Processes.

The EFCLT Theorem 5.3 holds if, for some 0 < γ ≤ 1 and S,C,Δ > 0, with SΔ > 2γ, we have

If the innovations have moments of any order, the existence of some Δ > 0 such that the preceding inequality holds implies the result.

On the other hand, a lower order moment assumption for the innovation allows higher regularity properties. If, for example, γ = 1 and S = 2, the inversion formula shows the existence of a

-integrable density. If now γ = 1 and S = 4, the density must be

-integrable.

For γ = 1, this result recovers the proof of the EFCLT in Doukhan and Surgailis (1998) when the covariance series is absolutely convergent. The latter paper considers the case S = 4; however, Burkholder's inequality for martingales yields the general result in a straightforward way. Giraitis and Surgailis (1994) gave a hint of the available results in a long memory dependence framework.

5.2.7. (θ,ℒ,ψ)-Weak Dependence.

The sequence

is assumed to satisfy a weak dependence condition we now present:

where

(in this case a weak dependence condition holds for a class of functions

working only with the values u = 1 or 2).

PROPOSITION 5.1. Let

be a stationary sequence such that (5.20) holds with

for some ν > 0. Then the sequence of processes

is tight in the Skohorod space

In the same way, stationary mixing sequences satisfy the conditions of Proposition 5.1 if

. This condition is slightly sharper than Yoshihara's condition,

for some ν > 0. Yet, it is slightly sharper than the corresponding result in Shao and Yu (see Theorem 2.2 therein), and the result of Rio (2000) improves on both of them.

THEOREM 5.4. Suppose that

-weak dependent. If either

, then the empirical functional convergence holds.

Remark. The use of the space

allows one to work with each of the classes of models in the previous section (association and Gaussian sequences enter the first case, whereas the second one corresponds to Bernoulli shifts). This yields new results for Bernoulli shifts and apparently for Markov sequences. Note that Rio's condition for α-mixing sequences improves this result. Moreover, Yu (1993) proves the same result for associated sequences with an exponent loss term 1.5.

6. FUNCTIONAL ESTIMATION

We consider a stationary process

with Z_t = (X_t,Y_t) where

. The quantity of interest is the regression function

. Let K be some kernel function integrated to 1, Lipschitzian with compact support. The kernel estimators are defined by

Here

is a sequence of positive real numbers. We always assume that h_n → 0, nh_n → ∞ as n → ∞.

DEFINITION 6.1. Let ρ = a + b with

. Define the set of ρ-regular functions

Here,

is the set of a-times continuously differentiable functions.

Assuming

, one can choose a kernel function K of order ρ (not necessarily nonnegative) such that the bias b_h satisfies

uniformly on any compact subset of

(see Rosenblatt, 1991). If, moreover, ρ is an integer with b = 1, ρ = a − 1, then with an appropriately chosen kernel K of order ρ, b_h(x) ∼ (g^(ρ)(x)/ρ!)h^ρ∫ s^ρK(s) ds, uniformly on any compact interval.

In view of the asymptotic analysis we assume that the marginal density f (.) of X₀ exists and is continuous. Moreover, f (x) > 0 for any point x of interest, and the regression function

exists and is continuous. Finally, for some

exists and is continuous. We set g = fr with obvious shorthand notation. Moreover, we impose one of the following moment conditions. Either

6.1. Second-Order Properties

We consider first the properties of

. We also consider the following conditionally centered equivalent of g₂ appearing in the asymptotic variance of the estimator

Assume that the densities of the pairs

, exist and are uniformly bounded: sup_k>0∥ f_(k)∥_∞ < ∞. Moreover, uniformly over all

, the functions

are continuous. Under these assumptions, the functions g_(k) = f_(k) r_(k) are locally bounded.

THEOREM 6.1. Suppose that the stationary sequence

satisfies the conditions (6.23) and (6.24) with p = 2. Suppose that n^δh → ∞ for some δ ∈]0,1[. Then

under any of the weak dependence condition formulated subsequently.

To consider asymptotics for the ratio estimator

we use a method, already used by Collomb (1984), that consists of studying higher order asymptotics. It is the topic of the next section.

THEOREM 6.2. Suppose that the stationary sequence

satisfies the conditions (6.23) and (6.24) with p = 2. Consider a positive kernel K. Let

for some ρ ∈]0,2], and nh^1+2ρ → 0. Then, for all x belonging to any compact subset of

under any of the weak dependence conditions formulated subsequently.

6.1.1. Strong Mixing Case (α-Mixing).

Theorems 6.1 and 6.2 hold if h_n → 0, nh_n /log(n) → ∞, and if

for some a > max{6δ,2 + 2δ}. The proof is based on a Bernstein grouping argument. Besides, Robinson (1983) proves the CLT result in Theorem 5.2 under condition (5.5) for a > 2S/(S − 2) without the assumption of positivity of the kernel K.

6.1.2. ρ-Mixing Case.

The estimators in Theorem 6.1 obey the CLT under the mixing assumption

and if the bandwidth condition h_n → 0, nh_n /log²(n) → ∞ is fulfilled (see Peligrad, 1996).

The latter CLT Theorem 6.2 holds if, for some

and if the bandwidth condition h_n → 0, nh_n /log²(n) → ∞, holds. The proof is based on a triangular CLT in Peligrad (1996) combined with moment inequalites B.1 from Shao (1995).

6.1.3. (θ,ℒ,ψ)-Weak Dependence.

Assuming that the sequence

-weak dependent with

, for j = 1 or j = 2, Ango Nze et al. (2002) prove that, uniformly in x belonging to any compact subset of

The exponential moment assumption can be relaxed. Suppose that the stationary sequence

satisfies conditions (6.22) and (6.24) with p = 2, S > 2. The former results then hold if the sequence

-weak dependent with

, for j = 1 or j = 2.

The CLT convergence Theorem 6.1 holds, under the conditions (6.23) and (6.24) with p = 2 if the stationary sequence

-weak dependent with

and

for j = 1 or j = 2. These results extend the results of Doukhan and Louhichi (1999), valid for the case of the density function

, to the estimate

under weak dependence with either ψ₁ or ψ₂. Indeed, the first right-hand-side term is obtained by Bernstein's blocking technique described in Appendix B. The second right-hand-side term results from the application of the Lindeberg method (see Rio, 2000).

The CLT convergence Theorem 6.2 relies on the expansion

where

with

Using the Rosenthal inequalities described in Appendix B and the aforementioned CLT, we obtain the CLT convergence Theorem 6.2 for the regression function, under conditions (6.23) and (6.24) with p = 2, if the stationary sequence

-weak dependent with

for j = 1 or j = 2.

The results stated in Theorem 6.1 and Theorem 6.2 also hold for finite-dimensional convergence. The components are asymptotically jointly independent, much in the same way as for i.i.d. sequences.

Moreover, Rios (1996) proves that the local linearity test can be handled in the strong mixing case. The function r is assumed to be ρ continuous. Then the plug-in estimator

converges to T = ∫r²(x)w(x) dx if

and the bandwidth condition h_n ∈ [n^−1/10,n^{−1/(2ρ−4)}].

6.2. Almost Sure Convergence Properties

THEOREM 6.3. Let

be a stationary sequence satisfying the conditions (6.23) and (6.24) with p = 2. Then under the conditions formulated in the sections that follow,

Remark. Under condition (ii) of Theorem 6.3, but assuming only the weaker condition about the bandwidth sequence

we obtain

6.2.1. Strong Mixing Case (α-Mixing).

Liebscher (1996) proves the uniform almost sure convergence at the optimal rate (ε_n = 1) if

, with a > 4 + 3/ρ.

6.2.2. ρ-Mixing Case.

Peligrad (1991) states a uniform almost sure convergence result with ε_n = log(n) if

, with r > (ρ + 1)/2ρ.

6.2.3. (θ,ℒ,ψ₁)-Weak Dependence Case.

For the sake of simplicity, we only consider the geometrically dependent case.

THEOREM 6.4. Let

be a stationary sequence satisfying the conditions (6.23) and (6.24) with p = 2 and either

-weak dependent with θ_r ≤ a^r for some 0 < a < 1.

7. CONCLUSION

This paper presented the new weak dependence condition of Doukhan and Louhichi (1999). It is related to some of the most popular conditions used by econometricians to transcribe the notion of fading memory. The new dependence condition has the advantage that it allows consideration of a broader class of models. This natural weak dependence condition also fits well with the more general (stationary) models used in econometrics. As we have illustrated, most applications of interest can be set out under this weaker dependence condition. Moreover, our framework is a very natural one for bootstrapping techniques. We have also provided several useful limit theorems.

APPENDIX A: PROOFS

A.1. Proofs for Section 4: Models.

Proof of Proposition 4.1. Let A = i₁,…,i_u and B = j₁,…,j_v be two finite subsets of

with i₁ ≤ … ≤ i_u < i_u + r ≤ j₁ ≤ … ≤ j_v. Set

and, for a given integer

. Then for any functions

with obvious notations

If p ≤ r/2, the r.v.s

are measurable with respect to the independent σ-fields σ(ξ_i : i₁ − p < i < i_u + p) and σ(ξ_j : j₁ − p < j < j_v + p). Therefore, the last covariance term is null. Besides, set

Then,

in the one-sided case. The two-sided case is handled similarly. █

A.2. Proofs for Section 5: FCLT.

Sketch of the Proof of Theorem 5.1 in the α-Mixing Case. With the notations of Rosenthal inequality in Lemma B.10 in Appendix B,

converges in distribution to the normal distribution

, where σ² = lim n⁻¹ Var(S_n) is assumed to be positive. The tightness of the process (D_n(t)) is derived according to Lemma B.1. Because the sequence (S_n²/n) is uniformly integrable, there exists a convex function G increasing faster than x at infinity, such that

blocking argument, it follows that (see details in Rio, 2000, Proposition 2)

Here

. The tightness condition obtains as soon as nα_n → 0. █

Sketch of the Proof of Theorem 5.1 in the ρ-Mixing Case. The proof follows the same lines as in the α-mixing case. Details are developed in the book by Lin and Lu (1996) (see Section 4.1 therein). █

Sketch of the Proof of Theorem 5.1 in the Weak Dependent Case. Lemma B.12 and a maximal inequality by Moricz, Serfling, and Stout (1982) yield

as soon as for any increasing sequence of integers 0 ≤ i < j < k ≤ l

Moreover, this entails that σ² = lim n⁻¹ Var(S_n) > 0, so that the finite-dimensional convergence is obtained. The tightness of the process is a consequence of (A.3). The first part of (A.3) follows from the covariance bound

. The latter bound follows from

. █

A.3. Proofs for Section 5: Empirical Process.

Proof of Proposition 5.1. Using the Rosenthal inequality in Lemma B.12, we get

The conclusion follows from Lemma 9.4 in Shao and Yu (1996). █

Proof of Theorem 5.4. Lemma B.13.

A.4. Proofs for Section 6: Functional Estimation.

A.4.1. Second-order Properties.

Proof of Theorem 6.1. We proceed as in Rio (2000) and more specifically as in Coulon-Prieur and Doukhan (2000) for density estimation.

Consider a sequence

of i.i.d. r.v.s with standard normal distribution independent of

. Here

denotes the estimator truncated at level M(n) by a Lipschitz continuous function. Define

where

. Let φ denote a three times differentiable function with bounded derivatives up to order 3 and consider the following r.v.s:

We are interested in establishing that

We consider either a

- or a

-weakly dependent sequence

. We shall follow the practical abuse of notation, in which the same letter C is used for different constants.

Formula (A.6) is easily proved using the exponential moment assumption (6.23).

To prove formula (A.7), we apply the so-called Lindeberg–Rio method (see Rio, 2000). Clearly,

Because

we obtain

Moreover,

It then follows that

We can now bound the preceding five terms.

In the

case,

For a

-weak dependent sequence, again using (A.9)–(A.15), we need θ_r = O(r^−a) with

If the sequence

-weak dependent with θ_r = O(r^−a) for some a > 3, then by (A.9)–(A.15) the right-hand side term of (A.8) tends to zero as n⁻¹.

The CLT for

is now proved. The second assertion is a consequence of the first one, where Y_t is replaced by Y_t − r(x). █

A.4.2. Almost Sure Convergence.

Proof of Theorem 6.3. We keep usual notations and denote by C a universal constant (whose value can change from one place to another). Assume that

. Then

and, by the Cauchy–Schwarz inequality,

We can now reduce computations to the case of a density estimator, as in Doukhan and Louhichi (1999). Assume that the interval [−M,M] is covered by L_ν intervals with diameter 1/ν (here ν = ν(n) depends on n, and we denote by I_j the jth interval and by x_j the center of the interval). Assume that the relation hν → ∞ holds (for n → ∞). Assume that the compactly supported kernel K vanishes if t > R₀. Liebscher (1996) exhibits another kernel-type density estimate

based on an even, continuous kernel, decreasing on [0,∞[, constant on [0,2R₀], taking the value 0 at t = 3R₀. Then, he proves that

Therefore, for any λ > 0,

Exponential inequality (DPL), which can be found in Section B.4.1 of Appendix B, completes the proof of assertion (i). █

Remark. Zhao and Fang (1985) prove almost sure convergence, uniform on compact sets, of the kernel regression estimator for strongly mixing stationary process under the same condition as in Theorem 6.3. Let us consider a strongly mixing processes that satisfies conditions (6.23) and (6.24) with p = 2. Assume that

for some δ ∈]0,1[. If the moment condition (6.22) holds with S > 4 + 2/ρ, and if the mixing rate is α_r ≤ r^−a for some a > 4 such that

then, almost surely,

APPENDIX B: TECHNICAL LEMMAS

B.1. Sufficient Conditions for Tightness.

B.1.1. Kolmogorov–Chentsov Criterion.

To obtain functional convergence in distribution it is usual to make use of some chaining argument to prove tightness of the sequence (Y_n(t))_n≥1, where

, in the space

. Chaining techniques can be found in Pollard (1981). For the sake of completeness, we recall the following tightness result deduced from the Arzela–Ascoli theorem.

LEMMA B.1 (Billingsley, 1968). The sequence of processes (Y_n(t))_t∈[0,1] is a tight sequence in the space

for n = 1,2,… if for each ε,η > 0, there exists a δ > 0 and an integer n₀ such that for all t ∈ [0,1]

To conclude the section we recall the standard Kolmogorov–Chentsov tightness criterion, adapted to moment inequalities.

LEMMA B.2 (Billingsley, 1968). A sequence of continuous processes (Z_n(t))_t∈[0,1] is tight in the space

for n = 1,2,… if there exist constants C > 0, p > 0, q > 1, such that for any s,t ∈ [0,1]:

B.1.2. Tightness of the Empirical Process.

We now recall a tightness criterion in the Skohorod space D([0,1]), of left continuous with right limits functions on the interval. Note that we can restrict ourselves to the case of marginal uniform distributions.

LEMMA B.3 (Billingsley, 1968). A sequence of càdlàg (right continuous with left limits) processes (Z_n(x))_x∈[0,1] is tight in D([0,1]) if

To conclude the section, we recall a chaining lemma, a proof of which can be found in Shao and Yu (1996); this result can be used to prove tightness of the empirical process when the fourth-order moment involved (p = 4) is bounded. The absolutely regular case is an exception for which a more general technique is used by Rio (2000).

LEMMA B.4 (Shao and Yu, 1996). Let X_n be a stationary sequence with marginal cumulative function F. Then the empirical Brownian bridge

is a tight sequence in the Skohorod space

if there exist constants C > 0, p > 2, q > 1, and u > 0, 0 ≤ v ≤ 1, with u + v > 1 such that for any

B.2. Tools under Mixing.

B.2.1. Covariance Inequalities.

A fundamental covariance inequality due to Rio (2000) extends on the previously known ones. First, recall the following definition.

DEFINITION 9.1. The quantile function Q_X of the real valued r.v. X is the càdlàg inverse of the tail function of |X|,

We are now in a position to state the fundamental covariance inequality.

LEMMA B.5 (Rio, 2000). Let u, v be two real-valued r.v.s with finite variance. Then, setting α = α(σ(u),σ(v)), we have

Because

, we find by taking expectations that

Hence a simple use of the Fubini theorem yields, for p > 0, the identity

. Thus, using Hölder's inequality, we deduce

Similar covariance inequalities are available for other mixing type sequences. Namely, we can state the following lemmas.

LEMMA B.6 (Bradley and Bryc, 1985). Let u, v be two real-valued r.v.s. Set ρ = ρ(σ(u),σ(v)). Let p > 0 and q > 0 be such that 1/p + 1/q = 1. Then

LEMMA B.7 (Ibragimov, 1962). Let u, v be two real-valued r.v.s. Set φ = φ(σ(u),σ(v)). Let p > 0 and q > 0 be such that 1/p + 1/q = 1. Then

B.2.2. Reconstruction Lemmas.

Such results exist in the α-mixing and β-mixing cases. The first one is due to Berbee (1979), and the second one is due to Rio (1994). The reader is referred to Rio (2000) for further details.

LEMMA B.8 (Berbee, 1979). Let u, v be two r.v.s defined on the probability space

and taking their values in Polish spaces

. Then, enlarging the probability space if it is necessary, it is possible to define an r.v. v* with the same distribution as v and such that u and v* are independent r.v.s and

We shall use a similar device for real-valued strong mixing sequences.

LEMMA B.9 (Rio, 2000). Let

be a σ-field of

and let v be a real-valued r.v. with values (a.s.) in [a,b]. Suppose, furthermore, that there exists an r.v. U with uniform distribution on [0,1] independent of

. Then there exists an r.v. v* independent of

with the same distribution as

-measurable and such that

B.2.3. Rosenthal Inequalities.

First applications of Rosenthal-type inequalities can be found in Billingsley (1968). They concern the Kolmogorov–Smirnov functional CLT for the empirical cumulative distribution of a φ-mixing sequence. A nice and general presentation of applications of Rosenthal inequalities to functional CLT is provided in Andrews and Pollard (1994). Maximal Rosenthal inequalities are available either in the α-mixing or in the ρ-mixing case.

For simplicity we consider stationary sequences

, such that for some real number r ≥ 2,

and we set

Similarly, α⁻¹ denotes the inverse of the decreasing function s → α_[s] (where

denotes the integral part of

defined by

LEMMA B.10 (Rio, 1994). Let (X_n) be a strong mixing sequence and for r ≥ 2 set

Then there exists a constant C_r only depending on r such that

For example, an explicit computation gives

This implies the maximal Rosenthal inequality

Assume now that

for some positive δ. Then the integral

can be bounded using Hölder's inequality and the relation

The integral on the right-hand side of the previous inequality can be written as a sum, yielding, after the use of the Abel transformation and the simple inequality (n + 1)^p − n^p ≤ p(n + 1)^p−1 that follows for p > 1 from the use of the mean value theorem,

, the moment M_r,α is finite.

LEMMA B.11 (Shao, 1995). Let (X_n) be a ρ-mixing sequence and set for r ≥ 2 and some constant K_r,ρ depending only on r and ρ:

Then there exists a constant C_r,ρ depending only on r and ρ such that

This follows from Bradley and Bryc's lemma and from several important lemmas in Peligrad (1987). The work of Peligrad (1987) yields maximal bounds for the moments of sums of n-samples of order 2 and 4 that only involve the sums

B.3. Tools under Association.

The following inequality is essential for studying associated r.v.s.

THEOREM B.1 (Newman, 1984). For a pair of measurable numeric functions (f,g) defined on

, we write f << g if both functions g + f and g − f are nondecreasing with respect to each argument. Now let X be any associated random vector with range in A. Then

This theorem follows simply from several applications of the definition to the coordinatewise nondecreasing functions g_i − f_i and g_i + f_i. By an easy application of the preceding inequalities one can check that

for

-valued associated random vectors

functions f and g with bounded partial derivatives. For this, it suffices to note that f << f₁ if one defines

and uses Theorem B.1.

Denote by

the real part of the complex number z. Theorem B.1 can be extended to complex-valued functions, up to a factor 2 on the left-hand side of inequality (B.2). Indeed, we can now set f << g if for any real number ω the mapping

is nondecreasing with respect to each argument. Also, for any real numbers t₁,…,t_k,

If now the r.v.s X_i have a density, bounded uniformly with respect to the index i, then

This relation provides a connection between

-weak dependences (the class

will be defined in (B.11) of Section B.4) in the context of associated sequences.

The negatively associated r.v.s are much simpler to handle. The key result is a comparison theorem established by Shao (2000). For any convex function f on

for a given sequence (X_i)_1≤i≤n of negatively associated r.v.s and for any sequence (X_i*)_1≤i≤n of independent r.v.s such that

for each i = 1,…,n. To avoid triviality, we assume that the preceding right-hand-side term exists.

B.4. Tools under Weak Dependence.

B.4.1. Moment and Exponential Inequalities.

Let

be a sequence of r.v.s with

. In this section, we give moment bounds for

, when

. For positive integers r, define coefficients of weak dependence as nondecreasing sequences (C_r,q)_q≥2 such that

for 1 ≤ t₁ ≤ … ≤ t_q ≤ n and for integers 1 ≤ m < q, t_m+1 − t_m = r. Doukhan and Louhichi (1999) provide explicit bounds C_r,q to construct inequalities for partial sums S_n. Two kinds of bounds are considered, either

where Q_X still denotes X's quantile function and c,γ ≥ 0 denote real numbers. In the examples, bound (B.5) holds for bounded sequences such that ∥X_n∥_∞ ≤ M. For instance,

-weak dependence yields the bounds

where

. As in Lemma 1 of Doukhan and Louhichi (1999), we see that under

-weak dependence with ψ(h,k,u,v) = c(u,v) × Lip(h)Lip(k), a bound is

Bound (B.6) holds for more general r.v.s, as can be shown using moment or tail assumptions. A first consequence of the previous definitions is the following Marcinkiewicz–Zygmund inequality.

Let

be a sequence of r.v.s with

satisfying the condition

Then, there exists a constant B > 0 independent of n for which

The following lemma gives moment inequalities with q ∈ {2,4}. It was essentially proved in Billingsley (1968, Lemmas 3 and 4, p. 172).

LEMMA B.12 (Doukhan and Louhichi, 1999). If

is a sequence of r.v.s with

, then

The following theorems deal with higher order moments.

THEOREM B.2 (Doukhan and Louhichi, 1999). Let q ≥ 2 be some integer. Assume that dependence coefficients C_r,p of the sequence (X_n) satisfy

for all integers 0 < p ≤ q and for some positive constants M, γ, C. Then, for any integer n ≥ 2,

Theorem B.2 is adapted to work with bounded sequences. Define the class

To consider the unbounded case, we shall consider

-weak dependence where ψ(h,k,u,v) = c(u,v).

THEOREM B.3 (Doukhan and Louhichi, 1999). Let

be a

-weak dependent sequence with

and set C_q = (max_u+v≤q c(u,v)) ∨ 2. Then

In the special case of strongly mixing and stationary sequences, this is Theorem 1 in Rio (1994). The explicit form of the constants compensates for the fact that we are restricted to even integers.

Remark. Exponential inequalities can be obtained using Theorem B.2. Define

We first suppose that for all integers q ≥ 2 and n

where β is some constant and A_n is a sequence independent of q. Then the following Doukhan–Portal type exponential inequality is available:

One can take here the constants

. Let (X_n) be a sequence of r.v.s with

. Inequality (DPL) holds when C_r,q = Cσ²M^q−2e^γqe^−br for suitable constants C,σ,γ,b > 0, if ∥X_n∥_∞ ≤ M and ∥X_n∥₂ ≤ σ, for any integer n ≥ 0. In this case, A_n = nσ². For example, inequality (DPL) holds under

-weak dependence if

for some δ ≥ 0.

The use of combinatorics in the preceding inequalities makes them relatively weak. For instance, Bernstein's inequality, valid for independent sequences, allows one to replace the term

in the preceding inequality by x² under the same assumption nσ² ≥ 1; in the mixing cases similar inequalities are also obtained using coupling arguments that are not available here.

Shao (2000) utilizes his main comparison result (expression (B.4)) to show that most of the inequalities for independent r.v.s remain true for negatively associated r.v.s, even with respect to the constants. For instance, a Rosenthal inequality is proved. Recalling that S_n* = max{|S₁|,…,|S_n|}, we have that for any real number p > 2,

An exponential inequality is also derived for negatively associated r.v.s. It is as sharp as the Bernstein's inequality for independent r.v.s.

The results of Shao (2000) can also be combined with those in the papers by Ibragimov and Sharakhmentov (2001) and de la Peña, Ibragimov, and Sharakhmentov (2003) to obtain sharp versions of Rosenthal inequalities. These inequalities concern general moments of partial sums of negatively associated sequences of nonnegative and symmetric r.v.s and also mean zero r.v.s in the case of even power p. A symmetrization argument gives improvements of formula (B.12) in the context of mean zero r.v.s and arbitrary power p.

Proof of Lemma 2.1. As an application of the preceding results, let us turn to the proof of a correct version of Lemma 1 in Hall and Horowitz (1996). Set

and let

. Then

B.4.2. Central Limit Theorem.

Following Withers (1981) and Newman (1984), the CLT holds if σ² = lim_n→∞(1/n)Var(X₁ + … + X_n) > 0 (this limit is assumed to exist), and the sequence S_n²/n is uniformly integrable.

Define by

the class of complex exponential functions of complex exponential functions f such that f (x₁,…,x_p) = exp(it_f(x₁ + … + x_p)) for some

. The function ψ is defined for this class by ψ_exp(f,g,p,q) = pq(1 + t_f t_g). Let

be the dependence coefficient associated with the preceding function class.

The CLT holds for a

-weak dependent sequence with

. The inclusion

implies that the CLT holds under a

-weak dependence condition, with ψ(h,k,u,v) = min(u,v)Lip(h)Lip(k) + 1. Notice that the preceding condition

holds for associated r.v.s where [sum ]_n Cov(X₀,X_n) < ∞.

PROPOSITION B.1 (Doukhan and Louhichi, 1999). The CLT holds under

-weak dependence if ψ(h,k,u,v) = min{u,v}μ(Lip(h),Lip(v)) for some locally bounded real function on

. It also holds if, for some d > 0, ψ(h,k,u,v) = (u + v)^d × μ(Lip(h),Lip(v)) and, for some

The proof is based on a more general lemma (see, e.g., Ibragimov, 1962; Ibragimov and Linnik, 1971; and Withers, 1981) setting Y_k = φ(X_k). Let

be a stationary sequence. The idea is to split

into Bernstein's blocks.

LEMMA B.13 (Ibragimov, 1962). Let (Y_k) be a stationary sequence of centered r.v.s. Let p = p(n), q = q(n), k = [n/(p + q)] be integer-valued functions satisfying

We define Bernstein blocks as

Then we set

Denote σ_n² = Var S_n and let g, h be either x → cos x or x → sin x. If

then the sequence S_n /σ_n converges in distribution to a Gaussian

distribution.

References

REFERENCES

Alsmeyer, G. & C.D. Fuh (2001) Limit theorems for iterated random functions by regenerative methods. Stochastic Processes and Their Applications 96, 123–142.Google Scholar

Andrews, D.W.K. (1984) Non strong mixing autoregressive processes. Journal of Applied Probability 21, 930–934.Google Scholar

Andrews, D.W.K. (1988) Laws of large numbers for dependent non-identically distributed random variables. Econometric Theory 4, 458–467.Google Scholar

Andrews, D.W.K. (2002) Higher-order improvements of a computationally attractive k-step bootstrap for extremum estimators. Econometrica 70, 119–162.Google Scholar

Andrews, D.W.K. & D. Pollard (1994) An introduction to functional central limit theorems for dependent stochastic processes. International Statistical Review 62, 119–132.Google Scholar

Ango Nze, P. (1995) Critères d'ergodicité de modèles markoviens: Estimation non paramétrique sous des hypothèses de dépendance. Ph.D. Dissertation, Université Dauphine, Paris.

Ango Nze, P. (1998) Critères d'ergodicité géométrique ou arithmétique de modèles linéaires perturbés à représentation markovienne. Comptes Rendus de l'Académie des Sciences, Serie 1, Paris t. 326, 371–376.Google Scholar

Ango Nze, P., P. Bühlmann, & P. Doukhan (2002) Weak dependence beyond mixing and asymptotics for nonparametric regression. Annals of Statistics 30, 397–430.Google Scholar

Arcones, M.A. & B. Yu (1994) Central limit theorems for empirical and u-processes of stationary sequences. Journal of Theoretical Probability 7, 47–71.Google Scholar

Berbee, H.C.P. (1979) Random Walks with Stationary Increments and Renewal Theory. Mathematical centre tracts, Amsterdam.

Bickel, P. & P. Bühlmann (1999) A new mixing notion and functional central limit theorems for a sieve bootstrap in time series. Bernoulli 5, 413–446.Google Scholar

Billingsley, P. (1968) Convergence of Probability Measures. Wiley.

Bollerslev, T. (1986) Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 31, 307–327.Google Scholar

Bradley, R.C. & W. Bryc (1985) Multilinear forms and measures of dependence between random variables. Journal of Multivariate Analysis 16, 335–367.Google Scholar

Breuer, P. & P. Major (1983) Central limit theorems for non-linear functionals of Gaussian random fields. Journal of Multivariate Analysis 13, 425–441.Google Scholar

Chambers, D. & D. Slud (1989) Central limit theorems for non-linear functionals of stationary Gaussian processes. Probability Theory and Related Fields 80, 323–346.Google Scholar

Chen, X. & Y. Fan (1999) Consistent hypotheses testing in semiparametric and nonparametric models for econometric time series. Journal of Econometrics 91, 373–401.Google Scholar

Cline, D. & H. Pu (1998) Verifying irreducibility and continuity of a nonlinear time series. Statistics and Probability Letters 40(2), 139–148.Google Scholar

Collomb, G. (1984) Propriétés de convergence presque complète du prédicteur à noyau. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 66, 441–460.Google Scholar

Coulon-Prieur, C. & P. Doukhan (2000) A triangular CLT for weakly dependent sequences. Statistics and Probability Letters 47, 61–68.Google Scholar

Csörgő, M. & J. Mielniczuk (1996) The empirical process of a short-range dependent stationary sequence under Gaussian subordination. Probability Theory and Related Fields 104, 15–25.Google Scholar

Davydov, Y. (1973) Mixing conditions for Markov chains. Theory of Probability and Its Applications 18, 313–328.Google Scholar

Dedecker, J. & E. Rio (2000) On the functional central limit theorem for stationary processes. Annales de l'institut Henri Poincaré, série B 36, 1–34.Google Scholar

Diaconis, P. & D. Freedman (1999) Iterated random functions. SIAM Review 41, 45–76Google Scholar

Doukhan, P. (1994) Mixing: Properties and Examples. Lecture Notes in Statistics 85. Springer Verlag.

Doukhan, P. (2002) Models, inequalities and limit theorems for stationary sequences. In P. Doukhan, G. Oppenheim, & M. Taqqu (eds.), Long Range Dependence, Theory and Applications. Birkhäuser.

Doukhan, P., P. Massart, & E. Rio (1995) Invariance principles for absolutely regular empirical processes. Annales de l'institut Henri Poincaré, série B 31, 393–427.Google Scholar

Doukhan, P. & S. Louhichi (1999) A new weak dependence condition and application to moment inequalities. Stochastic Processes and Their Applications 84, 313–343.Google Scholar

Doukhan, P. & D. Surgailis (1998) Functional central limit theorem for the empirical process of short memory linear processes. Comptes Rendus de l'Académie des Sciences, Paris 326, 87–92.Google Scholar

Duflo, M. (1990) Méthodes récursives aléatoires. Masson. (English edition, Springer, 1996).

Esary, J., F. Proschan, & D. Walkup (1967) Association of random variables with applications. Annals of Mathematical Statistics 38, 1466–1476.Google Scholar

Fort, G. & E. Moulines (2002) Computable bounds for polynomial ergodicity. Stochastic Processes and Their Applications. Forthcoming.

Fortuin, C., P. Kastelyn, & J. Ginibre (1971) Correlation inequalities on some ordered sets. Communications in Mathematical Physics 22, 89–103.Google Scholar

Giraitis, L., P. Kokoszka, & R. Leipus (2000) Stationary ARCH models: Dependence structure and CLT. Econometric Theory 16, 3–22.Google Scholar

Giraitis, L. & D. Surgailis (1986) Multivariate Appell polynomials and the central limit theorem. In E. Eberlein & M. Taqqu (eds.), Dependence in Probability and Statistics, pp. 27–51. Birkhaüser.

Giraitis, L. & D. Surgailis (1994) A Central Limit Theorem for the Empirical Process of a Long Memory Linear Sequence. Preprint.

Granger, C.W.J. & A.P. Andersen (1978) An Introduction to Bilinear Time Series Models. Vandenhoeck and Ruprecht.

Hall, P. & C.C. Heyde (1980) Martingale Limit Theory and Its Applications. Academic Press.

Hall, P. & J. Horowitz (1996) Bootstrap critical values for tests based on generalized method of moment estimators. Econometrica 64, 891–916.Google Scholar

He, C. & T. Teräsvirta (1999) Properties of moments of a family of GARCH models. Journal of Econometrics 92, 173–192.Google Scholar

Ho, H.-C. & T. Hsing (1997) Limit theorems for functionals of moving averages. Annals of Probability 25, 1636–1669.Google Scholar

Ibragimov, I.A. (1962) Some limit theorems for stationary processes. Theory of Probability and Its Applications 7, 349–382.Google Scholar

Ibragimov, I.A. & Y.V. Linnik (1971) Independent and stationary sequences of random variables. Wolters-Noordhoff.

Ibragimov, R. & S. Sharakhmentov (2001) The exact constant in the Rosenthal inequality for random variables with mean zero. Theory of Probability and Its Applications 46, 127–132.Google Scholar

Jarner, S.F. & G.O. Roberts (2002) Polynomial convergence rates of Markov chains. Annals of Applied Probability 12, 224–247.Google Scholar

Jarner, S.F. & R.L. Tweedie (2001) Locally contacting iterated functions and stability of Markov chains. Journal of Applied Probability 38, 494–507.Google Scholar

Kallenberg, O. (1997) Foundations of Modern Probability. Springer-Verlag.

Kolmogorov, A.N. & Y.A. Rozanov (1960) On the strong mixing conditions for stationary Gaussian sequences. Theory of Probability and Its Applications 5, 204–207.Google Scholar

Künsch, H.R. (1989) The jacknife and bootstrap for general stationary observations. Annals of Statistics 17, 1217–1241.Google Scholar

Liebscher, E. (1996) Strong convergence of sums of α-mixing random variables with applications to density estimation. Stochastic Processes and Their Applications 65, 69–98.Google Scholar

Lin, Z.Y. & C.R. Lu (1996) Limit Theory for Mixing Dependent Random Variables. Kluwer Academic Publishers.

Louhichi, S. (2000) Weak convergence for empirical processes of associated sequences. Annales de l'Institut Henri Poincaré, série B 36, 547–567.Google Scholar

McLeish, D.L. (1975) A maximal inequality and dependent strong laws. Annals of Probability 3, 829–839.Google Scholar

Meyn, S.P. & R.L. Tweedie (1993) Markov chains and stochastic stability. Communications in Control and Engineering.

Mills, T.C. (1999) The Econometric Modelling of Financial Time Series. Cambridge University Press.

Mokkadem, A. (1990) Propriétés de mélange des modèles autorégressifs polynômiaux. Annales de l'Institut Henri Poincaré, série B 26, 219–260.Google Scholar

Moricz, F.A., R.J. Serfling, & W.F. Stout (1982) Moment and probability bounds with quasi-superadditive structure for the maximum partial sum. Annals of Probability 10, 1032–1040.Google Scholar

Newman, C.M. (1984) Asymptotic independence and limit theorems for positively and negatively dependent random variables. In Y.L. Tong (ed.), Inequalities in Statistics and Probability, IMS Hayward, CA Lecture Notes-Monograph Series 5, pp. 127–140.

Newman, C.M. & A.L. Wright (1981) An invariance principle for certain dependent sequences. Annals of Probability 9, 671–675.Google Scholar

Ornstein, D.S. & G. Weiss (1990) Statistical properties of chaotic systems. Bulletin of the American Mathematical Society 24, 11–116.Google Scholar

Peligrad, M. (1987) The convergence of moments in ρ-mixing sequences of random variables. Proceedings of the American Mathematical Society 101, 142–148.Google Scholar

Peligrad, M. (1991) Properties of uniform consistency of the kernel estimators of density and of regression functions under dependence assumptions. Stochastics and Stochastic Reports 40, 147–168.Google Scholar

Peligrad, M. (1996) On the asymptotic normality of weak dependent random variables. Journal of Theoretical Probabilities 9, 703–715.Google Scholar

de la Peña, V.H., R. Ibragimov, & S. Sharakhmentov (2003) On extremal distributions and sharp L_p-bounds for sums of multilinear forms. Annals of Probability 31, 630–675.Google Scholar

Petruccelli, J. & S.W. Woolford (1984) A threshold AR(1) model. Journal of Applied Probability 21, 270–286.Google Scholar

Pham, T.D. (1986) The mixing property of bilinear and generalized random coefficient autoregressive models. Stochastic Processes and Their Applications 23, 291–300.Google Scholar

Pham, T.D. & L.T. Tran (1985) Some mixing properties of time series models. Stochastic Processes and Their Applications 19, 297–303.Google Scholar

Phillips, P.C.B. (1987) Time series regression with a unit root. Econometrica 55, 277–301.Google Scholar

Pollard, D. (1981) Limit theorems for empirical processes. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 57, 181–195.Google Scholar

Pötscher, B.M. & I.R. Prucha (1991a) Basic structure of the asymptotic theory in dynamic nonlinear econometric models, part I: Consistency and approximation concepts. Econometric Reviews 10, 125–216.Google Scholar

Pötscher, B.M. & I.R. Prucha (1991b) Basic structure of the asymptotic theory in dynamic nonlinear econometric models, part II: Asymptotic normality. Econometric Reviews 10, 253–326.Google Scholar

Prakasa, R. (1983) Nonparametric Functional Estimation. Academic Press.

Rio, E. (1994) Inégalités de moments pour les suites stationnaires et fortement mélangeantes. Comptes Rendus de l'Académie des Sciences, Paris 318, 355–360.Google Scholar

Rio, E. (2000) Théorie asymptotique des processus aléatoires faiblement dépendants. Mathématiques et applications 31.Google Scholar

Rios, R. (1996) Utilisation des techniques non paramétriques et semi paramétriques en statistique des données dépendantes. Ph.D. Dissertation, Université Paris sud, Paris.

Robinson, P.M. (1983) Nonparametric estimators for time series. Journal of Time Series Analysis 4, 185–207.Google Scholar

Robinson, P.M. (1989) Hypothesis testing in semiparametric and nonparametric models for econometric time series. Review of Economic Studies 56, 511–534.Google Scholar

Rosenblatt, M. (1956) A central limit theorem and a strong mixing condition. Proceedings of National Academy of Sciences, U.S.A. 42, 43–47.Google Scholar

Rosenblatt, M. (1985) Stationary Sequences and Random Fields. Birkhaüser.

Rosenblatt, M. (1991) Stochastic curve estimation. NSF-CBMS Regional Conference Series in Probability and Statistics 3.

Shao, Q.M. (1988) Remark on the invariance principle for ρ-mixing sequences of random variables. Chinese Annals of Mathematics 9A, 409–412.Google Scholar

Shao, Q.M. (1995) Maximal inequalities for partial sums of ρ-mixing sequences. Annals of Probability 23, 948–965.Google Scholar

Shao, Q.M. (2000) A comparison theorem on moment inequalities between negatively associated and independent random variables. Journal of Theoretical Probability 13, 343–356.Google Scholar

Shao, Q.M. & H. Yu (1996) Weak convergence for weighted empirical processes of dependent sequences. Annals of Probability 24, 2052–2078.Google Scholar

Tjøstheim, D. (1986) Some doubly stochastic time series models. Journal of Time Series Analysis 7, 51–72.Google Scholar

Tjøstheim, D. (1990) Non-linear time series and Markov chains. Advances in Applied Probability 22, 587–611.Google Scholar

Tong, H. (1981) A note on a Markov bilinear stochastic process in discrete time. Journal of Time Series Analysis 2 (4), 279–284.Google Scholar

van der Vaart, A.W. & J.A. Wellner (1996) Weak convergence and empirical processes. Springer-Verlag.

Veretennikov, A.Y. (1999) On polynomial mixing and convergence rate for stochastic difference and differential equation. Theory of Probability and Its Applications 44, 361–374.Google Scholar

Withers, C.S. (1981) Central limit theorem for dependent variables. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 57, 509–534.Google Scholar

Wolkonski, V.A. & Y.A. Rozanov (1959) Some limit theorems for random functions, Part I. Theory of Probability and Its Applications 4, 178–197.Google Scholar

Wolkonski, V.A. & Y.A. Rozanov (1961) Some limit theorems for random functions, Part II. Theory of Probability and Its Applications 6, 186–198.Google Scholar

Yoshihara, K. (1973) Billingsley's theorems on empirical processes of strong mixing sequences. Yokohama Mathematical Journal 23, 1–7.Google Scholar

Yoshihara, K. (1976) Limiting behaviour of U-statistics for stationary, absolutely regular processes. Zeitung Wahrscheinlichkeitstheorie und verwandte Gebiete 35, 237–252.Google Scholar

Yu, H. (1993) A Glivenko-Cantelli lemma and weak convergence for empirical processes of associated sequences. Probability Theory and Related Fields 95, 357–370.Google Scholar

Zhao, L. & Z. Fang (1985) Strong convergence of kernel estimates of nonparametric regression functions. Chinese Annals of Mathematics 6B, 147–155.Google Scholar

Article contents

WEAK DEPENDENCE: MODELS AND APPLICATIONS TO ECONOMETRICS

Abstract

1. INTRODUCTION

2. ECONOMETRICS AND DEPENDENCE

2.1. Unit Root Tests

2.2. Parametric Problems

2.2.1. Block Bootstrap.

2.2.2. Conditional Bootstrap.

2.3. Bootstrapping Critical Values for GMM Estimators

2.4. Sieve Bootstrap

2.5. A Semiparametric Estimation Problem

2.6. Nonparametric Problems

3. WEAK DEPENDENCE CONDITIONS

3.1. Mixing

3.2. Mixingales and Near Epoch Dependence

3.3. Association

3.4. Weak Dependence

4. MODELS: DEPENDENCE PROPERTIES

4.1. Markovian Models

4.1.1. Models with a Markovian Representation.

4.1.2. Weak Dependence Properties.

4.1.3. Mixing Properties.

4.2. Associated Sequences

4.3. Bernoulli Shifts

4.3.1. Markov Sequences.

4.3.2. Chaotic Representations.

4.3.3. Mixing Properties.

4.3.4. (θ,ℒ,ψ)-Weak Dependence.

4.3.5. Linear Processes.

5. LIMIT THEOREMS

5.1. The Donsker Line

5.1.1. Strong Mixing Case (α-Mixing).

5.1.2. ρ-Mixing Case.

5.1.3. Associated Case.

5.1.4. Nonlinear Functions of Linear Processes.

5.1.5. Gaussian Processes.

5.1.6. (θ,ℒ,ψ)-Weak Dependence.

5.1.7. Martingales and Generalizations.

5.2. The Empirical Cumulative Distribution

5.2.1. Strong Mixing Case (α-Mixing).

5.2.2. Absolute Regularity Condition (β-Mixing).

5.2.3. ρ-Mixing Case.

5.2.4. Gaussian Subordinated Case.

5.2.5. Associated Case.

5.2.6. One-Sided Linear Processes.

5.2.7. (θ,ℒ,ψ)-Weak Dependence.

6. FUNCTIONAL ESTIMATION

6.1. Second-Order Properties

6.1.1. Strong Mixing Case (α-Mixing).

6.1.2. ρ-Mixing Case.

6.1.3. (θ,ℒ,ψ)-Weak Dependence.

6.2. Almost Sure Convergence Properties

6.2.1. Strong Mixing Case (α-Mixing).

6.2.2. ρ-Mixing Case.

6.2.3. (θ,ℒ,ψ1)-Weak Dependence Case.

7. CONCLUSION

APPENDIX A: PROOFS

A.1. Proofs for Section 4: Models.

A.2. Proofs for Section 5: FCLT.

A.3. Proofs for Section 5: Empirical Process.

A.4. Proofs for Section 6: Functional Estimation.

A.4.1. Second-order Properties.

A.4.2. Almost Sure Convergence.

APPENDIX B: TECHNICAL LEMMAS

B.1. Sufficient Conditions for Tightness.

B.1.1. Kolmogorov–Chentsov Criterion.

B.1.2. Tightness of the Empirical Process.

B.2. Tools under Mixing.

B.2.1. Covariance Inequalities.

B.2.2. Reconstruction Lemmas.

B.2.3. Rosenthal Inequalities.

B.3. Tools under Association.

B.4. Tools under Weak Dependence.

B.4.1. Moment and Exponential Inequalities.

B.4.2. Central Limit Theorem.

References

REFERENCES

Save article to Kindle

Save article to Dropbox

6.2.3. (θ,ℒ,ψ₁)-Weak Dependence Case.