Published online by Cambridge University Press: 12 December 2005
We consider the unit root testing problem with errors being nonlinear transforms of linear processes. When the linear processes are long-range dependent, the asymptotic distributions in the unit root testing problem are shown to be functionals of Hermite processes. Functional limit theorems for nonlinear transforms of linear processes are established. The obtained results differ sharply from the classical cases where asymptotic distributions are functionals of Brownian motions.The author thanks the referee and Professor B. Hansen for their valuable suggestions. The work is supported in part by NSF grant DMS-04478704.
The unit root testing problem has been extensively studied in the econometrics literature. In the paper we consider the following model:
and
where
are independent and identically distributed (i.i.d.) random variables with mean zero and finite variance, the real coefficients (ai)i=0∞ are square summable, and K is a measurable function such that E [K(xt)] = 0 and E [K2(xt)] < ∞. Here (xt) can be autoregressive moving average (ARMA) or fractional autoregressive integrated moving average (ARIMA) processes and K can be nonlinear functions. See Granger and Joyeux (1980) and Hosking (1981) for an introduction to fractionally integrated processes. Dittmann and Granger (2002) recently discussed nonlinear transforms of fractionally integrated processes and found some very interesting properties. We are interested in testing the hypothesis H0 : ρ = 1 versus the alternative HA : ρ ≠ 1. Given the observations y0,…,yn, the least squares slope estimate has the form
In the classical unit root testing problem, ut are often assumed to be i.i.d.; see, for example, Dickey and Fuller (1979, 1981). In this case, it can be shown that
where
is a standard Brownian motion and ⇒ denotes convergence in distribution. The i.i.d. assumption is very restrictive in practice. Various generalizations to dependent random variables have been extensively pursued for processes with special dependence structures. For a partial list, see Phillips (1987) for strong mixing processes, Phillips and Xiao (1998) and Wang, Lin, and Gulati (2002) for linear processes in which K(x) = x, and Chan and Terrin (1995) for Gaussian processes. The limiting distributions can be similarly expressed as functionals of Brownian motions. In an important paper, Sowell (1990) considers the unit root testing problem for long-memory processes. In particular, Sowell considers the special case of (1) with the identity function K(x) = x and the fractionally integrated series ut ∼ I(d), where −½ < d < ½. Namely,
If 0 < d < ½, then the covariance function of ut satisfies γu(k) = E(u0uk) ∼ k2d−1c for some constant c > 0, which is not summable because 2d − 1 > −1. Such property is usually referred to as long-range dependence or long memory. In this case, the limiting distribution is strikingly different from (4). Sowell (1990) proves that
where
is the fractional Brownian motion with Hurst index H = d + ½. Namely,
is a mean zero Gaussian process with covariance function
(cf. Mandelbrot and Van Ness, 1969). See also Wang, Lin, and Gulati (2003) for some recent developments.
Long-memory processes have received considerable attention in the econometrics literature. It would be impossible to compile a complete list. See Baillie (1996) for an excellent survey and Robinson (2003), Doukhan, Oppenheim, and Taqqu (2003), and Caporale and Gil-Alana (2004), among others, for some recent contributions. In this paper we shall generalize previous results on unit root problems in two directions, namely, by allowing general nonlinear transforms K and general forms of linear processes that include ARMA and fractional ARIMA processes as special cases. It turns out that, because of nonlinearity, the asymptotic problem becomes considerably more challenging when K assumes a general form than that in special cases such as K(w) = w. On the other hand, the limiting distributions have more interesting structures that appear rather atypical in the sense that they may no longer be functionals of Brownian motions. Instead, under suitable conditions, the asymptotic distributions are functionals of Hermite processes. Dittmann and Granger (2002) discuss nonlinear transforms of fractionally integrated processes and show that the dependence structure of the transformed sequence ut = K(xt) may be significantly different from that of the input sequence xt.
The paper is structured as follows. Section 2 presents functional limit theorems for the partial sum process Sn = u1 + ··· + un. The latter problem has a rich literature, and it plays an important role in unit root problems. Our asymptotic results go beyond existing ones by allowing non-Gaussian processes and general functionals K; see the discussion in Section 2. Applications to the unit root testing problem (1) are made in Section 3. Proofs are given in the Appendix.
Functional limit theorems are powerful tools for asymptotic distributions of
(Phillips, 1987). In this section we shall present a functional limit theory for Sn = Sn(K) = u1 + ··· + un. For t ≥ 0 let St = S[lfloor ]t[rfloor ] + (t − [lfloor ]t[rfloor ])u[lfloor ]t[rfloor ]+1, where [lfloor ]t[rfloor ] is the integer part of t. Then St is a continuous function in t ≥ 0. Let C[0,1] be the collection of all continuous functions defined on [0,1] and define the metric d(f,g) = sup0≤t≤1| f (t) − g(t)| for f,g ∈ C[0,1]. For ξn,ξ ∈ C[0,1], denote by ξn(t) ⇒ ξ(t) the weak convergence of ξn to ξ in the space C[0,1] under the metric d. See Billingsley (1968) for an extensive treatment of the weak convergence theory in C[0,1]. Under mild conditions, the limiting distributions of {Snu,0 ≤ u ≤ 1} under proper scaling are shown to be either Hermite processes or Brownian motions, depending on whether the process is long- or short-memory and a quantity (power rank) related to K.
Let ∥ξ∥ = [E(ξ2)]1/2 be the L2 norm of the random variable ξ. Define the shift process
and the truncated processes
. Then for n ≥ 1,
is independent of
. Now define functions
Write κr for the rth derivative K∞(r)(0) if it exists. We say that K has power rank p if κp ≠ 0 and κr = 0 for 1 ≤ r ≤ p − 1 (Ho and Hsing, 1997). In the case of Gaussian processes, p is the Hermite rank (Taqqu, 1975, 1979; Dittmann and Granger, 2002). For a function g let g(w;λ) = sup|y|≤λ|g(w + y)| be the local maximal function. Let
(p ≥ 0) be the collection of functions f with pth-order partial derivatives. To state our main results, we need the following condition.
Condition 1. Let
for some
for all large n. Assume that for some λ > 0,
Condition 1 is actually quite mild; see Remark 1 in the Appendix and Example 1 in Section 3 for more discussion. This condition and relations (12) and (14), which follow, impose certain smoothness requirements on Kn−1. They are easily verifiable.
Let
be a standard two-sided Brownian motion; let the simplex
. For ½ < β < ½ + 1/(2r), define the Hermite process (cf. Surgailis, 1982; Avram and Taqqu, 1987)
When r = 1, Zr,β(t) is the fractional Brownian motion with Hurst index
; Zr,β(1) is called the multiple Wiener–Ito integral. Throughout the paper the notation [ell ](n) denotes a slowly varying function, namely, limn→∞ [ell ](λn)/[ell ](n) = 1 for all λ > 0 (Bingham, Goldie, and Teugels, 1987).
THEOREM 1. Assume that Condition 1 holds with q = 4 and that K has power rank p ≥ 1. Let an = n−β[ell ](n) with ½ < β < 1, n ≥ 1. (i) If p(2β − 1) < 1, then
in the space C[0,1], where σn,p = n1−p(β−1/2)[ell ]p(n). (ii) If p(2β − 1) > 1 or p(2β − 1) = 1 and
, then
in the space C[0,1] for some σ < ∞.
THEOREM 2. Assume
and
Then
in the space C[0,1] for some σ < ∞. A sufficient condition for (12) is
Theorems 1 and 2 improve previous results in several aspects. We shall compare our results with earlier ones that are based on strong mixing processes and near-epoch dependence (NED). The concept of strong mixing is proposed by Rosenblatt (1956); see the review by Bradley (1986) for various strong mixing conditions. Gallant and White (1988) apply NED to characterize weak dependence.
Functional central limit theorems for strongly mixing processes have been widely discussed in the literature; see Peligrad (1986) for an excellent survey. However, for linear processes very restrictive conditions on the decay rate of an are needed to ensure the strong mixing property; see Withers (1981), Andrews (1984), Pham and Tran (1985), Gorodetskii (1977), and Doukhan (1994) for more discussion about mixing properties of linear processes. Withers (1981) shows that, if E(εi2) < ∞, then under certain regularity conditions on the density function of εi and the inequality
the process xt is strong mixing. Here
. In the case that an = n−δ, n ≥ 1, the preceding inequality requires δ > 2 and the strong mixing coefficients
(Withers, 1981). For strong mixing processes, the celebrated central limit theorem by Ibragimov and Linnik (1971) asserts that
is asymptotically normal if
hold for some γ > 0. See Theorem 18.5.3 in Ibragimov and Linnik (1971). Therefore, even under the stronger moment condition E [|K(xt)|3] < ∞ (namely, γ = 1), one needs to impose
, to ensure the asymptotic normality of
. In comparison, our natural summability condition
only needs δ > 1.
We now compare our Theorems 1 and 2 with limit theorems for linear processes based on NED. De Jong and Davidson (2000) recently developed new conditions for functional limit theorems for near-epoch dependent sequences; see Theorem 3.1 therein. In particular, we consider the weak convergence of
for two special cases (i) K(x) = x and (ii) K(x) = x2 − E(xt2).
Generally speaking, limit theorems for transforms of linear processes based on NED require stronger conditions on the decay rate of (an) toward 0, especially when the function K is nonlinear. In comparison, our results impose minimal conditions on an. A key condition in Theorem 3.1 of De Jong and Davidson (2000) is that
is L2-NED of size −½ on εt; namely, there exists an η > 0 such that
See Definition 1 and Assumption 1 in De Jong and Davidson (2000) for more details. Under (i), namely, K(x) = x, the NED condition (15) requires that
. The latter relation implies
in view of
It is easily seen that the summability condition
does not imply the NED condition Am+11/2 = O(m−1/2−η). To see this, let an = 1/(n log2 n), n ≥ 2. Then the NED condition is violated, and our summability condition of Theorem 2 is weaker. On the other hand, De Jong and Davidson's result has its advantage in that it can be applied to nonstationary processes.
Consider case (ii). In this case K is an Appell polynomial. Let E(εi4) < ∞, E(εi2) = 1, and an = n−δ, n ≥ 1 with some δ > ½. Then the NED condition (15) requires δ > 1. To this end, let
. Then E(xm2|ε0,ε1,…,ε2m) = E(zm2) + ym2 and xm2 − E(xm2|ε0,ε1,…,ε2m) = zm2 − E(zm2) + 2ym zm. Elementary calculations show that there is a c > 0 such that ∥zm2 − E(zm2) + 2ym zm∥ ∼ cm1/2−δ as m → ∞, which by (15) implies δ ≥ 1 + η > 1. However, by (ii) of Theorem 1, the functional limit theorem (11) holds under the weaker condition δ > ¾ because K(x) = x2 − E(xt2) has power rank 2 in view of K∞(w) = E(w + xt)2 − E(xt2) = w2, K∞′(0) = 0, and K∞′′(0) ≠ 0. The condition δ > ¾ is much weaker, and it allows some long-range dependent sequences. It is optimal in the sense that the limiting distribution is the Rosenblatt process if
, as asserted by (i) of Theorem 1 or Avram and Taqqu (1987). Dittmann and Granger (2002) point out the similar phenomenon that the square of a Gaussian I(d) process shows less dependence than the input process.
There is a substantial history regarding the asymptotic distributions of Sn = Sn(K). Our results extend earlier ones in several aspects. Central and noncentral limit theorems for Sn(K) have been established for stationary Gaussian processes (xt) by Sun (1963), Taqqu (1975, 1979), and Breuer and Major (1983), among others. In particular, Taqqu (1975) established (10) for functionals of Gaussian processes. For non-Gaussian processes, the functional convergence (10) has been established for K with special forms. For example, Davydov (1970) considers K(x) = x, and Surgailis (1982) assumes that K is analytic and ε0 has moments of all order. Appell polynomials are discussed in Avram and Taqqu (1987) and Giraitis and Surgailis (1986). Surgailis (2000) considers the finite-dimensional convergence of (10) and assumes that either K is a polynomial or (xt) is associated with some Gaussian processes. It is a difficult problem to derive limit theorems for Sn if K is not analytic and the linear process (xt) is non-Gaussian. The difficulty is partly due to the fact that, in the non-Gaussian case, the associated Appell polynomials are no longer orthogonal. In the Gaussian case, they are Hermite polynomials and, hence, orthogonal (Taqqu, 1975, 1979; Granger and Newbold, 1976; Giraitis and Surgailis, 1986). Ho and Hsing (1997) made a breakthrough and proved that Sn /σn,p ⇒ κp Zp,β(1), a marginal version of (10). See the latter paper for further references. The functional convergence is needed in unit root problems. Our Theorem 1 shows functional convergence for non-Gaussian processes under mild conditions on K. For other contributions see Wu (2002, 2003b), where noninstantaneous transforms and infinite variance linear processes are discussed.
The functional limit theorems 1 and 2 easily lead to the asymptotic distributions of the unit root statistic
, which are functionals of Hermite processes or Brownian motions as asserted by Theorems 3 and 4, respectively. We omit the proofs of the latter theorems because they routinely follow from the argument in Phillips (1987). Unfortunately, we know very little about the analytical properties of the limiting distribution in (iii) of Theorem 3. In comparison, Dickey and Fuller (1979) show that (4) has a nice representation. For statistical testing, quantiles of the limiting distribution in (iii) of Theorem 3 can be obtained by extensive simulations.
THEOREM 3. Under assumption (i) of Theorem 1, we have as n → ∞ that
THEOREM 4. Under the assumptions of Theorem 2 or (ii) of Theorem 1, we have
If κ1 ≠ 0, then the power rank of K is 1. Hence the asymptotic distribution asserted by Theorem 3 with p = 1 is the same as Sowell's result if ½ < β < 1. In this case, the asymptotic distributions are expressed as functionals of fractional Brownian motions. An interesting phenomenon happens when p ≥ 2, as shown by Example 1.
Let K(w) = |w| − E|xt|. Assume that the density function f of xt is symmetric, namely, f (x) = f (−x). Then
Assume that
. Then the power rank of K is 2 because
in view of the symmetry of K. If
, then Theorem 3 and (i) of Theorem 1 are applicable. In this case, the limiting distribution is called the Rosenblatt process (Taqqu, 1975). On the other hand, if ¾ < β < 1, then as in the classical cases, the limiting distributions are functionals of Brownian motions (Theorem 4).
We shall apply the central limit theory for Markov chains to prove our main results. Let
be a stationary and ergodic Markov chain; let
and define the projection operator
By the Markovian property,
if k ≤ i.
LEMMA 1. Assume that the Markov chain
is stationary and ergodic and the function g satisfies E [g(ξ1)] = 0 and E [g2(ξ1)] < ∞. Let
. Further assume that
Then
, where
.
Lemma 1 is adapted from Hannan (1979), and it provides a useful tool for functional limit theorems for stationary processes. We shall apply it to g(ξi) = ui = K(xi), where the shift process
is clearly a Markov chain. In such cases we are able to obtain bounds for
(cf. Theorem 5, which follows), and thus condition (A.1) is verifiable.
For j ≥ 2 let
. Define
where
LEMMA 2. Condition 1 implies that
and
Remark 1. If α = 0, then (A.3) is valid by the smoothing property of conditional expectations. For larger α, differentiation under the expectation sign is required. A simple recursion yields that
almost surely for all i ≥ 0. Roughly speaking, (A.4) is a first-order Taylor's expansion.
Proof of Lemma 2. Assume without loss of generality that λ = 1. For α ≤ p let δ = an−1ε1, Rn(α) = Kn−1(α)(xn,1) − Kn−1(α)(xn,0), and Tn(α) = Rn(α) − Kn−1(1+α)(xn,0)δ. By Taylor's expansion, |Tn(α)1|δ|≤1| ≤ ½|δ|q/2Kn−1(α+2)(xn,0;1) because δ21|δ|≤1 ≤ |δ|q/2. On the other hand,
Hence
by (8) and the independence between
, and
. For (A.3), it suffices to show that
holds for α ≤ p. We shall use an induction argument. The case in which α = 0 trivially follows. By letting w → 0 in the identity
we have (A.6) with α = 1 by the first term of (8) and the Lebesgue dominated convergence theorem. The general case α ≥ 2 follows recursively. Observe that for α < p, by (A.6),
and similarly
. Relation (A.4) follows from
, as does
. Finally,
entails (A.5) via the stationarity of xn. █
THEOREM 5 (Reduction principle). Suppose that Condition 1 holds with p ≥ 1. Then
Proof. Let
and define
. Then
. Observe that for i ≤ −1,
which has the same distribution as
via stationarity of
. Now we claim by backward induction that for all p ≥ α ≥ 0,
When α = p, (A.9) follows from (A.5) because
. Suppose (A.9) holds for α = m, where 1 ≤ m ≤ p, and consider the case α = m − 1. So for i ≤ −1,
by the induction hypothesis and (A.8). By Lemma 2,
because
. By (A.10), (A.11), and the orthogonality of
,
Thus the induction is finished, and (A.7) follows by setting α = 0 and i = −1. █
COROLLARY 1. Suppose that Condition 1 holds with p ≥ 1. Then
In particular, let
with ½ < β < 1 and q = 4. Then (i)
if (p + 1)(2β − 1) > 1; (ii)
if (p + 1)(2β − 1) < 1; (iii)
if (p + 1)(2β − 1) = 1.
Proof. Observe that
, which is zero if j ≤ i. Hence
by the orthogonality of
. So (A.12) follows from (A.7). If
, then (i)–(iii) follow from Lemma 5 in Wu (2003a) as easy applications of Karamata's theorem. █
Remark 2. Corollary 1 goes beyond the important results by Ho and Hsing (1997) in several aspects. In particular, the imposed condition (8) is weaker and an is allowed to have forms other than n−β[ell ](n). Moreover, if
, then Corollary 1 gives a sharper bound. Ho and Hsing obtain the bound max(n, n2−(p+1)(2β−1)+ζ) for any ζ > 0. At a technical level, our induction argument appears much simpler, and it can be easily generalized to multiple linear processes. In addition, (A.12) does not require the finiteness of the fourth moment of ε1, whereas E(ε18) < ∞ is needed in Ho and Hsing.
Proof of Theorem 1. (i) We generically say that (10) is a noncentral limit theorem because the asymptotic distribution is non-Gaussian if p ≥ 2 and because the norming sequence σn,p grows faster than
, the norming sequence used in the class central limit theorem for i.i.d. random variables with finite variances. To prove (10), let
, where Un,r is defined in (A.2). Because p(2β − 1) < 1, by Lemma 5 in Surgailis (1982) or Theorem 2 in Avram and Taqqu (1987), {Wnt /σn,p, 0 ≤ t ≤ 1} ⇒ {Zp,β(t), 0 ≤ t ≤ 1} in C[0,1]. Note that Sm(L(p)) = Sm(K) − κpWm. Recall Corollary 1 for the bound of Ξn,p. By considering three cases (p + 1)(2β − 1) < 1, (p + 1)(2β − 1) > 1, and (p + 1)(2β − 1) = 1 separately, it is easily seen that maxk≤n Ξk,p = o(σn,p2) because p(2β − 1) < 1. Thus the finite-dimensional convergence of Snt /σn,p to K∞(p)(0)Zp,β(t) holds. It remains to verify the tightness. By Theorem 12.3 in Billingsley (1968), we need to show that there exist C < ∞ and γ > 1 such that
holds for all n ≥ 1 and 1 ≤ k ≤ n. Because Ξn,p = o(σn,p2), ∥Sm∥ ∼ |κm|σm,p as m → ∞. Let
. Then by elementary properties of slowly varying functions,
which entails (A.13) and completes the proof of part (i).
(ii) In this case it is interesting to observe that the limiting distribution in (11) is Brownian motion even though (xt) is long-range dependent. To prove (11), by Lemma 1, it suffices to show that
because
. The former easily follows from Theorem 5 under the proposed conditions of (an). As to the latter, observe that
. Hence
is also summable over n. █
Proof of Theorem 2. Let (εn′) be an i.i.d. copy of (εn) and xn,1′ = xn,1 + an−1(ε1′ − ε1). Then
and (13) follows from Lemma 1 in view of
That (14) entails (12) follows from the argument in the proof of Lemma 2. █