1. INTRODUCTION
Engle and Granger (1987) proposed applying the Dickey–Fuller (DF) test to residuals from a static level regression to test the null hypothesis of no cointegration. A thorough asymptotic treatment of this test was provided by Phillips and Ouliaris (1990). Both papers assume observed time series that are integrated of order one, I(1), and regression errors that are I(1) under the null hypothesis. The test is designed against the alternative of integer cointegration, i.e., that the errors are I(0). Similarly, we assume time series that are integrated of the same order d and suggest a residual-based test for the null hypothesis of no cointegration; i.e., the error term is I(d) also. The order of integration d is allowed to be noninteger. The residual-based test introduced here is constructed against the alternative of fractional cointegration; i.e., under H1 the residuals are I(d − b), b > 0.
In most economic applications it is assumed that d = 1. Still, under cointegration the deviations must not necessarily be I(0) but may rather be fractionally integrated of order 1 − b ≠ 0. Empirical studies using a fractional cointegration framework typically treat small systems of only two variables where a residual-based single equation approach seems appropriate. Our residual-based test does not rely on the DF statistic because Krämer and Marmol (2004) showed that the power is poor in the case of fractional cointegration. We instead adopt the Lagrange multiplier (LM) test pioneered by Robinson (1991) and further studied and extended by Robinson (1994), Agiakloglou and Newbold (1994), and Tanaka (1999). Breitung and Hassler (2002) proposed a further variant that is completely regression based and therefore particularly simple to compute. In the case of observed time series all these variants follow under the null hypothesis a limiting normal distribution.
Recently, a number of papers have considered fractional cointegration testing, too. A partial list includes Breitung and Hassler (2002), Davidson (2002), Robinson and Yajima (2002), Velasco (2003), Marmol and Velasco (2004), Nielsen (2004), and Hassler, Marmol, and Velasco (2006). Here, we provide two contributions. First, it is shown that the LM test applied naively, i.e., without modifications, to residuals from spurious regressions does not have a Gaussian limiting distribution. Second, we suggest simply using residuals of the regression of the differences instead of those from the level regression. It is shown that the corresponding residual-based LM statistic has a standard normal asymptotic distribution under the null hypothesis of no cointegration.
The rest of the paper is organized as follows. After the introduction our assumptions and the test statistic are presented. In Section 3, it is shown that the LM test applied to regression residuals without differencing results in a nonnormal limiting distribution depending on d. The fourth section establishes asymptotic normality of the test applied to residuals from a regression in differences. Finite-sample properties of this fractional cointegration test are investigated by means of Monte Carlo experiments in Section 5. The final section concludes. Proofs are given in the Appendix.
2. PRELIMINARIES
Presenting the assumptions and reviewing the LM statistic we now lay the foundations for the paper.
2.1. Assumptions
Let the scalar y1,t and the m-dimensional vector y2,t be generated by a nonstationary I(d) process, where d is taken to be known. We assume the regression model
If b > 0 and β ≠ 0, then y1,t and y2,t are fractionally cointegrated. Note that the observed series and also the errors may be fractionally integrated, although in many economic applications d = 1 will hold. The hypotheses to be tested are
The assumptions on the stochastic processes are the following, where L denotes the usual lag operator and Δd = (1 − L)d.
Assumption 1. Let y1,t and y2,t from (1) be generated by fractionally integrated processes of order d with initial values y1,t = 0 and y2,t = 0 for t ≤ 0.
For convenience both components will be summarized in one vector of length m + 1, yt′ = (y1,t,y2,t′). The differences Δdyt are denoted more economically as vt. For the asymptotic analysis we will maintain the following assumption under the null hypothesis.
Assumption 2. The vector vt′ = (v1,t,v2,t′) = Δdyt′ is generated by a stationary VAR(p) process,
where εt is independent and identically distributed (i.i.d.) with mean zero and positive definite covariance matrix
Furthermore, we assume that the fourth moments of the components of vt exist.
2.2. LM Test
If zt from (1) was observable (i.e., β was known), an LM test against the alternative of fractional cointegration would build on the differences Δdzt that are I(−b). Breitung and Hassler (2002) show that the LM principle gives rise to the following test equation estimated by ordinary least squares (OLS):
where
A t-type statistic1
In (4), the variance Var(ζt) is estimated under the null hypothesis (φ = 0), and hence tφ(ζ) does not equal exactly the usual t-statistic. Breitung and Hassler (2002) consider the usual studentization with
instead of
. In large samples the difference is negligible, whereas in small samples we collected experimental evidence that the present version of the test statistic is superior in terms of size.
Here
denotes convergence in distribution. The cointegration test against b > 0 can be performed as a one-sided test rejecting H0 for significantly negative values of tφ(ζ).
If ζt is a stationary AR(p) process under H0,
then Breitung and Hassler (2002) adopt from Agiakloglou and Newbold (1994) the proposal to work with prewhitened residuals,
with
Accounting for the residual effect, the test regression becomes
With Xt−1′ := (ζt−1,…, ζt−p) and due to
, the corresponding t-type statistic simplifies to
Again,
is the t-statistic for φ = 0 from (6) with the usual residual variance estimator relying on
replaced by the estimator under the null
. The limiting null distribution remains standard normal.
PROPOSITION 1. Let ζt = Δdzt be a stationary AR(p) process with ut from (5) being i.i.d. Then, as T → ∞, it holds for the statistic from (7):
.
Agiakloglou and Newbold (1994) and Breitung and Hassler (2002) proposed regression (6) without asymptotic justification. The limiting normality of (7) under H0 is therefore proved in the Appendix.
3. ESTIMATED PARAMETERS
In practice, the coefficient vector β from (1) is usually unknown, and therefore the error zt is not observable. The traditional residual-based cointegration test suggested by Engle and Granger (1987) uses residuals of the cointegration regression instead of the unobservable errors. We now investigate the asymptotic properties of a test based on the OLS residuals
from a cointegration regression in levels instead of the true errors zt. The test statistic tφ(ζ) from (4) with ζt and ζt−1* replaced by
does not result in an
distribution under the null hypothesis of no cointegration. This statement will be made more precise now.
Let us assume d > 0.5 and b = 0 in (1), which corresponds to the spurious regression setup of Marmol (1998).2
A similar (bivariate) setup was considered by Cappuccio and Lubian (1997), but with different definitions of the nonstationary I(d) processes; see Assumption 1.
where B1,d(r) and B2,d(r) are possibly correlated fractional Brownian motions (“type II” in the terminology of Marinucci and Robinson, 1999) of dimension 1 and m, respectively. Based on (9) we prove in the Appendix the following result.
PROPOSITION 2. Let the null hypothesis of no cointegration (b = 0), and Assumptions 1 and 2 hold true with d > 0.5. Without loss of generality set β = 0. It is further assumed that the differences are i.i.d., Δdyt = εt. Denote by
the statistic that results from replacing ζt and ζt−1* in (4) by the residual-based analogues from (8). Then it holds that
where NT, DT, AT, and BT are defined in the Appendix. Further, as T → ∞,
and A T does not converge to zero in probability. Here
denotes convergence in probability.
Remark A. Given the asymptotic normality of NT and because BT has a nondegenerate limiting distribution, we observe that
does not converge to a normally distributed random variable. Even under the more restrictive assumption that error and regressors are uncorrelated (σ21 = 0 in Assumption 2), the asymptotic
distribution does not carry over to the case where the test statistic is simply computed from residuals of a cointegrating regression. This is due to the fact that
does not converge to β = 0 in the case of spurious regressions. If the limit βd from (9) were zero, then a limiting
distribution would arise, as can be seen from the proof of Proposition 2. Moreover, it is worth noting that the limit of BT not only depends on the number of regressors m but also on the order d of fractional integration. Hence, percentiles are difficult to tabulate.
Remark B. The deviation from the normal distribution of the LM test applied to residuals without correction is not restricted to the variant discussed in Breitung and Hassler (2002). It can be demonstrated that, e.g., the time domain variant by Robinson (1994) suffers from the same shortcoming when applied to regression residuals. We omit details.
For practical purposes Proposition 2 implies that standard normal inference is not a valid guideline when the LM test is applied without modifications to OLS residuals obtained from a level regression to test for the null of no cointegration. According to the Monte Carlo evidence in Section 5, size distortions indicating cointegration too often will result if the LM test is applied naively to level residuals.
4. CORRECTING FOR RESIDUAL EFFECT
Under the null hypothesis of no cointegration the vector β from (1) is not identified. We may still define the following regression model in differences:
where the error xt is defined as
When vt = εt, then xt is i.i.d. with variance σ2 = σ11 − σ21′Σ22−1σ21, and xt is uncorrelated with v2,t by construction. Hence, assuming Δdyt = εt in Assumption 2, the regressors in (10) are exogeneous. More generally, however, xt is serially correlated. The two cases will be treated separately.
4.1. The i.i.d. Case
According to the LM principle parameters are estimated under the null hypothesis only. This suggests modifying the residual LM test such that it is applied to the differenced regression model (10) estimated by OLS,
The residuals are
with xt from (11). As before we define
. The LM statistic is computed from the residuals
; i.e.,
is regressed on
analogously to (2) to compute the test statistic
Limiting normality is established in the Appendix.
PROPOSITION 3. Let the null hypothesis of no cointegration (b = 0) hold true. Then, as T → ∞, it holds under Assumptions 1 and 2 for the i.i.d. case,
.
Remark C. An equivalent procedure can be performed as follows. Denote again with
the OLS residuals of the level regression (1). First, difference the level residuals:
. Second, project
on the differenced regressor,
Note that
. Hence, the modification of working with projection residuals
instead of using
naively results in a test statistic identical to
from (14).
Remark D. In practice, y1,t and y2,t from (1) may evolve around different levels, y1,t = c + y2,t′ β + zt. In differences one obtains instead of (10) Δdy1,t = Δdc + v2,t′ β + xt. Given the assumption of zero starting values (in Assumption 1) the expansion of the fractional differences is in fact time dependent:
If d = 1, the constant is removed in differences. If d ≠ 1, we propose including
as an additional regressor accounting for the constant (cf. the treatment of deterministic components in Robinson, 1994). Hence, regression (12) is modified in the following way:
Remark E. If the fractional difference parameter is unknown, the test may be performed by replacing the unknown parameter d by some consistent estimator
. However, estimation of d may affect the limiting distribution of the test. To see this consider the (normalized) numerator of the test statistic,
with
where
is the residual from an OLS regression of
(see (10)) or
. Given the Taylor expansion of ln(1 − L) we obtain for Δd = (1 − L)d:
This yields
with
. A first-order Taylor approximation around d hence renders for
:
Under the null hypothesis ΨT(d) converges to a normal distribution, and
has a nonnegligible effect.
4.2. Serial Correlation
When p ≥ 1 in Assumption 2, matters get more complicated. To investigate the dependence structure, define
Without loss of generality we may set p = 1 in Assumption 2 and write for convenience A1 = A,
The transformed VAR(1) model becomes
with
where Im denotes the identity matrix. The equations for v2,t and v1,t turn into
where α1 = a11 − σ21′Σ22−1σ21, β1 = a12 − A22′Σ22−1σ21, and
The regression equation corresponding to (16) becomes for general p
where the error ηt is i.i.d. Notwithstanding the
-consistency of the estimators, it is not sufficient to simply replace the residuals in (14) by
. Because lagged regressors and lagged values of ηt may be correlated one has to correct for that to obtain a limiting distribution free of nuisance parameters. Analogously to (6) we consider with
the regression
Limiting normality when testing for φ = 0 is established in the Appendix.
PROPOSITION 4. Let the null hypothesis of no cointegration (b = 0) hold true. Then, as T → ∞, it holds under Assumptions 1 and 2 for the t-statistic
testing for φ = 0 in (18) that
.
Remark F. Again, to compute the t-type statistic
, the error variance may be estimated under the null hypothesis
or just as well from empirical residuals,
. Based on experimental evidence we stick to
.
5. MONTE CARLO EVIDENCE
To study the size properties of the LM test applied to the residuals, we generate the data according to the bivariate model (m = 1)
where in the first set of experiments Δdy2,t and Δd−bzt are generated as Gaussian white noise processes with
For ρ = 0, the regressor y2,t is strongly exogenous, whereas for ρ ≠ 0 the regressor is correlated with the error. For all Monte Carlo experiments, 5,000 replications of the model are used.
As stated in Proposition 2, the test against fractional alternatives applied to the residuals of the cointegration regression does not yield an asymptotically normally distributed test statistic. Table 1 reports the actual sizes of the test statistic, where the 5% critical value −1.645 is applied. It turns out that when the test is computed from the cointegration in levels, the actual size is roughly twice as large as the nominal one for T = 200, irrespective of d. For T = 500 the size bias is slightly smaller. Furthermore, it turns out that the size properties remain unchanged if the regressors are correlated with the disturbance term.
Residual test based on levels: Empirical sizes
As shown in Table 2 (b = 0), the test statistic computed from the differenced equation exhibits no substantial size bias, thus illustrating Proposition 3. For the reported results the error zt is generated independently of the regressor; i.e., ρ = 0. Similar results obtained for a process with ρ ≠ 0 are available upon request. Table 2 also presents the empirical power of the residual-based LM-type test statistic for various values of b. As the results do not depend on the value d (which is assumed to be known) we let d = 1. In this case the residual Dickey–Fuller (rDF) test can also be applied although it is not designed against fractional alternatives. Furthermore, the size and power of the trace statistic suggested by Breitung and Hassler (2002) are reported. The trace statistic is based on the canonical correlation between the vectors ut = [(1 − L)dy1,t,(1 − L)dy2,t]′ and ut−1* = [sum ]j−1ut−j. This test statistic can be seen as a fractional analogue to the LR test statistic suggested by Johansen (1988). The respective results for this trace test are presented in the column “System” of Table 2.
Size and power of alternative cointegration tests
From the simulation results it turns out that the new test statistic is clearly more powerful than the rDF test. This is not surprising as we test against fractional alternatives. The residual test is also more powerful than the multivariate trace test (system). This may be due to the fact that the residual test is one-sided, whereas the trace statistic is designed as a two-sided test against b ≠ 0.
Next, the small-sample properties of the correction for serially correlated differences suggested in Proposition 4 are considered. The errors are generated under H0 by a VAR model given by
with α being a multiplicative scalar and where
To correct for the serial correlation of the errors, we choose p = 1 in the test equation (18). The empirical sizes of this test procedure are reported in Table 3. It turns out that the correction for autocorrelation works quite well and yields actual sizes close to the nominal size of 0.05 for sample sizes larger than T = 200.
Size under serially correlated errors
So far we have assumed that the integration parameter d of the regressor is known. In practice, however, it may be the case that d is unknown and must be replaced by some estimate. To investigate the effect of an estimated parameter on the properties of the test, we first estimate the parameter d by using the log-periodogram regression based on M ≤ T/2 frequencies around the origin (cf. Geweke and Porter-Hudak, 1983). The efficiency of this estimator depends on the range of frequencies M used in the Geweke and Porter-Hudak (GPH) log-periodogram regression. Because the data are generated by a pure fractionally integrated process with white noise errors, a large value of M may be chosen to reduce the variance of the estimator.
The resulting estimate of the parameter d is used to compute the fractional difference filter
. Table 4 presents the rejection frequencies of a test with estimated parameter
for various values of M and T = 200. The results suggest that the test may possess a severe size bias if M is small. Even if M = T/2 = 100 the test still overrejects the null hypothesis substantially.
Sizes for estimated d (GPH)
6. SUMMARY
We propose a test against the alternative that nonstationary time series integrated of order d are fractionally cointegrated, i.e., a linear combination is integrated of order d − b with b > 0. Neither d nor d − b have to be integer. The null hypothesis of no cointegration is b = 0. We discuss the use of the asymptotically normal LM test pioneered by Robinson (1991) in the version by Breitung and Hassler (2002).
It is shown that the LM test applied to residuals from a static regression in levels does not result in limiting normality. The distribution instead depends on d and hence is cumbersome to tabulate. Therefore, we propose working with residuals obtained from a first-step regression of the differenced variables instead of a level cointegration regression. Assuming a VAR model for the differences, it is suggested that the test regression be augmented in a second step with lagged endogenous and exogenous variables to retain limiting normality. Monte Carlo experiments assess the validity of a normal approximation in finite samples.
A drawback of our procedure is that the test requires the knowledge of d to difference the variables. If the difference parameter is unknown and not set to d = 1 a priori, then it will typically be replaced by some estimator. The estimation of d, however, may seriously distort the outcome of the test. Future research should therefore try to incorporate the effect of estimating d into the test procedure.
APPENDIX
We first establish a lemma and then prove the propositions. A word on notation before we begin. Let xt be a scalar variable and
. For compact notation we define
and similarly,
Analogously, we define the matrices
LEMMA A. For xt being iid(0,σ2) it holds that
as T → ∞.
Proof. (i) is obvious. (ii) is implied by Robinson (1991, Thm. 2.1) because
(cf. Robinson, 1991, p. 82). To establish (iii), define
Note that xt−1** is a stationary and ergodic process (Hannan, 1970, p. 203) with
Next, consider
with
Hence, we have
It follows by White (2001, Prop. 3.52) that
converges to zero almost surely. Similarly,
can be shown to tend to zero because E(ρt xt−1**) = E(ρt2). With (xt−1*)2 = (xt−1** − ρt)2 this implies (iii), which completes the proof. █
Proof of Proposition 1. Assume a stationary AR(p) model,
where the coefficients bi of the moving average representation are defined implicitly. The proof builds on Tanaka (1999). Note that
In terms of autocorrelations instead of autocovariances, Tanaka (1999, Thm. 3.3) shows that
where ΣX /σ2 is the information matrix for (a1,…, ap)′ (note that in Tanaka (1999, Thm. 3.3) there is a misprint with κk; here we give the proper definition used in the proof by Tanaka (1999, p. 580)). With the prewhitened residuals from the autoregression,
, it follows that
. Therefore, by (A.1),
Moreover, for
, it holds that
because Xt−2* is asymptotically stationary. Finally, for
,
As in the proof of Lemma A it can be shown that the difference between working with ut−1* and with ut−1** is negligible. Thus we obtain
Therefore,
as required to complete the proof given the test statistic in (7). █
Proof of Proposition 2. Define
with the limiting
distribution of tφ(ζ) established by Breitung and Hassler (2002) (cf. Lemma A). Using
we obtain after some manipulation
with
Let us first consider AT. As in Breitung and Hassler (2002, Lem. A.1), T−1/2V2′V2*, T−1/2V2*′v1, and T−1/2V2′v1* have limiting normal distributions. Because
does not converge to zero, AT cannot converge to zero. Note that the asymptotic distribution of AT does not follow as long as joint convergence of all terms involved is not established.
With BT the situation is different. All moment matrices of the type v′v/T, V′V/T, and V′v/T have well-defined constant probability limits, which can be shown analogously to Lemma A. Therefore, joint convergence in distribution with
applies, and the asymptotic distribution follows from (9) and (A.2) by the continuous mapping theorem, which completes the proof. █
Proof of Proposition 3. With
it holds that
where rT := (V2′V2)−1V2′x. By Assumption 2 and under vt = εt it holds that
and T−1V2*′x → 0 and T−1V2′x* → 0. Lemma A hence yields
This completes the proof. █
Proof of Proposition 4. Now let v1′ = (v1,p+2,…,v1,T) and
Using this matrix notation, equation (16) may be written for general p:
where
Furthermore, define Mw = I − W(W′W)−1W′, and the residual vector
from (17). With
the OLS estimator
from (18) is written as
With wt′ being a row of W and
this yields
where
is the OLS estimator of (17). We hence obtain
Let
be the σ-algebra generated by {ηt−1*,wt,wt−1,…}. Using the martingale difference property
it follows for ξT,t = T−1/2ηtηt−1* that
The existence of fourth moments and the Lyapunov condition,
imply the Lindeberg condition:
Similar results hold for the term ξT,t* = T−1/2ηt wt. Hence,
where
This implies
and
where
is a consistent estimator of σ. We may choose
, or just as well
, because all estimators in (18) converge to zero under H0. █