Published online by Cambridge University Press: 31 March 2005
This paper proposes a test of the rank of the submatrix of β, where β is a cointegrating matrix. In addition, the submatrix of β⊥, an orthogonal complement to β, is investigated. We construct the test statistic by using the eigenvalues of the quadratic form of the submatrix. We show that the test statistic has a limiting chi-square distribution when data are nontrending, whereas for trending data we have to consider a conservative test or other testing procedure that requires the pretest of the structure of the matrix. Finite sample simulations show that, although the simulation settings are limited, the proposed test works well for nontrending data, whereas we have to carefully use the test for trending data because it may become too conservative in some cases.I owe special thanks to two anonymous referees, the co-editor, Pierre Perron, and Taku Yamamoto. All errors are my responsibility. This research was supported by the Ministry of Education, Culture, Sports, Science and Technology under grants-in-aid 13730023 and 14203003.
A vector autoregressive (VAR) process has often been used to model a multivariate economic time series and, following the seminal work of Engle and Granger (1987), a cointegrating relation has been incorporated into the VAR model. A typical n-dimensional VAR model of order m is
for t = 1,…,T, where {εt} is independently and identically distributed (i.i.d.) with mean zero and a positive definite covariance matrix Σ and det(In − A1 z − ··· − Am zm) has all roots outside the unit circle or equal to 1. The model (1) can be written in the error correction (EC) format,
where α and β are n × r matrices with rank r, [utri ] = 1 − L, and L denotes the lag operator. We assume 0 < r < n, and then there are r cointegrating relations. The exact condition of the existence of cointegration is given by Johansen (1991, 1992). We also assume that the cointegrating rank r is known or estimated by some testing procedure, such as the likelihood ratio (LR) test proposed by Johansen (1988, 1991) or the Lagrange multiplier (LM) test by Lütkepohl and Saikkonen (2000) and Saikkonen and Lütkepohl (2000). Other testing procedures of the cointegrating rank are reviewed by Hubrich, Lütkepohl, and Saikkonen (2001) and papers therein.
In this paper, we investigate the tests of the rank of β1, the submatrix of β, and the rank of β⊥,1, the submatrix of β⊥, where β = [β1′,β2′]′ and β⊥ = [β⊥,1′,β⊥,2]′, with β⊥ being an orthogonal complement to β. In practical analysis, we sometimes encounter cases where we need to know the rank of β1 and/or β⊥,1. For example, the cointegrating matrix is sometimes normalized as β* = β(a′β)−1, as proposed by Johansen (1988, 1991) and Paruolo (1997), where a is an n × r matrix with full column rank and the prototype normalization is represented by a = [Ir,0]′. However, there is no guarantee that a′β is of full rank. In such a situation, we would like to know whether the first r rows of β have full rank. The second example is the Granger noncausality test. As shown in Toda and Phillips (1993), when there is a cointegrating relationship, in general the Wald statistic of the Granger noncausality test from the last n3 variables of xt to the first n1 variables has a nonstandard limiting distribution, depending on nuisance parameters. However, if either the last n3 rows of β or the first n1 rows of α have full row rank, the Wald statistic is asymptotically χ2 distributed. Then, the testing procedure in this paper is useful to check the rank of the submatrix of β, whereas the existing testing procedure may be available for the test of the rank of the submatrix of α. The third example is the test of long-run Granger noncausality proposed by Yamamoto and Kurozumi (2001, 2003). In a usual sense, Granger causality is concerned with the one period ahead forecast. This concept can be extended to the predictability of h period ahead horizon, and long-run Granger causality is defined when the forecast horizon h goes to infinity. See, for example, Bruneau and Jondeau (1999) and Dufour and Renault (1998). Yamamoto and Kurozumi (2003) proposed the test for long-run block noncausality, in which it is shown that the ranks of the submatrices of β and β⊥ play an important role in constructing the test statistic. See Yamamoto and Kurozumi (2003) for more details.
Tests of the rank of a matrix have been investigated in the literature, and recent econometric developments can be seen in works by Camba-Mendez, Kapetanios, Smith, and Weale (2003), Cragg and Donald (1996, 1997), and Robin and Smith (2000), among others. Although these papers proposed tests of the rank of a matrix, they assumed that the estimator of the matrix is T1/2 consistent and has a limiting normal distribution with a nonstochastic variance matrix. However, the estimator of the cointegrating matrix is T (or T3/2) consistent and has an asymptotic nonstandard distribution. As a result, we cannot apply existing testing procedures to the cointegrating matrix.
The paper is organized as follows. In Section 2, we propose tests of the rank of β1 and β⊥,1 for nontrending data. We will show that the two test statistics proposed have asymptotically a χ2 distribution and a distribution of the maximum eigenvalue of the product of normal random matrices. Section 3 considers the case of trending data. In this case, the test statistics do not necessarily converge to a χ2 distribution and a distribution of the maximum eigenvalue. To overcome this situation, we propose two testing procedures. Section 4 investigates the finite sample properties of the tests. Section 5 concludes the paper.
In regard to notation, we use vec(A) to stack the rows of a matrix A into a column vector, [x], to denote the largest integer ≤ x, a = a(a′a)−1 for a full column rank matrix a, and
signify convergence in probability, convergence in distribution, and weak convergence of the associated probability measures. We denote the rank of A by rk(A) and the column space of A by sp(A). We write integrals such as
simply as ∫XdY′ to achieve notational economy, and all integrals are from 0 to 1 except where otherwise noted.
In this section we consider a test of rank for nontrending data with d = 0. The model considered in this section is
We estimate the model (3) by the maximum likelihood (ML) method assuming that {εt} is Gaussian, although asymptotic properties are preserved under more general assumptions. We denote the ML estimator with [caret ]. For example, the ML estimator of β is denoted by
. Using the result that
for 0 ≤ r ≤ 1 by the functional central limit theorem, where W(·) is an n-dimensional Brownian motion with a variance matrix Σ, Johansen (1988, 1995) showed that
where
with
, and G0(·) and V(·) are independent. He also showed that
are consistent estimators of α, Σ, and Γi, respectively.
Let us partition β as β′ = [β1′,β2′] where β1 and β2 are n1 × r and (n − n1) × r matrices, respectively (0 < n1 < n). Similarly, we partition β⊥′ = [β⊥,1′,β⊥,2′] conformably. Note that β1′ β⊥,1 does not necessarily equal zero, whereas β′β⊥ = β1′ β⊥,1 + β2′ β⊥,2 must be zero. Our interest lies in finding the rank of β1, and thus we consider the following testing problem:
Note that the rank of β1 is at most p ≡ min(n1,r).
To test the rank of β1, we follow the same strategy as Robin and Smith (2000), who test the rank of a matrix and investigate its quadratic form. In our situation, we construct a quadratic form of β1. The advantage of considering a quadratic form is that the eigenvalues are nonnegative real values, even if those of β1 are complex values. Then, the null hypothesis H0 becomes equivalent to the existence of f positive real and n1 − f zero eigenvalues.
Let Ψ and Φ be r × r and n1 × n1 possibly stochastic matrices that are symmetric and positive definite almost surely (a.s.). Because they are full rank matrices (a.s.), the rank of β1 is equal to the rank of Φ−1β1Ψβ1′ (a.s.). Therefore, the test of the rank of β1 is equivalent to that of Φ−1β1Ψβ1′, and then we consider the rank of the latter matrix. Note that, although this strategy is basically the same as that followed by Robin and Smith (2000), we cannot directly use their result because they assume that the estimated matrix is asymptotically normally distributed with a convergence rate T1/2, whereas
is shown to be T-consistent and the limiting distribution is mixed Gaussian.
For the test of the rank of β1, we define Ψ = α′Σ−1α and
These Ψ and Φ are chosen so that the limiting distribution of the test statistic does not depend on nuisance parameters. Other choices of Φ may be possible because, as shown in the Appendix, the test statistic asymptotically does not depend on β1(β′β)−1β1′, which appears when (6) is expanded. For example, we can use a constant multiple of (β′β)−1 in the second term of (6). However, as indicated in the Appendix, Φ has to be invariant to the normalization of β. We use the definition (6) just because it seems simplest among other choices.
Let λ1 ≥ λ2 ≥ ··· ≥ λn1 be the ordered eigenvalues of Φ−1β1Ψβ1′, which are the solution of the determinant equation
Then, under H0, λ1 ≥ ··· ≥ λf > 0 and λf+1 = ··· = λn1 = 0 (a.s.).
We construct a sample analogue of (7) using the ML estimator and investigate the limiting distributions of the eigenvalues. The sample analogue of (7) is given by
where
is the first n1 rows of
, and
where
, with R1t being the regression residual of xt−1 on [utri ]xt−1,…,[utri ]xt−m+1, and we denote the ordered eigenvalues of (8) as
. Note that when n1 > r, the smallest n1 − r eigenvalues are obviously equal to 0, that is,
. We can easily see from the expressions (6) and (9) that
are positive definite (a.s.), whereas the expression (10) is simpler and may be used to construct
in practice.
To test the rank of β1, we consider the following test statistic:
which rejects the null hypothesis when
takes large values. The second equality is established because p = min(n1,r) and
when n1 > r.
We can also consider the null hypothesis of rk(β1) = f against the alternative of rk(β1) = f + 1. In this case, the test statistic is defined by
To denote the limiting distribution of
, we define λmax,j,k* as the maximum eigenvalue of
where X*′ is a j × k matrix with vec(X*′) ∼ N(0,Ij×k). The critical points of this distribution are given in Table 1 for the case where j ≥ k. They are calculated by simulations with 1,000,000 replications. Because the nonzero eigenvalues of X*X*′ are the same as those of X*′X*, we can refer to the percentage points of λmax,k,j when j < k.
THEOREM 1. Let
be given by (10). If f < p, under H0,
.
Remark 1. Because the determinant equation (8) converges to (7) in distribution, the estimated ordered eigenvalues of (8) also converge in distribution to those of (7). Then, under the alternative,
(a.s.), so that
goes to infinity. Therefore, the tests
are consistent.
Remark 2. Although the test statistics are constructed using the estimator of β⊥,1, we do not have to assume that it is of full rank. We can see that the rank of β⊥,1 is at least n1 − f under H0, noting that the column space of β⊥,1 must contain n1 − f bases that are orthogonal to sp(β1) because [β1,β⊥,1] has full row rank n1. Because β1′ β⊥,1 is not necessarily equal to zero, it is possible for sp(β⊥,1) to contain some of the bases that span sp(β1), so that the rank of β⊥,1 may be greater than n1 − f. It is shown in the Appendix that the limiting distributions of the test statistics depend not on the rank of β⊥,1 but on the number of the bases orthogonal to sp(β1), n1 − f, unless f = n1. When f = n1, all the eigenvalues are asymptotically greater than zero (a.s.), and then the test statistics will diverge. This case is excluded from the theorem (f is assumed to be less than p = min(n1,r)). In other words, our tests cannot be applied for the null hypothesis of full rank. If we need to check whether β1 is of full rank or not, we may test for the null of f = n1 − 1, and if we rejected the null hypothesis, we would conclude that it is a full row rank matrix.
Remark 3. Because the hypothesis about the rank of β1 can be regarded as a restriction on the cointegrating matrix β, we may consider using the LR test as proposed by, for example, Johansen (1991, 1995) and Johansen and Juselius (1990, 1992). In fact, when f = 0 the null hypothesis is equivalent to β1 = 0, and this hypothesis can be expressed as a linear restriction on β such as β = Hφ, where H = [0,In−n1]′ and φ is an (n − n1) × r unknown parameter. Then, the LR test is applicable to the test of f = 0. However, for 0 < f < p, the null hypothesis is expressed as β1 = β11 β12′ where β11 and β12 are n1 × f matrices with full column rank f. Then, we have to estimate the model with this restriction. Although the LR test might be applicable to the nonlinear hypothesis, it seems tedious to estimate the model with this nonlinear restriction, whereas our test uses only the ML estimator without the restriction. It is beyond our scope to investigate the applicability of the LR test to our case, and we do not discuss this in detail.
We may represent the null hypothesis as proposed by Boswijk (1996) and apply the LR test. According to his paper, the null hypothesis of rk(β1) = f is expressed as β = (Hoφ,ψ) where Ho = [0,In2]′ and (φ,ψ) ∈ Rn2×(r−f) × Rn×f. As pointed out by Boswijk (1996, p. 156), the LR test for this hypothesis has an asymptotic χ2 distribution only when “no linear combination of ψ lies in the column space of” Ho. Because there is no guarantee of this condition, we do not consider his method in this paper.
Next, we consider a test of the rank of the submatrix of β⊥. The testing problem is
For the same reason as in the test of β1, we investigate the rank of
, where
are (n − r) × (n − r) and n1 × n1 full rank matrices (a.s.). Similar to (7), we consider the following determinant equation:
where
and
and the sample analogue of (12) is given by
where
and
Let
be ordered eigenvalues of (12) and (13), respectively, and we construct the following test statistics, with q = min(n1,n − r):
THEOREM 2. Let
be given by (14). If g < q, under H0⊥,
.
Note that the consistency of the tests is shown in a similar way as in Remark 1. We also note that we cannot test the null of rk(β⊥,1) = q by a similar reason to that given in Remark 2.
Given the preceding two theorems, we can test the rank of β1 and β⊥,1. In addition, we may consider the procedure to decide the rank of the submatrix, as the cointegrating rank is selected sequentially using the test of the cointegrating rank. For example, to decide the rank of β1, we first test the null of f = 0. If the null hypothesis is accepted, the rank of β1 is decided to be zero. Otherwise, we then test the hypothesis of f = 1. We sequentially continue to test the rank of β1 until the null hypothesis is accepted. When the null of f = p − 1 is rejected, we consider that β1 has full rank. Similarly, the rank of β⊥,1 can be decided by the same procedure.
In the previous section, we considered the model with d = 0 for nontrending data. However, in practice, we sometimes consider the model (2) with d ≠ 0 but with the level of data having no linear trend. In this case, the constant term can be expressed as d = αρ0 where ρ0 is an r × 1 coefficient vector, so that the model (2) becomes
where β+ = [β′,ρ0]′ and xt−1+ = [xt−1′,1]′. The ML estimator of β+ can be obtained by the reduced rank regression of [utri ]xt on xt−1+ corrected for [utri ]xt−1,…,[utri ]xt−m+1, and the estimator of the cointegrating matrix is the first n rows of
.
To test the rank of the submatrix of β for the model (15), we use
defined by
where
are (n − r + 1) × (n − r) and (n + 1) × (n − r + 1) matrices defined by
and
, with R1t+ being the regression residual of xt−1+ on [utri ]xt−1,…,[utri ]xt−m+1.
THEOREM 3. Consider the model (15) and let
be given by (16). If f < p, under H0,
.
THEOREM 4. Consider the model (15) and let
be given by (14). If g < q, under H0⊥,
.
In practical analysis, we will obtain
by the reduced rank regression, and we have to calculate
from
. If
can be easily obtained as explained in Johansen (1995, p. 95). When d = αρ0, one of the methods to calculate
is as follows. First we calculate the orthogonal projection matrix of
. Then, by the singular value decomposition, M is expressed as Ml Mλ Mr′ where Ml and Mr are n × (n − r) orthogonal matrices and Mλ is an (n − r) × (n − r) diagonal matrix with positive diagonal elements. Because sp(M) = sp(Ml) and they are orthogonal to
, we can use Ml as
.
When data are trending, xt can be expressed as the sum of the stochastic trend, the deterministic trend, and the I(0) component such that
where C = β⊥(α⊥′Γβ⊥)−1α⊥′ as defined in Section 2.1, τ = Cd, C1(L) = (C(L) − C(1))/(1 − L) with C(L) being a lag polynomial when [utri ]xt is represented as the vector moving-average process like [utri ]xt = C(L)(d + εt), and x0* is a stochastic component such that β′x0* = 0. See Johansen (1991, 1995) for more details. In this case, β⊥ is decomposed to τ, the coefficient of a linear trend in (17), and γ, an n × (n − r − 1) matrix that is orthogonal to τ. We partition γ and τ into [γ1′,γ2′]′ and [τ1′,τ2′]′ in the same way as β. As shown in Chapter 13.2 of Johansen (1995),
can be expressed as
where
where G(r) = [G1′(r),G2′(r)]′ with G1(r) = G0(r) − ∫G0 ds, G0(r) = γ′CW(r) and G2(r) = r − ½. We denote Ω = ∫GG′ ds and partition it into 2 × 2 blocks conformably with [U1′,U2′]′. We express the (i,j) block element of (∫GG′ ds)−1 as Ωij for i,j = 1 and 2. In this section, we need the estimator of Ω11, which is given by
and S11 is defined in the same way as in the previous section, with R1t being the regression residual of xt−1 on a constant and [utri ]xt−1,…,[utri ]xt−m+1. Convergence of
is proved in Lemma 2(iii) in the Appendix, whereas the consistency of other ML estimators, such as
, is shown by Johansen (1991, 1995).
In the following discussion, we will show that the limiting distribution of
depends on whether the rank of [β1,γ1] is n1 − 1 or n1, or equivalently, whether τ2 = 0 or not. We will propose two testing procedures to cope with this problem.
Let us consider the testing problem (5). Under the null hypothesis, we can find the f linearly independent column vectors in β1, and we define β1* as an n1 × f matrix whose columns consist of those f vectors. We also define an n1 × (n1 − f) matrix δ* as an orthogonal complement to β1*, so that δ*′β1* = 0. We show that the direction of δ* is important in deciding the convergence rate of
and it also affects the limiting property of the test statistic.
Let us consider the case where r < n − 1. Because
is the first n1 rows of
, it is expressed from (18) as
Suppose that an n1 × 1 vector τ1* exists that is orthogonal to γ1 (τ1*′γ1 = 0) and belongs to the column space of δ*. Here, note that, because the n × n matrix [β,γ,τ] is of full rank, the first n1 rows of this matrix, [β1,γ1,τ1], must be of full row rank, which implies that a′[β1,γ1,τ1] ≠ 0 for any nonzero vector a. Then, because τ1* is orthogonal to both β1 and γ1 by the assumption, we have τ1*′[β1,γ1,τ1] = [0,0,τ1*′τ1] ≠ 0, so that τ1*′τ1 ≠ 0. This implies
whereas for an n1 × (n − r − 1) matrix δ0* whose columns span the orthogonal complement to τ1* in sp(δ*),
On the other hand, if there exists no vector in sp(δ*) that is orthogonal to γ1, we have
Therefore, the convergence rate of
depends on whether a vector τ1* orthogonal to γ1 exists in sp(δ*).
The existence of τ1* indicates that the column space of [β1,γ1] does not include τ1* because τ1*′ β1 = 0 and τ1*′γ1 = 0. We also note that the rank of [β1,γ1] must be n1 − 1 or n1 because [β1,γ1,τ1] has full rank n1. Then, from another point of view, we can say that the rank of [β1,γ1] is n1 − 1 if a vector τ1* exists, whereas the nonexistence of τ1* is equivalent to rk([β1,γ1]) = n1. Thus, we have to consider the asymptotic property separately according to the two cases where the rank of [β1,γ1] is n1 and n1 − 1 when r < n − 1.
For further investigation, let us consider the case where the rank of [β1,γ1] equals n1 − 1. In this case, this matrix is expressed as [Θ11,0] by some nonsingular transformation from the right-hand side, where Θ11 is an n1 × (n1 − 1) matrix with rank n1 − 1. Then, using the same nonsingular transformation, [β,γ] becomes
Let τ1* be the orthogonal complement to the column space of Θ11. Then, because τ1*′Θ1 = 0 and using the expression (22), we can see that the n × 1 vector [τ1*′,0]′ is orthogonal to [β,γ]. Therefore, in this case, the trend parameter τ, which is orthogonal to β and γ, is a constant multiple of [τ1*′,0]′. In other words, when rk[β1,γ1] = n1 − 1, τ2 must be equal to zero. Note that, because τ1* is orthogonal to sp(β1) and sp(γ1), it is essentially the same as τ1*.
On the other hand, when τ2 = 0, τ is expressed as [τ1′,0]′ and then τ1′[β1,γ1] equals zero because τ′[β,γ] = 0. This implies that the n1 × (n − 1) matrix [β1,γ1] does not have full row rank. Then, we have the following proposition.
PROPOSITION 1. The rank of [β1,γ1] is n1 − 1 if and only if τ2 = 0.
When r = n − 1, there is no γ, and in this case, rk(β1) must be n1 − 1 or n1. Then, under the null hypothesis of rk(β1) = n1 − 1, δ* becomes an n1 × 1 vector, and we have
In this case, the test statistics should be multiplied by T, that is,
are the appropriate test statistics.
In the following theorem, the test statistics are constructed from the eigenvalues of (8) using the same
as in the previous section and either
or
THEOREM 5. When r < n − 1,
(i.a) Let
be given by (24). If rk([β1,γ1]) = n1 and f < p, under H0,
.
(i.b) Let
be given by (25). If rk([β1,γ1]) = n1 and f < p, under H0,
converge in distribution to random variables that are bounded above by χ(n1−f)(r−f)2 and λmax,n1−f,r−f*, respectively.
(ii) Let
be given by (25). If rk([β1,γ1]) = n1 − 1 and f < p, under H0,
.
When r = n − 1,
(iii) Let
be given by (25). Under the null hypothesis of f = n1 − 1,
Remark 4. In the case of (i.b),
converges in distribution to χ(n1−f)(r−f)2 if and only if δ*′τ1 = 0, which is equivalent to the case where τ1 ∈ sp(β1*) = sp(β1). See the proof in the Appendix. In general, the test using (25) is conservative if rk([β1,γ1]) = n1.
From Theorem 5, if we know the rank of [β1,γ1] when r < n − 1, we can construct the test statistic
that converges to a χ2 distribution by appropriately using (24) or (25). However, such information is not available in practice. Notice that if rk[β1,γ1] = n1 − 1,
given by (24) may violate the condition that it is a full rank matrix, and in that case, the test statistic converges not to the same χ2 distribution as given by Theorem 5(ii) but to a random variable that depends on a nuisance parameter. Then, the test using (24) is not desirable in practice. On the other hand, if we use
given by (25), we can test the hypothesis by referring to a χ2 distribution irrespective of the rank of [β1,γ1], although the test may be conservative and the degrees of freedom may change depending on the rank of [β1,γ1]. Then, noting that the critical value of χ(n1−f)(r−f)2 in Theorem 5(i) is greater than that of χ(n1−f−1)(r−f)2 in (ii), we propose to test the null of rk(β1) = f as follows.
1. We construct the test statistic
using (25).
2.If
is greater than the critical value of χ(n1−f)(r−f)2, we reject the null hypothesis.
3. If
is less than the critical value of χ(n1−f−1)(r−f)2, we accept the null hypothesis.
The test statistic
is used in the same manner. In this procedure, we may encounter the case where the test statistic is greater than the critical value of χ(n1−f−1)(r−f)2 but less than that of χ(n1−f)(r−f)2, that is, the case where
, where c(n1−f−1)(r−f) and c(n1−f−1)(r−f) are corresponding critical values. To cope with such a case, the following corollary is useful.
COROLLARY 1. Let
be given by (25). Suppose that r < n − 1 and the rank of β1 is f (< p).
(i) If rk([β1,γ1]) = n1,
converges in distribution to a random variable that is bounded above by λmin,r−f,n1−f*, which is the smallest nonzero eigenvalue of (11) with j = r − f and k = n1 − f.
(ii) If
converges in probability to zero.
The percentage points of λmin,r−f,n1−f* are tabulated in Table 1.
Using the preceding corollary, we can cope with the situation where
. If
is less than some percentage (10, 5, or 1%) point of λmin,r−f,n1−f*, we reject the hypothesis of rk([β1,γ1]) = n1. In that case, c(n1−f−1)(r−f) is an appropriate critical value for
, so that the null of rk(β1) = f is rejected. On the other hand, if
is greater than the critical point of λmin,r−f,n1−f*, we accept the hypothesis of rk([β1,γ1]) = n1, so that the rank of β1 is decided to be f. We call this testing procedure TEST1.
The other strategy is to use the result of Proposition 1. From Johansen (1995),
converges in distribution to a normal random vector with mean zero and the variance matrix given by CΣC′. Although the Wald-type test may not be applicable to the test of τ2 = 0 because the variance matrix might be degenerate, we can test whether each element of τ2 is zero or not by the t-test statistic. We call the following testing procedure TEST2.
1. We test each element of τ2.
2. If some of the elements of τ2 are significant, we use Theorem 5(i.a).
3. If none of the elements of τ2 are significant, we use Theorem 5(ii).
Next, we investigate a test of the rank of β⊥,1. When data are trending, β⊥,1 can be decomposed into [γ1,τ1] where γ1 and τ1 are the first n1 rows of γ and τ, respectively. Then, testing the rank of β⊥,1 is equivalent to testing the rank of [γ1,τ1], and therefore we construct a test statistic from
. Note that
is the first n1 rows of
and is not necessarily numerically equal to
, although they span the same column space.
Let us consider the same determinant equation as (13) with
replaced by
and
We construct the test statistics
in the same way as in the previous section. Similar to Theorem 5, we have to distinguish two cases where r < n − 1 and r = n − 1. When r = n − 1, the rank of β⊥,1 (= τ1) must be 0 or 1, and in this case, we consider the null hypothesis of g = 0.
THEOREM 6. Let
be given by (26) and (27). When r < n − 1 and g < q, under H0⊥,
converge in distribution to random variables that are bounded above by χ(n1−g)(n−g−r)2 and λmax,n1−g,n−g−r*, respectively.
When r = n − 1, under the null hypothesis of g = 0,
In this section, we investigate the finite sample properties of the tests proposed in the previous sections. We consider the following four-dimensional EC model as a data generating process (DGP):
where {εt} ∼ i.i.d.N(0,I4). Let
and we consider the following settings of parameters.
Here DGP1(1o), 2(2o), and 3(3o) correspond to the cases where the cointegrating rank is 1, 2, and 3, respectively. We set the (2,1) element of β as c1, which takes values of 0, 0.005, 0.01, 0.025, 0.05, 0.075, and 0.1, and we consider the test of the rank of the first two rows of β. The case of c1 = 0 corresponds to the null hypothesis under which the rank of β1 is 0, 1, and 1 for DGP1, 2, and 3, whereas it is 1, 2, and 2 when c1 ≠ 0, which corresponds to the alternative. For the case of nontrending data, we set d = 0 for the zero-mean process, whereas d is defined as αρ0 for the case of d ≠ 0, where ρ0 is set to be 1, [1,1]′, and [1,1,1]′ for DGP1(1o), 2(2o), and 3(3o), respectively. On the other hand, for the case of trending data, d is set to be d1 and d2; the former corresponds to the case where [β1,γ1] is of full rank (τ2 ≠ 0), whereas the rank of [β1,γ1] is n1 − 1 (τ2 = 0) when d = d2.
Similarly, we set the (2,1) element of β⊥ as c2 and consider the test of the rank of the first two rows of β⊥. In this case, c2 = 0 implies that the rank of β⊥,1 is 1, 1, and 0 for DGPo1, o2, and o3, respectively, whereas it is 2, 2, and 1 under the alternative of c2 ≠ 0.
We set x0 = 0 and discard the first 100 observations in all experiments. The number of replication is 5,000, and the level of significance is set equal to 0.05. We only report the results of the test statistics
because the performances of
are almost the same as those of
.
Table 2 shows the simulation results of the test of rk(β1). When the cointegrating rank is 1, the empirical size is greater than the nominal size, 0.05, for T = 100 when data are nontrending (d = 0 or d = αρ0), whereas it becomes closer to 0.05 for T = 200. When data are trending, τ becomes
for d = d1, whereas it is
for d = d2. Similar to the case of nontrending data, the testing procedure TEST2 tends to overly reject the null of c1 = 0 for T = 100, whereas the testing procedure TEST1 seems to be slightly conservative. Under the alternative of c1 ≠ 0, the power increases rapidly around c1 = 0.025 for nontrending data and for trending data with TEST2, whereas the testing procedure TEST1 seems to be less powerful. This is because TEST1 is a conservative test. When data are trending, both TEST1 and TEST2 are more powerful for the model with rk([β1,γ1]) = n1 (d = d1) than the model with rk([β1,γ1]) = n1 − 1 (d = d2).
When the cointegrating rank is 2, the relative performance is preserved for the cases of d = 0 and d = αρ0. For trending data, τ becomes
for d = d1 and d2, respectively. Note that
is numerically equal to
because the determinant equation (11) with j = k = 1 has only one eigenvalue. Then, we can see that
converge in distribution to χ12 under H0 when rk([β1,γ1]) = n1 = 2, whereas they converge in probability to zero when rk([β1,γ1]) = n1 − 1 = 1. Then, the testing procedure TEST1 accepts the null hypothesis when it is less than the critical point of χ12. On the other hand, the asymptotic size of TEST1 becomes 0 when rk([β1,γ1]) = n1 − 1 (d = d2) because
T converges in probability to zero when rk([β1,γ1]) = n1 − 1. Reflecting this fact, TEST1 is too conservative for d = d2, and it is not powerful when the alternative is close to the null. TEST2 also seems to have no power when the rank of [β1,γ1] = n1 − 1 = 1 (d = d2). This is because τ2 is very close to zero
1For example, even when c1 = 0.1, the third and fourth elements of τ are
and 3/430.
When the cointegrating rank is 3, we can see that the first two variables of xt are cointegrated, whereas the last two variables are stationary. Note that we cannot generate the process such that the rank of β1 is 1 while all the variables are nonstationary. Because we want to investigate the property of the test under the null hypothesis, we allowed several variables to be stationary.
In this case, the power property seems to be improved for all the cases compared with the cases where r = 1 and 2. For trending data, τ becomes
for d = d1, whereas it is
for d = d2. Note that in this case the last two rows of the impact matrix C become zero because the corresponding variables are stationary, so that τ2, the last two rows of Cd, become zero irrespective of the value of d. We also note that the result of Theorem 5(iii) is applied because r = n − 1 = 3. That is, we do not have to use the conservative test or the pretest as in the cases where r < n − 1. This is the reason why both the size and power properties are improved for trending data compared with the cases where r < 3.
Table 3 reports the results of the test of rk(β⊥,1). From the table, the test tends to overly reject the null hypothesis for several cases when T = 100, whereas the size becomes reasonable when T = 200, except for the case where r = 3 and d = d1. In that case, the test becomes conservative as investigated in Theorem 6. As to the power, we can see that the more complicated the deterministic term becomes, the less powerful is the test.
In this paper, we proposed tests of the rank of the submatrix of cointegration. We can test the hypothesis straightforwardly when data are nontrending, whereas for trending data, we have to examine whether [β1,γ1] is of full rank or not or we have to use the conservative test. The simulation results show that we have to carefully use the test of rk(β1) when data are trending and f = n1 − 1, because the test might become too conservative to reject the null hypothesis.
We use the notation H alternately for different definitions if there is no confusion.
Proof of Theorem 1. First, note that we can replace
in (8), where
is the first n1 rows of
, because
. The latter relation is established because
is obtained by the nonsingular transformation of the columns of
does not depend on the normalization of
. We also define
whose columns span the orthogonal complement to
), so that
span the same column space. This implies that
can be obtained by the nonsingular transformation of the columns of
. Then, we can also replace
.
Under the null hypothesis, rk(β1) is f, and then an n1 × f matrix β1* exists with rank f such that sp(β1) = sp(β1*). We denote the orthogonal complement to β1* by δ*. That is, δ* is an n1 × (n1 − f) matrix with rank (n1 − f) such that δ*′β1* = 0.
LEMMA 1.
Proof.
(ii) As shown in Chapter 13.2 of Johansen (1995),
can be expressed as
for nontrending data, where TUT converges in distribution to (∫G0 G0′ ds)−1∫G0 dV′. Because
is the first n1 rows of
, we have
, so that
(iii) holds because
from (A.1).
Now, let us consider the determinant equation (8). Note that (8) is equivalent to
where H = [β1*,Tδ*] is an n × n nonsingular matrix. Using Lemma 1, we have
To investigate the asymptotic behavior of
, we consider
with the same expression as (9). Note that
because
by Lemma 1. Then,
is asymptotically equivalent to
Then, the equation (A.2) is asymptotically equal to
Therefore, the eigenvalues
converge in probability to zeros and are of order T−2.
Here, notice that, in the same way as Johansen (1988, p. 246), we can find an r × (r − f) matrix J with rank (r − f) such that
with J′(β1′ β1*) = 0 and J′Ψ−1J = Ir−f , implying that J′(α′Σ−1α)−1J = Ir−f because Ψ = α′Σ−1α. Then, because |β1*′β1Ψβ1′ β1*| ≠ 0, (A.4) becomes
The variance matrix of X0′J conditioned on G0(·) is given by
Noting that sp(β⊥,1) must contain δ* because [β1,β⊥,1] is of full row rank n1 and sp(β1) does not contain δ*, we can see that δ*′β⊥,1 has full row rank n1 − f irrespective of the rank of β⊥,1, which is greater than n1 − f as explained in Remark 2. As a result, we can see that the conditional variance matrix of X0′J is nonsingular (a.s.). Then, by multiplying the square root of the left-hand side of (A.7) from both sides of (A.6), the determinant equation becomes (11) with j = n1 − f and k = r − f, and then
converges in distribution to the solution of (11). This proves Theorem 1. █
Proof of Theorem 2. The outline of the proof is the same as the proof of Theorem 1, and thus we omit details.
Under the null hypothesis, an n1 × g matrix β⊥,1* exists such that sp(β⊥,1*) = sp(β⊥,1) and rk(β⊥,1*) = g, and we denote the orthogonal complement to β⊥,1* by η*. Consider the following determinant equation:
where H = [β⊥,1*,Tη*]. As in the previous proof, we replaced [caret ] by ˜. Because
is the first n1 rows of
, we obtain, using Lemma 1(iii),
Then, similar to the previous proof, we can show that
converges in distribution to a solution of (11) with j = n1 − g and k = n − r − g. This proves Theorem 2. █
Proof of Theorems 3 and 4. Let
. Exactly in the same way as the proof of Lemma 13.2 in Johansen (1995), we can show that
where G0+ = [G0′,1]′. Then, because
is the first n rows of
, we have
whose conditional variance is given by L′(∫G0+G0+′ ds)−1L [otimes ] (α′Σ−1α)−1. Because
as expressed in Johansen (1995, p. 179), we have
We also have
, which is proved as Lemma 1(iv), where
with
replaced by
. Then, the theorems are proved similarly to Theorems 1 and 2. █
Proof of Theorem 5. For the case where r < n − 1, we give the following lemma.
LEMMA 2.
Proof.
From Lemma 10.3 in Johansen (1995), T−1[γ,T−1/2τ]′S11 [γ,T−1/2τ] converges in distribution to Ω whereas β′S11 β converges in probability to a positive definite matrix, Σβ, and [γ,T−1/2τ]′S11 β = Op(1). Then,
In addition, we can see that
because
. Using this result, we have
From (A.10) and (A.11),
converges in distribution to Ω11. █
(i.a) Proved in the same way as Theorem 1.
(i.b) In this case, the determinant equation becomes asymptotically equivalent to
Note that, in general, for a given symmetric and positive definite matrix A and a vector b,
and then
for any nonzero vector c. By substituting δ*′γ1(γ′γ)−1Ω11(γ′γ)−1γ1′δ* and
for A and b, we obtain, for a given G(·),
where X* is an (r − f) × (n1 − f) matrix with vec(X*) ∼ N(0,I(r−f)(n1−f)). The equality is established if and only if δ*′τ1 = 0.
(ii) Let us consider the determinant equation (A.2) with H = [β1*,Tδ0*,Tτ1*]. Using Lemma 2 and by some algebra, the determinant equation is shown to be asymptotically equivalent to
This determinant equation implies that there are f nonzero eigenvalues, p − f − 1 eigenvalues of order T−2, and one eigenvalue of order smaller than T−2. Then, we can see that
We can also show that
is of order T3 if we choose H = [β1*,Tδ0*,T3/2τ1*].
For the case where r = n − 1, the limiting distribution is derived similarly using (23). █
Proof of Corollary 1.
(i) Note that, in general, for a given positive definite matrix A, a vector b, and a matrix D,
where we used the relation (A.12). By Theorem 9 of Magnus and Neudecker (1988, p. 208), we can see that the p − f th eigenvalue of D′A−1D is greater than that of D′(A + bb′)−1D. Then, by substituting δ*′γ1(γ′γ)−1Ω11(γ′γ)−1γ1′δ*,
, and X′J for A, b, and D, the limiting distribution of
is shown to be bounded above by λmin,r−f,n1−f* because D′A−1D = X*X*′ in this case. Note that
if and only if δ*′τ1 = 0.
(ii) is proved in Theorem 5(ii). █
Proof of Theorem 6. Let us define β⊥,1* and η* as in the proof of Theorem 2.
LEMMA 3.
(i)
, say, where
(ii)
.
Proof.
(i) Because η*′γ1 = 0 and η*′τ1 = 0, we have, using (A.14),
(ii) First, note that, because
is invariant to each normalization of
. Then, we can express
.
From the expression (A.1), we can see that
We also have, from the definition of τ,
Because the left-hand side is zero from the orthogonality between γ and τ, the first n − r − 1 rows of (α⊥′Γβ⊥)−1α⊥′ μ are zero. Then, because each estimator is consistent, we have
Combining (A.15) and (A.16), we obtain
. █
Similar to the proof of Theorem 2, we consider the same determinant equation as (A.8). Using Lemma 3, we have
where S1 = [In−r−1,0], and then, using
, (A.8) is expressed as
for large values of T, where an (n − r) × (n − r − g) matrix J satisfies
. Noting that the conditional variance of Y′S1 J is given by
the test statistic
conditioned on G(·) converges in distribution to
where vec(Y*) ∼ N(0,I(n1−g)(n−r−g)) and J = [J1′,J2′]′. Because
the limiting distribution (A.18) is bounded above by