Hostname: page-component-745bb68f8f-grxwn Total loading time: 0 Render date: 2025-02-06T04:43:10.616Z Has data issue: false hasContentIssue false

STRONG CONSISTENCY RESULTS FOR LEAST SQUARES ESTIMATORS IN GENERAL VECTOR AUTOREGRESSIONS WITH DETERMINISTIC TERMS

Published online by Cambridge University Press:  22 April 2005

Bent Nielsen
Affiliation:
University of Oxford
Rights & Permissions [Opens in a new window]

Abstract

A vector autoregression with deterministic terms and with no restrictions to its characteristic roots is considered. Strong consistency results for the least squares statistics are presented. This extends earlier results where deterministic terms have not been considered. In addition the convergence rates are improved compared with earlier results.Comments from S. Johansen are gratefully acknowledged.

Type
Research Article
Copyright
© 2005 Cambridge University Press

1. INTRODUCTION

The strong consistency of least squares estimators in a vector autoregression with deterministic terms is studied. Autoregressions generally have three types of asymptotic behavior in that they may be stationary, a random walk type process, or an explosive process. For the econometric analysis of nonexplosive time series it usually suffices to use weak consistency and weak convergence arguments as in the work by Phillips (1991) and Johansen (1995). When a time series has explosive features it is mathematically more natural to use strong consistency arguments exploiting that explosive processes tend to follow persistent trajectories with probability one.

The first results showing consistency for explosive first-order autoregressions were due to Rubin (1950) and Anderson (1959), with some generalizations by, for instance, Fuller, Hasza, and Goebel (1981) and Jeganathan (1988). A general strong consistency result for vector autoregressions was given by Lai and Wei (1985), and this is generalized here to a situation with deterministic terms as seen in econometric models. The techniques employed are to a large extent based on methods presented by Lai and Wei (1982, 1983a, 1983b, 1985) and Wei (1992).

The paper is organized so that Section 2 presents the model and an overview of the main results. The proofs follow in Section 3–10 and will be outlined in Section 2.

The following notation is used throughout the paper. For a matrix α let α[otimes ]2 = αα′, whereas α [otimes ] β is the Kronecker product and equals for example (α11 β,α12 β) if α ∈ R1×2. Further diag(α1,…,αn) is a block diagonal matrix with diagonal blocks αj. When α is symmetric then λmin(α) and λmax(α) are the smallest and the largest eigenvalue, respectively. The choice of norm is the spectral norm ∥α∥ = [λmax[otimes ]2)]1/2, implying that ∥α−1∥ = [λmin[otimes ]2)]−1/2. Whereas

is a conditional expectation, the notation (Yt|Zt) denotes the residual of the least squares regression of Yt on Zt. The abbreviation a.s. is used for properties holding almost surely.

2. THE AUTOREGRESSIVE MODEL AND MAIN RESULTS

The model in this paper is for a p-dimensional time series, X1−k,…,X0,…,XT satisfying a kth-order vector autoregressive equation

where Dt−1 is a deterministic term and εt an innovation term.

The innovations are required to satisfy the local Marcinkiewicz–Zygmund conditions for convergence of explosive processes introduced by Lai and Wei (1983a). These are that (εt) is a martingale difference sequence with respect to an increasing sequence of σ-fields

with the properties that some conditional moments of higher order are bounded and that the conditional variance has positive definite limit points.

Each of the Assumptions 2.1 and 2.2 excludes the possibility that the innovations could be autoregressive conditional heteroskedastic (ARCH) as proposed by Engle (1982). Therefore these conditions would probably be perceived as too strong for nonexplosive situations, but for general autoregressions they are convenient.

The deterministic term Dt is a vector of terms such as a constant, a linear trend, or periodic functions such as seasonal dummies. Inspired by Johansen (2000) the deterministic terms are required to satisfy the difference equation

where D has characteristic roots on the complex unit circle. For example

will generate a constant and a dummy for a biannual frequency. The deterministic term Dt is assumed to have linearly independent coordinates, which is formalized as follows.

Assumption 2.3. |eigen(D)| = 1 and rank(D1,…,Ddim D) = dim D.

With this type of deterministic term the time series can be written conveniently in companion form. The stacked process Xt−1 = (Xt−1′,…,Xtk′)′ satisfies

when defining matrices B and μ and a process eX,t as

whereas St = (Xt′,Dt′)′, which combines Xt with the deterministic process Dt, satisfies

where S and eS,t are defined as

The main object of interest is the least squares estimator for the parameters A1,…,Ak,μ, which has the form

where St−1[otimes ]2 is the outer product St−1 St−1′. The partial estimator for the dynamic parameters A1,…,Ak can correspondingly be written in terms of the residuals (Xt|Dt) from regressing the companion vector Xt on the deterministic terms Dt as

whereas the least squares variance estimator satisfies

The first main result gives a bound for a studentized version of the joint estimator.

THEOREM 2.4. Suppose Assumptions 2.1–2.3 are satisfied. Then

The remainder term can be decomposed as a sum of the following terms:

Special cases have been proved by Pötscher (1989, Lemma A.1) for dim D = 0 and ξ = 0 and by Nielsen (2001a), who proves a univariate version holding in probability.

The proof of Theorem 2.4 relies on a separation of the stochastic and deterministic components using that

can be rewritten as

by partitioned inversion. For the least squares estimator itself a more complete understanding of the interaction between the deterministic components and unit root processes appearing in the denominator matrix is needed. Such results are not available as yet, and the following consistency results are therefore only partial although they do represent some improvement over previous results and include a complete description of the pure stationary and purely explosive cases where |eigen(B)| ≠ 1.

THEOREM 2.5. Suppose Assumptions 2.1–2.3 are satisfied. Then

If B and D have no common eigenvalues then

The issue of consistency of the least squares estimator was first discussed for a univariate, explosive, Gaussian first-order autoregression, with dim X = 1, dim D = 0, by Rubin (1950) and Anderson (1959). Lai and Wei (1985, Theorem 4) studied strong consistency for the special case without deterministic terms, so dim D = 0, and gave a weaker result with ξ = 0. A related generalization has previously been presented by Duflo, Senoussi, and Touati (1991, Theorem 1) in the case where the explosive roots have multiplicity one, whereas their Theorem 2 seems false in suggesting that the least square estimator for B otherwise is inconsistent.

A direct consequence of Theorem 2.4 concerning the studentized least squares estimator is that the least squares variance estimator can be estimated consistently.

COROLLARY 2.6: Suppose Assumptions 2.1–2.3 are satisfied. Then

Although Assumptions 2.1 and 2.2 suffice to ensure that the sequence

is relatively compact with positive definite limit points as argued by Lai and Wei (1985), some further structure is needed to get convergence to an interpretable matrix. In light of Assumptions 2.1 and 2.2 it is convenient to apply the following sufficient condition used by Chan and Wei (1988).

Assumption 2.7.

where Ω is positive definite.

This gives rise to the following convergence result.

THEOREM 2.8. Suppose Assumptions 2.1 and 2.7 are satisfied. Then

Corollary 2.6 and Theorem 2.8 lead to an immediate result for the least squares variance estimator.

COROLLARY 2.9. Suppose Assumptions 2.1, 2.3, and 2.7 are satisfied. Then

There is potential for many other econometric applications of Theorem 2.4. An example is lag length determination, where it is conceptually important to establish the lag length before determining the location of the characteristic roots; see Pötscher (1989), Wei (1992), and Nielsen (2001b). Other examples are unit root testing (Nielsen 2001a) and cointegration analysis (Nielsen, 2000), where the asymptotic inference results also can be used without knowledge about the characteristic roots.

The remainder of the paper gives the proofs of these results. To a large extent the proofs follow those of Lai and Wei (1983b and 1985) but with many modifications because of the deterministic term. The proofs are outlined as follows. In Section 3 the process Xt is decomposed into autoregressive components Ut,Vt,Wt with characteristic roots outside, on, and inside the unit circle, respectively. The order of magnitude of the deterministic process, Dt, and the process itself, Xt, is discussed in Sections 4 and 5, respectively. The next sections are concerned with the order of magnitude of the denominator matrix

. As a first step the sample correlation of Ut and Dt is considered in Section 6. The order of magnitude of the largest and the smallest eigenvalue of MT is then discussed in Sections 7 and 8, respectively. The next step is to discuss sample correlations of all possible combinations of the processes Ut,Vt,Wt,Dt in Section 9, and finally the main results are proved in Section 10.

3. DECOMPOSITIONS OF THE PROCESS

The process Xt is decomposed in two ways to facilitate the subsequent asymptotic analysis. The first decomposition concerns the stochastic part of the process whereas the second decomposition disentangles deterministic and stochastic parts of the process.

The first decomposition separates the eigenvalues of the companion matrix B for the stochastic part of the process. Following Herstein (1975, p. 308) there exists a regular, real matrix M that transforms B into a real block diagonal matrix with blocks U, V, and W having eigenvalues with absolute value less than one, equal to one, and larger than one, respectively. That is,

For the purpose of proving the results of Section 2 it can be assumed without loss of generality that B = diag(U,V,W) is block diagonal and St = (Ut′,Vt′,Wt′,Dt′)′.

The second decomposition seeks to separate the stochastic and deterministic components and is based on two arguments. The processes Ut,Wt,Dt are first separated by a similarity transformation using that the matrices U, W, and D have no common eigenvalues, whereas the processes Vt,Dt have to be discussed in more detail because the matrices V and D may in general have common eigenvalues.

The processes Ut,Wt are linear functions of the deterministic process Dt, and they are first shown to satisfy the relationships

with

. The argument is the same in both cases. Taking Ut as an example consider the companion matrix for the vector (Ut′,Dt′) and apply a similarity transformation of the form

The result (3.1) then follows by arguing that

can be chosen so

. Because the matrices U and D have no common eigenvalues a unique solution

can be found according to Gantmacher (2000, p. 225).

In the special case where B has no eigenvalues in common with D the same argument can be made for the entire process Xt as for the Ut,Wt processes. That is,

When it comes to the general situation where V and D are allowed to have common eigenvalues it is convenient first to discuss the special case where V and D have their eigenvalues at one and are both Jordan matrices

with (Λ,E) = (1,1). In that situation Vt will be shown to satisfy

with

and where

is of the form (3.4) with

. To see this write the process (Vt,Dt) in companion form as

which has the solution

The result (3.5) is therefore a consequence of the following two properties:

for some matrices

. The second property is proved as follows. Because V and D are both Jordan blocks of the form (3.4) with eigenvalues at one a real matrix M exists so that

is a block diagonal matrix with diagonal elements that are Jordan blocks of the form (3.4) with (Λ,E) = (1,1). The left-hand side of (3.7) can then be written as a sum:

for some matrices μj,Dj,0 and Jordan blocks Dj. Writing the vector Dj,0 as T(0,1)′ where T is an upper triangular band matrix and noting that upper triangular band matrices commute then μjDjtDj,0 = μjDjtT(0,1)′ = μjTDjt(0,1)′. This in turn can be written as

, ensuring the desired result (3.7) with

.

In general V and D can have eigenvalues anywhere on the unit circle. Suppose these occur at l distinct complex pairs exp(iθj) and exp(−iθj) for 0 ≤ θj ≤ π, which of course reduce to a single eigenvalue of 1 or −1 if θj equals 0 or π. Following Herstein (1975, p. 308) and using Assumption 2.3 to D there exist regular, real similarity transformations MV and MD that block-diagonalize V and D as

where the subblocks Vj,m and Dj are real Jordan matrices of the form (3.4) and where (Λ,E) is one of the pairs

Using the same argument as before it therefore holds in general that Vt has representation (3.5) where

has subcomponents

of dimension dim Vj and dim Vj,m, respectively, and

has subcomponents

of dimension

.

Combining the results (3.1), (3.2), and (3.5) with the notation

shows that the process without loss of generality can be represented as

It is convenient also to introduce the dimensions

and also the maxima

4. LIMITING RESULTS FOR THE DETERMINISTIC COMPONENT

In the following discussion the order of magnitude of the deterministic process Dt and the denominator matrix

will be described.

The main result is formulated using normalization matrices

THEOREM 4.1. Suppose Assumption 2.3 is satisfied. Then it holds that

Two lemmas concerning the order of

are needed.

LEMMA 4.2. Let Dj,1,0 denote the last element of the initial vector Dj,0. Then

where f is the polynomial vector f(n + 1,u) = [un/(n!),…,u0/(0!)]′.

Proof of Lemma 4.2. The process Dj,t satisfies Dj,t = DjtDj,0 where

and b(·,·) is the binomial coefficient. The desired result then follows by noting that

uniformly in t whereas ∥Λn∥ ≤ ∥Λ∥n = 1. █

LEMMA 4.3.

Proof of Lemma 4.3. (i) By Lemma 4.2 the cross-product satisfies

Trigonometric identities show that ΛjtDj,1,0 Dj,1,0′(Λjt)′ equals EjDj,1,02/dim Λj + Rj where Rj = 0 for dim Λj = 1 and Rj = cos(2θt)A + sin(2θt)B for some constant matrices A,B when dim Λj = 2. When dim Λj = 1 the desired result follows immediately, whereas when dim Λj = 2 it follows from the result

for any constant a; see Gradshteyn and Ryzhik (1965, 2.633.2).

(ii) Note that the vector f (n,u) can be expressed as a nonsingular linear transformation of the first n Legendre polynomials, p(n,u), say, which have the property that

giving the positive definiteness.

(iii) Use the same type of arguments as in (i), noting that ΛjtDj,1,0 Dm,1,0′(Λmt)′ equals cos(2θt)A + sin(2θt)B for some constant matrices A,B. █

Proof of Theorem 4.1.

(i)The result follows from Lemma 4.2 by stacking the processes Nδj Dj,t and using the triangle inequality.

(ii) Lemma 4.3 implies that

converges to a block diagonal matrix with positive definite diagonal elements.

(iii) The desired result follows from (i) and (ii) and replacing each Dt with ND Dt. █

5. THE ORDER OF MAGNITUDE OF THE PROCESS

In the following discussion the order of magnitude of the process Xt is investigated. This is a generalization of Lai and Wei (1985, Theorem 1) where the case without deterministic components is considered. Subsequently a convergence result is given for the explosive component Wt.

To describe the order of magnitude of Xt let λj denote the distinct eigenvalues of B whereas mj is the multiplicity of λj. Define the multiplicity of the largest eigenvalue as

The following result then holds.

THEOREM 5.1. Suppose Assumptions 2.1 and 2.3 are satisfied. Then, for ξ < γ/(2 + γ),

Proof of Theorem 5.1. By (3.9) it holds that

. Lai and Wei (1985, Theorem 1) show the results for the purely stochastic component

, whereas the order of the deterministic component

follows from Lemma 4.2. █

When studying the process

Lai and Wei (1985) use the following generalization of the Marcinkiewicz–Zygmund theorem.

THEOREM 5.2 (Lai and Wei, 1983a, Corollaries 3 and 4). Suppose Assumptions 2.1 and 2.2 are satisfied. Then for any sequence of matrices At the series

converges a.s. if and only if the series

converges. If this holds, and At ≠ 0 for infinitely many t, then

for any variable Y that is

-measurable for some t.

This result yields a more precise statement about the order of magnitude of the explosive component.

COROLLARY 5.3. Suppose Assumptions 2.1–2.3 are satisfied. Then

Proof of Corollary 5.3.

6. CORRELATION BETWEEN STATIONARY AND DETERMINISTIC COMPONENTS

One major difference between the results presented here and the work of Lai and Wei (1985) is that deterministic terms are included in the model. Before turning to the question of how big the denominator matrix can be in Section 7 it is convenient to consider the asymptotic order of magnitude of correlations between the zero mean process with roots smaller than one,

, and the deterministic component, Dt.

As a first step toward discussing the sample correlation of

, results of Lai and Wei concerning the matrices

are stated. The results give conditions for relative compactness of sequences of such matrices. Recalling that the relative compactness of a sequence is the property that the limit points fall in a compact set, this enables a discussion of the order of magnitude of the sequence under weak assumptions. In particular, a condition is given ensuring that the limit points are bounded away from zero.

THEOREM 6.1 (Lai and Wei, 1985, Theorem 2, equation (3.7), Example 3). Suppose Assumption 2.1 is satisfied. Then, with probability one, the matrix sequences

are relatively compact with the same limit points.

If in addition Assumption 2.2 is satisfied, the limit points are positive definite.

Because eU,t is a linear combination of εt the sequence

is therefore relatively compact. In addition the following results can be shown.

THEOREM 6.2 (Lai and Wei, 1985, Theorem 2, Example 3). Suppose Assumptions 2.1 and 2.2 are satisfied. Then it holds with probability one that

and also it holds that

Before turning to the sample correlation of

it is useful to cite the following univariate result by Wei (1985).

LEMMA 6.3 (Wei 1985, Lemma 2). Suppose Assumption 2.1 is satisfied. Let (xt) be a sequence of random variables adapted to

with

. Assume

for some η > 0. Then

The result for the sample correlation of

can now be stated and proved.

THEOREM 6.4. Suppose Assumptions 2.1–2.3 are satisfied. Then, for all η > 0,

Proof of Theorem 6.4. Theorem 6.2 shows that

, so it suffices to show that

.

The main contribution arises from the sum

for an α satisfying 1 > α > 0. With this in mind and using

the object of interest can be written as

The first two terms in (6.1) are o(Tη). To see this bound their norm by

and use Theorems 4.1 and 5.1 and that ∥UTα decreases exponentially.

The third term in (6.1) is o(Tη). To see this use Dt = DsDts and the normalization ND given in (4.1) to rewrite it as

The norm of this expression is bounded by

Because the sum

converges it suffices to show that the last two components can be approximated uniformly by a variable that is o(Tη).

The sum

is approximately equal to

, which converges to a positive definite matrix; see Theorem 4.1. The norm of the approximation error,

, is bounded by

, because of Theorem 4.1.

In a similar way

is approximately

, which is not dependent on s. Considering each element of this matrix and applying Lemma 6.3 shows that this is o(TηND−1). The approximation error can be bounded by

Using Theorems 4.1 and 5.1 this is seen to be o(Tα−ξ/2), which is o(1) for a small α. █

Some immediate consequences of these results are the following examples.

Example 6.5

Suppose Assumptions 2.1 and 2.7 are satisfied. Then Theorems 6.1 and 6.2 imply

and in particular

for some matrix ΩU, so

Example 6.6

Suppose Assumptions 2.1–2.3 are satisfied. Then Theorems 6.2 and 6.4 and equation (3.1) imply that the sequence of matrices

is relatively compact with positive definite limit points. Moreover, this series converges almost surely if

is convergent. According to Example 6.5 this is for instance the case if, in addition, Assumption 2.7 is satisfied.

7. THE LARGEST EIGENVALUE OF THE DENOMINATOR MATRIX

The order of magnitude of the largest eigenvalue of the denominator matrix

can now be described. This is followed by a convergence result for the purely explosive case and a bound for the rate of convergence of sum of powers of

.

First, the largest eigenvalue of MT is considered in the following generalization of Lai and Wei (1985, Corollary 1).

THEOREM 7.1. Suppose Assumptions 2.1–2.3 are satisfied. Then

Proof of Theorem 7.1. If max|eigen(B)| < 1 then

by (3.1). Lai and Wei (1985, Corollary 1) show

and the result then follows from Theorems 4.1 and 6.4. If max|eigen(B)| ≥ 1 the result follows directly from Theorem 5.1. █

For the explosive part of the process the following generalization of Lai and Wei (1985, Corollary 2) can be established.

COROLLARY 7.2. Suppose Assumptions 2.1–2.3 are satisfied and min|eigen(B)| > 1 so

and recall the definition of W in Corollary 5.3. Then

where FW is positive definite a.s.; hence

Proof of Corollary 7.2. Let RT denote the difference between the matrices

. The decomposition (3.2) shows

which vanishes for large T because of Theorem 4.1(i) and Corollary 5.3(i). The desired result is then a direct consequence of Lai and Wei (1983b, Theorem 2). █

Whereas Theorem 7.1 gives a bound for the sum of squares of the process, the following result gives a bound for sums of higher order powers of the stationary component.

THEOREM 7.3. Suppose Assumption 2.1 is satisfied. Then, for all η > 0 and ζ < γ,

Proof of Theorem 7.3. For notational convenience define

. Using Hölder's inequality it follows that

Summation over t then gives the following bound:

Changing summation index in the double sum, this can be bounded further by

The first two sums converge, whereas the third term can be decomposed as

The latter term is of order O(T) = o(T1+η) by Assumption 2.1. The first term is a martingale. Normalized by T1+η it converges to zero a.s. on the set where

see Hall and Heyde (1980, Theorem 2.18). Minkowski's inequality shows that this sum is finite if the sum

is finite. Assumption 2.1 ensures that this is the case. █

8. THE SMALLEST EIGENVALUE OF THE DENOMINATOR MATRIX

Three results are given concerning the order of the inverse of the denominator matrix,

, of the least square estimator in the nonexplosive case. Using the techniques of Chan and Wei (1988) it can be proved that T−1MT is bounded from below in a weak convergence sense. The first result goes some way toward an almost sure version of this result in showing that the partial denominator matrix

is bounded from below, whereas the second result shows that the joint matrix T−1MT is bounded from below in the special case where B and D have no common eigenvalues. In combination these results can be used to establish the third result concerning the order of maxTαtT StMT−1St without actually establishing the order of MT−1, and this will suffice to prove the main theorems.

The first result concerns the partial denominator matrix

and is related to Lai and Wei (1985, Theorem 3).

THEOREM 8.1. Suppose Assumptions 2.1–2.3 are satisfied and |eigen(B)| ≤ 1. Then

To prove Theorem 8.1 the following lemma is needed. This lemma ensures that Lai and Wei (1982, Lemma 1) concerning the order of magnitude of normalized least squares estimators can be used.

LEMMA 8.2. Suppose Assumptions 2.1–2.3 are satisfied. Then

Proof of Lemma 8.2. It suffices to show that uMTu > 0 for all uRdim S so u ≠ 0 and some T. Because

it is equivalent that (S0,…,ST−1)R spans Rdim S for some invertible matrix R.

The decomposition (3.9) shows that

where

. The Cayley–Hamilton theorem (see Herstein, 1975, p. 334) implies that if

with d0 = 1 is the characteristic polynomial of

then

and in particular

, say. Define

and partition R as a (2 × 2)-block matrix so the upper right block is a dim

-dimensional square matrix. The preceding properties then show that (S1,…,ST)R is an upper triangular (2 × 2)-block matrix. The lower right block is

, which spans Rdim D by Assumption 2.3. It is left to prove that the upper left block

spans Rdim X. The process zt is a linear combination of the process

that satisfies a first-order autoregression without deterministic terms. The desired result then follows from Lai and Wei (1985, Theorem 3) using Assumptions 2.1 and 2.2. █

Proof of Theorem 8.1. Let m = dim X. Using the model equation (2.3) and that Dt−1−j = D−1−jDt the process Xt can be rewritten as

where eX,0 = X0, eX,−t = −μDt−1, and Xt = 0 for t > 0. It follows that

It is now argued that the lower bound in (8.1) satisfies

The norm of the difference between the left-hand side and the first term on the right is bounded by

The first sum is finite, so when normalized by the denominator term it is seen to be O(1) as a result of Lemma 8.2. The normalized second term is O[(log T)1/2] as a result of Lai and Wei (1982, Lemma 1), which can be used because of Lemma 8.2.

Using Lai and Wei (1982, Lemma 1) once again in combination with Theorem 6.2 shows that, for j < k,

This in turn implies that

The proof is completed by combining (8.2), (8.3), and Theorem 6.2. █

When B and D have no common eigenvalues Theorem 8.1 can be extended.

THEOREM 8.3. Suppose Assumptions 2.1–2.3 are satisfied and B and D have no common eigenvalues. Then lim infT→∞ λmin(T−1MT) > 0 a.s.

Proof of Theorem 8.3. Because of the representation

given in (3.3) it suffices to show the result for sums of squares of

If

is the characteristic polynomial of B the Cayley–Hamilton theorem (see Herstein, 1975, p. 334) implies

, and hence

Because B and D have no common eigenvalues then

. It follows that

using Theorems 4.1, 6.2, and 6.4 and Lai and Wei (1985, equation (3.19)). The argument is finished as in the proof of Lai and Wei (1985, Theorem 3). █

The final and more technical result addresses the order of StMT−1St. Lai and Wei (1985, Lemma 4) show that maxtT StMT−1St vanishes when |eigen(B)| ≤ 1 and dim D = 0. For the subsequent analysis it suffices to take a maximum over just TαtT for some 0 < α < 1 requiring that 0 < |eigen(B)| but allowing dim D > 0.

THEOREM 8.4. Suppose Assumptions 2.1–2.3 are satisfied and that 0 < |eigen(B)| ≤ 1. Then

for all α,ζ so 0 < α < 1 and ζ < min[γ/(2 + γ),½].

Theorem 8.4 will be proved in a few steps following Lai and Wei (1983b). The first step is to strengthen their Lemma 3.

LEMMA 8.5. Let (at) be a sequence of nonnegative numbers satisfying the following conditions.

Then aT = o(T−ρ) for all ρ < min(1,κ/2).

Proof of Lemma 8.5. Condition (i) implies that for every 0 < ρ < 1 it holds that minT>tTTρ ataT − 2CTρ−κ for all large T. In particular, choosing ρ to satisfy 0 < ρ < min(1,κ/2), it is seen that

Combining this with (ii) it follows that

. Because δ can be chosen arbitrarily small this proves the desired result. █

The second step is to generalize Lai and Wei (1983b, Lemma 6ii).

LEMMA 8.6. Suppose Assumptions 2.1–2.3 are satisfied and max|eigen(B)| ≤ 1. Define T0 as in Lemma 8.2. Then

Proof of Lemma 8.6. The proof is the same as that of Lai and Wei (1983b, Lemma 6ii) using the generalizations of their Theorem 3 and Lemma 6i presented previously in Theorem 7.1 and Lemma 8.2. █

The third step is to generalize Lai and Wei (1983b, Lemma 7).

LEMMA 8.7. Suppose Assumptions 2.1–2.3 are satisfied and that 0 < |eigen(B)| ≤ 1. Then

Proof of Lemma 8.7. (i) Using Lai and Wei (1983b, Lemma 5i) in the same way as in the proof of Lai and Wei (1983b, Lemma 7i) it holds that

where

This expression can be rewritten using the identities

for ι = (Idim X,0). The desired result follows by noting that

according to Lai and Wei (1982, Lemma 1), which can be used because of Lemma 8.2, whereas the term

by Theorem 8.1.

(ii) Noting that ST = SST−1 + eS,T it holds that

The norm of the first term is less than

by (i). This is in turn bounded by

because of Lemma 8.6(i). The second term equals

according to the identities (8.4) and is seen to be o(T−ξ/2) by Theorems 5.1 and 8.1. █

Theorem 8.4 can now be proved.

Proof of Theorem 8.4. Lemmas 8.6(ii) and 8.7(ii) show that the conditions of Lemma 8.5 are satisfied for the sequence StMt−1St with κ = ζ/2 and therefore tζ/4StMt+1−1St = o(1) for large t. For t > T0 then Mt−1 > Mt+1−1 so that StMT−1StStMt+1−1St, and thus for all ε > 0 and almost every outcome a T1 exists so that for all t,T so that T > tT1 it holds that StMT−1StStMt+1−1St < ε. This in turn implies that for all ε > 0 and almost every outcome a T1 exists so that for all T so that T > T1 it holds that maxTαt<T StMT−1StStMt+1−1St < ε as desired. █

9. SAMPLE CORRELATIONS

It has already been established in Section 6 that the sample correlation of

and Dt vanishes asymptotically. In the following discussion the remaining sample correlations of pairs of the processes

are studied. A first result concerns the sample correlation of Wt and Ut,Dt.

THEOREM 9.1. Suppose Assumptions 2.1–2.3 are satisfied. Then

The bound for the sample correlation of

should be viewed in light of the results of Anderson (1959). He found that

is convergent when the innovations εt are independent, identically distributed, but in general divergent. The stated result combined with Theorem 6.1 shows that the order of YT is at most o[T(1−ξ)/2] for martingale difference sequence innovations.

Proof of Theorem 9.1. The norms of the two expression are bounded by

where m is either of

The last two terms of (9.1) are convergent according to Corollaries 5.3 and 7.2. It holds that mD = O(T−1) by Theorem 4.1, whereas mU = o(T−ξ) because Theorem 5.1 shows

, whereas the denominator term is O(T−1) by Example 6.6. █

For the sample correlation between Wt and St=(Vt′,Dt′)′ a different type of proof is needed using the results of Section 8. This is because the order of the smallest eigenvalue of

is unknown.

THEOREM 9.2. Suppose Assumptions 2.1–2.3 are satisfied. Then, for all ζ < min[γ/(2 + γ),½], it holds that

Proof of Theorem 9.2. For convenience define

Because

is convergent according to Corollary 7.2 it suffices to show that

. Follow Anderson (1959, Theorem 2.2) in writing

Because

it follows, for any 0 < α < 1, that

The first two terms in (9.3) vanish exponentially fast. Their norm is less than

where ∥WTαT vanishes exponentially, MT−1 = O(1) by Lemma 8.2, and the remaining terms are of polynomial order according to Theorem 5.1 and Corollary 5.3.

The final term in (9.3) is o(T−ζ/8). Its norm is less than

where the first two components converge (see Corollary 5.3) and the last component is o(T−ζ/8) by Theorem 8.4. █

Remark 9.3. The bottleneck in the proof of Theorem 9.2 is the order of magnitude of maxTα<tT StMT−1St. By extending the weak convergence results of Chan and Wei (1988) it can be proved that this term is

when |eigen(B)| ≤ 1, implying that the sample correlation between Wt and (Vt′,Dt′) is

.

Wei (1992, Theorem A.1) considers the sample correlation between Ut and Vt in the univariate case dim X = 1 when Dt is absent and εt is a martingale difference sequence satisfying Assumptions 2.1 and 2.7. That result can be generalized and strengthened by a proof resembling that of Theorem 6.4.

THEOREM 9.4. Suppose Assumptions 2.1–2.3 are satisfied. Then, for all ξ < γ/(2 + γ),

Proof of Theorem 9.4. Define St, S, eS,t, and MT as in (9.2). Theorem 6.2 shows that

, so it suffices to show that

. Inspired by the proof of Theorem 6.4 and Wei (1992, Theorem A.1) write

for some 0 < α < 1 so that

The first term in (9.4) is o[T(1−ξ)/2]. The norm of the sum

is bounded by

, which is of the desired order when α is chosen small enough and using Theorem 5.1, whereas MT−1/2 is bounded as a result of the positive definiteness of MT stated in Lemma 8.2.

The second term in (9.4) is o(1). By Cauchy–Schwarz inequality its norm is bounded by

where ∥UTα vanishes exponentially and the other terms are O[(T log T)1/2] as a result of Theorem 6.2 and Lemma 8.6(ii).

The third term of (9.4) is o[T(1−ξ)/2]. Its norm is bounded by

according to the triangle inequality. Hölder's inequality implies

for 2 < p < 2 + γ and p−1 + q−1 = 1. Because q/2 < 1 and Tα > T0 for large T, then (StMT−1St)q/2StMT−1StStMt−1St, so the last term is o(Tη) for all η > 0 according to Lemma 8.6(ii). Because of Theorem 7.3 then

uniformly in s. Overall the sum in t is therefore o[T(1−ξ)/2] uniformly in s. The desired result then follows because

converges. █

Tables 1 and 2 give an overview of the sample correlation results of Theorems 6.4, 9.1, 9.2, and 9.4. All pairs of

have been considered except for Vt,Dt, which has nonnegligible sample correlation when V and D have common characteristic roots. To produce these tables it is used that the marginal sample correlation, C(x,y), of processes xt,yt relates to joint correlations by

according to the formula for partitioned inversion, and also by

Order of pairwise sample correlations, with η > 0 and ξ < γ/(2 + γ)

Order of pairwise sample correlations, with η > 0 and ξ < γ/(2 + γ)

As a consequence of the results summarized in Table 2 the condition |eigen(B)| ≤ 1 can be eliminated in Theorem 8.1 concerning the lower bound for

.

COROLLARY 9.5. Suppose Assumptions 2.1–2.3 are satisfied. Then

Proof of Corollary 9.5. Let Rt = (Ut′,Vt′)′. Using a similarity transformation M as described in Section 3 and the results in Table 2 shows that

equals

a.s. Apply Theorem 8.1 to the upper left block and Corollary 7.2 and Theorem 9.1 to the lower left block. █

10. PROOFS OF MAIN RESULTS

The proofs of the main results in Theorems 2.4, 2.5, and 2.8 now follow. The first of these results concerns the studentized least squares estimator.

Proof of Theorem 2.4. The process St is a linear combination of Rt = (Ut′,Vt′,Dt′)′ and Wt. As a consequence of Theorems 9.1 and 9.4 the sample correlation of Rt and Wt vanishes asymptotically; see also Table 1. The vector of interest therefore equals

so the nonexplosive and explosive components can be considered separately.

For the explosive component note that the norm of

is bounded by

where the first two terms are convergent because of Corollaries 7.2 and 5.3. The order of the last term is given in Theorem 5.1.

For the nonexplosive part with max|eigen(B)| = 1 Lai and Wei (1982, Lemma 1) together with Theorem 7.1 shows the desired result.

For max|eigen(B)| < 1 then Lemma 6.3 combined with Theorems 4.1, 6.2, and 6.4 shows the result. █

By combining Theorem 2.4 with results for the denominator matrix established in Sections 7 and 8 the strong consistency result for the least squares estimator can now be proved.

Proof of Theorem 2.5. Consider first the partial estimator. Transforming Xt into (Rt′,Wt′) with Rt = (Ut′,Vt′)′ using a similarity transformation M as described in Section 3 shows that

equals

The sample correlation between (Rt−1|Dt) and (Wt−1|Dt) vanishes asymptotically, so it suffices to prove the result for the two special cases where max|eigen(B)| ≤ 1 so dim W = 0 and where min|eigen(B)| > 1 so dim R = 0. In the first case the desired order follows from Theorems 2.4 and 8.1 whereas in the second case the statistic vanishes exponentially fast as a result of Theorem 2.4 and Corollary 7.2.

The second result for the full estimator when B and D have no common eigenvalues follows from Theorems 2.4 and 8.3. █

Proof of Theorem 2.8. Assumption 2.7 shows that mt = a′(εt2 − Ω)b for arbitrary dim X-vectors a and b. Hall and Heyde (1980, Theorem 2.18) show that if 1 ≤ p ≤ 2 then

on the set

. This set has probability one if p ≤ 1 + γ/2 and p(ζ − 1) < −1 according to Assumption 2.1. These restrictions are satisfied when ζ < min[γ/(2 + γ),½]. █

References

REFERENCES

Anderson, T.W. (1959) On asymptotic distributions of estimates of parameters of stochastic difference equations. Annals of Mathematical Statistics 30, 676687.Google Scholar
Chan, N.H. & C.Z. Wei (1988) Limiting distributions of least squares estimates of unstable autoregressive processes. Annals of Statistics 16, 367401.Google Scholar
Duflo, M., R. Senoussi, & R. Touati (1991) Propriétés asymptotiques presque sûre de l'estimateur des moindres carrés d'un modèle autorégressif vectoriel. Annales de l'Institut Henri Poincaré—Probabilités et Statistiques 27, 125.Google Scholar
Engle, R.F. (1982) Autoregressive conditional heteroscedasticity with estimates of the variance of UK inflation. Econometrica 50, 9871008.Google Scholar
Fuller, W.A., D.P. Hasza, & J.J. Goebel (1981) Estimation of the parameters of stochastic difference equations. Annals of Statistics 9, 531543.Google Scholar
Gantmacher, F.R. (2000) The Theory of Matrices, vol 1. Reprint. Chelsea Publishing.
Gradshteyn, I.S. & I.M. Ryzhik (1965) Table of Integrals, Series and Products. Academic Press.
Hall, P. & C.C. Heyde (1980) Martingale Limit Theory and Its Applications. Academic Press.
Herstein, I.N. (1975) Topics in Algebra, 2nd ed. Wiley.
Jeganathan, P. (1988) On the strong approximation of the distributions of estimators in linear stochastic models, parts I and II: Stationary and explosive AR models. Annals of Statistics 16, 12831314.Google Scholar
Johansen, S. (1995) Likelihood-Based Inference in Cointegrated Vector Autoregressive Models. Oxford University Press.
Johansen, S. (2000) A Bartlett correction factor for tests on the cointegrating relations. Econometric Theory 16, 740778.Google Scholar
Lai, T.L. & C.Z. Wei (1982) Least squares estimates in stochastic regression models with applications to identification and control of dynamic systems. Annals of Statistics 10, 154166.Google Scholar
Lai, T.L. & C.Z. Wei (1983a) A note on martingale difference sequences satisfying the local Marcinkiewicz-Zygmund condition. Bulletin of the Institute of Mathematics, Academia Sinica 11, 113.Google Scholar
Lai, T.L. & C.Z. Wei (1983b) Asymptotic properties of general autoregressive models and strong consistency of least-squares estimates of their parameters. Journal of Multivariate Analysis 13, 123.Google Scholar
Lai, T.L. & C.Z. Wei (1985) Asymptotic properties of multivariate weighted sums with applications to stochastic regression in linear dynamic systems. In P.R. Krishnaiah (ed.), Multivariate Analysis, vol. VI, pp. 375393. Elsevier Science Publishers.
Nielsen, B. (2000) The Asymptotic Distribution of Likelihood Ratio Test Statistics for Cointegration in Unstable Vector Autoregressive Processes. Discussion paper, Nuffield College, Oxford.
Nielsen, B. (2001a) The asymptotic distribution of unit root tests of unstable autoregressive processes. Econometrica 69, 211219.Google Scholar
Nielsen, B. (2001b) Weak Consistency of Criterions for Order Determination in a General Vector Autoregression. Discussion paper, Nuffield College, Oxford.
Phillips, P.C.B. (1991) Optimal inference in cointegrated systems. Econometrica 59, 283306.Google Scholar
Pötscher, B.M. (1989) Model selection under nonstationarity: Autoregressive models and stochastic linear regression models. Annals of Statistics 17, 12571274.Google Scholar
Rubin, H. (1950) Consistency of maximum-likelihood estimates in the explosive case. In T.C. Koopmans (ed.), Statistical Inference in Dynamic Economic Models, pp. 356364. Wiley.
Wei, C.Z. (1992) On predictive least squares principles. Annals of Statistics 20, 142.Google Scholar
Wei, C.Z. (1985) Asymptotic properties of least-squares estimates in stochastic regression models. Annals of Statistics 13, 14981508.Google Scholar
Figure 0

Order of pairwise sample correlations, with η > 0 and ξ < γ/(2 + γ)

Figure 1

Order of pairwise sample correlations, with η > 0 and ξ < γ/(2 + γ)