Hostname: page-component-7b9c58cd5d-v2ckm Total loading time: 0 Render date: 2025-03-15T09:34:31.077Z Has data issue: false hasContentIssue false

THE ASYMPTOTIC DISTRIBUTION OF THE COINTEGRATION RANK ESTIMATOR UNDER THE AKAIKE INFORMATION CRITERION

Published online by Cambridge University Press:  01 August 2004

George Kapetanios
Affiliation:
Queen Mary University of London
Rights & Permissions [Opens in a new window]

Abstract

We derive the asymptotic distribution of the estimate of the cointegration rank of a multivariate model when Akaike's information criterion is used. It is shown that the use of this criterion is ill-advised given that the estimate is severely upward biased even asymptotically.I thank the editor, Professor Peter Phillips, and three anonymous referees for comments and suggestions that improved this paper significantly. All remaining errors are my own.

Type
MISCELLANEA
Copyright
© 2004 Cambridge University Press

1. INTRODUCTION

The determination of the cointegration rank of a multivariate cointegrated system has attracted considerable attention in the econometric literature for the past 15 years. The most widely used procedures for determining cointegration rank are those proposed by Johansen (1988). Alternative testing procedures have been suggested by, among others, Phillips and Ouliaris (1988), Stock and Watson (1988), Snell (1999), and Bierens (1997).

In this paper we start by reviewing the formal justification for the application of model selection criteria in selecting the cointegration rank. We note that the standard necessary and sufficient conditions for a criterion to be weakly consistent in lag order selection extend to the determination of the cointegration rank. The main result of the paper involves the derivation of the asymptotic distribution of the cointegrating rank estimate when the inconsistent Akaike information criterion (AIC) is used. Unlike with stationary models, where AIC approximates the Kullback–Leibler distance between the estimated model and the data generation process, there is no compelling theoretical reason for its use in rank selection in nonstationary cointegration models. It is shown that the use of this criterion is ill-advised given that the estimate is severely upward biased even asymptotically. These results point toward the use of other criteria such as the Bayesian information criterion (BIC) and the posterior information criterion (PIC).

2. WEAK CONSISTENCY OF INFORMATION CRITERIA

We assume that the multivariate system may be written as an m-dimensional VAR(k) process given by

where the error term εt is a zero mean independent and identically distributed (i.i.d.) vector with finite positive definite covariance matrix. This VAR(k) process will be referred to as cointegrated of rank r if Π = IΦ1 − ··· − Φk has rank r. In this case the matrix Π may be decomposed as Π = αβ′ where α and β are matrices of dimension m × r. The error correction representation1

We assume that the error correction representation exists by imposing extra conditions such as, e.g., condition (iii) of Chao and Phillips (1999).

of the system is given by

where Ψi, i = 1,…,k − 1 are functions of Φi, i = 1,…,k.

The general form of the loss function minimized by information criteria is given by

where lT(θ) is the log likelihood of the model, s is the number of free parameters, and cT(s) is a penalty term promoting model parsimony depending on s and the sample size. For three common information criteria the penalty terms are as follows: s (Akaike's information criterion) (Akaike, 1973), (s/2)ln(T) (Bayesian information criterion) (Schwarz, 1978), and s ln(ln(T)) (Hannan–Quinn information criterion [HQ]) (Hannan and Quinn, 1979). The model specification chosen is that for which IC(k) is minimized. In lag order selection it is well known that the estimated lag order for stationary and unit root nonstationary vector autoregressive (VAR) models will be weakly consistent iff cT(k)

as T → ∞ and cT(k) is bounded in k where k is the lag order. For a proof of the latter case, for deterministic penalty terms, see Paulsen (1984). Clearly, whereas BIC and HQ are weakly consistent for lag order selection, AIC is not. This is a well-known result for AIC (see, e.g., Shibata, 1976). Note that we choose to have a general expression for the penalty term to accommodate other less widely used criteria such as, e.g., the generalized information criterion (GIC) (see Konishi and Kitagawa, 1996) and the PIC (see Phillips, 1996; Phillips and Ploberger, 1994; Phillips and Ploberger, 1996). Note that whereas AIC, BIC, and HQ have deterministic penalty terms, GIC and PIC have stochastic penalty terms (hence the notation concerning the conditions on the asymptotic behavior of the generic penalty term). These results have been shown by various authors to extend to more general model selection frameworks (see, e.g., Sin and White, 1996; Kapetanios, 2001).

Aznar and Salvador (2000) show that the standard conditions on the penalty terms for weak consistency in lag selection extend to the cointegration framework. In particular, they show that the cointegration rank and the lag order may be jointly weakly consistently estimated iff cT(k) → ∞ and cT(k)/T → 0 as T → ∞. This result holds only for information criteria whose penalty terms are deterministic, and therefore criteria such as the GIC are not covered. We also note that the asymptotic properties of PIC have been discussed in Chao and Phillips (1999) where weak consistency of PIC in jointly estimating cointegration rank and VAR lag order is established. Note that this is the first paper to give a consistency result for estimating cointegration rank via an information criterion.

3. THE ASYMPTOTIC DISTRIBUTION OF THE RANK ESTIMATE USING AKAIKE'S INFORMATION CRITERION

Following Pesaran and Pesaran (1997) we specify the number of free parameters for a multivariate cointegrated model with no intercept or time trend to be equal to s = m2(k0 − 1) + 2mrr2 where k0 is the true lag order of the system. The penalty terms for AIC, BIC, and HQ are then given, respectively, by s, (s/2)ln(T), and s ln(ln(T)) or equivalently by

where

.

It is clear that AIC is not consistent in rank determination. Nevertheless it is also clear that the probability of picking a rank that is lower than the true rank goes to zero asymptotically, as we also show in the Appendix. The following theorem provides the means for determining the asymptotic probabilities that AIC will pick a higher rank than the true one.

THEOREM. Consider the VAR model of (1) with μ = 0 and known k0. The asymptotic distribution of the rank estimate obtained through AIC is given by

where pr are given by expression (A.7) in the Appendix and r0 is the true rank of the model.

Note that we assume a known true lag order for the derivation of the asymptotic distribution of AIC. Unlike the result of Aznar and Salvador (2000) on the joint determination of lag order and cointegration rank, allowing for an unknown lag order in this context would obviously change the asymptotic distribution of the cointegration rank estimated using AIC.

The asymptotic distribution of the estimate of the cointegration rank depends only on d = mr0. Tables 1 and 2 show the distribution of the cointegration rank estimate for the case of no deterministic terms and the case of an unrestricted constant obtained through simulation. Brownian motion is simulated using a random walk of 1,000 observations. Five thousand replications have been used.

Asymptotic distribution of cointegration rank estimate under AIC when the model contains no deterministic terms

Asymptotic distribution of cointegration rank estimate under AIC when the model contains an unrestricted constant

In the standard case of lag order selection, the asymptotic probability of AIC picking a lag order larger than the true one is quite small and declines rapidly for higher lag orders. The results for the cointegration rank estimate do not follow this pattern. The probabilities of overestimation are quite large and depend crucially on the nature of the deterministic terms included in the model. In most cases the true rank is not even the mode of the asymptotic distribution of the estimate. As mr0 rises, the problem is further accentuated. In the extreme case considered in the tables, when mr0 = 9 and a constant is included in the model, the probability of picking the right rank is equal to just 1.9%.

The motivation behind the derivation of AIC is not consistency in the selection of the true model but optimization in terms of goodness of the selected model as measured by the Kullback–Leibler information metric.2

Note also the efficiency property of AIC in terms of selecting the model order for linear models, discussed by Shibata (1980).

Therefore, our result does not necessarily imply that the criterion is in general “bad” in selecting cointegration rank because such a judgment would have to be related to a particular modeling purpose. Nevertheless, the optimality properties of AIC hold for stationary models. Currently, there is no compelling theoretical reason motivating the use of AIC for rank determination in cointegration models. Furthermore, we can provide some evidence in favor of methods that are parsimonious in cointegration rank selection such as BIC and PIC. In terms of forecasting and over long horizons, error correction models have been in general shown to have an advantage. However, Christoffersen and Diebold (1998) cast doubt on the notion that error correction models are better forecasting tools even at long horizons, at least with respect to the standard root mean square forecasting error criterion. They also argue that although unit roots are estimated consistently, modeling nonstationary series in (log) levels is likely to produce forecasts that are suboptimal in finite samples relative to a procedure that imposes unit roots, a phenomenon exacerbated by small sample estimation bias. Developing this argument, they suggest that for cointegrated series it is better to overestimate rather than underestimate the number of common trends, or in other words, underestimate the cointegrating rank.

4. CONCLUSION

In this paper the asymptotic distribution of the rank estimator of a cointegrated model using AIC has been derived. The results are rather critical of the Akaike criterion in this context and point toward the use of other criteria such as BIC and PIC. The AIC estimate severely overestimates the rank. The overestimation is accentuated by the presence of deterministic terms in the model and by the magnitude of the difference between the true rank and the dimension of the model.

APPENDIX

The proof of the theorem requires some of the results derived by Johansen (1988). To simplify matters we will assume that the VAR process has zero mean and that no constant is included in the estimation. Extension to models with deterministic terms is straightforward. We will denote the loss function used by AIC by

where lT(r) is the log likelihood of the model for cointegration rank, r. Let

Finally, let G be the lower triangular Cholesky decomposition of

be the eigenvalues of GS10 S00−1S01 G′. Then by, say, Proposition 11.1 of Lütkepohl (1991) the difference in the log likelihood of the cointegrated VAR(k0) for cointegration ranks r1, r0, r1 > r0 is equal to

. For r0 < r1 < r0 it is clear that

, showing that Akaike's criterion will not pick a rank lower than the true one asymptotically in probability.

For r > r0 we first note that

by Lemma 4 of Johansen (1988). By a simple expansion we then have that

. But by Lemma 6 of Johansen (1988) we have that

converge in distribution to the ordered eigenvalues of the equation

denoted by λ1,…,λmr0, where W is a mr0 standard Brownian motion.

We now concentrate on deriving the probabilities for rr0. For rr0 we have that

is asymptotically equivalent to P(AIC(r) ≤ AIC(u), r0um). The asymptotic equivalence follows by the fact that the criterion will not pick a rank lower than r0 asymptotically. Clearly, this may be the case in finite samples. Disregarding constant terms with respect to r, the log likelihood is given by

. Then,

But

. Therefore, for any ε > 0 there exists a positive integer M such that for all T larger than M the difference between the probability on the right-hand side of (A.1) and

is less than ε. But the probability in (A.2) may be written as

or

By a change of indices and the weak convergence of

we get that, asymptotically, the preceding probability is equivalent to

where i′ = iu, j′ = jr, r′ = rr0, and u′ = ur0. Regrouping terms gives

where d = mr0. Define

Then, the probability in (A.6) may be expressed as

The joint probability distribution of Slq may easily be obtained by simulation using the standard results of Johansen (1988). █

References

REFERENCES

Akaike, H. (1973) Information theory and an extension of the maximum likelihood principle. In B.N. Petrov and F. Csaki (eds.), 2nd International Symposium on Information Theory, pp. 267281. Akademiai Kiado.
Aznar, A. & M. Salvador (2000) Selecting the Rank of the Cointegration Space and the Form of the Intercept Using an Information Criterion. Econometric Theory 18, 926947.Google Scholar
Bierens, H. (1997) Nonparametric cointegration analysis. Journal of Econometrics 77, 379404.Google Scholar
Chao, J. & P.C.B. Phillips (1999) Model selection in partially nonstationary vector autoregressive processes with reduced rank structure. Journal of Econometrics 91, 227272.Google Scholar
Christoffersen, P.F. & F.X. Diebold (1998) Cointegration and long horizon forecasting. Journal of Business and Economic Statistics 16, 450458.Google Scholar
Hannan, E.J. & B.G. Quinn (1979) The determination of the order of an autoregression. Journal of the Royal Statistical Society, Series B 41, 190195.Google Scholar
Johansen, S. (1988) Statistical analysis of cointegration vectors. Journal of Economic Dynamics and Control 12, 231254.Google Scholar
Kapetanios, G. (2001) Model selection in threshold models. Journal of Time Series Analysis 22, 733754.Google Scholar
Konishi, S. & G. Kitagawa (1996) Generalised information criteria in model selection. Biometrika 83, 875890.Google Scholar
Lütkepohl, H. (1991) Introduction to Multiple Time Series Analysis. Springer.
Paulsen, J. (1984) Order determination of multivariate autoregressive time series with unit roots. Journal of Time Series Analysis 5, 115127.Google Scholar
Pesaran, M.H. & B. Pesaran (1997) Microfit 4.0. Oxford University Press.
Phillips, P.C.B. (1996) Econometric model determination. Econometrica 64, 763812.Google Scholar
Phillips, P.C.B. & S. Ouliaris (1988) Testing for cointegration using principal components methods. Journal of Economic Dynamics and Control 12, 205230.Google Scholar
Phillips, P.C.B. & W. Ploberger (1994) Posterior odds testing for a unit root with data-based model selection. Econometric Theory 10, 774808.Google Scholar
Phillips, P.C.B. & W. Ploberger (1996) An asymptotic theory of Bayesian inference for time series. Econometrica 63, 381412.Google Scholar
Schwarz, G. (1978) Estimating the dimension of a model. Annals of Statistics 6, 461464.Google Scholar
Shibata, R. (1976) Selection of the order of an autoregressive model by Akaike's information criterion. Biometrika 63, 117126.Google Scholar
Shibata, R. (1980) Asymptotically efficient selection of the order of the model for estimating parameters of a linear process. Annals of Statistics 8, 147164.Google Scholar
Sin, C.Y. & H. White (1996) Information criteria for selecting possibly misspecified parametric models. Journal of Econometrics 71, 207225.Google Scholar
Snell, A. (1999) Testing for r versus r − 1 cointegrating vectors. Journal of Econometrics 88, 151191.Google Scholar
Stock, J. & M. Watson (1988) Testing for common trends. Journal of American Statistical Association 83, 10971107.Google Scholar
Figure 0

Asymptotic distribution of cointegration rank estimate under AIC when the model contains no deterministic terms

Figure 1

Asymptotic distribution of cointegration rank estimate under AIC when the model contains an unrestricted constant