BREAK DATE ESTIMATION FOR VAR PROCESSES WITH LEVEL SHIFT WITH AN APPLICATION TO COINTEGRATION TESTING

Pentti Saikkonen; Helmut Lütkepohl; Carsten Trenkler

doi:10.1017/S0266466606060026

BREAK DATE ESTIMATION FOR VAR PROCESSES WITH LEVEL SHIFT WITH AN APPLICATION TO COINTEGRATION TESTING

Published online by Cambridge University Press: 12 December 2005

Pentti Saikkonen ,

Helmut Lütkepohl and

Carsten Trenkler

Show author details

Pentti Saikkonen: Affiliation:
University of Helsinki
Helmut Lütkepohl: Affiliation:
European University Institute, Florence and Humboldt University Berlin
Carsten Trenkler: Affiliation:
Humboldt University Berlin

Article contents

Abstract
1. INTRODUCTION
2. THE DATA GENERATION PROCESS
3. SHIFT DATE ESTIMATION
4. TESTING THE COINTEGRATING RANK
5. MONTE CARLO SIMULATIONS
6. CONCLUSIONS
APPENDIX: Proofs
References

Rights & Permissions

Abstract

In testing for the cointegrating rank of a vector autoregressive process it is important to take into account level shifts that have occurred in the sample period. Therefore the properties of estimators of the time period where a shift has taken place are investigated. The possible structural break is modeled as a simple shift in the level of the process. Two alternative estimators for the break date are considered, and their asymptotic properties are derived under various assumptions regarding the size of the shift. In particular, properties of the shift date estimators are obtained under the assumption of an increasing or decreasing size of the shift when the sample size grows. These results are used to explore the implications for testing the cointegrating rank of the process. A previously proposed likelihood ratio type test for the cointegrating rank and a modified version are considered, and their asymptotic properties are derived. It is shown that their asymptotic null distributions are unaffected by the level shift under the assumptions made for the shift size. The performance of the shift date estimators and the cointegrating rank tests in small samples is investigated by simulations.We thank two referees for helpful comments, and we are grateful to the Deutsche Forschungsgemeinschaft, SFB 373, and the European Commission under the Training and Mobility of Researchers Programme (contract ERBFMRXCT980213) for financial support. The first author also acknowledges financial support by the Yrjö Jahnsson Foundation, the Academy of Finland, and the Alexander von Humboldt Foundation under a Humboldt research award. Part of this research was done while he was visiting the Humboldt University in Berlin, and part of the research was carried out while he and the third author were visiting the European University Institute, Florence. An extended version of this paper is available as an EUI discussion paper under the title “Break Date Estimation and Cointegration Testing in VAR Processes with Level Shift,” ECO 2004/21.

Type: Research Article
Information: Econometric Theory , Volume 22 , Issue 1 , February 2006 , pp. 15 - 68

DOI: https://doi.org/10.1017/S0266466606060026 [Opens in a new window]
Copyright: © 2006 Cambridge University Press

1. INTRODUCTION

From the unit root and cointegration testing literature it is well known that structural shifts in the time series of interest have a major impact on inference procedures. In particular, they affect the small-sample and asymptotic properties of unit root and cointegrating rank tests (see, e.g., Perron, 1989 for unit root testing; Lütkepohl, Saikkonen, and Trenkler, 2004, for cointegrating rank testing). In the latter article it is assumed that a level shift has occurred in a system of time series variables at an unknown time. Lütkepohl et al. propose to estimate the shift date in a first step and then apply a cointegrating rank test as follows. First the parameters of the deterministic part of the data generation process (DGP) are estimated by a feasible generalized least squares (GLS) procedure. Using these estimators, the original series is adjusted for deterministic terms including the structural shift, and a cointegrating rank test of the Johansen likelihood ratio (LR) type is applied to the adjusted series. They provide conditions under which the asymptotic null distribution of the cointegrating rank test in this procedure is unaffected by the level shift. They also show, however, that in small samples the way the break date is estimated may have an impact on the actual properties of the cointegrating rank test. In addition, the size of the level shift is important for the small-sample properties of the break date estimators and the tests.

Therefore, in this study we extend the results of Lütkepohl et al. (2004) in several directions. First of all we also consider another possible break date estimator. Second, we derive asymptotic properties of two break date estimators accounting explicitly for the size of the level shift. More precisely, we make the size of the level shift dependent on the sample size and provide asymptotic results for both increasing and decreasing shift sizes when the sample size goes to infinity. These results provide interesting new insights into the properties of the estimators and explain simulation results of Lütkepohl et al. that are difficult to understand if a fixed shift size is considered. Under our assumptions the null distribution of the cointegrating rank tests is still unaffected by the shift or the shift size just as in the case of a fixed shift size. We also modify the cointegrating rank tests considered by Lütkepohl et al. In their approach estimators of all parameters associated with the deterministic part of the model are estimated by the GLS procedure although the level parameters are not fully identified. In this paper we propose to estimate the identified parameters only and modify the cointegrating rank tests accordingly. Finally, we perform a more detailed and more insightful investigation of the small-sample properties of the break date estimators and the resulting cointegrating rank tests by extending the simulation design of Lütkepohl et al.

Estimating the break date in a system of I(1) variables has also been considered by Bai, Lumsdaine, and Stock (1998). These authors consider the asymptotic distribution of a pseudo maximum likelihood (ML) estimator of the break date. Although we use a similar estimator, we do not derive the asymptotic distribution of the estimators but focus on rates of convergence. Our results are important for investigating the properties of inference procedures such as cointegrating rank tests that are based on a vector autoregressive (VAR) model with estimated break date. Although Bai et al. (1998) also discuss shift sizes that depend on the sample size, our results go beyond their analysis because we consider increasing in addition to deincreasing shift sizes.

The study is structured as follows. In Section 2, the modeling framework of Lütkepohl et al. (2004) is summarized because that will be the basis for our investigation. Section 3 is devoted to a discussion of the break date estimators and their asymptotic properties. The properties of cointegrating rank tests based on a model with estimated break date are considered in Section 4, and small-sample simulation results of the break date estimators and the cointegrating rank tests are presented in Section 5. In Section 6, a summary and conclusions are given. The proofs of several theorems stated in the main body of the paper are given in the Appendix.

The following general notation will be used. The differencing and lag operators are denoted by Δ and L, respectively. The symbol I(d) denotes an integrated process of order d, that is, the stochastic part of the process is stationary or asymptotically stationary after differencing d times whereas it is still nonstationary after differencing just d − 1 times. Convergence in distribution is signified by

, and i.i.d. stands for independently, identically distributed. The symbols for boundedness and convergence in probability are as usual O_p(·) and o_p(·), respectively. Moreover, ∥·∥ denotes the euclidean norm. The trace, determinant, and rank of the matrix A are denoted by tr(A), det(A), and rk(A), respectively. If A is an (n × m) matrix of full column rank (n > m), we denote an orthogonal complement by A_⊥. The zero matrix is the orthogonal complement of a nonsingular square matrix, and an identity matrix of suitable dimension is the orthogonal complement of a zero matrix. An (n × n) identity matrix is denoted by I_n. For matrices A₁,…,A_s, diag[A₁ : ··· : A_s] is the block-diagonal matrix with A₁,…,A_s on the diagonal. LS, RR, and VECM are used to abbreviate least squares, reduced rank, and vector error correction model, respectively. As usual, a sum is defined to be zero if the lower bound of the summation index exceeds the upper bound.

2. THE DATA GENERATION PROCESS

We use the general setup of Lütkepohl et al. (2004). Hence, y_t = (y_1t,…,y_nt)′ (t = 1,…,T) is assumed to be generated by a process with constant, linear trend, and level shift terms,

Here μ_i (i = 0,1) and δ are unknown (n × 1) parameter vectors and d_tτ is a shift dummy variable representing a shift in period τ so that

We make the following assumption for the shift date τ.

Assumption 1. Let λ, λ, and λ be fixed real numbers such that 0 < λ ≤ λ ≤ λ < 1. The shift date τ satisfies

where [·] denotes the integer part of the argument.

In other words, the shift is assumed to occur at a fixed fraction of the sample length. The shift date may not be at the very beginning or at the very end of the sample, although λ and λ may be arbitrarily close to zero and one, respectively. The condition has also been employed by Bai et al. (1998) in models containing I(1) variables. It is obviously not very restrictive.

The term μ₁ t may be dropped from (2.1) if μ₁ = 0 is known to hold and, thus, the DGP does not have a deterministic linear trend. The necessary adjustments in the following analysis are straightforward, and we will comment on this situation as we go along. Also, seasonal dummies may be added without major changes to our arguments. They are not included in our basic model to avoid more complex notation.

The process x_t is assumed to be at most I(1) and to have a VAR(p) representation. More precisely, we make the following assumption.

Assumption 2. The process x_t is integrated of order at most I(1) with cointegrating rank r and

where the A_j are (n × n) coefficient matrices. The initial values x_t, t ≤ 0, are assumed to be such that the cointegration relations and Δx_t are stationary. The ε_t are i.i.d.(0,Ω) with positive definite covariance matrix Ω and existing moments of order b > 4.

Under Assumption 2, the process x_t has the VECM form

where Π = −(I_n − A₁ − ··· − A_p) and Γ_j = −(A_j+1 + ··· + A_p) (j = 1,…,p − 1) are (n × n) matrices. Because the cointegrating rank is r, the matrix Π can be written as Π = αβ′, where α and β are (n × r) matrices of full column rank. As is well known, β′x_t and Δx_t are then zero mean I(0) processes. Defining

, we have

where ξ_t is an I(0) process. These properties follow from Granger's representation theorem. Further details including a precise expression of ξ_t are given in Johansen (1995, pp. 49–52).

Multiplying (2.1) by A(L) = I_n − A₁ L − ··· − A_p L^p = I_n Δ − ΠL − Γ₁ ΔL − ··· − Γ_p−1 ΔL^p−1 yields

where ν = −Πμ₀ + Ψμ₁, φ = β′μ₁, θ = β′δ, γ₀* = δ, and γ_j* = −Γ_jδ for j = 1,…,p − 1. The quantity Δd_t−j,τ is an impulse dummy with value one in period t = τ + j and zero elsewhere.

For given values of the VAR order p and the shift date τ, Johansen type cointegration tests can be performed in our model framework. In the next section we will discuss two different estimators of the break date in detail, and then we will consider cointegration tests based on a model with estimated break date in Section 4.

3. SHIFT DATE ESTIMATION

In the following discussion we consider two different estimators of the shift date τ. The first one is based on estimating an unrestricted VAR model in which the cointegrating rank and the restrictions for the parameters related to the impulse dummies are not taken into account. The latter restrictions are accounted for by the second estimator. At the end of this section we briefly mention a third possible estimator and some of its properties. For all procedures we assume that the VAR order p is given or has been chosen by some statistical procedure in a previous step. For the time being it is assumed to be known.

3.1. Estimator Based on Unrestricted Model

As discussed previously, our first estimator of τ is based on the model

which is obtained from (2.7) by imposing no rank restriction on Π and rearranging terms. Here ν₀ = ν + Πμ₁, ν₁ = −Πμ₁, δ₁ = −Πδ, γ₀ = δ − δ₁, γ_j = γ_j* (j = 1,…,p − 1), and T is the sample size. The shift date is estimated as

where the

are LS residuals from (3.1) and

is the set of all shift dates considered. Notice that

cannot include all sample periods if Assumption 1 is made. Moreover, there may be nonsample information regarding the possible shift dates that makes it desirable to limit the search to a specific part of the sample period.

Instead of using the determinant of the residual covariance matrix as a criterion function for estimating the break date, one could consider other criteria such as the trace. We have chosen the determinant because it is in line with the Gaussian ML setup (for unknown cointegration rank), which can be viewed as the motivation for the LS estimator of the other parameters. Note, however, that we do not assume y_t to be Gaussian.

We assume that the size of the shift depends on the sample size and may increase or decrease when the sample size gets larger. More precisely, we make the following assumption for the parameter δ.

Assumption 3. For some fixed (n × 1) vector δ_*, δ = δ_T = T^aδ_*, a ≤ ½.

Thus, we allow for a decreasing, constant, or increasing shift size with growing sample size, depending on a being smaller, equal to, or greater than zero, respectively. In most cases there will be no need to use the subscript T, and so the notation δ will usually be used instead of δ_T. The same convention applies to parameters depending on δ (e.g., δ₁) and their estimators. As mentioned earlier, break date estimation when the shift size decreases with increasing sample size has also been discussed by other authors (Bai et al., 1998). For our purposes a lower bound for a is not needed because for a small shift size the break has no effect on the cointegration tests that will be considered later, even though the break date may be more difficult to estimate in that situation. An increasing shift size is treated here for completeness, and it turns out that it provides interesting insights into the actual behavior of our shift date estimators, as will be seen in the simulations in Section 5. Moreover, letting the shift size increase with the sample size may provide information on problems related to large shifts. In particular, it is of interest to check whether large shifts may affect the asymptotic distribution of the cointegrating rank tests discussed in Section 4. The upper bound a = ½ for the rate of increase of the shift size is chosen for technical reasons because we need this bound in our proofs. From a practical point of view such a bound should not be a problem because there may not be a need to estimate the shift date by formal statistical methods if the shift size is very large. We can now present asymptotic properties of our estimator

that generalize results presented in Lütkepohl et al. (2004).

THEOREM 3.1. Suppose Assumptions 1–3 hold.

For δ₁ ≠ 0 and a = 0, Lütkepohl et al. (2004) have shown that

, which is obviously a special case of our theorem. In fact, Theorem 3.1(i) shows that when the size of the break is sufficiently large, that is, a > 1/b or a > 0 and δ₁ ≠ 0, the break date can be estimated accurately. More precisely, asymptotically the break date can then be located at the true break date or just a few time points before the true break date. Estimating the break date larger than the true one cannot occur in large samples. However, consistent estimation of the break date is not possible without an additional assumption for the parameters related to the impulse dummies in model (3.1). The required assumption γ_p−1 ≠ 0 can be seen as an identification condition for the break date. Indeed, if γ_p−1 = 0 and γ_p−2 ≠ 0, Theorem 3.1(i) only tells us that asymptotically the break date estimator will take a value that is either the true break date or the preceding time point. The intuition for this is that one of the p − 1 impulse dummies in (3.1) can be used to allow for such an incorrect estimation of the break date. In this case, even if we choose a break date one smaller than the true one we can still obtain a correct model specification with white noise errors. A similar situation occurs when more than one of the parameters γ_i at the largest lags are zero. Notice also that γ_j = 0 for all j = 0,…,p − 1 can only occur if δ₁ ≠ 0 because δ ≠ 0 and γ₀ = δ − δ₁.

The preceding discussion implies that an overspecification of the VAR order will always make the break date estimator

inconsistent. This observation explains some of the small-sample results of Lütkepohl et al. (2004). These authors fitted VAR(3) models to VAR(1) DGPs and found that

often underestimated the true break date. In principle the same phenomenon can occur also in other situations where γ_p−1 = 0. However, because γ₀ is always nonzero when δ ≠ 0 (and p ≥ 1) reasoning similar to that used previously explains why the break date will asymptotically not be estimated larger than the true one.

The second part of Theorem 3.1 deals with the asymptotic behavior of the estimator

when the size of the break is “small.” In this case we need to assume that δ₁ ≠ 0 or that there is actually a level shift in model (3.1) and not just some exceptional observations that can be handled with impulse dummies. This assumption is not needed in the first part of the theorem where the size of the break is “large” (a > 1/b) because then even the impulse dummies can be used to estimate the break date accurately. However, even though consistent estimation of the break date is not possible in the case of Theorem 3.1(ii), consistent estimation of the sample fraction λ is still possible provided the size of the break is not “too small.” The result obtained in this context is weaker than its previous counterparts in Bai (1994), which, instead of a > η − ½, only require a > −½ (see, e.g., Proposition 3 of Bai, 1994). Complications caused by the presence of impulse dummies in model (3.1) are the reason for our weaker result. In any case, our assumption a > η − ½ is equivalent to −2a/(1 − 2η) < 1, which is clearly not very restrictive because

cannot be larger than T and is hence necessarily O_p(T).

As mentioned in the introduction, Bai et al. (1998) considered the asymptotic distribution of the break date and found that the resulting interval estimator for the break date depends on the dimension of the system under consideration. Such dependence on the dimension of the model is not obtained with our approach, which provides orders of convergence only.

3.2. Constrained Estimation of τ

We shall now consider the constrained estimation of the break date in which the restrictions between the autoregressive parameters and coefficients related to the dummies are taken into account. Instead of (3.1) it is now convenient to start with the specification

where δ₁ = −Πδ, as before, and the γ_j* are as in (2.7). Thus, we can write (3.3) as

Unlike in the unrestricted model (3.1), the impulse dummies do not appear separately anymore in the representation (3.4) but are included in the term that also involves the shift dummy. Thus, only a single parameter vector δ is associated with all the dummy variables. Consequently, the break date can be estimated more precisely, as we will see in the next theorem.

For any given value of the break date τ the parameters ν₀, ν₁, δ, Π, and Γ₁,…,Γ_p−1 can be estimated from (3.4) by nonlinear LS. The estimator of the break date is then obtained by minimizing an analog of (3.2) with

replaced by residuals from this nonlinear LS estimation. The following theorem presents asymptotic properties of this break date estimator denoted by

THEOREM 3.2. Let Assumptions 1–3 hold and suppose that δ ≠ 0.

(i) If a > 0 and δ₁ ≠ 0 or a > 1/b, then

(ii) If a ≤ 0 and δ₁ ≠ 0, then

, where 1/b < η < ¼.

The first part of the theorem shows that taking the restrictions into account is beneficial. Unlike in Theorem 3.1(i) consistency now obtains without any additional assumptions about coefficients. The second part of the theorem, which deals with the case of a “small” break size, is similar to its previous counterpart, however.

As a final remark on our two break date estimators we mention that, if the DGP is known to have no deterministic linear trend (μ₁ = 0), the corresponding terms in (3.1), (3.3), and (3.4) may be dropped without changing the convergence rates of our break date estimators.

3.3. Ignoring Dummies in Estimating τ

Lütkepohl et al. (2004) also considered estimating the break date based on the VAR model (3.1) without including the impulse dummies. Thus the resulting break date estimator, say,

, is actually based on a misspecified model. In the present model framework, where the shift size depends on the sample size, it can in fact be shown that the estimator

works well, provided δ₁ ≠ 0. More precisely, for

, and for

, where η > 0 (for details see Saikkonen, Lütkepohl, and Trenkler, 2004). Thus, although

is based on a misspecified model, its convergence rate is equally as good as that of the other two estimators, provided δ₁ ≠ 0. Clearly, δ₁ = −αβ′δ = 0 may hold even if δ ≠ 0. In fact, δ₁ = 0 always holds if the cointegrating rank is zero. If δ₁ = 0, there is co-breaking, and the process β′y_t has no break. For such processes,

can find the shift date only by chance, whereas

can still find the true break date with some likelihood in large samples, if the shift size is large. Thus, using only the estimator

may be problematic, unless the case δ₁ = 0 can be ruled out. In the next section we consider the consequences of using a model with estimated break date for testing the cointegrating rank of a system of time series variables.

4. TESTING THE COINTEGRATING RANK

For given VAR order p and some estimator of the shift date, the cointegrating rank of the DGP can be tested as discussed by Lütkepohl et al. (2004). In the following discussion it is assumed that the break date estimator is either

. The objective is to test a pair of hypotheses

Lütkepohl et al. propose using the tests suggested by Saikkonen and Lütkepohl (2000a). In their procedure, first-stage estimators for the parameters of the error process x_t, that is, for α, β, Γ_j (j = 1,…,p − 1), and Ω are determined by RR regression applied to (2.7). Using these estimators, Lütkepohl et al. apply a feasible GLS procedure to (2.1) to estimate all the parameters of the deterministic part. The observations are then adjusted for deterministic terms, and LR type cointegration tests can be formed in the usual way by solving the related generalized eigenvalue problem based on the adjusted series (for details see Johansen, 1995, Thm. 6.3). The resulting test statistic will be denoted by LR^GLS(r₀) in the following discussion.

The levels parameter μ₀ is not identified in the direction of β_⊥ in our model setup, and its estimator is partly determined by the initial values in the procedure underlying the LR^GLS test. In fact, the dependence of the LR^GLS test on initial values was sometimes found to be relevant in preliminary simulations. A detailed theoretical analysis of the impact of initial values on related unit root tests is provided by Müller and Elliott (2003). Given the dependence of the LR^GLS tests on initial values, one may hope to improve the performance of the tests by avoiding the estimation of μ₀. Therefore we shall also consider another approach in which only the parameters μ₁ and δ in the deterministic part are estimated. The effect of the level parameter will be taken into account when the test is performed.

We present the estimation procedure of the parameters μ₁ and δ for a given VAR order p, cointegration rank r, and break date τ. First consider the estimation of the parameter μ₁. Recall the identity ν = −Πμ₀ + Ψμ₁, which can be written as

or, more briefly,

where φ = β′μ₁, φ_* = β_⊥′ μ₁, Ψ_β = Ψβ(β′β)⁻¹, and Ψ_{β_⊥} = Ψβ_⊥(β_⊥′ β_⊥)⁻¹. Because α_⊥′Π = α_⊥′αβ′ = 0, a multiplication of this identity from the left by α_⊥′ yields α_⊥′(ν − Ψ_βφ) = α_⊥′Ψ_{β_⊥}φ_*. The matrix α_⊥′Ψ_{β_⊥} is nonsingular, and its inverse is (α_⊥′Ψ_{β_⊥})⁻¹ = β_⊥′ β_⊥(α_⊥′Ψβ_⊥)⁻¹. Thus, we can solve for φ_* as follows: φ_* = β_⊥′C(ν − Ψ_βφ), where C = β_⊥(α_⊥′Ψβ_⊥)⁻¹α_⊥′ as before. Thus, if

are sample analogs of C and Ψ_β, respectively, based on the RR estimation of (2.7), an estimator of φ_* is given by

Here

are also based on the RR estimation of (2.7). Using the estimators

together we can form an estimator for μ₁ as

The parameter δ can be estimated in a similar way. From the definitions we find that

Multiplying this equation from the left by the matrix [α_⊥′ : ··· : α_⊥′] yields

where θ_* = β_⊥′δ and θ = β′δ as in (2.7). From the foregoing equation we can solve for θ_* in the same way as for φ_*. The result is

, from which we form an estimator for θ_* as

Here

are again based on the RR estimation of (2.7). Thus, an estimator of δ is obtained as

The test will be based on the series

which are adjusted for the deterministic trend and the shift term. Thus, apart from estimation errors we have

. This suggests that we can base a test on this approximation or on the auxiliary model

where

is defined by adding an extra column to the matrix Π in (2.5). This auxiliary model can be treated as a true model, and a LR test statistic for a specified cointegrating rank can be formed by solving the related generalized eigenvalue problem, as before. We will denote the LR statistic for the null hypothesis rk(Π) = r₀ by LR^PAR(r₀) in the following discussion because only a partial set of parameters associated with the deterministic part is estimated in the first step. Its limiting distribution differs from that of LR^GLS(r₀) and also from the one given in Theorem 6.3 of Johansen (1995) for the corresponding LR test statistic. We have the following result for the case where the shift occurs in the cointegrating relations (δ₁ ≠ 0) and the shift size increases with the sample size. A proof is also given in the Appendix.

THEOREM 4.1. Suppose Assumptions 1–3 hold. If δ₁ ≠ 0, 0 < a < ½, and H₀(r₀) is true,

where B_*(s) = B(s) − sB(1) is an (n − r₀)-dimensional Brownian bridge, B₊(s) = [B_*(s)′,1]′, and dB_*(s) = dB(s) − dsB(1), that is,

abbreviates

, for example.

Several remarks are worth making regarding this theorem. First, a similar result for their break date estimators and cointegrating rank test was obtained by Lütkepohl et al. (2004) under more restrictive assumptions regarding the break size. The limiting distribution of LR^GLS(r₀) in Theorem 4.1 is the same as its earlier counterpart in Lütkepohl et al., whereas the limiting distribution of LR^PAR(r₀) differs in that the process B₊(s) appears in place of the Brownian bridge B_*(s). The reason is of course that an intercept term is included in the auxiliary model on which LR^PAR(r₀) is based. On the other hand, the limiting distribution of LR^PAR(r₀) is formally similar to its counterpart in Theorem 6.3 of Johansen (1995), where a standard Brownian motion appears in place of the Brownian bridge in our Theorem 4.1. Notice that the term

consists of two components. The first one is

and the second one is

Second, in the case without trend in the model, that is, μ₁ = 0 a priori and hence

, the processes B₊(s) and B_*(s) in the limiting distributions in Theorem 4.1 can be replaced by [B(s)′,1]′ and B(s), respectively. Then the limiting distribution of the test statistic LR^PAR(r₀) is the same as the limiting distribution of the corresponding LR test statistic in Theorem 6.3 of Johansen (1995). This result can be proved by making appropriate modifications to the proof of Theorem 4.1 in the Appendix. Moreover, in this case the limiting distribution of LR^GLS(r₀) is the same as that of an LR test based on a model without any deterministic terms.

Third, from the proof of Theorem 4.1 it is apparent that the same limiting distributions are obtained if the shift date is assumed known or if it is known that there is no shift in the process. In the latter case δ = 0 and only μ₀ and μ₁ are estimated in the first step leading to LR^GLS(r₀), whereas only μ₁ is estimated in the first step of the LR^PAR(r₀) procedure. Thus, in our framework, including a shift dummy in the model and estimating its coefficients and the shift date as described in the foregoing discussion has no effect on the limiting distributions of the cointegration tests. The same result was obtained by Lütkepohl et al. (2004) for LR^GLS(r₀) in a more limited model framework. It may be worth emphasizing that such a result will not be obtained if instead of our estimation procedures for the deterministic parameters, the Johansen (1995) ML approach is applied to a model with estimated shift date (see also Johansen, Mosconi, and Nielsen, 2000, for a discussion of the case when the break date is known).

Extensions of our results in different directions are conceivable. In particular, limiting results as in Theorem 4.1 can also be obtained under other assumptions for the shift size. For example, if δ₁ ≠ 0, the theorem holds more generally for a < ½. In particular, it holds for a = 0, where the shift size does not depend on the sample size, and for a < 0, where the shift size decreases with increasing sample size. In fact, a = ½ is the only case where a different result for the limiting distributions of the cointegration tests may be obtained. To get the same distributions as in Theorem 4.1, we then need the additional assumption that the break date estimator is consistent. This condition is satisfied for

but requires further assumptions for

(see Theorem 3.1(i)). Proofs for other assumptions regarding the shift are not given here because they require a separate treatment of different cases, which complicates the presentation. For details see, however, the discussion paper version of this article (Saikkonen et al., 2004). We have treated the case where the shift actually occurs in the cointegrating relations and the shift size may be large because in this case our theory can help to explain some simulation results of Lütkepohl et al. (2004), as we will see in Section 5.

It also seems likely that our results can be extended by including more than one shift dummy or other dummy variables in model (2.1). In fact, an additional impulse dummy and seasonal dummies were considered by Saikkonen and Lütkepohl (2000a). The result in Theorem 4.1 remains valid with additional dummies if the corresponding shift dates are known and the parameters of the additional deterministic terms are estimated in a similar way as μ₁ or δ. If the dates of further shifts are unknown, it may be more difficult to construct suitable shift date estimators. This issue may be an interesting project for future research.

An extension of our framework to the case where a break occurs not only in the levels of the series but also in the trend slopes may be desirable for applied work. However, such an extension is not straightforward, and the limiting distribution of the cointegrating rank tests is likely to be affected by the break date in this case.

To apply the cointegrating rank tests we need critical values for the second limiting distribution in Theorem 4.1. The limiting distribution of LR^GLS(r₀) is the same as in Lütkepohl et al. (2004), and critical values are available in Lütkepohl and Saikkonen (2000, Table 1). The second limiting distribution in Theorem 4.1 is simulated numerically by approximating the standard Brownian motions with T-step random walks of dimension n − r₀, as in Johansen (1995, Sect. 15.1). The percentiles in Table 1 are based on sample lengths of T = 1,000 using independent standard normal variates for the error terms and 100,000 replications of the simulation experiment. The computations are done with GAUSS V5.

Percentiles of limiting distribution of LRPAR(r0)

In the next section we will discuss small-sample properties of the break date estimators and cointegration tests.

5. MONTE CARLO SIMULATIONS

A Monte Carlo experiment was performed to compare our break date estimators and to explore the finite-sample properties of the corresponding cointegration test procedures. The simulations are based on the following x_t process from Toda (1994), which was also used by a number of other authors for investigating the properties of cointegrating rank tests (see, e.g., Hubrich, Lütkepohl, and Saikkonen, 2001):

where ψ = diag(ψ₁,…,ψ_r) and Θ are (r × r) and (r × (n − r)) matrices, respectively. As shown by Toda, this type of process is useful for investigating the properties of LR tests for the cointegrating rank because other cointegrated VAR(1) processes of interest can be obtained from (5.1) by linear transformations that leave such tests invariant. Obviously, if |ψ_i| < 1 (i = 1,…,r) we have r stationary series, and, thus, the cointegrating rank is equal to r. Hence, Θ describes the contemporaneous error term correlation between the stationary and nonstationary components. We have used three- and four-dimensional processes for simulations and report some of the results in more detail here. For given VAR order p and break date τ, the test results are invariant to the parameter values of the constant and trend because we allow for a linear trend in our tests. Therefore we use μ_i = 0 (i = 0,1) as parameter values throughout without loss of generality. In other words, the intercept and trend terms are actually zero although we take such terms into account, and thereby we pretend that this information is unknown to the analyst. Hence, y_t = δd_tτ + x_t, and we have performed simulations with different δ vectors. Rewriting x_t in VECM form (2.5) shows that Π = −(I_n − A₁) = diag(ψ − I_r : 0) and, thus, δ₁ = −Πδ can only be nonzero if level shifts occur in stationary components of the DGP.

Samples are simulated by starting with initial values of zero and discarding the first 50 observations. We have considered a sample size of T = 100. The number of replications is 1,000. Thus, the standard error of an estimator of a true rejection probability P is

, for example, s_0.05 = 0.007. Moreover, we use different VAR orders p, although the true order is p = 1, to explore the impact of this quantity on the estimation and testing results. In all simulations the search procedures are applied to all possible break points τ from the fifth up to and including the 96th observation. This choice corresponds to the situation where no prior knowledge on the break date is available. Therefore a search is performed over the full sample period except for some observations at the beginning and at the end. Recall that our theoretical results exclude the possibility of a break at the very beginning or at the very end of the sample. Leaving out 4% of the observations at both ends is to some extent arbitrary. Because we will consider VAR orders up to p = 3, a break closer than three periods to the end of the sample results in one or more impulse dummies in (3.1) being zero throughout the sample period and therefore can not be handled in our setup. We decided to stop the search close to the end of the feasible period and treat the beginning and the end of the sample symmetrically in this respect. In practice, some prior knowledge on the break date may be available that can be used to narrow the period where a search is necessary. In that case it may be easier for an estimator to find the true break date, and, hence, the results for the break date estimation and cointegration testing may improve relative to those obtained in our simulations.

To compute the estimator

we use a nonlinear LS estimation method by applying the Gauss–Newton algorithm to minimize the sum of squared residuals corresponding to (3.4). The iterations of the algorithm stop if the change in

from iteration i to i + 1 is less than (T − p)⁻ⁿ, where

are the residual vectors from the nonlinear estimation of (3.4). Thus, the precision is about 10⁻⁶ for a three-dimensional process. In addition, the maximum number of iterations is set to 25. We have also worked with smaller values of our stopping criterion and higher maximum numbers of iterations for a subset of our simulation experiments but did not obtain different results.

The interpretation of the simulation results is done in two steps. First, we analyze the ability of the shift date estimators to locate the true break point. Second, we discuss the small-sample properties of the corresponding cointegration tests based on these estimators. This discussion includes a comparison of the LR^PAR and LR^GLS test procedures.

As a basis for the comparison of the shift date estimators we start with a three-dimensional DGP with r = 1 (ψ₁ = 0.9), Θ = (0.4, 0.8), and τ = 50. Afterward, we comment briefly on the importance of the value of τ, the innovation correlation, and the results of a four-dimensional DGP with two cointegration relations without presenting detailed results. The latter DGP has been considered to study the properties of the procedures in the case of more complicated processes. Finally, we examine situations where δ₁ = −Πδ = 0 and, hence, according to Theorems 3.1(i) and 3.2(i), consistent estimation requires larger shift sizes.

The break date estimates with respect to our three-dimensional basis DGP with r = 1 and a VAR order p = 1 are reported in Table 2 and Figure 1a. We consider a shift δ = (δ₍₁₎,δ₍₂₎,δ₍₃₎)′ with δ₍₁₎ ranging from 1 to 10 and δ₍₂₎ = δ₍₃₎ = 0. Hence, the shift occurs in the first component of the DGP, which is stationary according to (5.1). Thus, as discussed before, we have δ₁ = Πδ = αβ′δ ≠ 0 in (3.1) and, hence, θ = β′δ ≠ 0 in (2.7).

Break date estimates for three-dimensional DGP with r = 1 (ψ1 = 0.9), Θ = (0.4, 0.8), VAR order p = 1, true break point τ = 50, sample size T = 100, δ(2) = δ(3) = 0

Relative frequency of true break point estimates or of estimates in interval for three-dimensional DGP with r = 1 (ψ1 = 0.9), Θ = (0.4, 0.8), sample size T = 100, true break point τ = 50, δ(2) = δ(3) = 0.

The performance of the estimators

is very similar, although the former estimator is more successful in finding the correct break date for small shift magnitudes. Only if δ₍₁₎ = 3 and δ₍₁₎ = 5 does

perform slightly better. For large values of δ₍₁₎ both estimators perform identically. In fact, the cases of δ₍₁₎ = 3 and δ₍₁₎ = 5 represent the few exceptions in all our simulation experiments where

outperforms

. These observations also hold if one considers the small band [τ − 2; τ + 2] instead of the single true value of τ only to evaluate the break date estimator. The frequency of break date estimates

in the interval [τ − 2; τ + 2] is denoted by

, respectively, in Figure 1. Obviously, the frequency of finding the true τ increases for larger shift magnitudes. This result is not surprising given the asymptotic properties of the estimators and the fact that δ₁ ≠ 0 in the present situation. Because T is fixed, changing δ₍₁₎ from 1 to 10 may be interpreted as changing a or δ_* in Assumption 3 accordingly.

Next, we have fitted a VAR(3) model although the true DGP has only an order p = 1, and we give the results in Table 3 and Figure 1b. In this case j₀ = 0 in Theorem 3.1(i) because γ₁ = γ₂ = 0. Thus, under the conditions of that theorem,

. In line with this result,

does not find the true break point τ = 50 even if δ₍₁₎ is large. In fact, we observe in Table 3 that in about two-thirds of the replications the break date is located too early. However, the estimates converge to the stated range for

, in line with our asymptotic results. Interestingly, focusing on the band [τ − 2; τ + 2],

is slightly more successful than

for δ₍₁₎ = 2 and 3 (see Figure 1b).

Break date estimates for three-dimensional DGP with r = 1 (ψ1 = 0.9), Θ = (0.4, 0.8), VAR order p = 3, true break point τ = 50, sample size T = 100, δ(2) = δ(3) = 0

To analyze possible effects of the location of τ we also studied break points τ = 10, 25, 75, and 90 using the same three-dimensional DGP as before. We found that it is only slightly more difficult to detect the more extreme break points if small shift magnitudes are considered. In the case of large shift magnitudes the location of the break date becomes even less important for the estimation results. These observations were made for both break date estimators and both VAR orders p = 1 and 3.

Next, we studied the effect of the error term correlation between the stationary and nonstationary components by considering a three-dimensional DGP as before but with Θ = (0,0) and comparing the outcomes with the previous findings. The absence of instantaneous error term correlation made it more difficult for both estimators to locate the true break point regardless of the lag order. This outcome can be explained by the fact that we considered a shift only in one of the three components so that a weaker link of the components owing to Θ = (0,0) complicated the break date search. In this situation,

was always the most successful procedure, and usually its advantage was even more pronounced than in the case of Θ = (0.4, 0.8).

Our results with respect to the more complicated four-dimensional DGP with a cointegrating rank r = 2 (ψ₁ = ψ₂ = 0.7) and Θ = ([0.4 : 0.4]′ : [0.4 : 0.4]′) clearly indicated that the performance of the break date estimators deteriorates in the case of smaller level shifts. We chose a shift vector of the form δ = (δ₍₁₎,δ₍₂₎,δ₍₃₎,δ₍₄₎)′ with δ₍₁₎ ranging again from 1 to 10 and δ₍₂₎ = δ₍₃₎ = δ₍₄₎ = 0. By adding a further dimension to the process the importance of the break in only one component is weakened, which may explain the lower number of correct break date estimates. The relative rankings of the two estimation procedures did not change, however.

Finally, we examined two DGPs for which δ₁ = Πδ = 0. For this situation, Theorems 3.1 and 3.2 state that compared to the case δ₁ ≠ 0 “larger” shift magnitudes are needed to ensure that

can estimate the break date consistently. First, we considered a three-dimensional process as in the base case but with δ₍₃₎ ranging from 1 to 10 and δ₍₁₎ = δ₍₂₎ = 0. Because the shift occurs in the third component, which is nonstationary, the level shift is orthogonal to the cointegration space in line with our DGP design (5.1). Thus, we simulated a case of co-breaking. Second, we used a three-dimensional process with ψ₁ = 1 so that the cointegrating rank is r = 0. In the case of r = 0, all components of the DGP are nonstationary, and therefore no error term correlation is present because Θ vanishes.

The results for the shift date estimators are depicted in Figure 2. Clearly, it was more difficult for both procedures to locate τ in this situation. In particular, the case of no cointegration is much more difficult to deal with than co-breaking (compare Figures 2c and 2d with Figures 2a and 2b). The poor performance of the shift date estimation procedures for small shift sizes relative to DGPs with δ₁ ≠ 0 is in accordance with the finding in Section 3 that precise estimation requires large shift magnitudes if δ₁ = 0.

Relative frequency of true break point estimates or of estimates in interval for three-dimensional DGPs with Θ = (0.4,0.8) (a and b), Θ = (0,0) (c and d), sample size T = 100, true break point τ = 50.

To sum up, the constrained estimator

is usually at least as good as

and often superior in finding the true break date. The small-sample results for

are in line with our asymptotic derivations, which say that this procedure may estimate the break date too early when the VAR order is overspecified.

So far we have just analyzed the small-sample properties of the break date estimators in terms of their ability to locate the true shift date. If one is primarily interested in the cointegrating rank of the system the focus should be on the small-sample properties of the cointegration tests based on these different estimators. Our main conclusion is that the tests' small-sample size and power differ very little even in those cases where the break date estimators perform differently. Therefore, we only discuss the outcomes for our three-dimensional base DGP for which we have δ₁ = Πδ ≠ 0, the case treated in Theorem 4.1. The results are given in Tables 4 and 5. To be precise, we present the rejection frequencies for the null hypothesis H₀ : r = r₀ when the LR^PAR and LR^GLS tests are applied to a process with known or estimated shift date. The rejection frequencies for the case r₀ = 1 should give an indication of the tests' sizes in small samples. Therefore we use the term size in the following discussion when we refer to this case.

Relative rejection frequencies of cointegration rank tests for three-dimensional DGP with r = 1 (ψ1 = 0.9), Θ = (0.4, 0.8), VAR order p = 1, true break point τ = 50, sample size T = 100, nominal significance level 0.05, δ(2) = δ(3) = 0

Relative rejection frequencies of cointegration rank tests for three-dimensional DGP with r = 1 (ψ1 = 0.9), Θ = (0.4, 0.8), VAR order p = 3, true break point τ = 50, sample size T = 100, nominal significance level 0.05, δ(2) = δ(3) = 0

In Table 4 we see for fitted VAR(1) models that in particular the sizes of LR^PAR are clearly higher than the nominal 5% level in cases of small shift magnitudes, for which we obtain many incorrect break date locations. For increasing shift magnitudes the sizes approach the values for a known shift date in line with the greater success of the estimators to locate τ. The small-sample powers (not adjusted for the variations in the sizes) are roughly constant if

is used whereas an increase in power for small values of δ₍₁₎ can be observed for

The impact of the shift magnitude on the small-sample properties of the cointegration tests is similar if VAR(3) models are fitted and

is used (see Table 5). Note, however, that the tests' small-sample power is clearly lower than in the VAR(1) case even if the true shift date is known. Regarding

we observe some differences if VAR(3) models are applied (compare Table 5). Here, the sizes of LR^GLS and powers of both tests fall below the values for a known shift date when δ₍₁₎ is large. Obviously, the effect of the wrong locations on the small-sample properties becomes important if the shift magnitude is large.

Three important observations can be made with respect to the relative small-sample properties of the two cointegration tests LR^PAR and LR^GLS. First, the new test LR^PAR rejects somewhat too often if the null hypothesis is true (r = 1), even if the shift date is known. These higher rejection frequencies were also found for other DGPs. Thus, it may be worth exploring small-sample corrections for the new tests in future work. Second, for increasing shift magnitudes the relative performance of LR^PAR and LR^GLS based on the estimator

is in general similar. Third, using

the drop in small-sample power in the case of large shift magnitudes when the VAR order is overspecified is more pronounced for LR^GLS than for LR^PAR. Thus, our new test proposal appears to be less affected by the incorrect break date estimates.

The overall conclusion from our simulations is that the estimator

is generally preferable to

. Taking account of the nonlinear restrictions is beneficial to both locating the shift date and testing for the cointegrating rank. In fact, estimating the shift date does not worsen the small-sample properties of the cointegration tests much relative to the case of a known break point if

is used. Given the size distortions of LR^PAR even if the shift date is known, size correcting procedures may be worth exploring for this test in the future.

6. CONCLUSIONS

We have analyzed the asymptotic properties of two estimators for the shift date in a cointegrated VAR process with level shift. The shift is modeled by a simple shift dummy variable. The first estimator is based on an unrestricted VAR model, and the second one is obtained by taking into account the relation between the parameters of the stochastic and deterministic parts of the model. Asymptotic properties of both estimators are given under the assumption that the shift may depend on the sample size. Both a growing and a declining shift size when the sample size tends to infinity are considered. These results extend previous results of Lütkepohl et al. (2004), who also consider the first estimator assuming a fixed shift size. Our results shed new light on previously unexplained small-sample phenomena. We have also considered the implications of using models with estimated instead of true shift dates in testing for the cointegrating rank, and we have proposed new variants of cointegrating rank tests. These tests differ from those considered by Lütkepohl et al. in that they avoid estimating the nonidentified part of the levels parameter and proceed otherwise in a similar manner. More precisely, the trend and shift parameters are estimated in a first step, and then rank tests of the LR type are applied to the adjusted series. The asymptotic distributions of the tests are derived.

In addition to providing asymptotic results, we have also investigated the small-sample properties of the procedures using a Monte Carlo simulation experiment. It is found that the estimator that takes the restrictions into account is the most successful one in locating the true shift date. Moreover, a superior break date estimator tends to improve the small-sample properties of subsequent cointegration tests. Generally it pays to account for a shift in testing for the cointegrating rank of a system of variables when such a shift is actually present.

A comparison of the tests considered by Lütkepohl et al. (2004) and the new tests of the present paper shows, however, that the latter tend to reject a true null hypothesis more often than the Lütkepohl et al. tests. Generally the new tests tend to reject true null hypotheses too often, and hence in future research it may be of interest to develop small-sample corrections to ensure a test size close to the nominal level.

Another direction for extending our results may be to develop a joint procedure for determining the break date and the cointegrating rank. Our two-step procedure, where the break date is estimated in the first stage and then tests for the cointegrating rank are performed, is easy to use in practice and therefore has some appeal in applied work. An alternative approach may be to determine the break date and cointegrating rank jointly, for instance by minimizing a model selection criterion. We leave such extensions for future research.

APPENDIX: Proofs

Some parts of the proofs are similar to those of the corresponding results stated in Lütkepohl et al. (2004) under more restrictive conditions. Because these authors provide brief sketches of the proofs only, we also present more detailed and more complete versions of the similar parts here.

The following notational conventions are used in addition to the notation defined earlier. Right-hand side and left-hand side will be abbreviated by r.h.s. and l.h.s., respectively. The smallest and largest eigenvalues of a matrix are denoted by λ_min(·) and λ_max(·), respectively. The complement of a set B is signified by B^c. The dependence of quantities on the sample size T is not indicated. The symbol ⇒ signifies weak convergence in a product space of D([λ,λ]) or D([0,1]). The former is relevant for random functions depending on the parameter λ, whereas the latter is used when the weak limit is a Brownian motion. Unless otherwise stated, all limits assume that T → ∞. When obtaining weak convergences in a product space of D([λ,λ]) we frequently make use of results given in Appendix A.1 of Gregory and Hansen (1996). It is straightforward to check that these results are applicable despite the differences in assumptions.

In the proofs we assume the model and conditions described in Sections 2 and 3, where the parameters μ₀, μ₁, δ_* ∈ Rⁿ and the true α, β, Π, and Γ_j (j = 1,…,p − 1) satisfy the restrictions that ensure that the observed variables are at most I(1), whereas these restrictions are not imposed in the estimation.

The true DGP is one specific process from our model class. It is occasionally helpful to be more explicit about its particular parameter values. In these cases they will be indicated with a subscript o (e.g., μ_0o, μ_1o, τ_o, etc.). For the break date we assume for convenience that

. We begin by proving Theorem 3.1.

A.1. Proof of Theorem 3.1.

Instead of the series y_t it will be convenient to use the mean adjusted series

Solving the preceding equation for y_t and inserting the result into (3.1) yields

Here

Note that the true values of ν₀⁽⁰⁾ and ν₁⁽⁰⁾ are zero.

It will also be convenient to use the transformation Πx_t−1 = α⁽⁰⁾u_t−1⁽⁰⁾ + ρ⁽⁰⁾v_t−1⁽⁰⁾, where u_t−1⁽⁰⁾ = β_o′x_t−1, v_t−1⁽⁰⁾ = β_o⊥′x_t−1, α⁽⁰⁾ = αβ′β_o(β_o′ β_o)⁻¹, and ρ⁽⁰⁾ = αβ′β_o⊥(β_o⊥′ β_o⊥)⁻¹. Clearly, the true values of α⁽⁰⁾ and ρ⁽⁰⁾ are α_o and zero, respectively. With this transformation the preceding error correction form can be expressed as

Denote q_tτ = [d_tτ : d_tτ′]′ and

With this notation (A.2) becomes

where Φ = [ν₀⁽⁰⁾ : Tν₁⁽⁰⁾ : T^1/2ρ⁽⁰⁾ : α⁽⁰⁾ : Γ₁ : ··· : Γ_p−1], Ξ = [δ₁ : γ], and Ξ⁽⁰⁾ = [δ₁⁽⁰⁾ : γ⁽⁰⁾].

Let Θ = [Φ : Ξ] contain the freely varying parameters in (A.3) or (A.2) (Ξ⁽⁰⁾ is not a freely varying parameter because it is determined by α⁽⁰⁾, ρ⁽⁰⁾, and Γ₁,…,Γ_p−1). Set

Then

is −2 times the (conditional) Gaussian log-likelihood function of the parameters in (A.3). Minimizing this function yields Gaussian ML estimators of the parameters Θ, τ, and Ω. It is not difficult to see that the resulting estimators of Θ and τ can alternatively be obtained by minimizing the concentrated counterpart of l_T(Θ,τ,Ω), that is,

The definition of ε_tτ(Θ) (and the fact that Ξ⁽⁰⁾ is not a freely varying parameter) makes it clear that the value of τ that minimizes the function l_T^(c)(Θ,τ) is identical to

defined by (3.2). Thus, (asymptotic) properties of

can be studied by using the Gaussian ML estimator of τ discussed previously. Before turning to this issue we note that the preceding discussion also makes clear that a minimizer of l_T(Θ,τ,Ω) exists (for every T larger than some constant).

The proof of Theorem 3.1 consists of several steps. In the first one we consider a subset of the parameter space of (Θ,Ω) defined by

and

Note that here M does not depend on T although Φ and δ₁⁽⁰⁾ do. We now prove the following lemma.

LEMMA A.1. Let B₁ = B₁(M,ω,ω) be the part of the parameter space of (Θ,τ,Ω) in which conditions (A.4) and (A.5) hold. Then there exist choices of M, ω, and ω such that

with probability approaching one.

Proof. First note that

where the latter equality is justified by the weak law of large numbers.

Next, because [Tλ] ≤ τ, τ_o ≤ [Tλ], we find from the definitions that

Hence,

where Φ⁽⁰⁾ = Φ + [δ₁ − δ₁⁽⁰⁾ : 0]. Here

where ε_* > 0 is a suitable real number and the inequality holds with probability approaching one. This fact can be justified in the same way as Lemma A.4 of Saikkonen (2001). A similar result is also obtained by changing the range of summation on the l.h.s. of (A.8) to t = [Tλ] + p,…,T. When these two eigenvalue conditions are assumed, arguments entirely similar to those in Saikkonen (2001, pp. 320–321) show that, with suitable choices of M, ω, and ω, the r.h.s of (A.7) can be made arbitrarily large whenever (Θ,τ,Ω) ∉ B₁(M,ω,ω). The assertion of the lemma follows from this and (A.6). █

Lemma A.1 implies that a minimizer of l_T(Θ,τ,Ω) will asymptotically satisfy inequality restrictions of the form (A.4) and (A.5). In what follows, the set B₁ is always assumed to be defined in such a way that the conclusion of Lemma A.1 holds. We shall now proceed in the same way as in Saikkonen (2001) and express the function l_T(Θ,τ,Ω) as a sum of two components. To this end, define

Then w_t⁽⁰⁾ = [w_1t⁽⁰⁾′ : w_2t⁽⁰⁾′]′, and we also partition the parameter matrix Φ conformably as Φ = [Φ₁ : Φ₂] where Φ₁ = [ν₀⁽⁰⁾ : Tν₁⁽⁰⁾ : T^1/2ρ⁽⁰⁾] and Φ₂ = [α⁽⁰⁾ : Γ₁ : ··· : Γ_p−1]. With these definitions,

where ε_1tτ(Θ) = −Φ₁ w_1t⁽⁰⁾ − Ξq_tτ + Ξ⁽⁰⁾q_{tτ_o} and ε_2t(Φ₂) = Δx_t − Φ₂ w_2t⁽⁰⁾. Clearly, ε_{1tτ_o}(Θ_o) = 0 and

where

For l_2T(Φ₂,Ω) we have the following result.

LEMMA A.2.

where the infimum is over unrestricted values of Φ₂ and Ω > 0.

Proof. Because we can treat Δx_t as a zero mean stationary process and because l_2T(Φ₂,Ω) can be interpreted as −2 times the logarithm of the Gaussian likelihood function associated with the regression model Δx_t = Φ₂ w_2t⁽⁰⁾ + ε_t, the stated result follows from standard regression theory (cf. Saikkonen, 2001, p. 321). █

Next consider the function l_1T(Θ,τ,Ω). Our treatment will be divided into several steps in which the time index t is suitably restricted. This means considering the function l_1T(Θ,τ,Ω) with the sample size T replaced by appropriate quantities smaller than T. Most of the subsequent results will explicitly be formulated for τ ≤ τ_o and only briefly discussed in the case τ ≥ τ_o (cf. Bai et al., 1998).

In the following results about the function l_1T(Θ,τ,Ω), c₁,c₂,… denote positive constants and a_1T,a_2T,… are nonnegative random variables that depend on the sample size but not on the parameters Θ, τ, or Ω. First we prove the following lemma.

LEMMA A.3. There exists a constant c₁ > 0 such that, with probability approaching one and uniformly in [Tλ] ≤ τ ≤ τ_o and (Θ,τ,Ω) ∈ B₁,

where a_1T ≥ 0 and a_1T = O_p(1).

Proof. For t ≤ τ − 1, ε_1tτ(Θ) = −Φ₁ w_1t⁽⁰⁾ and, consequently,

For L₁ we have

For (Θ,τ,Ω) ∈ B₁, the first eigenvalue in the last expression is bounded away from zero. That the same holds with probability approaching one and uniformly in [Tλ] ≤ τ ≤ τ_o for the second eigenvalue can be seen by using an analog of (A.3) of Gregory and Hansen (1996, p. 118). Thus, we have shown that L₁ ≥ c₁∥Φ₁∥², c₁ ≥ 0, with probability approaching one.

It remains to show that L₂ ≥ −a_1T∥T^1/2Φ₁∥ with a_1T having the properties stated in the lemma. To demonstrate this, notice that

Here we have used the definition of ε_2t(Φ₂), the Cauchy–Schwarz inequality, and the norm inequality. By an analog of (A.4) of Gregory and Hansen (1996, p. 118), the norm in the middle of the last expression is of order O_p(1) uniformly in [Tλ] ≤ τ ≤ τ₀ and for any fixed value of Φ₂. Thus, because the parameters Φ₂ and Ω belong to bounded sets when (Θ,τ,Ω) ∈ B₁, it can similarly be shown that the last expression as a whole has an upper bound a_1T∥T^1/2Φ₁∥ with a_1T as required. This completes the proof. █

Our next result deals with the contribution of l_{1,τ_o−1}(Θ,τ,Ω) − l_1,τ−1(Θ,τ,Ω) to l_1T(Θ,τ,Ω). Here the relevant expression of ε_1tτ(Θ) is

where Ψ₁ = Φ₁ + [δ₁ : 0].

LEMMA A.4. Let ε be any real number with the property 0 < ε < λ_o − λ. Then, for λ ≤ λ ≤ λ_o − ε there exists a constant c₂ > 0 such that, with probability approaching one and uniformly in [Tλ] ≤ τ ≤ [T(λ_o − ε)] and (Θ,τ,Ω) ∈ B₁,

where a_iT ≥ 0 (i = 2,3), a_2T = O_p(1), and a_3T = o_p(T^η) with 1/b < η < ¼.

Proof. By the definitions,

First consider L₃ and for simplicity denote Ψ₁ = [Ψ₁ : γ] and z_1tτ⁽⁰⁾ = [w_1t⁽⁰⁾′ : d_tτ′]′. Then

where D_1T = diag[T^−1/2I_n−r+2 : I_p].

Next note that

uniformly in [Tλ] ≤ τ < τ_o. Because w_1t⁽⁰⁾ = [1 : t/T : T^−1/2v_t−1⁽⁰⁾′]′ this is obvious for the first and second components of w_1t⁽⁰⁾. For the third component the same is true because T^−1/2max_{1≤t≤τ_o}∥v_t−1⁽⁰⁾∥ ≤ T^−1/2max_1≤t≤T∥β_o⊥′x_t−1∥ = O_p(1), where the equality follows from the fact that T^−1/2β_o⊥′x_[Ts] obeys an invariance principle. Thus, we can conclude that

uniformly in [Tλ] ≤ τ < τ_o − p.

The next step is to observe that

where M₁₁(λ) is the weak limit of

(cf. (A.3) of Gregory and Hansen, 1996, p. 118). It is straightforward to check that the difference M₁₁(λ_o) − M₁₁(λ) is positive definite and its smallest eigenvalue is bounded from below by a positive constant when λ ≤ λ ≤ λ_o − ε.

The preceding discussion implies that, with probability approaching one, the smallest eigenvalue of the matrix on the l.h.s. of (A.10) is bounded away from zero uniformly in [Tλ] ≤ τ ≤ [T(λ_o − ε)]. Thus, with probability approaching one and in the required uniform sense,

where c₂ > 0 is a (small) constant. This implies that it only remains to show that L₄ ≥ −a_2T∥T^1/2Ψ₁∥ − a_3T∥γ∥ with a_2T and a_3T as stated in the lemma.

To show the previously mentioned inequality about L₄, conclude from the definitions that

Arguments similar to those already used in the proof of Lemma A.3 show that

where a_2T = O_p(1) in the required uniform sense.

Regarding L₄₂, one similarly obtains

where a_3T = o_p(T^η), 1/b < η < ¼, in the required uniform sense. The latter inequality follows if the last norm in the preceding expression can be replaced by o_p(T^η). To justify this, recall that Δx_t and w_2t⁽⁰⁾ are stationary processes with finite moments of order b > 4 and that Φ₂ can be assumed to belong to a bounded set. Thus, it suffices to show that max_1≤t≤T∥Δx_t∥ = o_p(T^η) and similarly with Δx_t replaced by w_2t⁽⁰⁾. This, however, can be done by using an argument entirely similar to that in (A.14) of Saikkonen and Lütkepohl (2002). The inequalities obtained for |L₄₁| and |L₄₂| show that L₂ has the required lower bound, and the proof is complete. █

Our next result describes the contribution of l_{1,τ_o+p−1}(Θ,τ,Ω) − l_{1,τ_o−1}(Θ,τ,Ω) to l_1T(Θ,τ,Ω). We introduce the notation

In the following lemma the relevant values of ε_1tτ(Θ) can then be written as

where Ψ₂ = Φ₁ + [δ₁ − δ₁⁽⁰⁾ : 0]. Note that here the first term in the definition of ζ_tτ⁽⁰⁾ vanishes, but the general definition is convenient in later derivations. Now we can formulate the following lemma.

LEMMA A.5. There exists a constant c₃ > 0 such that, with probability approaching one and uniformly in [Tλ] ≤ τ ≤ τ_o and (Θ,τ,Ω) ∈ B₁,

where a_iT ≥ 0 and a_iT = O_p(1) (i = 4,5).

Proof. By the definitions,

Assuming (Θ,τ,Ω) ∈ B₁ we find that

Because we can here assume that Ψ₂ is bounded (see (A.5)), an application of the triangle inequality and the Cauchy–Schwarz inequality shows that the absolute value of the third term in the last expression can be bounded from above by

Here the latter square root is of order O_p(1) (see the argument leading to (A.10)). Hence, we can conclude that

where c₃ = ω⁻¹ > 0 and a_41T = O_p(1) in the required uniform sense.

Now consider L₆. Arguments similar to those used in previous derivations combined with the present definition of ε_1tτ(Θ) yield

It is easy to see that the first term on the r.h.s. can be used to define the term a_5T in the lemma. The arguments needed are similar to those used to obtain (A.11), and they can also be applied to the second term so that we can write

where also a_42T = O_p(1) in the required uniform sense. The result of the lemma now follows from (A.11) and (A.12) by defining a_4T = a_41T + a_42T. █

The next lemma is concerned with the contribution of l_1T(Θ,τ,Ω) − l_{1,τ_o+p−1}(Θ,τ,Ω) to l_1T(Θ,τ,Ω). Here ε_1tτ(Θ) is given by

LEMMA A.6. There exists a constant c₄ > 0 such that, with probability approaching one and uniformly in [Tλ] ≤ τ ≤ τ_o and (Θ,τ,Ω) ∈ B₁,

where a_6T ≥ 0 and a_6T = O_p(1).

Proof. The proof is similar to that of Lemma A.3. █

Our next lemma is used as an alternative to Lemma A.4 in some of the subsequent derivations. The formulation of this lemma makes use of the notation ζ_tτ⁽⁰⁾ employed in Lemma A.5.

LEMMA A.7. There exists a constant c₅ > 0 such that with probability approaching one and uniformly in [Tλ] ≤ τ ≤ τ_o − 1 and (Θ,τ,Ω) ∈ B₁,

where 1/b < η < ¼, a_iT ≥ 0, and a_iT = O_p(1) (i = 7,8,9).

Proof. By the definitions,

Recall that Ψ₁ = Φ₁ + [δ₁ : 0] and Ψ₂ = Φ₁ + [δ₁ − δ₁⁽⁰⁾ : 0]. For t = τ,…,τ_o − 1, we thus have ε_1tτ(Θ) = −Ψ₁ w_1t⁽⁰⁾ − γd_tτ = −Ψ₂ w_1t⁽⁰⁾ − ζ_tτ⁽⁰⁾. Hence,

Assume that (Θ,τ,Ω) ∈ B₁. An application of the Cauchy–Schwarz inequality, the norm inequality, and the triangle inequality yields

Because max_1≤t≤T∥w_1t⁽⁰⁾∥ = O_p(1) (see the arguments leading to (A.10)), the second square root in the last expression is of order O_p(1) uniformly in [Tλ] ≤ τ < τ_o. Hence,

where a_8T = O_p(1) in the required uniform sense.

Next note that L₇₁ ≥ 0 and λ_min(Ω⁻¹) ≥ ω⁻¹ for (Θ,τ,Ω) ∈ B₁. Consequently,

Now consider L₈, for which we have

Arguments similar to those used for L₇₃ show that

where a_9T = O_p(1) in the required uniform sense. The latter inequality is obtained because, for (Θ,τ,Ω) ∈ B₁, the last norm in the second expression can be replaced by O_p(1) by an analog of (A.4) of Gregory and Hansen (1996, p. 118).

As for L₈₂, assume first that τ < τ_o − p and use the Cauchy–Schwarz inequality to conclude that

Here the second inequality is based on the definitions and the triangle inequality, whereas the third one also makes use of the Cauchy–Schwarz inequality and the norm inequality.

In the last expression

uniformly in [Tλ] ≤ τ < τ_o − p and (Θ,τ,Ω) ∈ B₁. Here the latter result can be concluded from the Hájek–Rényi inequality given in Proposition 1 of Bai (1994). The former can be obtained by an argument similar to that used to prove (A.14) of Saikkonen and Lütkepohl (2002).

Combining the preceding discussion of L₈₂ shows that

where a_71T = O_p((τ_o − τ)^η) in the required uniform sense and the equality follows from definitions. Because for any real numbers a ≥ 0 and b ≥ 0 we have

, it follows that

In the proof of this result it was assumed that τ < τ_o − p, but it also holds for τ_o − p ≤ τ < τ_o. In that case arguments similar to those used for L₇₃ give

and (A.16) holds with a_71T = O_p(1). The result of the lemma is obtained from the definitions of L₇ and L₈ in conjunction with (A.13)–(A.16) by defining

and a_9T as done in (A.13) and (A.15), respectively. █

In the proof of the next lemma and also in subsequent proofs, frequent use will be made of the elementary inequality

which holds for a₀,a₁ ≥ 0, and a₂ > 0.

LEMMA A.8. Let ε > 0 and B₂ = {(Θ,τ,Ω) : ∥T^1/2−ηΦ₁∥² + ∥T^1/2−ηΨ₂∥² ≤ ε²}, where 1/b < η < ¼ is the same as in Lemma A.7. Then,

with probability approaching one and uniformly in [Tλ] ≤ τ ≤ τ_o.

Proof. By the definitions and Lemma A.2,

Thus, it suffices to show that, for some ε_* > 0,

with probability approaching one and uniformly in [Tλ] ≤ τ ≤ τ_o.

From Lemma A.1 it follows that we only need to prove (A.19) with the set B₂^c replaced by B₁ ∩ B₂^c. Let 0 < ε₁ ≤ λ_o − λ and define the sets

According to what was said previously, it suffices to establish (A.19) separately with B₂^c replaced by B₂₁ and B₂₂. Here we are free to choose the value of ε₁. Whatever our choice, Lemma A.4 can be applied on the set B₂₁, on which we shall first concentrate.

From Lemmas A.4 and A.5 and (A.17) we first find that, uniformly in B₂₁,

Combining these inequalities with those obtained from Lemmas A.3 and A.6 shows that, uniformly in B₂₁,

Denote

. Then the preceding inequality implies that, uniformly in B₂₁,

For simplicity, denote φ_T² = ∥T^1/2−ηΦ₁∥² + ∥T^1/2−ηΨ₂∥² and note that the sum of the two norms in the last expression is at most

. Thus, uniformly in B₂₁,

Because φ_T > ε on B₂₁ and a_T* = O_p(1) uniformly in B₂₁, this shows that (A.19) holds with B₂^c replaced by B₂₁.

Now consider proving (A.19) with B₂^c replaced by B₂₂. Here we can use Lemmas A.3, A.5, A.6, and A.7 to conclude that, with probability approaching one and uniformly in B₂₂,

Here it is understood that a_9T and the last two terms on the r.h.s. are deleted if τ = τ_o because then Lemma A.7 becomes redundant. By (A.17) the sum of the fifth, sixth, and seventh terms on the r.h.s. is of order o_p(1) uniformly in B₂₂, and the sum of the last two terms can be bounded from below by −(1/4c₅)[a_7T((τ_o − τ)/T)^η + a_8T((τ_o − τ)/T)^1/2∥T^1/2−ηΨ₂∥]². Thus, expanding the square and inserting the result on the r.h.s. of the preceding inequality yields, uniformly in B₂₂,

where

Note that here a_6T,…,a_9T are of order O_p(1) uniformly in B₂₂ and that, on B₂₂, (τ_o − τ)/T ≤ 2ε₁, say. Because we are free here to choose the value of ε₁ we can choose it so small that the following two conditions hold with probability approaching one and uniformly in B₂₂: (i) c_4T(τ) ≥ c₄ /2 and (ii) a_10T(τ) and a_11T(τ) become smaller than any preassigned positive number. Taking these facts into account and comparing the inequality (A.23) with (A.20) shows that there are only two points that make the previous proof based on inequality (A.20) directly inapplicable in the present context. These points are that instead of the terms T^−ηa_6T = o_p(1) and o_p(1) we have in (A.23) a_10T(τ) and a_11T(τ) + o_p(1), respectively, which are not of order o_p(1) but can only be replaced by an arbitrarily small positive number independent of parameters. However, this is sufficient for the application of essentially the same proof as previously. Indeed, we can conclude that, uniformly in B₂₂, an analog of (A.21) holds except that in the last expression T^η is replaced by a fixed positive number that can be assumed as large as we wish and o_p(1) is replaced by a fixed negative number that, in absolute value, can be assumed as small as we wish. In particular, we can assume that T^η and o_p(1) in (A.21) are replaced by M/ε and −ε/M, respectively, where M can be chosen arbitrarily large. This shows that we can make the r.h.s. of the present version of (A.21) larger than some ε_* > 0 with probability approaching one. Thus, there is a choice of ε₁ such that (A.19) holds with B₂^c replaced by B₂₁ and B₂₂. This completes the proof. █

The next lemma is similar to Lemma A.8 except that it deals with the short-run parameter Φ₂.

LEMMA A.9. Let ε > 0 and B₃ = {(Θ,τ,Ω) : ∥T^1/2−η(Φ₂ − Φ_2o)∥ ≤ ε}, where 1/b < η < ¼ is the same as in Lemma A.7. Then,

with probability approaching one and uniformly in [Tλ] ≤ τ ≤ τ_o.

Proof. By Lemma A.1 it suffices to prove the result with B₃^c replaced by B₁ ∩ B₃^c. First consider the break dates [Tλ] ≤ τ ≤ [T(λ_o − ε₁)] and note that the derivation of the inequality in (A.21) is valid for these break dates and for all (Θ,τ,Ω) ∈ B₁ ∩ B₃^c. It is also valid for every ε₁ > 0. Thus, an application of (A.17) shows that in this part of the parameter space T^−2ηl_1T(Θ,τ,Ω) ≥ o_p(1) holds uniformly. Next note that the inequality (A.23) is valid for [T(λ_o − ε)] < τ ≤ τ_o and for all (Θ,τ,Ω) ∈ B₁ ∩ B₃^c. Moreover, as the discussion after that inequality reveals, we can, with a suitable (small) choice of ε₁, use (A.17) to obtain an analog of (A.21) from which we conclude that, with probability approaching one and uniformly in the considered part of the parameter space, T^−2ηl_1T(Θ,τ,Ω) ≥ −ε₂, where ε₂ > 0 can be chosen arbitrarily small. From the preceding discussion and the first equality in (A.18) it thus follows that we need to show that, for some ε_* > 0,

with probability approaching one. Arguments needed to show this are similar to those used in previous proofs and also very similar to those used to prove the consistency of the LS estimators of the parameters Φ₂ and Ω in the standard regression model Δx_t = Φ₂ w_t⁽⁰⁾ + ε_t. Details are straightforward and are omitted. █

The next lemma again makes use of the notation ζ_tτ⁽⁰⁾ introduced for Lemma A.5.

LEMMA A.10. Let

, where τ < τ_o and 1/b < η < ¼ is the same as in Lemma A.7. Then, there exists a real number M₀ > 0 such that, for all M ≥ M₀,

with probability approaching one and uniformly in [Tλ] ≤ τ ≤ τ_o − 1. If the quantity (τ_o − τ)^−2η in the definition of the set B₄ is replaced by T^−2η the same conclusion holds.

Proof. From (A.18) it follows that it suffices to show that there exists a real number M₀ > 0 such that, for all M ≥ M₀ and any M₁ > 0,

with probability approaching one and uniformly in [Tλ] ≤ τ ≤ τ_o − 1. From Lemmas A.1, A.8, and A.9 it further follows that here the set B₄^c can be replaced by B₁ ∩ B₂ ∩ B₃ ∩ B₄^c. From (A.19) it can be seen that the value of ε in the definition of B₂ can be chosen arbitrarily small.

We wish to apply Lemmas A.3, A.5, A.6, and A.7 to obtain a lower bound for l_1T(Θ,τ,Ω). This lower bound can be obtained by multiplying both sides of the inequality (A.22) by T^2η. By (A.17) the contribution of the first four terms to the r.h.s. of the resulting inequality can be replaced by O_p(1). This is also the case for the seventh term. Hence, we can write

This holds uniformly in B₁ ∩ B₂ ∩ B₃ ∩ B₄^c and [Tλ] ≤ τ ≤ τ_o − 1. In this part of the parameter space we also have

and a_4T ≤ a_4T(τ_o − τ)^η (see Lemma A.8). Denote c* = min(c₃,c₅), a_T* = max(a_4T, a_7T + εa_8T), and for simplicity,

. From the lower bound obtained for l_1T(Θ,τ,Ω) previously we can then further obtain

Again, this holds uniformly in B₁ ∩ B₂ ∩ B₃ ∩ B₄^c and [Tλ] ≤ τ ≤ τ_o − 1. Now, on B₄^c, ξ_τ > M(τ_o − τ)^η so that, for all M large enough and with probability approaching one, we can make the r.h.s. of (A.25) larger than any preassigned number M₁ > 0. Thus, we have established (A.24) and thereby the first assertion of the lemma. The second assertion is obvious by (A.25) and the discussion thereafter. █

Before proceeding to new proofs we discuss how Lemmas A.3–A.10 are formulated when τ ≥ τ_o.

The counterpart of Lemma A.3 is concerned with the time points t = p + 1,…,τ_o − 1 and break dates τ_o ≤ τ ≤ [Tλ] but is otherwise similar to Lemma A.3.

The next time points of interest are now t = τ_o,…,τ_o + p − 1 so that we need to consider a counterpart of Lemma A.5. Here we write

where Ψ₂ = Φ₁ + [δ₁ − δ₁⁽⁰⁾ : 0] as before and ζ_tτ = (d_tτ − d_{tτ_o})δ₁ + γd_tτ − γ⁽⁰⁾d_{tτ_o}. In other words, in place of ζ_tτ⁽⁰⁾ we now use an analogous variable defined by using the parameter δ₁ instead of δ₁⁽⁰⁾. However, replacing ζ_tτ⁽⁰⁾ in Lemma A.5 by ζ_tτ is clearly possible, as can be seen from the given proof.

Instead of the time points t = τ_o + p,…,τ − 1 it is next reasonable to consider the time points t = τ_o + p,…,τ + p − 1. Then the number of time points is the same as in Lemmas A.4 and A.7. Changes in parameters have to be made, though. Now

where Ψ₁⁽⁰⁾ = Φ₁ − [δ₁⁽⁰⁾ : 0]. Thus, we now have the matrix Ψ₁⁽⁰⁾ in place of Ψ₁ used in Lemma A.4, and, as before, the former is defined by using δ₁⁽⁰⁾ instead of δ₁ in Ψ₁. The parameter γ used in Lemma A.4 is also changed by adding δ₁ to its columns. With these replacements the counterpart of Lemma A.4 applies with [T(λ_o + ε)] ≤ τ ≤ [Tλ].

Next consider the counterpart of Lemma A.7, which is also concerned with time points t = τ_o + p,…,τ_o + p − 1. Here the preceding expression of ε_1tτ(Θ) is modified to the form

where Ψ₂ is as defined in the proof of Lemma A.7. In the counterpart of Lemma A.7 we then have ζ_tτ in place of ζ_tτ⁽⁰⁾ and τ_o + 1 ≤ τ ≤ [Tλ]. The proof can again be obtained basically by following the previous proof.

The counterpart of Lemma A.6 is straightforward. The relevant time points are t = τ,…,T, and the obtained lower bound is as before except for the obvious change in the values of τ, which become τ_o ≤ τ ≤ [Tλ]. The proof is similar to the proof of Lemma A.3.

It is not difficult to check that the modified versions of Lemmas A.3–A.7 can be used to show that the results of Lemmas A.8 and A.9 also apply for τ_o ≤ τ ≤ [Tλ]. Regarding Lemma A.10, when τ_o + 1 ≤ τ ≤ [Tλ], the set B₄ is defined as

but otherwise the same result obtains.

Now we can turn to our next lemma, which is central in studying asymptotic properties of the break date estimator. Recall that δ_1o = −Π_oδ_o = −α_o β_o′δ_o, where δ_o = T^aδ_*. Thus, δ_1o ≠ 0 if and only if β_o′δ_o ≠ 0. Note also that we shall use the convention that the infimum over an empty set is ∞.

LEMMA A.11. Let M > 0. Assume that δ_1o ≠ 0 and define B₅ = {(Θ,τ,Ω) : (|τ_o − τ| − p)∥δ_1o∥^2/(1−2η) ≤ M}, where 1/b < η < ¼ is the same as in Lemma A.7 or its counterpart when τ > τ_o. Then there exists a real number M₀ > 0 such that, for all M ≥ M₀,

with probability approaching one. If δ_1o = 0 the same result holds with the set B₅ replaced by

Proof. Assume first that τ < τ_o − p and δ_1o ≠ 0. From Lemmas A.1, A.8, and A.9 it follows that we can replace the set B₅^c by B₁ ∩ B₂ ∩ B₃ ∩ B₅^c.

By the definitions, δ₁⁽⁰⁾ = −Πδ_o = −α⁽⁰⁾β_o′δ_o − ρ⁽⁰⁾β_o⊥′δ_o, where β_o′δ_o ≠ 0. On B₃, ∥α⁽⁰⁾ − α_o∥ ≤ εT^η−1/2 and, on B₂, ∥ρ⁽⁰⁾∥ ≤ εT^η−1 (see Lemmas A.8 and A.9). Thus, because δ_1o = −α_o β_o′δ_o and δ_o = T^aδ_*,

for some positive and finite constant c. Hence, because ζ_tτ⁽⁰⁾ = (d_tτ − d_{tτ_o})δ₁⁽⁰⁾ = δ₁⁽⁰⁾ for t = τ + p,…,τ_o − 1, we have on B₁ ∩ B₂ ∩ B₃ ∩ B₅^c,

Here the fourth relation makes use of the triangle inequality. For all T and M large enough the last expression can be made larger than the real number M₀ in Lemma A.10. Thus, the stated result follows from Lemma A.10.

Now consider the case τ > τ_o + p but maintain the assumption δ_1o ≠ 0. Then, using the counterparts of Lemmas A.8 and A.9 we can proceed in the same way as in the case τ < τ_o − p until the relations (A.26), which start now as

Thus, in place of δ₁⁽⁰⁾ we now have δ₁. However, from the counterpart of Lemma A.8 we find that, on B₂, ∥δ₁ − δ₁⁽⁰⁾∥ ≤ εT^η−1/2 and a straightforward modification of the arguments in the latter part of (A.26) combined with the present version of Lemma A.10 give the desired result.

Next assume that δ_1o = 0 and τ ≤ τ_o. In this case we use the inequality

where

, that is, the summation on the r.h.s. is over the values of t for which Δd_{tτ_o} ≠ 0 or Δd_tτ ≠ 0. Clearly the number of such time points is at most 2p.

From the definitions it follows that

Notice that here Γ_joδ_o = −γ_jo (j = 1,…,p − 1) and, because now δ_1o = 0, δ_o = γ_0o. Thus, the sum of the last three terms equals γd_tτ − γ_o d_{tτ_o}, and we wish to show that the contribution of the first three terms to the r.h.s. of (A.27) can be ignored. To this end, note that now δ₁⁽⁰⁾ = −ρ⁽⁰⁾β_o⊥′δ_o so that, on B₂, ∥δ₁⁽⁰⁾∥ ≤ cεT^η+a−1 for some 0 ≤ c < ∞ (see Lemma A.8). Furthermore, on B₃, ∥(Γ_j − Γ_jo)δ_o∥ ≤ ∥Γ_j − Γ_jo∥∥δ_o∥ ≤ εT^η+a−1/2∥δ_*∥ (j = 1,…,p − 1) (see Lemma A.9). Using these facts and the triangle inequality we find that

On the r.h.s. the summation can be extended to all t = p + 1,…,T. This means that on B₁ ∩ B₂ ∩ B₃ ∩ B₅₀^c the last expression becomes larger than the real number M₀ in Lemma A.10 for all T and M large enough. Thus, the stated result follows from the latter part of Lemma A.10.

Finally, assume that δ_1o = 0 and τ > τ_o. In place of (A.27) we then have a similar inequality with t = τ_o,…,τ + p − 1 and ζ_tτ⁽⁰⁾ replaced by ζ_tτ. However, using the fact that ∥δ₁ − δ₁⁽⁰⁾∥ ≤ εT^1/2−η on B₂ it is straightforward to show that the proof can be reduced to a form entirely similar to that in the case τ ≤ τ_o. This completes the proof of the lemma. █

Now we can prove Theorem 3.1. As discussed earlier, the estimator

can also be obtained by minimizing −2 times the Gaussian log-likelihood function l_T(Θ,τ,Ω). First consider the case a > 0 and δ_1o ≠ 0. By Lemma A.11 we can then concentrate on the break dates τ_o − p ≤ τ ≤ τ_o + p. First consider the case τ_o − p ≤ τ ≤ τ_o. If γ_jo = 0 for all j = 0,…,p − 1, Lemma A.11 shows that, asymptotically,

, as required. Next suppose that γ_j₀,o ≠ 0 and consider the break dates τ_o − p ≤ τ ≤ τ_o − p + j₀. For any of these break dates we have

Suppose first that j₀ > 0. Then, because γ_j₀⁽⁰⁾ − γ_j₀,o = −(Γ_j₀ − Γ_j₀,o)δ_o, we have for (Θ,τ,Ω) ∈ B₃,

Because γ_j₀,o = −T^aΓ_j₀,oδ_* ≠ 0, the last quantity tends to infinity as T → ∞. Hence, we can conclude from Lemmas A.9 and A.10 that asymptotically the function l_T(Θ,τ,Ω) is not minimized for τ ≤ τ_o − p + j₀. Now consider the case j₀ = 0. From the definitions it follows that γ_o⁽⁰⁾ − γ_0o = δ_1o − δ₁⁽⁰⁾, where ∥δ_1o − δ₁⁽⁰⁾∥ ≤ cT^η+a−1/2ε on B₂ ∩ B₃ (see the beginning of the proof of Lemma A.11). Hence, because γ_0o = T^a(δ_* + α_o β_o′δ_*) ≠ 0, the proof given in the case j₀ > 0 applies with obvious changes and shows that asymptotically

cannot occur.

To complete the proof of the first assertion, consider the case τ_o + 1 ≤ τ ≤ τ_o + p. By the definitions we then have ζ_{τ_oτ} = −δ₁ − γ₀⁽⁰⁾ = −δ_o + (δ₁⁽⁰⁾ − δ₁), where δ_o ≠ 0 and ∥δ₁⁽⁰⁾ − δ₁∥ ≤ εT^η−1/2 for (Θ,τ,Ω) ∈ B₂ ∩ B₃ (see Lemma A.8 and the definition of Ψ₂ given before Lemma A.5). In the same way as in the preceding case we can thus conclude from Lemmas A.8–A.10 that asymptotically

cannot occur. This completes the proof of the first assertion in the case a > 0 and δ_1o ≠ 0.

Next assume that a > η > 1/b and δ_1o = 0. Then, if τ ≤ τ_o − p + j₀ and j₀ > 0,

Because the last quantity tends to infinity as T → ∞ it follows from the latter part of Lemma A.11 that asymptotically

cannot occur. If j₀ = 0, we have γ_0o = δ_o − δ_1o = δ_o ≠ 0, and (A.28) holds with Γ_j₀,oδ_* replaced by δ_*. Hence the same conclusion also obtains for j₀ = 0.

If τ > τ_o the l.h.s. of (A.28) can be bounded from below by T^−2η∥γ_0o∥² = T^−2η∥δ_o∥² = T^2a−2η∥δ_*∥², and the situation is similar to the case j₀ = 0.

Finally, the second part of the theorem follows directly from the first part of Lemma A.11. This completes the proof of Theorem 3.1.

A.2. Proof of Theorem 3.2.

The break date estimator

can also be obtained by minimizing the objective function l_T(Θ,τ,Ω) over the relevant restricted part of the parameter space. Compared to the previous unrestricted estimation, the parameters δ₁ and γ in (A.2) are no more freely varying but (smooth) functions of the parameters δ, ρ⁽⁰⁾, α⁽⁰⁾, and Γ₁,…,Γ_p−1. Specifically, δ₁ = −Πδ = −α⁽⁰⁾β_o′δ − ρ⁽⁰⁾β_o⊥′δ, γ₀ = δ − δ₁, and γ_j = −Γ_jδ (j = 1,…,p − 1). Unlike with the unconstrained estimation it is not quite obvious that these restricted estimators exist. This fact will therefore be justified first. After that the proof follows straightforwardly from the results used to prove Theorem 3.1.

Define

Using y_t^(τ) in place of x_t we can obtain an analog of (A.2) in which d_{tτ_o} and d_{tτ_o} are replaced by d_tτ and d_tτ, respectively, and u_t−1⁽⁰⁾ and v_t−1⁽⁰⁾ are replaced by analogs defined in terms of y_t^(τ) instead of x_t. In other words, in place of u_t−1⁽⁰⁾ and v_t−1⁽⁰⁾ we use u_t−1^(τ) = β_o′y_t−1^(τ) and v_t−1^(τ) = β_o⊥′y_t−1^(τ), respectively. In place of (A.3) we then have

where w_t^(τ) is an obvious modification of w_t⁽⁰⁾.

Clearly, we can express the vector ε_tτ(Θ) as

and use this expression in the previous definition of l_T(Θ,τ,Ω). To demonstrate the existence of a minimizer of the objective function l_T(Θ,τ,Ω) it also appears convenient to use the reparameterization Θ → Θ⁽⁰⁾ = [Φ : Ξ − Ξ⁽⁰⁾]. Thus, if for simplicity we denote z_t^(τ) = [Δy_t^(τ)′ : w_t^(τ)′ : q_tτ′]′ and R(Θ⁽⁰⁾) = [I_n : − Φ : Ξ − Ξ⁽⁰⁾] we can write the relevant objective function as

Note that in the present context the parameter Θ has the same meaning as before except that it is treated as a (smooth) function of the parameters ν₀⁽⁰⁾, ν₁⁽⁰⁾, δ, ρ⁽⁰⁾, α⁽⁰⁾, and Γ₁,…,Γ_p−1. Because the parameter Ξ⁽⁰⁾ is also a (smooth) function of (some of) these parameters the same is true for the parameter Θ⁽⁰⁾. All these parameter restrictions are taken into account when the minimization of the objective function l_T(Θ⁽⁰⁾,τ,Ω) is considered. Notice that, because the objective function is expressed as a function of the “reduced form” parameter Θ⁽⁰⁾, the role of the parameter restrictions is to define the permissible space of Θ⁽⁰⁾. A similar idea, of course, applies to the previous parameterization of the objective function, that is, to l_T(Θ,τ,Ω) (cf. Saikkonen, 2001, and the references therein for a similar approach).

A useful consequence of the fact that we can still interpret the objective function l_T(Θ,τ,Ω) as a function of the “reduced form” parameter Θ and only restrict its permissible space is that results obtained to prove Theorem 3.1 can be applied straightforwardly even here. In particular, we wish to apply Lemma A.11 to conclude that, when the existence of a minimizer of the objective function l_T(Θ,τ,Ω) is studied in the present setup, values of the break date parameter τ can be restricted as implied by this lemma. Of course, this conclusion also holds when the objective function is parameterized as l_T(Θ⁽⁰⁾,τ,Ω).

To justify the application of Lemma A.11, we first discuss how Lemmas A.1–A.10 have to be modified to match the present setup. Notice that the existence of a minimizer of the objective function l_T(Θ,τ,Ω) is not needed to prove Lemmas A.1–A.11 and the same is also true for their modified versions to be discussed subsequently.

First note that Lemma A.2 is still used in its previous form and, because it is concerned with unrestricted values of Φ₂ and Ω, it obviously applies in the present context. Lemma A.1 is simply modified by replacing B₁ by the intersection of the restricted parameter space of (Θ,τ,Ω) and values for which the inequality constraints in (A.4) and (A.5) hold. This restricted version of the parameter space B₁ is then used to replace B₁ in Lemmas A.3–A.7. It is straightforward to check that the previous proofs of these lemmas apply in essence despite the differences in parameter spaces.

Next consider Lemmas A.8–A.10, where, in addition to B₁, also the parameter spaces B₂, B₃, and B₄ are redefined to allow for the employed restrictions. Again, it is not difficult to check that the previous proofs carry over. It is also easy to see that the modifications needed for Lemmas A.3–A.10 can be done in the case τ ≥ τ_o.

Because analogs of Lemmas A.1–A.10 hold in the present context, it is further straightforward to show that the result of Lemma A.11 also holds with the parameter space B₅ redefined to account for the employed restrictions. Thus, we can conclude that when searching for a minimizer of the objective function l_T(Θ⁽⁰⁾,τ,Ω), the value of the break date parameter τ can be restricted as implied by Lemma A.11. Specifically, if δ_1o ≠ 0, Lemma A.11 directly shows that τ_o − p ≤ τ ≤ τ_o + p can be assumed. If δ_1o = 0 and a > b, we can even assume τ_o − p + 1 ≤ τ ≤ τ_o + p − 1, as the argument used to prove the corresponding case of Theorem 3.1(i) readily shows.

We shall now show that the function l_T(Θ⁽⁰⁾,τ,Ω) and hence l_T(Θ,τ,Ω) have a minimizer with probability approaching one. In what follows, reference to Lemmas A.1–A.11 will be understood to mean the present restricted setup. We first show the following intermediate result, where the matrix D_T = diag[T^−1/2I : I_p] is used. Its dimension equals the dimension of the vector z_t^(τ).

LEMMA A.12. There exists an ε_* > 0 such that

with probability approaching one and uniformly in τ, when the value of the break date parameter τ can be restricted as implied by Lemma A.11.

Proof. The values of τ can be restricted depending on the value of a and whether δ_1o = 0 or not. Different cases will therefore be discussed separately.

Case (i). a > 0 and δ_1o ≠ 0 or a > η > 1/b.

From Lemma A.11 we can then conclude that, if a minimizer of l_T(Θ⁽⁰⁾,τ,Ω) exists, in large samples it must be such that the corresponding τ is in the interval [τ_o − p,τ_o + p]. If δ_1o ≠ 0 this follows directly from the first part of Lemma A.11. If δ_1o = 0 (and a > η > 1/b) the same conclusion can be drawn from the second part of the lemma by the argument used in the proof of Theorem 3.1 to obtain (A.28).

To justify (A.31), assume first that a < ½. Then the moment matrix in (A.31) behaves asymptotically in the same way as in the proof of Theorem 3.1 in that the vectors Δy_t^(τ) and w_t^(τ) in the definition of z_t^(τ) can be replaced by analogs defined in terms of x_t. This follows by observing that, when |τ_o − τ| ≤ p, the latter term on the r.h.s. of (A.29) satisfies

When a < ½ the last quantity converges to zero, and the desired conclusion is readily obtained.

If a = ½ the latter term on the r.h.s. of (A.29) has an impact, but (A.31) still obtains. To see this, suppose first that δ_1o ≠ 0. Then, as |τ − τ_o| ≤ p, the latter term on the r.h.s. of (A.29) behaves like an impulse dummy. Because now δ_o = T^1/2δ_* this term affects the asymptotic behavior of the moment matrix in (A.31), but, as can be readily seen, it only affects the diagonal and off-diagonal elements related to u_t−1^(τ) and Δy_t−j^(τ) (j = 0,…,p − 1). Moreover, the impact is such that asymptotically the moment matrix in (A.31) only differs from that obtained in the previous case by an additive positive semidefinite matrix. Thus, from this fact and the result of the previous case one again obtains (A.31).

Next assume that δ_1o = 0 and a = ½. Here the situation is similar to the preceding case except for being simpler because now u_t−1^(τ) = β_o′x_t−1 = u_t−1⁽⁰⁾. Thus, we again get (A.31), and, thus, we have justified (A.31) in the case of the first part of the theorem. It remains to consider the second part, for which the following assumption is made.

Case (ii). a ≤ 0 and δ_1o ≠ 0.

If a = 0 it follows from the first part of Lemma A.11 that we can assume |τ − τ_o| to be bounded, and arguments similar to those in the case 0 < a < ½ and δ_1o ≠ 0 show (A.31). If a < 0 we cannot restrict the values of τ. However, from (A.32) it can be seen that the vectors Δy_t^(τ) and w_t^(τ) in the definition of z_t^(τ) can be replaced by analogs defined in terms of x_t. Arguments similar to those used in the proof of Theorem 3.1 then show that (A.31) also holds in the present case. (In particular, analogs of (A.3) and (A.4) of Gregory and Hansen, 1996, and (A.14) of Saikkonen and Lütkepohl, 2002, can be used to handle sums of cross products between [Δx_t′ : w_t^(0)′]′ and q_tτ.) █

We have now shown that when searching for a minimizer of the function l_T(Θ⁽⁰⁾,τ,Ω) we can in both parts of Theorem 3.2 restrict the values of the break date τ in such a way that (A.31) holds with probability approaching one and uniformly in τ.

Using Lemma A.12 we can analyze the function l_T(Θ⁽⁰⁾,τ,Ω) in the same way as in the proof of Proposition 3.1 of Saikkonen (2001, pp. 320–321) and conclude that it suffices to search for a minimizer of l_T(Θ⁽⁰⁾,τ,Ω) in that part of the parameter space where, in addition to the restrictions on τ, we also have 0 < ω ≤ λ_min(Ω) ≤ λ_max(Ω) ≤ ω < ∞ and ∥Θ⁽⁰⁾∥ ≤ M < ∞.

We shall demonstrate that the parameter space defined by all these restrictions is compact. To this end, note first that the restrictions imposed on Θ⁽⁰⁾ are of the form h(Θ⁽⁰⁾) = 0, where h(·) is a continuous function. Thus, because the unrestricted parameter space of Θ⁽⁰⁾ is the whole euclidean space, it follows that the restricted space is closed and its intersection with parameter values restricted by 0 < ω ≤ λ_min(Ω) ≤ λ_max(Ω) ≤ ω < ∞ and ∥Θ⁽⁰⁾∥ ≤ M < ∞ is compact. The continuity of the function l_T(·,τ,·) therefore ensures that, for every relevant value of τ, a minimizer exists with probability approaching one. This proves the (asymptotic) existence of the nonlinear LS estimators of Θ⁽⁰⁾, τ, and Ω and hence also that of Θ.

To prove part (i) of Theorem 3.2, first consider the case δ_1o ≠ 0 and assume that τ ≤ τ_o − 1. As noted previously, we can also assume that τ_o − p ≤ τ. Using the definitions we can express the vector ζ_tτ⁽⁰⁾ as

Taking the assumed restrictions into account we can write this further as

Here we have also made use of the facts that δ₁ = −Πδ and δ₁⁽⁰⁾ = −Πδ_o with Π = α⁽⁰⁾β_o′ + ρ⁽⁰⁾β_o⊥′.

To show that asymptotically the function l_T(Θ,τ,Ω) cannot be minimized for τ_o − p ≤ τ ≤ τ_o − 1, we consider two cases separately. In the first case it is assumed that δ ≥ T^aε_*, where ε_* > 0 is arbitrary. The second case will then assume that δ < T^aε_*.

Now consider parameter values for which τ_o − p ≤ τ ≤ τ_o − 1 and δ ≥ T^aε_* hold for some ε_* > 0. By Lemma A.8 we can also assume that ∥δ₁ − δ₁⁽⁰⁾∥ ≤ εT^η−1/2. Using this, (A.33), and the previously mentioned parameter restrictions, we find that

Because the last quantity tends to infinity with T, it follows from Lemma A.10 that asymptotically

cannot occur.

For parameter values τ_o − p ≤ τ ≤ τ_o − 1 and δ < T^aε_* we can also use (A.33) and Lemma A.10. First note that, by Lemma A.8, the norm of the first term on the r.h.s. of (A.33) can be bounded by εT^η−1/2. Next, from Lemmas A.8 and A.9 it follows that the term in front of δ in the second term on the r.h.s. of (A.33) can be assumed bounded, and so the norm of the whole term can be bounded by a quantity of the form c₁ε_*T^a, where 0 < c₁ < ∞. Similar arguments can also be used to show that, at least for t = τ_o, the norm of the third term on the r.h.s. of (A.33) can be bounded from below by a quantity of the form c₂∥δ_*∥T^a, where 0 < c₂ < ∞ and ∥δ_*∥ ≠ 0. Thus, because ε_* can be chosen arbitrarily small, the asymptotic behavior of

is dominated by the third term on the r.h.s. of (A.33), and the preceding discussion implies that this sum tends to infinity with T. From this and Lemma A.10 we can conclude that asymptotically

cannot occur.

Thus, we have shown that, when δ_1o ≠ 0, asymptotically

cannot occur. A similar argument with ζ_tτ⁽⁰⁾ replaced by ζ_tτ and with Lemma A.10 replaced by its corresponding counterpart shows that asymptotically

cannot occur either.

Now suppose that δ_1o = 0 and consider the break dates τ_o − p ≤ τ ≤ τ_o − 1. Instead of (A.33) we use a slightly different representation of ζ_tτ⁽⁰⁾ given by

This representation can be obtained from the definitions (cf. the similar representation used in the proof of Lemma A.11). As with the case δ_1o ≠ 0, our treatment will be divided into two separate cases.

In the first one the parameter δ is restricted as δ ≥ T^aε_*, where ε_* > 0 is arbitrary and a > η > 1/b. From the preceding representation of ζ_tτ⁽⁰⁾ it then follows that

Here the last inequality makes use of the fact that ∥δ₁ − δ₁⁽⁰⁾∥ ≤ εT^η−1/2 can be assumed by Lemma A.8. Because the last quantity tends to infinity with T, it follows from the latter result of Lemma A.10 that asymptotically

cannot occur.

When δ < T^aε_* (a > η > 1/b) is assumed, (A.34) and Lemma A.11 give the desired result much in the same way as in the case δ_1o ≠ 0, where (A.33) was used instead of (A.34). First note that the norm of the first four terms on the r.h.s. of (A.34) can be bounded by a quantity of the form εcT^η+a−1/2, where 0 < c < ∞. This follows from Lemma A.8 and arguments used to prove Lemma A.11 for δ_1o = 0. Next, in the same way as in the case δ_1o ≠ 0 one can show that the term in front of δ in the fifth term on the r.h.s. of (A.34) can be assumed bounded and, hence, the norm of the whole term can be bounded by a quantity of the form c₁ε_*T^a, where 0 < c₁ < ∞. By similar arguments we finally find that, at least for t = τ_o, the norm of the last term on the r.h.s. of (A.34) can be bounded below by a quantity of the form c₂∥δ_*∥T^a, where 0 < c₂ < ∞ and ∥δ_*∥ ≠ 0. Thus, because ε_* can be chosen arbitrarily small, the asymptotic behavior of

is dominated by the last term on the r.h.s. of (A.34), and it follows from the latter result of Lemma A.10 that asymptotically

cannot occur.

Thus, we have shown that, when δ_1o = 0, we asymptotically cannot have

. Again a similar proof with ζ_tτ⁽⁰⁾ replaced by ζ_tτ and Lemma A.10 replaced by its corresponding counterpart shows that asymptotically

cannot occur either. This completes the proof of part (i) of the theorem in the case δ_1o = 0. Part (ii) is a consequence of the (asymptotic) existence of

and Lemma A.11. Hence, the proof of Theorem 3.2 is complete.

A.3. Proof of Theorem 4.1.

For simplicity we will denote the break date estimator by

. This estimator can be either of the two estimators considered in Section 3 unless explicit distinctions are made. From the assumptions δ₁ ≠ 0 and 0 < a < ½ and Theorems 3.1 and 3.2 it follows that asymptotically

can be assumed. This fact will be used in several arguments of the proof without explicit notice.

Properties of RR Estimators. We shall first show that the RR estimators of the parameters based on equation (2.7) with the unknown break date τ_o replaced by the estimator

satisfy appropriate consistency properties. This replacement changes the VECM (2.7) to

where

Write

where

. Using the transformation

we can transform the preceding VECM to the form

where ν⁽⁰⁾ = ν + αβ′μ_0o − Ψμ_1o, φ⁽⁰⁾ = φ − β′μ_1o, θ⁽⁰⁾ = θ − β′δ_o, γ₀*⁽⁰⁾ = δ − δ_o, and γ_j*⁽⁰⁾ = γ_j* + Γ_jδ_o (j = 1,…,p − 1). Note that the true values of these parameters are zero. RR estimators of the parameters in (A.38) are obtained by transforming the RR estimators based on (A.35) in the same way as the corresponding parameters (e.g.,

). Asymptotic properties of these transformed estimators are derived subsequently. We denote by

normalized versions of the estimators

, respectively, such that

LEMMA A.13. Under the conditions of Theorem 4.1,

Proof. We first note that the result of Lemma A.13 also holds when the break date is assumed known. A formal proof of this can be obtained by following the proof of Lemma 2.1 of Saikkonen and Lütkepohl (2000a) and observing that the omission of some impulse dummies from the model considered by Saikkonen and Lütkepohl is of no significance and that the same is true for the dependence of the parameter δ on the sample size. The latter fact is clear because the results of Lemma A.13 are formulated by using the transformed model (A.38) in which the true values of the deterministic parameters are zero.

Because the result of Lemma A.13 holds when the break date is assumed known it also holds when the break date can be consistently estimated, that is, when

. Indeed, then the analysis can be restricted to that part of the sample space where

holds and the probability of this can be made arbitrarily close to unity for all T large enough. This proves the results of the lemma for the constrained estimator

If j₀ = p − 1 in Theorem 3.1(i) the preceding argument also applies to the unconstrained estimator

. For other values of j₀ further arguments are needed. By Theorem 3.1(i) it suffices to consider any value of the break date such that τ_o − p + 1 + j₀ ≤ τ ≤ τ_o. For simplicity, consider the case j₀ = p − 2 and τ = τ_o − 1. It is easy to see that even though the break date is misspecified by one we can still consider (2.7) a correctly specified model if we only redefine the parameters γ₀*,…,γ_p−1* as γ₀* = αβ′δ, γ₁* = δ, and γ_j* = −Γ_j−1δ, j = 2,…,p − 1. By assumption we then have γ_p−1* ≠ 0 whereas Γ_p−1δ = 0. With these new definitions the error term of model (2.7) is still ε_t, and the analysis given in the case of a known break date can be used. Because the other parameters of the model are not affected by the redefinition of the parameters γ_j* (j = 0,…,p − 1) the obtained consistency results will be the same as in the case where the true break date is known. The same argument can clearly be extended to other values of j₀. This completes the proof. █

Properties of the new estimators of the deterministic parameters. We shall now consider asymptotic properties of the estimators

by assuming that the break date τ in (2.7) is replaced by one of the estimators

LEMMA A.14. Under the conditions of Theorem 4.1, the estimators

have the following properties:

Proof. We start with the results (A.39) and (A.40). Recall the definitions ν = −αβ′μ₀ + Ψμ₁, ν⁽⁰⁾ = ν + αβ′μ_0o − Ψμ_1o, and φ⁽⁰⁾ = φ − β′μ_1o, which imply that

Here the latter equality is obtained by arguments similar to those used to define the estimator

. These arguments further show that β_⊥′(μ₁ − μ_1o) = β_⊥′C(ν⁽⁰⁾ − Ψ_βφ⁽⁰⁾), and the same relation applies to estimators. Thus, we have

Here and in what follows the subscript 0 is omitted from the estimators of α and β to simplify the notation. By Lemma A.13, one obtains from the previous equality

Note that the estimator

can be viewed as the LS estimator of the parameter ν⁽⁰⁾ in the auxiliary regression model obtained by replacing

in (A.38) by its observed analog

. This implies that

can be obtained by LS from the auxiliary regression model

where

, Λ is a conformable coefficient matrix, and the error has the representation

. By the definition of C and Lemma A.13,

. Using this fact, Lemma A.13, and the assumptions, it is straightforward to show that the asymptotic properties of the LS estimator of the parameter Λ in the auxiliary regression model (A.43) can be obtained by assuming that the error equals

. The same arguments and the definition of

(see (A.36)) further show that the error can be assumed to be

or even β_⊥′Cε_t. Because it is also straightforward to show that the estimation of the intercept term in (A.43) is asymptotically independent of the estimation of the other regression coefficients we can conclude that

This and a standard central limit theorem yield

To obtain (A.40) we need to show that

on the l.h.s. can be replaced by β_⊥. To see this, write

By the consistency of the estimator

and the result just obtained the latter term on the r.h.s. is of order o_p(T^−1/2), and the same is true for the former because

by Lemma A.13. From this last result one can obtain (A.39) because

can be replaced by β using an argument similar to that used in (A.44).

Now consider the estimator

. From its derivation we get the identity

. By the definitions, this is equivalent to

Because the same relation applies to estimators, arguments similar to those used to define the estimator

yield

Lemma A.13 implies that the r.h.s. of this equality is of order O_p(1). Moreover,

. Thus, (A.41) and (A.42) follow because in these results

can be replaced by β and β_⊥, respectively, by using an argument similar to that in (A.44). This completes the proof of Lemma A.14. █

Proof of the limiting distribution of LR^PAR. The structure of our proof of the limiting distribution of LR^PAR(r₀) is similar to that of Theorem 11.1 of Johansen (1995). Therefore we just outline the arguments in the following discussion.

First note that the limiting distribution of the test statistic LR^PAR(r₀) can be derived by assuming that the true value of the parameter μ₀ is zero. Thus, we can write equation (4.1) as

Using this representation, the assumption a < ½, and the asymptotic properties of the estimators

obtained in Lemma A.14, we can now mimic the proof given in Johansen (1995, pp. 158–160) and see that all the quantities that therein converge in probability to constants will here converge in probability to the same constants. However, quantities that in Johansen (1995, pp. 158–160) converge weakly to functionals of a Brownian motion will here converge weakly to different functionals of a Brownian motion. Here these weak limits are determined by the weak limit of

. We have

where W(s) is an (n − r₀)-dimensional Brownian motion with covariance matrix Ω and hence the limit is a linear transformation of the Brownian bridge W₊(s) = W(s) − sW(1). The error term in the equality is understood to hold in the Skorohod topology.

To justify (A.46), first consider the equality. Because

by (A.42) of Lemma A.14, it is clear that the contribution of the third and fourth terms on the r.h.s. of (A.45) to

is asymptotically negligible. The same argument also applies to the fifth term on the r.h.s. of (A.45) because a < ½. As for the weak convergence in (A.46), it can be justified by a standard functional central limit theorem and (A.40) of Lemma A.14 by observing that the limit of the second expression is determined by the process ε_t (see Johansen, 1995, eqn. (B.24), and the proof of (A.40)).

The preceding discussion implies that the limiting distribution of the test statistic LR^PAR(r₀) can be derived by ignoring the last three terms on the r.h.s. of (A.45). This means that in the same way as in Saikkonen and Lütkepohl (2000a) we have reduced the problem to that of no break studied by Saikkonen and Lütkepohl (2000b). From Lemma A.14 and the proof of Theorem 3 of Saikkonen and Lütkepohl (2000b) it can be seen that, when μ_0o = 0 is assumed, the trace test statistic in that theorem is asymptotically equivalent to a similar test statistic based on an analog of (4.2) defined by replacing

. It is straightforward to show that the use of

instead of

changes the limiting distribution of the test statistic as stated in the theorem. In other words, because the vector

is obtained from

by augmenting with unity, the same augmentation results in one of the two Brownian bridges in the limiting distribution obtained in Theorem 3 of Saikkonen and Lütkepohl (2000b). Technical details, which are similar to the corresponding two cases in Johansen (1995, Sect. 11.2), are straightforward and will be omitted.

Asymptotic properties of the GLS estimators of the deterministic parameters. Because the break date estimator is asymptotically between τ_o − p and τ_o it is straightforward to follow the proof of Theorem 2.1 of Saikkonen and Lütkepohl (2000a) (case a₁ < 1) and obtain asymptotic properties of the GLS estimators of the parameters μ₀, μ₁, and δ. Denoting these GLS estimators by

, it can be demonstrated that (A.39)–(A.42) of Lemma A.14 hold for

except that in (A.42) O_p(T^a) replaces O_p(1). For

it follows that

Limiting distribution of LR^GLS. The test statistic LR^GLS(r₀) is defined as LR^PAR(r₀) except that now

. Instead of (A.45) we therefore have

where the estimators on the r.h.s. satisfy the rates of convergence obtained previously. It is straightforward to check that, under the conditions of Theorem 4.1, the rates of convergence obtained for

are sufficient for the fifth term on the r.h.s. of (A.47) to have no effect on the asymptotic properties of the second sample moments on which the test statistic is based, and the same is true for the last term, which is as in (A.45). Thus, the problem reduces to that of a known break date studied by Saikkonen and Lütkepohl (2000a), and the limiting distribution of the test statistic is obtained from Theorem 3.1 of that paper. Here it suffices to note the following points. First, the dependence of the break size on the sample size has no effect because the needed arguments only involve the difference

. Second, it is not difficult to check that the rate of convergence

suffices instead of

, which could be used in Saikkonen and Lütkepohl (2000a) and the same is true for

. Thus, we have demonstrated that, under the conditions of Theorem 4.1, the test statistic has the same limiting distribution as in Saikkonen and Lütkepohl (2000a).

References

REFERENCES

Bai, J. (1994) Least squares estimation of a shift in linear processes. Journal of Time Series Analysis 15, 453–472.CrossRef Google Scholar

Bai, J., R.L. Lumsdaine, & J.H. Stock (1998) Testing for and dating common breaks in multivariate time series. Review of Economic Studies 65, 395–432.CrossRef Google Scholar

Gregory, A.W. & B.E. Hansen (1996) Residual-based tests for cointegration in models with regime shifts. Journal of Econometrics 70, 99–126.CrossRef Google Scholar

Hubrich, K., H. Lütkepohl, & P. Saikkonen (2001) A review of systems cointegration tests. Econometric Reviews 20, 247–318.CrossRef Google Scholar

Johansen, S. (1995) Likelihood Based Inference in Cointegrated Vector Autoregressive Models. Oxford University Press.

Johansen, S., R. Mosconi, & B. Nielsen (2000) Cointegration analysis in the presence of structural breaks in the deterministic trend. Econometrics Journal 3, 216–249.CrossRef Google Scholar

Lütkepohl, H. & P. Saikkonen (2000) Testing for the cointegrating rank of a VAR process with a time trend. Journal of Econometrics 95, 177–198.CrossRef Google Scholar

Lütkepohl, H., P. Saikkonen, & C. Trenkler (2004) Testing for the cointegrating rank of a VAR process with level shift at unknown time. Econometrica 72, 647–662.CrossRef Google Scholar

Müller, U.K. & G. Elliott (2003) Tests for unit roots and the initial condition. Econometrica 71, 1269–1286.CrossRef Google Scholar

Perron, P. (1989) The great crash, the oil price shock and the unit root hypothesis. Econometrica 57, 1361–1401.CrossRef Google Scholar

Saikkonen, P. (2001) Consistent estimation in cointegrated vector autoregressive models with nonlinear time trends in cointegrating relations. Econometric Theory 17, 296–326.CrossRef Google Scholar

Saikkonen, P. & H. Lütkepohl (2000a) Testing for the cointegrating rank of a VAR process with structural shifts. Journal of Business & Economic Statistics 18, 451–464.Google Scholar

Saikkonen, P. & H. Lütkepohl (2000b) Trend adjustment prior to testing for the cointegrating rank of a vector autoregressive process. Journal of Time Series Analysis 21, 435–456.Google Scholar

Saikkonen, P. & H. Lütkepohl (2002) Testing for a unit root in a time series with a level shift at unknown time. Econometric Theory 18, 313–348.Google Scholar

Saikkonen, P., H. Lütkepohl, & C. Trenkler (2004) Break Date Estimation and Cointegration Testing in VAR Processes with Level Shift. European University Institute, Florence, Discussion paper ECO 2004/21.

Toda, H.Y. (1994) Finite sample properties of likelihood ratio tests for cointegrating ranks when linear trends are present. Review of Economics and Statistics 76, 66–79.CrossRef Google Scholar

Percentiles of limiting distribution of LRPAR(r0)

Break date estimates for three-dimensional DGP with r = 1 (ψ1 = 0.9), Θ = (0.4, 0.8), VAR order p = 1, true break point τ = 50, sample size T = 100, δ(2) = δ(3) = 0

Break date estimates for three-dimensional DGP with r = 1 (ψ1 = 0.9), Θ = (0.4, 0.8), VAR order p = 3, true break point τ = 50, sample size T = 100, δ(2) = δ(3) = 0

Article contents

BREAK DATE ESTIMATION FOR VAR PROCESSES WITH LEVEL SHIFT WITH AN APPLICATION TO COINTEGRATION TESTING

Abstract

1. INTRODUCTION

2. THE DATA GENERATION PROCESS

3. SHIFT DATE ESTIMATION

3.1. Estimator Based on Unrestricted Model

3.2. Constrained Estimation of τ

3.3. Ignoring Dummies in Estimating τ

4. TESTING THE COINTEGRATING RANK

5. MONTE CARLO SIMULATIONS

6. CONCLUSIONS

APPENDIX: Proofs

A.1. Proof of Theorem 3.1.

A.2. Proof of Theorem 3.2.

A.3. Proof of Theorem 4.1.

References

REFERENCES

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests