Published online by Cambridge University Press: 12 December 2005
In testing for the cointegrating rank of a vector autoregressive process it is important to take into account level shifts that have occurred in the sample period. Therefore the properties of estimators of the time period where a shift has taken place are investigated. The possible structural break is modeled as a simple shift in the level of the process. Two alternative estimators for the break date are considered, and their asymptotic properties are derived under various assumptions regarding the size of the shift. In particular, properties of the shift date estimators are obtained under the assumption of an increasing or decreasing size of the shift when the sample size grows. These results are used to explore the implications for testing the cointegrating rank of the process. A previously proposed likelihood ratio type test for the cointegrating rank and a modified version are considered, and their asymptotic properties are derived. It is shown that their asymptotic null distributions are unaffected by the level shift under the assumptions made for the shift size. The performance of the shift date estimators and the cointegrating rank tests in small samples is investigated by simulations.We thank two referees for helpful comments, and we are grateful to the Deutsche Forschungsgemeinschaft, SFB 373, and the European Commission under the Training and Mobility of Researchers Programme (contract ERBFMRXCT980213) for financial support. The first author also acknowledges financial support by the Yrjö Jahnsson Foundation, the Academy of Finland, and the Alexander von Humboldt Foundation under a Humboldt research award. Part of this research was done while he was visiting the Humboldt University in Berlin, and part of the research was carried out while he and the third author were visiting the European University Institute, Florence. An extended version of this paper is available as an EUI discussion paper under the title “Break Date Estimation and Cointegration Testing in VAR Processes with Level Shift,” ECO 2004/21.
From the unit root and cointegration testing literature it is well known that structural shifts in the time series of interest have a major impact on inference procedures. In particular, they affect the small-sample and asymptotic properties of unit root and cointegrating rank tests (see, e.g., Perron, 1989 for unit root testing; Lütkepohl, Saikkonen, and Trenkler, 2004, for cointegrating rank testing). In the latter article it is assumed that a level shift has occurred in a system of time series variables at an unknown time. Lütkepohl et al. propose to estimate the shift date in a first step and then apply a cointegrating rank test as follows. First the parameters of the deterministic part of the data generation process (DGP) are estimated by a feasible generalized least squares (GLS) procedure. Using these estimators, the original series is adjusted for deterministic terms including the structural shift, and a cointegrating rank test of the Johansen likelihood ratio (LR) type is applied to the adjusted series. They provide conditions under which the asymptotic null distribution of the cointegrating rank test in this procedure is unaffected by the level shift. They also show, however, that in small samples the way the break date is estimated may have an impact on the actual properties of the cointegrating rank test. In addition, the size of the level shift is important for the small-sample properties of the break date estimators and the tests.
Therefore, in this study we extend the results of Lütkepohl et al. (2004) in several directions. First of all we also consider another possible break date estimator. Second, we derive asymptotic properties of two break date estimators accounting explicitly for the size of the level shift. More precisely, we make the size of the level shift dependent on the sample size and provide asymptotic results for both increasing and decreasing shift sizes when the sample size goes to infinity. These results provide interesting new insights into the properties of the estimators and explain simulation results of Lütkepohl et al. that are difficult to understand if a fixed shift size is considered. Under our assumptions the null distribution of the cointegrating rank tests is still unaffected by the shift or the shift size just as in the case of a fixed shift size. We also modify the cointegrating rank tests considered by Lütkepohl et al. In their approach estimators of all parameters associated with the deterministic part of the model are estimated by the GLS procedure although the level parameters are not fully identified. In this paper we propose to estimate the identified parameters only and modify the cointegrating rank tests accordingly. Finally, we perform a more detailed and more insightful investigation of the small-sample properties of the break date estimators and the resulting cointegrating rank tests by extending the simulation design of Lütkepohl et al.
Estimating the break date in a system of I(1) variables has also been considered by Bai, Lumsdaine, and Stock (1998). These authors consider the asymptotic distribution of a pseudo maximum likelihood (ML) estimator of the break date. Although we use a similar estimator, we do not derive the asymptotic distribution of the estimators but focus on rates of convergence. Our results are important for investigating the properties of inference procedures such as cointegrating rank tests that are based on a vector autoregressive (VAR) model with estimated break date. Although Bai et al. (1998) also discuss shift sizes that depend on the sample size, our results go beyond their analysis because we consider increasing in addition to deincreasing shift sizes.
The study is structured as follows. In Section 2, the modeling framework of Lütkepohl et al. (2004) is summarized because that will be the basis for our investigation. Section 3 is devoted to a discussion of the break date estimators and their asymptotic properties. The properties of cointegrating rank tests based on a model with estimated break date are considered in Section 4, and small-sample simulation results of the break date estimators and the cointegrating rank tests are presented in Section 5. In Section 6, a summary and conclusions are given. The proofs of several theorems stated in the main body of the paper are given in the Appendix.
The following general notation will be used. The differencing and lag operators are denoted by Δ and L, respectively. The symbol I(d) denotes an integrated process of order d, that is, the stochastic part of the process is stationary or asymptotically stationary after differencing d times whereas it is still nonstationary after differencing just d − 1 times. Convergence in distribution is signified by
, and i.i.d. stands for independently, identically distributed. The symbols for boundedness and convergence in probability are as usual Op(·) and op(·), respectively. Moreover, ∥·∥ denotes the euclidean norm. The trace, determinant, and rank of the matrix A are denoted by tr(A), det(A), and rk(A), respectively. If A is an (n × m) matrix of full column rank (n > m), we denote an orthogonal complement by A⊥. The zero matrix is the orthogonal complement of a nonsingular square matrix, and an identity matrix of suitable dimension is the orthogonal complement of a zero matrix. An (n × n) identity matrix is denoted by In. For matrices A1,…,As, diag[A1 : ··· : As] is the block-diagonal matrix with A1,…,As on the diagonal. LS, RR, and VECM are used to abbreviate least squares, reduced rank, and vector error correction model, respectively. As usual, a sum is defined to be zero if the lower bound of the summation index exceeds the upper bound.
We use the general setup of Lütkepohl et al. (2004). Hence, yt = (y1t,…,ynt)′ (t = 1,…,T) is assumed to be generated by a process with constant, linear trend, and level shift terms,
Here μi (i = 0,1) and δ are unknown (n × 1) parameter vectors and dtτ is a shift dummy variable representing a shift in period τ so that
We make the following assumption for the shift date τ.
Assumption 1. Let λ, λ, and λ be fixed real numbers such that 0 < λ ≤ λ ≤ λ < 1. The shift date τ satisfies
where [·] denotes the integer part of the argument.
In other words, the shift is assumed to occur at a fixed fraction of the sample length. The shift date may not be at the very beginning or at the very end of the sample, although λ and λ may be arbitrarily close to zero and one, respectively. The condition has also been employed by Bai et al. (1998) in models containing I(1) variables. It is obviously not very restrictive.
The term μ1 t may be dropped from (2.1) if μ1 = 0 is known to hold and, thus, the DGP does not have a deterministic linear trend. The necessary adjustments in the following analysis are straightforward, and we will comment on this situation as we go along. Also, seasonal dummies may be added without major changes to our arguments. They are not included in our basic model to avoid more complex notation.
The process xt is assumed to be at most I(1) and to have a VAR(p) representation. More precisely, we make the following assumption.
Assumption 2. The process xt is integrated of order at most I(1) with cointegrating rank r and
where the Aj are (n × n) coefficient matrices. The initial values xt, t ≤ 0, are assumed to be such that the cointegration relations and Δxt are stationary. The εt are i.i.d.(0,Ω) with positive definite covariance matrix Ω and existing moments of order b > 4.
Under Assumption 2, the process xt has the VECM form
where Π = −(In − A1 − ··· − Ap) and Γj = −(Aj+1 + ··· + Ap) (j = 1,…,p − 1) are (n × n) matrices. Because the cointegrating rank is r, the matrix Π can be written as Π = αβ′, where α and β are (n × r) matrices of full column rank. As is well known, β′xt and Δxt are then zero mean I(0) processes. Defining
, we have
where ξt is an I(0) process. These properties follow from Granger's representation theorem. Further details including a precise expression of ξt are given in Johansen (1995, pp. 49–52).
Multiplying (2.1) by A(L) = In − A1 L − ··· − Ap Lp = In Δ − ΠL − Γ1 ΔL − ··· − Γp−1 ΔLp−1 yields
where ν = −Πμ0 + Ψμ1, φ = β′μ1, θ = β′δ, γ0* = δ, and γj* = −Γjδ for j = 1,…,p − 1. The quantity Δdt−j,τ is an impulse dummy with value one in period t = τ + j and zero elsewhere.
For given values of the VAR order p and the shift date τ, Johansen type cointegration tests can be performed in our model framework. In the next section we will discuss two different estimators of the break date in detail, and then we will consider cointegration tests based on a model with estimated break date in Section 4.
In the following discussion we consider two different estimators of the shift date τ. The first one is based on estimating an unrestricted VAR model in which the cointegrating rank and the restrictions for the parameters related to the impulse dummies are not taken into account. The latter restrictions are accounted for by the second estimator. At the end of this section we briefly mention a third possible estimator and some of its properties. For all procedures we assume that the VAR order p is given or has been chosen by some statistical procedure in a previous step. For the time being it is assumed to be known.
As discussed previously, our first estimator of τ is based on the model
which is obtained from (2.7) by imposing no rank restriction on Π and rearranging terms. Here ν0 = ν + Πμ1, ν1 = −Πμ1, δ1 = −Πδ, γ0 = δ − δ1, γj = γj* (j = 1,…,p − 1), and T is the sample size. The shift date is estimated as
where the
are LS residuals from (3.1) and
is the set of all shift dates considered. Notice that
cannot include all sample periods if Assumption 1 is made. Moreover, there may be nonsample information regarding the possible shift dates that makes it desirable to limit the search to a specific part of the sample period.
Instead of using the determinant of the residual covariance matrix as a criterion function for estimating the break date, one could consider other criteria such as the trace. We have chosen the determinant because it is in line with the Gaussian ML setup (for unknown cointegration rank), which can be viewed as the motivation for the LS estimator of the other parameters. Note, however, that we do not assume yt to be Gaussian.
We assume that the size of the shift depends on the sample size and may increase or decrease when the sample size gets larger. More precisely, we make the following assumption for the parameter δ.
Assumption 3. For some fixed (n × 1) vector δ*, δ = δT = Taδ*, a ≤ ½.
Thus, we allow for a decreasing, constant, or increasing shift size with growing sample size, depending on a being smaller, equal to, or greater than zero, respectively. In most cases there will be no need to use the subscript T, and so the notation δ will usually be used instead of δT. The same convention applies to parameters depending on δ (e.g., δ1) and their estimators. As mentioned earlier, break date estimation when the shift size decreases with increasing sample size has also been discussed by other authors (Bai et al., 1998). For our purposes a lower bound for a is not needed because for a small shift size the break has no effect on the cointegration tests that will be considered later, even though the break date may be more difficult to estimate in that situation. An increasing shift size is treated here for completeness, and it turns out that it provides interesting insights into the actual behavior of our shift date estimators, as will be seen in the simulations in Section 5. Moreover, letting the shift size increase with the sample size may provide information on problems related to large shifts. In particular, it is of interest to check whether large shifts may affect the asymptotic distribution of the cointegrating rank tests discussed in Section 4. The upper bound a = ½ for the rate of increase of the shift size is chosen for technical reasons because we need this bound in our proofs. From a practical point of view such a bound should not be a problem because there may not be a need to estimate the shift date by formal statistical methods if the shift size is very large. We can now present asymptotic properties of our estimator
that generalize results presented in Lütkepohl et al. (2004).
THEOREM 3.1. Suppose Assumptions 1–3 hold.
For δ1 ≠ 0 and a = 0, Lütkepohl et al. (2004) have shown that
, which is obviously a special case of our theorem. In fact, Theorem 3.1(i) shows that when the size of the break is sufficiently large, that is, a > 1/b or a > 0 and δ1 ≠ 0, the break date can be estimated accurately. More precisely, asymptotically the break date can then be located at the true break date or just a few time points before the true break date. Estimating the break date larger than the true one cannot occur in large samples. However, consistent estimation of the break date is not possible without an additional assumption for the parameters related to the impulse dummies in model (3.1). The required assumption γp−1 ≠ 0 can be seen as an identification condition for the break date. Indeed, if γp−1 = 0 and γp−2 ≠ 0, Theorem 3.1(i) only tells us that asymptotically the break date estimator will take a value that is either the true break date or the preceding time point. The intuition for this is that one of the p − 1 impulse dummies in (3.1) can be used to allow for such an incorrect estimation of the break date. In this case, even if we choose a break date one smaller than the true one we can still obtain a correct model specification with white noise errors. A similar situation occurs when more than one of the parameters γi at the largest lags are zero. Notice also that γj = 0 for all j = 0,…,p − 1 can only occur if δ1 ≠ 0 because δ ≠ 0 and γ0 = δ − δ1.
The preceding discussion implies that an overspecification of the VAR order will always make the break date estimator
inconsistent. This observation explains some of the small-sample results of Lütkepohl et al. (2004). These authors fitted VAR(3) models to VAR(1) DGPs and found that
often underestimated the true break date. In principle the same phenomenon can occur also in other situations where γp−1 = 0. However, because γ0 is always nonzero when δ ≠ 0 (and p ≥ 1) reasoning similar to that used previously explains why the break date will asymptotically not be estimated larger than the true one.
The second part of Theorem 3.1 deals with the asymptotic behavior of the estimator
when the size of the break is “small.” In this case we need to assume that δ1 ≠ 0 or that there is actually a level shift in model (3.1) and not just some exceptional observations that can be handled with impulse dummies. This assumption is not needed in the first part of the theorem where the size of the break is “large” (a > 1/b) because then even the impulse dummies can be used to estimate the break date accurately. However, even though consistent estimation of the break date is not possible in the case of Theorem 3.1(ii), consistent estimation of the sample fraction λ is still possible provided the size of the break is not “too small.” The result obtained in this context is weaker than its previous counterparts in Bai (1994), which, instead of a > η − ½, only require a > −½ (see, e.g., Proposition 3 of Bai, 1994). Complications caused by the presence of impulse dummies in model (3.1) are the reason for our weaker result. In any case, our assumption a > η − ½ is equivalent to −2a/(1 − 2η) < 1, which is clearly not very restrictive because
cannot be larger than T and is hence necessarily Op(T).
As mentioned in the introduction, Bai et al. (1998) considered the asymptotic distribution of the break date and found that the resulting interval estimator for the break date depends on the dimension of the system under consideration. Such dependence on the dimension of the model is not obtained with our approach, which provides orders of convergence only.
We shall now consider the constrained estimation of the break date in which the restrictions between the autoregressive parameters and coefficients related to the dummies are taken into account. Instead of (3.1) it is now convenient to start with the specification
where δ1 = −Πδ, as before, and the γj* are as in (2.7). Thus, we can write (3.3) as
Unlike in the unrestricted model (3.1), the impulse dummies do not appear separately anymore in the representation (3.4) but are included in the term that also involves the shift dummy. Thus, only a single parameter vector δ is associated with all the dummy variables. Consequently, the break date can be estimated more precisely, as we will see in the next theorem.
For any given value of the break date τ the parameters ν0, ν1, δ, Π, and Γ1,…,Γp−1 can be estimated from (3.4) by nonlinear LS. The estimator of the break date is then obtained by minimizing an analog of (3.2) with
replaced by residuals from this nonlinear LS estimation. The following theorem presents asymptotic properties of this break date estimator denoted by
.
THEOREM 3.2. Let Assumptions 1–3 hold and suppose that δ ≠ 0.
(i) If a > 0 and δ1 ≠ 0 or a > 1/b, then
.
(ii) If a ≤ 0 and δ1 ≠ 0, then
, where 1/b < η < ¼.
The first part of the theorem shows that taking the restrictions into account is beneficial. Unlike in Theorem 3.1(i) consistency now obtains without any additional assumptions about coefficients. The second part of the theorem, which deals with the case of a “small” break size, is similar to its previous counterpart, however.
As a final remark on our two break date estimators we mention that, if the DGP is known to have no deterministic linear trend (μ1 = 0), the corresponding terms in (3.1), (3.3), and (3.4) may be dropped without changing the convergence rates of our break date estimators.
Lütkepohl et al. (2004) also considered estimating the break date based on the VAR model (3.1) without including the impulse dummies. Thus the resulting break date estimator, say,
, is actually based on a misspecified model. In the present model framework, where the shift size depends on the sample size, it can in fact be shown that the estimator
works well, provided δ1 ≠ 0. More precisely, for
, and for
, where η > 0 (for details see Saikkonen, Lütkepohl, and Trenkler, 2004). Thus, although
is based on a misspecified model, its convergence rate is equally as good as that of the other two estimators, provided δ1 ≠ 0. Clearly, δ1 = −αβ′δ = 0 may hold even if δ ≠ 0. In fact, δ1 = 0 always holds if the cointegrating rank is zero. If δ1 = 0, there is co-breaking, and the process β′yt has no break. For such processes,
can find the shift date only by chance, whereas
can still find the true break date with some likelihood in large samples, if the shift size is large. Thus, using only the estimator
may be problematic, unless the case δ1 = 0 can be ruled out. In the next section we consider the consequences of using a model with estimated break date for testing the cointegrating rank of a system of time series variables.
For given VAR order p and some estimator of the shift date, the cointegrating rank of the DGP can be tested as discussed by Lütkepohl et al. (2004). In the following discussion it is assumed that the break date estimator is either
. The objective is to test a pair of hypotheses
Lütkepohl et al. propose using the tests suggested by Saikkonen and Lütkepohl (2000a). In their procedure, first-stage estimators for the parameters of the error process xt, that is, for α, β, Γj (j = 1,…,p − 1), and Ω are determined by RR regression applied to (2.7). Using these estimators, Lütkepohl et al. apply a feasible GLS procedure to (2.1) to estimate all the parameters of the deterministic part. The observations are then adjusted for deterministic terms, and LR type cointegration tests can be formed in the usual way by solving the related generalized eigenvalue problem based on the adjusted series (for details see Johansen, 1995, Thm. 6.3). The resulting test statistic will be denoted by LRGLS(r0) in the following discussion.
The levels parameter μ0 is not identified in the direction of β⊥ in our model setup, and its estimator is partly determined by the initial values in the procedure underlying the LRGLS test. In fact, the dependence of the LRGLS test on initial values was sometimes found to be relevant in preliminary simulations. A detailed theoretical analysis of the impact of initial values on related unit root tests is provided by Müller and Elliott (2003). Given the dependence of the LRGLS tests on initial values, one may hope to improve the performance of the tests by avoiding the estimation of μ0. Therefore we shall also consider another approach in which only the parameters μ1 and δ in the deterministic part are estimated. The effect of the level parameter will be taken into account when the test is performed.
We present the estimation procedure of the parameters μ1 and δ for a given VAR order p, cointegration rank r, and break date τ. First consider the estimation of the parameter μ1. Recall the identity ν = −Πμ0 + Ψμ1, which can be written as
or, more briefly,
where φ = β′μ1, φ* = β⊥′ μ1, Ψβ = Ψβ(β′β)−1, and Ψβ⊥ = Ψβ⊥(β⊥′ β⊥)−1. Because α⊥′Π = α⊥′αβ′ = 0, a multiplication of this identity from the left by α⊥′ yields α⊥′(ν − Ψβφ) = α⊥′Ψβ⊥φ*. The matrix α⊥′Ψβ⊥ is nonsingular, and its inverse is (α⊥′Ψβ⊥)−1 = β⊥′ β⊥(α⊥′Ψβ⊥)−1. Thus, we can solve for φ* as follows: φ* = β⊥′C(ν − Ψβφ), where C = β⊥(α⊥′Ψβ⊥)−1α⊥′ as before. Thus, if
are sample analogs of C and Ψβ, respectively, based on the RR estimation of (2.7), an estimator of φ* is given by
Here
are also based on the RR estimation of (2.7). Using the estimators
together we can form an estimator for μ1 as
The parameter δ can be estimated in a similar way. From the definitions we find that
Multiplying this equation from the left by the matrix [α⊥′ : ··· : α⊥′] yields
where θ* = β⊥′δ and θ = β′δ as in (2.7). From the foregoing equation we can solve for θ* in the same way as for φ*. The result is
, from which we form an estimator for θ* as
Here
are again based on the RR estimation of (2.7). Thus, an estimator of δ is obtained as
The test will be based on the series
which are adjusted for the deterministic trend and the shift term. Thus, apart from estimation errors we have
. This suggests that we can base a test on this approximation or on the auxiliary model
where
is defined by adding an extra column to the matrix Π in (2.5). This auxiliary model can be treated as a true model, and a LR test statistic for a specified cointegrating rank can be formed by solving the related generalized eigenvalue problem, as before. We will denote the LR statistic for the null hypothesis rk(Π) = r0 by LRPAR(r0) in the following discussion because only a partial set of parameters associated with the deterministic part is estimated in the first step. Its limiting distribution differs from that of LRGLS(r0) and also from the one given in Theorem 6.3 of Johansen (1995) for the corresponding LR test statistic. We have the following result for the case where the shift occurs in the cointegrating relations (δ1 ≠ 0) and the shift size increases with the sample size. A proof is also given in the Appendix.
THEOREM 4.1. Suppose Assumptions 1–3 hold. If δ1 ≠ 0, 0 < a < ½, and H0(r0) is true,
where B*(s) = B(s) − sB(1) is an (n − r0)-dimensional Brownian bridge, B+(s) = [B*(s)′,1]′, and dB*(s) = dB(s) − dsB(1), that is,
abbreviates
, for example.
Several remarks are worth making regarding this theorem. First, a similar result for their break date estimators and cointegrating rank test was obtained by Lütkepohl et al. (2004) under more restrictive assumptions regarding the break size. The limiting distribution of LRGLS(r0) in Theorem 4.1 is the same as its earlier counterpart in Lütkepohl et al., whereas the limiting distribution of LRPAR(r0) differs in that the process B+(s) appears in place of the Brownian bridge B*(s). The reason is of course that an intercept term is included in the auxiliary model on which LRPAR(r0) is based. On the other hand, the limiting distribution of LRPAR(r0) is formally similar to its counterpart in Theorem 6.3 of Johansen (1995), where a standard Brownian motion appears in place of the Brownian bridge in our Theorem 4.1. Notice that the term
consists of two components. The first one is
and the second one is
Second, in the case without trend in the model, that is, μ1 = 0 a priori and hence
, the processes B+(s) and B*(s) in the limiting distributions in Theorem 4.1 can be replaced by [B(s)′,1]′ and B(s), respectively. Then the limiting distribution of the test statistic LRPAR(r0) is the same as the limiting distribution of the corresponding LR test statistic in Theorem 6.3 of Johansen (1995). This result can be proved by making appropriate modifications to the proof of Theorem 4.1 in the Appendix. Moreover, in this case the limiting distribution of LRGLS(r0) is the same as that of an LR test based on a model without any deterministic terms.
Third, from the proof of Theorem 4.1 it is apparent that the same limiting distributions are obtained if the shift date is assumed known or if it is known that there is no shift in the process. In the latter case δ = 0 and only μ0 and μ1 are estimated in the first step leading to LRGLS(r0), whereas only μ1 is estimated in the first step of the LRPAR(r0) procedure. Thus, in our framework, including a shift dummy in the model and estimating its coefficients and the shift date as described in the foregoing discussion has no effect on the limiting distributions of the cointegration tests. The same result was obtained by Lütkepohl et al. (2004) for LRGLS(r0) in a more limited model framework. It may be worth emphasizing that such a result will not be obtained if instead of our estimation procedures for the deterministic parameters, the Johansen (1995) ML approach is applied to a model with estimated shift date (see also Johansen, Mosconi, and Nielsen, 2000, for a discussion of the case when the break date is known).
Extensions of our results in different directions are conceivable. In particular, limiting results as in Theorem 4.1 can also be obtained under other assumptions for the shift size. For example, if δ1 ≠ 0, the theorem holds more generally for a < ½. In particular, it holds for a = 0, where the shift size does not depend on the sample size, and for a < 0, where the shift size decreases with increasing sample size. In fact, a = ½ is the only case where a different result for the limiting distributions of the cointegration tests may be obtained. To get the same distributions as in Theorem 4.1, we then need the additional assumption that the break date estimator is consistent. This condition is satisfied for
but requires further assumptions for
(see Theorem 3.1(i)). Proofs for other assumptions regarding the shift are not given here because they require a separate treatment of different cases, which complicates the presentation. For details see, however, the discussion paper version of this article (Saikkonen et al., 2004). We have treated the case where the shift actually occurs in the cointegrating relations and the shift size may be large because in this case our theory can help to explain some simulation results of Lütkepohl et al. (2004), as we will see in Section 5.
It also seems likely that our results can be extended by including more than one shift dummy or other dummy variables in model (2.1). In fact, an additional impulse dummy and seasonal dummies were considered by Saikkonen and Lütkepohl (2000a). The result in Theorem 4.1 remains valid with additional dummies if the corresponding shift dates are known and the parameters of the additional deterministic terms are estimated in a similar way as μ1 or δ. If the dates of further shifts are unknown, it may be more difficult to construct suitable shift date estimators. This issue may be an interesting project for future research.
An extension of our framework to the case where a break occurs not only in the levels of the series but also in the trend slopes may be desirable for applied work. However, such an extension is not straightforward, and the limiting distribution of the cointegrating rank tests is likely to be affected by the break date in this case.
To apply the cointegrating rank tests we need critical values for the second limiting distribution in Theorem 4.1. The limiting distribution of LRGLS(r0) is the same as in Lütkepohl et al. (2004), and critical values are available in Lütkepohl and Saikkonen (2000, Table 1). The second limiting distribution in Theorem 4.1 is simulated numerically by approximating the standard Brownian motions with T-step random walks of dimension n − r0, as in Johansen (1995, Sect. 15.1). The percentiles in Table 1 are based on sample lengths of T = 1,000 using independent standard normal variates for the error terms and 100,000 replications of the simulation experiment. The computations are done with GAUSS V5.
In the next section we will discuss small-sample properties of the break date estimators and cointegration tests.
A Monte Carlo experiment was performed to compare our break date estimators and to explore the finite-sample properties of the corresponding cointegration test procedures. The simulations are based on the following xt process from Toda (1994), which was also used by a number of other authors for investigating the properties of cointegrating rank tests (see, e.g., Hubrich, Lütkepohl, and Saikkonen, 2001):
where ψ = diag(ψ1,…,ψr) and Θ are (r × r) and (r × (n − r)) matrices, respectively. As shown by Toda, this type of process is useful for investigating the properties of LR tests for the cointegrating rank because other cointegrated VAR(1) processes of interest can be obtained from (5.1) by linear transformations that leave such tests invariant. Obviously, if |ψi| < 1 (i = 1,…,r) we have r stationary series, and, thus, the cointegrating rank is equal to r. Hence, Θ describes the contemporaneous error term correlation between the stationary and nonstationary components. We have used three- and four-dimensional processes for simulations and report some of the results in more detail here. For given VAR order p and break date τ, the test results are invariant to the parameter values of the constant and trend because we allow for a linear trend in our tests. Therefore we use μi = 0 (i = 0,1) as parameter values throughout without loss of generality. In other words, the intercept and trend terms are actually zero although we take such terms into account, and thereby we pretend that this information is unknown to the analyst. Hence, yt = δdtτ + xt, and we have performed simulations with different δ vectors. Rewriting xt in VECM form (2.5) shows that Π = −(In − A1) = diag(ψ − Ir : 0) and, thus, δ1 = −Πδ can only be nonzero if level shifts occur in stationary components of the DGP.
Samples are simulated by starting with initial values of zero and discarding the first 50 observations. We have considered a sample size of T = 100. The number of replications is 1,000. Thus, the standard error of an estimator of a true rejection probability P is
, for example, s0.05 = 0.007. Moreover, we use different VAR orders p, although the true order is p = 1, to explore the impact of this quantity on the estimation and testing results. In all simulations the search procedures are applied to all possible break points τ from the fifth up to and including the 96th observation. This choice corresponds to the situation where no prior knowledge on the break date is available. Therefore a search is performed over the full sample period except for some observations at the beginning and at the end. Recall that our theoretical results exclude the possibility of a break at the very beginning or at the very end of the sample. Leaving out 4% of the observations at both ends is to some extent arbitrary. Because we will consider VAR orders up to p = 3, a break closer than three periods to the end of the sample results in one or more impulse dummies in (3.1) being zero throughout the sample period and therefore can not be handled in our setup. We decided to stop the search close to the end of the feasible period and treat the beginning and the end of the sample symmetrically in this respect. In practice, some prior knowledge on the break date may be available that can be used to narrow the period where a search is necessary. In that case it may be easier for an estimator to find the true break date, and, hence, the results for the break date estimation and cointegration testing may improve relative to those obtained in our simulations.
To compute the estimator
we use a nonlinear LS estimation method by applying the Gauss–Newton algorithm to minimize the sum of squared residuals corresponding to (3.4). The iterations of the algorithm stop if the change in
from iteration i to i + 1 is less than (T − p)−n, where
are the residual vectors from the nonlinear estimation of (3.4). Thus, the precision is about 10−6 for a three-dimensional process. In addition, the maximum number of iterations is set to 25. We have also worked with smaller values of our stopping criterion and higher maximum numbers of iterations for a subset of our simulation experiments but did not obtain different results.
The interpretation of the simulation results is done in two steps. First, we analyze the ability of the shift date estimators to locate the true break point. Second, we discuss the small-sample properties of the corresponding cointegration tests based on these estimators. This discussion includes a comparison of the LRPAR and LRGLS test procedures.
As a basis for the comparison of the shift date estimators we start with a three-dimensional DGP with r = 1 (ψ1 = 0.9), Θ = (0.4, 0.8), and τ = 50. Afterward, we comment briefly on the importance of the value of τ, the innovation correlation, and the results of a four-dimensional DGP with two cointegration relations without presenting detailed results. The latter DGP has been considered to study the properties of the procedures in the case of more complicated processes. Finally, we examine situations where δ1 = −Πδ = 0 and, hence, according to Theorems 3.1(i) and 3.2(i), consistent estimation requires larger shift sizes.
The break date estimates with respect to our three-dimensional basis DGP with r = 1 and a VAR order p = 1 are reported in Table 2 and Figure 1a. We consider a shift δ = (δ(1),δ(2),δ(3))′ with δ(1) ranging from 1 to 10 and δ(2) = δ(3) = 0. Hence, the shift occurs in the first component of the DGP, which is stationary according to (5.1). Thus, as discussed before, we have δ1 = Πδ = αβ′δ ≠ 0 in (3.1) and, hence, θ = β′δ ≠ 0 in (2.7).
The performance of the estimators
is very similar, although the former estimator is more successful in finding the correct break date for small shift magnitudes. Only if δ(1) = 3 and δ(1) = 5 does
perform slightly better. For large values of δ(1) both estimators perform identically. In fact, the cases of δ(1) = 3 and δ(1) = 5 represent the few exceptions in all our simulation experiments where
outperforms
. These observations also hold if one considers the small band [τ − 2; τ + 2] instead of the single true value of τ only to evaluate the break date estimator. The frequency of break date estimates
in the interval [τ − 2; τ + 2] is denoted by
, respectively, in Figure 1. Obviously, the frequency of finding the true τ increases for larger shift magnitudes. This result is not surprising given the asymptotic properties of the estimators and the fact that δ1 ≠ 0 in the present situation. Because T is fixed, changing δ(1) from 1 to 10 may be interpreted as changing a or δ* in Assumption 3 accordingly.
Next, we have fitted a VAR(3) model although the true DGP has only an order p = 1, and we give the results in Table 3 and Figure 1b. In this case j0 = 0 in Theorem 3.1(i) because γ1 = γ2 = 0. Thus, under the conditions of that theorem,
. In line with this result,
does not find the true break point τ = 50 even if δ(1) is large. In fact, we observe in Table 3 that in about two-thirds of the replications the break date is located too early. However, the estimates converge to the stated range for
, in line with our asymptotic results. Interestingly, focusing on the band [τ − 2; τ + 2],
is slightly more successful than
for δ(1) = 2 and 3 (see Figure 1b).
To analyze possible effects of the location of τ we also studied break points τ = 10, 25, 75, and 90 using the same three-dimensional DGP as before. We found that it is only slightly more difficult to detect the more extreme break points if small shift magnitudes are considered. In the case of large shift magnitudes the location of the break date becomes even less important for the estimation results. These observations were made for both break date estimators and both VAR orders p = 1 and 3.
Next, we studied the effect of the error term correlation between the stationary and nonstationary components by considering a three-dimensional DGP as before but with Θ = (0,0) and comparing the outcomes with the previous findings. The absence of instantaneous error term correlation made it more difficult for both estimators to locate the true break point regardless of the lag order. This outcome can be explained by the fact that we considered a shift only in one of the three components so that a weaker link of the components owing to Θ = (0,0) complicated the break date search. In this situation,
was always the most successful procedure, and usually its advantage was even more pronounced than in the case of Θ = (0.4, 0.8).
Our results with respect to the more complicated four-dimensional DGP with a cointegrating rank r = 2 (ψ1 = ψ2 = 0.7) and Θ = ([0.4 : 0.4]′ : [0.4 : 0.4]′) clearly indicated that the performance of the break date estimators deteriorates in the case of smaller level shifts. We chose a shift vector of the form δ = (δ(1),δ(2),δ(3),δ(4))′ with δ(1) ranging again from 1 to 10 and δ(2) = δ(3) = δ(4) = 0. By adding a further dimension to the process the importance of the break in only one component is weakened, which may explain the lower number of correct break date estimates. The relative rankings of the two estimation procedures did not change, however.
Finally, we examined two DGPs for which δ1 = Πδ = 0. For this situation, Theorems 3.1 and 3.2 state that compared to the case δ1 ≠ 0 “larger” shift magnitudes are needed to ensure that
can estimate the break date consistently. First, we considered a three-dimensional process as in the base case but with δ(3) ranging from 1 to 10 and δ(1) = δ(2) = 0. Because the shift occurs in the third component, which is nonstationary, the level shift is orthogonal to the cointegration space in line with our DGP design (5.1). Thus, we simulated a case of co-breaking. Second, we used a three-dimensional process with ψ1 = 1 so that the cointegrating rank is r = 0. In the case of r = 0, all components of the DGP are nonstationary, and therefore no error term correlation is present because Θ vanishes.
The results for the shift date estimators are depicted in Figure 2. Clearly, it was more difficult for both procedures to locate τ in this situation. In particular, the case of no cointegration is much more difficult to deal with than co-breaking (compare Figures 2c and 2d with Figures 2a and 2b). The poor performance of the shift date estimation procedures for small shift sizes relative to DGPs with δ1 ≠ 0 is in accordance with the finding in Section 3 that precise estimation requires large shift magnitudes if δ1 = 0.
To sum up, the constrained estimator
is usually at least as good as
and often superior in finding the true break date. The small-sample results for
are in line with our asymptotic derivations, which say that this procedure may estimate the break date too early when the VAR order is overspecified.
So far we have just analyzed the small-sample properties of the break date estimators in terms of their ability to locate the true shift date. If one is primarily interested in the cointegrating rank of the system the focus should be on the small-sample properties of the cointegration tests based on these different estimators. Our main conclusion is that the tests' small-sample size and power differ very little even in those cases where the break date estimators perform differently. Therefore, we only discuss the outcomes for our three-dimensional base DGP for which we have δ1 = Πδ ≠ 0, the case treated in Theorem 4.1. The results are given in Tables 4 and 5. To be precise, we present the rejection frequencies for the null hypothesis H0 : r = r0 when the LRPAR and LRGLS tests are applied to a process with known or estimated shift date. The rejection frequencies for the case r0 = 1 should give an indication of the tests' sizes in small samples. Therefore we use the term size in the following discussion when we refer to this case.
In Table 4 we see for fitted VAR(1) models that in particular the sizes of LRPAR are clearly higher than the nominal 5% level in cases of small shift magnitudes, for which we obtain many incorrect break date locations. For increasing shift magnitudes the sizes approach the values for a known shift date in line with the greater success of the estimators to locate τ. The small-sample powers (not adjusted for the variations in the sizes) are roughly constant if
is used whereas an increase in power for small values of δ(1) can be observed for
.
The impact of the shift magnitude on the small-sample properties of the cointegration tests is similar if VAR(3) models are fitted and
is used (see Table 5). Note, however, that the tests' small-sample power is clearly lower than in the VAR(1) case even if the true shift date is known. Regarding
we observe some differences if VAR(3) models are applied (compare Table 5). Here, the sizes of LRGLS and powers of both tests fall below the values for a known shift date when δ(1) is large. Obviously, the effect of the wrong locations on the small-sample properties becomes important if the shift magnitude is large.
Three important observations can be made with respect to the relative small-sample properties of the two cointegration tests LRPAR and LRGLS. First, the new test LRPAR rejects somewhat too often if the null hypothesis is true (r = 1), even if the shift date is known. These higher rejection frequencies were also found for other DGPs. Thus, it may be worth exploring small-sample corrections for the new tests in future work. Second, for increasing shift magnitudes the relative performance of LRPAR and LRGLS based on the estimator
is in general similar. Third, using
the drop in small-sample power in the case of large shift magnitudes when the VAR order is overspecified is more pronounced for LRGLS than for LRPAR. Thus, our new test proposal appears to be less affected by the incorrect break date estimates.
The overall conclusion from our simulations is that the estimator
is generally preferable to
. Taking account of the nonlinear restrictions is beneficial to both locating the shift date and testing for the cointegrating rank. In fact, estimating the shift date does not worsen the small-sample properties of the cointegration tests much relative to the case of a known break point if
is used. Given the size distortions of LRPAR even if the shift date is known, size correcting procedures may be worth exploring for this test in the future.
We have analyzed the asymptotic properties of two estimators for the shift date in a cointegrated VAR process with level shift. The shift is modeled by a simple shift dummy variable. The first estimator is based on an unrestricted VAR model, and the second one is obtained by taking into account the relation between the parameters of the stochastic and deterministic parts of the model. Asymptotic properties of both estimators are given under the assumption that the shift may depend on the sample size. Both a growing and a declining shift size when the sample size tends to infinity are considered. These results extend previous results of Lütkepohl et al. (2004), who also consider the first estimator assuming a fixed shift size. Our results shed new light on previously unexplained small-sample phenomena. We have also considered the implications of using models with estimated instead of true shift dates in testing for the cointegrating rank, and we have proposed new variants of cointegrating rank tests. These tests differ from those considered by Lütkepohl et al. in that they avoid estimating the nonidentified part of the levels parameter and proceed otherwise in a similar manner. More precisely, the trend and shift parameters are estimated in a first step, and then rank tests of the LR type are applied to the adjusted series. The asymptotic distributions of the tests are derived.
In addition to providing asymptotic results, we have also investigated the small-sample properties of the procedures using a Monte Carlo simulation experiment. It is found that the estimator that takes the restrictions into account is the most successful one in locating the true shift date. Moreover, a superior break date estimator tends to improve the small-sample properties of subsequent cointegration tests. Generally it pays to account for a shift in testing for the cointegrating rank of a system of variables when such a shift is actually present.
A comparison of the tests considered by Lütkepohl et al. (2004) and the new tests of the present paper shows, however, that the latter tend to reject a true null hypothesis more often than the Lütkepohl et al. tests. Generally the new tests tend to reject true null hypotheses too often, and hence in future research it may be of interest to develop small-sample corrections to ensure a test size close to the nominal level.
Another direction for extending our results may be to develop a joint procedure for determining the break date and the cointegrating rank. Our two-step procedure, where the break date is estimated in the first stage and then tests for the cointegrating rank are performed, is easy to use in practice and therefore has some appeal in applied work. An alternative approach may be to determine the break date and cointegrating rank jointly, for instance by minimizing a model selection criterion. We leave such extensions for future research.
Some parts of the proofs are similar to those of the corresponding results stated in Lütkepohl et al. (2004) under more restrictive conditions. Because these authors provide brief sketches of the proofs only, we also present more detailed and more complete versions of the similar parts here.
The following notational conventions are used in addition to the notation defined earlier. Right-hand side and left-hand side will be abbreviated by r.h.s. and l.h.s., respectively. The smallest and largest eigenvalues of a matrix are denoted by λmin(·) and λmax(·), respectively. The complement of a set B is signified by Bc. The dependence of quantities on the sample size T is not indicated. The symbol ⇒ signifies weak convergence in a product space of D([λ,λ]) or D([0,1]). The former is relevant for random functions depending on the parameter λ, whereas the latter is used when the weak limit is a Brownian motion. Unless otherwise stated, all limits assume that T → ∞. When obtaining weak convergences in a product space of D([λ,λ]) we frequently make use of results given in Appendix A.1 of Gregory and Hansen (1996). It is straightforward to check that these results are applicable despite the differences in assumptions.
In the proofs we assume the model and conditions described in Sections 2 and 3, where the parameters μ0, μ1, δ* ∈ Rn and the true α, β, Π, and Γj (j = 1,…,p − 1) satisfy the restrictions that ensure that the observed variables are at most I(1), whereas these restrictions are not imposed in the estimation.
The true DGP is one specific process from our model class. It is occasionally helpful to be more explicit about its particular parameter values. In these cases they will be indicated with a subscript o (e.g., μ0o, μ1o, τo, etc.). For the break date we assume for convenience that
. We begin by proving Theorem 3.1.
Instead of the series yt it will be convenient to use the mean adjusted series
Solving the preceding equation for yt and inserting the result into (3.1) yields
Here
Note that the true values of ν0(0) and ν1(0) are zero.
It will also be convenient to use the transformation Πxt−1 = α(0)ut−1(0) + ρ(0)vt−1(0), where ut−1(0) = βo′xt−1, vt−1(0) = βo⊥′xt−1, α(0) = αβ′βo(βo′ βo)−1, and ρ(0) = αβ′βo⊥(βo⊥′ βo⊥)−1. Clearly, the true values of α(0) and ρ(0) are αo and zero, respectively. With this transformation the preceding error correction form can be expressed as
Denote qtτ = [dtτ : dtτ′]′ and
With this notation (A.2) becomes
where Φ = [ν0(0) : Tν1(0) : T1/2ρ(0) : α(0) : Γ1 : ··· : Γp−1], Ξ = [δ1 : γ], and Ξ(0) = [δ1(0) : γ(0)].
Let Θ = [Φ : Ξ] contain the freely varying parameters in (A.3) or (A.2) (Ξ(0) is not a freely varying parameter because it is determined by α(0), ρ(0), and Γ1,…,Γp−1). Set
Then
is −2 times the (conditional) Gaussian log-likelihood function of the parameters in (A.3). Minimizing this function yields Gaussian ML estimators of the parameters Θ, τ, and Ω. It is not difficult to see that the resulting estimators of Θ and τ can alternatively be obtained by minimizing the concentrated counterpart of lT(Θ,τ,Ω), that is,
The definition of εtτ(Θ) (and the fact that Ξ(0) is not a freely varying parameter) makes it clear that the value of τ that minimizes the function lT(c)(Θ,τ) is identical to
defined by (3.2). Thus, (asymptotic) properties of
can be studied by using the Gaussian ML estimator of τ discussed previously. Before turning to this issue we note that the preceding discussion also makes clear that a minimizer of lT(Θ,τ,Ω) exists (for every T larger than some constant).
The proof of Theorem 3.1 consists of several steps. In the first one we consider a subset of the parameter space of (Θ,Ω) defined by
and
Note that here M does not depend on T although Φ and δ1(0) do. We now prove the following lemma.
LEMMA A.1. Let B1 = B1(M,ω,ω) be the part of the parameter space of (Θ,τ,Ω) in which conditions (A.4) and (A.5) hold. Then there exist choices of M, ω, and ω such that
with probability approaching one.
Proof. First note that
where the latter equality is justified by the weak law of large numbers.
Next, because [Tλ] ≤ τ, τo ≤ [Tλ], we find from the definitions that
Hence,
where Φ(0) = Φ + [δ1 − δ1(0) : 0]. Here
where ε* > 0 is a suitable real number and the inequality holds with probability approaching one. This fact can be justified in the same way as Lemma A.4 of Saikkonen (2001). A similar result is also obtained by changing the range of summation on the l.h.s. of (A.8) to t = [Tλ] + p,…,T. When these two eigenvalue conditions are assumed, arguments entirely similar to those in Saikkonen (2001, pp. 320–321) show that, with suitable choices of M, ω, and ω, the r.h.s of (A.7) can be made arbitrarily large whenever (Θ,τ,Ω) ∉ B1(M,ω,ω). The assertion of the lemma follows from this and (A.6). █
Lemma A.1 implies that a minimizer of lT(Θ,τ,Ω) will asymptotically satisfy inequality restrictions of the form (A.4) and (A.5). In what follows, the set B1 is always assumed to be defined in such a way that the conclusion of Lemma A.1 holds. We shall now proceed in the same way as in Saikkonen (2001) and express the function lT(Θ,τ,Ω) as a sum of two components. To this end, define
Then wt(0) = [w1t(0)′ : w2t(0)′]′, and we also partition the parameter matrix Φ conformably as Φ = [Φ1 : Φ2] where Φ1 = [ν0(0) : Tν1(0) : T1/2ρ(0)] and Φ2 = [α(0) : Γ1 : ··· : Γp−1]. With these definitions,
where ε1tτ(Θ) = −Φ1 w1t(0) − Ξqtτ + Ξ(0)qtτo and ε2t(Φ2) = Δxt − Φ2 w2t(0). Clearly, ε1tτo(Θo) = 0 and
where
For l2T(Φ2,Ω) we have the following result.
LEMMA A.2.
where the infimum is over unrestricted values of Φ2 and Ω > 0.
Proof. Because we can treat Δxt as a zero mean stationary process and because l2T(Φ2,Ω) can be interpreted as −2 times the logarithm of the Gaussian likelihood function associated with the regression model Δxt = Φ2 w2t(0) + εt, the stated result follows from standard regression theory (cf. Saikkonen, 2001, p. 321). █
Next consider the function l1T(Θ,τ,Ω). Our treatment will be divided into several steps in which the time index t is suitably restricted. This means considering the function l1T(Θ,τ,Ω) with the sample size T replaced by appropriate quantities smaller than T. Most of the subsequent results will explicitly be formulated for τ ≤ τo and only briefly discussed in the case τ ≥ τo (cf. Bai et al., 1998).
In the following results about the function l1T(Θ,τ,Ω), c1,c2,… denote positive constants and a1T,a2T,… are nonnegative random variables that depend on the sample size but not on the parameters Θ, τ, or Ω. First we prove the following lemma.
LEMMA A.3. There exists a constant c1 > 0 such that, with probability approaching one and uniformly in [Tλ] ≤ τ ≤ τo and (Θ,τ,Ω) ∈ B1,
where a1T ≥ 0 and a1T = Op(1).
Proof. For t ≤ τ − 1, ε1tτ(Θ) = −Φ1 w1t(0) and, consequently,
For L1 we have
For (Θ,τ,Ω) ∈ B1, the first eigenvalue in the last expression is bounded away from zero. That the same holds with probability approaching one and uniformly in [Tλ] ≤ τ ≤ τo for the second eigenvalue can be seen by using an analog of (A.3) of Gregory and Hansen (1996, p. 118). Thus, we have shown that L1 ≥ c1∥Φ1∥2, c1 ≥ 0, with probability approaching one.
It remains to show that L2 ≥ −a1T∥T1/2Φ1∥ with a1T having the properties stated in the lemma. To demonstrate this, notice that
Here we have used the definition of ε2t(Φ2), the Cauchy–Schwarz inequality, and the norm inequality. By an analog of (A.4) of Gregory and Hansen (1996, p. 118), the norm in the middle of the last expression is of order Op(1) uniformly in [Tλ] ≤ τ ≤ τ0 and for any fixed value of Φ2. Thus, because the parameters Φ2 and Ω belong to bounded sets when (Θ,τ,Ω) ∈ B1, it can similarly be shown that the last expression as a whole has an upper bound a1T∥T1/2Φ1∥ with a1T as required. This completes the proof. █
Our next result deals with the contribution of l1,τo−1(Θ,τ,Ω) − l1,τ−1(Θ,τ,Ω) to l1T(Θ,τ,Ω). Here the relevant expression of ε1tτ(Θ) is
where Ψ1 = Φ1 + [δ1 : 0].
LEMMA A.4. Let ε be any real number with the property 0 < ε < λo − λ. Then, for λ ≤ λ ≤ λo − ε there exists a constant c2 > 0 such that, with probability approaching one and uniformly in [Tλ] ≤ τ ≤ [T(λo − ε)] and (Θ,τ,Ω) ∈ B1,
where aiT ≥ 0 (i = 2,3), a2T = Op(1), and a3T = op(Tη) with 1/b < η < ¼.
Proof. By the definitions,
First consider L3 and for simplicity denote Ψ1 = [Ψ1 : γ] and z1tτ(0) = [w1t(0)′ : dtτ′]′. Then
where D1T = diag[T−1/2In−r+2 : Ip].
Next note that
uniformly in [Tλ] ≤ τ < τo. Because w1t(0) = [1 : t/T : T−1/2vt−1(0)′]′ this is obvious for the first and second components of w1t(0). For the third component the same is true because T−1/2max1≤t≤τo∥vt−1(0)∥ ≤ T−1/2max1≤t≤T∥βo⊥′xt−1∥ = Op(1), where the equality follows from the fact that T−1/2βo⊥′x[Ts] obeys an invariance principle. Thus, we can conclude that
uniformly in [Tλ] ≤ τ < τo − p.
The next step is to observe that
where M11(λ) is the weak limit of
(cf. (A.3) of Gregory and Hansen, 1996, p. 118). It is straightforward to check that the difference M11(λo) − M11(λ) is positive definite and its smallest eigenvalue is bounded from below by a positive constant when λ ≤ λ ≤ λo − ε.
The preceding discussion implies that, with probability approaching one, the smallest eigenvalue of the matrix on the l.h.s. of (A.10) is bounded away from zero uniformly in [Tλ] ≤ τ ≤ [T(λo − ε)]. Thus, with probability approaching one and in the required uniform sense,
where c2 > 0 is a (small) constant. This implies that it only remains to show that L4 ≥ −a2T∥T1/2Ψ1∥ − a3T∥γ∥ with a2T and a3T as stated in the lemma.
To show the previously mentioned inequality about L4, conclude from the definitions that
Arguments similar to those already used in the proof of Lemma A.3 show that
where a2T = Op(1) in the required uniform sense.
Regarding L42, one similarly obtains
where a3T = op(Tη), 1/b < η < ¼, in the required uniform sense. The latter inequality follows if the last norm in the preceding expression can be replaced by op(Tη). To justify this, recall that Δxt and w2t(0) are stationary processes with finite moments of order b > 4 and that Φ2 can be assumed to belong to a bounded set. Thus, it suffices to show that max1≤t≤T∥Δxt∥ = op(Tη) and similarly with Δxt replaced by w2t(0). This, however, can be done by using an argument entirely similar to that in (A.14) of Saikkonen and Lütkepohl (2002). The inequalities obtained for |L41| and |L42| show that L2 has the required lower bound, and the proof is complete. █
Our next result describes the contribution of l1,τo+p−1(Θ,τ,Ω) − l1,τo−1(Θ,τ,Ω) to l1T(Θ,τ,Ω). We introduce the notation
In the following lemma the relevant values of ε1tτ(Θ) can then be written as
where Ψ2 = Φ1 + [δ1 − δ1(0) : 0]. Note that here the first term in the definition of ζtτ(0) vanishes, but the general definition is convenient in later derivations. Now we can formulate the following lemma.
LEMMA A.5. There exists a constant c3 > 0 such that, with probability approaching one and uniformly in [Tλ] ≤ τ ≤ τo and (Θ,τ,Ω) ∈ B1,
where aiT ≥ 0 and aiT = Op(1) (i = 4,5).
Proof. By the definitions,
Assuming (Θ,τ,Ω) ∈ B1 we find that
Because we can here assume that Ψ2 is bounded (see (A.5)), an application of the triangle inequality and the Cauchy–Schwarz inequality shows that the absolute value of the third term in the last expression can be bounded from above by
Here the latter square root is of order Op(1) (see the argument leading to (A.10)). Hence, we can conclude that
where c3 = ω−1 > 0 and a41T = Op(1) in the required uniform sense.
Now consider L6. Arguments similar to those used in previous derivations combined with the present definition of ε1tτ(Θ) yield
It is easy to see that the first term on the r.h.s. can be used to define the term a5T in the lemma. The arguments needed are similar to those used to obtain (A.11), and they can also be applied to the second term so that we can write
where also a42T = Op(1) in the required uniform sense. The result of the lemma now follows from (A.11) and (A.12) by defining a4T = a41T + a42T. █
The next lemma is concerned with the contribution of l1T(Θ,τ,Ω) − l1,τo+p−1(Θ,τ,Ω) to l1T(Θ,τ,Ω). Here ε1tτ(Θ) is given by
LEMMA A.6. There exists a constant c4 > 0 such that, with probability approaching one and uniformly in [Tλ] ≤ τ ≤ τo and (Θ,τ,Ω) ∈ B1,
where a6T ≥ 0 and a6T = Op(1).
Proof. The proof is similar to that of Lemma A.3. █
Our next lemma is used as an alternative to Lemma A.4 in some of the subsequent derivations. The formulation of this lemma makes use of the notation ζtτ(0) employed in Lemma A.5.
LEMMA A.7. There exists a constant c5 > 0 such that with probability approaching one and uniformly in [Tλ] ≤ τ ≤ τo − 1 and (Θ,τ,Ω) ∈ B1,
where 1/b < η < ¼, aiT ≥ 0, and aiT = Op(1) (i = 7,8,9).
Proof. By the definitions,
Recall that Ψ1 = Φ1 + [δ1 : 0] and Ψ2 = Φ1 + [δ1 − δ1(0) : 0]. For t = τ,…,τo − 1, we thus have ε1tτ(Θ) = −Ψ1 w1t(0) − γdtτ = −Ψ2 w1t(0) − ζtτ(0). Hence,
Assume that (Θ,τ,Ω) ∈ B1. An application of the Cauchy–Schwarz inequality, the norm inequality, and the triangle inequality yields
Because max1≤t≤T∥w1t(0)∥ = Op(1) (see the arguments leading to (A.10)), the second square root in the last expression is of order Op(1) uniformly in [Tλ] ≤ τ < τo. Hence,
where a8T = Op(1) in the required uniform sense.
Next note that L71 ≥ 0 and λmin(Ω−1) ≥ ω−1 for (Θ,τ,Ω) ∈ B1. Consequently,
Now consider L8, for which we have
Arguments similar to those used for L73 show that
where a9T = Op(1) in the required uniform sense. The latter inequality is obtained because, for (Θ,τ,Ω) ∈ B1, the last norm in the second expression can be replaced by Op(1) by an analog of (A.4) of Gregory and Hansen (1996, p. 118).
As for L82, assume first that τ < τo − p and use the Cauchy–Schwarz inequality to conclude that
Here the second inequality is based on the definitions and the triangle inequality, whereas the third one also makes use of the Cauchy–Schwarz inequality and the norm inequality.
In the last expression
uniformly in [Tλ] ≤ τ < τo − p and (Θ,τ,Ω) ∈ B1. Here the latter result can be concluded from the Hájek–Rényi inequality given in Proposition 1 of Bai (1994). The former can be obtained by an argument similar to that used to prove (A.14) of Saikkonen and Lütkepohl (2002).
Combining the preceding discussion of L82 shows that
where a71T = Op((τo − τ)η) in the required uniform sense and the equality follows from definitions. Because for any real numbers a ≥ 0 and b ≥ 0 we have
, it follows that
In the proof of this result it was assumed that τ < τo − p, but it also holds for τo − p ≤ τ < τo. In that case arguments similar to those used for L73 give
and (A.16) holds with a71T = Op(1). The result of the lemma is obtained from the definitions of L7 and L8 in conjunction with (A.13)–(A.16) by defining
and a9T as done in (A.13) and (A.15), respectively. █
In the proof of the next lemma and also in subsequent proofs, frequent use will be made of the elementary inequality
which holds for a0,a1 ≥ 0, and a2 > 0.
LEMMA A.8. Let ε > 0 and B2 = {(Θ,τ,Ω) : ∥T1/2−ηΦ1∥2 + ∥T1/2−ηΨ2∥2 ≤ ε2}, where 1/b < η < ¼ is the same as in Lemma A.7. Then,
with probability approaching one and uniformly in [Tλ] ≤ τ ≤ τo.
Proof. By the definitions and Lemma A.2,
Thus, it suffices to show that, for some ε* > 0,
with probability approaching one and uniformly in [Tλ] ≤ τ ≤ τo.
From Lemma A.1 it follows that we only need to prove (A.19) with the set B2c replaced by B1 ∩ B2c. Let 0 < ε1 ≤ λo − λ and define the sets
According to what was said previously, it suffices to establish (A.19) separately with B2c replaced by B21 and B22. Here we are free to choose the value of ε1. Whatever our choice, Lemma A.4 can be applied on the set B21, on which we shall first concentrate.
From Lemmas A.4 and A.5 and (A.17) we first find that, uniformly in B21,
Combining these inequalities with those obtained from Lemmas A.3 and A.6 shows that, uniformly in B21,
Denote
. Then the preceding inequality implies that, uniformly in B21,
For simplicity, denote φT2 = ∥T1/2−ηΦ1∥2 + ∥T1/2−ηΨ2∥2 and note that the sum of the two norms in the last expression is at most
. Thus, uniformly in B21,
Because φT > ε on B21 and aT* = Op(1) uniformly in B21, this shows that (A.19) holds with B2c replaced by B21.
Now consider proving (A.19) with B2c replaced by B22. Here we can use Lemmas A.3, A.5, A.6, and A.7 to conclude that, with probability approaching one and uniformly in B22,
Here it is understood that a9T and the last two terms on the r.h.s. are deleted if τ = τo because then Lemma A.7 becomes redundant. By (A.17) the sum of the fifth, sixth, and seventh terms on the r.h.s. is of order op(1) uniformly in B22, and the sum of the last two terms can be bounded from below by −(1/4c5)[a7T((τo − τ)/T)η + a8T((τo − τ)/T)1/2∥T1/2−ηΨ2∥]2. Thus, expanding the square and inserting the result on the r.h.s. of the preceding inequality yields, uniformly in B22,
where
Note that here a6T,…,a9T are of order Op(1) uniformly in B22 and that, on B22, (τo − τ)/T ≤ 2ε1, say. Because we are free here to choose the value of ε1 we can choose it so small that the following two conditions hold with probability approaching one and uniformly in B22: (i) c4T(τ) ≥ c4 /2 and (ii) a10T(τ) and a11T(τ) become smaller than any preassigned positive number. Taking these facts into account and comparing the inequality (A.23) with (A.20) shows that there are only two points that make the previous proof based on inequality (A.20) directly inapplicable in the present context. These points are that instead of the terms T−ηa6T = op(1) and op(1) we have in (A.23) a10T(τ) and a11T(τ) + op(1), respectively, which are not of order op(1) but can only be replaced by an arbitrarily small positive number independent of parameters. However, this is sufficient for the application of essentially the same proof as previously. Indeed, we can conclude that, uniformly in B22, an analog of (A.21) holds except that in the last expression Tη is replaced by a fixed positive number that can be assumed as large as we wish and op(1) is replaced by a fixed negative number that, in absolute value, can be assumed as small as we wish. In particular, we can assume that Tη and op(1) in (A.21) are replaced by M/ε and −ε/M, respectively, where M can be chosen arbitrarily large. This shows that we can make the r.h.s. of the present version of (A.21) larger than some ε* > 0 with probability approaching one. Thus, there is a choice of ε1 such that (A.19) holds with B2c replaced by B21 and B22. This completes the proof. █
The next lemma is similar to Lemma A.8 except that it deals with the short-run parameter Φ2.
LEMMA A.9. Let ε > 0 and B3 = {(Θ,τ,Ω) : ∥T1/2−η(Φ2 − Φ2o)∥ ≤ ε}, where 1/b < η < ¼ is the same as in Lemma A.7. Then,
with probability approaching one and uniformly in [Tλ] ≤ τ ≤ τo.
Proof. By Lemma A.1 it suffices to prove the result with B3c replaced by B1 ∩ B3c. First consider the break dates [Tλ] ≤ τ ≤ [T(λo − ε1)] and note that the derivation of the inequality in (A.21) is valid for these break dates and for all (Θ,τ,Ω) ∈ B1 ∩ B3c. It is also valid for every ε1 > 0. Thus, an application of (A.17) shows that in this part of the parameter space T−2ηl1T(Θ,τ,Ω) ≥ op(1) holds uniformly. Next note that the inequality (A.23) is valid for [T(λo − ε)] < τ ≤ τo and for all (Θ,τ,Ω) ∈ B1 ∩ B3c. Moreover, as the discussion after that inequality reveals, we can, with a suitable (small) choice of ε1, use (A.17) to obtain an analog of (A.21) from which we conclude that, with probability approaching one and uniformly in the considered part of the parameter space, T−2ηl1T(Θ,τ,Ω) ≥ −ε2, where ε2 > 0 can be chosen arbitrarily small. From the preceding discussion and the first equality in (A.18) it thus follows that we need to show that, for some ε* > 0,
with probability approaching one. Arguments needed to show this are similar to those used in previous proofs and also very similar to those used to prove the consistency of the LS estimators of the parameters Φ2 and Ω in the standard regression model Δxt = Φ2 wt(0) + εt. Details are straightforward and are omitted. █
The next lemma again makes use of the notation ζtτ(0) introduced for Lemma A.5.
LEMMA A.10. Let
, where τ < τo and 1/b < η < ¼ is the same as in Lemma A.7. Then, there exists a real number M0 > 0 such that, for all M ≥ M0,
with probability approaching one and uniformly in [Tλ] ≤ τ ≤ τo − 1. If the quantity (τo − τ)−2η in the definition of the set B4 is replaced by T−2η the same conclusion holds.
Proof. From (A.18) it follows that it suffices to show that there exists a real number M0 > 0 such that, for all M ≥ M0 and any M1 > 0,
with probability approaching one and uniformly in [Tλ] ≤ τ ≤ τo − 1. From Lemmas A.1, A.8, and A.9 it further follows that here the set B4c can be replaced by B1 ∩ B2 ∩ B3 ∩ B4c. From (A.19) it can be seen that the value of ε in the definition of B2 can be chosen arbitrarily small.
We wish to apply Lemmas A.3, A.5, A.6, and A.7 to obtain a lower bound for l1T(Θ,τ,Ω). This lower bound can be obtained by multiplying both sides of the inequality (A.22) by T2η. By (A.17) the contribution of the first four terms to the r.h.s. of the resulting inequality can be replaced by Op(1). This is also the case for the seventh term. Hence, we can write
This holds uniformly in B1 ∩ B2 ∩ B3 ∩ B4c and [Tλ] ≤ τ ≤ τo − 1. In this part of the parameter space we also have
and a4T ≤ a4T(τo − τ)η (see Lemma A.8). Denote c* = min(c3,c5), aT* = max(a4T, a7T + εa8T), and for simplicity,
. From the lower bound obtained for l1T(Θ,τ,Ω) previously we can then further obtain
Again, this holds uniformly in B1 ∩ B2 ∩ B3 ∩ B4c and [Tλ] ≤ τ ≤ τo − 1. Now, on B4c, ξτ > M(τo − τ)η so that, for all M large enough and with probability approaching one, we can make the r.h.s. of (A.25) larger than any preassigned number M1 > 0. Thus, we have established (A.24) and thereby the first assertion of the lemma. The second assertion is obvious by (A.25) and the discussion thereafter. █
Before proceeding to new proofs we discuss how Lemmas A.3–A.10 are formulated when τ ≥ τo.
The counterpart of Lemma A.3 is concerned with the time points t = p + 1,…,τo − 1 and break dates τo ≤ τ ≤ [Tλ] but is otherwise similar to Lemma A.3.
The next time points of interest are now t = τo,…,τo + p − 1 so that we need to consider a counterpart of Lemma A.5. Here we write
where Ψ2 = Φ1 + [δ1 − δ1(0) : 0] as before and ζtτ = (dtτ − dtτo)δ1 + γdtτ − γ(0)dtτo. In other words, in place of ζtτ(0) we now use an analogous variable defined by using the parameter δ1 instead of δ1(0). However, replacing ζtτ(0) in Lemma A.5 by ζtτ is clearly possible, as can be seen from the given proof.
Instead of the time points t = τo + p,…,τ − 1 it is next reasonable to consider the time points t = τo + p,…,τ + p − 1. Then the number of time points is the same as in Lemmas A.4 and A.7. Changes in parameters have to be made, though. Now
where Ψ1(0) = Φ1 − [δ1(0) : 0]. Thus, we now have the matrix Ψ1(0) in place of Ψ1 used in Lemma A.4, and, as before, the former is defined by using δ1(0) instead of δ1 in Ψ1. The parameter γ used in Lemma A.4 is also changed by adding δ1 to its columns. With these replacements the counterpart of Lemma A.4 applies with [T(λo + ε)] ≤ τ ≤ [Tλ].
Next consider the counterpart of Lemma A.7, which is also concerned with time points t = τo + p,…,τo + p − 1. Here the preceding expression of ε1tτ(Θ) is modified to the form
where Ψ2 is as defined in the proof of Lemma A.7. In the counterpart of Lemma A.7 we then have ζtτ in place of ζtτ(0) and τo + 1 ≤ τ ≤ [Tλ]. The proof can again be obtained basically by following the previous proof.
The counterpart of Lemma A.6 is straightforward. The relevant time points are t = τ,…,T, and the obtained lower bound is as before except for the obvious change in the values of τ, which become τo ≤ τ ≤ [Tλ]. The proof is similar to the proof of Lemma A.3.
It is not difficult to check that the modified versions of Lemmas A.3–A.7 can be used to show that the results of Lemmas A.8 and A.9 also apply for τo ≤ τ ≤ [Tλ]. Regarding Lemma A.10, when τo + 1 ≤ τ ≤ [Tλ], the set B4 is defined as
but otherwise the same result obtains.
Now we can turn to our next lemma, which is central in studying asymptotic properties of the break date estimator. Recall that δ1o = −Πoδo = −αo βo′δo, where δo = Taδ*. Thus, δ1o ≠ 0 if and only if βo′δo ≠ 0. Note also that we shall use the convention that the infimum over an empty set is ∞.
LEMMA A.11. Let M > 0. Assume that δ1o ≠ 0 and define B5 = {(Θ,τ,Ω) : (|τo − τ| − p)∥δ1o∥2/(1−2η) ≤ M}, where 1/b < η < ¼ is the same as in Lemma A.7 or its counterpart when τ > τo. Then there exists a real number M0 > 0 such that, for all M ≥ M0,
with probability approaching one. If δ1o = 0 the same result holds with the set B5 replaced by
.
Proof. Assume first that τ < τo − p and δ1o ≠ 0. From Lemmas A.1, A.8, and A.9 it follows that we can replace the set B5c by B1 ∩ B2 ∩ B3 ∩ B5c.
By the definitions, δ1(0) = −Πδo = −α(0)βo′δo − ρ(0)βo⊥′δo, where βo′δo ≠ 0. On B3, ∥α(0) − αo∥ ≤ εTη−1/2 and, on B2, ∥ρ(0)∥ ≤ εTη−1 (see Lemmas A.8 and A.9). Thus, because δ1o = −αo βo′δo and δo = Taδ*,
for some positive and finite constant c. Hence, because ζtτ(0) = (dtτ − dtτo)δ1(0) = δ1(0) for t = τ + p,…,τo − 1, we have on B1 ∩ B2 ∩ B3 ∩ B5c,
Here the fourth relation makes use of the triangle inequality. For all T and M large enough the last expression can be made larger than the real number M0 in Lemma A.10. Thus, the stated result follows from Lemma A.10.
Now consider the case τ > τo + p but maintain the assumption δ1o ≠ 0. Then, using the counterparts of Lemmas A.8 and A.9 we can proceed in the same way as in the case τ < τo − p until the relations (A.26), which start now as
Thus, in place of δ1(0) we now have δ1. However, from the counterpart of Lemma A.8 we find that, on B2, ∥δ1 − δ1(0)∥ ≤ εTη−1/2 and a straightforward modification of the arguments in the latter part of (A.26) combined with the present version of Lemma A.10 give the desired result.
Next assume that δ1o = 0 and τ ≤ τo. In this case we use the inequality
where
, that is, the summation on the r.h.s. is over the values of t for which Δdtτo ≠ 0 or Δdtτ ≠ 0. Clearly the number of such time points is at most 2p.
From the definitions it follows that
Notice that here Γjoδo = −γjo (j = 1,…,p − 1) and, because now δ1o = 0, δo = γ0o. Thus, the sum of the last three terms equals γdtτ − γo dtτo, and we wish to show that the contribution of the first three terms to the r.h.s. of (A.27) can be ignored. To this end, note that now δ1(0) = −ρ(0)βo⊥′δo so that, on B2, ∥δ1(0)∥ ≤ cεTη+a−1 for some 0 ≤ c < ∞ (see Lemma A.8). Furthermore, on B3, ∥(Γj − Γjo)δo∥ ≤ ∥Γj − Γjo∥∥δo∥ ≤ εTη+a−1/2∥δ*∥ (j = 1,…,p − 1) (see Lemma A.9). Using these facts and the triangle inequality we find that
On the r.h.s. the summation can be extended to all t = p + 1,…,T. This means that on B1 ∩ B2 ∩ B3 ∩ B50c the last expression becomes larger than the real number M0 in Lemma A.10 for all T and M large enough. Thus, the stated result follows from the latter part of Lemma A.10.
Finally, assume that δ1o = 0 and τ > τo. In place of (A.27) we then have a similar inequality with t = τo,…,τ + p − 1 and ζtτ(0) replaced by ζtτ. However, using the fact that ∥δ1 − δ1(0)∥ ≤ εT1/2−η on B2 it is straightforward to show that the proof can be reduced to a form entirely similar to that in the case τ ≤ τo. This completes the proof of the lemma. █
Now we can prove Theorem 3.1. As discussed earlier, the estimator
can also be obtained by minimizing −2 times the Gaussian log-likelihood function lT(Θ,τ,Ω). First consider the case a > 0 and δ1o ≠ 0. By Lemma A.11 we can then concentrate on the break dates τo − p ≤ τ ≤ τo + p. First consider the case τo − p ≤ τ ≤ τo. If γjo = 0 for all j = 0,…,p − 1, Lemma A.11 shows that, asymptotically,
, as required. Next suppose that γj0,o ≠ 0 and consider the break dates τo − p ≤ τ ≤ τo − p + j0. For any of these break dates we have
Suppose first that j0 > 0. Then, because γj0(0) − γj0,o = −(Γj0 − Γj0,o)δo, we have for (Θ,τ,Ω) ∈ B3,
Because γj0,o = −TaΓj0,oδ* ≠ 0, the last quantity tends to infinity as T → ∞. Hence, we can conclude from Lemmas A.9 and A.10 that asymptotically the function lT(Θ,τ,Ω) is not minimized for τ ≤ τo − p + j0. Now consider the case j0 = 0. From the definitions it follows that γo(0) − γ0o = δ1o − δ1(0), where ∥δ1o − δ1(0)∥ ≤ cTη+a−1/2ε on B2 ∩ B3 (see the beginning of the proof of Lemma A.11). Hence, because γ0o = Ta(δ* + αo βo′δ*) ≠ 0, the proof given in the case j0 > 0 applies with obvious changes and shows that asymptotically
cannot occur.
To complete the proof of the first assertion, consider the case τo + 1 ≤ τ ≤ τo + p. By the definitions we then have ζτoτ = −δ1 − γ0(0) = −δo + (δ1(0) − δ1), where δo ≠ 0 and ∥δ1(0) − δ1∥ ≤ εTη−1/2 for (Θ,τ,Ω) ∈ B2 ∩ B3 (see Lemma A.8 and the definition of Ψ2 given before Lemma A.5). In the same way as in the preceding case we can thus conclude from Lemmas A.8–A.10 that asymptotically
cannot occur. This completes the proof of the first assertion in the case a > 0 and δ1o ≠ 0.
Next assume that a > η > 1/b and δ1o = 0. Then, if τ ≤ τo − p + j0 and j0 > 0,
Because the last quantity tends to infinity as T → ∞ it follows from the latter part of Lemma A.11 that asymptotically
cannot occur. If j0 = 0, we have γ0o = δo − δ1o = δo ≠ 0, and (A.28) holds with Γj0,oδ* replaced by δ*. Hence the same conclusion also obtains for j0 = 0.
If τ > τo the l.h.s. of (A.28) can be bounded from below by T−2η∥γ0o∥2 = T−2η∥δo∥2 = T2a−2η∥δ*∥2, and the situation is similar to the case j0 = 0.
Finally, the second part of the theorem follows directly from the first part of Lemma A.11. This completes the proof of Theorem 3.1.
The break date estimator
can also be obtained by minimizing the objective function lT(Θ,τ,Ω) over the relevant restricted part of the parameter space. Compared to the previous unrestricted estimation, the parameters δ1 and γ in (A.2) are no more freely varying but (smooth) functions of the parameters δ, ρ(0), α(0), and Γ1,…,Γp−1. Specifically, δ1 = −Πδ = −α(0)βo′δ − ρ(0)βo⊥′δ, γ0 = δ − δ1, and γj = −Γjδ (j = 1,…,p − 1). Unlike with the unconstrained estimation it is not quite obvious that these restricted estimators exist. This fact will therefore be justified first. After that the proof follows straightforwardly from the results used to prove Theorem 3.1.
Define
Using yt(τ) in place of xt we can obtain an analog of (A.2) in which dtτo and dtτo are replaced by dtτ and dtτ, respectively, and ut−1(0) and vt−1(0) are replaced by analogs defined in terms of yt(τ) instead of xt. In other words, in place of ut−1(0) and vt−1(0) we use ut−1(τ) = βo′yt−1(τ) and vt−1(τ) = βo⊥′yt−1(τ), respectively. In place of (A.3) we then have
where wt(τ) is an obvious modification of wt(0).
Clearly, we can express the vector εtτ(Θ) as
and use this expression in the previous definition of lT(Θ,τ,Ω). To demonstrate the existence of a minimizer of the objective function lT(Θ,τ,Ω) it also appears convenient to use the reparameterization Θ → Θ(0) = [Φ : Ξ − Ξ(0)]. Thus, if for simplicity we denote zt(τ) = [Δyt(τ)′ : wt(τ)′ : qtτ′]′ and R(Θ(0)) = [In : − Φ : Ξ − Ξ(0)] we can write the relevant objective function as
Note that in the present context the parameter Θ has the same meaning as before except that it is treated as a (smooth) function of the parameters ν0(0), ν1(0), δ, ρ(0), α(0), and Γ1,…,Γp−1. Because the parameter Ξ(0) is also a (smooth) function of (some of) these parameters the same is true for the parameter Θ(0). All these parameter restrictions are taken into account when the minimization of the objective function lT(Θ(0),τ,Ω) is considered. Notice that, because the objective function is expressed as a function of the “reduced form” parameter Θ(0), the role of the parameter restrictions is to define the permissible space of Θ(0). A similar idea, of course, applies to the previous parameterization of the objective function, that is, to lT(Θ,τ,Ω) (cf. Saikkonen, 2001, and the references therein for a similar approach).
A useful consequence of the fact that we can still interpret the objective function lT(Θ,τ,Ω) as a function of the “reduced form” parameter Θ and only restrict its permissible space is that results obtained to prove Theorem 3.1 can be applied straightforwardly even here. In particular, we wish to apply Lemma A.11 to conclude that, when the existence of a minimizer of the objective function lT(Θ,τ,Ω) is studied in the present setup, values of the break date parameter τ can be restricted as implied by this lemma. Of course, this conclusion also holds when the objective function is parameterized as lT(Θ(0),τ,Ω).
To justify the application of Lemma A.11, we first discuss how Lemmas A.1–A.10 have to be modified to match the present setup. Notice that the existence of a minimizer of the objective function lT(Θ,τ,Ω) is not needed to prove Lemmas A.1–A.11 and the same is also true for their modified versions to be discussed subsequently.
First note that Lemma A.2 is still used in its previous form and, because it is concerned with unrestricted values of Φ2 and Ω, it obviously applies in the present context. Lemma A.1 is simply modified by replacing B1 by the intersection of the restricted parameter space of (Θ,τ,Ω) and values for which the inequality constraints in (A.4) and (A.5) hold. This restricted version of the parameter space B1 is then used to replace B1 in Lemmas A.3–A.7. It is straightforward to check that the previous proofs of these lemmas apply in essence despite the differences in parameter spaces.
Next consider Lemmas A.8–A.10, where, in addition to B1, also the parameter spaces B2, B3, and B4 are redefined to allow for the employed restrictions. Again, it is not difficult to check that the previous proofs carry over. It is also easy to see that the modifications needed for Lemmas A.3–A.10 can be done in the case τ ≥ τo.
Because analogs of Lemmas A.1–A.10 hold in the present context, it is further straightforward to show that the result of Lemma A.11 also holds with the parameter space B5 redefined to account for the employed restrictions. Thus, we can conclude that when searching for a minimizer of the objective function lT(Θ(0),τ,Ω), the value of the break date parameter τ can be restricted as implied by Lemma A.11. Specifically, if δ1o ≠ 0, Lemma A.11 directly shows that τo − p ≤ τ ≤ τo + p can be assumed. If δ1o = 0 and a > b, we can even assume τo − p + 1 ≤ τ ≤ τo + p − 1, as the argument used to prove the corresponding case of Theorem 3.1(i) readily shows.
We shall now show that the function lT(Θ(0),τ,Ω) and hence lT(Θ,τ,Ω) have a minimizer with probability approaching one. In what follows, reference to Lemmas A.1–A.11 will be understood to mean the present restricted setup. We first show the following intermediate result, where the matrix DT = diag[T−1/2I : Ip] is used. Its dimension equals the dimension of the vector zt(τ).
LEMMA A.12. There exists an ε* > 0 such that
with probability approaching one and uniformly in τ, when the value of the break date parameter τ can be restricted as implied by Lemma A.11.
Proof. The values of τ can be restricted depending on the value of a and whether δ1o = 0 or not. Different cases will therefore be discussed separately.
Case (i). a > 0 and δ1o ≠ 0 or a > η > 1/b.
From Lemma A.11 we can then conclude that, if a minimizer of lT(Θ(0),τ,Ω) exists, in large samples it must be such that the corresponding τ is in the interval [τo − p,τo + p]. If δ1o ≠ 0 this follows directly from the first part of Lemma A.11. If δ1o = 0 (and a > η > 1/b) the same conclusion can be drawn from the second part of the lemma by the argument used in the proof of Theorem 3.1 to obtain (A.28).
To justify (A.31), assume first that a < ½. Then the moment matrix in (A.31) behaves asymptotically in the same way as in the proof of Theorem 3.1 in that the vectors Δyt(τ) and wt(τ) in the definition of zt(τ) can be replaced by analogs defined in terms of xt. This follows by observing that, when |τo − τ| ≤ p, the latter term on the r.h.s. of (A.29) satisfies
When a < ½ the last quantity converges to zero, and the desired conclusion is readily obtained.
If a = ½ the latter term on the r.h.s. of (A.29) has an impact, but (A.31) still obtains. To see this, suppose first that δ1o ≠ 0. Then, as |τ − τo| ≤ p, the latter term on the r.h.s. of (A.29) behaves like an impulse dummy. Because now δo = T1/2δ* this term affects the asymptotic behavior of the moment matrix in (A.31), but, as can be readily seen, it only affects the diagonal and off-diagonal elements related to ut−1(τ) and Δyt−j(τ) (j = 0,…,p − 1). Moreover, the impact is such that asymptotically the moment matrix in (A.31) only differs from that obtained in the previous case by an additive positive semidefinite matrix. Thus, from this fact and the result of the previous case one again obtains (A.31).
Next assume that δ1o = 0 and a = ½. Here the situation is similar to the preceding case except for being simpler because now ut−1(τ) = βo′xt−1 = ut−1(0). Thus, we again get (A.31), and, thus, we have justified (A.31) in the case of the first part of the theorem. It remains to consider the second part, for which the following assumption is made.
Case (ii). a ≤ 0 and δ1o ≠ 0.
If a = 0 it follows from the first part of Lemma A.11 that we can assume |τ − τo| to be bounded, and arguments similar to those in the case 0 < a < ½ and δ1o ≠ 0 show (A.31). If a < 0 we cannot restrict the values of τ. However, from (A.32) it can be seen that the vectors Δyt(τ) and wt(τ) in the definition of zt(τ) can be replaced by analogs defined in terms of xt. Arguments similar to those used in the proof of Theorem 3.1 then show that (A.31) also holds in the present case. (In particular, analogs of (A.3) and (A.4) of Gregory and Hansen, 1996, and (A.14) of Saikkonen and Lütkepohl, 2002, can be used to handle sums of cross products between [Δxt′ : wt(0)′]′ and qtτ.) █
We have now shown that when searching for a minimizer of the function lT(Θ(0),τ,Ω) we can in both parts of Theorem 3.2 restrict the values of the break date τ in such a way that (A.31) holds with probability approaching one and uniformly in τ.
Using Lemma A.12 we can analyze the function lT(Θ(0),τ,Ω) in the same way as in the proof of Proposition 3.1 of Saikkonen (2001, pp. 320–321) and conclude that it suffices to search for a minimizer of lT(Θ(0),τ,Ω) in that part of the parameter space where, in addition to the restrictions on τ, we also have 0 < ω ≤ λmin(Ω) ≤ λmax(Ω) ≤ ω < ∞ and ∥Θ(0)∥ ≤ M < ∞.
We shall demonstrate that the parameter space defined by all these restrictions is compact. To this end, note first that the restrictions imposed on Θ(0) are of the form h(Θ(0)) = 0, where h(·) is a continuous function. Thus, because the unrestricted parameter space of Θ(0) is the whole euclidean space, it follows that the restricted space is closed and its intersection with parameter values restricted by 0 < ω ≤ λmin(Ω) ≤ λmax(Ω) ≤ ω < ∞ and ∥Θ(0)∥ ≤ M < ∞ is compact. The continuity of the function lT(·,τ,·) therefore ensures that, for every relevant value of τ, a minimizer exists with probability approaching one. This proves the (asymptotic) existence of the nonlinear LS estimators of Θ(0), τ, and Ω and hence also that of Θ.
To prove part (i) of Theorem 3.2, first consider the case δ1o ≠ 0 and assume that τ ≤ τo − 1. As noted previously, we can also assume that τo − p ≤ τ. Using the definitions we can express the vector ζtτ(0) as
Taking the assumed restrictions into account we can write this further as
Here we have also made use of the facts that δ1 = −Πδ and δ1(0) = −Πδo with Π = α(0)βo′ + ρ(0)βo⊥′.
To show that asymptotically the function lT(Θ,τ,Ω) cannot be minimized for τo − p ≤ τ ≤ τo − 1, we consider two cases separately. In the first case it is assumed that δ ≥ Taε*, where ε* > 0 is arbitrary. The second case will then assume that δ < Taε*.
Now consider parameter values for which τo − p ≤ τ ≤ τo − 1 and δ ≥ Taε* hold for some ε* > 0. By Lemma A.8 we can also assume that ∥δ1 − δ1(0)∥ ≤ εTη−1/2. Using this, (A.33), and the previously mentioned parameter restrictions, we find that
Because the last quantity tends to infinity with T, it follows from Lemma A.10 that asymptotically
cannot occur.
For parameter values τo − p ≤ τ ≤ τo − 1 and δ < Taε* we can also use (A.33) and Lemma A.10. First note that, by Lemma A.8, the norm of the first term on the r.h.s. of (A.33) can be bounded by εTη−1/2. Next, from Lemmas A.8 and A.9 it follows that the term in front of δ in the second term on the r.h.s. of (A.33) can be assumed bounded, and so the norm of the whole term can be bounded by a quantity of the form c1ε*Ta, where 0 < c1 < ∞. Similar arguments can also be used to show that, at least for t = τo, the norm of the third term on the r.h.s. of (A.33) can be bounded from below by a quantity of the form c2∥δ*∥Ta, where 0 < c2 < ∞ and ∥δ*∥ ≠ 0. Thus, because ε* can be chosen arbitrarily small, the asymptotic behavior of
is dominated by the third term on the r.h.s. of (A.33), and the preceding discussion implies that this sum tends to infinity with T. From this and Lemma A.10 we can conclude that asymptotically
cannot occur.
Thus, we have shown that, when δ1o ≠ 0, asymptotically
cannot occur. A similar argument with ζtτ(0) replaced by ζtτ and with Lemma A.10 replaced by its corresponding counterpart shows that asymptotically
cannot occur either.
Now suppose that δ1o = 0 and consider the break dates τo − p ≤ τ ≤ τo − 1. Instead of (A.33) we use a slightly different representation of ζtτ(0) given by
This representation can be obtained from the definitions (cf. the similar representation used in the proof of Lemma A.11). As with the case δ1o ≠ 0, our treatment will be divided into two separate cases.
In the first one the parameter δ is restricted as δ ≥ Taε*, where ε* > 0 is arbitrary and a > η > 1/b. From the preceding representation of ζtτ(0) it then follows that
Here the last inequality makes use of the fact that ∥δ1 − δ1(0)∥ ≤ εTη−1/2 can be assumed by Lemma A.8. Because the last quantity tends to infinity with T, it follows from the latter result of Lemma A.10 that asymptotically
cannot occur.
When δ < Taε* (a > η > 1/b) is assumed, (A.34) and Lemma A.11 give the desired result much in the same way as in the case δ1o ≠ 0, where (A.33) was used instead of (A.34). First note that the norm of the first four terms on the r.h.s. of (A.34) can be bounded by a quantity of the form εcTη+a−1/2, where 0 < c < ∞. This follows from Lemma A.8 and arguments used to prove Lemma A.11 for δ1o = 0. Next, in the same way as in the case δ1o ≠ 0 one can show that the term in front of δ in the fifth term on the r.h.s. of (A.34) can be assumed bounded and, hence, the norm of the whole term can be bounded by a quantity of the form c1ε*Ta, where 0 < c1 < ∞. By similar arguments we finally find that, at least for t = τo, the norm of the last term on the r.h.s. of (A.34) can be bounded below by a quantity of the form c2∥δ*∥Ta, where 0 < c2 < ∞ and ∥δ*∥ ≠ 0. Thus, because ε* can be chosen arbitrarily small, the asymptotic behavior of
is dominated by the last term on the r.h.s. of (A.34), and it follows from the latter result of Lemma A.10 that asymptotically
cannot occur.
Thus, we have shown that, when δ1o = 0, we asymptotically cannot have
. Again a similar proof with ζtτ(0) replaced by ζtτ and Lemma A.10 replaced by its corresponding counterpart shows that asymptotically
cannot occur either. This completes the proof of part (i) of the theorem in the case δ1o = 0. Part (ii) is a consequence of the (asymptotic) existence of
and Lemma A.11. Hence, the proof of Theorem 3.2 is complete.
For simplicity we will denote the break date estimator by
. This estimator can be either of the two estimators considered in Section 3 unless explicit distinctions are made. From the assumptions δ1 ≠ 0 and 0 < a < ½ and Theorems 3.1 and 3.2 it follows that asymptotically
can be assumed. This fact will be used in several arguments of the proof without explicit notice.
Properties of RR Estimators. We shall first show that the RR estimators of the parameters based on equation (2.7) with the unknown break date τo replaced by the estimator
satisfy appropriate consistency properties. This replacement changes the VECM (2.7) to
where
Write
where
. Using the transformation
we can transform the preceding VECM to the form
where ν(0) = ν + αβ′μ0o − Ψμ1o, φ(0) = φ − β′μ1o, θ(0) = θ − β′δo, γ0*(0) = δ − δo, and γj*(0) = γj* + Γjδo (j = 1,…,p − 1). Note that the true values of these parameters are zero. RR estimators of the parameters in (A.38) are obtained by transforming the RR estimators based on (A.35) in the same way as the corresponding parameters (e.g.,
). Asymptotic properties of these transformed estimators are derived subsequently. We denote by
normalized versions of the estimators
, respectively, such that
.
LEMMA A.13. Under the conditions of Theorem 4.1,
,
.
Proof. We first note that the result of Lemma A.13 also holds when the break date is assumed known. A formal proof of this can be obtained by following the proof of Lemma 2.1 of Saikkonen and Lütkepohl (2000a) and observing that the omission of some impulse dummies from the model considered by Saikkonen and Lütkepohl is of no significance and that the same is true for the dependence of the parameter δ on the sample size. The latter fact is clear because the results of Lemma A.13 are formulated by using the transformed model (A.38) in which the true values of the deterministic parameters are zero.
Because the result of Lemma A.13 holds when the break date is assumed known it also holds when the break date can be consistently estimated, that is, when
. Indeed, then the analysis can be restricted to that part of the sample space where
holds and the probability of this can be made arbitrarily close to unity for all T large enough. This proves the results of the lemma for the constrained estimator
.
If j0 = p − 1 in Theorem 3.1(i) the preceding argument also applies to the unconstrained estimator
. For other values of j0 further arguments are needed. By Theorem 3.1(i) it suffices to consider any value of the break date such that τo − p + 1 + j0 ≤ τ ≤ τo. For simplicity, consider the case j0 = p − 2 and τ = τo − 1. It is easy to see that even though the break date is misspecified by one we can still consider (2.7) a correctly specified model if we only redefine the parameters γ0*,…,γp−1* as γ0* = αβ′δ, γ1* = δ, and γj* = −Γj−1δ, j = 2,…,p − 1. By assumption we then have γp−1* ≠ 0 whereas Γp−1δ = 0. With these new definitions the error term of model (2.7) is still εt, and the analysis given in the case of a known break date can be used. Because the other parameters of the model are not affected by the redefinition of the parameters γj* (j = 0,…,p − 1) the obtained consistency results will be the same as in the case where the true break date is known. The same argument can clearly be extended to other values of j0. This completes the proof. █
Properties of the new estimators of the deterministic parameters. We shall now consider asymptotic properties of the estimators
by assuming that the break date τ in (2.7) is replaced by one of the estimators
.
LEMMA A.14. Under the conditions of Theorem 4.1, the estimators
have the following properties:
Proof. We start with the results (A.39) and (A.40). Recall the definitions ν = −αβ′μ0 + Ψμ1, ν(0) = ν + αβ′μ0o − Ψμ1o, and φ(0) = φ − β′μ1o, which imply that
Here the latter equality is obtained by arguments similar to those used to define the estimator
. These arguments further show that β⊥′(μ1 − μ1o) = β⊥′C(ν(0) − Ψβφ(0)), and the same relation applies to estimators. Thus, we have
Here and in what follows the subscript 0 is omitted from the estimators of α and β to simplify the notation. By Lemma A.13, one obtains from the previous equality
Note that the estimator
can be viewed as the LS estimator of the parameter ν(0) in the auxiliary regression model obtained by replacing
in (A.38) by its observed analog
. This implies that
can be obtained by LS from the auxiliary regression model
where
, Λ is a conformable coefficient matrix, and the error has the representation
. By the definition of C and Lemma A.13,
. Using this fact, Lemma A.13, and the assumptions, it is straightforward to show that the asymptotic properties of the LS estimator of the parameter Λ in the auxiliary regression model (A.43) can be obtained by assuming that the error equals
. The same arguments and the definition of
(see (A.36)) further show that the error can be assumed to be
or even β⊥′Cεt. Because it is also straightforward to show that the estimation of the intercept term in (A.43) is asymptotically independent of the estimation of the other regression coefficients we can conclude that
This and a standard central limit theorem yield
To obtain (A.40) we need to show that
on the l.h.s. can be replaced by β⊥. To see this, write
By the consistency of the estimator
and the result just obtained the latter term on the r.h.s. is of order op(T−1/2), and the same is true for the former because
by Lemma A.13. From this last result one can obtain (A.39) because
can be replaced by β using an argument similar to that used in (A.44).
Now consider the estimator
. From its derivation we get the identity
. By the definitions, this is equivalent to
Because the same relation applies to estimators, arguments similar to those used to define the estimator
yield
Lemma A.13 implies that the r.h.s. of this equality is of order Op(1). Moreover,
. Thus, (A.41) and (A.42) follow because in these results
can be replaced by β and β⊥, respectively, by using an argument similar to that in (A.44). This completes the proof of Lemma A.14. █
Proof of the limiting distribution of LRPAR. The structure of our proof of the limiting distribution of LRPAR(r0) is similar to that of Theorem 11.1 of Johansen (1995). Therefore we just outline the arguments in the following discussion.
First note that the limiting distribution of the test statistic LRPAR(r0) can be derived by assuming that the true value of the parameter μ0 is zero. Thus, we can write equation (4.1) as
Using this representation, the assumption a < ½, and the asymptotic properties of the estimators
obtained in Lemma A.14, we can now mimic the proof given in Johansen (1995, pp. 158–160) and see that all the quantities that therein converge in probability to constants will here converge in probability to the same constants. However, quantities that in Johansen (1995, pp. 158–160) converge weakly to functionals of a Brownian motion will here converge weakly to different functionals of a Brownian motion. Here these weak limits are determined by the weak limit of
. We have
where W(s) is an (n − r0)-dimensional Brownian motion with covariance matrix Ω and hence the limit is a linear transformation of the Brownian bridge W+(s) = W(s) − sW(1). The error term in the equality is understood to hold in the Skorohod topology.
To justify (A.46), first consider the equality. Because
by (A.42) of Lemma A.14, it is clear that the contribution of the third and fourth terms on the r.h.s. of (A.45) to
is asymptotically negligible. The same argument also applies to the fifth term on the r.h.s. of (A.45) because a < ½. As for the weak convergence in (A.46), it can be justified by a standard functional central limit theorem and (A.40) of Lemma A.14 by observing that the limit of the second expression is determined by the process εt (see Johansen, 1995, eqn. (B.24), and the proof of (A.40)).
The preceding discussion implies that the limiting distribution of the test statistic LRPAR(r0) can be derived by ignoring the last three terms on the r.h.s. of (A.45). This means that in the same way as in Saikkonen and Lütkepohl (2000a) we have reduced the problem to that of no break studied by Saikkonen and Lütkepohl (2000b). From Lemma A.14 and the proof of Theorem 3 of Saikkonen and Lütkepohl (2000b) it can be seen that, when μ0o = 0 is assumed, the trace test statistic in that theorem is asymptotically equivalent to a similar test statistic based on an analog of (4.2) defined by replacing
. It is straightforward to show that the use of
instead of
changes the limiting distribution of the test statistic as stated in the theorem. In other words, because the vector
is obtained from
by augmenting with unity, the same augmentation results in one of the two Brownian bridges in the limiting distribution obtained in Theorem 3 of Saikkonen and Lütkepohl (2000b). Technical details, which are similar to the corresponding two cases in Johansen (1995, Sect. 11.2), are straightforward and will be omitted.
Asymptotic properties of the GLS estimators of the deterministic parameters. Because the break date estimator is asymptotically between τo − p and τo it is straightforward to follow the proof of Theorem 2.1 of Saikkonen and Lütkepohl (2000a) (case a1 < 1) and obtain asymptotic properties of the GLS estimators of the parameters μ0, μ1, and δ. Denoting these GLS estimators by
, it can be demonstrated that (A.39)–(A.42) of Lemma A.14 hold for
except that in (A.42) Op(Ta) replaces Op(1). For
it follows that
.
Limiting distribution of LRGLS. The test statistic LRGLS(r0) is defined as LRPAR(r0) except that now
. Instead of (A.45) we therefore have
where the estimators on the r.h.s. satisfy the rates of convergence obtained previously. It is straightforward to check that, under the conditions of Theorem 4.1, the rates of convergence obtained for
are sufficient for the fifth term on the r.h.s. of (A.47) to have no effect on the asymptotic properties of the second sample moments on which the test statistic is based, and the same is true for the last term, which is as in (A.45). Thus, the problem reduces to that of a known break date studied by Saikkonen and Lütkepohl (2000a), and the limiting distribution of the test statistic is obtained from Theorem 3.1 of that paper. Here it suffices to note the following points. First, the dependence of the break size on the sample size has no effect because the needed arguments only involve the difference
. Second, it is not difficult to check that the rate of convergence
suffices instead of
, which could be used in Saikkonen and Lütkepohl (2000a) and the same is true for
. Thus, we have demonstrated that, under the conditions of Theorem 4.1, the test statistic has the same limiting distribution as in Saikkonen and Lütkepohl (2000a).