Published online by Cambridge University Press: 08 June 2004
We examine properties of residual-based tests for the null of no cointegration for dynamic panels in which both the short-run dynamics and the long-run slope coefficients are permitted to be heterogeneous across individual members of the panel. The tests also allow for individual heterogeneous fixed effects and trend terms, and we consider both pooled within dimension tests and group mean between dimension tests. We derive limiting distributions for these and show that they are normal and free of nuisance parameters. We also provide Monte Carlo evidence to demonstrate their small sample size and power performance, and we illustrate their use in testing purchasing power parity for the post–Bretton Woods period.I thank Rich Clarida, Bob Cumby, Mahmoud El-Gamal, Heejoon Kang, Chiwha Kao, Andy Levin, Klaus Neusser, Masao Ogaki, David Papell, Pierre Perron, Abdel Senhadji, Jean-Pierre Urbain, Alan Taylor, and three anonymous referees for helpful comments on various earlier versions of this paper. The paper has also benefited from presentations at the 1994 North American Econometric Society Summer Meetings in Quebec City, the 1994 European Econometric Society Summer Meetings in Maastricht, and workshop seminars at the Board of Governors of the Federal Reserve, INSEE-CREST Paris, IUPUI, Ohio State, Purdue, Queens University Belfast, Rice University–University of Houston, and Southern Methodist University. Finally, I thank the following students who provided assistance in the earlier stages of the project: Younghan Kim, Rasmus Ruffer, and Lining Wan.
The use of cointegration techniques to test for the presence of long-run relationships among integrated variables has enjoyed growing popularity in the empirical literature. Unfortunately, a common dilemma for practitioners has been the inherently low power of many of these tests when applied to time series available for the length of the postwar period. Research by Shiller and Perron (1985), Perron (1989, 1991), and recently Pierce and Snell (1995) has generally confirmed that it is the span of the data, rather than the frequency, that matters for the power of these tests. On the other hand, expanding the time horizon to include prewar data can risk introducing unwanted changes in regime for the data relationships. In light of these data limitations, it is natural to question whether a practical alternative might not be to bring additional data to bear upon a particular cointegration hypothesis by drawing upon data from among similar cross-sectional data in lieu of additional time periods.
For many important hypotheses to which cointegration methods have been applied, data are in fact commonly available on a time series basis for multiple countries, for example, and practitioners could stand to benefit significantly if there existed a straightforward manner in which to perform cointegration tests for pooled time series panels. Many areas of research come to mind, such as the growth and convergence literature and the purchasing power parity (PPP) literature, for which it is natural to think about long-run properties of data that are expected to hold for groups of countries. Alternatively, time series panels are also increasingly available for industry level data and stock market data. For applications where the cross-sectional dimension grows reasonably large, existing systems methods such as the Johansen (1988, 1991) procedure are likely to become infeasible, and panel methods may be more appropriate.
On the other hand, pooling time series has traditionally involved a substantial degree of sacrifice in terms of the permissible heterogeneity of the individual time series.1
See, for example, Holz-Eakon, Newey, and Rosen (1988) on the dynamic homogeneity restrictions required typically for the implementation of panel vector autoregression (VAR) techniques.
Initial work on nonstationary panels was done in the context of testing for unit roots. For example Quah (1994) derives asymptotically normal distributions for standard unit root tests in panels for which the time series and cross-sectional dimensions grow large at the same rate. Levin, Lin, and Chu (2002) extend this work for the case in which both dimensions grow large independently and derive asymptotic distributions for panel unit root tests that allow for heterogeneous intercepts and trends across individual members. Im, Pesaran, and Shin (2003) develop a panel unit root estimator based on a group mean approach. Since the original versions of this paper, many other works have extended the literature on nonstationary panels, and we refer readers to recent surveys by Banerjee (1999), Phillips and Moon (2000), and Baltagi and Kao (2000).
In earlier working paper versions of this work, Pedroni (1995, 1997a, 2001a), we examined the properties of spurious regressions and residual-based tests for the null of no cointegration for both homogeneous and heterogeneous panels and studied special conditions under which tests for the null of no cointegration with homogeneous slope coefficients are asymptotically equivalent to raw panel unit root tests. In the interest of space, in this version we focus on the most general of these results, namely, the tests for the null of no cointegration for panels with heterogeneous dynamics and heterogeneous slope coefficients. In particular, we study both between dimension and within dimension residual-based test statistics. Each of these tests is able to accommodate individual specific short-run dynamics, individual specific fixed effects and deterministic trends, as well as individual specific slope coefficients. We derive limiting distributions for these under the null and show that each is standard normal and free of nuisance parameters. We also provide Monte Carlo evidence to document and compare their small sample performance and illustrate their use in testing weak PPP for a panel of post–Bretton Woods exchange rate data.
The remainder of the paper is organized as follows. Section 2 presents the underlying theory and asymptotic results for each of the test statistics. Section 3 studies the small sample properties of these tests under a variety of different scenarios for the error processes, and Section 4 demonstrates a brief empirical application of the tests to the hypothesis of PPP for a panel of post–Bretton Woods exchange rate data. Section 5 ends with a few concluding remarks. The derivations for each of the results in Section 2 are collected in the Appendix.
In its most general form, we will consider the following type of regression:
for a time series panel of observables yit and Xit for members i = 1,…,N over time periods t = 1,…,T, where Xit is an m-dimensional column vector for each member i and βi is an m-dimensional row vector for each member i. The variables yit and Xit are assumed to be integrated of order one, denoted I(1), for each member i of the panel, and under the null of no cointegration the residual eit will also be I(1). The parameters αi and δi allow for the possibility of member specific fixed effects and deterministic trends, respectively. The slope coefficients βi are also permitted to vary by individual, so that in general the cointegrating vectors may be heterogeneous across members of the panel.
For regressions of the form given in (1), we will be interested in studying the properties of tests for the null hypothesis H0: “all of the individuals of the panel are not cointegrated.” For the alternative hypothesis, it is worth noting that if the underlying data generating process (DGP) is assumed to require that all individuals of the panel be either uniformly cointegrated or uniformly not cointegrated, then the natural interpretation for the alternative hypothesis is simply H1: “all of the individuals are cointegrated.” On the other hand, if the underlying DGP is assumed to permit individual members of the panel to differ in whether or not they are cointegrated, then the natural interpretation for the alternative hypothesis should be H1: “a significant portion of the individuals are cointegrated.”
In earlier versions of this work we also studied the properties of tests for the null of no cointegration in panels for which the slope coefficients βi are constrained to be homogeneous across all individuals. For such panels in which the estimated slope coefficients are constrained to be homogeneous, we showed that under the special case of strict exogeneity for the regressors, the distribution for residual-based tests of the null of no cointegration is asymptotically equivalent to the distribution for raw panel unit root tests despite the fact that the residuals are estimated.2
For the case in which the regressors are endogenous, the asymptotic equivalence result no longer holds for panels with homogeneous slope coefficients, and it is necessary to adjust for the asymptotic bias induced by the estimated regressor effect. In fact, the work in Kao (1999) has since examined the properties of such a test for the null of no cointegration that adjusts for the bias term due to the estimated regressors effect under endogeneity for the special case in which both the slope estimates and the short-run dynamics are constrained to be homogeneous across members of the panel. The difficulty with this approach arises when we attempt to interpret the resulting tests for the null of no cointegration in the case when the true DGP is such that the slope coefficients are not common across individual members of the panel. Specifically, in this case if we impose a common slope coefficient despite the fact that the true slopes are heterogeneous, then the estimated residuals for any member of the panel whose slope differs from the average long-run regression correlation will be nonstationary, even if in truth they are cointegrated. In many situations the true slope coefficients are likely to vary across individuals of the panel, and the implications for constraining the coefficients to be common are unlikely to be acceptable for tests of the null of no cointegration.For this reason, we present here a set of residual-based test statistics for the null of no cointegration that do not pool the slope coefficients of the regression and thus do not constrain the estimated slope coefficients to be the same across members of the panel. These statistics will be applicable as tests for the null of no cointegration in the general case in which the regressors are fully endogenous and the slope coefficients are permitted to vary across individual members of the panel. Since both the dynamics and the cointegrating vector itself are permitted to vary across individual members of the panel, one can think of the test as effectively pooling only the information regarding the possible existence of the cointegrating relationship as indicated by the stationarity properties of the estimated residuals.
To study the distributional properties of such tests, we will describe the DGP in terms of the partitioned vector zit′ ≡ (yit,Xit′) such that the true process zit is generated as zit = zit−1 + ξit, for ξit′ ≡ (ξity,ξitX′). We then assume that for each member i the following condition holds with regard to the time series dimension.
Assumption 1.1 (Invariance Principle). The process ξit′ ≡ (ξity,ξitx′) satisfies
, for each member i as T → ∞, where ⇒ signifies weak convergence and Bi(Ωi) is vector Brownian motion with asymptotic covariance Ωi such that the m × m lower diagonal block Ω22i > 0 and where the Bi(Ωi) are taken to be defined on the same probability space for all i.
Assumption 1.1 states that the standard functional central limit theorem is assumed to hold individually for each member series as T grows large. The conditions on the error process required for this convergence are relatively weak and include the class of all stationary autoregressive moving average (ARMA) processes.
3See standard references, for example, Phillips (1986, 1987), Phillips and Durlauf (1986), and Phillips and Solo (1992), for further discussion of the conditions under which Assumption 1.1 holds more generally.
and can be decomposed as Ωi ≡ Ωio + Γi + Γi′, where Ωio and Γi represent the contemporaneous and dynamic covariances, respectively, of ξit for a given member i. The matrix is partitioned to conform with the dimensions of the vector ξit′ ≡ (ξity,ξitx′) so that the Ω22i element is an m × m-dimensional matrix. The off-diagonal terms Ω2li capture the feedback between the regressors and the dependent variable, and in keeping with the cointegration literature we do not require that the regressors xit be exogenous. The fact that Ωi is permitted to vary across individual sections of the panel reflects the fact that we will permit all dynamics that are absorbed in the asymptotic covariance matrix to be heterogeneous. Finally, by requiring that Ω22i > 0, we are ruling out cases where the regressors are themselves cointegrated with one another in the event that we have multiple regressors.
In addition to the conditions for the invariance principle with regard to the time series dimension, we will also assume the following condition in keeping with a basic panel data approach.
Assumption 1.2 (Cross-Sectional Independence). The individual processes are assumed to be independent and identically distributed (i.i.d). cross-sectionally, so that E [ξitξjs′] = 0 for all s,t,i ≠ j. More generally, the asymptotic long-run variance matrix for a panel of size N × T is block diagonal positive definite with the ith diagonal block given by the asymptotic covariances for member i, such that diag(Ω1,…,ΩN) . The process ξit is taken to be generated by a linear process ξit = Ci(L)ηit, where Ωi = Ci(1)Ci(1)′ and where the white noise innovations ηit and the random coefficients Ci(L) are independent of one another and each i.i.d. over both the i and t dimensions.
This condition will allow us to apply standard central limit theorems in the cross-sectional dimension in the presence of heterogeneous errors in a relatively straightforward fashion. As in Phillips and Moon (1999), we take the coefficients Ci(L) as being drawn from a distribution that is i.i.d. over the i dimension and independent from the ηit innovations. In the empirical illustration of Section 4 we also discuss some possibilities for dealing with the case in which such independence is violated in practice. Finally, note that the condition that Ωi > 0 ensures that there is no cointegrating relationship between yit and the Xit, as will be the case under the null hypothesis that we consider throughout this study.
Together, Assumptions 1.1 and 1.2 will provide us with the basic conditions for investigating the asymptotic properties of the various statistics as the dimensions T and N grow large. The first assumption will allow us to make use of standard asymptotic convergence results over the time series dimension for each of the individual members. In particular, we will make use of the fact that the following convergencies, developed in Phillips and Durlauf (1986) and Park and Phillips (1988), must also hold for each of the individual members i = 1,…,N as T grows large, so that
where Zi(r) ≡ (Vi(r),Wi(r)′)′ is vector Brownian motion, such that Vi(r) and Wi(r) are independent standard Wiener processes for all i, where Wi(r) is itself an m × 1 dimensioned vector, Γi is as previously defined, and Li is a lower triangular decomposition of Ωi such that
The convergence results in equations (2) and (3) hold under standard assumptions regarding the initialization of zio, and for convenience we will take these to be common across the panel such that zio = 0 for all i.
The second key assumption, Assumption 1.2, will then allow us to apply simple averaging arguments over the cross-sectional sums of the corresponding Brownian motion functionals that are used to construct the panel statistics. In particular, we will make use here of sequential limit arguments to investigate the properties of the panel statistics as the time series and cross-sectional dimensions grow large. Specifically, this means that in computing the limiting properties for the panel statistics, we will first take the limit as T grows large, followed sequentially by the limit as N grows large. Thus, for the typical double sum statistic
involved in constructing the panel statistics, we write
where
. Let Ri be the limit of the standardized sum of SiT as T → ∞. For the sequential limit, we first compute Ri as T → ∞ and then compute the limit of the standardized sum of
as N → ∞. The sequential limit is a convenient method for computing the limit distribution for a double index process. However, it is not the most general method. Phillips and Moon (1999) formalize the notion of sequential limits for nonstationary double index processes and also compare it to the more general method of joint limits. In contrast to the sequential limit, the joint limit allows both indexes to pass to infinity concurrently, rather than in sequence. As Phillips and Moon (1999) point out, this implies that the joint limit distribution characterizes the limit distribution for any monotonic expansion rate of T relative to N. Phillips and Moon (1999) also provide a specific set of conditions that are required for sequential convergence to imply joint convergence. Following Phillips and Moon, we will denote sequential limits as (T,N → ∞)seq.
In our context, the sequential limit substantially simplifies the derivation of the limit distribution for two important reasons. One reason is that it allows us to control the effect of nuisance parameters associated with the serial correlation properties of the data in the first step as T → ∞ by virtue of the standard multivariate invariance principle. This substantially simplifies the computation of the limit as N → ∞ for the panel statistics in the second step because it implies that we can typically characterize the heterogeneity of the standardized sum of random variables
in terms of a single nuisance parameter associated with the conditional long-run variance of the differenced data, L11i2. Another reason the sequential limit simplifies the derivation is that applying the limit as T → ∞ in the first step allows one to focus only on the first-order terms of the limit in the time series dimension, since the higher order terms are eliminated prior to averaging over the N dimension. This second feature is particularly convenient for the purposes of computing the limit for the panel. However, this feature can also be deceptive in its simplicity because it hides the need to control the relative expansion rate of the two dimensions as is often the case for the more general joint limit. In practical terms, the relative expansion rate can also be an important indicator for the small sample properties of the statistics for different dimensions of N and T. In Section 3, we illustrate this in terms of a series of Monte Carlo experiments that examine the size properties of the statistics as N and T grow large along various diagonal paths characterized by different monotonic rates of expansion of T relative to N.
In particular, we consider two classes of statistics. The first class of statistics is based on pooling the residuals of the regression along the within dimension of the panel, whereas the second class of statistics is based on pooling the residuals of the regression along the between dimension of the panel. The basic approach in both cases is to first estimate the hypothesized cointegrating relationship separately for each member of the panel and then pool the resulting residuals when constructing the panel tests for the null of no cointegration. Specifically, in the first step, one can estimate the proposed cointegrating regression for each individual member of the panel in the form of (1), including idiosyncratic intercepts or trends as the particular model warrants, to obtain the corresponding residuals
. In the second step, the way in which the estimated residuals are pooled will differ among the various statistics, which are defined as follows.
DEFINITION 1 (Panel and Group Mean Cointegration Statistics for Heterogeneous Panels). Let
where
is estimated from a model based on the regression in (1). Then we can define the following test statistics for the null of no cointegration in heterogeneous panels:
where
for some choice of lag window
where
such that
is a consistent estimator of Ωi.
The first three statistics are based on pooling the data across the within dimension of the panel, which implies that the test statistics are constructed by summing the numerator and denominator terms separately for the analogous conventional time series statistics.
4In earlier versions of this study, we presented these panel statistics in a form in which each of the component statistics of the numerator and denominator was weighted by the member specific long-run conditional variances L11i2, whereas here we present versions of the statistics that are not weighted by L11i2. The distinction between weighted and unweighted statistics is a fairly common occurrence in panels, and the limit distribution is the same for both types. However, in Monte Carlo simulations, we found that the unweighted statistics consistently outperformed the weighted statistics in terms of the small sample size properties. Consequently, we present here only the unweighted statistics. We are thankful to an anonymous referee for suggesting the unweighted form of the statistics.
“panel-rho” statistic is analogous to the semiparametric “rho” statistic studied in Phillips and Perron (1988) and Phillips and Ouliaris (1990) for the conventional time series case, and the panel statistics can be constructed by taking the ratio of the sum of the numerators and the sum of the denominators of the analogous conventional time series “rho” statistic across the individual members of the panel. Likewise, the
“panel-t” statistic, and the
“panel variance ratio” statistics are analogous to the semiparametric t-statistic and the long-run variance ratio statistic, each of which was also studied in Phillips and Ouliaris (1990) for the conventional time series case.
The next two statistics are constructed by pooling the data along the between dimension of the panel. In practice this implies that the statistics can be constructed by first computing the ratio corresponding to the conventional time series statistic and then computing the standardized sum of the entire ratio over the N dimension of the panel. Consequently, these statistics in effect compute the group mean of the individual conventional time series statistics. We have presented here two group mean statistics,
, that are analogous to the rho-statistic and t-statistic studied in Phillips and Ouliaris (1990) for the conventional time series case. In principle it is also possible to construct a group mean variance ratio statistic analogous to the one presented for the pooled panel cointegration statistics in Definition 1. We also experimented with such a statistic and found it to be dominated by the other two in terms of the small sample size properties. Consequently we present here only the semiparametric group-rho and group-t statistics.
In the interest of space and simplicity, we have only presented here the forms of the statistics that correspond to the nonparametric treatment of the nuisance parameters. However, it should be apparent that the nuisance parameters can also be treated parametrically for both the panel and group mean statistics, in which case the same limit distributions presented in the following proposition still apply. We refer readers to earlier versions of this work for a discussion of the parametric treatment of these in the form of panel and group mean augmented Dickey–Fuller (ADF) statistics.
5For example, the parametric analogue to the ZtNT-statistic would take the form of the standard ADF correction. For more details on the parametric ADF version of the panel and group mean statistics, we refer readers to an earlier version of the paper, Pedroni (1997a), which examines the small sample properties of the parametric ADF version of the statistics, and to Pedroni (1999), which discusses in more detail the construction of the ADF.
in Section 3 in conjunction with the small sample properties of the statistics.
In the proposition that follows, we present the limiting distributions for these test statistics under the null. In particular, for the following proposition, we posit
, respectively, to be finite means and covariances of the appropriate vector Brownian motion functional. As the following proposition indicates, when the statistics are standardized by the appropriate values for N and T, then the asymptotic distributions will depend only on known parameters given by
.
PROPOSITION 1 (Asymptotic Distributions of Residual-Based Tests for the Null of No Cointegration in Heterogeneous Panels). Let Θ,Ψ signify the mean and covariance for the vector Brownian motion functional
, where
, j = 1,2,3 refers to the j × j upper submatrix of ψ. Similarly, let
signify the mean and variance for the vector Brownian motion functional
. Then under the null of no cointegration the asymptotic distributions of the statistics presented in Definition 1 are given by
as (T,N → ∞)seq, where the values for φ(j) are given as φ(1)′ = −Θ1−2, φ(2)′ = (−Θ2Θ1−2,Θ1−1), and φ(3)′ = (−½Θ2Θ1−3/2(1 + Θ3)−1/2,Θ1−1/2(1 + Θ3)−1/2, −½Θ2Θ1−1/2(1 + Θ3)−3/2).
These results are fairly general and give the nuisance parameter free asymptotic distributions simply in terms of the corresponding moments of the underlying Brownian motion functionals, which can be computed by Monte Carlo simulation, much as is done for the conventional single equation tests for the null of no cointegration. Note that we require only the assumption of finite second moments here provided that we apply sequential limit arguments such that first T → ∞ so that this produces sums of i.i.d. random variables characterized as Brownian motion functionals to which standard Lindeberg–Levy central limit arguments can be applied for large N.
The result applies in general for any of the models associated with regression (1) and for any number of regressors when the slope coefficients are estimated separately for each member of the panel. On the other hand, the specific values for the moments
depend on the particular form of the model, such as whether heterogeneous intercepts or trends have been included in the estimation, and on the number of integrated regressors, m. Accordingly, Table 1 gives large finite sample moments for the leading bivariate cases of interest based on the simulated Brownian motion functionals of Proposition 1 so that we can evaluate the corresponding formulas under these conditions.
Let
signify the means and covariances for the vector Brownian motion functionals defined in Proposition 1. Then the approximations shown in Table 1 are obtained on the basis of Monte Carlo simulations for 100,000 draws from pairs of independent random walks with T = 1,000, N = 1, where cases 1, 2, and 3 refer, respectively, to the same functionals constructed from standard Wiener processes, demeaned Wiener processes, and demeaned and detrended Wiener processes, respectively. We then use these simulations to approximate the asymptotic distributions for the panel cointegration statistics as N grows large on the basis of Proposition 1. The results are summarized in the following corollary.
COROLLARY 1 (Empirical Distributions). Let
so that based on Proposition 1,
for each of the k = 1,…,5 statistics of χ. Based on the empirical moments given in Table 1 for large T, the following approximations obtain as N → ∞ under the null of no cointegration:
where cases 1, 2, and 3 refer, respectively, to statistics constructed from estimated residuals
, from the standard case, the case with estimated fixed effects,
, and the case with estimated fixed effects and estimated trends
.
The usage for these statistics is the same as for the single series case. For the panel-v statistics, large positive values indicate rejections, whereas for the panel-rho statistics and panel-t statistics, large negative values indicate rejection of the null. The computed moments in Table 1 and the corresponding distributions in Corollary 1 are for the leading case in which a single regressor is included. For the moments and empirical distributions corresponding to cases with various numbers of additional regressors, we refer readers to Pedroni (1999), which reports these for cases ranging from m = 2 through m = 7 regressors. The bias correction terms given by μ in the corollary are required to ensure that the distribution does not diverge as the N dimension grows large. The need for these stems from the fact that functionals for the underlying Weiner processes have nonzero means, which must be accommodated when averaging over the N dimension to ensure convergence. In comparing the distributions for the panel-rho and panel-t statistics to the ones applicable for raw panel unit root tests reported in Levin et al. (2002), we see that the consequence of using estimated residuals is to affect not only the asymptotic variance but also the rate at which the mean of the unadjusted pooled statistics diverge asymptotically. In these cases, ignoring the consequences of the estimated regressors problem for the asymptotic bias in panels would lead the raw panel unit root statistic to become divergent when applied to estimated residuals.
In this section we study some of the small sample properties of the statistics for variously dimensioned panels. We also study the empirical properties of the statistics as the sample dimensions grow large at different relative expansion rates. In particular, to study the small sample size properties we will employ the following DGP under the null hypothesis.
Data Generating Process 1.1. Let zit = (yit,xit)′, t = 1,…,T, i = 1,…,N be generated by
where xit is a scalar series so that m = 1.
This DGP is particularly convenient because it allows us to easily model and observe the consequences of heterogeneity in the dynamics for Δzit = ξit under the null hypothesis in terms of the long-run covariance matrix Ωi for ξit. Specifically, by drawing ξit from a vector moving average process and setting θ12i = θ22i = 0 while allowing θ11i and θ21i to vary across the individual members of the panel we obtain a fairly simple mapping between these coefficients and the long-run covariance matrix Ωi, while at the same time permitting substantial heterogeneity in the key features of the dynamics. In particular, since Ωi = (I + θi)E [ηitηit′](I + θi)′, this implies that when θ12i = θ22i = 0, we can characterize the conditional long-run variance of the spurious regression in terms of the simple ratio L11i2 = (1 + θ21i2)−1(1 + θ11i)2. Thus, by varying the values for θ11i and θ21i we control the values for Ωi and L11i2. For example, for the special case in which the DGP is i.i.d., so that θ11i and θ21i are both zero, then Ωi = I and L11i2 = 1. The two extremes then occur as either one of the parameters θ11i or θ21i approaches its upper bound. For example, when θ21i is at its minimum value of zero and θ11i is at its maximum value of 0.5, then L11i2 = 2.25. At the other extreme, when θ11i is at its minimum value of zero and θ21i is at its maximum value of 0.5, then L11i2 = 0.80.
To implement each of the test statistics described in Section 2, we must obtain the various nuisance parameter estimates
as presented in Definition 1. Thus, to obtain these, we first estimate
by ordinary least squares (OLS) separately for each member of the panel, and then we estimate
using the Newey–West kernel estimator, which allows us to construct
.
6Note that it is also possible to construct these estimates by imposing the null value ρi = 1 and estimating the nuisance parameters directly from
. However, Phillips and Ouliaris (1990) recommend against this, and we follow the same recommendation here by first estimating
in order to estimate the nuisance parameters.
The same issue discussed in the previous note applies here in that we could also impose a unit root on the zit data and estimate the nuisance parameters directly from ξit = Δzit. Instead, however, we have followed the recommendation in Phillips and Ouliaris (1990) here by first estimating
.
In our analysis of the small sample properties of the statistics, we are interested in particular in comparing the behavior of the statistics for different dimensions of the panel. We are also interested in observing the empirical consequences for the statistics as the cross-section and time series dimensions grow large at different relative rates. To illustrate these features, we focus here on a few key empirical properties, which we illustrate graphically. For more extensive Monte Carlo results presented in tabular form for the various other statistics, including the ADF versions of the t-statistics, we refer the reader to an earlier working paper version of this study, Pedroni (1997a).8
The DGP used in Pedroni (1997a) was based on various parameterizations of the one proposed by Haug (1996) for the conventional time series case. We have simplified the DGP in this version, which enables us to more easily characterize the long-run variance of ξit in terms of the DGP.
In the first two figures, we study the nominal sizes of the three different constructions of the rho-statistics as the dimensions of the panel vary. In Figure 1, we depict the empirical sizes for the nominal 5% tests as the T dimension grows large for a fixed value of N. Specifically, in this case we set N = 20 and allowed T to vary in increments of 10, with 10,000 independent draws from the DGP 1.1 described earlier.9
In all figures, the curves representing the behavior of each of the statistics have been smoothed slightly by means of a moving average of neighboring points.
In the next set of figures, we study the empirical size properties as both the N and T dimensions grow large at various relative rates of expansion. These experiments are particularly interesting, because they tell us something about the behavior of the statistics for different rates of expansion, which the sequential limit analysis that we used to obtain the limit distributions of the previous section does not address. In the following diagrams we depict results for the empirical sizes of the nominal 5% test using only a single statistic in each diagram, but for different relative rates of expansion. In particular, we illustrate this for two different statistics, the panel-rho statistic in Figure 3 and the group-rho statistic in Figure 4. Thus, the figures illustrate what happens for the empirical size of the particular statistic as the two dimensions grow large at different relative rates ranging from N = T1/2 to N = T5/6. Specifically, in Figures 3 and 4 the horizontal axis reports the value for T, and the various curves then report the empirical sizes for the statistic when N is given by N = T1/2, N = T2/3, N = T3/4, and N = T5/6, respectively. In each case, we allowed T to increase by increments of 10 and then assigned N a value indicated by the corresponding expansion rate rounded to the nearest integer and constructed the statistics based on 10,000 independent draws from DGP 1.1.
From Figure 3 we can see that among these expansion rates, convergence appears to occur most quickly for the panel-rho statistic when N = T3/4 and appears to occur most slowly when N = T1/2. In each case, convergence is from above. Figure 4 depicts the same experiments done for the group-rho statistic. Interestingly, the group-rho statistic appears to exhibit a hump-shaped feature for each of these rates of expansion, in that the size first rises before falling and appears to peak at around T = 100. Again, among these expansion rates, convergence appears to be slowest for the case when N = T1/2. However, for the group-rho statistic, the case when N = T5/6 appears to do best and also exhibits the least of the hump-shaped feature. In comparison to the panel-rho statistic, the empirical sizes for the group-rho statistic appear to be much closer to nominal size for very short panels along any of the expansion paths. For the case when N = T5/6, the empirical size begins at its lowest point at around 4.8% when T = 60 and peaks at its highest point of around 5.8% when T = 100.
We also experimented with rates of expansion with powers of T equal to and in excess of 1.0. In these cases the T dimension grows faster than the N dimension, and, as anticipated, the empirical sizes do not converge to nominal size along these expansion paths. This result is consistent with the fact that for more general joint convergence results, it is often necessary to impose the condition that the ratio N/T → 0 to eliminate bias terms that otherwise explode when T/N → 0. However, it is also interesting to note that in these cases we found that both of the statistics remain undersized relative to nominal size and eventually go to zero as the sample dimensions go to infinity, with the speed increasing as the exponent, a, for N = Ta increases. The fact that the statistics become undersized in these cases is reassuring in that it tells us that in practice the tests simply tend to become overly conservative in finite samples in which the N dimension exceeds the T dimension. Finally, as a separate issue, it is also worth noting that for more extreme cases in which large negative moving average components are present, modifications in the form of those studied by Ng and Perron (1997) for the conventional single cointegration equation context may also be helpful in further reducing small sample size distortions in the panel context.
Next, we study the power properties of the statistics against various alternative hypotheses. To simulate data under the alternative, we use the following DGP.
Data Generating Process 1.2. Let yit,xit,t = 1,…,T,i = 1,…,N be generated by
where xit is a scalar series so that m = 1 and where we vary the value for φ across experiments.
Note that we have imposed the alternative hypothesis in DGP 1.2 by ensuring that the residuals eit are stationary. Furthermore, in this case, rather than using a moving average process for the errors, instead we use an autoregressive process. The reason for this is because the power of the tests is primarily sensitive to the autoregressive coefficient φ of the residuals eit. In conventional time series tests, the small sample power tends to be weak against alternatives that imply near unit root behavior for the residuals, and we are interested in knowing the extent to which the small sample power improves against such near unit root alternatives in the case of the panel tests. Consequently, we examine the empirical power of the 5% tests against near unit root alternatives for the residuals, ranging from φ = 0.9 to φ = 0.99.
Again, we are interested in studying this for different combinations of the panel dimensions, N and T. As a general rule, we find that the power rises most rapidly as the N dimension increases. Accordingly, in the first set of figures, we depict the power of each of the various tests as the T dimension increases for a given value of N when the autoregressive parameter for the regression residuals is 0.9 and 0.95. When we later consider the extreme case such that the autoregressive parameter for the regression residuals is 0.99, we depict both the case when T increases for fixed N and the case when N increases for fixed T. Thus, in Figure 5 we depict the raw power of the 5% nominal tests for the null of no cointegration against the alternative hypothesis that the members of the panel are cointegrated when the AR(1) coefficient for the regression residuals is φ = 0.9. Specifically, for Figure 5 we set N = 20 and allowed T to vary in increments of 5, with 10,000 independent draws from the DGP 1.2 described previously for the case when φ = 0.9. The results show that in this case the empirical power for all of the tests rises rapidly as T increases, with the group-rho test reaching 100% power at the slowest rate, by the time T = 70, and all of the other tests achieving near 100% power by the time T reaches around T = 50. In Figure 6, we examine the empirical power properties for the case in which the alternative is closer to the null, such that φ = 0.95. In this case, we allowed T to increase by increments of 10, with N fixed at N = 20. Each of the statistics shows the same relative patterns as in Figure 5, except that they now require larger values for T to achieve a given level of power. The panel-v test achieves 100% power the quickest, at around T = 90. The group-rho test is again the slowest to obtain power as the T dimension increases, but it achieves nearly 100% power by the time T = 130. When comparing the raw power of the statistics for very small values of T, we should keep in mind however that according to Figures 1 and 2, the group-rho statistic is also the most conservative in terms of empirical size so that the difference in power is likely to be less extreme for size adjusted power. Of course as the dimensionality of the panel increases and the size distortion decreases, this plays less of a role.
Finally, we are interested in considering the power properties of the statistics for the extreme case in which the alternative is extremely close to the null and the regression residuals exhibit near unit root properties with φ = 0.99. In such cases, conventional time series tests for the null of no cointegration have very little power even in relatively sizable samples. In Figure 7, we see that even in panels, when N = 20 we still require a fairly long time dimension before the tests achieve high power. The panel-v test reaches 100% power most quickly, at around T = 350, whereas the panel-rho is the next quickest to reach 100% power at around T = 500. However, Figure 8 illustrates how it is possible to achieve near 100% empirical power even in the extreme case when φ = 0.99 by considering increases in the N dimension in lieu of the T dimension. Specifically, for Figure 8 we set T = 250 and varied N by increments of 1. In this case, we can see that the panel-v statistic reaches nearly 100% power already by the time N = 45, and the two other panel statistics exceed 90% power by the time N = 100, and the two group statistics exceed 90% power by the time N = 120. These results are potentially very promising for empirical research. In terms of monthly data, they imply that with little more than 20 years of data it may be possible to distinguish even the most extreme cases from the null of no cointegration when the data are pooled across members of panels with these dimensions.
Taken together, the Monte Carlo results from this section can also be helpful in deciding among the best uses for the various statistics presented in the previous section. For example, in very small panels, if the group-rho statistic rejects the null of no cointegration, one can be relatively confident of the conclusion because it is slightly undersized and empirically the most conservative of the tests. On the other hand, if the panel is fairly large so that size distortion is less of an issue, then the panel-v statistic tends to have the best power relative to the other statistics and can be most useful when the alternative is potentially very close to the null. The other statistics tend to lie somewhere in between these two extremes, and they tend to have minor comparative advantages over different ranges of the sample size. Finally, it is worth noting that the simulations here have been conducted for the case in which heterogeneous intercepts are estimated. An important avenue of further research will be to consider the small sample properties of the test statistics in the presence of member specific heterogeneous trends, which are likely to affect the power and size.
The PPP hypothesis has long been popular as an initial area of investigation for new nonstationary time series techniques, and in keeping with this tradition we illustrate here a fairly simple example of the application of the statistics proposed in this paper to a version of the hypothesis known as weak long-run PPP. This version of the PPP hypothesis posits that although nominal exchange rates and aggregate price ratios may move together over long periods, there are reasons to think that in practice the movements may not be directly proportional, leading to cointegrating slopes different from 1.0. For example, the presence of such factors as international transportation costs, measurement errors, differences in price indices, and differential productivity shocks has been used to explain why under the weak version of PPP the cointegrating slope may differ from unity.10
See earlier versions of this study, Pedroni (1995, 1997a), for a more detailed discussion of the PPP application of this section. In separate work, Pedroni (1996, 2000), a panel fully modified ordinary least squares (FMOLS) method for testing hypotheses regarding cointegrating vectors in such panels is developed and subsequently applied in Pedroni (2001b) to test the strong version of PPP for a similar data set, which is strongly rejected.
where sit is the log nominal bilateral U.S. dollar exchange rate at time t for country i and pit is the log price level differential between country i and the United States at time t, and a rejection of the null of no cointegration in this equation is taken as evidence in favor of the weak PPP hypothesis.
Table 2 reports both the conventional individual country results for a test of the null of no cointegration and the results of the panel and group mean statistics for the null of no cointegration. We employ both monthly and annual IFS data on nominal exchange rates and CPI deflators for the post–Bretton Woods period from June 1973 to December 1994 for between 20 and 25 countries depending on availability and reliability of the data. Results for both annual data, T = 20, and monthly data, T = 246, are reported side by side for each statistic, with the results for the monthly data reported in italics. For the semiparametric tests we have used the Newey and West (1994) recommendation for truncating the lag length for the kernel bandwidth. We also report the individual, panel, and group mean parametric ADF for comparison, where we have used a standard step down procedure, starting from K = 12 for the monthly data and K = 2 for the annual data. Because this results in a different truncation for each country, we report these in the last column. For the panel and group mean statistics we report results both for the raw data and for data that have been demeaned with respect to common time effects to accommodate some forms of cross-sectional dependency, so that in place of sit,pit we use
where
.
A few results are worth noting in particular. First, the point estimates for the slopes and intercepts appear to vary greatly among different countries, and second, as expected, the number of rejections based on the individual country tests is relatively low, so that on this basis alone the evidence does not appear to favor even weak PPP. By comparison, we see that for the annual data the panel-rho statistic and the two ADF statistics reject the null for the standard case, whereas all but the group rho reject the null for the case when the time means are subtracted. For the monthly data, each of the panel statistics rejects the null for the standard case, and the group statistics also reject the null for the case when the time means are subtracted. Thus, in contrast to the individual time series tests, both the panel and group statistics appear to provide fairly strong support in favor of the likelihood that weak PPP holds for at least a significant portion of countries in the post–Bretton Woods period.
An important caveat worth noting is that not all forms of cross-sectional dependency are necessarily accommodated by simple common time effects. This approach assumes that the disturbances for each member of the panel can be decomposed into common disturbances that are shared among all members of the panel and independent idiosyncratic disturbances that are specific to each member.11
We should note that the estimation of common time effects potentially further complicates the analysis of limiting distributions because the number of parameters to be estimated for the time effects grows with the time dimension, T. We report these estimates for our empirical illustration here simply for the case of comparison.
We have studied in this paper properties of residual-based tests for the null of no cointegration for panels in which the estimated slope coefficients are permitted to vary across individual members of the panel. These statistics allow for heterogeneous fixed effects and deterministic trends and also for heterogeneous short-run dynamics. The sequential limiting distributions under the null are shown to be normal and free of nuisance parameters. We have also studied the small sample behavior of the proposed statistics under a variety of different scenarios in a series of Monte Carlo experiments, and we have showed how these statistics could be applied in an empirical application to the PPP hypothesis. Finally, we note that the study is intended as an initial investigation into the properties of such statistics and that in so doing it raises many important additional issues of both a practical and a technical nature that we hope will be of interest for future research on the theory and application of nonstationary panel data techniques.
Proof of Proposition 1. Let
where now
. By expanding R1,iT,R2,iT in terms of the convergencies given in (2) and (3), it can be shown that as T → ∞
where Q is defined in terms of V,W as in the statement of the proposition. (See the appendix in Pedroni (1997a) for more details regarding this calculation.)
Similarly, we can evaluate
as follows. First, note that
. Under the null,
is Op(T−1) so that the second term will be eliminated asymptotically as T → ∞ and will not impact the sequential limit distribution. For convenience in notation, we drop these terms and write
as
where
Because ξit is strictly stationary we know that
Using the convergencies from (2) and (3) and Ωi = Li′Li, following some algebra, we see that as T → ∞,
Thus, letting
gives
as T → ∞.
Now, let RiT = (R1iT,R2iT,R3iT)′ and note that the first three statistics of the proposition can be written as different combinations of the standardized sums of these elements over the N dimension. Specifically,
where the individual elements of RiT,i = 1,…,N are i.i.d. over the i dimension. Next, define the mean of the values for L11i2 averaged over the i dimension to be E [L11i2] = π. It should be apparent that
as (T,N → ∞)seq because
as T → ∞ and
as N → ∞ in the second stage limit. Next, to determine the limiting distribution of each of the panel statistics as (T,N → ∞)seq, we use the delta method, which provides the limiting distribution for continuously differential transformations of i.i.d. vector sequences. Toward this end, we first expand each of the statistics as follows:
where the elements of Θ correspond to the means of the vector functional
as defined in the proposition. Thus, because we take the coefficients generating ξit = Ci(L)ηit to be i.i.d. over the i dimension and independent of the innovations, we have E [RiT] = π(Θ1,Θ2,1 + Θ3)′ as T → ∞ for any i.
Now, for the next stage of the sequential limit, as N → ∞, the summations in parentheses converge to the means of the respective random variables by virtue of a law of large numbers. This leaves the expressions involving each of the standardized square bracketed terms as a continuously differentiable transformation of a sum of i.i.d. random variables. In general, for a continuously differential transformation ZN of an i.i.d. vector sequence Xi, with vector mean
and covariance Σ, the delta method tells us that
as N → ∞, where the jth element of the vector α is given by the partial derivative
. Thus, in terms of our notation for the moments of RiT, we set
for each of the statistics, which produces the limiting distributions stated in the proposition as (T,N → ∞)seq.
The cases with demeaned or demeaned and detrended data can be obtained in similar fashion by defining
and the elements of RiT conformably in terms of the demeaned or demeaned and detrended data. In this case, the elements of the vector
become defined analogously in terms of demeaned Brownian motion, V*,W*, or demeaned and detrended Brownian motion, V**,W**, and the derivation in terms of the corresponding moments proceeds accordingly.
To establish the limiting distribution for the two group mean statistics, let
Using similar notation it should be apparent then that
as T → ∞. Next, expand each of the statistics as
which converge to
, respectively, as N → ∞ by the same type of arguments. █
Proof of Corollary 1. Expanding the terms for the variances in Proposition 1 gives
where ζ = Θ1−1(1 + Θ3)−1ψ22 + ¼Θ22Θ1−3(1 + Θ3)−1ψ11 + ¼Θ22Θ1−1(1 + Θ3)−3ψ33 − Θ2Θ1−2(1 + Θ3)−1ψ12 − Θ2Θ1−1(1 + Θ3)−2ψ23 + ½Θ22Θ1−2(1 + Θ3)−2ψ13. Substituting the empirical moments for large T,N = 1 into these expressions gives the reported approximations for the asymptotic distributions as N → ∞. The results for the group mean statistics follow immediately upon substituting the empirical moments for large T,N = 1 into the expressions of Proposition 1. █