Published online by Cambridge University Press: 25 October 2005
This paper analyzes the robustness of the estimate of a positive productivity shock on hours to the presence of a possible unit root in hours. Estimations in levels or in first differences provide opposite conclusions. We rely on an agnostic procedure in which the researcher does not have to choose between a specification in levels or in first differences. We find that a positive productivity shock has a negative impact effect on hours, but the effect is much shorter lived, and disappears after two quarters. The effect becomes positive at business-cycle frequencies, although it is not significant.
According to real-business-cycle models, hours worked should rise after a positive permanent shock to technology. However, the empirical validity of this theoretical implication has been questioned in the recent literature. For example, Gali (1999) identifies technology shocks as the only shocks that have an effect on labor productivity in the long run, and estimates a persistent decline of hours in response to a positive technology shock. As Gali (1999) points out, this result is more consistent with the predictions of a New Keynesian model than those of standard real-business-cycle models. Other papers have reached similar conclusions [see, e.g., Shea (1999), and Francis and Ramey (2001)], which spurred a line of research aimed at developing general equilibrium models that can account for this empirical finding [see, e.g., Francis et al. (2003), Uhlig (2003), and Gali and Rabanal (2004)].
In a recent paper, Christiano et al. (2003) challenge these empirical results. Using the same identifying assumption as Gali (1999), Christiano et al. (2003) find evidence that a positive technology shock drives hours worked up, not down. It seems that the estimated effects of technology shocks crucially depend on whether the empirical analysis is specified in levels or in differences. In fact, Gali (1999), Shea (1999), and Francis and Ramey (2001) specify hours in first differences and report that hours worked fall after a positive technology shock. On the other hand, Christiano et al. (2003) use hours in levels and report that hours worked increase. In the words of Christiano et al. (2003, p. 2) “the difference must be due to different maintained assumptions. As it turns out, a key culprit is how we treat hours worked.”
Whether hours worked is a stationary or an exactly integrated process is then a key assumption in the current debate on the effects of technology shocks on business cycles. However, it is practically difficult to choose between specifications in levels or in first differences on the basis of unit root tests because of their low power. Pesavento and Rossi (2003) show that, in the presence of a root close to unity, impulse response function estimates and confidence bands that rely on unit root pretests have bad small sample properties (in terms of median unbiasedness and coverage rates). Impulse responses based on VAR's estimated in levels or first differences have bad coverage properties as well, unless the true data generating process is not persistent (in which case, levels are appropriate) or it has an exact unit root (in which case, first differences are appropriate).
We provide empirical evidence based on an agnostic empirical estimation procedure proposed by Pesavento and Rossi (2003). The estimation is agnostic in that it does not impose either a unit root or stationarity. These authors show that their method is robust to the presence of highly persistent processes and thus it is appropriate if the researcher aims at analyzing the long-run effect of technology shocks on hours worked without making assumptions on the order of integration of the series. We find that a positive productivity shock has a negative impact effect on hours worked, but this effect disappears more quickly than in Francis and Ramey (2001) (after only two quarters), and it becomes quickly positive.
Let the data generating process (hereafter DGP) be
where
is a (2 × 1) vector of variables, where nt is the log of per-capita hours worked in the business sector and ft is average labor productivity; ut is a (2×1) stationary and ergodic moving-average sequence,
is a martingale difference sequence with covariance matrix
is the (2×2) identity matrix, and
is invertible.
Note that (1) and (2) are simply another way of writing a VAR, in terms of the roots rather than in the usual linear expression with lagged endogenous variables. This representation is convenient for our purposes because it distinguishes the long-run dynamics, captured by Φ, from the short-run dynamics, described by Θ(L). In fact, to allow a unit root in ft and high persistence in nt, we let
where ρ is close to 1 in a sense made precise below.
The objects of interest are the structural shocks, denoted by
, which are related to the VAR's residuals
by the following relationship:
We let
where
and
denote, respectively, the sequence of technology and nontechnology shocks. Following Gali (1999), we identify the technology innovation as the only shock that can have a permanent effect on productivity. This long-run identification imposes a lower triangular structure to Θ(I)A0 that allows the identification of the technology shock.
Let us first provide some intuition about how our “agnostic” method works by discussing what our method would deliver at long horizons. As in Pesavento and Rossi (2003), we use a local-to-unity asymptotic theory to improve the asymptotic approximation to highly persistent processes in small samples. That is, we model the largest root associated with hours, ρ, as local-to-unity:
To obtain better asymptotic approximations to IRF's in small samples, we also assume that the lead time of the impulse response function, h, is a fixed fraction of the sample size,
Note that, because of assumption (5), the method works very well at horizons (h) that are large relative to the available sample size, which is what we refer to as “long horizons.”
Considering the two assumptions (4) and (5) together, we have
Pesavento and Rossi (2003) show that the IRF of the effect of a technology shock,
on nt can be approximated by
where is denotes the sth column of the m×m identity matrix. This provides a simple, closed-form formula for the IRF's at long horizons as a monotone increasing function of c. This formula can easily be used to construct confidence intervals for the IRF at long horizons.1
Simply use (6) to obtain the confidence interval as follows: [
], where “hats” denote estimated values.
To construct IRF's that are valid at short horizons as well, which is what we do in this paper, the method is implemented in practice as follows:
The quasi differences are obtained by taking the residuals of a VAR(1). In our empirical application,
As pointed out by a referee, since the estimated value of ρ (0.986) is very close to 1, quasi differencing gives very similar results to first differencing at short horizons.
The last two steps are equivalent (by monotonicity) to the following proceedure. For a given horizon h=[δT], for each point on a grid within the confidence interval for
, construct two new sequences by multiplying each of the points in the confidence intervals by
and
, respectively; call these sequences
and
. The overall confidence interval for the IRF of hours to a productivity shock at horizon h is then obtained as the minimum over the first sequence and the maximum over the second sequence: (min
). By the Bonferroni inequality, the confidence interval should have a coverage of at least 90% at each horizon h. Because exponential functions are always positive, this procedure gives the same result as the procedure described in the main text. Intuitively, relative to simply using (6) with a consistent estimate of Θ(I) as described in a previous note, step (ii) adds information on the sampling variability of the short-run parameters,
, thus improving the performance of the method at short horizons.
Pesavento and Rossi (2003) investigate a variety of methods, all of which have good coverage. These methods build on the inversion of the following test statistics: ADF as in Stock (1991), Elliott et al. (1996), Elliott and Jansson (2003), Elliott and Stock (2001), and Elliott et al. (2005). Although we report results based on ADF only, our results are qualitatively robust to the use of the other methods mentioned above.
In the empirical section, we also report results by using Wright's (2000) method. The latter method is implemented by steps (i)–(iv) above, but replacing step (ii) with the following:
(ii′):
is reestimated conditional on every value of c within a grid over (cL, cU)—not only at the extremes, as we do. According to Pesavento and Rossi (2003), asymptotically the estimate of
is consistent anyway, and we gain in computational simplicity and smaller confidence bands.
In our empirical section, we also report IRF's obtained from standard VAR's using nt in levels as well as first differences. To estimate the confidence bands in both VAR's, we simulate the IRF distribution under a normality assumption with 1,000 Monte Carlo replications.
We use the same data as in Christiano et al. (2003), where per-capita hours are measured as the natural logarithm of hours worked in the business sector divided by a measure of the population. Productivity is measured as the natural logarithm of output per hour in the business sector. Data are quarterly observations from 1948:1 to 2001:4 and are ultimately taken from the DRI Economics Database.5
The mnemonics for business labor productivity, business hours, and the civilian population over the age of 16 are, respectively: LBOUT, LBMN, and P16. We thank Christiano et al. (2003) for the data.
The IRF's are multiplied by 100 so a value of 0.10 corresponds to a response of 0.10%. Following the cited literature, we include a constant, but not a time trend. We focus on a bivariate VAR with hours worked and the productivity measure. As in Francis and Ramey (2001) and in Christiano et al. (2003), we do not expect our results to change if we include additional variables. We use four lags (chosen by the BIC criterion) in order to compare our results directly to those of Francis and Ramey (2001) and Christiano et al. (2003). Results are robust to different lags (e.g., 1 to 6) if we use quasi differences to estimate the short-run dynamics.
Table 1 shows that, indeed, hours are a persistent process. The table provides both results on unit root tests on hours and empirical evidence on the magnitude of the persistence by using various methods to construct confidence intervals for the largest root. The methods are the Stock (1991) median unbiased method and those of Hansen (1995), Elliott et al. (1996), and Elliott and Jansson (2003). The Stock (1991) method is implemented as follows: First, we calculate the ADF test statistic for the time-series process of hours with four lags; then, by using the “inversion” Table A1 in Stock (1991), pp. 455–456, we recover the confidence interval for the largest root. Confidence intervals for the other methods are obtained in a similar fashion, although in the latter cases the inversion table may depend on nuisance parameters and thus needs to be calculated by the researcher for the specific database.
According to the Stock (1991) method, the largest root is between 0.93 and 1.01, with a median estimate equal to 0.98. With such a persistent process, it is not surprising that none of the tests is able to reject a unit root at 5% level (note that the CADF test rejects at 10%).
Given that unit root tests do not strongly support the presence of a unit root, it may not be desirable to take a stand on whether the process has a unit root or not. Kilian and Chang (2000) and Pesavento and Rossi (2003) show that in the presence of large roots the coverage rates of confidence intervals for impulse response functions constructed from VAR's in first differences or levels can be very bad in finite samples. The intuition is that a model that imposes a root equal to 1 when one of the variables is not I(1) is misspecified. On the other hand, in small samples, a model in levels underestimates the largest root and the persistence of shocks. These apparently small mistakes and biases become extremely important at medium to long horizons, where the difference between stationary and nonstationary processes becomes more and more important. As a result, VAR's in levels and first differences have a very small probability of containing the true impulse response function, almost zero. Unit root pretests do not solve the problem because the actual coverage of impulse response bands obtained after a pretest can be quite different from the nominal one (due to the low power of unit root tests against persistent alternatives). Furthermore, even if the tests reject a unit root, asymptotic approximations that rely on highly persistent regressors are expected to provide better approximations in small samples. Thus, we use the Pesavento and Rossi (2003) agnostic method to estimate median unbiased impulse response functions and their confidence bands, which does not require the researcher to choose between the two specifications. By using the local-to-unity parameterization, we model the persistency of the process as a function of the location parameter c (see the preceding section for details), which measures how close to unity the largest root of the process is.
Figure 3 reports results for the agnostic method. It shows a negative and very short-lived impact effect, which is very much in accordance with the findings of Francis and Ramey (2001). The negative effect lasts only two quarters, less than in Francis and Ramey (2001), and it is significant on impact. At business-cycle frequencies, the median point estimate of the impulse responses is positive, although not significantly different from zero. The confidence bands show that the effect is very likely to be positive at long horizons and at business-cycle frequencies (between six quarters and eight years). Comparing our median unbiased estimate of the response with that of VAR's in differences, we find some evidence that the medium and long horizon effect is more positive and slightly larger in magnitude. On the other hand, the effect that we estimate is also more persistent than that obtained from VAR's in levels. Finally, for comparison, Figure 4 reports results obtained by using methods such as Wright (2000).7
The method originally proposed by Wright (2000) is univariate. We apply a method that is, in spirit, very similar to his, but it is extended to a multivariate VAR with one large root.
In unreported simulations, we found that the results for the Hansen's (1995) test are robust to whether the CADF test is estimated with both leads and lags (as in Figure 5) or with lags only. We also found that the results are robust to the use of other methods to construct confidence intervals for a unit root, such as the Elliott et al. (1996) PT test and the Elliott and Jansson (2003) test.
Our results are also similar to those obtained by using the Anderson and Rubin (1949) robust confidence intervals, which also were reported by Vigfusson (2004, pp. 11–12). In fact, Vigfusson (2004) finds that, in a VAR's estimated in levels, the impact response of hours to a 1-standard-deviation shock can be negative [the confidence interval is (−0.05, 0.11) percent] and becomes more positive at business-cycle frequencies (the confidence interval is (0.05, 0.27) percent after six quarters). In the present paper (see Figure 3), the agnostic estimation shows an impact effect that is negative, but the upper bound of the confidence interval is very close to zero; in addition, after five to six quarters, the confidence interval shifts more toward positive values, which is very much in line with what Vigfusson (2004) finds.
This paper analyzes the robustness of the estimate of the effect of a positive productivity shock on hours worked to the presence of a possible unit root in hours. Whereas the literature focuses on the cases in which hours are estimated either in levels or in first differences (a sort of “atheist” view), we rely on an agnostic procedure in which the researcher does not have to choose between the two specifications. We find that a positive productivity shock has a negative impact on hours, as in Francis and Ramey (2001), but the effect is much shorter lived than previously found, and disappears after only two quarters. The effect then becomes positive at business-cycle frequencies, as in Christiano et al. (2003), although it is not significantly different from zero.
Our empirical evidence extends the results of Christiano et al. (2003) in an important and crucial way. In their framework, the level specification implies that the first difference specification is misspecified whereas the difference specification implies that the level specification is correctly specified. The latter follows from the fact that the level VAR's allows for a unit root. Although this is true at very short horizons, this does not need to hold at horizons that are large relative to the sample size, where the possibly downward-biased estimate of the root becomes important. The importance of these biases depends on the economic problem at hand and on the particular parameters that the researcher faces. Our results show that neglecting this effect may lead to very different economic results in measuring the effects of productivity shocks.
Possible alternative estimation methods include Bayesian methods, as described by Sims and Uhlig (1991). We do not attempt to pursue this approach in the present paper, but a thorough investigation of the performance of Bayesian methods in constructing confidence bands for impulse responses is provided by Kilian and Chang (2000).
We thank Robert Vigfusson and Neville Francis for providing codes to replicate their results and for comments, Robert Chirinko and seminar participants at Emory, the 2004 Duke Conference on Forecasting, and the 2004 Summer Meetings of the Econometric Society for comments. We also thank Pedro Duarte and Viktor Todorov for research assistance in the early stages of the project. All mistakes are ours.