A NEW ASYMPTOTIC THEORY FOR HETEROSKEDASTICITY-AUTOCORRELATION ROBUST TESTS

Nicholas M. Kiefer; Timothy J. Vogelsang

doi:10.1017/S0266466605050565

A NEW ASYMPTOTIC THEORY FOR HETEROSKEDASTICITY-AUTOCORRELATION ROBUST TESTS

Published online by Cambridge University Press: 23 September 2005

Nicholas M. Kiefer and

Timothy J. Vogelsang

Show author details

Nicholas M. Kiefer: Affiliation:
Cornell University
Timothy J. Vogelsang: Affiliation:
Cornell University

Article contents

Abstract
1. INTRODUCTION
2. INFERENCE IN GMM MODELS: THE STANDARD APPROACH
3. MOTIVATION: FINITE-SAMPLE PERFORMANCE OF STANDARD ASYMPTOTICS
4. A NEW ASYMPTOTIC THEORY
5. CHOICE OF KERNEL AND BANDWIDTH: PERFORMANCE
6. CONCLUSIONS
APPENDIX: Proofs
References

Rights & Permissions

Abstract

A new first-order asymptotic theory for heteroskedasticity-autocorrelation (HAC) robust tests based on nonparametric covariance matrix estimators is developed. The bandwidth of the covariance matrix estimator is modeled as a fixed proportion of the sample size. This leads to a distribution theory for HAC robust tests that explicitly captures the choice of bandwidth and kernel. This contrasts with the traditional asymptotics (where the bandwidth increases more slowly than the sample size) where the asymptotic distributions of HAC robust tests do not depend on the bandwidth or kernel. Finite-sample simulations show that the new approach is more accurate than the traditional asymptotics. The impact of bandwidth and kernel choice on size and power of t-tests is analyzed. Smaller bandwidths lead to tests with higher power but greater size distortions, and large bandwidths lead to tests with lower power but smaller size distortions. Size distortions across bandwidths increase as the serial correlation in the data becomes stronger. Overall, the results clearly indicate that for bandwidth and kernel choice there is a trade-off between size distortions and power. Finite-sample performance using the new asymptotics is comparable to the bootstrap, which suggests that the asymptotic theory in this paper could be useful in understanding the theoretical properties of the bootstrap when applied to HAC robust tests.We thank an editor and a referee for constructive comments on a previous version of the paper. Helpful comments provided by Cliff Hurvich, Andy Levin, Jeff Simonoff, and seminar participants at NYU (Statistics), U. Texas Austin, Yale, U. Montreal, UCSD, UC Riverside, UC Berkeley, U. of Pittsburgh, SUNY Albany, U. Aarhus, Brown U., NBER/NSF Time Series Conference, and 2003 Winter Meetings of the Econometrics Society are gratefully acknowledged. We gratefully acknowledge financial support from the National Science Foundation through grant SES-0095211. We thank the Center for Analytic Economics at Cornell University.

Type: Research Article
Information: Econometric Theory , Volume 21 , Issue 6 , December 2005 , pp. 1130 - 1164

DOI: https://doi.org/10.1017/S0266466605050565 [Opens in a new window]
Copyright: © 2005 Cambridge University Press

1. INTRODUCTION

We provide a new and improved approach to the asymptotics of hypothesis testing in time series models with “arbitrary,” i.e., unspecified, serial correlation and heteroskedasticity. Our results are general enough to apply to stationary models estimated by generalized method of moments (GMM). Heteroskedasticity and autocorrelation consistent (HAC) estimation and testing in these models involves calculating an estimate of the spectral density at zero frequency of the estimating equations or moment conditions defining the estimator. We focus on the class of nonparametric spectral density estimators¹

An alternative to the nonparametric approach has been advocated by den Haan and Levin (1997, 1998). Following Berk (1974) and others in the time series statistics literature, they propose estimating the zero frequency spectral density parametrically using vector autoregression (VAR) models. They show that this parametric approach can achieve essentially the same generality as the nonparametric approach if the VAR lag length increases with the sample size at a suitable rate.

that were originally proposed and analyzed in the time series statistics literature. See Priestley (1981) for the standard textbook treatment. Important contributions to the development of these estimators for covariance matrix estimation in econometrics include White (1984), Newey and West (1987), Gallant (1987), Gallant and White (1988), Andrews (1991), Andrews and Monahan (1992), Hansen (1992), Robinson (1998), de Jong and Davidson (2000), and Jansson (2002).

We stress at the outset that we are not proposing new estimators or statistics; rather we focus on improving the asymptotic distribution theory for existing techniques. Our results provide a framework that can be used to assess the impact of the choice of HAC estimator on the resulting test statistic.

Conventional asymptotic theory for HAC estimators is well established and has proved useful in providing practical formulas for estimating asymptotic variances. The ingenious “trick” is the assumption that the variance estimator depends on a fraction of sample autocovariances, with the number of sample autocovariances going to infinity, but the fraction going to zero as the sample size grows. Under this condition it has been shown that well-known HAC estimators of the asymptotic variance are consistent. Then, the asymptotic distribution of estimated coefficients can essentially be derived assuming the variance is known. That is, sampling bias and variance of the variance estimator do not appear in the first-order asymptotic distribution theory of test statistics regarding parameters of interest. Although this is an extremely productive simplifying assumption that leads to standard asymptotic distribution theory for tests, the accuracy of the resulting asymptotic theory is often less than satisfactory. In particular there is a tendency for HAC robust tests to overreject (sometimes substantially) under the null hypothesis in finite samples; see Andrews (1991), Andrews and Monahan (1992), and the July 1996 special issue of Journal of Business & Economic Statistics for evidence.

There are two main sources of finite-sample distortions. The first source is inaccuracy via the central limit theorem approximation to the sampling distribution of parameters of interest. This becomes a serious problem for data that have strong or persistent serial correlation. The second source is the bias and sampling variability of the HAC estimate of the asymptotic variance. This second source is the focus of this paper. Simply appealing to a consistency result for the asymptotic variance estimator as is done under the standard approach does not capture these important small-sample properties.

The assumption that the fraction of the sample autocovariances used in calculating the HAC variance estimator goes to zero as the sample size goes to infinity is a clever technical assumption that substantially simplifies asymptotic calculations. However, in practice there is a given sample size, and some fraction of sample autocovariances is used to estimate the asymptotic variance. Even if a practitioner chooses the fraction based on a rule such that the fraction goes to zero as the sample size grows, it does not change the fact that a positive fraction is being used for a particular data set. The implications of this simple observation have been eloquently summarized by Neave (1970, p. 70) in the context of spectral density estimation:

When proving results on the asymptotic behavior of estimates of the spectrum of a stationary time series, it is invariably assumed that as the sample size T tends to infinity, so does the truncation point M, but at a slower rate, so that M/T tends to zero. This is a convenient assumption mathematically in that, in particular, it ensures consistency of the estimates, but it is unrealistic when such results are used as approximations to the finite case where the value of M/T cannot be zero.

Based on this observation, Neave (1970) derived an asymptotic approximation for the sampling variance of spectral density estimates under the assumption that M/T is a constant and showed that his approximation was more accurate than the standard approximation.

In this paper, we effectively generalize the approach of Neave (1970) for zero frequency nonparametric spectral density estimators (HAC estimators). We derive the entire asymptotic distribution (rather than just the variance) of these estimators under the assumption that M = bT where b ∈ (0,1] is a constant. We label asymptotics obtained under this nesting of the bandwidth “fixed-b asymptotics.” In contrast, under the standard asymptotics b goes to zero as T increases. Therefore, we refer to the standard asymptotics as “small-b asymptotics.” We show that under fixed-b asymptotics, the HAC robust variance estimators converge to a limiting random matrix that is proportional to the unknown asymptotic variance and has a limiting distribution that depends on the kernel and b. Under the fixed-b asymptotics, HAC robust test statistics computed in the usual way are shown to have limiting distributions that are pivotal but depend on the kernel and b. This contrasts with small-b asymptotics where the effects of the kernel and bandwidth are not captured.

Fixed-b asymptotics is a natural generalization of ideas explored by Kiefer, Vogelsang, and Bunzel (2000), Bunzel, Kiefer, and Vogelsang (2001), Kiefer and Vogelsang (2002a), Kiefer and Vogelsang (2002b), and Vogelsang (2003). Those papers analyzed HAC robust tests for the case where the HAC bandwidth is set equal to the sample size, i.e., b = 1. By considering values of b < 1 in this paper, we follow the traditional approach where the bandwidth controls the downweighting on higher order sample autocovariances. An alternative has been proposed in recent work by Phillips, Sun, and Jin (2003) and Phillips, Sun, and Jin (2005) where downweighting is achieved by exponentiating kernels that use bandwidth equal to the sample size. They show that if the exponent increases with the sample size at a suitable rate, then consistent and asymptotically normal HAC variance estimators can be obtained. In an analysis analogous to fixed-b asymptotics, they also develop an asymptotic theory under the assumption that the exponent is fixed as the sample size increases. Fixed-exponent asymptotics captures the choice of kernel and exponent in much the same way that fixed-b asymptotics captures the choice of kernel and bandwidth.

Although the fixed-b assumption is a better reflection of practice in reality, that alone does not justify the new asymptotic theory. In fact, our asymptotic theory leads to two important innovations for HAC robust testing.

First, because the fixed-b asymptotics explicitly captures the choice of bandwidth and kernel, a more accurate first-order asymptotic approximation is obtained for HAC robust tests. Finite-sample simulations reported here and in the working paper (Kiefer and Vogelsang, 2005) show that in many situations the fixed-b asymptotics provides a more accurate approximation than the standard small-b asymptotics. There is also theoretical evidence by Jansson (2004) showing that fixed-b asymptotics is more accurate than small-b asymptotics in Gaussian location models for the special case of the Bartlett kernel with b = 1. Jansson (2004) proves that fixed-b asymptotics delivers an error in rejection probability that is O(T⁻¹ log(T)). This contrasts with small-b asymptotics where the error in rejection probability is no smaller than O(T^−1/2) (see Velasco and Robinson, 2001). Again focusing on Gaussian location models, recent work by Phillips, Sun, and Jin (2004) has shown that the error in rejection probability is O(T⁻¹) for exponentiated kernels under fixed-exponent asymptotics. Phillips et al. (2004) conjecture that this result likely extends to traditional kernels under fixed-b asymptotics. If this conjecture is true, then fixed-b asymptotics has an error in rejection probability of smaller order than the standard normal approximation (small-b).

Second, fixed-b asymptotic theory permits a local asymptotic power analysis for HAC robust tests that depends on the kernel and bandwidth. We can theoretically analyze how the choices of kernel and bandwidth affect the power of HAC robust tests. Such an analysis is not possible under the standard small-b asymptotics because local asymptotic power does not depend on the choice of kernel or bandwidth. Because of this fact, the existing HAC robust testing literature has focused instead on minimizing the asymptotic truncated mean square error (MSE) of the asymptotic variance estimators when choosing the kernel and bandwidth. For the analysis of HAC robust tests, this is not a completely satisfying situation, as noted by Andrews (1991, p. 828).²

Additional discussion of this point is given by Cushing and McGarvey (1999, p. 80).

An obvious alternative to asymptotic approximations is the bootstrap. Recently it has been shown by Hall and Horowitz (1996), Götze and Künsch (1996), Inoue and Shintani (2004), and others that higher order refinements to the small-b asymptotics are feasible when bootstrapping the distribution of HAC robust tests using blocking. Using finite-sample simulations we compare fixed-b asymptotics with the block bootstrap. Our results suggest some interesting properties of the bootstrap and indicate that fixed-b asymptotics may be a useful analytical tool for understanding variation in bootstrap performance across bandwidths. One result is that the bootstrap without blocking performs almost identically to the fixed-b asymptotics even when the data are dependent (in contrast the small-b asymptotics performs relatively poorly). When blocking is used, the bootstrap can perform slightly better or slightly worse than fixed-b asymptotics depending on the choice of block length. It may be the case that the block bootstrap delivers an asymptotic refinement over the fixed-b first-order asymptotics if the block length is chosen in a suitable way. A higher order fixed-b asymptotic theory is required before such a result can be formally established.

The remainder of the paper is organized as follows. Section 2 lays out the GMM framework and reviews standard results. Section 3 reports some small-sample simulation results that illustrate the inaccuracies that can occur when using the small-b asymptotic approximation. Section 4 introduces the new fixed-b asymptotic theory. Section 5 analyzes the performance of the new asymptotic theory in terms of size distortions and local asymptotic power. The impact of the choice of bandwidth and kernel is analyzed, and comparisons are made with the traditional small-b asymptotics and the block bootstrap. Section 6 gives concluding comments. Proofs are provided in an Appendix.

The following notation is used throughout the paper. The symbol ⇒ denotes weak convergence, B_j(r) denotes a j vector of standard Brownian motions (Wiener processes) defined on

denotes a j vector of standard Brownian bridges, and [rT] denotes the integer part of rT for r ∈ [0,1].

2. INFERENCE IN GMM MODELS: THE STANDARD APPROACH

We present our results in the GMM framework, noting that this covers estimating equation methods (Heyde, 1997). Since the influential work of Hansen (1982), GMM is widely used in virtually every field of economics. Heteroskedasticity or autocorrelation of unknown form is often an important specification issue, especially in macroeconomics and financial applications. Typically the form of the correlation structure is not of direct interest (if it is, it should be modeled directly). What is desired is an inference procedure that is robust to the form of the heteroskedasticity and serial correlation. HAC covariance matrix estimators were developed for exactly this setting.

Consider the p × 1 vector of parameters θ ∈ Θ ⊂ R^p. Let θ₀ denote the true value of θ and assume θ₀ is an interior point of Θ. Let v_t denote a vector of observed data and assume that q moment conditions hold that can be written as

where f (·) is a q × 1 vector of functions with q ≥ p and rank(E [∂f/∂θ′]) = p. The expectation is taken over the endogenous variables in v_t and may be conditional on exogenous elements of v_t. There is no need in what follows to make this conditioning explicit in the notation. Define

where

is the sample analog to (1). The GMM estimator is defined as

where W_T is a q × q positive definite weighting matrix. Alternatively,

can also be defined as an estimating equations estimator, the solution to the p first-order conditions associated with (2)

where

. Of course, when the model is exactly identified and q = p, an exact solution to

is attainable and the weighting matrix W_T is irrelevant. Application of the mean-value theorem implies that

where

denotes the q × p matrix whose ith row is the corresponding row of G_t(θ_T⁽ⁱ⁾) where

for some 0 ≤ λ_i,T ≤ 1 and λ_T is the q × 1 vector with ith element λ_i,T.

To focus on the new asymptotic theory for tests, we avoid listing primitive assumptions and make rather high-level assumptions on the GMM estimator

. Lists of sufficient conditions for these to hold can be found in Hansen (1982) and Newey and McFadden (1994). Our assumptions are as follows:

Assumption 1.

Assumption 2.

where

Assumption 3.

uniformly in r ∈ [0,1] where G₀ = E [∂f (v_t,θ₀)/∂θ′].

Assumption 4. W_T is positive semidefinite and p lim W_T = W_∞ where W_∞ is a matrix of constants and G₀′W_∞ G₀ is positive definite.

These assumptions hold for many models in economics, and with the exception of Assumption 2 they are fairly standard. Assumption 2 requires that a functional central limit theorem hold for T^1/2g_t(θ₀). This is stronger than the central limit theorem for T^1/2g_T(θ₀) that is required for asymptotic normality of

. However, consistent estimation of the asymptotic variance of

requires an estimate of Ω. Conditions for consistent estimation of Ω are typically stronger than Assumption 2 and often imply Assumption 2. For example, Andrews (1991) requires that f (v_t,θ₀) is a mean zero fourth-order stationary process that is α-mixing. Phillips and Durlauf (1986) show that Assumption 2 holds under the weaker assumption that f (v_t,θ₀) is a mean zero, 2 + δ–order stationary process (for some δ > 0) that is α-mixing. Thus our assumptions are slightly weaker than those usually given for asymptotic testing in HAC-estimated GMM models.

Under our assumptions

is asymptotically normally distributed, as recorded in the following lemma, which is proved in the Appendix.

LEMMA 1. Under Assumptions 1–4, as T → ∞,

where Λ*Λ*′ = G₀′W_∞ΛΛ′W_∞ G₀ and V = (G₀′W_∞ G₀)⁻¹Λ*Λ*′(G₀′W_∞ G₀)⁻¹.

Under the standard approach, a consistent estimator of V is required for inference. Let

denote an estimator of Ω. Then V can be estimated by

The HAC literature builds on the spectral density estimation literature to suggest feasible estimators of Ω and to find conditions under which such estimators are consistent. The widely used class of nonparametric estimators of Ω take the form

with

where k(x) is a kernel function k : R → R satisfying k(x) = k(−x), k(0) = 1, |k(x)| ≤ 1, k(x) continuous at

. Often k(x) = 0 for |x| > 1 so M “trims” the sample autocovariances and acts as a truncation lag. Some popular kernel functions do not truncate, and M is often called a bandwidth parameter in those cases. For kernels that truncate, the cutoff at |x| = 1 is arbitrary and is essentially a normalization. For kernels that do not truncate, a normalization must be made because the weights generated by the kernel k(x) and bandwidth M are the same as those generated by kernel k(ax) with bandwidth aM. Therefore, there is an interaction between bandwidth and kernel choice. We focus on kernels that yield positive definite

for the obvious practical reasons although many of our theoretical results hold without this restriction.

Standard asymptotic analysis proceeds under the assumption that M → ∞ and M/T → 0 as T → ∞, in which case

is a consistent estimator. Thus, b → 0 as T → ∞ and hence the label small-b asymptotics. Because in practical settings b is strictly positive, the assumption that b shrinks to zero has little to do with econometric practice; rather it is an ingenious technical assumption allowing an estimable asymptotic approximation to the asymptotic distribution of

to be calculated. The difficulty in practice is that any choice of M for a given sample size, T, can be made consistent with the preceding rate requirement. Although the rate requirement can been refined if one is interested in minimizing the MSE of

(e.g., M must increase at rate T^1/3 for the Bartlett kernel), these refinements do not deliver specific choices for M. This fact has long been recognized in the spectral density and HAC literatures and data dependent methods for choosing M have been proposed. See Andrews (1991) and Newey and West (1994). These papers suggest choosing M to minimize the truncated MSE of

. However, because the MSE of

depends on the serial correlation structure of f (v_t,θ₀), the practitioner must estimate the serial correlation structure of f (v_t,θ₀) either nonparametrically or with an “approximate” parametric model. Although data dependent methods are a significant improvement over the basic case for empirical implementation, the practitioner is still faced with either a choice of approximating parametric model or the choice of bandwidth in a preliminary nonparametric estimation problem. Even these bandwidth rules do not yield unique bandwidth choices in practice because, e.g., if a bandwidth M_A satisfies the optimality criterion of Andrews (1991), then the bandwidth M_A + d where d is a finite constant is also optimal. See den Haan and Levin (1997) for details and additional practical challenges.

As the previous discussion makes clear, the standard approach addresses the choice of bandwidth by determining the rate by which M must grow to deliver a consistent variance estimator so that the usual standard normal approximation can be justified. In contrast, for a given data set and sample size, we take the choice of M as given and provide an asymptotic theory that reflects to some extent how that choice of M affects the sampling distribution of the HAC robust test.

3. MOTIVATION: FINITE-SAMPLE PERFORMANCE OF STANDARD ASYMPTOTICS

Although the poor finite-sample performance of HAC robust tests in the presence of strong serial correlation is well documented, it will be useful for later comparisons to illustrate some of these problems briefly in a simple environment. Consider the most basic univariate time series model with ARMA(1,1) errors,

Define the HAC robust t-statistic for μ as

where

is computed using

. We set μ = 0 and generated data according to (7) for a wide variety of parameter values and computed empirical rejection probabilities of the t-statistic for testing the null hypothesis that μ ≤ 0 against the alternative that μ > 0. Note that because the test statistic has a symmetric distribution, the performance of a two-sided version of the test is qualitatively similar. We report results for the sample size T = 50, and 5,000 replications were used in all cases. To illustrate how well the standard normal approximation works as the bandwidth varies in this finite sample, we computed rejection probabilities for the t-statistic implemented using M = 1,2,3,…,49,50. We set the asymptotic significance level to 0.05 and used the usual standard normal critical value of 1.645 for all values of M. To conserve on space, we report results for six error configurations: independent and identically distributed (i.i.d.) errors (ρ = φ = 0), AR(1) errors with ρ = 0.7,0.9, MA(1) errors with φ = −0.5,0.5 and ARMA(1,1) errors with ρ = 0.8,φ = −0.4. We also give results where

is implemented with AR(1) prewhitening. We report results for the popular Bartlett and quadratic spectral (QS) kernels. Results for other kernels are similar.

The results are depicted in Figures 1 and 2. In each figure, the line with the label, N(0,1), plots rejection probabilities when the critical value 1.645 is used and there is no prewhitening. In the case of prewhitening, the label is N(0,1), PW. The figures also depict plots of rejection probabilities using the fixed-b asymptotic critical values, but for now we only focus on the results using the standard small-b asymptotics.

Empirical null rejection probabilities: Bartlett kernel, T = 50, 5% nominal level.

Empirical null rejection probabilities: QS kernel, T = 50, 5% nominal level.

Consider first the case of i.i.d. errors. When M is small rejection probabilities are only slightly above 0.05 as one might expect for i.i.d. data. However, as M increases, rejection probabilities steadily rise. When M = T, rejection probabilities are nearly 0.2 for the Bartlett kernel and exceed 0.25 for the QS kernel. Somewhat different patterns occur for AR(1) errors. When M is small, there are nontrivial overrejection problems. When prewhitening is not used, the rejections fall as M increases but then rise again as M increases further. When prewhitening is used, rejection probabilities essentially increase as M increases. Prewhitening reduces the extent of overrejection but does not remove it. The patterns for MA(1) errors are similar to the i.i.d. case, and the patterns for ARMA(1,1) errors are combinations of the AR(1) and MA(1) cases.

These plots clearly illustrate a fundamental finite-sample property of HAC robust tests: the choice of M in a given sample matters and can greatly affect the extent of size distortion depending on the serial correlation structure of the data. And, when M is not very small, the choice of kernel also matters for the overrejection problem. Given that the sampling distribution of the test depends on the bandwidth and kernel, it would seem to be a minimal requirement that the asymptotic approximation reflect this dependence. Whereas the standard small-b asymptotics is too crude for this purpose, our results in the next section show that the fixed-b asymptotics naturally captures the dependence on the bandwidth and kernel and captures the dependence in a simple and elegant manner.³

An alternative approach to the fixed-b asymptotics is to consider higher order asymptotic approximations in the small-b framework. For example, one could take the Edgeworth expansions from the recent work by Velasco and Robinson (2001) and obtain a second-order asymptotic approximation for HAC robust tests that depends on the bandwidth and kernels. Although this is a potentially fruitful approach, it is complicated by the need to estimate the second derivative of the spectral density.

Before moving on, it is useful to discuss in some detail the reason that rejection probabilities have the nonmonotonic pattern for AR(1) errors. When M is small,

is biased but has relatively small variance. Because the bias is downward, that leads to overrejection. As M initially increases, bias falls and variance rises. The bias effect is more important for overrejection (see Simonoff, 1993), and the extent of overrejection decreases. According to the conventional, but wrong, wisdom, the story would be that as M increases further bias continues to fall but variance increases so much that overrejections become worse.

⁴

This common misperception is an unfortunate result of folklore in the econometrics literature that states that “as M increases, bias in

decreases but variance increases.” This statement is quite misleading although the source is easy to pinpoint. A careful reader of Priestley (1981) will repeatedly see the phrase “as M increases, bias in

decreases but variance increases.” This phrase is completely correct if, as in Priestley (1981), one is discussing the properties of spectral density estimators at nonzero frequencies or, in the case of known mean zero data, at the zero frequency. However, this phrase does not apply to zero frequency estimators computed using demeaned data as is the case in GMM models. This fact is well known and has been nicely illustrated by Figure 2 in Ng and Perron (1996) where plots of the exact bias and variance of

are given for AR(1) processes.

This is not what is happening. In fact, as M increases further, bias eventually increases and variance falls. It is the increase in bias that leads to the large overrejections when M is large. The reason that bias increases and variance shrinks as M gets large is easy to explain. When M is large, high weights are placed on high-order sample autocovariances. In the extreme case where full weight is placed on all of the sample autocovariances, it is well known that

, and this occurs because

is computed using residuals that have a sample average of zero. Obviously,

is an estimator with large bias and zero variance. Thus, as M increases,

is pushed closer to the full weight case.

4. A NEW ASYMPTOTIC THEORY

4.1. Distribution of
in the Fixed-b Asymptotic Framework

We now develop a distribution theory for

in the fixed-b asymptotic framework.

⁵

The results in this section are a fixed-b asymptotic analysis of nonparametric zero frequency spectral density estimators. Fixed-b asymptotic results for spectral density estimators at nonzero frequencies have been obtained by Hashimzade and Vogelsang (2003).

We proceed under the asymptotic nesting that M = bT where b ∈ (0,1] is fixed. The limiting distribution of

in the fixed-b asymptotic framework can be written in terms of Q_i(b), an i × i random matrix that takes on one of three forms depending on the second derivative of the kernel. The following definition gives the forms of Q_i(b).

DEFINITION 1. Let the i × i random matrix Q_i(b) be defined as follows. Case (i): if k(x) is twice continuously differentiable everywhere,

Case (ii): if k(x) is continuous, k(x) = 0 for |x| ≥ 1, and k(x) is twice continuously differentiable everywhere except for |x| = 1,

where k₋′(1) = lim_h→0[(k(1) − k(1 − h))/h], i.e., k₋′(1) is the derivative of k(x) from the left at x = 1. Case (iii): if k(x) is the Bartlett kernel

The moments of Q_i(1) have been derived by Phillips et al. (2003) for the case of positive definite kernels. Hashimzade, Kiefer, and Vogelsang (2003) have generalized those results for Q_i(b) while also relaxing the positive definite requirement. The moments are

where I_i is the i × i identity matrix, κ_ii is the standard commutation matrix,⁶

The standard notation for the commutation matrix is usually K_ii (see Magnus and Neudecker, 1999, p. 46). We are using alternative notation because K_ij is used for a different matrix in our proofs.

and the range of all integrals is 0 to 1. Using these moment results, Hashimzade et al. (2003) prove that lim_b→0 E(Q_i(b)) = I_i and lim_b→0 var(vec(Q_i(b))) = 0. As an illustrative example, consider the Bartlett kernel where

We first consider the asymptotic distribution of

for the case of exactly identified models.

THEOREM 1. (Exactly identified models). Suppose that q = p. Let M = bT where b ∈ (0,1] is fixed. Let Q_p(b) be given by Definition 1 for i = p. Then, under Assumptions 1–4, as T → ∞,

Several useful observations can be made regarding this theorem. Under fixed-b asymptotics,

converges to a matrix of random variables (rather than constants) that is proportional to Ω through Λ and Λ′. This contrasts with the small-b asymptotic approximation where

is approximated by the constant Ω. Because, as b → 0 p lim Q_p(b) = I_p, it follows from Lemma 1 that p lim_b→0 ΛQ_p(b)Λ′ = Ω. Thus, the fixed-b asymptotics coincides with the standard small-b asymptotics as b goes to zero. The advantage of the fixed-b asymptotic result is that the limit of

depends on the kernel through k′′(x) and k₋′(1) and on the bandwidth through b but is otherwise nuisance parameter free. Therefore, it is possible to obtain a first-order asymptotic distribution theory that explicitly captures the choice of kernel and bandwidth. Under fixed-b asymptotics, any choice of bandwidth leads to asymptotically pivotal tests of hypotheses regarding θ₀ when using

to construct standard errors (details are given subsequently). Note that Theorem 1 generalizes results obtained by Vogelsang (2003) where the focus was b = 1.

When q > p and the model is overidentified, the limiting expressions for

are more complicated, and asymptotic proportionality to Ω no longer holds. This was established for the special case of b = 1 by Vogelsang (2003). This does not imply, however, that valid testing is not possible when using

in overidentified models because the required asymptotic proportionality does hold for

, the middle term in

. The following theorem provides the relevant result.

THEOREM 2. (Overidentified models). Suppose that q > p. Let M = bT where b ∈ (0,1] is fixed. Let Q_p(b) be given by Definition 1 for i = p. Define Λ* = G₀′W_∞Λ. Under Assumptions 1–4, as T → ∞,

This theorem shows that

is asymptotically proportional to Λ*Λ*′ and otherwise only depends on the random matrix Q_p(b). It directly follows that

is asymptotically proportional to V, and asymptotically pivotal tests can be obtained.

4.2. Inference

We now examine the limiting null distributions of tests regarding θ₀ under fixed-b asymptotics. Consider the hypotheses

where r(θ) is an m × 1 vector (m ≤ p) of continuously differentiable functions with first derivative matrix R(θ) = ∂r(θ)/∂θ′. Applying the delta method to Lemma 1 we obtain

where V_R = R(θ₀)VR(θ₀)′. Using (8) one can construct the standard HAC robust Wald test of the null hypothesis or a t-test in the case of m = 1. To remain consistent with earlier work, we consider the F-test version of the Wald statistic defined as

When m = 1 the usual t-statistic can be computed as

Often, the significance of individual parameters is of interest, which leads to t-statistics of the form

where

is the ith diagonal element of the

matrix. To avoid any confusion, note that these statistics are being computed in exactly the same way as under the standard approach. Only the asymptotic approximation to the sampling distribution is different.

Note that some kernels, including the Tukey–Hanning, allow negative variance estimates. In this case some convention must be adopted in calculating the denominator of the test statistics. Equally arbitrary conventions include reflection of negative values through the origin or setting negatives to a small positive value. Although our results apply to kernels that are not positive definite, we see no merit in using a kernel allowing negative estimated variances absent a compelling argument in a specific case. Nevertheless, we have experimented with the Tukey–Hanning and trapezoid kernels, and results not reported here do not support their consideration over a kernel guaranteeing positive variance estimates.

The following theorem provides the asymptotic null distributions of F and t.

THEOREM 3. Let b ∈ (0,1] be a constant and suppose M = bT. Let Q_i(b) be given by Definition 1 for i = m. Then, under Assumptions 1–4 and H₀, as T → ∞,

Theorem 3 shows that under fixed-b asymptotics, asymptotically pivotal tests are obtained and the asymptotic distributions reflect the choices of kernel and bandwidth. This contrasts with asymptotic results under the standard approach, where F would have a limiting χ_m²/m distribution and t a limiting N(0,1) distribution regardless of the choice of M and k(x). As discussed in Section 4, as b → 0, p lim Q_m(b) = I_m and the fixed-b asymptotics reduces to the standard small-b asymptotics. Therefore, if a traditional bandwidth rule is used in conjunction with the fixed-b asymptotics, in large samples the two asymptotic theories will coincide. However, because the value of b is strictly greater than zero in practice, it is natural to expect fixed-b asymptotics to deliver a more accurate approximation. The simulation results reported in Section 5 and in the working paper (Kiefer and Vogelsang, 2005), indicate that this is true in some Gaussian models. Finite-sample results reported by Ravikumar, Ray, and Savin (2004) indicate that the fixed-b asymptotic approximation can substantially reduce size distortions in tests of joint hypotheses, especially when the number of hypotheses being tested is large.

A theoretical comparison of the accuracy of the fixed-b asymptotics with the small-b asymptotics is not currently available because existing methods in higher order asymptotic expansions, such as Edgeworth expansions, do not directly apply to the fixed-b asymptotic nesting given the nonstandard nature of the distribution theory. Obtaining such theoretical results appears difficult, although for the Gaussian local model and the special case of the Bartlett kernel with b = 1 Jansson (2004) has shown that the error in rejection probability of F is O(T⁻¹ log(T)). Phillips et al. (2004) conjecture that this result can be strengthened to O(T⁻¹) and can be generalized to include b < 1 and other kernels besides the Bartlett kernel.

4.3. Asymptotic Critical Values

The limiting distributions given by Theorem 3 are nonstandard. Analytical forms of the densities are not available with the exception of t for the case of the Bartlett kernel with b = 1 (see Abadir and Paruolo, 2002; Kiefer and Vogelsang, 2002b). However, because the limiting distributions are simple functions of standard Brownian motions, critical values are easily obtained using simulations. In the working paper (Kiefer and Vogelsang, 2005) we provide critical values for the t-statistic for a selection of popular kernels (see the formula appendix for formulas for the kernels). Additional critical values for the F-test will be made available in a follow-up paper.

To make the use of the fixed-b asymptotics easy for practitioners we provide critical value functions for the t-statistic using the cubic equation

For a selection of well-known kernels, we computed the cv(b) function for the percentage points 90%, 95%, 97.5%, and 99%. Critical values for the left tail follow by symmetry around zero. The a_i coefficients are given in Table 1. They were obtained as follows. For each kernel and the grid b = 0.02,0.04,…,0.98,1.0 critical values were calculated via simulation methods using 50,000 replications. Normalized partial sums of 1,000 i.i.d. N(0,1) random deviates were used to approximate the standard Brownian motions in the respective distributions given by Theorem 3. For each percentage point, the simulated critical values were used to fit the cv(b) function by ordinary least squares (OLS). The intercept was constrained to yield the standard normal critical value so that cv(0) coincides with the standard asymptotics. Table 1 also reports the R² from the regressions, and the fits are excellent in all cases (R² ranging from 0.9825 to 0.9996).

Fixed-b asymptotic critical value function coefficients for t: cv(b) = a0 + a1 b + a2 b2 + a3 b3

5. CHOICE OF KERNEL AND BANDWIDTH: PERFORMANCE

In this section we analyze effects of the choice of kernel and bandwidth on the performance of HAC robust tests. We focus on accuracy of the asymptotic approximation under the null and on local asymptotic power. As far as we know, our analysis is the first to explore theoretically the effects of kernel and bandwidth choice on the power of HAC robust tests.

5.1. Accuracy of the Asymptotic Approximation under the Null and Comparison with the Bootstrap

The way to evaluate the accuracy of an asymptotic approximation to a null distribution, or indeed any approximation, is to compare the approximate distribution to the exact distribution. Sometimes this can be done analytically; more commonly the comparison can be made by simulation. We argued earlier that our approximation to the distribution of HAC robust tests was likely to be better than the usual approximation because ours takes into account the randomness in the estimated variance. However, as noted, that argument is unconvincing in the absence of evidence on the approximation's performance. We provide results for two popular positive definite kernels: Bartlett and QS. Results for the Parzen, Bohman, and Daniell kernel are similar and are not reported here. The working paper (Kiefer and Vogelsang, 2005) contains additional finite-sample simulation results that are similar to what is reported here.

The simulations were based on the simple location model (7) with the same design as described previously. Figures 1 and 2 provide plots of empirical rejection probabilities using the fixed-b asymptotics. Compared to the standard small-b asymptotics, the results are striking. In nearly all cases, the tendency to overreject is substantially reduced. The exceptions are for small values of M where the fixed-b asymptotics show only small improvements. But, this is to be expected given that the two asymptotic theories coincide as b goes to 0. In the case of i.i.d. errors, the fixed-b asymptotics is remarkably accurate. In the case of moving average (MA) errors, fixed-b asymptotics gives tests with rejection probabilities close to 0.05. In the case of autoregressive (AR) errors, overrejections decrease as M increases. Intuitively, the fixed-b asymptotics is capturing the variance of

and the downward bias in

that can be substantial when M is large. As M increases, the tendency to overreject falls. The QS kernel delivers tests with smaller size distortions than the Bartlett kernel, especially when M is large.

⁷

A recent paper by Müller (2004) provides a novel way of theoretically quantifying the size robustness properties of HAC robust tests as a function of the kernel and b. The robustness measure proposed by Müller (2004) closely matches the patterns seen in the finite-sample simulations.

The results in Figures 1 and 2 strongly suggest that the use of fixed-b asymptotic critical values greatly improves the performance of HAC robust t-tests.

Note however, that as serial correlation in the errors becomes stronger, the tendency to overreject increases even when fixed-b critical values are used. The reason is that the functional central limit theorem approximation begins to erode. Therefore, there may be potential for the bootstrap to provide further refinements in accuracy. The fixed-b asymptotics suggests that the bootstrap could work because the HAC robust tests are asymptotically pivotal for any kernel and bandwidth. Ideally, one would like to establish theoretically whether or not the bootstrap provides an asymptotic refinement. Such a theoretical exercise is well beyond the scope of this paper because higher order expansions for fixed-b asymptotics have not been developed. Obtaining such expansions is an important topic that deserves attention.

To show the potential of the bootstrap for HAC robust tests, we expanded our simulation study to include the block bootstrap. We implemented the block bootstrap resampling scheme following Götze and Künsch (1996) as follows. The original data, y₁,y₂,…,y_T, are divided into T − l + 1 overlapping blocks of length l. For simplicity, we report results for l = 1,4, both of which factor into T = 50. The T/l blocks are drawn randomly with replacement to form a bootstrap sample labeled y₁*,y₂*,…,y_T*. The bootstrap t-statistic is computed as

where

is computed using formula (6) with

. We use the label “naive” because the bootstrap t-statistic is computed using the same formula for

as was used for the original data.

⁸

We considered implementing the recentered and rescaled version of the bootstrap as proposed by Götze and Künsch (1996). Because Götze and Künsch (1996) recommend using the block length, l, equal to the HAC bandwidth, M, it is difficult to implement this bootstrap for the full range of M values given in Figures 1 and 2. The problem is that for large values of M, l is so large that the resampling is based on a very small number of blocks and it breaks down. We leave a comparison of the naive bootstrap and the Götze and Künsch (1996) bootstrap to future research.

Empirical rejection probabilities using the naive bootstrap critical values are reported in Figures 1 and 2 and are labeled NBoot. There are some compelling patterns in the figures. Contrary to theoretical results by Davison and Hall (1993) and Götze and Künsch (1996) suggesting otherwise, the naive bootstrap performs quite well; it dominates the standard normal approximation and has rejection probabilities that are very similar to the fixed-b asymptotics. This is true even for the i.i.d. (l = 1) bootstrap when the data are not i.i.d. In addition, the fact that the naive bootstrap works well when using the Bartlett kernel contrasts with conventional wisdom that claims bootstrapping Bartlett kernel tests are not more accurate than the standard normal approximation.

It is quite striking how closely the naive bootstrap with l = 1 (the i.i.d. bootstrap) follows the fixed-b approximation regardless of the error structure. Recent work by Goncalves and Vogelsang (2004) has shown that this pattern is systematic. Goncalves and Vogelsang (2004) prove that the naive bootstrap has the same first-order asymptotic distribution as the t-statistic under fixed-b asymptotics, including the case when l = 1. The simulations also suggest that the naive bootstrap could be more accurate than fixed-b asymptotics with suitable choice of the block length. Focusing on the cases of AR(1) errors with l = 4, notice that the naive bootstrap often has rejection probabilities closer to 0.05 than fixed-b. Although the naive bootstrap may provide a refinement over the fixed-b asymptotics, such a refinement has not been formally established. Higher order properties of the naive bootstrap under fixed-b asymptotics are a very interesting, but potentially difficult, area where more work is needed.

5.2. Local Asymptotic Power

Whereas the existing HAC literature has almost exclusively focused on the MSE of the HAC estimator to guide the choice of kernel and bandwidth, a more relevant metric is to focus on the power of the resulting tests. In this section we compare power of HAC robust t-tests using a local asymptotic power analysis within the fixed-b asymptotic framework. Our analysis permits comparison of power across bandwidths and across kernels. Such a comparison is not possible using the traditional first-order small-b asymptotics because local asymptotic power is the same for all bandwidths and kernels.

For clarity, we restrict attention to linear regression models. Given the results in Theorem 1, the derivations in this section are very simple extensions of results given by Kiefer and Vogelsang (2002b). Therefore, details are kept to a minimum. Consider the regression model

with θ₀ and x_t being p × 1 vectors. In terms of the general model we have f (v_t,θ₀) = x_t(y_t − x_t′θ₀). Without loss of generality, we focus on θ_i0, one element of θ, and consider null and alternative hypotheses:

where c ≥ 0 is a constant. If the regression model satisfies Assumptions 1–4, then we can use the results of Theorem 1 and results from Kiefer and Vogelsang (2002b) to easily establish that under the local alternative, H₁, as T → ∞,

where

Asymptotic power curves can be computed for given bandwidths and kernels by simulating the asymptotic distribution of t based on (10) for a range of values for δ and computing rejection probabilities with respect to the relevant null critical value. Using the same simulation methods as for the asymptotic critical values, local asymptotic power was computed for δ = 0,0.2,0.4,…,4.8,5.0 using 5% asymptotic null critical values.

The power results are reported in two ways. Figures 3, 4, 5, 6, 7, 8, 9, and 10 plot power across the kernels for a given value of b. Figures 11, 12, 13, 14, and 15 plot power across values of b for a given kernel. Figures 3, 4, 5, 6, 7, 8, 9, and 10 show that for small bandwidths, power is essentially the same across kernels. As b increases, it becomes clear that the Bartlett kernel has the highest power, whereas the QS and Daniell kernels have the lowest power. If power is the criterion used to choose a test, then the Bartlett kernel is the best choice within this set of five kernels. If we compare the Bartlett and QS kernels, we see that the power ranking of these kernels is the reverse of their ranking based on accuracy of the asymptotic approximation under the null.

Local asymptotic power of t, b = 0.02.

Local asymptotic power of t, b = 0.06.

Local asymptotic power of t, b = 0.01.

Local asymptotic power of t, b = 0.3.

Local asymptotic power of t, b = 0.5.

Local asymptotic power of t, b = 0.7.

Local asymptotic power of t, b = 0.9.

Local asymptotic power of t, b = 1.0.

Local asymptotic power of t, Bartlett kernel.

Local asymptotic power of t, Parzen kernel.

Local asymptotic power of t, Bohman kernel.

Local asymptotic power of t, Daniell kernel.

Local asymptotic power of t, QS kernel.

Figures 11, 12, 13, 14, and 15 show how the choice of bandwidth affects power. Regardless of the kernel, power is highest for small bandwidths and lowest for large bandwidths, and power is decreasing in b. These figures also show that power of the Bartlett kernel is least sensitive to b, whereas power of the QS and Daniell kernels is the most sensitive to b. Again, power rankings of b are the opposite of rankings of b based on accuracy of the asymptotic approximation under the null.

5.3. Size-Power Trade-Off

The finite-sample simulations and local asymptotic power calculations clearly indicate a trade-off between size distortions and power with regard to choice of kernel and bandwidth when using fixed-b critical values. Smaller bandwidths lead to tests with higher power but at the cost of greater size distortions, whereas larger bandwidths lead to tests with smaller size distortions but lower power. Similar trade-offs occur across kernels.⁹

Simulations reported by Phillips et al. (2003) and Phillips et al. (2005) found a similar trade-off between size distortion and power with respect to the exponent of exponentiated kernels.

Balancing the size-power trade-off in some systematic way could lead to the development of useful bandwidth rules that deliver tests with desirable properties.¹⁰

In the context of trend function inference analyzed using fixed-b asymptotics, Bunzel and Vogelsang (2005) propose a data dependent bandwidth rule that maximizes integrated power. They focus only on power because the class of tests they consider are very robust in terms of size distortions. Their approach does not naturally apply to the models in this paper because their size correction technique does not easily apply to models with stochastic regressors.

Indeed, for the exponentiated Bartlett kernel, Phillips et al. (2004) propose a data dependent rule for the choice of exponent that minimizes a loss function that is a weighted sum of type I and type II errors. They develop higher order expansions within the fixed-exponent asymptotic framework and use these expansions to analytically quantify their loss function. The authors conjecture that their results can be extended to traditional kernels via fixed-b asymptotics. Thus it seems promising that fixed-b asymptotics can be used to develop data dependent bandwidth rules that balance size distortion and power.

6. CONCLUSIONS

We have provided a new approach to the asymptotic theory of HAC robust testing. We consider tests based on the popular nonparametric kernel estimates of the standard errors. We are not proposing new tests, but rather we propose a new asymptotic theory for these well-known tests. Our results are general enough to apply to stationary models estimated by GMM. In our approach, b, the ratio of bandwidth to sample size, is held constant when deriving the asymptotic behavior of the relevant covariance matrix estimator (i.e., zero frequency spectral density estimator). Thus we label our asymptotic framework “fixed-b” asymptotics. In standard asymptotics, b is sent to zero and can be viewed as a “small-b” asymptotic framework. Fixed-b asymptotics improves upon two well-known problems with the standard approach. First, as has been well documented in the literature, the standard asymptotic approximation of the sampling behavior of tests is often poor. Second, the kernel and bandwidth choice do not appear in the approximate distribution, leaving the standard theory silent on the choice of kernel and bandwidth with respect to properties of the tests. Our theory leads to approximate distributions that explicitly depend on the kernel and bandwidth. The new approximation performs much better and gives insight into the choice of kernel and bandwidth with respect to test behavior. Once the higher order results of Phillips et al. (2004) have been extended to the fixed-b framework, data dependent bandwidth rules that balance size distortion and power can be developed. In addition, fixed-b asymptotics should be useful in explaining the performance of the naive bootstrap when applied to HAC robust tests.

The new approximations should be used for HAC robust test statistics for any choice of kernel and bandwidth. The fixed-b approximation is an unambiguous improvement over the standard normal approximation in most cases considered. We show that size distortions are reduced when large bandwidths are used but so is asymptotic power. Generally there is a trade-off in bandwidth and kernel choice between size (the accuracy of the approximation) and power. Among a group of popular kernels, the QS kernel leads to the least size distortion, whereas the Bartlett kernel leads to tests with highest power (and generally acceptable size distortion when large bandwidths are used).

APPENDIX: Proofs

We first define some relevant functions and derive preliminary results before proving the lemma and theorems. Define the functions

Notice that

Because k(r) is an even function around r = 0, D_T*(−r) = D_T*(r). If k*′′(r) exists then lim_T→∞ D_T*(r) = k*′′(r) by the definition of the second derivative. If k*′′(r) is continuous, then D_T*(r) converges to k*′′(r) uniformly in r. Define the stochastic process

for 1/T ≤ r ≤ 1 and X_T(r) = 0 for 0 ≤ r < 1/T (i.e., set g₀(θ₀) = 0). It directly follows from Assumptions 2–4 that

Proof of Lemma 1. Setting t = T, multiplying both sides of (4) by

, and using the first-order condition

gives

Solving (A.2) for

and scaling by T^1/2 gives

Because

by Assumptions 3 and 4, it follows from (A.1) that

We now prove Theorem 2. The proof of Theorem 1 follows using similar arguments and is omitted.

Proof of Theorem 2. Define the random process

for

. Plugging in for

using (4) gives

using (A.3). It directly follows from Assumptions 3 and 4 and (A.1) that

Straightforward algebra gives

Using algebraic arguments similar to those used by Kiefer and Vogelsang (2002b), it is straightforward to show that

To simplify notation, define

. Using (A.5) and the facts that

which follow from (3), it directly follows that

The rest of the proof is divided into three cases.

Case 1: k(x) is twice continuously differentiable. Using (A.6) it follows that

using the continuous mapping theorem. The final expression is obtained using k*′′(x) = (1/b²)k′′(x/b).

Case 2: k(x) is continuous, k(x) = 0 for |x| ≥ 1, and k(x) is twice continuously differentiable everywhere except for |x| = 1. Let 1([bull ]) denote the indicator function. Noting that Δ²K_ij = 0 for |i − j| > [bT] + 1, Δ²K_ij = −k([bT]/bT) = −k*([bT]/T) for |i − j| = [bT] + 1, and Δ²K_ij = [k([bT]/bT) − k(([bT]/bT) − (1/bT))] + k([bT]/bT) = [k*([bT]/T) − k*(([bT]/T) − (1/T))] + k*([bT]/T) for |i − j| = [bT], break up the double sum following the second = in (A.6) to obtain

Using the fact that C_T,i = C_i,T = 0, it follows that

By definition

in which case we can write

Therefore it follows that

where the second two terms of (A.8) are o_p(1) because

Note that the o_p(1) terms become identically zero when bT is an integer. Using (A.9) the right-hand side of (A.7) simplifies to

Using this expression we can write

Let k_{_}*′(b) denote the first derivative of k*(x) from the left at x = b. By definition

Therefore, by the continuous mapping theorem

Case 3: k(x) is the Bartlett kernel. It is easy to calculate that Δ²K_ij = 2/bT for |i − j| = 0, Δ²K_ij = −(1/bT) + 1 − ([bT]/bT) for |i − j| = [bT], Δ²K_ij = −(1 − ([bT]/bT)) for |i − j| = [bT] + 1, and Δ²K_ij = 0 otherwise. Using similar algebraic calculations as were done in Case 2, it is straightforward to show that

Proof of Theorem 3. We only give the proof for F as the proof for t follows using similar arguments. Applying the delta method to the result in Lemma 1 and using the fact that B_q(1) is a vector of independent standard normal random variables gives

where Λ^*j*j*j*j is the matrix square root of R(θ₀)(G₀′W_∞ G₀)⁻¹Λ*Λ*′(G₀′W_∞ G₀)⁻¹R(θ₀)′. Using the results in Theorem 2, it directly follows that

where we use the fact that

Using (A.11) and (A.12) it directly follows that

which completes the proof. █

References

REFERENCES

Abadir, K.M. & P. Paruolo (2002) Simple robust testing of regression hypotheses: A comment. Econometrica 70, 2097–2099.Google Scholar

Andrews, D.W.K. (1991) Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica 59, 817–854.Google Scholar

Andrews, D.W.K. & J.C. Monahan (1992) An improved heteroskedasticity and autocorrelation consistent covariance matrix estimator. Econometrica 60, 953–966.Google Scholar

Berk, K.N. (1974) Consistent autoregressive spectral estimates. Annals of Statistics 2, 489–502.Google Scholar

Bunzel, H., N.M. Kiefer, & T.J. Vogelsang (2001) Simple robust testing of hypotheses in non-linear models. Journal of the American Statistical Association 96, 1088–1098.Google Scholar

Bunzel, H. & T.J. Vogelsang (2005) Powerful trend function tests that are robust to strong serial correlation with an application to the Prebisch-Singer hypothesis. Journal of Business & Economic Statistics, forthcoming.Google Scholar

Cushing, M.J. & M.G. McGarvey (1999) Covariance matrix estimation. In L. Matyas (ed.), Generalized Method of Moments Estimation, pp. 63–95. Cambridge University Press.

Davison, A.C. & P. Hall (1993) On Studentizing and blocking methods for implementing the bootstrap with dependent data. Australian Journal of Statistics 35, 215–224.Google Scholar

de Jong, R.M. & J. Davidson (2000) Consistency of kernel estimators of heteroskedastic and autocorrelated covariance matrices. Econometrica 68, 407–424.Google Scholar

den Haan, W.J. & A. Levin (1997) A practitioner's guide to robust covariance matrix estimation. In G. Maddala and C. Rao (eds.), Handbook of Statistics: Robust Inference, vol. 15, pp. 291–341. Elsevier.

den Haan, W.J. & A. Levin (1998) Vector Autoregressive Covariance Matrix Estimation. Working paper, International Finance Division, FED Board of Governors.

Gallant, A. (1987) Nonlinear Statistical Models. Wiley.

Gallant, A. & H. White (1988) A Unified Theory of Estimation and Inference for Nonlinear Dynamic Models. Blackwell.

Goncalves, S. & T.J. Vogelsang (2004) Block Bootstrap Puzzles in HAC Robust Testing: The Sophistication of the Naive Bootstrap. Working paper, Department of Economics, Cornell University.

Götze, F. & H.R. Künsch (1996) Second-order correctness of the blockwise bootstrap for stationary observations. Annals of Statistics 24, 1914–1933.Google Scholar

Hall, P. & J.L. Horowitz (1996) Bootstrap critical values for tests based on generalized method of moments estimators. Econometrica 64, 891–916.Google Scholar

Hansen, B.E. (1992) Consistent covariance matrix estimation for dependent heterogenous processes. Econometrica 60, 967–972.Google Scholar

Hansen, L.P. (1982) Large sample properties of generalized method of moments estimators. Econometrica 50, 1029–1054.Google Scholar

Hashimzade, N., N.M. Kiefer, & T.J. Vogelsang (2003) Moments of HAC Robust Covariance Matrix Estimators under Fixed- b Asymptotics. Working paper, Department of Economics, Cornell University.

Hashimzade, N. & T.J. Vogelsang (2003) A New Asymptotic Approximation for the Sampling Behavior of Spectral Density Estimators. Working paper, Department of Economics, Cornell University.

Heyde, C. (1997) Quasi-Likelihood and Its Application. A General Approach to Optimal Parameter Estimation. Springer.

Inoue, A. & M. Shintani (2004) Bootstrapping GMM estimators for time series. Journal of Econometrics, forthcoming.Google Scholar

Jansson, M. (2002) Consistent covariance estimation for linear processes. Econometric Theory 18, 1449–1459.Google Scholar

Jansson, M. (2004) The error rejection probability of simple autocorrelation robust tests. Econometrica 72, 937–946.Google Scholar

Kiefer, N.M. & T.J. Vogelsang (2002a) Heteroskedasticity-autocorrelation robust standard errors using the Bartlett kernel without truncation. Econometrica 70, 2093–2095.Google Scholar

Kiefer, N.M. & T.J. Vogelsang (2002b) Heteroskedasticity-autocorrelation robust testing using bandwidth equal to sample size. Econometric Theory 18, 1350–1366.Google Scholar

Kiefer, N.M. & T.J. Vogelsang (2005) A new asymptotic theory for heteroskedasticity-autocorrelation robust tests. Working paper 05-08, Center for Analytic Economics, Cornell University.

Kiefer, N.M., T.J. Vogelsang, & H. Bunzel (2000) Simple robust testing of regression hypotheses. Econometrica 68, 695–714.Google Scholar

Magnus, J.R. & H. Neudecker (1999) Matrix Differential Calculus with Applications in Statistics and Econometrics. Wiley.

Müller, U.K. (2004) A Theory of Robust Long-Run Variance Estimation. Mimeo, Department of Economics, Princeton University.

Neave, H.R. (1970) An improved formula for the asymptotic variance of spectrum estimates. Annals of Mathematical Statistics 41, 70–77.Google Scholar

Newey, W.K. & D.L. McFadden (1994) Large sample estimation and hypothesis testing. In R. Engle and D.L. McFadden (eds.), Handbook of Econometrics, vol. 4, pp. 2113–2247. Elsevier.

Newey, W.K. & K.D. West (1987) A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica 55, 703–708.Google Scholar

Newey, W.K. & K.D. West (1994) Automatic lag selection in covariance estimation. Review of Economic Studies 61, 631–654.Google Scholar

Ng, S. & P. Perron (1996) The exact error in estimating the spectral density at the origin. Journal of Time Series Analysis 17, 379–408.Google Scholar

Phillips, P.C.B. & S.N. Durlauf (1986) Multiple regression with integrated processes. Review of Economic Studies 53, 473–496.Google Scholar

Phillips, P.C.B., Y. Sun, & S. Jin (2003) Consistent HAC Estimation and Robust Regression Testing Using Sharp Origin Kernels with No Truncation. Working paper, Department of Economics, Yale University.

Phillips, P.C.B., Y. Sun, & S. Jin (2004) Improved HAR Inference Using Power Kernels without Truncation. Working paper, Department of Economics, Yale University.

Phillips, P.C.B., Y. Sun, & S. Jin (2005) Spectral density estimation and robust hypothesis testing using steep origin kernels without truncation. International Economic Review, forthcoming.Google Scholar

Priestley, M.B. (1981) Spectral Analysis and Time Series, vol. 1. Academic Press.

Ravikumar, B., S. Ray, & N.E. Savin (2004) Robust Wald Tests and the Curse of Dimensionality. Working paper, Department of Economics, University of Iowa.

Robinson, P. (1998) Inference without smoothing in the presence of nonparametric autocorrelation. Econometrica 66, 1163–1182.Google Scholar

Simonoff, J. (1993) The relative importance of bias and variability in the estimation of the variance of a statistic. Statistician 42, 3–7.Google Scholar

Velasco, C. & P.M. Robinson (2001) Edgeworth expansions for spectral density estimates and studentized sample mean. Econometric Theory 17, 497–539.Google Scholar

Vogelsang, T.J. (2003) Testing in GMM models without truncation. In T. Fomby & R. Carter Hill (eds.), Advances in Econometrics: Maximum Likelihood Estimation of Misspecified Models: Twenty Years Later, vol. 17, pp. 199–233. Elsevier.

White, H. (1984) Asymptotic Theory for Econometricians. Academic Press.