A DATA-DRIVEN NONPARAMETRIC SPECIFICATION TEST FOR DYNAMIC REGRESSION MODELS

Alain Guay; Emmanuel Guerre

doi:10.1017/S0266466606060282

A DATA-DRIVEN NONPARAMETRIC SPECIFICATION TEST FOR DYNAMIC REGRESSION MODELS

Published online by Cambridge University Press: 23 May 2006

Alain Guay and

Emmanuel Guerre

Show author details

Alain Guay: Affiliation:
Université du Québec à Montréal
Emmanuel Guerre: Affiliation:
LSTA, Université Paris 6

Article contents

Abstract
1. INTRODUCTION
2. HEURISTICS OF THE DATA-DRIVEN TEST
3. MAIN RESULTS
4. SIMULATION EXPERIMENTS
5. CONCLUDING REMARKS
6. PROOFS OF MAIN RESULTS
APPENDIX A: Proofs of Propositions 1 and 2 and Lemma 1
APPENDIX B: Proof of Lemmas A.1–A.3
References

Rights & Permissions

Abstract

The paper introduces a new nonparametric specification test for dynamic regression models. The test combines chi-square statistics based on Fourier series regression. A data-driven choice of the regression order, which uses the square root of the number of Fourier coefficients, is proposed. The benefits of the new test are (1) the selection procedure produces explicit and chi-square critical values that give a finite-sample size close to the nominal size; (2) the test is adaptive rate-optimal and detects local alternatives converging to the null with a rate that can be made arbitrarily close to the parametric rate. Simulation experiments illustrate the practical relevance of the new test.The first author acknowledges financial support from the Fonds Québécois de la Recherche sur la Société et la Culture (FQRSC). The second author acknowledges financial support from LSTA.

Type: Research Article
Information: Econometric Theory , Volume 22 , Issue 4 , August 2006 , pp. 543 - 586

DOI: https://doi.org/10.1017/S0266466606060282 [Opens in a new window]
Copyright: © 2006 Cambridge University Press

1. INTRODUCTION

Starting with Bierens (1984) and Robinson (1989), nonparametric specification testing for dependent data has received much attention in the econometric literature. The range of potential applications includes nonlinearity tests and time series model building as reviewed in Tjøstheim (1994) and Fan and Yao (2003), specification of a continuous-time diffusion model for interest rates (Aït-Sahalia, 1996), specification of the Phillips curve (Hamilton, 2001), rational expectations models and conditional portfolio efficiency (Chen and Fan, 1999; Robinson, 1989), and tests of the Black and Scholes formula (Aït-Sahalia, Bickel, and Stocker, 2001) among others.

An important branch of this literature has considered a nonparametric approach that uses a smoothing parameter, such as a bandwidth or the order of a series expansion. This has raised two important issues, the detection properties and the size accuracy. The former can be addressed with efficiency considerations, as pioneered in Ingster (1992, 1993); see also Guerre and Lavergne (2002). This framework leads to calibration tests to detect alternatives, in a given smoothness class, that approach the null at the fastest possible rate. However, the proposed smoothing parameters depend upon the chosen smoothness class, which is too restrictive for practical applications because the choice of a smoothness class is often arbitrary. Regarding the size issue, the statistics considered in the literature are often quadratic, but the critical values are computed from a normal approximation that may be inaccurate; see Hong and White (1995) for nonparametric series and Tjøstheim (1994) for kernel methods. Recent work for independent and identically distributed (i.i.d.) observations, such as Fan, Zhang, and Zhang (2001), suggests that more sophisticated approximations should be used instead of the normal. Härdle and Mammen (1993) and Gozalo (1997), among others, have proposed bootstrapped critical values as a solution. This may be difficult when the parametric model under consideration is specified in continuous time and is therefore costly to simulate or to bootstrap. Bootstrapping is also a burden when the dynamic specification includes covariates that are not strongly exogenous and need to be simulated.

An important step for the detection issue was the development of the adaptive framework. Under this approach, the smoothness class containing the alternative is considered unknown. Adaptive tests combine several statistics, designed for a specific class, to build a test; see Hart (1997) for a review of earlier work in this direction. Spokoiny (1996) has developed an efficiency theory for the adaptive case. Various papers considered adaptive rate-optimal tests using the maximum of the statistics, including Fan (1996), Fan and Huang (2001), Horowitz and Spokoiny (2001), and Spokoiny (1996, 2001). More specifically, Horowitz and Spokoiny (2001) have proposed an adaptive rate-optimal kernel-based specification test for a general parametric regression model that has generated various extensions. Baraud, Huet, and Laurent (2003) consider some nonasymptotic refinements of the maximum approach for specification of a linear model. Poo, Sperlich, and Vieu (2004) are interested in a semiparametric null hypothesis, whereas Gayraud and Pouet (2005) considered a nonparametric null. Gao and King (2001, 2004) and Fan and Yao (2003) have proposed extending the scope of applications to dependent data.

However, the maximum approach produces statistics with unstable asymptotic null behavior, so that achieving an accurate size remains a difficult issue. Fan (1996) found that the null limit distribution of his test gives a poor approximation for finite samples. Horowitz and Spokoiny (2001) did not derive a null limit distribution and used simulated critical values. On the other hand, Guerre and Lavergne (2005) built on a data-driven selection procedure that, under the null, selects a prescribed statistic with a high probability. Compared to the maximum approach, this considerably reduces the complexity of the null behavior of the resulting test statistic, which asymptotic distribution is a standard normal given by a specific statistic. But the statistics of Guerre and Lavergne (2005) have a complicated quadratic structure, and so these authors used bootstrapped critical to achieve a level close to the nominal size. Hence, as mentioned earlier, such an approach may not be suitable for a dynamic model.

In this paper, a suitable modification of the Guerre and Lavergne (2005) test is proposed to derive an adaptive rate-optimal specification test with an accurate size in a dynamic setting. The null hypothesis considered is the specification of the conditional mean for a time series with heteroskedastic innovations. Nonparametric series methods are used to compute chi-square statistics of various orders, which, in case of low degrees of freedom, have an accurate chi-square approximation under the null. A selection criterion, using a low penalty term proportional to the square root of the number of coefficients, chooses a test statistic. Hence the rejection region of the test can use accurate chi-square critical values. The rest of the paper is organized as follows. Section 2 presents our test and the adaptive framework on a nontechnical level. Section 3 groups our main assumptions and our main results. After studying the null behavior of the test, adaptive rate-optimality is introduced, and the test is shown to be efficient. Detection of local alternatives, approaching the null with a rate close to the parametric one, is also considered. Section 4 illustrates the size and detection properties of the test with a simulation experiment, and Section 5 concludes the paper. The proofs are grouped in Section 6 and two Appendixes.

2. HEURISTICS OF THE DATA-DRIVEN TEST

Consider an autoregressive model with exogenous variables Z_t,

with

, where

is the past Borel field generated by X₁,…, X_t. Given T observations (Y₁,X₁),…,(Y_T,X_T), we want to test that μ(·) belongs to some parametric family

, that is, the correct specification hypothesis

The proposed procedure builds on the estimated residuals

, where

is a consistent estimator of θ under H₀, such as, for instance, the nonlinear least squares estimator

By Y_t = μ(X_t) + ε_t, the residuals decompose as

, where

indicates potential misspecification, which asymptotically vanishes under the null but not under the alternative. Our test combines nonparametric series statistics constructed by projecting the residuals to detect the presence of a significant

over a compact Λ = [−λ,λ]^d. More specifically, we focus on multivariate Fourier series regression.

Using other series approximation methods, as, for instance, polynomial functions or wavelets, is possible but leads to a more involved theoretical study. Indeed, the Fourier system satisfies

, a condition that simplifies algebraic manipulations under dependence mixing conditions. Another interest of Fourier methods is that using wavelets may limit the scope of applications to alternatives with a maximal smoothness given by the choice of the wavelet basis; see the wavelet tests considered in Spokoiny (1996) and Theorem 2.4 therein.

For

, define the kth trigonometric function over Λ as

so that

is an L₂(dx)-orthonormal system, that is, ∫_Λ ψ_k(x)ψ_k′(x) dx = 1 if k = k′ and 0 otherwise. Let

be the degree of ψ_k(·). The series estimation of

over Λ builds on trigonometric multivariate polynomial function [sum ]_|k|≤K b_kψ_k(·) of degree K, with a number c_K of coefficients b_k proportional to K^d. To account for heteroskedasticity, assume that an estimator

of σ(·) is given and consider the generalized least squares estimator

where

is the diagonal matrix with entries

, and Ψ_K is the T × c_K matrix [ψ_k(X_t),1 ≤ t ≤ T,|k| ≤ K]. Suppose that

is a trigonometric polynomial function of order K. A standard procedure to test the significance of Fourier coefficients would use the chi-square statistic

leading to rejection of H₀ when

is large. However, assuming that

has a finite series expansion of known order K is too simplistic for practical applications. More generally, an arbitrary choice of K may affect the power, and a better understanding of the impact of K is important to build a proper specification test. Set

so that

decomposes into three terms

with

The term

is crucial regarding detection of potential misspecification. It is the squared norm of the orthogonal projection of

on the columns of

, which increases with K up to

, achieved for c_K ≥ T. Hence

can be viewed as a downward-biased estimation of the empirical measure of misspecification

, that is,

where bias_μ(K) ≤ 0 depends upon the unknown μ(·) and decreases with K. The other important term in the decomposition (2.3) of the statistic

, a pure noise term. It can be expected that

is asymptotically a chi-square variable with c_K degree of freedom, with mean c_K and variance 2c_K, so that

. Neglecting

Assume that H₀ is μ(·) = 0 and that σ(·) is known so that

and the choice

is possible. In the case of Gaussian i.i.d. ε_t independent of the X_t's,

would be an

, which can be neglected with respect to

when this variable diverges. Note also that the distribution of

coincides with its chi-square approximation for such

and substituting in (2.3) gives a bias variance type decomposition for

Note that

is a better misspecification indicator than

, which is affected by an additional systematic bias term c_K. Guerre and Lavergne (2005) proposed a different bias correction that makes asymptotic inference less accurate in finite sample, so that the bootstrap is used.

Looking for the best estimator

of the misspecification indicator suggests that an ideal choice of K should achieve the minimum of

. However, this is infeasible in practice, at least because bias_μ(·) depends upon the unknown μ(·). Alternative feasible choices of K include the Akaike information criterion (AIC) and Bayesian information criterion (BIC) as reviewed in Hart (1997). These selection procedures consider a K achieving the maximum of

where γ is a penalty parameter. According to (2.4), this amounts to achieving the minimum of

. Therefore these selection procedures asymptotically balance |bias_μ(K)| with (γ − 1)c_K in place of the ideal order c_K^1/2 in (2.4). This suggests using instead a lower penalty term of the form c_k + γc_K^1/2 affecting the square root of the number of coefficients c_K^1/2 in place of c_k. More specifically, let

be a set of admissible degree K larger than or equal to K_min. Our data-driven choice of K is

The introduction of K_min quantities in the penalty criterion reflects a preference for low degree as justified now from considerations on the null behavior of the retained

As seen from Fan (1996) or Horowitz and Spokoiny (2001), finding an accurate approximation for the null distribution of a statistic that combines the

's as

is difficult. A first distinctive feature is that the selection procedure (2.5) is flexible enough to limit the contribution of the statistics with high K by taking γ_T large enough. Indeed, a limit case is γ_T = +∞, which gives that

. This continues to hold asymptotically provided γ_T diverges fast enough, as shown in Theorem 1 in Section 3. Moreover, as detailed now, an accurate approximation of the distribution of

is a standard chi-square. Because

asymptotically vanishes under H₀, (2.3) shows that the null distribution of

is approximately that of

and then, neglecting the effect of the variance estimation, of

where Ω^1/2 = Diag[σ(X₁),…, σ(X_T)]. In the i.i.d. case and according to the Berry–Esseen bound in Hart (1997, Thm. 7.2), the distribution of the vector

has a normal approximation up to an error a(c_K)/T^1/2 where a(c_K) diverges with c_K. Therefore, the distribution of the chi-squared statistic R_3K should be close to a chi-square with c_K degree of freedom up to an error a(c_K)/T^1/2, which is smaller for moderate K.⁴

This continues to hold in the dependent setup where the bound (B.9) in Appendix B gives a more complicated error term, which is K^2d/T^1/2 at best. A normal approximation would be affected with a bigger K^2d/T^1/2 + K^−d/2 error term.

Hence the test uses a chi-square critical value z_α = z_α,T with

where χ(c) is a chi-square with c degree of freedom and rejects H₀ if⁵

A second distinctive feature of the selection procedure (2.5) is standardization with c_{K_min} in the critical region

; see (2.6). Because

asymptotically, an alternative α-level critical region would use

in place of c_{K_min}. But such a choice would asymptotically reduce power because

. This also contrasts with a maximum procedure that would use the test statistic

with a

larger than c_{K_min}. The simulation experiments of Guerre and Lavergne (2005) revealed that such a construction of the critical region (2.6) gives a test that improves on its adaptive rate-optimal competitors.

Consider now the power issue. The data-driven choice (2.5) of K combines the detection properties of each of the

's. Indeed, because c_K ≥ c_{K_min} for any K in

, we have

This gives the power lower bound

which holds in particular for an optimal K that balances the bias with the penalty term. Taking K = K_min gives that

a power bound that shows that the test (2.6) improves on the one using the single statistic

. As seen from (2.4) and (2.8), consistency holds as soon as there is a degree K in

such that the misspecification measure

is asymptotically larger than the sum of

. Hence increasing γ_T too much should give a less powerful test. The form of the low penalty term in (2.5) is crucial to show adaptive rate-optimality; see Theorem 2 in Section 3. Theorem 3 in Section 3 shows that the test detects Pitman local alternatives with a rate arbitrarily close to the rate T^−1/2.

3. MAIN RESULTS

3.1. Main Assumptions

Consider T observations (Y_t,X_t) with

, and where μ(·) can depend upon T, in which case (Y_t,X_t) forms a triangular array (Y_tT,X_tT). Let

denote the Borel field generated by X₁,ε₁,…, X_t,ε_t and X_t,ε_t,X_t+1,ε_t+1,… , respectively. The α-mixing coefficients of

are

The next assumptions deal with the ε_t's, the mixing coefficients, and the parametric mean.

Assumption E. Let

be the Borel field generated by (X₁,ε₀),…,(X_t,ε_t−1). The variables

are martingale difference with

. The standard deviation function, σ(·) = Var[ε_t|X_t = ·], is continuous and bounded away from 0 on

Assumption X. The process

is stationary, with the following conditions holding.

(i) α(n) ≤ An^−1−a for some constant A,a > 0.

(ii) The variable X_t has a density f (·) with respect to the Lebesgue measure on

. The density f (·) is bounded away from 0 and infinity.

Assumption M. The parameter set Θ is a subset of

, and the following conditions hold.

(i) The regression function m(x;θ) is twice continuously differentiable with respect to θ. The gradient m⁽¹⁾(x;θ) and Hessian matrix m⁽²⁾(x;θ) are bounded over Λ × Θ.

(ii) For any sequence of regression functions μ_T(·) with

, there exists a sequence of parameter θ_T in Θ such that

, with θ_T = θ if μ_T(·) = m(·;θ) for some θ in Θ.

Assumption E ensures that the sums

are martingales that are asymptotically normal under Assumption X(i). The polynomial mixing rate of X(i) is a minimal rate to achieve T^1/2-consistency in the weak law of large numbers for the empirical mean T⁻¹Ψ_K′Ω⁻¹Ψ_K. Under Assumption X(ii), the limit of T⁻¹Ψ_K′Ω⁻¹Ψ_K has an inverse. Mixing conditions for Markovian (Y_t,X_t) as in Assumption X(i) can be derived using a drift condition; see Fan and Yao (2003, Thm. 2.4) and the references therein. When

, the sequence θ_T in Assumption M(ii) is the pseudo–true value

, which is uniquely defined under identification of the parametric regression model; see Domowitz and White (1982). Assumption M(i) then ensures that

is close to Δ(·) = μ_T(·) − m(·;θ_T) over Λ up to an

term.

Let us now turn to the construction of the test. The first assumption specifies a set of admissible degrees

in the spirit of the dyadic bandwidth set of Horowitz and Spokoiny (2001).

Assumption K. Let a be as in Assumption X. Set K_max = 2^J_max = O(T^{C₁ /d}) for some C₁ in (0,¾[(1 + a)/(5 + 3a)]), K_min = 2^J_min → ∞ with K_min^d = O(ln^C₂ T) for C₂ > 0, where J_min ≤ J_max are integer numbers. The set of admissible degrees

is dyadic, that is,

Note that (3.1) and the polynomial divergence rate of K_max imply that

is of exact order ln T. Such a restriction is helpful to show that

asymptotically under the null but also has some practical justifications. Indeed, achieving a small

is an important condition to get an accurate size. Because

vanishes if and only if K = K_min, (2.5) yields that

if and only if one of these penalized statistics is strictly positive for a K ≠ K_min, or equivalently

Hence

so that

increases with

and decreases with the penalty sequence γ_T. Therefore, using a parsimonious

can improve the size accuracy of the test. On the other hand, a dyadic

as in Assumption K contains sequences with any arbitrary order between ln^C₂ T and T^C₁ that is sufficient for adaptive rate-optimality. The constant C₁ of Assumption K must be smaller than

where a comes from Assumption X(i), α(n) = O(n^−1−a). This gives a K_max of order T^1/(4d) at best, whereas, in the i.i.d. setup, Hong and White (1995) allowed for a better order T^1/(3d) when using a single series statistic on which to base the test.

Let us now turn to variance estimation. The next condition allows us to approximate

with T⁻¹Ψ_K′Ω⁻¹Ψ_K for degrees K depending on the sample size T, as in Assumption K.

Assumption V. Let

. Then, for the considered sequence of regression models

and, for some integer [ell ] > d/2 and all ([ell ]₁,…,[ell ]_d) with

, where v_T = o(K_max^−3d/2/ln T) and lim inf_T→∞ T ^1/2v_T > 0.

Assumption V requires consistency of

under the null and the alternative. Convergence of

with the rate v_T requires that μ_T(·) and σ(·) satisfy a minimal smoothness condition. As seen from Guerre and Lavergne (2002), consistency is not necessary under the alternative but can be useful to get a powerful test. Under homoskedasticity, a simple choice of

is a constant difference-based estimator, in which case Assumption V holds with a best possible v_T = T^−1/2 so that K_max = o(T^1/(3d) ln^2/(3d) T). The heteroskedastic case requires nonparametric variance estimation, such as kernel, sieves, series expansion; see, among others, Guerre and Lavergne (2002, 2005) and Horowitz and Spokoiny (2001). The rate v_T is then the consistency rate for the [ell ]th partial derivatives, which restricts the divergence rate of K_max.

3.2. Asymptotic Behavior under the Null

As discussed following (3.2) and (2.9), a fast divergence rate for γ_T is useful to achieve an accurate size under the null but may negatively affect its power properties. Therefore, an important issue is to find a minimal divergence rate for γ_T ensuring that the test is asymptotically of level α or equivalently that

asymptotically vanishes under H₀. The Bonferroni inequality gives, in (3.2),

and showing that the last sum asymptotically vanishes for small γ_T necessitates precise uniform bounds for these probabilities, so that simple Chebychev-type inequalities may not be sufficient. Better Gaussian-type bounds in the spirit of Mill's ratio inequality

are derived in Lemma A.3 in Appendix A. Because the exact order of

is ln T, the next theorem ensures that the asymptotic size of the test is α provided that the penalty sequence γ_T diverges faster than (ln ln T)^1/2.

THEOREM 1. Consider that the null hypothesis H₀ is true and assume that Assumptions E, K, M, V, and X hold. Then, if γ_T diverges with

, and the test (2.6) is asymptotically of level α.

The minimal divergence rate (ln ln T)^1/2 ensuring that the test is asymptotically of level α is surprisingly low compared to the penalty term of order ln T used in the BIC criterion. Such improvement comes from the Gaussian-type bounds used for the tails of the standardized

. Indeed, this gives, up to remainder terms, a bound

in (3.3), which asymptotically vanishes provided that (3.4) holds. On the other hand, such a low rate is in line with previous findings for rate-optimal adaptive testing. Indeed, (3.2) shows that suitable γ_T should resemble the critical values of a maximum test such as that of Fan (1996), who found critical values with a typical rate of (2 ln ln T)^1/2. This suggests that our minimal rate condition (3.4) cannot be improved.

Another condition for Theorem 1 to hold is that K_min diverges with the sample size; see Assumption K. This is used to neglect the parametric estimation error

in the chi-square approximation of the distribution of

. Accounting for such an effect would allow us to consider a fixed K_min; see, for example, Hart (1997, Sect. 8.3.1).

3.3. Detection of Small Alternatives

As discussed following equation (2.9), the detection properties of the test depend upon a bias term from (2.4). Establishing formal adaptive rate-optimality of the test necessitates bounding this bias. The current mathematical approach to do so makes use of some smoothness restrictions. We consider here Hölder smoothness classes

that we introduce now. Define the departure from the null as

with a θ_T as in Assumption M. We restrict ourselves to departures Δ(·) with a restriction to Λ that admits a (2λ)-periodic extension. Consider first the case s ∈ (0,1], for which

For real s > 0, let [lfloor ]s[rfloor ] be the lower integer part of s, that is, the unique integer number satisfying [lfloor ]s[rfloor ] < s ≤ [lfloor ]s[rfloor ] + 1, so that s − [lfloor ]s[rfloor ] is in (0,1] with s − [lfloor ]s[rfloor ] = s for s ∈ (0,1]. For any s > 0, the smoothness class

is defined as

Hence the smoothness class

is defined for all s > 0 and L > 0. Lemma 1 in Section 6 gives, for the bias term of (2.4), the following bound:

for any Δ_μ,T(·) in

and any K. This gives, for small alternatives, which are the harder to detect,

Our minimax adaptive framework evaluates tests uniformly over alternatives at distance ρ from the null, that is, in

with unknown smoothness index (L,s). Such alternatives allow for a general shape of Δ_μ,T(·) with narrow peaks and valleys that may depend upon on T; see Horowitz and Spokoiny (2001). As pointed out in Guerre and Lavergne (2005), uniform consistency over

is equivalent to consistency against any sequence

as considered here. A crucial issue is the choice of a suitable asymptotically vanishing rate

. Indeed, some of the alternatives of

will not be detected by any tests if

goes to 0 at too rapid a rate. On the other hand, detection can become straightforward if

remains far from the null. Hence a good candidate

to evaluate a test is a frontier rate that separates these two extreme situations. In the adaptive approach, such a rate depends upon the unknown smoothness index s, and Spokoiny (1996) has shown that the optimal adaptive rate is

⁶

Spokoiny (1996) studied the continuous time white noise model (CTWN)

, where {W(t)}_t∈[0,1] is a standard Brownian motion. Although this model is mainly of theoretical interest, results established for the CTWN model extend to more common models through model equivalence; see Brown and Low (1996).

which is slower than the parametric rate T^−1/2. Guerre and Lavergne (2002) derived an optimal rate for a known smoothness index s that improves ρ_T from the (ln ln T)^1/2 factor, so that the price to pay for rate adaptation is moderate. As is well known, the rate ρ_T decreases faster than the nonparametric estimation rate T^−s/(2s+d). The adaptive rate-optimality of our test is stated in the next result.

THEOREM 2. Consider a sequence of alternatives

with s ≥ d(2/C₁ − 1)/4, L > 0, C₃ > 0, and

. Assume that Assumptions E, K, M, and V hold. Then, if γ_T is of exact order (ln ln T)^1/2 and provided C₃ is taken large enough, the test is consistent, that is,

The proof of Theorem 2 builds on the lower power bound (2.8) and on the bias variance decomposition (2.4). In view of the bias order (3.5) for small alternatives, an optimal choice of K in (2.8) is such that the order of the penalty term γ_T K^d/2 is proportional to TK^−2s, that is, for

where [·] is the integer part. Such K_* detects alternatives within the bias order divided by the sample size, K_*^−s ∝ (γ_T /T)^2s/(4s+d), which coincides with the optimal adaptive order ρ_T provided γ_T has the smallest possible order (ln ln T)^1/2 compatible with Theorem 1. Note that, under Assumption K, K_* is in

provided s ≥ d(2/C₁ − 1)/4, which implies that s > 7d/4.

Because adaptation means detection over various smoothness classes

, it is crucial that the test combine several statistics, as seen from the optimal K_* in (3.6), which depends on the smoothing index s. Therefore, tests that use a single statistic

generally fail to be rate-optimal adaptive. A more specific property of the test (2.6) is detection of small local alternatives.

THEOREM 3. Consider a sequence of local alternatives μ_T(·) satisfying

Then, under Assumptions E, K, M, V, and X, the test is consistent provided

Because K_min can diverge very slowly, the rate r_T can be arbitrarily close to the parametric detection rate 1/T^1/2. This slightly improves on the results of Horowitz and Spokoiny (2001), who achieved a rate (ln ln T)^1/2/T^1/2. A key argument there is that the local alternatives of Theorem 3 are asymptotically very smooth, because the departure from the null r_T Δ_0T(·) is in

, with a Lipschitz constant Lr_T that goes to 0. Hence these alternatives differ from the general ones in Theorem 2, and they are typically detected by trigonometric series with low degree such as K_min, so that (2.9) yields consistency of the test. On the other hand, using the single statistic

would give a test that is not consistent against the alternatives of Theorem 2, so that combining several statistics as in our procedure is crucial to achieve these opposite kinds of detection properties.

4. SIMULATION EXPERIMENTS

In this section we study the size and the power properties of the proposed procedure when testing for a null of linearity in the context of a Markov process of order 1. The resulting test is compared with the one developed by Hamilton (2001) to detect nonlinearity. First, to examine the size properties, we use the AR(1)

Three distributions are considered for the error term: standard normal, standardized student with five degrees of freedom, and a centered and standardized exponential. To examine the sensitivity of the tests to temporal dependence, we consider various values of the autoregressive parameter ρ, namely, ρ = 0, 0.25, 0.50, 0.75. To implement our test, we choose the interval (Λ in Section 2) for projecting the covariate Y_t−1 onto the trigonometric expansion to be equal to 2 divided by standard error of Y_t under the null. This corresponds to approximately 95% of the observations. The set

is equal to {1,2,4,8,16}. The asymptotic critical value is given by

, where χ_0.05(1) is the critical value at 5% of a chi-square with one degree of freedom. We study the small-sample properties of the test for various values of the penalty parameter γ_T. We fix γ_T equal to

where we set c = 2,3,5. The parameters are estimated by ordinary least squares (OLS). The sample size is set to 200, and the number of simulations is equal to 10,000.

The simulation results for the size, which are presented in Table 1, are encouraging. For c = 2 the test slightly overrejects in all cases. However, for c = 3,5, the size is accurate whatever the distribution, persistence, and number of observations considered. The Lagrange multiplier (LM) test developed by Hamilton (2001) shares these good size properties.

Size properties (5%) of our test and Hamilton test (LM) (200 observations)

To study the effect on power of the penalty sequence γ_T, two alternative specifications of the linear autoregressive process are examined. The first specification is a threshold autoregressive model defined as

where ε_t is i.i.d. N(0,1).⁷

Results for the normal distribution are only reported here because the results for the two other distributions are very similar. Of course, those results can be obtained upon request.

This representation contains two regimes delimited by a threshold equal to zero. When Y_t−1 is greater than zero, the dynamic dependence is controlled by the parameter ρ₁. In the case where it is inferior to zero, the dynamic depends on the parameter ρ₂. Under the null of linearity ρ₁ = ρ₂. The distance from the null is a function of the absolute value of the difference between ρ₁ and ρ₂. To see this, we can rewrite the threshold autoregressive model as follows:

Thus, under the null, μ(X_t) = ρ₁Y_t−1 whereas the nonlinear alternative is

To examine the sensitivity of the tests to temporal dependence, we consider various types of dependence for the process Y_t. We run the following experiments: (1) ρ₁ = 0 and ρ₂ = 0.25, 0.50, 0.75, (2) ρ₁ = 0.25 and ρ₂ = 0.50, 0.75, −0.50, (3) ρ₁ = 0.50 and ρ₂ = 0.25, 0, −0.25, and (4) ρ₁ = 0.75 and ρ₂ = 0.50, 0.25, 0. The values of ρ₂ under the alternative are chosen such that the parameter (δ) that governs the distance from the null is equal to 0.25, 0.50, and 0.75, respectively. Table 2 reports the power results. Our test is more powerful than Hamilton's for all cases. Our power gains increase with the degree of temporal dependence and the distance of the alternative from the null. The difference in the rejection rate can be as high as 38%.

Power properties (5%) of our test and Hamilton test (LM): First experiment (200 observations)

The second experiment corresponds to an alternative for which the data-driven optimal test is specially designed. The alternative models have the following form:

where

, and ε_t is i.i.d. N(0,1). Figure 1 shows the function f (·) for τ = 1, 0.50, and 0.25, ρ = 0.50, and values of Y_t between −10 and 10. The function f (·) is symmetric around zero and more concentrated for smaller values of τ. The function is bounded between zero and one, with f (0) = 1 and lim_x→±∞ f (x) = 0. We can easily show that the alternative (4.1) respects the drift condition of Fan and Yao (2003, Thm. 2.4) for geometric ergodicity. This alternative is then compatible with the assumptions in this paper.

Alternative model (ρ = 0.50). Dashed line, τ = 0.25; thick line, τ = 0.50; and solid line, τ = 1.

We examine the sensitivity of the tests to the narrowness of the peak and temporal dependence. We consider the parameter values τ = 25, 0.50, 0.75 and ρ = 0.25, 0.50, 0.75. Table 3 shows the results of the experiment. For τ = 1, Hamilton's test is close to the nominal size. For 200 observations, our test rejects at a rate of 17% for ρ = 0.25 and 56% for ρ = 0.75. For τ = 0.50, our test also clearly dominates the test proposed by Hamilton for all cases. For a narrow peak (τ = 0.25), the rejection rate of both tests is quite similar. The better performance of the Hamilton test for this alternative compared to the one with a wider peak is probably due to the specification of the variance-covariance function of the random field underlying the test statistic. See Hamilton (2001) for further details on the construction of this test.

Power properties (5%) of our test and Hamilton test (LM): Second experiment (200 observations)

5. CONCLUDING REMARKS

This paper proposes a new adaptive rate-optimal specification test for time series. As in the maximum approach of Fan (1996) or Horowitz and Spokoiny (2001), the test combines several statistics to achieve adaptive rate-optimality. More specifically, the test builds on series regression chi-square statistics with increasing orders. A data-driven selection procedure, in the spirit of Guerre and Lavergne (2005), uses a penalty term proportional to the square root of the number of Fourier coefficients to choose the test statistic. Under the null, the retained statistic is, with high probability, a statistic with a distribution close to a chi-square. Therefore, standard chi-square critical values can be used, allowing for better control of the size of the test. This contrasts with the maximum approach, where using a null limit distribution performs poorly, as noted in Fan (1996), or is out of reach, as in Horowitz and Spokoiny (2001). Hence, the maximum approach necessitates the use of simulated critical values, limiting the scope of applications to time series models that can be easily simulated. A simulation experiment confirms the good level properties of the proposed test, which shows interesting power improvements compared to a simpler test using a single statistic such as that of Hamilton (2001). We also examine the power of the test that is adaptive rate-optimal and detects local alternatives approaching the null at a faster rate than in Horowitz and Spokoiny (2001). The simulation experiment shows that the choice of the penalty term has a moderate impact on the power. This positively illustrates the interest of our approach, which builds on the fact that the combination mechanism inherent to adaptive testing can also be designed to achieve a level close to the nominal size.

Although our results are stated for Fourier series methods, our approach also applies to wavelets or polynomial series regression. As noted in Guerre and Lavergne (2005), the series construction of the test statistic easily can be modified to cope with additive alternatives that are not affected by the curse of dimensionality. Obtaining an accurate size in the case of kernel or local polynomial methods is theoretically feasible. The scope of applications of the new data-driven selection procedure can also be extended as discussed in Hart (1997) for earlier adaptive procedures or as in Tjøstheim (1994) and Fan and Yao (2003) in the time series context, in addition to many other specification hypotheses of econometric interest.

6. PROOFS OF MAIN RESULTS

The proofs are organized as follows. Important intermediate results and proofs of the main statements are given in Section 6. Proofs of auxiliary results are gathered in Appendixes A and B. We now introduce some notation and conventions. All functions can be set to 0 outside Λ without loss of generality. We set

. The symbol a_T [asymp ] b_T means that the two sequences a_T, b_T with the same sign are such that c|a_T| ≤ |b_T| ≤ C|a_T| for some 0 < c ≤ C < ∞ and T ≥ 1. Constants are denoted by the generic letter C and vary from expression to expression.

For notational convenience, we reindex the trigonometric functions (2.1) as

and set c_K = κ. We assume that the new ordering is such that Ψ_K = [ψ₁,…, ψ_κ] and uses the notation Ψ_κ for Ψ_K. Here

, is a column vector with

. Therefore Ψ_κ is a T × κ matrix and κ [asymp ] K^d. With little abuse of notation,

_T denotes both the set of admissible K or κ with κ between κ_min [asymp ] 2^J_mind and κ_max [asymp ] 2^J_maxd. The term

corresponds to

. The variance estimation rate in Assumption V is such that v_T = o(κ_max^−3/2/ln T).

Let ∥·∥ be the euclidean norm of

, that is, if

. If m = [m(X₁),…, m(X_T)]′ where m(·) maps

. Under Assumption E,

. For a κ × κ matrix Σ = [Σ_{k[ell ]}]_{1≤k,[ell ]≤κ}, ∥Σ∥ is the spectral radius

. Recall that ∥Σu∥ ≤ ∥Σ∥∥u∥, |u₁′Σu₂| ≤ ∥Σ∥∥u₁|∥u₂∥. It follows that the entries of Σu are bounded by κ^1/2∥Σ∥max_1≤k≤κ|u_k|. If Σ is a symmetric matrix, ∥Σ∥ = sup_∥u∥=1|u′Σu| is the largest eigenvalue in absolute value of Σ. Because

is the orthogonal projection on the space spanned by the columns of

, we have

In what follows, we bound variance of sums using the Wolkonski–Rozanov inequality (see Fan and Yao, 2003, Prop. 2.5(ii)), which states that

for any real-valued bounded g₁(·) and g₂(·). This gives

6.1. Estimation Errors

We consider first the parametric and variance estimation errors induced by

, respectively. For Δ_T(·) = μ_T(·) − m(x;θ_T), set U = Δ_T + ε and let Ω^1/2 be the T × T diagonal matrix with entries σ(X_t). Set

where

PROPOSITION 1. Consider a departure from the null such that

. Under Assumptions E, M, V, and X, and if κ_min → ∞, κ_max = O(T^1/3/ln² T), we have

Proof of Proposition 1. See Appendix A.

6.2. Proof of Theorem 1

The next proposition is the key tool to establish Theorem 1.

PROPOSITION 2. Assume that H₀ holds, that is, Δ_T(·) = 0. Then under Assumptions E, K, M, V, and X, make the following assumptions.

(i) Let χ(κ) be a chi-square variable with κ degree of freedom. Then, for any κ = κ_T in

(ii) Assume that (3.4) holds, that is, that for some ε > 0,

. Then

Proof of Proposition 2. See Appendix A.

Proof of Theorem 1. Equation (3.2) and Proposition 2(ii) yield

Then the definition of z_α in (2.6) and Proposition 2(i) yield

6.3. Proof of Theorems 2 and 3

The next lemma is crucial for the consistency properties of the test and is used for the item

in (2.3).

LEMMA 1. Consider a departure from the null such that

. Assume that Assumptions E, V, and X hold and that κ = κ_T diverges with κ = o(T^1/3/ln² T).

Then there exists a constant C₅ > 0, depending upon s, L, and Λ, such that for any

, any Δ(·) from Λ to

, we have

Proof of Lemma 1. See Appendix A.

Proof of Theorem 2. Let s ≤ d(2/C₁ − 1) and L be some unknown smoothness indexes. Let K_* be as in (3.6), so that K_* corresponds to a κ_* in the new indexation. Observe that this κ_* is such that

because the exact order of γ_T is ln^1/2 ln T, s > 0, and κ_min is smaller than a power of ln T.

Consider now a sequence of alternatives μ_T(·) in H₁(C₃.ρ_T) with C₃ ρ_T > 2C₅κ_*^−s/d, where C₅ is from Lemma 1. This gives that

and that

diverges. Hence Lemma 1 gives

Observe also that Proposition 2(i) shows that

Hence, (6.5), applying Proposition 1 for

(so that κ_max = κ_min = κ_T), and substituting yield

provided C₃ is large enough. The lower power bound (2.8) then shows that Theorem 2 is proved. █

Proof of Theorem 3. Because the proof of Theorem 3 is similar to the proof of Theorem 2 up to the fact that detection is achieved through κ_min, we just give the main steps. Expression (2.7) yields that

, so that it is sufficient to show that

diverges to +∞ in probability. Building on Propositions 1 and 2(i) and Lemma 1 as for Theorem 2 now gives, because κ_min [asymp ] K_min^d → ∞,

provided Tr_T² diverges with lim_T→∞ K_min^d/2/(Tr_T²) = 0 as assumed in Theorem 3. █

APPENDIX A: Proofs of Propositions 1 and 2 and Lemma 1

A.1. Preliminary Lemmas.

We begin with the estimation errors

(see (6.2)) and preliminary bounds. Define

which are used to study the difference

in the proof of Proposition 2(ii). The next lemmas hold for general orthonormal systems

of L²(Λ,dx) with

. Recall that v_T is such that

with v_T = o(κ_max^−3/2/ln T); see Assumption V.

LEMMA A.1. Let

be as in (6.2) and

as in (A.1). Then, under Assumptions E, V, and X,

LEMMA A.2. Let m_T(·) and μ_T(·) from

be some functions with support Λ. Then, under Assumptions E, V, and X and if

The functions m_T(·) and μ_T(·) may depend upon (X₁,ε₁),…,(X_T,ε_T) in (A.2) but not in (A.5).

Proofs of Lemmas A.1 and A.2. See Appendix B.

The next lemma is used for Proposition 2. It is stated for general maps φ_k(·) from

, k ≥ 1. Consider the row vector Φ_κ(X_t) = [φ₁(X_t),…, φ_κ(X_t)] and the κ × T matrix Φ_κ = [Φ_κ(X₁)′, …,Φ_κ(X_T)′]′. Define

We make the following assumption.

Assumption B. The matrices V_κ have an inverse with

, and the functions φ_k(·) are such that

Define

We now study the tail probability of Q_T.

LEMMA A.3. Let Q_T = Q_κT be as before. Then, under Assumptions E, X(i), B, and κ = κ_T = o(T^{(3/4)[(1+a)/(5+3a)]}), make the following assumptions.

(i) Let χ(κ) be a chi-square variable with κ degree of freedom. Then

(ii) Consider ε > 0. Then there exists a constant C_ε, which does not depend upon κ and γ, such that for any γ > ε and κ,

Proof of Lemma A.3. See Appendix B.

A.2. Proof of Propositions 1 and 2.

Proof of Proposition 1. For brevity of notation, the proof is made for p = dim θ = 1. Define

This gives

Under Assumption M,

, which gives

, so that

. Consider now A_κ. Under Assumption M, the Taylor formula gives

with a θ_tT* between

and where m₁ and m₂ are

column vectors with bounded entries given by the first- and second-order derivatives. Because U = Δ_T + ε, this gives

The Cauchy–Schwarz inequality gives |A_1κ| ≤ ∥e(θ)∥∥Δ_T∥ with

, so that

because

by the Markov inequality and Assumption E. Because

and under Assumption M, applying (A.5) for A_2κ and the Cauchy–Schwarz inequality for A_3κ give

Substituting in the expression of

give

But

so that substituting (A.4) in the preceding equation and (A.6) give the desired result. █

Proof of Proposition 2. Define

Under the null, Proposition 1 yields

Hence Proposition 2(i) follows from taking κ = κ_min in Lemma A.3(i) and (A.7). Consider now Proposition 2(ii). Let ε be as in (3.4), so that

for T large enough. Therefore (A.7) yields that Proposition 2(ii) is a consequence of

To prove (A.8), we first rewrite R_κ⁰ − R_{κ_min}⁰ as a suitable quadratic form. For k,κ > κ_min, let

be as in (A.1) and consider the row vectors

for some regular κ × κ matrix β_κ. Elementary algebra gives

Hence

We now verify that the quadratic form

obeys the conditions of Lemma A.3. Lemma A.1(i) yields that

, so that Assumption B holds taking φ_∞ = O(κ_min^1/2) = O(ln^{C₂ d/2} T). Recall that κ − κ_min [asymp ] 2^jd − 2^J_mind by the definition (3.1) of

. Hence Lemma A.3(ii) yields, for (A.8),

A.3. Proof of Lemma 1.

In this proof, we apply Lemmas A.1 and A.2 for

, which is such that κ = κ_min = κ_max = o(T^1/3/ln² T). The Jackson theorem (see Timan, 1994, eqn. (8), p. 278) yields that there is a trigonometric polynomial function Π(·) = Π_{Δ_T,κ}(·) with degree [asymp ] κ^1/d such that

Because

is bounded away from 0 over Λ in probability, (A.9) implies that

Note that

. Let Π = [Π(X₁),…, Π(X_T)]′, which is such that

because

is in the space spanned by the columns of

. Hence the triangular inequality and (A.9) give

In the expression (A.9) of Π(·), write β = [β₁,…, β_κ]′, so that the definitions of

in (6.2) and Lemma A.1(ii) give

Substituting shows that (6.3) is proved. Equation (6.4) follows from (A.5) and Assumption V, which gives

APPENDIX B: Proof of Lemmas A.1–A.3

B.1. Proof of Lemma A.1.

We begin with Lemma A.1(i),

. Because

is the largest eigenvalue of the symmetric Σ_κ and ∥Σ_κ⁻¹∥ is the inverse of the smallest eigenvalue of Σ_κ. Hence

Because f (·) and σ(·) are bounded away from 0 and infinity over Λ by Assumptions E and X(ii), and because

is an orthonormal system of L²(Λ,dx), we have uniformly in κ

This gives

, and we now prove that

. Let Ψ_{κ_min}^κ(X_t) = [ψ_{κ_min+1}(X_t),…, ψ_κ(X_t)] and note that

It then follows that

where A [prcue ] B means that A − B is a symmetric nonnegative matrix. This gives that

because the upper bound is a diagonal block submatrix of Σ_κ. Observe that

is also a diagonal block of Σ_κ⁻¹ by the partitioned inverse formula, so that

. This gives

. To show that

, note that

is the L₂(Λ,f (x) dx/σ²(x))-orthogonal projection of ψ_k(·) on ψ₁(·),…, ψ_{κ_min}(·). The Pythagore inequality gives, uniformly in k ≥ 1,

Therefore, the Cauchy–Schwarz inequality gives for all x and κ ≥ 1,

Consider now Lemma A.1(ii) and (iii). Define

Assumptions E and X(i) and (6.1) give

and then, by the Cauchy–Schwarz inequality

and then

, and we now bound

. We have, uniformly in k ≤ κ_max,

Because κ_max [asymp ] K_max^d, Assumption V and κ_max²/T = o(1) yield

Therefore the smallest eigenvalue of

is bounded away from 0 and these matrices have an inverse for 1 ≤ κ ≤ κ_max with a probability tending to 1. The order of

comes from the series expansion

which ends the proof of Lemma A.1(i) and (iii) because sup_κ∥Σ_κ⁻¹∥ < ∞. █

B.2. Proof of Lemma A.2.

Let us recall some results from an empirical process useful to establish some preliminary bounds. Consider the class of functions

from Λ to

with

with [ell ] as in Assumption V. Under Assumption V, there is an M_T [asymp ] v_T such that

Then, to establish Lemma A.2, we can view

as a member of a

. Consider now a sequence of functions from Λ to

and define the empirical process

Modifications of bounds (8.3), (8.7), and (8.9) in Rio (2000) to account for multiplication by m_T(·) and ψ_k(·) with sup_x∈Λ|ψ_k(x)| = 1 show that

Define

so that

. The Chebyshev inequality, (B.2), and Lemma A.1(ii) give

Observe also that the martingale structure of the ε_t's, Assumption E, and (6.1) yield that

It follows that

Note that (A.2) is due to Cauchy–Schwarz inequality and

. Expression (A.3) follows from (B.3) and (B.6). We now prove (A.4). We have

By (B.3), (B.5), (B.6), Lemma A.1(i), Assumption V,

, we have

the other remainder terms being negligible. This gives (A.3).

We now turn to (A.5). Let π_κ(·) = π_κ,T(·) be a trigonometric polynomial function of Π_κ with sup_x∈Λ|m_T(x) − π_κ(x)| ≤ 2 inf_{π(·)∈Π_κ} sup_x∈Λ|m_T(x) − π(x)|. Because

is a linear combination of the columns of

for all κ ≥ κ_min, it follows that

. This gives

with

Consider first the leading term

of (B.7). Because sup_{κ_min} sup_x∈Λ|π_{κ_min}(x)| < ∞ and taking ψ₁(·) = 1 gives, in (B.2),

The definition of π_{κ_min}(·) yields, under Assumption E,

This gives, for the leading term of (B.7),

For the first item of (B.8), note that Assumption E gives that

; see (6.2). Because orthogonal projection decreases the mean squared norm, this gives, for the first term in (B.8),

so that

For the second term in (B.8), observe that

Therefore Lemma A.1, (B.3), and (B.6) yield

For the last item of (B.8), (B.3), (B.4), (B.6), and Lemma A.1 give that

Substituting in (B.8) and (B.7) yields

B.3. Proof of Lemma A.3.

Abbreviate V_κ^−1/2Φ_κ′(X_t)ε_t into η_t. Consider a sequence

variables independent of

, where Id_κ is the identity matrix of dimension κ × κ. Let

be a three time differentiable real function. Define

. The proof of Lemma A.3 is divided into three steps. The main step aims to establish that for

and some C > 0 independent of κ and T,

Step 1. Proof of (B.9). We build on arguments used in the proof of the Lindeberg central limit theorem as given in Billingsley (1968, Thm. 7.2); see Horowitz and Spokoiny (2001, Lem. 10) for a similar approach in the context of adaptive testing. It consists of successive changes of the η_t into their Gaussian counterparts

, as seen from (B.10), which follows. However, a important difference is due to the use of nonparametric series methods and dependence. Define

This gives

Define, for

. A third-order Taylor expansion of

with integral remainder yields

with

Let

be the sigma field generated by

and note that S_tT(0) and Q_tT(0) are

-measurable. Because

are centered given

, we have

Substituting the Taylor expansion in (B.10) yields

and we now bound each of these two sums.

We begin by establishing a preliminary inequality. Let n₁ and n₂ be two positive real numbers with 2 ≤ n₁ + n₂ ≥ 8. Then for any t, t′ and z ∈ [0,1],

We give a proof for

, the other bound being similarly established. The Hölder inequality implies that

Because

is an N(0,σ² Id_κ), it is easily seen that

, and we now bound

. We have, by convexity, the Burkholder inequality (see Chow and Teicher, 1988, p. 396, noticing that

is a sum of difference of martingale), and the Minkowski inequality

This gives

and then (B.14).

We now return to (B.13). The expression (B.11) of the third derivative of

and (B.14) yield

To study (B.12), let Φ_κ(X_t) = V_κ^−1/2Φ_κ′(X_t) = [φ₁(X_t),…, φ_κ(X_t)]′, S_tT = S_tT(0) = [S_1tT,…, S_κtT]′, Q_tT = Q_tT(0). The definitions of

show that

. Therefore because Q_tT and S_tT are

measurable, conditioning with respect to

yields, using the expression of the second-order derivative of

given in (B.11),

Let n be an integer and define

The variables

depend upon

, which are n + 1 time periods far from the φ_k²(X_t)'s. Because

, the Wolkonski–Rozanov inequality yields

by first integrating out with respect to the

, which are independent from the η_t's, and using (B.14). Note that

and

. This together with the definition of

and (B.14) gives

Therefore, (B.16), (B.17), and these inequalities give

Summing over t gives in (B·12)

under Assumption M(i). An optimal choice of the order of n in (B.18) is T^2/(5+3a), which gives the upper bound

. Therefore (B.18) and (B.12), (B.15), and (B.13) yield that (B.9) is proved.

Step 2. Proof of Lemma A.3(i). Now choose a three time continuously differentiable

with

if z > 0. This gives, for any

and then, by (B.9),

Note that

is a

that has a continuous density and converges in distribution to a standard normal if κ goes to infinity. Therefore taking ε small enough gives Lemma A.3(i).

Step 3. Proof of Lemma A.3(ii). The proof is done by bounding

in (B.20). Observe that

has the same distribution as

where the ζ_k's are i.i.d. N(0,1) random variables. As established in the proof of Theorem 7.2 of Billingsley (1968) and changing the

into standard N(0,1) variables, there is a constant C_ε with

Then (B.19) and (B.20) show

Applying the Mill's ratio inequality (see Shorack and Wellner, 1986, p. 850) to

shows that Lemma A.3(ii) is proved. █

References

REFERENCES

Aït-Sahalia, Y. (1996) Testing continuous-time models of the spot interest rate. Review of Financial Studies 9, 385–426.Google Scholar

Aït-Sahalia, Y., P.J. Bickel, & T.M. Stocker (2001) Goodness of fit tests for kernel regression with an application to option-implied volatility. Journal of Econometrics 105, 363–412.Google Scholar

Baraud, Y., S. Huet, & B. Laurent (2003) Adaptive tests of linear hypotheses by model selection. Annals of Statistics 31, 225–251.Google Scholar

Bierens, H.J. (1984) Model specification testing of time series regressions. Journal of Econometrics 26, 323–353.Google Scholar

Billingsley, P. (1968) Convergence of Probability Measures. Wiley.

Brown, L.D. & M.G. Low (1996) Asymptotic equivalence of nonparametric regression and white noise. Annals of Statistics 24, 2384–2398.Google Scholar

Chen, X. & Y. Fan (1999) Consistent hypothesis testing in semiparametric and nonparametric models for econometric time series. Journal of Econometrics 91, 373–401.Google Scholar

Chow, S.C. & H. Teicher (1988) Probability Theory: Independence, Interchangeability, Martingales. Springer.

Domowitz, I. & H. White (1982) Misspecified models with dependent observations. Journal of Econometrics 20, 35–58.Google Scholar

Fan, J. (1996) Test of significance based on wavelet thresholding and Neyman's truncation. Journal of the American Statistical Association 91, 674–688.Google Scholar

Fan, J. & L.S. Huang (2001) Goodness-of-fits test for parametric regression models. Journal of the American Statistical Association 96, 640–652.Google Scholar

Fan, J. & Q. Yao (2003) Nonlinear Time Series: Nonparametric and Parametric Methods. Springer.

Fan, J., C. Zhang, & J. Zhang (2001) Generalized likelihood ratio statistics and Wilks phenomenon. Annals of Statistics 29, 153–193.Google Scholar

Gao, J. & M. King (2001) Estimation and Model Specification Testing in Nonparametric and Semiparametric Regression. Working paper, School of Mathematics and Statistics, the University of Western Australia.

Gao, J. & M. King (2004) Adaptive testing in continuous-time diffusion models. Econometric Theory 20, 844–883.Google Scholar

Gayraud, G. & C. Pouet (2005) Adaptive minimax testing in the discrete regression scheme. Probability Theory and Related Fields 133, 531–58.Google Scholar

Gozalo, P.L. (1997) Nonparametric bootstrap analysis with applications to demographic effects in demand functions. Journal of Econometrics 81, 357–393.Google Scholar

Guerre, E. & P. Lavergne (2002) Optimal minimax rates for nonparametric specification testing in regression models. Econometric Theory 18, 1139–1171.Google Scholar

Guerre, E. & P. Lavergne (2005) Data-driven rate-optimal specification testing in regression models. Annals of Statistics 33, 840–870.Google Scholar

Hamilton, J.D. (2001) A parametric approach to flexible nonlinear inference. Econometrica 69, 537–573.Google Scholar

Härdle, W. & E. Mammen (1993) Comparing nonparametric versus parametric regression fits. Annals of Statistics 21, 1926–1947.Google Scholar

Hart, J.D. (1997) Nonparametric Smoothing and Lack-of-fit Tests. Springer.

Hong, Y. & H. White (1995) Consistent specification testing via nonparametric series regression. Econometrica 63, 1133–1159.Google Scholar

Horowitz, J. & V.G. Spokoiny (2001) An adaptive, rate-optimal test of a parametric mean regression model against a nonparametric alternative. Econometrica 69, 599–631.Google Scholar

Ingster, Y.I. (1992, 1993) Asymptotically minimax hypothesis testing for nonparametric alternatives, parts I, II, and III. Mathematical Methods of Statistics 2, 85–114, 171–189, and 249–268.

Poo, J.R., S. Sperlich, & P. Vieu (2004) An Adaptive Specification Test for Semiparametric Models. Manuscript, Universidad de Zaragoza.

Rio, E. (2000) Théorie asymptotique des processus aléatoires faiblement dépendants. Mathématiques and Applications 31. Springer.

Robinson, P.M. (1989) Hypothesis testing in semiparametric and nonparametric models for econometric time series. Review of Economic Studies 56, 511–534.Google Scholar

Shorack, G.R. & J.A. Wellner (1986) Empirical Processes with Applications to Statistics. Wiley.

Spokoiny, V.G. (1996) Adaptive hypothesis testing using wavelets. Annals of Statistics 24, 2477–2498.Google Scholar

Spokoiny, V.G. (2001) Data-driven testing the fit of linear models. Mathematical Methods of Statistics 10, 465–497.Google Scholar

Timan, A.F. (1994) Theory of Approximation of Functions of a Real Variable. Dover.

Tjøstheim, D. (1994) Non-linear time series: A selective review. Scandinavian Journal of Statistics 21, 97–130.Google Scholar

Size properties (5%) of our test and Hamilton test (LM) (200 observations)

Power properties (5%) of our test and Hamilton test (LM): First experiment (200 observations)

Alternative model (ρ = 0.50). Dashed line, τ = 0.25; thick line, τ = 0.50; and solid line, τ = 1.

Power properties (5%) of our test and Hamilton test (LM): Second experiment (200 observations)

Article contents

A DATA-DRIVEN NONPARAMETRIC SPECIFICATION TEST FOR DYNAMIC REGRESSION MODELS

Abstract

1. INTRODUCTION

2. HEURISTICS OF THE DATA-DRIVEN TEST

3. MAIN RESULTS

3.1. Main Assumptions

3.2. Asymptotic Behavior under the Null

3.3. Detection of Small Alternatives

4. SIMULATION EXPERIMENTS

5. CONCLUDING REMARKS

6. PROOFS OF MAIN RESULTS

6.1. Estimation Errors

6.2. Proof of Theorem 1

6.3. Proof of Theorems 2 and 3

APPENDIX A: Proofs of Propositions 1 and 2 and Lemma 1

A.1. Preliminary Lemmas.

A.2. Proof of Propositions 1 and 2.

A.3. Proof of Lemma 1.

APPENDIX B: Proof of Lemmas A.1–A.3

B.1. Proof of Lemma A.1.

B.2. Proof of Lemma A.2.

B.3. Proof of Lemma A.3.

References

REFERENCES

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests