Hostname: page-component-745bb68f8f-d8cs5 Total loading time: 0 Render date: 2025-02-06T05:16:09.988Z Has data issue: false hasContentIssue false

A DATA-DRIVEN NONPARAMETRIC SPECIFICATION TEST FOR DYNAMIC REGRESSION MODELS

Published online by Cambridge University Press:  23 May 2006

Alain Guay
Affiliation:
Université du Québec à Montréal
Emmanuel Guerre
Affiliation:
LSTA, Université Paris 6
Rights & Permissions [Opens in a new window]

Abstract

The paper introduces a new nonparametric specification test for dynamic regression models. The test combines chi-square statistics based on Fourier series regression. A data-driven choice of the regression order, which uses the square root of the number of Fourier coefficients, is proposed. The benefits of the new test are (1) the selection procedure produces explicit and chi-square critical values that give a finite-sample size close to the nominal size; (2) the test is adaptive rate-optimal and detects local alternatives converging to the null with a rate that can be made arbitrarily close to the parametric rate. Simulation experiments illustrate the practical relevance of the new test.The first author acknowledges financial support from the Fonds Québécois de la Recherche sur la Société et la Culture (FQRSC). The second author acknowledges financial support from LSTA.

Type
Research Article
Copyright
© 2006 Cambridge University Press

1. INTRODUCTION

Starting with Bierens (1984) and Robinson (1989), nonparametric specification testing for dependent data has received much attention in the econometric literature. The range of potential applications includes nonlinearity tests and time series model building as reviewed in Tjøstheim (1994) and Fan and Yao (2003), specification of a continuous-time diffusion model for interest rates (Aït-Sahalia, 1996), specification of the Phillips curve (Hamilton, 2001), rational expectations models and conditional portfolio efficiency (Chen and Fan, 1999; Robinson, 1989), and tests of the Black and Scholes formula (Aït-Sahalia, Bickel, and Stocker, 2001) among others.

An important branch of this literature has considered a nonparametric approach that uses a smoothing parameter, such as a bandwidth or the order of a series expansion. This has raised two important issues, the detection properties and the size accuracy. The former can be addressed with efficiency considerations, as pioneered in Ingster (1992, 1993); see also Guerre and Lavergne (2002). This framework leads to calibration tests to detect alternatives, in a given smoothness class, that approach the null at the fastest possible rate. However, the proposed smoothing parameters depend upon the chosen smoothness class, which is too restrictive for practical applications because the choice of a smoothness class is often arbitrary. Regarding the size issue, the statistics considered in the literature are often quadratic, but the critical values are computed from a normal approximation that may be inaccurate; see Hong and White (1995) for nonparametric series and Tjøstheim (1994) for kernel methods. Recent work for independent and identically distributed (i.i.d.) observations, such as Fan, Zhang, and Zhang (2001), suggests that more sophisticated approximations should be used instead of the normal. Härdle and Mammen (1993) and Gozalo (1997), among others, have proposed bootstrapped critical values as a solution. This may be difficult when the parametric model under consideration is specified in continuous time and is therefore costly to simulate or to bootstrap. Bootstrapping is also a burden when the dynamic specification includes covariates that are not strongly exogenous and need to be simulated.

An important step for the detection issue was the development of the adaptive framework. Under this approach, the smoothness class containing the alternative is considered unknown. Adaptive tests combine several statistics, designed for a specific class, to build a test; see Hart (1997) for a review of earlier work in this direction. Spokoiny (1996) has developed an efficiency theory for the adaptive case. Various papers considered adaptive rate-optimal tests using the maximum of the statistics, including Fan (1996), Fan and Huang (2001), Horowitz and Spokoiny (2001), and Spokoiny (1996, 2001). More specifically, Horowitz and Spokoiny (2001) have proposed an adaptive rate-optimal kernel-based specification test for a general parametric regression model that has generated various extensions. Baraud, Huet, and Laurent (2003) consider some nonasymptotic refinements of the maximum approach for specification of a linear model. Poo, Sperlich, and Vieu (2004) are interested in a semiparametric null hypothesis, whereas Gayraud and Pouet (2005) considered a nonparametric null. Gao and King (2001, 2004) and Fan and Yao (2003) have proposed extending the scope of applications to dependent data.

However, the maximum approach produces statistics with unstable asymptotic null behavior, so that achieving an accurate size remains a difficult issue. Fan (1996) found that the null limit distribution of his test gives a poor approximation for finite samples. Horowitz and Spokoiny (2001) did not derive a null limit distribution and used simulated critical values. On the other hand, Guerre and Lavergne (2005) built on a data-driven selection procedure that, under the null, selects a prescribed statistic with a high probability. Compared to the maximum approach, this considerably reduces the complexity of the null behavior of the resulting test statistic, which asymptotic distribution is a standard normal given by a specific statistic. But the statistics of Guerre and Lavergne (2005) have a complicated quadratic structure, and so these authors used bootstrapped critical to achieve a level close to the nominal size. Hence, as mentioned earlier, such an approach may not be suitable for a dynamic model.

In this paper, a suitable modification of the Guerre and Lavergne (2005) test is proposed to derive an adaptive rate-optimal specification test with an accurate size in a dynamic setting. The null hypothesis considered is the specification of the conditional mean for a time series with heteroskedastic innovations. Nonparametric series methods are used to compute chi-square statistics of various orders, which, in case of low degrees of freedom, have an accurate chi-square approximation under the null. A selection criterion, using a low penalty term proportional to the square root of the number of coefficients, chooses a test statistic. Hence the rejection region of the test can use accurate chi-square critical values. The rest of the paper is organized as follows. Section 2 presents our test and the adaptive framework on a nontechnical level. Section 3 groups our main assumptions and our main results. After studying the null behavior of the test, adaptive rate-optimality is introduced, and the test is shown to be efficient. Detection of local alternatives, approaching the null with a rate close to the parametric one, is also considered. Section 4 illustrates the size and detection properties of the test with a simulation experiment, and Section 5 concludes the paper. The proofs are grouped in Section 6 and two Appendixes.

2. HEURISTICS OF THE DATA-DRIVEN TEST

Consider an autoregressive model with exogenous variables Zt,

with

, where

is the past Borel field generated by X1,…, Xt. Given T observations (Y1,X1),…,(YT,XT), we want to test that μ(·) belongs to some parametric family

, that is, the correct specification hypothesis

The proposed procedure builds on the estimated residuals

, where

is a consistent estimator of θ under H0, such as, for instance, the nonlinear least squares estimator

By Yt = μ(Xt) + εt, the residuals decompose as

, where

indicates potential misspecification, which asymptotically vanishes under the null but not under the alternative. Our test combines nonparametric series statistics constructed by projecting the residuals to detect the presence of a significant

over a compact Λ = [−λ,λ]d. More specifically, we focus on multivariate Fourier series regression.

1

Using other series approximation methods, as, for instance, polynomial functions or wavelets, is possible but leads to a more involved theoretical study. Indeed, the Fourier system satisfies

, a condition that simplifies algebraic manipulations under dependence mixing conditions. Another interest of Fourier methods is that using wavelets may limit the scope of applications to alternatives with a maximal smoothness given by the choice of the wavelet basis; see the wavelet tests considered in Spokoiny (1996) and Theorem 2.4 therein.

For

, define the kth trigonometric function over Λ as

so that

is an L2(dx)-orthonormal system, that is, ∫Λ ψk(xk′(x) dx = 1 if k = k′ and 0 otherwise. Let

be the degree of ψk(·). The series estimation of

over Λ builds on trigonometric multivariate polynomial function [sum ]|k|≤K bkψk(·) of degree K, with a number cK of coefficients bk proportional to Kd. To account for heteroskedasticity, assume that an estimator

of σ(·) is given and consider the generalized least squares estimator

,

where

is the diagonal matrix with entries

, and ΨK is the T × cK matrix [ψk(Xt),1 ≤ tT,|k| ≤ K]. Suppose that

is a trigonometric polynomial function of order K. A standard procedure to test the significance of Fourier coefficients would use the chi-square statistic

leading to rejection of H0 when

is large. However, assuming that

has a finite series expansion of known order K is too simplistic for practical applications. More generally, an arbitrary choice of K may affect the power, and a better understanding of the impact of K is important to build a proper specification test. Set

so that

decomposes into three terms

with

The term

is crucial regarding detection of potential misspecification. It is the squared norm of the orthogonal projection of

on the columns of

, which increases with K up to

, achieved for cKT. Hence

can be viewed as a downward-biased estimation of the empirical measure of misspecification

, that is,

where biasμ(K) ≤ 0 depends upon the unknown μ(·) and decreases with K. The other important term in the decomposition (2.3) of the statistic

is

, a pure noise term. It can be expected that

is asymptotically a chi-square variable with cK degree of freedom, with mean cK and variance 2cK, so that

. Neglecting

2

Assume that H0 is μ(·) = 0 and that σ(·) is known so that

and the choice

is possible. In the case of Gaussian i.i.d. εt independent of the Xt's,

would be an

, which can be neglected with respect to

when this variable diverges. Note also that the distribution of

coincides with its chi-square approximation for such

.

and substituting in (2.3) gives a bias variance type decomposition for

3

Note that

is a better misspecification indicator than

, which is affected by an additional systematic bias term cK. Guerre and Lavergne (2005) proposed a different bias correction that makes asymptotic inference less accurate in finite sample, so that the bootstrap is used.

Looking for the best estimator

of the misspecification indicator suggests that an ideal choice of K should achieve the minimum of

. However, this is infeasible in practice, at least because biasμ(·) depends upon the unknown μ(·). Alternative feasible choices of K include the Akaike information criterion (AIC) and Bayesian information criterion (BIC) as reviewed in Hart (1997). These selection procedures consider a K achieving the maximum of

where γ is a penalty parameter. According to (2.4), this amounts to achieving the minimum of

. Therefore these selection procedures asymptotically balance |biasμ(K)| with (γ − 1)cK in place of the ideal order cK1/2 in (2.4). This suggests using instead a lower penalty term of the form ck + γcK1/2 affecting the square root of the number of coefficients cK1/2 in place of ck. More specifically, let

be a set of admissible degree K larger than or equal to Kmin. Our data-driven choice of K is

The introduction of Kmin quantities in the penalty criterion reflects a preference for low degree as justified now from considerations on the null behavior of the retained

.

As seen from Fan (1996) or Horowitz and Spokoiny (2001), finding an accurate approximation for the null distribution of a statistic that combines the

's as

is difficult. A first distinctive feature is that the selection procedure (2.5) is flexible enough to limit the contribution of the statistics with high K by taking γT large enough. Indeed, a limit case is γT = +∞, which gives that

. This continues to hold asymptotically provided γT diverges fast enough, as shown in Theorem 1 in Section 3. Moreover, as detailed now, an accurate approximation of the distribution of

is a standard chi-square. Because

asymptotically vanishes under H0, (2.3) shows that the null distribution of

is approximately that of

and then, neglecting the effect of the variance estimation, of

where Ω1/2 = Diag[σ(X1),…, σ(XT)]. In the i.i.d. case and according to the Berry–Esseen bound in Hart (1997, Thm. 7.2), the distribution of the vector

has a normal approximation up to an error a(cK)/T1/2 where a(cK) diverges with cK. Therefore, the distribution of the chi-squared statistic R3K should be close to a chi-square with cK degree of freedom up to an error a(cK)/T1/2, which is smaller for moderate K.4

This continues to hold in the dependent setup where the bound (B.9) in Appendix B gives a more complicated error term, which is K2d/T1/2 at best. A normal approximation would be affected with a bigger K2d/T1/2 + Kd/2 error term.

Hence the test uses a chi-square critical value zα = zα,T with

where χ(c) is a chi-square with c degree of freedom and rejects H0 if5

A second distinctive feature of the selection procedure (2.5) is standardization with cKmin in the critical region

; see (2.6). Because

asymptotically, an alternative α-level critical region would use

in place of cKmin. But such a choice would asymptotically reduce power because

. This also contrasts with a maximum procedure that would use the test statistic

with a

larger than cKmin. The simulation experiments of Guerre and Lavergne (2005) revealed that such a construction of the critical region (2.6) gives a test that improves on its adaptive rate-optimal competitors.

Consider now the power issue. The data-driven choice (2.5) of K combines the detection properties of each of the

's. Indeed, because cKcKmin for any K in

, we have

This gives the power lower bound

which holds in particular for an optimal K that balances the bias with the penalty term. Taking K = Kmin gives that

a power bound that shows that the test (2.6) improves on the one using the single statistic

. As seen from (2.4) and (2.8), consistency holds as soon as there is a degree K in

such that the misspecification measure

is asymptotically larger than the sum of

. Hence increasing γT too much should give a less powerful test. The form of the low penalty term in (2.5) is crucial to show adaptive rate-optimality; see Theorem 2 in Section 3. Theorem 3 in Section 3 shows that the test detects Pitman local alternatives with a rate arbitrarily close to the rate T−1/2.

3. MAIN RESULTS

3.1. Main Assumptions

Consider T observations (Yt,Xt) with

, and where μ(·) can depend upon T, in which case (Yt,Xt) forms a triangular array (YtT,XtT). Let

denote the Borel field generated by X11,…, Xtt and Xtt,Xt+1t+1,… , respectively. The α-mixing coefficients of

are

The next assumptions deal with the εt's, the mixing coefficients, and the parametric mean.

Assumption E. Let

be the Borel field generated by (X10),…,(Xtt−1). The variables

are martingale difference with

. The standard deviation function, σ(·) = Var[εt|Xt = ·], is continuous and bounded away from 0 on

.

Assumption X. The process

on

is stationary, with the following conditions holding.

(i) α(n) ≤ An−1−a for some constant A,a > 0.

(ii) The variable Xt has a density f (·) with respect to the Lebesgue measure on

. The density f (·) is bounded away from 0 and infinity.

Assumption M. The parameter set Θ is a subset of

, and the following conditions hold.

(i) The regression function m(x;θ) is twice continuously differentiable with respect to θ. The gradient m(1)(x;θ) and Hessian matrix m(2)(x;θ) are bounded over Λ × Θ.

(ii) For any sequence of regression functions μT(·) with

, there exists a sequence of parameter θT in Θ such that

, with θT = θ if μT(·) = m(·;θ) for some θ in Θ.

Assumption E ensures that the sums

are martingales that are asymptotically normal under Assumption X(i). The polynomial mixing rate of X(i) is a minimal rate to achieve T1/2-consistency in the weak law of large numbers for the empirical mean T−1ΨK′Ω−1ΨK. Under Assumption X(ii), the limit of T−1ΨK′Ω−1ΨK has an inverse. Mixing conditions for Markovian (Yt,Xt) as in Assumption X(i) can be derived using a drift condition; see Fan and Yao (2003, Thm. 2.4) and the references therein. When

, the sequence θT in Assumption M(ii) is the pseudo–true value

, which is uniquely defined under identification of the parametric regression model; see Domowitz and White (1982). Assumption M(i) then ensures that

is close to Δ(·) = μT(·) − m(·;θT) over Λ up to an

term.

Let us now turn to the construction of the test. The first assumption specifies a set of admissible degrees

in the spirit of the dyadic bandwidth set of Horowitz and Spokoiny (2001).

Assumption K. Let a be as in Assumption X. Set Kmax = 2Jmax = O(TC1 /d) for some C1 in (0,¾[(1 + a)/(5 + 3a)]), Kmin = 2Jmin → ∞ with Kmind = O(lnC2 T) for C2 > 0, where JminJmax are integer numbers. The set of admissible degrees

is dyadic, that is,

Note that (3.1) and the polynomial divergence rate of Kmax imply that

is of exact order ln T. Such a restriction is helpful to show that

asymptotically under the null but also has some practical justifications. Indeed, achieving a small

is an important condition to get an accurate size. Because

vanishes if and only if K = Kmin, (2.5) yields that

if and only if one of these penalized statistics is strictly positive for a KKmin, or equivalently

Hence

so that

increases with

and decreases with the penalty sequence γT. Therefore, using a parsimonious

can improve the size accuracy of the test. On the other hand, a dyadic

as in Assumption K contains sequences with any arbitrary order between lnC2 T and TC1 that is sufficient for adaptive rate-optimality. The constant C1 of Assumption K must be smaller than

where a comes from Assumption X(i), α(n) = O(n−1−a). This gives a Kmax of order T1/(4d) at best, whereas, in the i.i.d. setup, Hong and White (1995) allowed for a better order T1/(3d) when using a single series statistic on which to base the test.

Let us now turn to variance estimation. The next condition allows us to approximate

with T−1ΨK′Ω−1ΨK for degrees K depending on the sample size T, as in Assumption K.

Assumption V. Let

. Then, for the considered sequence of regression models

and, for some integer [ell ] > d/2 and all ([ell ]1,…,[ell ]d) with

, where vT = o(Kmax−3d/2/ln T) and lim infT→∞ T 1/2vT > 0.

Assumption V requires consistency of

under the null and the alternative. Convergence of

with the rate vT requires that μT(·) and σ(·) satisfy a minimal smoothness condition. As seen from Guerre and Lavergne (2002), consistency is not necessary under the alternative but can be useful to get a powerful test. Under homoskedasticity, a simple choice of

is a constant difference-based estimator, in which case Assumption V holds with a best possible vT = T−1/2 so that Kmax = o(T1/(3d) ln2/(3d) T). The heteroskedastic case requires nonparametric variance estimation, such as kernel, sieves, series expansion; see, among others, Guerre and Lavergne (2002, 2005) and Horowitz and Spokoiny (2001). The rate vT is then the consistency rate for the [ell ]th partial derivatives, which restricts the divergence rate of Kmax.

3.2. Asymptotic Behavior under the Null

As discussed following (3.2) and (2.9), a fast divergence rate for γT is useful to achieve an accurate size under the null but may negatively affect its power properties. Therefore, an important issue is to find a minimal divergence rate for γT ensuring that the test is asymptotically of level α or equivalently that

asymptotically vanishes under H0. The Bonferroni inequality gives, in (3.2),

and showing that the last sum asymptotically vanishes for small γT necessitates precise uniform bounds for these probabilities, so that simple Chebychev-type inequalities may not be sufficient. Better Gaussian-type bounds in the spirit of Mill's ratio inequality

are derived in Lemma A.3 in Appendix A. Because the exact order of

is ln T, the next theorem ensures that the asymptotic size of the test is α provided that the penalty sequence γT diverges faster than (ln ln T)1/2.

THEOREM 1. Consider that the null hypothesis H0 is true and assume that Assumptions E, K, M, V, and X hold. Then, if γT diverges with

, and the test (2.6) is asymptotically of level α.

The minimal divergence rate (ln ln T)1/2 ensuring that the test is asymptotically of level α is surprisingly low compared to the penalty term of order ln T used in the BIC criterion. Such improvement comes from the Gaussian-type bounds used for the tails of the standardized

. Indeed, this gives, up to remainder terms, a bound

in (3.3), which asymptotically vanishes provided that (3.4) holds. On the other hand, such a low rate is in line with previous findings for rate-optimal adaptive testing. Indeed, (3.2) shows that suitable γT should resemble the critical values of a maximum test such as that of Fan (1996), who found critical values with a typical rate of (2 ln ln T)1/2. This suggests that our minimal rate condition (3.4) cannot be improved.

Another condition for Theorem 1 to hold is that Kmin diverges with the sample size; see Assumption K. This is used to neglect the parametric estimation error

in the chi-square approximation of the distribution of

. Accounting for such an effect would allow us to consider a fixed Kmin; see, for example, Hart (1997, Sect. 8.3.1).

3.3. Detection of Small Alternatives

As discussed following equation (2.9), the detection properties of the test depend upon a bias term from (2.4). Establishing formal adaptive rate-optimality of the test necessitates bounding this bias. The current mathematical approach to do so makes use of some smoothness restrictions. We consider here Hölder smoothness classes

that we introduce now. Define the departure from the null as

with a θT as in Assumption M. We restrict ourselves to departures Δ(·) with a restriction to Λ that admits a (2λ)-periodic extension. Consider first the case s ∈ (0,1], for which

For real s > 0, let [lfloor ]s[rfloor ] be the lower integer part of s, that is, the unique integer number satisfying [lfloor ]s[rfloor ] < s ≤ [lfloor ]s[rfloor ] + 1, so that s − [lfloor ]s[rfloor ] is in (0,1] with s − [lfloor ]s[rfloor ] = s for s ∈ (0,1]. For any s > 0, the smoothness class

is defined as

Hence the smoothness class

is defined for all s > 0 and L > 0. Lemma 1 in Section 6 gives, for the bias term of (2.4), the following bound:

for any Δμ,T(·) in

and any K. This gives, for small alternatives, which are the harder to detect,

Our minimax adaptive framework evaluates tests uniformly over alternatives at distance ρ from the null, that is, in

with unknown smoothness index (L,s). Such alternatives allow for a general shape of Δμ,T(·) with narrow peaks and valleys that may depend upon on T; see Horowitz and Spokoiny (2001). As pointed out in Guerre and Lavergne (2005), uniform consistency over

is equivalent to consistency against any sequence

in

as considered here. A crucial issue is the choice of a suitable asymptotically vanishing rate

. Indeed, some of the alternatives of

will not be detected by any tests if

goes to 0 at too rapid a rate. On the other hand, detection can become straightforward if

remains far from the null. Hence a good candidate

to evaluate a test is a frontier rate that separates these two extreme situations. In the adaptive approach, such a rate depends upon the unknown smoothness index s, and Spokoiny (1996) has shown that the optimal adaptive rate is

6

Spokoiny (1996) studied the continuous time white noise model (CTWN)

, where {W(t)}t∈[0,1] is a standard Brownian motion. Although this model is mainly of theoretical interest, results established for the CTWN model extend to more common models through model equivalence; see Brown and Low (1996).

which is slower than the parametric rate T−1/2. Guerre and Lavergne (2002) derived an optimal rate for a known smoothness index s that improves ρT from the (ln ln T)1/2 factor, so that the price to pay for rate adaptation is moderate. As is well known, the rate ρT decreases faster than the nonparametric estimation rate Ts/(2s+d). The adaptive rate-optimality of our test is stated in the next result.

THEOREM 2. Consider a sequence of alternatives

with sd(2/C1 − 1)/4, L > 0, C3 > 0, and

. Assume that Assumptions E, K, M, and V hold. Then, if γT is of exact order (ln ln T)1/2 and provided C3 is taken large enough, the test is consistent, that is,

.

The proof of Theorem 2 builds on the lower power bound (2.8) and on the bias variance decomposition (2.4). In view of the bias order (3.5) for small alternatives, an optimal choice of K in (2.8) is such that the order of the penalty term γT Kd/2 is proportional to TK−2s, that is, for

where [·] is the integer part. Such K* detects alternatives within the bias order divided by the sample size, K*s ∝ (γT /T)2s/(4s+d), which coincides with the optimal adaptive order ρT provided γT has the smallest possible order (ln ln T)1/2 compatible with Theorem 1. Note that, under Assumption K, K* is in

provided sd(2/C1 − 1)/4, which implies that s > 7d/4.

Because adaptation means detection over various smoothness classes

, it is crucial that the test combine several statistics, as seen from the optimal K* in (3.6), which depends on the smoothing index s. Therefore, tests that use a single statistic

generally fail to be rate-optimal adaptive. A more specific property of the test (2.6) is detection of small local alternatives.

THEOREM 3. Consider a sequence of local alternatives μT(·) satisfying

Then, under Assumptions E, K, M, V, and X, the test is consistent provided

.

Because Kmin can diverge very slowly, the rate rT can be arbitrarily close to the parametric detection rate 1/T1/2. This slightly improves on the results of Horowitz and Spokoiny (2001), who achieved a rate (ln ln T)1/2/T1/2. A key argument there is that the local alternatives of Theorem 3 are asymptotically very smooth, because the departure from the null rT Δ0T(·) is in

, with a Lipschitz constant LrT that goes to 0. Hence these alternatives differ from the general ones in Theorem 2, and they are typically detected by trigonometric series with low degree such as Kmin, so that (2.9) yields consistency of the test. On the other hand, using the single statistic

would give a test that is not consistent against the alternatives of Theorem 2, so that combining several statistics as in our procedure is crucial to achieve these opposite kinds of detection properties.

4. SIMULATION EXPERIMENTS

In this section we study the size and the power properties of the proposed procedure when testing for a null of linearity in the context of a Markov process of order 1. The resulting test is compared with the one developed by Hamilton (2001) to detect nonlinearity. First, to examine the size properties, we use the AR(1)

Three distributions are considered for the error term: standard normal, standardized student with five degrees of freedom, and a centered and standardized exponential. To examine the sensitivity of the tests to temporal dependence, we consider various values of the autoregressive parameter ρ, namely, ρ = 0, 0.25, 0.50, 0.75. To implement our test, we choose the interval (Λ in Section 2) for projecting the covariate Yt−1 onto the trigonometric expansion to be equal to 2 divided by standard error of Yt under the null. This corresponds to approximately 95% of the observations. The set

is equal to {1,2,4,8,16}. The asymptotic critical value is given by

, where χ0.05(1) is the critical value at 5% of a chi-square with one degree of freedom. We study the small-sample properties of the test for various values of the penalty parameter γT. We fix γT equal to

where we set c = 2,3,5. The parameters are estimated by ordinary least squares (OLS). The sample size is set to 200, and the number of simulations is equal to 10,000.

The simulation results for the size, which are presented in Table 1, are encouraging. For c = 2 the test slightly overrejects in all cases. However, for c = 3,5, the size is accurate whatever the distribution, persistence, and number of observations considered. The Lagrange multiplier (LM) test developed by Hamilton (2001) shares these good size properties.

Size properties (5%) of our test and Hamilton test (LM) (200 observations)

To study the effect on power of the penalty sequence γT, two alternative specifications of the linear autoregressive process are examined. The first specification is a threshold autoregressive model defined as

where εt is i.i.d. N(0,1).7

Results for the normal distribution are only reported here because the results for the two other distributions are very similar. Of course, those results can be obtained upon request.

This representation contains two regimes delimited by a threshold equal to zero. When Yt−1 is greater than zero, the dynamic dependence is controlled by the parameter ρ1. In the case where it is inferior to zero, the dynamic depends on the parameter ρ2. Under the null of linearity ρ1 = ρ2. The distance from the null is a function of the absolute value of the difference between ρ1 and ρ2. To see this, we can rewrite the threshold autoregressive model as follows:

Thus, under the null, μ(Xt) = ρ1Yt−1 whereas the nonlinear alternative is

To examine the sensitivity of the tests to temporal dependence, we consider various types of dependence for the process Yt. We run the following experiments: (1) ρ1 = 0 and ρ2 = 0.25, 0.50, 0.75, (2) ρ1 = 0.25 and ρ2 = 0.50, 0.75, −0.50, (3) ρ1 = 0.50 and ρ2 = 0.25, 0, −0.25, and (4) ρ1 = 0.75 and ρ2 = 0.50, 0.25, 0. The values of ρ2 under the alternative are chosen such that the parameter (δ) that governs the distance from the null is equal to 0.25, 0.50, and 0.75, respectively. Table 2 reports the power results. Our test is more powerful than Hamilton's for all cases. Our power gains increase with the degree of temporal dependence and the distance of the alternative from the null. The difference in the rejection rate can be as high as 38%.

Power properties (5%) of our test and Hamilton test (LM): First experiment (200 observations)

The second experiment corresponds to an alternative for which the data-driven optimal test is specially designed. The alternative models have the following form:

where

, and εt is i.i.d. N(0,1). Figure 1 shows the function f (·) for τ = 1, 0.50, and 0.25, ρ = 0.50, and values of Yt between −10 and 10. The function f (·) is symmetric around zero and more concentrated for smaller values of τ. The function is bounded between zero and one, with f (0) = 1 and limx→±∞ f (x) = 0. We can easily show that the alternative (4.1) respects the drift condition of Fan and Yao (2003, Thm. 2.4) for geometric ergodicity. This alternative is then compatible with the assumptions in this paper.

Alternative model (ρ = 0.50). Dashed line, τ = 0.25; thick line, τ = 0.50; and solid line, τ = 1.

We examine the sensitivity of the tests to the narrowness of the peak and temporal dependence. We consider the parameter values τ = 25, 0.50, 0.75 and ρ = 0.25, 0.50, 0.75. Table 3 shows the results of the experiment. For τ = 1, Hamilton's test is close to the nominal size. For 200 observations, our test rejects at a rate of 17% for ρ = 0.25 and 56% for ρ = 0.75. For τ = 0.50, our test also clearly dominates the test proposed by Hamilton for all cases. For a narrow peak (τ = 0.25), the rejection rate of both tests is quite similar. The better performance of the Hamilton test for this alternative compared to the one with a wider peak is probably due to the specification of the variance-covariance function of the random field underlying the test statistic. See Hamilton (2001) for further details on the construction of this test.

Power properties (5%) of our test and Hamilton test (LM): Second experiment (200 observations)

5. CONCLUDING REMARKS

This paper proposes a new adaptive rate-optimal specification test for time series. As in the maximum approach of Fan (1996) or Horowitz and Spokoiny (2001), the test combines several statistics to achieve adaptive rate-optimality. More specifically, the test builds on series regression chi-square statistics with increasing orders. A data-driven selection procedure, in the spirit of Guerre and Lavergne (2005), uses a penalty term proportional to the square root of the number of Fourier coefficients to choose the test statistic. Under the null, the retained statistic is, with high probability, a statistic with a distribution close to a chi-square. Therefore, standard chi-square critical values can be used, allowing for better control of the size of the test. This contrasts with the maximum approach, where using a null limit distribution performs poorly, as noted in Fan (1996), or is out of reach, as in Horowitz and Spokoiny (2001). Hence, the maximum approach necessitates the use of simulated critical values, limiting the scope of applications to time series models that can be easily simulated. A simulation experiment confirms the good level properties of the proposed test, which shows interesting power improvements compared to a simpler test using a single statistic such as that of Hamilton (2001). We also examine the power of the test that is adaptive rate-optimal and detects local alternatives approaching the null at a faster rate than in Horowitz and Spokoiny (2001). The simulation experiment shows that the choice of the penalty term has a moderate impact on the power. This positively illustrates the interest of our approach, which builds on the fact that the combination mechanism inherent to adaptive testing can also be designed to achieve a level close to the nominal size.

Although our results are stated for Fourier series methods, our approach also applies to wavelets or polynomial series regression. As noted in Guerre and Lavergne (2005), the series construction of the test statistic easily can be modified to cope with additive alternatives that are not affected by the curse of dimensionality. Obtaining an accurate size in the case of kernel or local polynomial methods is theoretically feasible. The scope of applications of the new data-driven selection procedure can also be extended as discussed in Hart (1997) for earlier adaptive procedures or as in Tjøstheim (1994) and Fan and Yao (2003) in the time series context, in addition to many other specification hypotheses of econometric interest.

6. PROOFS OF MAIN RESULTS

The proofs are organized as follows. Important intermediate results and proofs of the main statements are given in Section 6. Proofs of auxiliary results are gathered in Appendixes A and B. We now introduce some notation and conventions. All functions can be set to 0 outside Λ without loss of generality. We set

. The symbol aT [asymp ] bT means that the two sequences aT, bT with the same sign are such that c|aT| ≤ |bT| ≤ C|aT| for some 0 < cC < ∞ and T ≥ 1. Constants are denoted by the generic letter C and vary from expression to expression.

For notational convenience, we reindex the trigonometric functions (2.1) as

and set cK = κ. We assume that the new ordering is such that ΨK = [ψ1,…, ψκ] and uses the notation Ψκ for ΨK. Here

, is a column vector with

. Therefore Ψκ is a T × κ matrix and κ [asymp ] Kd. With little abuse of notation,

T denotes both the set of admissible K or κ with κ between κmin [asymp ] 2Jmind and κmax [asymp ] 2Jmaxd. The term

corresponds to

. The variance estimation rate in Assumption V is such that vT = omax−3/2/ln T).

Let ∥·∥ be the euclidean norm of

, that is, if

. If m = [m(X1),…, m(XT)]′ where m(·) maps

. Under Assumption E,

. For a κ × κ matrix Σ = [Σk[ell ]]1≤k,[ell ]≤κ, ∥Σ∥ is the spectral radius

. Recall that ∥Σu∥ ≤ ∥Σ∥∥u∥, |u1′Σu2| ≤ ∥Σ∥∥u1|∥u2∥. It follows that the entries of Σu are bounded by κ1/2∥Σ∥max1≤k≤κ|uk|. If Σ is a symmetric matrix, ∥Σ∥ = supu∥=1|u′Σu| is the largest eigenvalue in absolute value of Σ. Because

is the orthogonal projection on the space spanned by the columns of

, we have

In what follows, we bound variance of sums using the Wolkonski–Rozanov inequality (see Fan and Yao, 2003, Prop. 2.5(ii)), which states that

for any real-valued bounded g1(·) and g2(·). This gives

6.1. Estimation Errors

We consider first the parametric and variance estimation errors induced by

, respectively. For ΔT(·) = μT(·) − m(xT), set U = ΔT + ε and let Ω1/2 be the T × T diagonal matrix with entries σ(Xt). Set

where

PROPOSITION 1. Consider a departure from the null such that

. Under Assumptions E, M, V, and X, and if κmin → ∞, κmax = O(T1/3/ln2 T), we have

Proof of Proposition 1. See Appendix A.

6.2. Proof of Theorem 1

The next proposition is the key tool to establish Theorem 1.

PROPOSITION 2. Assume that H0 holds, that is, ΔT(·) = 0. Then under Assumptions E, K, M, V, and X, make the following assumptions.

(i) Let χ(κ) be a chi-square variable with κ degree of freedom. Then, for any κ = κT in

,

(ii) Assume that (3.4) holds, that is, that for some ε > 0,

. Then

Proof of Proposition 2. See Appendix A.

Proof of Theorem 1. Equation (3.2) and Proposition 2(ii) yield

Then the definition of zα in (2.6) and Proposition 2(i) yield

6.3. Proof of Theorems 2 and 3

The next lemma is crucial for the consistency properties of the test and is used for the item

in (2.3).

LEMMA 1. Consider a departure from the null such that

. Assume that Assumptions E, V, and X hold and that κ = κT diverges with κ = o(T1/3/ln2 T).

Then there exists a constant C5 > 0, depending upon s, L, and Λ, such that for any

, any Δ(·) from Λ to

in

, we have

Proof of Lemma 1. See Appendix A.

Proof of Theorem 2. Let sd(2/C1 − 1) and L be some unknown smoothness indexes. Let K* be as in (3.6), so that K* corresponds to a κ* in the new indexation. Observe that this κ* is such that

because the exact order of γT is ln1/2 ln T, s > 0, and κmin is smaller than a power of ln T.

Consider now a sequence of alternatives μT(·) in H1(C3T) with C3 ρT > 2C5κ*s/d, where C5 is from Lemma 1. This gives that

and that

diverges. Hence Lemma 1 gives

Observe also that Proposition 2(i) shows that

Hence, (6.5), applying Proposition 1 for

(so that κmax = κmin = κT), and substituting yield

provided C3 is large enough. The lower power bound (2.8) then shows that Theorem 2 is proved. █

Proof of Theorem 3. Because the proof of Theorem 3 is similar to the proof of Theorem 2 up to the fact that detection is achieved through κmin, we just give the main steps. Expression (2.7) yields that

, so that it is sufficient to show that

diverges to +∞ in probability. Building on Propositions 1 and 2(i) and Lemma 1 as for Theorem 2 now gives, because κmin [asymp ] Kmind → ∞,

provided TrT2 diverges with limT→∞ Kmind/2/(TrT2) = 0 as assumed in Theorem 3. █

APPENDIX A: Proofs of Propositions 1 and 2 and Lemma 1

A.1. Preliminary Lemmas.

We begin with the estimation errors

(see (6.2)) and preliminary bounds. Define

which are used to study the difference

in the proof of Proposition 2(ii). The next lemmas hold for general orthonormal systems

of L2(Λ,dx) with

. Recall that vT is such that

with vT = omax−3/2/ln T); see Assumption V.

LEMMA A.1. Let

be as in (6.2) and

as in (A.1). Then, under Assumptions E, V, and X,

LEMMA A.2. Let mT(·) and μT(·) from

be some functions with support Λ. Then, under Assumptions E, V, and X and if

,

The functions mT(·) and μT(·) may depend upon (X11),…,(XTT) in (A.2) but not in (A.5).

Proofs of Lemmas A.1 and A.2. See Appendix B.

The next lemma is used for Proposition 2. It is stated for general maps φk(·) from

, k ≥ 1. Consider the row vector Φκ(Xt) = [φ1(Xt),…, φκ(Xt)] and the κ × T matrix Φκ = [Φκ(X1)′, …,Φκ(XT)′]′. Define

We make the following assumption.

Assumption B. The matrices Vκ have an inverse with

, and the functions φk(·) are such that

.

Define

We now study the tail probability of QT.

LEMMA A.3. Let QT = QκT be as before. Then, under Assumptions E, X(i), B, and κ = κT = o(T(3/4)[(1+a)/(5+3a)]), make the following assumptions.

(i) Let χ(κ) be a chi-square variable with κ degree of freedom. Then

(ii) Consider ε > 0. Then there exists a constant Cε, which does not depend upon κ and γ, such that for any γ > ε and κ,

Proof of Lemma A.3. See Appendix B.

A.2. Proof of Propositions 1 and 2.

Proof of Proposition 1. For brevity of notation, the proof is made for p = dim θ = 1. Define

This gives

Under Assumption M,

, which gives

, so that

. Consider now Aκ. Under Assumption M, the Taylor formula gives

with a θtT* between

and where m1 and m2 are

column vectors with bounded entries given by the first- and second-order derivatives. Because U = ΔT + ε, this gives

The Cauchy–Schwarz inequality gives |A| ≤ ∥e(θ)∥∥ΔT∥ with

, so that

because

by the Markov inequality and Assumption E. Because

and under Assumption M, applying (A.5) for A and the Cauchy–Schwarz inequality for A give

Substituting in the expression of

give

But

so that substituting (A.4) in the preceding equation and (A.6) give the desired result. █

Proof of Proposition 2. Define

Under the null, Proposition 1 yields

Hence Proposition 2(i) follows from taking κ = κmin in Lemma A.3(i) and (A.7). Consider now Proposition 2(ii). Let ε be as in (3.4), so that

for T large enough. Therefore (A.7) yields that Proposition 2(ii) is a consequence of

To prove (A.8), we first rewrite Rκ0Rκmin0 as a suitable quadratic form. For k,κ > κmin, let

be as in (A.1) and consider the row vectors

,

for some regular κ × κ matrix βκ. Elementary algebra gives

Hence

We now verify that the quadratic form

obeys the conditions of Lemma A.3. Lemma A.1(i) yields that

, so that Assumption B holds taking φ = Omin1/2) = O(lnC2 d/2 T). Recall that κ − κmin [asymp ] 2jd − 2Jmind by the definition (3.1) of

. Hence Lemma A.3(ii) yields, for (A.8),

A.3. Proof of Lemma 1.

In this proof, we apply Lemmas A.1 and A.2 for

, which is such that κ = κmin = κmax = o(T1/3/ln2 T). The Jackson theorem (see Timan, 1994, eqn. (8), p. 278) yields that there is a trigonometric polynomial function Π(·) = ΠΔT(·) with degree [asymp ] κ1/d such that

Because

is bounded away from 0 over Λ in probability, (A.9) implies that

Note that

. Let Π = [Π(X1),…, Π(XT)]′, which is such that

because

is in the space spanned by the columns of

. Hence the triangular inequality and (A.9) give

In the expression (A.9) of Π(·), write β = [β1,…, βκ]′, so that the definitions of

in (6.2) and Lemma A.1(ii) give

Substituting shows that (6.3) is proved. Equation (6.4) follows from (A.5) and Assumption V, which gives

APPENDIX B: Proof of Lemmas A.1–A.3

B.1. Proof of Lemma A.1.

We begin with Lemma A.1(i),

. Because

is the largest eigenvalue of the symmetric Σκ and ∥Σκ−1∥ is the inverse of the smallest eigenvalue of Σκ. Hence

Because f (·) and σ(·) are bounded away from 0 and infinity over Λ by Assumptions E and X(ii), and because

is an orthonormal system of L2(Λ,dx), we have uniformly in κ

This gives

, and we now prove that

. Let Ψκminκ(Xt) = [ψκmin+1(Xt),…, ψκ(Xt)] and note that

It then follows that

where A [prcue ] B means that AB is a symmetric nonnegative matrix. This gives that

because the upper bound is a diagonal block submatrix of Σκ. Observe that

is also a diagonal block of Σκ−1 by the partitioned inverse formula, so that

. This gives

. To show that

, note that

is the L2(Λ,f (x) dx2(x))-orthogonal projection of ψk(·) on ψ1(·),…, ψκmin(·). The Pythagore inequality gives, uniformly in k ≥ 1,

Therefore, the Cauchy–Schwarz inequality gives for all x and κ ≥ 1,

Consider now Lemma A.1(ii) and (iii). Define

Assumptions E and X(i) and (6.1) give

and then, by the Cauchy–Schwarz inequality

and then

, and we now bound

. We have, uniformly in k ≤ κmax,

Because κmax [asymp ] Kmaxd, Assumption V and κmax2/T = o(1) yield

Therefore the smallest eigenvalue of

is bounded away from 0 and these matrices have an inverse for 1 ≤ κ ≤ κmax with a probability tending to 1. The order of

comes from the series expansion

which ends the proof of Lemma A.1(i) and (iii) because supκ∥Σκ−1∥ < ∞. █

B.2. Proof of Lemma A.2.

Let us recall some results from an empirical process useful to establish some preliminary bounds. Consider the class of functions

from Λ to

with

with [ell ] as in Assumption V. Under Assumption V, there is an MT [asymp ] vT such that

Then, to establish Lemma A.2, we can view

as a member of a

. Consider now a sequence of functions from Λ to

and define the empirical process

as

Modifications of bounds (8.3), (8.7), and (8.9) in Rio (2000) to account for multiplication by mT(·) and ψk(·) with supx∈Λk(x)| = 1 show that

Define

so that

. The Chebyshev inequality, (B.2), and Lemma A.1(ii) give

Observe also that the martingale structure of the εt's, Assumption E, and (6.1) yield that

It follows that

Note that (A.2) is due to Cauchy–Schwarz inequality and

. Expression (A.3) follows from (B.3) and (B.6). We now prove (A.4). We have

By (B.3), (B.5), (B.6), Lemma A.1(i), Assumption V,

, we have

the other remainder terms being negligible. This gives (A.3).

We now turn to (A.5). Let πκ(·) = πκ,T(·) be a trigonometric polynomial function of Πκ with supx∈Λ|mT(x) − πκ(x)| ≤ 2 infπ(·)∈Πκ supx∈Λ|mT(x) − π(x)|. Because

is a linear combination of the columns of

for all κ ≥ κmin, it follows that

. This gives

with

Consider first the leading term

of (B.7). Because supκmin supx∈Λκmin(x)| < ∞ and taking ψ1(·) = 1 gives, in (B.2),

The definition of πκmin(·) yields, under Assumption E,

This gives, for the leading term of (B.7),

For the first item of (B.8), note that Assumption E gives that

; see (6.2). Because orthogonal projection decreases the mean squared norm, this gives, for the first term in (B.8),

so that

For the second term in (B.8), observe that

Therefore Lemma A.1, (B.3), and (B.6) yield

For the last item of (B.8), (B.3), (B.4), (B.6), and Lemma A.1 give that

Substituting in (B.8) and (B.7) yields

B.3. Proof of Lemma A.3.

Abbreviate Vκ−1/2Φκ′(Xtt into ηt. Consider a sequence

variables independent of

, where Idκ is the identity matrix of dimension κ × κ. Let

be a three time differentiable real function. Define

. The proof of Lemma A.3 is divided into three steps. The main step aims to establish that for

and some C > 0 independent of κ and T,

Step 1. Proof of (B.9). We build on arguments used in the proof of the Lindeberg central limit theorem as given in Billingsley (1968, Thm. 7.2); see Horowitz and Spokoiny (2001, Lem. 10) for a similar approach in the context of adaptive testing. It consists of successive changes of the ηt into their Gaussian counterparts

, as seen from (B.10), which follows. However, a important difference is due to the use of nonparametric series methods and dependence. Define

This gives

Define, for

. A third-order Taylor expansion of

with integral remainder yields

with

Let

be the sigma field generated by

and note that StT(0) and QtT(0) are

-measurable. Because

are centered given

, we have

Substituting the Taylor expansion in (B.10) yields

and we now bound each of these two sums.

We begin by establishing a preliminary inequality. Let n1 and n2 be two positive real numbers with 2 ≤ n1 + n2 ≥ 8. Then for any t, t′ and z ∈ [0,1],

We give a proof for

, the other bound being similarly established. The Hölder inequality implies that

Because

is an N(0,σ2 Idκ), it is easily seen that

, and we now bound

. We have, by convexity, the Burkholder inequality (see Chow and Teicher, 1988, p. 396, noticing that

is a sum of difference of martingale), and the Minkowski inequality

This gives

and then (B.14).

We now return to (B.13). The expression (B.11) of the third derivative of

and (B.14) yield

To study (B.12), let Φκ(Xt) = Vκ−1/2Φκ′(Xt) = [φ1(Xt),…, φκ(Xt)]′, StT = StT(0) = [S1tT,…, SκtT]′, QtT = QtT(0). The definitions of

show that

. Therefore because QtT and StT are

measurable, conditioning with respect to

yields, using the expression of the second-order derivative of

given in (B.11),

Let n be an integer and define

The variables

depend upon

, which are n + 1 time periods far from the φk2(Xt)'s. Because

, the Wolkonski–Rozanov inequality yields

by first integrating out with respect to the

, which are independent from the ηt's, and using (B.14). Note that

and

. This together with the definition of

and (B.14) gives

Therefore, (B.16), (B.17), and these inequalities give

Summing over t gives in (B·12)

under Assumption M(i). An optimal choice of the order of n in (B.18) is T2/(5+3a), which gives the upper bound

. Therefore (B.18) and (B.12), (B.15), and (B.13) yield that (B.9) is proved.

Step 2. Proof of Lemma A.3(i). Now choose a three time continuously differentiable

with

if z > 0. This gives, for any

,

and then, by (B.9),

Note that

is a

that has a continuous density and converges in distribution to a standard normal if κ goes to infinity. Therefore taking ε small enough gives Lemma A.3(i).

Step 3. Proof of Lemma A.3(ii). The proof is done by bounding

in (B.20). Observe that

has the same distribution as

where the ζk's are i.i.d. N(0,1) random variables. As established in the proof of Theorem 7.2 of Billingsley (1968) and changing the

into standard N(0,1) variables, there is a constant Cε with

Then (B.19) and (B.20) show

Applying the Mill's ratio inequality (see Shorack and Wellner, 1986, p. 850) to

shows that Lemma A.3(ii) is proved. █

References

REFERENCES

Aït-Sahalia, Y. (1996) Testing continuous-time models of the spot interest rate. Review of Financial Studies 9, 385426.Google Scholar
Aït-Sahalia, Y., P.J. Bickel, & T.M. Stocker (2001) Goodness of fit tests for kernel regression with an application to option-implied volatility. Journal of Econometrics 105, 363412.Google Scholar
Baraud, Y., S. Huet, & B. Laurent (2003) Adaptive tests of linear hypotheses by model selection. Annals of Statistics 31, 225251.Google Scholar
Bierens, H.J. (1984) Model specification testing of time series regressions. Journal of Econometrics 26, 323353.Google Scholar
Billingsley, P. (1968) Convergence of Probability Measures. Wiley.
Brown, L.D. & M.G. Low (1996) Asymptotic equivalence of nonparametric regression and white noise. Annals of Statistics 24, 23842398.Google Scholar
Chen, X. & Y. Fan (1999) Consistent hypothesis testing in semiparametric and nonparametric models for econometric time series. Journal of Econometrics 91, 373401.Google Scholar
Chow, S.C. & H. Teicher (1988) Probability Theory: Independence, Interchangeability, Martingales. Springer.
Domowitz, I. & H. White (1982) Misspecified models with dependent observations. Journal of Econometrics 20, 3558.Google Scholar
Fan, J. (1996) Test of significance based on wavelet thresholding and Neyman's truncation. Journal of the American Statistical Association 91, 674688.Google Scholar
Fan, J. & L.S. Huang (2001) Goodness-of-fits test for parametric regression models. Journal of the American Statistical Association 96, 640652.Google Scholar
Fan, J. & Q. Yao (2003) Nonlinear Time Series: Nonparametric and Parametric Methods. Springer.
Fan, J., C. Zhang, & J. Zhang (2001) Generalized likelihood ratio statistics and Wilks phenomenon. Annals of Statistics 29, 153193.Google Scholar
Gao, J. & M. King (2001) Estimation and Model Specification Testing in Nonparametric and Semiparametric Regression. Working paper, School of Mathematics and Statistics, the University of Western Australia.
Gao, J. & M. King (2004) Adaptive testing in continuous-time diffusion models. Econometric Theory 20, 844883.Google Scholar
Gayraud, G. & C. Pouet (2005) Adaptive minimax testing in the discrete regression scheme. Probability Theory and Related Fields 133, 53158.Google Scholar
Gozalo, P.L. (1997) Nonparametric bootstrap analysis with applications to demographic effects in demand functions. Journal of Econometrics 81, 357393.Google Scholar
Guerre, E. & P. Lavergne (2002) Optimal minimax rates for nonparametric specification testing in regression models. Econometric Theory 18, 11391171.Google Scholar
Guerre, E. & P. Lavergne (2005) Data-driven rate-optimal specification testing in regression models. Annals of Statistics 33, 840870.Google Scholar
Hamilton, J.D. (2001) A parametric approach to flexible nonlinear inference. Econometrica 69, 537573.Google Scholar
Härdle, W. & E. Mammen (1993) Comparing nonparametric versus parametric regression fits. Annals of Statistics 21, 19261947.Google Scholar
Hart, J.D. (1997) Nonparametric Smoothing and Lack-of-fit Tests. Springer.
Hong, Y. & H. White (1995) Consistent specification testing via nonparametric series regression. Econometrica 63, 11331159.Google Scholar
Horowitz, J. & V.G. Spokoiny (2001) An adaptive, rate-optimal test of a parametric mean regression model against a nonparametric alternative. Econometrica 69, 599631.Google Scholar
Ingster, Y.I. (1992, 1993) Asymptotically minimax hypothesis testing for nonparametric alternatives, parts I, II, and III. Mathematical Methods of Statistics 2, 85–114, 171–189, and 249268.
Poo, J.R., S. Sperlich, & P. Vieu (2004) An Adaptive Specification Test for Semiparametric Models. Manuscript, Universidad de Zaragoza.
Rio, E. (2000) Théorie asymptotique des processus aléatoires faiblement dépendants. Mathématiques and Applications 31. Springer.
Robinson, P.M. (1989) Hypothesis testing in semiparametric and nonparametric models for econometric time series. Review of Economic Studies 56, 511534.Google Scholar
Shorack, G.R. & J.A. Wellner (1986) Empirical Processes with Applications to Statistics. Wiley.
Spokoiny, V.G. (1996) Adaptive hypothesis testing using wavelets. Annals of Statistics 24, 24772498.Google Scholar
Spokoiny, V.G. (2001) Data-driven testing the fit of linear models. Mathematical Methods of Statistics 10, 465497.Google Scholar
Timan, A.F. (1994) Theory of Approximation of Functions of a Real Variable. Dover.
Tjøstheim, D. (1994) Non-linear time series: A selective review. Scandinavian Journal of Statistics 21, 97130.Google Scholar
Figure 0

Size properties (5%) of our test and Hamilton test (LM) (200 observations)

Figure 1

Power properties (5%) of our test and Hamilton test (LM): First experiment (200 observations)

Figure 2

Alternative model (ρ = 0.50). Dashed line, τ = 0.25; thick line, τ = 0.50; and solid line, τ = 1.

Figure 3

Power properties (5%) of our test and Hamilton test (LM): Second experiment (200 observations)