1. INTRODUCTION
Starting with Bierens (1984) and Robinson (1989), nonparametric specification testing for dependent data has received much attention in the econometric literature. The range of potential applications includes nonlinearity tests and time series model building as reviewed in Tjøstheim (1994) and Fan and Yao (2003), specification of a continuous-time diffusion model for interest rates (Aït-Sahalia, 1996), specification of the Phillips curve (Hamilton, 2001), rational expectations models and conditional portfolio efficiency (Chen and Fan, 1999; Robinson, 1989), and tests of the Black and Scholes formula (Aït-Sahalia, Bickel, and Stocker, 2001) among others.
An important branch of this literature has considered a nonparametric approach that uses a smoothing parameter, such as a bandwidth or the order of a series expansion. This has raised two important issues, the detection properties and the size accuracy. The former can be addressed with efficiency considerations, as pioneered in Ingster (1992, 1993); see also Guerre and Lavergne (2002). This framework leads to calibration tests to detect alternatives, in a given smoothness class, that approach the null at the fastest possible rate. However, the proposed smoothing parameters depend upon the chosen smoothness class, which is too restrictive for practical applications because the choice of a smoothness class is often arbitrary. Regarding the size issue, the statistics considered in the literature are often quadratic, but the critical values are computed from a normal approximation that may be inaccurate; see Hong and White (1995) for nonparametric series and Tjøstheim (1994) for kernel methods. Recent work for independent and identically distributed (i.i.d.) observations, such as Fan, Zhang, and Zhang (2001), suggests that more sophisticated approximations should be used instead of the normal. Härdle and Mammen (1993) and Gozalo (1997), among others, have proposed bootstrapped critical values as a solution. This may be difficult when the parametric model under consideration is specified in continuous time and is therefore costly to simulate or to bootstrap. Bootstrapping is also a burden when the dynamic specification includes covariates that are not strongly exogenous and need to be simulated.
An important step for the detection issue was the development of the adaptive framework. Under this approach, the smoothness class containing the alternative is considered unknown. Adaptive tests combine several statistics, designed for a specific class, to build a test; see Hart (1997) for a review of earlier work in this direction. Spokoiny (1996) has developed an efficiency theory for the adaptive case. Various papers considered adaptive rate-optimal tests using the maximum of the statistics, including Fan (1996), Fan and Huang (2001), Horowitz and Spokoiny (2001), and Spokoiny (1996, 2001). More specifically, Horowitz and Spokoiny (2001) have proposed an adaptive rate-optimal kernel-based specification test for a general parametric regression model that has generated various extensions. Baraud, Huet, and Laurent (2003) consider some nonasymptotic refinements of the maximum approach for specification of a linear model. Poo, Sperlich, and Vieu (2004) are interested in a semiparametric null hypothesis, whereas Gayraud and Pouet (2005) considered a nonparametric null. Gao and King (2001, 2004) and Fan and Yao (2003) have proposed extending the scope of applications to dependent data.
However, the maximum approach produces statistics with unstable asymptotic null behavior, so that achieving an accurate size remains a difficult issue. Fan (1996) found that the null limit distribution of his test gives a poor approximation for finite samples. Horowitz and Spokoiny (2001) did not derive a null limit distribution and used simulated critical values. On the other hand, Guerre and Lavergne (2005) built on a data-driven selection procedure that, under the null, selects a prescribed statistic with a high probability. Compared to the maximum approach, this considerably reduces the complexity of the null behavior of the resulting test statistic, which asymptotic distribution is a standard normal given by a specific statistic. But the statistics of Guerre and Lavergne (2005) have a complicated quadratic structure, and so these authors used bootstrapped critical to achieve a level close to the nominal size. Hence, as mentioned earlier, such an approach may not be suitable for a dynamic model.
In this paper, a suitable modification of the Guerre and Lavergne (2005) test is proposed to derive an adaptive rate-optimal specification test with an accurate size in a dynamic setting. The null hypothesis considered is the specification of the conditional mean for a time series with heteroskedastic innovations. Nonparametric series methods are used to compute chi-square statistics of various orders, which, in case of low degrees of freedom, have an accurate chi-square approximation under the null. A selection criterion, using a low penalty term proportional to the square root of the number of coefficients, chooses a test statistic. Hence the rejection region of the test can use accurate chi-square critical values. The rest of the paper is organized as follows. Section 2 presents our test and the adaptive framework on a nontechnical level. Section 3 groups our main assumptions and our main results. After studying the null behavior of the test, adaptive rate-optimality is introduced, and the test is shown to be efficient. Detection of local alternatives, approaching the null with a rate close to the parametric one, is also considered. Section 4 illustrates the size and detection properties of the test with a simulation experiment, and Section 5 concludes the paper. The proofs are grouped in Section 6 and two Appendixes.
2. HEURISTICS OF THE DATA-DRIVEN TEST
Consider an autoregressive model with exogenous variables Zt,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm001.gif?pub-status=live)
with
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm002.gif?pub-status=live)
, where
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm003.gif?pub-status=live)
is the past Borel field generated by X1,…, Xt. Given T observations (Y1,X1),…,(YT,XT), we want to test that μ(·) belongs to some parametric family
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm004.gif?pub-status=live)
, that is, the correct specification hypothesis
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm005.gif?pub-status=live)
The proposed procedure builds on the estimated residuals
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm006.gif?pub-status=live)
, where
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm007.gif?pub-status=live)
is a consistent estimator of θ under H0, such as, for instance, the nonlinear least squares estimator
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm008.gif?pub-status=live)
By Yt = μ(Xt) + εt, the residuals decompose as
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm009.gif?pub-status=live)
, where
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm010.gif?pub-status=live)
indicates potential misspecification, which asymptotically vanishes under the null but not under the alternative. Our test combines nonparametric series statistics constructed by projecting the residuals to detect the presence of a significant
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm011.gif?pub-status=live)
over a compact Λ = [−λ,λ]d. More specifically, we focus on multivariate Fourier series regression.
1Using other series approximation methods, as, for instance, polynomial functions or wavelets, is possible but leads to a more involved theoretical study. Indeed, the Fourier system satisfies
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm012.gif?pub-status=live)
, a condition that simplifies algebraic manipulations under dependence mixing conditions. Another interest of Fourier methods is that using wavelets may limit the scope of applications to alternatives with a maximal smoothness given by the choice of the wavelet basis; see the wavelet tests considered in Spokoiny (1996) and Theorem 2.4 therein.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm013.gif?pub-status=live)
, define the kth trigonometric function over Λ as
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-35485-mediumThumb-S0266466606060282frm001.jpg?pub-status=live)
so that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm014.gif?pub-status=live)
is an L2(dx)-orthonormal system, that is, ∫Λ ψk(x)ψk′(x) dx = 1 if k = k′ and 0 otherwise. Let
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm015.gif?pub-status=live)
be the degree of ψk(·). The series estimation of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm016.gif?pub-status=live)
over Λ builds on trigonometric multivariate polynomial function [sum ]|k|≤K bkψk(·) of degree K, with a number cK of coefficients bk proportional to Kd. To account for heteroskedasticity, assume that an estimator
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm017.gif?pub-status=live)
of σ(·) is given and consider the generalized least squares estimator
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm018.gif?pub-status=live)
,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm019.gif?pub-status=live)
where
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm020.gif?pub-status=live)
is the diagonal matrix with entries
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm021.gif?pub-status=live)
, and ΨK is the T × cK matrix [ψk(Xt),1 ≤ t ≤ T,|k| ≤ K]. Suppose that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm022.gif?pub-status=live)
is a trigonometric polynomial function of order K. A standard procedure to test the significance of Fourier coefficients would use the chi-square statistic
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282frm002.gif?pub-status=live)
leading to rejection of H0 when
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm023.gif?pub-status=live)
is large. However, assuming that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm024.gif?pub-status=live)
has a finite series expansion of known order K is too simplistic for practical applications. More generally, an arbitrary choice of K may affect the power, and a better understanding of the impact of K is important to build a proper specification test. Set
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm025.gif?pub-status=live)
so that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm026.gif?pub-status=live)
decomposes into three terms
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm027.gif?pub-status=live)
with
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-81444-mediumThumb-S0266466606060282frm003.jpg?pub-status=live)
The term
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm028.gif?pub-status=live)
is crucial regarding detection of potential misspecification. It is the squared norm of the orthogonal projection of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm029.gif?pub-status=live)
on the columns of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm030.gif?pub-status=live)
, which increases with K up to
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm031.gif?pub-status=live)
, achieved for cK ≥ T. Hence
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm032.gif?pub-status=live)
can be viewed as a downward-biased estimation of the empirical measure of misspecification
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm033.gif?pub-status=live)
, that is,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm034.gif?pub-status=live)
where biasμ(K) ≤ 0 depends upon the unknown μ(·) and decreases with K. The other important term in the decomposition (2.3) of the statistic
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm035.gif?pub-status=live)
is
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm036.gif?pub-status=live)
, a pure noise term. It can be expected that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm037.gif?pub-status=live)
is asymptotically a chi-square variable with cK degree of freedom, with mean cK and variance 2cK, so that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm038.gif?pub-status=live)
. Neglecting
2Assume that H0 is μ(·) = 0 and that σ(·) is known so that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm039.gif?pub-status=live)
and the choice
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm040.gif?pub-status=live)
is possible. In the case of Gaussian i.i.d. εt independent of the Xt's,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm041.gif?pub-status=live)
would be an
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm042.gif?pub-status=live)
, which can be neglected with respect to
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm043.gif?pub-status=live)
when this variable diverges. Note also that the distribution of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm044.gif?pub-status=live)
coincides with its chi-square approximation for such
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm045.gif?pub-status=live)
.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm046.gif?pub-status=live)
and substituting in (2.3) gives a bias variance type decomposition for
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm047.gif?pub-status=live)
Note that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm048.gif?pub-status=live)
is a better misspecification indicator than
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm049.gif?pub-status=live)
, which is affected by an additional systematic bias term cK. Guerre and Lavergne (2005) proposed a different bias correction that makes asymptotic inference less accurate in finite sample, so that the bootstrap is used.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282frm004.gif?pub-status=live)
Looking for the best estimator
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm050.gif?pub-status=live)
of the misspecification indicator suggests that an ideal choice of K should achieve the minimum of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm051.gif?pub-status=live)
. However, this is infeasible in practice, at least because biasμ(·) depends upon the unknown μ(·). Alternative feasible choices of K include the Akaike information criterion (AIC) and Bayesian information criterion (BIC) as reviewed in Hart (1997). These selection procedures consider a K achieving the maximum of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm052.gif?pub-status=live)
where γ is a penalty parameter. According to (2.4), this amounts to achieving the minimum of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm053.gif?pub-status=live)
. Therefore these selection procedures asymptotically balance |biasμ(K)| with (γ − 1)cK in place of the ideal order cK1/2 in (2.4). This suggests using instead a lower penalty term of the form ck + γcK1/2 affecting the square root of the number of coefficients cK1/2 in place of ck. More specifically, let
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm054.gif?pub-status=live)
be a set of admissible degree K larger than or equal to Kmin. Our data-driven choice of K is
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-08435-mediumThumb-S0266466606060282frm005.jpg?pub-status=live)
The introduction of Kmin quantities in the penalty criterion reflects a preference for low degree as justified now from considerations on the null behavior of the retained
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm055.gif?pub-status=live)
.
As seen from Fan (1996) or Horowitz and Spokoiny (2001), finding an accurate approximation for the null distribution of a statistic that combines the
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm056.gif?pub-status=live)
's as
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm057.gif?pub-status=live)
is difficult. A first distinctive feature is that the selection procedure (2.5) is flexible enough to limit the contribution of the statistics with high K by taking γT large enough. Indeed, a limit case is γT = +∞, which gives that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm058.gif?pub-status=live)
. This continues to hold asymptotically provided γT diverges fast enough, as shown in Theorem 1 in Section 3. Moreover, as detailed now, an accurate approximation of the distribution of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm059.gif?pub-status=live)
is a standard chi-square. Because
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm060.gif?pub-status=live)
asymptotically vanishes under H0, (2.3) shows that the null distribution of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm061.gif?pub-status=live)
is approximately that of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm062.gif?pub-status=live)
and then, neglecting the effect of the variance estimation, of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm063.gif?pub-status=live)
where Ω1/2 = Diag[σ(X1),…, σ(XT)]. In the i.i.d. case and according to the Berry–Esseen bound in Hart (1997, Thm. 7.2), the distribution of the vector
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm064.gif?pub-status=live)
has a normal approximation up to an error a(cK)/T1/2 where a(cK) diverges with cK. Therefore, the distribution of the chi-squared statistic R3K should be close to a chi-square with cK degree of freedom up to an error a(cK)/T1/2, which is smaller for moderate K.4
This continues to hold in the dependent setup where the bound (B.9) in Appendix B gives a more complicated error term, which is K2d/T1/2 at best. A normal approximation would be affected with a bigger K2d/T1/2 + K−d/2 error term.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm065.gif?pub-status=live)
where χ(c) is a chi-square with c degree of freedom and rejects H0 if5
A second distinctive feature of the selection procedure (2.5) is standardization with cKmin in the critical region
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm066.gif?pub-status=live)
; see (2.6). Because
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm067.gif?pub-status=live)
asymptotically, an alternative α-level critical region would use
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm068.gif?pub-status=live)
in place of cKmin. But such a choice would asymptotically reduce power because
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm069.gif?pub-status=live)
. This also contrasts with a maximum procedure that would use the test statistic
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm070.gif?pub-status=live)
with a
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm071.gif?pub-status=live)
larger than cKmin. The simulation experiments of Guerre and Lavergne (2005) revealed that such a construction of the critical region (2.6) gives a test that improves on its adaptive rate-optimal competitors.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282frm006.gif?pub-status=live)
Consider now the power issue. The data-driven choice (2.5) of K combines the detection properties of each of the
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm072.gif?pub-status=live)
's. Indeed, because cK ≥ cKmin for any K in
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm073.gif?pub-status=live)
, we have
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-43381-mediumThumb-S0266466606060282frm007.jpg?pub-status=live)
This gives the power lower bound
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-09925-mediumThumb-S0266466606060282frm008.jpg?pub-status=live)
which holds in particular for an optimal K that balances the bias with the penalty term. Taking K = Kmin gives that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282frm009.gif?pub-status=live)
a power bound that shows that the test (2.6) improves on the one using the single statistic
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm074.gif?pub-status=live)
. As seen from (2.4) and (2.8), consistency holds as soon as there is a degree K in
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm075.gif?pub-status=live)
such that the misspecification measure
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm076.gif?pub-status=live)
is asymptotically larger than the sum of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm077.gif?pub-status=live)
. Hence increasing γT too much should give a less powerful test. The form of the low penalty term in (2.5) is crucial to show adaptive rate-optimality; see Theorem 2 in Section 3. Theorem 3 in Section 3 shows that the test detects Pitman local alternatives with a rate arbitrarily close to the rate T−1/2.
3. MAIN RESULTS
3.1. Main Assumptions
Consider T observations (Yt,Xt) with
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm078.gif?pub-status=live)
, and where μ(·) can depend upon T, in which case (Yt,Xt) forms a triangular array (YtT,XtT). Let
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm079.gif?pub-status=live)
denote the Borel field generated by X1,ε1,…, Xt,εt and Xt,εt,Xt+1,εt+1,… , respectively. The α-mixing coefficients of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm080.gif?pub-status=live)
are
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm081.gif?pub-status=live)
The next assumptions deal with the εt's, the mixing coefficients, and the parametric mean.
Assumption E. Let
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm082.gif?pub-status=live)
be the Borel field generated by (X1,ε0),…,(Xt,εt−1). The variables
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm083.gif?pub-status=live)
are martingale difference with
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm084.gif?pub-status=live)
. The standard deviation function, σ(·) = Var[εt|Xt = ·], is continuous and bounded away from 0 on
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm085.gif?pub-status=live)
.
Assumption X. The process
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm086.gif?pub-status=live)
on
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm087.gif?pub-status=live)
is stationary, with the following conditions holding.
(i) α(n) ≤ An−1−a for some constant A,a > 0.
(ii) The variable Xt has a density f (·) with respect to the Lebesgue measure on
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm088.gif?pub-status=live)
. The density f (·) is bounded away from 0 and infinity.
Assumption M. The parameter set Θ is a subset of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm089.gif?pub-status=live)
, and the following conditions hold.
(i) The regression function m(x;θ) is twice continuously differentiable with respect to θ. The gradient m(1)(x;θ) and Hessian matrix m(2)(x;θ) are bounded over Λ × Θ.
(ii) For any sequence of regression functions μT(·) with
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm090.gif?pub-status=live)
, there exists a sequence of parameter θT in Θ such that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm091.gif?pub-status=live)
, with θT = θ if μT(·) = m(·;θ) for some θ in Θ.
Assumption E ensures that the sums
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm092.gif?pub-status=live)
are martingales that are asymptotically normal under Assumption X(i). The polynomial mixing rate of X(i) is a minimal rate to achieve T1/2-consistency in the weak law of large numbers for the empirical mean T−1ΨK′Ω−1ΨK. Under Assumption X(ii), the limit of T−1ΨK′Ω−1ΨK has an inverse. Mixing conditions for Markovian (Yt,Xt) as in Assumption X(i) can be derived using a drift condition; see Fan and Yao (2003, Thm. 2.4) and the references therein. When
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm093.gif?pub-status=live)
, the sequence θT in Assumption M(ii) is the pseudo–true value
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm094.gif?pub-status=live)
, which is uniquely defined under identification of the parametric regression model; see Domowitz and White (1982). Assumption M(i) then ensures that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm095.gif?pub-status=live)
is close to Δ(·) = μT(·) − m(·;θT) over Λ up to an
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm096.gif?pub-status=live)
term.
Let us now turn to the construction of the test. The first assumption specifies a set of admissible degrees
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm097.gif?pub-status=live)
in the spirit of the dyadic bandwidth set of Horowitz and Spokoiny (2001).
Assumption K. Let a be as in Assumption X. Set Kmax = 2Jmax = O(TC1 /d) for some C1 in (0,¾[(1 + a)/(5 + 3a)]), Kmin = 2Jmin → ∞ with Kmind = O(lnC2 T) for C2 > 0, where Jmin ≤ Jmax are integer numbers. The set of admissible degrees
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm098.gif?pub-status=live)
is dyadic, that is,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282frm010.gif?pub-status=live)
Note that (3.1) and the polynomial divergence rate of Kmax imply that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm099.gif?pub-status=live)
is of exact order ln T. Such a restriction is helpful to show that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm100.gif?pub-status=live)
asymptotically under the null but also has some practical justifications. Indeed, achieving a small
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm101.gif?pub-status=live)
is an important condition to get an accurate size. Because
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm102.gif?pub-status=live)
vanishes if and only if K = Kmin, (2.5) yields that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm103.gif?pub-status=live)
if and only if one of these penalized statistics is strictly positive for a K ≠ Kmin, or equivalently
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm104.gif?pub-status=live)
Hence
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282frm011.gif?pub-status=live)
so that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm105.gif?pub-status=live)
increases with
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm106.gif?pub-status=live)
and decreases with the penalty sequence γT. Therefore, using a parsimonious
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm107.gif?pub-status=live)
can improve the size accuracy of the test. On the other hand, a dyadic
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm108.gif?pub-status=live)
as in Assumption K contains sequences with any arbitrary order between lnC2 T and TC1 that is sufficient for adaptive rate-optimality. The constant C1 of Assumption K must be smaller than
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm109.gif?pub-status=live)
where a comes from Assumption X(i), α(n) = O(n−1−a). This gives a Kmax of order T1/(4d) at best, whereas, in the i.i.d. setup, Hong and White (1995) allowed for a better order T1/(3d) when using a single series statistic on which to base the test.
Let us now turn to variance estimation. The next condition allows us to approximate
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm110.gif?pub-status=live)
with T−1ΨK′Ω−1ΨK for degrees K depending on the sample size T, as in Assumption K.
Assumption V. Let
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm111.gif?pub-status=live)
. Then, for the considered sequence of regression models
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm112.gif?pub-status=live)
and, for some integer [ell ] > d/2 and all ([ell ]1,…,[ell ]d) with
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm113.gif?pub-status=live)
, where vT = o(Kmax−3d/2/ln T) and lim infT→∞ T 1/2vT > 0.
Assumption V requires consistency of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm114.gif?pub-status=live)
under the null and the alternative. Convergence of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm115.gif?pub-status=live)
with the rate vT requires that μT(·) and σ(·) satisfy a minimal smoothness condition. As seen from Guerre and Lavergne (2002), consistency is not necessary under the alternative but can be useful to get a powerful test. Under homoskedasticity, a simple choice of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm116.gif?pub-status=live)
is a constant difference-based estimator, in which case Assumption V holds with a best possible vT = T−1/2 so that Kmax = o(T1/(3d) ln2/(3d) T). The heteroskedastic case requires nonparametric variance estimation, such as kernel, sieves, series expansion; see, among others, Guerre and Lavergne (2002, 2005) and Horowitz and Spokoiny (2001). The rate vT is then the consistency rate for the [ell ]th partial derivatives, which restricts the divergence rate of Kmax.
3.2. Asymptotic Behavior under the Null
As discussed following (3.2) and (2.9), a fast divergence rate for γT is useful to achieve an accurate size under the null but may negatively affect its power properties. Therefore, an important issue is to find a minimal divergence rate for γT ensuring that the test is asymptotically of level α or equivalently that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm117.gif?pub-status=live)
asymptotically vanishes under H0. The Bonferroni inequality gives, in (3.2),
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282frm012.gif?pub-status=live)
and showing that the last sum asymptotically vanishes for small γT necessitates precise uniform bounds for these probabilities, so that simple Chebychev-type inequalities may not be sufficient. Better Gaussian-type bounds in the spirit of Mill's ratio inequality
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm118.gif?pub-status=live)
are derived in Lemma A.3 in Appendix A. Because the exact order of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm119.gif?pub-status=live)
is ln T, the next theorem ensures that the asymptotic size of the test is α provided that the penalty sequence γT diverges faster than (ln ln T)1/2.
THEOREM 1. Consider that the null hypothesis H0 is true and assume that Assumptions E, K, M, V, and X hold. Then, if γT diverges with
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282frm013.gif?pub-status=live)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm120.gif?pub-status=live)
, and the test (2.6) is asymptotically of level α.
The minimal divergence rate (ln ln T)1/2 ensuring that the test is asymptotically of level α is surprisingly low compared to the penalty term of order ln T used in the BIC criterion. Such improvement comes from the Gaussian-type bounds used for the tails of the standardized
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm121.gif?pub-status=live)
. Indeed, this gives, up to remainder terms, a bound
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm122.gif?pub-status=live)
in (3.3), which asymptotically vanishes provided that (3.4) holds. On the other hand, such a low rate is in line with previous findings for rate-optimal adaptive testing. Indeed, (3.2) shows that suitable γT should resemble the critical values of a maximum test such as that of Fan (1996), who found critical values with a typical rate of (2 ln ln T)1/2. This suggests that our minimal rate condition (3.4) cannot be improved.
Another condition for Theorem 1 to hold is that Kmin diverges with the sample size; see Assumption K. This is used to neglect the parametric estimation error
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm123.gif?pub-status=live)
in the chi-square approximation of the distribution of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm124.gif?pub-status=live)
. Accounting for such an effect would allow us to consider a fixed Kmin; see, for example, Hart (1997, Sect. 8.3.1).
3.3. Detection of Small Alternatives
As discussed following equation (2.9), the detection properties of the test depend upon a bias term from (2.4). Establishing formal adaptive rate-optimality of the test necessitates bounding this bias. The current mathematical approach to do so makes use of some smoothness restrictions. We consider here Hölder smoothness classes
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm125.gif?pub-status=live)
that we introduce now. Define the departure from the null as
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm126.gif?pub-status=live)
with a θT as in Assumption M. We restrict ourselves to departures Δ(·) with a restriction to Λ that admits a (2λ)-periodic extension. Consider first the case s ∈ (0,1], for which
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm127.gif?pub-status=live)
For real s > 0, let [lfloor ]s[rfloor ] be the lower integer part of s, that is, the unique integer number satisfying [lfloor ]s[rfloor ] < s ≤ [lfloor ]s[rfloor ] + 1, so that s − [lfloor ]s[rfloor ] is in (0,1] with s − [lfloor ]s[rfloor ] = s for s ∈ (0,1]. For any s > 0, the smoothness class
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm128.gif?pub-status=live)
is defined as
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm129.gif?pub-status=live)
Hence the smoothness class
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm130.gif?pub-status=live)
is defined for all s > 0 and L > 0. Lemma 1 in Section 6 gives, for the bias term of (2.4), the following bound:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm131.gif?pub-status=live)
for any Δμ,T(·) in
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm132.gif?pub-status=live)
and any K. This gives, for small alternatives, which are the harder to detect,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282frm014.gif?pub-status=live)
Our minimax adaptive framework evaluates tests uniformly over alternatives at distance ρ from the null, that is, in
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm133.gif?pub-status=live)
with unknown smoothness index (L,s). Such alternatives allow for a general shape of Δμ,T(·) with narrow peaks and valleys that may depend upon on T; see Horowitz and Spokoiny (2001). As pointed out in Guerre and Lavergne (2005), uniform consistency over
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm134.gif?pub-status=live)
is equivalent to consistency against any sequence
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm135.gif?pub-status=live)
in
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm136.gif?pub-status=live)
as considered here. A crucial issue is the choice of a suitable asymptotically vanishing rate
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm137.gif?pub-status=live)
. Indeed, some of the alternatives of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm138.gif?pub-status=live)
will not be detected by any tests if
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm139.gif?pub-status=live)
goes to 0 at too rapid a rate. On the other hand, detection can become straightforward if
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm140.gif?pub-status=live)
remains far from the null. Hence a good candidate
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm141.gif?pub-status=live)
to evaluate a test is a frontier rate that separates these two extreme situations. In the adaptive approach, such a rate depends upon the unknown smoothness index s, and Spokoiny (1996) has shown that the optimal adaptive rate is
6Spokoiny (1996) studied the continuous time white noise model (CTWN)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm142.gif?pub-status=live)
, where {W(t)}t∈[0,1] is a standard Brownian motion. Although this model is mainly of theoretical interest, results established for the CTWN model extend to more common models through model equivalence; see Brown and Low (1996).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm143.gif?pub-status=live)
which is slower than the parametric rate T−1/2. Guerre and Lavergne (2002) derived an optimal rate for a known smoothness index s that improves ρT from the (ln ln T)1/2 factor, so that the price to pay for rate adaptation is moderate. As is well known, the rate ρT decreases faster than the nonparametric estimation rate T−s/(2s+d). The adaptive rate-optimality of our test is stated in the next result.
THEOREM 2. Consider a sequence of alternatives
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm144.gif?pub-status=live)
with s ≥ d(2/C1 − 1)/4, L > 0, C3 > 0, and
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm145.gif?pub-status=live)
. Assume that Assumptions E, K, M, and V hold. Then, if γT is of exact order (ln ln T)1/2 and provided C3 is taken large enough, the test is consistent, that is,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm146.gif?pub-status=live)
.
The proof of Theorem 2 builds on the lower power bound (2.8) and on the bias variance decomposition (2.4). In view of the bias order (3.5) for small alternatives, an optimal choice of K in (2.8) is such that the order of the penalty term γT Kd/2 is proportional to TK−2s, that is, for
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282frm015.gif?pub-status=live)
where [·] is the integer part. Such K* detects alternatives within the bias order divided by the sample size, K*−s ∝ (γT /T)2s/(4s+d), which coincides with the optimal adaptive order ρT provided γT has the smallest possible order (ln ln T)1/2 compatible with Theorem 1. Note that, under Assumption K, K* is in
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm147.gif?pub-status=live)
provided s ≥ d(2/C1 − 1)/4, which implies that s > 7d/4.
Because adaptation means detection over various smoothness classes
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm148.gif?pub-status=live)
, it is crucial that the test combine several statistics, as seen from the optimal K* in (3.6), which depends on the smoothing index s. Therefore, tests that use a single statistic
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm149.gif?pub-status=live)
generally fail to be rate-optimal adaptive. A more specific property of the test (2.6) is detection of small local alternatives.
THEOREM 3. Consider a sequence of local alternatives μT(·) satisfying
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-31997-mediumThumb-S0266466606060282ffm150.jpg?pub-status=live)
Then, under Assumptions E, K, M, V, and X, the test is consistent provided
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm151.gif?pub-status=live)
.
Because Kmin can diverge very slowly, the rate rT can be arbitrarily close to the parametric detection rate 1/T1/2. This slightly improves on the results of Horowitz and Spokoiny (2001), who achieved a rate (ln ln T)1/2/T1/2. A key argument there is that the local alternatives of Theorem 3 are asymptotically very smooth, because the departure from the null rT Δ0T(·) is in
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm152.gif?pub-status=live)
, with a Lipschitz constant LrT that goes to 0. Hence these alternatives differ from the general ones in Theorem 2, and they are typically detected by trigonometric series with low degree such as Kmin, so that (2.9) yields consistency of the test. On the other hand, using the single statistic
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm153.gif?pub-status=live)
would give a test that is not consistent against the alternatives of Theorem 2, so that combining several statistics as in our procedure is crucial to achieve these opposite kinds of detection properties.
4. SIMULATION EXPERIMENTS
In this section we study the size and the power properties of the proposed procedure when testing for a null of linearity in the context of a Markov process of order 1. The resulting test is compared with the one developed by Hamilton (2001) to detect nonlinearity. First, to examine the size properties, we use the AR(1)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm154.gif?pub-status=live)
Three distributions are considered for the error term: standard normal, standardized student with five degrees of freedom, and a centered and standardized exponential. To examine the sensitivity of the tests to temporal dependence, we consider various values of the autoregressive parameter ρ, namely, ρ = 0, 0.25, 0.50, 0.75. To implement our test, we choose the interval (Λ in Section 2) for projecting the covariate Yt−1 onto the trigonometric expansion to be equal to 2 divided by standard error of Yt under the null. This corresponds to approximately 95% of the observations. The set
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm155.gif?pub-status=live)
is equal to {1,2,4,8,16}. The asymptotic critical value is given by
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm156.gif?pub-status=live)
, where χ0.05(1) is the critical value at 5% of a chi-square with one degree of freedom. We study the small-sample properties of the test for various values of the penalty parameter γT. We fix γT equal to
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm157.gif?pub-status=live)
where we set c = 2,3,5. The parameters are estimated by ordinary least squares (OLS). The sample size is set to 200, and the number of simulations is equal to 10,000.
The simulation results for the size, which are presented in Table 1, are encouraging. For c = 2 the test slightly overrejects in all cases. However, for c = 3,5, the size is accurate whatever the distribution, persistence, and number of observations considered. The Lagrange multiplier (LM) test developed by Hamilton (2001) shares these good size properties.
Size properties (5%) of our test and Hamilton test (LM) (200 observations)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-31850-mediumThumb-S0266466606060282tbl001.jpg?pub-status=live)
To study the effect on power of the penalty sequence γT, two alternative specifications of the linear autoregressive process are examined. The first specification is a threshold autoregressive model defined as
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm158.gif?pub-status=live)
where εt is i.i.d. N(0,1).7
Results for the normal distribution are only reported here because the results for the two other distributions are very similar. Of course, those results can be obtained upon request.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm159.gif?pub-status=live)
Thus, under the null, μ(Xt) = ρ1Yt−1 whereas the nonlinear alternative is
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm160.gif?pub-status=live)
To examine the sensitivity of the tests to temporal dependence, we consider various types of dependence for the process Yt. We run the following experiments: (1) ρ1 = 0 and ρ2 = 0.25, 0.50, 0.75, (2) ρ1 = 0.25 and ρ2 = 0.50, 0.75, −0.50, (3) ρ1 = 0.50 and ρ2 = 0.25, 0, −0.25, and (4) ρ1 = 0.75 and ρ2 = 0.50, 0.25, 0. The values of ρ2 under the alternative are chosen such that the parameter (δ) that governs the distance from the null is equal to 0.25, 0.50, and 0.75, respectively. Table 2 reports the power results. Our test is more powerful than Hamilton's for all cases. Our power gains increase with the degree of temporal dependence and the distance of the alternative from the null. The difference in the rejection rate can be as high as 38%.
Power properties (5%) of our test and Hamilton test (LM): First experiment (200 observations)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-83247-mediumThumb-S0266466606060282tbl002.jpg?pub-status=live)
The second experiment corresponds to an alternative for which the data-driven optimal test is specially designed. The alternative models have the following form:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282frm016.gif?pub-status=live)
where
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm161.gif?pub-status=live)
, and εt is i.i.d. N(0,1). Figure 1 shows the function f (·) for τ = 1, 0.50, and 0.25, ρ = 0.50, and values of Yt between −10 and 10. The function f (·) is symmetric around zero and more concentrated for smaller values of τ. The function is bounded between zero and one, with f (0) = 1 and limx→±∞ f (x) = 0. We can easily show that the alternative (4.1) respects the drift condition of Fan and Yao (2003, Thm. 2.4) for geometric ergodicity. This alternative is then compatible with the assumptions in this paper.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-03944-mediumThumb-S0266466606060282fig001g.jpg?pub-status=live)
Alternative model (ρ = 0.50). Dashed line, τ = 0.25; thick line, τ = 0.50; and solid line, τ = 1.
We examine the sensitivity of the tests to the narrowness of the peak and temporal dependence. We consider the parameter values τ = 25, 0.50, 0.75 and ρ = 0.25, 0.50, 0.75. Table 3 shows the results of the experiment. For τ = 1, Hamilton's test is close to the nominal size. For 200 observations, our test rejects at a rate of 17% for ρ = 0.25 and 56% for ρ = 0.75. For τ = 0.50, our test also clearly dominates the test proposed by Hamilton for all cases. For a narrow peak (τ = 0.25), the rejection rate of both tests is quite similar. The better performance of the Hamilton test for this alternative compared to the one with a wider peak is probably due to the specification of the variance-covariance function of the random field underlying the test statistic. See Hamilton (2001) for further details on the construction of this test.
Power properties (5%) of our test and Hamilton test (LM): Second experiment (200 observations)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-25429-mediumThumb-S0266466606060282tbl003.jpg?pub-status=live)
5. CONCLUDING REMARKS
This paper proposes a new adaptive rate-optimal specification test for time series. As in the maximum approach of Fan (1996) or Horowitz and Spokoiny (2001), the test combines several statistics to achieve adaptive rate-optimality. More specifically, the test builds on series regression chi-square statistics with increasing orders. A data-driven selection procedure, in the spirit of Guerre and Lavergne (2005), uses a penalty term proportional to the square root of the number of Fourier coefficients to choose the test statistic. Under the null, the retained statistic is, with high probability, a statistic with a distribution close to a chi-square. Therefore, standard chi-square critical values can be used, allowing for better control of the size of the test. This contrasts with the maximum approach, where using a null limit distribution performs poorly, as noted in Fan (1996), or is out of reach, as in Horowitz and Spokoiny (2001). Hence, the maximum approach necessitates the use of simulated critical values, limiting the scope of applications to time series models that can be easily simulated. A simulation experiment confirms the good level properties of the proposed test, which shows interesting power improvements compared to a simpler test using a single statistic such as that of Hamilton (2001). We also examine the power of the test that is adaptive rate-optimal and detects local alternatives approaching the null at a faster rate than in Horowitz and Spokoiny (2001). The simulation experiment shows that the choice of the penalty term has a moderate impact on the power. This positively illustrates the interest of our approach, which builds on the fact that the combination mechanism inherent to adaptive testing can also be designed to achieve a level close to the nominal size.
Although our results are stated for Fourier series methods, our approach also applies to wavelets or polynomial series regression. As noted in Guerre and Lavergne (2005), the series construction of the test statistic easily can be modified to cope with additive alternatives that are not affected by the curse of dimensionality. Obtaining an accurate size in the case of kernel or local polynomial methods is theoretically feasible. The scope of applications of the new data-driven selection procedure can also be extended as discussed in Hart (1997) for earlier adaptive procedures or as in Tjøstheim (1994) and Fan and Yao (2003) in the time series context, in addition to many other specification hypotheses of econometric interest.
6. PROOFS OF MAIN RESULTS
The proofs are organized as follows. Important intermediate results and proofs of the main statements are given in Section 6. Proofs of auxiliary results are gathered in Appendixes A and B. We now introduce some notation and conventions. All functions can be set to 0 outside Λ without loss of generality. We set
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm162.gif?pub-status=live)
. The symbol aT [asymp ] bT means that the two sequences aT, bT with the same sign are such that c|aT| ≤ |bT| ≤ C|aT| for some 0 < c ≤ C < ∞ and T ≥ 1. Constants are denoted by the generic letter C and vary from expression to expression.
For notational convenience, we reindex the trigonometric functions (2.1) as
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm163.gif?pub-status=live)
and set cK = κ. We assume that the new ordering is such that ΨK = [ψ1,…, ψκ] and uses the notation Ψκ for ΨK. Here
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm164.gif?pub-status=live)
, is a column vector with
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm165.gif?pub-status=live)
. Therefore Ψκ is a T × κ matrix and κ [asymp ] Kd. With little abuse of notation,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm166.gif?pub-status=live)
T denotes both the set of admissible K or κ with κ between κmin [asymp ] 2Jmind and κmax [asymp ] 2Jmaxd. The term
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm167.gif?pub-status=live)
corresponds to
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm168.gif?pub-status=live)
. The variance estimation rate in Assumption V is such that vT = o(κmax−3/2/ln T).
Let ∥·∥ be the euclidean norm of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm169.gif?pub-status=live)
, that is, if
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm170.gif?pub-status=live)
. If m = [m(X1),…, m(XT)]′ where m(·) maps
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm171.gif?pub-status=live)
. Under Assumption E,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm172.gif?pub-status=live)
. For a κ × κ matrix Σ = [Σk[ell ]]1≤k,[ell ]≤κ, ∥Σ∥ is the spectral radius
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm173.gif?pub-status=live)
. Recall that ∥Σu∥ ≤ ∥Σ∥∥u∥, |u1′Σu2| ≤ ∥Σ∥∥u1|∥u2∥. It follows that the entries of Σu are bounded by κ1/2∥Σ∥max1≤k≤κ|uk|. If Σ is a symmetric matrix, ∥Σ∥ = sup∥u∥=1|u′Σu| is the largest eigenvalue in absolute value of Σ. Because
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm174.gif?pub-status=live)
is the orthogonal projection on the space spanned by the columns of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm175.gif?pub-status=live)
, we have
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm176.gif?pub-status=live)
In what follows, we bound variance of sums using the Wolkonski–Rozanov inequality (see Fan and Yao, 2003, Prop. 2.5(ii)), which states that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm177.gif?pub-status=live)
for any real-valued bounded g1(·) and g2(·). This gives
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-85668-mediumThumb-S0266466606060282frm017.jpg?pub-status=live)
6.1. Estimation Errors
We consider first the parametric and variance estimation errors induced by
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm178.gif?pub-status=live)
, respectively. For ΔT(·) = μT(·) − m(x;θT), set U = ΔT + ε and let Ω1/2 be the T × T diagonal matrix with entries σ(Xt). Set
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-90752-mediumThumb-S0266466606060282frm018.jpg?pub-status=live)
where
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-07987-mediumThumb-S0266466606060282ffm179.jpg?pub-status=live)
PROPOSITION 1. Consider a departure from the null such that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm180.gif?pub-status=live)
. Under Assumptions E, M, V, and X, and if κmin → ∞, κmax = O(T1/3/ln2 T), we have
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-56102-mediumThumb-S0266466606060282ffm181.jpg?pub-status=live)
Proof of Proposition 1. See Appendix A.
6.2. Proof of Theorem 1
The next proposition is the key tool to establish Theorem 1.
PROPOSITION 2. Assume that H0 holds, that is, ΔT(·) = 0. Then under Assumptions E, K, M, V, and X, make the following assumptions.
(i) Let χ(κ) be a chi-square variable with κ degree of freedom. Then, for any κ = κT in
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm182.gif?pub-status=live)
,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-89762-mediumThumb-S0266466606060282ffm183.jpg?pub-status=live)
(ii) Assume that (3.4) holds, that is, that for some ε > 0,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm184.gif?pub-status=live)
. Then
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm185.gif?pub-status=live)
Proof of Proposition 2. See Appendix A.
Proof of Theorem 1. Equation (3.2) and Proposition 2(ii) yield
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm186.gif?pub-status=live)
Then the definition of zα in (2.6) and Proposition 2(i) yield
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-54021-mediumThumb-S0266466606060282ffm187.jpg?pub-status=live)
6.3. Proof of Theorems 2 and 3
The next lemma is crucial for the consistency properties of the test and is used for the item
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm188.gif?pub-status=live)
in (2.3).
LEMMA 1. Consider a departure from the null such that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm189.gif?pub-status=live)
. Assume that Assumptions E, V, and X hold and that κ = κT diverges with κ = o(T1/3/ln2 T).
Then there exists a constant C5 > 0, depending upon s, L, and Λ, such that for any
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm190.gif?pub-status=live)
, any Δ(·) from Λ to
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm191.gif?pub-status=live)
in
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm192.gif?pub-status=live)
, we have
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282frm019.gif?pub-status=live)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282frm020.gif?pub-status=live)
Proof of Lemma 1. See Appendix A.
Proof of Theorem 2. Let s ≤ d(2/C1 − 1) and L be some unknown smoothness indexes. Let K* be as in (3.6), so that K* corresponds to a κ* in the new indexation. Observe that this κ* is such that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282frm021.gif?pub-status=live)
because the exact order of γT is ln1/2 ln T, s > 0, and κmin is smaller than a power of ln T.
Consider now a sequence of alternatives μT(·) in H1(C3.ρT) with C3 ρT > 2C5κ*−s/d, where C5 is from Lemma 1. This gives that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm193.gif?pub-status=live)
and that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm194.gif?pub-status=live)
diverges. Hence Lemma 1 gives
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-91229-mediumThumb-S0266466606060282ffm195.jpg?pub-status=live)
Observe also that Proposition 2(i) shows that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-31460-mediumThumb-S0266466606060282ffm196.jpg?pub-status=live)
Hence, (6.5), applying Proposition 1 for
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm197.gif?pub-status=live)
(so that κmax = κmin = κT), and substituting yield
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-31681-mediumThumb-S0266466606060282ffm198.jpg?pub-status=live)
provided C3 is large enough. The lower power bound (2.8) then shows that Theorem 2 is proved. █
Proof of Theorem 3. Because the proof of Theorem 3 is similar to the proof of Theorem 2 up to the fact that detection is achieved through κmin, we just give the main steps. Expression (2.7) yields that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm199.gif?pub-status=live)
, so that it is sufficient to show that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm200.gif?pub-status=live)
diverges to +∞ in probability. Building on Propositions 1 and 2(i) and Lemma 1 as for Theorem 2 now gives, because κmin [asymp ] Kmind → ∞,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-11890-mediumThumb-S0266466606060282ffm201.jpg?pub-status=live)
provided TrT2 diverges with limT→∞ Kmind/2/(TrT2) = 0 as assumed in Theorem 3. █
APPENDIX A: Proofs of Propositions 1 and 2 and Lemma 1
A.1. Preliminary Lemmas.
We begin with the estimation errors
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm202.gif?pub-status=live)
(see (6.2)) and preliminary bounds. Define
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282frm022.gif?pub-status=live)
which are used to study the difference
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm203.gif?pub-status=live)
in the proof of Proposition 2(ii). The next lemmas hold for general orthonormal systems
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm204.gif?pub-status=live)
of L2(Λ,dx) with
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm205.gif?pub-status=live)
. Recall that vT is such that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm206.gif?pub-status=live)
with vT = o(κmax−3/2/ln T); see Assumption V.
LEMMA A.1. Let
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm207.gif?pub-status=live)
be as in (6.2) and
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm208.gif?pub-status=live)
as in (A.1). Then, under Assumptions E, V, and X,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-61911-mediumThumb-S0266466606060282ffm209.jpg?pub-status=live)
LEMMA A.2. Let mT(·) and μT(·) from
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm210.gif?pub-status=live)
be some functions with support Λ. Then, under Assumptions E, V, and X and if
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm212.gif?pub-status=live)
,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282frm023.gif?pub-status=live)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282frm024.gif?pub-status=live)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-67702-mediumThumb-S0266466606060282frm025.jpg?pub-status=live)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-42863-mediumThumb-S0266466606060282frm026.jpg?pub-status=live)
The functions mT(·) and μT(·) may depend upon (X1,ε1),…,(XT,εT) in (A.2) but not in (A.5).
Proofs of Lemmas A.1 and A.2. See Appendix B.
The next lemma is used for Proposition 2. It is stated for general maps φk(·) from
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm213.gif?pub-status=live)
, k ≥ 1. Consider the row vector Φκ(Xt) = [φ1(Xt),…, φκ(Xt)] and the κ × T matrix Φκ = [Φκ(X1)′, …,Φκ(XT)′]′. Define
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm214.gif?pub-status=live)
We make the following assumption.
Assumption B. The matrices Vκ have an inverse with
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm215.gif?pub-status=live)
, and the functions φk(·) are such that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm216.gif?pub-status=live)
.
Define
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm217.gif?pub-status=live)
We now study the tail probability of QT.
LEMMA A.3. Let QT = QκT be as before. Then, under Assumptions E, X(i), B, and κ = κT = o(T(3/4)[(1+a)/(5+3a)]), make the following assumptions.
(i) Let χ(κ) be a chi-square variable with κ degree of freedom. Then
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm218.gif?pub-status=live)
(ii) Consider ε > 0. Then there exists a constant Cε, which does not depend upon κ and γ, such that for any γ > ε and κ,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm219.gif?pub-status=live)
Proof of Lemma A.3. See Appendix B.
A.2. Proof of Propositions 1 and 2.
Proof of Proposition 1. For brevity of notation, the proof is made for p = dim θ = 1. Define
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm220.gif?pub-status=live)
This gives
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm221.gif?pub-status=live)
Under Assumption M,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm211.gif?pub-status=live)
, which gives
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm222.gif?pub-status=live)
, so that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm223.gif?pub-status=live)
. Consider now Aκ. Under Assumption M, the Taylor formula gives
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm224.gif?pub-status=live)
with a θtT* between
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm226.gif?pub-status=live)
and where m1 and m2 are
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm227.gif?pub-status=live)
column vectors with bounded entries given by the first- and second-order derivatives. Because U = ΔT + ε, this gives
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-21541-mediumThumb-S0266466606060282ffm228.jpg?pub-status=live)
The Cauchy–Schwarz inequality gives |A1κ| ≤ ∥e(θ)∥∥ΔT∥ with
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm229.gif?pub-status=live)
, so that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm230.gif?pub-status=live)
because
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm231.gif?pub-status=live)
by the Markov inequality and Assumption E. Because
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm232.gif?pub-status=live)
and under Assumption M, applying (A.5) for A2κ and the Cauchy–Schwarz inequality for A3κ give
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm233.gif?pub-status=live)
Substituting in the expression of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm234.gif?pub-status=live)
give
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282frm027.gif?pub-status=live)
But
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm235.gif?pub-status=live)
so that substituting (A.4) in the preceding equation and (A.6) give the desired result. █
Proof of Proposition 2. Define
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm236.gif?pub-status=live)
Under the null, Proposition 1 yields
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282frm028.gif?pub-status=live)
Hence Proposition 2(i) follows from taking κ = κmin in Lemma A.3(i) and (A.7). Consider now Proposition 2(ii). Let ε be as in (3.4), so that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm237.gif?pub-status=live)
for T large enough. Therefore (A.7) yields that Proposition 2(ii) is a consequence of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282frm029.gif?pub-status=live)
To prove (A.8), we first rewrite Rκ0 − Rκmin0 as a suitable quadratic form. For k,κ > κmin, let
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm238.gif?pub-status=live)
be as in (A.1) and consider the row vectors
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm239.gif?pub-status=live)
,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm240.gif?pub-status=live)
for some regular κ × κ matrix βκ. Elementary algebra gives
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm241.gif?pub-status=live)
Hence
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm242.gif?pub-status=live)
We now verify that the quadratic form
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm243.gif?pub-status=live)
obeys the conditions of Lemma A.3. Lemma A.1(i) yields that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm244.gif?pub-status=live)
, so that Assumption B holds taking φ∞ = O(κmin1/2) = O(lnC2 d/2 T). Recall that κ − κmin [asymp ] 2jd − 2Jmind by the definition (3.1) of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm245.gif?pub-status=live)
. Hence Lemma A.3(ii) yields, for (A.8),
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-44718-mediumThumb-S0266466606060282ffm246.jpg?pub-status=live)
A.3. Proof of Lemma 1.
In this proof, we apply Lemmas A.1 and A.2 for
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm247.gif?pub-status=live)
, which is such that κ = κmin = κmax = o(T1/3/ln2 T). The Jackson theorem (see Timan, 1994, eqn. (8), p. 278) yields that there is a trigonometric polynomial function Π(·) = ΠΔT,κ(·) with degree [asymp ] κ1/d such that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282frm030.gif?pub-status=live)
Because
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm248.gif?pub-status=live)
is bounded away from 0 over Λ in probability, (A.9) implies that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm249.gif?pub-status=live)
Note that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm250.gif?pub-status=live)
. Let Π = [Π(X1),…, Π(XT)]′, which is such that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm251.gif?pub-status=live)
because
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm252.gif?pub-status=live)
is in the space spanned by the columns of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm253.gif?pub-status=live)
. Hence the triangular inequality and (A.9) give
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-19265-mediumThumb-S0266466606060282ffm254.jpg?pub-status=live)
In the expression (A.9) of Π(·), write β = [β1,…, βκ]′, so that the definitions of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm255.gif?pub-status=live)
in (6.2) and Lemma A.1(ii) give
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-52447-mediumThumb-S0266466606060282ffm256.jpg?pub-status=live)
Substituting shows that (6.3) is proved. Equation (6.4) follows from (A.5) and Assumption V, which gives
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm257.gif?pub-status=live)
APPENDIX B: Proof of Lemmas A.1–A.3
B.1. Proof of Lemma A.1.
We begin with Lemma A.1(i),
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm258.gif?pub-status=live)
. Because
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm259.gif?pub-status=live)
is the largest eigenvalue of the symmetric Σκ and ∥Σκ−1∥ is the inverse of the smallest eigenvalue of Σκ. Hence
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm260.gif?pub-status=live)
Because f (·) and σ(·) are bounded away from 0 and infinity over Λ by Assumptions E and X(ii), and because
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm261.gif?pub-status=live)
is an orthonormal system of L2(Λ,dx), we have uniformly in κ
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm262.gif?pub-status=live)
This gives
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm263.gif?pub-status=live)
, and we now prove that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm264.gif?pub-status=live)
. Let Ψκminκ(Xt) = [ψκmin+1(Xt),…, ψκ(Xt)] and note that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-98787-mediumThumb-S0266466606060282ffm265.jpg?pub-status=live)
It then follows that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm266.gif?pub-status=live)
where A [prcue ] B means that A − B is a symmetric nonnegative matrix. This gives that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm267.gif?pub-status=live)
because the upper bound is a diagonal block submatrix of Σκ. Observe that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm268.gif?pub-status=live)
is also a diagonal block of Σκ−1 by the partitioned inverse formula, so that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm269.gif?pub-status=live)
. This gives
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm270.gif?pub-status=live)
. To show that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm271.gif?pub-status=live)
, note that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm272.gif?pub-status=live)
is the L2(Λ,f (x) dx/σ2(x))-orthogonal projection of ψk(·) on ψ1(·),…, ψκmin(·). The Pythagore inequality gives, uniformly in k ≥ 1,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm273.gif?pub-status=live)
Therefore, the Cauchy–Schwarz inequality gives for all x and κ ≥ 1,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-69539-mediumThumb-S0266466606060282ffm274.jpg?pub-status=live)
Consider now Lemma A.1(ii) and (iii). Define
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm275.gif?pub-status=live)
Assumptions E and X(i) and (6.1) give
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm276.gif?pub-status=live)
and then, by the Cauchy–Schwarz inequality
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-71784-mediumThumb-S0266466606060282frm031.jpg?pub-status=live)
and then
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm277.gif?pub-status=live)
, and we now bound
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm278.gif?pub-status=live)
. We have, uniformly in k ≤ κmax,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-02915-mediumThumb-S0266466606060282ffm279.jpg?pub-status=live)
Because κmax [asymp ] Kmaxd, Assumption V and κmax2/T = o(1) yield
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm280.gif?pub-status=live)
Therefore the smallest eigenvalue of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm281.gif?pub-status=live)
is bounded away from 0 and these matrices have an inverse for 1 ≤ κ ≤ κmax with a probability tending to 1. The order of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm282.gif?pub-status=live)
comes from the series expansion
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm283.gif?pub-status=live)
which ends the proof of Lemma A.1(i) and (iii) because supκ∥Σκ−1∥ < ∞. █
B.2. Proof of Lemma A.2.
Let us recall some results from an empirical process useful to establish some preliminary bounds. Consider the class of functions
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm284.gif?pub-status=live)
from Λ to
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm285.gif?pub-status=live)
with
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm286.gif?pub-status=live)
with [ell ] as in Assumption V. Under Assumption V, there is an MT [asymp ] vT such that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm287.gif?pub-status=live)
Then, to establish Lemma A.2, we can view
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm288.gif?pub-status=live)
as a member of a
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm289.gif?pub-status=live)
. Consider now a sequence of functions from Λ to
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm290.gif?pub-status=live)
and define the empirical process
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm291.gif?pub-status=live)
as
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-01787-mediumThumb-S0266466606060282ffm292.jpg?pub-status=live)
Modifications of bounds (8.3), (8.7), and (8.9) in Rio (2000) to account for multiplication by mT(·) and ψk(·) with supx∈Λ|ψk(x)| = 1 show that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282frm032.gif?pub-status=live)
Define
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm293.gif?pub-status=live)
so that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm294.gif?pub-status=live)
. The Chebyshev inequality, (B.2), and Lemma A.1(ii) give
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-86443-mediumThumb-S0266466606060282frm033.jpg?pub-status=live)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-16137-mediumThumb-S0266466606060282frm034.jpg?pub-status=live)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282frm035.gif?pub-status=live)
Observe also that the martingale structure of the εt's, Assumption E, and (6.1) yield that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-60175-mediumThumb-S0266466606060282ffm295.jpg?pub-status=live)
It follows that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282frm036.gif?pub-status=live)
Note that (A.2) is due to Cauchy–Schwarz inequality and
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm296.gif?pub-status=live)
. Expression (A.3) follows from (B.3) and (B.6). We now prove (A.4). We have
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-09662-mediumThumb-S0266466606060282ffm297.jpg?pub-status=live)
By (B.3), (B.5), (B.6), Lemma A.1(i), Assumption V,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm298.gif?pub-status=live)
, we have
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-81182-mediumThumb-S0266466606060282ffm299.jpg?pub-status=live)
the other remainder terms being negligible. This gives (A.3).
We now turn to (A.5). Let πκ(·) = πκ,T(·) be a trigonometric polynomial function of Πκ with supx∈Λ|mT(x) − πκ(x)| ≤ 2 infπ(·)∈Πκ supx∈Λ|mT(x) − π(x)|. Because
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm300.gif?pub-status=live)
is a linear combination of the columns of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm301.gif?pub-status=live)
for all κ ≥ κmin, it follows that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm302.gif?pub-status=live)
. This gives
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282frm037.gif?pub-status=live)
with
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-97264-mediumThumb-S0266466606060282frm038.jpg?pub-status=live)
Consider first the leading term
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm303.gif?pub-status=live)
of (B.7). Because supκmin supx∈Λ|πκmin(x)| < ∞ and taking ψ1(·) = 1 gives, in (B.2),
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm304.gif?pub-status=live)
The definition of πκmin(·) yields, under Assumption E,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-16980-mediumThumb-S0266466606060282ffm305.jpg?pub-status=live)
This gives, for the leading term of (B.7),
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-65071-mediumThumb-S0266466606060282ffm306.jpg?pub-status=live)
For the first item of (B.8), note that Assumption E gives that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm307.gif?pub-status=live)
; see (6.2). Because orthogonal projection decreases the mean squared norm, this gives, for the first term in (B.8),
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-44318-mediumThumb-S0266466606060282ffm308.jpg?pub-status=live)
so that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm309.gif?pub-status=live)
For the second term in (B.8), observe that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-29390-mediumThumb-S0266466606060282ffm310.jpg?pub-status=live)
Therefore Lemma A.1, (B.3), and (B.6) yield
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-00540-mediumThumb-S0266466606060282ffm311.jpg?pub-status=live)
For the last item of (B.8), (B.3), (B.4), (B.6), and Lemma A.1 give that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-00405-mediumThumb-S0266466606060282ffm312.jpg?pub-status=live)
Substituting in (B.8) and (B.7) yields
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-56350-mediumThumb-S0266466606060282ffm313.jpg?pub-status=live)
B.3. Proof of Lemma A.3.
Abbreviate Vκ−1/2Φκ′(Xt)εt into ηt. Consider a sequence
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm314.gif?pub-status=live)
variables independent of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm315.gif?pub-status=live)
, where Idκ is the identity matrix of dimension κ × κ. Let
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm316.gif?pub-status=live)
be a three time differentiable real function. Define
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm317.gif?pub-status=live)
. The proof of Lemma A.3 is divided into three steps. The main step aims to establish that for
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm318.gif?pub-status=live)
and some C > 0 independent of κ and T,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282frm039.gif?pub-status=live)
Step 1. Proof of (B.9). We build on arguments used in the proof of the Lindeberg central limit theorem as given in Billingsley (1968, Thm. 7.2); see Horowitz and Spokoiny (2001, Lem. 10) for a similar approach in the context of adaptive testing. It consists of successive changes of the ηt into their Gaussian counterparts
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm319.gif?pub-status=live)
, as seen from (B.10), which follows. However, a important difference is due to the use of nonparametric series methods and dependence. Define
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm320.gif?pub-status=live)
This gives
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-65853-mediumThumb-S0266466606060282frm040.jpg?pub-status=live)
Define, for
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm321.gif?pub-status=live)
. A third-order Taylor expansion of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm322.gif?pub-status=live)
with integral remainder yields
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm323.gif?pub-status=live)
with
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-13162-mediumThumb-S0266466606060282frm041.jpg?pub-status=live)
Let
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm324.gif?pub-status=live)
be the sigma field generated by
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm325.gif?pub-status=live)
and note that StT(0) and QtT(0) are
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm326.gif?pub-status=live)
-measurable. Because
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm367.gif?pub-status=live)
are centered given
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm327.gif?pub-status=live)
, we have
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm328.gif?pub-status=live)
Substituting the Taylor expansion in (B.10) yields
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282frm042.gif?pub-status=live)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282frm043.gif?pub-status=live)
and we now bound each of these two sums.
We begin by establishing a preliminary inequality. Let n1 and n2 be two positive real numbers with 2 ≤ n1 + n2 ≥ 8. Then for any t, t′ and z ∈ [0,1],
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282frm044.gif?pub-status=live)
We give a proof for
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm329.gif?pub-status=live)
, the other bound being similarly established. The Hölder inequality implies that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-57943-mediumThumb-S0266466606060282ffm330.jpg?pub-status=live)
Because
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm331.gif?pub-status=live)
is an N(0,σ2 Idκ), it is easily seen that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm332.gif?pub-status=live)
, and we now bound
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm333.gif?pub-status=live)
. We have, by convexity, the Burkholder inequality (see Chow and Teicher, 1988, p. 396, noticing that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm334.gif?pub-status=live)
is a sum of difference of martingale), and the Minkowski inequality
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-95046-mediumThumb-S0266466606060282ffm335.jpg?pub-status=live)
This gives
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm336.gif?pub-status=live)
and then (B.14).
We now return to (B.13). The expression (B.11) of the third derivative of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm337.gif?pub-status=live)
and (B.14) yield
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-97997-mediumThumb-S0266466606060282frm045.jpg?pub-status=live)
To study (B.12), let Φκ(Xt) = Vκ−1/2Φκ′(Xt) = [φ1(Xt),…, φκ(Xt)]′, StT = StT(0) = [S1tT,…, SκtT]′, QtT = QtT(0). The definitions of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm338.gif?pub-status=live)
show that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm339.gif?pub-status=live)
. Therefore because QtT and StT are
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm340.gif?pub-status=live)
measurable, conditioning with respect to
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm341.gif?pub-status=live)
yields, using the expression of the second-order derivative of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm342.gif?pub-status=live)
given in (B.11),
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-21145-mediumThumb-S0266466606060282ffm343.jpg?pub-status=live)
Let n be an integer and define
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm344.gif?pub-status=live)
The variables
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm345.gif?pub-status=live)
depend upon
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm346.gif?pub-status=live)
, which are n + 1 time periods far from the φk2(Xt)'s. Because
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm347.gif?pub-status=live)
, the Wolkonski–Rozanov inequality yields
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282frm046.gif?pub-status=live)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-13932-mediumThumb-S0266466606060282frm047.jpg?pub-status=live)
by first integrating out with respect to the
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm348.gif?pub-status=live)
, which are independent from the ηt's, and using (B.14). Note that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm349.gif?pub-status=live)
and
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm350.gif?pub-status=live)
. This together with the definition of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm351.gif?pub-status=live)
and (B.14) gives
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-95526-mediumThumb-S0266466606060282ffm352.jpg?pub-status=live)
Therefore, (B.16), (B.17), and these inequalities give
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-84615-mediumThumb-S0266466606060282ffm353.jpg?pub-status=live)
Summing over t gives in (B·12)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-75824-mediumThumb-S0266466606060282frm048.jpg?pub-status=live)
under Assumption M(i). An optimal choice of the order of n in (B.18) is T2/(5+3a), which gives the upper bound
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm354.gif?pub-status=live)
. Therefore (B.18) and (B.12), (B.15), and (B.13) yield that (B.9) is proved.
Step 2. Proof of Lemma A.3(i). Now choose a three time continuously differentiable
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm355.gif?pub-status=live)
with
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm356.gif?pub-status=live)
if z > 0. This gives, for any
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm357.gif?pub-status=live)
,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282frm049.gif?pub-status=live)
and then, by (B.9),
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170407163718-12206-mediumThumb-S0266466606060282frm050.jpg?pub-status=live)
Note that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm358.gif?pub-status=live)
is a
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm359.gif?pub-status=live)
that has a continuous density and converges in distribution to a standard normal if κ goes to infinity. Therefore taking ε small enough gives Lemma A.3(i).
Step 3. Proof of Lemma A.3(ii). The proof is done by bounding
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm360.gif?pub-status=live)
in (B.20). Observe that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm361.gif?pub-status=live)
has the same distribution as
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm362.gif?pub-status=live)
where the ζk's are i.i.d. N(0,1) random variables. As established in the proof of Theorem 7.2 of Billingsley (1968) and changing the
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm363.gif?pub-status=live)
into standard N(0,1) variables, there is a constant Cε with
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm364.gif?pub-status=live)
Then (B.19) and (B.20) show
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm365.gif?pub-status=live)
Applying the Mill's ratio inequality (see Shorack and Wellner, 1986, p. 850) to
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060282:S0266466606060282ffm366.gif?pub-status=live)
shows that Lemma A.3(ii) is proved. █