Published online by Cambridge University Press: 01 October 2004
We propose an optimal test procedure for testing the marginal density functions of a class of nonlinear diffusion processes. The proposed test is not only an optimal one but also avoids undersmoothing. An adaptive test is constructed, and its asymptotic properties are investigated. To show the asymptotic properties, we establish some general results for moment inequalities and asymptotic distributions for strictly stationary processes under the α-mixing condition. These results are applicable to some other estimation and testing of strictly stationary processes with the α-mixing condition. An example of implementation is given to demonstrate that the proposed model specification procedure is applicable to economic and financial model specification and can be implemented in practice. To ensure the applicability and implementation, we propose a computer-intensive simulation scheme for the choice of a suitable bandwidth involved in the kernel estimation and also a simulated critical value for the proposed adaptive test. Our finite sample studies support both the proposed theory and the simulation procedure.The authors thank the co-editor and three anonymous referees for their constructive comments and suggestions. The first author also thanks Song Xi Chen for some constructive suggestions, in particular the suggestion on using the local linear form instead of the Nadaraya–Watson kernel form in equation (2.6), and Yongmiao Hong for sending a working paper. The authors acknowledge comments from seminar participants at the International Chinese Statistical Association Meeting in Hong Kong in July 2001, the Western Australian Branch Meeting of the Statistical Society of Australia in September 2001, the University of Western Australia, and Monash University. Thanks also go to the Australian Research Council for its financial support.
Continuous-time diffusion processes arise in many applications in econometrics, but perhaps nowhere do they play as large a role as in finance. Following the pathbreaking work of Black and Scholes (1973), the use of continuous-time diffusion processes has become a common feature of many applications, especially asset pricing models. This is probably due to the following two reasons. The first one is that continuous-time diffusion processes are able to mimic some important macroeconomic and financial phenomena (see Sundaresan, 2001). The second reason is that various parametric diffusion processes have already been used nicely to model financial data. In both theory and practice, however, one needs to specify whether a parametric diffusion process is appropriate for a given set of financial data. In other words, one needs to determine whether it is appropriate to use a diffusion process with both the drift and the volatility assumed to be parametric for a given set of financial data. To justify whether the use of parametric diffusion processes is appropriate or not for a given set of financial data, empirical researchers have recently shown a preference for nonparametric alternatives. Aït-Sahalia (1996a, 1996b) was among the first to pioneer the nonparametric approach. Other related studies include Jiang and Knight (1997), Stanton (1997), Chapman and Pearson (2000), Gao and King (2001), Hong and Li (2004), and Fan and Zhang (2003). Aït-Sahalia (1996a) considers testing the marginal density functions of a class of diffusion processes under the β-mixing condition. Pritsker (1998) conducts a finite sample simulation of a nonparametric kernel test proposed in Aït-Sahalia (1996a). The principal result of Pritsker (1998) is that the test rejects true models much too often when asymptotic critical values are used. This suggests that the use of an asymptotic critical value may not be suitable in the finite sample analysis of a test power. In addition, the use of an estimation-based bandwidth in the nonparametric kernel test may also contribute to the poor performance of the test in finite sample studies, because an estimation-based optimal bandwidth may not necessarily imply that the corresponding test is optimal. We have been motivated by these two aspects to establish a simulation procedure for the choice of both an appropriate critical value and a test optimum bandwidth to improve the test proposed in Aït-Sahalia (1996b).
Recently, Horowitz and Spokoiny (2001) have developed a new test of a parametric model of a conditional mean function against a nonparametric alternative. The test adapts to the unknown smoothness of the alternative model and is uniformly consistent against alternatives whose distance from the parametric model converges to zero at the fastest possible rate. This rate is slower than T−1/2, where T is the number of observations. To the best of our knowledge, the problem of extending the approach of Horowitz and Spokoiny (2001) to construct an adaptive and optimal test for marginal density functions has not been considered. This paper then proposes an adaptive test for testing marginal density functions. The proposed test has an optimal-rate property. In theory, the proposed test is consistent against some local alternatives with an optimal rate as stated in Section 3. In practice, we demonstrate how to apply the test in Section 4 through using a simulated example. Our studies show that the proposed test has some advantages over the test proposed in Aït-Sahalia (1996a).
The rest of the paper is organized as follows. Section 2 discusses the testing of the marginal density. An adaptive test procedure is proposed in Section 3. Section 4 provides an example of implementation. Section 5 concludes the paper with some remarks on extensions. Mathematical assumptions and proofs are relegated to Appendixes A–C.
Consider a continuous-time diffusion process of the form
where μ(·) and σ(·) > 0 are, respectively, the univariate drift and volatility functions of the process indexed by θ and Bt is standard Brownian motion.
Let {rt} satisfy model (2.1) and f (·,θ) be a parametric form of the marginal density function of {rt}. Within the diffusion process, f (·,θ) is completely determined by the corresponding drift μ(·,θ) and the diffusion σ(·,θ) (see Aït-Sahalia, 1996a, expression (6)) given by
where {rt} is distributed on D = (xmin,xmax) with −∞ ≤ xmin < xmax ≤ ∞, both the lower bound x0 and ξ(θ) can be chosen to ensure that f (x,θ) is a probability density, and θ is an unknown parameter vector. Let Θ denote a parameter space in Rq and θ0 ∈ Θ denote the true value of θ.
Let f (x) be a nonparametric form of the density function. The null and alternative hypotheses are
where 0 ≤ CT ≤ 1, limT→∞ CT = 0, and ΔT(x) is a continuous function satisfying ∫ΔT(x) dx = 0 and f (x) ≥ 0 under
. Theoretically, this requires that under
, the alternative function is still a probability density. In practice, the form of ΔT(x) needs to be constructed. The simple and natural choice of ΔT(x) is ΔT(x) = f1(x,θ1) − f (x,θ1), where f1(x,θ) is another specified density function and θ1 ∈ Θ. For example, f (x,θ) is the marginal density of {rt} satisfying the CIR model proposed in Cox, Ingersoll, and Ross (1985), and f1(x,θ) is the marginal density of {rt} satisfying the AG model proposed in Ahn and Gao (1999).
For this case, the hypothesis structure (2.3) can be written as
This is equivalent to
This basically requires us to test whether {rt} is sampled from f (x,θ0) or from f (x,θ1) with probability 1 − CT and from f1(x,θ1) with probability CT. Obviously, such a structure of the null hypothesis versus a sequence of local alternatives naturally extends the usual structure of the null hypothesis against a global alternative of the form
For the diffusion process, we observe the process at dates {tΔ|t = 0,1,…,}, where Δ > 0 is generally small but fixed. Let Xt = r(t−1)Δ for t ≥ 1 throughout this section. Let k(·) be a kernel function,
be the standard kernel density estimator of f (x). Intuitively, it is natural to compare
directly.
In a seminal paper, Aït-Sahalia (1996a) uses a test statistic of the form
where
.
It then follows from (13) of Aït-Sahalia (1996a) that as T → ∞
under the β-mixing and some other conditions, where
in which R(k) = ∫k2(u) du < ∞ and k(j)(0) denotes the j-times convolution product of k(·) given by
The preceding test statistic is based on
, which measures directly the difference between
. It can be shown that under H0,
This implies that it has the same order as the mean square error of
if h is chosen to be O(T−1/5). Thus, to obtain an asymptotically normal distribution with zero mean, h has to satisfy limT→∞ Th4.5 = 0 as required in Assumption A5 of Aït-Sahalia (1996a). This implies undersmoothing.
To reduce the bias and avoid undersmoothing, we propose a nonparametric estimator,
, of f (x,θ) of the form
where
is a consistent estimator of θ, wt(x) = wt(x,h) = (1/T)kh(x − Xt) × [(s2(x) − s1(x)(x − Xt))/(s2(x)s0(x) − s12(x))], and
for r = 0,1,2.
We also define
If
is a
-consistent estimator of θ, then we have
It follows from Fan and Gijbels (1996) that
provided that the first three derivatives of f (x) exist, where ξ1 and ξ2 lie between x and h and x, ck, and dk are constants depending on functionals of k(·), and σk2 = ∫x2k(x) dx.
This implies that as T → ∞
As can be seen from (2.7), the use of the difference
can avoid undersmoothing. In other words, we can still assume lim supT→∞ Th5 < ∞.
Let us now establish our test statistic. We first have a look at the following distance function:
This naturally suggests estimating D(f,θ) by
We then propose using a test statistic of the form
We now state the main results of this section. Their proofs are relegated to Appendix A.
THEOREM 2.1. (i) Suppose that Assumptions A.1–A.5 in Appendix A hold. Then under
in (2.3) we have
where
.
(ii) Assume that the conditions of (i) hold. In addition, assume that there is a random data-driven
such that
. Then under
in (2.3) we have
THEOREM 2.2. (i) Suppose that Assumptions A.1–A.5 in Appendix A hold. Then under
in (2.3) we have
where
are as defined in (2.4).
(ii) Assume that the conditions of (i) hold. In addition, assume that there is a random data-driven
such that
. Then under
in (2.3) we have
Remark 2.1. (i) Similar to
of (2.4), one may replace
by
(ii) As can be seen from Theorem 2.2(i), we need to estimate both the asymptotic mean and variance of
involved in practice. It is possible to avoid estimating this kind of unknown quantity by introducing a weight function into
. In both theory and practice, however, the asymptotic power of the test may depend on the choice of such a weight function. We therefore follow a suggestion made by two of the referees and use the natural form
to construct an adaptive test in Section 3.
(iii) Theorem 2.2(i) establishes an asymptotic normality test statistic. Theorem 2.2(ii) shows that the asymptotic normality remains unchanged when h is replaced with the random data-driven
, which is known as the plug-in method. Fan and Gijbels (1996, pp. 152–154) have shown that the plug-in method has some advantages in applications. Whether the proposed test statistic
is optimal has not been discussed. A modified form of the test statistic is shown to be optimal, and the detailed discussion is given in Section 3.
Section 2 establishes the asymptotic normality of the test statistic for testing the marginal densities. The test statistic has nontrivial power only if CT converges more slowly than T−1/2. To improve the asymptotic power properties of the test, we consider extending the approach of Horowitz and Spokoiny (2001) for testing nonparametric regression functions. It is assumed that a marginal density function g belongs to a class of s-times (s ≥ 2) differentiable density functions on R1, such as a Hölder, Sobolev, or Besov class,
, which is separated from the null hypothesis by some distance CT that converges to zero as T → ∞. The objective of this section is to find the fastest rate at which CT can approach zero while permitting consistent testing uniformly over
. This rate is called the optimal rate of testing. A test is consistent uniformly over
if
Thus, the optimal rate of testing is the fastest rate at which CT can approach zero while maintaining (3.1).
As can be seen in Section 2, the proposed test statistic depends on the bandwidth. This section then suggests using
where HT = {h = hmaxak : h ≥ hmin, k = 0,1,2,…}, in which 0 < hmin < hmax and 0 < a < 1. Let JT denote the number of elements of HT. In this case, JT ≤ log1/a(hmax/hmin). Detailed conditions on hmin and hmax will be given in Assumption B.3 in Appendix B.
We discuss how to obtain a critical value for L*. The exact α-level critical value, lα* (0 < α < 1), is the 1 − α quantile of the exact finite sample distribution of L*. Because θ0 is unknown, lα* cannot be evaluated in practice. We therefore suggest choosing a simulated α-level critical value, lα, by using the following simulation procedure.
1. For the simulation, we either use resamples of the sampled data Xt or generate the data Xt from the marginal density f (x,θ0) or the corresponding transition density with an initial value of θ0 under
.
2. The true value θ0 is estimated based on the simulated {Xt}, and the resulting estimate is denoted by
.
3. We choose HT as specified following (3.2) with hmin and hmax satisfying Assumption B.3 in Appendix B and then compute L* of (3.2) using the simulated {Xt} and
.
4. Repeat the preceding steps M times and produce M versions of L*, Lm* for m = 1,2…,M. The simulated critical value lα is then the (1 − α)% percentile of the M values, Lm* for m = 1,2…,M, of L*.
We now state the following result, and its proof is relegated to Appendix B.
THEOREM 3.1. Assume that Assumptions A.1, A.3, and A.4 in Appendix A and B.1–B.3 listed in Appendix B hold. Then under
The main result on the behavior of the test statistic L* under
is that lα is an asymptotically correct α-level critical value under any model in
.
We now show that L* is consistent against a fixed alternative model. Assume that model (1.1) holds. Let the parameter set Θ be an open subset of Rq. Let
satisfy Assumption B.1 in Appendix B. For convenience, let
Measure the distance between
by the normalized l2 distance
where ∥·∥ denotes the euclidean norm. If
is false, then
for all sufficiently large T and some Cρ > 0. A consistent test will reject a false
with probability approaching one as T → ∞.
The following theorem establishes a consistency result, and its proof is relegated to Appendix B.
THEOREM 3.2. Assume that the conditions of Theorem 3.1 hold. In addition, if there is a Cρ > 0 such that
holds then
In this section, we consider the consistency of L* under local alternatives of the form
with
for some constant C0 > 0 and θ1 ∈ Θ.
Let
We now have that
To ensure that the rate of convergence of fT to the parametric model F(θ1) is the same as the rate of convergence of CT to zero, in view of (3.4), we need to assume that ΔT(x) is a continuous function that is normalized so that
for some δ > 0. When ΔT(·) does not depend on T, condition (3.5) can be replaced by E [Δ2(X1)] > 0, which holds automatically when Δ(·) ≠ 0.
We now state the following consistency result, and its proof is relegated to Appendix B.
THEOREM 3.3. Assume that Assumptions A.1, A.3, and A.4 in Appendix A and B.1–B.3 with hmax = cmax(log log T)−1 for some constant cmax > 0 in Appendix B hold. Let
be a
-consistent estimator of θ. Let fT satisfy (3.3) with
for some constant C > 0. In addition, let condition (3.5) hold. Then
The result shows that the power of the adaptive, rate-optimal test approaches one as T → ∞ for any function ΔT(·) and sequence {CT} that satisfy the conditions of Theorem 3.3.
This section establishes that L* is consistent uniformly over alternatives in a Hölder smoothness class whose distance from the parametric model approaches zero at the fastest possible rate. It can be shown that we can extend the results to Sobolev and Besov classes under more technical conditions.
Before specifying our smoothness classes, we introduce the following notation. Define the Hölder norm
where Sf = {x ∈ R1 : f (x) > 0}.
The smoothness classes that we consider consist of functions f ∈ S(H,s) ≡ {f : ∥ f ∥H,s ≤ cH} for some (unknown) s ≥ 2 and cH < ∞.
For some s ≥ 2 and all sufficiently large Cf < ∞, define
where
is as defined in Section 3.2.
We now state the following consistency result, and its proof is relegated to Appendix B.
THEOREM 3.4. Assume that Assumptions A.1, A.3, and A.4 in Appendix A and B.1–B.3 in Appendix B hold. Then for 0 < α < 1 and BH,T as defined in (3.7)
Remark 3.1. Theorems 3.1–3.4 show that we have established some consistency results for the proposed test given in (3.2). Such consistency results correspond to Theorems 1–4 of Horowitz and Spokoiny (2001) for a fixed design regression case. In our case, we deal with the case where the observations are stationary and α-mixing time series. In addition, the optimum version L* is asymptotically consistent as established in Theorem 3.2. This is one of the advantages of our test over existing ones, such as the natural competitor proposed in Aït-Sahalia (1996a). In Section 4, we show that our test also outperforms the natural competitor in the finite sample case.
This section illustrates the proposed adaptive test by the following example. As the bootstrap simulation procedure for selecting both the bandwidth and simulated critical values is extremely computationally demanding, especially for large numbers of data, we only consider using the CIR model proposed by Cox et al. (1985) and show how to implement the adaptive test statistic L* of (3.2) in practice through using a simulated example. The main reason for choosing the model is not only because both the marginal and transition density functions have closed forms but also because the model has been studied extensively in the literature. See, for example, Aït-Sahalia (1999) and Hong and Li (2004).
We consider using the CIR model given by
where κ > 0, β > 0, and σ > 0 are unknown parameters and Bt is standard Brownian motion. It can be shown that {rt} is distributed on R+ = (0,∞) if 2κβ/σ2 ≥ 1. Furthermore, it follows from Lemma 3.1 of Masry and Tjøstheim (1995) that the process {rt} satisfies Assumption A.1(i). Alternatively, one may apply Assumption A.3′ of Aït-Sahalia (1996b, p. 552) to verify that {rt} is strictly stationary and α-mixing.
As a result of (2.2), the marginal density function of {rt} satisfying model (4.1) is
where θ = (β,κ,σ), ν = 2κβ/σ2 − 1, and Γ(·) is the usual gamma function. Let θ0 be the true value of θ.
To construct a sequence of local alternatives, we also consider using a marginal density of the form
where ν1 = 2κ/σ2. It is known that f1(x,θ) is the marginal density of {rt} satisfying the AG model proposed in Ahn and Gao (1999)
with parameter values κ > 0, β > 0, and σ > 0. The necessary and sufficient conditions for stationarity and unattainability of 0 and ∞ in finite expected time are the pairs κ > 0 and β > 0 (see Ahn and Gao, 1999). To show that {rt} is strictly stationary and α-mixing, as explained in Appendix A of Ahn and Gao (1999, pp. 755–756), one needs only to verify Assumption A.3′ of Aït-Sahalia (1996b, p. 552). It is easy to see that such an assumption holds for the marginal density, drift, and diffusion functions given in (4.3) and (4.4).
The corresponding structure of the test problem (2.3) for this example can be constructed as
where
in which θ1 ∈ Θ. The reason for choosing such ΔT(·) as the local shift function is to ensure that the models under
fluctuate closely around those under
. The choice of (4.5) and (4.6) ensures that (3.7) holds with s = 2. This implies that the adaptive test is consistent against the sequence with an optimal rate. Note that Assumptions B.1 and B.2 hold.
In the following simulation, we consider using a class of alternatives of the form
where θ1 ∈ Θ and 0 < ψ < 1 is defined as the truncation parameter to be chosen.
To compute the nonparametric estimators involved, we choose the normal kernel function given by
throughout the simulation. Observe that Assumptions A.1–A.4 hold. For the CIR and AG models, we simulate the data from their marginal density and transitional functions, which all have closed forms.
In the detailed simulation, we simulate the data from (4.2) for the CIR model, (4.3) for the AG model, and then (4.7) under
. Using the simulated data, we compute
in which R(k) =
are used after the choice of (4.8) and HT is as defined following (3.2) with
. Note that Assumption B.3 holds.
To compare L* with
in (2.4), we construct a test statistic of the form
where h* is chosen by using the following procedure.
[bull ] We simulate Xt with probability 1 − ψ from the CIR model and with probability ψ from the AG model with an initial value of θ1 under
.
[bull ] Use the simulated data {Xt : t = 1,2,…,T} to estimate θ1.
[bull ] Compute the resulting function of h given by
[bull ] Repeat the preceding steps Q = 1,000 times and produce Q versions of
denoted by
for m = 1,2,…,Q. Use the Q functions of h,
for m = 1,2,…,Q, to construct their empirical bootstrap distribution function, that is,
where I(U ≤ u) is the usual indicator function.
[bull ] For a given asymptotic critical value ecvα at the level α (e.g., ecv0.005 = 1.645 at the 5% level), we then calculate the following power function:
[bull ] Find approximately at which h value the power function ψ(h) is maximized. Denote the maximizer by h*.
We then consider using the same choice of the parameter values as in (17) of Pritsker (1998). This means that the baseline model is model (4.1) with
. In this example, the same parameter values were also used as θ1 in computing the power of the tests L* and L0*. The truncation parameter was chosen as ψ = 0 under
whereas the truncation parameter was chosen as
under
. Three different sizes of sample T = 1,000, 2,755, or 5,500 were then considered. The corresponding simulated critical values, lα and l0α, of L* and L0* at the α level are then found by using the simulation scheme proposed in Section 3.1. The sizes of the tests were then computed based on the data simulated under
, and the power values of the tests were calculated based on the data generated under
. In implementing the simulation procedure, we used M = 1,000 involved in the simulation scheme proposed in Section 3.1. The number of simulations in producing Table 1 was also 1,000. Both the size and the power of L* and L0* are given in Table 1.
Remark 4.1. (i) As can be seen from Table 1, the power values of both L* and L0* look reasonable when
, or about 3%. This may show that both L* and L0* are practically applicable to the medium sample case, because the difference between the null hypothesis and its alternative was made deliberately close. We also computed the power of the tests for the case where
or 5%. Our small sample results showed that the power of L* was already 100% even when T = 1,000. In general, it is true that the power increases as ψ increases for each case. Observe that L* is slightly more powerful than L0*, although h* involved in
has been chosen based on the assessment of its power. We observe that the sizes of the two tests are also close to either 5% in the first half of Table 1 or 1% in the second half of Table 1.
(ii) We also examined the dependence of the power on the choice of the initial parameter values. Our experience suggests that the power of the tests mainly depends on the choice of the truncation parameter ψ. This is both understandable and expected, because the test statistics finally depend only on the estimation and reestimation procedure of the vector of the initial parameters rather than the initial parameter values themselves. This is probably why artificial values or parameter values estimated from a set of real data are used as initial values for starting a simulation procedure. For example, Hong and Li (2004) use the parameter values estimated from the U.S. interest rate series for their simulation procedure.
(iii) Compared with existing results (see Pritsker, 1998), both the size and power of L0* have been significantly improved. This is probably because (a) the choice of h involved in
is based on the assessment of the power of
rather than using an estimation-based optimal value and (b) to avoid using the asymptotic distribution of
and then an asymptotic critical value of 1.645 at the 5% or 2.33 at the 1% level, we have used the bootstrap-based simulated critical value, l0α, at the level α. We also computed both the power and size values for the case where h was chosen by using a cross-validation criterion, and the resulting sizes and power values were similar to those obtained by Pritsker (1998), although L* always performed better than L0*. This further demonstrates that the asymptotic distribution of either
can only provide some kind of idea about the asymptotic behavior. In practice, we strongly suggest using the proposed bootstrap simulation procedure for choosing a simulated critical value rather than an asymptotic critical value.
In this paper, we have considered testing the general continuous-time diffusion model (1.1) under the α-mixing condition. The results for continuous-time models under the α-mixing condition complement some existing results under the β-mixing condition. See, for example, Aït-Sahalia (1996a). Moreover, an adaptive and optimal test procedure has been established. This extension corresponds to Horowitz and Spokoiny (2001) for the fixed design nonparametric regression and then to Chen, Gao, and Li (2001) for a nonparametric time series regression model. To deal with the α-mixing condition, we have established some novel results for moment inequalities (see Lemma C.2) and limit theorems (see Lemma A.1) for degenerate U-statistics of strongly dependent processes. Both Lemmas A.1 and C.2 are applicable to some other estimation and testing of diffusion processes with the α-mixing condition (for more about various mixing conditions, see Doukhan, 1995). In addition, we have demonstrated how to implement the proposed test procedure in practice through using a simulated example.
The results given in this paper can be extended in a number of directions. First, it is possible to consider testing for both the marginal and transition density functions simultaneously, because the transition density can capture the full dynamics of a diffusion process and, in particular, can distinguish the diffusion processes that have the same marginal density but different transition densities. Second, the results of this paper for the short-range dependent continuous-time case can be extended to the long-range dependent continuous-time case. Third, one probably can relax the strict stationarity and the mixing condition, as the recent work by Aït-Sahalia (1999) and Karlsen and Tjøstheim (2001) indicates that it is possible to do such work without the stationarity and the mixing condition. This part is particularly important for two reasons: (i) for the long-range dependent case one needs to avoid assuming both the long-range dependence and the mixing condition, as they contradict each other; and (ii) some important models are nonstationary. These are some issues left for future research.
This Appendix lists the necessary assumptions for the establishment and the proof of the main results given in Section 2.
A.1. Assumptions. Let the parameter set Θ be an open subset of Rq. Let
. Define [dtri ]θ f (x,θ) = ∂f (x,θ)/∂θ, [dtri ]θ2 f (x,θ) = ∂2f (x,θ)/∂θ∂θ′, and [dtri ]θ3 f (x,θ) = ∂3f (x,θ)/∂θ∂θ′∂θ′′ whenever these derivatives exist. For any q × q matrix D, define
where
.
Assumption A.1. (i) Assume that the process {rt} is strictly stationary and α-mixing with the mixing coefficient α(t) = Cααt defined by
for all s,t ≥ 1, where 0 < Cα < ∞ and 0 < α < 1 are constants and Ωij denotes the σ-field generated by {rt : i ≤ t ≤ j}.
(ii) Assume that the univariate kernel function k(·) is nonnegative, symmetric, and four-times differentiable on R1 = (−∞,∞). In addition,
.
Assumption A.2. (i) The parameter space Θ ⊂ Rq is compact. In a neighborhood of the true parameter θ0, f (x,θ) is twice continuously differentiable in θ; E [(∂f (x,θ)/∂θ)(∂f (x,θ)/∂θ)τ] is of full rank. In addition, assume that G(x) is a positive and integrable function with E [G(Xt)] < ∞ uniformly in t ≥ 1 such that supθ∈Θ| f (Xt,θ)|2 ≤ G(Xt) and supθ∈Θ∥[dtri ]θ j f (Xt,θ)∥2 ≤ G(Xt) for j = 1,2,3, where for
.
(ii) Assume that
is a
-consistent estimator of θ0.
Assumption A.3. For every θ ∈ Θ:
(i) The drift and the diffusion functions are three times continuously differentiable in x ∈ R+ = (0,∞), and σ > 0 on R+.
(ii) The integral of
converges at both boundaries of D, where v is fixed in D.
(iii) The integral of
diverges at both boundaries of D.
Assumption A.4. (i) Assume that the first three derivatives of f (x) are continuous on D and that f (x) > cf > 0 on the interior of D for some cf > 0. In addition, both f (x) and f2(x) are integrable on D.
(ii) The initial random variable r0 is distributed as f (x).
(iii) The true drift and diffusion functions satisfy Assumption A.3.
Assumption A.5. The bandwidth parameter h satisfies that
Remark A.1. Assumptions A.1–A.4 are quite natural in this kind of problem. Assumptions A.2–A.4 correspond to Assumptions A0, A1, and A3 of Aït-Sahalia (1996a). Assumption A.1 is the exception. Assumption A.1(i) assumes the α-mixing condition, which is weaker than the β-mixing condition. Assumption A.1(ii) is quite general, allowing the use of the standard normal kernel. Assumption A.5 ensures that the theoretically optimum value of hoptimal = CT−1/5 can be included. This is important, because there may be cases in which hoptimal is also optimal for testing purposes.
A.2. Technical Lemmas.
The following lemmas are necessary for the proof of the main results stated in Section 2.
LEMMA A.1. Let ξt be an r-dimensional strictly stationary and strong mixing (α-mixing) stochastic process. Let φ(·,·) be a symmetric Borel function defined on Rr × Rr. Assume that for any fixed x ∈ Rr, E [φ(ξ1,x)] = 0 and E [φ(ξi,ξj)|Ω0j−1] = 0 for any i < j, where Ωij denotes the σ-field generated by {ξs : i ≤ s ≤ j}. Let
. For some small constant 0 < δ < 1, let
where the maximization over P in the equation for MT4 is taken over the four probability measures P(ξ1,ξi,ξj,ξk), P(ξ1)P(ξi,ξj,ξk), P(ξ1)P(ξi1)P(ξi2,ξi3), and P(ξ1)P(ξi)P(ξj)P(ξk), where (i1,i2,i3) is the permutation of (i,j,k) in ascending order;
Assume that all the MT′s are finite. Let
If limT→∞(max{MT,NT}/σT2) = 0, then
Remark A.2. Lemma A.1 establishes central limit theorems for degenerate U-statistics of strongly dependent processes. It should be pointed out that the conclusion of Lemma A.1 remains true when the martingale assumption that E [φ(ξi,ξj)|Ω0j−1] = 0 for any i < j is removed. Such a martingale assumption is used only for a direct application of an existing central limit theorem (CLT) for martingales. Without such a condition, one needs only to decompose
and then apply the martingale CLT to
. Using the condition that E [φ(ξ1,x)] = 0 for each given x, one can show that the terms involving E [φ(ξi,ξj)|Ω0j−1] are negligible (see Roussas and Ioannides, 1987, Theorem 5.5). Thus, as assumed in Lemma 3.2 of Hjellvik, Yao, and Tjøstheim (1996) and Theorem 2.1 of Fan and Li (1998), the condition that E [φ(ξ1,x)] = 0 for each given x is the key assumption.
Proof. Let
To prove Lemma A.1, it suffices to show that as T → ∞
and
By Lemma C.1 (with η1 = φik, η2 = φjk, l = 2, pi = 2(1 + δ), and Q = 1/(1 + δ)),
Therefore,
because
.
Observe that
Let ηijk = 1/3(φikφjk + φijφkj + φjiφki) and ηij = 1/3∫φikφjk dP(ξk).
Then by Lemma C.2(i) in Appendix C,
Let Cφ = ∫φ122φ342 dP(ξ1) dP(ξ2) dP(ξ3) dP(ξ4), where P(ξ) denotes the probability measure of ξ.
Using Lemma C.1 repeatedly, we have that for different i,j,k,l
where Δ(i,j,k,l) is the minimum increment in the sequence that is the permutation of i,j,k,l in ascending order.
Similar to (A.5), one can have for all different i,j,k,l
Therefore,
It now follows from (A.3)–(A.5) that for any ε > 0
Thus, (A.1) holds.
Note that for 2 ≤ k ≤ T,
It is easy to see that
Similar to (A.5), one can have for any (i,j) ≠ (s,t),
where Δ(·) is as defined in (A.5).
Consequently, the first two terms on the right-hand side of (A.7) are of order O(T3MT41/(1+δ)), because
.
Thus, (A.2) follows from
This finishes the proof. █
Before stating the following lemmas, we define the following notation.
using
where
in which A is the T × T matrix with {ast} as its (s,t) element.
We assume without loss of generality throughout the rest of this paper that
LEMMA A.2. Under the conditions of Theorem 2.1, we have as T → ∞
Proof. We now prove (A.9). It follows from Assumptions A.2 and A.3 that as T → ∞
This completes the proof of the first part of (A.9). For the proof of the second part of (A.9), let
Then
We first look at the main component of σT2. We now have
Using Assumptions A.1–A.4, we have as T → ∞
where L(x) = ∫k(x + y)k(y) dy is as defined in (2.5).
Therefore, as T → ∞
where k(4)(·) is as defined in (2.5).
Similarly, one can show that as T → ∞
We now deal with the remainder term of var[N0T(h)] . By Lemma C.1 (with η1 = φik, η2 = φjk, l = 2, pi = 2(1 + δ), and Q = 1/(1 + δ)),
where MT1 is as defined in Lemma A.1.
Therefore, using the fact that
,
whose proof is similar to that of (A.17), which follows.
Equations (A.10)–(A.12) imply
This finishes the proof of the second part of (A.9). █
LEMMA A.3. Under the conditions of Theorem 2.1, we have as T → ∞
and
where C1 is a constant and
are as defined in Section 2.
Proof. We now give only the proof of (A.14) in some detail, as the proofs of (A.13) and (A.14) are similar and quite standard and the details follow similarly from some existing results. See, for example, Fan and Gijbels (1996).
In view of the definition of wt(x) and the second equation of (A.13), to prove (A.14), it suffices to show that as T → ∞
using a Taylor expansion to f (x) − f (x − vh). This finishes the proof of (A.14). █
A.3. Proof of Theorem 2.1.
Proof of Theorem 2.1(i). To prove Theorem 2.1(i), in view of Remark A.2 and Lemma A.3, it suffices to show that
To apply Lemma A.1, let ξt = Xt and φ(ξs,ξt) = φst defined previously. Let MT and NT be defined as in Lemma A.1. We now verify only the following condition listed in Lemma A.1:
for MT1, MT 21, MT3, MT51, MT52, and MT6, where σh2 = hσ02. The others follow similarly.
For the MT part, one justifies only
The others follow similarly.
Let ψst = (1/Th)∫k((x − Xs)/h)k((x − Xt)/h)p(x) dx. We now have
where L(·) is as defined previously.
For any given 1 < ζ < 2 and T sufficiently large, we obtain
using Assumption A.1(ii), where f (x,y,z) denotes the joint density of (Xi,Xj,Xk) and Cp is a constant.
Thus, as T → ∞
Hence, (A.17) shows that (A.15) holds for the first part of MT1. The proof for the second part of MT1 follows in a similar way. Similarly, we have that as T → ∞
using Assumption A.1(ii).
This implies that as T → ∞
Thus, (A.18) now shows that (A.15) holds for MT3. It follows from the structure of {ψij} that (A.15) holds automatically for MT51, MT52, and MT6, because E [φst] = 0 for s ≠ t.
We now prove that (A.15) holds for MT 21. For some 0 < δ < 1 and 1 ≤ i < j < k ≤ T, let MT 21 = E [|ψikψjk|2(1+δ)] . Similar to (A.16) and (A.17), we obtain that as T → ∞
using the fact that limT→∞ Th = ∞.
This completes the proof of (A.15) for MT 21, and thus (A.15) holds for the first part of {φst}. Similarly, one can show that (A.15) holds for the other parts of {φst}. Thus, we have shown that under
The proof of Theorem 2.1(i) is therefore finished. █
Proof of Theorem 2.1(ii). Note that as T → ∞
using the continuity of
in h. This completes the proof of Theorem 2.1(ii). █
Proof of Theorem 2.2. The proof follows from Theorem 2.1 and the following standard result:
This Appendix lists the necessary assumptions for the establishment and the proof of the main results given in Section 3.
B.1. Assumptions.
Assumption B.1. The parameter set Θ is an open subset of Rq for some q ≥ 1. The parametric family
satisfies the following conditions.
holds with probability one (almost surely).
Assumption B.2. (i) Let
be true. Then θ0 ∈ Θ and
for any ε > 0 and all sufficiently large CL.
(ii) Let
be false. Then there is a θ* ∈ Θ such that
for any ε > 0 and all sufficiently large CL.
Assumption B.3. (i) Assume that the set HT has the structure of (3.2) with cminT−γ = hmin < hmax = cmax(log log T)−1, where γ, cmin, and cmax are some constants satisfying 0 < γ < 1 and 0 < cmin,cmax < ∞.
(ii) Assume that ΔT(x) is continuous in x ∈ D and satisfies
for all T ≥ 1.
Remark B.1. Assumptions B.1(i) and B.1(ii) are quite standard in this kind of problem. See Assumptions 1(i) and (ii) of Horowitz and Spokoiny (2001). Assumption B.1(iii) is required to ensure that the marginal density function is identifiable. A similar condition is used in Assumption 1(iii) of Horowitz and Spokoiny (2001). It can be shown that Assumption B.1(iii) holds when f (x,θ) belongs to classes of simple linear and certain nonlinear functions in θ. The identifiability assumption is imposed to exclude the case where f (x,θ) is flat as a function of θ over certain range of θ and some value of x, because such a function may be neither identifiable nor a probability density. Assumption B.2 is needed to ensure that the true version of θ under
can be estimated by a
-consistent estimator. Assumption B.3(i) imposes some conditions on both hmin and hmax. The theoretical condition on hmin is quite general. In practice, we would suggest using
to include the estimation-based optimal bandwidth hoptimal = Cn−[1/(2s+1)], because the estimation-based optimal value may also be optimal for testing purposes in some cases. The restriction on hmax is required only for the proof of Theorem 3.3. It should be noted that hmax is not necessarily the optimal bandwidth such that the power of the resulting test is maximized. As explained at the beginning of Section 2, both the existence and reasonableness of Assumption B.3(ii) can be justified. Unlike the regression setting discussed in Horowitz and Spokoiny (2001), we need to assume
to ensure that the alternative is also a probability density. As the main results in Section 2 are only concerned with the null hypothesis, we do not need to assume such a rigorous condition for the main results.
This paper considers using only a set of discrete bandwidths for constructing the adaptive test. It is believed that some corresponding results of Theorems 3.1–3.4 can be established for the case where HT is an interval of continuous bandwidth values. As HT is always chosen as a set of discrete bandwidths in practice, we therefore think that such an extension from a set of discrete bandwidths to an interval of continuous bandwidth values may just be for theoretical and technical consideration. As such an extension also involves much more tedious and technical details, we do not discuss this issue in detail in this paper.
B.2. Technical Lemmas. Before stating the necessary lemmas for the proof of the results given in Section 3, we introduce the following notation.
LEMMA B.1. Suppose that the conditions of Theorem 2.1 hold.
(i) For every δ > 0
in probability, where C > 0 is a constant.
(ii) For each θ ∈ Θ and sufficiently large T
Proof. (i) It follows from the definition of QT(θ) that
To prove Lemma B.1(i), one first needs to show that
in probability for some constant C > 0.
Using the conditions of Lemma B.1, we now have
in probability.
In view of (B.2), to prove Lemma B.1(i), it suffices to show that
in probability.
A Taylor series expansion to f (Xt,θ) − f (Xt,θ0) and an application of Assumption B.1(i) imply (B.3). This finishes the proof of Lemma B.1(i).
(ii) Let λmin(A) and λmax(A) denote the smallest and largest eigenvalues of A, respectively. In view of
to prove Lemma B.1(ii), it suffices to show that for n large enough
for some C > 0. Similar to the proof of Lemma A.2 of Gao, Tong, and Wolff (2002), one can easily finish the proof of (B.5). █
Without loss of generality, we consider the case of q = 1 in the following lemmas and their proofs. Define
LEMMA B.2. Under the conditions of Theorem 3.1, we have for any given θ ∈ Θ and i = 1,2
Proof. It suffices to show that for any large constant C0 > 0
where
Similar to the proof of (A.1), one can show that as T → ∞
for some function C(θ).
Using Lemmas C.1 and C.2 in Appendix C and the fact that E [εt(x)] = 0 for x ∈ D, one can show that as T → ∞
Thus, equations (B.7)–(B.9) complete the proof. █
LEMMA B.3. Under the conditions of Theorem 3.1, we have as T → ∞
Proof. Similar to (B.7), we have for large constant C0 > 0
Similar to (B.8), we can have as T → ∞
Analogous to (B.9), one can show that as T → ∞
Thus, equations (B.11)–(B.13) complete the proof of (B.10). █
LEMMA B.4. Under the conditions of Theorem 3.1, we have for each u > 0,
under
.
Proof. We now prove (B.14). Using a Taylor series expansion to f (Xt,θ) − f (Xt,θ0) and Assumption B.1, we have for θ′ between θ and θ0
Hence, (B.4), (B.10), (B.15), and Assumption B.1(i) imply
The proof of (B.14) follows from (B.15) and (B.16). █
LEMMA B.5. Suppose that the conditions of Theorem 3.1 hold. Then for every u > 0, some h ∈ HT, and as T → ∞
under
.
Proof. In view of the definition of Qn(θ), to prove (B.17), it suffices to show that as T → ∞
where qT = E [QT(θ*)] .
Note that
where θ′ lies between θ and θ*.
In view of (B.6), (B.10), (B.18), and Assumptions B.1(i) and B.2(ii), to prove (B.17), it suffices to show that for any δ > 0,
as T → ∞.
Similar to (B.8) and (B.9), one can show that as T → ∞
Thus, equations (B.19) and (B.20) imply that as T → ∞
using qT = CTh(1 + o(1)) given in the proof of Lemma B.1(ii), where C is a constant independent of T. Lemma B.5 therefore follows from (B.21). █
Recall the notation introduced in (A.9). We assume without loss of generality that k(4)(0) = 1 in Lemma A.2. Define
LEMMA B.6. Suppose that the conditions of Theorem 3.1 hold. Then as T → ∞
uniformly over h ∈ HT.
Proof. The proof of (B.23) follows from (2.7) and (2.8) immediately. █
LEMMA B.7. Suppose that the conditions of Theorem 3.1 hold. Then maxh∈HT L0(h) and maxh∈HT LT(h) have identical asymptotic distributions under
.
Proof. Note that QT(θ0) = 0 under
and that Lemmas A.3 and B.1–B.5 imply as T → ∞
Therefore, equations (B.21), (B.22), and (B.24) complete the proof of Lemma B.7. █
LEMMA B.8. Suppose that the conditions of Theorem 3.1 hold. Then for any x ≥ 0, h ∈ HT, and all sufficiently large T
Proof. It follows from the beginning of the proof of Theorem 2.1(i) that for any small δ > 0 there exists a large integer T0 ≥ 1 such that for T ≥ T0
where
.
This implies for any T ≥ T0 and x ≥ 0
using
.
The proof follows by letting
for any x ≥ 0. █
For 0 < α < 1, define
to be the 1 − α quantile of maxh∈HT L0(h).
LEMMA B.9. Suppose that the conditions of Theorem 3.1 hold. Then for large enough T
Proof. The proof is trivial.
LEMMA B.10. Suppose that the conditions of Theorem 3.1 hold. Suppose that
for some h ∈ HT, where
Then
Proof. To prove Lemma B.10, in view of Lemmas B.6 and B.7, it suffices to show that
which holds if
for some h ∈ HT. For any h ∈ HT, using (B.21) and then (B.17) we have
On the other hand, condition (B.25) implies that as T → ∞
Observe that
Thus, it follows from (B.26) that as T → ∞
because L0(h) is asymptotically normal and therefore bounded in probability and
.
Because of (B.27), as T → ∞
This finishes the proof. █
B.3. Proofs of Theorems 3.1–3.4.
Proof of Theorem 3.1. The proof follows from Lemmas B.6 and B.7.
Proof of Theorem 3.2. This proof is similar to that of Theorem 3.3, which follows, using Lemma B.1(ii). Alternatively, one can follow the corresponding proof of Theorem 2 of Horowitz and Spokoiny (2001) by using Lemma B.1(ii) and the condition that
to verify (B.25). █
Proof of Theorem 3.3. Condition (3.5) ensures that the rate of convergence of fT to the parametric model F(θ1) is the same as the rate of convergence of CT to zero. In particular, when (3.5) holds,
In view of Lemma B.10, to complete the proof of Theorem 3.3, it suffices to verify (B.25). This verification follows from Lemma B.1(ii) and (B.28). █
Proof of Theorem 3.4. For the proof of Theorem 3.4, one needs to use the conditions of Theorem 3.4 to finish the proof. In our proof, we mainly use Lemma B.1(ii) and the condition of Theorem 3.4 that
to verify (B.25). █
The following two technical lemmas have already been used in the proofs of Lemma A.1 and Theorem 2.1. The two lemmas are of general interest in themselves and can be used for other nonparametric estimation and testing problems associated with the α-mixing condition.
LEMMA C.1. Suppose that Mmn are the σ-fields generated by a stationary α-mixing process ξi with the mixing coefficient α(i). For some positive integers m let ηi ∈ Msiti where s1 < t1 < s2 < t2 < ··· < tm and suppose ti − si > τ for all i. Assume further that
for some pi > 1 for which
Then
Proof. See Roussas and Ioannides (1987).
LEMMA C.2. (i) Let ψ(·,·,·) be a symmetric Borel function defined on Rr × Rr × Rr. Let the processξi be defined as in Lemma A.1. Assume that for any fixed x,y ∈ Rr, E [ψ(ξ1,x,y)] = 0. Then
where 0 < δ < 1 is a small constant, C > 0 is a constant independent of T and the function ψ, M = max{M1,M2,M3}, and
(ii) Let φ(·,·) be a symmetric Borel function defined on Rr × Rr. Let the process ξi be defined as in Lemma A.1. Assume that for any fixed x ∈ Rr, E [φ(ξ1,x)] = 0. Then
where δ > 0 is a constant, C > 0 is a constant independent of T and the function φ, and
Proof. As the proof of (ii) is similar to that of (i), one proves only (i). Let i1,…,i6 be distinct integers and 1 ≤ ij ≤ T, let 1 ≤ k1 < ··· < k6 ≤ T be the permutation of i1,…,i6 in ascending order, and let dc be the cth largest difference among kj+1 − kj, j = 1,…,5. Let
By Lemma C.1 (with η1 = ψ(ξi1,ξi2,ξi3), η2 = ψ(ξi4,ξi5,ξi6), l = 2, pi = 2(1 + δ) and Q = 1/(1 + δ)),
Thus,
Similarly,
Analogously, it can be shown in a similar way that
On the other hand, if {k6 − k5,k2 − k1} = {d4,d5}, by using Lemma C.1 three times we have the inequality
Hence,
It follows from (C.3)–(C.7) that
Similar to (C.8), one can show that
Finally, it is easy to see that
The conclusion of Lemma C.2(i) follows immediately from (C.8)–(C.11). █