Hostname: page-component-745bb68f8f-s22k5 Total loading time: 0 Render date: 2025-02-05T23:05:55.492Z Has data issue: false hasContentIssue false

ADAPTIVE TESTING IN CONTINUOUS-TIME DIFFUSION MODELS

Published online by Cambridge University Press:  01 October 2004

Jiti Gao
Affiliation:
The University of Western Australia
Maxwell King
Affiliation:
Monash University
Rights & Permissions [Opens in a new window]

Abstract

We propose an optimal test procedure for testing the marginal density functions of a class of nonlinear diffusion processes. The proposed test is not only an optimal one but also avoids undersmoothing. An adaptive test is constructed, and its asymptotic properties are investigated. To show the asymptotic properties, we establish some general results for moment inequalities and asymptotic distributions for strictly stationary processes under the α-mixing condition. These results are applicable to some other estimation and testing of strictly stationary processes with the α-mixing condition. An example of implementation is given to demonstrate that the proposed model specification procedure is applicable to economic and financial model specification and can be implemented in practice. To ensure the applicability and implementation, we propose a computer-intensive simulation scheme for the choice of a suitable bandwidth involved in the kernel estimation and also a simulated critical value for the proposed adaptive test. Our finite sample studies support both the proposed theory and the simulation procedure.The authors thank the co-editor and three anonymous referees for their constructive comments and suggestions. The first author also thanks Song Xi Chen for some constructive suggestions, in particular the suggestion on using the local linear form instead of the Nadaraya–Watson kernel form in equation (2.6), and Yongmiao Hong for sending a working paper. The authors acknowledge comments from seminar participants at the International Chinese Statistical Association Meeting in Hong Kong in July 2001, the Western Australian Branch Meeting of the Statistical Society of Australia in September 2001, the University of Western Australia, and Monash University. Thanks also go to the Australian Research Council for its financial support.

Type
Research Article
Copyright
© 2004 Cambridge University Press

1. INTRODUCTION AND MOTIVATION

Continuous-time diffusion processes arise in many applications in econometrics, but perhaps nowhere do they play as large a role as in finance. Following the pathbreaking work of Black and Scholes (1973), the use of continuous-time diffusion processes has become a common feature of many applications, especially asset pricing models. This is probably due to the following two reasons. The first one is that continuous-time diffusion processes are able to mimic some important macroeconomic and financial phenomena (see Sundaresan, 2001). The second reason is that various parametric diffusion processes have already been used nicely to model financial data. In both theory and practice, however, one needs to specify whether a parametric diffusion process is appropriate for a given set of financial data. In other words, one needs to determine whether it is appropriate to use a diffusion process with both the drift and the volatility assumed to be parametric for a given set of financial data. To justify whether the use of parametric diffusion processes is appropriate or not for a given set of financial data, empirical researchers have recently shown a preference for nonparametric alternatives. Aït-Sahalia (1996a, 1996b) was among the first to pioneer the nonparametric approach. Other related studies include Jiang and Knight (1997), Stanton (1997), Chapman and Pearson (2000), Gao and King (2001), Hong and Li (2004), and Fan and Zhang (2003). Aït-Sahalia (1996a) considers testing the marginal density functions of a class of diffusion processes under the β-mixing condition. Pritsker (1998) conducts a finite sample simulation of a nonparametric kernel test proposed in Aït-Sahalia (1996a). The principal result of Pritsker (1998) is that the test rejects true models much too often when asymptotic critical values are used. This suggests that the use of an asymptotic critical value may not be suitable in the finite sample analysis of a test power. In addition, the use of an estimation-based bandwidth in the nonparametric kernel test may also contribute to the poor performance of the test in finite sample studies, because an estimation-based optimal bandwidth may not necessarily imply that the corresponding test is optimal. We have been motivated by these two aspects to establish a simulation procedure for the choice of both an appropriate critical value and a test optimum bandwidth to improve the test proposed in Aït-Sahalia (1996b).

Recently, Horowitz and Spokoiny (2001) have developed a new test of a parametric model of a conditional mean function against a nonparametric alternative. The test adapts to the unknown smoothness of the alternative model and is uniformly consistent against alternatives whose distance from the parametric model converges to zero at the fastest possible rate. This rate is slower than T−1/2, where T is the number of observations. To the best of our knowledge, the problem of extending the approach of Horowitz and Spokoiny (2001) to construct an adaptive and optimal test for marginal density functions has not been considered. This paper then proposes an adaptive test for testing marginal density functions. The proposed test has an optimal-rate property. In theory, the proposed test is consistent against some local alternatives with an optimal rate as stated in Section 3. In practice, we demonstrate how to apply the test in Section 4 through using a simulated example. Our studies show that the proposed test has some advantages over the test proposed in Aït-Sahalia (1996a).

The rest of the paper is organized as follows. Section 2 discusses the testing of the marginal density. An adaptive test procedure is proposed in Section 3. Section 4 provides an example of implementation. Section 5 concludes the paper with some remarks on extensions. Mathematical assumptions and proofs are relegated to Appendixes A–C.

2. TESTING MARGINAL DENSITY FUNCTIONS

Consider a continuous-time diffusion process of the form

where μ(·) and σ(·) > 0 are, respectively, the univariate drift and volatility functions of the process indexed by θ and Bt is standard Brownian motion.

Let {rt} satisfy model (2.1) and f (·,θ) be a parametric form of the marginal density function of {rt}. Within the diffusion process, f (·,θ) is completely determined by the corresponding drift μ(·,θ) and the diffusion σ(·,θ) (see Aït-Sahalia, 1996a, expression (6)) given by

where {rt} is distributed on D = (xmin,xmax) with −∞ ≤ xmin < xmax ≤ ∞, both the lower bound x0 and ξ(θ) can be chosen to ensure that f (x,θ) is a probability density, and θ is an unknown parameter vector. Let Θ denote a parameter space in Rq and θ0 ∈ Θ denote the true value of θ.

Let f (x) be a nonparametric form of the density function. The null and alternative hypotheses are

where 0 ≤ CT ≤ 1, limT→∞ CT = 0, and ΔT(x) is a continuous function satisfying ∫ΔT(x) dx = 0 and f (x) ≥ 0 under

. Theoretically, this requires that under

, the alternative function is still a probability density. In practice, the form of ΔT(x) needs to be constructed. The simple and natural choice of ΔT(x) is ΔT(x) = f1(x1) − f (x1), where f1(x,θ) is another specified density function and θ1 ∈ Θ. For example, f (x,θ) is the marginal density of {rt} satisfying the CIR model proposed in Cox, Ingersoll, and Ross (1985), and f1(x,θ) is the marginal density of {rt} satisfying the AG model proposed in Ahn and Gao (1999).

For this case, the hypothesis structure (2.3) can be written as

This is equivalent to

This basically requires us to test whether {rt} is sampled from f (x0) or from f (x1) with probability 1 − CT and from f1(x1) with probability CT. Obviously, such a structure of the null hypothesis versus a sequence of local alternatives naturally extends the usual structure of the null hypothesis against a global alternative of the form

For the diffusion process, we observe the process at dates {tΔ|t = 0,1,…,}, where Δ > 0 is generally small but fixed. Let Xt = r(t−1)Δ for t ≥ 1 throughout this section. Let k(·) be a kernel function,

be the standard kernel density estimator of f (x). Intuitively, it is natural to compare

directly.

In a seminal paper, Aït-Sahalia (1996a) uses a test statistic of the form

where

.

It then follows from (13) of Aït-Sahalia (1996a) that as T → ∞

under the β-mixing and some other conditions, where

in which R(k) = ∫k2(u) du < ∞ and k(j)(0) denotes the j-times convolution product of k(·) given by

The preceding test statistic is based on

, which measures directly the difference between

. It can be shown that under H0,

This implies that it has the same order as the mean square error of

if h is chosen to be O(T−1/5). Thus, to obtain an asymptotically normal distribution with zero mean, h has to satisfy limT→∞ Th4.5 = 0 as required in Assumption A5 of Aït-Sahalia (1996a). This implies undersmoothing.

To reduce the bias and avoid undersmoothing, we propose a nonparametric estimator,

, of f (x,θ) of the form

where

is a consistent estimator of θ, wt(x) = wt(x,h) = (1/T)kh(xXt) × [(s2(x) − s1(x)(xXt))/(s2(x)s0(x) − s12(x))], and

for r = 0,1,2.

We also define

If

is a

-consistent estimator of θ, then we have

It follows from Fan and Gijbels (1996) that

provided that the first three derivatives of f (x) exist, where ξ1 and ξ2 lie between x and h and x, ck, and dk are constants depending on functionals of k(·), and σk2 = ∫x2k(x) dx.

This implies that as T → ∞

As can be seen from (2.7), the use of the difference

can avoid undersmoothing. In other words, we can still assume lim supT→∞ Th5 < ∞.

Let us now establish our test statistic. We first have a look at the following distance function:

This naturally suggests estimating D(f,θ) by

We then propose using a test statistic of the form

We now state the main results of this section. Their proofs are relegated to Appendix A.

THEOREM 2.1. (i) Suppose that Assumptions A.1–A.5 in Appendix A hold. Then under

in (2.3) we have

where

.

(ii) Assume that the conditions of (i) hold. In addition, assume that there is a random data-driven

such that

. Then under

in (2.3) we have

THEOREM 2.2. (i) Suppose that Assumptions A.1–A.5 in Appendix A hold. Then under

in (2.3) we have

where

are as defined in (2.4).

(ii) Assume that the conditions of (i) hold. In addition, assume that there is a random data-driven

such that

. Then under

in (2.3) we have

Remark 2.1. (i) Similar to

of (2.4), one may replace

by

(ii) As can be seen from Theorem 2.2(i), we need to estimate both the asymptotic mean and variance of

involved in practice. It is possible to avoid estimating this kind of unknown quantity by introducing a weight function into

. In both theory and practice, however, the asymptotic power of the test may depend on the choice of such a weight function. We therefore follow a suggestion made by two of the referees and use the natural form

to construct an adaptive test in Section 3.

(iii) Theorem 2.2(i) establishes an asymptotic normality test statistic. Theorem 2.2(ii) shows that the asymptotic normality remains unchanged when h is replaced with the random data-driven

, which is known as the plug-in method. Fan and Gijbels (1996, pp. 152–154) have shown that the plug-in method has some advantages in applications. Whether the proposed test statistic

is optimal has not been discussed. A modified form of the test statistic is shown to be optimal, and the detailed discussion is given in Section 3.

3. AN ADAPTIVE TEST PROCEDURE

Section 2 establishes the asymptotic normality of the test statistic for testing the marginal densities. The test statistic has nontrivial power only if CT converges more slowly than T−1/2. To improve the asymptotic power properties of the test, we consider extending the approach of Horowitz and Spokoiny (2001) for testing nonparametric regression functions. It is assumed that a marginal density function g belongs to a class of s-times (s ≥ 2) differentiable density functions on R1, such as a Hölder, Sobolev, or Besov class,

, which is separated from the null hypothesis by some distance CT that converges to zero as T → ∞. The objective of this section is to find the fastest rate at which CT can approach zero while permitting consistent testing uniformly over

. This rate is called the optimal rate of testing. A test is consistent uniformly over

if

Thus, the optimal rate of testing is the fastest rate at which CT can approach zero while maintaining (3.1).

3.1. Asymptotic Behavior of the Test Statistic under the Null Hypothesis

As can be seen in Section 2, the proposed test statistic depends on the bandwidth. This section then suggests using

where HT = {h = hmaxak : hhmin, k = 0,1,2,…}, in which 0 < hmin < hmax and 0 < a < 1. Let JT denote the number of elements of HT. In this case, JT ≤ log1/a(hmax/hmin). Detailed conditions on hmin and hmax will be given in Assumption B.3 in Appendix B.

Simulation Scheme.

We discuss how to obtain a critical value for L*. The exact α-level critical value, lα* (0 < α < 1), is the 1 − α quantile of the exact finite sample distribution of L*. Because θ0 is unknown, lα* cannot be evaluated in practice. We therefore suggest choosing a simulated α-level critical value, lα, by using the following simulation procedure.

1. For the simulation, we either use resamples of the sampled data Xt or generate the data Xt from the marginal density f (x0) or the corresponding transition density with an initial value of θ0 under

.

2. The true value θ0 is estimated based on the simulated {Xt}, and the resulting estimate is denoted by

.

3. We choose HT as specified following (3.2) with hmin and hmax satisfying Assumption B.3 in Appendix B and then compute L* of (3.2) using the simulated {Xt} and

.

4. Repeat the preceding steps M times and produce M versions of L*, Lm* for m = 1,2…,M. The simulated critical value lα is then the (1 − α)% percentile of the M values, Lm* for m = 1,2…,M, of L*.

We now state the following result, and its proof is relegated to Appendix B.

THEOREM 3.1. Assume that Assumptions A.1, A.3, and A.4 in Appendix A and B.1–B.3 listed in Appendix B hold. Then under

The main result on the behavior of the test statistic L* under

is that lα is an asymptotically correct α-level critical value under any model in

.

3.2. Consistency against a Fixed Alternative

We now show that L* is consistent against a fixed alternative model. Assume that model (1.1) holds. Let the parameter set Θ be an open subset of Rq. Let

satisfy Assumption B.1 in Appendix B. For convenience, let

Measure the distance between

by the normalized l2 distance

where ∥·∥ denotes the euclidean norm. If

is false, then

for all sufficiently large T and some Cρ > 0. A consistent test will reject a false

with probability approaching one as T → ∞.

The following theorem establishes a consistency result, and its proof is relegated to Appendix B.

THEOREM 3.2. Assume that the conditions of Theorem 3.1 hold. In addition, if there is a Cρ > 0 such that

holds then

3.3. Consistency against a Sequence of Local Alternatives

In this section, we consider the consistency of L* under local alternatives of the form

with

for some constant C0 > 0 and θ1 ∈ Θ.

Let

We now have that

To ensure that the rate of convergence of fT to the parametric model F1) is the same as the rate of convergence of CT to zero, in view of (3.4), we need to assume that ΔT(x) is a continuous function that is normalized so that

for some δ > 0. When ΔT(·) does not depend on T, condition (3.5) can be replaced by E2(X1)] > 0, which holds automatically when Δ(·) ≠ 0.

We now state the following consistency result, and its proof is relegated to Appendix B.

THEOREM 3.3. Assume that Assumptions A.1, A.3, and A.4 in Appendix A and B.1–B.3 with hmax = cmax(log log T)−1 for some constant cmax > 0 in Appendix B hold. Let

be a

-consistent estimator of θ. Let fT satisfy (3.3) with

for some constant C > 0. In addition, let condition (3.5) hold. Then

The result shows that the power of the adaptive, rate-optimal test approaches one as T → ∞ for any function ΔT(·) and sequence {CT} that satisfy the conditions of Theorem 3.3.

3.4. Consistency against a Sequence of Smooth Alternatives

This section establishes that L* is consistent uniformly over alternatives in a Hölder smoothness class whose distance from the parametric model approaches zero at the fastest possible rate. It can be shown that we can extend the results to Sobolev and Besov classes under more technical conditions.

Before specifying our smoothness classes, we introduce the following notation. Define the Hölder norm

where Sf = {xR1 : f (x) > 0}.

The smoothness classes that we consider consist of functions fS(H,s) ≡ {f : ∥ fH,scH} for some (unknown) s ≥ 2 and cH < ∞.

For some s ≥ 2 and all sufficiently large Cf < ∞, define

where

is as defined in Section 3.2.

We now state the following consistency result, and its proof is relegated to Appendix B.

THEOREM 3.4. Assume that Assumptions A.1, A.3, and A.4 in Appendix A and B.1–B.3 in Appendix B hold. Then for 0 < α < 1 and BH,T as defined in (3.7)

Remark 3.1. Theorems 3.1–3.4 show that we have established some consistency results for the proposed test given in (3.2). Such consistency results correspond to Theorems 1–4 of Horowitz and Spokoiny (2001) for a fixed design regression case. In our case, we deal with the case where the observations are stationary and α-mixing time series. In addition, the optimum version L* is asymptotically consistent as established in Theorem 3.2. This is one of the advantages of our test over existing ones, such as the natural competitor proposed in Aït-Sahalia (1996a). In Section 4, we show that our test also outperforms the natural competitor in the finite sample case.

4. EXAMPLE OF IMPLEMENTATION AND APPLICATION IN DIFFUSION MODELS

This section illustrates the proposed adaptive test by the following example. As the bootstrap simulation procedure for selecting both the bandwidth and simulated critical values is extremely computationally demanding, especially for large numbers of data, we only consider using the CIR model proposed by Cox et al. (1985) and show how to implement the adaptive test statistic L* of (3.2) in practice through using a simulated example. The main reason for choosing the model is not only because both the marginal and transition density functions have closed forms but also because the model has been studied extensively in the literature. See, for example, Aït-Sahalia (1999) and Hong and Li (2004).

Example 4.1

We consider using the CIR model given by

where κ > 0, β > 0, and σ > 0 are unknown parameters and Bt is standard Brownian motion. It can be shown that {rt} is distributed on R+ = (0,∞) if 2κβ/σ2 ≥ 1. Furthermore, it follows from Lemma 3.1 of Masry and Tjøstheim (1995) that the process {rt} satisfies Assumption A.1(i). Alternatively, one may apply Assumption A.3′ of Aït-Sahalia (1996b, p. 552) to verify that {rt} is strictly stationary and α-mixing.

As a result of (2.2), the marginal density function of {rt} satisfying model (4.1) is

where θ = (β,κ,σ), ν = 2κβ/σ2 − 1, and Γ(·) is the usual gamma function. Let θ0 be the true value of θ.

To construct a sequence of local alternatives, we also consider using a marginal density of the form

where ν1 = 2κ/σ2. It is known that f1(x,θ) is the marginal density of {rt} satisfying the AG model proposed in Ahn and Gao (1999)

with parameter values κ > 0, β > 0, and σ > 0. The necessary and sufficient conditions for stationarity and unattainability of 0 and ∞ in finite expected time are the pairs κ > 0 and β > 0 (see Ahn and Gao, 1999). To show that {rt} is strictly stationary and α-mixing, as explained in Appendix A of Ahn and Gao (1999, pp. 755–756), one needs only to verify Assumption A.3′ of Aït-Sahalia (1996b, p. 552). It is easy to see that such an assumption holds for the marginal density, drift, and diffusion functions given in (4.3) and (4.4).

The corresponding structure of the test problem (2.3) for this example can be constructed as

where

in which θ1 ∈ Θ. The reason for choosing such ΔT(·) as the local shift function is to ensure that the models under

fluctuate closely around those under

. The choice of (4.5) and (4.6) ensures that (3.7) holds with s = 2. This implies that the adaptive test is consistent against the sequence with an optimal rate. Note that Assumptions B.1 and B.2 hold.

In the following simulation, we consider using a class of alternatives of the form

where θ1 ∈ Θ and 0 < ψ < 1 is defined as the truncation parameter to be chosen.

To compute the nonparametric estimators involved, we choose the normal kernel function given by

throughout the simulation. Observe that Assumptions A.1–A.4 hold. For the CIR and AG models, we simulate the data from their marginal density and transitional functions, which all have closed forms.

In the detailed simulation, we simulate the data from (4.2) for the CIR model, (4.3) for the AG model, and then (4.7) under

. Using the simulated data, we compute

in which R(k) =

are used after the choice of (4.8) and HT is as defined following (3.2) with

. Note that Assumption B.3 holds.

To compare L* with

in (2.4), we construct a test statistic of the form

where h* is chosen by using the following procedure.

[bull ] We simulate Xt with probability 1 − ψ from the CIR model and with probability ψ from the AG model with an initial value of θ1 under

.

[bull ] Use the simulated data {Xt : t = 1,2,…,T} to estimate θ1.

[bull ] Compute the resulting function of h given by

[bull ] Repeat the preceding steps Q = 1,000 times and produce Q versions of

denoted by

for m = 1,2,…,Q. Use the Q functions of h,

for m = 1,2,…,Q, to construct their empirical bootstrap distribution function, that is,

where I(Uu) is the usual indicator function.

[bull ] For a given asymptotic critical value ecvα at the level α (e.g., ecv0.005 = 1.645 at the 5% level), we then calculate the following power function:

[bull ] Find approximately at which h value the power function ψ(h) is maximized. Denote the maximizer by h*.

We then consider using the same choice of the parameter values as in (17) of Pritsker (1998). This means that the baseline model is model (4.1) with

. In this example, the same parameter values were also used as θ1 in computing the power of the tests L* and L0*. The truncation parameter was chosen as ψ = 0 under

whereas the truncation parameter was chosen as

under

. Three different sizes of sample T = 1,000, 2,755, or 5,500 were then considered. The corresponding simulated critical values, lα and l, of L* and L0* at the α level are then found by using the simulation scheme proposed in Section 3.1. The sizes of the tests were then computed based on the data simulated under

, and the power values of the tests were calculated based on the data generated under

. In implementing the simulation procedure, we used M = 1,000 involved in the simulation scheme proposed in Section 3.1. The number of simulations in producing Table 1 was also 1,000. Both the size and the power of L* and L0* are given in Table 1.

Rejection rates for the marginal density tests

Remark 4.1. (i) As can be seen from Table 1, the power values of both L* and L0* look reasonable when

, or about 3%. This may show that both L* and L0* are practically applicable to the medium sample case, because the difference between the null hypothesis and its alternative was made deliberately close. We also computed the power of the tests for the case where

or 5%. Our small sample results showed that the power of L* was already 100% even when T = 1,000. In general, it is true that the power increases as ψ increases for each case. Observe that L* is slightly more powerful than L0*, although h* involved in

has been chosen based on the assessment of its power. We observe that the sizes of the two tests are also close to either 5% in the first half of Table 1 or 1% in the second half of Table 1.

(ii) We also examined the dependence of the power on the choice of the initial parameter values. Our experience suggests that the power of the tests mainly depends on the choice of the truncation parameter ψ. This is both understandable and expected, because the test statistics finally depend only on the estimation and reestimation procedure of the vector of the initial parameters rather than the initial parameter values themselves. This is probably why artificial values or parameter values estimated from a set of real data are used as initial values for starting a simulation procedure. For example, Hong and Li (2004) use the parameter values estimated from the U.S. interest rate series for their simulation procedure.

(iii) Compared with existing results (see Pritsker, 1998), both the size and power of L0* have been significantly improved. This is probably because (a) the choice of h involved in

is based on the assessment of the power of

rather than using an estimation-based optimal value and (b) to avoid using the asymptotic distribution of

and then an asymptotic critical value of 1.645 at the 5% or 2.33 at the 1% level, we have used the bootstrap-based simulated critical value, l, at the level α. We also computed both the power and size values for the case where h was chosen by using a cross-validation criterion, and the resulting sizes and power values were similar to those obtained by Pritsker (1998), although L* always performed better than L0*. This further demonstrates that the asymptotic distribution of either

can only provide some kind of idea about the asymptotic behavior. In practice, we strongly suggest using the proposed bootstrap simulation procedure for choosing a simulated critical value rather than an asymptotic critical value.

5. CONCLUSION

In this paper, we have considered testing the general continuous-time diffusion model (1.1) under the α-mixing condition. The results for continuous-time models under the α-mixing condition complement some existing results under the β-mixing condition. See, for example, Aït-Sahalia (1996a). Moreover, an adaptive and optimal test procedure has been established. This extension corresponds to Horowitz and Spokoiny (2001) for the fixed design nonparametric regression and then to Chen, Gao, and Li (2001) for a nonparametric time series regression model. To deal with the α-mixing condition, we have established some novel results for moment inequalities (see Lemma C.2) and limit theorems (see Lemma A.1) for degenerate U-statistics of strongly dependent processes. Both Lemmas A.1 and C.2 are applicable to some other estimation and testing of diffusion processes with the α-mixing condition (for more about various mixing conditions, see Doukhan, 1995). In addition, we have demonstrated how to implement the proposed test procedure in practice through using a simulated example.

The results given in this paper can be extended in a number of directions. First, it is possible to consider testing for both the marginal and transition density functions simultaneously, because the transition density can capture the full dynamics of a diffusion process and, in particular, can distinguish the diffusion processes that have the same marginal density but different transition densities. Second, the results of this paper for the short-range dependent continuous-time case can be extended to the long-range dependent continuous-time case. Third, one probably can relax the strict stationarity and the mixing condition, as the recent work by Aït-Sahalia (1999) and Karlsen and Tjøstheim (2001) indicates that it is possible to do such work without the stationarity and the mixing condition. This part is particularly important for two reasons: (i) for the long-range dependent case one needs to avoid assuming both the long-range dependence and the mixing condition, as they contradict each other; and (ii) some important models are nonstationary. These are some issues left for future research.

APPENDIX A

This Appendix lists the necessary assumptions for the establishment and the proof of the main results given in Section 2.

A.1. Assumptions. Let the parameter set Θ be an open subset of Rq. Let

. Define [dtri ]θ f (x,θ) = ∂f (x,θ)/∂θ, [dtri ]θ2 f (x,θ) = ∂2f (x,θ)/∂θ∂θ′, and [dtri ]θ3 f (x,θ) = ∂3f (x,θ)/∂θ∂θ′∂θ′′ whenever these derivatives exist. For any q × q matrix D, define

where

.

Assumption A.1. (i) Assume that the process {rt} is strictly stationary and α-mixing with the mixing coefficient α(t) = Cααt defined by

for all s,t ≥ 1, where 0 < Cα < ∞ and 0 < α < 1 are constants and Ωij denotes the σ-field generated by {rt : itj}.

(ii) Assume that the univariate kernel function k(·) is nonnegative, symmetric, and four-times differentiable on R1 = (−∞,∞). In addition,

.

Assumption A.2. (i) The parameter space Θ ⊂ Rq is compact. In a neighborhood of the true parameter θ0, f (x,θ) is twice continuously differentiable in θ; E [(∂f (x,θ)/∂θ)(∂f (x,θ)/∂θ)τ] is of full rank. In addition, assume that G(x) is a positive and integrable function with E [G(Xt)] < ∞ uniformly in t ≥ 1 such that supθ∈Θ| f (Xt,θ)|2G(Xt) and supθ∈Θ∥[dtri ]θ j f (Xt,θ)∥2G(Xt) for j = 1,2,3, where for

.

(ii) Assume that

is a

-consistent estimator of θ0.

Assumption A.3. For every θ ∈ Θ:

(i) The drift and the diffusion functions are three times continuously differentiable in xR+ = (0,∞), and σ > 0 on R+.

(ii) The integral of

converges at both boundaries of D, where v is fixed in D.

(iii) The integral of

diverges at both boundaries of D.

Assumption A.4. (i) Assume that the first three derivatives of f (x) are continuous on D and that f (x) > cf > 0 on the interior of D for some cf > 0. In addition, both f (x) and f2(x) are integrable on D.

(ii) The initial random variable r0 is distributed as f (x).

(iii) The true drift and diffusion functions satisfy Assumption A.3.

Assumption A.5. The bandwidth parameter h satisfies that

Remark A.1. Assumptions A.1–A.4 are quite natural in this kind of problem. Assumptions A.2–A.4 correspond to Assumptions A0, A1, and A3 of Aït-Sahalia (1996a). Assumption A.1 is the exception. Assumption A.1(i) assumes the α-mixing condition, which is weaker than the β-mixing condition. Assumption A.1(ii) is quite general, allowing the use of the standard normal kernel. Assumption A.5 ensures that the theoretically optimum value of hoptimal = CT−1/5 can be included. This is important, because there may be cases in which hoptimal is also optimal for testing purposes.

A.2. Technical Lemmas.

The following lemmas are necessary for the proof of the main results stated in Section 2.

LEMMA A.1. Let ξt be an r-dimensional strictly stationary and strong mixing (α-mixing) stochastic process. Let φ(·,·) be a symmetric Borel function defined on Rr × Rr. Assume that for any fixed xRr, E [φ(ξ1,x)] = 0 and E [φ(ξij)|Ω0j−1] = 0 for any i < j, where Ωij denotes the σ-field generated bys : isj}. Let

. For some small constant 0 < δ < 1, let

where the maximization over P in the equation for MT4 is taken over the four probability measures P1ijk), P1)Pijk), P1)Pi1)Pi2i3), and P1)Pi)Pj)Pk), where (i1,i2,i3) is the permutation of (i,j,k) in ascending order;

Assume that all the MTs are finite. Let

If limT→∞(max{MT,NT}/σT2) = 0, then

Remark A.2. Lemma A.1 establishes central limit theorems for degenerate U-statistics of strongly dependent processes. It should be pointed out that the conclusion of Lemma A.1 remains true when the martingale assumption that E [φ(ξij)|Ω0j−1] = 0 for any i < j is removed. Such a martingale assumption is used only for a direct application of an existing central limit theorem (CLT) for martingales. Without such a condition, one needs only to decompose

and then apply the martingale CLT to

. Using the condition that E [φ(ξ1,x)] = 0 for each given x, one can show that the terms involving E [φ(ξij)|Ω0j−1] are negligible (see Roussas and Ioannides, 1987, Theorem 5.5). Thus, as assumed in Lemma 3.2 of Hjellvik, Yao, and Tjøstheim (1996) and Theorem 2.1 of Fan and Li (1998), the condition that E [φ(ξ1,x)] = 0 for each given x is the key assumption.

Proof. Let

To prove Lemma A.1, it suffices to show that as T → ∞

and

By Lemma C.1 (with η1 = φik, η2 = φjk, l = 2, pi = 2(1 + δ), and Q = 1/(1 + δ)),

Therefore,

because

.

Observe that

Let ηijk = 1/3(φikφjk + φijφkj + φjiφki) and ηij = 1/3∫φikφjk dPk).

Then by Lemma C.2(i) in Appendix C,

Let Cφ = ∫φ122φ342 dP1) dP2) dP3) dP4), where P(ξ) denotes the probability measure of ξ.

Using Lemma C.1 repeatedly, we have that for different i,j,k,l

where Δ(i,j,k,l) is the minimum increment in the sequence that is the permutation of i,j,k,l in ascending order.

Similar to (A.5), one can have for all different i,j,k,l

Therefore,

It now follows from (A.3)–(A.5) that for any ε > 0

Thus, (A.1) holds.

Note that for 2 ≤ kT,

It is easy to see that

Similar to (A.5), one can have for any (i,j) ≠ (s,t),

where Δ(·) is as defined in (A.5).

Consequently, the first two terms on the right-hand side of (A.7) are of order O(T3MT41/(1+δ)), because

.

Thus, (A.2) follows from

This finishes the proof. █

Before stating the following lemmas, we define the following notation.

using

where

in which A is the T × T matrix with {ast} as its (s,t) element.

We assume without loss of generality throughout the rest of this paper that

LEMMA A.2. Under the conditions of Theorem 2.1, we have as T → ∞

Proof. We now prove (A.9). It follows from Assumptions A.2 and A.3 that as T → ∞

This completes the proof of the first part of (A.9). For the proof of the second part of (A.9), let

Then

We first look at the main component of σT2. We now have

Using Assumptions A.1–A.4, we have as T → ∞

where L(x) = ∫k(x + y)k(y) dy is as defined in (2.5).

Therefore, as T → ∞

where k(4)(·) is as defined in (2.5).

Similarly, one can show that as T → ∞

We now deal with the remainder term of var[N0T(h)] . By Lemma C.1 (with η1 = φik, η2 = φjk, l = 2, pi = 2(1 + δ), and Q = 1/(1 + δ)),

where MT1 is as defined in Lemma A.1.

Therefore, using the fact that

,

whose proof is similar to that of (A.17), which follows.

Equations (A.10)–(A.12) imply

This finishes the proof of the second part of (A.9). █

LEMMA A.3. Under the conditions of Theorem 2.1, we have as T → ∞

and

where C1 is a constant and

are as defined in Section 2.

Proof. We now give only the proof of (A.14) in some detail, as the proofs of (A.13) and (A.14) are similar and quite standard and the details follow similarly from some existing results. See, for example, Fan and Gijbels (1996).

In view of the definition of wt(x) and the second equation of (A.13), to prove (A.14), it suffices to show that as T → ∞

using a Taylor expansion to f (x) − f (xvh). This finishes the proof of (A.14). █

A.3. Proof of Theorem 2.1.

Proof of Theorem 2.1(i). To prove Theorem 2.1(i), in view of Remark A.2 and Lemma A.3, it suffices to show that

To apply Lemma A.1, let ξt = Xt and φ(ξst) = φst defined previously. Let MT and NT be defined as in Lemma A.1. We now verify only the following condition listed in Lemma A.1:

for MT1, MT 21, MT3, MT51, MT52, and MT6, where σh2 = hσ02. The others follow similarly.

For the MT part, one justifies only

The others follow similarly.

Let ψst = (1/Th)∫k((xXs)/h)k((xXt)/h)p(x) dx. We now have

where L(·) is as defined previously.

For any given 1 < ζ < 2 and T sufficiently large, we obtain

using Assumption A.1(ii), where f (x,y,z) denotes the joint density of (Xi,Xj,Xk) and Cp is a constant.

Thus, as T → ∞

Hence, (A.17) shows that (A.15) holds for the first part of MT1. The proof for the second part of MT1 follows in a similar way. Similarly, we have that as T → ∞

using Assumption A.1(ii).

This implies that as T → ∞

Thus, (A.18) now shows that (A.15) holds for MT3. It follows from the structure of {ψij} that (A.15) holds automatically for MT51, MT52, and MT6, because Est] = 0 for st.

We now prove that (A.15) holds for MT 21. For some 0 < δ < 1 and 1 ≤ i < j < kT, let MT 21 = E [|ψikψjk|2(1+δ)] . Similar to (A.16) and (A.17), we obtain that as T → ∞

using the fact that limT→∞ Th = ∞.

This completes the proof of (A.15) for MT 21, and thus (A.15) holds for the first part of {φst}. Similarly, one can show that (A.15) holds for the other parts of {φst}. Thus, we have shown that under

The proof of Theorem 2.1(i) is therefore finished. █

Proof of Theorem 2.1(ii). Note that as T → ∞

using the continuity of

in h. This completes the proof of Theorem 2.1(ii). █

Proof of Theorem 2.2. The proof follows from Theorem 2.1 and the following standard result:

APPENDIX B

This Appendix lists the necessary assumptions for the establishment and the proof of the main results given in Section 3.

B.1. Assumptions.

Assumption B.1. The parameter set Θ is an open subset of Rq for some q ≥ 1. The parametric family

satisfies the following conditions.

holds with probability one (almost surely).

Assumption B.2. (i) Let

be true. Then θ0 ∈ Θ and

for any ε > 0 and all sufficiently large CL.

(ii) Let

be false. Then there is a θ* ∈ Θ such that

for any ε > 0 and all sufficiently large CL.

Assumption B.3. (i) Assume that the set HT has the structure of (3.2) with cminT−γ = hmin < hmax = cmax(log log T)−1, where γ, cmin, and cmax are some constants satisfying 0 < γ < 1 and 0 < cmin,cmax < ∞.

(ii) Assume that ΔT(x) is continuous in xD and satisfies

for all T ≥ 1.

Remark B.1. Assumptions B.1(i) and B.1(ii) are quite standard in this kind of problem. See Assumptions 1(i) and (ii) of Horowitz and Spokoiny (2001). Assumption B.1(iii) is required to ensure that the marginal density function is identifiable. A similar condition is used in Assumption 1(iii) of Horowitz and Spokoiny (2001). It can be shown that Assumption B.1(iii) holds when f (x,θ) belongs to classes of simple linear and certain nonlinear functions in θ. The identifiability assumption is imposed to exclude the case where f (x,θ) is flat as a function of θ over certain range of θ and some value of x, because such a function may be neither identifiable nor a probability density. Assumption B.2 is needed to ensure that the true version of θ under

can be estimated by a

-consistent estimator. Assumption B.3(i) imposes some conditions on both hmin and hmax. The theoretical condition on hmin is quite general. In practice, we would suggest using

to include the estimation-based optimal bandwidth hoptimal = Cn−[1/(2s+1)], because the estimation-based optimal value may also be optimal for testing purposes in some cases. The restriction on hmax is required only for the proof of Theorem 3.3. It should be noted that hmax is not necessarily the optimal bandwidth such that the power of the resulting test is maximized. As explained at the beginning of Section 2, both the existence and reasonableness of Assumption B.3(ii) can be justified. Unlike the regression setting discussed in Horowitz and Spokoiny (2001), we need to assume

to ensure that the alternative is also a probability density. As the main results in Section 2 are only concerned with the null hypothesis, we do not need to assume such a rigorous condition for the main results.

This paper considers using only a set of discrete bandwidths for constructing the adaptive test. It is believed that some corresponding results of Theorems 3.1–3.4 can be established for the case where HT is an interval of continuous bandwidth values. As HT is always chosen as a set of discrete bandwidths in practice, we therefore think that such an extension from a set of discrete bandwidths to an interval of continuous bandwidth values may just be for theoretical and technical consideration. As such an extension also involves much more tedious and technical details, we do not discuss this issue in detail in this paper.

B.2. Technical Lemmas. Before stating the necessary lemmas for the proof of the results given in Section 3, we introduce the following notation.

LEMMA B.1. Suppose that the conditions of Theorem 2.1 hold.

(i) For every δ > 0

in probability, where C > 0 is a constant.

(ii) For each θ ∈ Θ and sufficiently large T

Proof. (i) It follows from the definition of QT(θ) that

To prove Lemma B.1(i), one first needs to show that

in probability for some constant C > 0.

Using the conditions of Lemma B.1, we now have

in probability.

In view of (B.2), to prove Lemma B.1(i), it suffices to show that

in probability.

A Taylor series expansion to f (Xt,θ) − f (Xt0) and an application of Assumption B.1(i) imply (B.3). This finishes the proof of Lemma B.1(i).

(ii) Let λmin(A) and λmax(A) denote the smallest and largest eigenvalues of A, respectively. In view of

to prove Lemma B.1(ii), it suffices to show that for n large enough

for some C > 0. Similar to the proof of Lemma A.2 of Gao, Tong, and Wolff (2002), one can easily finish the proof of (B.5). █

Without loss of generality, we consider the case of q = 1 in the following lemmas and their proofs. Define

LEMMA B.2. Under the conditions of Theorem 3.1, we have for any given θ ∈ Θ and i = 1,2

Proof. It suffices to show that for any large constant C0 > 0

where

Similar to the proof of (A.1), one can show that as T → ∞

for some function C(θ).

Using Lemmas C.1 and C.2 in Appendix C and the fact that Et(x)] = 0 for xD, one can show that as T → ∞

Thus, equations (B.7)–(B.9) complete the proof. █

LEMMA B.3. Under the conditions of Theorem 3.1, we have as T → ∞

Proof. Similar to (B.7), we have for large constant C0 > 0

Similar to (B.8), we can have as T → ∞

Analogous to (B.9), one can show that as T → ∞

Thus, equations (B.11)–(B.13) complete the proof of (B.10). █

LEMMA B.4. Under the conditions of Theorem 3.1, we have for each u > 0,

under

.

Proof. We now prove (B.14). Using a Taylor series expansion to f (Xt,θ) − f (Xt0) and Assumption B.1, we have for θ′ between θ and θ0

Hence, (B.4), (B.10), (B.15), and Assumption B.1(i) imply

The proof of (B.14) follows from (B.15) and (B.16). █

LEMMA B.5. Suppose that the conditions of Theorem 3.1 hold. Then for every u > 0, some hHT, and as T → ∞

under

.

Proof. In view of the definition of Qn(θ), to prove (B.17), it suffices to show that as T → ∞

where qT = E [QT(θ*)] .

Note that

where θ′ lies between θ and θ*.

In view of (B.6), (B.10), (B.18), and Assumptions B.1(i) and B.2(ii), to prove (B.17), it suffices to show that for any δ > 0,

as T → ∞.

Similar to (B.8) and (B.9), one can show that as T → ∞

Thus, equations (B.19) and (B.20) imply that as T → ∞

using qT = CTh(1 + o(1)) given in the proof of Lemma B.1(ii), where C is a constant independent of T. Lemma B.5 therefore follows from (B.21). █

Recall the notation introduced in (A.9). We assume without loss of generality that k(4)(0) = 1 in Lemma A.2. Define

LEMMA B.6. Suppose that the conditions of Theorem 3.1 hold. Then as T → ∞

uniformly over hHT.

Proof. The proof of (B.23) follows from (2.7) and (2.8) immediately. █

LEMMA B.7. Suppose that the conditions of Theorem 3.1 hold. Then maxhHT L0(h) and maxhHT LT(h) have identical asymptotic distributions under

.

Proof. Note that QT0) = 0 under

and that Lemmas A.3 and B.1–B.5 imply as T → ∞

Therefore, equations (B.21), (B.22), and (B.24) complete the proof of Lemma B.7. █

LEMMA B.8. Suppose that the conditions of Theorem 3.1 hold. Then for any x ≥ 0, hHT, and all sufficiently large T

Proof. It follows from the beginning of the proof of Theorem 2.1(i) that for any small δ > 0 there exists a large integer T0 ≥ 1 such that for TT0

where

.

This implies for any TT0 and x ≥ 0

using

.

The proof follows by letting

for any x ≥ 0. █

For 0 < α < 1, define

to be the 1 − α quantile of maxhHT L0(h).

LEMMA B.9. Suppose that the conditions of Theorem 3.1 hold. Then for large enough T

Proof. The proof is trivial.

LEMMA B.10. Suppose that the conditions of Theorem 3.1 hold. Suppose that

for some hHT, where

Then

Proof. To prove Lemma B.10, in view of Lemmas B.6 and B.7, it suffices to show that

which holds if

for some hHT. For any hHT, using (B.21) and then (B.17) we have

On the other hand, condition (B.25) implies that as T → ∞

Observe that

Thus, it follows from (B.26) that as T → ∞

because L0(h) is asymptotically normal and therefore bounded in probability and

.

Because of (B.27), as T → ∞

This finishes the proof. █

B.3. Proofs of Theorems 3.1–3.4.

Proof of Theorem 3.1. The proof follows from Lemmas B.6 and B.7.

Proof of Theorem 3.2. This proof is similar to that of Theorem 3.3, which follows, using Lemma B.1(ii). Alternatively, one can follow the corresponding proof of Theorem 2 of Horowitz and Spokoiny (2001) by using Lemma B.1(ii) and the condition that

to verify (B.25). █

Proof of Theorem 3.3. Condition (3.5) ensures that the rate of convergence of fT to the parametric model F1) is the same as the rate of convergence of CT to zero. In particular, when (3.5) holds,

In view of Lemma B.10, to complete the proof of Theorem 3.3, it suffices to verify (B.25). This verification follows from Lemma B.1(ii) and (B.28). █

Proof of Theorem 3.4. For the proof of Theorem 3.4, one needs to use the conditions of Theorem 3.4 to finish the proof. In our proof, we mainly use Lemma B.1(ii) and the condition of Theorem 3.4 that

to verify (B.25). █

APPENDIX C

The following two technical lemmas have already been used in the proofs of Lemma A.1 and Theorem 2.1. The two lemmas are of general interest in themselves and can be used for other nonparametric estimation and testing problems associated with the α-mixing condition.

LEMMA C.1. Suppose that Mmn are the σ-fields generated by a stationary α-mixing process ξi with the mixing coefficient α(i). For some positive integers m let ηiMsiti where s1 < t1 < s2 < t2 < ··· < tm and suppose tisi > τ for all i. Assume further that

for some pi > 1 for which

Then

Proof. See Roussas and Ioannides (1987).

LEMMA C.2. (i) Let ψ(·,·,·) be a symmetric Borel function defined on Rr × Rr × Rr. Let the processξi be defined as in Lemma A.1. Assume that for any fixed x,yRr, E [ψ(ξ1,x,y)] = 0. Then

where 0 < δ < 1 is a small constant, C > 0 is a constant independent of T and the function ψ, M = max{M1,M2,M3}, and

(ii) Let φ(·,·) be a symmetric Borel function defined on Rr × Rr. Let the process ξi be defined as in Lemma A.1. Assume that for any fixed xRr, E [φ(ξ1,x)] = 0. Then

where δ > 0 is a constant, C > 0 is a constant independent of T and the function φ, and

Proof. As the proof of (ii) is similar to that of (i), one proves only (i). Let i1,…,i6 be distinct integers and 1 ≤ ijT, let 1 ≤ k1 < ··· < k6T be the permutation of i1,…,i6 in ascending order, and let dc be the cth largest difference among kj+1kj, j = 1,…,5. Let

By Lemma C.1 (with η1 = ψ(ξi1i2i3), η2 = ψ(ξi4i5i6), l = 2, pi = 2(1 + δ) and Q = 1/(1 + δ)),

Thus,

Similarly,

Analogously, it can be shown in a similar way that

On the other hand, if {k6k5,k2k1} = {d4,d5}, by using Lemma C.1 three times we have the inequality

Hence,

It follows from (C.3)–(C.7) that

Similar to (C.8), one can show that

Finally, it is easy to see that

The conclusion of Lemma C.2(i) follows immediately from (C.8)–(C.11). █

References

REFERENCES

Ahn, D.H. & B. Gao (1999) A parametric nonlinear model of term structure dynamics. Review of Financial Studies 12, 721762.Google Scholar
Aït-Sahalia, Y. (1996a) Testing continuous-time models of the spot interest rate. Review of Financial Studies 9, 385426.Google Scholar
Aït-Sahalia, Y. (1996b) Nonparametric pricing of interest rate derivative securities. Econometrica 64, 527560.Google Scholar
Aït-Sahalia, Y. (1999) Transition densities for interest rate and other nonlinear diffusions. Journal of Finance 54, 13611395.Google Scholar
Black, F. & M. Scholes (1973) The pricing of options and corporate liabilities. Journal of Political Economy 3, 637654.Google Scholar
Chapman, D. & N. Pearson (2000) Is the short rate drift actually nonlinear? Journal of Finance 54, 355388.Google Scholar
Chen, S., J. Gao, & M. Li (2001) Simultaneous Specification Tests for Nonparametric Regression with Application to Diffusion Model Testing. Working paper, Department of Statistics and Applied Probability, National University of Singapore. Available at www.stat.nus.edu.sg/∼stacsx.
Cox, J., E. Ingersoll, & S. Ross (1985) An intertemporal general equilibrium model of asset prices. Econometrica 53, 363384.Google Scholar
Doukhan, P. (1995) Mixing-Properties and Examples. Lecture Notes in Statistics. Springer-Verlag.
Fan, J. & I. Gijbels (1996) Local Polynomial Modelling and Its Applications. Chapman and Hall.
Fan, J. & C. Zhang (2003) A re-examination of Stanton's diffusion estimation with applications to financial model validation. Journal of the American Statistical Association 457, 118134.Google Scholar
Fan, Y. & Q. Li (1998) Central limit theorem for degenerate U-statistics of absolutely regular processes with applications to model specification testing. Journal of Nonparametric Statistics 10, 245271.Google Scholar
Gao, J. & M. King (2001) Estimation and Model Specification Testing in Nonparametric and Semiparametric Regression. Working paper, School of Mathematics and Statistics, the University of Western Australia, Australia. Available at www.maths.uwa.edu.au/∼jiti/jems.pdf.
Gao, J., H. Tong, & R. Wolff (2002) Model specification tests in nonparametric stochastic regression models. Journal of Multivariate Analysis 83, 324359.Google Scholar
Hjellvik, V., Q. Yao, & D. Tjøstheim (1996) Linearity testing using local polynomial approximation. Discussion paper 60, Sonderforschungsbereich 373, Humboldt-Universität zu Berlin, Spandauerst. 1. 10178, Berlin.
Hong, Y. & H. Li (2004) Nonparametric specification testing for continuous-time models with application to spot interest rates. Review of Financial Studies 17, forthcoming.Google Scholar
Horowitz, J. & V. Spokoiny (2001) An adaptive, rate-optimal test of a parametric mean-regression model against a nonparametric alternative. Econometrica 69, 599632.Google Scholar
Jiang, G. & J. Knight (1997) A nonparametric approach to the estimation of diffusion processes with an application to a short-term interest rate model. Econometric Theory 13, 615645.Google Scholar
Karlsen, H. & D. Tjøstheim (2001) Nonparametric estimation in null recurrent time series. Annals of Statistics 29, 372416.Google Scholar
Masry, E. & D. Tjøstheim (1995) Nonparametric estimation and identification of nonlinear ARCH time series. Econometric Theory 11, 258289.Google Scholar
Pritsker, M. (1998) Nonparametric density estimation and tests of continuous time interest rate models. Review of Financial Studies 11, 449487.Google Scholar
Roussas, G. & D. Ioannides (1987) Moment inequalities for mixing sequences of random variables. Stochastic Analysis and Applications 5, 61120.Google Scholar
Stanton, R. (1997) A nonparametric model of term structure dynamics and the market price of interest rate risk. Journal of Finance 52, 19732002.Google Scholar
Sundaresan, S. (2001) Continuous-time methods in finance: A review and an assessment. Journal of Finance 55, 15691622.Google Scholar
Figure 0

Rejection rates for the marginal density tests