Hostname: page-component-745bb68f8f-kw2vx Total loading time: 0 Render date: 2025-02-11T06:29:30.894Z Has data issue: false hasContentIssue false

A NONPARAMETRIC BOOTSTRAP TEST OF CONDITIONAL DISTRIBUTIONS

Published online by Cambridge University Press:  23 May 2006

Yanqin Fan
Affiliation:
Vanderbilt University
Qi Li
Affiliation:
Texas A&M University
Insik Min
Affiliation:
Kyung Hee University
Rights & Permissions [Opens in a new window]

Abstract

This paper proposes a bootstrap test for the correct specification of parametric conditional distributions. It extends Zheng's test (Zheng, 2000, Econometric Theory 16, 667–691) to allow for discrete dependent variables and for mixed discrete and continuous conditional variables. We establish the asymptotic null distribution of the test statistic with data-driven stochastic smoothing parameters. By smoothing both the discrete and continuous variables via the method of cross-validation, our test has the advantage of automatically removing irrelevant variables from the estimate of the conditional density function and, as a consequence, enjoys substantial power gains in finite samples, as confirmed by our simulation results. The simulation results also reveal that the bootstrap test successfully overcomes the size distortion problem associated with Zheng's test.We are grateful for the insightful comments from three referees and a co-editor that greatly improved the paper. Li's research is partially supported by the Private Enterprise Research Center, Texas A&M University. Fan is grateful to the National Science Foundation for research support.

Type
Research Article
Copyright
© 2006 Cambridge University Press

1. INTRODUCTION

Currently, there exists a substantial body of work on consistent model specification testing for regression models and for unconditional distribution (density) functions; see Bierens and Ploberger (1997), Delgado and Manteiga (2001), Fan (1994, 1997, 1998), Fan and Li (1996), Hong and White (1996), Wooldridge (1992), and the references therein. In many economic applications, however, it is the distribution of one variable conditional on some other variables that is of more direct interest. The popular parametric binary or multinomial response models are but two leading examples of conditional probability models. Conditional probability models also are widely deployed in risk management and insurance settings, where the dependent variable of interest may be the claim size (a continuous variable) and the explanatory variables usually contain a mixture of discrete and continuous variables such as sex, age, whether children are present, whether one smokes, and so forth. Moreover, in risk management analysis, usually one is interested in the entire (conditional) distribution, rather than only in the conditional mean itself. Hence, a conditional probability model is more useful than a regression model in risk analysis. Relatively speaking, tests for conditional probability models are scarce. Zheng (2000), using kernel density estimators, proposed a consistent test for a parametric conditional density function. He showed that the limiting distribution of his test statistic is N(0,1) and that the test can detect Pitman local alternatives approaching the null distribution at the rate of (nhq/2)−1/2, where n is the sample size, h is the bandwidth, and q is the dimension of the conditioning variables. To apply Zheng's test to a given data set, one needs to choose the bandwidth; no guidance is provided on how this should be accomplished. Moreover, the requirement that both the dependent variable y and conditioning variables x are continuous variables severely limits the scope of application of Zheng's test, as many economic data sets contain both continuous and discrete variables. Andrews (1997) proposed a conditional Kolmogorov (CK) test for testing a parametric conditional distribution function. His test overcomes the difficulties associated with Zheng's test; it does not involve smoothing parameters and allows for both discrete and continuous variables. The critical values of the CK test of Andrews are obtained via a parametric bootstrap procedure, and the test can detect Pitman type local alternatives that approach the null model at the rate of O(n−1/2). Although Andrews' test can handle both continuous and discrete variables, it does not produce an estimate of the conditional density function, which is of course undesirable when the parametric distribution function is rejected. In addition, it does not distinguish between relevant and irrelevant explanatory variables.

A related literature is the work on dynamic integral probability transform models such as that outlined in Diebold, Gunther, and Tay (1998). Corradi and Swanson (2004) and Li and Tkacz (2004) have also proposed bootstrap-based tests for conditional distributions. The Corradi and Swanson (2004) procedure is a nonsmoothing test similar to that of Andrews (1997), and their test extends Andrews' test to the time series data setting. Li and Tkacz (2004) use kernel smoothing; however, like Zheng (2000), they only consider the case whereby both y and x are continuous variables. The conventional way of handling discrete variables when estimating a conditional density function involving both discrete and continuous explanatory variables is by the so-called frequency method in which the entire sample is first split into a number of distinct cells and the data in each cell are then used to estimate the conditional density that is a function of the remaining continuous variables. For economic data, however, it is typically the case that the number of discrete cells is comparable to or even larger than the sample size. This renders the nonparametric frequency approach infeasible. Moreover, one may not know which conditional variables should be included in a particular application and hence faces the danger of including potentially irrelevant variables in the estimate. This is unfortunate, particularly in nonparametric settings, as including irrelevant explanatory variables has serious consequences for the accuracy of the resulting estimate: the rate of convergence of the density estimator will deteriorate quickly with the number of irrelevant continuous variables (the “curse of dimensionality”), whereas the number of cells will increase quite quickly with the number of irrelevant discrete variables. Recently, Hall, Racine, and Li (2004) proposed estimating a conditional density by smoothing both the discrete and continuous variables and showed that the use of cross-validation can automatically remove irrelevant variables from the resulting estimate. This is because the cross-validation method selects bandwidths that converge to some optimal values for relevant variables but selects large values for irrelevant conditional variables, thereby effectively smoothing out the irrelevant variables from the resulting estimate.

In this paper, we exploit the approach of Hall et al. (2004) to establish an alternative test for a parametric conditional density function. It is constructed based on the Zheng (2000) setup; however, it improves upon Zheng's test in a number of important ways: (i) the bandwidth is automatically chosen by cross-validation, thereby avoiding potential arbitrariness in the test's outcome due to an arbitrary choice of the bandwidth; (ii) it allows for both discrete and continuous variables; and (iii) the critical values are obtained from a parametric bootstrap procedure, which corrects the size distortions present in Zheng's approach. Although (ii) and (iii) are shared by Andrews' CK test, our test automatically produces an estimate of the conditional density function when the parametric density function is rejected by the test. More importantly, by automatically smoothing both the discrete and continuous variables via the method of cross-validation, our test has the advantage of automatically removing irrelevant variables from the resulting estimate (see Hall et al., 2004) and, as a consequence, enjoys substantial power gains in finite samples, as confirmed by our simulation results. Although our proposed test can only detect Pitman local alternatives approaching the null at rates slower than O(n−1/2), it can be shown that for high-frequency alternatives, our test can detect local alternatives that approach the null at rates o(n−1/2) in terms of the L1 norm of the difference between the local alternative and the null model (e.g., Fan, 1998; Fan and Li, 2000). Hence it provides a complement to Andrews' CK test.

The remainder of this paper is organized as follows. In Section 2 we review and suggest a modified version of Zheng's test statistic. We also propose a bootstrap method for approximating the null distribution of our test. Section 3 reports Monte Carlo simulation results that examine the finite-sample performance of the proposed test. Finally, Section 4 concludes. Proofs are presented in the Appendix.

2. THE NULL HYPOTHESIS AND THE TEST

2.1. Zheng's Test

We begin by briefly reviewing the test proposed by Zheng (2000). Suppose that the data consist of {yi,xi}i=1n, an independent and identically distributed (i.i.d.) sample drawn from the distribution of (y,x) with the joint density function p(y,x). Let p(y|x) denote the conditional density function of y given x. We are interested in testing whether p(y|x) belongs to a particular parametric family. Let f (y|x,θ) denote a parametric conditional density function with θ being a k × 1 parameter. The null hypothesis is given by

where Θ is the parameter space that is a compact set in

. The alternative hypothesis is the negation of the null:

The Kullback–Leibler information criterion (Kullback and Leibler, 1951), measuring the discrepancy of two conditional density functions, is defined as

It is well known that I(p,f) ≥ 0 and I(p,f) = 0 if and only if p(y|x) = f (y|x0) almost everywhere (a.e.). Thus, I(p,f) serves as a proper measure to test H0. For technical reasons, instead of basing his test on the information measure, Zheng (2000) considered its first-order expansion,

Weighting (2) by the marginal density p1(x) of the conditional variable x leads to the following measure:

Zheng (2000) has shown that I1(p,f) ≥ 0 and the equality holds if and only if H0 is true. Therefore, I1(p,f) also serves as a proper measure to test for H0. For continuous random variables y and x, Zheng (2000) proposed estimating p(yi,xi) by a standard kernel density estimator and estimating f (yi|xi0)p1(xi) by a smoothed density estimator

given by

where w2,hy(·) = hy−1w2(·), w2(·) is a (specially defined) univariate kernel function, Wh(·) is a product kernel

with w(·) being a standard (second-order) univariate kernel, hy and hs's are the smoothing parameters, and

is an estimator of θ0 under the null model. The measure I1(p,f) is then estimated by

To establish the null asymptotic distribution of Tn,hc, Zheng (2000) suggested transforming the dependent variable such that it takes values in [0,1] and then choosing a special kernel function for w2(·) with the property that

as n → ∞. The use of the smoothed estimator

eliminates the bias of the kernel estimator of p(yi,xi) under H0 such that the test statistic is appropriately centered for a wide range of smoothing parameter values. Under some regularity conditions, Zheng (2000) showed that the asymptotic null distribution of Tn,hc is normal and provided a consistent estimator of its asymptotic variance.

2.2. Our Framework

We now extend Zheng's test to include both continuous and discrete explanatory variables (x is a mixed variable), where the dependent variable y can be discrete or continuous.

We first consider the case that y is a discrete variable. In this case, we show that the smoothed estimator

reduces to an average estimator. Thus, the resulting test statistic only involves summations and hence avoids the need for numerical integration.

Let x = (xc,xd), where xc is a q × 1 continuous variable and xd is an r × 1 discrete variable. We use xisc (xisd) to denote the sth component of xic (xid). We further assume that xisd takes the values in {0,1,…, cs − 1} (it takes cs different values).

In constructing the kernel density estimate, we use different kernel functions for the discrete and continuous variables. For the discrete variable xd, we use the Aitchison and Aitken (1976) kernel: l(xisd,xjsds) = 1 − λs if xisd = xjsd, and l(xisd,xjsds) = λs /(cs − 1) if xisdxjsd. Hence, the product kernel for the discrete variable is

where Nis(x) = I(xisdxjsd), in which I(·) is the usual indicator function and λ1,…,λr are the smoothing parameters for the discrete components and are constrained by 0 ≤ λs ≤ (cs − 1)/cs. Note that when λs assumes the upper extreme value, (cs − 1)/cs, l(xisd,xjsds = (cs − 1)/cs) ≡ 1/cs becomes unrelated to (xisd,xjsd), i.e., the sth component of xd is completely smoothed out when λs = (cs − 1)/cs.

For the continuous component xc, we still use the standard (second-order) product kernel function as discussed earlier. Therefore, for the mixed type variable x = (xc,xd), the kernel function is defined by

where γ = (h,λ) ≡ (h1,…, hq1,…, λr).

We now discuss how to estimate p(yi,xi) and p1(xi). Assume that yi is a discrete variable; then we estimate p(yi,xi) and p1(xi) by the following leave-one-out kernel estimators:

To construct the smoothed estimator of f (yi|xi0), we replace Wh(·) in (4) by Kγ(xi,xj) and ∫w2,hy((yiy)/hy) dy by [sum ]y I(yiy). Taking into account these modifications, we obtain

Using

just introduced, we define our test statistic as

Note that the double summation in Tn does not include j = i terms because we have used the leave-one-out estimators for estimating p(yi,xi) and p1(xi). The reason for using these leave-one-out estimators is that, under H0, the asymptotic distribution of Tn will be centered at zero (there is no center term).

The smoothing parameters h1,…, hq (corresponding to the continuous variable xc) can be selected by several commonly used procedures, including the cross-validation method, the plug-in method, and some ad hoc methods. However, for λ1,…, λr, the plug-in or even an ad hoc formula is not available. Hall et al. (2004) have shown that using the cross-validation method to select λ1,…, λr and h1,…, hq has some nice properties: when xsc (xsd) is a relevant variable, the cross-validation method will select a small hss) that converges to zero at an optimal rate; when xsc (xsd) is an irrelevant variable,1

We say that xs is an irrelevant variable if p(y|x) is independent of xs.

the cross-validation method will select an extremely large value for hs (upper bound value for λs) so that the irrelevant variables are (asymptotically) automatically removed (smoothed out). Indeed in the problem of nonparametric estimation of a conditional density, cross-validation comes into its own as a method with no obvious peers. Therefore, we will choose λ1,…, λr, h1,…, hq by the cross-validation method suggested in Hall et al. (2004).

Let (h,λ) = (h1,…, hq1,…, λr). Hall et al. propose choosing (h,λ) by minimizing the following objective function:2

Hall et al. (2004) show that, up to an additive constant term that does not depend on (h,λ), CV(h,λ) is a consistent estimator of the weighted integrated squared error:

, where

if y is a continuous variable and

if y is a discrete variable.

where

in which

are the leave-one-out kernel estimators of p1(xi) and p(xi,yi), respectively, and m(xic) is a weight function introduced to deal with the small random denominator problem; see Hall et al. (2004).

We will use

to denote the resulting smoothing parameters. Assuming that all the x variables are relevant variables, Hall et al. (2004) showed that

for s = 1,…,q, and

for s = 1,…, r, where as0 and bs0 are some finite constants.

THEOREM 2.1. Under conditions (C1)–(C3) given in the Appendix, we have under H0

where

and

is a consistent estimator of σ02 = [∫W2(v) dv]E [(1 − f (yi|xi0)) f−1(yi|xi0)p1(xi)], the asymptotic variance of

.

It can be shown that under

diverges to +∞. Hence, the

test is a consistent test. Moreover, the

test can detect local alternatives that approach the null at a rate of Op(n−1/2(h1hq)−1/4) = Op(n−(1/2)((8+q)/(8+2q))), which is slower than Op(n−1/2) (because hj = Op(n−1/(4+q)) for all j = 1,…,q).

We now briefly discuss the case where the dependent variable y is continuous. In this case, one can still use Zheng's test statistic given in (5) but with w2,hy((yiyj)/hy) and Wh((xicxjc)/h) being replaced by

, respectively, where

denote the cross-validation selected smoothing parameters suggested by Hall et al. (2004); i.e., one chooses (hy,h,λ) by minimizing (11), but now Gi(xi) is defined as

where w2,hy(v) = hy−1w2(v) and w2(v) = ∫w2(u)w2(vu) du is the twofold convolution kernel derived from w2(·).

With a slight abuse of notation, the resulting test statistic becomes

where

contains the extra smoothing parameter

because yi is continuous.

The asymptotic distribution of

is given in the following theorem.

THEOREM 2.2. Under conditions (C1)–(C3) given in the Appendix, we have under H0,

where

is a consistent estimator of σ0,c2 = 2[∫W2(v) dv]E [p1(xi)], the asymptotic variance of

.

The proof of Theorem 2.2 is similar to that of Theorem 1 in Zheng (2000) and is omitted here.

2.3. A Parametric Bootstrap Test

Theorems 2.1 and 2.2 provide, respectively, the asymptotic null distribution of

. Consequently, one can perform tests for H0 by comparing the value of

with its asymptotic critical value. However, it is well known that consistent nonparametric tests often suffer from substantial finite-sample size distortions. Our simulations reveal that the

shares this drawback. To overcome this problem, we propose a bootstrap procedure to more accurately approximate the finite-sample null distribution of

. It involves the following steps.

Step (i). Generate the ith bootstrap value of the dependent variable y from the parametric conditional distribution

. Denote this value by yi* (i = 1,…, n). We have the complete bootstrap sample {xi,yi*}i=1n.

Step (ii). Based on the parametric null model, estimate θ using the bootstrap sample. Let

denote the resulting estimator. Compute the bootstrap statistic

in the same way as

except that

are replaced by

, respectively. Note that we use the same cross-validation selected smoothing parameter

in computing the bootstrap statistics. There is no re-cross-validation in computing

.

Step (iii). Repeat steps (i) and (ii) a large number of times, say, B times, and use the empirical distribution of the B bootstrap statistics

to approximate the null distribution of

.

Step (iv). The bootstrap test rejects H0 at significance level α if

exceeds the empirical αth percentile of

.

The following theorem justifies the asymptotic validity of the bootstrap test.

THEOREM 2.3. Assume the same conditions as in Theorem 2.1 (Theorem 2.2) except the null hypothesis. We have

where Φ(·) is the cumulative distribution function of a standard normal random variable.

The proof of Theorem 2.3 is given in the Appendix.

In words, Theorem 2.3 states that

converges to N(0,1) in distribution in probability. Other authors show that some bootstrap method works using the concept of convergence with probability one, where one states that the left-hand side of (13) is o(1) with probability one (i.e., convergence in distribution with probability one). Here we choose to use the concept of convergence in distribution in probability because our test statistic involves nonparametric estimation and it is easier to work with “convergence in probability” than “convergence with probability one.”

Note that Theorem 2.3 holds true regardless of whether the null hypothesis is true or not. Therefore, (i) when the null hypothesis is true, the bootstrap procedure will lead to (asymptotically) correct size of the test, because

converges in distribution to the same N(0,1) limiting distribution under H0; (ii) when the null hypothesis is false, because the test statistic

will converge to +∞ in probability, whereas asymptotically the bootstrap critical value is still finite (say, the 95th quantile from the N(0,1) distribution), the bootstrap procedure leads to a consistent test.

3. MONTE CARLO SIMULATION RESULTS

In this section, we present Monte Carlo simulation results to examine the finite-sample performance of our

test.

3.1. Discrete Dependent Variable

In this simulation experiment, the dependent variable y is a {0,1} binary variable. We use a slightly different notation in this section; x denotes xc and z denotes xd. The data generating process (DGP) for the null model is given by

where {xi}i=1n is a random sample from N(0,1), zi takes binary values {0,1} with case (i) Pr[zi = 1] = ½ and Pr[zi = 0] = ½ and case (ii) Pr[zi = 1] = 0.8 and Pr[zi = 0] = 0.2, and the error term {ui} is i.i.d. N(0,1). Moreover, xi, zi, and ui are all independent of each other. The true parameters are {β01} = {1,1} and β2 = {1, 0.3, 0}; β2 = 0 corresponds to the case that zi is in fact an irrelevant variable. This leads to the following null hypothesis:

where Φ(·) is the standard normal cumulative distribution function. The parametric conditional density of the null model is estimated by the maximum likelihood (ML) method.

The following two alternative DGPs are constructed to examine the power of the

test; one has a nonlinear term in the index, and the other has a conditional heteroskedastic error:

where xi, zi, and ui are all generated in the same way as before. Also, β012 take the same values as previously, whereas β3 = 1. We use the parametric bootstrap described earlier to approximate the null distribution of the test statistic

.

Our test will be compared with the CK test of Andrews (1997) with test statistic (CKn) defined as

where F(·|·,·,θ) is the parametric conditional distribution function and

is the ML estimator of θ0.

The sample sizes considered are n = 100 and 200, the numbers of simulations are 5,000 for size estimation and 2,000 for power estimation, and the number of bootstraps is B = 1,000 for all cases. The simulation results for discrete yi with relevant covariates only are reported in Table 1.

DGPa: The case of discrete yi with relevant covariates

From Table 1 we observe that for different values of β2 (with β2 = 1, 0.3) and different values of Pr(zi = 1) (0.5, 0.8), the performances of the

and Andrews' tests are qualitatively the same. Overall the estimated sizes are quite close to their nominal sizes for both tests. The power performances are mixed for the two alternative models. For the alternative DGP1a with an extra quadratic term, our test

shows higher power than Andrews' test for the sample sizes considered. However, for some cases of DGP2a with a heteroskedastic error term, Andrews' test is slightly more powerful than ours. The simulation results show that our

test complements Andrews' test.

Next we consider the case with an irrelevant covariate. We use the same DGP as before except that now we choose β2 = 0 so that the binary discrete variable z becomes an irrelevant covariate. Because this information is unknown a priori, we still compute the conditional probability of y conditional on both x and z. In this case we expect that the cross-validation method tends to select the upper bound value of λ = ½ so that the irrelevant covariate z is smoothed out automatically, resulting in a finite-sample power gain for the

test.

From Table 2 we observe that the power of the

test improves substantially compared with those reported in Table 1. It is interesting to observe that for DGP2a, the power performance of the

test is quite comparable to that of Andrews' test. Thus, the simulation results confirm that our cross-validation-based test indeed has the ability to remove irrelevant covariates and enjoys superior finite-sample power performance.

The case of discrete yi with irrelevant covariates

3.2. Continuous Dependent Variable

In this section we consider the case where both y and x are continuous variables, and we compare the finite-sample performance of Zheng's original test with our

test. The first DGP we use is the same as that in Zheng. The null model is a linear regression model with normal homoskedastic errors:

where {xi}i=1n is a random sample from N(0,1) and the error term {ui} is i.i.d. N(0,σ2). Moreover, xi and ui are independent of each other. The true parameters are {β01,σ} = {1,1,1}. This leads to the following null hypothesis:

where φ(·) is the standard normal density function. The parameter θ is estimated by the ML estimation method.

Two alternative models are considered: one is designed to test misspecification in the regression (DGP1b), and the second is to test homoskedasticity of the error term (DGP2b):

where β2 is set to be 1 in the experiment. We also report Andrews' test for comparison purposes. The simulation results are reported in Table 3a.

DGPb: The case of continuous yi

We observe from Table 3a that the parametric bootstrap method successfully overcomes the size distortion of Zheng's test. The estimated sizes of the bootstrap test are all close to their nominal values, whereas Zheng's test based on the asymptotic normal approximation is significantly undersized. For the alternatives DGP1b and DGP2b, we observe that the bootstrap test

is much more powerful than Zheng's test. There are two reasons for this: the first is that the bootstrap test corrects the undersize problem of Zheng's test and hence improves the finite-sample power performance; the second reason is that we use the data-driven cross-validation method to select the smoothing parameters that lead to optimal smoothing in estimating the unknown conditional density functions, whereas Zheng suggested using some ad hoc method to select the smoothing parameters. It turns out that the use of optimal smoothing also enhances the finite-sample power of the test. For DGP1b, Andrews' test has similar power as the

test, whereas for DGP2b, Andrews' test is less powerful than the

test.

Finally we consider a case that there exists an irrelevant continuous variable. We will use basically the same setup as in DGPb except that we set β2 = 0 now. Therefore, x2i becomes an irrelevant variable. However, this information is not used in the estimation. That is, all estimation methods still use the full data set {yi,x1i,x2i}i=1n. Because our cross-validation-based

has the advantage of (asymptotically) removing the irrelevant variable x2, we expect that the

test will enjoy further power gains. The simulation results are reported in Table 3b.

From Table 3b we observe that the

test has good estimated sizes. Zheng's test still underestimates the sizes at the 5% and 10% levels. Andrews' test is also somewhat undersized when an irrelevant variable exists. From the estimated power results, we see substantial power gain of the

test over Zheng's test. Essentially, Zheng's test is based on a two-dimensional nonparametric conditional density estimate because the smoothing parameters in Zheng's test are selected by some ad hoc rules that cannot detect the irrelevant variable x2, whereas our

test estimates, asymptotically, a one-dimensional conditional density function because x2i will be smoothed out asymptotically. The

test is also more powerful than Andrews' test for this DGP (when there is an irrelevant continuous variable). Of course here we only report a limited simulation result, from the local power analysis; we expect that there exist data generating processes for which Andrews' test will be more powerful than the

test. Our simulation results show that the

test can serve as a useful complement to Andrews' test when one is interested in testing a parametric conditional distribution.

4. CONCLUSIONS

This paper proposes a kernel-based bootstrap test for parametric conditional distribution functions. We separately consider the case where y is a discrete variable and where y is a continuous variable. In either case, the conditional variables can contain both discrete and continuous variables. By automatically smoothing both the discrete and continuous variables via the method of cross-validation, our test has the advantage of automatically removing irrelevant variables from the estimate of the conditional density function and, as a consequence, enjoys substantial power gains in finite-sample applications, as confirmed by our simulation results. The test is potentially applicable in a wide variety of applications and should prove useful to applied researchers.

APPENDIX

We first state conditions that are used to prove Theorem 2.1.

(C1) {yi,xi}i=1n are i.i.d. data with a joint density p(y,x). The first-order derivatives of p(.,.) with respect to its continuous arguments are uniformly bounded. The marginal density p1(x) of xi and its first-order derivatives with respect to its continuous arguments are uniformly bounded.

(C2) (i) The parameter space Θ is a compact and convex subset of Rk. Let ∥·∥ denote the euclidean norm of ·; then f (y|x0)−1, ∥(∂f (y|x,θ))/∂θ∥, ∥(∂2 log f (y|x,θ))/∂θ∂θ′∥, and ∥(∂ log f (y|x,θ))/∂θ × (∂ log f (y|x,θ))/∂θ′∥ are all bounded by a nonnegative function b(x,y) with ∫ b(x,y)s < ∞ (s = 1,2), where ∫ denotes integration for the continuous variable and summation for the discrete variable. (ii)

under H0.

(C3) w(·) is a nonnegative, bounded, symmetric function with ∫ w(v) dv = 1 and ∫ w(v)v2 dv = c(< ∞).

The preceding conditions are basically the same as those used in Zheng (2000).

We give the central limit theorem (CLT) of Hall (1984, Thm. 3.1) for degenerate U-statistics as a lemma here.

LEMMA A.1. Let

be a second-order U-statistic, where {zi}i=1n is i.i.d. Suppose E [Hn(zi,zj)|zi] = 0 (for ij, Un is a degenerate U-statistic) and define Gn(z1,z2) = E [Hn(z3,z1)Hn(z3,z2)|z1,z2]. If

then

In the proof presented subsequently, we will replace

by their nonstochastic leading terms: (h1,…, hq) = (a10n−1/(q+4),…, aq0n−1/(q+4)) and (λ1,…, λr) = (b10n−2/(q+4),…, br0n−2/(q+4)). This will greatly simplify the arguments in the proof. By the stochastic equicontinuity result of Ichimura (2000) (see Lemma A.4, which follows), we know that the conclusion holds provided

, which are true by Theorem 3.1 of Hall et al. (2004).

Using the shorthand notations

, and the identity

we can write Tn = Tn1 + Tn2 + T3n, where

Let

, where

is between the line segment of

. By Taylor expansion, we have

where the definitions of Tn1,j (j = 1,2,3) should be apparent.

The term Tn1,1 can be written as a second-order U-statistic (zi = (xi,yi)):

where

It is easy to check that

Similarly,

Thus, E [Hn(zi,zj)|zi] = 0 and Tn1,1 is a degenerate U-statistic.

where

, we have used

if xidxjd.

Therefore, we have

Equation (A.3) implies that {E [Hn2(zi,zj)]}−1 = O(h1hq). Similarly, one can show that E [Hn4(zi,zj)] = O((h1hq)−3). Define Gn(z1,z2) = E [Hn(z3,z2)Hn(z3,z1)|z1,z2]. One can show that E [Gn2(zi,zj)] = O((h1hq)−1). Thus, equation (A.1) becomes

Thus by Lemma A.1 we know that

Define

where

is defined in the same way as Hn(zi,zj) except that θ0 is replaced by

. Applying Lemma 3.1 of Powell, Stock, and Stoker (1989) or Lemma 1 of Zheng (2000), it is straightforward to show that

. Thus, we have

Applying Taylor expansion to Tn2, i.e., using

, we obtain

where

and

.

Lemma A.2, which follows, shows that Tn1,2 = Op(n−1/2) and Tn2,1 = Op(n−1/2), and Lemma A.3 shows that Tn1,3 = Op(1), Tn2,2 = Op(1), and Tn3 = Op(n−1). These results together with

lead to

Expressions (A.6) and (A.8) together complete the proof of Theorem 2.1. █

LEMMA A.2.

(i) Tn1,2 = Op(n−1/2).

(ii) Tn2,1 = Op(n−1/2).

Proof of (i).

First note that E [Tn1,2] = 0 because

Hence,

The preceding expression is zero if i,j,i′,j′ all take different values (because E [Kγ,ij fij(1)/fi] = 0). Therefore, for E {[Tn1,2]2} to be nonzero, we must have either (i) i,j,i′,j′ take three different values or (ii) i,j,i′,j′ take two different values. For these two cases it is easy to show that

Hence, E {[Tn1,2]2} = O(n−1), and consequently, Tn1,2 = Op(n−1/2). █

Proof of (ii).

where

Hence, E [A1n(zi,zj)] = E [A1n,1(zi,zj)] − E [A1n,2(zi,zj)] = 0.

One can write Tn2,1 = [2/n(n − 1)][sum ]i [sum ]j>i V1n(zi,zj) as a second-order U-statistic, where V1n(zi,zj) = (½)[A1n(zi,zj) + A1n(zj,zi)].

where in the preceding expression, Ai = Bi + (s.o.) means that [sum ]i Ai = [sum ]i Bi + op([sum ]i Bi), i.e., [sum ]i Bi is the leading term of [sum ]i Ai. Here v1i = (½)p1(xi)[fi(1)E(fi(1)|xi)], and we have used E [fji(1)|xi] = E [fi(1)|xi] = [sum ]y f (y|xi)2.

Using the H-decomposition, we have

LEMMA A.3.

(i) Tn1,3 = Op(1).

(ii) Tn2,2 = Op(1).

(iii) Tn3 = Op(n−1).

Proof of (i). Here

. By assumption (C2) (b(.,.) is the bound function for f(2)(·)):

which implies that Tn1,3 = Op(1). █

Proof of (ii). It is similar to the proof of (i) and is thus omitted here.

Proof of (iii). Using

, where

is between the line segment of

, we have

, where

It is easy to show that E [∥T3n,1,0∥] = O(1). Hence, T3n,1,0 = Op(1), which implies T3n,1 = Op(1) and T3n = Op(n−1) because

. █

LEMMA A.4. (Ichimura).

where γ = (h1,…, hq1,…, λr) with hs = as0n−1/(q+4) (s = 1,…,q), λs = bs0n−2/(q+4) (s = 1,…, r), and as0 > 0 and bs0 ≥ 0 are uniquely defined constants as given in Hall et al. (2004).

Ichimura (2000) has proved a general result that includes Lemma A.4 as a special case. Here, we provide an alternative proof for Lemma A.4 using a simple tightness argument (e.g., Mammen, 1992). Our proof consists of two parts: (i)

under H0; (ii)

. Because the proofs are similar, we only provide the proof for (i).

Proof of (i). Writing

, by Theorem 3.1 of Hall et al. (2004), we know that

(in probability). This implies that

in probability. Let

, where ajs and bjt (j = 1,2) are some positive constants with a1s < as0 < a2s (s = 1,…,q) and b1t < bt0 < b2t (t = 1,…, r). Denote

. Then Lemma A.5, which follows, shows that An(c) ≡ n(h1hq)1/2Tn (with hs = as n−1/(q+4) and λs = bs n−2/(q+4)) is tight in

.

Define Bn(c) = An(c) − An(c0). Then (i) becomes

; i.e., we want to show that, for all ε > 0

For any δ > 0, denote the δ-ball centered at c0 by Cδ = {c : ∥cc0∥ ≤ δ}, where ∥·∥ denotes the euclidean norm of a vector. By Lemma A.5 we know that An(·) is tight. By the Arzela–Ascoli theorem (see Billingsley, 1968, Thm. 8.2, p. 55) we know that tightness implies the following stochastic equicontinuous condition: for all ε > 0, η1 > 0, there exist a δ (0 < δ < 1) and an N1 such that

for all nN1.

Expression (A.10) implies that

for all nN1.

Also, from

in probability we know that for all η2 > 0, and for the δ given previously, there exists an N2 such that

for all nN2.

Therefore,

for all n ≥ max{N1,N2} by (A.11) and (A.12), where we have also used the fact that

is a subset of

.

Equation (A.13) is equivalent to (A.9). This completes the proof of (i). █

LEMMA A.5. Let An(c) = n(h1hq)1/2Tn,γ, where γ = (h,λ), hs = as n−1/(q+4), λs = bs n−2/(q+4), c = (a1,…, aq,b1,…, br), cs ∈ [C1s,C2s] with 0 < C1s < C2s < ∞ (s = 1,…,q + r).

Then the stochastic process An(c) indexed by c is tight under the sup-norm.

Proof. Write Kγ,ij as (h1hq)−1Kc,ij with hs = as n1/(q+4) and λs = bs n−2/(q+4), where Kc,ij = W((XjXi)/h)L(Xjd,Xid,λ). Also, denote by δ = q/(4 + q), C1 = (a1,…, aq)′ and

. Then we have (h1hq)−1Kc,ij = C1 nδWC1,ij LC2,ij. Also note that

; we have

where D1 > 0 is a finite constant. In the last equality we used |LC2,ij| ≤ 1 and assumption (C3). Also, we replaced one of the (C1′)−1/2 by C1−1/2 because as ∈ [C1s,C2s] are all bounded from above and below. The difference can be absorbed into D1.

By noting that An(c′) − An(c) is a degenerate U-statistic, and using (A.14), we have

where in the preceding expression AB means A and B having the same order of magnitude and D is a finite positive constant. Therefore, An(·) (hence, Bn(·)) is tight by Theorem 3.1 of Ossiander (1987). █

Proof of Theorem 2.3. We will provide a proof for the discrete dependent variable case. The continuous case is similar. To prove (13), similar to the decomposition of Tn, we decompose

as

, where the definitions of Tnj* are similar to those of Tnj with the proper changes; i.e.,

need to be changed to

. We further decompose

, where the definitions of Tn1,j* are similar to Tn1,j with the proper changes (j = 1,2,3).

The term Tn1,1* can be written as a second-order U-statistic (zi* = (xi*,yi*) = (xi,yi*)):

where

with

.

It is easy to check that

Similarly,

Hence, E*[Hn*(zi*,zj*)|zi*] = 0. Thus, conditional on the random sample {xi,yi}i=1n, Tn1,1* is a degenerate U-statistic.

Denote Un,ij* = [2/(n(n − 1))]Hn*(zi*,zj*) and define Un* = [2/(n(n − 1))]sum;i [sum ]j>i Hn*(zi*,zj*) ≡ Tn1,1*. We apply the CLT of de Jong (1996) for generalized quadratic forms to derive the asymptotic distribution of Un*|{xi,yi}i=1n. The reason for using de Jong's central CLT instead of the one in Hall (1984) is that in the bootstrap world, the function Hn*(zi*,zj*) depends on i and j, because zi* = (xi,yi*). By de Jong (1996, Prop. 3.2) we know that Un*/Sn* → N(0,1) in distribution in probability if GI*, GII*, and GIV* are all op(Sn*4), where Sn*2 = E*[Un*2], GI* = [sum ]i [sum ]j>i E*[Un,ij*4], GII* = [sum ]i [sum ]j>i [sum ]l>j>i[E*(Un,ij*2Un,il*2) + E*(Un,ji*2Un,jl*2) + E*(Un,li*2Un,lj*2)], and GIV* = (½)[sum ]i [sum ]j>i [sum ]s [sum ]t>s E*(Un,is*2Un,sj*2Un,ti*2Un,js*2).

Now,

Hence,

. By using a proof similar to the proof of Lemma A.4, one can show that Sn*2 has the same order as Sn*2 where Sn*2 is the same as Sn*2 except that

is replaced by γ. Hence we only need to establish the order of Sn*2. Because discrete regressors do not affect its order, for clarity, we will establish the order of Sn*2 for the case with continuous regressors only. We have

where C > 0 is a constant, which implies that

. Hence,

.

Next,

. Similar to Sn*2, one can show that

given that

.

From the preceding calculation it should be apparent that the probability orders of GI*, GII*, and GIV* are solely determined by the factor of n's and

through

. Therefore, tedious but straightforward calculations show that

Therefore, Gk*/Sn*4 = op(1) for all k = I,II,IV, and we know that

Next, define

where

is defined in the same way as Hn*(zi*,zj*) except that

is replaced by

. Similar to the analysis of Sn*2, one can show that

and that

. These results together with (A.16) lead to

The analysis of Tn1,2*, Tn1,3*, Tn2*, and Tn3* is similar to that of their counterparts in the proof of Theorem 2.1. One can show that Tn1,1* is the leading term of Tn*. For example, in Lemma A.2(i) we have shown that Tn1,2 = Op(n−1/2) by proving that E [Tn1,22] = O(n−1). By similar arguments one can show that E*[Tn1,2*2] = Op(n−1). The details are omitted here to save space. Therefore, we conclude that

has the same asymptotic distribution as that of

. Hence,

Because N(0,1) is a continuous distribution, by Polya's theorem (Bhattacharya and Rao, 1986), we obtain Theorem 2.3. █

References

REFERENCES

Aitchison, J. & C.G.G. Aitken (1976) Multivariate binary discrimination by the kernel method. Biometrika 63, 413420.Google Scholar
Andrews, D.W.K. (1997) A conditional Kolmogorov test. Econometrica 65, 10971128.Google Scholar
Bhattacharya, R.N. & R.R. Rao (1986) Normal Approximations and Asymptotic Expansions. Krieger.
Bierens, H.J. & W. Ploberger (1997) Asymptotic theory of integrated conditional moment tests. Econometrica 65, 11291151.Google Scholar
Billingsley, P. (1968) Probability and Measure. Wiley.
Corradi, V. & N. Swanson (2004) Bootstrap conditional distribution tests under dynamic misspecification. Journal of Econometrics, forthcoming.Google Scholar
de Jong, R.M. (1996) The Bierens test under data dependence. Journal of Econometrics 72, 132.Google Scholar
Delgado, M.A. & W.G. Manteiga (2001) Significance testing in nonparametric regression based on the bootstrap. Annals of Statistics 29, 14691507.Google Scholar
Diebold, F.X., T. Gunther, & A.S. Tay (1998) Evaluating density forecasts with applications to financial risk management. International Economic Review 39, 863883.Google Scholar
Fan, Y. (1994) Testing the goodness-of-fit of a parametric density function by kernel method. Econometric Theory 10, 316356.Google Scholar
Fan, Y. (1997) Goodness-of-fit tests for a multivariate distribution by the empirical characteristic function. Journal of Multivariate Analysis 62, 3663.Google Scholar
Fan, Y. (1998) Goodness-of-fit tests based on kernel density estimators with fixed smoothing parameters. Econometric Theory 14, 604621.Google Scholar
Fan, Y. & Q. Li (1996) Consistent model specification tests: Omitted variables and semi-parametric functional forms. Econometrica 64, 865890.Google Scholar
Fan, Y. & Q. Li (2000) Consistent model specification tests: Nonparametric versus Bierens' ICM tests. Econometric Theory 16, 10161041.Google Scholar
Hall, P. (1984) Central limit theorem for integrated square error of multivariate nonparametric density estimators. Journal of Multivariate Analysis 14, 116.Google Scholar
Hall, P., J. Racine, & Q. Li (2004) Cross-validation and the estimation of conditional probability densities. Journal of the American Statistical Association 99, 10151026.Google Scholar
Hong, Y. & H. White (1996) Consistent specification testing via nonparametric series regression. Econometrica 63, 11331159.Google Scholar
Ichimura, H. (2000) Asymptotic Distribution of Non-parametric and Semiparametric Estimators with Data Dependent Smoothing Parameters. Manuscript, University College, London.
Kullback, S. & R.A. Leibler (1951) On information and sufficiency. Annals of Mathematical Statistics 22, 7986.Google Scholar
Li, F. & G. Tkacz (2004) A consistent bootstrap test for conditional density functions with time-dependent data. Journal of Econometrics, forthcoming.Google Scholar
Mammen, E. (1992) When does bootstrap work? Asymptotic results and simulations. Springer-Verlag.
Ossiander, M. (1987) A central limit theorem under metric entropy with L2 bracketing. Annals of Probability 15, 897919.Google Scholar
Powell, J.L., J.H. Stock, & T.M. Stoker (1989) Semiparametric estimation of index coefficients. Econometrica 57, 14031430.Google Scholar
Wooldridge, J. (1992) A test for functional form against nonparametric alternatives. Econometric Theory 8, 452475.Google Scholar
Zheng, J.X. (2000) A consistent test of conditional parametric distributions. Econometric Theory 16, 667691.Google Scholar
Figure 0

DGPa: The case of discrete yi with relevant covariates

Figure 1

The case of discrete yi with irrelevant covariates

Figure 2

DGPb: The case of continuous yi