1. INTRODUCTION
Currently, there exists a substantial body of work on consistent model specification testing for regression models and for unconditional distribution (density) functions; see Bierens and Ploberger (1997), Delgado and Manteiga (2001), Fan (1994, 1997, 1998), Fan and Li (1996), Hong and White (1996), Wooldridge (1992), and the references therein. In many economic applications, however, it is the distribution of one variable conditional on some other variables that is of more direct interest. The popular parametric binary or multinomial response models are but two leading examples of conditional probability models. Conditional probability models also are widely deployed in risk management and insurance settings, where the dependent variable of interest may be the claim size (a continuous variable) and the explanatory variables usually contain a mixture of discrete and continuous variables such as sex, age, whether children are present, whether one smokes, and so forth. Moreover, in risk management analysis, usually one is interested in the entire (conditional) distribution, rather than only in the conditional mean itself. Hence, a conditional probability model is more useful than a regression model in risk analysis. Relatively speaking, tests for conditional probability models are scarce. Zheng (2000), using kernel density estimators, proposed a consistent test for a parametric conditional density function. He showed that the limiting distribution of his test statistic is N(0,1) and that the test can detect Pitman local alternatives approaching the null distribution at the rate of (nhq/2)−1/2, where n is the sample size, h is the bandwidth, and q is the dimension of the conditioning variables. To apply Zheng's test to a given data set, one needs to choose the bandwidth; no guidance is provided on how this should be accomplished. Moreover, the requirement that both the dependent variable y and conditioning variables x are continuous variables severely limits the scope of application of Zheng's test, as many economic data sets contain both continuous and discrete variables. Andrews (1997) proposed a conditional Kolmogorov (CK) test for testing a parametric conditional distribution function. His test overcomes the difficulties associated with Zheng's test; it does not involve smoothing parameters and allows for both discrete and continuous variables. The critical values of the CK test of Andrews are obtained via a parametric bootstrap procedure, and the test can detect Pitman type local alternatives that approach the null model at the rate of O(n−1/2). Although Andrews' test can handle both continuous and discrete variables, it does not produce an estimate of the conditional density function, which is of course undesirable when the parametric distribution function is rejected. In addition, it does not distinguish between relevant and irrelevant explanatory variables.
A related literature is the work on dynamic integral probability transform models such as that outlined in Diebold, Gunther, and Tay (1998). Corradi and Swanson (2004) and Li and Tkacz (2004) have also proposed bootstrap-based tests for conditional distributions. The Corradi and Swanson (2004) procedure is a nonsmoothing test similar to that of Andrews (1997), and their test extends Andrews' test to the time series data setting. Li and Tkacz (2004) use kernel smoothing; however, like Zheng (2000), they only consider the case whereby both y and x are continuous variables. The conventional way of handling discrete variables when estimating a conditional density function involving both discrete and continuous explanatory variables is by the so-called frequency method in which the entire sample is first split into a number of distinct cells and the data in each cell are then used to estimate the conditional density that is a function of the remaining continuous variables. For economic data, however, it is typically the case that the number of discrete cells is comparable to or even larger than the sample size. This renders the nonparametric frequency approach infeasible. Moreover, one may not know which conditional variables should be included in a particular application and hence faces the danger of including potentially irrelevant variables in the estimate. This is unfortunate, particularly in nonparametric settings, as including irrelevant explanatory variables has serious consequences for the accuracy of the resulting estimate: the rate of convergence of the density estimator will deteriorate quickly with the number of irrelevant continuous variables (the “curse of dimensionality”), whereas the number of cells will increase quite quickly with the number of irrelevant discrete variables. Recently, Hall, Racine, and Li (2004) proposed estimating a conditional density by smoothing both the discrete and continuous variables and showed that the use of cross-validation can automatically remove irrelevant variables from the resulting estimate. This is because the cross-validation method selects bandwidths that converge to some optimal values for relevant variables but selects large values for irrelevant conditional variables, thereby effectively smoothing out the irrelevant variables from the resulting estimate.
In this paper, we exploit the approach of Hall et al. (2004) to establish an alternative test for a parametric conditional density function. It is constructed based on the Zheng (2000) setup; however, it improves upon Zheng's test in a number of important ways: (i) the bandwidth is automatically chosen by cross-validation, thereby avoiding potential arbitrariness in the test's outcome due to an arbitrary choice of the bandwidth; (ii) it allows for both discrete and continuous variables; and (iii) the critical values are obtained from a parametric bootstrap procedure, which corrects the size distortions present in Zheng's approach. Although (ii) and (iii) are shared by Andrews' CK test, our test automatically produces an estimate of the conditional density function when the parametric density function is rejected by the test. More importantly, by automatically smoothing both the discrete and continuous variables via the method of cross-validation, our test has the advantage of automatically removing irrelevant variables from the resulting estimate (see Hall et al., 2004) and, as a consequence, enjoys substantial power gains in finite samples, as confirmed by our simulation results. Although our proposed test can only detect Pitman local alternatives approaching the null at rates slower than O(n−1/2), it can be shown that for high-frequency alternatives, our test can detect local alternatives that approach the null at rates o(n−1/2) in terms of the L1 norm of the difference between the local alternative and the null model (e.g., Fan, 1998; Fan and Li, 2000). Hence it provides a complement to Andrews' CK test.
The remainder of this paper is organized as follows. In Section 2 we review and suggest a modified version of Zheng's test statistic. We also propose a bootstrap method for approximating the null distribution of our test. Section 3 reports Monte Carlo simulation results that examine the finite-sample performance of the proposed test. Finally, Section 4 concludes. Proofs are presented in the Appendix.
2. THE NULL HYPOTHESIS AND THE TEST
2.1. Zheng's Test
We begin by briefly reviewing the test proposed by Zheng (2000). Suppose that the data consist of {yi,xi}i=1n, an independent and identically distributed (i.i.d.) sample drawn from the distribution of (y,x) with the joint density function p(y,x). Let p(y|x) denote the conditional density function of y given x. We are interested in testing whether p(y|x) belongs to a particular parametric family. Let f (y|x,θ) denote a parametric conditional density function with θ being a k × 1 parameter. The null hypothesis is given by
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm001.gif?pub-status=live)
where Θ is the parameter space that is a compact set in
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm002.gif?pub-status=live)
. The alternative hypothesis is the negation of the null:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm003.gif?pub-status=live)
The Kullback–Leibler information criterion (Kullback and Leibler, 1951), measuring the discrepancy of two conditional density functions, is defined as
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294frm001.gif?pub-status=live)
It is well known that I(p,f) ≥ 0 and I(p,f) = 0 if and only if p(y|x) = f (y|x,θ0) almost everywhere (a.e.). Thus, I(p,f) serves as a proper measure to test H0. For technical reasons, instead of basing his test on the information measure, Zheng (2000) considered its first-order expansion,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170408054310-42776-mediumThumb-S0266466606060294frm002.jpg?pub-status=live)
Weighting (2) by the marginal density p1(x) of the conditional variable x leads to the following measure:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294frm003.gif?pub-status=live)
Zheng (2000) has shown that I1(p,f) ≥ 0 and the equality holds if and only if H0 is true. Therefore, I1(p,f) also serves as a proper measure to test for H0. For continuous random variables y and x, Zheng (2000) proposed estimating p(yi,xi) by a standard kernel density estimator and estimating f (yi|xi,θ0)p1(xi) by a smoothed density estimator
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm004.gif?pub-status=live)
given by
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294frm004.gif?pub-status=live)
where w2,hy(·) = hy−1w2(·), w2(·) is a (specially defined) univariate kernel function, Wh(·) is a product kernel
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm005.gif?pub-status=live)
with w(·) being a standard (second-order) univariate kernel, hy and hs's are the smoothing parameters, and
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm006.gif?pub-status=live)
is an estimator of θ0 under the null model. The measure I1(p,f) is then estimated by
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294frm005.gif?pub-status=live)
To establish the null asymptotic distribution of Tn,hc, Zheng (2000) suggested transforming the dependent variable such that it takes values in [0,1] and then choosing a special kernel function for w2(·) with the property that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm007.gif?pub-status=live)
as n → ∞. The use of the smoothed estimator
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm008.gif?pub-status=live)
eliminates the bias of the kernel estimator of p(yi,xi) under H0 such that the test statistic is appropriately centered for a wide range of smoothing parameter values. Under some regularity conditions, Zheng (2000) showed that the asymptotic null distribution of Tn,hc is normal and provided a consistent estimator of its asymptotic variance.
2.2. Our Framework
We now extend Zheng's test to include both continuous and discrete explanatory variables (x is a mixed variable), where the dependent variable y can be discrete or continuous.
We first consider the case that y is a discrete variable. In this case, we show that the smoothed estimator
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm009.gif?pub-status=live)
reduces to an average estimator. Thus, the resulting test statistic only involves summations and hence avoids the need for numerical integration.
Let x = (xc,xd), where xc is a q × 1 continuous variable and xd is an r × 1 discrete variable. We use xisc (xisd) to denote the sth component of xic (xid). We further assume that xisd takes the values in {0,1,…, cs − 1} (it takes cs different values).
In constructing the kernel density estimate, we use different kernel functions for the discrete and continuous variables. For the discrete variable xd, we use the Aitchison and Aitken (1976) kernel: l(xisd,xjsd,λs) = 1 − λs if xisd = xjsd, and l(xisd,xjsd,λs) = λs /(cs − 1) if xisd ≠ xjsd. Hence, the product kernel for the discrete variable is
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm010.gif?pub-status=live)
where Nis(x) = I(xisd ≠ xjsd), in which I(·) is the usual indicator function and λ1,…,λr are the smoothing parameters for the discrete components and are constrained by 0 ≤ λs ≤ (cs − 1)/cs. Note that when λs assumes the upper extreme value, (cs − 1)/cs, l(xisd,xjsd,λs = (cs − 1)/cs) ≡ 1/cs becomes unrelated to (xisd,xjsd), i.e., the sth component of xd is completely smoothed out when λs = (cs − 1)/cs.
For the continuous component xc, we still use the standard (second-order) product kernel function as discussed earlier. Therefore, for the mixed type variable x = (xc,xd), the kernel function is defined by
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294frm006.gif?pub-status=live)
where γ = (h,λ) ≡ (h1,…, hq,λ1,…, λr).
We now discuss how to estimate p(yi,xi) and p1(xi). Assume that yi is a discrete variable; then we estimate p(yi,xi) and p1(xi) by the following leave-one-out kernel estimators:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294frm007.gif?pub-status=live)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294frm008.gif?pub-status=live)
To construct the smoothed estimator of f (yi|xi,θ0), we replace Wh(·) in (4) by Kγ(xi,xj) and ∫w2,hy((yi − y)/hy) dy by [sum ]y I(yi − y). Taking into account these modifications, we obtain
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294frm009.gif?pub-status=live)
Using
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm011.gif?pub-status=live)
just introduced, we define our test statistic as
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294frm010.gif?pub-status=live)
Note that the double summation in Tn,γ does not include j = i terms because we have used the leave-one-out estimators for estimating p(yi,xi) and p1(xi). The reason for using these leave-one-out estimators is that, under H0, the asymptotic distribution of Tn,γ will be centered at zero (there is no center term).
The smoothing parameters h1,…, hq (corresponding to the continuous variable xc) can be selected by several commonly used procedures, including the cross-validation method, the plug-in method, and some ad hoc methods. However, for λ1,…, λr, the plug-in or even an ad hoc formula is not available. Hall et al. (2004) have shown that using the cross-validation method to select λ1,…, λr and h1,…, hq has some nice properties: when xsc (xsd) is a relevant variable, the cross-validation method will select a small hs(λs) that converges to zero at an optimal rate; when xsc (xsd) is an irrelevant variable,1
We say that xs is an irrelevant variable if p(y|x) is independent of xs.
Let (h,λ) = (h1,…, hq,λ1,…, λr). Hall et al. propose choosing (h,λ) by minimizing the following objective function:2
Hall et al. (2004) show that, up to an additive constant term that does not depend on (h,λ), CV(h,λ) is a consistent estimator of the weighted integrated squared error:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm012.gif?pub-status=live)
, where
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm013.gif?pub-status=live)
if y is a continuous variable and
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm014.gif?pub-status=live)
if y is a discrete variable.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294frm011.gif?pub-status=live)
where
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm015.gif?pub-status=live)
in which
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm016.gif?pub-status=live)
are the leave-one-out kernel estimators of p1(xi) and p(xi,yi), respectively, and m(xic) is a weight function introduced to deal with the small random denominator problem; see Hall et al. (2004).
We will use
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm017.gif?pub-status=live)
to denote the resulting smoothing parameters. Assuming that all the x variables are relevant variables, Hall et al. (2004) showed that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm018.gif?pub-status=live)
for s = 1,…,q, and
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm019.gif?pub-status=live)
for s = 1,…, r, where as0 and bs0 are some finite constants.
THEOREM 2.1. Under conditions (C1)–(C3) given in the Appendix, we have under H0
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm020.gif?pub-status=live)
where
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm021.gif?pub-status=live)
and
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm022.gif?pub-status=live)
is a consistent estimator of σ02 = [∫W2(v) dv]E [(1 − f (yi|xi,θ0)) f−1(yi|xi,θ0)p1(xi)], the asymptotic variance of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm023.gif?pub-status=live)
.
It can be shown that under
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm024.gif?pub-status=live)
diverges to +∞. Hence, the
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm025.gif?pub-status=live)
test is a consistent test. Moreover, the
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm026.gif?pub-status=live)
test can detect local alternatives that approach the null at a rate of Op(n−1/2(h1 … hq)−1/4) = Op(n−(1/2)((8+q)/(8+2q))), which is slower than Op(n−1/2) (because hj = Op(n−1/(4+q)) for all j = 1,…,q).
We now briefly discuss the case where the dependent variable y is continuous. In this case, one can still use Zheng's test statistic given in (5) but with w2,hy((yi − yj)/hy) and Wh((xic − xjc)/h) being replaced by
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm027.gif?pub-status=live)
, respectively, where
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm028.gif?pub-status=live)
denote the cross-validation selected smoothing parameters suggested by Hall et al. (2004); i.e., one chooses (hy,h,λ) by minimizing (11), but now G−i(xi) is defined as
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm029.gif?pub-status=live)
where w2,hy(v) = hy−1w2(v) and w2(v) = ∫w2(u)w2(v − u) du is the twofold convolution kernel derived from w2(·).
With a slight abuse of notation, the resulting test statistic becomes
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170408054310-31729-mediumThumb-S0266466606060294frm012.jpg?pub-status=live)
where
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm030.gif?pub-status=live)
contains the extra smoothing parameter
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm031.gif?pub-status=live)
because yi is continuous.
The asymptotic distribution of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm032.gif?pub-status=live)
is given in the following theorem.
THEOREM 2.2. Under conditions (C1)–(C3) given in the Appendix, we have under H0,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm033.gif?pub-status=live)
where
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm034.gif?pub-status=live)
is a consistent estimator of σ0,c2 = 2[∫W2(v) dv]E [p1(xi)], the asymptotic variance of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm035.gif?pub-status=live)
.
The proof of Theorem 2.2 is similar to that of Theorem 1 in Zheng (2000) and is omitted here.
2.3. A Parametric Bootstrap Test
Theorems 2.1 and 2.2 provide, respectively, the asymptotic null distribution of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm036.gif?pub-status=live)
. Consequently, one can perform tests for H0 by comparing the value of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm037.gif?pub-status=live)
with its asymptotic critical value. However, it is well known that consistent nonparametric tests often suffer from substantial finite-sample size distortions. Our simulations reveal that the
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm038.gif?pub-status=live)
shares this drawback. To overcome this problem, we propose a bootstrap procedure to more accurately approximate the finite-sample null distribution of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm039.gif?pub-status=live)
. It involves the following steps.
Step (i). Generate the ith bootstrap value of the dependent variable y from the parametric conditional distribution
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm040.gif?pub-status=live)
. Denote this value by yi* (i = 1,…, n). We have the complete bootstrap sample {xi,yi*}i=1n.
Step (ii). Based on the parametric null model, estimate θ using the bootstrap sample. Let
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm041.gif?pub-status=live)
denote the resulting estimator. Compute the bootstrap statistic
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm042.gif?pub-status=live)
in the same way as
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm043.gif?pub-status=live)
except that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm044.gif?pub-status=live)
are replaced by
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm045.gif?pub-status=live)
, respectively. Note that we use the same cross-validation selected smoothing parameter
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm046.gif?pub-status=live)
in computing the bootstrap statistics. There is no re-cross-validation in computing
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm047.gif?pub-status=live)
.
Step (iii). Repeat steps (i) and (ii) a large number of times, say, B times, and use the empirical distribution of the B bootstrap statistics
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm048.gif?pub-status=live)
to approximate the null distribution of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm049.gif?pub-status=live)
.
Step (iv). The bootstrap test rejects H0 at significance level α if
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm050.gif?pub-status=live)
exceeds the empirical αth percentile of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm051.gif?pub-status=live)
.
The following theorem justifies the asymptotic validity of the bootstrap test.
THEOREM 2.3. Assume the same conditions as in Theorem 2.1 (Theorem 2.2) except the null hypothesis. We have
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294frm013.gif?pub-status=live)
where Φ(·) is the cumulative distribution function of a standard normal random variable.
The proof of Theorem 2.3 is given in the Appendix.
In words, Theorem 2.3 states that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm052.gif?pub-status=live)
converges to N(0,1) in distribution in probability. Other authors show that some bootstrap method works using the concept of convergence with probability one, where one states that the left-hand side of (13) is o(1) with probability one (i.e., convergence in distribution with probability one). Here we choose to use the concept of convergence in distribution in probability because our test statistic involves nonparametric estimation and it is easier to work with “convergence in probability” than “convergence with probability one.”
Note that Theorem 2.3 holds true regardless of whether the null hypothesis is true or not. Therefore, (i) when the null hypothesis is true, the bootstrap procedure will lead to (asymptotically) correct size of the test, because
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm053.gif?pub-status=live)
converges in distribution to the same N(0,1) limiting distribution under H0; (ii) when the null hypothesis is false, because the test statistic
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm054.gif?pub-status=live)
will converge to +∞ in probability, whereas asymptotically the bootstrap critical value is still finite (say, the 95th quantile from the N(0,1) distribution), the bootstrap procedure leads to a consistent test.
3. MONTE CARLO SIMULATION RESULTS
In this section, we present Monte Carlo simulation results to examine the finite-sample performance of our
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm055.gif?pub-status=live)
test.
3.1. Discrete Dependent Variable
In this simulation experiment, the dependent variable y is a {0,1} binary variable. We use a slightly different notation in this section; x denotes xc and z denotes xd. The data generating process (DGP) for the null model is given by
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm056.gif?pub-status=live)
where {xi}i=1n is a random sample from N(0,1), zi takes binary values {0,1} with case (i) Pr[zi = 1] = ½ and Pr[zi = 0] = ½ and case (ii) Pr[zi = 1] = 0.8 and Pr[zi = 0] = 0.2, and the error term {ui} is i.i.d. N(0,1). Moreover, xi, zi, and ui are all independent of each other. The true parameters are {β0,β1} = {1,1} and β2 = {1, 0.3, 0}; β2 = 0 corresponds to the case that zi is in fact an irrelevant variable. This leads to the following null hypothesis:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm057.gif?pub-status=live)
where Φ(·) is the standard normal cumulative distribution function. The parametric conditional density of the null model is estimated by the maximum likelihood (ML) method.
The following two alternative DGPs are constructed to examine the power of the
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm058.gif?pub-status=live)
test; one has a nonlinear term in the index, and the other has a conditional heteroskedastic error:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170408054310-69577-mediumThumb-S0266466606060294ffm059.jpg?pub-status=live)
where xi, zi, and ui are all generated in the same way as before. Also, β0,β1,β2 take the same values as previously, whereas β3 = 1. We use the parametric bootstrap described earlier to approximate the null distribution of the test statistic
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm060.gif?pub-status=live)
.
Our test will be compared with the CK test of Andrews (1997) with test statistic (CKn) defined as
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294frm014.gif?pub-status=live)
where F(·|·,·,θ) is the parametric conditional distribution function and
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm061.gif?pub-status=live)
is the ML estimator of θ0.
The sample sizes considered are n = 100 and 200, the numbers of simulations are 5,000 for size estimation and 2,000 for power estimation, and the number of bootstraps is B = 1,000 for all cases. The simulation results for discrete yi with relevant covariates only are reported in Table 1.
DGPa: The case of discrete yi with relevant covariates
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170408054310-08273-mediumThumb-S0266466606060294tbl001.jpg?pub-status=live)
From Table 1 we observe that for different values of β2 (with β2 = 1, 0.3) and different values of Pr(zi = 1) (0.5, 0.8), the performances of the
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm062.gif?pub-status=live)
and Andrews' tests are qualitatively the same. Overall the estimated sizes are quite close to their nominal sizes for both tests. The power performances are mixed for the two alternative models. For the alternative DGP1a with an extra quadratic term, our test
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm063.gif?pub-status=live)
shows higher power than Andrews' test for the sample sizes considered. However, for some cases of DGP2a with a heteroskedastic error term, Andrews' test is slightly more powerful than ours. The simulation results show that our
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm064.gif?pub-status=live)
test complements Andrews' test.
Next we consider the case with an irrelevant covariate. We use the same DGP as before except that now we choose β2 = 0 so that the binary discrete variable z becomes an irrelevant covariate. Because this information is unknown a priori, we still compute the conditional probability of y conditional on both x and z. In this case we expect that the cross-validation method tends to select the upper bound value of λ = ½ so that the irrelevant covariate z is smoothed out automatically, resulting in a finite-sample power gain for the
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm065.gif?pub-status=live)
test.
From Table 2 we observe that the power of the
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm066.gif?pub-status=live)
test improves substantially compared with those reported in Table 1. It is interesting to observe that for DGP2a, the power performance of the
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm067.gif?pub-status=live)
test is quite comparable to that of Andrews' test. Thus, the simulation results confirm that our cross-validation-based test indeed has the ability to remove irrelevant covariates and enjoys superior finite-sample power performance.
The case of discrete yi with irrelevant covariates
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170408054310-63911-mediumThumb-S0266466606060294tbl002.jpg?pub-status=live)
3.2. Continuous Dependent Variable
In this section we consider the case where both y and x are continuous variables, and we compare the finite-sample performance of Zheng's original test with our
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm068.gif?pub-status=live)
test. The first DGP we use is the same as that in Zheng. The null model is a linear regression model with normal homoskedastic errors:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm069.gif?pub-status=live)
where {xi}i=1n is a random sample from N(0,1) and the error term {ui} is i.i.d. N(0,σ2). Moreover, xi and ui are independent of each other. The true parameters are {β0,β1,σ} = {1,1,1}. This leads to the following null hypothesis:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm070.gif?pub-status=live)
where φ(·) is the standard normal density function. The parameter θ is estimated by the ML estimation method.
Two alternative models are considered: one is designed to test misspecification in the regression (DGP1b), and the second is to test homoskedasticity of the error term (DGP2b):
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm071.gif?pub-status=live)
where β2 is set to be 1 in the experiment. We also report Andrews' test for comparison purposes. The simulation results are reported in Table 3a.
DGPb: The case of continuous yi
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170408054310-79795-mediumThumb-S0266466606060294tbl003.jpg?pub-status=live)
We observe from Table 3a that the parametric bootstrap method successfully overcomes the size distortion of Zheng's test. The estimated sizes of the bootstrap test are all close to their nominal values, whereas Zheng's test based on the asymptotic normal approximation is significantly undersized. For the alternatives DGP1b and DGP2b, we observe that the bootstrap test
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm072.gif?pub-status=live)
is much more powerful than Zheng's test. There are two reasons for this: the first is that the bootstrap test corrects the undersize problem of Zheng's test and hence improves the finite-sample power performance; the second reason is that we use the data-driven cross-validation method to select the smoothing parameters that lead to optimal smoothing in estimating the unknown conditional density functions, whereas Zheng suggested using some ad hoc method to select the smoothing parameters. It turns out that the use of optimal smoothing also enhances the finite-sample power of the test. For DGP1b, Andrews' test has similar power as the
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm073.gif?pub-status=live)
test, whereas for DGP2b, Andrews' test is less powerful than the
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm074.gif?pub-status=live)
test.
Finally we consider a case that there exists an irrelevant continuous variable. We will use basically the same setup as in DGPb except that we set β2 = 0 now. Therefore, x2i becomes an irrelevant variable. However, this information is not used in the estimation. That is, all estimation methods still use the full data set {yi,x1i,x2i}i=1n. Because our cross-validation-based
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm075.gif?pub-status=live)
has the advantage of (asymptotically) removing the irrelevant variable x2, we expect that the
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm076.gif?pub-status=live)
test will enjoy further power gains. The simulation results are reported in Table 3b.
From Table 3b we observe that the
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm077.gif?pub-status=live)
test has good estimated sizes. Zheng's test still underestimates the sizes at the 5% and 10% levels. Andrews' test is also somewhat undersized when an irrelevant variable exists. From the estimated power results, we see substantial power gain of the
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm078.gif?pub-status=live)
test over Zheng's test. Essentially, Zheng's test is based on a two-dimensional nonparametric conditional density estimate because the smoothing parameters in Zheng's test are selected by some ad hoc rules that cannot detect the irrelevant variable x2, whereas our
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm079.gif?pub-status=live)
test estimates, asymptotically, a one-dimensional conditional density function because x2i will be smoothed out asymptotically. The
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm080.gif?pub-status=live)
test is also more powerful than Andrews' test for this DGP (when there is an irrelevant continuous variable). Of course here we only report a limited simulation result, from the local power analysis; we expect that there exist data generating processes for which Andrews' test will be more powerful than the
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm081.gif?pub-status=live)
test. Our simulation results show that the
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm082.gif?pub-status=live)
test can serve as a useful complement to Andrews' test when one is interested in testing a parametric conditional distribution.
4. CONCLUSIONS
This paper proposes a kernel-based bootstrap test for parametric conditional distribution functions. We separately consider the case where y is a discrete variable and where y is a continuous variable. In either case, the conditional variables can contain both discrete and continuous variables. By automatically smoothing both the discrete and continuous variables via the method of cross-validation, our test has the advantage of automatically removing irrelevant variables from the estimate of the conditional density function and, as a consequence, enjoys substantial power gains in finite-sample applications, as confirmed by our simulation results. The test is potentially applicable in a wide variety of applications and should prove useful to applied researchers.
APPENDIX
We first state conditions that are used to prove Theorem 2.1.
(C1) {yi,xi}i=1n are i.i.d. data with a joint density p(y,x). The first-order derivatives of p(.,.) with respect to its continuous arguments are uniformly bounded. The marginal density p1(x) of xi and its first-order derivatives with respect to its continuous arguments are uniformly bounded.
(C2) (i) The parameter space Θ is a compact and convex subset of Rk. Let ∥·∥ denote the euclidean norm of ·; then f (y|x,θ0)−1, ∥(∂f (y|x,θ))/∂θ∥, ∥(∂2 log f (y|x,θ))/∂θ∂θ′∥, and ∥(∂ log f (y|x,θ))/∂θ × (∂ log f (y|x,θ))/∂θ′∥ are all bounded by a nonnegative function b(x,y) with ∫ b(x,y)s < ∞ (s = 1,2), where ∫ denotes integration for the continuous variable and summation for the discrete variable. (ii)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm083.gif?pub-status=live)
under H0.
(C3) w(·) is a nonnegative, bounded, symmetric function with ∫ w(v) dv = 1 and ∫ w(v)v2 dv = c(< ∞).
The preceding conditions are basically the same as those used in Zheng (2000).
We give the central limit theorem (CLT) of Hall (1984, Thm. 3.1) for degenerate U-statistics as a lemma here.
LEMMA A.1. Let
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm084.gif?pub-status=live)
be a second-order U-statistic, where {zi}i=1n is i.i.d. Suppose E [Hn(zi,zj)|zi] = 0 (for i ≠ j, Un is a degenerate U-statistic) and define Gn(z1,z2) = E [Hn(z3,z1)Hn(z3,z2)|z1,z2]. If
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294frm015.gif?pub-status=live)
then
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm085.gif?pub-status=live)
In the proof presented subsequently, we will replace
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm086.gif?pub-status=live)
by their nonstochastic leading terms: (h1,…, hq) = (a10n−1/(q+4),…, aq0n−1/(q+4)) and (λ1,…, λr) = (b10n−2/(q+4),…, br0n−2/(q+4)). This will greatly simplify the arguments in the proof. By the stochastic equicontinuity result of Ichimura (2000) (see Lemma A.4, which follows), we know that the conclusion holds provided
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm087.gif?pub-status=live)
, which are true by Theorem 3.1 of Hall et al. (2004).
Using the shorthand notations
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm088.gif?pub-status=live)
, and the identity
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294frm016.gif?pub-status=live)
we can write Tn,γ = Tn1 + Tn2 + T3n, where
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170408054310-87250-mediumThumb-S0266466606060294ffm089.jpg?pub-status=live)
Let
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm090.gif?pub-status=live)
, where
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm091.gif?pub-status=live)
is between the line segment of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm092.gif?pub-status=live)
. By Taylor expansion, we have
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170408054310-58225-mediumThumb-S0266466606060294ffm093.jpg?pub-status=live)
where the definitions of Tn1,j (j = 1,2,3) should be apparent.
The term Tn1,1 can be written as a second-order U-statistic (zi = (xi,yi)):
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm094.gif?pub-status=live)
where
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm095.gif?pub-status=live)
It is easy to check that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170408054310-34456-mediumThumb-S0266466606060294ffm096.jpg?pub-status=live)
Similarly,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170408054310-56304-mediumThumb-S0266466606060294ffm097.jpg?pub-status=live)
Thus, E [Hn(zi,zj)|zi] = 0 and Tn1,1 is a degenerate U-statistic.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170408054310-28066-mediumThumb-S0266466606060294ffm098.jpg?pub-status=live)
where
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm099.gif?pub-status=live)
, we have used
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm100.gif?pub-status=live)
if xid ≠ xjd.
Therefore, we have
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294frm017.gif?pub-status=live)
Equation (A.3) implies that {E [Hn2(zi,zj)]}−1 = O(h1 … hq). Similarly, one can show that E [Hn4(zi,zj)] = O((h1 … hq)−3). Define Gn(z1,z2) = E [Hn(z3,z2)Hn(z3,z1)|z1,z2]. One can show that E [Gn2(zi,zj)] = O((h1 … hq)−1). Thus, equation (A.1) becomes
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm101.gif?pub-status=live)
Thus by Lemma A.1 we know that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294frm018.gif?pub-status=live)
Define
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294frm019.gif?pub-status=live)
where
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm102.gif?pub-status=live)
is defined in the same way as Hn(zi,zj) except that θ0 is replaced by
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm103.gif?pub-status=live)
. Applying Lemma 3.1 of Powell, Stock, and Stoker (1989) or Lemma 1 of Zheng (2000), it is straightforward to show that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm104.gif?pub-status=live)
. Thus, we have
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294frm020.gif?pub-status=live)
Applying Taylor expansion to Tn2, i.e., using
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm105.gif?pub-status=live)
, we obtain
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170408054310-14836-mediumThumb-S0266466606060294frm021.jpg?pub-status=live)
where
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm106.gif?pub-status=live)
and
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm107.gif?pub-status=live)
.
Lemma A.2, which follows, shows that Tn1,2 = Op(n−1/2) and Tn2,1 = Op(n−1/2), and Lemma A.3 shows that Tn1,3 = Op(1), Tn2,2 = Op(1), and Tn3 = Op(n−1). These results together with
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm108.gif?pub-status=live)
lead to
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294frm022.gif?pub-status=live)
Expressions (A.6) and (A.8) together complete the proof of Theorem 2.1. █
LEMMA A.2.
(i) Tn1,2 = Op(n−1/2).
(ii) Tn2,1 = Op(n−1/2).
Proof of (i).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm109.gif?pub-status=live)
First note that E [Tn1,2] = 0 because
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm110.gif?pub-status=live)
Hence,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm111.gif?pub-status=live)
The preceding expression is zero if i,j,i′,j′ all take different values (because E [Kγ,ij fij(1)/fi] = 0). Therefore, for E {[Tn1,2]2} to be nonzero, we must have either (i) i,j,i′,j′ take three different values or (ii) i,j,i′,j′ take two different values. For these two cases it is easy to show that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm112.gif?pub-status=live)
Hence, E {[Tn1,2]2} = O(n−1), and consequently, Tn1,2 = Op(n−1/2). █
Proof of (ii).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm113.gif?pub-status=live)
where
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170408054310-47861-mediumThumb-S0266466606060294ffm114.jpg?pub-status=live)
Hence, E [A1n(zi,zj)] = E [A1n,1(zi,zj)] − E [A1n,2(zi,zj)] = 0.
One can write Tn2,1 = [2/n(n − 1)][sum ]i [sum ]j>i V1n(zi,zj) as a second-order U-statistic, where V1n(zi,zj) = (½)[A1n(zi,zj) + A1n(zj,zi)].
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170408054310-05104-mediumThumb-S0266466606060294ffm115.jpg?pub-status=live)
where in the preceding expression, Ai = Bi + (s.o.) means that [sum ]i Ai = [sum ]i Bi + op([sum ]i Bi), i.e., [sum ]i Bi is the leading term of [sum ]i Ai. Here v1i = (½)p1(xi)[fi(1) − E(fi(1)|xi)], and we have used E [fji(1)|xi] = E [fi(1)|xi] = [sum ]y f (y|xi)2.
Using the H-decomposition, we have
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm116.gif?pub-status=live)
LEMMA A.3.
(i) Tn1,3 = Op(1).
(ii) Tn2,2 = Op(1).
(iii) Tn3 = Op(n−1).
Proof of (i). Here
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm117.gif?pub-status=live)
. By assumption (C2) (b(.,.) is the bound function for f(2)(·)):
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170408054310-77373-mediumThumb-S0266466606060294ffm118.jpg?pub-status=live)
which implies that Tn1,3 = Op(1). █
Proof of (ii). It is similar to the proof of (i) and is thus omitted here.
Proof of (iii). Using
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm119.gif?pub-status=live)
, where
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm120.gif?pub-status=live)
is between the line segment of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm121.gif?pub-status=live)
, we have
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm122.gif?pub-status=live)
, where
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170408054310-61740-mediumThumb-S0266466606060294ffm123.jpg?pub-status=live)
It is easy to show that E [∥T3n,1,0∥] = O(1). Hence, T3n,1,0 = Op(1), which implies T3n,1 = Op(1) and T3n = Op(n−1) because
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm124.gif?pub-status=live)
. █
LEMMA A.4. (Ichimura).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm125.gif?pub-status=live)
where γ = (h1,…, hq,λ1,…, λr) with hs = as0n−1/(q+4) (s = 1,…,q), λs = bs0n−2/(q+4) (s = 1,…, r), and as0 > 0 and bs0 ≥ 0 are uniquely defined constants as given in Hall et al. (2004).
Ichimura (2000) has proved a general result that includes Lemma A.4 as a special case. Here, we provide an alternative proof for Lemma A.4 using a simple tightness argument (e.g., Mammen, 1992). Our proof consists of two parts: (i)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm126.gif?pub-status=live)
under H0; (ii)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm127.gif?pub-status=live)
. Because the proofs are similar, we only provide the proof for (i).
Proof of (i). Writing
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm128.gif?pub-status=live)
, by Theorem 3.1 of Hall et al. (2004), we know that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm129.gif?pub-status=live)
(in probability). This implies that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm130.gif?pub-status=live)
in probability. Let
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm131.gif?pub-status=live)
, where ajs and bjt (j = 1,2) are some positive constants with a1s < as0 < a2s (s = 1,…,q) and b1t < bt0 < b2t (t = 1,…, r). Denote
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm132.gif?pub-status=live)
. Then Lemma A.5, which follows, shows that An(c) ≡ n(h1 … hq)1/2Tn,γ (with hs = as n−1/(q+4) and λs = bs n−2/(q+4)) is tight in
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm133.gif?pub-status=live)
.
Define Bn(c) = An(c) − An(c0). Then (i) becomes
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm134.gif?pub-status=live)
; i.e., we want to show that, for all ε > 0
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294frm023.gif?pub-status=live)
For any δ > 0, denote the δ-ball centered at c0 by Cδ = {c : ∥c − c0∥ ≤ δ}, where ∥·∥ denotes the euclidean norm of a vector. By Lemma A.5 we know that An(·) is tight. By the Arzela–Ascoli theorem (see Billingsley, 1968, Thm. 8.2, p. 55) we know that tightness implies the following stochastic equicontinuous condition: for all ε > 0, η1 > 0, there exist a δ (0 < δ < 1) and an N1 such that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294frm024.gif?pub-status=live)
for all n ≥ N1.
Expression (A.10) implies that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294frm025.gif?pub-status=live)
for all n ≥ N1.
Also, from
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm135.gif?pub-status=live)
in probability we know that for all η2 > 0, and for the δ given previously, there exists an N2 such that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294frm026.gif?pub-status=live)
for all n ≥ N2.
Therefore,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294frm027.gif?pub-status=live)
for all n ≥ max{N1,N2} by (A.11) and (A.12), where we have also used the fact that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm136.gif?pub-status=live)
is a subset of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm137.gif?pub-status=live)
.
Equation (A.13) is equivalent to (A.9). This completes the proof of (i). █
LEMMA A.5. Let An(c) = n(h1 … hq)1/2Tn,γ, where γ = (h,λ), hs = as n−1/(q+4), λs = bs n−2/(q+4), c = (a1,…, aq,b1,…, br), cs ∈ [C1s,C2s] with 0 < C1s < C2s < ∞ (s = 1,…,q + r).
Then the stochastic process An(c) indexed by c is tight under the sup-norm.
Proof. Write Kγ,ij as (h1 … hq)−1Kc,ij with hs = as n1/(q+4) and λs = bs n−2/(q+4), where Kc,ij = W((Xj − Xi)/h)L(Xjd,Xid,λ). Also, denote by δ = q/(4 + q), C1 = (a1,…, aq)′ and
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm138.gif?pub-status=live)
. Then we have (h1 … hq)−1Kc,ij = C1 nδWC1,ij LC2,ij. Also note that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm139.gif?pub-status=live)
; we have
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170408054310-07914-mediumThumb-S0266466606060294frm028.jpg?pub-status=live)
where D1 > 0 is a finite constant. In the last equality we used |LC2,ij| ≤ 1 and assumption (C3). Also, we replaced one of the (C1′)−1/2 by C1−1/2 because as ∈ [C1s,C2s] are all bounded from above and below. The difference can be absorbed into D1.
By noting that An(c′) − An(c) is a degenerate U-statistic, and using (A.14), we have
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170408054310-54148-mediumThumb-S0266466606060294frm029.jpg?pub-status=live)
where in the preceding expression A ∼ B means A and B having the same order of magnitude and D is a finite positive constant. Therefore, An(·) (hence, Bn(·)) is tight by Theorem 3.1 of Ossiander (1987). █
Proof of Theorem 2.3. We will provide a proof for the discrete dependent variable case. The continuous case is similar. To prove (13), similar to the decomposition of Tn,γ, we decompose
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm140.gif?pub-status=live)
as
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm141.gif?pub-status=live)
, where the definitions of Tnj* are similar to those of Tnj with the proper changes; i.e.,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm142.gif?pub-status=live)
need to be changed to
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm143.gif?pub-status=live)
. We further decompose
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm144.gif?pub-status=live)
, where the definitions of Tn1,j* are similar to Tn1,j with the proper changes (j = 1,2,3).
The term Tn1,1* can be written as a second-order U-statistic (zi* = (xi*,yi*) = (xi,yi*)):
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm145.gif?pub-status=live)
where
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm146.gif?pub-status=live)
with
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm147.gif?pub-status=live)
.
It is easy to check that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170408054310-45499-mediumThumb-S0266466606060294ffm148.jpg?pub-status=live)
Similarly,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170408054310-41191-mediumThumb-S0266466606060294ffm149.jpg?pub-status=live)
Hence, E*[Hn*(zi*,zj*)|zi*] = 0. Thus, conditional on the random sample {xi,yi}i=1n, Tn1,1* is a degenerate U-statistic.
Denote Un,ij* = [2/(n(n − 1))]Hn*(zi*,zj*) and define Un* = [2/(n(n − 1))]sum;i [sum ]j>i Hn*(zi*,zj*) ≡ Tn1,1*. We apply the CLT of de Jong (1996) for generalized quadratic forms to derive the asymptotic distribution of Un*|{xi,yi}i=1n. The reason for using de Jong's central CLT instead of the one in Hall (1984) is that in the bootstrap world, the function Hn*(zi*,zj*) depends on i and j, because zi* = (xi,yi*). By de Jong (1996, Prop. 3.2) we know that Un*/Sn* → N(0,1) in distribution in probability if GI*, GII*, and GIV* are all op(Sn*4), where Sn*2 = E*[Un*2], GI* = [sum ]i [sum ]j>i E*[Un,ij*4], GII* = [sum ]i [sum ]j>i [sum ]l>j>i[E*(Un,ij*2Un,il*2) + E*(Un,ji*2Un,jl*2) + E*(Un,li*2Un,lj*2)], and GIV* = (½)[sum ]i [sum ]j>i [sum ]s [sum ]t>s E*(Un,is*2Un,sj*2Un,ti*2Un,js*2).
Now,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170408054310-07479-mediumThumb-S0266466606060294ffm150.jpg?pub-status=live)
Hence,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm151.gif?pub-status=live)
. By using a proof similar to the proof of Lemma A.4, one can show that Sn*2 has the same order as Sn*2 where Sn*2 is the same as Sn*2 except that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm171.gif?pub-status=live)
is replaced by γ. Hence we only need to establish the order of Sn*2. Because discrete regressors do not affect its order, for clarity, we will establish the order of Sn*2 for the case with continuous regressors only. We have
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170408054310-40035-mediumThumb-S0266466606060294ffm152.jpg?pub-status=live)
where C > 0 is a constant, which implies that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm153.gif?pub-status=live)
. Hence,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm154.gif?pub-status=live)
.
Next,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm155.gif?pub-status=live)
. Similar to Sn*2, one can show that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm156.gif?pub-status=live)
given that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm157.gif?pub-status=live)
.
From the preceding calculation it should be apparent that the probability orders of GI*, GII*, and GIV* are solely determined by the factor of n's and
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm158.gif?pub-status=live)
through
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm159.gif?pub-status=live)
. Therefore, tedious but straightforward calculations show that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170408054310-76863-mediumThumb-S0266466606060294ffm160.jpg?pub-status=live)
Therefore, Gk*/Sn*4 = op(1) for all k = I,II,IV, and we know that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294frm030.gif?pub-status=live)
Next, define
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170408054310-58682-mediumThumb-S0266466606060294ffm161.jpg?pub-status=live)
where
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm162.gif?pub-status=live)
is defined in the same way as Hn*(zi*,zj*) except that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm163.gif?pub-status=live)
is replaced by
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm164.gif?pub-status=live)
. Similar to the analysis of Sn*2, one can show that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm165.gif?pub-status=live)
and that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm166.gif?pub-status=live)
. These results together with (A.16) lead to
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm167.gif?pub-status=live)
The analysis of Tn1,2*, Tn1,3*, Tn2*, and Tn3* is similar to that of their counterparts in the proof of Theorem 2.1. One can show that Tn1,1* is the leading term of Tn*. For example, in Lemma A.2(i) we have shown that Tn1,2 = Op(n−1/2) by proving that E [Tn1,22] = O(n−1). By similar arguments one can show that E*[Tn1,2*2] = Op(n−1). The details are omitted here to save space. Therefore, we conclude that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm168.gif?pub-status=live)
has the same asymptotic distribution as that of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm169.gif?pub-status=live)
. Hence,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065628874-0502:S0266466606060294:S0266466606060294ffm170.gif?pub-status=live)
Because N(0,1) is a continuous distribution, by Polya's theorem (Bhattacharya and Rao, 1986), we obtain Theorem 2.3. █