A NONPARAMETRIC BOOTSTRAP TEST OF CONDITIONAL DISTRIBUTIONS

Yanqin Fan; Qi Li; Insik Min

doi:10.1017/S0266466606060294

A NONPARAMETRIC BOOTSTRAP TEST OF CONDITIONAL DISTRIBUTIONS

Published online by Cambridge University Press: 23 May 2006

Yanqin Fan ,

Qi Li and

Insik Min

Show author details

Yanqin Fan: Affiliation:
Vanderbilt University
Qi Li: Affiliation:
Texas A&M University
Insik Min: Affiliation:
Kyung Hee University

Article contents

Abstract
1. INTRODUCTION
2. THE NULL HYPOTHESIS AND THE TEST
3. MONTE CARLO SIMULATION RESULTS
4. CONCLUSIONS
APPENDIX
References

Rights & Permissions

Abstract

This paper proposes a bootstrap test for the correct specification of parametric conditional distributions. It extends Zheng's test (Zheng, 2000, Econometric Theory 16, 667–691) to allow for discrete dependent variables and for mixed discrete and continuous conditional variables. We establish the asymptotic null distribution of the test statistic with data-driven stochastic smoothing parameters. By smoothing both the discrete and continuous variables via the method of cross-validation, our test has the advantage of automatically removing irrelevant variables from the estimate of the conditional density function and, as a consequence, enjoys substantial power gains in finite samples, as confirmed by our simulation results. The simulation results also reveal that the bootstrap test successfully overcomes the size distortion problem associated with Zheng's test.We are grateful for the insightful comments from three referees and a co-editor that greatly improved the paper. Li's research is partially supported by the Private Enterprise Research Center, Texas A&M University. Fan is grateful to the National Science Foundation for research support.

Type: Research Article
Information: Econometric Theory , Volume 22 , Issue 4 , August 2006 , pp. 587 - 613

DOI: https://doi.org/10.1017/S0266466606060294 [Opens in a new window]
Copyright: © 2006 Cambridge University Press

1. INTRODUCTION

Currently, there exists a substantial body of work on consistent model specification testing for regression models and for unconditional distribution (density) functions; see Bierens and Ploberger (1997), Delgado and Manteiga (2001), Fan (1994, 1997, 1998), Fan and Li (1996), Hong and White (1996), Wooldridge (1992), and the references therein. In many economic applications, however, it is the distribution of one variable conditional on some other variables that is of more direct interest. The popular parametric binary or multinomial response models are but two leading examples of conditional probability models. Conditional probability models also are widely deployed in risk management and insurance settings, where the dependent variable of interest may be the claim size (a continuous variable) and the explanatory variables usually contain a mixture of discrete and continuous variables such as sex, age, whether children are present, whether one smokes, and so forth. Moreover, in risk management analysis, usually one is interested in the entire (conditional) distribution, rather than only in the conditional mean itself. Hence, a conditional probability model is more useful than a regression model in risk analysis. Relatively speaking, tests for conditional probability models are scarce. Zheng (2000), using kernel density estimators, proposed a consistent test for a parametric conditional density function. He showed that the limiting distribution of his test statistic is N(0,1) and that the test can detect Pitman local alternatives approaching the null distribution at the rate of (nh^q/2)^−1/2, where n is the sample size, h is the bandwidth, and q is the dimension of the conditioning variables. To apply Zheng's test to a given data set, one needs to choose the bandwidth; no guidance is provided on how this should be accomplished. Moreover, the requirement that both the dependent variable y and conditioning variables x are continuous variables severely limits the scope of application of Zheng's test, as many economic data sets contain both continuous and discrete variables. Andrews (1997) proposed a conditional Kolmogorov (CK) test for testing a parametric conditional distribution function. His test overcomes the difficulties associated with Zheng's test; it does not involve smoothing parameters and allows for both discrete and continuous variables. The critical values of the CK test of Andrews are obtained via a parametric bootstrap procedure, and the test can detect Pitman type local alternatives that approach the null model at the rate of O(n^−1/2). Although Andrews' test can handle both continuous and discrete variables, it does not produce an estimate of the conditional density function, which is of course undesirable when the parametric distribution function is rejected. In addition, it does not distinguish between relevant and irrelevant explanatory variables.

A related literature is the work on dynamic integral probability transform models such as that outlined in Diebold, Gunther, and Tay (1998). Corradi and Swanson (2004) and Li and Tkacz (2004) have also proposed bootstrap-based tests for conditional distributions. The Corradi and Swanson (2004) procedure is a nonsmoothing test similar to that of Andrews (1997), and their test extends Andrews' test to the time series data setting. Li and Tkacz (2004) use kernel smoothing; however, like Zheng (2000), they only consider the case whereby both y and x are continuous variables. The conventional way of handling discrete variables when estimating a conditional density function involving both discrete and continuous explanatory variables is by the so-called frequency method in which the entire sample is first split into a number of distinct cells and the data in each cell are then used to estimate the conditional density that is a function of the remaining continuous variables. For economic data, however, it is typically the case that the number of discrete cells is comparable to or even larger than the sample size. This renders the nonparametric frequency approach infeasible. Moreover, one may not know which conditional variables should be included in a particular application and hence faces the danger of including potentially irrelevant variables in the estimate. This is unfortunate, particularly in nonparametric settings, as including irrelevant explanatory variables has serious consequences for the accuracy of the resulting estimate: the rate of convergence of the density estimator will deteriorate quickly with the number of irrelevant continuous variables (the “curse of dimensionality”), whereas the number of cells will increase quite quickly with the number of irrelevant discrete variables. Recently, Hall, Racine, and Li (2004) proposed estimating a conditional density by smoothing both the discrete and continuous variables and showed that the use of cross-validation can automatically remove irrelevant variables from the resulting estimate. This is because the cross-validation method selects bandwidths that converge to some optimal values for relevant variables but selects large values for irrelevant conditional variables, thereby effectively smoothing out the irrelevant variables from the resulting estimate.

In this paper, we exploit the approach of Hall et al. (2004) to establish an alternative test for a parametric conditional density function. It is constructed based on the Zheng (2000) setup; however, it improves upon Zheng's test in a number of important ways: (i) the bandwidth is automatically chosen by cross-validation, thereby avoiding potential arbitrariness in the test's outcome due to an arbitrary choice of the bandwidth; (ii) it allows for both discrete and continuous variables; and (iii) the critical values are obtained from a parametric bootstrap procedure, which corrects the size distortions present in Zheng's approach. Although (ii) and (iii) are shared by Andrews' CK test, our test automatically produces an estimate of the conditional density function when the parametric density function is rejected by the test. More importantly, by automatically smoothing both the discrete and continuous variables via the method of cross-validation, our test has the advantage of automatically removing irrelevant variables from the resulting estimate (see Hall et al., 2004) and, as a consequence, enjoys substantial power gains in finite samples, as confirmed by our simulation results. Although our proposed test can only detect Pitman local alternatives approaching the null at rates slower than O(n^−1/2), it can be shown that for high-frequency alternatives, our test can detect local alternatives that approach the null at rates o(n^−1/2) in terms of the L₁ norm of the difference between the local alternative and the null model (e.g., Fan, 1998; Fan and Li, 2000). Hence it provides a complement to Andrews' CK test.

The remainder of this paper is organized as follows. In Section 2 we review and suggest a modified version of Zheng's test statistic. We also propose a bootstrap method for approximating the null distribution of our test. Section 3 reports Monte Carlo simulation results that examine the finite-sample performance of the proposed test. Finally, Section 4 concludes. Proofs are presented in the Appendix.

2. THE NULL HYPOTHESIS AND THE TEST

2.1. Zheng's Test

We begin by briefly reviewing the test proposed by Zheng (2000). Suppose that the data consist of {y_i,x_i}_i=1ⁿ, an independent and identically distributed (i.i.d.) sample drawn from the distribution of (y,x) with the joint density function p(y,x). Let p(y|x) denote the conditional density function of y given x. We are interested in testing whether p(y|x) belongs to a particular parametric family. Let f (y|x,θ) denote a parametric conditional density function with θ being a k × 1 parameter. The null hypothesis is given by

where Θ is the parameter space that is a compact set in

. The alternative hypothesis is the negation of the null:

The Kullback–Leibler information criterion (Kullback and Leibler, 1951), measuring the discrepancy of two conditional density functions, is defined as

It is well known that I(p,f) ≥ 0 and I(p,f) = 0 if and only if p(y|x) = f (y|x,θ₀) almost everywhere (a.e.). Thus, I(p,f) serves as a proper measure to test H₀. For technical reasons, instead of basing his test on the information measure, Zheng (2000) considered its first-order expansion,

Weighting (2) by the marginal density p₁(x) of the conditional variable x leads to the following measure:

Zheng (2000) has shown that I₁(p,f) ≥ 0 and the equality holds if and only if H₀ is true. Therefore, I₁(p,f) also serves as a proper measure to test for H₀. For continuous random variables y and x, Zheng (2000) proposed estimating p(y_i,x_i) by a standard kernel density estimator and estimating f (y_i|x_i,θ₀)p₁(x_i) by a smoothed density estimator

given by

where w_{2,h_y}(·) = h_y⁻¹w₂(·), w₂(·) is a (specially defined) univariate kernel function, W_h(·) is a product kernel

with w(·) being a standard (second-order) univariate kernel, h_y and h_s's are the smoothing parameters, and

is an estimator of θ₀ under the null model. The measure I₁(p,f) is then estimated by

To establish the null asymptotic distribution of T_n,h^c, Zheng (2000) suggested transforming the dependent variable such that it takes values in [0,1] and then choosing a special kernel function for w₂(·) with the property that

as n → ∞. The use of the smoothed estimator

eliminates the bias of the kernel estimator of p(y_i,x_i) under H₀ such that the test statistic is appropriately centered for a wide range of smoothing parameter values. Under some regularity conditions, Zheng (2000) showed that the asymptotic null distribution of T_n,h^c is normal and provided a consistent estimator of its asymptotic variance.

2.2. Our Framework

We now extend Zheng's test to include both continuous and discrete explanatory variables (x is a mixed variable), where the dependent variable y can be discrete or continuous.

We first consider the case that y is a discrete variable. In this case, we show that the smoothed estimator

reduces to an average estimator. Thus, the resulting test statistic only involves summations and hence avoids the need for numerical integration.

Let x = (x^c,x^d), where x^c is a q × 1 continuous variable and x^d is an r × 1 discrete variable. We use x_is^c (x_is^d) to denote the sth component of x_i^c (x_i^d). We further assume that x_is^d takes the values in {0,1,…, c_s − 1} (it takes c_s different values).

In constructing the kernel density estimate, we use different kernel functions for the discrete and continuous variables. For the discrete variable x^d, we use the Aitchison and Aitken (1976) kernel: l(x_is^d,x_js^d,λ_s) = 1 − λ_s if x_is^d = x_js^d, and l(x_is^d,x_js^d,λ_s) = λ_s /(c_s − 1) if x_is^d ≠ x_js^d. Hence, the product kernel for the discrete variable is

where N_is(x) = I(x_is^d ≠ x_js^d), in which I(·) is the usual indicator function and λ₁,…,λ_r are the smoothing parameters for the discrete components and are constrained by 0 ≤ λ_s ≤ (c_s − 1)/c_s. Note that when λ_s assumes the upper extreme value, (c_s − 1)/c_s, l(x_is^d,x_js^d,λ_s = (c_s − 1)/c_s) ≡ 1/c_s becomes unrelated to (x_is^d,x_js^d), i.e., the sth component of x^d is completely smoothed out when λ_s = (c_s − 1)/c_s.

For the continuous component x^c, we still use the standard (second-order) product kernel function as discussed earlier. Therefore, for the mixed type variable x = (x^c,x^d), the kernel function is defined by

where γ = (h,λ) ≡ (h₁,…, h_q,λ₁,…, λ_r).

We now discuss how to estimate p(y_i,x_i) and p₁(x_i). Assume that y_i is a discrete variable; then we estimate p(y_i,x_i) and p₁(x_i) by the following leave-one-out kernel estimators:

To construct the smoothed estimator of f (y_i|x_i,θ₀), we replace W_h(·) in (4) by K_γ(x_i,x_j) and ∫w_{2,h_y}((y_i − y)/h_y) dy by [sum ]_y I(y_i − y). Taking into account these modifications, we obtain

Using

just introduced, we define our test statistic as

Note that the double summation in T_n,γ does not include j = i terms because we have used the leave-one-out estimators for estimating p(y_i,x_i) and p₁(x_i). The reason for using these leave-one-out estimators is that, under H₀, the asymptotic distribution of T_n,γ will be centered at zero (there is no center term).

The smoothing parameters h₁,…, h_q (corresponding to the continuous variable x^c) can be selected by several commonly used procedures, including the cross-validation method, the plug-in method, and some ad hoc methods. However, for λ₁,…, λ_r, the plug-in or even an ad hoc formula is not available. Hall et al. (2004) have shown that using the cross-validation method to select λ₁,…, λ_r and h₁,…, h_q has some nice properties: when x_s^c (x_s^d) is a relevant variable, the cross-validation method will select a small h_s(λ_s) that converges to zero at an optimal rate; when x_s^c (x_s^d) is an irrelevant variable,¹

We say that x_s is an irrelevant variable if p(y|x) is independent of x_s.

the cross-validation method will select an extremely large value for h_s (upper bound value for λ_s) so that the irrelevant variables are (asymptotically) automatically removed (smoothed out). Indeed in the problem of nonparametric estimation of a conditional density, cross-validation comes into its own as a method with no obvious peers. Therefore, we will choose λ₁,…, λ_r, h₁,…, h_q by the cross-validation method suggested in Hall et al. (2004).

Let (h,λ) = (h₁,…, h_q,λ₁,…, λ_r). Hall et al. propose choosing (h,λ) by minimizing the following objective function:²

Hall et al. (2004) show that, up to an additive constant term that does not depend on (h,λ), CV(h,λ) is a consistent estimator of the weighted integrated squared error:

, where

if y is a continuous variable and

if y is a discrete variable.

where

in which

are the leave-one-out kernel estimators of p₁(x_i) and p(x_i,y_i), respectively, and m(x_i^c) is a weight function introduced to deal with the small random denominator problem; see Hall et al. (2004).

We will use

to denote the resulting smoothing parameters. Assuming that all the x variables are relevant variables, Hall et al. (2004) showed that

for s = 1,…,q, and

for s = 1,…, r, where a_s⁰ and b_s⁰ are some finite constants.

THEOREM 2.1. Under conditions (C1)–(C3) given in the Appendix, we have under H₀

where

and

is a consistent estimator of σ₀² = [∫W²(v) dv]E [(1 − f (y_i|x_i,θ₀)) f⁻¹(y_i|x_i,θ₀)p₁(x_i)], the asymptotic variance of

It can be shown that under

diverges to +∞. Hence, the

test is a consistent test. Moreover, the

test can detect local alternatives that approach the null at a rate of O_p(n^−1/2(h₁ … h_q)^−1/4) = O_p(n^{−(1/2)((8+q)/(8+2q))}), which is slower than O_p(n^−1/2) (because h_j = O_p(n^−1/(4+q)) for all j = 1,…,q).

We now briefly discuss the case where the dependent variable y is continuous. In this case, one can still use Zheng's test statistic given in (5) but with w_{2,h_y}((y_i − y_j)/h_y) and W_h((x_i^c − x_j^c)/h) being replaced by

, respectively, where

denote the cross-validation selected smoothing parameters suggested by Hall et al. (2004); i.e., one chooses (h_y,h,λ) by minimizing (11), but now G_−i(x_i) is defined as

where w_{2,h_y}(v) = h_y⁻¹w₂(v) and w₂(v) = ∫w₂(u)w₂(v − u) du is the twofold convolution kernel derived from w₂(·).

With a slight abuse of notation, the resulting test statistic becomes

where

contains the extra smoothing parameter

because y_i is continuous.

The asymptotic distribution of

is given in the following theorem.

THEOREM 2.2. Under conditions (C1)–(C3) given in the Appendix, we have under H₀,

where

is a consistent estimator of σ_0,c² = 2[∫W²(v) dv]E [p₁(x_i)], the asymptotic variance of

The proof of Theorem 2.2 is similar to that of Theorem 1 in Zheng (2000) and is omitted here.

2.3. A Parametric Bootstrap Test

Theorems 2.1 and 2.2 provide, respectively, the asymptotic null distribution of

. Consequently, one can perform tests for H₀ by comparing the value of

with its asymptotic critical value. However, it is well known that consistent nonparametric tests often suffer from substantial finite-sample size distortions. Our simulations reveal that the

shares this drawback. To overcome this problem, we propose a bootstrap procedure to more accurately approximate the finite-sample null distribution of

. It involves the following steps.

Step (i). Generate the ith bootstrap value of the dependent variable y from the parametric conditional distribution

. Denote this value by y_i* (i = 1,…, n). We have the complete bootstrap sample {x_i,y_i*}_i=1ⁿ.

Step (ii). Based on the parametric null model, estimate θ using the bootstrap sample. Let

denote the resulting estimator. Compute the bootstrap statistic

in the same way as

except that

are replaced by

, respectively. Note that we use the same cross-validation selected smoothing parameter

in computing the bootstrap statistics. There is no re-cross-validation in computing

Step (iii). Repeat steps (i) and (ii) a large number of times, say, B times, and use the empirical distribution of the B bootstrap statistics

to approximate the null distribution of

Step (iv). The bootstrap test rejects H₀ at significance level α if

exceeds the empirical αth percentile of

The following theorem justifies the asymptotic validity of the bootstrap test.

THEOREM 2.3. Assume the same conditions as in Theorem 2.1 (Theorem 2.2) except the null hypothesis. We have

where Φ(·) is the cumulative distribution function of a standard normal random variable.

The proof of Theorem 2.3 is given in the Appendix.

In words, Theorem 2.3 states that

converges to N(0,1) in distribution in probability. Other authors show that some bootstrap method works using the concept of convergence with probability one, where one states that the left-hand side of (13) is o(1) with probability one (i.e., convergence in distribution with probability one). Here we choose to use the concept of convergence in distribution in probability because our test statistic involves nonparametric estimation and it is easier to work with “convergence in probability” than “convergence with probability one.”

Note that Theorem 2.3 holds true regardless of whether the null hypothesis is true or not. Therefore, (i) when the null hypothesis is true, the bootstrap procedure will lead to (asymptotically) correct size of the test, because

converges in distribution to the same N(0,1) limiting distribution under H₀; (ii) when the null hypothesis is false, because the test statistic

will converge to +∞ in probability, whereas asymptotically the bootstrap critical value is still finite (say, the 95th quantile from the N(0,1) distribution), the bootstrap procedure leads to a consistent test.

3. MONTE CARLO SIMULATION RESULTS

In this section, we present Monte Carlo simulation results to examine the finite-sample performance of our

test.

3.1. Discrete Dependent Variable

In this simulation experiment, the dependent variable y is a {0,1} binary variable. We use a slightly different notation in this section; x denotes x^c and z denotes x^d. The data generating process (DGP) for the null model is given by

where {x_i}_i=1ⁿ is a random sample from N(0,1), z_i takes binary values {0,1} with case (i) Pr[z_i = 1] = ½ and Pr[z_i = 0] = ½ and case (ii) Pr[z_i = 1] = 0.8 and Pr[z_i = 0] = 0.2, and the error term {u_i} is i.i.d. N(0,1). Moreover, x_i, z_i, and u_i are all independent of each other. The true parameters are {β₀,β₁} = {1,1} and β₂ = {1, 0.3, 0}; β₂ = 0 corresponds to the case that z_i is in fact an irrelevant variable. This leads to the following null hypothesis:

where Φ(·) is the standard normal cumulative distribution function. The parametric conditional density of the null model is estimated by the maximum likelihood (ML) method.

The following two alternative DGPs are constructed to examine the power of the

test; one has a nonlinear term in the index, and the other has a conditional heteroskedastic error:

where x_i, z_i, and u_i are all generated in the same way as before. Also, β₀,β₁,β₂ take the same values as previously, whereas β₃ = 1. We use the parametric bootstrap described earlier to approximate the null distribution of the test statistic

Our test will be compared with the CK test of Andrews (1997) with test statistic (CK_n) defined as

where F(·|·,·,θ) is the parametric conditional distribution function and

is the ML estimator of θ₀.

The sample sizes considered are n = 100 and 200, the numbers of simulations are 5,000 for size estimation and 2,000 for power estimation, and the number of bootstraps is B = 1,000 for all cases. The simulation results for discrete y_i with relevant covariates only are reported in Table 1.

DGPa: The case of discrete yi with relevant covariates

From Table 1 we observe that for different values of β₂ (with β₂ = 1, 0.3) and different values of Pr(z_i = 1) (0.5, 0.8), the performances of the

and Andrews' tests are qualitatively the same. Overall the estimated sizes are quite close to their nominal sizes for both tests. The power performances are mixed for the two alternative models. For the alternative DGP₁^a with an extra quadratic term, our test

shows higher power than Andrews' test for the sample sizes considered. However, for some cases of DGP₂^a with a heteroskedastic error term, Andrews' test is slightly more powerful than ours. The simulation results show that our

test complements Andrews' test.

Next we consider the case with an irrelevant covariate. We use the same DGP as before except that now we choose β₂ = 0 so that the binary discrete variable z becomes an irrelevant covariate. Because this information is unknown a priori, we still compute the conditional probability of y conditional on both x and z. In this case we expect that the cross-validation method tends to select the upper bound value of λ = ½ so that the irrelevant covariate z is smoothed out automatically, resulting in a finite-sample power gain for the

test.

From Table 2 we observe that the power of the

test improves substantially compared with those reported in Table 1. It is interesting to observe that for DGP₂^a, the power performance of the

test is quite comparable to that of Andrews' test. Thus, the simulation results confirm that our cross-validation-based test indeed has the ability to remove irrelevant covariates and enjoys superior finite-sample power performance.

The case of discrete yi with irrelevant covariates

3.2. Continuous Dependent Variable

In this section we consider the case where both y and x are continuous variables, and we compare the finite-sample performance of Zheng's original test with our

test. The first DGP we use is the same as that in Zheng. The null model is a linear regression model with normal homoskedastic errors:

where {x_i}_i=1ⁿ is a random sample from N(0,1) and the error term {u_i} is i.i.d. N(0,σ²). Moreover, x_i and u_i are independent of each other. The true parameters are {β₀,β₁,σ} = {1,1,1}. This leads to the following null hypothesis:

where φ(·) is the standard normal density function. The parameter θ is estimated by the ML estimation method.

Two alternative models are considered: one is designed to test misspecification in the regression (DGP₁^b), and the second is to test homoskedasticity of the error term (DGP₂^b):

where β₂ is set to be 1 in the experiment. We also report Andrews' test for comparison purposes. The simulation results are reported in Table 3a.

DGPb: The case of continuous yi

We observe from Table 3a that the parametric bootstrap method successfully overcomes the size distortion of Zheng's test. The estimated sizes of the bootstrap test are all close to their nominal values, whereas Zheng's test based on the asymptotic normal approximation is significantly undersized. For the alternatives DGP₁^b and DGP₂^b, we observe that the bootstrap test

is much more powerful than Zheng's test. There are two reasons for this: the first is that the bootstrap test corrects the undersize problem of Zheng's test and hence improves the finite-sample power performance; the second reason is that we use the data-driven cross-validation method to select the smoothing parameters that lead to optimal smoothing in estimating the unknown conditional density functions, whereas Zheng suggested using some ad hoc method to select the smoothing parameters. It turns out that the use of optimal smoothing also enhances the finite-sample power of the test. For DGP₁^b, Andrews' test has similar power as the

test, whereas for DGP₂^b, Andrews' test is less powerful than the

test.

Finally we consider a case that there exists an irrelevant continuous variable. We will use basically the same setup as in DGP^b except that we set β₂ = 0 now. Therefore, x_2i becomes an irrelevant variable. However, this information is not used in the estimation. That is, all estimation methods still use the full data set {y_i,x_1i,x_2i}_i=1ⁿ. Because our cross-validation-based

has the advantage of (asymptotically) removing the irrelevant variable x₂, we expect that the

test will enjoy further power gains. The simulation results are reported in Table 3b.

From Table 3b we observe that the

test has good estimated sizes. Zheng's test still underestimates the sizes at the 5% and 10% levels. Andrews' test is also somewhat undersized when an irrelevant variable exists. From the estimated power results, we see substantial power gain of the

test over Zheng's test. Essentially, Zheng's test is based on a two-dimensional nonparametric conditional density estimate because the smoothing parameters in Zheng's test are selected by some ad hoc rules that cannot detect the irrelevant variable x₂, whereas our

test estimates, asymptotically, a one-dimensional conditional density function because x_2i will be smoothed out asymptotically. The

test is also more powerful than Andrews' test for this DGP (when there is an irrelevant continuous variable). Of course here we only report a limited simulation result, from the local power analysis; we expect that there exist data generating processes for which Andrews' test will be more powerful than the

test. Our simulation results show that the

test can serve as a useful complement to Andrews' test when one is interested in testing a parametric conditional distribution.

4. CONCLUSIONS

This paper proposes a kernel-based bootstrap test for parametric conditional distribution functions. We separately consider the case where y is a discrete variable and where y is a continuous variable. In either case, the conditional variables can contain both discrete and continuous variables. By automatically smoothing both the discrete and continuous variables via the method of cross-validation, our test has the advantage of automatically removing irrelevant variables from the estimate of the conditional density function and, as a consequence, enjoys substantial power gains in finite-sample applications, as confirmed by our simulation results. The test is potentially applicable in a wide variety of applications and should prove useful to applied researchers.

APPENDIX

We first state conditions that are used to prove Theorem 2.1.

(C1) {y_i,x_i}_i=1ⁿ are i.i.d. data with a joint density p(y,x). The first-order derivatives of p(.,.) with respect to its continuous arguments are uniformly bounded. The marginal density p₁(x) of x_i and its first-order derivatives with respect to its continuous arguments are uniformly bounded.

(C2) (i) The parameter space Θ is a compact and convex subset of R^k. Let ∥·∥ denote the euclidean norm of ·; then f (y|x,θ₀)⁻¹, ∥(∂f (y|x,θ))/∂θ∥, ∥(∂² log f (y|x,θ))/∂θ∂θ′∥, and ∥(∂ log f (y|x,θ))/∂θ × (∂ log f (y|x,θ))/∂θ′∥ are all bounded by a nonnegative function b(x,y) with ∫ b(x,y)^s < ∞ (s = 1,2), where ∫ denotes integration for the continuous variable and summation for the discrete variable. (ii)

under H₀.

(C3) w(·) is a nonnegative, bounded, symmetric function with ∫ w(v) dv = 1 and ∫ w(v)v² dv = c(< ∞).

The preceding conditions are basically the same as those used in Zheng (2000).

We give the central limit theorem (CLT) of Hall (1984, Thm. 3.1) for degenerate U-statistics as a lemma here.

LEMMA A.1. Let

be a second-order U-statistic, where {z_i}_i=1ⁿ is i.i.d. Suppose E [H_n(z_i,z_j)|z_i] = 0 (for i ≠ j, U_n is a degenerate U-statistic) and define G_n(z₁,z₂) = E [H_n(z₃,z₁)H_n(z₃,z₂)|z₁,z₂]. If

then

In the proof presented subsequently, we will replace

by their nonstochastic leading terms: (h₁,…, h_q) = (a₁⁰n^−1/(q+4),…, a_q⁰n^−1/(q+4)) and (λ₁,…, λ_r) = (b₁⁰n^−2/(q+4),…, b_r⁰n^−2/(q+4)). This will greatly simplify the arguments in the proof. By the stochastic equicontinuity result of Ichimura (2000) (see Lemma A.4, which follows), we know that the conclusion holds provided

, which are true by Theorem 3.1 of Hall et al. (2004).

Using the shorthand notations

, and the identity

we can write T_n,γ = T_n1 + T_n2 + T_3n, where

Let

, where

is between the line segment of

. By Taylor expansion, we have

where the definitions of T_n1,j (j = 1,2,3) should be apparent.

The term T_n1,1 can be written as a second-order U-statistic (z_i = (x_i,y_i)):

where

It is easy to check that

Similarly,

Thus, E [H_n(z_i,z_j)|z_i] = 0 and T_n1,1 is a degenerate U-statistic.

where

, we have used

if x_i^d ≠ x_j^d.

Therefore, we have

Equation (A.3) implies that {E [H_n²(z_i,z_j)]}⁻¹ = O(h₁ … h_q). Similarly, one can show that E [H_n⁴(z_i,z_j)] = O((h₁ … h_q)⁻³). Define G_n(z₁,z₂) = E [H_n(z₃,z₂)H_n(z₃,z₁)|z₁,z₂]. One can show that E [G_n²(z_i,z_j)] = O((h₁ … h_q)⁻¹). Thus, equation (A.1) becomes

Thus by Lemma A.1 we know that

Define

where

is defined in the same way as H_n(z_i,z_j) except that θ₀ is replaced by

. Applying Lemma 3.1 of Powell, Stock, and Stoker (1989) or Lemma 1 of Zheng (2000), it is straightforward to show that

. Thus, we have

Applying Taylor expansion to T_n2, i.e., using

, we obtain

where

and

Lemma A.2, which follows, shows that T_n1,2 = O_p(n^−1/2) and T_n2,1 = O_p(n^−1/2), and Lemma A.3 shows that T_n1,3 = O_p(1), T_n2,2 = O_p(1), and T_n3 = O_p(n⁻¹). These results together with

lead to

Expressions (A.6) and (A.8) together complete the proof of Theorem 2.1. █

LEMMA A.2.

(i) T_n1,2 = O_p(n^−1/2).

(ii) T_n2,1 = O_p(n^−1/2).

Proof of (i).

First note that E [T_n1,2] = 0 because

Hence,

The preceding expression is zero if i,j,i′,j′ all take different values (because E [K_γ,ij f_ij⁽¹⁾/f_i] = 0). Therefore, for E {[T_n1,2]²} to be nonzero, we must have either (i) i,j,i′,j′ take three different values or (ii) i,j,i′,j′ take two different values. For these two cases it is easy to show that

Hence, E {[T_n1,2]²} = O(n⁻¹), and consequently, T_n1,2 = O_p(n^−1/2). █

Proof of (ii).

where

Hence, E [A_1n(z_i,z_j)] = E [A_1n,1(z_i,z_j)] − E [A_1n,2(z_i,z_j)] = 0.

One can write T_n2,1 = [2/n(n − 1)][sum ]_i [sum ]_j>i V_1n(z_i,z_j) as a second-order U-statistic, where V_1n(z_i,z_j) = (½)[A_1n(z_i,z_j) + A_1n(z_j,z_i)].

where in the preceding expression, A_i = B_i + (s.o.) means that [sum ]_i A_i = [sum ]_i B_i + o_p([sum ]_i B_i), i.e., [sum ]_i B_i is the leading term of [sum ]_i A_i. Here v_1i = (½)p₁(x_i)[f_i⁽¹⁾ − E(f_i⁽¹⁾|x_i)], and we have used E [f_ji⁽¹⁾|x_i] = E [f_i⁽¹⁾|x_i] = [sum ]_y f (y|x_i)².

Using the H-decomposition, we have

LEMMA A.3.

(i) T_n1,3 = O_p(1).

(ii) T_n2,2 = O_p(1).

(iii) T_n3 = O_p(n⁻¹).

Proof of (i). Here

. By assumption (C2) (b(.,.) is the bound function for f⁽²⁾(·)):

which implies that T_n1,3 = O_p(1). █

Proof of (ii). It is similar to the proof of (i) and is thus omitted here.

Proof of (iii). Using

, where

is between the line segment of

, we have

, where

It is easy to show that E [∥T_3n,1,0∥] = O(1). Hence, T_3n,1,0 = O_p(1), which implies T_3n,1 = O_p(1) and T_3n = O_p(n⁻¹) because

. █

LEMMA A.4. (Ichimura).

where γ = (h₁,…, h_q,λ₁,…, λ_r) with h_s = a_s⁰n^−1/(q+4) (s = 1,…,q), λ_s = b_s⁰n^−2/(q+4) (s = 1,…, r), and a_s⁰ > 0 and b_s⁰ ≥ 0 are uniquely defined constants as given in Hall et al. (2004).

Ichimura (2000) has proved a general result that includes Lemma A.4 as a special case. Here, we provide an alternative proof for Lemma A.4 using a simple tightness argument (e.g., Mammen, 1992). Our proof consists of two parts: (i)

under H₀; (ii)

. Because the proofs are similar, we only provide the proof for (i).

Proof of (i). Writing

, by Theorem 3.1 of Hall et al. (2004), we know that

(in probability). This implies that

in probability. Let

, where a_js and b_jt (j = 1,2) are some positive constants with a_1s < a_s⁰ < a_2s (s = 1,…,q) and b_1t < b_t⁰ < b_2t (t = 1,…, r). Denote

. Then Lemma A.5, which follows, shows that A_n(c) ≡ n(h₁ … h_q)^1/2T_n,γ (with h_s = a_s n^−1/(q+4) and λ_s = b_s n^−2/(q+4)) is tight in

Define B_n(c) = A_n(c) − A_n(c₀). Then (i) becomes

; i.e., we want to show that, for all ε > 0

For any δ > 0, denote the δ-ball centered at c₀ by C_δ = {c : ∥c − c₀∥ ≤ δ}, where ∥·∥ denotes the euclidean norm of a vector. By Lemma A.5 we know that A_n(·) is tight. By the Arzela–Ascoli theorem (see Billingsley, 1968, Thm. 8.2, p. 55) we know that tightness implies the following stochastic equicontinuous condition: for all ε > 0, η₁ > 0, there exist a δ (0 < δ < 1) and an N₁ such that

for all n ≥ N₁.

Expression (A.10) implies that

for all n ≥ N₁.

Also, from

in probability we know that for all η₂ > 0, and for the δ given previously, there exists an N₂ such that

for all n ≥ N₂.

Therefore,

for all n ≥ max{N₁,N₂} by (A.11) and (A.12), where we have also used the fact that

is a subset of

Equation (A.13) is equivalent to (A.9). This completes the proof of (i). █

LEMMA A.5. Let A_n(c) = n(h₁ … h_q)^1/2T_n,γ, where γ = (h,λ), h_s = a_s n^−1/(q+4), λ_s = b_s n^−2/(q+4), c = (a₁,…, a_q,b₁,…, b_r), c_s ∈ [C_1s,C_2s] with 0 < C_1s < C_2s < ∞ (s = 1,…,q + r).

Then the stochastic process A_n(c) indexed by c is tight under the sup-norm.

Proof. Write K_γ,ij as (h₁ … h_q)⁻¹K_c,ij with h_s = a_s n^1/(q+4) and λ_s = b_s n^−2/(q+4), where K_c,ij = W((X_j − X_i)/h)L(X_j^d,X_i^d,λ). Also, denote by δ = q/(4 + q), C₁ = (a₁,…, a_q)′ and

. Then we have (h₁ … h_q)⁻¹K_c,ij = C₁ n^δW_C₁,ij L_C₂,ij. Also note that

; we have

where D₁ > 0 is a finite constant. In the last equality we used |L_C₂,ij| ≤ 1 and assumption (C3). Also, we replaced one of the (C₁′)^−1/2 by C₁^−1/2 because a_s ∈ [C_1s,C_2s] are all bounded from above and below. The difference can be absorbed into D₁.

By noting that A_n(c′) − A_n(c) is a degenerate U-statistic, and using (A.14), we have

where in the preceding expression A ∼ B means A and B having the same order of magnitude and D is a finite positive constant. Therefore, A_n(·) (hence, B_n(·)) is tight by Theorem 3.1 of Ossiander (1987). █

Proof of Theorem 2.3. We will provide a proof for the discrete dependent variable case. The continuous case is similar. To prove (13), similar to the decomposition of T_n,γ, we decompose

, where the definitions of T_nj* are similar to those of T_nj with the proper changes; i.e.,

need to be changed to

. We further decompose

, where the definitions of T_n1,j* are similar to T_n1,j with the proper changes (j = 1,2,3).

The term T_n1,1* can be written as a second-order U-statistic (z_i* = (x_i*,y_i*) = (x_i,y_i*)):

where

with

It is easy to check that

Similarly,

Hence, E*[H_n*(z_i*,z_j*)|z_i*] = 0. Thus, conditional on the random sample {x_i,y_i}_i=1ⁿ, T_n1,1* is a degenerate U-statistic.

Denote U_n,ij* = [2/(n(n − 1))]H_n*(z_i*,z_j*) and define U_n* = [2/(n(n − 1))]sum;_i [sum ]_j>i H_n*(z_i*,z_j*) ≡ T_n1,1*. We apply the CLT of de Jong (1996) for generalized quadratic forms to derive the asymptotic distribution of U_n*|{x_i,y_i}_i=1ⁿ. The reason for using de Jong's central CLT instead of the one in Hall (1984) is that in the bootstrap world, the function H_n*(z_i*,z_j*) depends on i and j, because z_i* = (x_i,y_i*). By de Jong (1996, Prop. 3.2) we know that U_n*/S_n* → N(0,1) in distribution in probability if G_I*, G_II*, and G_IV* are all o_p(S_n*⁴), where S_n*² = E*[U_n*²], G_I* = [sum ]_i [sum ]_j>i E*[U_n,ij*⁴], G_II* = [sum ]_i [sum ]_j>i [sum ]_l>j>i[E*(U_n,ij*²U_n,il*²) + E*(U_n,ji*²U_n,jl*²) + E*(U_n,li*²U_n,lj*²)], and G_IV* = (½)[sum ]_i [sum ]_j>i [sum ]_s [sum ]_t>s E*(U_n,is*²U_n,sj*²U_n,ti*²U_n,js*²).

Now,

Hence,

. By using a proof similar to the proof of Lemma A.4, one can show that S_n*² has the same order as S_n*² where S_n*² is the same as S_n*² except that

is replaced by γ. Hence we only need to establish the order of S_n*². Because discrete regressors do not affect its order, for clarity, we will establish the order of S_n*² for the case with continuous regressors only. We have

where C > 0 is a constant, which implies that

. Hence,

Next,

. Similar to S_n*², one can show that

given that

From the preceding calculation it should be apparent that the probability orders of G_I*, G_II*, and G_IV* are solely determined by the factor of n's and

through

. Therefore, tedious but straightforward calculations show that

Therefore, G_k*/S_n*⁴ = o_p(1) for all k = I,II,IV, and we know that

Next, define

where

is defined in the same way as H_n*(z_i*,z_j*) except that

is replaced by

. Similar to the analysis of S_n*², one can show that

and that

. These results together with (A.16) lead to

The analysis of T_n1,2*, T_n1,3*, T_n2*, and T_n3* is similar to that of their counterparts in the proof of Theorem 2.1. One can show that T_n1,1* is the leading term of T_n*. For example, in Lemma A.2(i) we have shown that T_n1,2 = O_p(n^−1/2) by proving that E [T_n1,2²] = O(n⁻¹). By similar arguments one can show that E*[T_n1,2*²] = O_p(n⁻¹). The details are omitted here to save space. Therefore, we conclude that

has the same asymptotic distribution as that of

. Hence,

Because N(0,1) is a continuous distribution, by Polya's theorem (Bhattacharya and Rao, 1986), we obtain Theorem 2.3. █

References

REFERENCES

Aitchison, J. & C.G.G. Aitken (1976) Multivariate binary discrimination by the kernel method. Biometrika 63, 413–420.Google Scholar

Andrews, D.W.K. (1997) A conditional Kolmogorov test. Econometrica 65, 1097–1128.Google Scholar

Bhattacharya, R.N. & R.R. Rao (1986) Normal Approximations and Asymptotic Expansions. Krieger.

Bierens, H.J. & W. Ploberger (1997) Asymptotic theory of integrated conditional moment tests. Econometrica 65, 1129–1151.Google Scholar

Billingsley, P. (1968) Probability and Measure. Wiley.

Corradi, V. & N. Swanson (2004) Bootstrap conditional distribution tests under dynamic misspecification. Journal of Econometrics, forthcoming.Google Scholar

de Jong, R.M. (1996) The Bierens test under data dependence. Journal of Econometrics 72, 1–32.Google Scholar

Delgado, M.A. & W.G. Manteiga (2001) Significance testing in nonparametric regression based on the bootstrap. Annals of Statistics 29, 1469–1507.Google Scholar

Diebold, F.X., T. Gunther, & A.S. Tay (1998) Evaluating density forecasts with applications to financial risk management. International Economic Review 39, 863–883.Google Scholar

Fan, Y. (1994) Testing the goodness-of-fit of a parametric density function by kernel method. Econometric Theory 10, 316–356.Google Scholar

Fan, Y. (1997) Goodness-of-fit tests for a multivariate distribution by the empirical characteristic function. Journal of Multivariate Analysis 62, 36–63.Google Scholar

Fan, Y. (1998) Goodness-of-fit tests based on kernel density estimators with fixed smoothing parameters. Econometric Theory 14, 604–621.Google Scholar

Fan, Y. & Q. Li (1996) Consistent model specification tests: Omitted variables and semi-parametric functional forms. Econometrica 64, 865–890.Google Scholar

Fan, Y. & Q. Li (2000) Consistent model specification tests: Nonparametric versus Bierens' ICM tests. Econometric Theory 16, 1016–1041.Google Scholar

Hall, P. (1984) Central limit theorem for integrated square error of multivariate nonparametric density estimators. Journal of Multivariate Analysis 14, 1–16.Google Scholar

Hall, P., J. Racine, & Q. Li (2004) Cross-validation and the estimation of conditional probability densities. Journal of the American Statistical Association 99, 1015–1026.Google Scholar

Hong, Y. & H. White (1996) Consistent specification testing via nonparametric series regression. Econometrica 63, 1133–1159.Google Scholar

Ichimura, H. (2000) Asymptotic Distribution of Non-parametric and Semiparametric Estimators with Data Dependent Smoothing Parameters. Manuscript, University College, London.

Kullback, S. & R.A. Leibler (1951) On information and sufficiency. Annals of Mathematical Statistics 22, 79–86.Google Scholar

Li, F. & G. Tkacz (2004) A consistent bootstrap test for conditional density functions with time-dependent data. Journal of Econometrics, forthcoming.Google Scholar

Mammen, E. (1992) When does bootstrap work? Asymptotic results and simulations. Springer-Verlag.

Ossiander, M. (1987) A central limit theorem under metric entropy with L₂ bracketing. Annals of Probability 15, 897–919.Google Scholar

Powell, J.L., J.H. Stock, & T.M. Stoker (1989) Semiparametric estimation of index coefficients. Econometrica 57, 1403–1430.Google Scholar

Wooldridge, J. (1992) A test for functional form against nonparametric alternatives. Econometric Theory 8, 452–475.Google Scholar

Zheng, J.X. (2000) A consistent test of conditional parametric distributions. Econometric Theory 16, 667–691.Google Scholar

DGPa: The case of discrete yi with relevant covariates

The case of discrete yi with irrelevant covariates

DGPb: The case of continuous yi

Article contents

A NONPARAMETRIC BOOTSTRAP TEST OF CONDITIONAL DISTRIBUTIONS

Abstract

1. INTRODUCTION

2. THE NULL HYPOTHESIS AND THE TEST

2.1. Zheng's Test

2.2. Our Framework

2.3. A Parametric Bootstrap Test

3. MONTE CARLO SIMULATION RESULTS

3.1. Discrete Dependent Variable

3.2. Continuous Dependent Variable

4. CONCLUSIONS

APPENDIX

References

REFERENCES

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests