A CONSISTENT DIAGNOSTIC TEST FOR REGRESSION MODELS USING PROJECTIONS

J. Carlos Escanciano

doi:10.1017/S0266466606060506

A CONSISTENT DIAGNOSTIC TEST FOR REGRESSION MODELS USING PROJECTIONS

Published online by Cambridge University Press: 03 November 2006

J. Carlos Escanciano

Show author details

J. Carlos Escanciano: Affiliation:
Universidad de Navarra

Article contents

Abstract
1. INTRODUCTION
2. THE RESIDUAL MARKED PROCESS BASED ON PROJECTIONS
3. ASYMPTOTIC THEORY
4. MONTE CARLO EVIDENCE
APPENDIX A: Proofs
APPENDIX B: Computation of the Test Statistic
References

Rights & Permissions

Abstract

This paper proposes a consistent test for the goodness-of-fit of parametric regression models that overcomes two important problems of the existing tests, namely, the poor empirical power and size performance of the tests due to the curse of dimensionality and the subjective choice of parameters such as bandwidths, kernels, and integrating measures. We overcome these problems by using a residual marked empirical process based on projections (RMPP). We study the asymptotic null distribution of the test statistic, and we show that our test is able to detect local alternatives converging to the null at the parametric rate. It turns out that the asymptotic null distribution of the test statistic depends on the data generating process, and so a bootstrap procedure is considered. Our bootstrap test is robust to higher order dependence, in particular to conditional heteroskedasticity. For completeness, we propose a new minimum distance estimator constructed through the same RMPP as in the testing procedure. Therefore, the new estimator inherits all the good properties of the new test. We establish the consistency and asymptotic normality of the new minimum distance estimator. Finally, we present some Monte Carlo evidence that our testing procedure can play a valuable role in econometric regression modeling.The author thanks Carlos Velasco and Miguel A. Delgado for useful comments. The paper has also benefited from the comments of two referees and the co-editor. This research was funded by the Spanish Ministry of Education and Science reference number SEJ2004-04583/ECON and by the Universidad de Navarra reference number 16037001.

Type: Research Article
Information: Econometric Theory , Volume 22 , Issue 6 , December 2006 , pp. 1030 - 1051

DOI: https://doi.org/10.1017/S0266466606060506 [Opens in a new window]
Copyright: © 2006 Cambridge University Press

1. INTRODUCTION

The purpose of the present paper is to develop a consistent, powerful, and simple diagnostic test for testing the adequacy of a parametric regression model with the property of being free of any user-chosen parameter (e.g., bandwidth) and, at the same time, being suitable for cases in which the covariate is of high or moderate finite dimension. Most consistent tests proposed in the literature give misleading results for this latter empirically relevant case. This problem is intrinsic and is often referred to as the “curse of dimensionality” in the regression literature; see Section 7.1 of Fan and Gijbels (1996) for some discussion on this problem. More precisely, let (Y,X′)′ be a random vector in a (d + 1)-dimensional Euclidean space, where Y represents the real-valued dependent (or response) variable, X is the d-dimensional explanatory variable,

, and A′ denotes the matrix transpose of A. Under E|Y | < ∞, it is well known that the regression function m(x) = E [Y |X = x] is well defined. If in addition E|Y |² < ∞, then m(X) represents almost surely (a.s.) the “best” prediction of Y given X, in a mean square sense. Then, it is common in regression modeling to consider the following tautological expression:

where ε = Y − E [Y |X] is, by construction, the unpredictable part (in mean) of Y given X.

Much of the existing literature is concerned with parametric modeling in that m is assumed to belong to a given parametric family

and, by analogy, one considers the following parametric regression model:

with f (X,θ) a parametric specification for the regression function m(X) and with e(θ) a random variable (r.v.), disturbance of the model. Parametric regression models continue to be attractive to practitioners because these models have the appealing property that the parameter θ together with the functional form f (·,·) describes, in a very concise way, the relation between the response Y and the explanatory variable X. Because we do not know in advance the true regression model, to prevent wrong conclusions, every statistical inference that is based on model f should be accompanied by a proper model check. As a matter of fact, a correct specification of m is important in model-based economic decisions and/or to interpret parameters correctly.

Note that

is tantamount to

There is a vast amount of literature on testing consistently the correct specification of a parametric regression model. Although the idea of the proposed consistent tests is similar in all cases, namely, comparing a parametric and a (semi-) nonparametric estimation of a functional of the conditional mean in (2), they can be divided into two classes of tests. The first class of tests uses nonparametric smoothing estimators of E [e(θ₀)|X]. We call this approach the “local approach”; see Eubank and Spiegelman (1990), Eubank and Hart (1992), Wooldridge (1992), Yatchew (1992), Gozalo (1993), Härdle and Mammen (1993), Horowitz and Härdle (1994), Hong and White (1995), Zheng (1996), Li (1999), Horowitz and Spokoiny (2001), Koul and Ni (2004), and Guerre and Lavergne (2005) for some examples. A methodology related to the local approach is that of empirical likelihood procedures as proposed in Chen, Härdle, and Li (2003) and Tripathi and Kitamura (2003). The local approach requires smoothing of the data in addition to the estimation of the finite-dimensional parameter vector and leads to less precise fits. Tests based on the local approach have standard asymptotic null distributions, but their finite-sample distributions depend on the choice of a bandwidth (or similar) of the nonparametric estimator, which affects the inference procedures.

The second class of tests avoids smoothing estimation by means of reducing the conditional mean independence in (2) to an infinite (but parametric) number of unconditional orthogonality restrictions, i.e.,

where Π is a properly chosen space and the parametric family w(·,x) is such that the equivalence (3) holds; see Bierens and Ploberger (1997), Stinchcombe and White (1998), and Escanciano (2006) for primitive conditions on the family w(·,x) to satisfy this equivalence. We call the approach based on (3) the “integrated approach” because it uses the integrated (cumulative) measures of dependence E [e(θ₀)w(X,x)]. In the literature the most frequently used weighting functions have been the exponential function, e.g., w(X,x) = exp(ix′X) in Bierens (1982, 1990), where

denotes the imaginary unit and the indicator function w(X,x) = 1(X ≤ x); see, e.g., Stute (1997), Koul and Stute (1999), Whang (2000), and Li, Hsiao, and Zinn (2003), among many others. Different families w deliver different power properties of the integrated-approach-based tests. Most tests based on the integrated approach have nonstandard asymptotic null distributions, but they can be well approximated by bootstrap methods; see, e.g., Stute, Gonzalez-Manteiga, and Presedo-Quindimil (1998).

An important problem with the local approach arises when the dimension of the explanatory variable X is high or even moderate. The sparseness of the data in high-dimensional spaces leads most local-based test statistics to suffer a considerable bias, even for large sample sizes. This is an important practical limitation for most tests considered in the literature, because it is not uncommon in econometric modeling to have high-order models. Some statistical theories have been developed to overcome this problem; cf. generalized linear models (GLM) (see, e.g., McCullagh and Nelder, 1989) or single-index models (see, e.g., Powell, Stock, and Stoker, 1989). However, these theories are semiparametric and, therefore, need smoothing techniques. In addition, they do not cover all possible models.

Here, we propose a new consistent test within the integrated framework that compares very well to indicator- and exponential-based tests. The new test is simple to compute, does not need user-chosen parameters or high-dimensional numerical integration, is robust to higher order dependence (in particular to conditional heteroskedasticity), and presents excellent empirical power properties in finite samples; see Section 4. Furthermore, our test procedure provides a formalization of some well-known traditional exploratory tools based on residual-fitted values plots.

The organization of the paper is as follows. In Section 2 we define the residual marked process based on projections (RMPP) as the basis for our test statistic. In Section 3 we study the asymptotic null distribution and the behavior against Pitman's local alternatives of the new test statistic. For completeness of exposition, we consider in this section a new minimum distance estimator for the regression parameter based on the RMPP, and we show its consistency and asymptotic normality under similar assumptions as in the testing procedure. Also, because the asymptotic null distribution depends on the data generating process (DGP), a bootstrap procedure to approximate the asymptotic critical values of the test statistic is considered. In Section 4 we conduct a simulation exercise comparing the new proposed test with some competing tests considered in the literature. This Monte Carlo experiment shows that our new test can play a valuable role in parametric regression modeling. Proofs of the main results are deferred to Appendix A. Appendix B contains a simple algorithm to compute the new test statistic.

2. THE RESIDUAL MARKED PROCESS BASED ON PROJECTIONS

Let {Z_i = (Y_i,X_i′)′}_i=1ⁿ be a sequence of independent and identically distributed (i.i.d.) (d + 1)-dimensional r.v.s defined on the probability space

and with the same distribution as Z = (Y,X′)′, with 0 < E|Y | < ∞. The main goal in this paper is to test the null hypothesis (2), i.e.,

against the alternative

As argued before, one way to characterize H₀ is by the infinite number of parametric unconditional moment restrictions

where the parametric family w(·,x) is such that the equivalence in (3) holds. Examples of such families are w(X,x) = 1(X ≤ x), w(X,x) = exp(ix′X), w(X,x) = sin(x′X), and w(X,x) = 1/(1 + exp(c − x′X)) with c ≠ 0; see the aforementioned references for many other families.

In view of a sample {Z_i}_i=1ⁿ, define the marked empirical process

Define also R_n,w¹(·) ≡ R_n,w(·,θ_n), where θ_n is an

-consistent estimator of θ₀. The marks in R_n,w¹ are given by the classical residuals; therefore, we call R_n,w¹ a residual marked empirical process.

Because of the equivalence (3), it is natural to base the tests on a distance from R_n,w¹ to zero, i.e., on a norm Γ(R_n,w¹), say. The most used norms are the Cramér–von Mises (CvM) and Kolmogorov–Smirnov (KS) functionals:

respectively, where Ψ(x) is an integrating function satisfying some mild conditions; see A4 in Section 3. Other functionals are possible. Then, tests in the integrated approach reject the null hypothesis (2) for “large” values of Γ(R_n,w¹).

The first consistent integrated test proposed in the literature was that of Bierens (1982) based on the exponential weighting family, i.e., using the residual marked process

where Φ(·) is a bounded one-to-one Borel measurable mapping from

. Bierens (1982) considered a CvM norm with integrating measures Ψ(dx) = ϒ(x) dx, with ϒ(x) = 1(x ∈ Π_l=1^d[−ε_l,ε_l]), where ε_l > 0, l = 1,…, d, are arbitrarily chosen numbers (Bierens, 1982, p. 109), and ϒ(x) equals a d-variate normal density function (Bierens, 1982, p. 111).

On the other hand, Stute (1997) used the indicator family w(X,x) = 1(X ≤ x) in the residual marked process. The main advantage of the indicator weighting function over the exponential function is that it avoids the choice of an arbitrary integrating function Ψ, because in the indicator case this is given by the natural empirical distribution function of {X_i}_i=1ⁿ. On the contrary, the indicator weight has the drawback of being more sensitive to the dimension d than the exponential weight, which is based on one-dimensional projections (see Escanciano, 2006).

In this paper we propose a test based on a new family {w,Ψ} of weighting and integrating functions that possesses the good properties of the exponential- and indicator-based tests and at the same time prevents their deficiencies. The new test avoids the arbitrary choice of the integrating function or numerical integration in high-dimensional spaces and is less sensitive to the dimension d than indicator-based tests because it is based on one-dimensional projections. The CvM test based on this new family presents an excellent performance in finite samples and is very simple to compute. In addition, the new family w formalizes some traditional model diagnostic tools based on residual-fitted values plots for linear models.

Our first aim is to avoid the problem of the curse of dimensionality. The following result can be viewed as a particularization of the Cramér–Wold principle to our main concern, the goodness-of-fit of the regression function. The term |A| denotes the Euclidean norm of A.

LEMMA 1. A necessary and sufficient condition for (2) to hold is that for any vector

with |β| = 1,

Lemma 1 yields that consistent tests for H₀ can be based on one-dimensional projections. In particular, we have the characterization of the null hypothesis H₀:

where from now on

is the nuisance parameter space with

the unit ball in

, i.e.,

. Therefore, the test we consider here rejects the null hypothesis for “large” values of the standardized sample analogue of E [e(θ₀)1(β′X ≤ u)].

An approach related to ours is that of Stute and Zhu (2002), who considered the weighting family {1(β₀′ X ≤ u)} for model checks of GLM in an i.i.d. framework. However, note that they fix the direction to β₀, the direction involved in the GLM, and so their approach is clearly different from that considered here, because we consider all the directions β in

simultaneously. As a consequence, our test will be consistent against all alternatives, whereas in our present framework the Stute and Zhu (2002) test is only consistent against alternatives satisfying that E [e(θ_*)1(β_*′ X ≤ u)] ≠ 0 in a set with positive Lebesgue measure in

, where θ_* and β_* are the probabilistic limits under the alternative of the estimators of θ₀ and β₀, respectively.

The family 1(β′X ≤ u) yields the RMPP

The marks of R_n¹ are given by the classical residuals and the “jumps” by the projected regressors. Note that for a fixed direction β, R_n¹ is uniquely determined by the residuals and the projected variables {β′X_i}_i=1ⁿ and vice versa. As in the usual residual-regressors plot, we can plot the path of R_n¹ for different directions β as an exploratory diagnostic tool. In particular, in the linear model, the plot of the path of R_n¹(β_n,u), with β_n the least squares estimator, resembles the usual residual-fitted values plot. Therefore, tests based on R_n¹(β_n,u) provide a formalization of such traditional well-known exploratory tools.

To measure the distance from R_n¹ to zero a norm has to be chosen. From computational considerations a CvM norm is very convenient in our context. Two facts motivate our choice of the integrating measure in the CvM norm. First, note that once the direction β is fixed, u lives in the projected regressor variable's space, and second, in principle, all the directions are equally important; cf. Lemma 1. To define our CvM test we need some notation. Let F_n,β(u) be the empirical distribution function of the projected regressors {β′X_i}_i=1ⁿ and dβ the uniform density on the unit sphere. Also let F_β(u) be the true cumulative probability distribution function (c.d.f.) of β′X. Then, we define the new CvM test as

Therefore, we reject the null hypothesis H₀ for large values of PCvM_n. See Appendix B for a simple algorithm to compute PCvM_n from a given data set {Z_i}_i=1ⁿ. The next section justifies inference for PCvM_n based on the asymptotic theory.

Our test statistic PCvM_n avoids the deficiencies of the Bierens (1982) and Stute (1997) tests, namely, the arbitrary choice of the integrating function or numerical integration in high-dimensional spaces and the low power performance when the dimension d is large, respectively. However, it is worthwhile to mention that our test is not necessarily better than the Bierens (1982) and Stute (1997) tests. In fact, using the results of Bierens and Ploberger (1997) it can be shown that all these tests are asymptotically admissible, and therefore none of them is strictly better than the others uniformly over the space of alternatives. However, in our simulations that follow we show that for the alternatives considered our test is the best or comparable to the best test. A simple intuition as to why our test performs so well with the alternatives considered is as follows. Under the alternative it can be shown that, uniformly in x ∈ Π,

where θ₁ is the probabilistic limit of θ_n under the alternative H_A. On the other hand, under the normalization E [m²(X,θ₁)] = 1, where m(·,θ₁) = E [e(θ₁)|X = ·], it holds that the optimization problem

attains its optimum at w*(·) = m(·,θ₁). Therefore, as w(·,·) is closer to m(·,·), the test based on w is expected to have better power properties. It seems that for the models considered in Section 4 m(·,θ₁) can be “well approximated” by our weight function 1(β′X ≤ u), and this might explain the good power properties of our test procedure.

During the revision process of the paper one of the referees suggested a modification of our test that might have better finite-sample performance. Based on the inequality

which follows from simple algebra, the modified test statistic is

However, contrary to PCvM_n the latter test statistic involves numerical integration and is much more difficult to compute. Therefore, we do not study this modified test statistic further in the paper. On the contrary, the next section studies the asymptotic distribution theory for PCvM_n.

3. ASYMPTOTIC THEORY

Now, we establish the limit distribution of R_n¹ under the null hypothesis H₀. For the asymptotic theory, note that R_n¹ can be viewed as a mapping from

with values in [ell ]^∞(Π), the space of all real-valued functions that are uniformly bounded on Π. Let ⇒ denote weak convergence on [ell ]^∞(Π) and

denote convergence in outer probability; see Definitions 1.3.3 and 1.9.1, respectively, in van der Vaart and Wellner (1996). Also,

stands for convergence in distribution of real r.v.s. To derive asymptotic results we consider the following assumptions. First, let us denote by F_Y(·) and F_X(·) the marginal c.d.f. of Y and X, respectively. Also let Ψ_p(·) be the product measure of F_β(·) and the uniform distribution on

, i.e., Ψ_p(dβ,du) = F_β(du) dβ. In the discussion that follows C is a generic constant that may change from one expression to another.

Assumption A1.

A1(a) {Z_i = (Y_i,X_i′)′}_i=1ⁿ is a sequence of i.i.d. random vectors with 0 < E|Y_i| < ∞.

A1(b) E|ε|² < C.

Assumption A2. f (·,θ) is twice continuously differentiable in a neighborhood Θ₀ of θ₀, Θ₀ ⊂ Θ. The score g(X,θ) = (∂/∂θ′) f (X,θ) verifies that there exists a F_X(·)-integrable function M(·) with sup_θ∈Θ₀|g(·,θ)| ≤ M(·).

Assumption A3.

A3(a) The parametric space Θ is compact in

. The true parameter θ₀ belongs to the interior of Θ. There exists a θ₁ ∈ Θ such that |θ_n − θ₁| = o_P(1).

A3(b) The estimator θ_n satisfies the following asymptotic expansion under H₀:

where l(·) is such that E [l(Y,X,θ₀)] = 0 and L(θ₀) = E [l(Y,X,θ₀)l′(Y,X,θ₀)] exists and is positive definite.

Assumption A4. Ψ_p(·) is absolutely continuous with respect to Lebesgue measure on Π.

Assumptions A1 and A2 are standard in the model checks literature; see, e.g., Bierens (1990) and Stute (1997). Assumption A3 is satisfied, e.g., for the nonlinear least squares estimator and (under further regularity assumptions) its robust modifications; see, e.g., Chapters 5 and 7 in Koul (2002). Note that A3(a) and A3(b) imply that θ₀ = θ₁ under the null H₀, but they are not necessarily equal under the alternative. We shall show subsequently that A3 is also satisfied for a new minimum distance estimator. Assumption A4 is only necessary for consistency of the test.

Under A1 and (2), using a classical central limit theorem (CLT) for i.i.d. sequences, we have that the finite-dimensional distributions of R_n, where R_n is the process defined in (5) with θ = θ₀ and w(X,x) = 1(β′X ≤ u), converge to those of a multivariate normal distribution with a zero mean vector and variance-covariance matrix given by the covariance function

where x₁ = (β₁′,u₁)′ and x₂ = (β₂′,u₂)′. The next result is an extension of this convergence to weak convergence in the space [ell ]^∞(Π). Throughout the paper x = (β′,u)′ will denote the nuisance parameter, and we interchange the notation x and (β′,u)′ whenever this does not create confusion.

THEOREM 1. Under the null hypothesis H₀ and Assumption A1

where R_∞(·) is a Gaussian process with zero mean and covariance function given by (9).

In practice θ₀ is unknown and has to be estimated from a sample {Z_i}_i=1ⁿ by an estimator θ_n, say. The next result shows the effect of the parameter uncertainty on the asymptotic null distribution of R_n¹. To this end, let us define the function G(x,θ₀) = G(x) = E [g(X,θ₀)1(β′X ≤ u)] and let V be a normal random vector with zero mean and variance-covariance matrix given by L(θ₀) as defined in A3(b).

THEOREM 2. Under the null hypothesis H₀ and Assumptions A1–A3

where R _∞ is the same process as in Theorem 1 and

Theorem 2 and the continuous mapping theorem (CMT) (see, e.g., van der Vaart and Wellner, 1996, Thm. 1.3.6) yield the asymptotic null distribution of the functional PCvM_n.

COROLLARY 1. Under the assumptions of Theorem 2, for any continuous functional (with respect to the supremum norm) Γ(·),

Furthermore,

Note that the integrating measure in PCvM_n is a random measure, but Corollary 1 shows that the asymptotic theory is not affected by this fact. Also note that the asymptotic null distribution of PCvM_n depends in a complex way on the DGP and the specification under the null, and so critical values have to be tabulated for each model and each DGP, making the application of these asymptotic results difficult in practice. To overcome this problem we approximate the asymptotic null distribution of continuous functionals of R_n¹ by a bootstrap procedure given subsequently.

In Assumption A3 we require that the estimator of θ₀ admits an asymptotic linear representation. For completeness of the presentation we give some mild sufficient conditions under which a minimum distance estimator (see Koul, 2002, Ch. 5 and references therein), is asymptotically linear. Motivated from Lemma 1, we have that under the null

and θ₀ is the unique value that satisfies (10). Then, we propose estimating θ₀ by the sample analogue of (10), i.e.,

This estimator is a minimum distance estimator and extends in some sense the generalized method of moments (GMM) estimator, frequently used in econometric and statistical applications. This kind of generalization of GMM was considered first in Carrasco and Florens (2000) for univariate problems. Recently, and for w(X,x) = 1(X ≤ x), Dominguez and Lobato (2004) have considered an estimator similar to (11) for a conditional moment restriction under time series. Also using this principle, Koul and Ni (2004) have proposed a minimum distance estimation for θ₀ using an L₂-distance similar to that used in Härdle and Mammen (1993) in the “local approach.” Our estimator θ_n has the advantage of being free of any user-chosen parameter (bandwidth, kernel, or integrating measure) and is expected to be more robust to the problem of the curse of dimensionality than the estimating procedures based on 1(X ≤ x) or local approaches. Now, we shall show that θ_n in (11) satisfies Assumption A3. The following matrices are involved in the asymptotic variance-covariance matrix of the estimator:

For the consistency and asymptotic normality of the estimator we need an additional assumption.

Assumption A1′. The regression function f (·,θ) satisfies that there exists an F_X(·)-integrable function K_f(·) with sup_θ∈Θ| f (·,θ)| ≤ K_f(·).

THEOREM 3. Under H₀ and Assumptions A1, A2, and A1′

(i) the estimator given in (11) is consistent, i.e., θ_n → θ₀ a.s.;

(ii) if in addition, the matrix C is nonsingular, then

From Theorem 3 we have immediately the asymptotic linear expansion required in A3(b):

where now

Note that in general the estimator given in (11) is not asymptotically efficient. An asymptotically efficient estimator based on the same minimum distance principle can be constructed following the ideas of Carrasco and Florens (2000). This optimal estimator will require the choice of a regularization parameter needed to invert a covariance operator; see Carrasco and Florens (2000) for more details.

Now we study the asymptotic distribution of R_n¹ under a sequence of local alternatives converging to null at a parametric rate n^−1/2. We consider the local alternatives

where the random variable a(X) is F_X-integrable with zero mean and satisfies P(a(X) = 0) < 1. To derive the next result we need the following assumption.

Assumption A3′. The estimator θ_n satisfies the following asymptotic expansion under H_A,n:

where the function l(·) is as in Assumption A3 and ξ_a is a vector in

Remark 1. It is not difficult to show that θ_n in (11) satisfies Assumption A3′ under Assumptions A1, A2, and A1′ with

THEOREM 4. Under the local alternatives (12) and Assumptions A1, A2, and A3′

where R_∞¹ is the process defined in Theorem 2 and the function D_a(·) is the determinist function

For some estimators, D_a has an intuitive geometric interpretation. For instance, for the new minimum distance estimator (11) the shift function is given by

and represents the orthogonal projection in L₂(Π,Ψ_p), the Hilbert space of all real-valued and Ψ_p-square-integrable functions on Π, of E [a(X)1(β′X ≤ u)] parallel to G(β,u). The next corollary is a consequence of the CMT and Theorem 4.

COROLLARY 2. Under the local alternatives (12) and Assumptions A1, A2, and A3′, for any continuous functional Γ(·)

Furthermore,

Note that because of Lemma 1, we have that

Therefore, from this result it is not difficult to show that the test based on PCvM_n is able to detect asymptotically any local alternative a(·) that is not parallel to g(·,θ₀). This result is not attainable for tests based on the local approach, e.g., the Härdle and Mammen (1993) test.

We have seen before that the asymptotic null distribution of continuous functionals of R_n¹ depends in a complicated way on the DGP and the specification under the null. Therefore, critical values for the test statistics cannot be tabulated for general cases. Here we propose to implement the test with the assistance of a bootstrap procedure. Resampling methods have been extensively used in the model checks literature of regression models; see, e.g., Stute et al. (1998) or more recently Li et al. (2003). It is shown in these papers that the most relevant bootstrap method for regression problems is the wild bootstrap (WB) introduced in Wu (1986). We approximate the asymptotic null distribution of R_n¹ by that of

where the sequence {e_i*(θ_n*)}_i=1ⁿ are the fixed design wild bootstrap (FDWB) residuals computed from e_t*(θ_n*) = Y_i* − f (X_i,θ_n*), where Y_i* = f (X_i,θ_n) + e_i(θ_n)V_i, θ_n* is the bootstrap estimator calculated from the data {(Y_i*,X_i′)′}_i=1ⁿ, and {V_i}_i=1ⁿ is a sequence of i.i.d. random variables with zero mean, unit variance, and bounded support and also independent of the sequence {Z_i}_i=1ⁿ. Examples of {V_i}_i=1ⁿ sequences are i.i.d. Bernoulli variates with

where

, used in, e.g., Li et al. (2003). For other sequences see Mammen (1993). The reader is referred to Stute et al. (1998) for the theoretical justification of this bootstrap approximation and the assumptions needed. The results of these authors jointly with those proved here ensure that the proposed bootstrap test has a correct asymptotic level, is consistent, and is able to detect alternatives tending to the null at the parametric rate n^−1/2. The next section shows that this bootstrap procedure provides good approximations in finite samples.

4. MONTE CARLO EVIDENCE

In this section we compare the new CvM test with some competing integrated-approach-based tests proposed in the literature. This study complements others considered in the literature; see, e.g., Miles and Mora (2003). We briefly describe our simulation setup. We denote by PCvM_n the new CvM test defined in (8). For the explicit computation of PCvM_n see Appendix B.

Bierens (1982, p. 111) proposed the CvM test statistic based on the exponential weight function w(X,x) = exp(ix′X) and the d-variate normal density function as the integration function, i.e.,

We also consider here the CvM and KS statistics defined in Stute (1997), which are given by

respectively. Note that CvM_n and PCvM_n are the same test statistics when d = 1, by definition.

Recently, Stute and Zhu (2002) have considered an innovation process transformation of R_n¹(β_n,u) for testing the correct specification of GLM models, where β_n is a suitable estimator of the GLM parameter, say, β₀. More concretely, their test statistic is the CvM test

where

a_{n,β_n}(u) and σ_{n,β_n}²(u) are Nadaraya–Watson estimators of a_β₀(u) = E [g(X,θ₀)/β₀′ X = u] and σ_β₀²(u) = E [ε²|β₀′ X = u], respectively,

is the 99% quantile of F_{n,β_n}. Under the correct specification of the GLM and some additional assumptions

where B(·) denotes a standard Brownian motion on [0,1]; see Stute and Zhu (2002) for further details. For the nonparametric estimators we have chosen a Gaussian kernel with bandwidth h = 0.5n^−1/2 as in Stute and Zhu (2002).

We consider the same FDWB for the version of the exponential Bierens test and for the Stute (1997) test as for our CvM test PCvM_n. For SZ_n we consider empirical critical values based on 10,000 simulations on the first null model in each block of models. In the discussion that follows, ε_i ∼ iid N(0,1) and ν_i ∼ iid exp(1) are standard Gaussian and centered exponential noises, respectively. We consider in the simulations two blocks of models. In the first block the null model is

where X_1i = (W_i + W_1i)/2 and X_2i = (W_i + W_2i)/2, and W_i, W_1i, and W_2i are i.i.d. U [0,2π], independent of ε_i, 1 ≤ i ≤ n. We examine the adequacy of this model under the following DGPs:

DGP1: Y_i = 1 + X_1i + X_2i + ε_i ≡ X_i′α₀ + ε_i.
DGP1-EXP: Y_i = 1 + X_1i + X_2i + ν_i = X_i′α₀ + ν_i.
DGP2: Y_i = X_i′α₀ + 0.1(W_1i − π)(W_2i − π) + ε_i.
DGP3: Y_i = X_i′α₀ + X_i′α₀ exp{−0.01(X_i′α₀)²} + ε_i.
DGP4: Y_i = X_i′α₀ + cos(0.6πX_i′α₀) + ε_i.

DGP1 and DGP2 are considered in Hong and White (1995). DGP3 here is similar to their DGP3; see also Koul and Stute (1999). DGP4 is similar to that considered in Eubank and Hart (1992). DGP1-EXP is considered here to show the robustness of the tests against fatter tailed error distributions. For the first block of models we consider a sample size of n = 50, 100, and 300. The number of Monte Carlo experiments is 1,000, and the number of bootstrap replications is B = 500. For the bootstrap approximation we employ the sequence {V_i}_i=1ⁿ of i.i.d. Bernoulli variates given in (13). We estimate the null model by the usual least squares estimator. The nominal levels are 10%, 5%, and 1%.

In Table 1 we show the empirical rejection probabilities (RP) associated with models DGP1 and DGP1-EXP. The empirical levels of the test statistics are close to the nominal level, even for sample sizes as small as 50. The empirical levels for DGP1-EXP are less accurate than for DGP1 but are reasonable, showing that the tests are robust to fat-tailed error distributions.

Empirical size of tests

In Table 2 we report the empirical power against the DGP2. It increases with the sample size n for all test statistics, as expected. It is shown that the new CvM test PCvM_n has the best empirical power in all cases. The empirical power for CvM_n,exp is reasonable and greater than or equal to CvM_n and KS_n for n = 50, but better for n = 100 and 300. The Stute and Zhu (2002) test, SZ_n, is the worst against this alternative. The rejection probabilities of PCvM_n are comparable to the best test in Hong and White (1995) against this alternative. In Table 3 we show the RP for DGP3. For this alternative SZ_n and our test statistic, PCvM_n, have generally the best empirical powers, SZ_n performing slightly better than PCvM_n. Bierens' test CVM_n,exp has good power properties for this alternative. Stute's test CvM_n performs similarly to CVM_n,exp, whereas KS_n presents the worst results, with a moderate power. For DGP4 (Table 4), PCvM_n and CVM_n,exp have excellent empirical powers. Stute's tests, CvM_n and KS_n, and the Stute and Zhu (2002) test, SZ_n, have low power against this “high-frequency” alternative.

Empirical power of tests

The second block of models is taken from Zhu (2003). The null model is

whereas the DGPs considered are

where X_i′ is a random d-dimensional covariate with i.i.d. U [0,2π] marginal components, d = 3 and 6. When d = 3, γ₀ = (1,1,2)′ and β₀ = (2,1,1)′, and when d = 6, γ₀ = (1,2,3,4,5,6)′ and β₀ = (6,5,4,3,2,1)′. Furthermore, set b = 0.01, 0.02,…,0.1 when d = 3 and b = 0.001, 0.002,…,0.01 when d = 6. This experiment provides us with evidence of the power performance of the tests under local alternatives (b = 0 corresponds to the null hypothesis). The sample size is n = 25; the rest of the Monte Carlo parameters are as before.

We show the RP for these models in Figure 1. We see that in both cases, d = 3 and 6, our new test statistic PCvM_n and SZ_n have the best empirical powers for all values of b. None of them is superior to the other for all values of b and for both models. For d = 3, SZ_n performs slightly better than PCvM_n. They are followed by CvM_n,exp. For d = 6, PCvM_n has the best power for b ≤ 0.006, whereas SZ_n is the best for b > 0.006; CvM_n,exp, CvM_n, and KS_n have very low empirical power against this alternative.

Rejection probabilities plots for d = 3 and 6. The solid, solid-star, dot, dash, and dash-dot lines are, respectively, for the empirical power of PCvMn, SZn, CvMn,exp, CvMn, and KSn.

Summarizing, these two Monte Carlo experiments show that our test possesses an excellent power performance in finite samples for the alternatives considered. In all cases, our test has the best empirical power or it is comparable to the best test among the tests proposed by Bierens (1982), Stute (1997), and Stute and Zhu (2002). In our Monte Carlo experiments we have focused on the integrated-approach-based tests. Miles and Mora (2003) have compared through simulations some local-based and integrated-based tests. These authors conclude that for one-dimensional regressors, the integrated-approach-based tests perform slightly better than the smoothing-based ones, especially Bierens' statistic. When the number of regressors is greater than one, some of the smoothing tests considered by these authors perform better. Therefore, it is important to compare our new test with the smoothing-based tests considered by these authors, especially for the case of multivariate regressors. This study is beyond the scope of this paper and is deferred for future research. Our test has the advantage that no bandwidth selection is required, though its implementation requires the use of a bootstrap procedure. Our Monte Carlo experiments show that our test should be considered a reasonably competent test to the best local-approach-based test and a valuable diagnostic procedure for regression modeling.

APPENDIX A: Proofs

Proof of Lemma 1. This follows easily from Part I of Theorem 1 in Bierens (1982). █

Proof of Theorem 1. By a classical CLT we can show that the finite-dimensional distributions of R_n converge to those of the Gaussian process R_∞. The asymptotic equicontinuity of R_n follows by a direct application of Theorem 2.5.2 in van der Vaart and Wellner (1996); see also their Problem 14 on p. 152. █

Proof of Theorem 2. Applying the classical mean value theorem argument we have

where

and where

satisfies

. By Assumptions A1–A3, the generalization by Wolfowitz (1954) of the Glivenko–Cantelli theorem, and the uniform law of large numbers (ULLN) of Jennrich (1969), it is easy to show that I = o_P(1) and II = o_P(1) uniformly in x ∈ Π. Thus the theorem follows from Theorem 1 and Assumption A3. █

Proof of Corollary 1. For a nonrandom continuous functional, the result follows from the CMT and Theorem 2. For PCvM_n the result follows because under the conditions of Theorem 2 we have that R_n¹ is asymptotically tight and hence Lemma 3.1 in Chang (1990) applies. █

Proof of Theorem 3. The proof follows exactly the same steps as the proof of Theorems 1 and 2 in Dominguez and Lobato (2004), and thus it is omitted. █

Proof of Theorem 4. Under the local alternatives (12) write

with

Using A3′ as in Theorem 2, we obtain

uniformly in x ∈ Π. On the other hand, using the results of Wolfowitz (1954), we have uniformly in x ∈ Π

Using the preceding equations and (A.1), the theorem holds from Theorem 1 and Assumption A3′. █

APPENDIX B: Computation of the Test Statistic

By simple algebra

For d > 1, note that the integral A_ijr is proportional to the volume of a spherical wedge and hence we can compute them from the formula

where A_ijr⁽⁰⁾ is the complementary angle between the vectors (X_i − X_r) and (X_j − X_r) measured in radians and Γ(·) is the gamma function. Thus, A_ijr⁽⁰⁾ is given by

Hence, the computation of these integrals is simple. In addition, there are some restrictions on the integrals A_ijr that make the computation simpler, e.g., if X_i = X_j and X_i ≠ X_r then A_ijr⁽⁰⁾ = π, whereas if X_i = X_j and X_i = X_r then A_ijr⁽⁰⁾ = 2π. If X_i ≠ X_j and X_i = X_r or X_j = X_r, we have that A_ijr⁽⁰⁾ = π. Also, the symmetric property A_ijr = A_jir holds.

References

REFERENCES

Bierens, H.J. (1982) Consistent model specification tests. Journal of Econometrics 20, 105–134.CrossRef Google Scholar

Bierens, H.J. (1990) A consistent conditional moment test of functional form. Ecomometrica 58, 1443–1458.CrossRef Google Scholar

Bierens, H.J. & W. Ploberger (1997) Asymptotic theory of integrated conditional moment test. Econometrica 65, 1129–1151.CrossRef Google Scholar

Carrasco, M. & J.P. Florens (2000) Generalization of GMM to a continuum of moment conditions. Econometric Theory 16, 797–834.CrossRef Google Scholar

Chang, N.M. (1990) Weak convergence of a self-consistent estimator of a survival function with doubly censored data. Annals of Statistics 18, 391–404.CrossRef Google Scholar

Chen, S.X., W. Härdle, & M. Li (2003) An empirical likelihood goodness-of-fit test for time series. Journal of the Royal Statistical Society, Series B 65, 663–678.CrossRef Google Scholar

Dominguez, M. & I. Lobato (2004) Consistent estimation of models defined by conditional moment restrictions. Econometrica 72, 1601–1615.CrossRef Google Scholar

Escanciano, J.C. (2006) Goodness-of-fit tests for linear and nonlinear time series models. Journal of the American Statistical Association 101, 531–541.CrossRef Google Scholar

Eubank, R. & J. Hart (1992) Testing goodness-of-fit in regression via order selection criteria. Annals of Statistics 20, 1412–1425.CrossRef Google Scholar

Eubank, R. & S. Spiegelman (1990) Testing the goodness of fit of a linear model via nonparametric regression techniques. Journal of the American Statistical Association 85, 387–392.CrossRef Google Scholar

Fan, J. & I. Gijbels (1996) Local Polynomial Modelling and Its Applications. Chapman and Hall.Google Scholar

Gozalo, P.L. (1993) A consistent model specification test for nonparametric estimation of regression function models. Econometric Theory 9, 451–477.CrossRef Google Scholar

Guerre, E. & P. Lavergne (2005) Rate-optimal data-driven specification testing for regression models. Annals of Statistics 33, 840–870.CrossRef Google Scholar

Härdle, W. & E. Mammen (1993) Comparing nonparametric versus parametric regression fits. Annals of Statistics 21, 1926–1974.CrossRef Google Scholar

Hong, Y. & H. White (1995) Consistent specification testing via nonparametric series regression. Econometrica 63, 1133–1159.CrossRef Google Scholar

Horowitz, J.L. & W. Härdle (1994) Testing a parametric model against a semiparametric alternative. Econometric Theory 10, 821–848.CrossRef Google Scholar

Horowitz, J.L. & V.G. Spokoiny (2001) An adaptive, rate-optimal test of a parametric mean-regression model against a nonparametric alternative. Econometrica 69, 599–631.CrossRef Google Scholar

Jennrich, R.I. (1969) Asymptotic properties of nonlinear least squares estimators. Annals of Mathematical Statistics 40, 633–643.CrossRef Google Scholar

Koul, H.L. (2002) Weighted Empirical Processes in Dynamic Nonlinear Models, 2nd ed. Lecture Notes in Statistics, vol. 166. Springer-Verlag.CrossRef Google Scholar

Koul, H.L. & P. Ni (2004) Minimum distance regression model checking. Journal of Statistical Planning and Inference 119, 109–144.CrossRef Google Scholar

Koul, H.L. & W. Stute (1999) Nonparametric model checks for time series. Annals of Statistics 27, 204–236.CrossRef Google Scholar

Li, Q. (1999) Consistent model specification test for time series econometric models. Journal of Econometrics 92, 101–147.CrossRef Google Scholar

Li, Q., C. Hsiao, & J. Zinn (2003) Consistent specification tests for semiparametric/nonparametric models based on series estimation methods. Journal of Econometrics 112, 295–325.CrossRef Google Scholar

Mammen, E. (1993) Bootstrap and wild bootstrap for high-dimensional linear models. Annals of Statistics 21, 255–285.CrossRef Google Scholar

McCullagh, P. & J. Nelder (1989) Generalized Linear Models. Monographs on Statistics and Applied Probability 37. Chapman and Hall.CrossRef Google Scholar

Miles, D. & J. Mora (2003) On the performance of nonparametric specification tests in regression models. Computational Statistics & Data Analysis 42, 477–490.CrossRef Google Scholar

Powell, J.L., J.M. Stock, & T.M. Stoker (1989) Semiparametric estimation of index coefficients. Econometrica 57, 1403–1430.CrossRef Google Scholar

Stinchcombe, M. & H. White (1998) Consistent specification testing with nuisance parameters present only under the alternative. Econometric Theory 14, 295–325.CrossRef Google Scholar

Stute, W. (1997) Nonparametric model checks for regression. Annals of Statistics 25, 613–641.CrossRef Google Scholar

Stute, W., W. Gonzalez-Manteiga, & M. Presedo-Quindimil (1998) Bootstrap approximations in model checks for regression. Journal of the American Statistical Association 93, 141–149.CrossRef Google Scholar

Stute, W. & L.X. Zhu (2002) Model checks for generalized linear models. Scandinavian Journal of Statistics 29, 535–545.CrossRef Google Scholar

Tripathi, G. & Y. Kitamura (2003) Testing conditional moment restrictions. Annals of Statistics 31, 2059–2095.CrossRef Google Scholar

van der Vaart, A.W. & J.A. Wellner (1996) Weak Convergence and Empirical Processes. Springer-Verlag.CrossRef Google Scholar

Whang, Y.-J. (2000) Consistent bootstrap tests of parametric regression functions. Journal of Econometrics 98, 27–46.CrossRef Google Scholar

Wolfowitz, J. (1954) Generalization of the theorem of Glivenko-Cantelli. Annals of Mathematical Statistics 25, 131–138.CrossRef Google Scholar

Wooldridge, J. (1992) A test for functional form against nonparametric alternatives. Econometric Theory 8, 452–475.CrossRef Google Scholar

Wu, C.F.J. (1986) Jacknife, bootstrap and other resampling methods in regression analysis (with discussion). Annals of Statistics 14, 1261–1350.Google Scholar

Yatchew, A.J. (1992) Nonparametric regression tests based on least squares. Econometric Theory 8, 435–451.CrossRef Google Scholar

Zheng, X. (1996) A consistent test of functional form via nonparametric estimation technique. Journal of Econometrics 75, 263–289.CrossRef Google Scholar

Zhu, L.X. (2003) Model checking of dimension-reduction type for regression. Statistica Sinica 13, 283–296.Google Scholar

Empirical size of tests

Empirical power of tests

Rejection probabilities plots for d = 3 and 6. The solid, solid-star, dot, dash, and dash-dot lines are, respectively, for the empirical power of PCvMn, SZn, CvMn,exp, CvMn, and KSn.

Article contents

A CONSISTENT DIAGNOSTIC TEST FOR REGRESSION MODELS USING PROJECTIONS

Abstract

1. INTRODUCTION

2. THE RESIDUAL MARKED PROCESS BASED ON PROJECTIONS

3. ASYMPTOTIC THEORY

4. MONTE CARLO EVIDENCE

APPENDIX A: Proofs

APPENDIX B: Computation of the Test Statistic

References

REFERENCES

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests