Hostname: page-component-745bb68f8f-d8cs5 Total loading time: 0 Render date: 2025-02-11T01:36:29.781Z Has data issue: false hasContentIssue false

THE LIVE METHOD FOR GENERALIZED ADDITIVE VOLATILITY MODELS

Published online by Cambridge University Press:  01 December 2004

Woocheol Kim
Affiliation:
Korea Institute of Public Finance and Humboldt University of Berlin
Oliver Linton
Affiliation:
The London School of Economics
Rights & Permissions [Opens in a new window]

Abstract

We investigate a new separable nonparametric model for time series, which includes many autoregressive conditional heteroskedastic (ARCH) models and autoregressive (AR) models already discussed in the literature. We also propose a new estimation procedure called LIVE, or local instrumental variable estimation, that is based on a localization of the classical instrumental variable method. Our method has considerable computational advantages over the competing marginal integration or projection method. We also consider a more efficient two-step likelihood-based procedure and show that this yields both asymptotic and finite-sample performance gains.This paper is based on Chapter 2 of the first author's Ph.D. dissertation from Yale University. We thank Wolfgang Härdle, Joel Horowitz, Peter Phillips, and Dag Tjøstheim for helpful discussions. We are also grateful to Donald Andrews and two anonymous referees for valuable comments. The second author thanks the National Science Foundation and the ESRC for financial support.

Type
Research Article
Copyright
© 2004 Cambridge University Press

1. INTRODUCTION

Volatility models are of considerable interest in empirical finance. There are many types of parametric volatility models, following the seminal work of Engle (1982). These models are typically nonlinear, which poses difficulties both in computation and in deriving useful tools for statistical inference. Parametric models are prone to misspecification, especially when there is no theoretical reason to prefer one specification over another. Nonparametric models can provide greater flexibility. However, the greater generality of these models comes at a cost—including a large number of lags requires estimation of a high-dimensional smooth, which is known to behave very badly (Silverman, 1986). The “curse of dimensionality” puts severe limits on the dynamic flexibility of nonparametric models. Separable models offer an intermediate position between the complete generality of nonparametric models and the restrictiveness of parametric models. These models have been investigated in cross-sectional settings and also in time series settings.

In this paper, we investigate a generalized additive nonlinear autoregressive conditional heteroskedastic model (GANARCH):

where mα(·) and vα(·) are smooth but unknown functions and Fm(·) and Fv(·) are known monotone transformations (whose inverses are Gm(·) and Gv(·), respectively).

1

The extension to allow the F transformations to be of unknown functional form is considerably more complicated; see Horowitz (2001).

The error process, {εt}, is assumed to be a martingale difference with unit scale, that is,

, where

is the σ-algebra of events generated by {yk}k=−∞t. Under some weak assumptions, the time series of nonlinear autoregressive models can be shown to be stationary and strongly mixing with mixing coefficients decaying exponentially fast. Auestadt and Tjøstheim (1990) use α-mixing or geometric ergodicity to identify their nonlinear time series model. Similar results are obtained for the additive nonlinear autoregressive conditional heteroskedastic (ARCH) process by Masry and Tjøstheim (1997); see also Cai and Masry (2000) and Carrasco and Chen (2002). We follow the same argument as Masry and Tjøstheim (1997) and will assume all the necessary conditions for stationarity and mixing property of the process {yt}t=1n in (1.1). The standard identification for the components of the mean and variance is made by

for all α = 1,…,d. The notable aspect of the model is additivity via known links for conditional mean and volatility functions. As will be shown later, (1.1)–(1.3) include a wide variety of time series models in the literature. See Horowitz (2001) for a discussion of generalized additive models in a cross-section context.

In a much simpler univariate setup, Robinson (1983), Auestadt and Tjøstheim (1990), and Härdle and Vieu (1992) study the kernel estimation of the conditional mean function m(·) in (1.1). The so-called CHARN (conditionally heteroskedastic autoregressive nonlinear) model is the same as (1.1) except that m(·) and v(·) are univariate functions of yt−1. Masry and Tjøstheim (1995) and Härdle and Tsybakov (1997) apply the Nadaraya–Watson and local linear smoothing methods, respectively, to jointly estimate v(·) together with m(·). Alternatively, Fan and Yao (1996) and Ziegelmann (2002) propose local linear least square estimation for the volatility function, with the extension given by Avramidis (2002) based on local linear maximum likelihood estimation. Also, in a nonlinear vector autoregressive (VAR) context, Härdle, Tsybakov, and Yang (1998) deal with the estimation of conditional mean in a multilagged extension similar to (1.1). Unfortunately, however, introducing more lags in nonparametric time series models has unpleasant consequences, more so than in the parametric approach. As is well known, smoothing methods in high dimensions suffer from a slower convergence rate—the “curse of dimensionality.” Under twice differentiability of m(·), the optimal rate is n−2/(4+d), which gets rapidly worse with dimension. In high dimensions it is also difficult to describe graphically the function m.

The additive structure has been proposed as a useful way to circumvent these problems in multivariate smoothing. By assuming the target function to be a sum of functions of covariates, say,

, we can effectively reduce the dimensionality of a regression problem and improve the implementability of multivariate smoothing up to that of the one-dimensional case. Stone (1985, 1986) shows that it is possible to estimate mα(·) and m(·) with the one-dimensional optimal rate of convergence—for example, n2/5 for twice differentiable functions—regardless of d. The estimates are easily illustrated and interpreted. For these reasons, since the 1980s, additive models have been fundamental to nonparametric regression among both econometricians and statisticians. Regarding the estimation method for achieving the one-dimensional optimal rate, the literature suggests two different approaches: backfitting and marginal integration. The former, originally suggested by Breiman and Friedman (1985), Buja, Hastie, and Tibshirani (1989), and Hastie and Tibshirani (1987, 1990), is to execute iterative calculations of one-dimensional smoothing until some convergence criterion is satisfied. Though appealing to our intuition, the statistical properties of backfitting algorithm were not clearly understood until the very recent works by Opsomer and Ruppert (1997) and Mammen, Linton, and Nielsen (1999). They develop specific (linear) backfitting procedures and establish the geometric convergence of their algorithms and the pointwise asymptotic distributions under some conditions. However, one disadvantage of these procedures is the time-consuming iterations required for implementation. Also, the proofs for the linear case cannot be easily generalized to nonlinear cases such as generalized additive models.

A more recent approach, called marginal integration (MI), is theoretically more manipulable—its statistical properties are easy to derive, because it simply uses averaging of multivariate kernel estimates. Developed independently by Newey (1994), Tjøstheim and Auestad (1994), and Linton and Nielsen (1995), its simplicity inspired subsequent applications such as Linton, Wang, Chen, and Härdle (1995) for transformation models and Linton, Nielsen, and van de Geer (2003) for hazard models with censoring. In the time series models that are special cases of (1.1) and (1.2) with Fm being the identity, Chen and Tsay (1993a, 1993b) and Masry and Tjøstheim (1997) apply backfitting and MI, respectively, to estimate the conditional mean function. Mammen et al. (1999) provide useful results for the same type of models by improving the previous backfitting method with some modification and successfully deriving the asymptotic properties under weak conditions. The separability assumption is also used in volatility estimation by Yang, Härdle, and Nielsen (1999), where the nonlinear ARCH model is of additive mean and multiplicative volatility in the form of

To estimate (1.5), they rely on marginal integration with local linear fits as a pilot estimate and derive asymptotic properties.

This paper features two contributions to the additive literature. The first concerns theoretical development of a new estimation tool called the local instrumental variable estimator for the components of additive models (LIVE for CAM), which was outlined for simple additive cross-sectional regression in Kim, Linton, and Hengartner (1999). The novelty of the procedure lies in the simple definition of the estimator based on univariate smoothing combined with new kernel weights. That is, adjusting kernel weights via conditional density of the covariate enables a univariate kernel smoother to estimate consistently the corresponding additive component function. In many respects, the new estimator preserves the good properties of univariate smoothers. The instrumental variable method is analytically tractable for asymptotic theory: it is shown to attain the optimal one-dimensional rate. Furthermore, it is computationally more efficient than the two existing methods (backfitting and MI) in the sense that it reduces the computations by a factor of n smoothings. The other contribution relates to the general coverage of the model we work with. The model in (1.1)–(1.3) extends ARCH models to a generalized additive framework where both the mean and variance functions are additive after some known transformation (see Hastie and Tibshirani, 1990). All the time series models in our previous discussion are regarded as a subclass of the data generating process for {yt} in (1.1)–(1.3). For example, setting Gm to be an identity and Gv a logarithmic function reduces our model to (1.5). Similar efforts to apply transformation have been made in parametric ARCH models. Nelson (1991) considers a model for the log of the conditional variance—the exponential (G)ARCH class—to embody the multiplicative effects of volatility. It has also been argued to use the Box–Cox transformation for volatility, which is intermediate between linear and logarithm and which allows nonseparable news impact curves. Because it is hard to tell a priori which structure of volatility is more realistic and it should be determined by real data, our generalized additive model provides useful flexible specifications for empirical work. Additionally, from the perspective of potential misspecification problems, the transformation used here alleviates the restriction imposed by the additivity assumption, which increases the approximating power of our model. Note that when the lagged variables in (1.1)–(1.3) are replaced by different covariates and the observations are independent and identically distributed (i.i.d.), the model becomes the cross-sectional additive model studied by Linton and Härdle (1996). Finally, we also consider more efficient estimation along the lines of Linton (1996, 2000).

The rest of the paper is organized as follows. Section 2 describes the main estimation idea in a simple setting. In Section 3, we define the estimator for the full model. In Section 4 we give our main results, including the asymptotic normality of our estimators. Section 5 discusses more efficient estimation. Section 6 reports a small Monte Carlo study. The proofs are contained in the Appendix.

2. NONPARAMETRIC INSTRUMENTAL VARIABLES: THE MAIN IDEA

This section explains the basic idea behind the instrumental variable method and defines the estimation procedure. For ease of exposition, this will be carried out using an example of simple additive models with i.i.d. data. We then extend the definition to the generalized additive ARCH case in (1.1)–(1.3).

Consider a bivariate additive regression model for i.i.d. data (y,X1,X2),

where E(ε|X) = 0 with X = (X1,X2) and the components satisfy the identification conditions E [mα(Xα)] = 0, for α = 1,2 (the constant term is assumed to be zero, for simplicity). Letting η = m2(X2) + ε, we rewrite the model as

which is a classical example of “omitted variable” regression. That is, although (2.6) appears to take the form of a univariate nonparametric regression model, smoothing y on X1 will incur a bias due to the omitted variable η, because η contains X2, which in general depends on X1. One solution to this is suggested by the classical econometric notion of instrumental variable. That is, we look for an instrument W such that

with probability one.2

Note the contrast with the marginal integration or projection method. In this approach one defines m1 by some unconditional expectation

for some weighting function W that depends only on X2 and that satisfies

If such a random variable exists, we can write

This suggests that we estimate the function m1(·) by nonparametric smoothing of Wy on X1 and W on X1. In parametric models the choice of instrument is usually not obvious and requires some caution. However, our additive model has a natural class of instruments—p2(X2)/p(X) times any measurable function of X1 will do, where p(·), p1(·), and p2(·) are the density functions of the covariates X, X1, and X2, respectively. It follows that

as required. This formula shows what the instrumental variable estimator is estimating when m is not additive—an average of the regression function over the X2 direction, exactly the same as the target of the marginal integration estimator. For simplicity we will take

throughout.3

If instead we take

this satisfies E(W|X1) = 1 and E(Wη|X1) = 0. However, the term p1(X1) cancels out of the expression and is redundant.

Up to now, it was implicitly assumed that the distributions of the covariates are known a priori. In practice, this is rarely true, and we have to rely on estimates of these quantities. Let

be kernel estimates of the densities p(·),p1(·), and p2(·), respectively. Then the feasible procedure is defined with a replacement of the instrumental variable W by

and taking sample averages instead of population expectations. Section 3 provides a rigorous statistical treatment for feasible instrumental variable estimators based on local linear estimation. See Kim et al. (1999) for a slightly different approach.

Next, we come to the main advantage that the local instrumental variable method has. This is in terms of the computational cost. The marginal integration method actually needs n2 regression smoothings evaluated at the pairs (X1i,X2j), for i,j = 1,…,n, whereas the backfitting method requires nr operations—where r is the number of iterations to achieve convergence. The instrumental variable procedure, in contrast, takes at most 2n operations of kernel smoothings in a preliminary step for estimating the instrumental variable and another n operations for the regressions. Thus, it can be easily combined with the bootstrap method whose computational costs often become prohibitive in the case of marginal integration (see Kim et al., 1999).

Finally, we show how the instrumental variable approach can be applied to generalized additive models. Let F(·) be the inverse of a known link function G(·) and let m(X) = E(y|X). The model is defined as

or equivalently G(m(X)) = m1(X1) + m2(X2). We maintain the same identification condition, E [mα(Xα)] = 0. Unlike in the simple additive model, there is no direct way to relate Wy to m1(X1) here, so (2.8) cannot be implemented. However, under additivity

for the W defined in (2.9). Because m(·) is unknown, we need consistent estimates of m(X) in a preliminary step, and then the calculation in (2.11) is feasible. In the next section we show how these ideas are translated into estimators for the general time series setting.

3. INSTRUMENTAL VARIABLE PROCEDURE FOR GANARCH

We start with some simplifying notations that will be used repeatedly in the discussion that follows. Let xt be the vector of d lagged variables until t − 1, that is, xt = (yt−1,…,ytd), or concisely, xt = (yt−α,yt−α), where yt−α = (yt−1,…,yt−α−1,yt−α+1,…,ytd). Defining

, we can reformulate (1.1)–(1.3) with a focus on the αth components of the mean and variance as

To save space we will use the following abbreviations for the functions to be estimated:

Note that the components

are identified, up to constant c, by φα(·), which will be our major interest in estimation. Subsequently, we examine in some detail each relevant step for computing the feasible nonparametric instrumental variable estimator of φα(·). The set of observations is given by

, where n′ = n + d.

3.1. Step I. Preliminary Estimation of rt = H(xt)

Because rt is unknown, we start with computing the pilot estimates of the regression surface by a local linear smoother. Let

be the first component of

that solves

where

is a one-dimensional kernel function, and h = h(n) is a bandwidth sequence. In a similar way, we get the estimate of the volatility surface,

, from (3.12) by replacing yt with the squared residuals,

. Then, transforming

by the known links will lead to consistent estimates of

,

3.2. Step II: Instrumental Variable Estimation of Additive Components

This step involves the estimation of φα(·), which is equivalent to

, up to the constant c. Let p(·) and pα(·) denote the density functions of the random variables (yt−α,yt−α) and yt−α, respectively. Define the feasible instrument as

where

are computed using the kernel function L(·), for example,

with Lg(·) ≡ L(·/g)/g and g = g(n) is a bandwidth sequence. The instrumental variable local linear estimates

are given as

through minimizing the localized squared errors elementwise:

where

is the jth element of

.

4

For simplicity, we choose the common bandwidth parameter for the kernel function K(·) in (3.12) and (3.13), which amounts to undersmoothing (for our choice of h) for the purposes of estimating m. Undersmoothing in the preliminary estimation of step I allows us control over the biases from estimating m and v. In addition, the convolution kernel function in the asymptotic variance of Theorem 1 relies on the condition of the same bandwidth for K(·).

The closed form of the solution is

4. MAIN RESULTS

Let

be the σ-algebra of events generated by {yt}ab and α(k) the strong mixing coefficient of {yt} that is defined by

Throughout the paper, we make the following assumptions.

Assumption A.

A1. {yt}t=1 is a stationary and strongly mixing process generated by (1.1)–(1.3), with a mixing coefficient such that

, for some ν > 2 and 0 < a < (1 − 2/ν).

As pointed out by Masry and Tjøstheim (1997), the condition on the mixing coefficient in A1 is milder than assumed on the standard mixing process where the coefficient decreases at a geometric rate, that is, α(k) = ρ−βk (for some β > 0). Some technical conditions for regularity are stated here. For simplicity, we assume that the process {yt}t=1 has a compact support.

A2. The additive component functions, mα(·) and vα(·), for α = 1,…,d, are continuous and twice differentiable on the compact support.

A3. The link functions, Gm and Gv, have bounded continuous second-order derivatives over any compact interval.

A4. The joint and marginal density functions, p(·), pα(·), and pα(·), for α = 1,…,d, are continuous, twice differentiable with bounded (partial) derivatives, and bounded away from zero on the compact support.

A5. The kernel functions, K(·) and L(·), are a real bounded nonnegative symmetric (around zero) function on a compact support satisfying ∫K(u) du = ∫L(u) du = 1, ∫uK(u) du = ∫uL(u) du = 0. Also, assume that the kernel functions are Lipschitz-continuous, |K(u) − K(v)| ≤ C|uv|.

A6. (i)

. (ii)

. (iii) The bandwidth satisfies

, where {t(n)} is a sequence of positive integers, t(n) → ∞, such that

.

Conditions A2–A5 are standard in kernel estimation. The continuity assumption in A2 and A4, together with the compact support, implies that the functions are bounded. The bandwidth conditions in A6(i) and A6(ii) are necessary for showing negligibility of the stochastic error terms arising from the preliminary estimation of m, v, and pα(·). Under twice-differentiability of these functions as in A2–A4, the given side conditions are satisfied when d ≤ 4. Our asymptotic results that follow can be extended into a more general case of d > 4, although we do not prove it in the paper. One way of extension to higher dimensions is to strengthen the differentiability conditions in A2–A4 and use higher order polynomials (see Kim et al., 1999). The additional bandwidth condition in A6(iii) is necessary to control the effects from the dependence of the mixing processes in showing the asymptotic normality of instrumental variable estimates. The proof of consistency, however, does not require this condition. Define

and [∇Gm(t), ∇Gv(t)] = [dGm(t)/dt,dGv(t)/dt] . Let (K * K)i(u) = ∫K(w)K(w + u) ×wi dw, a convolution of kernel functions, μK*K2 = ∫(K * K)0(u)u2 du, and ∥K22 denote ∫K2(u) du. The asymptotic properties of the feasible instrumental variable estimates in (3.14) are summarized in the following theorem, whose proof is in the Appendix. Let κ3(yα,zα) = Et3|xt = (yα,zα)] and κ4(yα,zα) = E [(εt2 − 1)2|xt = (yα,zα)] . A [odot ] B denotes the matrix Hadamard product.

THEOREM 1. Assume that conditions A1–A6 hold. Then,

Remarks.

1. To estimate

we can use the following recentered estimates:

, where

. Because

, the bias and variance of

are the same as those of

. For y = (y1,…,yd), the estimates for the conditional mean and volatility are defined by

Let

. Then, by Theorem 1 and the delta method, their asymptotic distribution satisfies

where

. It is easy to see that

are asymptotically uncorrelated for any α and β and that the asymptotic variance of their sum is also the sum of the variances of

.

2. The first term of the bias is of standard form, depending only on the second derivatives as in other local linear smoothing. The last term reflects the biases from using estimates for density functions to construct the feasible instrumental variable,

. When the instrument consisting of known density functions, pα(yt−α)/p(xt), is used in (3.13), the asymptotic properties of instrumental variable estimates are the same as those from Theorem 1 except that the new asymptotic bias now includes only the first two terms of Bα(yα).

3. The convolution kernel (K * K)(·) is the legacy of double smoothing in the instrumental variable estimation of “generalized” additive models because we smooth

with

given by (multivariate) local linear fits. When Gm(·) is the identity, we can directly smooth y instead of

to estimate the components of the conditional mean function. Then, as the following theorem shows, the second term of the bias of Bα does not arise, and the convolution kernel in the variance is replaced by a usual kernel function.

Suppose that Fm(t) = Fv(t) = t in (1.2) and (1.3). In this case, the instrumental variable estimates of φα(yα) can be defined in a simpler way. For φα(yα) = [Mα(yα),Vα(yα)] = [cm + mα(yα), cv + vα(yα)] , we define

by the solution to the adjusted-kernel least squares in (3.13) with the modification that the (2 × 1) vector

is replaced by

, where

is given in step I in Section 3.1. Theorem 2 shows the asymptotic normality of these estimates. The proof is almost the same as that of Theorem 1 and thus is omitted.

THEOREM 2. Under the same conditions as Theorem 1,

Although the instrumental variable estimators achieve the one-dimensional optimal convergence rate, there is room for improvement in terms of variance. For example, compared with the marginal integration estimators of Linton and Härdle (1996) or Linton and Nielsen (1995), the asymptotic variances of the instrumental variable estimates for m1(·) in Theorems 1 and 2 include an additional factor of m22(·). This is because the instrumental variable approach treats η = m2(X2) + ε in (2.6) as if it were the error term of the regression equation for m1(·). Note that the second term of the asymptotic covariance in Theorem 2 is the same as that in Yang et al. (1999), where the authors only considered the case with additive mean and multiplicative volatility functions. The issue of efficiency in estimating an additive component was first addressed by Linton (1996) based on “oracle efficiency” bounds of infeasible estimators under the knowledge of other components. According to this, both instrumental variable and marginal integration estimators are inefficient, but they can attain the efficiency bounds through one simple additional step, following Linton (1996, 2000) and Kim et al. (1999).

5. MORE EFFICIENT ESTIMATION

5.1. Oracle Standard

In this section we define a standard of efficiency that could be achieved in the presence of certain information, and then we show how to achieve this in practice. There are several routes to efficiency here, depending on the assumptions one is willing to make about εt. We shall take an approach based on likelihood, that is, we shall assume that εt is i.i.d. with known density function f like the normal or t with given degrees of freedom. It is easy to generalize this to the case where f contains unknown parameters, but we shall not do so here. It is also possible to build an efficiency standard based on the moment conditions in (1.1)–(1.3). We choose the likelihood approach because it leads to easy calculations and links with existing work and is the most common method for estimating parametric ARCH/GARCH models in applied work.

There are several standards that we could apply here. First, suppose that we know (cm,{mβ(·) : β ≠ α}) and (cv,{vα(·) : α}); then what is the best estimator we can obtain for the function mα within the local polynomial paradigm? Similarly, suppose that we know (cm,{mα(·) : α}) and (cv,{vβ(·) : β ≠ α}); then what is the best estimator we can obtain for the function vα? It turns out that this standard is very high and cannot be achieved in practice. Instead we ask: suppose that we know (cm,{mβ(·) : β ≠ α}) and (cv,{vβ(·) : β ≠ α}); then what is the best estimator we can obtain for the functions (mα,vα)? It turns out that this standard can be achieved in practice. Let π denote −log f (·), where f (·) is the density function of εt. We use zt to denote (xt,yt), where xt = (yt−1,…,ytd) = (yt−α,yt−α). For θ = (θab) = (am,av,bm,bv), we define

where

. With lt(θ,γα) being the (negative) conditional local log likelihood, the infeasible local likelihood estimator

is defined by the minimizer of

where γα0(·) = (γ0(·),γ0(·)) = (cm0 + mα0(·),cv0 + vα0(·)). From the definition for the score function

the first-order condition for

is given by

The asymptotic distribution of the local maximum likelihood estimator has been studied by Avramidis (2002). For y = (y1,…,yd) = (yα,yα), define

where

With a minor generalization of the results by Avramidis (2002, Theorem 2), we obtain the following asymptotic properties for the infeasible estimators:

. Let

, that is, φαc(yα) = φα(yα) − c, where c = (cm,cv).

THEOREM 3. Under Assumption C in the Appendix, it holds that

where

.

A more specific form for the asymptotic variance can be calculated. For example, suppose that the error density function, f (·), is symmetric. Then, the asymptotic variance of the volatility function is given by

where g(y) = f′(y) f−1(y)y + 1 and q(y) = [y2f′′(y) f (y) + yf′(y) f (y) − y2f′(y)2] f−2(y).

When the error distribution is Gaussian, we can further simplify the asymptotic variance; that is,

In this case, one can easily find the infeasible estimator to have lower asymptotic variance than the instrumental variable estimator. To see this, we note that ∇Gm = 1/∇Fm and ∥K22 ≤ ∥(K * K)022 and apply the Cauchy–Schwarz inequality to get

In a similar way, from κ4 = 3 due to the Gaussianity assumption on ε, it follows that

These, together with κ3 = 0, imply that the second term of Σα*(yα) in Theorem 1 is greater than Ωα*(yα) in the sense of positive definiteness, and hence Σα*(yα) ≥ Ωα*(yα), because the first term of Σα*(yα) is a nonnegative matrix. The infeasible estimator is more efficient than the instrumental variable estimator because the former uses more information concerning the mean-variance structure. We finally remark that the infeasible estimator is also more efficient that the marginal integration estimator in Yang et al. (1999) whose asymptotic variance corresponds to the second term of Σα*(yα); see the discussion following Theorem 2.

5.2. Feasible Estimation

Let

be the estimators from (3.12) and (3.13) in Section 3, with the common bandwidth parameter h0 chosen for the kernel function K(·). We define the feasible local likelihood estimator

as the minimizers of

where

is given by (5.15), with the additional bandwidth parameter h, possibly different from h0. Then, the first-order condition for

is given by

Let

. We have the following result.

THEOREM 4. Under Assumptions B and C in the Appendix, it holds that

This result shows that the oracle efficiency bound is achieved by the two-step estimator.

6. NUMERICAL EXAMPLES

A small-scale simulation is carried out to investigate the finite-sample properties of both the instrumental variable and two-step estimators. The design in our experiment is additive nonlinear ARCH(2):

where ΦN(·) is the (cumulative) standard normal distribution function and εt is i.i.d. with N(0,1). Figure 1 (solid lines) depicts the shapes of the volatility functions defined by v1(·) and v2(·). Based on the preceding model, we simulate 500 samples of ARCH processes with sample size n = 500. For each realization of the ARCH process, we apply the instrumental variable estimation procedure in (3.13) with

to get preliminary estimates of v1(·) and v2(·). Those estimates then are used to compute the two-step estimates of volatility functions based on the feasible local maximum likelihood estimator in Section 5.2, under the normality assumption for the errors. The infeasible oracle estimates are also provided for comparisons. The Gaussian kernel is used for all the nonparametric estimates, and bandwidths are chosen according to the rule of thumb (Härdle, 1990), h = ch std(yt)n−1/(4+d), where std(yt) is the standard deviation of yt. We fix ch = 1 for both the density estimates (for computing the instruments, W) and instrumental variable estimates in (3.13) and ch = 1.5 for the (feasible and infeasible) local maximum likelihood estimator. To evaluate the performance of the estimators, we calculate the mean squared error, together with the mean absolute deviation error, for each simulated datum; for α = 1,2,

where {y1,..,y50} are grid points on [−1,1). The grid range covers about 70% of the observations on average. Table 1 gives averages of eα,MSE's and eα,MAE's from 500 repetitions.

Averages of volatility estimates (demeaned): (a) first lag; (b) second lag.

Averages MSE and MAE for three volatility estimators

Table 1 shows that the infeasible oracle estimator is the best out of the three, as would be expected. The performance of the instrumental variable estimator seems to be reasonably good, compared to the local maximum likelihood estimators, at least in estimating the volatility function of the first lagged variable. However, the overall accuracy of the instrumental variable estimates is improved by the two-step procedure, which behaves almost as well as the infeasible one, confirming our theoretical results in Theorem 4. For more comparisons, Figure 1 shows the averaged estimates of volatility functions, where the averages are made, at each grid, over 500 simulations. In Figure 2, we also illustrate the estimates for three typical (consecutive) realizations of ARCH processes.

Volatility estimates (demeaned).

APPENDIX

A.1. Proofs for Section 4.

The proof of Theorem 1 consists of three steps. Without loss of generality we deal with the case α = 1; here we will use the subscript 2, for expositional convenience, to denote the nuisance direction. That is, p2(yk−1) = p1(yk−1) in the case of density function. For component functions, m2(yk−1), v2(yk−1), and H2(yk−1) will be used instead of m1(yk−1), v1(yk−1), and H1(yk−1), respectively. We start by decomposing the estimation errors,

, into the main stochastic term and bias. Use XnYn to denote Xn = Yn{1 + op(1)} in the following. Let vec(X) denote the vectorization of the elements of the matrix X along with columns.

Proof of Theorem 1.

Step I. Decompositions and Approximations.

Because

is a column vector, the vectorization of equation (3.14) gives

A similar form is obtained for the true function, φ1(y1),

by the identity

because

By defining

, the estimation errors are

where

Observing

where

, it follows by adding and subtracting rk = φ1(yk−1) + H2(yk−1) that

As a result of the boundedness condition in Assumption A2, the Taylor expansion applied to

at [m(xk),v(xk)] yields the first term of τn as

where

,

and m*(xk)[v*(xk)] is between

. In a similar way, the Taylor expansion of φ1(yk−1) at y1 gives the second term of τn as

The term

continues to be simplified by some further approximations. Define the marginal expectation of estimated density functions

as follows:

In the first approximation, we replace the estimated instrument,

, by the ratio of the expectations of the kernel density estimates, p2(yk−1)/p(xk) and deal with the linear terms in the Taylor expansions. That is,

is approximated with an error of

by t1n + t2n:

based on the following results:

To show (i), consider the first two elements of the term, for example, which are bounded elementwise by

The last equality is direct from the uniform convergence theorems in Masry (1996) that

and

. The proof for (ii) is shown by applying Lemma A.1, which follows. The negligibility of (iii) follows in a similar way from (ii), considering (A.1). Although the asymptotic properties of s0n and t2n are relatively easy to derive, additional approximation is necessary to make t1n more tractable. Note that the estimation errors of the local linear fits,

, are decomposed into

from the approximation results for the local linear smoother in Jones, Davies, and Park (1994). A similar expression holds for volatility estimates,

, with a stochastic term of (1/n)[sum ]l [Kh(xlxk)/p(xl)]v(xl)(εl2 − 1). Define

and let J(xl) denote the marginal expectation of Jk,n with respect to xk. Then, the stochastic term of t1n, after rearranging its the double sums, is approximated by

because the approximation error from J(Xl) negligible, that is,

applying the same method as in Lemma A.1. A straightforward calculation gives

where

Observe that (K * K)i((yl−1y1)/h) in J(Xl) is actually a convolution kernel and behaves just like a one-dimensional kernel function of yl−1. This means that the standard method (central limit theorem or law of least numbers) for univariate kernel estimates can be applied to show the asymptotics of

If we define s1n as the remaining bias term of t1n, the estimation errors of

consist of two stochastic terms,

, and three bias terms,

, where

Step II. Computation of Variance and Bias.

We start with showing the order of the main stochastic term,

where ξk = ξ1k + ξ2k,

by calculating its asymptotic variance. Dividing a normalized variance of

into the sums of variances and covariances gives

where the last equality comes from the stationarity assumption.

We claim that

where

Proof of (a). Noting

by the stationarity assumption. Applying the integration with substitution of variable and Taylor expansion, the expectation term is

where κ3(y1,z2) = Et3|xt = (y1,z2)] and κ4(y1,z2) = E [(εt2 − 1)2|xt = (y1,z2)]. █

Proof of (b). Because

By setting c(n)h → 0, as n → ∞, we separate the covariance terms into two parts:

To show the negligibility of the first part of the covariances, consider that the dominated convergence theorem used after Taylor expansion and the integration with substitution of variables gives

Therefore, it follows from the assumption on the boundedness condition in Assumption A2 that

where AB means aijbij, for all element of matrices A and B. By the construction of c(n),

Next, we turn to the negligibility of the second part of the covariances:

Let ξ2ki be the ith element of ξ2k, for i = 1,…,4. Using Davydov's lemma (in Hall and Heyde, 1980, Theorem A.5), we obtain

for some v > 2. The boundedness of

, for example, is evident from the direct calculation that

Thus, the covariance is bounded by

This implies

if a is such that

for example, c(n)ah1−2/v = 1, which implies c(n) → ∞. If we further restrict a such that

then

Thus, c(n)h → 0 as required. Therefore,

as n goes to ∞. █

The proof of (c) is immediate from (a) and (b).

Next, we consider the asymptotic bias. Using the standard result on the kernel weighted sum of the stationary series, we first get

because

For the asymptotic bias of s1n, we again use the approximation results in Jones et al. (1994). Then, the first component of s1n, for example, is

and converges to

based on the argument for the convolution kernel given previously. A convolution of symmetric kernels is symmetric, so that ∫(K * K)0(u)udu = 0 and ∫(K * K)1(u)u2 du = ∫∫wK(w)K(w + u)u2 dwdu = 0. This implies that

To calculate s2n, we use the Taylor series expansion of p2(yk−1)/p(Xk):

Thus,

Finally, for the probability limit of

we note that

with

, for i = 0,1,2, and

where q0 = 1, q1 = 0, and q2 = μK2.

Thus,

. Therefore,

Step III. Asymptotic Normality of
.

Applying the Cramer–Wold device, it is sufficient to show

for all

. We use the small block–large block argument (see Masry and Tjøstheim, 1997). Partition the set {d,d + 1,…,n} into 2k + 1 subsets with large blocks of size r = rn and small blocks of size s = sn where

and [x] denotes the integer part of x. Define

Then,

Because of Assumption A6, there exists a sequence an → ∞ such that

defining the large block size as

It is easy to show by (A.2) and (A.3) that as n → ∞

We first show that Sn′′ and Sn′′′ are asymptotically negligible. The same argument used in step II yields

which implies

from the condition (A.4). Next, consider

where Nj = j(r + s) + r. Because |NiNj + k1k2| ≥ r, for ij, the covariance term is bounded by

The last equality also follows from step II. Hence, (1/n)E {(Sn′′)2} → 0, as n → ∞. Repeating a similar argument for Sn′′′, we get

Now, it remains to show

.

Because ηj is a function of

-measurable, the Volkonskii and Rozanov's lemma (1959) in the appendix of Masry and Tjøstheim (1997) implies that, with

,

where the last two equalities follow from (A.4). Thus, the summands {ηj} in Sn′ are asymptotically independent, because an operation similar to (A.5) yields

Finally, because of the boundedness of density and kernel functions, the Lindeberg–Feller condition for the asymptotic normality of Sn′ holds:

for every δ > 0. This completes the proof of step III.

From

, the Slutzky theorem implies

, where

. In sum,

given by

LEMMA A.1. Assume the conditions in Assumptions A1 and A4–A6. For a bounded function, F(·), it holds that

Proof. The proof of (b) is almost the same as (a). Therefore we only show (a). By adding and subtracting Ll|k(yl−2|yk−2), the conditional expectation of Lg(yl−2yk−2) given yk−2 in r1n, we get r1n = ξ1n + ξ2n, where

Rewrite ξ2n as

where k*(n) is increasing to infinity as n → ∞. Let

which exists as a result of the boundedness of F(xk). Then, for a large n, the first part of ξ2n is asymptotically equivalent to (1/n)k*(n)B. The second part of ξ2n is bounded by

Therefore,

, for example.

It remains to show

. Because E1n) = 0 from the law of iteration, we just compute

(1) Consider the case k = i and lj.

because, by the law of iteration and the definition of Lj|k(yk−2),

(2) Consider the case l = j and ki.

We only calculate

because the rest of the triple sum consists of expectations of standard kernel estimates and is O(1/n). Note that

where (L * L)g(·) = (1/g)∫L(u)L(u + ·/g) is a convolution kernel. Thus, (A.6) is

(3) Consider the case with i = k, j = m:

(4) Consider the case ki, lj:

for the same reason as in (1). █

A.2. Proofs for Section 5.

Recall that xt = (yt−1,…,ytd) = (yt−α,yt−α) and zt = (xt,yt). In a similar context, let x = (y1,..,yd) = (yα,yα) and z = (x,y0). For the score function s*(z,θ,γα) = s*(z,θ,γα(yα)), we define its first derivative with respect to the parameter θ by

and use

to denote E [s*(zt,θ,γα)] and E [∇θ s*(zt,θ,γα)] , respectively. Also, the score function s*(z,θ,·) is said to be Frechet differentiable (with respect to the sup norm ∥·∥) if there is S*(z,θ,γα) such that for all γα with ∥γα − γα0 small enough,

for some bounded function b(·). The term S*(z,θ,γα0) is called the functional derivative of s*(z,θ,γα) with respect to γα. In a similar way, we define ∇γ S*(z,θ,γα) to be the functional derivative of S*(z,θ,γα) with respect to γα.

Assumption B. Suppose that (i)

is nonsingular; (ii) S*(z,θ,γα(yα)) and ∇γ S*(z,θ,γα(yα)) exist and have square integrable envelopes S*(·) and γ S*(·), satisfying

and (iii) both s*(z,θ,γα) and S*(z,θ,γα) are continuously differentiable in θ, with derivatives bounded by square integrable envelopes.

Note that the first condition is related to the identification condition of component functions, whereas the second concerns Frechet differentiability (up to the second order) of the score function and uniform boundedness of the functional derivatives. For the main results in Section 5, we need the following conditions. Some of the assumptions are stronger than their counterparts in Assumption A in Section 4. Let h0 and h denote the bandwidth parameter used for the preliminary instrumental variable and the two-step estimates, respectively, and g denote the bandwidth parameter for the kernel density.

Assumption C.

1. {yt}t=1 is stationary and strongly mixing with a mixing coefficient α(k) = ρ−βk, for some β > 0, and Et4xt) < ∞, where εt = ytE(yt|xt).

2. The joint density function, p(·), is bounded away from zero and q-times continuously differentiable on the compact supports

, with Lipschitz continuous remainders, that is, there exists C < ∞ such that for all

, for all vectors μ = (μ1,…,μd) with

.

3. The component functions, mα(·) and vα(·), for α = 1,…,d, are q-times continuously differentiable on

with Lipschitz continuous qth derivative.

4. The link functions, Gm and Gv, are q-times continuously differentiable over any compact interval of the real line.

5. The kernel functions, K(·) and L(·), are of bounded support, symmetric about zero, satisfying ∫K(u) du = ∫L(u) du = 1, and of order q, that is, ∫uiK(u) du = ∫uiL(u) du = 0, for i = 1,…,q − 1. Also, the kernel functions are q-times differentiable with Lipschitz continuous qth derivative.

6. The true parameters θ0 = (mα(yα),vα(yα),mα′(yα),vα′(yα)) lie in the interior of the compact parameter space Θ.

7. (i) g → 0, ngd → ∞ and (ii) h0 → 0, nh0 → ∞.

8. (i)

and for some integer ω > d/2,

 (ii) n(h0 h)2ω+1/log n → ∞; h0q−ωh−ω−1/2 → 0;

 (iii) nh0d+(4ω+1)/log n → ∞; q ≥ 2ω + 1.

Some facts about empirical processes will be useful in the discussion that follows. Define the L2-Sobolev norm (of order q) on the class of real-valued function with domain

:

where, for

and a k-vector μ = (μ1,…,μk) of nonnegative integers,

and q ≥ 1 is some positive integer. Let

be an open set in

with minimally smooth boundary as defined by, for example, Stein (1970), and

. Define

as a class of smooth functions on

whose L2-Sobolev norm is bounded by some constant

. In a similar way,

.

Define (i) an empirical process, v1n(·), indexed by

:

with pseudometric ρ1(·,·) on

:

where f1(w;τ) = h−1/2K((wαyα)/h)S*(wα01(wα); and (ii) an empirical process, v2n(·,·), indexed by

:

with pseudometric ρ2(·,·) on

:

where f2(w; yα2) = h0−1/2K [(wαyα)/h0][pα(wα)/p(w)]Gm′(m(w))τ2(w).

We say that the processes {ν1n(·)} and {ν2n(·,·)} are stochastically equicontinuous at τ10 and (yα020), respectively (with respect to the pseudometric ρ1(·,·) and ρ2(·,·), respectively), if

and

respectively, where P* denotes the outer measure of the corresponding probability measure.

Let

be the class of functions such as f1(·) defined previously. Note that (A.10) follows, if Pollard's entropy condition is satisfied by

with some square integrable envelope F1; see Pollard (1990) for more details. Because f1(w1) = c1(w1(wα) is the product of smooth functions τ1 from an infinite-dimensional class (with uniformly bounded partial derivatives up to order q) and a single unbounded function c(w) = [h−1/2K((wαyα)/h)S*(wα0)] , the entropy condition is verified by Theorem 2 in Andrews (1994) on a class of functions of type III. Square integrability of the envelope F1 comes from Assumption B(ii). In a similar way, we can show (A.11) by applying the “mix and match” argument of Theorem 3 in Andrews (1994) to f2(w; yα2) = c2(w)h−1/2K((wαyα)/h02(w), where K(·) is Lipschitz continuous in yα, that is, a function of type II.

Proof of Theorem 4. We only give a sketch, because the whole proof is lengthy and relies on arguments similar to Andrews (1994) or Gozalo and Linton (2000) for the i.i.d. case. Expanding the first-order condition in (5.16) and solving for

yields

where θ is the mean value between

. By the uniform law of large numbers in Gozalo and Linton (2000), we have

, which, together with (i) uniform convergence of

by Lemma A.3 and (ii) uniform continuity of the localized likelihood function, Qn(θ,γα) over Θ × Γα, yields

and thus consistency of

. Based on the ergodic theorem on the stationary time series and a similar argument to Theorem 1 in Andrews (1994), consistency of

and uniform convergence of

imply

For the numerator, we first linearize the score function. Under Assumption B(ii), s*(z,θ,γα) is Frechet differentiable and (A.7) holds, which, because of

(by Lemma A.3 and Assumption C.8(i)), yields a proper linearization of the score term:

where S*(ztα0(yt−α)) = S*(zt0α0(yt−α)). Or equivalently, by letting

and ut = S*(xtα0(yt−α)) − E [S*(xtα0(yt−α))|xt = y] , we have

Note that the asymptotic expansion of the infeasible estimator is equivalent to the first term of the linearized score function premultiplied by the inverse Hessian matrix in (A.12). Because of the asymptotic boundedness of (A.12), it suffices to show the negligibility of the second and third terms.

To calculate the asymptotic order of T2n, we make use of the preceding stochastic equicontinuity results. For a real-valued function δ(·) on

, we define an empirical process

where f (xt; yα,δ) = K((yt−αyα)/h)hωS*(xtα0(yt−α))δ(yt−α), for some integer ω > d/2. Let

. From the uniform convergence rate in Lemma A.3 and the bandwidth condition C.8(ii), it follows that

Because

is bounded uniformly over

, with probability approaching one, it holds that

. Also, because, for some positive constant C < ∞,

we have

. Hence, following Andrews (1994, p. 2257), the stochastic equicontinuity condition of vn(yα,·) at δ0 = 0 implies that

; that is, T2n is approximated (with an op(1) error) by

We proceed to show negligibility of Tn2*. From the integrability condition on S*(zα0(yα)), it follows, by change of variables and the dominated convergence theorem, that ∫Kh(yαyα0)S*(zα0(yα)) dF0(z) = ∫S*[(y,yα0,yα),γα0(yα)] × p(y,yα0,yα) d(y,yα) < ∞, which, together with

-consistency of

, means that

. Because

this yields

From Lemma A.3,

where

. Under the condition C.8(i),

, integrability of the bias function bβ(yβ) and S*(z0α0(yα)) imply

where

Let

be the ith elements of

, respectively, with S*ij(·) being the (i,j) element of S*(·). By the dominated convergence theorem and the integrability condition, we have

where

and ∇Gj(·) = ∇Gm(·), for j = 1; ∇Gv(·), for j = 2. Because p2(·)/p(·) and

are bounded under the condition of compact support, applying the law of large numbers for i.i.d. errors

leads to

and consequently

. Likewise,

where

and, for the same reason as before, we get

, because E(mα(yt−α)) = E(vα(yt−α)) = 0.

We finally show negligibility of the last term:

Substituting the error decomposition for

and interchanging the summations gives

where the op(1) errors for the remaining bias terms hold under the assumption that

. For

we can easily check that E1ni(zt,zs)|zt) = E1ni(zt,zs)|zs) = 0, for ts, implying that [sum ][sum ]ts πni(zt,zs) is a degenerate second-order U-statistic. The same conclusion also holds for the second term. Hence, the two double sums are mean zero and have variance of the same order as

which is of order n−1h−1. Therefore, T3n = op(1). █

LEMMA A.2. (Masry, 1996). Suppose that Assumption C holds. Then, for any vector

with |μ| = Σj μj ≤ ω,

LEMMA A.3. Suppose that Assumption C holds. Then, for any vector

with |μ| = Σj μj ≤ ω,

Proof. We first show (b). For notational simplicity, the bandwidth parameter h (only in this proof) abbreviates h0. From the decomposition results for the instrumental variable estimates,

By the Cauchy–Schwarz inequality and Lemma A.2 applied with Taylor expansion, it holds that

where the boundedness condition of C.2 is used for the last line. Hence, the standard argument of Masry (1996) implies that

, where qi = ∫K(u1)u1i du1. From q0 = 1, q1 = 0, and q2 = μK2, we get the following uniform convergence result for the denominator term; that is,

, uniformly in

. For the numerator, we show the uniform convergence rate of the first element of τn because the other terms can be treated in the same way. Let τn1 denote the first element of τn, that is,

or alternatively,

where

Because pα(·)/p(·) is bounded away from zero and Gm has a bounded second-order derivative, the functional r(xt;g) is Frechet differentiable in g, with respect to the sup norm ∥·∥, with the (bounded) functional derivative R(xt;g) = [∂r(xt;g)/∂g]↓g=g(xt). This implies that for all g with ∥gg0 small enough, there exists some bounded function b(·) such that

By Lemma A.2,

, and consequently, we can properly linearize τn1 as

where the Opn2) error term is uniformly in xα. After plugging Gm(m(xt)) = cm + Σ1≤β≤d mβ(yt−β) into r(xt;g0), a straightforward calculation shows that

where ςt = [p2(yt−α)/p(xt)]Mα(yt−α) and Mα(yt−α) = Σ1≤β≤d,(≠α) mβ(yt−α). Note that as a result of the identification condition Et|yt−α] = 0 and consequently the first term is a standard stochastic term appearing in kernel estimates. For a further asymptotic expansion of the second term of τn1, we use the stochastic equicontinuity argument to the empirical process {vn(·,·)}, indexed by

, with

, such that

where f (xt; yα,δ) = K[(yt−αyα)/h] hω[pα(yt−α)/p(xt)]Gm′(m(xt))δ(yt−α), for some positive integer ω > d/2. Let

. From the uniform convergence rate in Lemma A.2 and the bandwidth condition in C.8(iii), it follows that

, leading to (i)

and (ii)

, where δ0 = 0. These conditions and stochastic equicontinuity of vn(·,·) at (yα0) yield

. Thus, the second term of τn1 is approximated with an

error (uniform in yα) by

which, by substituting

, is given by

where (K * K)(·) is actually a convolution kernel as defined before. Hence, by letting bα(yα) summarize two bias terms appearing in (A.13) and (A.14), Lemma A.3(b) is shown. The uniform convergence results in part (a) then follow by the standard arguments of Masry (1996), because two stochastic terms in the asymptotic expansion of

consist only of univariate kernels. █

References

REFERENCES

Andrews, D.W.K. (1994) Empirical process methods in econometrics. In R.F. Engle & D. McFadden (eds.), Handbook of Econometrics, vol. IV, pp. 22472294. North-Holland.
Auestadt, B. & D. Tjøstheim (1990) Identification of nonlinear time series: First order characterization and order estimation. Biometrika 77, 669687.Google Scholar
Avramidis, P. (2002) Local maximum likelihood estimation of volatility function. Manuscript, LSE.
Breiman, L. & J.H. Friedman (1985) Estimating optimal transformations for multiple regression and correlation (with discussion). Journal of the American Statistical Association 80, 580619.Google Scholar
Buja, A., T. Hastie, & R. Tibshirani (1989) Linear smoothers and additive models (with discussion). Annals of Statistics 17, 453555.Google Scholar
Cai, Z. & E. Masry (2000) Nonparametric estimation of additive nonlinear ARX time series: Local linear fitting and projections. Econometric Theory 16, 465501.Google Scholar
Carrasco, M. & X. Chen (2002) Mixing and moment properties of various GARCH and stochastic volatility models. Econometric Theory 18, 1739.Google Scholar
Chen, R. (1996) A nonparametric multi-step prediction estimator in Markovian structures. Statistical Sinica 6, 603615.Google Scholar
Chen, R. & R.S. Tsay (1993a) Nonlinear additive ARX models. Journal of the American Statistical Association 88, 955967.Google Scholar
Chen, R. & R.S. Tsay (1993b) Functional-coefficient autoregressive models. Journal of the American Statistical Association 88, 298308.Google Scholar
Engle, R.F. (1982) Autoregressive conditional heteroscedasticity with estimates of the variance of U.K. inflation. Econometrica 50, 9871008.Google Scholar
Fan, J. & Q. Yao (1996) Efficient estimation of conditional variance functions in stochastic regression. Biometrica 85, 645660.Google Scholar
Gozalo, P. & O. Linton (2000) Local nonlinear least squares: Using parametric information in nonparametric regression. Journal of Econometrics 99(1), 63106.Google Scholar
Hall, P. & C. Heyde (1980) Martingale Limit Theory and Its Application. Academic Press.
Härdle, W. (1990) Applied Nonparametric Regression. Econometric Monograph Series 19. Cambridge University Press.
Härdle, W. & A.B. Tsybakov (1997) Locally polynomial estimators of the volatility function. Journal of Econometrics 81, 223242.Google Scholar
Härdle, W., A.B. Tsybakov, & L. Yang (1998) Nonparametric vector autoregression. Journal of Statistical Planning and Inference 68(2), 221245.Google Scholar
Härdle, W. & P. Vieu (1992) Kernel regression smoothing of time series. Journal of Time Series Analysis 13, 209232.Google Scholar
Hastie, T. & R. Tibshirani (1990) Generalized Additive Models. Chapman and Hall.
Hastie, T. & R. Tibshirani (1987) Generalized additive models: Some applications. Journal of the American Statistical Association 82, 371386.Google Scholar
Horowitz, J. (2001) Estimating generalized additive models. Econometrica 69, 499513.Google Scholar
Jones, M.C., S.J. Davies, & B.U. Park (1994) Versions of kernel-type regression estimators. Journal of the American Statistical Association 89, 825832.Google Scholar
Kim, W., O. Linton, & N. Hengartner (1999) A computationally efficient oracle estimator of additive nonparametric regression with bootstrap confidence intervals. Journal of Computational and Graphical Statistics 8, 120.Google Scholar
Linton, O.B. (1996) Efficient estimation of additive nonparametric regression models. Biometrika 84, 469474.Google Scholar
Linton, O.B. (2000) Efficient estimation of generalized additive nonparametric regression models. Econometric Theory 16, 502523.Google Scholar
Linton, O.B. & W. Härdle (1996) Estimating additive regression models with known links. Biometrika 83, 529540.Google Scholar
Linton, O.B. & J. Nielsen (1995) A kernel method of estimating structured nonparametric regression based on marginal integration. Biometrika 82, 93100.Google Scholar
Linton, O.B., J. Nielsen, & S. van de Geer (2003) Estimating multiplicative and additive hazard functions by kernel methods. Annals of Statistics 31, 464492.Google Scholar
Linton, O.B., N. Wang, R. Chen, & W. Härdle (1995) An analysis of transformation for additive nonparametric regression. Journal of the American Statistical Association 92, 15121521.Google Scholar
Mammen, E., O.B. Linton, & J. Nielsen (1999) The existence and asymptotic properties of a backfitting projection algorithm under weak conditions. Annals of Statistics 27, 14431490.Google Scholar
Masry, E. (1996) Multivariate local polynomial regression for time series: Uniform strong consistency and rates. Journal of Time Series Analysis 17, 571599.Google Scholar
Masry, E. & D. Tjøstheim (1995) Nonparametric estimation and identification of nonlinear ARCH time series: Strong convergence and asymptotic normality. Econometric Theory 11, 258289.Google Scholar
Masry, E. & D. Tjøstheim (1997) Additive nonlinear ARX time series and projection estimates. Econometric Theory 13, 214252.Google Scholar
Nelson, D.B. (1991) Conditional heteroskedasticity in asset returns: A new approach. Econometrica 59, 347370.Google Scholar
Newey, W.K. (1994) Kernel estimation of partial means. Econometric Theory 10, 233253.Google Scholar
Opsomer, J.D. & D. Ruppert (1997) Fitting a bivariate additive model by local polynomial regression. Annals of Statistics 25, 186211.Google Scholar
Pollard, D. (1990) Empirical Processes: Theory and Applications. CBMS Conference Series in Probability and Statistics, vol. 2. Institute of Mathematical Statistics.
Robinson, P.M. (1983) Nonparametric estimation for time series models. Journal of Time Series Analysis 4, 185208.Google Scholar
Silverman, B.W. (1986) Density Estimation for Statistics and Data Analysis. Chapman and Hall.
Stein, E.M. (1970) Singular Integrals and Differentiability Properties of Functions. Princeton University Press.
Stone, C.J. (1985) Additive regression and other nonparametric models. Annals of Statistics 13, 685705.Google Scholar
Stone, C.J. (1986) The dimensionality reduction principle for generalized additive models. Annals of Statistics 14, 592606.Google Scholar
Tjøstheim, D. & B. Auestad (1994) Nonparametric identification of nonlinear time series: Projections. Journal of the American Statistical Association 89, 13981409.Google Scholar
Volkonskii, V. & Y. Rozanov (1959) Some limit theorems for random functions. Theory of Probability and Applications 4, 178197.Google Scholar
Yang, L., W. Härdle, & J. Nielsen (1999) Nonparametric autoregression with multiplicative volatility and additive mean. Journal of Time Series Analysis 20, 579604.Google Scholar
Ziegelmann, F. (2002) Nonparametric estimation of volatility functions: The local exponential estimator. Econometric Theory 18, 985992.Google Scholar
Figure 0

Averages of volatility estimates (demeaned): (a) first lag; (b) second lag.

Figure 1

Averages MSE and MAE for three volatility estimators

Figure 2

Volatility estimates (demeaned).