SOME CONVERGENCE THEORY FOR ITERATIVE ESTIMATION PROCEDURES WITH AN APPLICATION TO SEMIPARAMETRIC ESTIMATION

Jeff Dominitz; Robert P. Sherman

doi:10.1017/S0266466605050425

SOME CONVERGENCE THEORY FOR ITERATIVE ESTIMATION PROCEDURES WITH AN APPLICATION TO SEMIPARAMETRIC ESTIMATION

Published online by Cambridge University Press: 19 July 2005

Jeff Dominitz and

Robert P. Sherman

Show author details

Jeff Dominitz: Affiliation:
Carnegie Mellon University
Robert P. Sherman: Affiliation:
California Institute of Technology

Article contents

Abstract
1. INTRODUCTION
2. THEORY
3. AN ILLUSTRATION
4. SIMULATIONS
5. SUMMARY
APPENDIX
References

Rights & Permissions

Abstract

We develop general conditions for rates of convergence and convergence in distribution of iterative procedures for estimating finite-dimensional parameters. An asymptotic contraction mapping condition is the centerpiece of the theory. We illustrate some of the results by deriving the limiting distribution of a two-stage iterative estimator of regression parameters in a semiparametric binary response model. Simulation results illustrating the computational benefits of the first-stage iterative estimator are also reported.We thank a co-editor and two referees for comments and criticisms that led to significant improvements in this paper. We also thank Roger Klein for providing us with Gauss code to compute his estimator.

Type: Research Article
Information: Econometric Theory , Volume 21 , Issue 4 , August 2005 , pp. 838 - 863

DOI: https://doi.org/10.1017/S0266466605050425 [Opens in a new window]
Copyright: © 2005 Cambridge University Press

1. INTRODUCTION

There is a substantial literature on convergence properties of various widely applied iterative estimation procedures such as the expectation-maximization (EM) algorithm and its many descendants (see, e.g., McLachlan and Krishnan, 1997). Often in this literature, observed data are conditioned on, and convergence refers to numerical convergence of sample iterates to a sample fixed point. Similarly, a rate of convergence refers to how fast the sample iterates converge to the sample fixed point as the number of iterations increases. Although useful computationally, such information is not sufficient for doing asymptotic inference on the parameter of interest, namely, the population parameter estimated by the sample fixed point. For example, such information neither reveals the limiting distribution of the sample sequence nor implies a rate of convergence to the parameter of interest, and it gives no guidelines about the number of iterations needed to achieve these results. The literature on finite-step estimation sheds some light on these problems but requires a starting value that converges to the parameter of interest at a known rate (see, e.g., Robinson, 1988; Lehmann, 1983, Ch. 6.3; Bickel, 1975). By contrast, most iterative estimation procedures do not start at consistent starting values. Hence the need for theory enabling asymptotic inference about a parameter of interest for general iterative estimation procedures. This paper provides such theory.

Specifically, this paper develops checkable conditions for consistency, rates of convergence, and convergence in distribution of iterative procedures for estimating finite-dimensional parameters. The theory covers procedures such as expectation-maximization (EM), Newton–Raphson (NR), and iterative least squares (ILS). The key requirement is that the sample mapping generating the procedure and a population analogue be contraction mappings, asymptotically. We also isolate a convenient bias condition from which sensible stopping rules can be derived.

We illustrate the theory by establishing the limiting distribution of a two-stage iterative estimator of the regression parameters in a semiparametric binary response model. The first stage is an ILS procedure. We apply the theory to show that this procedure consistently estimates the regression parameters in the model. The second stage is a NR procedure that starts at the ILS estimates and is based on the criterion function of the Klein and Spady (1993) estimator that achieves the semiparametric efficiency bound for this model established by Chamberlain (1986) and Cosslett (1987). We use the theory to show that the two-stage procedure also achieves this bound. Although we use this application to illustrate the asymptotic theory, the ILS estimator is interesting in its own right because of the substantial computational advantages it can provide, particularly in applications with many observations and many regressors. To illustrate these benefits, we provide a small simulation study comparing the ILS estimator to the efficient estimator of Klein and Spady (1993). We also briefly discuss extensions of the ILS procedure to semiparametric censored regression models.

Simultaneously and independently of this work, Pastorello, Patilea, and Renault (2003, Sects. 1–4) have developed a similar theory of iterative estimation and consider applications to structural nonadaptive econometric models with an emphasis on financial market models. The theory developed in this paper covers all of their main applications. However, their theory requires a continuity condition (Pastorello et al., 2003, Assumption 1a, p. 452) that limits the applicability of their results. For example, we show that this continuity condition does not hold for the ILS estimator developed in Section 3 for the semiparametric binary response model. Nor do these authors provide guidelines for stopping rules.

The rest of the paper is organized as follows. In Section 2, we develop conditions for consistency, rates of convergence, and convergence in distribution of general iterative estimation procedures. We also provide guidelines for developing stopping rules for these procedures. In Section 3, we apply the theory to the iterative estimator for the semiparametric binary response model described earlier. Section 4 presents the simulation study. Section 5 summarizes. Proofs and other technical supporting material are provided in an Appendix.

2. THEORY

This section develops general conditions for rates of convergence and convergence in distribution of iterative procedures for estimating finite-dimensional parameters. We begin by developing some notation.

Let

denote a probability space. That is, Ω is a sample space,

is a σ-field of subsets of Ω, and

is a probability measure on

. For ω ∈ Ω, let Z₁(ω,β₀),…,Z_n(ω,β₀) denote a sample of size n from

, where β₀ denotes a fixed parameter of interest in

Let M(φ) denote a mapping from

with fixed point β₀. That is, M(β₀) = β₀. Population analogues of sample mappings generating common iterative estimators satisfy this fixed point condition. For example, a population analogue of the sample mapping defining an EM procedure has the form

where Q_n(β|φ) is the conditional expectation of the complete data log-likelihood function given the observed data and φ. The outer expectation is over the distribution of the observed data. It follows that

is the expected value of the complete data log-likelihood function, which is maximized at β₀. That is, M(β₀) = β₀. A population analogue of a sample mapping defining a NR procedure has the form

where G(φ) is a population analogue of the gradient of the sample objective function and H(φ) is a population analogue of the corresponding sample Hessian. Under general conditions, G(β₀) = 0, and so M(β₀) = β₀.

We show in Section 3 that the fixed point condition just described is also satisfied by a population analogue of a semiparametric ILS mapping. However, this population mapping depends on both the sample size n and the ω that generated the sample from

. To cover applications like this, from now on we write M_n(φ) for a generic population mapping, letting the subscript n suggest the possible dependence of this mapping on both n and ω. The fixed point condition, then, becomes M_n(β₀) = β₀ for all n and ω. Write

sample analogue of M_n(φ). Let

denote a starting point in

. This starting point may depend on n and ω. For i ≥ 1, define

. We call β_nⁱ a population iterate. We call

a sample iterate and also an iterative estimator of β₀.

Contraction mappings play a central role in establishing the limiting behavior of

. We now introduce the notion of an asymptotic contraction mapping.

DEFINITION. For each n ≥ 1 and ω ∈ Ω, let K_n^ω(·) be a function defined on a set

where

is a metric space. The collection {K_n^ω(·) : n ≥ 1, ω ∈ Ω} is an asymptotic contraction mapping (denoted ACM) on

if there exist a constant c in [0,1) that does not depend on n or ω, and sets {A_n} with each

as n → ∞, such that for each ω ∈ A_n, K_n^ω(·) maps

to itself and for all

The ACM property is a property of the collection of functions {K_n^ω(·) : n ≥ 1,ω ∈ Ω}. For ease of notation, we write {K_n^ω(·)} for this collection.

If {K_n^ω(·)} is an ACM on

and ω ∈ A_n, where A_n is one of the “good” sets described in the definition, then K_n^ω(·) is a contraction mapping on

. Then, by the Banach fixed point theorem (Aliprantis and Border, 1994, pp. 88–89), K_n^ω(·) has a unique fixed point

, and any sequence defined by

where

converges to

as i → ∞. Note that the iterates and the fixed point can depend on n and ω. Also, note that {K_n^ω(·)} can be an ACM without K_n^ω(·) being a contraction mapping for each n and ω.

To establish the limiting distribution of

we require that both

be ACMs on (B₀,E_k). Here E_k is the euclidean metric on

is the closed ball centered at β₀ of radius |β_n⁰ − β₀|. If {M_n(φ)} is an ACM on (B₀,E_k), then, for each ω ∈ A_n (see the definition), β₀ is the unique fixed point of M_n(φ) on B₀. If

is an ACM on (B₀,E_k), then, for each ω ∈ A_n (not necessarily the same A_n as for M_n(φ)),

has a unique fixed point on B₀, which we denote

. Unlike

typically depends on both n and ω.

Theorem 1 gives conditions for rates of convergence of

. Let Z⁺ be the positive integers.

THEOREM 1. Let i(n) be a function from Z⁺ to Z⁺. Fix δ > 0. If

then

as n → ∞.

Remark 1. To apply Theorem 1, one must choose i(n) to satisfy condition (ii). The choice depends on the order of convergence of the iterative procedure. Let βⁱ, i ≥ 1, and β be points in

. The sequence {βⁱ}_i=1^∞ converges to β of order σ ≥ 1 if there exists a constant κ > 0 such that for i ≥ 1, |e_i|/|e_i−1|^σ ≤ κ where e_i = βⁱ − β (cf. Burden, Faires, and Reynolds, 1981, p. 45). The leading special cases are σ = 1 (linear convergence) and σ = 2 (quadratic convergence). For example, the population sequence for the semiparametric ILS procedure analyzed in Section 3 exhibits linear convergence to its fixed point, whereas the population and sample sequences for the semiparametric NR procedure analyzed in Section 3 exhibit quadratic convergence to their respective fixed points.

First, we choose i(n) to satisfy condition (ii) when σ = 1. In this case, the constant κ can be taken to equal the modulus of contraction c guaranteed by condition (i). Assume sup_n|β_n⁰ − β₀| < ∞. A simple recursive calculation shows that n^δ|β_nⁱ⁽ⁿ⁾ − β₀| ≤ |β_n⁰ − β₀| for each n ≥ 1 (a stronger condition than condition (ii)) provided i(n) ≥ −δ ln n/ln c. This bound is sharp and can be used to develop a stopping rule for the iterative sequence. For instance, if σ = 1, δ = ½, n = 5,000, and c ≤ 0.9, then the stronger condition just stated is satisfied provided i(5,000) ≥ 41. Alternatively, one can estimate c with the maximum of the ratios

over a small number of ratios and for different starting values for the sequence.

Next, we choose i(n) to satisfy condition (ii) when σ > 1. Define α(σ) = κ^1/(σ−1)|β_n⁰ − β₀| and assume α(σ) < 1. A recursive calculation shows that n^δ|β_nⁱ⁽ⁿ⁾ − β₀| ≤ |β_n⁰ − β₀| for n ≥ 1 (again, a stronger condition than condition (ii)) provided i(n) ≥ [ln σ]⁻¹ ln(−δ ln n/ln α(σ)). To illustrate, for a smooth NR procedure, σ = 2 and the constant κ can be taken to equal 2Ck with 2C an upper bound on the absolute value of each of the second-order mixed partial derivatives of each of the k components of M_n (cf. Burden et al., 1981, p. 47).¹

Recall that M_n(φ) = (M_n¹(φ),…,M_n^k(φ))′ where each M_n^j(φ) is a real-valued function of φ = (φ₁,…,φ_k)′. A NR procedure is smooth if, for each j, the k × k matrix of second-order mixed partial derivatives of M_n^j(φ) exists.

The constant α(2) can be chosen strictly less than unity by starting the procedure close enough to β₀. If σ = 2, δ = ½, n = 5,000, and α(2) ≤ 0.9, then the stronger condition stated previously is satisfied provided i(5,000) ≥ 6.

Finally, we note that condition (ii) does not require that i(n) → ∞ as n → ∞. As a simple example, consider sampling independent and identically distributed (i.i.d.) observations from a normal distribution with unknown mean β₀ and known variance and estimating β₀ using a NR procedure. Then it is trivial to show that no matter what the starting value β_n⁰, β_n¹ = β₀. That is, condition (ii) is satisfied for all δ > 0 with i(n) = 1 for all n.

Remark 2. Many iterative procedures converge at rate

, corresponding to δ = ½ in Theorem 1. However, this need not be the case. For example, consider estimating the regression parameters in a semiparametric binary response model using a NR procedure based on the smoothed maximum score estimator of Horowitz (1992). Depending on the smoothness of the data distribution, under the conditions of Theorem 1, this NR procedure will converge at rate n^δ, for some

The next result gives conditions for consistency without a rate of convergence.

THEOREM 2. Let i(n) be a function from Z⁺ to Z⁺. If

then

as n → ∞.

Remark 3. In Theorem 2, if condition (i) holds, then condition (ii) holds if i(n) → ∞ as n → ∞. No minimum rate of growth is required for i(n) because the bias term in (ii) is not inflated by a factor of n^δ as it is in Theorem 1.

Our next objective is to develop conditions for convergence in distribution of

. To do so, we require that

be an ACM on (B₀,E_k).

Assume that the first partial derivatives of the components of

and M_n(φ) exist. Let ∇_φ denote the differential operator (∂/∂φ₁,…,∂/∂φ_k). Write V_n(φ) for the k × k matrix

for the k × k matrix

. For a k × k matrix A = (a_ij), let ∥A∥ denote the matrix norm

LEMMA 3. Suppose

exist. If

then

is an ACM on (B₀,E_k).

We are now in a position to establish a convergence in distribution result for

. The limiting distribution depends on the limiting behavior of the infeasible estimator

. Recall that if

is an ACM on (B₀,E_k), then, for

has a unique fixed point on B₀, denoted

. Assume that the probability limit of V_n(φ) exists and let V(φ) denote this quantity. In what follows, we use the symbol ⇒ to denote convergence in distribution.

THEOREM 4. Let i(n) be a function from Z⁺ to Z⁺ and let {ε_n} denote an arbitrary sequence of nonnegative real numbers converging to zero as n → ∞. For δ > 0, if

then

as n → ∞, where D = [I_k − V(β₀)]⁻¹.

Remark 4. Theorem 2 provides checkable conditions that imply condition (i). Lemma 3 provides checkable conditions that imply condition (ii). Remark 1 concerning the choice of i(n) can be adapted to establish condition (iii). For example, in the application in Section 3, the sample sequence of the semiparametric NR procedure is started at a consistent estimator

and exhibits quadratic convergence to its sample fixed point. Replacing M_n with

with

in Remark 1 and choosing i(n) ≥ [ln 2]⁻¹ ln(−δ ln n/ln α(2)) is sufficient to establish condition (iii). Finally, note that procedures exhibiting linear convergence satisfy V(β₀) ≠ 0, whereas procedures exhibiting quadratic convergence (such as NR procedures) satisfy V(β₀) = 0 (Burden et al. 1981, pp. 47–48). Thus, for NR procedures such as the one presented in Section 3, D is the identity matrix, and so

as n → ∞.

3. AN ILLUSTRATION

In this section, we illustrate the theory developed in Section 2 by establishing the limiting behavior of a two-stage iterative estimator of regression parameters in a semiparametric binary response model. The first-stage estimator is an ILS estimator. We establish consistency of this estimator by developing primitive conditions implying the conditions of Theorem 2 in Section 2.²

Wang and Zhou (1995) appear to have been the first to propose a semiparametric ILS estimator for binary response models. They use isotonic regression to estimate a certain conditional expectation. They do not prove consistency. The theory developed in this paper covers the ILS estimator of Wang and Zhou. However, the contraction mapping condition and uniform convergence condition of Theorem 1 are harder to prove due to the difficulty in analyzing the isotonic regression component of their estimator.

We then apply Theorem 4 in Section 2 to establish the limiting distribution of a NR estimator started at the ILS estimates. The NR estimator is based on the criterion function of the Klein and Spady (1993) estimator, which achieves the semiparametric efficiency bound for this model established by Chamberlain (1986) and Cosslett (1987). We show that this two-stage estimator also achieves this bound.

Consider the binary response model Y = 1{Y* ≥ 0} where the latent variable Y* = X′β₀ − u, X = (W₁,…,W_k,W_k+1)′, β₀ = (β₀₁,…,β_0k,β_0,k+1)′, and u is an error term with unknown distribution. Because the distribution of u is not known in this model, restrictions are needed to identify the regression parameters. We assume that W_k+1 is nonconstant and normalize β_0,k+1 to unity. Rather than introduce new notation, we reinterpret β₀ as the first k components of the true parameter vector divided by β_0,k+1 and u as the true error divided by β_0,k+1. Also, for each φ in

, we write X′φ for W₁φ₁ + ··· + W_kφ_k + W_k+1. In addition, we take W₁ = 1. That is, W₁ is the regressor corresponding to the intercept term in the model.

Let (Y₁,X₁),…,(Y_n,X_n) denote a sample of independent observations from the model just defined and let X_n denote the n × k matrix comprised of the first k components of each X_j, j = 1,…,n. Prewhiten X_n so that X_n′X_n = nI_k.

Next, let F and f denote the unknown cumulative distribution function and density function of u. Fix t in

and φ in

. Let

. Assume that ∇_t F(t,φ) exists and write f (t,φ) = ∇_t F(t,φ). Note that F(t,β₀) = F(t) and f (t,β₀) = f (t). Define

The second representations for ν(t,φ) and π(t,φ) follow from integration by parts arguments. Note that

. Write u_n(φ) for (u₁(φ),…,u_n(φ))′ with u_j(φ) = F(X_j′ β₀)ν(X_j′φ,φ) + (1 − F(X_j′ β₀))π(X_j′φ,φ). Write Y_j(φ) for X_j′φ − u_j(φ). Define the population ILS mapping

Note that for all j,

. Deduce from this and prewhitening that M_n(β₀) = β₀. That is, β₀ is a fixed point of M_n(φ).

We now construct a sample analogue of M_n(φ) by developing numerical integral approximations of ν(X_j′φ,φ) and π(X_j′φ,φ) using nearest neighbor estimators of the F(X_j′φ,φ)'s. Fix α > 1 and let {c_n} denote a sequence of nonnegative real numbers satisfying c_n → ∞ as n → ∞. For

, define the trimming functions τ_n(t) = 1{|t| ≤ c_n} and σ_n(t) = 1{|t| ≤ c_n^α}. Note that τ_n(·) trims more severely than σ_n(·). These functions help control tail behavior. Relabel observation numbers so that index values are ordered from smallest to largest. That is, let X₁′φ ≤ ··· ≤ X_n′φ. Let X₀′φ = X₁′φ, X_n+1′φ = X_n′φ, and Δ(X_i′φ) = X_i′φ − X_i−1′φ, i = 1,…,n + 1. For j = 1,…,n, define

In the preceding expressions,

is a nearest neighbor estimator of F(X_i′φ,φ). Let k_n be a positive integer. If k_n < i ≤ n − k_n, then

is the arithmetic average of the 2k_n + 1 binary values Y_{i−k_n},…,Y_{i+k_n}. These are symmetric nearest neighbor estimators. If i ≤ k_n, then

is the average of the k_n + 1 binary values Y_i,…,Y_{i+k_n}. If i > n − k_n, then

is the average of the k_n + 1 binary values Y_{i−k_n},…,Y_i.

Write

with

. Write

Note that

is a function of the nearest neighbor estimators

, which are step functions. This implies that

(and, therefore, the least squares criterion function) is a discontinuous function of φ, violating Assumption 1a in the theory developed in Pastorello et al. (2003, p. 452).

Define the sample ILS mapping

Let

denote an arbitrary starting value and for each i ≥ 1, define

. We call

a semiparametric ILS estimator of β₀.

To prove consistency of the ILS estimator

, we see from Theorem 2 in Section 2 that we must show that M_n(φ) is an asymptotic contraction mapping and that

converges uniformly to M_n(φ). We now state and discuss primitive conditions sufficient to imply these high-level conditions.

Let

denote the closure of an open convex neighborhood of β₀. Write S_u for the support of u and S_φ for the support of X′φ. Write g(t,φ) for the density of X′φ evaluated at t. Throughout, we will maintain the following assumptions for

Assumption A0 describes the data and the model.

Assumptions A1 and A2 are assumptions about u and X and their relationship to each other. Note that we assume that u is log-concave. This is a large class of distributions and includes normal, logistic, extreme-value, Laplacian, gamma, beta, and triangular families and many others also. However, log-concavity is only a sufficient condition. As we will demonstrate in the next section, the ILS procedure can work well even when this assumption does not hold, as when u is log-convex. It is not necessary that

. We make these assumptions for ease of exposition and because the infinite support case is a leading special case. The prewhitening in Assumption A2 is used to establish the local contraction mapping property. In general, prewhitening would not be effective in this regard if the index were not linear.

Assumption A3 defines the number of neighbors up to scale. From the proofs of Lemma 1A and Lemma 2A in the Appendix, we see that n^1/2 << k_n << n is sufficient. However, we choose n^3/4 because this rate optimally balances convergence rates of stochastic and bias terms in the nearest neighbor estimators (see proof of Lemma 3A in the Appendix). The constant of proportionality could be any consistent estimate of a measure of spread in the X_i′φ's.

Assumptions A4–A8 define trimming and tail conditions. The interval [−c_n,c_n] defines the set of t values for which numerical integral estimates of ν(t,φ) and π(t,φ) are computed, and the interval [−c_n^α,c_n^α] defines the set of t values at which nearest neighbor nonparametric regression estimates of F(t,φ) are computed. These latter estimates are ratios with estimates of g(t,φ) in denominators and appear in denominators of estimates of ν(t,φ) and π(t,φ). To give an example of what we have in mind for Assumptions A4–A8, suppose c_n is proportional to

. Then simple calculations show that Assumption A5 is satisfied provided the tails of X′φ decrease no faster than normal tails. If, in addition, α > (1 + δ)/δ for δ > 0, then the conditions in Assumptions A6–A8 are satisfied at β₀ provided the tails of u decrease no faster than normal tails and no slower than |t|^−(2+δ) as |t| → ∞.

Assumptions A9–A11 are conditions useful for bounding remainder terms in Taylor expansions or in proving uniform convergence results.

LEMMA 5. Suppose Assumptions A0, A1, A2, and A11 hold. Then there exists an open ball centered at β₀ with closure

such that {M_n(φ)} is an ACM on (B₀,E_k).

The following result is based on Lemmas 1A, 2A, and 3A in the Appendix, which extend some results of Ichimura (1997) relating k_n, the number of nearest neighbors, to nearest neighbor window widths. It also uses Lemma 4A in the Appendix, a result on maximum sample spacings, to show that numerical integral approximations are close to their estimands.

LEMMA 6. Assumptions A0–A11 imply

as n → ∞.

The next result follows from Lemma 5, Lemma 6, and Theorem 2.

THEOREM 7. Suppose Assumptions A0–A11 hold and

from Lemma 5. If i(n) → ∞ as n → ∞, then

as n → ∞.

Remark 5. The starting value,

, for the ILS procedure may or may not be an element of B₀. Thus, successful implementation of the procedure may require good starting values. Standard parametric estimates, such as probit or logit estimates, can be tried. For our simulations, we used ordinary least squares (OLS) estimates. Although these were generally poor estimates of β₀, they proved to be good enough starting values for the ILS procedure. Also, note that an informal check of the contraction mapping condition can always be made: given a sequence of ILS iterates

, compute the ratios

. Ratios less than unity give informal support to the contraction mapping assumption.

We now develop the efficient estimator. We do so by starting from the consistent ILS estimates and taking Newton–Raphson steps using the criterion function for the efficient Klein and Spady (1993) estimator. Let d = k − 1, reinterpret β₀ as the last d components of the true parameter vector, and write

as the last d components of the ILS estimator. In short, we throw out the intercept term. We do this because the Klein and Spady estimator does not estimate the intercept. Let φ denote an element of

. Define the sample NR mapping

where

is the gradient and

the Hessian or outer product gradient of the criterion function

of the efficient Klein and Spady estimator as defined in Sherman (1994). This function has the form

where

is a nonparametric regression estimator of F(X_i′φ,φ) and τ_n(x) = {|x| ≤ c_n}. Define

and for i ≥ 1, define

. We call

a semiparametric NR estimator of β₀.

Define the population NR mapping

where

the gradient and

the Hessian or outer product gradient of the function

where

Note that M(φ) does not depend on n or ω and M(β₀) = β₀.

Define

and note that V(β₀) = 0_d, the d × d zero matrix. Assumptions in Sherman (1994) imply that H(φ) is continuous in a neighborhood of β₀. It follows that V(φ) is continuous in a neighborhood of β₀. Deduce that there exists an open ball centered at β₀ with closure B₀ such that M(φ) is a contraction mapping on (B₀,E_d). Arguments in Sherman (1994) can be adapted to show that as n → ∞, (i)

, (ii)

, and (iii)

. Deduce from these facts, Lemma 3, and Theorem 4 that for i(n) ≥ [ln 2]⁻¹ ln(−0.5 ln n/ln κ) for κ ∈ (0,1) (see Remark 1),

converges in distribution to a N(0,−[H(β₀)]⁻¹) random variable as n → ∞. This is the limiting distribution of the efficient Klein and Spady (1993) estimator.

Finally, we note that the ILS procedure developed in this section can be easily extended to semiparametric censored regression models. For example, if the latent variable Y* = X′β₀ − u and we observe (Y,X) where Y = Y*1{Y* ≥ 0}, then an ILS estimator of β₀ can be defined exactly as in this section after setting

. Similar extensions can cover other censoring schemes such as those that involve top-coding.

4. SIMULATIONS

In this section, we provide a small simulation study comparing the speed and performance of the ILS procedure developed in Section 3 to the speed and performance of the efficient estimator of Klein and Spady (1993) for the semiparametric binary response model.

The model we use in the simulations has the form Y = 1{u ≤ X′β₀} where the regressor vector X = (W₁,…,W₆,W₇)′ and

with β₀₁ = 0, β₀₂ = −2, β₀₃ = −1, β₀₄ = −0.5, β₀₅ = 0.5, β₀₆ = 2, and β₀₇ = 1. In this model, W₁ = 1 and W₂–W₇ are independent standard exponential random variables. In addition, u is independent of X and has either a χ²(1) or a χ²(3) distribution standardized to have mean zero and variance equal to the variance of X′β₀. Thus, the signal to noise ratio in the simulations is unity. We normalize on β₀₇ and report estimates of the slope coefficients β₀₂,β₀₃,β₀₄,β₀₅,β₀₆.⁴

Although the ILS procedure directly produces an intercept estimate, the Klein and Spady estimator does not. Consequently, we do not report intercept estimates.

Note that a χ²(1) distribution is log-convex, whereas a χ²(3) distribution is log-concave. Thus, the χ²(1) distribution strongly violates log-concavity. We will show that the ILS procedure still works well in this setting.

For the ILS procedure, we choose the number of neighbors k_n by a “leave one out” method of least squares cross-validation calibrated on one of the models defined previously with n = 10,000. This and A3 produce the rule k_n ≈ 0.14n^3/4, which we use in all the simulations. We do no trimming. Also, we use OLS starting values and say that the procedure has converged when the maximum relative difference between the components of successive iterates is less than 10⁻⁴ in absolute value. If this convergence criterion is not met after 1,000 iterations, we stop and use the last five components of the 1,000th iterate as an estimate of (β₀₂,…,β₀₆).

We compute the Klein and Spady criterion function using a program written in the GAUSS language by Roger Klein. This program likewise does no trimming. We compute the Klein and Spady estimator using the MAXLIK optimization routine with PROBIT starting values and say that the procedure has converged when the maximum component of the gradient is less than 10⁻⁴ in absolute value. If this criterion is not met after 50 iterations, we stop and use the 50th iterate as an estimate of (β₀₂,…,β₀₆). Typically, the components of the 50th iterate are not changing in the first four decimal places.

Results for both the ILS and Klein and Spady estimators are based on the same simulation data sets. All simulations are performed using GAUSS 3.5 for Windows on a Pentium-III PC with 800 megahertz of RAM.

Table 1 presents means and root-mean squared error (RMSE) statistics for the ILS and Klein and Spady (KS) estimators, based on 100 simulations. Results for the PROBIT and OLS estimators are also provided as points of comparison. For both error distributions, when n = 1,000, both the ILS and Klein and Spady estimators do well in terms of bias. The Klein and Spady estimator slightly outperforms the ILS estimator in terms of RMSE for χ²(1) errors; the two estimators are close in RMSE when errors are χ²(3). When n = 5,000, the estimators are almost indistinguishable in terms of both bias and RMSE. Note that both absolute and relative bias in PROBIT and OLS estimates increase as the parameter values increase in magnitude, the effect being more pronounced for the more skewed χ²(1) distribution.

Means and RMSE of coefficient estimates

Table 2 presents computation statistics that allow timing and convergence comparisons of the ILS and Klein and Spady estimators. These statistics correspond to the results presented in Table 1. The first column gives the median number of iterations per simulation, the second column gives the approximate time per iteration, and the third column gives the percentage of simulations where the respective convergence criteria were satisfied. We see that for both error distributions, when n = 1,000, most of the ILS simulations do not satisfy the ILS criterion. However, we see from Table 1 that this does not imply that the ILS iterates are diverging. Rather, the iterates oscillate between vectors whose components differ in the third or fourth decimal places. We suspect that this oscillation is due to lack of smoothness in

when n = 1,000. Note that when n = 5,000, the median number of ILS iterations decreases to 175, whereas nearly all the simulations satisfy the ILS criterion. The Klein and Spady estimator, on the other hand, satisfies its convergence criterion most of the time, though contrary to our expectations, the percentage decreases as skewness decreases and as sample size increases.

Computation statistics

A notable aspect of Table 2 is the timing comparison. The ILS estimator, on average, requires only about 20 seconds to estimate six parameters (five slope coefficients and an intercept) when n = 5,000. The Klein and Spady estimator, on average, requires well over 2 hours to estimate five parameters when n = 5,000. The ILS procedure is fast because (i) only O(n) calculations are needed to compute the nearest neighbor estimators of all the F(X_j′φ,φ)'s and (ii) computation time for a least squares calculation is nearly constant as a function of k, the number of estimated parameters. By contrast, computing the Klein and Spady criterion function, which involves local smoothing, is an O(n²h_n) calculation where h_n is the deterministic bandwidth for the kernel regression estimators of the F(X_j′φ,φ)'s. In addition, computation time can increase as a quadratic in k, because steps can require the computation of numerical gradients and Hessians of the criterion function.

5. SUMMARY

This paper develops general conditions for rates of convergence and convergence in distribution of iterative procedures for estimating finite-dimensional parameters. The theory covers iterative estimation schemes such as expectation-maximization (EM), Newton–Raphson (NR), and iterative least squares (ILS) procedures. The theory requires a combination of asymptotic contraction mapping conditions and uniform convergence conditions, with convergence in distribution requiring an additional convergence in distribution result for a certain infeasible estimator. A bias condition is isolated that can be used to derive sensible stopping rules.

We illustrate the theory by establishing the limiting distribution of a two-stage iterative estimator of regression parameters in a semiparametric binary response model. The first stage is a consistent ILS procedure. The second stage is a NR procedure that is started at the ILS estimates and is based on the criterion function of the efficient Klein and Spady (1993) estimator. The ILS/NR procedure achieves the semiparametric efficiency bound for this model established by Chamberlain (1986) and Cosslett (1987). Simulations show that the ILS procedure is very fast to compute even for models with many observations and many estimable parameters. In addition, the ILS estimator is comparable to the Klein and Spady estimator in terms of root-mean-squared error in the given simulations. The ILS procedure developed for the semiparametric binary response model easily extends to cover various semiparametric censored regression models.

APPENDIX

Proof of Theorem 1. By (i), there exists a constant c in [0,1) that does not depend on n or ω, and sets {A_n} with each

as n → ∞, such that for each ω ∈ A_n, |M_n(φ) − M_n(γ)| ≤ c|φ − γ| for each φ,γ in B₀. Let ρ₀ = |β_n⁰ − β₀|. By (iii), there exist sets {B_n} with each

as n → ∞, such that for each ω ∈ B_n for all n large enough,

is bounded and

is arbitrarily small, in particular, smaller than 1 − cρ₀. Let C_n = A_n ∩ B_n and note that

as n → ∞. It follows that for each ω ∈ C_n and for each γ in B₀,

Deduce that for each

maps B₀ to itself.

Next, note that

can be bounded by a bias term plus a stochastic term. That is,

By (ii), the bias term has order O_p(n^−δ) as n → ∞. Consider the stochastic term. Recall that

and that for each

maps B₀ to itself. It follows that for each

. Thus, for each ω ∈ C_n,

Apply the last inequality recursively to see that for each ω ∈ C_n,

Condition (iii) implies that the stochastic term has order O_p(n^−δ), which proves the result. █

Proof of Theorem 2. This involves a trivial modification of the proof of Theorem 1. █

Proof of Lemma 3. Fix φ and γ in B₀. Because B₀ is convex, we may apply a multivariate Taylor expansion (e.g., Dieudonné, 1969, p. 190) to write M_n(φ) − M_n(γ) = Λ_n(φ,γ)[φ − γ] where

. Similarly, we may write

where

. Use (i) and (ii) and argue as in the proof of Theorem 1 that there exist sets {C_n} with each

as n → ∞, such that for each

maps B₀ to itself and M_n(φ) is a contraction mapping on (B₀,E_k) with fixed point β₀ and modulus of contraction c ∈ [0,1) independent of n and ω. Then for each ω ∈ C_n and for each φ,γ in B₀,

By (iii), the last term has order o_p(|φ − γ|) as n → ∞, from which the result follows. █

Proof of Theorem 4. By (ii), there exist sets {A_n} with each

as n → ∞ such that for each

is a fixed point of

. For ω ∈ A_n, write

. By (iii),

has order o_p(n^−δ) as n → ∞. The result will follow if, as n → ∞,

Because

is a fixed point of

. Thus, for each ω ∈ A_n,

Expand

about β₀ to get

where β_n* is on the line segment connecting

. Combine the last two expressions to get

Note that

Conditions (i) and (ii) imply that

as n → ∞. This and the definition of β_n* imply that |β_n* − β₀| = o_p(1) as n → ∞. This and condition (v) imply that the first term in brackets has order o_p(1) as n → ∞. Condition (vi) and the fact that |β_n* − β₀| = o_p(1) as n → ∞ imply that the second term in brackets has order o_p(1) as n → ∞. This and condition (iv) imply the result. █

Next, we prove several lemmas used in the proof of consistency of the semiparametric ILS estimator. The first is an extension of a result of Ichimura (1997, Lemma 4.1, p. 29) relating a nearest neighbor window width to the number of neighbors.

Recall that

is a neighborhood of β₀ and for

is the density of X′φ evaluated at

. Define

. Note that for all

and d(t,φ) ≥ 1. In addition, by Assumption A10, both c(t,φ) and d(t,φ) are continuous in t and φ. Define a_n(t,φ) to be the distance from t to its k_nth nearest neighbor to the right and b_n(t,φ) the distance from t to its k_nth nearest neighbor to the left. Recall that the trimming function σ_n(t) = {|t| ≤ c_n^α}. Fix μ > 0. Define S_n to be the interval [−c_n^α,c_n^α], L_n the lower interval [−c_n^α,−μ], M the middle interval [−μ,μ], and U_n the upper interval [μ,c_n^α].

LEMMA 1A. Suppose Assumptions A2, A3, A5, A9, and A10 hold. Then, as n → ∞,

where either r_n(t,φ) = [nc(t,φ)g(t,φ)a_n(t,φ)]/k_n or r_n(t,φ) = [nd(t,φ)g(t,φ) × b_n(t,φ)]/k_n.

Proof. We shall prove the result for the case r_n(t,φ) = [nc(t,φ)g(t,φ)a_n(t,φ)]/k_n. The proof for the other case is similar.

Let H(·,t,φ) denote the cumulative distribution function (c.d.f.) of (X′φ − t)⁺. That is, for s ≥ 0,

Let H_n(s,t,φ) denote the corresponding empirical c.d.f. That is, for s ≥ 0,

Note that H_n(a_n(t,φ),t,φ) = k_n /n and so

Standard empirical process results (e.g., Pakes and Pollard, 1989) imply that as n → ∞,

Thus, uniformly over

This and a Taylor expansion of H about s = 0 imply that uniformly over

where s* is between zero and a_n(t,φ). Because k_n >> n^1/2, we get from (A.1) that uniformly over

The continuity of g(t,φ) implies that

is a continuous function of t and φ on the compact set

. Thus, it achieves its minimum on this set. Moreover, this minimum must be positive because the support of

for each φ in

. Deduce from a standard uniform law of large numbers that wp → 1 as n → ∞, there exists a positive number p independent of

, such that there are at least pn points in [t,t + 1] for each

. Because k_n << n, wp → 1 as n → ∞, a_n(t,φ) (and therefore, s*) must be in the interval [0,1] for each

. Because n^1/2 << k_n << n and 1/[c(t,φ)g(t,φ)] is bounded over

, it follows from (A.1) that a_n(t,φ) = O_p(k_n /n) = o_p(1) uniformly over

. The left-hand side of (A.2) equals

The result follows from (A.3), a_n(t,φ) = O_p(k_n /n), and a Taylor expansion of g(t + s*,φ) about s = 0 together with the uniform convergence of s* to zero and sup_t,φ|∇_t g(t,φ)| < ∞. █

LEMMA 2A. Suppose Assumptions A2, A3, A5, A9, and A10 hold. Then

Proof. We shall prove (i). The proof of (ii) is similar.

To establish (i), note that an argument similar to the one given in the proof of Lemma 1A shows that wp → 1 as n → ∞, a_n(t,φ) (and therefore, s*) must be in the interval [0,c_n^α − μ + 1] for each

. Fix

. Because n^1/2 << k_n << n^1−δ and 1/[c(t,φ)g(t,φ)] << n^δ uniformly over

, it follows from (A.1) that a_n(t,φ) = o_p(k_n /n^1−2δ) uniformly over

. Result (i) now follows from (A.2) and (A.3), a Taylor expansion of g(t + s*,φ) about s = 0, the fact that s* = o_p(k_n /n^1−δ) uniformly over

, and sup_t,φ|∇_t g(t,φ)| < ∞. █

Recall that we use three types of nearest neighbor estimators of F(t,φ): symmetric, asymmetric from the right, and asymmetric from the left. The nearest neighbor estimator that is asymmetric from the right has the form

whereas the the nearest neighbor estimator that is asymmetric from the left has the form

The symmetric nearest neighbor estimator has the form

We see that if t ≠ X_j′φ for any

is an arithmetic average of

. We shall establish rates of uniform consistency for

. The same results with similar proofs can be obtained for the other types. Note that we can write

where

Our next result gives a rate of uniform convergence of

LEMMA 3A. Suppose Assumptions A2, A3, A5, A9, and A10 hold. Then, as n → ∞, for all δ > 0,

Proof. We first prove (i). Decompose

into a sum of a stochastic term and a bias term:

Let {ε_n} denote an arbitrary sequence of nonnegative real numbers satisfying ε_n → 0 as n → ∞. By Lemma 1A and Lemma 2A, wp → 1 as n → ∞, for

, and a satisfying |nc(t,φ)g(t,φ)a/k_n − 1| ≤ ε_n, the first term in (A.4) is bounded by

Standard empirical process results (e.g., Pakes and Pollard, 1989) show that uniformly over

, and a ≥ 0, the term in absolute value signs in (A.5) has order O_p(n^−1/2) as n → ∞. Deduce from this, Assumptions A3 and A5, and the condition on a, that for all δ > 0, the term in (A.5) has order o_p(n^−1/4+δ) as n → ∞.

Next, consider the bias term in (A.4). This term, wp → 1 as n → ∞, for

, and a satisfying |nc(t,φ)g(t,φ)a/k_n − 1| ≤ ε_n, is bounded by

After the change of variable z = (v − t)/a,

A Taylor expansion of g(t + az,φ) about t, together with the bounded derivative in Assumptions A9, A3, and A5 and the condition on a, implies that for each δ > 0, the term in (A.6) has order o_p(n^−1/4+δ) as n → ∞. This proves (i).

Mimic the proof of (i) to prove that

To prove (ii), note that

Apply (i), (A.7), and Assumption A5 to get the result. █

LEMMA 4A. Let P be a probability distribution and V₁,…,V_n a sample of independent observations from P. Let λ_n be the infimum of the density of P over [−κ_n,κ_n], 0 < κ_n < ∞, and suppose λ_n > 0. For α_n > 0, partition [−κ_n,κ_n] into N = (2κ_n n)/α_n intervals of length α_n /n. For i = 1,…,N, let A_i be the event that the ith interval contains at least one sample point. As n → ∞,

Proof. By Bonferroni's inequality,

Both (i) and (ii) now follow from simple calculus. █

Lemma 5A is used in the proof of Lemma 5. It is proved in a more general form in Klein and Spady (1993). For ease of reference, we restate the result in the form in which we use it in the proof of Lemma 5.

LEMMA 5A. Suppose f (·) is bounded and

. Then

Proof of Lemma 5. Note that V_n(φ) = ∇_φ M_n(φ) is an average of i.i.d. random variables. By Assumption A11 and Lemma 2.13 in Pakes and Pollard (1989), V_n(φ) converges uniformly to

. Moreover, Assumption A11 implies that

is continuous on

Fix

. By a multivariate Taylor expansion, there exists a φ* on the line segment between φ and γ such that as n → ∞,

By the continuity of

and a standard linear algebra result, it is enough to show that wp → 1 as n → ∞, the largest eigenvalue of V_n(β₀) in absolute value is strictly less than unity. To this end, recall that

where

Write ∇_c for ∂/∂φ_c, c = 1,2,…,k. The rcth element of V_n(β₀) equals

Recall that X = (W₁,…,W_k,W_k+1) and W₁ = 1. Integrate by parts and then differentiate and apply Lemma 5A to get that [∇_cν(X_j′φ,φ)]_β₀ equals

Similarly, [∇_cπ(X_j′φ,φ)]_β₀ equals

Integration by parts arguments show that

Write d(X_j′ β₀) for F(X_j′ β₀)ν₁(X_j′ β₀) + [1 − F(X_j′ β₀)]π₁(X_j′ β₀). The term in outer brackets in (A.8) is equal to

where

This and prewhitening let us write the rcth element of V_n(β₀) as

Because W₁ = 1, κ₁ = 1. Prewhitening implies that W_j1 = 1 for all j. Deduce that for r = 1,…,k and c = 1, the rcth element of V_n(β₀) equals zero. That is, the first column of V_n(β₀) is a vector of zeros. This implies that one solution of the characteristic equation of V_n(β₀) must be zero. Thus, to prove that wp → 1 as n → ∞, the maximum eigenvalue of V_n(β₀) in absolute value is strictly less than unity, it is enough to prove this for the (k − 1) × (k − 1) lower right-hand submatrix of V_n(β₀). For convenience, we call this submatrix A_n.

Prewhitening implies that

. Deduce from this and (A.10) that the rcth element of A_n equals

Let _n denote the n × (k − 1) matrix comprised of the second through kth components of each regressor vector and let Xβ₀ denote the n × 1 vector with jth component X_j′ β₀. Define

. Also, let D_n denote the n × n diagonal matrix with jjth element d(X_j′ β₀). We see that

By prewhitening, T_n′T_n = I_k−1 = T_n′I_nT_n. The fact that T_n′D_n S_n = S_n′D_n S_n + o_p(1) as n → ∞ implies that wp → 1 as n → ∞,

where W_n = [I_n − D_n]^1/2T_n and Z_n = D_n^1/2[T_n − S_n]. (Note that log-concavity of u and Proposition 1 in Heckman and Honoré, 1990, imply that the diagonal elements of D_n are in [0,1], making it possible to form D_n^1/2 and [I_n − D_n]^1/2.) Thus, wp → 1 as n → ∞, A_n is a nonnegative definite matrix and so must have all nonnegative eigenvalues. Recall from the preceding discussion that wp → 1 as n → ∞,

where Σ_n = D_n^1/2S_n. It follows that wp → 1 as n → ∞, the maximum eigenvalue of A_n is equal to

Assumptions A1 and A2 imply that

. Thus, x′Σ_n′Σ_n x is positive and bounded away from zero for all unit vectors

. Thus, wp → 1 as n → ∞, the maximum eigenvalue of A_n is strictly less than unity. This proves the result. █

Proof of Lemma 6. Note that

is bounded by

where

with

, _n(φ))′ with u_j(φ) = [F(X_j′ β₀)ν(X_j′φ,φ) + (1 − F(X_j′ β₀)) π(X_j′φ,φ)]τ_n(X_j′φ) and u_n(φ) = (u₁(φ),…,u_n(φ))′ with u_j(φ) = [F(X_j′ β₀) ν(X_j′φ,φ) + (1 − F(X_j′ β₀))π(X_j′φ,φ)].

Each component of the second term in (A.11) is an average of zero mean i.i.d. random variables. Deduce from this, Assumption A11,

, and Lemma 2.13 in Pakes and Pollard (1989) that this term has order O_p(n^−1/2) uniformly over

Each component of the third term in (A.11) can be decomposed into its mean plus a term that is an average of zero mean i.i.d. random variables. This latter term has order O_p(n^−1/2) uniformly over

. This follows from the same argument used to handle the second term in (A.11). The mean of the ith component is

, which converges to zero as n → ∞ by Assumptions A2 and A11 and dominated convergence, because τ_n(t) → 1 as n → ∞ for all

Consider the first term in (A.11). The result will follow if wp → 1 as n → ∞, uniformly over j for which

converges to

converges to π(X_j′φ,φ). We will prove the former. Proof of the latter is similar. Note that

equals

Consider (A.13) in the last display. Assumption A4 and the fact that

is bounded between zero and unity imply that for all δ > 0,

Deduce from (A.15), Lemma 3A(iii), and Assumption A6 that for all δ > 0, wp → 1 as n → ∞, uniformly over j and φ,

It then follows from (A.15), (A.16), Lemma 3A(iii), and Assumption A6 that for all δ > 0, wp → 1 as n → ∞, uniformly over j and φ, the term in (A.13) has order o_p(n^−1/4+δ).

Next, consider (A.14). Write X_σ′φ for the smallest index value for which σ_n(X_i′φ) = 1. If τ_n(X_j′φ) = 1, then X_j′φ ≥ −c_n. By Assumption A7, F(t,φ) [nearr ] 0 as t → −∞ for each

. This and the triangle inequality imply that wp → 1 as n → ∞ the term in (A.14) is bounded by

By Assumption A5 and Lemma 4A, wp → 1 as n → ∞, X_σ−1′φ can be made arbitrarily close to −c_n^α. Deduce that the second term in (A.17) is bounded by

, which converges to zero by Assumption A7. Consider the first term in (A.17). Define

. By Assumption A9 and a Taylor expansion of H(ε) about ε = 0 followed by a Taylor expansion of F(t − ε,φ) about ε = 0, we see that H(ε) = εF(t,φ) + O(ε²) uniformly over

. Deduce from this and Assumption A5 that the first term in (A.17) has order o(n^1+δΔ_n²) for all δ > 0 where Δ_n = sup_φ Δ_n(φ) with Δ_n(φ) = max_i{Δ(X_i′φ)σ_n(X_i′φ)}, the norm of the partition of the interval [−c_n^α,c_n^α]. By Assumption A5 and Lemma 4A, wp → 1 as n → ∞, Δ_n << n^δ log n/n for all δ > 0. Deduce that as n → ∞, the first term in (A.17) has order o_p(n^δ−1) for all δ > 0. This proves the result. █

References

REFERENCES

Aliprantis, C.D. & K.C. Border (1994) Infinite Dimensional Analysis. Springer-Verlag.

Bickel, P. (1975) One-step Huber estimates in the linear model. Journal of the American Statistical Association 70, 428–434.Google Scholar

Burden, R.L., J.D. Faires, & A.C. Reynolds (1981) Numerical Analysis. PWS Publishers.

Chamberlain, G. (1986) Asymptotic efficiency in semi-parametric models with censoring. Journal of Econometrics 32, 189–218.Google Scholar

Cosslett, S.R. (1987) Efficiency bounds for distribution-free estimators of the binary choice and censored regression models. Econometrica 55, 559–586.Google Scholar

Dieudonné, J. (1969) Foundations of Modern Analysis. Academic Press.

Heckman, J.J. & B. Honoré (1990) The empirical content of the Roy model. Econometrica 58, 1121–1149.Google Scholar

Horowitz, J. (1992) A smoothed maximum score estimator for the binary response model. Econometrica 60, 505–531.Google Scholar

Ichimura, H. (1997) Asymptotic Distribution of Non-parametric and Semiparametric Estimators with Data Dependent Smoothing Parameters. Manuscript, Department of Economics, University of Pittsburgh.

Klein, R.W. & R.H. Spady (1993) An efficient semiparametric estimator for binary response models. Econometrica 61, 387–421.Google Scholar

Lehmann, E.L. (1983) Theory of Point Estimation. Wiley.

McLachlan, G.J. & T. Krishnan (1997) The EM Algorithm and Extensions. Wiley.

Pakes, A. & D. Pollard (1989) Simulation and the asymptotics of optimization estimators. Econometrica 57, 1027–1057.Google Scholar

Pastorello, S., V. Patilea, & E. Renault (2003) Iterative and recursive estimation in structural non-adaptive models. Journal of Business & Economic Statistics 21, 449–509.Google Scholar

Robinson, P.M. (1988) The stochastic difference between econometric estimators. Econometrica 56, 531–548.Google Scholar

Sherman, R.P. (1994) U-processes in the analysis of a generalized semiparametric regression estimator. Econometric Theory 10, 372–395.Google Scholar

Wang, W. & M. Zhou (1995) Iterative Least Squares Estimator of Binary Choice Models: A Semiparametric Approach. Working paper, University of Kentucky College of Business and Economics.

Means and RMSE of coefficient estimates

Computation statistics

Article contents

SOME CONVERGENCE THEORY FOR ITERATIVE ESTIMATION PROCEDURES WITH AN APPLICATION TO SEMIPARAMETRIC ESTIMATION

Abstract

1. INTRODUCTION

2. THEORY

3. AN ILLUSTRATION

4. SIMULATIONS

5. SUMMARY

APPENDIX

References

REFERENCES

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests