Article contents
TIME-INVARIANT REGRESSOR IN NONLINEAR PANEL MODEL WITH FIXED EFFECTS
Published online by Cambridge University Press: 31 March 2005
Abstract
This paper generalizes the intuition of Hausman and Taylor (1981, Econometrica 49, 1377–1398) and develops a method of dealing with a time-invariant regressor in nonlinear panel models with fixed effects. We illustrate the usefulness of our result by discussing the implication for some nonlinear models of social interactions.We are grateful to Dan Ackerberg and Jerry Hausman for helpful comments. The first author gratefully acknowledges financial support from NSF grant SES-0313651.
- Type
- MISCELLANEA
- Information
- Copyright
- © 2005 Cambridge University Press
1. INTRODUCTION
Panel data allow the possibility of controlling for unobserved individual specific effects. In linear models, such “fixed effects” are usually eliminated by differencing. An unintended consequence of differencing is that it also eliminates the time-invariant regressor, which renders its coefficient unidentified. Hausman and Taylor (1981) used an instrumental variables (IV) approach to overcome such a problem. They show that a variable that is uncorrelated with individual fixed effects can be used as a valid instrument in estimating such a coefficient.
In this paper, we generalize their intuition and develop a method of dealing with a time-invariant regressor in the nonlinear framework. This method requires a large number of observations per individual (T), so its applicability is limited to the case where T is large. Because the IV estimation requires a large number of individuals (n), we adopt an asymptotic framework where both n and T grow to infinity at the same rate. This result is made possible by recent technical progress of panel analysis under such alternative asymptotics. See, e.g., Hahn and Kuersteiner (2002, 2003), Hahn and Newey (2004), and Woutersen (2002). We illustrate the usefulness of our result by discussing the implication for some nonlinear models of social interactions.
2. IV ESTIMATION
Suppose that we are given a set of moment restrictions

for some vector-valued function φ, where yit, xit, and wi denote the dependent variable in the tth period, time-varying regressor in the tth period, and time-invariant regressor. Unobserved individual specific effects are summarized by the scalar variable γi. For example, in the case of the linear model

with (wi′,xit′) strictly exogenous such that

Our primary focus is to estimate the coefficient δ0 of the time-invariant regressor wi when γi is possibly correlated with wi and xit. We should note that estimation of θ0 does not present any substantive conceptual challenge. If both n and T grow to infinity at the same rate, θ0 can be

-consistently estimated. Letting αi0 ≡ γi0 + wi′δ0, we can rewrite the model as E [φ(yit,αi0 + xit′θ0)] = 0, to which we can apply variants of recently developed methods discussed in Hahn and Kuersteiner (2002, 2003), Hahn and Newey (2004), and Woutersen (2002).
To understand how δ0 could be estimated, suppose for a moment that we observe αi0. Also suppose that we observe an additional variable zi with dim(zi) = dim(wi)1
It is easy to generalize the discussion to the overidentified case where dim(zi) > dim(wi). Because the primary purpose of this paper is identification and consistent estimation of δ0, we focus on the exactly identified case.
CONDITION 1. (i) E [ziγi0] = 0; (ii) E [zi wi′] is nonsingular.
Note that zi is required to be uncorrelated with the individual fixed effects. It is clear that we can consistently estimate δ0 by

under the mild condition that the data are independent and identically distributed (i.i.d.) over i.
CONDITION 2. ({yi1,yi2,…},{xi1,xi2,…},zi,wi,γi0) is i.i.d. over i.
Hausman and Taylor (1981) noted that

would remain consistent even if we replace αi0 by an unbiased estimate. In the nonlinear context, it is difficult to come up with such an unbiased estimator for αi0. Therefore, Hausman and Taylor's method cannot be directly applied. The basic intuition in this paper is that, when both n and T grow to infinity at the same rate, we can come up with a

-consistent estimator for αi0, say,

.
CONDITION 3. n,T → ∞ such that n/T → ρ, where 0 < ρ < ∞.
Because the estimation error becomes very small as the sample size increases, the IV estimator

is expected to be consistent for δ0 in general. This is quite intuitive. Note that

is an IV estimator of

on wi. Because

is a proxy for αi0, and because the “measurement error” disappears as T → ∞, we should expect that the distribution of

should be similar to that of

if T is large.
To come up with a

-consistent estimator

, we will assume that

for some functions u and v, where Xit ≡ (yi1,…,yiT,xi1,…,xiT) and where dim(θ) = dim(u) = p and dim(α) = dim(v) = 1. We will consider the estimator that solves

This indicates that (i) the first component u is used throughout the sample for estimation of θ0; and (ii) the second component v is used only for the ith individual to estimate αi0. We do not expect this separation to be constraining in practice. For example, in the linear model (2), we may take u(Xit;θ,αi) = xit·(yit − αi − xit′θ), v(Xit;θ,αi) = yit − αi − xit′θ, which will result in the usual fixed effects estimator.
Under regularity conditions discussed in Appendix A, it can be shown that the

, which solves (5), is uniformly consistent over i.
2See Appendix D.

is a proxy for αi0, there is a “measurement error.” If there is a correlation between the measurement error and the instrument zi, the resultant estimator

may be biased. Condition 4 rules this out.
CONDITION 4. E [v(Xit;θ,αi)|zi] = 0.3
We have v(Xit;θ,αi) = εit in the linear model (2), and Hausman and Taylor's instrument zi for such a linear model is required to satisfy E [zi·εit] = 0.
Theorem 1 establishes that the IV estimator

in (4) based on the proxy

of αi0 has the same asymptotic distribution as

in (3).
THEOREM 1. Assume Conditions 1–7. Further assume that E [|zi wi′|] < ∞ and E [|γi02zi zi′|] < ∞. We then have

Proof. See Appendix E.
3. APPLICATION: NONLINEAR MODEL OF SOCIAL INTERACTIONS
Identification and estimation of various social effects in the nonlinear model of social interactions based on the preceding framework are straightforward and provide one way of dealing with the typical identification problems that are peculiar to these models.
For a grouped cross-section of data, a model of social interactions can have the form of a conditional likelihood f (ygi,γg0 + Eg [ygi]β + Eg [sgi′]φ + sgi′ζ + xgi′θ0). Here, ygi is the outcome/behavior of interest for the ith individual in the gth group, and Eg [·] denotes the mean for the gth group. Following the classification of Manski (1993), the coefficient on Eg [ygi] determines the strength of endogenous social effects in explaining individual outcomes. In addition to ygi, we observe sgi, a vector of individual characteristics that also generate exogenous (contextual) social effects, and xgi, a vector of individual characteristics that operate at the individual level only. Finally γg0, which is not observed by the econometrician, captures the presence of correlated group effects.
The focus of Graham and Hahn (2003) is on the linear-in-means model without contextual effects. They exploit the idea of Hausman and Taylor (1981) of using IVs to identify the parameters. Identification of the endogenous effect is made possible by an instrument that exogenously explains the between-group variation of the individual characteristics xgi. To be more specific, they consider the simplified model

and examine whether the endogenous effects can be identified in the presence of correlated effects. For such purpose, they considered the social equilibrium

They show that

under a standard strict exogeneity condition, where for any vector

. They also note that θ0 /(1 − β) is the limit of the two-stage least squares estimator applied to the social equilibrium (6) if (i) Eg [ygi] and Eg [xgi] are observed; and (ii) there exists an instrument zg such that E [zgγg0] = 0 and E [zg Eg [xgi′]] ≠ 0.
Brock and Durlauf (2001a, 2001b) are concerned with nonlinear models without correlated group effects. To be specific, they considered a logit model4
They actually considered the model

where mge denotes the (common) expectation of yg among agents in group g. Under some auxiliary assumption including rational expectations and common knowledge, the model is reduced to the simpler form presented here.

where hg = (Eg [sgi′],sgi′)′. They show that the parameter (k,β,ξ,θ0) is identified and can be consistently estimated by maximum likelihood estimation. Exploiting nonlinearity, they established that the endogenous effects β can be identified from ξ. Because hg contains Eg [sgi], the contextual effects are identified from the endogenous effects in nonlinear models in general.5
They note that these identification results are still valid in the presence of multiple equilibria.
The result in the previous section can be used to identify social effects in the presence of correlated group effects. Let γg0 denote the group characteristic that may be correlated with observed variables such as Eg [ygi] or xgi. Assume that the conditional likelihood given as γg0,Eg [ygi],hg,xgi takes the form f (ygi,γg0 + Eg [ygi]β + hg′ξ + xgi′θ0). Note that the explanatory variables affect the outcome through the linear index γg0 + Eg [ygi]β + hg′ξ + xgi′θ0. For example, we may have

which is a generalized version of Brock and Durlauf's logit model that allows correlated group effects. Interpreting Eg [ygi] as just one of the regular time invariant regressors and writing wg = (Eg [ygi],hg′)′ and wg′δ0 = βEg [ygi] + hg′ξ, we get the conditional likelihood as f (ygi,γg0 + wg′δ0 + xgi′θ0). This model can be understood to be a nonlinear panel model with group fixed effects and some individual-invariant regressor wg, for which identification results have been established earlier in this paper. We note that consistent estimation of γg0 + wg′δ0 and θ0 can be achieved by considering the maximum likelihood estimator, i.e., by taking

We may therefore conclude that the social effects are identified as such.
4. SUMMARY AND EXTENSION
In this paper, we generalized the result of Hausman and Taylor (1981) to nonlinear panel models with fixed effects. The usefulness of the result is illustrated with some nonlinear models of social interactions. It would be interesting to generalize Hausman and Taylor's specification test to the nonlinear setup, which is left for future research.
APPENDIX A: REGULARITY CONDITIONS
Condition 5. (i) Given time-invariant variables (αi0,zi,wi), (yit,xit) is i.i.d. over t; (ii) for every i, G(i)(θ0,αi0) = 0; (iii) for each η > 0, infi inf{(θ,α):|(θ,α)−(θ0,αi0)|>η}|G(i)(θ,α)| > 0, where

.
Remark 1. We are assuming αi are deterministic sequence of numbers, i.e., all the results in this paper are results conditional on αi.
Condition 6. (i) The function g(·;θ,α) is continuous in

; (ii) the parameter space

is compact; (iii) there exists a function M(Xit) such that

and supi E [M(Xit)Q] < ∞ for some Q > 64.
Condition 7. (i) mini E [v(Xit;θ0,αi0)2] > 0; (ii)

, where

.
APPENDIX B: CONSISTENCY
LEMMA 1. Assume that Wt are i.i.d. with E [Wt] = 0 and E [Wt2k] < ∞. Then,

for some constant C(k).
Proof. By adopting an argument in the proof of Lemma 5.1 in Lahiri (1992), we have

where for each fixed j ∈ {1,…,2k}, [sum ]α extends over all j-tuples of positive integers (α1,…,αj) such that α1 + ··· + αj = 2k and [sum ]I extends over all ordered j-tuples (t1,…,tj) of integers such that 1 ≤ tj ≤ T. Also, C(α1,…,αj) stands for a bounded constant. Note that if j > k then at least one of the indices αj = 1. By independence and the fact that E [Wt] = 0 it follows that

whenever j > k. This shows that

for some constant C(k). █
LEMMA 2. Suppose that, for each i, {ξit,t = 1,2,…} is a sequence of zero mean i.i.d. random variables. We assume that {ξit,t = 1,2,3} are independent across i. We also assume that maxi E [|ξit|16] < ∞. Finally, we assume that n = O(T). We then have

for every η > 0.
Proof. Using Lemma 1, we obtain

, where C > 0 is a constant. Therefore, we have

, or

. █
LEMMA 3. Suppose that Conditions 3 and 6 hold. We then have for all η > 0 that

Proof. Let η > 0 be given. We note that

Let ε > 0 be chosen such that 2ε maxi E [M(Xit)] < η/3. Divide

into subsets

such that |(θ,α) − (θ′,α′)| < ε whenever (θ,α) and (θ′,α′) are in the same subset. Let (θj,αj) denote some point in ϒj for each j. Then,

and therefore

For

, we have

and therefore

by Lemma 2. Combining (B.2)–(B.4) and n = O(T), we obtain the desired conclusion. █
APPENDIX C: EXPANSION
Let

be such that

. Letting Ui(Xit;θ,αi) ≡ u(Xit;θ,αi) − ρi0 v(Xit;θ,αi), ρi0 ≡ E [∂u(Xit;θ,αi)/∂αi′](E [∂v(Xit;θ,αi)/∂αi′])−1 (in the likelihood framework, U is the efficient score for θ), we can recognize that

is a solution to

Let F ≡ (F1,…,Fn) denote the collection of distribution functions Fi, where each Fi denotes the distribution function of Xit. Let

, where

denotes the empirical distribution function for the stratum i. Define F(ε) ≡ F + εΔiT for ε ∈ [0,T−1/2], where

. For each fixed θ and ε, let V(Xit;θ,αi) ≡ v(Xit;θ,αi) and αi(θ,Fi(ε)) be the solution to the estimating equation 0 = ∫Vi [θ,αi(θ,Fi(ε))] dFi(ε) and let θ(ε) be the solution to the estimating equation

. By Taylor series expansion, we have

, where

is somewhere in between 0 and T−1/2. We therefore have

The last term in (C.1) can be shown to be op(1) by the same method as in Hahn and Newey (2004).
LEMMA 4. For every η > 0, we have

.
Proof. Only the first assertion is proved. The second assertion can be proved similarly. Let η be given. Recall that

. We therefore have

, from which we find

. By Lemma 3, we also have

. Therefore, for every

with probability equal to 1 − o(T−1), we have

where ε ≡ infi inf{(θ,α) : |(θ,α)−(θ0,αi0)|>η}|G(i)(θ,α)| > 0. It follows that

LEMMA 5. Suppose that, for each i, {ξit(φ),t = 1,2,…} is a sequence of zero mean i.i.d. random variables indexed by some parameter φ ∈ Φ. We assume that {ξit(φ), t = 1,2,…} are independent across i. We also assume that supφ∈Φ|ξit(φ)| ≤ Bit for some sequence of random variables Bit that is i.i.d. across t and independent across i. Finally, we assume that maxi E [|Bit|64] < ∞ and n = O(T). We then have

for every υ such that

. Here, {φi} is an arbitrary sequence in Φ.
Proof. By Markov's inequality, we obtain

By Lemma 1, we have

for some C. Therefore, we have

LEMMA 6. Suppose that Ki(·;θ(ε),αi(θ(ε),ε)) is equal to ∂m1+m2g(Xit;θ(ε), αi(θ(ε),ε))/∂θm1∂αim2 for some m1 + m2 ≤ 1,…,5. Then, for any η > 0, we have

Also,

for some constant

.
Proof. Note that we have

Therefore, we have

the right-hand side of which can be bounded by using Lemmas 4 and 5 in absolute value by some η > 0 with probability 1 − o(T−1), which proves the first claim. The second claim can be proved similarly.
As for the third claim, we can show using Lemma 5 that maxi|∫Ki(·;θ(ε), αi(θ(ε),ε)) dΔiT| can be bounded by in absolute value by CT(1/10)−υ for some constant C > 0 and υ such that

with probability 1 − o(T−1). █
LEMMA 7.

for some constant

.
Proof. In Hahn and Kuersteiner (2003), it is shown that

Using Lemma 6, we can see that (∫[∂Vi(·,θ,ε)/∂αi] dFi(ε))−1 is uniformly bounded away from zero with probability 1 − o(T−1). We can also see that, with probability 1 − o(T−1), ∫[∂Vi(·,θ,ε)/∂θ] dFi(ε) is uniformly bounded by some constant C and ∫Vi(·,θ,ε) dΔiT is uniformly bounded by CT(1/10)−υ. █
LEMMA 8.

for some constant

.
Proof. In Hahn and Kuersteiner (2003), it is shown that θε(ε) is equal to

Using Lemmas 6 and 7 we can bound the denominator of θε(ε) by some C > 0 and the numerator by some CT(1/10)−υ with probability 1 − o(T−1). █
LEMMA 9.

for some constant

. Here, αiθrθr′ ≡ ∂2αi /∂θr∂θr′. We similarly define αiθrε.
Proof. By repeatedly differentiating 0 = ∫Vi(·,θ,ε) dFi(ε) with respect to ε, we obtain

The result then follows by applying the same argument as in the proof of Lemma 7. █
LEMMA 10.

for some constant

.
Sketch of Proof. By repeatedly differentiating

with respect to ε, we obtain a characterization of θεε(ε). (For more detailed characterization, see Hahn and Kuersteiner, 2003.) The conclusion follows by combining it with Lemmas 6–9. █
APPENDIX D: UNIFORM CONSISTENCY OF [circumflex]αi
THEOREM 2. Under Conditions 3, 5, 6, and 7, we have

where

and Pr[maxi|κi| ≥ η] = o(1) for every η > 0. Here, vit ≡ vit(Xit,θ0,αi0).
Let

denote αi that sets

equal to zero. (In other words, let

.) We then have the expansion

for some

. Let vi(·,ε) ≡ Vi(θ(F(ε)),αi(Fi(ε))). The first-order condition may be written as 0 = ∫vi(·,ε) dFi(ε). Differentiating repeatedly with respect to ε, we obtain


Because dvi(·,ε)/dε = [∂vi(·,ε)/∂θ′](∂θ/∂ε) + [∂vi(·,ε)/∂αi](∂αi /∂ε), (D.2) implies that

Evaluating at ε = 0 we obtain

where Viα ≡ ∂v(Xit;θ,αi)/∂αi, Viθ ≡ ∂v(Xit;θ,αi)/∂θ, and θε(0) can be deduced from (C.2). Next, consider

such that

is characterized by

We now combine (D.1), (D.4), and

and obtain

Here, (D.6) can be obtained by evaluating (C.2) at ε = 0.
In light of (D.7), Theorem 2 can be obtained by showing that

for any

. This follows from representation (D.5) and also from Lemmas 6, 8, and 10.
APPENDIX E: PROOF OF THEOREM 1
Theorem 2 implies that we have

Under Conditions 4 and 5, the term

is of order op(1), and substitution of

for αi0 in (3) does not affect the asymptotic distribution of the resultant estimator (4):

References
REFERENCES
- 4
- Cited by