Bias from Network Misspecification Under Spatial Dependence

Timm Betz; Scott J. Cook; Florian M. Hollenbach

doi:10.1017/pan.2020.26

Bias from Network Misspecification Under Spatial Dependence

Published online by Cambridge University Press: 23 November 2020

Timm Betz ,

Scott J. Cook

and

Florian M. Hollenbach

Show author details

Timm Betz: Affiliation:
Bavarian School of Public Policy, TUM School of Governance, Technical University of Munich, Munich, Germany. Email: timm.betz@tum.de, URL: timm-betz.de
Scott J. Cook*: Affiliation:
Department of Political Science, Texas A&M University, College Station, TX77843, USA. Email: sjcook@tamu.edu, URL: http://www.scottjcook.net
Florian M. Hollenbach: Affiliation:
Department of Political Science, Texas A&M University, College Station, TX77843, USA. Email: fhollenbach@tamu.edu, URL: http://www.fhollenbach.org
*: Corresponding author Scott J. Cook

Article contents

Abstract
Confounding from Omitted Spatial Dependence
Bias from a Misspecified Network
Simulation
Conclusion
Data Availability Statement
Supplementary Materials
Footnotes
References

Rights & Permissions

Abstract

The prespecification of the network is one of the biggest hurdles for applied researchers in undertaking spatial analysis. In this letter, we demonstrate two results. First, we derive bounds for the bias in nonspatial models with omitted spatially-lagged predictors or outcomes. These bias expressions can be obtained without prior knowledge of the network, and are more informative than familiar omitted variable bias formulas. Second, we derive bounds for the bias in spatial econometric models with nondifferential error in the specification of the weights matrix. Under these conditions, we demonstrate that an omitted spatial input is the limit condition of including a misspecificed spatial weights matrix. Simulated experiments further demonstrate that spatial models with a misspecified weights matrix weakly dominate nonspatial models. Our results imply that, where cross-sectional dependence is presumed, researchers should pursue spatial analysis even with limited information on network ties.

Keywords

spatial dependence bounds omitted variables measurement error

Type: Letter
Information: Political Analysis , Volume 29 , Issue 2 , April 2021 , pp. 260 - 266

DOI: https://doi.org/10.1017/pan.2020.26 [Opens in a new window]
Copyright: © The Author(s) 2020. Published by Cambridge University Press on behalf of the Society for Political Methodology

Across the social sciences, many theories involve cross-unit interactions resulting from spillovers in predictors (e.g., externalities) or outcomes (e.g., interdependence). Where researchers are explicitly interested in cross-unit relationships, spatial econometric models are widely used (Anselin Reference Anselin1988; Franzese and Hays Reference Franzese and Hays2007). Even if researchers are otherwise uninterested in these relationships, however, accounting for spatial dependence is often necessary to recover unbiased estimates.

With a variety of spatial models to select from and widely-available software routines, estimating spatial models is easier than ever. Yet, one prerequisite for spatial analysis continues to frustrate applied researchers: the specification of the spatial weights matrix. Specifying the spatial weights matrix requires additional theories and data for cross-unit relations (Neumayer and Plümper Reference Neumayer and Plümper2016). Researchers, however, often lack theory-backed information to motivate this choice (Corrado and Fingleton Reference Corrado and Fingleton2012). As a result, any spatial weights matrix can be contested and the value of the resulting estimates disputed. This may lead researchers away from spatial econometric models and toward models that ignore spatial relationships—especially when understanding spatial relationships is not the primary concern.

In this note, we consider the consequences of ignoring spatial interactions outright and of introducing them with error in the weights matrix. We first derive bounds of the bias from ignoring spatial dependence. Exploiting several features unique to spatial relationships, we obtain bounds that are more informative than common expressions for omitted variables bias. We then demonstrate that omitting spatial terms produces worse results than estimation based on a misspecified network under nondifferential error. As such, we argue that researchers should prefer spatial models, even when they possess limited knowledge of the network.

1 Confounding from Omitted Spatial Dependence

We consider two spatial processes: (1) spillovers of predictors across units, in the form of a spatial lag of X (SLX) model and (2) outcome interdependence between units, in the form of a spatial auto-regressive (SAR) model.

Consider a SLX data-generating process:

(1)

$$ \begin{align} \mathbf{y} = \alpha + \mathbf{x}\beta + \mathbf{W}\mathbf{x}\theta + {\boldsymbol\in}, \end{align} $$

where $\mathbf {y}, \mathbf {x}$, and $ {\boldsymbol\in }$ are N-length vectors of the outcome, predictor, and error term, respectively. $\mathbf {W}$ is an N-by-N spatial weights matrix specifying network ties between units. We make usual assumptions about $\mathbf {W}$: it has zeroes along the diagonal, non-negative elements, and is normalized using standard approaches. Interest is in estimating the coefficient $\beta $.

Omitting $\mathbf {W}\mathbf {x}$ results in standard omitted variables bias:

(2)

$$ \begin{align} \operatorname{\mathrm{plim}}_{n\rightarrow\infty} \hat{\beta}_{OLS} - \beta = \theta \frac{\mathrm{cov}(\mathbf{W}\mathbf{x},\mathbf{x})}{\mathrm{var}(\mathbf{x})}. \end{align} $$

Our goal is to identify, for given sample realizations of $\mathbf {W}$ and $\mathbf {x}$, more informative bounds for the bias than expression (2).Footnote ¹ Usually, the omitted variable bias formula does not offer much leverage, because the covariances involving the omitted terms are unknown. When the omitted variable is a spatial lag of a predictor, however, we have more information, because $\mathbf {W}\mathbf {x}$ is a linear combination of the values of $\mathbf {x}$. Consequently, knowledge of $\mathbf {x}$ is sufficient to produce empirical bounds.

Specifically, for a large class of common weights matrices,Footnote ²

(3)

$$ \begin{align} \left|\frac{\mathrm{cov}(\mathbf{W}\mathbf{x},\mathbf{x})}{\mathrm{var}(\mathbf{x})}\right| \leq 1. \end{align} $$

Under these conditions, the upper bound of the bias in Equation (2) is $\theta $. In many contexts, own-unit values of a predictor can be reasonably assumed to have a larger effect than other-unit values, such that $|\beta | \geq |\theta |$. This implies that the maximum asymptotic bias is $\beta $.Footnote ³

Thus, for any given sample, $\beta $ provides an upper bound on the bias in $\hat {\beta }_{OLS}$, and, asymptotically, $\hat {\beta }_{OLS}$ is in the interval $[0,2\beta ].$ Except for randomness, therefore, we should not observe sign switches as a consequence of omitting $\mathbf {W}\mathbf {x}$. Moreover, given a potentially biased estimate $\hat {\beta }_{OLS}$, we can estimate the lower bound of $\beta $ as $\frac {\hat {\beta }_{OLS}}{2}$. This lower bound shifts closer toward $\hat {\beta }_{OLS}$ as the magnitude of spillovers decreases, allowing for assessment of the sensitivity of substantive effects.

As an illustrative example, consider economic voting: how do economic conditions shape voting behavior? In addition to local GDP growth, growth in neighboring units can matter for evaluations of incumbents through benchmarking effects (Kayser and Peress Reference Kayser and Peress2012). Arel-Bundock, Blais, and Dassonneville (Reference Arel-Bundock, Blais and Dassonneville2019) demonstrate that many of these theories translate into SLX models: $\theta $, the coefficient on $\mathbf {W}\mathbf {x}$, captures benchmarking effects; $\beta $, the coefficient on $\mathbf {x}$, captures conventional economic voting.

It is difficult to imagine a scenario under which growth in neighboring countries has a larger effect on vote choices than domestic growth, so $|\beta | \geq |\theta |$ seems reasonable. We estimate three models using the Kayser and Peress (Reference Kayser and Peress2012) data, assuming that their network specification captures the true $\mathbf {W}$: (1) incumbent vote share regressed on growth ($\mathbf {x}$), plus controls, (2) incumbent vote share regressed on growth ($\mathbf {x}$) and trade-weighted global growth ($\mathbf {W}\mathbf {x}$), plus controls and (3) trade-weighted global growth regressed on growth, plus controls. The first model yields a potentially biased estimate $\hat {\beta }$ (0.530), the second model yields estimates for $\beta $ (0.577) and $\theta $ (-0.173) that we treat as “true,” and the third model yields an estimate (0.270) of $\mathrm {Cov}(\mathbf {x},\mathbf {W}\mathbf {x})/\mathrm {Var}(\mathbf {x})$.

This demonstrates that the main conditions necessary for our bound are plausible (and conservative) in real-world data: $|\theta | = 0.173 < |\beta | = 0.577$, and $\mathrm {Cov}(\mathbf {x},\mathbf {W}\mathbf {x})/\mathrm {Var}(\mathbf {x}) = 0.270 \leq 1$. Additionally, the lower bound on $\beta $ estimated from the biased $\hat {\beta }$ – that is, $\frac {\hat {\beta }}{2} = \frac {0.530}{2} = 0.265$—holds because the bias, in this case, was attenuating. However, this lower bound also holds for alternative $\mathbf {W}$’s that we have not considered. That is, we do not need to assume that trade is the appropriate edge when making cross-country economic evaluations.

Similar results follow for the SAR data-generating process,

(4)

$$ \begin{align} \mathbf{y} = \alpha + \mathbf{x}\beta + \rho \mathbf{W}\mathbf{y} + {\boldsymbol\in}. \end{align} $$

Omitting the spatial lag of the outcome ($\mathbf {W}\mathbf {y}$) induces bias

(5)

$$ \begin{align} \operatorname{\mathrm{plim}}_{n\rightarrow\infty} \hat{\beta}_{OLS} - \beta = \rho \frac{\mathrm{cov}(\mathbf{W}\mathbf{y},\mathbf{x})}{\mathrm{var}(\mathbf{x})}. \end{align} $$

Using condition (3) and the derivation detailed in Betz, Cook, and Hollenbach (Reference Betz, Cook and Hollenbach2020a), this expression can be rewritten as

(6)

$$ \begin{align} \begin{aligned} \operatorname{\mathrm{plim}}_{n\rightarrow\infty} \hat{\beta}_{OLS} - \beta \leq \frac{\beta\rho}{1-\rho}, \end{aligned} \end{align} $$

with $\hat {\beta }_{OLS}$ in $[0,\infty )$.Footnote ⁴

These results have several implications. First, asymptotically, an omitted spatial lag of the outcome cannot produce a sign reversal on the estimated coefficient. Moreover, the bias is proportional to $\beta $, the true effect.

Second, our assumptions have been purposefully weak. Restricting the domain of $\rho $ yields tighter bounds. For example, with $\rho < 0.5$—which still implies strong spatial interdependence—the bounds on $\beta _{OLS}$ are identical to those derived earlier, $[0,2\beta ]$. Additionally, expression (6) allows for a simple form of sensitivity analysis by determining permissible values of $\rho $ for a desired lower bound on $\beta $, or to graph the lower bound of $\beta $ given $\rho $.

Finally, the bias can again be expressed with data on hand. As we demonstrate in the Appendix, empirical bounds can be calculated from the sample data for arbitrary $\mathbf {W}$, which will be tighter than those implied by (6) because they yield a finite upper bound on $\mathrm {cov}(\mathbf {W}\mathbf {y},\mathbf {x})$.

2 Bias from a Misspecified Network

Omitting relevant spatial inputs induces bias, yet we can still infer substantively relevant information from such results. Modeling these spatial terms explicitly promises greater gains. To do so, researchers must presupply the weights matrix. In applied work, researchers often fear that they do not have sufficient information to accurately specify $\mathbf {W}$, which may cause them to forgo modeling spatial terms at all. Returning to the example of economic voting, Arel-Bundock, Blais, and Dassonneville (Reference Arel-Bundock, Blais and Dassonneville2019) note that the existing literature provides no theoretically grounded argument for a specific choice of $\mathbf {W}$. Perhaps as a consequence, few studies of conventional economic voting account for benchmarking effects.

Given the centrality of the specification of $\mathbf {W}$, these concerns have received considerable attention (Corrado and Fingleton Reference Corrado and Fingleton2012; Neumayer and Plümper Reference Neumayer and Plümper2016). Researchers have suggested that uncertainty over competing $\mathbf {W}$s can be assessed using information criteria (Halleck Vega and Elhorst Reference Halleck Vega and Elhorst2015), modeled using Bayesian model averaging (Juhl Reference Juhl2020), and may be less essential than presumed because of the high degree of correlation among different $\mathbf {W}$s (LeSage and Pace Reference LeSage and Pace2014). We demonstrate that spatial models with misspecified weights matrices weakly dominate nonspatial models under random measurement error of the weights matrix.

First, consider a SLX process. Suppose instead of $\mathbf {W}$ we possess a noisy $\widetilde {\mathbf {W}}$,

(7)

$$ \begin{align} \widetilde{\mathbf{W}}\mathbf{x} = \mathbf{W}\mathbf{x} + \mathbf{e}, \end{align} $$

where $\mathbf {W}\mathbf {x} \perp \mathbf {e}$, indicating that the spatial lag suffers from classical, nondifferential measurement error. Estimating a SLX model yields

(8)

$$ \begin{align} \operatorname{\mathrm{plim}}_{n\rightarrow\infty} \hat{\theta}_{SLX} = \theta\frac{\sigma^2_{\mathbf{W}\mathbf{x}|\mathbf{x}}}{\sigma^2_{\mathbf{W}\mathbf{x}|\mathbf{x}} + \sigma^2_{\mathbf{e}}} = \theta\lambda, \end{align} $$

where $\sigma ^2_{\mathbf {W}\mathbf {x}|\mathbf {x}}$ is the residual variance of regressing $\mathbf {W}\mathbf {x}$ on $\mathbf {x}$, and $\lambda $ is the bivariate reliability ratio (Carroll et al. Reference Carroll, Ruppert, Stefanski and Crainiceanu2006). Because $\lambda $ is bounded on the unit interval, Equation (8) indicates the usual attenuation bias.

The corresponding bias in the estimate of $\beta $ is

(9)

$$ \begin{align} \operatorname{\mathrm{plim}}_{n\rightarrow\infty} \hat{\beta}_{SLX} - \beta &= \theta(1-\lambda)\frac{\mathrm{cov}(\mathbf{W}\mathbf{x},\mathbf{x})}{\mathrm{var}(\mathbf{x})}. \end{align} $$

This expressions corresponds to the omitted variables bias in Equation (2) weighted by (1-$\lambda $). Thus, the bias in Equation (9) can be no greater than the bias in Equation (2). Omitting a spatial predictor provides the limit condition of including a spatial predictor with a misspecificed weights matrix. Because $\hat {\beta }_{SLX}$ is less biased than $\hat {\beta }_{OLS}$, the implied lower bound on $\beta $ is also more informative.

For the SAR model, estimation is more complicated: the simultaneity of $\mathbf {y} $ and $\mathbf {W}\mathbf {y}$ necessitates maximum likelihood or instrumental variable (IV) methods (Anselin Reference Anselin1988). While IV strategies typically offer relief from measurement error, this is not the case for spatial models where the instruments are spatially-lagged realizations of the predictors. Because these are generated using the same weights matrix as the outcome, they inherit—and are correlated with—the measurement error. Thus, misspecifying the weights matrix results in asymptotically biased estimates (see the Appendix).

To derive the bias expression in the SAR model, we consider a just identified IV model where $\widetilde {\mathbf {W}}\mathbf {x}$ is used as an instrument for $\widetilde {\mathbf {W}}\mathbf {y}$. Analogously to the SLX model, the IV estimation produces

(10)

$$ \begin{align} \operatorname{\mathrm{plim}}_{n\rightarrow\infty} \hat{\beta}_{IV} - \beta &= \rho(1-\lambda)\frac{\mathrm{cov}(\mathbf{W}\mathbf{y},\mathbf{x})}{\mathrm{var}(\mathbf{x})}. \end{align} $$

As before, the bias in Equation (10) can be no greater than the bias in Equation (6). Consequently, a misspecified weights matrix induces bias in the estimation, but improves over the omitted variable bias from ignoring spatial interdependence.

This should encourage researchers to consider spatial models even where knowledge of the unit ties is imperfect. Not only do spatial estimators of $\beta $ weakly dominate those from nonspatial models, but researchers also obtain sample estimates of $\theta $ or $\rho $. This allows calculating postspatial and total effects of $\mathbf {x}$ (Franzese and Hays Reference Franzese and Hays2007; LeSage and Pace Reference LeSage and Pace2009), yielding a more complete understanding of the relationship of interest.

3 Simulation

The following simulations demonstrate the small sample performance of spatial models when $\mathbf {W}$ is misspecified. We focus on the SAR model, which is the most widely used spatial model in applied research.Footnote ⁵ We generate data where both $\mathbf {y}$ and $\mathbf {x}$ are governed by SAR processes:

(11a)

$$ \begin{align} \mathbf{y} &= (\mathbf{I} - \rho_y\mathbf{W})^{-1}[\alpha + \beta\mathbf{x} + \boldsymbol{\in}], \end{align} $$

(11b)

$$ \begin{align} \mathbf{x} &= (\mathbf{I} - \rho_x\mathbf{W})^{-1}\mathbf{u}, \end{align} $$

where $\mathbf {u}$ and $\boldsymbol {\in }$ are N-length vectors with elements drawn from $\mathcal {N}(0,1)$. $\beta $ reflects the direct (i.e., prespatial) effect of $\mathbf {x}$ on $\mathbf {y}$, while $\rho _y$ and $\rho _x$ determine the strength of the spatial autocorrelation in $\mathbf {y}$ and $\mathbf {x}$.Footnote ⁶

We hold $\mathbf {W}$ and $\mathbf {u}$ fixed across simulations. Locations for observations are determined by drawing vertical and horizontal coordinates from $\mathcal {U}(0,5)$. Based on these coordinates, we generate a binary 10-nearest-neighbor $\mathbf {W}$ matrix. We fix $\beta $ at $2$ and the number of observations at $150$ across the experiments, focusing on variation in the spatial autoregressive parameters $\rho _x$ and $\rho _y$, which we vary between $0$ (i.e., no spatial interdependence), $0.3$, and $0.6$ (i.e., high spatial interdependence). For each of these $9$ experimental settings, we simulate $2,000$ data sets.

To induce misspecification in the matrix $\widetilde {\mathbf {W}}$ used in the estimation, we generate a second connectivity matrix ($\mathbf {M}$) based on a new random draw of locations. $\mathbf {M}$ is therefore independent of the true $\mathbf {W}$ used in the data-generating process. We then generate the set of connectivity matrices used in the model estimation ($\widetilde {\mathbf {W}}$) as a mixture of the true ($\mathbf {W}$) and false ($\mathbf {M}$) matrices. Specifically, the elements $\tilde {w}_{i,j}$ are determined as

$$ \begin{align*} \tilde{w}_{i,j} &= \begin{cases} w_{i,j} & \text{if } d = 0 \text{ where } d \sim \operatorname{Bern}(p),\\ m_{i,j} & \text{otherwise}, \end{cases} \end{align*} $$

where p is the probability of misclassification, which we increase from $0$ (no error) to $1$ (all error) in increments of $0.05$. In total, this produces $21$ connectivity matrices used in estimation, which are all normalized using min–max normalization.Footnote ⁷

Using the simulated data for $\mathbf {y}$ and $\mathbf {x}$, we estimate nonspatial linear models (via OLS) and SAR models (via ML) using $\widetilde {\mathbf {W}}$s of varying accuracy (decreasing in p, the probability of misspecification). For each model, we record $\hat {\beta }$ to assess performance. Figure 1 shows the results for the simulations of the SAR process based on 10-nearest neighbors and min–max normalization. Each cell presents the results for one combination of $\rho _x$ and $\rho _y$; $\rho _x$ increases from $0$ to $0.6$ going from left to right, $\rho _y$ increases moving from top to bottom. In each, we plot the densities of coefficient estimates at different levels of the misspecification probability p. Darker shading indicates higher levels of misspecification. The densities of $\hat {\beta }$s for nonspatial models are plotted in black. The bias in both the nonspatial models and misspecified spatial models increases in $\rho _y$ and $\rho _x$, being largest in the bottom right cell.

Figure 1 Coefficients with misspecification of W in SAR models based on 10-nearest-neighbors with min–max normalization (orange/grey) and OLS model omitting spatial lag (black).

The results underscore three points. First, as the misspecification of $\widetilde {\mathbf {W}}$ increases, the bias in $\hat {\beta }$ increases. Yet even with high interdependence and mismeasured (or omitted) $\mathbf {W}$, the observed bias is much smaller than the bounds derived above. Second, the SAR model weakly dominates the nonspatial model. Even a SAR model estimated with a random $\widetilde {\mathbf {W}}$ does no worse than omitting the spatial term. Finally, the simulation results confirm our analytical results. For example, inequality (6) implies a maximum bias of $3$. The bias in the simulations clearly maintains that bound. Moreover, with $\rho _y$ = $\rho _x$ = 0.6, on average we obtain $\frac {\mathrm {cov}(\mathbf {W}\mathbf {y},\mathbf {x})}{\mathrm {var}(\mathbf {x})} = 1.45$. Equation (5) thus implies an OLS estimate of $2.87$, identical to the average OLS estimate in the simulations. Tables C.1 and C.2 in the Appendix report these quantities for all simulation scenarios.

4 Conclusion

Researchers frequently suspect spatial dependence in their data, but lack knowledge of the precise network. Fearing that selecting the wrong network may open them to criticism, researchers may forgo spatial models altogether. Here, we have demonstrated the potential biases introduced from omitting spatial terms outright versus including them with error. Our results should encourage the estimation of spatial models even if researchers have imperfect information. As researchers in these settings likely lack strong theory-based specifications, we point to Griffith’s five rules of thumb for specifying weights matrices (Griffith Reference Griffith and Arlinghaus1996).

We emphasize that our results do not hold under differential measurement error. We suspect that differential measurement error is most likely for network ties that violate the exogeneity assumption for spatial weights—implying that traditional spatial econometric models would be inappropriate. However, we hope that future work extends our results to other contexts and more complex forms of measurement error. Several of the features identified here may be useful in these efforts. First, prior research focuses on misspecification in the weights matrix, yet errors manifest in the empirical model as vectors. Second, restrictions on $\mathbf {x}$—such as limiting the analysis to binary $\mathbf {x}$—imply restrictions on $\mathbf {W}\mathbf {x}$. Finally, row and min–max normalization imply bounds for the vector range and vector sum. Recognizing these attributes could be of potential use in new analytical and empirical approaches in future research.

Acknowledgments

An earlier version of this paper benefited from feedback at the 2017 Annual Meeting for the Society of Political Methodology. We particularly thank Andrew Bridy, three anonymous reviewers, and the editor for their useful advice. All remaining errors are ours alone. Authors are listed in alphabetical order, equal authorship is implied. For further questions contact Scott J. Cook. Portions of this research were conducted with high performance research computing resources provided by Texas A&M University (http://hprc.tamu.edu).

Data Availability Statement

Replication code for this article has been published in Code Ocean, a computational reproducibility platform that enables users to run the code and can be viewed interactively at https://codeocean.com/capsule/7618094/tree (Betz, Cook, and Hollenbach Reference Betz, Cook and Hollenbach2020b). A preservation copy of the same code and data can also be accessed via Dataverse at https://doi.org/10.7910/DVN/ADIFOV (Betz, Cook, and Hollenbach Reference Betz, Cook and Hollenbach2020c).

Reader note: The Code Ocean capsule above contains the code to replicate the results of this article. Users can run the code and view the outputs, but in order to do so they will need to register on the Code Ocean site (or login if they have an existing Code Ocean account).

Supplementary Materials

To view supplementary material for this article, please visit https://dx.doi.org/10.1017/pan.2020.26.

Footnotes

Edited by Jeff Gill

1 The standard errors are also biased (Franzese and Hays Reference Franzese and Hays2007). A similar approach to ours may extend to the variance–covariance matrix, which we leave to future work.

2 We provide full results in the Appendix. For intuition, note that $\mathbf {W}$ often induces averaging of the values of the original vector $\mathbf {x}$, thereby reducing the variance. A sufficient condition for inequality (3) to hold is that Moran’s I in the sample is bounded by $[-1,1]$, which generally holds except “for an irregular pattern” (Cliff and Ord Reference Cliff and Ord1981). We demonstrate that this inequality holds for any symmetric weights matrix; for any spectral weights matrix; and for any doubly-stochastic weights matrix. This includes all matrices based on the attributes of undirected dyads, such as inverse distance, bilateral flows, threshold models, and contiguity. For arbitrary combinations of $\mathbf {W}$ and $\mathbf {x}$ the inequality may no longer hold. We derive worst-case bounds for arbitrary $\mathbf {W}$ that can be calculated from the sample, and show that inequality (3) holds for any arbitrary weights matrix when $\mathbf {x}$ is binary.

3 The plausibility of $|\beta | \geq |\theta |$ depends on the specific application. We believe it is defensible in most contexts our note addresses: researchers consider spatial effects a nuisance and have little prior information about $\mathbf {W}$.

4 $\beta $ in the SAR model reflects the prespatial (i.e., partial equillibrium) effect of $x_i$ on $y_i$. Total effects also involve spatial spillovers and feedback (LeSage and Pace, Reference LeSage and Pace2009). The quantity we consider is similar to that obtained by spatial filtering.

5 In the Appendix, we present results for a SLX model.

6 It is not necessary that the spatial dependence in $\mathbf {x}$ is generated via a spatial autoregressive process. Any alternative which produces spatial correlation in $\mathbf {x}$ (along $\mathbf {W}$) would induce the types of biases we consider.

7 In the Appendix we provide results for row normalization.

References

Anselin, L. 1988. Spatial Econometrics: Methods and Models. Dordrecht, the Netherlands: Springer Science & Business Media.Google Scholar

Arel-Bundock, V., Blais, A., and Dassonneville, R.. 2019. “Do Voters Benchmark Economic Performance?” British Journal of Political Science, 1–13. doi:10.1017/S0007123418000236.Google Scholar

Betz, T., Cook, S. J., and Hollenbach, F. M.. 2020a. “Spatial Interdependence and Instrumental Variable Models.” Political Science Research and Methods 8(4):646–661.Google Scholar

Betz, T., Cook, S. J., and Hollenbach, F. M.. 2020b. “Bias Due to Network Misspecification Under Spatial Dependence.” Code Ocean, V1. https://codeocean.com/capsule/7618094/tree.Google Scholar

Betz, T., Cook, S. J., and Hollenbach, F. M.. 2020c. “Replication Data for: Bias Due to Network Misspecification Under Spatial Dependence.” https://doi.org/10.7910/DVN/ADIFOV, Harvard Dataverse, V1, UNF:6:EubeWsg/18JKC5tPaIAePw== [fileUNF].Google Scholar

Carroll, R. J., Ruppert, D., Stefanski, L. A., and Crainiceanu, C. M. 2006. Measurement Error in Nonlinear Models: A Modern Perspective. Boca Raton, FL: Chapman and Hall/CRC Press.CrossRef Google Scholar

Cliff, A. D., and Ord, J. K.. 1981. Spatial Processes: Models and Applications. London: Pion Ltd.Google Scholar

Corrado, L., and Fingleton, B.. 2012. “Where is the Economics in Spatial Econometrics?” Journal of Regional Science 52(2):210–239.CrossRef Google Scholar

Franzese, R. J. Jr., and Hays, J. C.. 2007. “Models of Cross-Sectional Interdependence in Political Science Panel and Time-Series-Cross-Section Data.” Political Analysis 15(2):140–164.Google Scholar

Griffith, D. A. 1996. “Some Guidelines for Specifying the Geographic Weights Matrix Contained in Spatial Statistical Models.” In Practical Handbook of Spatial Statistics, edited by Arlinghaus, S. L., 65–82. Boca Raton, FL: CRC Press.Google Scholar

Halleck Vega, S., and Elhorst, J. P.. 2015. “The SLX Model.” Journal of Regional Science 55(3):339–363.Google Scholar

Juhl, S. 2020. “The Sensitivity of Spatial Regression Models to Network Misspecification.” Political Analysis 28(1):1–19.Google Scholar

Kayser, M. A., and Peress, M.. 2012. “Benchmarking across Borders: Electoral Accountability and the Necessity of Comparison.” American Political Science Review 106(3):661–684.Google Scholar

LeSage, J., and Pace, R.. 2014. “The Biggest Myth in Spatial Econometrics.” Econometrics 2(4):217–249.CrossRef Google Scholar

LeSage, J., and Pace, R. K.. 2009. Introduction to Spatial Econometrics. Boca Raton, FL: Chapman and Hall/CRC Press.Google Scholar

Neumayer, E., and Plümper, T.. 2016. “W.” Political Science Research and Methods 4(1):175–193.CrossRef Google Scholar

Figure 1 Coefficients with misspecification of W in SAR models based on 10-nearest-neighbors with min–max normalization (orange/grey) and OLS model omitting spatial lag (black).

Betz et al. Dataset

Dataset

https://doi.org/10.7910/DVN/ADIFOV

Link

Betz et al. Supplementary Materials

PDF 731 KB

Article contents

Bias from Network Misspecification Under Spatial Dependence

Abstract

Keywords

1 Confounding from Omitted Spatial Dependence

2 Bias from a Misspecified Network

3 Simulation

4 Conclusion

Acknowledgments

Data Availability Statement

Supplementary Materials

Footnotes

References

Betz et al. Dataset

Betz et al. Supplementary Materials

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests