Across the social sciences, many theories involve cross-unit interactions resulting from spillovers in predictors (e.g., externalities) or outcomes (e.g., interdependence). Where researchers are explicitly interested in cross-unit relationships, spatial econometric models are widely used (Anselin Reference Anselin1988; Franzese and Hays Reference Franzese and Hays2007). Even if researchers are otherwise uninterested in these relationships, however, accounting for spatial dependence is often necessary to recover unbiased estimates.
With a variety of spatial models to select from and widely-available software routines, estimating spatial models is easier than ever. Yet, one prerequisite for spatial analysis continues to frustrate applied researchers: the specification of the spatial weights matrix. Specifying the spatial weights matrix requires additional theories and data for cross-unit relations (Neumayer and Plümper Reference Neumayer and Plümper2016). Researchers, however, often lack theory-backed information to motivate this choice (Corrado and Fingleton Reference Corrado and Fingleton2012). As a result, any spatial weights matrix can be contested and the value of the resulting estimates disputed. This may lead researchers away from spatial econometric models and toward models that ignore spatial relationships—especially when understanding spatial relationships is not the primary concern.
In this note, we consider the consequences of ignoring spatial interactions outright and of introducing them with error in the weights matrix. We first derive bounds of the bias from ignoring spatial dependence. Exploiting several features unique to spatial relationships, we obtain bounds that are more informative than common expressions for omitted variables bias. We then demonstrate that omitting spatial terms produces worse results than estimation based on a misspecified network under nondifferential error. As such, we argue that researchers should prefer spatial models, even when they possess limited knowledge of the network.
1 Confounding from Omitted Spatial Dependence
We consider two spatial processes: (1) spillovers of predictors across units, in the form of a spatial lag of X (SLX) model and (2) outcome interdependence between units, in the form of a spatial auto-regressive (SAR) model.
Consider a SLX data-generating process:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210427160133581-0221:S1047198720000261:S1047198720000261_eqn1.png?pub-status=live)
where $\mathbf {y}, \mathbf {x}$, and
$ {\boldsymbol\in }$ are N-length vectors of the outcome, predictor, and error term, respectively.
$\mathbf {W}$ is an N-by-N spatial weights matrix specifying network ties between units. We make usual assumptions about
$\mathbf {W}$: it has zeroes along the diagonal, non-negative elements, and is normalized using standard approaches. Interest is in estimating the coefficient
$\beta $.
Omitting $\mathbf {W}\mathbf {x}$ results in standard omitted variables bias:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210427160133581-0221:S1047198720000261:S1047198720000261_eqn2.png?pub-status=live)
Our goal is to identify, for given sample realizations of $\mathbf {W}$ and
$\mathbf {x}$, more informative bounds for the bias than expression (2).Footnote 1 Usually, the omitted variable bias formula does not offer much leverage, because the covariances involving the omitted terms are unknown. When the omitted variable is a spatial lag of a predictor, however, we have more information, because
$\mathbf {W}\mathbf {x}$ is a linear combination of the values of
$\mathbf {x}$. Consequently, knowledge of
$\mathbf {x}$ is sufficient to produce empirical bounds.
Specifically, for a large class of common weights matrices,Footnote 2
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210427160133581-0221:S1047198720000261:S1047198720000261_eqn3.png?pub-status=live)
Under these conditions, the upper bound of the bias in Equation (2) is $\theta $. In many contexts, own-unit values of a predictor can be reasonably assumed to have a larger effect than other-unit values, such that
$|\beta | \geq |\theta |$. This implies that the maximum asymptotic bias is
$\beta $.Footnote 3
Thus, for any given sample, $\beta $ provides an upper bound on the bias in
$\hat {\beta }_{OLS}$, and, asymptotically,
$\hat {\beta }_{OLS}$ is in the interval
$[0,2\beta ].$ Except for randomness, therefore, we should not observe sign switches as a consequence of omitting
$\mathbf {W}\mathbf {x}$. Moreover, given a potentially biased estimate
$\hat {\beta }_{OLS}$, we can estimate the lower bound of
$\beta $ as
$\frac {\hat {\beta }_{OLS}}{2}$. This lower bound shifts closer toward
$\hat {\beta }_{OLS}$ as the magnitude of spillovers decreases, allowing for assessment of the sensitivity of substantive effects.
As an illustrative example, consider economic voting: how do economic conditions shape voting behavior? In addition to local GDP growth, growth in neighboring units can matter for evaluations of incumbents through benchmarking effects (Kayser and Peress Reference Kayser and Peress2012). Arel-Bundock, Blais, and Dassonneville (Reference Arel-Bundock, Blais and Dassonneville2019) demonstrate that many of these theories translate into SLX models: $\theta $, the coefficient on
$\mathbf {W}\mathbf {x}$, captures benchmarking effects;
$\beta $, the coefficient on
$\mathbf {x}$, captures conventional economic voting.
It is difficult to imagine a scenario under which growth in neighboring countries has a larger effect on vote choices than domestic growth, so $|\beta | \geq |\theta |$ seems reasonable. We estimate three models using the Kayser and Peress (Reference Kayser and Peress2012) data, assuming that their network specification captures the true
$\mathbf {W}$: (1) incumbent vote share regressed on growth (
$\mathbf {x}$), plus controls, (2) incumbent vote share regressed on growth (
$\mathbf {x}$) and trade-weighted global growth (
$\mathbf {W}\mathbf {x}$), plus controls and (3) trade-weighted global growth regressed on growth, plus controls. The first model yields a potentially biased estimate
$\hat {\beta }$ (0.530), the second model yields estimates for
$\beta $ (0.577) and
$\theta $ (-0.173) that we treat as “true,” and the third model yields an estimate (0.270) of
$\mathrm {Cov}(\mathbf {x},\mathbf {W}\mathbf {x})/\mathrm {Var}(\mathbf {x})$.
This demonstrates that the main conditions necessary for our bound are plausible (and conservative) in real-world data: $|\theta | = 0.173 < |\beta | = 0.577$, and
$\mathrm {Cov}(\mathbf {x},\mathbf {W}\mathbf {x})/\mathrm {Var}(\mathbf {x}) = 0.270 \leq 1$. Additionally, the lower bound on
$\beta $ estimated from the biased
$\hat {\beta }$ – that is,
$\frac {\hat {\beta }}{2} = \frac {0.530}{2} = 0.265$—holds because the bias, in this case, was attenuating. However, this lower bound also holds for alternative
$\mathbf {W}$’s that we have not considered. That is, we do not need to assume that trade is the appropriate edge when making cross-country economic evaluations.
Similar results follow for the SAR data-generating process,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210427160133581-0221:S1047198720000261:S1047198720000261_eqn4.png?pub-status=live)
Omitting the spatial lag of the outcome ($\mathbf {W}\mathbf {y}$) induces bias
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210427160133581-0221:S1047198720000261:S1047198720000261_eqn5.png?pub-status=live)
Using condition (3) and the derivation detailed in Betz, Cook, and Hollenbach (Reference Betz, Cook and Hollenbach2020a), this expression can be rewritten as
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210427160133581-0221:S1047198720000261:S1047198720000261_eqn6.png?pub-status=live)
with $\hat {\beta }_{OLS}$ in
$[0,\infty )$.Footnote 4
These results have several implications. First, asymptotically, an omitted spatial lag of the outcome cannot produce a sign reversal on the estimated coefficient. Moreover, the bias is proportional to $\beta $, the true effect.
Second, our assumptions have been purposefully weak. Restricting the domain of $\rho $ yields tighter bounds. For example, with
$\rho < 0.5$—which still implies strong spatial interdependence—the bounds on
$\beta _{OLS}$ are identical to those derived earlier,
$[0,2\beta ]$. Additionally, expression (6) allows for a simple form of sensitivity analysis by determining permissible values of
$\rho $ for a desired lower bound on
$\beta $, or to graph the lower bound of
$\beta $ given
$\rho $.
Finally, the bias can again be expressed with data on hand. As we demonstrate in the Appendix, empirical bounds can be calculated from the sample data for arbitrary $\mathbf {W}$, which will be tighter than those implied by (6) because they yield a finite upper bound on
$\mathrm {cov}(\mathbf {W}\mathbf {y},\mathbf {x})$.
2 Bias from a Misspecified Network
Omitting relevant spatial inputs induces bias, yet we can still infer substantively relevant information from such results. Modeling these spatial terms explicitly promises greater gains. To do so, researchers must presupply the weights matrix. In applied work, researchers often fear that they do not have sufficient information to accurately specify $\mathbf {W}$, which may cause them to forgo modeling spatial terms at all. Returning to the example of economic voting, Arel-Bundock, Blais, and Dassonneville (Reference Arel-Bundock, Blais and Dassonneville2019) note that the existing literature provides no theoretically grounded argument for a specific choice of
$\mathbf {W}$. Perhaps as a consequence, few studies of conventional economic voting account for benchmarking effects.
Given the centrality of the specification of $\mathbf {W}$, these concerns have received considerable attention (Corrado and Fingleton Reference Corrado and Fingleton2012; Neumayer and Plümper Reference Neumayer and Plümper2016). Researchers have suggested that uncertainty over competing
$\mathbf {W}$s can be assessed using information criteria (Halleck Vega and Elhorst Reference Halleck Vega and Elhorst2015), modeled using Bayesian model averaging (Juhl Reference Juhl2020), and may be less essential than presumed because of the high degree of correlation among different
$\mathbf {W}$s (LeSage and Pace Reference LeSage and Pace2014). We demonstrate that spatial models with misspecified weights matrices weakly dominate nonspatial models under random measurement error of the weights matrix.
First, consider a SLX process. Suppose instead of $\mathbf {W}$ we possess a noisy
$\widetilde {\mathbf {W}}$,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210427160133581-0221:S1047198720000261:S1047198720000261_eqn7.png?pub-status=live)
where $\mathbf {W}\mathbf {x} \perp \mathbf {e}$, indicating that the spatial lag suffers from classical, nondifferential measurement error. Estimating a SLX model yields
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210427160133581-0221:S1047198720000261:S1047198720000261_eqn8.png?pub-status=live)
where $\sigma ^2_{\mathbf {W}\mathbf {x}|\mathbf {x}}$ is the residual variance of regressing
$\mathbf {W}\mathbf {x}$ on
$\mathbf {x}$, and
$\lambda $ is the bivariate reliability ratio (Carroll et al. Reference Carroll, Ruppert, Stefanski and Crainiceanu2006). Because
$\lambda $ is bounded on the unit interval, Equation (8) indicates the usual attenuation bias.
The corresponding bias in the estimate of $\beta $ is
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210427160133581-0221:S1047198720000261:S1047198720000261_eqn9.png?pub-status=live)
This expressions corresponds to the omitted variables bias in Equation (2) weighted by (1-$\lambda $). Thus, the bias in Equation (9) can be no greater than the bias in Equation (2). Omitting a spatial predictor provides the limit condition of including a spatial predictor with a misspecificed weights matrix. Because
$\hat {\beta }_{SLX}$ is less biased than
$\hat {\beta }_{OLS}$, the implied lower bound on
$\beta $ is also more informative.
For the SAR model, estimation is more complicated: the simultaneity of $\mathbf {y} $ and
$\mathbf {W}\mathbf {y}$ necessitates maximum likelihood or instrumental variable (IV) methods (Anselin Reference Anselin1988). While IV strategies typically offer relief from measurement error, this is not the case for spatial models where the instruments are spatially-lagged realizations of the predictors. Because these are generated using the same weights matrix as the outcome, they inherit—and are correlated with—the measurement error. Thus, misspecifying the weights matrix results in asymptotically biased estimates (see the Appendix).
To derive the bias expression in the SAR model, we consider a just identified IV model where $\widetilde {\mathbf {W}}\mathbf {x}$ is used as an instrument for
$\widetilde {\mathbf {W}}\mathbf {y}$. Analogously to the SLX model, the IV estimation produces
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210427160133581-0221:S1047198720000261:S1047198720000261_eqn10.png?pub-status=live)
As before, the bias in Equation (10) can be no greater than the bias in Equation (6). Consequently, a misspecified weights matrix induces bias in the estimation, but improves over the omitted variable bias from ignoring spatial interdependence.
This should encourage researchers to consider spatial models even where knowledge of the unit ties is imperfect. Not only do spatial estimators of $\beta $ weakly dominate those from nonspatial models, but researchers also obtain sample estimates of
$\theta $ or
$\rho $. This allows calculating postspatial and total effects of
$\mathbf {x}$ (Franzese and Hays Reference Franzese and Hays2007; LeSage and Pace Reference LeSage and Pace2009), yielding a more complete understanding of the relationship of interest.
3 Simulation
The following simulations demonstrate the small sample performance of spatial models when $\mathbf {W}$ is misspecified. We focus on the SAR model, which is the most widely used spatial model in applied research.Footnote 5 We generate data where both
$\mathbf {y}$ and
$\mathbf {x}$ are governed by SAR processes:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210427160133581-0221:S1047198720000261:S1047198720000261_eqn11.png?pub-status=live)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210427160133581-0221:S1047198720000261:S1047198720000261_eqn12.png?pub-status=live)
where $\mathbf {u}$ and
$\boldsymbol {\in }$ are N-length vectors with elements drawn from
$\mathcal {N}(0,1)$.
$\beta $ reflects the direct (i.e., prespatial) effect of
$\mathbf {x}$ on
$\mathbf {y}$, while
$\rho _y$ and
$\rho _x$ determine the strength of the spatial autocorrelation in
$\mathbf {y}$ and
$\mathbf {x}$.Footnote 6
We hold $\mathbf {W}$ and
$\mathbf {u}$ fixed across simulations. Locations for observations are determined by drawing vertical and horizontal coordinates from
$\mathcal {U}(0,5)$. Based on these coordinates, we generate a binary 10-nearest-neighbor
$\mathbf {W}$ matrix. We fix
$\beta $ at
$2$ and the number of observations at
$150$ across the experiments, focusing on variation in the spatial autoregressive parameters
$\rho _x$ and
$\rho _y$, which we vary between
$0$ (i.e., no spatial interdependence),
$0.3$, and
$0.6$ (i.e., high spatial interdependence). For each of these
$9$ experimental settings, we simulate
$2,000$ data sets.
To induce misspecification in the matrix $\widetilde {\mathbf {W}}$ used in the estimation, we generate a second connectivity matrix (
$\mathbf {M}$) based on a new random draw of locations.
$\mathbf {M}$ is therefore independent of the true
$\mathbf {W}$ used in the data-generating process. We then generate the set of connectivity matrices used in the model estimation (
$\widetilde {\mathbf {W}}$) as a mixture of the true (
$\mathbf {W}$) and false (
$\mathbf {M}$) matrices. Specifically, the elements
$\tilde {w}_{i,j}$ are determined as
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210427160133581-0221:S1047198720000261:S1047198720000261_eqnu1.png?pub-status=live)
where p is the probability of misclassification, which we increase from $0$ (no error) to
$1$ (all error) in increments of
$0.05$. In total, this produces
$21$ connectivity matrices used in estimation, which are all normalized using min–max normalization.Footnote 7
Using the simulated data for $\mathbf {y}$ and
$\mathbf {x}$, we estimate nonspatial linear models (via OLS) and SAR models (via ML) using
$\widetilde {\mathbf {W}}$s of varying accuracy (decreasing in p, the probability of misspecification). For each model, we record
$\hat {\beta }$ to assess performance. Figure 1 shows the results for the simulations of the SAR process based on 10-nearest neighbors and min–max normalization. Each cell presents the results for one combination of
$\rho _x$ and
$\rho _y$;
$\rho _x$ increases from
$0$ to
$0.6$ going from left to right,
$\rho _y$ increases moving from top to bottom. In each, we plot the densities of coefficient estimates at different levels of the misspecification probability p. Darker shading indicates higher levels of misspecification. The densities of
$\hat {\beta }$s for nonspatial models are plotted in black. The bias in both the nonspatial models and misspecified spatial models increases in
$\rho _y$ and
$\rho _x$, being largest in the bottom right cell.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210427160133581-0221:S1047198720000261:S1047198720000261_fig1.png?pub-status=live)
Figure 1 Coefficients with misspecification of W in SAR models based on 10-nearest-neighbors with min–max normalization (orange/grey) and OLS model omitting spatial lag (black).
The results underscore three points. First, as the misspecification of $\widetilde {\mathbf {W}}$ increases, the bias in
$\hat {\beta }$ increases. Yet even with high interdependence and mismeasured (or omitted)
$\mathbf {W}$, the observed bias is much smaller than the bounds derived above. Second, the SAR model weakly dominates the nonspatial model. Even a SAR model estimated with a random
$\widetilde {\mathbf {W}}$ does no worse than omitting the spatial term. Finally, the simulation results confirm our analytical results. For example, inequality (6) implies a maximum bias of
$3$. The bias in the simulations clearly maintains that bound. Moreover, with
$\rho _y$ =
$\rho _x$ = 0.6, on average we obtain
$\frac {\mathrm {cov}(\mathbf {W}\mathbf {y},\mathbf {x})}{\mathrm {var}(\mathbf {x})} = 1.45$. Equation (5) thus implies an OLS estimate of
$2.87$, identical to the average OLS estimate in the simulations. Tables C.1 and C.2 in the Appendix report these quantities for all simulation scenarios.
4 Conclusion
Researchers frequently suspect spatial dependence in their data, but lack knowledge of the precise network. Fearing that selecting the wrong network may open them to criticism, researchers may forgo spatial models altogether. Here, we have demonstrated the potential biases introduced from omitting spatial terms outright versus including them with error. Our results should encourage the estimation of spatial models even if researchers have imperfect information. As researchers in these settings likely lack strong theory-based specifications, we point to Griffith’s five rules of thumb for specifying weights matrices (Griffith Reference Griffith and Arlinghaus1996).
We emphasize that our results do not hold under differential measurement error. We suspect that differential measurement error is most likely for network ties that violate the exogeneity assumption for spatial weights—implying that traditional spatial econometric models would be inappropriate. However, we hope that future work extends our results to other contexts and more complex forms of measurement error. Several of the features identified here may be useful in these efforts. First, prior research focuses on misspecification in the weights matrix, yet errors manifest in the empirical model as vectors. Second, restrictions on $\mathbf {x}$—such as limiting the analysis to binary
$\mathbf {x}$—imply restrictions on
$\mathbf {W}\mathbf {x}$. Finally, row and min–max normalization imply bounds for the vector range and vector sum. Recognizing these attributes could be of potential use in new analytical and empirical approaches in future research.
Acknowledgments
An earlier version of this paper benefited from feedback at the 2017 Annual Meeting for the Society of Political Methodology. We particularly thank Andrew Bridy, three anonymous reviewers, and the editor for their useful advice. All remaining errors are ours alone. Authors are listed in alphabetical order, equal authorship is implied. For further questions contact Scott J. Cook. Portions of this research were conducted with high performance research computing resources provided by Texas A&M University (http://hprc.tamu.edu).
Data Availability Statement
Replication code for this article has been published in Code Ocean, a computational reproducibility platform that enables users to run the code and can be viewed interactively at https://codeocean.com/capsule/7618094/tree (Betz, Cook, and Hollenbach Reference Betz, Cook and Hollenbach2020b). A preservation copy of the same code and data can also be accessed via Dataverse at https://doi.org/10.7910/DVN/ADIFOV (Betz, Cook, and Hollenbach Reference Betz, Cook and Hollenbach2020c).
Supplementary Materials
To view supplementary material for this article, please visit https://dx.doi.org/10.1017/pan.2020.26.