Data structures that define relations between pairs of actors are ubiquitous in political science—examples include the study of events such as legislation cosponsorship, trade, interstate conflict, and the formation of international agreements. The dominant paradigm for dealing with such data, however, is not a network approach but rather a dyadic design, in which an interaction between a pair of actors is considered independent of interactions between any other pair in the system. To highlight the ubiquity of this approach the following represent just a sampling of the articles published from the 1980s to the present in the American Journal of Political Science (AJPS) and American Political Science Review (APSR) that assume dyadic independence: Dixon (Reference Dixon1983), Mansfield, Milner, and Rosendorff (Reference Mansfield, Milner and Rosendorff2000), Lemke and Reed (Reference Lemke and Reed2001), Mitchell (Reference Mitchell2002), Dafoe (Reference Dafoe2011), Carnegie (Reference Carnegie2014), Fuhrmann and Sechser (Reference Fuhrmann and Sechser2014).
The implication of this assumption is that when, for example, Vietnam and the United States decide to form a trade agreement, they make this decision independently of what they have done with other countries and what other countries in the international system have done among themselves.Footnote 1 An even stronger assumption is that Japan declaring war against the United States is independent of the decision of the United States to go to war against Japan.Footnote 2 A common refrain from those that favor the dyadic approach is that many events are only bilateral (Diehl and Wright Reference Diehl and Wright2016), thus alleviating the need for an approach that incorporates interdependencies between observations. However, even bilateral events and processes take place within a broader system, and occurrences in one part of the system may be dependent upon events in another. At a minimum, we do not know whether independence of events and processes characterizes what we observe.
In this article, we introduce the additive and multiplicative effects (AME) model and compare it to two popular alternatives: the latent space model (LSM) and exponential random graph model (ERGM). The AME approach to network modeling is a flexible framework that can be used to estimate many different types of cross-sectional and longitudinal networks with binary, ordinal, or continuous edges within a generalized linear model framework. Our approach addresses ways in which observations can be interdependent while still allowing scholars to focus on examining theories that may only be relevant in the monadic or dyadic level. Further, at the network level it accounts for nodal and dyadic dependence patterns, and provides a descriptive visualization of higher-order dependencies such as homophily and stochastic equivalence.
The article is organized as follows, we begin by briefly discussing the difficulties in studying dyadic data through approaches that assume observational independence. Then we introduce the AME framework in two steps. We first discuss nodal and dyadic dependencies that may lead to non-iid observations and show how the additive effects portion of AME can be used to model these dependencies. Similarly, in the second step, we discuss how the multiplicative effects portion of the AME framework can be used to effectively model third-order effects while still enabling researchers to study exogenous covariates of interest. We then briefly contrast these latent variable models with ERGM and conclude with an application on a cross-sectional network measuring collaborations during the policy design of the Swiss $\text{CO}_{2}$ act. We show that AME provides a superior goodness of fit to the data in terms of ability to predict linkages and capture network dependencies.
1 Addressing Dependencies in Dyadic Data
Relational, or dyadic, data provide measurements of how pairs of actors relate to one another. The easiest way to organize such data is the directed dyadic design in which the unit of analysis is some set of $n$ actors that have been paired together to form a dataset of $z$ directed dyads. A tabular design such as this for a set of $n$ actors, $\{i,j,k,l\}$ results in $n\times (n-1)$ observations, as shown in Table 1.
When modeling these types of data, scholars typically employ a generalized linear model (GLM) estimated via maximum likelihood. The stochastic component of this model reflects our assumptions about the probability distribution from which the data are generated: $y_{ij}\sim P(Y|\unicode[STIX]{x1D703}_{ij})$ , with a probability density or mass function such as the normal, binomial, or Poisson, and we assume that each dyad in the sample is independently drawn from that particular distribution, given $\unicode[STIX]{x1D703}_{ij}$ . The systematic component characterizes the model for the parameters of that distribution and describes how $\unicode[STIX]{x1D703}_{ij}$ varies as a function of a set of nodal and dyadic covariates, $\mathbf{X}_{ij}$ : $\unicode[STIX]{x1D703}_{ij}=\unicode[STIX]{x1D737}^{T}\mathbf{X}_{ij}$ . The key assumption we make when applying this modeling technique is that given $\mathbf{X}_{ij}$ and the parameters of our distribution, each of the dyadic observations is conditionally independent. Specifically, we construct the joint density function over all dyads using the observations from Table 1 as an example.
We next convert the joint probability into a likelihood: ${\mathcal{L}}(\unicode[STIX]{x1D73D}|\mathbf{Y})=\prod _{\unicode[STIX]{x1D6FC}=1}^{n\times (n-1)}P(y_{\unicode[STIX]{x1D6FC}}|\unicode[STIX]{x1D703}_{\unicode[STIX]{x1D6FC}})$ .
The likelihood as defined above is only valid if we are able to make the assumption that, for example, $y_{ij}$ is independent of $y_{ji}$ and $y_{ik}$ given the set of covariates we specified.Footnote 3 Assuming that the dyad $y_{ij}$ is conditionally independent of the dyad $y_{ji}$ asserts that there is no level of reciprocity in a dataset, an assumption that in many cases would seem quite untenable. A harder problem to handle is the assumption that $y_{ij}$ is conditionally independent of $y_{ik}$ ; the difficulty here follows from the possibility that $i$ ’s relationship with $k$ is dependent on how $i$ relates to $j$ and how $j$ relates to $k$ , or more simply put the “enemy of my enemy [may be] my friend”. Accordingly, inferences drawn from misspecified models that ignore potential interdependencies between dyadic observations are likely to have a number of issues including biased estimates of the effect of independent variables, uncalibrated confidence intervals, and poor predictive performance.
2 Additive Part of AME
The dependencies that tend to develop in relational data can be more easily understood when we move away from stacking dyads on top of one another and turn instead to a matrix design as illustrated in Table 2. Operationally, this type of data structure is represented as a $n\times n$ matrix, $\mathbf{Y}$ , where the diagonals are typically undefined. The $ij$ th entry defines the relationship sent from $i$ to $j$ and can be continuous or discrete. Relations between actors in a network setting at times does not involve senders and receivers. Networks such as these are referred to as undirected and all the relations between actors are symmetric, meaning $y_{ij}=y_{ji}$ .
The most common type of dependency that arises in networks are first-order, or nodal dependencies, and these point to the fact that we typically find significant heterogeneity in activity levels across nodes. The implication of this across-node heterogeneity is within-node homogeneity of ties, meaning that values across a row, say $\{y_{ij},y_{ik},y_{il}\}$ , will be more similar to each other than other values in the adjacency matrix because each of these values has a common sender $i$ . This type of dependency manifests in cases where sender $i$ tends to be more active or less active in the network than other senders. Similarly, while some actors may be more active in sending ties to others in the network, we might also observe that others are more popular targets; this would manifest in observations down a column, $\{y_{ji},y_{ki},y_{li}\}$ , being more similar. Last, we might also find that actors who are more likely to send ties in a network are also more likely to receive them, meaning that the row and column means of an adjacency matrix may be correlated. Another ubiquitous type of structural interdependency is reciprocity. This is a second-order, or dyadic, dependency relevant only to directed datasets, and asserts that values of $y_{ij}$ and $y_{ji}$ may be statistically dependent. The prevalence of these types of potential interactions within directed dyadic data also complicates the basic assumption of observational independence.
We model first- and second-order dependencies in AME using a set of additive effects that are motivated by the social relations model (SRM) developed by Warner, Kenny, and Stoto (Reference Warner, Kenny and Stoto1979), Li and Loken (Reference Li and Loken2002). Specifically, we decompose the variance of observations in an adjacency matrix in terms of heterogeneity across row means (outdegree), heterogeneity along column means (indegree), correlation between row and column means, and correlations within dyads:
$\unicode[STIX]{x1D707}$ here provides a baseline measure of the density mean of a network, and $e_{ij}$ represents residual variation. The residual variation decomposes into parts: a row/sender effect ( $a_{i}$ ), a column/receiver effect ( $b_{j}$ ), and a within-dyad effect ( $\unicode[STIX]{x1D716}_{ij}$ ). The row and column effects are modeled jointly to account for correlation in how active an actor is in sending and receiving ties. Heterogeneity in the row and column means is captured by $\unicode[STIX]{x1D70E}_{a}^{2}$ and $\unicode[STIX]{x1D70E}_{b}^{2}$ , respectively, and $\unicode[STIX]{x1D70E}_{ab}$ describes the linear relationship between these two effects (i.e., whether actors who send a lot of ties also receive a lot of ties). Beyond these first-order dependencies, second-order dependencies are described by $\unicode[STIX]{x1D70E}_{\unicode[STIX]{x1D716}}^{2}$ and a within-dyad correlation, or reciprocity, parameter $\unicode[STIX]{x1D70C}$ .
We incorporate the covariance structure described in Equation (2) into the systematic component of a GLM framework: $\unicode[STIX]{x1D737}^{\top }\mathbf{X}_{ij}+a_{i}+b_{j}+\unicode[STIX]{x1D716}_{ij}$ , where $\unicode[STIX]{x1D737}^{\top }\mathbf{X}_{ij}$ accommodates the inclusion of dyadic, sender, and receiver covariates. This approach incorporates row, column, and within-dyad dependence in way that is widely used and understood by applied researchers: a regression framework and additive random effects to accommodate variances and covariances often seen in relational data. Furthermore, this handles a diversity of outcome distributions.
3 Multiplicative Part of AME
Missing from the additive effects portion of the model is an accounting of third-order dependence patterns that can arise in relational data. A third-order dependency is defined as the dependency between triads, not dyads. The ubiquity of third-order effects in relational datasets can arise from the presence of some set of shared attributes between nodes that affects their probability of interacting with one another.Footnote 4
For example, finding common in the political economy literature is that democracies are more likely to form trade agreements with one another, and the shared attribute here is a country’s political system. A binary network where actors tend to form ties with others based on some set of shared characteristics often leads to a network graph with a high number of “transitive triads” in which sets of actors $\{i,j,k\}$ are each linked to one another. The leftmost plot in Figure 1 provides a representation of a network that exhibits this type of pattern. The relevant implication of this when it comes to conducting statistical inference is that—unless we are able to specify the list of exogenous variable that may explain this prevalence of triads—the probability of $j$ and $k$ forming a tie is not independent of the ties that already exist between those actors and $i$ .
Another third-order dependence pattern that cannot be accounted for in the additive effects framework is stochastic equivalence. A pair of actors $ij$ are stochastically equivalent if the probability of $i$ relating to, and being related to, by every other actor is the same as the probability for $j$ . This refers to the idea that there will be groups of nodes in a network with similar relational patterns. The occurrence of a dependence pattern such as this is not uncommon in the social science applications. Recent work estimates a stochastic equivalence structure to explain the formation of preferential trade agreements (PTAs) between countries Manger, Pickup, and Snijders (Reference Manger, Pickup and Snijders2012). Specifically, they suggest that PTA formation is related to differences in per capita income levels between countries. Countries falling into high-, middle-, and low-income per capita levels will have patterns of PTA formation that are determined by the groups into which they fall. Such a structure is represented in the rightmost panel of Figure 1; here the lightly shaded group of nodes at the top can represent high-income countries, nodes on the bottom-left middle-income, and the darkest shade of nodes low-income countries. The behavior of actors in a network can at times be governed by group level dynamics, and failing to account for such dynamics leaves potentially important parts of the data generating process ignored.
We account for third-order dependence patterns using a latent variable framework, and our goal in doing so is twofold: (1) be able to adequately represent third-order dependence patterns, (2) improve our ability to conduct inference on exogenous covariates. Latent variable models assume that relationships between nodes are mediated by a small number ( $K$ ) of node-specific unobserved latent variables. We contrast the approach that we utilize within AME, the latent factor model (LFM), to the latent space model, which is among the most widely used in the networks literature.Footnote 5 For the sake of exposition, we consider the case where relations are symmetric to describe the differences between these approaches. These approaches can be incorporated into the framework that we have been constructing through the inclusion of an additional term, $\unicode[STIX]{x1D6FC}(\unicode[STIX]{x1D707}_{i},\unicode[STIX]{x1D707}_{j})$ , that captures latent third-order characteristics of a network. General definitions for how $\unicode[STIX]{x1D6FC}(u_{i},u_{j})$ are defined for these latent variable models are shown in Equations (3):
In the LSM approach, each node $i$ has some unknown latent position in $K$ -dimensional space, $\mathbf{u}_{i}\in \mathbb{R}^{K}$ , and the probability of a tie between a pair $ij$ is a function of the negative Euclidean distance between them: $-|\mathbf{u}_{i}-\mathbf{u}_{j}|$ . Because latent distances for a triple of actors obey the triangle inequality, this formulation models the tendencies toward homophily commonly found in social networks. This approach is implemented in the latentnet which is part of the statnet $\mathsf{R}$ package Krivitsky and Handcock (Reference Krivitsky and Handcock2015). However, this approach also comes with an important shortcoming: it confounds stochastic equivalence and homophily. Consider two nodes $i$ and $j$ that are proximate to one another in $K$ -dimensional Euclidean space; this suggests not only that $|\mathbf{u}_{i}-\mathbf{u}_{j}|$ is small but also that $|\mathbf{u}_{i}-\mathbf{u}_{l}|\approx |\mathbf{u}_{j}-\mathbf{u}_{l}|$ , the result being that nodes $i$ and $j$ will by construction assumed to possess the same relational patterns with other actors such as $l$ (i.e., that they are stochastically equivalent). Thus LSMs confound strong ties with stochastic equivalence. This approach cannot adequately model data with many ties between nodes that have different network roles. This is problematic as real-world networks exhibit varying degrees of stochastic equivalence and homophily. In these situations, using the LSM would end up representing only a part of the network structure.
In the latent factor model, each actor has an unobserved vector of characteristics, $\mathbf{u}_{i}=\{u_{i,1},\ldots ,u_{i,K}\}$ , which describe their behavior as an actor in the network. The probability of a tie from $i$ to $j$ depends on the extent to which $\mathbf{u}_{i}$ and $\mathbf{u}_{j}$ are “similar” (i.e., point in the same direction) and on whether the entries of $\unicode[STIX]{x1D6EC}$ are greater or less than zero. More specifically, the similarity in the latent factors, $\mathbf{u}_{i}\approx \mathbf{u}_{j}$ , corresponds to how stochastically equivalent a pair of actors are and the eigenvalue determines whether the network exhibits positive or negative homophily. For example, say that we estimate a rank-one latent factor model (i.e., $K=1$ ); in this case $\mathbf{u}_{i}$ is represented by a scalar $u_{i,1}$ , similarly, $\mathbf{u}_{j}=u_{j,1}$ , and $\unicode[STIX]{x1D6EC}$ will have just one diagonal element $\unicode[STIX]{x1D706}$ . The average effect this will have on $y_{ij}$ is simply $\unicode[STIX]{x1D706}\times u_{i}\times u_{j}$ , where a positive value of $\unicode[STIX]{x1D706}>0$ indicates homophily and $\unicode[STIX]{x1D706}<0$ heterophily. This approach can represent both varying degrees of homophily and stochastic equivalence.Footnote 6
In addition to summarizing dependence patterns in networks, scholars are often concerned with accounting for interdependencies so that they can better estimate the effects of exogenous covariates. Both the latent space and factor models attempt to do this as they are “conditional independence models”—in that they assume that ties are conditionally independent given all of the observed predictors and unknown node-specific parameters: $p(Y|X,U)=\prod _{i<j}p(y_{i,j}|x_{i,j},u_{i},u_{j})$ . Typical parametric models of this form relate $y_{i,j}$ to $(x_{i,j},u_{i},u_{j})$ via a link function:
However, the structure of $\unicode[STIX]{x1D6FC}(\mathbf{u}_{i},\mathbf{u}_{j})$ can result in very different interpretations for any estimates of the regression coefficients $\unicode[STIX]{x1D6FD}$ . For example, suppose the latent effects $\{u_{1},\ldots ,u_{n}\}$ are near zero on average (if not, their mean can be absorbed into an intercept parameter and row and column additive effects). Under the LFM, the average value of $\unicode[STIX]{x1D6FC}(\mathbf{u}_{i},\mathbf{u}_{j})=\mathbf{u}_{i}^{\top }\unicode[STIX]{x1D6EC}\mathbf{u}_{j}$ will be near zero and so we have
The implication of this is that the values of $\unicode[STIX]{x1D6FD}$ can be interpreted as yielding the “average” value of $\unicode[STIX]{x1D702}_{i,j}$ . On the other hand, under the LSM
In this case, $\unicode[STIX]{x1D6FD}^{\top }\bar{x}$ does not represent an “average” value of the predictor $\unicode[STIX]{x1D702}_{i,j}$ , it represents a maximal value as if all actors were zero distance from each other in the latent social space. For example, consider the simplest case of a normally distributed network outcome with an identity link:
Under the LSM, $\bar{y}\approx \unicode[STIX]{x1D6FD}^{\top }\bar{x}-\overline{|\mathbf{u}_{i}-\mathbf{u}_{j}|}<\unicode[STIX]{x1D6FD}^{\top }\bar{x}$ , and so we no longer can interpret $\unicode[STIX]{x1D6FD}$ as representing the linear relationship between $y$ and $x$ . Instead, it may be thought of as describing some sort of average hypothetical “maximal” relationship between $y_{i,j}$ and $x_{i,j}$ .
Thus the LFM provides two important benefits. First, we are able to capture a wider assortment of dependence patterns that arise in relational data, and, second, parameter interpretation is more straightforward. The AME approach considers the regression model shown in Equation (4):
Using this framework, we are able to model the dyadic observations as conditionally independent given $\unicode[STIX]{x1D73D}$ , where $\unicode[STIX]{x1D73D}$ depends on the unobserved random effects, $\mathbf{e}$ . $\mathbf{e}$ is then modeled to account for the potential first-, second-, and third-order dependencies that we have discussed. As described in Equation (2), $a_{i}+b_{j}+\unicode[STIX]{x1D716}_{ij}$ , are the additive random effects in this framework and account for sender, receiver, and within-dyad dependence. The multiplicative effects, $\mathbf{u}_{i}^{\top }\mathbf{D}\mathbf{v}_{j}$ , are used to capture higher-order dependence patterns that are left over in $\unicode[STIX]{x1D73D}$ after accounting for any known covariate information.Footnote 7
3.1 ERGMs
An alternative approach to accounting for third-order dependence patterns are ERGMs. Whereas AME seeks to estimate interdependencies in a network through a set of latent variables, ERGM approaches are useful when researchers are interested in the role that a specific network statistic(s) has in giving rise to an observed network. These network statistics could include the number of transitive triads in a network, balanced triads, reciprocal pairs and so on.Footnote 8 In the ERGM framework, a set of statistics, $S(\mathbf{Y})$ , define a model. Given the chosen set of statistics, the probability of observing a particular network dataset $\mathbf{Y}$ can be expressed as:
$\unicode[STIX]{x1D737}$ represents a vector of model coefficients for the specified network statistics, ${\mathcal{Y}}$ denotes the set of all obtainable networks, and the denominator is used as a normalizing factor (Hunter et al. Reference Hunter, Handcock, Butts, Goodreau and Morris2008). This approach provides a way to state that the probability of observing a given network depends on the patterns that it exhibits, which are operationalized in the list of network statistics specified by the researcher. Within this approach one can test the role that a variety of network statistics play in giving rise to a particular network. In addition, researchers can easily accommodate nodal and dyadic covariates. Further because of the Hammersley–Clifford theorem any probability distribution over networks can be represented by the form shown in Equation (5).
A notable issue when estimating ERGMs, however, is that the estimated model can become degenerate. Degeneracy here means that the model places a large amount of probability on a small subset of networks that fall in the set of obtainable networks, ${\mathcal{Y}}$ , but share little resemblance with the observed network, $\mathbf{Y}$ (Schweinberger Reference Schweinberger2011).Footnote 9 Some have argued that model degeneracy is simply a result of model misspecification (Handcock Reference Handcock, Ronald, Kathlene and Pip2003; Goodreau et al. Reference Goodreau, Handcock, Hunter, Butts and Morris2008; Handcock et al. Reference Handcock, Hunter, Butts, Goodreau and Morris2008). However, this points to an important caveat in interpreting the implications of the Hammersley–Clifford theorem. Though this theorem ensures that any network can be represented through an ERGM, it says nothing about the complexity of the sufficient statistics ( $S(y)$ ) required to do so. Failure to properly account for higher-order dependence structures through an appropriate specification can at best lead to model degeneracy, which provides an obvious indication that the specification needs to be altered, and at worst deliver a result that converges but does not appropriately capture the interdependencies in the network. The consequence of the latter case is a set of inferences that will continue to be biased as a result of unmeasured heterogeneity, thus defeating the major motivation for pursuing an inferential network model in the first place.
In the following section we undertake a comparison of the latent distance model, ERGM, and the AME model using an application presented in Cranmer et al. (Reference Cranmer, Leifeld, McClurg and Rolfe2017).Footnote 10 In doing so, we are able to compare and contrast these various approaches.
4 Empirical Comparison
We utilize a cross-sectional network measuring whether an actor indicated that they collaborated with each other during the policy design of the Swiss $\text{CO}_{2}$ act (Ingold Reference Ingold2008). This is a directed relational matrix as an actor $i$ can indicate that they collaborated with $j$ but $j$ may not have stated that they collaborated with $i$ . The Swiss government proposed this act in 1995 with the goal of undertaking a 10% reduction in $\text{CO}_{2}$ emissions by 2012. The act was accepted in the Swiss Parliament in 2000 and implemented in 2008. Ingold (Reference Ingold2008), and subsequent work by Ingold and Fischer (Reference Ingold and Fischer2014), sought to determine what drives collaboration among actors trying to affect climate change policy. The set of actors included in this network are those that were identified by experts as holding an important position in Swiss climate policy. In total, Ingold identifies 34 relevant actors: five state actors, eleven industry and business representatives, seven environmental NGOs and civil society organizations, five political parties, and six scientific institutions and consultants. We follow Ingold and Fischer (Reference Ingold and Fischer2014) and Cranmer et al. (Reference Cranmer, Leifeld, McClurg and Rolfe2017) in developing a model specification to understand and predict link formation in this network.Footnote 11
The LSM we fit on this network includes a two-dimensional Euclidean distance metric. The ERGM specification for this network includes not only the same exogenous variables as LSM, but also a number of endogenous characteristics of the network. The AME model we fit includes the same exogenous covariates and accounts for nodal and dyadic heterogeneity using the SRM.Footnote 12 Third-order effects are represented by the latent factor model with $K=2$ . Last, we also include a logistic model as that is still the standard in most of the field. Parameter estimates for these three approaches are shown in Table 3.
The first point to note is that, in general, the parameter estimates returned by the AME while similar to those of ERGM are quite different from the LSM. For example, while the LSM returns a result for the Opposition/alliance variable that diverges from ERGM, the AME returns a result that is similar to Ingold & Fischer. Similar discrepancies appear for other parameters such as Influence attribution and Alter’s influence degree. Each of these discrepancies are eliminated when using AME. As described previously, this is because the LSM approach complicates the interpretation of the effects of exogenous variables due to the construction of the latent variable term.Footnote 13
There are also a few differences between the parameter estimates that result from the ERGM and AME. Using the AME we find evidence that Preference dissimilarity is associated with a reduced probability of collaboration between a pair of actors, which is in line with the theoretical expectations of Ingold and Fischer.Footnote 14 In addition, the AME results differ from ERGM for the nodal effects related to whether a receiver of a collaboration is a government actor, Alter=Government actor, and whether the sender is an environmental NGO, Ego=Environmental NGO.
4.1 Tie formation prediction
To test which model more accurately captures the data generating process for this network, we utilize a cross-validation procedure to assess the out-of-sample performance for each of the models presented in Table 3. Our cross-validation approach proceeds as follows:
∙ Randomly divide the $n\times (n-1)$ data points into $S$ sets of roughly equal size, letting $s_{ij}$ be the set to which pair $\{ij\}$ is assigned.
∙ For each $s\in \{1,\ldots ,S\}$ :
– Obtain estimates of the model parameters conditional on $\{y_{ij}:s_{ij}\neq s\}$ , the data on pairs not in set $s$ .
– For pairs $\{kl\}$ in set $s$ , let ${\hat{y}}_{kl}=E[y_{kl}|\{y_{ij}:s_{ij}\neq s\}]$ , the predicted value of $y_{kl}$ obtained using data not in set $s$ .
The procedure summarized in the steps above generates a sociomatrix of out-of-sample predictions of the observed data. Each entry ${\hat{y}}_{ij}$ is a predicted value obtained from using a subset of the data that does not include $y_{ij}$ . In this application we set $S$ to 45 which corresponds to randomly excluding approximately 2% of the data from the estimation.Footnote 15 Using the set of out-of-sample predictions we generate from the cross-validation procedure, we provide a series of tests to assess model fit. The leftmost plot in Figure 2 compares the four approaches in terms of their ability to predict the out-of-sample occurrence of collaboration based on Receiver Operating Characteristic (ROC) curves. ROC curves provide a comparison of the trade-off between the True Positive Rate (TPR), sensitivity, False Positive Rate (FPR), 1-specificity, for each model. Models that have a better fit according to this test should have curves that follow the left-hand border and then the top border of the ROC space. On this diagnostic, the AME model performs best closely followed by ERGM. The Logit and LSM approach lag notably behind the other specifications.
A more intuitive visualization of the differences between these modeling approaches can be gleaned through examining the separation plots included on the right-bottom edge of the ROC plot. This visualization tool plots each of the observations, in this case actor pairs, in the dataset according to their predicted value from left (low values) to right (high values). Models with a good fit should have all network links, here these are colored by the modeling approach, toward the right of the plot. Using this type of visualization emphasizes that the AME and ERGM models perform better than the alternatives.
The last diagnostic we highlight to assess predictive performance are precision-recall (PR) curves. In both ROC and PR space we utilize the TPR, also referred to as recall—though in the former it is plotted on the $y$ -axis and the latter the $x$ -axis. The difference, however, is that in ROC space we utilize the FPR, while in PR space we use precision. FPR measures the fraction of negative examples that are misclassified as positive, while precision measures the fraction of examples classified as positive that are truly positive. PR curves are useful in situations where correctly predicting events is more interesting than simply predicting nonevents (Davis and Goadrich Reference Davis and Goadrich2006). This is especially relevant in the context of studying many relational datasets in political science such as conflict, because events in such data are extremely sparse and it is relatively easy to correctly predict nonevents.
In the case of our application dataset, the vast majority of dyads, 80%, do not have a network linkage, which points to the relevance of assessing performance using the PR curves as we do in the rightmost plot of Figure 2. We can see that the relative ordering of the models remains similar but the differences in how well they perform become much more stark. Here we find that the AME approach performs notably better in actually predicting network linkages than each of the alternatives. Area under the curve (AUC) statistics are provided in Figure 2 and these also highlight AME’s superior out-of-sample performance.Footnote 16
4.2 Capturing network attributes
We also assess which of these models best captures the network features of the dependent variable.Footnote 17 To do this, we compare the observed network with a set of networks simulated from the estimated models.Footnote 18 We simulate 1,000 networks from the three models and compare how well they align with the observed network in terms of four network statistics: (1) the empirical standard deviation of the row means (i.e., heterogeneity of nodes in terms of the ties they send); (2) the empirical standard deviation of the column means (i.e., heterogeneity of nodes in terms of the ties they receive); (3) the empirical within-dyad correlation (i.e., measure of reciprocity in the network); and (4) a normalized measure of triadic dependence. A comparison of the LSM, ERGM, and AME models among these four statistics is shown in Figure 3.
Here it becomes quickly apparent that the LSM model fails to capture how active and popular actors are in the Swiss climate change mitigation network.Footnote 19 The AME and ERGM specifications again both tend to do equally well. If when running this diagnostic, we found that the AME model did not adequately represent the observed network this would indicate that we might want to increase $K$ to better account for network interdependencies. No changes to the model specification as described by the exogenous covariates a researcher has chosen would be necessary. If the ERGM results did not align with the diagnostic presented in Figure 3, then this would indicate that an incorrect set of endogenous dependencies have been specified.
5 Conclusion
The AME approach to estimation and inference in network data provides a number of benefits over extant alternatives in political science. Specifically, it provides a modeling framework for dyadic data that is based on familiar statistical tools such as linear regression, GLM, random effects, and factor models.Footnote 20 Further we have shown that alternatives such as the LSM complicate parameter interpretation due to the construction of the latent variable term. The benefit of AME is that its focus intersects with the interest of most IR scholars, which is primarily on the effects of exogenous covariates. For researchers in the social sciences this is of primary interest, as many studies that employ relational data still have conceptualizations that are monadic or dyadic in nature.
ERGMs are best suited for cases in which scholars are interested in studying the role that particular types of node- and dyad-based network configurations play in generating the network. Though valuable this is often orthogonal to the interest of most researchers who are focused on studying the effect of a particular exogenous variable, such as democracy, on a dyadic variable like conflict while simply accounting for network dependencies. In addition, through the application dataset utilized herein we show that the AME approach outperforms both ERGM and LSM in out-of-sample prediction, and also is better able to capture network dependencies than the LSM.
More broadly, relational data structures are composed of actors that are part of a system.Footnote 21 It is unlikely that this system can be viewed simply as a collection of isolated actors or pairs of actors. The assumption that dependencies between observations occur can at the very least be examined. Failure to take into account interdependencies leads to biased parameter estimates and poor fitting models. By using standard diagnostics such as shown in Figure 3, one can easily assess whether an assumption of independence is reasonable. We stress this point because a common misunderstanding that seems to have emerged within the social science literature relying on dyadic data is that a network- based approach is only necessary if one has theoretical explanations that extend beyond the dyadic. This is not at all the case and findings that continue to employ a dyadic design may misrepresent the effects of the very variables that they are interested in. The AME approach that we have detailed here provides a statistically familiar way for scholars to account for unobserved network structures in relational data.
Supplementary material
For supplementary material accompanying this paper, please visit https://doi.org/10.1017/pan.2018.50.