1. Introduction
The traditional theory of exchangeability concerns families of random variables whose distribution is invariant with respect to symmetries of a well-defined index set. This includes the classical study of exchangeable sequences ${\left( {{X_i}} \right)_{i \in I}}$, which satisfy ${({X_{\sigma (i)}})_{i \in I}}\underline{\underline D} \,\,\,{({X_{\sigma (i)}})_{i \in I}}$ for all bijections $\alpha :\,I \to I$, with ‘$\underline{\underline {\rm{D}}} $‘ indicating equality in distribution. The concept naturally extends to two-dimensional arrays ${\left( {{X_{ij}}} \right)_{i,j \in I}}$ and arbitrary d-dimensional arrays (Xi 1···i d)i 1,...,i d∈I, for which exchangeability implies that
An exception to this setup arises in the study of exchangeable random partitions, which were initially studied by Kingman [Reference Kingman21] as models for classification of species in ecology, and have since proven useful throughout population genetics [Reference Ewens17] and statistical applications for clustering [Reference McCullagh and Yang24]; see [Reference Crane11] for a recent survey. In this case, exchangeability is defined as symmetry with respect to relabeling of an equivalence relation.
As we discuss in further detail below, Kingman’s theory of exchangeable partitions arises naturally in certain genetic and ecological contexts, in which a random sample of units are partitioned according to their membership in otherwise unlabeled categories. Here we expand upon Kingman’s initial treatment by describing a general class of random structures whose distributions are exchangeable with respect to relabeling of their relations. The theory of relational exchangeability presented here is thus a refinement of Kingman’s theory of exchangeable partitions and the theory of edge exchangeable random networks. Before characterizing this class of structures, we first discuss several motivating examples.
1.1. Exchangeable partitions
Consider a random sequence $X = \left( {{X_{1,}} \cdots \,{X_n}} \right)$ taking values in an at most countable set S. For concreteness, we may regard S as a set of species, so that X records the species for a random sample of animals from a certain population. From X, we define an equivalence relation $'{ \sim _X}'$ on $\left[ n \right]: = \left\{ {1, \cdots ,n} \right\}$ by
We write $\prod \left( X \right) = \left\{ {{B_1},\,{B_{2,}} \cdots } \right\}$ to denote the set partition whose blocks B 1, B 2, … are the equivalence classes induced $'{ \sim _X}'$. We sometimes write $'{ \sim _\prod }'$ in place of $'{ \sim _X}'$ when convenient.
Suppose now that $X = \left( {{X_{1,}} \cdots \,{X_n}} \right)$ is exchangeable, meaning that
for all permutations $\alpha :\left[ n \right] \to \left[ n \right]$. Then exchangeability of X induces exchangeability on the equivalence relation $\prod : = \prod \left( X \right)$ as defined in (1.1) in the sense that $${\Pi ^\sigma }^{\underline{\underline {\rm{D}}} }\,\Pi $$ for all permutations σ $\left[ n \right] \to \left[ n \right]$, with Πσ defined by
A key difference between exchangeable sequences and exchangeable partitions is that the equivalence classes in a set partition are unlabeled. As there are no definitive characteristics of the elements independent of how they relate to each other through the equivalence relation, the exchangeability condition is thus a distributional invariance of the associated relation instead of a distributional invariance with respect to the relabeling of the unobserved labels (i.e. species).
1.2. More general structures
The above theory of exchangeable random partitions can be refined to a wider class of structures that arise in modern data science and network analysis applications. Consider, for example, a uniform random sampling of emails from a database, e.g. the Enron email dataset [Reference Klimt and Yang22]. If emails are sampled sequentially from the database then the resulting data structure takes the form of a sequence of binary relations among individuals in a population, i.e. the sequence of sender–receiver pairs corresponding to a given email exchange, as in Figure 1, below.
Whereas email correspondences are binary (i.e. a communication between caller and receiver), other common network structures may not be. Common examples of generic interaction networks include the network of actor collaborations formed by sampling from the Internet Movie Database (IMDb) or the network of scientific coauthorships obtained by sampling articles from arXiv. These examples give rise to edge-labeled hypergraphs, as shown in Figures 5 and 6, below.
Still other examples arise, as in the path-labeled networks of Figure 7, below, which might represent the network structure obtained by sampling paths in the Internet. Such path sampling mechanisms were critical to the early studies of Internet topology; see, e.g. [Reference Achlioptas, Clauset, Kempe and Moore1] for some discussion of traceroute sampling and its impact on network analysis. We discuss these examples in further detail in Section 3.
The above examples are special cases of what we call relational structures. These structures are relational by virtue of their being observed by sampling a relation (e.g. species, phone call, movie collaboration, coauthorship, path between routers, etc.). The theory emerging is thus different from more traditional theories of exchangeability, for which the natural sampling unit, and thus the invariance, is defined with respect to relabeling of the underlying population.
As we observe beginning in Section 2, exchangeable relational data structures exhibit different probabilistic behaviors than more traditional exchangeable array data. A key difference lies in the assumed observation mechanism: whereas a random sequence (X 1, …, Xn) represents individual measurements taken on a sample of n distinct individuals, a random partition of [n], as in (1.1), represents a composite binary relation on [n]. The difference is subtle, as can be seen through the different theories for exchangeable random sequences (de Finetti’s theorem) and exchangeable random partitions (Kingman’s theorem). We begin in Section 2 with a discussion of edge exchangeable random networks, the simplest nontrivial extension of Kingman’s theory. We focus specifically on the distinction between the vertex- and edge-centric perspectives of network data, and why the latter is most appropriate for modeling networks constructed from sampled interactions, as in the Enron dataset. In Section 3, we extend this perspective to multiway interactions, such as networks constructed by observing scientific collaborations and paths in the Internet. In Section 4, we generalize these special cases into a framework for relationally labeled structures and prove our main theorem, which characterizes the probabilistic structure of relationally exchangeable structures uniquely in terms of a class of interaction propensity processes. By specializing our theory to the case of exchangeable equivalence relations, we recover Kingman’s paintbox correspondence. However, the argument needed to obtain our general representation requires a careful handling of elements called ‘blips’, which correspond to the ‘dust’ in the special case of Kingman’s theorem. We make concluding remarks in Section 5.
2. Edge-centric view of network structures
Consider sampling uniformly at random from a database of telephone calls exchanged among individuals in a countable set S. The sampling procedure results in an (S × S)-valued sequence $X = \left( {{X_1}, \cdots ,{X_n}} \right)$, where each Xi is an ordered pair (ci, ri) representing the caller ci and receiver ri. Such a call sequence can be represented by a vertex-edge labeled network in a straightforward way. For example, the sequence of ordered pairs given by
can be represented by a network-like object as shown in Figure 1(a).
Assuming that the sequence X is exchangeable induces a related exchangeability property on the induced vertex-edge labeled structure. By exchangeability of X, the model assigns equal probability to any two structures that are isomorphic up to relabeling edges, as in Figure 1(a) and (b). If we further assume that $X = \left( {{X_1},{X_n}, \cdots } \right)$ is countably infinite, then the one-to-one correspondence between the sequence X and its vertex-edge labeled network representation implies that the class of models for such data is determined by de Finetti’s theorem for exchangeable sequences.
But in much the same way that the species ‘names’ are disregarded when representing the classification of animals by a partition whose (unlabeled) blocks represent the class of animals of a given species, the essential structure of the call sequence is determined by the network structure obtained by disregarding the vertex ‘names’ in Figure 1. A key observation, which guides our development of the general theory below, is that exchangeability of the structure in Figure 1 is induced by the invariance of X with respect to relabeling its indices. In the vertex-edge labeled network representation, the indices labeling X correspond to the edge labels, thus giving rise to the concept of edge exchangeable random graphs, whose distributions are invariant with respect to arbitrary relabeling of their edges, as shown in Figure 2.
The principle of edge exchangeability was described in [Reference Crane and Dempsey13] and [Reference Crane and Dempsey14] as an alternative to the more conventional property of vertex exchangeability exhibited by graphon models, e.g. [Reference Aldous6], [Reference Bickel and Chen8], and [Reference Hoover19]. A similar nonparametric Bayesian treatment of edge exchangeability was described in [Reference Cai and Campbell9] and [Reference Cai, Campbell and Broderick10]. Edge exchangeability has a number of statistical and practical implications, which improve upon known drawbacks of vertex exchangeable models. For example, although real-world networks are widely believed to be sparse, it is well known that vertex exchangeable models are appropriate only for modeling networks that are ‘dense’. Edge exchangeable models, on the other hand, are able to handle sparsity in a natural and computationally tractable way. (See [14, Sections 4–6] for an illustration of the computation aspects of edge exchangeable models, in particular the instantiation of the Hollywood model in [14, Section 4].) Here we build upon these observations toward a more general theory of structures behaving in an analogous way to exchangeable random partitions and edge exchangeable random graphs.
To appreciate the new perspective offered by the theory of edge and relationally exchangeable networks presented here, it is important to clarify how the edge-centric perspective arises naturally out of our motivating example of phone call sampling. In the vertex-edge labeled structure in Figure 1, it may seem natural to consider the vertex labels a, b, c, d, and e as arbitrary ‘names’ chosen from S. Since the names are arbitrary, a standard line of reasoning may lead to the conclusion that the specific names do not matter, and thus that the model should be invariant under arbitrary reassignment of vertex names, as illustrated in Figure 3. But it is important to bear in mind that vertex exchangeability assumes not only distributional invariance with respect to relabeling of the sampled vertices, but also invariance with respect to relabeling of both sampled and unsampled vertices, thus implying that the observed vertices are representative of the population of all vertices (sampled and unsampled). In our running example, this is simply not the case, as uniform random sampling of telephone calls makes the observed vertices a size-biased sample according to their relative frequency of appearance in the database.
We conclude this section with a formal definition of the edge-labeled graphs shown in Figure 4, which serves as a precursor of our general framework of relational exchangeability below. Just as we disregarded the species names in passing from X to its induced equivalence relation ‘~X’ in (1.1), we may also disregard the vertex labels in Figure 1 to obtain an edge-labeled graph, as shown in Figure 4. The edge-labeled graph is defined formally as the equivalence class X≅ of all sequences X′ that yield the same structure as X after removing vertex labels:
Here we overload notation by allowing the bijection ρ : S → S to act on S × S by (c, r) ↦ (ρ(c), ρ(r)), so that ρ X′:= ρ ○ X′: [n] → S × S is well defined by composition of functions. Exchangeability of X immediately implies edge exchangeability of the edge-labeled graph associated to X≅.
Though the definition in (2.2) may appear foreboding at first, we stress that it is merely the extension of (1.1) to sequences of ordered pairs. In particular, if we let X : [n] → S be a sequence of species labels, as in the discussion preceding (1.1), then the analog to (2.2) is
which is exactly equivalent to the relation ‘~X’ defined in (1.1). We explain this notation further in our technical treatment given in Section 4. But first we present additional examples to illustrate the breadth of the framework presented here.
3. Multiway interaction networks: extending the edge-centric perspective
In the previous section we discussed data structures constructed from sampled phone calls involving exactly two individuals, a caller and receiver. But many real-world networks are constructed from interactions involving two or more constituent elements. For example, let S represent a set of scientists, and let X = (X 1, …, Xn) record names of coauthors on a collection of n scientific journal articles sampled uniformly from a database, such as arXiv. Then each Xi is a finite subset {si ,1, …, si,ki} ⊂ S and the data sequence X 1, …, Xn gives rise to a vertexhyperedge labeled structure.
Suppose that we observe the following sequence of ordered pairs obtained by the above procedure with n = 4:
As with the example from the previous section, the vertex labels serve only to distinguish among the observed elements. Once the relations of all observed vertices are recorded, through the induced network structure, these vertex labels can be disregarded. So while the data is observed in the form of a sequence, the essential structure can be represented by the hypergraph-labeled structure on the right-hand side of Figure 5.
The assumption that articles were observed by uniform sampling again makes X an exchangeable sequence, which in turn induces exchangeability on the associated hyperedgelabeled network constructed by disregarding vertex labels in the induced network structure in a manner analogous to the construction of edge-labeled graphs from the equivalence class (2.2). See Figure 6 for an illustration.
3.1. Examples of relational structures
Here we briefly describe additional relational structures that commonly arise in modern statistical applications and share the same essential structure as the above two examples. Each example points toward the basic structure of relationally exchangeable networks given in the next section. The α-structures of Section 4 unify these various structures into a single formal mathematical framework for studying relationally exchangeable structures.
We start with a similar example to the previous scientific collaboration network. Let S represent a set of individuals, and let X = (X 1, …, Xn) record names of senders and recipient lists of n emails sampled uniformly from an email database. Each Xi is an ordered pair (si, (ri 1, …, rik i)) consisting of a single element of S (the sender) and a vector of elements of S of some finite, arbitrary length (the recipients). In an email network, we could further classify the recipients as ‘direct recipients’ (i.e. listed in the ‘to’ field of the email), or recipients who were in the ‘cc’ or ‘bcc’ field of the email. This additional classification adds structure to the data. Our more general treatment of α-structures is helpful to represent this structure.
For another example, let S be a set of Internet Protocol (IP) addresses and let each entry of X = (X 1, …, Xn) correspond to the path traversed by a message sent between two IP addresses over the Internet. Each Xi corresponds to a path (si, ai,1, …, ai,k i, ti) from source si to target ti by passing through the intermediate nodes ai ,1, …, ai ,k i, altogether indicating that the path from s to ai ,1 to ai ,2 and so on until passing from si,k 1 to ti. If X was obtained, for example, by uniform random sampling of source-target pairs (si, ti) and then by applying an algorithm, such as traceroute, to obtain a path between si and ti, then the sequence of paths X is exchangeable and, therefore, so is the induced path-labeled structure shown in Figure 7. Note in Figure 7 that each path is labeled by a distinct line type, thus preserving the structure of the data in the same manner that is was observed. The more conventional approach to representing network data as a graph loses information about the dependency of the network on the path structure.
Another example in the realm of networks is to take each Xi to be an r-step ego network, obtained, e.g. by snowball sampling a neighborhood of size r from a randomly chosen vertex. The relationally labeled structure is constructed by piecing together the neighborhoods obtained from the r-neighborhoods of n randomly chosen ‘egos’ in this population. We omit the details of this example and instead move on to our general treatment.
Myriad other data structures arise according to a similar recipe as those above: let S be a set of elements, and let $\mathcal{R}$ be a set of relations on the elements in S; sample an exchangeable sequence X taking values in $\mathcal{R}$; construct a structure from X (as in Figures 1 and 5) and obtain the corresponding relationally labeled structure by removing the vertex labels (as in Figures 4, 6, and 7). The resulting structure is exchangeable with respect to relabeling of its relations, a property which we call relational exchangeability. We now define this notion formally and prove a generic structure theorem for infinite relationally exchangeable structures. The formal statement is given in Theorem 4.1. Our formal definition of relational structures and proof of the associated structural theorem offer a rigorous probabilistic foundation for statistical modeling of relational structures.
4. Relational exchangeability
Let $(S,\mathcal{S})$ be any Borel space. A countably infinite sequence of S-valued random variables X = (X 1, X 2, … ) is exchangeable if ${X^\sigma }\,\underline{\underline D} \,X\,$ for every permutation σ : ℕ → ℕ, where Xσ :=(Xσ (1), Xσ (2), … ) is the reordering of X according to σ. By de Finetti’s theorem [Reference De Finetti16], the distribution of every such countably infinite sequence can be expressed as a mixture of independent, identically distributed (i.i.d.) sequences. In particular, with $\mathcal{P}(S)$ denoting the space of probability measures on $(S,\mathcal{S})$, there is a unique probability measure ϕ on $\mathcal{P}(S)$ such that the distribution of X can be expressed as
where ν ∞ denotes the infinite product measure induced by ν on S ∞. We call the measure ϕ in (4.1) the de Finetti measure of X.
De Finetti’s theorem figures into our treatment of relationally exchangeable structures, which we now define. Note that all of the previous examples, and many more that could arise in practice, involve relational structures as they are defined in the coming section. Example 4.1 provides a concrete illustration.
4.1. Relational structures
Let α = {R 1, R 2, …} be a countable collection of relation symbols, and let α : ℕ → ℕ ∪{0} be a signature such that α ( j) = 0 implies that α (k) = 0 for all k ≥ j. An α-structure with domain A is a collection $\mathcal{A}=(A;\,R_1^{\mathcal{A}}, R_2^{\mathcal{A}},\ldots\!)$, where A is a set and each $R_j^{\cal A} \subseteq {A^{\alpha (\>j)}}$ is a relation of arity α ( j) on A. We adopt the convention that A 0 := ∅, so that $R_j^{\mathcal{A}}=\emptyset$ whenever α( j) = 0. When discussing these structures below, we may sometimes leave the domain implicit, writing $\mathcal{A}=(R_1^{\mathcal{A}},R_2^{\mathcal{A}},\ldots)$ if no confusion will result.
Remark 4.1. Although the α-structures defined above seem to be nonstandard in the probability literature, they are a standard object in the mathematical field of model theory, e.g. [Reference Hodges18], and they provide a useful setting in which to study general exchangeable structures, as we do here. For example, the framework of α-structures has proven a powerful technique in very recent studies of exchangeable structures in theoretical and applied probability theory [Reference Ackerman2], [Reference Ackerman, Freer, Kruckman and Patel4], [Reference Ackerman, Freer and Patel3], [Reference Crane12], [Reference Crane and Towsner15]. In our treatment below, we leverage this theory to study a general class of structures that arise naturally in applied probability and statistics.
Note that the domain A is for a particular α-structure. In the remainder of the text, the population of constituent elements take labels in ℤ. Therefore, every α-structure has domain A that is a finite subset of ℤ. An α-structure pertaining to a caller–receiver pair, for example, will have domain A of cardinality two; that is, A is the set of labels used to indicate the two individuals involved in the phone call. Of course, the actual elements of ℤ used to identify the particular caller and receiver are immaterial. For a set of α-structures, we are simply interested in the relations, not the identities of the constituent elements. Therefore, we will investigate the equivalence class of sets of α-structures with the same inherent relational structure.
We write dom $\mathcal{A}: = A$ for the domain of $\mathcal{A}$ and FINα for the set of all α-structures $\mathcal{A}$ with finite dom ${\cal A} \subset = \{ \ldots , - 1,0,1, \ldots \} $ such that $\sum_{j=1}^{\infty}|R_j^{\mathcal{A}}|<\infty$, where |S| denotes the cardinality of a set S. Writing dom ${\cal A} { \subset _f}$ to denote that dom $\mathcal{A}$ is a finite subset of ℤ, we have
The condition $\sum_{j\geq1}|R_j^{\mathcal{A}}|<\infty$ implies that to every ${\cal A} \in FI{N_\alpha }$ there is an $r_{\mathcal{A}}: = \max\{k\geq1\colon R_k^{\mathcal{A}}\neq\emptyset\}<\infty$, making it convenient to sometimes express ${\cal A} \in FI{N_\alpha }$ as $(\! dom\mathcal{A};\, R_1^{\mathcal{A}},\ldots,R_{r_{\mathcal{A}}}^{\mathcal{A}})$ by omitting the infinite sequence of empty relations after $r_{\mathcal{A}}$. We note that the set FINα is countable.
For any A* ⊇ A and any injection ρ : A* → A′, there is an induced action on $\mathcal{A}=(A;\, R_1^{\mathcal{A}},R_2^{\mathcal{A}},\ldots)$ by $\mathcal{A}\mapsto\rho(\mathcal{A})=(\rho(A);\,R_1^{\rho(\mathcal{A})}, R_2^{\rho(\mathcal{A})},\ldots)$, with dom $ dom\rho(\mathcal{A})=\rho(A): = \{\rho(a)\colon a\in A\}$, and
for each j = 1, 2, ….
For n ≥ 1 and any subset of finite α-structures $R\,{ \cong \over {\left[ n \right]}}$ for which each $\mathcal{A}\in\mathcal{R}$ has dom ${\cal A}{ \subset _f}\,$, we consider the set ${\text{R}}\frac{ \cong }{{\left[ n \right]}}$ of equivalence classes of $\mathcal{R}$-valued sequences $R\,{ \cong \over {\left[ n \right]}}$ obtained as follows. Any $R\,{ \cong \over {\left[ n \right]}}$ determines a relationally labeled structure by removing the labels of the elements contained in each of the x (i) while maintaining the integrity of the overall structure induced by x. Formally, we define the relationally labeled structure induced by x as the equivalence class
where $\rho {\text{ }}x:[n] \mathcal{R}$ is defined by $\left( {\rho x} \right)\left( i \right) = \left( {\rho x} \right)\left( i \right))$ as in (4.2). We write $R\frac{ \cong }{{\left[ n \right]}}$ as the set of all x≅ constructed from some $x:[n] \to \mathcal{R}$ in this way.
Remark 4.2. (Notation.) Below we reserve lowercase letters x, y, … for generic (nonrandom) objects and uppercase letters X, Y, … for random objects. We use bold letters x, y, … for generic (nonrandom) functions $x:[n] \to \mathcal{R}$ and X, Y, … for random functions $x:[n] \to \mathcal{R}$.
Example 4.1. In this example, we demonstrate how the modern data structures introduced in Section 1 fit into the framework of α-structures. For this, we write $\mathcal{A}=(A;\,R_1^{\mathcal{A}},R_2^{\mathcal{A}},\ldots)$ as an α-structure with dom $\mathcal{A}=A$ and α varying according to the context.
(a) For α (1) = 1 and $\alpha \left( k \right) = 0for\,k\, \ge \,2$, each α-structure $\mathcal{A}$ is determined by a subset $R_1^{\mathcal{A}}\subseteq A$. If we then choose $\mathcal{R} \subseteq \,FI{N_\alpha }$ to consist of all singleton subsets of ℤ, so that each $R_1^{\mathcal{A}}$ has the form $i \in $, then the equivalence class of $x:[n] \to \mathcal{R}$ in (4.3) corresponds to the equivalence relation induced by x as in (1.1).
(b) For α (1) = 2 and α (k) = 0 for k ≥ 2, each α-structure $\mathcal{A}$ with dom $\mathcal{A}=A$ corresponds to a binary relation $R_1^{\mathcal{A}}\subseteq A^2$. Taking $\mathcal{R} \subseteq {\rm{ }}FI{N_\alpha }$ to consist of all structures with $R_1^{\mathcal{A}}=\{(i,j)\}$ for i ≠ j corresponds to sampling from the phone call database in Section 1: each sampled Xk = {(i, j)} corresponds to a caller–receiver pair (i, j). The equivalence class x≅ gives an edge-labeled (directed) graph as in Figure 2. (To get an undirected graph, we would take each $R_1^{\mathcal{A}}$ to consist of symmetric pairs {(i, j), (j, i)}.)
(c) Taking α( j) = j for all j ≥ 1 means that any α-structure is determined by a collection $R_j^{\mathcal{A}}\subseteq A^j$ of subsets of j-tuples for every j ≥ 1. If we take R ⊂ FINα to consist of only those α-structures of the form $R_{k}^{\mathcal{A}}=\left\{\left(a_{\sigma(1)}, a_{\sigma(2)}, \ldots, a_{\sigma(k)}\right)\right.$: permutations σ : [k] → [k]} for some k ≥ 1 and $R_j^{\mathcal{A}}=\emptyset$ for all j ≠ k, then the elements $\mathcal{A}\in\mathcal{R}$ can be used to represent the set of coauthors in a sample of articles.
(b) A small alteration of the previous α-structure can be used to represent the sender– receiver pair list in a sample of emails. Take α(1) = 1 and α(j + 1) = j for all j ≥ 2. Then take R ⊂ FINα to consist of only those α-structures of the form $R_{1}^{\mathcal{A}}=\{(a_1) \}$ and $R_{k+1}^{\mathcal{A}}=\left\{\left(a_{\sigma(1)}, a_{\sigma(2)}, \ldots, a_{\sigma(k)}\right)\right.$: permutations $\alpha :\left[ k \right] \to \left[ k \right]\} $ for some k ≥ 2 and $R_{k+1}^{\mathcal{A}}=\left\{\left(a_{\sigma(1)}, a_{\sigma(2)}, \ldots, a_{\sigma(k)}\right)\right.$ for all j ≠ k ≥ 2.
(e) Also for α( j) = j for all j ≥ 1, if we take $R \subset \,FI{N_\alpha }$ to consist of only those α-structures of the form $R_k^{\mathcal{A}}=\{(a_1,a_2,\ldots,a_{k})\}$ for some k ≥ 1 and $R_j^{\mathcal{A}}=\emptyset$ for all j ≠ k, then each $\mathcal{A}\in\mathcal{R}$ corresponds to a path from a 1 to ak, as discussed at the end of Section 1.
Note that the above formalism applies to structures with multiple relations, e.g. a graph with a distinguished community of vertices would have α(1) = 2 (for the binary relation of edges), α(2) = 1 (for the distinguished subset of vertices), and α(k) = 0 for k ≥ 3.
Example 4.1 demonstrates that α-structures are well-defined mathematical objects which compactly represent a wide range of relational structures prevalent in modern statistical settings. Examples of α-structures include graphs, hypergraphs, partitions, unary relations, and much more general structures. Our main theorem, Theorem 4.1, establishes a generic construction for infinitely exchangeable sequences of relational structures. This theorem will provide a necessary probabilistic foundation for research into statistical modeling of modern relational datasets.
4.2. Relational exchangeability
Our discussion below considers the case of random structures labeled by the countable set ℕ, which are those structures constructed just as in (4.3) but for a countable sequence $x:{\text{ }}{\Bbb N} \to \mathcal{R}$. It is important to note the difference between the index set ℕ and the domain of the structures in $\mathcal{R}$, which we take to be ℤ. The domain labeling is ‘quotiented out’ in (4.3), while the indexing ℕ remains, serving as the ‘edge labels’ in the associated interpretation as an edge-labeled graph in Figure 2.
We write $R\frac{ \cong }{{\Bbb N}}$ to denote the set of such structures and we equip $R\frac{ \cong }{{\Bbb N}}$ with the Borel σ-field associated to its product-discrete topology induced by the following metric. For any ${x_ \cong } \in \,R\frac{ \cong }{{\Bbb N}}$ and n ≥ 1, we define the restriction ${R_n}\,x \cong \,\,\,of\,x \cong $ to ${R_n}\,x \cong of\,x\, \cong $ as the structure obtained by taking any ${x^\prime } \in {x_ \cong }\,\,\left( {i.e.\,\,\,} \right.{x^\prime }:\,{\Bbb N} \to \,R$ for which there exists ρ such that ρ x′= x), restricting its domain to [n] by ${x^\prime }_{|[n]}:[n] \to \mathcal{R}$, i ↦ x′(i), and putting ${R_{n\,\,}}{x_ \cong }: = {({x^\prime }_{|[n]})_ \cong }$ as in (4.3). It is clear from the definition of $R\frac{ \cong }{{\Bbb N}}$ that this is well defined and does not depend on the specific choice of x′ ∈ x≅. We then define the metric on $R\frac{ \cong }{{\Bbb N}}$ by
with the convention that 1/∞ = 0, under which $R\frac{ \cong }{{\Bbb N}}$ is compact.
For any permutation σ : $ \to $, we define the relabeling of ${x_ \cong } \in \,R\frac{ \cong }{{\Bbb N}}$ by σ as the structure $x_{\cong}^{\sigma}$ obtained by first choosing any x′ ∈ x≅ and putting $x_{\cong}^{\sigma}: = (x'\circ\sigma)_{\cong}$, where $x'^\circ \sigma :{\text{ }}{\Bbb N} \to \mathcal{R}$ is defined by the usual composition of functions, $\left( {x' \circ \sigma } \right)\left( i \right): = x'\left( {\sigma \left( i \right)} \right)$. It is once again clear that this does not depend on the specific choice of representative x′ ∈ x≅ since the actions of $\rho : \to $ and $\sigma : \to $ commute for all x, i.e. $\rho \,\left( {{x}^{'}}\circ \sigma \right)\text{ }=\left( \rho \text{ }{{x}^{'}} \right){}^\circ \sigma $.
Definition 4.1. (Relational exchangeability.) A random structure ${x_ \cong } \in R\frac{ \cong }{{\Bbb N}}$ is relationally exchangeable if $X_ \cong ^\sigma\underline{\underline D} X \cong $ for all permutations σ : ℕ → ℕ.
4.3. Representation theorem
Our main theorem establishes a generic construction for infinite relationally exchangeable structures X≅ in ${\text{R}}\frac{ \cong }{{\Bbb N}}$. The construction proceeds by sampling a sequence X 1, X 2, … conditionally i.i.d. in a related space $\mathcal{R}^\star$ and then modifying these observations to obtain a new sequence ${X^\dagger }: = (X_1^\dagger ,X_2^\dagger , \ldots )$. We then construct the equivalence class $X_ \cong ^\dagger $ based on the sequence $X\frac{\dagger }{1},X\frac{\dagger }{2}, \ldots $ as in (4.3). Since stating the theorem formally requires some new ideas and notation, we give the construction and key ideas of the proof prior to stating the result in Theorem 4.1.
As above, let $\sigma : \to \cup \left( 0 \right)$, and let $\mathcal{R} \subseteq \,FI{N_\alpha }$ be a set of α-structures such that α ( j) = 0 implies that α (k) = 0 for all k ≥ j and each $\mathcal{A}\in\mathcal{R}$ has dom $\mathcal{A}{ \subset _f}$. Since $\mathcal{R}$ is at most countable, we may fix an ordering $\mathcal{R}=\{\mathcal{S}_n\}_{n\geq1}$ of its elements. Given any $ = (dom;{\mkern 1mu} R_1^,R_2^, \ldots ) \in {\rm{FI}}{{\rm{N}}_\alpha }$ and any [0, 1]-valued sequence ${\left( {{T_i}} \right)_{i \in }}$, we define $T\mathcal{A}\,:\!\equiv T\circ\mathcal{A}$ as the α-structure $T\mathcal{A}$ with dom $(T\mathcal{A}): = \{T_i\colon i\in dom\mathcal{A}\}$ and relations $R_j^{T\mathcal{A}}$ given by
for each j = 1, 2, …. We then define
as the set of α-structures obtained by associating [0, 1]-valued labels to the elements of $\mathcal{R}$. More generally, if $(\Xi_{i})_{i\in dom\mathcal{A}}$ is a collection of subsets ${\Xi _i} \subseteq \left[ {0,1} \right]$, then we define $\Xi\mathcal{A}$ by
We equip $\mathcal{S}_{[0,1]}$ with the σ-field on $\mathcal{S}_{[0,1]}$ generated by all sets of the form $\Xi\mathcal{A}$ with $\mathcal{A}\in\mathcal{R}$ and $\Xi=(\Xi_i)_{i\in dom\mathcal{A}}$ a collection of Borel subsets of [0, 1].
From any $x\cong \,\in \,\,R\frac{\cong }{\mathbb{N}}$ and any sequence $\xi = {\left( {{\xi _i}} \right)_{i \in }}$ of i.i.d. Uniform [0, 1] random variables, we write $\xi \,x \cong $ to denote a random ${{S}_{\left[ 0,1 \right]}}$-valued sequence $\left( {{Z_{_I}}} \right)i \ge 1$ obtained by first taking any representative x′ ∈ x≅ and then putting Zi = ξ x′(i) for each i ≥ 1, with ξ x′(i) defined as in (4.4). Since x≅ is fixed in this example, each ξ x′(i) corresponds to an assignment of Uniform[0, 1] random labels to the elements in the domain of x′(i). Since ξi ≠ ξj with probability 1 for all i ≠ j, it is immediate that the distribution of ξ x ≅ does not depend on the manner in which representative x′ is chosen.
Now, for any infinite relationally exchangeable structure ${{X}_{\cong }}\in \,R\frac{\cong }{\mathbb{N}}$ and a sequence ξ of Uniform[0, 1] random variables independent of X≅, ξ X ≅ = (Zi)i≥1 defines a random sequence obtained by putting Zi = ξ x′(i) for a representative x′ ∈ x≅ on the event X≅ = x≅. By exchangeability of X≅, the sequence Zi = (Zi)i≥1 obtained in this way is also exchangeable and de Finetti’s theorem implies that Z is distributed as an i.i.d. sequence from a random measure ν on ${{S}_{\left[ 0,1 \right]}}$, as described in (4.1).
Given ν, we define the propensity of u ∈ [0, 1] in Z by
which equals the conditional probability of the event {u ∈ dom Zj} given ν for each j ≥ 1. It is clear that, with probability 1, each u ∈ [0, 1] appears in either 0, 1, or infinitely many of the relations (Zi)i≥1. First, if ν*(u) > 0 then the strong law of large numbers implies that u occurs in infinitely many of the relations Zi with proportion ν*(u). If ν*(u) = 0 and u ∈ Zi for some i, then the probability that u occurs in Zj is ν*(u) = 0 independently for each j ≠ i and, therefore, u appears only once in Z with probability 1.
Clearly, the set $\mathcal{U}=\{u\colon \nu^\star(u)>0\}$ is at most countable since | dom (Zi)| < ∞ for each i = 1, 2, …. We can, therefore, order the elements of $\mathcal{U}$ as $u_1,u_2,\ldots$ as u 1, u 2, … such that ν*(uj) ≥ ν* (uj +1) for j ≥ 1, breaking ties ν*(v) = ν*(w) as follows. For each $\mathcal{A}\in\mathcal{R}$, we define
to be the measure assigned to the subset of $\mathcal{S}_{[0,1]}$ whose structure is consistent with $\mathcal{A}$ and which contains u in its domain. Assuming that ν*(v) = ν*(w), we assign the smaller label to v if there is some k ≥ 1 such that $\nu^\star(v;\,\mathcal{S}_j)=\nu^\star(w;\,\mathcal{S}_j)$ for all j < k and $\nu^\star(v;\,\mathcal{S}_j)=\nu^\star(w;\,\mathcal{S}_j)$. If $\nu^\star(v;\,\mathcal{S}_j)=\nu^\star(w;\,\mathcal{S}_j)$ for all j ≥ 1 then we label v and w in increasing order, i.e. if we are to assign labels j and j + 1 to v and w and if v < w, then we shall put uj = v and uj +1 = w; otherwise, we put uj = w and uj +1 = v. The ordering $\mathcal{U}=(u_j)_{j\geq1}$ is thus uniquely determined by ν and the fixed ordering of $\mathcal{R}$ chosen at the outset.
Now given $\mathcal{U}$, for any $\mathcal{A}\in\mathcal{S}_{[0,1]}$, we define $\mathcal{A}^\star$ (suppressing the dependence on ν) by replacing each occurrence of $u_j\in\mathcal{U}$ by j and replacing each occurrence of $v'\notin\mathcal{U}$ in $\mathcal{A}$ by a unique nonpositive integer z(v′) = 0, −1, −2, … so that, for v′, $v''\notin\mathcal{U}$ and both in dom $\mathcal{A}$ with v′≤ v″ we have z(v′) ≤ z(v″) and the z(v′) are chosen to be the largest possible nonpositive integers that satisfy this condition. (For example, if v 1 < v 2 < v 3 are the only elements in dom $\mathcal{A}$ that are not in $\mathcal{U}$, we assign z(v 3) = 0, z(v 2) = −1, and z(v 1) = −2.)
Every $\mathcal{A}\in\mathcal{S}_{[0,1]}$ thus corresponds to a unique such $\mathcal{A}^\star$ and we define $\mathcal{R}^\star$ as the set of all structures $\mathcal{A}^\star$ obtained in this way. Note that, since $\mathcal{R}$ is at most countable, so is $\mathcal{R}^\star$. Also, although we have constructed $\mathcal{R}^\star$ using $\mathcal{U}$ (and therefore ν), the set $\mathcal{R}^\star$ depends only on $\mathcal{R}$, justifying the term $\mathcal{R}$-simplex in our definition of $ FR$ in (4.6) below.
The elements of $\mathcal{R}^\star$ are thus α-structures $\mathcal{A}^\star$ with dom ${A^ * } \subseteq $. We define the $\mathcal{R}$-simplex $ FR$ by
on which we equip the metric
and the associated Borel σ-field.
Any $f\in FR$ determines a unique probability measure εf on ${\rm{R}}{ \cong \over }$ by first drawing X = (X 1, X 2, … ) i.i.d. from
and then constructing ${X^\dagger } = (X{\dagger \over 1},X{\dagger \over 2}, \ldots )$ from X as follows. We initialize by putting m 0 = 0. For each n ≥ 1, given mn −1, we replace the nonpositive elements in dom Xn according to the following rule.
(i) If dom Xn has no nonpositive elements then put mn = mn −1 and $X{ \cong \over n}: = {X_n}$.
(ii) If dom Xn has nonpositive elements 0, …, −k for k ≥ 0 then define $X{\dagger \over n}$ by replacing each occurrence of −i for 0 ≤ i ≤ k in Xn by mn −1 − i, then putting mn = mn −1 − k − 1 and keeping positive elements unchanged. See Example 4.2 below for an illustration.
Note the distinction between the use of nonpositive elements in the constructions of Xi and $X{\dagger \over i}$, respectively. The nonpositive elements of each $X_i \in \mathcal{R}^\star$ serve to denote the nonrecurring particles that appear only within this particular relation. The nonpositive elements of $$X{\dagger \over i}$$, on the other hand, serve to denote the nonrecurring particles across all relations in the sequence $$X\dagger $$. We define εf to be the distribution of $X{\dagger \over \cong }$ constructed by applying (4.3) to the sequence X †.
From ν, we define $f^{\nu}=(\,f_{\mathcal{B}}^{\nu})_{\mathcal{B}\in\mathcal{R}^\star}\in FR$ by
This choice of $f^{\nu}=(\,f^{\nu}_{\mathcal{B}})_{\mathcal{B}\in\mathcal{R}^\star}$ is uniquely determined by ν and the fixed ordering of $\mathcal{R}$. Conversely, given $f=(\,f_{\mathcal{B}})_{\mathcal{B}\in\mathcal{R}^\star}\in FR$, we construct a measure νf on $\mathcal{S}_{[0,1]}$ to be the distribution of ξY from (4.4) for Y drawn from distribution (4.7) and ξ = (ξi)i∈ℤ i.i.d. Uniform[0, 1] independent of Y. Proposition 4.1 proves that the above procedure does not alter the random, relationally labeled structure.
Proposition 4.1. Let X≅ be relationally exchangeable, and let θ = (θi)i∈ℤ be i.i.d. Uniform[0, 1] independent of X ≅. Then (((θ X ≅)*)†)≅ = X≅ a.s., where (θ X ≅ )* denotes the application of $^{\star}\colon \mathcal{S}_{[0,1]}\to\mathcal{R}^\star$ to each component of the sequence θ X ≅.
Proof. Let θ = (θi)i∈ℤ be i.i.d. Uniform [0, 1] independent of X≅. Each event X≅ = x≅ gives rise to a probability measure ν on $\mathcal{S}_{[0,1]}$ through the de Finetti measure of θ x ≅; see (4.1). Let $\mathcal{U}=(u_i)_{i\geq1}$ be the ordered subset of [0, 1] corresponding to the atoms of ν* as in (4.5). Since θ is i.i.d. Uniform[0, 1], we have
which implies that distinct i, j ∈ ℤ are labeled distinctly in θ x ≅ with probability 1. In particular, the nonpositive labels in (θ x ≅.)* account for all those elements that appear in only one entry of the sequence θ x ≅.It follows that ((θ x ≅)*)† ∈ x≅ with probability 1 and, thus, ((θ x ≅)*)†)≅ = x≅ with probability 1 for all possible outcomes X≅ = x≅.
Example 4.2. To illustrate the above procedure, let $\mathcal{R}$ consist of the singleton sets ({i}), i ≥ 1, just as in Example 4.1(a), written as {1}, {2}, … for simplicity. Let $\mathcal{U}=(u_i)_{i\geq1}$ be a countable subset of [0, 1]. From $\mathcal{R}=\{\{1\},\{2\},\ldots\}$, we obtain $\mathcal{R}^\star=\{\{0\},\{1\},\ldots\}$ since, for any sequence T =(Ti)i∈ℤ and any {i}, i ≥ 1, the transformed relation T{i} ≡ {Ti} has either $T_i\in\mathcal{U}$ or $T_i\not\in\mathcal{U}$. If $T_i=u_k\in\mathcal{U}$ then {Ti}* = {k} for k ≥ 1; and if $T_i=v\not\in\mathcal{U}$ then {Ti}* = ({0}).
Given $(\,f_{\{0\}},f_{\{1\}},\ldots)\in FR$, we generate a sequence X 1, X 2, … of singleton sets i.i.d. as in (4.7), from which we obtain $X_1^{\over dagger},X_2^{\over dagger},\ldots$ by reassigning occurrences of 0 to the greatest negative integer that has not yet appeared in the sequence. For example, if the sequence begins {4}, {0}, {2}, {0}, {2}, {2}, {0}, …, then we reassign labels to obtain {4}, {0}, {2}, {−1}, {2}, {2}, {−2}, …. Delabeling according to (4.3) (or equivalently (1.1)) yields the equivalence classes {1}, {2}, {3, 5, 6}, {4}, {7}.
For another example, let $\mathcal{R}$ consist of ordered pairs {(i, j)}, 1 ≤ i ≠ j < ∞, so that we are in the phone call example of Section 1. $\mathcal{U}=(u_i)_{i\geq1}$ be a countable subset of [0, 1]. Then
since any T = (Ti)i∈ℤ and any $\{(i,j)\}\in\mathcal{R}$ is transformed by T{(i, j)} = {(Ti, Tj)}, which may have $T_i=u_{i'}\in\mathcal{U}$ and $T_j=u_{j'}\in\mathcal{U}$ in which case {(Ti, Tj)}* = {(i′, j′ )}. If $T_i=u_{i'}\in\mathcal{U}$ and $T_j\not\in\mathcal{U}$, then {(Ti, Tj)}* = {(i′, 0)}. If $T_i\not\in\mathcal{U}$ and Tj = uj ′ then {(Ti, Tj)}* = {(0, j′)}. If Ti, $T_i,T_j\not\in\mathcal{U}$ then {(Ti, Tj)}* will equal {(0, −1)} or {(− 1, 0)} depending on whether Tj < Ti or Ti < Tj, respectively.
The above construction gives the following representation theorem.
Theorem 4.1. Let α be a signature, ${\mathcal R} \subseteq {\rm{FI}}{{\rm{N}}_\alpha }$, and fix an ordering $\mathcal{R}=(\mathcal{S}_n)_{n\geq1}$. Let X ≅ be a relationally exchangeable random structure in ${\rm{R}}{ \cong \over }$. Then there exists a probability measure ϕ on $ FR$ such that X ≅~εϕ, where
A canonical version of the measure ϕ in Theorem 4.1 can be constructed as in (4.8), and this is the measure we construct in the following proof.
Proof of Theorem 4.1. We proceed by constructing ϕ as in (4.8) and showing that $Y\frac{\dagger }{\cong }\sim ~{{\varepsilon }_{\phi }}$ satisfies $Y{ \over \cong }\,{X_ \cong }$.
To see this, we first let ψ be the de Finetti measure on $\mathcal{P}(\mathcal{S}_{[0,1]})$ (i.e. the space of probability measures on $\mathcal{S}_{[0,1]}$) associated to ξ X ≅ for ξ = (ξi)i∈ℤ i.i.d. Uniform[0, 1] independent of X≅. In particular, the distribution of ξ X ≅ is conditionally i.i.d. from ν ~ ψ. The measure ψ determines a measure ϕ on $ FR$ through (4.8). We need to show that $Y\,\frac{\dagger }{_{\cong }}\,\tilde{\ }{{\varepsilon }_{\phi }}$ satisfies $Y{\dagger \over \cong }{\rm{ }}{X_ \cong }$.
Given f ~ ϕ, we construct Y = (Y 1, Y 2, … ) as conditionally i.i.d. from the distribution in (4.7) and Y† as in (i) and (ii) above. Let $\mathcal{U}=(u_i)_{i\geq1}$ be the atoms of θ Y† for θ = (θi)i∈ℤ i.i.d. Uniform[0, 1] independent of Y†, let ζ = (ζk)k≤0 be i.i.d. Uniform[0, 1] independent of everything else, and define ξf = (ξi)i∈ℤ by
First note that the conditional distribution of ξf Y† given f is the same as the conditional law of θ Y† given νf, since the ξi for i ≥ 1 were constructed from the atoms of $\theta \,Y{\dagger \over \cong }$. Writing $\mathcal{D}(X\mid Y)$ to denote the conditional distribution of X given Y, we thus have
where (4.10) follows from the construction of ϕ from the de Finetti measure ψ of ξ X ≅. From (4.9) and (4.10), it follows that
and, thus,
By Proposition 4.1, we have ${(({(\theta \,Y{\dagger \over \cong })^ \star }))_ \cong } = Y{\dagger \over \cong }$ a.s. and (((θ X ≅)*)†)≅ = X≅, implying that $Y{\dagger \over \cong }{X_ \cong }$, as desired.
4.4. Special cases
As discussed in Example 4.2, the case in which $\mathcal{R}$ corresponds to the singleton sets {i} for i ≥ 1 gives $ FR$ equal to the ranked simplex
To see this, note that $\mathcal{S}_{[0,1]}$ is the set of singleton elements of [0, 1], i.e.
An exchangeable sequence X = (X 1, X 2, … ) in $\mathcal{S}_{[0,1]}$ gives rise to a random countable subset $\mathcal{U}\subset[0,1]$ of elements that appear infinitely often among the Xi and its complement $[0,1]\setminus\mathcal{U}$ consisting of all elements appearing at most once among the Xi. In the construction of X† from X described above, any occurrence Xi = {u} for $u\notin\mathcal{U}$ gives rise to $X_i^{\star}=\{0\}$, explaining why the definition of $ FR$ in (4.6) corresponds to the simplex of elements (fi)i≥0.
Following Example 4.1(b), the case of edge exchangeable random (directed) graphs corresponds to ${\mathcal R} \subset FI{N_\alpha }$ with α(1) = 2 and α(k) = 0 for k ≥ 2, with each $\mathcal{A}\in\mathcal{R}$ having $R_1^{\mathcal{A}}=\{(i,j)\},\,i\neq j$, i ≠ j. In this case, $\mathcal{S}_{[0,1]}$ corresponds to all pairs (u, v) for u, v ∈ [0, 1]. An exchangeable sequence X in $\mathcal{S}_{[0,1]}$ determines a random subset $\mathcal{U}\subset[0,1]$. Now, in each Xi there can be 0, 1, or 2 elements $v\notin\mathcal{U}$. Occurrences of such elements are replaced by 0 and −1, as explained in Example 4.2.
In general, we call the occurrences of any $u\notin\mathcal{U}$ in the sequence X blips. These elements are merely a ‘blip’ in the overall sequence X—they occur only for an instant and then never again. Our labeling convention in defining $\mathcal{R}^\star$ is that the nonpositive labels correspond to the blips. Theorem 4.1 describes how blips arise in general relationally exchangeable structures.
5. Concluding
Though relational structures are commonplace in modern statistics, the theory of relational exchangeability presented here stands apart from most other studies of exchangeable structures by de Finetti [Reference De Finetti16], Aldous [Reference Aldous5], Hoover [Reference Hoover19], Kallenberg [Reference Kallenberg20], and others [Reference Ackerman, Freer and Patel3], [Reference Austin and Panchenko7], [Reference Crane and Towsner15]. The most relevant prior work is Kingman’s on exchangeable random set partitions [Reference Kingman21], which can be reinterpreted as relationally exchangeable structures in the sense described above.
Recent work in the statistical analysis of network data underscores the significance of relationally labeled structures in applications, as many data structures which are typically represented graphically, such as social networks and networks detailing email correspondence and professional collaborations, arise from a process by which interactions or relations accumulate within a population of otherwise indistinguishable individuals. The work in [Reference Crane and Dempsey14] focused on the case of edge exchangeable random graphs, but the additional examples in Sections 1–3 involving repeated path sampling and snowball sampling are also highly relevant in network applications.
Our main representation theorem serves two immediate statistical purposes. First, the representation characterizes a general class of nonparametric statistical models of potential interest in the aforementioned applications. Second, it establishes that vertices arrive in size-biased random order in relationally exchangeable structures, explaining why the common assumption of exchangeable vertex labeling, as presented in graphon models [Reference Lovász and Szegedy23], is not tenable in many applications. We reserve discussion of these practical implications for other work; see [Reference Crane and Dempsey13] and [Reference Crane and Dempsey14].