Relational exchangeability

Harry Crane; Walter Dempsey

doi:10.1017/jpr.2019.13

Relational exchangeability

Part of: Stochastic processes Operations research and management science

Published online by Cambridge University Press: 12 July 2019

Harry Crane and

Walter Dempsey

Show author details

Harry Crane*: Affiliation:
Rutgers University
Walter Dempsey*: Affiliation:
Harvard University
*: *Postal address: Department of Statistics & Biostatistics, Rutgers University, 110 Frelinghuysen Avenue, Piscataway, NJ 08854, USA. Email address: hcrane@stat.rutgers.edu Partially supported by NSF grants CNS-1523785 and CAREER DMS-1554092.
**Postal address: Department of Statistics, Harvard University, One Oxford Street, Cambridge, MA 02138, USA. Email address: wdempsey@fas.harvard.edu

Article contents

Abstract
Introduction
Edge-centric view of network structures
Multiway interaction networks: extending the edge-centric perspective
Relational exchangeability
Concluding
References

Rights & Permissions

Abstract

A relationally exchangeable structure is a random combinatorial structure whose law is invariant with respect to relabeling its relations, as opposed to its elements. Historically, exchangeable random set partitions have been the best known examples of relationally exchangeable structures, but the concept now arises more broadly when modeling interaction data in modern network analysis. Aside from exchangeable random partitions, instances of relational exchangeability include edge exchangeable random graphs and hypergraphs, path exchangeable processes, and a range of other network-like structures. We motivate the general theory of relational exchangeability, with special emphasis on the alternative perspective it provides and its benefits in certain applied probability problems. We then prove a de Finetti-type structure theorem for the general class of relationally exchangeable structures.

Keywords

Edge exchangeable graph Kingman’s correspondence paintbox process exchangeable random partition

MSC classification

Primary: 60G09: Exchangeability

Secondary: 90B15: Network models, stochastic

Type: Research Papers
Information: Journal of Applied Probability , Volume 56 , Issue 1 , March 2019 , pp. 192 - 208

DOI: https://doi.org/10.1017/jpr.2019.13 [Opens in a new window]
Copyright: © Applied Probability Trust 2019

1. Introduction

The traditional theory of exchangeability concerns families of random variables whose distribution is invariant with respect to symmetries of a well-defined index set. This includes the classical study of exchangeable sequences ${\left( {{X_i}} \right)_{i \in I}}$, which satisfy ${({X_{\sigma (i)}})_{i \in I}}\underline{\underline D} \,\,\,{({X_{\sigma (i)}})_{i \in I}}$ for all bijections $\alpha :\,I \to I$, with ‘$\underline{\underline {\rm{D}}} $‘ indicating equality in distribution. The concept naturally extends to two-dimensional arrays ${\left( {{X_{ij}}} \right)_{i,j \in I}}$ and arbitrary d-dimensional arrays (X_i _{₁···i _d})_{i ₁,...,i _d∈I}, for which exchangeability implies that

$${({X_{\sigma ({i_1}) \cdots \sigma ({i_d})}})_{{i_1}, \ldots ,{i_d} \in I}}\,\,\,_ = ^D{({X_{{i_1} \cdots {i_d}}})_{{i_1}, \ldots ,{i_d} \in I}}\quad {\rm{for}}\,{\rm{all}}\,{\rm{permutations}}\,\,\,\,\sigma :{\rm{I}} \to {\rm{I}}.$$

An exception to this setup arises in the study of exchangeable random partitions, which were initially studied by Kingman [Reference Kingman21] as models for classification of species in ecology, and have since proven useful throughout population genetics [Reference Ewens17] and statistical applications for clustering [Reference McCullagh and Yang24]; see [Reference Crane11] for a recent survey. In this case, exchangeability is defined as symmetry with respect to relabeling of an equivalence relation.

As we discuss in further detail below, Kingman’s theory of exchangeable partitions arises naturally in certain genetic and ecological contexts, in which a random sample of units are partitioned according to their membership in otherwise unlabeled categories. Here we expand upon Kingman’s initial treatment by describing a general class of random structures whose distributions are exchangeable with respect to relabeling of their relations. The theory of relational exchangeability presented here is thus a refinement of Kingman’s theory of exchangeable partitions and the theory of edge exchangeable random networks. Before characterizing this class of structures, we first discuss several motivating examples.

1.1. Exchangeable partitions

Consider a random sequence $X = \left( {{X_{1,}} \cdots \,{X_n}} \right)$ taking values in an at most countable set S. For concreteness, we may regard S as a set of species, so that X records the species for a random sample of animals from a certain population. From X, we define an equivalence relation $'{ \sim _X}'$ on $\left[ n \right]: = \left\{ {1, \cdots ,n} \right\}$ by

(1.1)

$$i \sim xj\quad {\rm{if}}\,{\rm{and}}\,{\rm{only}}\,\,{\rm{if}}\quad {X_i} = {X_j}.$$

We write $\prod \left( X \right) = \left\{ {{B_1},\,{B_{2,}} \cdots } \right\}$ to denote the set partition whose blocks B ₁, B ₂, … are the equivalence classes induced $'{ \sim _X}'$. We sometimes write $'{ \sim _\prod }'$ in place of $'{ \sim _X}'$ when convenient.

Suppose now that $X = \left( {{X_{1,}} \cdots \,{X_n}} \right)$ is exchangeable, meaning that

$$({X_{\sigma (1)}}, \ldots ,{X_{\sigma (n)}}){\,^{\underline{\underline D} }}({X_1}, \ldots ,{X_n})$$

for all permutations $\alpha :\left[ n \right] \to \left[ n \right]$. Then exchangeability of X induces exchangeability on the equivalence relation $\prod : = \prod \left( X \right)$ as defined in (1.1) in the sense that $${\Pi ^\sigma }^{\underline{\underline {\rm{D}}} }\,\Pi $$ for all permutations σ $\left[ n \right] \to \left[ n \right]$, with Π^σ defined by

$$i \sim {\Pi ^\sigma }j\quad {\rm{if}}{\mkern 1mu} \,{\rm{and}}\,{\mkern 1mu} {\rm{only}}\,{\mkern 1mu} {\rm{if}}\quad {\sigma ^{{\rm{ - 1}}}}({\rm{i}}){ \sim _\Pi }{\sigma ^{{\rm{ - 1}}}}({\rm{j}}).$$

A key difference between exchangeable sequences and exchangeable partitions is that the equivalence classes in a set partition are unlabeled. As there are no definitive characteristics of the elements independent of how they relate to each other through the equivalence relation, the exchangeability condition is thus a distributional invariance of the associated relation instead of a distributional invariance with respect to the relabeling of the unobserved labels (i.e. species).

1.2. More general structures

The above theory of exchangeable random partitions can be refined to a wider class of structures that arise in modern data science and network analysis applications. Consider, for example, a uniform random sampling of emails from a database, e.g. the Enron email dataset [Reference Klimt and Yang22]. If emails are sampled sequentially from the database then the resulting data structure takes the form of a sequence of binary relations among individuals in a population, i.e. the sequence of sender–receiver pairs corresponding to a given email exchange, as in Figure 1, below.

Figure 1: (a) Representation of the sequence of ordered pairs in (2.1). (b) Representation of the sequence of ordered pairs in S = {a, b, c, d, e} with $X'_2=(c,a)$, $ X'_2=(d,e)$, $X'_3=(a,c)$, and $X'_4=(a,b)$. The object in (b) can be obtained from that in (a) by reordering $\left( {{X_{1,}}{X_{2,}}{X_{3,}}{X_{4,}}} \right)$ as $\left( {{X_{2,}}{X_{3,}}{X_{4,}}{X_{1,}}} \right)$ . In both (a) and (b), the edges are labeled according to where they appear in the sequence.

Whereas email correspondences are binary (i.e. a communication between caller and receiver), other common network structures may not be. Common examples of generic interaction networks include the network of actor collaborations formed by sampling from the Internet Movie Database (IMDb) or the network of scientific coauthorships obtained by sampling articles from arXiv. These examples give rise to edge-labeled hypergraphs, as shown in Figures 5 and 6, below.

Figure 5: (a) Network representation of the sequence in (3.1). Note that the ‘triangle’ with vertices a, b, and c is labeled 1, the ‘line’ with ends a and c is labeled 2, the ‘triangle’ with vertices b, d, and e is labeled 3, and the ‘rectangle’ with vertices a, d, f, and g is labeled 4. (b) Network representation of the sequence X ₁ = (b, c, e), X ₂ = (b, e), X ₃ = (c, f, g), X ₄ = (b, f, h, k). (c) Edge-labeled hypergraph induced by removing vertex labels from the networks in both (a) and (b).

Figure 6: Two edge-labeled hypergraphs equivalent up to reordering of their edge labels. The graphs in (a) and (b) have equal probability under a hyperedge exchangeable model.

Still other examples arise, as in the path-labeled networks of Figure 7, below, which might represent the network structure obtained by sampling paths in the Internet. Such path sampling mechanisms were critical to the early studies of Internet topology; see, e.g. [Reference Achlioptas, Clauset, Kempe and Moore1] for some discussion of traceroute sampling and its impact on network analysis. We discuss these examples in further detail in Section 3.

Figure 7: Both (a) and (b) correspond to structures induced by taking the equivalence class over paths between vertices. In this illustration, all paths have the same source vertex and each path is labeled by a line type, so that the structure in (b) can be obtained by re-typing the paths from (a) (i.e. type 2 to 4, 3 to 2, and 4 to 3).

The above examples are special cases of what we call relational structures. These structures are relational by virtue of their being observed by sampling a relation (e.g. species, phone call, movie collaboration, coauthorship, path between routers, etc.). The theory emerging is thus different from more traditional theories of exchangeability, for which the natural sampling unit, and thus the invariance, is defined with respect to relabeling of the underlying population.

As we observe beginning in Section 2, exchangeable relational data structures exhibit different probabilistic behaviors than more traditional exchangeable array data. A key difference lies in the assumed observation mechanism: whereas a random sequence (X ₁, …, X_n) represents individual measurements taken on a sample of n distinct individuals, a random partition of [n], as in (1.1), represents a composite binary relation on [n]. The difference is subtle, as can be seen through the different theories for exchangeable random sequences (de Finetti’s theorem) and exchangeable random partitions (Kingman’s theorem). We begin in Section 2 with a discussion of edge exchangeable random networks, the simplest nontrivial extension of Kingman’s theory. We focus specifically on the distinction between the vertex- and edge-centric perspectives of network data, and why the latter is most appropriate for modeling networks constructed from sampled interactions, as in the Enron dataset. In Section 3, we extend this perspective to multiway interactions, such as networks constructed by observing scientific collaborations and paths in the Internet. In Section 4, we generalize these special cases into a framework for relationally labeled structures and prove our main theorem, which characterizes the probabilistic structure of relationally exchangeable structures uniquely in terms of a class of interaction propensity processes. By specializing our theory to the case of exchangeable equivalence relations, we recover Kingman’s paintbox correspondence. However, the argument needed to obtain our general representation requires a careful handling of elements called ‘blips’, which correspond to the ‘dust’ in the special case of Kingman’s theorem. We make concluding remarks in Section 5.

2. Edge-centric view of network structures

Consider sampling uniformly at random from a database of telephone calls exchanged among individuals in a countable set S. The sampling procedure results in an (S × S)-valued sequence $X = \left( {{X_1}, \cdots ,{X_n}} \right)$, where each X_i is an ordered pair (c_i, r_i) representing the caller c_i and receiver r_i. Such a call sequence can be represented by a vertex-edge labeled network in a straightforward way. For example, the sequence of ordered pairs given by

(2.1)

$$\begin{equation}X_1=(a,b), \qquad X_2=(c,a), \qquad X_3=(d,e), \qquad X_4=(a,c)\end{equation}$$

can be represented by a network-like object as shown in Figure 1(a).

Assuming that the sequence X is exchangeable induces a related exchangeability property on the induced vertex-edge labeled structure. By exchangeability of X, the model assigns equal probability to any two structures that are isomorphic up to relabeling edges, as in Figure 1(a) and (b). If we further assume that $X = \left( {{X_1},{X_n}, \cdots } \right)$ is countably infinite, then the one-to-one correspondence between the sequence X and its vertex-edge labeled network representation implies that the class of models for such data is determined by de Finetti’s theorem for exchangeable sequences.

But in much the same way that the species ‘names’ are disregarded when representing the classification of animals by a partition whose (unlabeled) blocks represent the class of animals of a given species, the essential structure of the call sequence is determined by the network structure obtained by disregarding the vertex ‘names’ in Figure 1. A key observation, which guides our development of the general theory below, is that exchangeability of the structure in Figure 1 is induced by the invariance of X with respect to relabeling its indices. In the vertex-edge labeled network representation, the indices labeling X correspond to the edge labels, thus giving rise to the concept of edge exchangeable random graphs, whose distributions are invariant with respect to arbitrary relabeling of their edges, as shown in Figure 2.

Figure 2: (a) Edge-labeled network induced by disregarding vertex labels in Figure 1(a). (b) Edgelabeled graph induced by disregarding vertex labels in Figure 1(b). The graphs in (a) and (b) have equal probability under an edge exchangeable model.

The principle of edge exchangeability was described in [Reference Crane and Dempsey13] and [Reference Crane and Dempsey14] as an alternative to the more conventional property of vertex exchangeability exhibited by graphon models, e.g. [Reference Aldous6], [Reference Bickel and Chen8], and [Reference Hoover19]. A similar nonparametric Bayesian treatment of edge exchangeability was described in [Reference Cai and Campbell9] and [Reference Cai, Campbell and Broderick10]. Edge exchangeability has a number of statistical and practical implications, which improve upon known drawbacks of vertex exchangeable models. For example, although real-world networks are widely believed to be sparse, it is well known that vertex exchangeable models are appropriate only for modeling networks that are ‘dense’. Edge exchangeable models, on the other hand, are able to handle sparsity in a natural and computationally tractable way. (See [14, Sections 4–6] for an illustration of the computation aspects of edge exchangeable models, in particular the instantiation of the Hollywood model in [14, Section 4].) Here we build upon these observations toward a more general theory of structures behaving in an analogous way to exchangeable random partitions and edge exchangeable random graphs.

To appreciate the new perspective offered by the theory of edge and relationally exchangeable networks presented here, it is important to clarify how the edge-centric perspective arises naturally out of our motivating example of phone call sampling. In the vertex-edge labeled structure in Figure 1, it may seem natural to consider the vertex labels a, b, c, d, and e as arbitrary ‘names’ chosen from S. Since the names are arbitrary, a standard line of reasoning may lead to the conclusion that the specific names do not matter, and thus that the model should be invariant under arbitrary reassignment of vertex names, as illustrated in Figure 3. But it is important to bear in mind that vertex exchangeability assumes not only distributional invariance with respect to relabeling of the sampled vertices, but also invariance with respect to relabeling of both sampled and unsampled vertices, thus implying that the observed vertices are representative of the population of all vertices (sampled and unsampled). In our running example, this is simply not the case, as uniform random sampling of telephone calls makes the observed vertices a size-biased sample according to their relative frequency of appearance in the database.

Figure 3: (a) Vertex-labeled network induced by disregarding edge labels in either Figure 1(a) or (b).

We conclude this section with a formal definition of the edge-labeled graphs shown in Figure 4, which serves as a precursor of our general framework of relational exchangeability below. Just as we disregarded the species names in passing from X to its induced equivalence relation ‘~_X’ in (1.1), we may also disregard the vertex labels in Figure 1 to obtain an edge-labeled graph, as shown in Figure 4. The edge-labeled graph is defined formally as the equivalence class X_≅ of all sequences X′ that yield the same structure as X after removing vertex labels:

Figure 4: Three vertex-edge labeled networks with the same induced edge structure, as shown in the edge-labeled network on the right.

(2.2)

$$X \cong {\mkern 1mu} : = \{ {X^\prime }:[n] \to S \times S\mid \rho {X^\prime } = X{\mkern 1mu} {\rm{for}}\,{\rm{some}}\,{\rm{bijection}}\,\rho :{\rm{S}} \to {\rm{S}}\} .$$

Here we overload notation by allowing the bijection ρ : S → S to act on S × S by (c, r) ↦ (ρ(c), ρ(r)), so that ρ X′:= ρ ○ X′: [n] → S × S is well defined by composition of functions. Exchangeability of X immediately implies edge exchangeability of the edge-labeled graph associated to X_≅.

Though the definition in (2.2) may appear foreboding at first, we stress that it is merely the extension of (1.1) to sequences of ordered pairs. In particular, if we let X : [n] → S be a sequence of species labels, as in the discussion preceding (1.1), then the analog to (2.2) is

$${X_ \cong } = \{ {X^\prime }:[n] \to S\mid {(\rho ({X'_i}))_{i \in S}} = {({X_i})_{i \in S}}{\rm{ for\,\,\, some\,\,\, bijection }}\rho :S \to S\} ,$$

which is exactly equivalent to the relation ‘~_X’ defined in (1.1). We explain this notation further in our technical treatment given in Section 4. But first we present additional examples to illustrate the breadth of the framework presented here.

3. Multiway interaction networks: extending the edge-centric perspective

In the previous section we discussed data structures constructed from sampled phone calls involving exactly two individuals, a caller and receiver. But many real-world networks are constructed from interactions involving two or more constituent elements. For example, let S represent a set of scientists, and let X = (X ₁, …, X_n) record names of coauthors on a collection of n scientific journal articles sampled uniformly from a database, such as arXiv. Then each X_i is a finite subset {s_i _,1, …, s_i,ki} ⊂ S and the data sequence X ₁, …, X_n gives rise to a vertexhyperedge labeled structure.

Suppose that we observe the following sequence of ordered pairs obtained by the above procedure with n = 4:

(3.1)

$$\begin{equation}X_1 = \{a,b,c\}, \qquad X_2 = \{a,c\}, \qquad X_3 = \{b,d,e\}, \qquad X_4= \{a,d,f,g\}.\end{equation}$$

As with the example from the previous section, the vertex labels serve only to distinguish among the observed elements. Once the relations of all observed vertices are recorded, through the induced network structure, these vertex labels can be disregarded. So while the data is observed in the form of a sequence, the essential structure can be represented by the hypergraph-labeled structure on the right-hand side of Figure 5.

The assumption that articles were observed by uniform sampling again makes X an exchangeable sequence, which in turn induces exchangeability on the associated hyperedgelabeled network constructed by disregarding vertex labels in the induced network structure in a manner analogous to the construction of edge-labeled graphs from the equivalence class (2.2). See Figure 6 for an illustration.

3.1. Examples of relational structures

Here we briefly describe additional relational structures that commonly arise in modern statistical applications and share the same essential structure as the above two examples. Each example points toward the basic structure of relationally exchangeable networks given in the next section. The α-structures of Section 4 unify these various structures into a single formal mathematical framework for studying relationally exchangeable structures.

We start with a similar example to the previous scientific collaboration network. Let S represent a set of individuals, and let X = (X ₁, …, X_n) record names of senders and recipient lists of n emails sampled uniformly from an email database. Each X_i is an ordered pair (s_i, (r_i ₁, …, r_ik _{_i})) consisting of a single element of S (the sender) and a vector of elements of S of some finite, arbitrary length (the recipients). In an email network, we could further classify the recipients as ‘direct recipients’ (i.e. listed in the ‘to’ field of the email), or recipients who were in the ‘cc’ or ‘bcc’ field of the email. This additional classification adds structure to the data. Our more general treatment of α-structures is helpful to represent this structure.

For another example, let S be a set of Internet Protocol (IP) addresses and let each entry of X = (X ₁, …, X_n) correspond to the path traversed by a message sent between two IP addresses over the Internet. Each X_i corresponds to a path (s_i, a_i,₁, …, a_i,k _{_i}, t_i) from source s_i to target t_i by passing through the intermediate nodes a_i _,1, …, a_i _{,k _i}, altogether indicating that the path from s to a_i _,1 to a_i _,2 and so on until passing from s_i,k _₁ to t_i. If X was obtained, for example, by uniform random sampling of source-target pairs (s_i, t_i) and then by applying an algorithm, such as traceroute, to obtain a path between s_i and t_i, then the sequence of paths X is exchangeable and, therefore, so is the induced path-labeled structure shown in Figure 7. Note in Figure 7 that each path is labeled by a distinct line type, thus preserving the structure of the data in the same manner that is was observed. The more conventional approach to representing network data as a graph loses information about the dependency of the network on the path structure.

Another example in the realm of networks is to take each X_i to be an r-step ego network, obtained, e.g. by snowball sampling a neighborhood of size r from a randomly chosen vertex. The relationally labeled structure is constructed by piecing together the neighborhoods obtained from the r-neighborhoods of n randomly chosen ‘egos’ in this population. We omit the details of this example and instead move on to our general treatment.

Myriad other data structures arise according to a similar recipe as those above: let S be a set of elements, and let $\mathcal{R}$ be a set of relations on the elements in S; sample an exchangeable sequence X taking values in $\mathcal{R}$; construct a structure from X (as in Figures 1 and 5) and obtain the corresponding relationally labeled structure by removing the vertex labels (as in Figures 4, 6, and 7). The resulting structure is exchangeable with respect to relabeling of its relations, a property which we call relational exchangeability. We now define this notion formally and prove a generic structure theorem for infinite relationally exchangeable structures. The formal statement is given in Theorem 4.1. Our formal definition of relational structures and proof of the associated structural theorem offer a rigorous probabilistic foundation for statistical modeling of relational structures.

4. Relational exchangeability

Let $(S,\mathcal{S})$ be any Borel space. A countably infinite sequence of S-valued random variables X = (X ₁, X ₂, … ) is exchangeable if ${X^\sigma }\,\underline{\underline D} \,X\,$ for every permutation σ : ℕ → ℕ, where X^σ :=(X_σ ₍₁₎, X_σ ₍₂₎, … ) is the reordering of X according to σ. By de Finetti’s theorem [Reference De Finetti16], the distribution of every such countably infinite sequence can be expressed as a mixture of independent, identically distributed (i.i.d.) sequences. In particular, with $\mathcal{P}(S)$ denoting the space of probability measures on $(S,\mathcal{S})$, there is a unique probability measure ϕ on $\mathcal{P}(S)$ such that the distribution of X can be expressed as

(4.1)

$$P(X \in \cdot )\, = \int {{P_{\left( S \right)}}} \,{v^\infty }\left( \cdot \right)\,\phi \,\left( {dv} \right)$$

where ν ^∞ denotes the infinite product measure induced by ν on S ^∞. We call the measure ϕ in (4.1) the de Finetti measure of X.

De Finetti’s theorem figures into our treatment of relationally exchangeable structures, which we now define. Note that all of the previous examples, and many more that could arise in practice, involve relational structures as they are defined in the coming section. Example 4.1 provides a concrete illustration.

4.1. Relational structures

Let α = {R ₁, R ₂, …} be a countable collection of relation symbols, and let α : ℕ → ℕ ∪{0} be a signature such that α ( j) = 0 implies that α (k) = 0 for all k ≥ j. An α-structure with domain A is a collection $\mathcal{A}=(A;\,R_1^{\mathcal{A}}, R_2^{\mathcal{A}},\ldots\!)$, where A is a set and each $R_j^{\cal A} \subseteq {A^{\alpha (\>j)}}$ is a relation of arity α ( j) on A. We adopt the convention that A ⁰ := ∅, so that $R_j^{\mathcal{A}}=\emptyset$ whenever α( j) = 0. When discussing these structures below, we may sometimes leave the domain implicit, writing $\mathcal{A}=(R_1^{\mathcal{A}},R_2^{\mathcal{A}},\ldots)$ if no confusion will result.

Remark 4.1. Although the α-structures defined above seem to be nonstandard in the probability literature, they are a standard object in the mathematical field of model theory, e.g. [Reference Hodges18], and they provide a useful setting in which to study general exchangeable structures, as we do here. For example, the framework of α-structures has proven a powerful technique in very recent studies of exchangeable structures in theoretical and applied probability theory [Reference Ackerman2], [Reference Ackerman, Freer, Kruckman and Patel4], [Reference Ackerman, Freer and Patel3], [Reference Crane12], [Reference Crane and Towsner15]. In our treatment below, we leverage this theory to study a general class of structures that arise naturally in applied probability and statistics.

Note that the domain A is for a particular α-structure. In the remainder of the text, the population of constituent elements take labels in ℤ. Therefore, every α-structure has domain A that is a finite subset of ℤ. An α-structure pertaining to a caller–receiver pair, for example, will have domain A of cardinality two; that is, A is the set of labels used to indicate the two individuals involved in the phone call. Of course, the actual elements of ℤ used to identify the particular caller and receiver are immaterial. For a set of α-structures, we are simply interested in the relations, not the identities of the constituent elements. Therefore, we will investigate the equivalence class of sets of α-structures with the same inherent relational structure.

We write dom $\mathcal{A}: = A$ for the domain of $\mathcal{A}$ and FIN_α for the set of all α-structures $\mathcal{A}$ with finite dom ${\cal A} \subset = \{ \ldots , - 1,0,1, \ldots \} $ such that $\sum_{j=1}^{\infty}|R_j^{\mathcal{A}}|<\infty$, where |S| denotes the cardinality of a set S. Writing dom ${\cal A} { \subset _f}$ to denote that dom $\mathcal{A}$ is a finite subset of ℤ, we have

$$FI{N_\alpha }:{\rm{ = }}{\mkern 1mu} \{ A = (A;R_1^A,R_2^A, \ldots ):A{\mkern 1mu} { \subset _f}{\mkern 1mu} {\mkern 1mu} {\mkern 1mu} {\mkern 1mu} {\rm{and}}{\mkern 1mu} {\mkern 1mu} {\mkern 1mu} \sum\limits_{{\rm{j = 1}}}^\infty {\rm{|}} {\rm{R}}_{\rm{j}}^A{\rm{|}} < \infty \} .$$

The condition $\sum_{j\geq1}|R_j^{\mathcal{A}}|<\infty$ implies that to every ${\cal A} \in FI{N_\alpha }$ there is an $r_{\mathcal{A}}: = \max\{k\geq1\colon R_k^{\mathcal{A}}\neq\emptyset\}<\infty$, making it convenient to sometimes express ${\cal A} \in FI{N_\alpha }$ as $(\! dom\mathcal{A};\, R_1^{\mathcal{A}},\ldots,R_{r_{\mathcal{A}}}^{\mathcal{A}})$ by omitting the infinite sequence of empty relations after $r_{\mathcal{A}}$. We note that the set FIN_α is countable.

For any A* ⊇ A and any injection ρ : A* → A′, there is an induced action on $\mathcal{A}=(A;\, R_1^{\mathcal{A}},R_2^{\mathcal{A}},\ldots)$ by $\mathcal{A}\mapsto\rho(\mathcal{A})=(\rho(A);\,R_1^{\rho(\mathcal{A})}, R_2^{\rho(\mathcal{A})},\ldots)$, with dom $ dom\rho(\mathcal{A})=\rho(A): = \{\rho(a)\colon a\in A\}$, and

(4.2)

$$(\rho ({a_1}), \ldots ,\rho ({a_{\alpha (\>j)}})) \in R_j^{\rho ({\cal A})}\quad {\rm{ifandonlyif}}\quad ({{\rm{a}}_{\rm{1}}}, \ldots ,{{\rm{a}}_{\alpha {\rm{ = }}(\>{\rm{j}})}}) \in {\rm{R}}_{\rm{j}}^{\cal A}$$

for each j = 1, 2, ….

For n ≥ 1 and any subset of finite α-structures $R\,{ \cong \over {\left[ n \right]}}$ for which each $\mathcal{A}\in\mathcal{R}$ has dom ${\cal A}{ \subset _f}\,$, we consider the set ${\text{R}}\frac{ \cong }{{\left[ n \right]}}$ of equivalence classes of $\mathcal{R}$-valued sequences $R\,{ \cong \over {\left[ n \right]}}$ obtained as follows. Any $R\,{ \cong \over {\left[ n \right]}}$ determines a relationally labeled structure by removing the labels of the elements contained in each of the x (i) while maintaining the integrity of the overall structure induced by x. Formally, we define the relationally labeled structure induced by x as the equivalence class

(4.3)

$${x_ \cong }: = \{ {x^\prime }:[n] \to R\mid \rho \,{x^\prime } = x\,{\mkern 1mu} {\text{for}}\,{\mkern 1mu} {\text{some}}{\mkern 1mu} \,\,{\text{bijection}}{\mkern 1mu} {\mkern 1mu} {\mkern 1mu} \rho :\,{\Bbb Z} \to {\Bbb Z}\} ,$$

where $\rho {\text{ }}x:[n] \mathcal{R}$ is defined by $\left( {\rho x} \right)\left( i \right) = \left( {\rho x} \right)\left( i \right))$ as in (4.2). We write $R\frac{ \cong }{{\left[ n \right]}}$ as the set of all x_≅ constructed from some $x:[n] \to \mathcal{R}$ in this way.

Remark 4.2. (Notation.) Below we reserve lowercase letters x, y, … for generic (nonrandom) objects and uppercase letters X, Y, … for random objects. We use bold letters x, y, … for generic (nonrandom) functions $x:[n] \to \mathcal{R}$ and X, Y, … for random functions $x:[n] \to \mathcal{R}$.

Example 4.1. In this example, we demonstrate how the modern data structures introduced in Section 1 fit into the framework of α-structures. For this, we write $\mathcal{A}=(A;\,R_1^{\mathcal{A}},R_2^{\mathcal{A}},\ldots)$ as an α-structure with dom $\mathcal{A}=A$ and α varying according to the context.

(a) For α (1) = 1 and $\alpha \left( k \right) = 0for\,k\, \ge \,2$, each α-structure $\mathcal{A}$ is determined by a subset $R_1^{\mathcal{A}}\subseteq A$. If we then choose $\mathcal{R} \subseteq \,FI{N_\alpha }$ to consist of all singleton subsets of ℤ, so that each $R_1^{\mathcal{A}}$ has the form $i \in $, then the equivalence class of $x:[n] \to \mathcal{R}$ in (4.3) corresponds to the equivalence relation induced by x as in (1.1).
(b) For α (1) = 2 and α (k) = 0 for k ≥ 2, each α-structure $\mathcal{A}$ with dom $\mathcal{A}=A$ corresponds to a binary relation $R_1^{\mathcal{A}}\subseteq A^2$. Taking $\mathcal{R} \subseteq {\rm{ }}FI{N_\alpha }$ to consist of all structures with $R_1^{\mathcal{A}}=\{(i,j)\}$ for i ≠ j corresponds to sampling from the phone call database in Section 1: each sampled X_k = {(i, j)} corresponds to a caller–receiver pair (i, j). The equivalence class x_≅ gives an edge-labeled (directed) graph as in Figure 2. (To get an undirected graph, we would take each $R_1^{\mathcal{A}}$ to consist of symmetric pairs {(i, j), (j, i)}.)
(c) Taking α( j) = j for all j ≥ 1 means that any α-structure is determined by a collection $R_j^{\mathcal{A}}\subseteq A^j$ of subsets of j-tuples for every j ≥ 1. If we take R ⊂ FIN^α to consist of only those α-structures of the form $R_{k}^{\mathcal{A}}=\left\{\left(a_{\sigma(1)}, a_{\sigma(2)}, \ldots, a_{\sigma(k)}\right)\right.$: permutations σ : [k] → [k]} for some k ≥ 1 and $R_j^{\mathcal{A}}=\emptyset$ for all j ≠ k, then the elements $\mathcal{A}\in\mathcal{R}$ can be used to represent the set of coauthors in a sample of articles.
(b) A small alteration of the previous α-structure can be used to represent the sender– receiver pair list in a sample of emails. Take α(1) = 1 and α(j + 1) = j for all j ≥ 2. Then take R ⊂ FIN^α to consist of only those α-structures of the form $R_{1}^{\mathcal{A}}=\{(a_1) \}$ and $R_{k+1}^{\mathcal{A}}=\left\{\left(a_{\sigma(1)}, a_{\sigma(2)}, \ldots, a_{\sigma(k)}\right)\right.$: permutations $\alpha :\left[ k \right] \to \left[ k \right]\} $ for some k ≥ 2 and $R_{k+1}^{\mathcal{A}}=\left\{\left(a_{\sigma(1)}, a_{\sigma(2)}, \ldots, a_{\sigma(k)}\right)\right.$ for all j ≠ k ≥ 2.
(e) Also for α( j) = j for all j ≥ 1, if we take $R \subset \,FI{N_\alpha }$ to consist of only those α-structures of the form $R_k^{\mathcal{A}}=\{(a_1,a_2,\ldots,a_{k})\}$ for some k ≥ 1 and $R_j^{\mathcal{A}}=\emptyset$ for all j ≠ k, then each $\mathcal{A}\in\mathcal{R}$ corresponds to a path from a ₁ to a_k, as discussed at the end of Section 1.

Note that the above formalism applies to structures with multiple relations, e.g. a graph with a distinguished community of vertices would have α(1) = 2 (for the binary relation of edges), α(2) = 1 (for the distinguished subset of vertices), and α(k) = 0 for k ≥ 3.

Example 4.1 demonstrates that α-structures are well-defined mathematical objects which compactly represent a wide range of relational structures prevalent in modern statistical settings. Examples of α-structures include graphs, hypergraphs, partitions, unary relations, and much more general structures. Our main theorem, Theorem 4.1, establishes a generic construction for infinitely exchangeable sequences of relational structures. This theorem will provide a necessary probabilistic foundation for research into statistical modeling of modern relational datasets.

4.2. Relational exchangeability

Our discussion below considers the case of random structures labeled by the countable set ℕ, which are those structures constructed just as in (4.3) but for a countable sequence $x:{\text{ }}{\Bbb N} \to \mathcal{R}$. It is important to note the difference between the index set ℕ and the domain of the structures in $\mathcal{R}$, which we take to be ℤ. The domain labeling is ‘quotiented out’ in (4.3), while the indexing ℕ remains, serving as the ‘edge labels’ in the associated interpretation as an edge-labeled graph in Figure 2.

We write $R\frac{ \cong }{{\Bbb N}}$ to denote the set of such structures and we equip $R\frac{ \cong }{{\Bbb N}}$ with the Borel σ-field associated to its product-discrete topology induced by the following metric. For any ${x_ \cong } \in \,R\frac{ \cong }{{\Bbb N}}$ and n ≥ 1, we define the restriction ${R_n}\,x \cong \,\,\,of\,x \cong $ to ${R_n}\,x \cong of\,x\, \cong $ as the structure obtained by taking any ${x^\prime } \in {x_ \cong }\,\,\left( {i.e.\,\,\,} \right.{x^\prime }:\,{\Bbb N} \to \,R$ for which there exists ρ such that ρ x′= x), restricting its domain to [n] by ${x^\prime }_{|[n]}:[n] \to \mathcal{R}$, i ↦ x′(i), and putting ${R_{n\,\,}}{x_ \cong }: = {({x^\prime }_{|[n]})_ \cong }$ as in (4.3). It is clear from the definition of $R\frac{ \cong }{{\Bbb N}}$ that this is well defined and does not depend on the specific choice of x′ ∈ x_≅. We then define the metric on $R\frac{ \cong }{{\Bbb N}}$ by

$$d({x_ \cong },{x^\prime }_ \cong ):\, = {1 \over {1 + \sup \{ n \ge 1:\,\,\,{R_n}{x_ \cong } = {R_n}{x^\prime }_ \cong \} }},\qquad {x_ \cong },{x^\prime }_ \cong \in R\mathop \cong \limits_ \,,$$

with the convention that 1/∞ = 0, under which $R\frac{ \cong }{{\Bbb N}}$ is compact.

For any permutation σ : $ \to $, we define the relabeling of ${x_ \cong } \in \,R\frac{ \cong }{{\Bbb N}}$ by σ as the structure $x_{\cong}^{\sigma}$ obtained by first choosing any x′ ∈ x_≅ and putting $x_{\cong}^{\sigma}: = (x'\circ\sigma)_{\cong}$, where $x'^\circ \sigma :{\text{ }}{\Bbb N} \to \mathcal{R}$ is defined by the usual composition of functions, $\left( {x' \circ \sigma } \right)\left( i \right): = x'\left( {\sigma \left( i \right)} \right)$. It is once again clear that this does not depend on the specific choice of representative x′ ∈ x_≅ since the actions of $\rho : \to $ and $\sigma : \to $ commute for all x, i.e. $\rho \,\left( {{x}^{'}}\circ \sigma \right)\text{ }=\left( \rho \text{ }{{x}^{'}} \right){}^\circ \sigma $.

Definition 4.1. (Relational exchangeability.) A random structure ${x_ \cong } \in R\frac{ \cong }{{\Bbb N}}$ is relationally exchangeable if $X_ \cong ^\sigma\underline{\underline D} X \cong $ for all permutations σ : ℕ → ℕ.

4.3. Representation theorem

Our main theorem establishes a generic construction for infinite relationally exchangeable structures X_≅ in ${\text{R}}\frac{ \cong }{{\Bbb N}}$. The construction proceeds by sampling a sequence X ₁, X ₂, … conditionally i.i.d. in a related space $\mathcal{R}^\star$ and then modifying these observations to obtain a new sequence ${X^\dagger }: = (X_1^\dagger ,X_2^\dagger , \ldots )$. We then construct the equivalence class $X_ \cong ^\dagger $ based on the sequence $X\frac{\dagger }{1},X\frac{\dagger }{2}, \ldots $ as in (4.3). Since stating the theorem formally requires some new ideas and notation, we give the construction and key ideas of the proof prior to stating the result in Theorem 4.1.

As above, let $\sigma : \to \cup \left( 0 \right)$, and let $\mathcal{R} \subseteq \,FI{N_\alpha }$ be a set of α-structures such that α ( j) = 0 implies that α (k) = 0 for all k ≥ j and each $\mathcal{A}\in\mathcal{R}$ has dom $\mathcal{A}{ \subset _f}$. Since $\mathcal{R}$ is at most countable, we may fix an ordering $\mathcal{R}=\{\mathcal{S}_n\}_{n\geq1}$ of its elements. Given any $ = (dom;{\mkern 1mu} R_1^,R_2^, \ldots ) \in {\rm{FI}}{{\rm{N}}_\alpha }$ and any [0, 1]-valued sequence ${\left( {{T_i}} \right)_{i \in }}$, we define $T\mathcal{A}\,:\!\equiv T\circ\mathcal{A}$ as the α-structure $T\mathcal{A}$ with dom $(T\mathcal{A}): = \{T_i\colon i\in dom\mathcal{A}\}$ and relations $R_j^{T\mathcal{A}}$ given by

(4.4)

$$({T_{{a_1}}}, \ldots ,{T_{{a_{\alpha (j)}}}}) \in R_j^{TA}\quad {\rm{if}}{\mkern 1mu} {\rm{and}}{\mkern 1mu} {\rm{only}}{\mkern 1mu} {\rm{if}}\quad ({{\rm{a}}_{\rm{1}}}, \ldots ,{{\rm{a}}_{\alpha ({\rm{j}})}}) \in {\rm{R}}_{\rm{j}}^A$$

for each j = 1, 2, …. We then define

$${{S}_{[0,1]}}:=\bigcup\limits_{n\ge 1}{\{T{{S}_{n}}:T={{\left( {{T}_{i}} \right)}_{i\in \mathbb{Z}}}\in {{\left[ 0,1 \right]}^{\mathbb{Z}}}\}}$$

as the set of α-structures obtained by associating [0, 1]-valued labels to the elements of $\mathcal{R}$. More generally, if $(\Xi_{i})_{i\in dom\mathcal{A}}$ is a collection of subsets ${\Xi _i} \subseteq \left[ {0,1} \right]$, then we define $\Xi\mathcal{A}$ by

$$\Xi {\cal A}: = \{ T{\cal A}:{T_i} \in {\Xi _i}{\rm{ for each }}i \in dom\,{\cal A}\} .$$

We equip $\mathcal{S}_{[0,1]}$ with the σ-field on $\mathcal{S}_{[0,1]}$ generated by all sets of the form $\Xi\mathcal{A}$ with $\mathcal{A}\in\mathcal{R}$ and $\Xi=(\Xi_i)_{i\in dom\mathcal{A}}$ a collection of Borel subsets of [0, 1].

From any $x\cong \,\in \,\,R\frac{\cong }{\mathbb{N}}$ and any sequence $\xi = {\left( {{\xi _i}} \right)_{i \in }}$ of i.i.d. Uniform [0, 1] random variables, we write $\xi \,x \cong $ to denote a random ${{S}_{\left[ 0,1 \right]}}$-valued sequence $\left( {{Z_{_I}}} \right)i \ge 1$ obtained by first taking any representative x′ ∈ x_≅ and then putting Z_i = ξ x′(i) for each i ≥ 1, with ξ x′(i) defined as in (4.4). Since x_≅ is fixed in this example, each ξ x′(i) corresponds to an assignment of Uniform[0, 1] random labels to the elements in the domain of x′(i). Since ξ_i ≠ ξ_j with probability 1 for all i ≠ j, it is immediate that the distribution of ξ x _≅ does not depend on the manner in which representative x′ is chosen.

Now, for any infinite relationally exchangeable structure ${{X}_{\cong }}\in \,R\frac{\cong }{\mathbb{N}}$ and a sequence ξ of Uniform[0, 1] random variables independent of X_≅, ξ X _≅ = (Zi)_i≥1 defines a random sequence obtained by putting Z_i = ξ x′(i) for a representative x′ ∈ x_≅ on the event X_≅ = x_≅. By exchangeability of X_≅, the sequence Z_i = (Z_i)_i≥1 obtained in this way is also exchangeable and de Finetti’s theorem implies that Z is distributed as an i.i.d. sequence from a random measure ν on ${{S}_{\left[ 0,1 \right]}}$, as described in (4.1).

Given ν, we define the propensity of u ∈ [0, 1] in Z by

(4.5)

$$\begin{equation}\nu^\star(u): = \nu(\{\mathcal{A}\in\mathcal{S}_{[0,1]}\colon u\in dom\mathcal{A}\}),\end{equation}$$

which equals the conditional probability of the event {u ∈ dom Z_j} given ν for each j ≥ 1. It is clear that, with probability 1, each u ∈ [0, 1] appears in either 0, 1, or infinitely many of the relations (Z_i)_i≥1. First, if ν*(u) > 0 then the strong law of large numbers implies that u occurs in infinitely many of the relations Z_i with proportion ν*(u). If ν*(u) = 0 and u ∈ Z_i for some i, then the probability that u occurs in Z_j is ν*(u) = 0 independently for each j ≠ i and, therefore, u appears only once in Z with probability 1.

Clearly, the set $\mathcal{U}=\{u\colon \nu^\star(u)>0\}$ is at most countable since | dom (Z_i)| < ∞ for each i = 1, 2, …. We can, therefore, order the elements of $\mathcal{U}$ as $u_1,u_2,\ldots$ as u ₁, u ₂, … such that ν*(u_j) ≥ ν* (u_j ₊₁) for j ≥ 1, breaking ties ν*(v) = ν*(w) as follows. For each $\mathcal{A}\in\mathcal{R}$, we define

$${{\nu }^{\star }}(u;\,A):=\nu (\{TA:T={{({{T}_{i}})}_{i\in }}\in \text{ }{{[0,1]}^{\mathbb{Z}}}\text\,\,\,{such}\text\,\,\,{that}\,\,\,u\in dom(T\,A)\}),\qquad u\in [0,1],$$

to be the measure assigned to the subset of $\mathcal{S}_{[0,1]}$ whose structure is consistent with $\mathcal{A}$ and which contains u in its domain. Assuming that ν*(v) = ν*(w), we assign the smaller label to v if there is some k ≥ 1 such that $\nu^\star(v;\,\mathcal{S}_j)=\nu^\star(w;\,\mathcal{S}_j)$ for all j < k and $\nu^\star(v;\,\mathcal{S}_j)=\nu^\star(w;\,\mathcal{S}_j)$. If $\nu^\star(v;\,\mathcal{S}_j)=\nu^\star(w;\,\mathcal{S}_j)$ for all j ≥ 1 then we label v and w in increasing order, i.e. if we are to assign labels j and j + 1 to v and w and if v < w, then we shall put u_j = v and u_j ₊₁ = w; otherwise, we put u_j = w and u_j ₊₁ = v. The ordering $\mathcal{U}=(u_j)_{j\geq1}$ is thus uniquely determined by ν and the fixed ordering of $\mathcal{R}$ chosen at the outset.

Now given $\mathcal{U}$, for any $\mathcal{A}\in\mathcal{S}_{[0,1]}$, we define $\mathcal{A}^\star$ (suppressing the dependence on ν) by replacing each occurrence of $u_j\in\mathcal{U}$ by j and replacing each occurrence of $v'\notin\mathcal{U}$ in $\mathcal{A}$ by a unique nonpositive integer z(v′) = 0, −1, −2, … so that, for v′, $v''\notin\mathcal{U}$ and both in dom $\mathcal{A}$ with v′≤ v″ we have z(v′) ≤ z(v″) and the z(v′) are chosen to be the largest possible nonpositive integers that satisfy this condition. (For example, if v ₁ < v ₂ < v ₃ are the only elements in dom $\mathcal{A}$ that are not in $\mathcal{U}$, we assign z(v ₃) = 0, z(v ₂) = −1, and z(v ₁) = −2.)

Every $\mathcal{A}\in\mathcal{S}_{[0,1]}$ thus corresponds to a unique such $\mathcal{A}^\star$ and we define $\mathcal{R}^\star$ as the set of all structures $\mathcal{A}^\star$ obtained in this way. Note that, since $\mathcal{R}$ is at most countable, so is $\mathcal{R}^\star$. Also, although we have constructed $\mathcal{R}^\star$ using $\mathcal{U}$ (and therefore ν), the set $\mathcal{R}^\star$ depends only on $\mathcal{R}$, justifying the term $\mathcal{R}$-simplex in our definition of $ FR$ in (4.6) below.

The elements of $\mathcal{R}^\star$ are thus α-structures $\mathcal{A}^\star$ with dom ${A^ * } \subseteq $. We define the $\mathcal{R}$-simplex $ FR$ by

(4.6)

$${F_R}: = \left\{ {\{ {{({f_B})}_{B \in {R^ \star }}}:{f_B} \ge 0\,{\rm{and}}\sum\limits_{B \in {R^ \star }} {{f_B}} = 1} \right\},$$

on which we equip the metric

$$d_{FR}(\,f,f'): = \sum_{\mathcal{B}\in\mathcal{R}^\star}|f_{\mathcal{B}}-f'_{\mathcal{B}}|,\qquad f,f'\in FR,$$

and the associated Borel σ-field.

Any $f\in FR$ determines a unique probability measure ε_f on ${\rm{R}}{ \cong \over }$ by first drawing X = (X ₁, X ₂, … ) i.i.d. from

(4.7)

$$\begin{equation}\label{eq:f-induced}\mathbb{P}(X_i=\mathcal{B}\mid f)=f_{\mathcal{B}},\qquad\mathcal{B}\in\mathcal{R}^\star,\end{equation}$$

and then constructing ${X^\dagger } = (X{\dagger \over 1},X{\dagger \over 2}, \ldots )$ from X as follows. We initialize by putting m ₀ = 0. For each n ≥ 1, given m_n ₋₁, we replace the nonpositive elements in dom X_n according to the following rule.

(i) If dom X_n has no nonpositive elements then put m_n = m_n ₋₁ and $X{ \cong \over n}: = {X_n}$.
(ii) If dom X_n has nonpositive elements 0, …, −k for k ≥ 0 then define $X{\dagger \over n}$ by replacing each occurrence of −i for 0 ≤ i ≤ k in X_n by m_n ₋₁ − i, then putting m_n = m_n ₋₁ − k − 1 and keeping positive elements unchanged. See Example 4.2 below for an illustration.

Note the distinction between the use of nonpositive elements in the constructions of X_i and $X{\dagger \over i}$, respectively. The nonpositive elements of each $X_i \in \mathcal{R}^\star$ serve to denote the nonrecurring particles that appear only within this particular relation. The nonpositive elements of $$X{\dagger \over i}$$, on the other hand, serve to denote the nonrecurring particles across all relations in the sequence $$X\dagger $$. We define ε_f to be the distribution of $X{\dagger \over \cong }$ constructed by applying (4.3) to the sequence X ^†.

From ν, we define $f^{\nu}=(\,f_{\mathcal{B}}^{\nu})_{\mathcal{B}\in\mathcal{R}^\star}\in FR$ by

(4.8)

$$\begin{equation}\label{eq:canonical}f^{\nu}_{\mathcal{B}}: = \nu(\{\mathcal{A}\in\mathcal{S}_{[0,1]}\colon\mathcal{A}^\star=\mathcal{B}\}),\qquad\mathcal{B}\in\mathcal{R}^\star.\end{equation}$$

This choice of $f^{\nu}=(\,f^{\nu}_{\mathcal{B}})_{\mathcal{B}\in\mathcal{R}^\star}$ is uniquely determined by ν and the fixed ordering of $\mathcal{R}$. Conversely, given $f=(\,f_{\mathcal{B}})_{\mathcal{B}\in\mathcal{R}^\star}\in FR$, we construct a measure ν_f on $\mathcal{S}_{[0,1]}$ to be the distribution of ξY from (4.4) for Y drawn from distribution (4.7) and ξ = (ξ_i)_i∈ℤ i.i.d. Uniform[0, 1] independent of Y. Proposition 4.1 proves that the above procedure does not alter the random, relationally labeled structure.

Proposition 4.1. Let X_≅ be relationally exchangeable, and let θ = (θ_i)_i∈ℤ be i.i.d. Uniform[0, 1] independent of X _≅. Then (((θ X _≅)*)^†)_≅ = X_≅ a.s., where (θ X _≅ )* denotes the application of $^{\star}\colon \mathcal{S}_{[0,1]}\to\mathcal{R}^\star$ to each component of the sequence θ X _≅.

Proof. Let θ = (θ_i)_i∈ℤ be i.i.d. Uniform [0, 1] independent of X_≅. Each event X_≅ = x_≅ gives rise to a probability measure ν on $\mathcal{S}_{[0,1]}$ through the de Finetti measure of θ x _≅; see (4.1). Let $\mathcal{U}=(u_i)_{i\geq1}$ be the ordered subset of [0, 1] corresponding to the atoms of ν* as in (4.5). Since θ is i.i.d. Uniform[0, 1], we have

$$\mathbb{P}\{\theta_i\neq\theta_j\text{ for all }i\neq j\}=1,$$

which implies that distinct i, j ∈ ℤ are labeled distinctly in θ x _≅ with probability 1. In particular, the nonpositive labels in (θ x _≅.)* account for all those elements that appear in only one entry of the sequence θ x _≅.It follows that ((θ x _≅)*)^† ∈ x_≅ with probability 1 and, thus, ((θ x _≅)*)^†)_≅ = x_≅ with probability 1 for all possible outcomes X_≅ = x_≅.

Example 4.2. To illustrate the above procedure, let $\mathcal{R}$ consist of the singleton sets ({i}), i ≥ 1, just as in Example 4.1(a), written as {1}, {2}, … for simplicity. Let $\mathcal{U}=(u_i)_{i\geq1}$ be a countable subset of [0, 1]. From $\mathcal{R}=\{\{1\},\{2\},\ldots\}$, we obtain $\mathcal{R}^\star=\{\{0\},\{1\},\ldots\}$ since, for any sequence T =(T_i)_i∈ℤ and any {i}, i ≥ 1, the transformed relation T{i} ≡ {T_i} has either $T_i\in\mathcal{U}$ or $T_i\not\in\mathcal{U}$. If $T_i=u_k\in\mathcal{U}$ then {T_i}* = {k} for k ≥ 1; and if $T_i=v\not\in\mathcal{U}$ then {T_i}* = ({0}).

Given $(\,f_{\{0\}},f_{\{1\}},\ldots)\in FR$, we generate a sequence X ₁, X ₂, … of singleton sets i.i.d. as in (4.7), from which we obtain $X_1^{\over dagger},X_2^{\over dagger},\ldots$ by reassigning occurrences of 0 to the greatest negative integer that has not yet appeared in the sequence. For example, if the sequence begins {4}, {0}, {2}, {0}, {2}, {2}, {0}, …, then we reassign labels to obtain {4}, {0}, {2}, {−1}, {2}, {2}, {−2}, …. Delabeling according to (4.3) (or equivalently (1.1)) yields the equivalence classes {1}, {2}, {3, 5, 6}, {4}, {7}.

For another example, let $\mathcal{R}$ consist of ordered pairs {(i, j)}, 1 ≤ i ≠ j < ∞, so that we are in the phone call example of Section 1. $\mathcal{U}=(u_i)_{i\geq1}$ be a countable subset of [0, 1]. Then

$${\varepsilon _\phi }( \cdot ) = \int_{FR} {{\varepsilon _f}} ( \cdot )\phi (df).$$

since any T = (T_i)_i∈ℤ and any $\{(i,j)\}\in\mathcal{R}$ is transformed by T{(i, j)} = {(T_i, T_j)}, which may have $T_i=u_{i'}\in\mathcal{U}$ and $T_j=u_{j'}\in\mathcal{U}$ in which case {(T_i, T_j)}* = {(i′, j′ )}. If $T_i=u_{i'}\in\mathcal{U}$ and $T_j\not\in\mathcal{U}$, then {(T_i, T_j)}* = {(i′, 0)}. If $T_i\not\in\mathcal{U}$ and T_j = u_j _′ then {(T_i, T_j)}* = {(0, j′)}. If T_i, $T_i,T_j\not\in\mathcal{U}$ then {(T_i, T_j)}* will equal {(0, −1)} or {(− 1, 0)} depending on whether T_j < T_i or T_i < T_j, respectively.

The above construction gives the following representation theorem.

Theorem 4.1. Let α be a signature, ${\mathcal R} \subseteq {\rm{FI}}{{\rm{N}}_\alpha }$, and fix an ordering $\mathcal{R}=(\mathcal{S}_n)_{n\geq1}$. Let X _≅ be a relationally exchangeable random structure in ${\rm{R}}{ \cong \over }$. Then there exists a probability measure ϕ on $ FR$ such that X _≅~εϕ, where

$$\mathcal{R}^{\star}=\{\{(i,0)\}\colon i\geq1\}\cup\{\{(0,i)\}\colon \geq1\}\cup\{\{(0,-1)\},\{(\!-1,0)\}\}\cup\mathcal{R},$$

A canonical version of the measure ϕ in Theorem 4.1 can be constructed as in (4.8), and this is the measure we construct in the following proof.

Proof of Theorem 4.1. We proceed by constructing ϕ as in (4.8) and showing that $Y\frac{\dagger }{\cong }\sim ~{{\varepsilon }_{\phi }}$ satisfies $Y{ \over \cong }\,{X_ \cong }$.

To see this, we first let ψ be the de Finetti measure on $\mathcal{P}(\mathcal{S}_{[0,1]})$ (i.e. the space of probability measures on $\mathcal{S}_{[0,1]}$) associated to ξ X _≅ for ξ = (ξ_i)_i∈ℤ i.i.d. Uniform[0, 1] independent of X_≅. In particular, the distribution of ξ X _≅ is conditionally i.i.d. from ν ~ ψ. The measure ψ determines a measure ϕ on $ FR$ through (4.8). We need to show that $Y\,\frac{\dagger }{_{\cong }}\,\tilde{\ }{{\varepsilon }_{\phi }}$ satisfies $Y{\dagger \over \cong }{\rm{ }}{X_ \cong }$.

Given f ~ ϕ, we construct Y = (Y ₁, Y ₂, … ) as conditionally i.i.d. from the distribution in (4.7) and Y^† as in (i) and (ii) above. Let $\mathcal{U}=(u_i)_{i\geq1}$ be the atoms of θ Y^† for θ = (θ_i)_i∈ℤ i.i.d. Uniform[0, 1] independent of Y^†, let ζ = (ζ_k)_k≤0 be i.i.d. Uniform[0, 1] independent of everything else, and define ξ_f = (ξ_i)_i∈ℤ by

$$\xi_i=\Bigg\{\begin{array}{@{}ll}u_i,& i\geq1,\\\zeta_i,& \text{otherwise.}\end{array}$$

First note that the conditional distribution of ξ_f Y^† given f is the same as the conditional law of θ Y^† given ν_f, since the ξ_i for i ≥ 1 were constructed from the atoms of $\theta \,Y{\dagger \over \cong }$. Writing $\mathcal{D}(X\mid Y)$ to denote the conditional distribution of X given Y, we thus have

(4.9)

$$\matrix{ {{\cal D}({\xi _f}Y\mathop {\dagger \,}\limits_ \cong \mid f)} \hfill & { = {\cal D}(\theta \,Y\dagger \mid {\nu _f}),} \hfill \cr } $$

(4.10)

$$\matrix{ {{\cal D}({\xi _f}Y\mathop {\dagger \,}\limits_ \cong \mid f)} \hfill & { = {\cal D}(\theta {X_ \cong }\mid \,{\nu _f}),} \hfill \cr } $$

where (4.10) follows from the construction of ϕ from the de Finetti measure ψ of ξ X _≅. From (4.9) and (4.10), it follows that

$$\matrix{ {{\cal D}({{(({{({\xi _f}Y\mathop \dagger \limits_{_ \cong } )}^ \star }))}_ \cong }\mid f) = {\cal D}({{({{({{(\theta \,Y\mathop \dagger \limits_{_ \cong } )}^ \star })}^\dagger })}_ \cong }\,\mid {\nu _f}),} \cr {{\cal D}({{(({{({\xi _f}Y\mathop \dagger \limits_{_ \cong } )}^ \star }))}_ \cong }\mid \,f) = {\cal D}({{({{({{(\theta \,X\mathop \dagger \limits_{_ \cong } )}^ \star })}^\dagger })}_ \cong }\,\mid \,{\nu _f}),} \cr } $$

and, thus,

$${\cal D}({({({(\theta \,{Y^{\mathop \dagger \limits_{_ \cong } }})^ \star })^\dagger })_{ \cong \,\,}}\mid \,\,{\nu _f}) = {\cal D}({({({(\theta {X_ \cong })^ \star })^\dagger })_{ \cong \,\,}}\mid \,\,{\nu _f}).$$

By Proposition 4.1, we have ${(({(\theta \,Y{\dagger \over \cong })^ \star }))_ \cong } = Y{\dagger \over \cong }$ a.s. and (((θ X _≅)*)^†)_{≅ =} X_≅, implying that $Y{\dagger \over \cong }{X_ \cong }$, as desired.

4.4. Special cases

As discussed in Example 4.2, the case in which $\mathcal{R}$ corresponds to the singleton sets {i} for i ≥ 1 gives $ FR$ equal to the ranked simplex

$$\Delta^{\downarrow}: = \Bigg\{(\,f_0,f_1,\ldots)\colon f_1\geq f_2\geq\cdots\geq0,\ f_0\geq0,\ \sum_{i\geq0}f_i=1\Bigg\}.$$

To see this, note that $\mathcal{S}_{[0,1]}$ is the set of singleton elements of [0, 1], i.e.

$$\mathcal{S}_{[0,1]}: = \{\{u\}\colon u\in[0,1]\}.$$

An exchangeable sequence X = (X ₁, X ₂, … ) in $\mathcal{S}_{[0,1]}$ gives rise to a random countable subset $\mathcal{U}\subset[0,1]$ of elements that appear infinitely often among the X_i and its complement $[0,1]\setminus\mathcal{U}$ consisting of all elements appearing at most once among the X_i. In the construction of X^† from X described above, any occurrence X_i = {u} for $u\notin\mathcal{U}$ gives rise to $X_i^{\star}=\{0\}$, explaining why the definition of $ FR$ in (4.6) corresponds to the simplex of elements (f_i)_i≥0.

Following Example 4.1(b), the case of edge exchangeable random (directed) graphs corresponds to ${\mathcal R} \subset FI{N_\alpha }$ with α(1) = 2 and α(k) = 0 for k ≥ 2, with each $\mathcal{A}\in\mathcal{R}$ having $R_1^{\mathcal{A}}=\{(i,j)\},\,i\neq j$, i ≠ j. In this case, $\mathcal{S}_{[0,1]}$ corresponds to all pairs (u, v) for u, v ∈ [0, 1]. An exchangeable sequence X in $\mathcal{S}_{[0,1]}$ determines a random subset $\mathcal{U}\subset[0,1]$. Now, in each X_i there can be 0, 1, or 2 elements $v\notin\mathcal{U}$. Occurrences of such elements are replaced by 0 and −1, as explained in Example 4.2.

In general, we call the occurrences of any $u\notin\mathcal{U}$ in the sequence X blips. These elements are merely a ‘blip’ in the overall sequence X—they occur only for an instant and then never again. Our labeling convention in defining $\mathcal{R}^\star$ is that the nonpositive labels correspond to the blips. Theorem 4.1 describes how blips arise in general relationally exchangeable structures.

5. Concluding

Though relational structures are commonplace in modern statistics, the theory of relational exchangeability presented here stands apart from most other studies of exchangeable structures by de Finetti [Reference De Finetti16], Aldous [Reference Aldous5], Hoover [Reference Hoover19], Kallenberg [Reference Kallenberg20], and others [Reference Ackerman, Freer and Patel3], [Reference Austin and Panchenko7], [Reference Crane and Towsner15]. The most relevant prior work is Kingman’s on exchangeable random set partitions [Reference Kingman21], which can be reinterpreted as relationally exchangeable structures in the sense described above.

Recent work in the statistical analysis of network data underscores the significance of relationally labeled structures in applications, as many data structures which are typically represented graphically, such as social networks and networks detailing email correspondence and professional collaborations, arise from a process by which interactions or relations accumulate within a population of otherwise indistinguishable individuals. The work in [Reference Crane and Dempsey14] focused on the case of edge exchangeable random graphs, but the additional examples in Sections 1–3 involving repeated path sampling and snowball sampling are also highly relevant in network applications.

Our main representation theorem serves two immediate statistical purposes. First, the representation characterizes a general class of nonparametric statistical models of potential interest in the aforementioned applications. Second, it establishes that vertices arrive in size-biased random order in relationally exchangeable structures, explaining why the common assumption of exchangeable vertex labeling, as presented in graphon models [Reference Lovász and Szegedy23], is not tenable in many applications. We reserve discussion of these practical implications for other work; see [Reference Crane and Dempsey13] and [Reference Crane and Dempsey14].

References

Achlioptas, D., Clauset, A., Kempe, D. and Moore, C. (2005). On the bias of traceroute sampling or, power-law degree distributions in regular graphs. In Proc. 37th Annual ACM Symposium on Theory of Computing, Association for Computing Machinery, New York, pp. 694–703.Google Scholar

Ackerman, N. (2015). Representations of Aut(ℳ)-invariant measures: part 1. Preprint. Available at https://arxiv.org/abs/1509.06170v1.Google Scholar

Ackerman, N., Freer, C. and Patel, R. (2014) Invariant measures concentrated on countable structures. Preprint. Available at https://arxiv.org/abs/1206.4011v3.Google Scholar

Ackerman, N., Freer, C., Kruckman, A. and Patel, R. (2017). Properly ergodic structures. Preprint. Available at https://arxiv.org/abs/1710.09336v1.Google Scholar

Aldous, D. J. (1981). Representations for partially exchangeable arrays of random variables. J. Multivariate Anal. 11, 581–598.CrossRef Google Scholar

Aldous, D. J. (1985). Exchangeability and related topics. In École d’été de Probabilités de Saint-Flour, XIII—1983 (Lecture Notes Math. 1117), Springer, Berlin, pp. 1–198.CrossRef Google Scholar

Austin, T. and Panchenko, D. (2014). A hierarchical version of the de Finetti and Aldous-Hoover representations. Prob. Theory Relat. Fields 159, 809–823.CrossRef Google Scholar

Bickel, P. J. and Chen, A. (2009). A nonparametric view of network models and Newman–Girvan and other modularities. Proc. Nat. Acad. Sci. USA 106, 21068–21073.CrossRef Google Scholar PubMed

Cai, D. and Campbell, T. (2015). Edge-exchangeable graphs and sparsity. In NIPS 2015 Workshop on Networks in the Social and Information Sciences.Google Scholar

Cai, D., Campbell, T. and Broderick, T. (2016). Edge-exchangeable graphs and sparsity. Neural Information Processing Systems.Google Scholar

Crane, H. (2016). The ubiquitous Ewens sampling formula. Statist. Sci. 31, 1–19.CrossRef Google Scholar

Crane, H. (2018). Combinatorial Lévy processes. Ann. Appl. Prob. 28, 395–339.CrossRef Google Scholar

Crane, H. and Dempsey, W. (2016). A framework for statistical network modeling. Preprint. Available at https://arxiv.org/abs/1509.08185v4.Google Scholar

Crane, H. and Dempsey, W. (2018). Edge exchangeable models for interaction networks. J. Amer. Statist. Assoc. 113, 1311–1326CrossRef Google Scholar PubMed

Crane, H. and Towsner, H. (2018). Relatively exchangeable structures. J. Symbolic Logic 83, 416–442.CrossRef Google Scholar

De Finetti, B. (1937). La prévision: ses lois logiques, ses sources subjectives. Ann. Inst. H. Poincaré 7, 1–68.Google Scholar

Ewens, W. J. (1972). The sampling theory of selectively neutral alleles. Theoret. Pop. Biol. 3, 87–112.CrossRef Google Scholar PubMed

Hodges, W. (1993). Model Theory (Encyclopedia Math. Appl. 42). Cambridge University Press.Google Scholar

Hoover, D. (1979). Relations on probability spaces and arrays of random variables. Preprint. Institute for Advanced Studies, Princeton, NJ.Google Scholar

Kallenberg, O. (2005). Probabilistic Symmetries and Invariance Principles. Springer, New York.Google Scholar

Kingman, J. F. C. (1978). Random partitions in population genetics. Proc. R. Soc. London A 361, 1–20.CrossRef Google Scholar

Klimt, B. and Yang, Y. (2004). Introducing the enron corpus. CEAS. In Proc. CEAS 2004 (Mountain View, CA, 2004).Google Scholar

Lovász, L. and Szegedy, B. (2006). Limits of dense graph sequences. J. Combinatorial Theory B 96, 933–957.CrossRef Google Scholar

McCullagh, P. and Yang, J. (2006). Stochastic classification models. In International Congress of Mathematicians. Vol. III, European Mathematical Society, Zürich, pp. 669–686.Google Scholar

Figure 5: (a) Network representation of the sequence in (3.1). Note that the ‘triangle’ with vertices a, b, and c is labeled 1, the ‘line’ with ends a and c is labeled 2, the ‘triangle’ with vertices b, d, and e is labeled 3, and the ‘rectangle’ with vertices a, d, f, and g is labeled 4. (b) Network representation of the sequence X1 = (b, c, e), X2 = (b, e), X3 = (c, f, g), X4 = (b, f, h, k). (c) Edge-labeled hypergraph induced by removing vertex labels from the networks in both (a) and (b).

Figure 6: Two edge-labeled hypergraphs equivalent up to reordering of their edge labels. The graphs in (a) and (b) have equal probability under a hyperedge exchangeable model.

Figure 3: (a) Vertex-labeled network induced by disregarding edge labels in either Figure 1(a) or (b).

Figure 4: Three vertex-edge labeled networks with the same induced edge structure, as shown in the edge-labeled network on the right.

Article contents

Relational exchangeability

Abstract

Keywords

MSC classification

1. Introduction

1.1. Exchangeable partitions

1.2. More general structures

2. Edge-centric view of network structures

3. Multiway interaction networks: extending the edge-centric perspective

3.1. Examples of relational structures

4. Relational exchangeability

4.1. Relational structures

4.2. Relational exchangeability

4.3. Representation theorem

4.4. Special cases

5. Concluding

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests