A Prolegomenon to Differential Equations and Variational Methods on Graphs

Yves van Gennip; Jeremy Budd

doi:10.1017/9781009346641

1 Introduction

The Element you are about to read tells a tale that shrank in the writing. The opposite is true for its companion volume [Reference van Gennip and Budd190]. When we were asked to write a contribution to what would become the Elements in Non-local Data Interactions: Foundations and Applications series in April 2021,Footnote ¹ quickly the idea took hold to provide an overview, a snapshot of the field of differential equations and variational methods on graphs and their applications to machine learning, image processing, and image analysis. But this is a very active field, being developed in a variety of different directions, and so the story we wanted to tell outgrew the format of a contribution to the Cambridge Elements series. We are grateful to Cambridge University Press and the editors for deeming our full contribution worthy enough of its own publication [Reference van Gennip and Budd190] and allowing us to extract and adapt some introductory sections for publication in the Elements series. This Element provides an introduction to differential equations and variational methods on graphs seen through the lenses of the authors. It focuses on some areas to which we ourselves have actively contributed, but at the same time aims to give sufficient background to equip the interested reader to explore this fascinating topic. We have aimed to make this Element a satisfyingly comprehensive read in itself, while also allowing it to be a teaser for its more extensive companion volume. Those who compare both volumes will find that the companion volume contains, besides additional details in some of the sections that correspond to those in the current Element, chapters on applications of differential equations on graphs and their computational implementation, as well as chapters on further theoretical explorations of the relationships between various graph-based models and of the continuum limits of such models when the number of nodes of the underlying graph is taken to infinity.

Differential equations, both ordinary differential equations (ODEs) and partial differential equations (PDEs), have a long history in mathematics and we assume they need no introduction to the reader. Slightly younger, yet still of a venerable age, is the study of graphs.Footnote ² By a differential equation on a graph we mean a discretization of a differential equation, usually a PDE, on a graph: if we can write the PDE as $F (u (x, t)) = 0$ , for a differential operator $F$ and a function $u$ defined on (a subset of) $R^{m} \times R$ , then we obtain a differential equationFootnote ³ on a graph by replacing the (spatial) derivatives with respect to $x$ in $F$ by finite difference operators based on the structure of the graph and replacing $u$ by a function defined on (a subset of) $V \times R$ , where $V$ is the set of nodes of the graph.

Variational models in the calculus of variations are typically formulated in terms of minimization of a function(al). To consider such a model on a graph, we discretize the functional by replacing integrals with the appropriate sums and differential operators with the corresponding finite difference operators on graphs. This process also turns finite-dimensional the space of admissible functions over which the functional is minimized. The line between calculus of variations and mathematical optimization becomes blurry here, or is even crossed. Because the authors of this Element approach these models from the point of view of variational calculus, we will include them under the umbrella of variational models (on graphs), even if the originators of any particular model may have had a different inspiration when proposing that model.

The field of machine learning is concerned with the development of methods and algorithms to analyse data sets. ‘Learning’ in this context refers to leveraging the properties of some collection of ‘training data’ (which may or may not be a part of the data set which is to be analysed) to draw conclusions about the data set. Machine learning has undertaken an enormous flight in the twenty-first century. Terms like ‘big data’, ‘machine learning’, and ‘artificial intelligence’ are now commonplace for many people, both because of the commercial successes of the many tech companies that exist by the grace of data availability and the methods to learn from the data, and because of the enormous speed with which deep-learned algorithms have transformed many areas of science, industry, and public and private life. Scientific curiosity goes hand in hand with a societal need to understand the methods that play such a big role in so many sectors.

But what is the role of differential equations in all of this? After all, many of the advances in machine learning, both in terms of development of new methods and analysis of existing ones, come from statistics, computer science, and the specific application fields – scientific or industrial – where the methods are used. One might argue for the general notion that increased diversity in the points of view from which a particular topic is scrutinized leads to different and complementary insights that all strengthen the full picture. There certainly is validity in that notion when it comes to the study of machine learning; in this case, though, there are stronger ties that go beyond generalities.

A substantial part of the root system of differential equations in machine learning lies in the field(s) of mathematical image processing and image analysis.Footnote ⁴ Not only do (the ingredients of) many differential-equation-based machine learning methods have roots in the mathematical imaging literature and many machine learning methods have applications in imaging problems, but there is also a substantial overlap in the communities active in these fields.

Despite the success of artificial neural networks, they are not central objects in this Element. The main focus in the current Element is on those methods that represent the data (e.g., the image) in terms of a graph in order to apply a graph-based variational model or differential equation.

These methods have many desirable properties, which have made them popular in recent years. Because the graph-based models and equations have close connections to well-established models and equations in the continuum setting, there is a wealth of ideas to pursue and techniques to apply to study and understand them. Moreover, the development of numerical methods for differential equations has a long and rich history, yielding many algorithms that can be adapted to the graph-based setting. Another key difference between most machine learning methods discussed in this Element compared to deep learning methods is that the latter are mostly data-driven,Footnote ⁵ which usually means many training data are required to obtain well-performing networks, while the former are explicitly model-driven and so tend to require fewer training data (but also are less likely to discover patterns for which the models are not built to look).

The scope of this Element is broad in some parts and narrow in others. On the one hand, we wish to provide an overview of an exciting, ever-broadening, and expansive field. Thus, in the general literature overview in Section 2 we have taken a very broad view of the topic of differential equations on graphs and their applications with the aim of placing it in its historical context and pointing the reader to the many aspects and fields that are closely related to it. On the other hand, in the remainder of this Element, we focus on very specific models with a certain amount of bias in the direction of those models with which the authors have close experience themselves, yet not to the complete exclusion of other models. These models revolve around the graph Ginzburg–Landau functional, which is introduced in this Element in Section 5 and which plays a central role in a number of graph-based clustering and classification methods.

Graph clustering and classification are similar tasks that both aim to use the structure of the graph to group its nodes into subsets called clusters or classes (or sometimes communities or phases, depending on the context or application). In most of the settings that we discuss here, these subsets are required to be pairwise disjoint, so that mathematically we can speak of a partition of the node set (assuming nonempty subsets) and we obtain non-overlapping clusters or classes. A key guiding principle of both tasks is to have strong connections (i.e., many edges or, in an edge-weighted graph, highly weighted edges) between nodes in the same class or cluster and few between nodes in different clusters or classes.Footnote ⁶ This is not the only requirement for a good clustering or classification – in the absence of any other desiderata, a trivial partition of the node set into one cluster would maximize intra-cluster connectivity and minimize inter-cluster connectivity – and so additional demands are typically imposed. Two types of constraints that will receive close attention in this Element are constraints on cluster sizes and constraints that encourage fidelity to a priori known class membership of certain nodes. The presence of such a priori known labels is what sets classification apart from clustering.

There are mathematical imaging tasks that can be formulated in terms of graph clustering or classification, most notably image segmentation, which can be viewed as the task of clustering or classifying the pixels of a digital image based on their contribution to (or membership of) objects of interest in the image. Other imaging tasks, such as image denoising, reconstruction, or inpainting, can be formulated in ways that are mathematically quite closely related to graph clustering and classification.

We hope this Element may serve as an overview and an inspiration, both for those who already work in the area of differential equations on graphs and for those who do not (yet) and wish to familiarize themselves with a field rich with mathematical challenges and abundant applications.

1.1 Outline of This Element

We give a brief overview of the history of and literature in the field of differential equations and variational methods on graphs with applications in machine learning and image analysis in Section 2. For a more extensive overview, we refer to section 1.2 of the companion volume [Reference van Gennip and Budd190]. In Section 3 we lay the mathematical foundations that we require to formulate differential equations and variational models on graphs. For example, we define spaces of node functions and important operators and functionals on these spaces, such as the graph Laplacian operators and the graph total variation functional.

In most of this Element we consider undirected graphs, but in Section 4 we discuss very briefly works that generalize some of the concepts from Section 3 to directed graphs, in particular graph Laplacians.

Besides the graph total variation functional, another very important graph-based functional, which we have already mentioned, is the graph Ginzburg–Landau functional. It deserves its own place in the spotlight; thus, in Section 5 we introduce it and discuss some of its variants and properties, including its connection to the graph total variation functional.

The spectrum of an operator gives insight into its behaviour. In Section 6 indeed we see that the spectra of graph Laplacians shed some light on their usefulness in graph clustering and classification problems.

The role of functionals in variational models is as ‘energy’ or an ‘objective function’ that needs to be minimized. Two important dynamical systems that (approximately) accomplish this minimization for the graph Ginzburg–Landau functional are described by the graph Allen–Cahn equation and the graph Merriman–Bence–Osher scheme, respectively. The former, which is an ordinary differential equation obtained as a gradient flow of the graph Ginzburg–Landau functional, is the topic of Section 7, while the latter is the focus of Section 8.

Closely related to the graph Allen–Cahn and Merriman–Bence–Osher dynamics are the graph mean curvature flow dynamics that are described in Section 9. Although exactly how closely these are related is still an open question, one property most of them have in common is a threshold on the parameter of the model below which the dynamics trivializes. This freezing phenomenon is discussed in Section 10.

The main focus in this Element is on models that cluster or classify the node set of the graph into two subsets. In Section 11 we take a look at multiclass extensions of some of these models.

In Section 12 we discuss some finite difference methods on graphs, namely Laplacian learning and Poisson learning. Finally, Section 13 provides a brief conclusion to this Element as we look forward (or sideways) to additional topics that appear in the companion volume [Reference van Gennip and Budd190].

2 History and Literature Overview

Any area of mathematical enquiry will have its roots in what came before. The field of differential equations on graphs for clustering and classification problems is no exception, so there is always the risk that the starting point of any historical overview may feel somewhat arbitrary. It is inescapably personal too; the priorities and narratives generated by our own research inevitably have influenced our departure point for this story and the route we will follow afterwards. As such, the references given in this section are not meant to be exhaustive, nor are all contributions from the references that are given always exhaustively described. One reason this field has generated such enthusiastic interest is that it brings together ideas from many different directions, from statistics and machine learning to discrete mathematics and mathematical analysis. We encourage any attempts to understand the history of this field from perspectives different from the one we provide here, shining more light on the exciting diversity of viewpoints differential equations on graphs have to offer.

For a fuller overview of the literature, we refer to section 1.2 of the companion volume [Reference van Gennip and Budd190]. Neither the current section nor the literature section in the companion volume are exhaustive overviews, if such a thing is even a possibility, but they should provide enough starting points for someone who is eager to learn about this active field of research. We apologize for the undoubtedly many important works in their respective areas that are missing from our bibliography.

As a double ignition point for the growing interest in differential equations on graphs by the (applied) mathematical analysis community in the late noughties and early tens of the twenty-first century, especially in relation to image processing and data analysis applications, we mention works by Abderrahim Elmoataz and collaborators such as [Reference Desquesnes, Elmoataz and Lézoray63, Reference Elmoataz, Lézoray and Bougleux75, Reference Lozes, Elmoataz and Lézoray131] – which have a strong focus on $p$ -Laplacians, $\infty$ -Laplacians, and morphological operations such as dilation and erosion on graphs, as well as processing of point clouds in three dimensions – and works by Andrea L. Bertozzi and collaborators, like [Reference Bertozzi and Flenner21, Reference Bertozzi and Flenner22, Reference Ekaterina, Tijana and Bertozzi144, Reference Bertozzi189, Reference Nestor, Braxton and Bertozzi191] – which deal with the Ginzburg–Landau functional on graphs and derived dynamics such as the Allen–Cahn equation and Merriman–Bence–Osher (MBO) scheme on graphs. This is not to say there were no earlier investigations into the topic of differential operators and equations on graphs, for example in [Reference Neuberger160, Reference Wardetzky, Mathur, Kälberer and Grinspun195] or in the context of consensus problems [Reference Moreau150, Reference Reza and Murray164], or variational ‘energy’ minimization problems on graphs, such as in the context of Markov random fields [Reference Szeliski, Zabih, Scharstein, Veksler, Kolmogorov, Agarwala, Tappen and Rother181], but the two groups we mentioned earlier provided a sustained drive for the investigation of both the applied and theoretical aspects of differential equations on graphs in the setting on which we are focusing in the current Element.

We note that in the current Element, when we talk about differential equations on graphs, we typically mean partial differential equations whose spatial derivatives have been replaced by discrete finite difference operators on graphs (see Section 3.2), leading to ODEs or finite-difference equations. A priori this is different from the systems of PDEs formulated on the edges of a network that are coupled through boundary conditions on the nodes, which are also studied under the name ‘differential equations on graphs (or networks)’ [Reference Vol’pert192].

New research directions tend not to spring into being fully formed, and also the works mentioned earlier have had the benefit of a rich pre-existing literature in related fields. In particular, many of the variational models and differential equations that are studied on graphs have been inspired by continuum cousins that came before and by the, often sizeable, literature that exists about those models and equations. Examples are the Allen–Cahn equation [Reference Allen and Cahn5], the Cahn–Hilliard equation [Reference Cahn and Hilliard40], the Merriman–Bence–Osher scheme [Reference Barry, Bence and Osher146, Reference Barry, Bence and Osher147], flow by mean curvature [Reference Fred, Taylor and Wang6, Reference Brakke30, Reference Osher and Sethian167], total variation flow [Reference Fuensanta, Coloma, Vicent and Mazón9], and the Mumford–Shah and Chan–Vese variational models [Reference Chan and Vese47, Reference Mumford and Shah159].

Inspiration has also come from other directions, such as discrete calculus [Reference Grady and Polimeni101] and numerical analysis and scientific computing [Reference Belongie, Fowlkes, Chung, Malik, Anders, Gunnar, Mads and Johansen18]. The latter fields not only provide state-of-the-art methods that allow for fast implementations of the graph methods on graphs with many nodes – something which is very important in modern-day applications that deal with large data sets or high-resolution images – but also offer theoretical tools for dealing with discretizations of continuum equations, even though the specifics may differ substantially between a discretization designed with the goal of approximating a continuum problem as accurately as possible and a discretization which is a priori determined by the graph structure that is given by the problem or inherent in the application at hand.

Some of the most prominent applications that have been tackled by differential equations and variational models on graphs, especially by the models and methods central to the current Element, are graph clustering and classification [Reference Schaeffer174], and other applications that are – or can be formulated to become – related, such as community detection [Reference Porter, Onnela and Mucha171], image segmentation [Reference Chan and Shen46], and graph learning [Reference Xia, Sun, Shuo, Aziz, Wan, Pan and Liu200]. For an extensive look at these, and other, applications, we refer to chapter 4 of the companion volume [Reference van Gennip and Budd190]. In the current Element we focus on the core graph clustering and classification applications.

Since the early pioneering works we mentioned at the start of this section, differential equations on graphs have enjoyed a lot of attention from applied analysts and researchers from adjacent fields. As a very rough attempt at classification of these different research efforts, we distinguish between those papers that study differential equations and variational models purely at the discrete-graph level and those that are interested in continuum limits of those discrete equations and models as a way to establish their consistency. (Some papers may combine elements of both categories.) The focus of this Element is on the first category. For a closer look at the second category, we refer to chapter 7 of [Reference van Gennip and Budd190].

In this first category, we encounter papers that study particular graph-based differential operators or graph-based dynamics, such as the eikonal equation (and the related eikonal depth), $p$ -eikonal equation, $p$ - and $\infty$ -Laplacians [Reference Böhle, Kuehn, Mulas and Jost38, Reference Calder and Ettehad42, Reference Elmoataz, Toutain and Tenbrinck76, Reference Manfredi, Oberman and Sviridov138, Reference Zhou, Schölkopf, Walter, Robert and Hanbury206], semigroup evolution equations [Reference Mugnolo153], dynamics [Reference Knill and Rak121] such as mean curvature flow and morphological evolution equations (related to morphological filtering and graph-based front propagation) [Reference Chakik, Abdallah and Desquesnes70, Reference Vinh-Thong, Elmoataz and Lézoray182] or advection [Reference Rak172], or discrete variational models such as trend filtering on graphs [Reference Smola, Alexander and Tibshirani194] and the graph Mumford–Shah model [Reference Huiyi, Justin, Bertozzi, Bae, Chan, Tony and Lysaker106, Reference Parker and Feng168].

Of special interest in the context of the current Element are the graph Allen–Cahn equation, graph MBO scheme, and graph mean curvature flow [Reference Budd33, Reference Ekaterina, Tijana and Bertozzi144, Reference van Gennip187, Reference Nestor, Braxton and Bertozzi191], which are discussed in much greater detail in Sections 7, 8, and 9. For details about applications in which these graph-based dynamics have been used, we refer to chapter 4 of [Reference van Gennip and Budd190]. Some of the applications that are considered in that volume require an extension of the classical two-phase versions of the Allen–Cahn and MBO dynamics to a multiclass context [Reference Cristina, Arjuna, Percus, of Fred, Ana and Marsico94, Reference Cristina, Ekaterina, Bertozzi, Arjuna and Percus95]; other variations on multiclass MBO have been developed, such as an incremental reseeding method [Reference Bresson, Huiyi, Laurent, Szlam, von Brecht, Xue-Cheng, Egil and Lysaker31] and auction dynamics [Reference Jacobs, Merkurjev and Esedoḡlu110].

The publications focusing on the discrete level also include papers that study connections between graph-based differential operators or dynamics on the one hand, and on the other hand graph-based concepts that are useful for studying graph structures, which sometimes already had a history outside of the differential-equations-on-graphs literature. For example: the modulus on graphs [Reference Albin, Brunner, Perez, Poggi-Corradini and Wiens3], Cheeger cuts and ratio cuts [Reference Merkurjev, Bertozzi, Yan and Lerman141], ranking algorithms and centrality measures such as heat kernel PageRank [Reference Ekaterina, Bertozzi and Chung142], nonconservative alpha-centrality (as opposed to conservative PageRank) [Reference Ghosh and Lerman98], centrality measures and community structure based on the interplay between dynamics via parameterized (or generalized) Laplacians and the network structure [Reference Yan, Teng, Lerman and Ghosh201], random walks and SimRank on uncertain graphs (i.e., graphs in which each edge has a probability of existence assigned to it) [Reference Zhu, Zou and Jianzhong209], distance and proximity measures on graphs [Reference Avrachenkov, Chebotarev and Rubanov12, Reference Chebotarev, Frank and Barbaresco48], and a hubs-biased resistance distance (based on graph Laplacians) [Reference Estrada and Mugnolo79].

We also draw attention here to graph-based active learning [Reference Kevin, John, Jason, Xoaquin, Zhan, Jeff, Bertozzi, Edmund and Garber148], Laplacian learning [Reference Zhu, Ghahramani, Lafferty, Tom and Mishra211], Poisson learning [Reference Calder, Cook, Thorpe, Slepčev, Hal and Singh44], and results on the Lipschitz regularity of functions in terms of Laplacians on point clouds [Reference Calder, Trillos and Lewicka45]. For overview articles about graph-based methods in machine learning and image processing, we refer to [Reference Bertozzi, of Sirakov, Boyan, Paulo and Viana20, Reference Bertozzi, Merkurjev, of Kimmel, Ron and Tai23, Reference Chong, Ding, Yan and Pan51].

In the continuum setting, the dynamics of both the Allen–Cahn equation and the MBO scheme are known to approximate flow by mean curvature, in a sense that has been made precise through convergence analysis [Reference Barles and Georgelin16, Reference Lia and Kohn32, Reference Caffarelli and Souganidis39, Reference Evans80, Reference Laux and Lelmi125, Reference Laux and Yip126]. Rigorous connections between the graph Allen–Cahn equation and graph MBO scheme have been established in [Reference Budd33, Reference Budd and van Gennip35, Reference Budd and van Gennip36, Reference Budd, van Gennip and Latz37]; their connections with graph mean curvature flow are open questions. Details about these established connections and open questions, as well as a closer look at the various dynamics in the continuum setting, can be found in chapter 6 of [Reference van Gennip and Budd190].

In many of the works just cited, the graphs under consideration on which the variational models or differential equations are formulated are finite, undirected graphs, with edges that connect nodes pairwise and that, if weighted, have a positive weight. Moreover, the graphs are unchanging – also, for the continuum limits that we briefly mentioned, even though the limit $| V | \to \infty$ is considered, typically at each fixed $| V |$ , the graph structure is static.

This leaves a lot of room for generalizations and extensions. In this Element we refrain from delving into these generalizations in too much detail, although in Section 4 we do briefly discuss Laplacians on directed graphs [Reference Bauer17, Reference Fanuel, Alaíz and Suykens87, Reference Hein, Audibert and von Luxburg104, Reference Zhou, Schölkopf, Hofmann, Lawrence, Yair and Bottou207]. Other possible generalizations are to multislice networks [Reference Mucha, Richardson, Macon, Porter and Onnela151], hypergraphs (in which edges can connect more than two nodes) [Reference Bosch, Klamt and Stoll26; Reference Ikeda and Uchida108], metric graphs and quantum graphs (in which edges are represented by intervals of the real line) [Reference Kennedy, Kurasov, Léna and Mugnolo118], signed graphs that can have positive and negative edge weights [Reference Cucuringu, Pizzoferrato and van Gennip60], metric random walk spaces (of which locally finite positively edge-weighted connected graphs are a special case) [Reference Mazón, Marcos and Toledo139; Reference Mazón, Marcos and Toledo140]Footnote ⁷, and graphs changing in time [Reference Bonaccorsi, Cottini and Mugnolo25].

Of interest also is the connection between methods on graphs and the constructions used to build the graphs, as is considered in, for example, [Reference Trillos, Nicolás and Yang97]. Some more details about building graph models for specific applications are given in section 4.2 of [Reference van Gennip and Budd190].

3 Calculus on Undirected Edge-Weighted Graphs

3.1 Graphs, Function Spaces, Inner Products, and Norms

Except where explicitly stated otherwise, in this Element we consider finite,Footnote ⁸ simple (i.e., without multi-edgesFootnote ⁹ and without self-loopsFootnote ¹⁰), connected,Footnote ¹¹ edge-weighted graphs $G = (V, E, ω)$ – if a graph $G$ is mentioned without further specification, it is assumed to have these properties. Figure 3.1 shows an example of such a graph. Here $V$ is the set of nodes or vertices of the graph, $E \subseteq V \times V$ is the set of edges,Footnote ¹² and $ω : V \times V \to R$ is the edge weight function which vanishes on $E^{c} := (V \times V) ∖ E$ ; thus $ω |_{E^{c}} = 0$ . Unless otherwise specified, we assume that $ω |_{E} > 0$ . In this framework, an unweighted graph $G = (V, E)$ can be viewed as an edge-weighted graph $G = (V, E, ω)$ with $ω |_{E} = 1$ .

Figure 3.1 An example of a finite, simple, connected, and undirected graph.

We will assume that $| V | \geq 2$ . It will often be useful to identify the nodes in $V$ with the numbers $1$ to $n \in N$ ,Footnote ¹³ and we write $V = [n] := {1, 2, \dots, n}$ . Unspecified nodes from $V$ we denote by $i, j, k, \dots$ . The edge from $i$ to $j$ is denoted by $(i, j)$ ; the nodes $i$ and $j$ are endpoints of this edge. For any node function $u$ , that is, a function $u$ whose domain is (a subset of) $V$ , we write $u_{i}$ for $u (i)$ . Similarly, for any function $φ$ whose domain is (a subset of) $V \times V$ , we write $φ_{i j}$ for $φ (i, j)$ . Any such function which vanishes on $E^{c}$ we will call an edge function. We define the function spaces of node and edge functions,

\begin{matrix} V & := {u : V \to R}, & E & := {φ : V \times V \to R : φ |_{E^{c}} = 0} . \end{matrix}

We use subscripts to indicate alternative codomains: if $A$ is a set, then

\begin{matrix} V_{A} & := {u : V \to A}, & E_{A} & := {φ : V \times V \to A : φ |_{E^{c}} = 0} . \end{matrix}

In particular, $V = V_{R}$ and $E = E_{R}$ .

When it is not explicitly stated differently (as in Section 4), we consider undirected graphs, namely graphs for which $(i, j) \in E$ if and only if $(j, i) \in E$ . Two nodes $i, j \in V$ in an undirected graph are called adjacent, or neighbours, if $(i, j) \in E$ . The condition that $j \neq i$ , which is sometimes explicitly included in the definition of neighbour, is superfluous under our assumption of absence of self-loops. The matrix $A$ with entries $A_{i j} := ω_{i j}$ is called the (weighted) adjacency matrix, or weight matrix of the graph. In an undirected graph, we require $ω$ to be symmetric in its arguments, that is, for all $i, j \in V$ , $ω_{i j} = ω_{j i}$ . Some authors demand their edge functions $φ \in E$ to be skew-symmetric, namely $φ_{i j} = - φ_{j i}$ . We do not require this assumption and so will not impose it.

Since we consider graphs without self-loops, for all $i \in V$ , $ω_{i i} = 0$ . The degree of node $i$ is $d_{i} := \sum_{j \in V} ω_{i j}$ . A graph is connected if, for all $i, j \in V$ with $i \neq j$ , there exist finitely many nodes $i_{1}, i_{2}, \dots, i_{k}$ such that $i_{1} = i$ , $i_{k} = j$ , and $ω_{i_{1} i_{2}} \dots ω_{i_{k - 1} i_{k}} > 0$ . A graph which is not connected is called disconnected. Since our graphs are assumed to be connected (unless stated otherwise), we have, for all $i \in V$ , $d_{i} > 0$ .

In some situations it will be useful to have a shorthand notation to indicate adjacency of nodes (in an undirected graph): for $i \in V$ , we write $j \sim i$ if and only if $j \in V$ and $(i, j) \in E$ . Equivalently under our assumption that $ω |_{E} > 0$ , we have that $j \sim i$ if and only if $j \in V$ and $ω_{i j} > 0$ .

Our first step to defining a calculus on node and edge functions is to define an inner product structureFootnote ¹⁴ on $V$ and $E$ . LetFootnote ¹⁵ $r \in [0, 1]$ , $q \in [1 / 2, 1]$ (see Remark 3.9), $u, v \in V$ , and $φ, ψ \in E$ . Then

⟨ u, v ⟩_{V} := \sum_{i \in V} d_{i}^{r} u_{i} v_{i} and ⟨ φ, ψ ⟩_{E} := \frac{1}{2} \sum_{(i, j) \in E} ω_{i j}^{2 q - 1} φ_{i j} ψ_{i j} .

(3.1)

We note that the factor $\frac{1}{2}$ compensates for the ‘double count’ of edges $(i, j)$ and $(j, i)$ in an undirected graph. In this and other circumstances it can be convenient to rewrite the sum over $(i, j) \in E$ as a double sum over $i, j \in V$ . This can be done since $φ_{i j} = ψ_{i j} = ω_{i j} = 0$ if $(i, j) ̸ \in E$ . In this case we have to interpret $ω_{i j}^{0}$ to be $0$ (not $1$ !) if $(i, j) ̸ \in E$ .

These inner products induce norms in the usual way: $∥ u ∥_{V} := \sqrt{⟨ u, u ⟩_{V}}$ and $∥ φ ∥_{E} := \sqrt{⟨ φ, φ ⟩_{E}}$ . Other commonly used norms on $V$ and $E$ are the $p$ -norms (for $p \in [1, \infty)$ ) and $\infty$ -norms:

We note that the $p = 2$ norms are the norms induced by the inner products.

If $r = 0$ , then $∥ u ∥_{V}$ is the Euclidean $2$ -norm of the vector with components $u_{i}$ . To avoid specifying $r = 0$ , we may write $∥ u ∥_{2}$ in this case, with $ℓ^{2}$ inner product $⟨ \cdot, \cdot ⟩_{2}$ . We also use this notation for vectors not (necessarily) meant to be interpreted as node functions. Similarly, if $r = 0$ , then $∥ u ∥_{V, p}$ is equal to the $p$ -norm for the vector with components $u_{i}$ and we may write $∥ u ∥_{p}$ instead.

If $S \subseteq V$ , we define $χ_{S}$ to be its indicator (or characteristic) function, that is,

(χ_{S})_{i} := (\begin{matrix} 1, & if i \in S, \\ 0, & if i \in S^{c} := V ∖ S . \end{matrix})

In particular $χ_{V} = 1 \in V$ (and we may write $1$ for $χ_{V}$ , or $c$ for $c χ_{V}$ , if $c \in R$ is a constant) and $χ_{\emptyset} = 0 \in V$ . Similarly, we define indicator functions $χ_{E^{'}}$ for edge subsets $E^{'} \subseteq E$ . We will also need a variant indicator function $ı_{A}$ for $A \subseteq R$ , defined to be $ı_{A} (x) = 0$ if $x \in A$ and $ı_{A} (x) = + \infty$ if $x \in R ∖ A$ .

The volume of a node subset $S \subseteq V$ and volume of an edge subset $E^{'} \subseteq E$ are defined to be (for any $p \in [1, \infty)$ ), respectively,

vol S := ∥ χ_{S} ∥_{V, p}^{p} = \sum_{i \in S} d_{i}^{r}, vol E^{'} := ∥ χ_{E^{'}} ∥_{E, p}^{p} = \frac{1}{2} \sum_{(i, j) \in E^{'}} ω_{i j}^{2 q - 1} .

(3.2)

Lemma 3.1. In this lemma we interpret $\frac{1}{\infty} = 0$ . Let $u, v \in V$ and $φ, ψ \in E$ . For all $p, p^{'} \in [1, \infty]$ such that $\frac{1}{p} + \frac{1}{p^{'}} = 1$ , these Hölder inequalities hold:

∥ u v ∥_{V, 1} \leq ∥ u ∥_{V, p} ∥ v ∥_{V, p^{'}} and ∥ φ ψ ∥_{E, 1} \leq ∥ φ ∥_{E, p} ∥ ψ ∥_{E, p^{'}} .

Moreover, these embedding estimates are satisfied, if $s, t \in [1, \infty]$ with $s \leq t$ :

∥ u ∥_{V, s} \leq {(vol V)}^{\frac{1}{s} - \frac{1}{t}} ∥ u ∥_{V, t} and ∥ φ ∥_{E, s} \leq {(vol E)}^{\frac{1}{s} - \frac{1}{t}} ∥ φ ∥_{E, t} .

Furthermore,

lim_{p \to \infty} ∥ u ∥_{V, p} = ∥ u ∥_{V, \infty} and lim_{p \to \infty} ∥ φ ∥_{E, p} = ∥ φ ∥_{E, \infty} .

In fact, for all $p \in [1, \infty)$ we have

min_{i \in V} d_{i}^{\frac{r}{p}} ∥ u ∥_{V, \infty} \leq ∥ u ∥_{V, p} \leq {(vol V)}^{\frac{1}{p}} ∥ u ∥_{V, \infty}

and

2^{- \frac{1}{p}} min_{(i, j) \in E} ω_{i j}^{\frac{2 q - 1}{p}} ∥ φ ∥_{E, \infty} \leq ∥ φ ∥_{E, p} \leq {(vol E)}^{\frac{1}{p}} ∥ φ ∥_{E, \infty} .

Proof. For a proof we refer to lemma 2.1.1 of [Reference van Gennip and Budd190]. □

Instead of interpreting $φ \in E$ in the standard way as a function from $V \times V$ to $R$ , we may also view it as a function from $V$ to $V$ : for all $i \in V$ , $φ_{i \cdot} \in V$ . This prompts the following definitions of a node-dependent inner product and $p$ - and $\infty$ -norms: for all $φ, ψ \in E$ , all $i \in V$ , and all $p \in [1, \infty)$ ,

\begin{matrix} (φ, ψ)_{i} & := \frac{1}{2} \sum_{j \in V} ω_{i j}^{2 q - 1} φ_{i j} ψ_{i j}, \\ ∥ φ ∥_{i, p} & := {(\frac{1}{2} \sum_{j \in V} ω_{i j}^{2 q - 1} | φ_{i j} |^{p})}^{\frac{1}{p}}, \\ ∥ φ ∥_{i, \infty} & := max {| φ_{i j} | : j \in V} . \end{matrix}

(3.3)

Corollary 3.2. In this corollary we interpret $\frac{1}{\infty} = 0$ . Let $φ, ψ \in E$ and $i \in V$ . For $p, p^{'} \in [1, \infty]$ such that $\frac{1}{p} + \frac{1}{p^{'}} = 1$ , a Hölder inequality holds:

∥ φ ψ ∥_{i, 1} \leq ∥ φ ∥_{i, p} ∥ ψ ∥_{i, p^{'}} .

Moreover, an embedding estimate is satisfied, if $s, t \in [1, \infty]$ with $s \leq t$ :

∥ φ ∥_{i, s} \leq {(\frac{1}{2} \sum_{j \in V} ω_{i j}^{2 q - 1})}^{\frac{1}{s} - \frac{1}{t}} ∥ φ ∥_{i, t} .

Furthermore, for all $φ \in E$ and for all $p \in [1, \infty)$ ,

2^{- \frac{1}{p}} min_{\begin{matrix} j \in V \\ ω_{i j} > 0 \end{matrix}} ω_{i j}^{\frac{2 q - 1}{p}} ∥ φ ∥_{i, \infty} \leq ∥ φ ∥_{i, p} \leq {(\frac{1}{2} \sum_{j \in V} ω_{i j}^{2 q - 1})}^{\frac{1}{p}} ∥ φ ∥_{i, \infty},

and thus in particular,

lim_{p \to \infty} ∥ φ ∥_{i, p} = ∥ φ ∥_{i, \infty} .

Proof. We refer to corollary 2.1.2 in [Reference van Gennip and Budd190]. □

3.2 Graph Gradient, ( $p$ -)Dirichlet Energy, ( $p$ -)Laplacian, and Total Variation

To be able to do calculus on graphs, we require a discrete analogue of the derivative: the graph gradient. For $u \in V$ the gradient $\nabla u \in E$ is defined byFootnote ¹⁶

(\nabla u)_{i j} := ω_{i j}^{1 - q} (u_{j} - u_{i}) .

Remark 3.3. The graph gradient provides a good motivation for defining the node-dependent inner product and norms in (3.3). The continuum gradient $\nabla u$ of a function $u : R^{m} \to R$ is a vector-valued function with length $∥ \nabla u ∥_{2}$ – or, in general, $p$ -norm – a real-valued function on $R^{m}$ . Analogously, in the graph setting the node-dependent norms from (3.3) are real-valued functions on $V$ .

In Ta et al. [Reference Vinh-Thong, Elmoataz and Lézoray183] a connection is made between the node-dependent $\infty$ -norms and morphological dilation and erosion operators, if $q = 1$ or $ω (V \times V) \subseteq {0, 1}$ . Using superscripts $+$ and $-$ to denote the positive part $x^{+} := max (0, x)$ and negative part $x^{-} := - min (0, x)$ of a number $x \in R$ , respectively, we compute

\begin{matrix} ∥ (\nabla u)^{+} ∥_{i, \infty} & = max_{j \sim i} max (0, u_{j} - u_{i}) \\ = max (u_{j} : j = i or j \sim i) - u_{i} =: (δ_{a} (u))_{i} - u_{i}, \\ ∥ (\nabla u)^{-} ∥_{i, \infty} & = max_{j \sim i} (- min (0, u_{j} - u_{i})) = - (min (u_{j} : j = i or j \sim i) - u_{i}) \\ = u_{i} - min (u_{j} : j = i or j \sim i) =: u_{i} - (ε_{a} (u))_{i} . \end{matrix}

In [Reference Vinh-Thong, Elmoataz and Lézoray183] the operators $δ_{a}$ and $ε_{a}$ are called the dilation and erosion operators, respectively, in analogy to similar operators in the continuum setting. We can understand these names, if we apply the operators to the indicator function $u = χ_{S}$ of a node subset $S \subseteq V$ . Then $δ_{a} (u) = χ_{S^{δ}}$ and $ε_{a} (u) = χ_{S^{ε}}$ , where the dilated set $S^{δ}$ consists of $S$ with all the nodes that have at least one neighbour in $S$ added, and the eroded set $S^{ε}$ consists of $S$ with all the nodes that have at least one neighbour in $S^{c}$ (i.e., $V ∖ S$ ) removed. In El Chakik et al. [Reference Chakik, Abdallah and Desquesnes70] on weighted graphs the nonlocal dilation operator and nonlocal erosion operator,

(N L D (u))_{i} := u_{i} + ∥ (\nabla u)^{+} ∥_{i, \infty} and (NLE (u))_{i} := u_{i} - ∥ (\nabla u)^{-} ∥_{i, \infty},

respectively, are introduced. The preceding computation shows that in the case of an unweighted graph, $δ_{a} = N L D$ and $ε_{a} = NLE$ .

We define the graph divergence $d i v : E \to V$ to be given by

(d i v φ)_{i} := \frac{1}{2} d_{i}^{- r} \sum_{j \in V} ω_{i j}^{q} (φ_{j i} - φ_{i j}),

since this is the adjoint of the graph gradient: it can be checked that, for all $u \in V$ and all $φ \in E$ , $⟨ \nabla u, φ ⟩_{E} = ⟨ u, d i v φ ⟩_{V}$ . We thus define a graph LaplacianFootnote ¹⁷ $Δ : V \to V$ as the divergence of the gradient:

(Δ u)_{i} := (d i v \nabla u)_{i} = d_{i}^{- r} \sum_{j \in V} ω_{i j} (u_{i} - u_{j}) .

(3.4)

We note that $Δ$ is independent of $q$ . If $r = 0$ , $Δ$ is called the combinatorial (or unnormalized) graph Laplacian, while if $r = 1$ , it is called the random walk (or asymmetrically normalized) graph Laplacian. We note that $Δ$ is self-adjoint:

⟨ u, Δ v ⟩_{V} = ⟨ Δ u, v ⟩_{V} .

(3.5)

Remark 3.4. Since we consider finite graphs with $| V | = [n]$ , there is a bijective correspondence between functions in $u \in V$ and (column) vectors in $R^{n}$ with entries $u_{i}$ . It follows that linear operators on $V$ can be represented by matrices. In particular, the graph Laplacian $Δ$ has associated matrix $D^{- r} (D - A) = D^{1 - r} - D^{- r} A$ , where $D$ is the diagonal degree matrix with $D_{i i} := d_{i}$ and, as in Section 3.1, the matrix $A$ is the weighted adjacency matrix with entries $A_{i j} = ω_{i j}$ . In particular, the matrix associated to the combinatorial graph Laplacian is $D - A$ , and to the random walk graph LaplacianFootnote ¹⁸ is $I - D^{- 1} A$ , where by $I$ we denote the identity matrix of the appropriate size. To avoid complicating the notation and text, we will freely use both kinds of representation without always explicitly noting any switches or changing notation; for example, $u$ can denote a function or vector and $Δ$ may be the Laplacian operator or matrix. We note that the multiplication of two node functions $u v$ becomes a matrix-vector multiplication $U v$ in vector notation, with $U$ the diagonal matrix with $U_{i i} = u_{i}$ .

Remark 3.5. The name random walk Laplacian for $Δ$ with $r = 1$ comes from the fact that the $D^{- 1} A$ term in the associated matrix $I - D^{- 1} A$ (see Remark 3.4) is a right-stochastic matrix, that is, its rows sum to one, and can thus be interpreted as the transition matrix of a discrete-time Markov chain with $(D^{- 1} A)_{i j} = d_{i}^{- 1} ω_{i j}$ being the probability of the transition from state $i$ to state $j$ in a single time step. Associating the states with nodes of a graph, the Markov chain describes a random walk on that graph. The (negative) unnormalized graph Laplacian $- Δ$ (with $r = 0$ ) is the (infinitesimal) generator of a continuous-time Markov chain,Footnote ¹⁹ as described in detail in remark 2.1.5 in [Reference van Gennip and Budd190].

Remark 3.6. We would be remiss not to mention the symmetrically normalized Laplacian, which is represented in matrix form as

Δ^{s y m} := D^{- \frac{1}{2}} (D - A) D^{- \frac{1}{2}} = D^{r - \frac{1}{2}} (D^{1 - r} - D^{- r} A) D^{- \frac{1}{2}} = I - D^{- \frac{1}{2}} A D^{- \frac{1}{2}} .

(3.6)

This graph Laplacian appears frequently in the literature, for example, Chung [Reference Chung55], but is not captured in the preceding framework, as it would require a different scaling for each term in the gradient $\nabla u$ . We refer to Zhou and Schölkopf [206, section 2.3] for details. We note that $Δ^{s y m}$ is self-adjoint, with respect to $⟨ \cdot, \cdot ⟩_{V}$ for $r = 0$ . More generally, the two-parameter normalized graph Laplacian

Δ^{(s, t)} := D^{- s} (D - A) D^{- t}

(3.7)

is self-adjoint with respect to $⟨ \cdot, \cdot ⟩_{V}$ forFootnote ²⁰ $r = s - t$ . Furthermore, for all $a \in R$

D^{- a} Δ^{(s, t)} D^{a} = Δ^{(s + a, t - a)}

and so $Δ^{(s, t)}$ and $Δ^{(s^{'}, t^{'})}$ are similar whenever $s + t = s^{'} + t^{'}$ . In particular, $Δ^{(s, t)}$ is similar to the symmetric matrix $Δ^{((s + t) / 2, (s + t) / 2)}$ and thus $Δ$ and $Δ^{s y m}$ are similar when $r = 1$ . For further details, we refer to Budd [Reference Budd34, chapter 2].

In Smola and Kondor [Reference Smola, Kondor, Bernhard and Warmuth180, theorem 3] it is shown that graph Laplacians are, in a sense, invariant to vertex permutations. It is also shown that they are, in some sense, the only linear operators that depend linearly on the graph’s adjacency matrix to be so.

In Zhou and Belkin [Reference Zhou, Belkin, of Gordon, Geoffrey, David and Dudík208] a geometry graph Laplacian is used, defined as

L^{g e o m} := I - (D^{g e o m})^{- 1} A^{g e o m},

where $A^{g e o m} := D^{- 1} A D^{- 1}$ and $D^{g e o m}$ is the corresponding degree matrix. We also mention that in Merkurjev et al. [Reference Merkurjev, Nguyen and Wei145] a multiscale Laplacian is introduced.

The graph Dirichlet ‘energy’ of $u \in V$ (as defined in e.g. Van Gennip and Bertozzi [Reference Bertozzi189, appendix A]) is

\begin{matrix} \frac{1}{2} ∥ \nabla u ∥_{E}^{2} & = \frac{1}{2} ⟨ u, Δ u ⟩_{V} = \frac{1}{4} \sum_{i, j \in V} ω_{i j} (u_{i} - u_{j})^{2} \\ = max {⟨ d i v φ, u ⟩_{V} : φ \in E and ∥ φ ∥_{E} \leq 1} . \end{matrix}

(3.8)

We observe that $∥ \nabla u ∥_{E}^{2}$ does not depend on $q$ .

For Poincaré inequalities [Reference Dacorogna, Gangbo and Subía61, Reference Dym and McKean66, Reference Evans81] involving the Dirichlet energy on (infinite) graphs, we refer to [Reference Coulhon and Koskela58, Reference Levi, Santagati, Tabacco and Vallarino129]. Computing the Gateaux derivative of the Dirichlet energy, we recognize the graph Laplacian:

\begin{matrix} {(\frac{d}{d α} \frac{1}{2} ∥ \nabla (u + α v) ∥_{E}^{2})}_{α = 0} & = \frac{1}{2} \frac{d}{d α} [⟨ Δ u, u ⟩_{V} + 2 α ⟨ Δ u, v ⟩_{V} + α^{2} ⟨ Δ v, v ⟩_{V}]_{α = 0} \\ = ⟨ Δ u, v ⟩_{V}, \end{matrix}

(3.9)

where $α \in R$ , $u, v \in V$ , and we have used the self-adjointness of $Δ$ from (3.5).

In [Reference Budd34, theorem 4.1.2] it is shown that the two-parameter normalized graph Laplacian from (3.7) also is the first variation of a Dirichlet-type functional:

{(\frac{d}{d α} \frac{1}{2} ∥ \nabla D^{- t} (u + α v) ∥_{E}^{2})}_{α = 0} = ⟨ Δ^{(s, t)} u, v ⟩_{V},

where $(D^{- t} u)_{i} = d_{i}^{- t} u_{i}$ and in the $V$ -inner product $r = s - t$ .

Changing the edge function norm in the maximum formulation in (3.8) to a maximum norm leads to a definition of graph total variation:

\begin{matrix} T V (u) & := max {⟨ d i v φ, u ⟩_{V} : φ \in E and ∥ φ ∥_{E, \infty} \leq 1} \\ = \frac{1}{2} \sum_{i, j \in V} ω_{i j}^{q} | u_{i} - u_{j} | = ∥ \nabla u ∥_{E, 1} . \end{matrix}

(3.10)

The second equality in (3.10) follows since the maximum in the definition is achieved at $φ = s g n (\nabla u)$ (where the signum function acts elementwise: $φ_{i j} = s g n (u_{j} - u_{i})$ ), where $s g n$ can be any representative of the signum function equivalence class in $L_{l o c}^{1} (R)$ , that is, $s g n (x) = 1$ if $x > 0$ , $s g n (x) = - 1$ if $x < 0$ , and $s g n (0)$ may be defined to equal any arbitrary, but determined, real number.Footnote ²¹ This can be seen by rewriting $⟨ d i v φ, u ⟩_{V} = ⟨ φ, \nabla u ⟩_{E}$ . This will play a role again when we consider curvature in Section 9. For the final equality in (3.10) we used that the edge weights $ω_{i j}$ are nonnegative.

If $u \in V_{{0, 1}}$ , then $(u_{i} - u_{j})^{2} = | u_{i} - u_{j} |$ and thus by (3.8), if $q = 1$ , then

∥ \nabla u ∥_{E}^{2} = \frac{1}{2} \sum_{i, j \in V} ω_{i j} (u_{i} - u_{j})^{2} = \frac{1}{2} \sum_{i, j \in V} ω_{i j} | u_{i} - u_{j} | = T V (u) .

Analogously to the ‘anisotropic’ total variation in (3.10), an isotropic total variation can be defined. See Van Gennip et al. [Reference Nestor, Braxton and Bertozzi191, remark 2.1].

The graph $p$ -Dirichlet energy – so called in analogy to the Dirichlet energy in (3.8) – is

\begin{matrix} \frac{1}{p} ∥ \nabla u ∥_{E, p}^{p} & = \frac{1}{2 p} \sum_{i, j \in V} ω_{i j}^{2 q - 1} | ω_{i j}^{1 - q} (u_{j} - u_{i}) |^{p} \\ = \frac{1}{2 p} \sum_{i, j \in V} ω_{i j}^{(2 - p) q + p - 1} | u_{i} - u_{j} |^{p} . \end{matrix}

(3.11)

We note that for $p = 2$ indeed we recover the graph Dirichlet energy from (3.8) and for $p = 1$ we obtain the graph total variation from (3.10).

For $p \in [1, \infty)$ , the graph $p$ -Laplacian $Δ_{p} : V \to V$ is definedFootnote ²² via the Gateaux derivative of the $p$ -Dirichlet energy by requiring that, for all $u, v \in V$ ,

\begin{matrix} ⟨ Δ_{p} u, v ⟩_{V} & = {(\frac{d}{d s} \frac{1}{p} ∥ \nabla (u + s v) ∥_{E, p}^{p})}_{s = 0} \\ = \frac{1}{2} \sum_{i \in V} \sum_{\begin{matrix} j \in V \\ u_{j} \neq u_{i} \end{matrix}} ω_{i j}^{(2 - p) q + p - 1} | u_{i} - u_{j} |^{p - 2} (u_{i} - u_{j}) (v_{i} - v_{j}) \\ = \sum_{i \in V} \sum_{\begin{matrix} j \in V \\ u_{i} \neq u_{j} \end{matrix}} ω_{i j}^{(2 - p) q + p - 1} | u_{i} - u_{j} |^{p - 2} (u_{i} - u_{j}) v_{i} . \end{matrix}

(3.12)

(We note that we used the symmetry of $ω$ .) Thus, for all $u \in V$ and for all $i \in V$ ,

(Δ_{p} u)_{i} := d_{i}^{- r} \sum_{\begin{matrix} j \in V \\ u_{j} \neq u_{i} \end{matrix}} ω_{i j}^{(2 - p) q + p - 1} | u_{i} - u_{j} |^{p - 2} (u_{i} - u_{j}) .

(3.13)

From (3.9) we see that we recover our standard graph Laplacian if we choose $p = 2$ . For other values of $p$ , the operator $Δ_{p}$ is not linear. We note that (for general $p$ ), if $q = 1$ , then the exponent of $ω_{i j}$ in $Δ_{p}$ equals $1$ (see Remark 3.9).

By splitting the sum in (3.13), we obtain

\begin{matrix} (Δ_{p} u)_{i} & = d_{i}^{- r} \sum_{\begin{matrix} j \in V \\ u_{i} > u_{j} \end{matrix}} ω_{i j}^{(2 - p) q + p - 1} (u_{i} - u_{j})^{p - 1} - d_{i}^{- r} \sum_{\begin{matrix} j \in V \\ u_{j} > u_{i} \end{matrix}} ω_{i j}^{(2 - p) q + p - 1} (u_{j} - u_{i})^{p - 1} \\ = 2 d_{i}^{- r} ∥ ω^{\frac{1 - q}{p}} (\nabla u)^{-} ∥_{i, p - 1}^{p - 1} - 2 d_{i}^{- r} ∥ ω^{\frac{1 - q}{p}} (\nabla u)^{+} ∥_{i, p - 1}^{p - 1} . \end{matrix}

In particular, $Δ_{p} u = 0$ if and only if, for all $i \in V$ , $∥ ω^{\frac{1 - q}{p}} (\nabla u)^{-} ∥_{i, p - 1} = ∥ ω^{\frac{1 - q}{p}} (\nabla u)^{+} ∥_{i, p - 1}$ . From the final inequalities in Corollary 3.2, we know that

\begin{matrix} lim_{p \to \infty} ∥ ω^{\frac{1 - q}{p}} (\nabla u)^{-} ∥_{i, p - 1} & = ∥ (\nabla u)^{-} ∥_{i, \infty}, \\ lim_{p \to \infty} ∥ ω^{\frac{1 - q}{p}} (\nabla u)^{+} ∥_{i, p - 1} & = ∥ (\nabla u)^{+} ∥_{i, \infty} . \end{matrix}

This inspires the following definitionFootnote ²³ of the graph $\infty$ -Laplacian from Elmoataz et al. [Reference Elmoataz, Desquesnes, Lakhdari and Lézoray73], for all $u \in V$ and for all $i \in V$ :

(Δ_{\infty} u)_{i} := ∥ (\nabla u)^{-} ∥_{i, \infty} - ∥ (\nabla u)^{+} ∥_{i, \infty} .

Then $Δ_{\infty} u = 0$ if and only if, for all $i \in V$ , $∥ (\nabla u)^{-} ∥_{i, \infty} = ∥ (\nabla u)^{+} ∥_{i, \infty}$ . The results in Lemma 3.7 give further justification for this definition of $Δ_{\infty}$ . In order to state the lemma, we need the concept of lexicographic ordering of edge functions (we refer to Kyng et al. [Reference Rasmus, Anup, Sushant, Spielman, Peter, Elad and Kale123, section 2]): given functions $φ, ψ \in E$ , we define $φ ⪯ ψ$ if and only if, after reordering the values of $φ$ and $ψ$ in nonincreasing order, the first value of $φ$ that is different from the corresponding value of $ψ$ is smaller than this corresponding value, or no such value exists (i.e., $φ$ and $ψ$ are equal after the reorderings).

Lemma 3.7. Let $S \subseteq V$ be a nonempty subset and let $u^{0} : S \to R$ . The minimization problem

{a r g m i n}_{\begin{matrix} u \in V \end{matrix}} ∥ \nabla u ∥_{E, \infty} s.t. u |_{S} = u^{0}

has a solution. Out of all minimizers, there is a unique one $u^{*}$ , which satisfies, for all $u \in V$ with $u |_{S} = u^{0}$ , $\nabla u^{*} ⪯ \nabla u$ . Moreover, if $u \in V$ with $u |_{S} = u^{0}$ , then $u = u^{*}$ if and only if, for all $i \in V ∖ S$ , $(Δ_{\infty} u)_{i} = 0$ . Furthermore, for $p \in (1, \infty)$ , the minimization problem

{a r g m i n}_{\begin{matrix} u \in V \end{matrix}} ∥ \nabla u ∥_{E, p} s.t. u |_{S} = u^{0}

has a unique solution $u^{p}$ . If $q = \frac{1}{2}$ , then $lim_{p \to \infty} u^{p} = u^{*}$ .

Proof. We refer to lemma 2.1.7 in [Reference van Gennip and Budd190]. □

Because it is the first variation of the graph $p$ -Dirichlet energy (see (3.12)), the graph $p$ -Laplacian in (3.13) is sometimes also called the variational graph $p$ -Laplacian to distinguish it from the game-theoretic graph $p$ -LaplacianFootnote ²⁴

Δ_{p}^{game} := \frac{1}{p} Δ + α (1 - \frac{2}{p}) Δ_{\infty},

(3.14)

for some constant $α > 0$ ; see Flores et al. [Reference Flores, Calder and Lerman90].Footnote ²⁵ A similar operator is used in Elmoataz et al. [Reference Elmoataz, Desquesnes and Lézoray72], where it is called the graph normalized $p$ -Laplacian and where the constants in the linear combination of $Δ$ and $Δ_{\infty}$ are kept as parameters that can be chosen depending on the application.

For a generalization of the discrete calculus presented in these past few sections to (undirected and directed) hypergraphs, that is, ‘graphs’ whose (hyper)edge set $E$ is a subset of $P (V) ∖ {\emptyset}$ ,Footnote ²⁶ we refer to [Reference Bosch, Klamt and Stoll26, Reference Fazeny88, Reference Fazeny, Tenbrinck, Burger, of Calatroni, Luca, Marco, Serena and Santacesaria89, Reference Ikeda and Uchida108, Reference Jost and Mulas112].

Remark 3.8. Hypergraphs do not receive much attention in this work, but we do want to mention a string of recent works by Mulas, Jost, and collaborators, which define graph Laplacians [Reference Jost and Mulas112] and $p$ -Laplacians [Reference Jost, Mulas and Zhang113] on hypergraphs, study their spectraFootnote ²⁷ [Reference Galuppi, Mulas and Venturello92, Reference Mulas, Horak, Jost, of Battiston, Federico and Petri155, Reference Mulas and Zhang157], investigate random walks [Reference Mulas, Kuehn, Böhle and Jost156] and other hypergraph dynamics [Reference Bühler and Hein24], and generalize concepts such as the Cheeger cut [Reference Mulas154], independence number (i.e., the maximum size of a subset $S \subseteq V$ such that, for all distinct $i, j \in S$ , there is no hyperedge to which they both belong), and colouring number (i.e., the fewest colours needed to colour all vertices such that no hyperedge contains two vertices with the same colour) [Reference Abiad, Mulas and Zhang1] to hypergraphs.

We note that Mulas et al. [Reference Mulas, Horak, Jost, of Battiston, Federico and Petri155] also provides generalizations of graph Laplacians to signed graphs (which are graphs in which edge weights can be positive and negative), graphs with self-loops, and directed graphs. For the latter case, the graph Laplacian from Bauer [Reference Bauer17] is used; see Section 4.

For extensions to hypergraphs of the graph limit theories of graphons and graphops (see chapter 7 of [Reference van Gennip and Budd190]) we refer to Elek and Szegedy [Reference Elek and Szegedy71] and Zhao [Reference Zhao203] (hypergraphons), and Zucal [Reference Zucal212] (hypergraphops), respectively.

Methods on hypergraphs have been applied to problems related to our interests in this work, for example hypergraph signal processing in Zhang et al. [Reference Zhang, Ding and Cui202], heat diffusion on hypergraphs for finding bipartite components in Macgregor and Sun [Reference Macgregor, Sun, Beygelzimer, Dauphin, Yann, Liang and Vaughan134] (see also Macgregor [Reference Macgregor133, chapter 7]), and semi-supervised learning (see section 4.1.2 in [Reference van Gennip and Budd190]) on hypergraphs for early diagnosis of Alzheimer’s disease in Aviles-Rivero et al. [Reference Aviles-Rivero, Christina, Nicolas, Zoe, Schönlieb, Linwei, Qi, Thomas, Speidel and Li11].

3.3 Miscellaneous Considerations

To define the distance $d_{i j}^{G}$ between two nodes $i$ and $j$ on a graph $G$ , we first need the concept of a path on $V$ . If $i, j \in V$ are distinct nodes, a path $γ_{i j}$ from $i$ to $j$ is a tuple of $ℓ \in N$ distinct vertices $γ_{i j} = (i_{1}, \dots, i_{ℓ})$ such that $i_{1} = i$ , $i_{ℓ} = j$ , and, for all $k \in [ℓ - 1]$ , $i_{k + 1} \sim i_{k}$ . The length of the path $γ_{i j}$ is defined to be

| γ_{i j} | := \sum_{k = 1}^{ℓ} ω_{i_{k} i_{k + 1}}^{q - 1} .

The reasons for this choice of scaling of $ω$ are explored in some more detail in Remark 3.9. We now define the graph distance between nodes $i \neq j$ as

d_{i j}^{G} := min_{γ_{i j} : a path from i to j} | γ_{i j} | .

(3.15)

We also define $d_{i i}^{G} := 0$ . We note that the path of minimal length between two neighbouring nodes, is not necessarily the ‘direct’ path $(i, j)$ . If $S \subseteq V$ is not empty, we define the distance from $i \in V$ to $S$ as

d_{i}^{S} := min_{j \in S} d_{i j}^{G} .

For $S = \emptyset$ we define, for all $i \in V$ , $d_{i}^{\emptyset} := + \infty$ . By Manfredi et al. [Reference Manfredi, Oberman and Sviridov138, section 3.1, example 2], if $S \neq \emptyset$ , $u = d^{S}$ is the unique solution to an eikonal equation:

(\begin{matrix} min_{j \sim i} (\nabla u)_{i j} = - 1, & if i \in V ∖ S, \\ u_{i} = 0, & if i \in S . \end{matrix})

That $d^{S}$ is a solution can be understood as follows. If $i \in V ∖ S$ , the minimum value of $(\nabla d^{S})_{i j}$ among all neighbours $j$ of $i$ will be achieved at a $j$ that lies on a shortest path from $i$ to $S$ , in which case $d_{j}^{S} - d_{i}^{S} = - d_{i j}^{G} = - ω_{i j}^{q - 1}$ and thus $(\nabla d^{S})_{i j} = - 1$ . If $i \in S$ , then $d_{i}^{S} = 0$ . For a proof of uniqueness we refer to [Reference Manfredi, Oberman and Sviridov138, section 6.2]. We note that the eikonal equation, while inspired by a differential equation from the continuum setting, is a difference equation on graphs.

It is useful to note that we can rewrite the eikonal equation. Let $i \in V ∖ S$ . Since the equation requires $min_{j \sim i} (\nabla u)_{i j} = - 1$ , we can replace $\nabla u$ by $- (\nabla u)^{-}$ without changing the equation. Since $min_{j \sim i} [- (\nabla u)_{i j}^{-}] = - max_{j \sim i} (\nabla u)_{i j}^{-} = - ∥ (\nabla u)^{-} ∥_{i, \infty}$ , we rewrite the eikonal equation as

(\begin{matrix} ∥ (\nabla u)^{-} ∥_{i, \infty} = 1, & if i \in V ∖ S, \\ u_{i} = 0, & if i \in S . \end{matrix})

Remark 3.9. The parameter $q$ , introduced in the definition of the $E$ -inner product and used in the definition of the gradient, does not appear in the Laplacian and Dirichlet energy, but it does impact the definition of total variation.

There are two common choices for $q$ . If we imagine the graph gradient to be a generalization of the finite-difference discretization of the continuum gradient from a regular grid to a general graph, it is natural to associate $ω_{i j}^{1 - q}$ with the length scale $h^{- 1}$ , where $h$ is the mesh size of the ‘grid’. Reasoning similarly for the graph Laplacian, which ‘should’ be analogous to a finite-difference discretization of the continuum Laplacian, $d_{i}^{- r} ω_{i j} = h^{- 2}$ . If the weights $ω_{i j}$ are scaled by a constant factor, the degree $d_{i}$ scales with the same factor, thus $ω_{i j}^{1 - r} \propto h^{- 2}$ . Hence $ω_{i j}^{1 - r} / ω_{i j}^{1 - q} = ω_{i j}^{q - r} \propto h^{- 2} / h^{- 1} = h^{- 1}$ . As both $ω_{i j}^{q - r}$ and $ω_{i j}^{1 - q}$ scale with $h^{- 1}$ , we might choose $q$ by $q - r = 1 - q$ , that is, $q = \frac{1}{2} (1 + r)$ . The common picks $r = 0$ and $r = 1$ then correspond to $q = \frac{1}{2}$ and $q = 1$ , respectively.

This numerical-analysis-based analogy cannot be extended indefinitely. If we want to interpret the summations in the Dirichlet energy and total variation as discrete quadratures approximating integrals, then the volume of the grid cells of the discretization will have to be incorporated into $ω_{i j}$ . Since this volume depends on the dimension of the space over which it is being integrated, this cannot be made consistent with our preceding considerations.

In the remainder of this work (except in Sections 4, 5.3, and 9 where $q$ returns very briefly) we will make the choice $q = 1$ , even when $r \neq 1$ . This is inspired by Theorem 5.1, which states that the graph Ginzburg–Landau functional (which we will define in Section 5) $Γ$ -converges to graph total variation with $q = 1$ in the relevant parameter limit $ε ↓ 0$ . The graph Ginzburg–Landau functional, which is at the heart of much that is discussed in this Element, is independent of $q$ ; the fact that in the limit total variation with $q = 1$ appears can thus be viewed as a selection criterion for our parameter choice.

Another advantage of choosing $q = 1$ is that the exponent of $ω_{i j}$ in the graph $p$ -Dirichlet energy in (3.11) is independent of $p$ (and equal to $1$ ) if and only if $q = 1$ . The graph $p$ -Dirichlet energy is a common choice of regularizer in graph-based variational models (see section 12 and section 4.1 in [Reference van Gennip and Budd190]) and gave rise to the graph $p$ -Laplacian in (3.13).

If $G = (V, E)$ is an unweighted graph, we call the graph $G^{'} = (V^{'}, E^{'})$ a subgraph of $G$ if $V^{'} \subseteq V$ , $E^{'} \subseteq E$ , and $(i, j) \in E^{'}$ implies $i, j \in V^{'}$ . If, additionally, $i, j \in V^{'}$ and $(i, j) \in E$ implies $(i, j) \in E^{'}$ , then $G^{'}$ is called an induced subgraph (or vertex-induced subgraph, or subgraph induced by $V^{'}$ ) of $G$ . A subgraph $G^{'}$ of $G$ is connected if for all distinct nodes $i, j \in V^{'}$ , there exists a path from $i$ to $j$ . A subgraph $G^{'}$ is a component (or connected component) if it is connected and it is maximal in the sense that if is a connected subgraph of $G$ and $G^{'}$ is a subgraph of $G^{'}$ , then . Any connected component of $G$ necessarily is an induced subgraph of $G$ .

These notions straightforwardly carry over to the case in which $\tilde{G} = (V, E, ω)$ is an edge-weighted graph; that is, we call ${\tilde{G}}^{'} = (V^{'}, E^{'}, χ_{E^{'}} ω |_{V^{'} \times V^{'}})$ Footnote ²⁸ a subgraph of $\tilde{G}$ , a (vertex-)induced subgraph of $\tilde{G}$ , connected, or a connected component of ${\tilde{G}}^{'}$ , if $(V^{'}, E^{'})$ is a subgraph of $(V, E)$ , a (vertex-)induced subgraph of $(V, E)$ , connected, or a connected component of $(V, E)$ , respectively.

4 Directed Graphs

In most of this Element we restrict our attention to undirected graphs – which we defined in Section 3.1 as graphs for which $(i, j) \in E$ if and only if $(j, i) \in E$ – as that is the setting in which most of the work has been done on the topics we cover. In this section we will take a brief detour to directed graphs – that is, graphs in which $(i, j) \in E$ and $(j, i) / \in E$ may both be true for given nodes $i, j \in V$ – and we give a brief overview of (a priori different) approaches to defining graph Laplacians on such graphs. Each of these approaches uses a different aspect of Laplacians on undirected graphs as a starting point to define a Laplacian on a directed graph.

For an edge-weighted directed graph, the edge weights $ω$ need not be symmetric, that is, it is possible that $ω_{i j} \neq ω_{j i}$ . We still assume $ω_{i j} \geq 0$ , with $ω_{i j} > 0$ if and only if $(i, j) \in E$ .

Figure 4.1 shows an example of a directed graph, where an arrow on an edge pointing from node $i$ to node $j$ indicates the edge $(i, j)$ . In particular, the graph in Figure 4.1 is an oriented graph, namely a directed graph in which $(i, j) \in E$ implies $(j, i) / \in E$ . We do not require the directed graphs to be oriented.

Figure 4.1 An example of a directed graph; specifically, an oriented graph.

A direct adaptation of the random walk graph Laplacian to directed graphs is given in Bauer [Reference Bauer17]. In Zhou et al. [Reference Zhou, Schölkopf, Hofmann, Lawrence, Yair and Bottou207] the authors make use of the one-to-one correspondence between directed graphs and bipartite graphsFootnote ²⁹ to define new undirected graphs on the two independent vertex classes of the bipartite graph and combine the Laplacians on those new graphs into a Laplacian on the directed graph. In Chung [Reference Chung52] the author recalls the connection between graph Laplacians and random walks to define a graph Laplacian via a random walk on a directed graph. In Hein et al. [Reference Hein, Audibert and von Luxburg104] a similar approach is followed as we usedFootnote ³⁰ for our undirected Laplacian in Section 3, that is, the authors start by defining inner product structures on the spaces of node and edge functions and a discrete gradient, then find a divergence as the adjoint of the gradient and define the Laplacian as the divergence of the gradient. In Shubin [Reference Shubin177] a magnetic Laplacian is introduced for physical reasons; in [Reference Fanuel, Alaíz, Fernández and Suykens86, Reference Fanuel, Alaíz and Suykens87, Reference Xue, Higham and Zygalakis100] it is used for community detection, visualization, and detection of hierarchical structures in directed graphs (see chapter 4 in [Reference van Gennip and Budd190]). In MacKay et al. [Reference MacKay, Johnson and Sansom137] and Gong et al. [Reference Xue, Higham and Zygalakis100] the trophic Laplacian is introduced, in Klus and Djurdjevac Conrad [Reference Klus and Conrad119] the forward-backward Laplacian, and in Jost et al. [Reference Jost, Mulas and Torres114] and Mulas et al. [Reference Mulas, Zhang and Zucal158] the non-backtracking Laplacian. We refer to section 2.5 of [Reference van Gennip and Budd190] for more details on these approaches.

5 The Graph Ginzburg–Landau Functional

A central object in this work is the graph Ginzburg–Landau functionalFootnote ³¹ ${G L}_{ε} : V \to R$ , which was first introduced in Bertozzi and Flenner [Reference Bertozzi and Flenner21, Reference Bertozzi and Flenner22]. It consists of the Dirichlet energy plus a potential term:

{G L}_{ε} (u) := \frac{1}{2} α (ε) ∥ \nabla u ∥_{E}^{2} + \frac{1}{ε} W (u) .

There are some variations in the literature regarding the choice of the $ε$ -dependent factor $α (ε)$ and the choice of potential term, which we briefly discuss here.

In [Reference Bertozzi and Flenner21, Reference Bertozzi and Flenner22] the authors choose $α (ε) = ε$ in analogy to the continuum Ginzburg–Landau functional (see section 3.7 in [Reference van Gennip and Budd190]). In Van Gennip and Bertozzi [Reference Bertozzi189, section 3.1], however, the authors argue that, if one is interested in the limit $ε ↓ 0$ , then $α (ε) = 1$ is a better choice. The reason for this is (in brief) that the divergence of the Dirichlet energy for which $ε$ is meant to compensate in the continuum case, does not occur for the discrete graph Dirichlet energy, which remains bounded. Thus instead of compensating for divergent behaviour, a factor $ε$ would completely remove the influence of the Dirichlet term in the limit $ε ↓ 0$ , leading to a trivial (and uninteresting) limit.

When considering the variational properties of ${G L}_{ε}$ at a fixed $ε > 0$ , the difference between $α (ε) = ε$ and $α (ε) = 1$ is minimal, since $ε^{- 1 / 2} {G L}_{\sqrt{ε}}$ with $α (ε) = ε$ equals ${G L}_{ε}$ with $α = 1$ . Moreover, if we consider gradient flows derived from ${G L}_{ε}$ (see Section 7), going from $α (ε) = ε$ to $α (ε) = 1$ involves only a time rescaling $t_{n e w} = t_{o l d} / ε$ .

The different choices in the potential term $W$ concern two different issues. The first choice involves the factor $d_{i}^{r}$ in the $V$ -inner product. In [Reference Bertozzi and Flenner21; Reference Bertozzi and Flenner22] the authors use $W (u) = \sum_{i \in V} W (u_{i})$ , where is a potential function, which we will consider in much more detail shortly. In Budd and Van Gennip [Reference Budd and van Gennip35] it is noted that $W (u) = ⟨ W \circ u, 1 ⟩_{V} = \sum_{i \in V} d_{i}^{r} W (u_{i})$ is a preferable choice when studying the $V$ -gradient flow of ${G L}_{ε}$ (see Section 7), as it removes the factor $d_{i}^{r}$ (which, for $r \neq 0$ , could be considered a node-dependent time rescaling) from the gradient flow.Footnote ³²

The second choice in $W$ concerns the potential function . In the classical continuum Ginzburg–Landau functional, this is a smooth double-well potential, whose graph resembles (a smooth variant of) the letter ‘W’. An example is shown in Figure 5.1(a). Such a function has three important properties: smoothness, a global minimum which is achieved in exactly two locations (two ‘wells’), and some kind of coercivity which implies that $W (x) \to \infty$ if $| x | \to \infty$ . In some papers, such as in [Reference Bertozzi189, section 2.3], no specific choice is made for $W$ , but precise conditions are given for the smoothness, wells, and coercivity properties of $W$ . Other papers choose a specific (quartic) potential from the class of potentials that satisfy these conditions, such as $W (x) = \frac{1}{4} (x^{2} - 1)^{2}$ in [Reference Bertozzi and Flenner21, Reference Bertozzi and Flenner22] or Van Gennip et al. [Reference Nestor, Braxton and Bertozzi191, section 5] with wells at $x \in {- 1, 1}$ or $W (x) = \frac{1}{4} x^{2} (x - 1)^{2}$ in Calatroni et al. [Reference Luca, Hannah and Flenner41] with wells at $x \in {0, 1}$ . For most purposes, the location of the wells is not important.Footnote ³³ A significant exception is the case of the signless Ginzburg–Landau functional, which we discuss in Section 5.3, and the graph Cahn–Hilliard equation (which requires a mass condition), which is addressed in Section 7.4.

Figure 5.1

(a) Two types of potential $W$ that are commonly used in the Ginzburg–Landau functional: the double-well potential $W (x) = \frac{1}{4} x^{2} (x - 1)^{2}$

(a) and the double-obstacle potential from (5.1).

When ${G L}_{ε}$ is minimized, the potential term drives ‘phase separation’, that is, minimizers (especially at small $ε > 0$ ) will take values which are close to the values at which $W$ has its wells (thus typically $0$ and $1$ , or $- 1$ and $1$ ).

A significant departure from the standard double-well potential can be found in Budd et al. [Reference Budd and van Gennip35, Reference Budd and van Gennip36, Reference Budd, van Gennip and Latz37]. In those papers the double-obstacle potential is used (since it will be clear from the context which potential we are using at any given time in this Element, we use the same symbol $W$ to denote it):

W (x) = (\begin{matrix} \frac{1}{2} x (1 - x), & if 0 \leq x \leq 1, \\ + \infty, & otherwise . \end{matrix})

(5.1)

Figure 5.1 (b) shows a visualization of the double-obstacle potential. We note that the double-obstacle potential also achieves its minimum in two wells, located at $x \in {0, 1}$ , but the coercive nature of the potential is stronger than that of the double-well potential, with $W$ attaining finite values only in the interval $[0, 1]$ . We also call attention to the quadratic, rather than quartic, nature of the potential on $[0, 1]$ . This is convenient when considering gradient flows in which the ‘derivative’ of $W$ appears, which is linear on $(0, 1)$ . A potential difficulty when using the double-obstacle potential compared to the double-well potential is that $W$ is not differentiable outside of $(0, 1)$ . We discuss these issues in more detail in Section 7 and in section 6 in [Reference van Gennip and Budd190], where we outline the results from [Reference Budd and van Gennip35, Reference Budd and van Gennip36, Reference Budd, van Gennip and Latz37].

5.1 Constraints

Minimizing ${G L}_{ε}$ (in any of its variants that we have discussed) over $V$ without imposing any further constraints, leads to constant minimizers $u = c$ , where $c$ can be the location of any of the wells of $W$ .Footnote ³⁴ Besides being necessary to avoid trivial minimizers, in applications constraints are often suggested naturally by a priori known information about the required solutions. In this Element we consider two common constraints:

∙ a ‘hard’ mass (or volume) constraint:Footnote ³⁵ $M (u) := ⟨ u, 1 ⟩_{V} = M$ , for a fixed and given $M \geq 0$ ;
∙ a ‘soft’ fidelity constraint, where a fidelity term of the form $\frac{1}{p} ∥ μ^{\frac{1}{p}} (u - f) ∥_{V, p}^{p}$ is added to to ${G L}_{ε} (u)$ , where $μ \in V_{[0, \infty)} ∖ {0}$ is a node-dependent fidelity parameter function with support equal to some subset $Z \subseteq V$ and $f \in V$ is a reference function to which fidelity should be ‘softly’ enforced on $Z$ .

These constraints are discussed in more detail in [Reference Bertozzi189, section 2.5] and play an important role in [Reference Budd and van Gennip35] (mass constraint) and [Reference Budd, van Gennip and Latz37] (fidelity constraint with $p = 2$ ).

The hard mass constraint can be directly included in the functional that has to be minimized by adding a term $ı_{{0}} (M (u) - M)$ to ${G L}_{ε} (u)$ . In [Reference Bae and Merkurjev13, Reference Jacobs, Merkurjev and Esedoḡlu110] in related contexts relaxed mass constraints are used, in which upper and lower bounds are imposed for the mass instead of a precise value.

A graph functional consisting of the Dirichlet energy term and a fidelity term (or, in other words, a Ginzburg–Landau functional with fidelity term, but without potential term) is studied in Zhou and Schölkopf [Reference Zhou and Schölkopf205].

From an application-based point of view, the presence of a fidelity constraint usually differentiates graph classification from clustering (we refer to chapter 4 of [Reference van Gennip and Budd190] for more details).

5.2 Graph Cuts and $Γ$ -Convergence to Total Variation

One important reason why minimization of ${G L}_{ε}$ can be useful in applications that require clustering or classification of the node set of a graph, is that at small $ε > 0$ the functional ${G L}_{ε}$ captures an (approximate) notion of perimeter of a node subset. To understand this, we first note that, for $S \subseteq V$ ,

T V (χ_{S}) = \sum_{\begin{matrix} i \in S \end{matrix}} \sum_{j \in S^{c}} ω_{i j} =: C u t (S) .

This quantity is called the graph cut of $S$ and can be understood as a discrete perimeter (length) of $S$ .

Since the graph cut is a measure for the strength of the connections between the sets $S$ and $S^{c}$ , it is a candidate for the objective function that is to be minimized in graph clustering. In practice it turns out that minimizing the graph cut subject only to the constraints that $S$ and $S^{c}$ are nonempty sets, often leads to an unbalanced clustering in which one of the two sets contains the vast majority of the graph nodes. In many applications this is not what is required, which has led to many ‘balanced’ (graph) cuts, such as the ratio cut [Reference Lars and Kahng102], normalized cut [Reference Shi and Malik176], min-max cut [Reference Ding, Xiaofeng, Zha, Ming, Simon, of Cercone, Nick, Young and Wu64], edge expansion (related to the sparsest cut problem) [Reference Arora, Hazan and Kale10], and Cheeger cutFootnote ³⁶ (or conductance) [Reference Cheeger and Gunning49, Reference Chung55]:

\begin{matrix} R C u t (S) & := \frac{C u t (S)}{| S |} + \frac{C u t (S)}{| S^{c} |}, N C u t (S) := \frac{C u t (S)}{vol S} + \frac{C u t (S)}{vol S^{c}}, \\ M C u t (S) & := \frac{C u t (S)}{\sum_{\begin{matrix} i \in S \\ j \in S \end{matrix}} ω_{i j}} + \frac{C u t (S)}{\sum_{\begin{matrix} i \in S^{c} \\ j \in S^{c} \end{matrix}} ω_{i j}}, E C u t (S) := \frac{C u t (S)}{min (| S |, | S^{c} |)}, \\ and C C u t (S) & := \frac{C u t (S)}{min (vol S, vol S^{c})}, \end{matrix}

respectively, where $r = 1$ is used in the computations of the volume. Since the main interest in these cuts is as objective functions in minimization problems, different versions of the cuts that are equivalent for minimization purposes may be encountered, such as $\frac{C u t (S)}{| S | | S^{c} |}$ for the ratio cut, which differs from the preceding definition above only by a constant factor $\frac{1}{| V |}$ .

These balanced cuts have been normalized by factors which, upon minimization, avoid one of the sets $S$ and $S^{c}$ being much larger than the other, where ‘large’ is to be interpreted according to the specific measure used in the normalizing factors. Intuitively, these normalization factors play a role similar to the mass constraint from Section 5.1. Minimization of such balanced cuts is NP-hard (see, for example, [Reference Shi and Malik176, Reference von Luxburg193] and references therein). For more information about relaxations of some of these minimization problems in terms of eigenvalue problems for graph Laplacians, we refer to von Luxburg [Reference von Luxburg193].

Multiclass extensions of the preceding cuts for a partition ${S_{i}}_{i \in I}$ of $V$ are straightforward to construct: $\sum_{i \in I} \frac{C u t (S_{i})}{| S_{i} |}$ for the ratio cut, $\sum_{i \in I} \frac{c u t S_{i}}{min (| S_{i} |, | S_{i}^{c} |)}$ for the edge expansion, and similarly for the other balanced cuts.

For more details about graph cuts, we refer to Izenman [Reference Izenman109, chapter 11].

The following theorem shows the connection between the graph cut and ${G L}_{ε}$ . It uses the language of $Γ$ -convergence [Reference Braides28, Reference Maso62].

Theorem 5.1. Assume that $W$ is either a smooth double-well potential with wellsFootnote ³⁷ at $x \in {0, 1}$ , satisfying the coercivity condition $lim_{| x | \to \infty} W (x) = \infty$ , or is the double-obstacle potential. Let $W$ be defined in either of the two ways that are introduced near the beginning of (the current) Section 5 and let $α (ε) = 1$ .

When $ε ↓ 0$ , the sequence $({G L}_{ε})$ $Γ$ -converges to the functional:

Moreover, the following compactness result holds: if $(ε_{k})$ is a sequence of strictly positive real numbers converging to zero and $(u_{k})$ is a sequence in $V$ for which ${G L}_{ε_{k}} (u_{k})$ is bounded uniformly in $k$ , then there exists a subsequence of $(u_{k})$ which converges to a $u_{\infty} \in V$ .

Proof. In the double-well case, the proof can be found in [Reference Bertozzi189, theorems 3.1 and 3.2]. In the double-obstacle case, the proof of $Γ$ -convergence is very similar and given in [Reference Budd and van Gennip35, theorem 6.1]. The proof of the compactness result is not given there, but it is simpler than the one in [Reference Bertozzi189, theorem 3.2]: the double-obstacle potential forces the functions $u_{k}$ to lie in the compact set $V_{[0, 1]}$ . Compactness of $V_{[0, 1]}$ is a consequence of the Heine–Borel theorem, if we identify the set with the closed and bounded set of vectors $[0, 1]^{n} \subseteq R^{n}$ .

The proof in [Reference Bertozzi189] works with $W (u) = \sum_{i \in V} W (u_{i})$ , while the proof in [Reference Budd and van Gennip35] has $W (u) = ⟨ W \circ u, 1 ⟩_{V}$ . □

As a consequence of Theorem 5.1, by the fundamental theorem of $Γ$ -convergence (see [Reference Braides28, Reference Maso62]), if $(u_{k})$ is a sequence such that $u_{k}$ minimizes ${G L}_{ε_{k}}$ over $V$ , then any cluster point of the sequence minimizes ${G L}_{0}$ . Of course, that in itself is not very interesting, as we already know that the minimizers of ${G L}_{ε}$ over $V$ are constant functions. In [Reference Bertozzi189, theorem 3.6] it is shown that the results of Theorem 5.1 remain true (mutatis mutandis) if any one of the constraints from Section 5.1 is implemented in the double-well case and [Reference Budd33, theorem 7.2.4, corollary 7.2.5] shows the same for the double-obstacle case with a fidelity constraint. The double-obstacle case with a mass constraint can be tackled with similar techniques as the other cases.

5.3 Signless Laplacians and Signless Ginzburg–Landau Functional

A variant of the graph Ginzburg–Landau functional is the signless graph Ginzburg–Landau functional ${G L}_{ε}^{σ} : V \to R$ :

where the signless graph gradient is defined to be $(\nabla^{σ} u)_{i j} := ω_{i j}^{1 - q} (u_{j} + u_{i})$ . We note that the only difference with the standard graph gradient and graph Ginzburg–Landau functional is the sum rather than the difference of $u_{j}$ and $u_{i}$ in the gradient. In Keetch and Van Gennip [Reference Keetch and van Gennip117] and Keetch [Reference Keetch116], minimization of ${G L}_{ε}^{σ}$ is used to find approximate solutions to the Max-Cut problem, which consists of finding a partition of the node set $V$ into $S \subseteq V$ and $S^{c}$ , in such a way that $C u t (S)$ is maximized. As before near the beginning of Section 5, we note that the choice between $α (ε) = 1$ and $α (ε) = ε$ is not relevant for this minimization problem. The choice of $W$ , however, is important. In particular, the wells of $W$ should be placed symmetrically with respect to the origin, otherwise the constant function $u = 0$ would be the unique minimizer. In [Reference Keetch116, Reference Keetch and van Gennip117] the authors use $W (x) = (x^{2} - 1)^{2}$ , $W (u) = \sum_{i \in V} W (u_{i})$ , $α (ε) = 1$ , and $q = 1$ . We note that no additional constraints, such as a mass or fidelity constraint, are required to avoid trivial minimizers of ${G L}_{ε}^{σ}$ .

As for the standard graph gradient, we can define the signless graph divergence as the adjoint of the signless gradient. Having those ingredients, we then define the signless graph Laplacian as the signless divergence of the signless gradient, the signless graph Dirichlet energy as in (3.8), but using the signless gradient instead, and signless graph total variation as in (3.10), but using the signless divergence instead. In each of these cases, this simply translates to replacing the difference in the divergence or Laplacian by a summation. In particular,

(Δ^{σ} u)_{i} := d_{i}^{- r} \sum_{j \in V} ω_{i j} (u_{i} + u_{j}) and {T V}^{σ} (u) := \frac{1}{2} \sum_{i, j \in V} ω_{i j} | u_{i} + u_{j} | .

(5.2)

We note that the signless Laplacian has matrix representation $Δ^{σ} = D^{- r} (D + A)$ (see Remark 3.4). Besides the combinatorial ( $r = 0$ ) and random walk ( $r = 1$ ) versions, the symmetrically normalized signless Laplacian $Δ^{σ, sym} = D^{- \frac{1}{2}} (D + A) D^{- \frac{1}{2}}$ also appears in the literature. It is a special case of the signless variant of the two-parameter graph Laplacian in (3.7) (see also Remark 3.6).

The signless analogue of Theorem 5.1 holds, with ${G L}_{ε}$ , $T V$ , and $V_{{0, 1}}$ replaced by ${G L}_{ε}^{σ}$ , ${T V}^{σ}$ , and $V_{{- 1, 1}}$ , respectively, as proven in [Reference Keetch and van Gennip117, lemmas 4.3 and 4.4]. This justifies the use of ${G L}_{ε}^{+}$ as an approximate objective function for the Max-Cut problem, since, for $S \subseteq V$ , ${T V}^{σ} (χ_{S} - χ_{S^{c}}) = 2 \sum_{i, j \in V} ω_{i j} - 4 C u t (S)$ [Reference Keetch and van Gennip117, lemma 4.1]. Thus minimization of ${T V}^{σ} (χ_{S} - χ_{S^{c}})$ over $S$ maximizes the cut. We note that, given that the wells of $W$ are located at $x \in {- 1, 1}$ , the relevant set ‘indicator’ function is here $χ_{S} - χ_{S^{c}}$ , not $χ_{S}$ .

6 Spectrum of the Graph Laplacians

Indicators of the usefulness of the graph Laplacians for clustering problems can be found in their spectrum, particularly the property that is stated in the following lemma. In this section (Section 6), we temporarily drop the assumption that our graphs are connected. We begin with a very important result.

Lemma 6.1. All eigenvalues of $Δ$ and $Δ^{s y m}$ are real and nonnegative. The smallest eigenvalue of each of these operators is $0$ and its algebraic and geometric multiplicities are both equal to the number $k \in N$ of connected components $G_{1}, \dots, G_{k}$ of the graph $G$ . The associated eigenspace for $Δ$ is spanned by the set of indicator functions (on $V$ ) ${χ_{S^{i}}}_{i \in [k]}$ , where $S^{i} \subseteq V$ is the set of nodes that induces the subgraph $G_{i}$ . The associated eigenspace for $Δ^{s y m}$ is spanned by ${f^{i}}_{i \in [k]}$ where $f^{i} : V \to R$ is defined by, for all $j \in V$ , $f_{j}^{i} := d_{j}^{1 / 2} (χ_{S^{i}})_{j}$ .

Proof. This well-known result has been proven and reproven in many places, for example Chung [Reference Chung55, lemma 1.7] and von Luxburg [Reference von Luxburg193, propositions 1–4]. □

This lemma shows that if the clustering problem is trivial, in the sense that each cluster is its own connected component, then the number of clusters is given by the multiplicity of the zero eigenvalue and the corresponding eigenspace contains information about the nodes belonging to each cluster. The expectation that graph Laplacians are also useful in nontrivial clustering problems, in which the graph itself may be connected, but certain induced subgraphs have a high edge volume (see (3.2)) and low connectivity to other parts of the graph, is based on the hope that small perturbations to the edge structure of the trivial case will preserve the important information in the first (in ascending order) eigenvalues of the graph Laplacian and their corresponding eigenfunctions (or eigenspaces). This hope is (non-rigorously) justified in practice by the success of Laplacian-based clustering techniques, such as spectral clustering [Reference Izenman109, Reference Ng, Jordan, Weiss, Thomas, Suzanna and Ghahramani161, Reference Shi and Malik176, Reference von Luxburg193] or the ones discussed in [Reference van Gennip and Budd190].

This hope can also be given some rigorous support, for example through higher-order Cheeger inequalities (see Lee et al. [Reference Lee, Gharan and Trevisan128, theorem 1.1])Footnote ³⁸ of the form (for $K \in N \cap [2, n]$ ),

\frac{1}{2} λ_{K} \leq min_{{S_{k}}_{k = 1}^{K}} max (\frac{C u t (S_{k})}{vol S_{k}} : k \in [K]) \leq C (K) \sqrt{λ_{K}},

(6.1)

where $λ_{K}$ denotes the $K^{th}$ eigenvalue (in nondecreasing order) of the random walk graph Laplacian and $C (K) = O (K^{2})$ as $K \to \infty$ .Footnote ³⁹ The minimum in (6.1) is taken over all partitions of $V$ into $K$ (per definition of partition nonempty and pairwise disjoint) subsets $S_{k}$ of $V$ . These inequalities tell us that, if there is a large gap between the $K^{th}$ and $(K + 1)^{st}$ eigenvalues, then the value of the Cheeger constant, that is, the value in between the inequalities in (6.1) (see also footnote Footnote 36), will be significantly higher if we partition $V$ into $K + 1$ subsets, than if we partition it into $K$ subsets.

For further details about the eigenvectors that support their use in clustering, we refer to the structure theorems of Peng et al. [Reference Peng, Sun and Zanetti169] and Macgregor and Sun [Reference Macgregor, Sun, Kamalika, Stefanie, Le, Csaba and Gang136] (see also Macgregor [Reference Macgregor133, chapter 4]). In Macgregor and Sun [Reference Macgregor, Sun, Kamalika, Stefanie, Le, Csaba and Gang136] (see also Macgregor [Reference Macgregor133, chapter 5]) it is shown that in some cases the use of fewer than $K$ eigenvectors (specifically, in the spectral clustering method; see section 4.1.2 in [Reference van Gennip and Budd190]) to find $K$ clusters can be beneficial.

More rigorous support is given by the results of Hoffmann et al. [Reference Franca, Bamdad, Oberai and Stuart105], whose paper looked at graph Laplacians $Δ^{(s, t)}$ (see (3.7)) that are built upon data sampled from a probability density which is a perturbation of perfectly separated clusters, and the continuum limit of these Laplacians obtained as the number of data points tends to infinity (more details on this continuum object and this random setting can be found in section 7.2 in [Reference van Gennip and Budd190]). It was shown in [Reference Franca, Bamdad, Oberai and Stuart105, theorem 3.2] that if $s + t = 1$ (as is the case for the random walk and symmetrically normalized Laplacians), then this continuum operator has a spectral gap between its second and third eigenvalue which is independent of the size of the perturbation (refer to [Reference van Gennip and Budd190] for more details).

The second smallest eigenvalue of a graph LaplacianFootnote ⁴⁰ is called the algebraic connectivity or Fiedler value. By Lemma 6.1, for a connected graph this value is strictly positive. The associated eigenvector is called the Fiedler vector.Footnote ⁴¹

Figure 6.1

(a) An example of a disconnected graph

(b) and that same graph with nodes coloured according to the value of the Fiedler vector at that node.

We illustrate these properties in Figures 6.1, 6.2, and 6.3. Figure 6.1 shows a disconnected graph, and Figure 6.1 (b) shows how the Fiedler vector distinguishes between the connected components of the graph. Figure 6.3 (a) shows the spectrum of the associated unnormalized graph Laplacian, and we observe that the first two eigenvalues are zero, in accordance with Lemma 6.1. Next, in Figure 6.2 we make this graph connected by adding a single extra edge. Figure 6.2 (b) shows how the Fiedler vector still distinguishes between the two clusters. Figure 6.3 (b) shows the spectrum of the associated graph Laplacian, and we observe that the second eigenvalue is now non-zero – in accordance with Lemma 6.1 – but the other eigenvalues are largely unchanged.

Figure 6.2

(a) The graph from Figure 6.1 with an extra edge added to make it connected

(b) and that same connected graph with nodes coloured according to the value of the Fiedler vector at that node.

(a) and Figure 6.2

(b) plotted on a (crunched) log scale to emphasize the change in the second eigenvalue, indicated by a square marker.

Figure 6.3 The spectra of the (unnormalized) graph Laplacians of the graphs from Figure 6.1

One key feature of the Laplacian spectrum is its stability with respect to noise. In El Karoui and Wu [Reference Karoui and Hau-Tieng115, lemmas 3.1 and 3.2] it was shown that if there exist $(α_{i})_{i = 1}^{n}$ and $ε > 0$ such that for all $i, j \in V$ , $α_{i} > 0$ and $| ω_{i, j} - α_{i} {\tilde{ω}}_{i, j} | \leq ε$ ,Footnote ⁴² then the corresponding random walk Laplacians built from $ω$ and $\tilde{ω}$ differ in spectral normFootnote ⁴³ by at most an $O (ε)$ term which is independent of the $α_{i}$ . For a more comprehensive discussion of this stability, we refer to Ding and Wu [Reference Ding and Hau-Tieng65] and the references therein. This property is of practical importance, as it allows (spectral) graph Laplacian methods to be applied robustly to noisy data.

There are other properties of the spectrum that can be proven rigorously. Since we will not directly use them in the current work, we refer to other sources such as [Reference Chung55, Reference Nestor, Braxton and Bertozzi191, Reference von Luxburg193] for more information.

An analogous result to Lemma 6.1 also holds for the signless graph Laplacians (see Section 5.3). We recall the definition of bipartite graphs from footnote Footnote 29.

Lemma 6.2. All eigenvalues of $Δ^{σ}$ and $Δ^{σ, sym}$ are real and nonnegative. Assume the graph $G$ has $k \in N$ connected components $G_{1}, \dots, G_{k}$ induced by the subsets $S^{i} \subseteq V$ ( $i \in [k]$ ) and let $k^{'} \in N \cap [0, k]$ . Then both $Δ^{σ}$ and $Δ^{σ, sym}$ have eigenvalue $0$ with algebraic and geometric multiplicity each equal to $k^{'}$ if and only if $k^{'}$ of the connected components are bipartite graphs.

In that case we may assume (possibly after relabelling) that, for all $i \in [k^{'}]$ , $G_{i}$ is bipartite. Denote the partite sets of $G_{i}$ by $T^{i} \subseteq S^{i}$ and $S^{i} ∖ T^{i}$ . Then the eigenspace corresponding to the eigenvalue $0$ for $Δ^{σ}$ is spanned by the set of indicator functions (on $V$ ) ${χ_{T^{i}} - χ_{S^{i} ∖ T^{i}}}_{i \in [k^{'}]}$ . The eigenspace corresponding to the eigenvalue $0$ for $Δ^{σ, sym}$ is spanned by ${f^{i}}_{i \in [k^{'}]}$ where the functions $f^{i} : V \to R$ are defined by, for all $j \in V$ , $f_{j}^{i} := d_{j}^{1 / 2} ((χ_{T^{i}})_{j} - (χ_{S^{i} ∖ T^{i}})_{j})$ .

Proof. A proof is provided in Keetch and Van Gennip [Reference Keetch and van Gennip117, proposition 4]. □

Similar to how the result from Lemma 6.1 helps interpreting the usefulness of graph Laplacians for clustering, so the result from Lemma 6.2 clarifies the usefulness of signless graph Laplacians for Max-Cut problems, as in [Reference Keetch116, Reference Keetch and van Gennip117]. The objective in a Max-Cut problem is to find a partition of the node set $V$ of a given graph into two disjoint nonempty subsets $T$ and $V ∖ T$ such that $C u t (T) = max_{\tilde{T} \subseteq V} C u t (\tilde{T})$ (see Section 5.2). In the trivial case, that is, for a graph in which all $k$ connected components are bipartite) the maximum cut can be achieved by setting $T = ⋃_{i = 1}^{k} T_{i}$ (using the notation from Lemma 6.2) and the information about each $T_{i}$ is encoded in the eigenspace corresponding to the eigenvalue $0$ . Analogously to the clustering case, the hope (justified by numerical tests in [Reference Keetch116, Reference Keetch and van Gennip117]) is that, if the connected components are perturbations of bipartite graphs, enough of this information remains encoded in those eigenfunctions (or eigenspaces) that correspond to the smallest eigenvalues of the signless graph Laplacians, such that it can be used to construct a partition of $V$ that, in a sense,Footnote ⁴⁴ is the closest to an actual bipartition as is possible in the given graph (or a good approximation thereof).

For information about the (nonlinear) spectrum of graph $p$ -Laplacians, we refer to Bühler and Hein [Reference Böhle, Kuehn, Mulas and Jost38].

7 Gradient Flow: Allen–Cahn

Starting from this section, we only consider undirected graphs. In the definition of ${G L}_{ε}$ , we make the choices $α (ε) = 1$ and $W (u) = ⟨ W \circ u, 1 ⟩_{V}$ .Footnote ⁴⁵ We will specify whether $W$ is a double-well or double-obstacle potential where this is relevant. Also, we recall from Remark 3.9 that we choose $q = 1$ .

In Section 5 we have argued that minimizing ${G L}_{ε}$ under a mass or fidelity constraint gives constrained approximate minimum cuts. This raises the question of how a minimizer can be found. We introduce three different options aimed at minimizing (or approximately minimizing) ${G L}_{ε}$ : a gradient flow (in the current section), a threshold dynamics scheme (Section 8), and flow by mean curvature (Section 9). It should be noted that none of these options provably leads to global minimizers of ${G L}_{ε}$ , but in many cases it can be shown that the scheme in question decreases ${G L}_{ε}$ (or a closely related functional).

This work focuses on graph models, not the continuum models that were their inspiration, and so we will not delve deeply into those models. In section 3.7 of [Reference van Gennip and Budd190] we give a short (and very much not exhaustive) overview of some of the important literature on the continuum models that inspired many of the graph models we discuss in this work.

7.1 Smooth Double-Well Potential

First we consider the $V$ -gradient flow of ${G L}_{ε}$ [Reference Bertozzi and Flenner21, Reference Bertozzi and Flenner22, Reference Nestor, Braxton and Bertozzi191]. The equation describing this flow is of the form $\frac{d u}{d t} = - {g r a d}_{V} {G L}_{ε} (u)$ , where for all $u \in V$ , the $V$ -gradient of ${G L}_{ε}$ at $u$ is the unique function ${g r a d}_{V} {G L}_{ε} (u) \in V$ such that, with $s \in R$ and for all $v \in V$ ,

{(\frac{d}{d s} {G L}_{ε} (u + s v))}_{s = 0} = ⟨ {g r a d}_{V} {G L}_{ε} (u), v ⟩_{V} .

In analogy to the continuum case (see section 3.7.2 of [Reference van Gennip and Budd190]), the resulting equation is called the graph Allen–Cahn equation:

\frac{d u}{d t} = - Δ u - \frac{1}{ε} W^{'} \circ u .

(7.1)

We have assumed here that the double-well potential $W$ is chosen to be smooth (in the sense of Section 5) and, as is standard practice for gradient flows, we have introduced an ‘artificial time’ variable $t$ , such that at each time $t$ , $u (t) \in V$ . To be precise, borrowing notation from Budd and Van Gennip [Reference Budd and van Gennip35], we define the following spaces, given an interval $T \subseteq R$ and subset $A \subseteq R$ :

\begin{matrix} V_{t \in T} & := {u : T \to V}, V_{A, t \in T} := {u : T \to V_{A}}, \\ L^{2} (T; V) & := {u \in V_{t \in T} : ∥ u ∥_{t \in T} < \infty}, \\ H^{1} (T; V) & := {u \in L^{2} (T; V) : \frac{d u}{d t} \in L^{2} (T; V)}, \\ H_{l o c}^{1} (T; V) & := {u \in V_{t \in T} : \forall a, b \in T u |_{(a, b)} \in H^{1} ((a, b); V)} . \end{matrix}

The norm $∥ \cdot ∥_{t \in T}$ in the definition of $L^{2} (T; V)$ is induced by the inner product

(u, v)_{t \in T} := \int_{T} ⟨ u (t), v (t) ⟩_{V} d t

and $\frac{d u}{d t} \in L^{2} (T; V)$ denotes the generalized time derivative of $u$ , that is, the function which satisfies, for all $v \in C_{c}^{\infty} (T; V)$ , ${(u, \frac{d v}{d t})}_{t \in T} = - {(\frac{d u}{d t}, v)}_{t \in T} .$

Remark 7.1. If we were not to choose $α (ε) = 1$ , but leave $α (ε)$ unspecified, then (7.1) would be

\frac{d u}{d t} = - α (ε) Δ u - \frac{1}{ε} W^{'} \circ u

instead. Rescaling by $α (ε)$ gives

\frac{1}{α (ε)} \frac{d u}{d t} = - Δ u - \frac{1}{ε α (ε)} W^{'} \circ u or \frac{d u}{d \tilde{t}} = - Δ u - \frac{1}{\tilde{ε}} W^{'} \circ u,

if we define $\tilde{t} = α (ε)^{- 1} t$ and $\tilde{ε} = ε α (ε)$ . So, even though the choice of $α (ε)$ matters when considering the limiting behaviour of ${G L}_{ε}$ as $ε ↓ 0$ (see Theorem 5.1), when considering the graph Allen–Cahn equation we can assume, without loss of generality, that $α (ε) = 1$ , as indeed we did earlier in this section and as will be our default choice except where it is relevant to specify otherwise.

The choice of $W$ has more influence on the Allen–Cahn equation. Choosing $W (u) = \sum_{i \in V} W (u_{i})$ (rather than $W (u) = ⟨ W \circ u, 1 ⟩_{V}$ ) leads to a term $- \frac{1}{ε} D^{- r} W^{'} \circ u$ (if we interpret $W^{'} \circ u$ as the vector with components $W^{'} (u_{i})$ for $i \in V$ ) instead of $- \frac{1}{ε} W^{'} \circ u$ . To the authors’ knowledge, there are currently no studies that focus on the influence that the extra scaling $D^{- r}$ (if $r \neq 0$ ) has in practical applications. In our discussion of applications in chapter 4 of [Reference van Gennip and Budd190], we will mostly ignore this difference.

Remark 7.2. If we wish to similarly obtain a graph Allen–Cahn equation with the two-parameter normalization of the graph Laplacian $Δ^{(s_{1}, s_{2})}$ (which, we recall from (3.7), is defined to be $D^{- s_{1}} (D - A) D^{- s_{2}}$ ), that is,

\frac{d u}{d t} = - Δ^{(s_{1}, s_{2})} u - \frac{1}{ε} W^{'} \circ u,

as a $V$ -gradient flow of a Ginzburg–Landau-like energy, then we must modify the Ginzburg–Landau energy to

\frac{1}{2} ∥ \nabla (D^{- s_{2}} u) ∥_{E}^{2} + \frac{1}{ε} W (u),

and take the $V$ -gradient flow (and inner products in the energy) with $r = s_{1} - s_{2}$ . The same modification mutatis mutandis applies to all of the Allen–Cahn equations in (this) Section 7. For further details, we refer to Budd [Reference Budd34].

Remark 7.3. Since the right-hand side of (7.1) is locally Lipschitz continuous, by the Picard–Lindelöf theorem (see Hale [Reference Hale103, chapter I, theorem 3.1]), for every $u^{0} \in V$ there exists a unique continuously differentiable solution $u$ locally in time of the initial-value problem with Equation (7.1) and initial condition $u (0) = u_{0}$ . Since (7.1) is the $V$ -gradient flow of ${G L}_{ε}$ , we have, for all $t \geq 0$ in the domain of $u$ , ${G L}_{ε} (u (t)) \leq {G L}_{ε} (u_{0}) < \infty$ . Since $W$ satisfies a coercivity condition (i.e., $W (x) \to \infty$ if $| x | \to \infty$ ; see Section 5), this implies that $∥ u (t) ∥_{V, \infty}$ is bounded for all $t \geq 0$ in the domain of $u$ (there is no finite-time blowup). A standard continuation result for ODEs [Reference Hale103, chapter I, theorem 2.1] tells us that the solution $u$ exists for all $t \in [0, \infty)$ .

Remark 7.4. As discussed in some more detail in section 3.7.6 of [Reference van Gennip and Budd190], solutions of the continuum Allen–Cahn equation converge to solutions of continuum mean curvature flow (in some sense that can be made precise) as $ε ↓ 0$ . For more details about the expectations this raises for the graph variants, we refer to Section 9.2.

Remark 7.5. In Keetch and Van Gennip [Reference Keetch and van Gennip117] (see also Keetch [Reference Keetch116]) the signless graph Allen–Cahn equation is derived, based on the signless graph Ginzburg–Landau functional from Section 5.3. It differs from the equation in (7.1) in that it uses the signless graph Laplacian $Δ^{σ}$ from (5.2) rather than $Δ$ .

7.2 Double-Obstacle Potential

Instead of using a smooth double-well potential $W$ , we can also consider the $V$ -gradient flow of ${G L}_{ε}$ using the nondifferentiable double-obstacle potential. In that case ${g r a d}_{V} {G L}_{ε} (u)$ is no longer uniquely determined at each $u$ . Instead we require the use of subgradients [Reference Ekeland and Temam67, definition 5.1] and the differential equation describing the gradient flow is replaced by a differential inclusion, $\frac{d u}{d t} \in - \partial_{V} {G L}_{ε} (u)$ , where the subdifferential $\partial_{V} {G L}_{ε} (u)$ of ${G L}_{ε}$ at $u \in V$ (with respect to the $V$ -inner product) is the setFootnote ⁴⁶

\begin{matrix} (\begin{matrix} (β \in V : \forall w \in V ⟨ w - u, β ⟩_{V} + {G L}_{ε} (u) \leq {G L}_{ε} (w)), & if {G L}_{ε} (u) < + \infty, \\ \emptyset, & if {G L}_{ε} (u) = + \infty . \end{matrix}) \end{matrix}

Using the double-obstacle potential in ${G L}_{ε}$ then gives us the following gradient flow differential inclusion:Footnote ⁴⁷

\frac{d u}{d t} = - Δ u + \frac{1}{ε} (- \frac{1}{2} + u + β), where, \forall i \in V, β_{i} (t) \in - \partial_{R} ı_{[0, 1]} (u_{i} (t)) .

(7.2)

We define a solution of (7.2) on an interval $T \subseteq R$ to be a pair $(u, β) \in V_{[0, 1], t \in T} \times V_{t \in T}$ with $u \in H_{l o c}^{1} (T; V) \cap C^{0} (T; V)$ such that (7.2) is satisfied at a.e. $t \in T$ and for all $i \in V$ .Footnote ⁴⁸ In Budd and Van Gennip [Reference Budd and van Gennip35, theorem 3.2] it is proven that if $(u, β)$ is a solution to (7.2), then $β$ is determined at all $i \in V$ and a.e. $t \in T$ to be

β_{i} (t) = (\begin{matrix} \frac{1}{2} + ε (Δ u (t))_{i}, & if u_{i} (t) = 0, \\ 0, & if u_{i} (t) \in (0, 1), \\ - \frac{1}{2} + ε (Δ u (t))_{i}, & if u_{i} (t) = 1. \end{matrix})

This means that the differential inclusion above is actually a differential equation.

When discussing the Allen–Cahn equation with the double-obstacle potential from (7.2), we have stated the result for a.e. $t \in T$ . This contrasts with the Allen–Cahn equation with the double-well potential from (7.1), for which we have seen in Remark 7.3 that, for every initial condition, a unique solution exists on $[0, \infty)$ . In [Reference Budd and van Gennip35, theorems 3.9 and 3.10] it is shown that, for all initial conditions $u^{0} \in V_{[0, 1]}$ there exists a pair $(u, β) \in V_{[0, 1], t \in T} \times V_{t \in T}$ which satisfies (7.2) for a.e. $t \in T$ with $u (0) = u^{0}$ and $u \in H_{l o c}^{1} (T; V) \cap C^{0, 1} (T; V)$ . Moreover, $u$ is uniquely determined for all $t \in T$ and $β$ for a.e. $t \in T$ .

The choice for the double-obstacle potential makes it possible to connect the graph Allen–Cahn equation with the graph Merriman–Bence–Osher (MBO) scheme, which we introduce in Section 8. This connection is looked at in more detail in chapter 6 of [Reference van Gennip and Budd190].

Also, (numerical) practice is influenced by the choice for the double-obstacle potential, as argued in Bosch et al. [Reference Bosch, Klamt and Stoll26], where the (fidelity-forced; see Section 7.3) graph Allen–Cahn equation with a non-smooth potential outperforms the one with a smooth potential on image segmentation tasks (for more detail on image segmentation, we refer to section 4.1.3 of [Reference van Gennip and Budd190]).

7.3 Allen–Cahn with Constraints

We can incorporate the mass constraint or fidelity constraint from Section 5.1 into the gradient flow. Starting with the latter, recall that the fidelity term $\frac{1}{p} ∥ μ^{1 / p} (u - f) ∥_{V, p}^{p}$ is added to ${G L}_{ε} (u)$ , where $μ \in V_{[0, \infty)} ∖ {0}$ has support $Z \subseteq V$ . For simplicity we restrict ourselves here to the case $p = 2$ . This is also most commonly used in the literature, for example, Bertozzi and Flenner [Reference Bertozzi and Flenner21, Reference Bertozzi and Flenner22] (double-well potential) and Budd et al. [Reference Budd, van Gennip and Latz37] (double-obstacle potential). At the level of the $V$ -gradient flow equation, this adds a term $- μ (u - f)$ (i.e., minus the first variation of the fidelity term in the functional) to the right-hand side of (7.1) or (7.2), leading to the fidelity-forced Allen–Cahn equation on graphs:

\frac{d u}{d t} = - Δ u - \frac{1}{ε} g (u, β) - μ (u - f),

(7.3)

where $g (u, β) = W^{'} \circ u$ for the double-well potential case (see (7.1)) and $g (u, β) = \frac{1}{2} - u - β$ for the double-obstacle potential case (see (7.2)).

To instead impose a mass constraint $M (u) = M$ , we recall from Section 5.1 that we can do so by adding the term $ı_{{0}} (M (u) - M)$ to the functional. In the gradient flow this leads to an additional term [Reference Ekeland and Temam67, proposition 5.6] which should be an element of the subdifferential $\partial_{V} m (u)$ , where $m (u) := ı_{{0}} (M (u) - M)$ .

Lemma 7.6. Let $u \in V$ and $M \in R$ , then

\partial_{V} m (u) = (\begin{matrix} {v \in V : \exists c \in R \forall i \in V v_{i} = c}, & if M (u) = M, \\ \emptyset, & if M (u) \neq M . \end{matrix})

Proof. The proof is found in lemma 3.1.6 in [Reference van Gennip and Budd190]. □

Lemma 7.6 shows that, if a mass constraint is imposed, an additional constant term is added to the gradient flow. The value of this constant is determined by the requirement that the resulting equation conserves mass. To be explicit, if $\frac{d u}{d t} = - Δ u - \frac{1}{ε} g (u, β) + c$ (recall the definition of $g (u, β)$ from its introduction in the fidelity-forced case), then we require

0 = \frac{d}{d t} ⟨ u, 1 ⟩_{V} = - \frac{1}{ε} ⟨ g (u, β), 1 ⟩_{V} + c vol V = - \frac{1}{ε} M (g (u, β)) + c vol V,

where we used that $M (Δ u) = 0$ . Hence $c = \frac{1}{ε vol V} M (g (u, β))$ and the mass-conserving Allen–Cahn equation is

\frac{d u}{d t} = - Δ u - \frac{1}{ε} g (u, β) + \frac{1}{ε vol V} M (g (u, β)) .

(7.4)

We emphasize that $c$ is constant as function in $V$ , that is, it has the same value at each node, but may (and does) depend on $u$ .

We may also impose the fidelity and mass constraints simultaneously. Either by explicit computation of the subdifferential of the sum of the fidelity term and $m$ , or by using the fact that this subdifferential of the sum is equal to the sum of the respective subdifferentials [Reference Ekeland and Temam67, proposition 5.6], we find that the resulting mass-conserving fidelity-forced ( $p = 2$ ) Allen–Cahn equation (version 1) is

\frac{d u}{d t} = - Δ u - \frac{1}{ε} g (u, β) - μ (u - f) + c,

for some constant $c \in R$ . As before, we can determine the value of $c$ by requiring that $⟨ \frac{d u}{d t}, 1 ⟩_{V} = 0$ , which leads to $c = \frac{1}{vol V} (\frac{1}{ε} M (g (u, β)) + ⟨ μ, u - f ⟩_{V})$ .

In Calder et al. [Reference Calder, Cook, Thorpe, Slepčev, Hal and Singh44] a different approach is taken. Instead of the fidelity term from Section 5.1, a term $- ⟨ u, μ f ⟩_{V}$ is added to the mass-conserving Ginzburg–Landau functional.Footnote ⁴⁹ We note the minus sign. In the minimization of the functional, this term encourages $u_{i}$ to be large whenever $μ_{i} f_{i}$ is large and to be small whenever $μ_{i} f_{i}$ is small. The corresponding mass-conserving fidelity-forced Allen–Cahn equation (version 2) is

\frac{d u}{d t} = - Δ u - \frac{1}{ε} g (u, β) + μ f + c,

where again $c$ is determined by the requirement of mass conservation: $c = \frac{1}{vol V} (\frac{1}{ε} M (g (u, β)) - ⟨ μ, f ⟩_{V})$ .

As was the case for the Allen–Cahn equation without constraints (see Remark 7.3), if we use the double-well potential in the constrained equation in (7.3) or (7.4), then the right-hand side of the equation is locally Lipschitz continuous and thus existence and uniqueness of solutions locally in time for any given initial condition follow from the Picard–Lindelöf theorem of ODE theory (see Hale [Reference Hale103, chapter I, theorem 3.1]). A gradient flow argument as in Remark 7.3, this time using the constrained Ginzburg–Landau functionals, again rules out finite-time blowupFootnote ⁵⁰ and thus the solution can be extended to $[0, \infty)$ (see [Reference Hale103, chapter I, theorem 2.1]).

In the double-obstacle potential case (as for (7.2)), we define a solution of (7.3) (or (7.4)) on an interval $T \subseteq R$ to be a pair $(u, β) \in V_{[0, 1], t \in T} \times V_{t \in T}$ with $u \in H_{l o c}^{1} (T; V) \cap C^{0} (T; V)$ such that (7.3) (or (7.4)) is satisfied at a.e. $t \in T$ and, for all $i \in V$ and a.e. $t \in T$ , $β_{i} (t) \in - \partial_{R} ı_{[0, 1]} (u_{i} (t))$ . (We will sometimes simply refer to $u$ as a solution, implying the existence of a corresponding $β$ .) We refer to Budd and Van Gennip [Reference Budd and van Gennip36, theorems 3.8 and 3.9] and Budd et al. [Reference Budd, van Gennip and Latz37, theorem 2.7] for proofs of existence and uniqueness of solutions for the initial-value problems corresponding to the mass-constrained and fidelity-forced graph Allen–Cahn equations, respectively. The function $u$ is uniquely determined for all $t$ . In the fidelity-forced case the function $β$ is uniquely determined for a.e. $t$ ; in the mass-conserving case $β$ is uniquely determined up to a constant for almost every time $t$ . Moreover, this constant is zero if there exists such a time $t$ and a node $i$ such that $u_{i} (t) \in (0, 1)$ (for more details, we refer to [Reference Budd and van Gennip36, theorem 3.8]).

7.4 Different Metrics

A gradient flow is determined by the functional of which the gradient is taken and the metric structure with respect to which the gradient is constructed. Earlier we used the metric induced by the $V$ -inner product. Other choices lead to other gradient flows. In Van Gennip [Reference van Gennip188, definition 4.1], for example, the $H^{- 1}$ -inner product for functions $u, v \in V$ with $M (u) = M (v) = 0$ is introduced:

⟨ u, v ⟩_{H^{- 1}} := ⟨ \nabla φ, \nabla ψ ⟩_{E} = ⟨ Δ φ, ψ ⟩_{V} = ⟨ φ, Δ ψ ⟩_{V},

where $φ, ψ \in V$ solve the graph Poisson equations $Δ φ = u$ and $Δ ψ = v$ , respectively. In [Reference van Gennip188, section 3.4] it is proven that the zero-mass conditions on $u$ and $v$ are necessary and sufficient to ensure that these Poisson equations have solutions. Moreover, these solutions are unique up to an additive constant, which does not influence the inner product. In [Reference van Gennip188, Supplementary Materials section 4] the graph Cahn–Hilliard equation (named thus, in analogy to the continuum $H^{- 1}$ -gradient flow of the Ginzburg–Landau functional; see section 3.7.3 in [Reference van Gennip and Budd190]) is derived as the $H^{- 1}$ -gradient flow of ${G L}_{ε}$ :

\begin{matrix} \frac{d u}{d t} = - Δ Δ u - \frac{1}{ε} Δ (W^{'} \circ u) .^{51} \end{matrix}

Footnote ⁵¹Since $M (Δ w) = 0$ for any $w \in V$ , we see that the Cahn–Hilliard equation automatically conserves mass. It may appear restrictive that we required $u$ to have zero mass, but if $M (u) \neq 0$ , then we can easily transform $u$ into a function with zero mass: $v = u - \frac{M (u)}{vol V}$ . This addition of a constant (as a function in $V$ ) term does not affect the Dirichlet term in ${G L}_{ε}$ and effectively shifts the graph of $W$ (and thus its wells) by $M (u) / vol V$ to the right. We note that this shift does depend on $u$ , but if we restrict ourselves to functions $u \in V$ with prescribed mass $M$ (not necessarily zero), we get a constant shift $M / vol V$ . This restriction is natural, since the Cahn–Hilliard equation preserves mass.

Other choices for the metric are possible, but have not yet been explored in the literature, to the best current knowledge of the authors.

8 Merriman–Bence–Osher Scheme

In this section we take a look at a threshold dynamics scheme called the Merriman–Bence–Osher (MBO) scheme.

8.1 Definition of the MBO Scheme

The MBO scheme in the continuum was originally introduced in Merriman et al. [Reference Barry, Bence and Osher146, Reference Barry, Bence and Osher147] as a method to approximate the flow by mean curvature (we refer to section 3.7.5 of [Reference van Gennip and Budd190] for more information about continuum mean curvature flow and to Section 9 of this Element for graph variants), but has found its way into the ‘differential equations on graphs’ literature in recent years mostly as an alternative way to (approximately) minimize ${G L}_{ε}$ . Before we consider some reasons why it is not unreasonable to use the scheme for this purpose, we will first give its recursive definition on graphs:

Graph MBO Scheme (without Constraints)

∙ Initialize. Choose an initial condition $u^{0} = χ_{S_{0}}$ with $S^{0} \subseteq V$ and a ‘time step’Footnote ⁵² $τ > 0$ .
∙ Step $k + 1$ : diffusion. Solve the diffusion/heat equation $\frac{d u}{d t} = - Δ u$ on $(0, τ]$ with initial condition $u (0) = u^{k}$ .
∙ Step $k + 1$ : threshold. Define, for all $i \in V$ , $u_{i}^{k + 1} := (\begin{matrix} 0, & if u_{i} (τ) < \frac{1}{2}, \\ 1, & if u_{i} (τ) \geq \frac{1}{2} . \end{matrix})$
∙ Stop. Stop the scheme when a stopping condition or predetermined number of steps has been achieved.Footnote ⁵³

By standard methods for linear ODEs (see Hale [Reference Hale103, section III.1]), it follows that the diffusion step has a unique outcome $u (τ)$ , given an initial condition $u^{k}$ .

The output of this scheme is a sequence of functions $u^{0}, u^{1}, u^{2}, \dots \in V_{{0, 1}}$ , or equivalently a sequence of subsets $S^{0}, S^{1}, S^{2}, \dots \subseteq V$ , where $u^{k} = χ_{S^{k}}$ . We will freely move between both representations as is convenient.

A potential, minor, source of ambiguity in the literature concerns the value of $u_{i}^{k + 1}$ if $u_{i} (τ)$ is exactly at the threshold: if $u_{i} (τ) = \frac{1}{2}$ , we can decide to set $u^{k + 1} = 0$ or $u^{k + 1} = 1$ . We have decided on the latter for the algorithm given earlier, but in other literature the other choice may have been made. In practice, it is unlikely that $u_{i} (τ) = \frac{1}{2}$ is exactly achieved, but for theoretical purposes this choice will determine the (non-)strictness of various inequalities along the way.

A first, hand-waving, reason to expect the graph MBO scheme to minimize ${G L}_{ε}$ is its superficial similarity to the Allen–Cahn equation from (7.1). Instead of solving a nonlinear differential equation as in (7.1), when executing the MBO scheme the solution of a linear differential equation is sought in the diffusion step up to a, typically small,Footnote ⁵⁴ time $τ > 0$ , after which the phase-separating drive of the $W$ -derived nonlinear term from (7.1) is mimicked by the thresholding step. (For some more details about this interpretation of the MBO scheme as a time-splitting scheme, we refer to remark 6.1.8 in [Reference van Gennip and Budd190].)

Another reason to suspect similar behaviour from the graph MBO scheme as from the graph Allen–Cahn equation is that in the continuum both processes approximate mean curvature flow (for small $ε$ in the Allen–Cahn case and small $τ$ in the MBO case). This can be made rigorous in the form of precisely formulated convergence results, whose details fall outside the scope of the current work. In section 3.7.6 of [Reference van Gennip and Budd190] we give a brief overview of some literature about the continuum versions of the graph models, equations, and schemes we discuss in the current work, including these convergence results.

In chapter 6 of that same work [Reference van Gennip and Budd190], we also present a rigorous link between a specific time discretization of the Allen–Cahn equation with double-obstacle potential (7.2) and the MBO scheme.

8.2 The Signless MBO Scheme

Starting from the signless Ginzburg–Landau functional from Section 5.3, Keetch and Van Gennip [Reference Keetch and van Gennip117] and Keetch [Reference Keetch116] define a signless MBO scheme.Footnote ⁵⁵

Signless Graph MBO Scheme

∙ Initialize (signless). Choose an initial condition $u^{0} = χ_{S_{0}} - χ_{S_{0}^{c}}$ with $S^{0} \subseteq V$ and a ‘time step’ $τ > 0$ .
∙ Step $k + 1$ : signless diffusion. Solve the signless diffusion/heat equation $\frac{d u}{d t} = - Δ^{σ} u$ on $(0, τ]$ with initial condition $u (0) = u^{k}$ .
∙ Step $k + 1$ : signless threshold. Define, for all $i \in V$ ,
$u_{i}^{k + 1} := (\begin{matrix} - 1, & if u_{i} (τ) < 0, \\ 1, & if u_{i} (τ) \geq 0. \end{matrix})$
∙ Stop.

We note that, besides the use of the signless graph Laplacian $Δ^{σ}$ in the diffusion step, we also require the binary functionsFootnote ⁵⁶ $u^{k}$ to take values that are symmetric with respect to the origin, as argued in Section 5.3.

As with the diffusion step before, standard ODE techniques guarantee the existence of a unique outcome of the signless diffusion step.

In [Reference Keetch and van Gennip117] it is shown that the signless MBO scheme performs well for the Max-Cut problem (see Section 5.3) in some practical tests, finding comparable or greater cut values than the Goemans–Williamson algorithm [Reference Goemans and Williamson99], in less computation time – but without any rigorous performance guarantees. Finding such guarantees for any of the MBO-type algorithms is currently still an open question of great interest.

8.3 The MBO Scheme with Constraints

A mass or fidelity constraint, as in Section 5.1, can also be incorporated into the MBO scheme. In the current section we will state and explain these schemes. The rigorous link between the Allen–Cahn equation and MBO scheme which was first established in Budd and Van Gennip [Reference Budd and van Gennip35] and which is also presented in chapter 6 of [Reference van Gennip and Budd190] does explain how these MBO schemes with constraint can be derived from the Allen–Cahn equations with constraint as presented in Section 7.3.

To incorporate a mass constraint as in [Reference van Gennip188, section 5.3]Footnote ⁵⁷ and [Reference Budd and van Gennip36, theorem 4.16], we first require some more notation. If $u$ is the solution of the $(k + 1)^{s t}$ diffusion step, we denote the preimage of $x \in R$ under $u$ at time $τ$ by

U (x) := {i \in V : u_{i} (τ) = x} .

Let $L \in N$ be the (always strictly positive and finite) number of $x_{ℓ} \in R$ for which $U (x_{ℓ}) \neq \emptyset$ and assume the labels $x_{ℓ}$ are such that $x_{1} < \dots < x_{L}$ . Then by [Reference Budd and van Gennip36, theorem 4.16] there is a unique $ℓ^{*} \in [K]$ such that

\sum_{ℓ = ℓ^{*} + 1}^{L} M (U (x_{ℓ})) < M (u^{k}) \leq \sum_{ℓ = ℓ^{*}}^{L} M (U (x_{ℓ})) .

We now replace the threshold step by a mass-conserving threshold step to get a mass-conserving MBO scheme:

Mass-Conserving Graph MBO Scheme

Initialize.
Step $k + 1$ : diffusion.
Step $k + 1$ : mass-conserving threshold. Let $u^{k + 1}$ be any function in $V$ which satisfies $M (u^{k + 1}) = M (u^{k})$ and, for all $i \in V$ ,
$u_{i}^{k + 1} := (\begin{matrix} 0, & if u_{i} (τ) \in ⋃_{1 \leq ℓ < ℓ^{*}} U (x_{ℓ}), \\ 1, & if u_{i} (τ) \in ⋃_{ℓ^{*} < ℓ \leq L} U (x_{ℓ}) . \end{matrix})$
Stop.

Remark 8.1. In the mass-conserving threshold step we order the nodes by their value of the diffused state $u_{i} (τ)$ and assign, by setting $u_{i}^{k + 1} = 1$ , as much mass to the nodes at the top of the order (with the highest diffused state values $x_{L} > \dots > x_{ℓ^{*} + 1}$ ) as we can without assigning more than the required total mass $M (u_{k})$ . The leftover mass can be assigned in any way to the nodes in $U (x_{ℓ^{*}})$ . If and only if $| U (x_{ℓ^{*}}) | > 1$ and $\sum_{ℓ = ℓ^{*}}^{L} M (U (x_{ℓ})) \neq M (u^{k})$ , this introduces nonuniqueness into the scheme (see [Reference Budd and van Gennip36, theorem 4.16] or [Reference Budd33, theorem 4.2.16]). In [Reference van Gennip188] the choice is made to pick exactly enough (arbitrarily chosen) nodes $i \in U (x_{ℓ^{*}})$ so that all leftover mass can be assigned to them. If $r \neq 0$ , the maximum amount of mass that can be assigned per node, namely $d_{i}^{r}$ , is node-dependent, so that it is possible (in practice likely) that the leftover mass does not match exactly the sum $\sum d_{i}^{r}$ over the chosen nodes. In this case one of the chosen nodes does not get its full mass assigned, that is the resulting function $u^{k + 1}$ will be non-binary, since $u_{i} / \in {0, 1}$ at this one node. Other choices to deal with the nonuniqueness are possible, for example, an equal division of mass over all nodes in $U (x_{ℓ^{*}})$ , which may also lead to a non-binary function $u^{k + 1}$ .

Next, we incorporate a fidelity constraint into the MBO scheme [Reference Budd, van Gennip and Latz37], instead of a mass constraint. This time we change the diffusion step of the scheme, instead of the threshold step. Restricting to the case $p = 2$ (i.e., a fidelity term of the form $\frac{1}{2} ∥ μ^{\frac{1}{2}} (u - f) ∥_{V}^{2}$ in the functional) and taking a clue from the fidelity-forced Allen–Cahn equation from (7.3), we replace the diffusion step by a fidelity-forced diffusion step to get a fidelity-forced MBO scheme (version 1):

Fidelity-Forced Graph MBO Scheme (Version 1)

Initialize.
Step $k + 1$ : fidelity-forced diffusion. Solve the fidelity-forced diffusion/heat equation $\frac{d u}{d t} = - Δ u - μ (u - f)$ on $(0, τ]$ with initial condition $u (0) = u^{k}$ .
Step $k + 1$ : threshold.
Stop.

The interest in MBO schemes on graphs started with the introduction of this fidelity-forced scheme in Merkurjev et al. [Reference Ekaterina, Tijana and Bertozzi144] for classification and image processing applications (see also chapter 4 of [Reference van Gennip and Budd190]).

As before, by the usual methods for linear systems of ODEs (see Hale [Reference Hale103, chapter III.1]), it follows that the fidelity-forced diffusion step has a unique outcome, given an initial condition $u^{k}$ .

In Section 7.3 we described an alternative way to incorporate a fidelity constraint into a mass-conserving Allen–Cahn equation from Calder et al. [Reference Calder, Cook, Thorpe, Slepčev, Hal and Singh44]. This can be used in a non-mass-conserving context as well, giving rise to the following fidelity-forced scheme (version 2) (the continuum limit of which is examined in Laux and Lelmi [Reference Laux and Lelmi124]; see also section 7.2.6 of [Reference van Gennip and Budd190]):

Fidelity-Forced Graph MBO Scheme (Version 2)

Initialize.
Step $k + 1$ : fidelity-forced diffusion. Solve the fidelity-forced diffusion/heat equation $\frac{d u}{d t} = - Δ u + μ f$ on $(0, τ]$ with initial condition $u (0) = u^{k}$ .
Step $k + 1$ : threshold.
Stop.

If we impose both a fidelity and mass constraint at once, both the diffusion and threshold steps of the original MBO scheme get replaced. While the threshold step gets replaced by the mass-conserving threshold step as described earlier, the diffusion step must be replaced by a mass-conserving fidelity-forced diffusion step to obtain a mass-conserving fidelity-forced MBO scheme (version 1):

Mass-Conserving Fidelity-Forced Graph MBO Scheme (Version 1)

Initialize.
Step $k + 1$ : mass-conserving fidelity-forced diffusion. Solve the mass-conserving fidelity-forced diffusion/heat equation $\frac{d u}{d t} = - Δ u - μ (u - f) + \frac{1}{vol V} ⟨ μ, u - f ⟩_{V}$ on $(0, τ]$ with initial condition $u (0) = u^{k}$ .
Step $k + 1$ : mass-conserving threshold.
Stop.

As above, we can also define a different mass-conserving fidelity-forced MBO scheme, used in Calder et al. [Reference Calder, Cook, Thorpe, Slepčev, Hal and Singh44], which uses the following mass-conserving fidelity-forced diffusion step instead of the one described earlier to arrive at a mass-conserving fidelity-forced MBO scheme (version 2):

Mass-Conserving Fidelity-Forced Graph MBO Scheme (Version 2)

Initialize.
Step $k + 1$ : mass-conserving fidelity-forced diffusion. Solve the mass-conserving fidelity-forced diffusion/heat equation
$\frac{d u}{d t} = - Δ u + μ f - \frac{1}{vol V} ⟨ μ, f ⟩_{V}$ on $(0, τ]$ with initial condition $u (0) = u^{k}$ .
Step $k + 1$ : mass-conserving threshold.
Stop.

As for the other diffusion(-like) steps earlier, we note that mass-conserving fidelity-forced diffusion initial-value problems have unique solutions.

As mentioned before, for explanations of how these alternative steps in the MBO scheme can be derived, we refer to [Reference van Gennip and Budd190]. It is also interesting to compare the two alternative fidelity-forced and mass-conserving fidelity-forced MBO schemes which we presented above. A full comparison is an interesting topic for future work. As an initial step, in the companion volume we have included a closer look at the steady states for the two mass-conserving fidelity-forced diffusion steps. It should be noted that in practice these steps will be run for a short time and steady states will not be achieved, but they do shed light on some features that distinguish both methods.

9 Graph Curvature and Mean Curvature Flow

As with many of the other concepts, functionals, and dynamics we have introduced thus far, mean curvature flow (or motion by mean curvature) finds it origins in the continuum setting. It refers to the evolution of a set in which each boundary point moves with a normal velocity proportional to the local mean curvature at that point. In this section we describe various attempts to define a meaningful concept of curvature and mean curvature flow (MCF) on a graph. For some references about mean curvature flow in the continuum setting, we refer to section 3.7.5 of [Reference van Gennip and Budd190]. First we introduce various notions of curvature on graphs.

9.1 Curvature

Various different definitions of graph curvature can be found in the literature. In this section we discuss a few of them.

In the discussion following (3.10) we determined that the maximum in the definition of $T V (u)$ is achieved by $φ = s g n (\nabla u)$ . In the continuum setting, if $u$ is the characteristic function of a set with smooth boundary, the supremum in the definition of total variation can be achieved by a smooth extension of the normal vector field on the boundary of the set. Inspired by this, in the graph setting, if $u$ is the characteristic function of some node subset $S \subseteq V$ , namely $u = χ_{S}$ , then we can interpret this edge function $φ$ as a graph normal analogously to the normal vector field in the continuum:

ν_{i j}^{S} := (s g n (\nabla χ_{S}))_{i j} = (\begin{matrix} s g n ((χ_{S})_{j} - (χ_{S})_{i}), & if ω_{i j} > 0, \\ s g n (0), & if ω_{i j} = 0. \end{matrix})

This prompted Van Gennip et al. [Reference Nestor, Braxton and Bertozzi191] to define graph (mean) curvature for a node subset $S$ as $κ := d i v ν^{S}$ . Thus (allowing for a brief return of our parameter $q$ from (3.1)):

\begin{matrix} κ_{i} & = \frac{1}{2} d_{i}^{- r} \sum_{j \in V} ω_{i j}^{q} ((s g n (\nabla χ_{S}))_{j i} - (s g n (\nabla χ_{S}))_{i j}) \\ = \frac{1}{2} d_{i}^{- r} (\begin{matrix} 2 \sum_{j \in S^{c}} ω_{i j}^{q} + \sum_{j \in S} ω_{i j}^{q} (s g n (0) - s g n (0)), & if i \in S, \\ - 2 \sum_{j \in S} ω_{i j}^{q} + \sum_{j \in S^{c}} ω_{i j}^{q} (s g n (0) - s g n (0)), & if i \in S^{c}, \end{matrix}) \\ = d_{i}^{- r} (\begin{matrix} \sum_{j \in S^{c}} ω_{i j}^{q}, & if i \in S, \\ - \sum_{j \in S} ω_{i j}^{q}, & if i \in S^{c} . \end{matrix}) \end{matrix}

We note that $κ$ is independent of the value chosen for $s g n (0)$ .

In El Bouchairi et al. [Reference Bouchairi, Abderrahim and Fadili69, remark 5.1(ii)] the graph mean curvature of a function $u \in V$ at $i \in V$ is defined as

(K (u))_{i} := - d_{i}^{- 1} \sum_{j \in V} ω_{i j} s g n (u_{j} - u_{i}),

(9.1)

where explicitly $s g n (0) = 1$ is chosen. For $r = 1$ and $q = 1$ , $K (χ_{S})$ would equal $κ$ , if $s g n (0) = 0$ had been chosen instead. The reason for the choice $s g n (0) = 1$ in $K$ is so that $(K (u))_{i}$ can be interpreted as the mean curvature $K_{i}^{{j \in V : u_{j} \geq u_{i}}}$ of the superlevel set ${j \in V : u_{j} \geq u_{i}}$ at $i \in V$ , where, for $S \subseteq V$ and $i \in V$ ,

K_{i}^{S} := d_{i}^{- 1} (\sum_{j \in S^{c}} ω_{i j} - \sum_{j \in S} ω_{i j}) .

We note that $(K (χ_{S}))_{i} = (\begin{matrix} K_{i}^{S}, & if i \in S, \\ - 1, & if i \in S^{c} . \end{matrix})$ The curvature used in El Chakik et al. [Reference Chakik, Abdallah and Desquesnes70] is the same as $K$ from (9.1).

In Zhou and Schölkopf [Reference Zhou, Schölkopf, Walter, Robert and Hanbury206, section 2.4] the authors define the graph mean curvature of $u \in V$ as

- \frac{1}{2} d i v (\frac{\nabla u}{2 ∥ \nabla u ∥_{E}}),

where $q = 1$ is chosen in the definition of the $E$ -norm (see (3.1)) and the gradient and divergence operators are scaled to be consistent with the symmetrically normalized Laplacian of (3.6). Aside from the differences in scaling, this approach is similar to the ones mentioned earlier.

Finally we mention two notions of Ricci curvature on graphs: Ollivier’s Ricci curvature and Forman’s Ricci curvature.

Ollivier curvature (or coarse Ricci curvature; see Ollivier [Reference Ollivier165, Reference Ollivier, Motoko, Masanori and Kumagai166], also Münch and Wojciechowski [Reference Münch and Wojciechowski152]) is defined to be

(κ_{G}^{O})_{i j} := 1 - \frac{W_{G} (μ_{i}, μ_{j})}{d_{i j}^{G}},

where $μ : V \to P (V)$ assigns a probability measure on the node set $V$ to each node in $V$ Footnote ⁵⁸ and $W_{G}$ is a graph Wasserstein distance. In this context, the graph distance $d_{i j}^{G}$ often is chosen to be as in (3.15) with $q = 2$ (or equivalently $q = 0$ , with reciprocal weights) – for unweighted graphs the choice of $q$ in $d_{i j}^{G}$ is, of course, not relevant.

In Ni et al. [Reference Chien-Chun, Lin, Luo and Gao162, Reference Chien-Chun, Lin, Luo and Gao163] and Tian et al. [Reference Tian, Lubberts and Weber185] the probability-measure-valued node function $μ$ is chosen to be

μ_{i} (j) := (\begin{matrix} α, & if j = i, \\ \frac{(1 - α) e^{- (d_{i j}^{G})^{p}}}{\sum_{k \in {j \in V : j \sim i}} e^{- (d_{i k}^{G})^{p}}}, & if j \sim i, \\ 0, & otherwise, \end{matrix})

where, for given $i \in V$ , $μ_{i} \in P (V)$ assigns probability $μ_{i} (j)$ to node $j \in V$ . The parameters $α \in [0, 1]$ and $p \geq 0$ are both chosen to be zero for unweighted graphs, in which case the probability distribution $μ_{i}$ is uniform over the neighbours of $i$ . On random geometric graphs (see section 7.2 of [Reference van Gennip and Budd190]), Van der Hoorn et al. [Reference Cunningham, Gabor, Carlo and Krioukov186, section III.B] has a slightly different definition for Ollivier curvature:

({\tilde{κ}}_{G}^{O})_{i j} := 1 - \frac{W_{G} (μ_{i}, μ_{j})}{δ},

where $δ > 0$ is the distance between the sample points corresponding to nodes $i$ and $j$ along the geodesic on the manifold from which the points are sampled, and $μ_{i}$ is the uniform distribution over the set of vertices that are within a weighted graph distance of at most $δ > 0$ of the node $i$ . If the edge weights $ω_{i j}$ are chosen to be equal to the distance between $i$ and $j$ on the manifold, then in [Reference Cunningham, Gabor, Carlo and Krioukov186], this curvature is proven to convergeFootnote ⁵⁹ to the Ricci curvature on the manifold that underlies the random geometric graph. In García Trillos and Weber [Reference Trillos and Weber96], explicit rates are proved for this convergence. We refer to those papers for further details, such as details about the graph Wasserstein distance (whose explanation would take us a bit too far afield here.) In particular, we refer to [Reference Trillos and Weber96] for an in-depth discussion of various choices of the graph distance and their impact on the result, and to [Reference Cunningham, Gabor, Carlo and Krioukov186, section II] and the references therein for information about more distinct notions of graph curvature.

Forman curvature (or combinatorial Ricci curvature) is defined in Forman [Reference Forman91] (see also [Reference Jost and Münch111, Reference Weber, Saucan and Jost196, Reference Weber, Saucan and Jost197]) for cell complexes. Here we do not introduce the concept in this general setting, but rather give its definition applied to graphs. Besides edge weights $ω_{i j}$ , the definition also allows for node weights $ω_{i}$ . In our standard setting without node weights, we may assume that, for all $i \in V$ , $ω_{i} = 1$ . Forman curvature is then defined to beFootnote ⁶⁰

(κ_{G}^{F})_{i j} := ω_{i j} (\frac{ω_{i}}{ω_{i j}} + \frac{ω_{j}}{ω_{i j}} - 2 \sum_{k \in V} (\frac{ω_{i}}{\sqrt{ω_{i j} ω_{i k}}} + \frac{ω_{j}}{\sqrt{ω_{i j} ω_{k j}}})) .

9.2 Mean Curvature Flow

In Section 9.1 we saw that Van Gennip et al. [Reference Nestor, Braxton and Bertozzi191] and El Chakik et al. [Reference Chakik, Abdallah and Desquesnes70] introduce very similar notions of graph mean curvature. They also both give definitions of graph MCF, but do so starting from different descriptions of MCF at the continuum level.

In [Reference Chakik, Abdallah and Desquesnes70] the inspiration comes from the level set description of MCF in the continuum [Reference Chen, Giga and Goto50, Reference Clarenz, Haußer, Rumpf, Voigt, Weikard and of Voigt56, Reference Evans and Spruck82, Reference Evans and Spruck83, Reference Evans and Spruck84, Reference Evans and Spruck85]: $\frac{d u}{d t} = | \nabla u | d i v (\frac{\nabla u}{| \nabla u |})$ , where the operators $d i v$ and $\nabla$ here denote the continuum versions of the divergence and gradient, respectively. If $u$ satisfies this equation, then each level set of $u$ evolves according to flow by mean curvature. Since $d i v (\frac{\nabla u}{| \nabla u |})$ gives the curvature of the level sets of $u$ , this leads [Reference Chakik, Abdallah and Desquesnes70] to introduce the following equations for MCF on graphs:

\begin{matrix} \frac{d u_{i}}{d t} & = 2^{p} (K (u))_{i}^{+} ∥ (\nabla u)^{+} ∥_{i, p} - 2^{p} (K (u))_{i}^{-} ∥ (\nabla u)^{-} ∥_{i, p}, & if p \in [1, \infty), \\ \frac{d u_{i}}{d t} & = (K (u))_{i}^{+} ∥ (\nabla u)^{+} ∥_{i, \infty} - (K (u))_{i}^{-} ∥ (\nabla u)^{-} ∥_{i, \infty}, & if p = \infty . \end{matrix}

(9.2)

Recall the definitions of the positive and negative parts (superscripts $\pm$ ) from Remark 3.3. Also recall the node-dependent $p$ -norms from (3.3).

A similar MCF is used in El Bouchairi et al. [Reference Bouchairi, Abderrahim and Fadili69] and called power mean curvature flow, where each instance of $K (u)$ in (9.2) is replaced by $| K (u) |^{α - 1} K (u)$ for a parameter $α \in [0, 1]$ (not to be confused with $α (ε)$ from Section 5). In particular, if $α = 1$ , then [Reference Bouchairi, Abderrahim and Fadili69, formula (6.5)] is the same as (9.2).

If locally in time the mean curvature $(K (u))_{i}$ at node $i$ is positive, only the term $2^{p} (K (u))_{i}^{+} ∥ (\nabla u)^{+} ∥_{i, p}$ or $(K (u))_{i}^{+} ∥ (\nabla u)^{+} ∥_{i, \infty}$ remains in the right-hand side of the the preceding equations, which leads to what [Reference Chakik, Abdallah and Desquesnes70] calls a discrete dilation process. Similarly, if locally in time the mean curvature $(K (u))_{i} < 0$ , only the terms with $(K (u))_{i}^{-}$ survive, leading to discrete erosion processes.

To understand those names, we recall from Remark 3.3 that [Reference Chakik, Abdallah and Desquesnes70, Reference Vinh-Thong, Elmoataz and Lézoray183] defines (nonlocal) dilation and erosion operators, $N L D$ and $NLE$ , respectively, in terms of the node-dependent $\infty$ -norms. These allow us to rewrite the graph MCF equation in (9.2) for $p = \infty$ as

\frac{d u_{i}}{d t} = (K (u))_{i}^{+} {(N L D (u))}_{i} + (K (u))_{i}^{-} {(NLE (u))}_{i} - u_{i} ((K (u))_{i}^{+} + (K (u))_{i}^{-}) .

For further brief discussions of the mean curvature flow from [Reference Chakik, Abdallah and Desquesnes70], in particular the decrease of total variation along its trajectories, we refer to Budd and Van Gennip [Reference Budd and van Gennip35, remark 6.4] and Budd [Reference Budd33, section 7.5]. Variants of this graph MCF equation are studied in El Bouchairi [Reference Bouchairi68] and El Bouchairi et al. [Reference Bouchairi, Abderrahim and Fadili69].

In [Reference Nestor, Braxton and Bertozzi191] (we also refer to Van Gennip [Reference van Gennip187, appendix B.1]) a different, variational, formulation is given for MCF, inspired by the variational formulation for continuum mean curvature flow in Almgren et al. [Reference Fred, Taylor and Wang6] and Luckhaus and Sturzenhecker [Reference Luckhaus and Sturzenhecker132]. This is a discrete-in-time scheme generating a sequence of node subsets $S^{k}$ , starting from an initial set $S^{0} \subseteq V$ , as follows:

S^{k + 1} \in {a r g m i n}_{S \subseteq V} T V (χ_{S}) + \frac{1}{ð t} ⟨ χ_{S} - χ_{S^{k}}, (χ_{S} - χ_{S^{k}}) d^{S^{k}} ⟩_{V} .

(9.3)

Here $d^{S^{k}}$ denotes the graph distance function (see Section 3.3) to $S^{k}$ , which is the graph boundary of $S^{k}$ defined by

S^{k} := {i \in V : \exists j \sim i {(χ_{S^{k}})}_{i} \neq {(χ_{S^{k}})}_{j}}

and $ð t > 0$ is a time-step parameter.

Besides its cosmetic resemblance to the continuum variational formulation of MCF in [Reference Fred, Taylor and Wang6, Reference Luckhaus and Sturzenhecker132], the definition for MCF in (9.3) has the desired property that any sequence $(S^{k})$ satisfying it does not increase total variation. To be precise, since $S = S^{k}$ is admissible in the minimization problem in (9.3), we see that

T V (χ_{S^{k + 1}}) \leq T V (χ_{S^{k + 1}}) + \frac{1}{ð t} ⟨ χ_{S^{k + 1}} - χ_{S^{k}}, (χ_{S^{k + 1}} - χ_{S^{k}}) d^{S^{k}} \leq T V (χ_{S^{k}}) .

It is not true, however, that $T V (χ_{S^{k + 1}}) = T V (χ_{S^{k}})$ only if $S^{k + 1} = S^{k}$ . Consider the following counterexample. Let $G = (V, E, ω)$ be the graph with $V = {1, 2, 3, 4}$ , $ω_{12} = ω_{21} = ω_{23} = ω_{32} = ω_{34} = ω_{43} = 1$ , $ω_{i j} = 0$ in all other cases, and $E$ contains $(i, j)$ if and only if $ω_{i j} \neq 0$ . If $S^{k} = {1, 2}$ , then $S^{k} = {2, 3}$ and thus $d_{1}^{S^{k}} = d_{4}^{S^{k}} = 1$ and $d_{2}^{S^{k}} = d_{3}^{S^{k}} = 0$ (recall from Remark 3.9 that we have chosen $q = 1$ ). Moreover, $d_{1}^{r} = d_{4}^{r} = 1$ and $d_{2}^{r} = d_{3}^{r} = 2^{r}$ . For this example, the second term in the objective function in (9.3) thus becomes

\frac{1}{ð t} ⟨ χ_{S} - χ_{{1, 2}}, (χ_{S} - χ_{{1, 2}}) d^{S^{k}} ⟩_{V} = \frac{1}{ð t} (((χ_{S})_{1} - 1)^{2} + (χ_{S})_{4}^{2}) .

Since $T V (χ_{S^{k}}) = 1$ , we know that $T V (χ_{S^{k + 1}}) \leq 1$ . Furthermore, if $S \in {\emptyset, V}$ , then $T V (χ_{S}) = 0$ , and if $S / \in {\emptyset, V}$ , then $T V (χ_{S}) \geq 1$ . In the former case, the objective function has value $\frac{1}{ð t}$ . Looking for potential minimizers $S$ in (9.3) that are different from $\emptyset$ and $V$ , we know that $T V (χ_{S}) = 1$ , thus $S \in {{1}, {4}, {1, 2}, {3, 4}, {1, 2, 3}, {2, 3, 4}}$ . Among these candidates, only those that minimize the second term of the objective functional are potential miminizers of (9.3), thus we require $1 \in S$ and $4 / \in S$ . This leaves $S \in {{1}, {1, 2}, {1, 2, 3}}$ . For each of these candidates the objective function has value $1$ and thus, if $ð t < 1$ , all these candidates are solutions $S^{k + 1}$ of (9.3). In particular we note that two of these candidates are different from $S^{k}$ , yet $T V (χ_{S^{k + 1}}) = T V (χ_{S^{k}}) = 1$ .

Mean curvature flow of a form very similar to (9.3) is analysed on square grids by Braides et al. in [Reference Braides, Gelli and Novaga29].

In light of the close connection between the MBO scheme (and the Allen–Cahn equation) and MCF in the continuum that was briefly touched upon in Section 8.1 (and Remark 7.4) – we refer to section 3.7.6 of [Reference van Gennip and Budd190] for more details – a natural question is whether such a connection also exists in the graph setting. In [Reference Budd and van Gennip35, section 6.2] (and in more detail in [Reference Budd33, section 3]) it is argued that a close connection between the graph MBO scheme and the variational graph MCF from (9.3) is not to be expected.

A first reason for this is that, even though when the time step is large (i.e., large $τ$ for MBO and large $ð t$ for MCF) each scheme will reach a stationary state equal to $u = 0$ ( $S = \emptyset$ ) or $u = 1$ ( $S = V$ ) after a single time step, the conditions governing which of these two states is chosen are different for MBO and MCF.

Since the main interest in these schemes is at small time steps, this first reason by itself would not necessarily be enough to abandon the formulation of MCF from (9.3). The other reason provided in [Reference Budd33, section 3] is more damning, if our intention is to connect MBO and MCF. Given a graph $G = (V, E, ω)$ (satisfying the conditions described in Section 3.1) and $η > 0$ , we can consider the $η$ -completion of $G$ , which we define to be the graph $G^{η} = (V, V^{2}, ω^{η})$ , where $ω_{i j}^{η} = ω_{i j}$ if $ω_{i j} > 0$ and $ω_{i j}^{η} = η$ if $i \neq j$ and $ω_{i j} = 0$ . In [Reference Budd33, theorem 7.3.1] it is shown that the difference (in $V$ -norm) between solutions of the heat equation on $G$ and on $G^{η}$ is bounded by $C_{η} t e^{C t}$ , where $C > 0$ is a constant and $0 < C_{η} = O (η)$ as $η ↓ 0$ .Footnote ⁶¹ Unless in very specific circumstances in which this small difference places both solutions on different sides of the threshold value in the MBO scheme, this shows that the MBO scheme will have very similar, if not the same, solutions on $G$ and $G^{η}$ . A similar conclusion is shown to hold for solutions of the graph Allen–Cahn equation.Footnote ⁶² On the other hand, however, if $S^{k} / \in {\emptyset, V}$ , then the graph boundary $S^{k}$ on $G^{η}$ is equal to $V$ , thus $d^{S^{k}} = 0$ and (9.3) reduces to $S^{k + 1} \in {a r g m i n}_{S \subseteq V} T V (χ_{S}) = {\emptyset, V}$ . For most graphs $G$ , this behaviour of MCF on $G^{η}$ will differ significantly from the behaviour of MCF (according to (9.3)) on $G$ . In other words, while the behaviour of solutions to the Allen–Cahn equation or MBO scheme is expected to be very stable under the $η$ -completion of $G$ , the behaviour of solutions of (9.3) is not. Simple redefinitions of the graph boundary, for example as $S^{k} \cap S^{k}$ or $S^{k} \cap (V ∖ S^{k})$ , do nothing to alter this conclusion.

A promising alternative is offered in [Reference Budd and van Gennip35, section 6.2] and [Reference Budd33, section 7.4]:

S^{k + 1} \in {a r g m i n}_{S \subseteq V} T V (χ_{S}) + \frac{1}{τ} ∥ e^{- \frac{1}{2} τ Δ} (χ_{S} - χ_{S^{k}}) ∥_{V}^{2} .

(9.4)

Just as we saw for solutions of (9.3), solutions of (9.4) satisfy $T V (χ_{S^{k + 1}}) \leq T V (χ_{S^{k}})$ . Unlike before, we now also have that equality is achieved if and only if $S^{k + 1} = S^{k}$ , since $e^{- \frac{1}{2} τ Δ}$ is an invertible operator and thus equality in

T V (χ_{S^{k + 1}}) \leq T V (χ_{S^{k + 1}}) + \frac{1}{τ} ∥ e^{- \frac{1}{2} τ Δ} (χ_{S^{k + 1}} - χ_{S^{k}}) ∥_{V}^{2} \leq T V (χ_{S^{k}}) .

cannot be achieved unless $χ_{S^{k + 1}} - χ_{S^{k}} = 0$ [Reference Budd33, proposition 7.4.1].

The main objection against the formulation in (9.3) does not apply to the formulation in (9.4). The total variation term in the objective function in (9.4) depends linearly on the edge weights and thus will differ only by an $O (η)$ term between graphs $G$ and $G^{η}$ ; the second term in the objective function depends on the solution of the graph diffusion equation and thus, by the argument from [Reference Budd33, theorem 7.3.1] discussed earlier, will also only differ by an $O (η)$ term.

The other, minor, objection which was brought against (9.3) also does not apply here. It is argued in [Reference Budd33, section 7.4] that the behaviour at large $τ$ of the scheme in (9.3) is very similar to that of the MBO scheme at large $τ$ .

In section 6.3 of [Reference van Gennip and Budd190] we compare MCF defined by (9.4) more closely with the MBO scheme.

10 Freezing of Allen–Cahn, MBO, and Mean Curvature Flow

Freezing (or pinning) is the phenomenon where for very small values of the parameters $ε$ (in the graph Allen–Cahn equation), $τ$ (in the graph MBO scheme), or $ð t$ or $τ$ (in the variational graph mean curvature flow) no, or only trivial, dynamics occur. (See also footnote Footnote 54.)

In the graph Allen–Cahn equation, if the parameter $ε$ is very small, the potential term dominates the right-hand side of the equation, leading to dynamics in which each value $u_{i}$ simply settles into the well of $W$ in which it starts. Rigorous quantitative statements are made in Van Gennip et al. [Reference Nestor, Braxton and Bertozzi191, theorem 5.3], Budd and Van Gennip [Reference Budd and van Gennip35, remark 4.7], Budd and van Gennip [Reference Budd and van Gennip36, section 3.3], and Budd [Reference Budd33, theorem 3.4.11]. We collect them in the following lemma.

We define $ρ_{Δ}$ to be the spectral radius of $Δ$ , namely the maximal eigenvalue of $Δ$ . In general, we write $ρ_{O}$ for the spectral radius of a linear operator $O : V \to V$ .

Lemma 10.1. Let $W (x) = (x - 1)^{2} (x + 1)^{2}$ . Let $u^{0} \in V$ and let $u$ solve

(\begin{matrix} \frac{d u_{i}}{d t} = - (Δ u)_{i} - \frac{1}{ε} d_{i}^{- r} W^{'} (u_{i}), & for all i \in V and t > 0, \\ u (0) = u^{0} . \end{matrix})

Then there exists a $C > 0$ such that, for all $t \geq 0$ , $∥ u (t) ∥_{V} \leq C$ . Now assume additionally that there exists an $α \in (0, 1)$ such that, for all $i \in V$ , $| u_{i}^{0} | \geq α$ . If

\begin{matrix} ε & \leq C^{- 1} ρ_{Δ}^{- 1} 4 α (1 - α^{2}) {(max_{i \in V} d_{i})}^{- \frac{r}{2}}, or \\ ε & \leq {(sup_{t \geq 0} ∥ Δ u (t) ∥_{V, \infty})}^{- 1} 4 α (1 - α^{2}) {(max_{i \in V} d_{i})}^{- r}, \end{matrix}

then, for all $i \in V$ , $t \mapsto s g n (u_{i} (t))$ is constant.

Now assume that $W$ is the double obstacle potential from (5.1) instead and $T$ is some interval. Let $S$ be a strict and nonempty subset of $V$ . Then $u (t) = χ_{S}$ , for all $t \in T$ , is a solution of (7.2) if and only if

ε \leq \frac{1}{2} ∥ Δ χ_{S} ∥_{V, \infty}^{- 1} .

Similarly $u (t) = χ_{S}$ solves the mass-conserving Equation (7.4) if and only if

ε max_{i \in S^{c}} | (Δ χ_{S})_{i} | \leq 1 - ε max_{i \in S} | (Δ χ_{S})_{i} | .

This latter condition is satisfied if $ε \leq \frac{1}{2} ∥ Δ χ_{S} ∥_{V, \infty}^{- 1} .$

Lastly, $u (t) = χ_{S}$ solves the fidelity-forced equation (7.3) if and only if

ε max_{i \in S^{c}} (| (Δ χ_{S})_{i} | + f_{i}) \leq \frac{1}{2} and ε max_{i \in S} (| (Δ χ_{S})_{i} | - f_{i} + μ_{i}) \leq \frac{1}{2} .

For the graph MBO scheme, the pinning or freezing phenomenon appears in a different guise. If the time parameter $τ$ in the diffusion step is so small that at every node the local function value does not cross the threshold value, then the thresholding step will return the initial value and hence the scheme is stationary. The critical value of $τ$ below which this behaviour is produced, depends not only on the structure of the graph, but also on the initial condition that is fed into the diffusion step. After all, it is known that for fixed $τ$ the graph MBO schemes converge in a finite number of steps [Reference Nestor, Braxton and Bertozzi191, proposition 4.6] and so eventually a stationary state will be reached which, for the given $τ$ , will be ‘frozen’ by the scheme. In practical applications, however, one wants to choose a value of $τ$ that does not immediately freeze the initial condition that is given at the start of the first iteration of the scheme. The following lemma collects bounds on the critical value for $τ$ for various variants of the graph MBO scheme. These bounds are not sharp and to the best of the authors’ knowledge, sharp bounds are not currently known. Similar restrictions on the size of $τ$ were already discussed in the context of a finite-difference discretization of the continuum MBO scheme in Ruuth [Reference Ruuth173, section 2.3]. The bounds listed in the next lemma are taken from Van Gennip et al. [Reference Nestor, Braxton and Bertozzi191, theorem 4.2], Budd and Van Gennip [Reference Budd and van Gennip35, theorem 4.5 and remark 4.6], Budd and Van Gennip [Reference Budd and van Gennip36, theorem 4.24] (for the mass-conserving scheme), Budd [Reference Budd33, section 4.3.2] (for both the mass-conserving scheme and the fidelity-forced scheme), Keetch and Van Gennip [Reference Keetch and van Gennip117, theorem 5.3] (for the signless scheme), and Keetch [Reference Keetch116, theorem 4.2.3] (also for the signless scheme). The references [Reference Budd33, Reference Budd and van Gennip35, Reference Budd and van Gennip36] actually provide information about a whole family of schemes, of which graph MBO is only one member. These so-called semidiscrete implicit Euler (SDIE) schemesFootnote ⁶³ are the main objects of study in chapter 6 of [Reference van Gennip and Budd190] and we briefly address the topic of freezing then and there.

To ease notation, we define $A : V \to V$ by $A u := Δ u + μ u$ . Recall that $ρ_{A}$ and $ρ_{Δ^{σ}}$ are the spectral radii of $A$ and the signless Laplacian $Δ^{σ}$ , respectively.

Lemma 10.2. Let $k \in N_{0}$ , $S \subseteq V$ such that $\emptyset \neq S \neq V$ , and $u^{k} = χ_{S}$ . Let $u^{k + 1}$ be the result of one iteration of either the graph MBO scheme or the mass-conserving graph MBO scheme applied to initial condition $u^{k}$ . If

τ < \frac{1}{2 ∥ Δ χ_{S} ∥_{V, \infty}}

or

τ < ρ_{Δ}^{- 1} log (1 + \frac{1}{2} {(min_{i \in V} d_{i})}^{r / 2} vol S^{- 1 / 2}),

(10.1)

then $u^{k + 1} = u^{k}$ .

Alternatively, let $S \subseteq V$ ( $S = \emptyset$ or $S = V$ are now allowed) and let $u^{k + 1}$ be the result of one iteration of the fidelity-forced graph MBO scheme applied to initial condition $u^{k}$ . If

τ < \frac{1}{2 sup_{t \in [0, \infty)} ∥ e^{- t A} χ_{V} ∥_{V, \infty} ∥ A χ_{S} - f ∥_{V, \infty}}

or

τ < ρ_{A}^{- 1} log (1 + \frac{1}{2} {(min_{i \in V} d_{i})}^{r / 2} {(vol S^{1 / 2} + ρ_{A}^{- 1} ∥ f ∥_{V})}^{- 1}),

then $u^{k + 1} = u^{k}$ .

Finally, let $S \subseteq V$ and $u^{k} = χ_{S} - χ_{S^{c}}$ , and let $u^{k + 1}$ be the result of one iteration of the signless graph MBO scheme applied to initial condition $u^{k}$ . If

τ < ρ_{Δ^{σ}}^{- 1} log (1 + {(min_{i \in V} d_{i})}^{r / 2} vol V^{- 1 / 2}),

then $u^{k + 1} = u^{k}$ .

If we let $f = 0$ and $A = Δ$ in the fidelity-forced case, we observe that we recover the conditions for the graph MBO scheme without forcing.

We note that the restriction $\emptyset \neq S \neq V$ in the first part of Lemma 10.2, does not diminish the generality of the lemma, since $χ_{\emptyset}$ and $χ_{V}$ are stationary states of the (mass-conserving) graph MBO scheme for any $τ > 0$ .

For the signless graph MBO scheme only one condition for $τ$ is given in Lemma 10.2, whereas two conditions are given for the other two schemes. To mimick one of the proofs from the other schemes to try and derive a second condition for the signless scheme, we would require a comparison principle (as in [Reference Nestor, Braxton and Bertozzi191, lemma 2.6(d)]; see also [Reference Budd33, theorem 3.2.6]) for the signless graph Laplacian $Δ^{σ}$ to produce an analogue of the proof of [Reference Nestor, Braxton and Bertozzi191, theorem 4.2], or we would require $e^{- t Δ^{σ}}$ to have a matrix representation with only nonnegative elements to mimick the proof from [Reference Budd33, lemma 4.3.4]. Neither is true; for a counterexample see section 3.4 of [Reference van Gennip and Budd190]. (Of course, the fact that mimicking the proofs from the other schemes does not work, does not prove a similar second condition does not hold.)

Remark 10.3. In section 6.1 [Reference van Gennip and Budd190] we present variational formulations of the graph MBO scheme and its fidelity-forced and mass-conserving variants. The functions generated by these schemes are solutions of these variational problems, but, due to some subtle nonuniqueness issues that are explained in that book, these are not the only solutions. Despite that, the results presented in Lemma 10.1 for these three graph MBO schemes in fact hold for all solutions of their variational formulations.

In [Reference Nestor, Braxton and Bertozzi191, theorem 4.8] (with an updated proof in [Reference van Gennip187, appendix A]) a condition for pinning in the graph MBO scheme at a specific node was given that depended only on the local graph structure around that node. Unfortunately, later in [Reference van Gennip187, theorem 3.1], it was proven that this is an ‘empty’ condition, namely that it cannot be satisfied by any graph. In [Reference van Gennip187, sections 4 and 5] ‘nonempty’ local conditions are provided specifically for star graphs and regular trees.

Pinning occurs in the MBO scheme if $τ$ is chosen to be small; on the other hand, if $τ$ is large, another example of trivial dynamics may occur, as one implementation of the diffusion step could lead to a state $u (τ)$ which is close to the constant steady state $\frac{M (u^{0})}{vol V} χ_{V}$ and hence the subsequent threshold step would return either $u^{1} = χ_{V}$ or $u^{1} = χ_{\emptyset}$ , which are stationary states of the MBO scheme. The following lemma gives a precise result (originally published as [Reference Nestor, Braxton and Bertozzi191, theorem 4.3]). We write $λ_{2}$ for the second smallest eigenvalue of $Δ$ (i.e., the algebraic connectivity; see Section 6).Footnote ⁶⁴

Lemma 10.4. Let $k \in N_{0}$ , $S \subseteq V$ , $u^{k} = χ_{S}$ , and assume that $R_{S} := \frac{M (u^{k})}{vol V} \neq \frac{1}{2}$ . Let $u^{k + 1}$ be the result of one iteration of the graph MBO scheme applied to initial condition $u^{k}$ . If

τ > λ_{2}^{- 1} log (\frac{(vol S)^{1 / 2} (vol S^{c})^{1 / 2}}{(vol V)^{1 / 2} | R_{S} - \frac{1}{2} | {(min_{i \in V} d_{i})}^{r / 2}}),

(10.2)

then

u^{k + 1} = (\begin{matrix} χ_{V}, & if R_{S} > \frac{1}{2}, \\ χ_{\emptyset}, & if R_{S} < \frac{1}{2} . \end{matrix})

It is natural to ask if there is a gap between the lower bound of (10.1) below which pinning occurs in the MBO scheme and the upper bound of (10.2) above which the dynamics is trivial in the way described in Lemma 10.4. The following result from [Reference Nestor, Braxton and Bertozzi191, theorem 4.4] (with corrections to the proof in [Reference van Gennip187, appendix B.2]; see also the discussion in [Reference Budd33, section 7.3.2]) answers affirmatively.

Lemma 10.5. If $\frac{λ_{2}}{ρ_{Δ}} < \frac{log \sqrt{2}}{log (3 / 2)} \approx 0.85$ , then there exists a $τ$ which does satisfy neither the inequality in (10.1), nor the inequality in (10.2).

We emphasize that there is no reason to believe that the inequality conditions in the lemmas above are sharp. In particular, Lemma 10.5 does not preclude the possibility that, for a particular given graph and particular given initial condition $u^{k}$ , every choice of $τ > 0$ leads to either the behaviour described in Lemma 10.2 or the behaviour from Lemma 10.4.

In Boyd et al. [Reference Boyd, Bae, Tai and Bertozzi27, propositions 4.1 and 4.2], results similar to some of those in Lemmas 10.2 and 10.4 are presented for an MBO scheme with a generalized diffusion step.Footnote ⁶⁵ That particular scheme is used for modularity optimization (see section 4.1.4 of [Reference van Gennip and Budd190]). The practical suggestion in [Reference Boyd, Bae, Tai and Bertozzi27, section 4.5] is to choose $τ$ as the geometric mean of the (trivial-dynamics-avoiding) upper and lower bounds that are obtained for $τ$ .

We end this section by taking a brief look at the behaviour of graph mean curvature flow for small $τ$ and large $τ$ . As explained in Section 9.2 there is currently not one main definition for the graph MCF and the behaviour at small or large time steps will depend on which form of the scheme that is chosen. For some notes on the small time-step behaviour of some of the variational graph MCF formulations we refer the interested reader to [Reference Nestor, Braxton and Bertozzi191, remark 3.10] and [Reference Budd33, theorem 7.4.2 and note 38], for information about behaviour at large time steps to [Reference Budd33, sections 7.3.2 and 7.4]. We end this section by giving some results from [Reference Budd33, section 7.4] for the graph MCF defined in (9.4) in detail.

We require the Lambert $W$ -function [Reference Corless, Gonnet, Hare, Jeffrey and Knuth57], which we denote by $W_{L}$ . We note that, if $x \in [0, \infty)$ (as is the relevant case in Lemma 10.6), then $W_{L}$ is the unique function on $[0, \infty)$ that satisfies $W_{L} (x) e^{W_{L} (x)} = x$ . To make the following bound more intuitive, we note the following simple approximation for $W_{L} (x)$ when $x \geq 0$ , which is given in Iacono and Boyd [Reference Roberto and Boyd107]:

W_{L} (x) \approx log (1 + \frac{x}{1 + \frac{1}{2} log (1 + x)}) .

Lemma 10.6. Let $k \in N$ , $S^{k} \subseteq V$ , and let $S^{k + 1}$ be a graph MCF update defined by (9.4). If

\begin{matrix} τ e^{τ ρ_{Δ}} < \frac{min_{i \in V} d_{i}^{r}}{max_{S \subseteq V} T V (χ_{S})}, or equivalently, τ < ρ_{Δ}^{- 1} W_{L} (\frac{ρ_{Δ} min_{i \in V} d_{i}^{r}}{max_{S \subseteq V} T V (χ_{S})}), \end{matrix}

then $S^{k + 1} = S^{k}$ .

Now assume $τ > \frac{vol V}{min_{(i, j) \in E} ω_{i j}}$ , then $S^{k + 1} \in {\emptyset, V}$ . If $vol S^{k} < \frac{1}{2} vol V$ , then $S^{k + 1} = \emptyset$ . Alternatively, if $vol S^{k} > \frac{1}{2} vol V$ instead, then $S^{k + 1} = V$ .

We note that the volume conditions on $S^{k}$ in the ‘large $τ$ ’ case of Lemma 10.6, correspond to the conditions on $R_{S}$ in Lemma 10.4 which decided the ‘large $τ$ ’ behaviour of the MBO scheme.

The results we have mentioned thus far have all been obtained by studying the discrete schemes directly and all provide parameter regions in which freezing is guaranteed to happen. A different approach is to consider the continuum limit of the graph-based MBO scheme and obtain parameter scalings under which the limiting dynamics is not frozen, as a way to guide parameter choices to avoid freezing. One example in the literature is Misiats and Yip [Reference Misiats and Yip149], which considers three regimes for the parameter $τ$ for MBO on a two-dimensional square grid with grid size $h$ . With $h = C (τ)^{γ}$ , for some constant $C > 0$ , the regimes are $γ > 1$ , in which the limiting dynamics is mean curvature flow; $γ < 1$ , in which the limiting dynamics is frozen; and the critical case $γ = 1$ in which the dynamics depends on finer details. Another example is Reference Laux and LelmiLaux and Lelmi (2021, theorem 6 and corollary 8) which studies the continuum limit of (a generalization of) the graph MBO scheme on random geometric graphs and guarantees nontrivial limit dynamics if $τ ↓ 0$ such that $lim_{| V | \to \infty} [(log (| V |))^{α} τ]^{- 1} = 0$ and $lim_{| V | \to \infty} ε (log (| V |))^{β} = 0$ , for constants $α > 0$ (not to be confused with $α (ε)$ from Section 5) and $β > 0$ in a specified range. Here $ε$ is the length scale used to construct the edge weights in the graph. (More details about the construction of the random geometric graphs are given in section 7.2 of [Reference van Gennip and Budd190].) Interestingly, the convergence to limiting dynamics remains valid even if the spectral decomposition of the graph Laplacian is truncated after the lowest $K$ eigenvalues and their corresponding eigenfunctions, as is commonly done in implementations to keep the scheme computationally manageable and to filter out noise that may be present in the higher eigenfunctions (see also chapter 5 [Reference van Gennip and Budd190]). In this case $K \geq (log (| V |))^{q}$ is required, for $q > 0$ (unrelated to the parameter $q$ from Section 3) in a specified range.

11 Multiclass Extensions

The models we have discussed so far all describe exactly two phases (or classes, or clusters), which allows for a characterization of the state of the system via a single (approximately) binary-valued function $u$ . Many of these models can be extended to incorporate multiple classes; in this section we will discuss some of these extensions, specifically for the Ginzburg–Landau functional, Allen–Cahn equation, and MBO scheme. (In Shen and Kang [Reference Shen and Kang175] total variation for functions on a continuum domain and with a discrete multiclass range (quantum TV) is investigated for image processing applications.)

The two-phase nature of the functional ${G L}_{ε}$ is the result of the double-well potential and double-obstacle potential each achieving their minimum value in two places (‘wells’). For a multiphase version of the Ginzburg–Landau functional, we thus require a potential with multiple minimizers.

One approach is using a potential function $\tilde{W} : R \to R$ with multiple wells instead of $W$ , as is done with a periodic-well potential in Garcia-Cardona et al. [Reference Cristina, Arjuna, Percus and Marsico93; Reference Cristina, Arjuna, Percus, of Fred, Ana and Marsico94]. The one-dimensional domain of $\tilde{W}$ has the advantage that the state of the system can still be described by a single real-valued function $u : V \to R$ ; but having more than two wells in the one-dimensional domain necessarily creates an asymmetry in the distances between the wells. For example, if the potential has wells located at $- 1$ , $0$ , and $1$ , then the distance between the first and third wells is twice that between the first and second wells. Combined with the Dirichlet energy term in the Ginzburg–Landau functional, this creates an unwanted bias in favour of interfaces between the $- 1$ - and $0$ -phases (or between the $0$ - and $1$ -phases) over interfaces between the $- 1$ - and $1$ -phases. In [Reference Cristina, Arjuna, Percus and Marsico93; Reference Cristina, Arjuna, Percus, of Fred, Ana and Marsico94] this problem is resolved by the use of a generalized difference function in the Dirichlet energy term, which compensates for the asymmetry.

A second approach, which has found more traction in the literature, is using a potential function on a higher-dimensional domain, as in Merkurjev et al. [Reference Ekaterina, Cristina, Bertozzi, Arjuna and Percus143] and Garcia-Cardona et al. [Reference Cristina, Ekaterina, Bertozzi, Arjuna and Percus95]. The state of a system with $K$ phases is now described by a vector-valued function $u : V \to R^{K}$ . We denote the set of such functions by $V^{K} := V_{R^{K}}$ and the $k$ th component of the value of such a function $u$ at node $i$ by $(u_{i})_{k} \in R$ . Just as there is a bijective correspondence between functions $u \in V$ and vectors in $R^{n}$ (see Remark 3.4), there is also a bijective correspondence between functions $u \in V^{K}$ and $n$ -by- $K$ real matrices $U$ , with entries $U_{i j} = (u_{i})_{j}$ , that is, $U_{i j}$ is the $j^{th}$ entry of the vector $u_{i} \in R^{K}$ . We write $u_{\cdot j}$ for the $j^{th}$ component function of $u$ , that is, $(u_{\cdot j})_{i} = (u_{i})_{j}$ . The $K$ phases are represented by the standard basis vectorsFootnote ⁶⁶ $e_{k} \in R^{K}$ (i.e., $(e_{k})_{j} = 1$ if $k = j$ and $(e_{k})_{j} = 0$ if $k \neq j$ ), with $k \in [K]$ , and thus the potential function should have wells located at the corners of the $K$ -dimensional Gibbs simplex

Σ^{K} := (x \in [0, 1]^{K} : \sum_{k = 1}^{K} x_{k} = 1) .

(11.1)

If $E_{K} := {e_{1}, \dots, e_{K}}$ is the set of standard basis vectors in $R^{K}$ , then

V_{E}^{K} := V_{E_{K}}

is the set of node functions that take values on the corners of the simplex. Hence, if $u \in V_{E}^{K}$ with matrix representation $U \in R^{n \times K}$ and we write $U_{i *}$ for the $i$ th row of $U$ , then, for all $i \in [n]$ , $U_{i *} \in E_{K}$ .

In [Reference Ekaterina, Cristina, Bertozzi, Arjuna and Percus143; Reference Cristina, Ekaterina, Bertozzi, Arjuna and Percus95] the multiclass potential

W_{K} (x) := \frac{1}{2} \prod_{k = 1}^{K} \frac{1}{4} ∥ x - e_{k} ∥_{1}^{2}

is used for $x \in R^{K}$ , where $∥ x ∥_{1} := \sum_{i = 1}^{K} | x_{i} |$ and, more generally, $∥ x ∥_{p} := {(\sum_{i = 1}^{K} | x_{i} |^{p})}^{\frac{1}{p}}$ for $p \in [1, \infty)$ and $x \in R^{K}$ . The Dirichlet energy term now extends straightforwardly to $K$ classes:Footnote ⁶⁷

\frac{1}{2} \sum_{k = 1}^{K} ∥ \nabla u_{\cdot k} ∥_{E}^{2},

where in matrix representation $u_{\cdot k}$ is represented by the $k^{th}$ column of $U$ . Applying these multiclass extensions to the graph Ginzburg–Landau functional from Section 5, we arrive at the following multiclass graph Ginzburg–Landau functional ${G L}_{ε}^{K} : V^{K} \to R$ :

{G L}_{ε}^{K} (u) := \frac{1}{2} α (ε) \sum_{k = 1}^{K} ∥ \nabla u_{\cdot k} ∥_{E}^{2} + \frac{1}{ε} W_{K} (u),

where $W_{K} (u) := \sum_{i \in V} W_{K} (u_{i})$ or $W_{K} (u) := \sum_{i \in V} d_{i}^{r} W_{K} (u_{i})$ (see Section 5 for a discussion about these two options in the context of the two-phase functional ${G L}_{ε}$ and about $α (ε)$ ; also see Remark 7.1).

In the case when $K = 2$ , ${G L}_{ε}^{K}$ is not equal to ${G L}_{ε}$ . We can, however, derive a connection between the two under the assumption that $u_{\cdot 1} = v$ and $u_{\cdot 2} = 1 - v$ , for some $v \in V$ . We note that this is equivalent to requiring $u \in Σ^{2}$ , and note that that node $i$ being in one of the pure phases, namely $u_{i} = e_{1} \in R^{2}$ or $u_{i} = e_{2} \in R^{2}$ , corresponds to $v_{i} = 1$ or $v_{i} = 0$ , respectively. Then we compute

\frac{1}{2} (∥ \nabla u_{\cdot 1} ∥_{E}^{2} + ∥ \nabla u_{\cdot 2} ∥_{E}^{2}) = \frac{1}{2} (∥ \nabla v ∥_{E}^{2} + ∥ \nabla (1 - v) ∥_{E}^{2}) = ∥ \nabla v ∥_{E}^{2}

and

W_{2} (u_{i}) = \frac{1}{32} ∥ u_{i} - e_{1} ∥_{1}^{2} ∥ u_{i} - e_{2} ∥_{1}^{2} = \frac{1}{32} {(2 | 1 - v_{i} |)}^{2} {(2 | v_{i} |)}^{2} = 2 W (v_{i}),

if $W (x) = \frac{1}{4} x^{2} (x - 1)^{2}$ , which is one of the options discussed in Section 5. Hence, under these assumptions,

{G L}_{ε}^{2} (u) = 2 {G L}_{ε} (v),

and so for minimization purposes both functionals are equivalent.

Given a multiclass graph Ginzburg–Landau functional, a corresponding multiclass graph Allen–Cahn equation is defined as its gradient flow with respect to the $V^{K}$ -inner product

⟨ u, v ⟩_{V^{K}} := \sum_{k = 1}^{K} ⟨ u_{\cdot k}, v_{\cdot k} ⟩_{V},

for $u, v \in V^{K}$ . With $s \in R$ and for all $u, v \in V^{K}$ , we have

{(\frac{d}{d s} {G L}_{ε}^{K} (u))}_{s = 0} = α (ε) ⟨ \overline{Δ} u, v ⟩_{V^{K}} + \frac{1}{ε} ⟨ D W_{K} \circ u, v ⟩_{V^{K}},

where $\overline{Δ} : V^{K} \to V^{K}$ and $D W_{K} : V^{K} \to V^{K}$ are defined by, for all $u \in V^{K}$ , $i \in V$ , and $k \in [K]$ ,

{({(\overline{Δ} u)}_{i})}_{k} := {(Δ u_{\cdot k})}_{i} and {({(D W_{K} \circ u)}_{i})}_{k} := \partial_{k} W_{K} (u_{i}),

with $\partial_{k}$ the partial derivative with respect to the $k^{th}$ component of the independent variable in $R^{K}$ . Here we have chosen $W_{K} (u) := \sum_{i \in V} d_{i}^{r} W_{K} (u_{i})$ ; if $W_{K} (u) := \sum_{i \in V} W_{K} (u_{i})$ instead, we would need to include the degrees in the second term, as in Remark 7.1, or, equivalently, redefine ${({(D W_{K} (u))}_{i})}_{k} := d_{i}^{- r} \partial_{k} W_{K} (u_{i})$ .

This leads to the following multiclass graph Allen–Cahn equation:

\frac{d u}{d t} = - α (ε) \overline{Δ} u - \frac{1}{ε} D W_{K} \circ u .

In [Reference Ekaterina, Cristina, Bertozzi, Arjuna and Percus143; Reference Cristina, Ekaterina, Bertozzi, Arjuna and Percus95] also a fidelity-forcing term is included in the Allen–Cahn equation.

Similarly, also a multiclass MBO scheme can be defined. In the diffusion step $K$ uncoupled diffusion equations are solved, leading to an intermediate output $u \in V^{K}$ . In the threshold step, we first project each $u_{i} \in R^{K}$ onto the simplex $Σ^{K}$ and then assign (one of) the closest corner(s) of the simplex to each node $i$ . This leads to the following scheme.

Multiclass Graph MBO Scheme

Initialize (multiclass). Choose an initial condition $u^{0} \in V_{E}^{K}$ , and $τ > 0$ .
Step $l + 1$ : multiclass diffusion. Solve the multiclass diffusion/heat equations: for all $k \in [K]$ , $\frac{d u_{\cdot k}}{d t} = - Δ u_{\cdot k}$ on $(0, τ]$ with initial condition $u (0) = u^{l}$ .
Step $l + 1$ : multiclass threshold. For all $i \in V$ , define $u_{i}^{l + 1} := e_{k^{*}} \in R^{K}$ , where $k^{*} \in {a r g m i n}_{k \in [K]} ∥ v_{i} - e_{k} ∥_{2}$ (if $k^{*}$ is not unique, a $k$ is chosen uniformly at random out of all minimizers) with $v_{i} := {a r g m i n}_{x \in Σ^{K}} ∥ u_{i} - x ∥_{2}$ .Footnote ⁶⁸
Stop.

The initial condition $u^{0}$ and the output $u^{l + 1}$ of each iteration are functions in $V_{E}^{K}$ and can be interpreted as indicator functions for the $K$ classes: for $k \in [K]$ , define $S_{k}^{l} := {i \in V : u_{i}^{l} = e_{k}}$ , then $u_{\cdot k}^{l} = χ_{S_{k}^{l}}$ . The diffusion step thus consists of $K$ uncoupled instances of the standard (i.e., $2$ -phase) MBO scheme diffusion step, one for each of the $K$ classes. In Cucuringu et al. [Reference Cucuringu, Pizzoferrato and van Gennip60, appendix A.3] it is shown that the multiclass threshold step is equivalent to assigning $u_{i}^{l + 1} := e^{k^{*}}$ where $k^{*} \in {a r g m a x}_{k \in [K]} (u_{i})_{k}$ . In other words, the diffusion step is followed by a ‘majority vote’ determining to which class each node gets assigned.

In Bresson et al. [Reference Bresson, Huiyi, Laurent, Szlam, von Brecht, Xue-Cheng, Egil and Lysaker31] a variant of graph multiclass MBO is presented, with a slightly different random-walk-based diffusion step, but more importantly with an incremental reseeding step between iterations: each iteration the initial condition of the diffusion step is created by randomly sampling nodes from each class based on the output of the previous iteration’s multiclass threshold step. The number of sampled nodes per class is increased in each iteration.

The ideas presented earlier for the multiclass graph MBO scheme can directly be transferred to the signless setting from Sections 5.3 and 8.2 (see Keetch [Reference Keetch116, chapter 5]) and to the fidelity-forced setting [Reference Ekaterina, Cristina, Bertozzi, Arjuna and Percus143; Reference Cristina, Ekaterina, Bertozzi, Arjuna and Percus95].

In the signless case we need to take care to replace the standard simplex by

Σ_{\pm}^{K} := (x \in [- 1, 1]^{K} : \sum_{k = 1}^{K} x_{k} = 2 - K),

with corners $E_{K}^{\pm} := {e_{1}^{\pm}, \dots, e_{K}^{\pm}}$ , where, for all $i, k \in [K]$ , $(e_{k}^{\pm})_{i} = 1$ if $i = k$ and $(e_{k}^{\pm})_{i} = - 1$ if $i \neq k$ . Let $V_{E^{\pm}}^{K} := V_{E_{K}^{\pm}}$ be the set of node-functions with values on the corners of $Σ_{\pm}^{K}$ .

Signless Multiclass Graph MBO Scheme

Initialize (signless multiclass). Choose $u^{0} \in V_{E^{\pm}}^{K}$ and $τ > 0$ .
Step $l + 1$ : signless multiclass diffusion. Solve the multiclass signless diffusion/heat equations: for all $k \in [K]$ , $\frac{d u_{\cdot k}}{d t} = - Δ^{σ} u_{\cdot k}$ on $(0, τ]$ ; $u (0) = u^{l}$ .
Step $l + 1$ : signless multiclass threshold. For all $i \in V$ , define $u_{i}^{l + 1} := e_{k^{*}}^{\pm} \in R^{K}$ , where $k^{*} \in {a r g m i n}_{k \in [K]} ∥ v_{i} - e_{k} ∥_{2}$ (if $k^{*}$ is not unique, a $k$ is chosen uniformly at random out of all minimizers) with $v_{i} := {a r g m i n}_{x \in Σ_{\pm}^{K}} ∥ u_{i} - x ∥_{2}$ .Footnote ⁶⁹
Stop.

We again refer to [Reference Cucuringu, Pizzoferrato and van Gennip60, appendix A.3] for a ‘majority vote’ interpretation of the signless multiclass threshold step.

For the multiclass fidelity-forced MBO scheme, we need multiclass versions of the fidelity parameter function $μ$ and reference function $f$ from Section 5.1; that is, $μ \in V_{[0, \infty)^{K}} ∖ {0}$ Footnote ⁷⁰ and $f \in V^{K}$ . In [Reference Ekaterina, Cristina, Bertozzi, Arjuna and Percus143; Reference Cristina, Ekaterina, Bertozzi, Arjuna and Percus95] each $μ_{\cdot k}$ is chosen to be the same and $f$ is chosen from $V_{E}^{K}$ .

Multiclass Fidelity-Forced Graph MBO Scheme

Initialize (multiclass).
Step $l + 1$ : multiclass fidelity-forced diffusion. Solve the multiclass fidelity-forced diffusion/heat equations: for all $k \in [K]$ , $\frac{d u_{\cdot k}}{d t} = - Δ u_{\cdot k} - μ_{\cdot k} (u_{\cdot k} - f_{\cdot k})$ on $(0, τ]$ with initial condition $u (0) = u^{l}$ .
Step $l + 1$ : multiclass threshold.
Stop.

A multiclass mass-conserving graph MBO scheme called auction dynamics is developed in Jacobs et al. [Reference Jacobs, Merkurjev and Esedoḡlu110] (for $r = 0$ ), based on the (unconstrained) continuum multiclass MBO scheme that was introduced in Esedoḡlu and Otto [Reference Esedoḡlu and Otto77] to approximate multiclass mean curvature flow. The mass of each class is specified, that is, for a given $M \in R^{K}$ it is imposed that, for all $k \in [K]$ , $M (u_{\cdot k}) = M_{k}$ . In order that the mass is conserved, the simple majority vote from the multiclass threshold step has to be adapted by a class-dependent correction term $p_{k}$ . (Approximately) determining this term is accomplished in [Reference Jacobs, Merkurjev and Esedoḡlu110] via the duality theory of linear programming. The name ‘auction dynamics’ is inspired by an interpretation of the scheme in which each node is bidding for membership of one of the $K$ classes, with $u_{i} \in R^{K}$ the vector of weighted preferences of node $i$ for each class and $p_{k}$ the price of membership for class $k$ . For details we refer to [Reference Jacobs, Merkurjev and Esedoḡlu110].

At the time of writing and to the best knowledge of the authors, it is an open question whether this auction dynamics scheme can also be derived starting from a multiclass mass-conserving Allen–Cahn equation similar to the way the binary mass-conserving MBO scheme from Section 8.3 is derived as explained in section 6.1.2 of [Reference van Gennip and Budd190].

In [Reference Jacobs, Merkurjev and Esedoḡlu110] auction dynamics is also extended to handle upper and lower bounds on the masses of the classes, rather than exactly prescribed masses.

In the context of Poisson learning (see Section 12) Calder et al. [Reference Calder, Cook, Thorpe, Slepčev, Hal and Singh44] implement a multiclass version of the mass-conserving fidelity-forced graph MBO scheme (version 2) from Section 8.3, which they name PoissonMBO and which takes the following form.Footnote ⁷¹

Multiclass Mass-Conserving Fidelity-Forced Graph MBO Scheme

Initialize (multiclass).
Step $l + 1$ : multiclass mass-conserving fidelity-forced diffusion. Solve the multiclass mass-conserving fidelity-forced diffusion/heat equations: for all $k \in [K]$ , $\frac{d u_{\cdot k}}{d t} = - Δ u_{\cdot k} + μ_{\cdot k} f_{\cdot k} - ⟨ μ_{\cdot k}, f_{\cdot k} ⟩_{V}$ on $(0, τ]$ , with $u (0) = u^{l}$ .
Step $l + 1$ : multiclass mass-conserving threshold. For all $i \in V$ , define $u_{i}^{l + 1} := e_{k^{*}} \in R^{K}$ , where $k^{*} \in {a r g m i n}_{k \in [K]} ∥ v_{i} - e_{k} ∥_{2}$ (in case of nonuniqueness of minimizers one $k$ is chosen uniformly at random out of all minimizers) with $v_{i} := \sum_{k = 1}^{K} c_{k} (u_{i})_{k}$ , where the strictly positive constants $c_{k}$ are chosen such that, for all $k \in [K]$ , $M (u_{\cdot k}) = M (u_{\cdot k}^{0})$ .
Stop.

It is argued in [Reference Calder, Cook, Thorpe, Slepčev, Hal and Singh44, remark 2.2] that the constants $c_{k} > 0$ in the multiclass mass-conserving threshold step can always be chosen such that the mass constraints are satisfied.

In section 7.2.6 of [Reference van Gennip and Budd190] some further multiclass MBO schemes and their continuum limits are discussed.

12 Laplacian Learning and Poisson Learning

A function $u \in V$ is harmonic at $i \in V$ , if $(Δ u)_{i} = 0$ . In Zhu et al. [Reference Zhu, Ghahramani, Lafferty, Tom and Mishra211] the random walk graph Laplacian ( $r = 1$ ) is used, but other Laplacians appear in the literature as well. If $S \subseteq V$ and $u$ is harmonic at all $i \in V ∖ S$ , then $u$ is said to be harmonic on $V ∖ S$ . If $S \neq \emptyset$ and $\tilde{f} : S \to R$ is given, there exists a unique function $u \in V$ such that $u$ is harmonic on $V ∖ S$ and $u |_{S} = \tilde{f}$ , see Lovász [Reference Lovász130, theorem 3.1.2]. This is called the harmonic extension of $f$ . For introductory notes on harmonic functions on graphs, see Wigderson [Reference Wigderson199]. In [Reference Zhu, Ghahramani, Lafferty, Tom and Mishra211] this harmonic extension of $\tilde{f}$ is suggested as a good candidate for semi-supervised learning (we refer to section 4.1.2 of [Reference van Gennip and Budd190] for more information about semi-supervised learning) of binary labels on a graph (label propagation), given the a priori labels $\tilde{f}$ on $S$ . This technique is called Laplacian learning (or Laplace learning). In Zhu et al. [Reference Zhu, Lafferty and Ghahramani210] the harmonic extension is interpreted as the mean of an associated Gaussian Random Field. To go from the real-valued harmonic extension $\tilde{f}$ to binary labels, simple thresholding suffices – or mass-conserving thresholding if the preferred cluster sizes are known – in the vein of the MBO scheme(s) of Section 11. Extensions to multiple labels are also possible, similar to the multiclass extension of Calder et al. [Reference Calder, Cook, Thorpe, Slepčev, Hal and Singh44] which we discuss in Section 11.

The Laplacian learning problem

(\begin{matrix} Δ \tilde{u} = 0, & on V ∖ S, \\ \tilde{u} = \tilde{f}, & on S, \end{matrix})

(12.1)

can be transformed, via $u := \tilde{u} - f$ , where

f := (\begin{matrix} 0, & on V ∖ S, \\ \tilde{f}, & on S, \end{matrix})

(12.2)

to the semihomogeneous Dirichlet problem

(\begin{matrix} Δ u = - Δ f, & on V ∖ S, \\ u = 0, & on S . \end{matrix})

The unique solution to this problem can be constructed in terms of a Green’s function [Reference Chung and Yau53; Reference Chung and Yau54], which itself can be expressed in terms of equilibrium measures [Reference Enrique, Carmona and Encinas19; Reference van Gennip188] (see also appendix B of [Reference van Gennip and Budd190]).

The Laplacian learning problem in (12.1) also has a variational formulation.

Lemma 12.1. Let $u \in V$ , then $u$ is the unique solution of (12.1) if and only if

u \in {a r g m i n}_{\begin{matrix} \hat{u} \in V \end{matrix}} ∥ \nabla \hat{u} ∥_{E}^{2} s.t. \hat{u} |_{S} = \tilde{f} .

(12.3)

Proof. For a proof, we refer to lemma 3.5.1 in [Reference van Gennip and Budd190]. □

Laplacian learning and variants (often using the Dirichlet energy as a regularizer in a variational setting, inspired by (12.3)), have been applied in various settings, for example in Zhou and Schölkopf [Reference Zhou, Schölkopf, Walter, Robert and Hanbury206], Zhou et al. [Reference Zhou, Huang, Schölkopf, Luc and Wrobel204], and Elmoataz et al. [Reference Elmoataz, Desquesnes and Toutain74]. One generalization is the $p$ -Laplacian learning problem (see [Reference Zhou, Schölkopf, Walter, Robert and Hanbury206]) which uses $∥ \nabla u ∥_{E, p}^{p}$ in (12.3) instead of $∥ \nabla u ∥_{E}^{2}$ .Footnote ⁷² However, when there are many a priori unlabelled nodes (i.e., when $| V ∖ S |$ is large), there are some drawbacks. In particular, for random geometric graphs built from point clouds in $R^{d}$ it has been shown that when $| S |$ is fixed and $p \leq d$ , solutions to the limiting problem $| V ∖ S | \to \infty$ are relatively flat, except for spikes concentrated around the a priori labelled points (see, e.g., [Reference Zhou, Belkin, of Gordon, Geoffrey, David and Dudík208]). When $p > d$ , solutions to the limiting problem do not exhibit this unwanted behaviour, assuming the length scale in the edge weight function (which regulates the strength of the connection between points in the point cloud based on their Euclidean distance) decreases slowly enough as $| V ∖ S | \to \infty$ . For technical details, we refer to Slepčev and Thorpe [Reference Slepčev and Thorpe179]; see also [Reference Alicandro, Ansini, Braides, Piatnitski and Tribuzio4, section 7.2]. In Crook et al. [Reference Crook, Hurst, Schönlieb, Thorpe and Zygalakis59] this connection to the limiting problem is exploited to develop a new numerical method for solving $p$ -Laplacian learning. In Calder et al. [Reference Calder, Cook, Thorpe, Slepčev, Hal and Singh44] this degeneracy issue for $p = 2$ is related to the random walk interpretation of Laplace learning, in which $u_{i}$ equals the expected value of ${\tilde{f}}_{j}$ , where $j \in S$ is the first node in $S$ that a random walk starting from node $i$ hits.Footnote ⁷³ Also for $p = 2$ , Calder and Slepčev in [Reference Calder and Slepčev43] address this degeneracy issue by adapting the weights of edges that connect to nodes (i.e., points in the point cloud) that are near to the a priori labelled points.

When $p = \infty$ , $\infty$ -Laplacian learning is also called Lipschitz learning, see Kyng et al. [Reference Rasmus, Anup, Sushant, Spielman, Peter, Elad and Kale123]. Various graph-to-continuum consistency results for Lipschitz learning, as well as learning with the game-theoretic $p$ -Laplacian from (3.14) (or variants thereof) are established in, for example, Flores et al. [Reference Flores, Calder and Lerman90] (for more literature references, see section 3.5 of [Reference van Gennip and Budd190]).

In Weihs and Thorpe [Reference Weihs and Thorpe198], fractional-Laplacian learning is studied (especially its consistency at the variational level; we refer to section 7.4 of [Reference van Gennip and Budd190]), which is based on another variant of the graph Dirichlet energy:

⟨ u, Δ^{s} u ⟩_{V},

with parameter $s > 0$ (and $r = 0$ in $Δ$ ).

Another variant is Poisson learning, which is suggested in [Reference Calder, Cook, Thorpe, Slepčev, Hal and Singh44] as a way to avoid the degeneracy issue that we discussed earlier:

Δ u = χ_{S} (f - \frac{M (f)}{M (χ_{S})}),

(12.4)

where $f$ is as in (12.2). The choice $r = 0$ is made for the Laplacian $Δ$ and mass functional $M$ . To select a unique solution, the constraint $M (u) = 0$ is imposed, where this time $r = 1$ is chosen in $M$ .Footnote ⁷⁴ As in the case of Laplace learning, to go from the real-valued function $u$ to labels, a (mass-conserving) thresholding step is used. This is extended to multiple labels (see Section 11) and implemented using a multiclass variant of the mass-conserving fidelity-forced graph MBO scheme (version 2) from Section 8.3, which is named the Poisson MBO algorithm in [Reference Calder, Cook, Thorpe, Slepčev, Hal and Singh44]. Indeed, in the two-class case we see that the stationary solution to the diffusion step of this Poisson graph MBO scheme – namely to the mass-conserving fidelity-forced diffusion step of the mass-conserving fidelity-forced graph MBO scheme (version 2) – satisfies (12.4) with $χ_{S}$ in the right-hand side replaced by $μ$ with support equal to $S$ .

Per [Reference Calder, Cook, Thorpe, Slepčev, Hal and Singh44, theorem 2.3], the equivalent variational formulation of (12.4) isFootnote ⁷⁵

u \in {a r g m i n}_{\begin{matrix} v \in V \end{matrix}} \frac{1}{2} ∥ \nabla v ∥_{E}^{2} - {(χ_{S} (f - \frac{M (f)}{M (χ_{S})}), v)}_{V} s.t. M (v) = 0.

For an interpretation of Poisson learning in terms of random walks, we refer to [Reference Calder, Cook, Thorpe, Slepčev, Hal and Singh44, theorem 2.1]. In appendix B of [Reference van Gennip and Budd190] we have a detailed look at solutions of (12.4).

In the recent work of Thorpe and Wang [Reference Thorpe, Wang, of Bruna, Joan, Jan and Zdeborova184] a robust certification against adversarial attacks is proved for graph Laplace learning, that is, the classification result remains unchanged if the data points (assumed to be embedded in $R^{m}$ ) are perturbed with some bounded perturbation. Work currently in preparation of Korolev et al. [Reference Korolev, Kuger and Thorpe122] studies Laplace learning in an infinite-dimensional setting.

13 Conclusions

Using the tools discribed in Section 3, discretized variants of differential equations and variational models can be formulated on graphs. We have focused mostly on the graph Ginzburg–Landau model and dynamics related to that model, such as those described by the graph Allen–Cahn equation, the graph MBO scheme, and graph mean curvature flow. In Section 12 we also had a look at the graph Laplace equation and graph Poisson equation.

A common theme that unites these various models and equations, besides being discretized versions of well-known continuum variational models and PDEs, is their application in machine learning and mathematical imaging.

Besides containing extended versions of some of the sections from the current book, the companion volume, [Reference van Gennip and Budd190], also contains a full chapter about these applications in machine learning and imaging. It has another chapter dealing with the numerical implementation of these methods, a chapter about the connections between the graph Allen–Cahn equation, the graph MBO scheme, and graph mean curvature flow, and a chapter about discrete-to-continuum limits of graph-based models. Furthermore, it contains a, necessarily relatively brief, overview of the continuum models from which the graph-based models are derived, and a discussion of connections with other fields and open questions.

If this work is part of a snapshot of the current state of this very active mathematical field, then its companion is meant to be a fuller and broader overview. While both works contain a literature overview that can suggest further directions to explore for the interested reader, the companion volume contains a significantly more extensive bibliography to explore.

Acknowledgements

First of all we would like to thank Daniel Tenbrinck for taking the initiative that has made it possible for this work to come into existence. It, and [190], grew from the fertile ground that has been the NoMADS (Nonlocal Methods for Arbitrary Data Sources) project which was funded by the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 777826.

Our thank also goes out to Luca Calatroni who has been our main point of contact for all questions editorial and David Tranah, our invaluable contact at Cambridge University Press. Furthermore, we very much appreciate the feedback of the two anonymous reviewers, whose comments have led to many improvements throughout this Element.

The content of this Element is built upon the expertise we have built in the course of our own research over many years, as well as the expertise of many others which we have had the privilege to learn from through the literature, scientific presentations, and – most enjoyably – many personal conversations. It would be impossible to thank all the people whose influence can be found to a smaller or larger degree in this Element, so we restrict ourselves to mentioning Andrea Bertozzi at the University of California, Los Angeles, and Abderrahim Elmoataz at the Université de Caen Normandie, and their many collaborators, who through their scientific works and their personal engagement have put the field of PDE-inspired differential equations on graphs for imaging and machine learning on the map as a vital and vibrant area of research. It can always remain a matter of debate in the larger community what may or even should be considered the origin of a new (sub)field in mathematics, but at least in the narrow context of the authors’ own contributions to this field, the people and groups from Los Angeles and Caen have been instrumental in shaping it. Moreover, we would like to thank explicitly Jonas Latz for his contributions to our comment on continuous-time Markov chains in Remark 3.5 and Daniele Avitabile for pointing out an alternative name for the semidiscrete implicit Euler scheme (see footnote Footnote 63).

The authors acknowledge financial support by the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 777826 via the NoMADS project. The second author has received funding from the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy - EXC-2047/1 - 390685813, and from start-up funds at the California Institute of Technology.

The open access edition of this Element was financed by the above-mentioned NoMADS project and the TU Delft Open Access Fund.

Luca Calatroni
Centre National de la Recherche Scientifique (CNRS)
Luca Calatroni is a permanent junior research scientist of the French Centre of Scientific Research (CNRS) at the laboratory I3S of Sophia-Antipolis, France. He got his PhD in applied mathematics in 2016 as part of the Cambridge Centre for Analysis (DTC) and he worked as post-doctoral research fellow at the École Polytechnique (Palaiseau, France) with a Lecteur Hadamard fellowship funded by the FMJH. His research interests include variational models for mathematical imaging, inverse problems, non-smooth and non-convex optimization with applications to biomedical imaging, computational neurosciences and digital art restoration.

Martin Burger
Friedrich-Alexander University Erlangen-Nürnberg, Germany
Raymond Chan
City University of Hong Kong
Ekaterina Rapinchuk
Michigan State University, USA
Carola-Bibiane Schönlieb
University of Cambridge, UK
Daniel Tenbrinck
Friedrich-Alexander University Erlangen-Nürnberg, Germany

About the Series

This series provides a mathematical description of the modelling, the analysis and the optimisation aspects of processing data that features complex and non-local relationships. It is intended for both students and researchers wishing to deepen their understanding of non-local analysis and explore applications to image and data processing.

Element contents

A Prolegomenon to Differential Equations and Variational Methods on Graphs

Summary

Keywords

1 Introduction

1.1 Outline of This Element

2 History and Literature Overview

3 Calculus on Undirected Edge-Weighted Graphs

3.1 Graphs, Function Spaces, Inner Products, and Norms

3.2 Graph Gradient, (p-)Dirichlet Energy, (p-)Laplacian, and Total Variation

3.3 Miscellaneous Considerations

4 Directed Graphs

5 The Graph Ginzburg–Landau Functional

5.1 Constraints

5.2 Graph Cuts and Γ-Convergence to Total Variation

5.3 Signless Laplacians and Signless Ginzburg–Landau Functional

6 Spectrum of the Graph Laplacians

7 Gradient Flow: Allen–Cahn

7.1 Smooth Double-Well Potential

7.2 Double-Obstacle Potential

7.3 Allen–Cahn with Constraints

7.4 Different Metrics

8 Merriman–Bence–Osher Scheme

8.1 Definition of the MBO Scheme

Graph MBO Scheme (without Constraints)

8.2 The Signless MBO Scheme

Signless Graph MBO Scheme

8.3 The MBO Scheme with Constraints

Fidelity-Forced Graph MBO Scheme (Version 1)

Fidelity-Forced Graph MBO Scheme (Version 2)

Mass-Conserving Fidelity-Forced Graph MBO Scheme (Version 1)

Mass-Conserving Fidelity-Forced Graph MBO Scheme (Version 2)

9 Graph Curvature and Mean Curvature Flow

9.1 Curvature

9.2 Mean Curvature Flow

10 Freezing of Allen–Cahn, MBO, and Mean Curvature Flow

11 Multiclass Extensions

Multiclass Graph MBO Scheme

Signless Multiclass Graph MBO Scheme

Multiclass Fidelity-Forced Graph MBO Scheme

Multiclass Mass-Conserving Fidelity-Forced Graph MBO Scheme

12 Laplacian Learning and Poisson Learning

13 Conclusions

Acknowledgements

Footnotes

References

Save element to Kindle

Save element to Dropbox

Save element to Google Drive

3.2 Graph Gradient, ( $p$ -)Dirichlet Energy, ( $p$ -)Laplacian, and Total Variation

5.2 Graph Cuts and $Γ$ -Convergence to Total Variation