Skip to main content Accessibility help
×
Hostname: page-component-745bb68f8f-l4dxg Total loading time: 0 Render date: 2025-02-05T01:06:55.605Z Has data issue: false hasContentIssue false

A Prolegomenon to Differential Equations and Variational Methods on Graphs

Published online by Cambridge University Press:  05 February 2025

Yves van Gennip
Affiliation:
Delft University of Technology
Jeremy Budd
Affiliation:
California Institute of Technology

Summary

The use of differential equations on graphs as a framework for the mathematical analysis of images emerged about fifteen years ago and since then it has burgeoned, and with applications also to machine learning. The authors have written a bird's eye view of theoretical developments that will enable newcomers to quickly get a flavour of key results and ideas. Additionally, they provide an substantial bibliography which will point readers to where fuller details and other directions can be explored. This title is also available as open access on Cambridge Core.
Type
Element
Information
Online ISBN: 9781009346641
Publisher: Cambridge University Press
Print publication: 27 February 2025
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - ND
This content is Open Access and distributed under the terms of the Creative Commons Attribution licence CC-BY-NC-ND 4.0 https://creativecommons.org/cclicenses/

1 Introduction

The Element you are about to read tells a tale that shrank in the writing. The opposite is true for its companion volume [Reference van Gennip and Budd190]. When we were asked to write a contribution to what would become the Elements in Non-local Data Interactions: Foundations and Applications series in April 2021,Footnote 1 quickly the idea took hold to provide an overview, a snapshot of the field of differential equations and variational methods on graphs and their applications to machine learning, image processing, and image analysis. But this is a very active field, being developed in a variety of different directions, and so the story we wanted to tell outgrew the format of a contribution to the Cambridge Elements series. We are grateful to Cambridge University Press and the editors for deeming our full contribution worthy enough of its own publication [Reference van Gennip and Budd190] and allowing us to extract and adapt some introductory sections for publication in the Elements series. This Element provides an introduction to differential equations and variational methods on graphs seen through the lenses of the authors. It focuses on some areas to which we ourselves have actively contributed, but at the same time aims to give sufficient background to equip the interested reader to explore this fascinating topic. We have aimed to make this Element a satisfyingly comprehensive read in itself, while also allowing it to be a teaser for its more extensive companion volume. Those who compare both volumes will find that the companion volume contains, besides additional details in some of the sections that correspond to those in the current Element, chapters on applications of differential equations on graphs and their computational implementation, as well as chapters on further theoretical explorations of the relationships between various graph-based models and of the continuum limits of such models when the number of nodes of the underlying graph is taken to infinity.

Differential equations, both ordinary differential equations (ODEs) and partial differential equations (PDEs), have a long history in mathematics and we assume they need no introduction to the reader. Slightly younger, yet still of a venerable age, is the study of graphs.Footnote 2 By a differential equation on a graph we mean a discretization of a differential equation, usually a PDE, on a graph: if we can write the PDE as F(u(x,t))=0, for a differential operator F and a function u defined on (a subset of) Rm×R, then we obtain a differential equationFootnote 3 on a graph by replacing the (spatial) derivatives with respect to x in F by finite difference operators based on the structure of the graph and replacing u by a function defined on (a subset of) V×R, where V is the set of nodes of the graph.

Variational models in the calculus of variations are typically formulated in terms of minimization of a function(al). To consider such a model on a graph, we discretize the functional by replacing integrals with the appropriate sums and differential operators with the corresponding finite difference operators on graphs. This process also turns finite-dimensional the space of admissible functions over which the functional is minimized. The line between calculus of variations and mathematical optimization becomes blurry here, or is even crossed. Because the authors of this Element approach these models from the point of view of variational calculus, we will include them under the umbrella of variational models (on graphs), even if the originators of any particular model may have had a different inspiration when proposing that model.

The field of machine learning is concerned with the development of methods and algorithms to analyse data sets. ‘Learning’ in this context refers to leveraging the properties of some collection of ‘training data’ (which may or may not be a part of the data set which is to be analysed) to draw conclusions about the data set. Machine learning has undertaken an enormous flight in the twenty-first century. Terms like ‘big data’, ‘machine learning’, and ‘artificial intelligence’ are now commonplace for many people, both because of the commercial successes of the many tech companies that exist by the grace of data availability and the methods to learn from the data, and because of the enormous speed with which deep-learned algorithms have transformed many areas of science, industry, and public and private life. Scientific curiosity goes hand in hand with a societal need to understand the methods that play such a big role in so many sectors.

But what is the role of differential equations in all of this? After all, many of the advances in machine learning, both in terms of development of new methods and analysis of existing ones, come from statistics, computer science, and the specific application fields – scientific or industrial – where the methods are used. One might argue for the general notion that increased diversity in the points of view from which a particular topic is scrutinized leads to different and complementary insights that all strengthen the full picture. There certainly is validity in that notion when it comes to the study of machine learning; in this case, though, there are stronger ties that go beyond generalities.

A substantial part of the root system of differential equations in machine learning lies in the field(s) of mathematical image processing and image analysis.Footnote 4 Not only do (the ingredients of) many differential-equation-based machine learning methods have roots in the mathematical imaging literature and many machine learning methods have applications in imaging problems, but there is also a substantial overlap in the communities active in these fields.

Despite the success of artificial neural networks, they are not central objects in this Element. The main focus in the current Element is on those methods that represent the data (e.g., the image) in terms of a graph in order to apply a graph-based variational model or differential equation.

These methods have many desirable properties, which have made them popular in recent years. Because the graph-based models and equations have close connections to well-established models and equations in the continuum setting, there is a wealth of ideas to pursue and techniques to apply to study and understand them. Moreover, the development of numerical methods for differential equations has a long and rich history, yielding many algorithms that can be adapted to the graph-based setting. Another key difference between most machine learning methods discussed in this Element compared to deep learning methods is that the latter are mostly data-driven,Footnote 5 which usually means many training data are required to obtain well-performing networks, while the former are explicitly model-driven and so tend to require fewer training data (but also are less likely to discover patterns for which the models are not built to look).

The scope of this Element is broad in some parts and narrow in others. On the one hand, we wish to provide an overview of an exciting, ever-broadening, and expansive field. Thus, in the general literature overview in Section 2 we have taken a very broad view of the topic of differential equations on graphs and their applications with the aim of placing it in its historical context and pointing the reader to the many aspects and fields that are closely related to it. On the other hand, in the remainder of this Element, we focus on very specific models with a certain amount of bias in the direction of those models with which the authors have close experience themselves, yet not to the complete exclusion of other models. These models revolve around the graph Ginzburg–Landau functional, which is introduced in this Element in Section 5 and which plays a central role in a number of graph-based clustering and classification methods.

Graph clustering and classification are similar tasks that both aim to use the structure of the graph to group its nodes into subsets called clusters or classes (or sometimes communities or phases, depending on the context or application). In most of the settings that we discuss here, these subsets are required to be pairwise disjoint, so that mathematically we can speak of a partition of the node set (assuming nonempty subsets) and we obtain non-overlapping clusters or classes. A key guiding principle of both tasks is to have strong connections (i.e., many edges or, in an edge-weighted graph, highly weighted edges) between nodes in the same class or cluster and few between nodes in different clusters or classes.Footnote 6 This is not the only requirement for a good clustering or classification – in the absence of any other desiderata, a trivial partition of the node set into one cluster would maximize intra-cluster connectivity and minimize inter-cluster connectivity – and so additional demands are typically imposed. Two types of constraints that will receive close attention in this Element are constraints on cluster sizes and constraints that encourage fidelity to a priori known class membership of certain nodes. The presence of such a priori known labels is what sets classification apart from clustering.

There are mathematical imaging tasks that can be formulated in terms of graph clustering or classification, most notably image segmentation, which can be viewed as the task of clustering or classifying the pixels of a digital image based on their contribution to (or membership of) objects of interest in the image. Other imaging tasks, such as image denoising, reconstruction, or inpainting, can be formulated in ways that are mathematically quite closely related to graph clustering and classification.

We hope this Element may serve as an overview and an inspiration, both for those who already work in the area of differential equations on graphs and for those who do not (yet) and wish to familiarize themselves with a field rich with mathematical challenges and abundant applications.

1.1 Outline of This Element

We give a brief overview of the history of and literature in the field of differential equations and variational methods on graphs with applications in machine learning and image analysis in Section 2. For a more extensive overview, we refer to section 1.2 of the companion volume [Reference van Gennip and Budd190]. In Section 3 we lay the mathematical foundations that we require to formulate differential equations and variational models on graphs. For example, we define spaces of node functions and important operators and functionals on these spaces, such as the graph Laplacian operators and the graph total variation functional.

In most of this Element we consider undirected graphs, but in Section 4 we discuss very briefly works that generalize some of the concepts from Section 3 to directed graphs, in particular graph Laplacians.

Besides the graph total variation functional, another very important graph-based functional, which we have already mentioned, is the graph Ginzburg–Landau functional. It deserves its own place in the spotlight; thus, in Section 5 we introduce it and discuss some of its variants and properties, including its connection to the graph total variation functional.

The spectrum of an operator gives insight into its behaviour. In Section 6 indeed we see that the spectra of graph Laplacians shed some light on their usefulness in graph clustering and classification problems.

The role of functionals in variational models is as ‘energy’ or an ‘objective function’ that needs to be minimized. Two important dynamical systems that (approximately) accomplish this minimization for the graph Ginzburg–Landau functional are described by the graph Allen–Cahn equation and the graph Merriman–Bence–Osher scheme, respectively. The former, which is an ordinary differential equation obtained as a gradient flow of the graph Ginzburg–Landau functional, is the topic of Section 7, while the latter is the focus of Section 8.

Closely related to the graph Allen–Cahn and Merriman–Bence–Osher dynamics are the graph mean curvature flow dynamics that are described in Section 9. Although exactly how closely these are related is still an open question, one property most of them have in common is a threshold on the parameter of the model below which the dynamics trivializes. This freezing phenomenon is discussed in Section 10.

The main focus in this Element is on models that cluster or classify the node set of the graph into two subsets. In Section 11 we take a look at multiclass extensions of some of these models.

In Section 12 we discuss some finite difference methods on graphs, namely Laplacian learning and Poisson learning. Finally, Section 13 provides a brief conclusion to this Element as we look forward (or sideways) to additional topics that appear in the companion volume [Reference van Gennip and Budd190].

2 History and Literature Overview

Any area of mathematical enquiry will have its roots in what came before. The field of differential equations on graphs for clustering and classification problems is no exception, so there is always the risk that the starting point of any historical overview may feel somewhat arbitrary. It is inescapably personal too; the priorities and narratives generated by our own research inevitably have influenced our departure point for this story and the route we will follow afterwards. As such, the references given in this section are not meant to be exhaustive, nor are all contributions from the references that are given always exhaustively described. One reason this field has generated such enthusiastic interest is that it brings together ideas from many different directions, from statistics and machine learning to discrete mathematics and mathematical analysis. We encourage any attempts to understand the history of this field from perspectives different from the one we provide here, shining more light on the exciting diversity of viewpoints differential equations on graphs have to offer.

For a fuller overview of the literature, we refer to section 1.2 of the companion volume [Reference van Gennip and Budd190]. Neither the current section nor the literature section in the companion volume are exhaustive overviews, if such a thing is even a possibility, but they should provide enough starting points for someone who is eager to learn about this active field of research. We apologize for the undoubtedly many important works in their respective areas that are missing from our bibliography.

As a double ignition point for the growing interest in differential equations on graphs by the (applied) mathematical analysis community in the late noughties and early tens of the twenty-first century, especially in relation to image processing and data analysis applications, we mention works by Abderrahim Elmoataz and collaborators such as [Reference Desquesnes, Elmoataz and Lézoray63, Reference Elmoataz, Lézoray and Bougleux75, Reference Lozes, Elmoataz and Lézoray131] – which have a strong focus on p-Laplacians, -Laplacians, and morphological operations such as dilation and erosion on graphs, as well as processing of point clouds in three dimensions – and works by Andrea L. Bertozzi and collaborators, like [Reference Bertozzi and Flenner21, Reference Bertozzi and Flenner22, Reference Ekaterina, Tijana and Bertozzi144, Reference Bertozzi189, Reference Nestor, Braxton and Bertozzi191] – which deal with the Ginzburg–Landau functional on graphs and derived dynamics such as the Allen–Cahn equation and Merriman–Bence–Osher (MBO) scheme on graphs. This is not to say there were no earlier investigations into the topic of differential operators and equations on graphs, for example in [Reference Neuberger160, Reference Wardetzky, Mathur, Kälberer and Grinspun195] or in the context of consensus problems [Reference Moreau150, Reference Reza and Murray164], or variational ‘energy’ minimization problems on graphs, such as in the context of Markov random fields [Reference Szeliski, Zabih, Scharstein, Veksler, Kolmogorov, Agarwala, Tappen and Rother181], but the two groups we mentioned earlier provided a sustained drive for the investigation of both the applied and theoretical aspects of differential equations on graphs in the setting on which we are focusing in the current Element.

We note that in the current Element, when we talk about differential equations on graphs, we typically mean partial differential equations whose spatial derivatives have been replaced by discrete finite difference operators on graphs (see Section 3.2), leading to ODEs or finite-difference equations. A priori this is different from the systems of PDEs formulated on the edges of a network that are coupled through boundary conditions on the nodes, which are also studied under the name ‘differential equations on graphs (or networks)’ [Reference Vol’pert192].

New research directions tend not to spring into being fully formed, and also the works mentioned earlier have had the benefit of a rich pre-existing literature in related fields. In particular, many of the variational models and differential equations that are studied on graphs have been inspired by continuum cousins that came before and by the, often sizeable, literature that exists about those models and equations. Examples are the Allen–Cahn equation [Reference Allen and Cahn5], the Cahn–Hilliard equation [Reference Cahn and Hilliard40], the Merriman–Bence–Osher scheme [Reference Barry, Bence and Osher146, Reference Barry, Bence and Osher147], flow by mean curvature [Reference Fred, Taylor and Wang6, Reference Brakke30, Reference Osher and Sethian167], total variation flow [Reference Fuensanta, Coloma, Vicent and Mazón9], and the Mumford–Shah and Chan–Vese variational models [Reference Chan and Vese47, Reference Mumford and Shah159].

Inspiration has also come from other directions, such as discrete calculus [Reference Grady and Polimeni101] and numerical analysis and scientific computing [Reference Belongie, Fowlkes, Chung, Malik, Anders, Gunnar, Mads and Johansen18]. The latter fields not only provide state-of-the-art methods that allow for fast implementations of the graph methods on graphs with many nodes – something which is very important in modern-day applications that deal with large data sets or high-resolution images – but also offer theoretical tools for dealing with discretizations of continuum equations, even though the specifics may differ substantially between a discretization designed with the goal of approximating a continuum problem as accurately as possible and a discretization which is a priori determined by the graph structure that is given by the problem or inherent in the application at hand.

Some of the most prominent applications that have been tackled by differential equations and variational models on graphs, especially by the models and methods central to the current Element, are graph clustering and classification [Reference Schaeffer174], and other applications that are – or can be formulated to become – related, such as community detection [Reference Porter, Onnela and Mucha171], image segmentation [Reference Chan and Shen46], and graph learning [Reference Xia, Sun, Shuo, Aziz, Wan, Pan and Liu200]. For an extensive look at these, and other, applications, we refer to chapter 4 of the companion volume [Reference van Gennip and Budd190]. In the current Element we focus on the core graph clustering and classification applications.

Since the early pioneering works we mentioned at the start of this section, differential equations on graphs have enjoyed a lot of attention from applied analysts and researchers from adjacent fields. As a very rough attempt at classification of these different research efforts, we distinguish between those papers that study differential equations and variational models purely at the discrete-graph level and those that are interested in continuum limits of those discrete equations and models as a way to establish their consistency. (Some papers may combine elements of both categories.) The focus of this Element is on the first category. For a closer look at the second category, we refer to chapter 7 of [Reference van Gennip and Budd190].

In this first category, we encounter papers that study particular graph-based differential operators or graph-based dynamics, such as the eikonal equation (and the related eikonal depth), p-eikonal equation, p- and -Laplacians [Reference Böhle, Kuehn, Mulas and Jost38, Reference Calder and Ettehad42, Reference Elmoataz, Toutain and Tenbrinck76, Reference Manfredi, Oberman and Sviridov138, Reference Zhou, Schölkopf, Walter, Robert and Hanbury206], semigroup evolution equations [Reference Mugnolo153], dynamics [Reference Knill and Rak121] such as mean curvature flow and morphological evolution equations (related to morphological filtering and graph-based front propagation) [Reference Chakik, Abdallah and Desquesnes70, Reference Vinh-Thong, Elmoataz and Lézoray182] or advection [Reference Rak172], or discrete variational models such as trend filtering on graphs [Reference Smola, Alexander and Tibshirani194] and the graph Mumford–Shah model [Reference Huiyi, Justin, Bertozzi, Bae, Chan, Tony and Lysaker106, Reference Parker and Feng168].

Of special interest in the context of the current Element are the graph Allen–Cahn equation, graph MBO scheme, and graph mean curvature flow [Reference Budd33, Reference Ekaterina, Tijana and Bertozzi144, Reference van Gennip187, Reference Nestor, Braxton and Bertozzi191], which are discussed in much greater detail in Sections 7, 8, and 9. For details about applications in which these graph-based dynamics have been used, we refer to chapter 4 of [Reference van Gennip and Budd190]. Some of the applications that are considered in that volume require an extension of the classical two-phase versions of the Allen–Cahn and MBO dynamics to a multiclass context [Reference Cristina, Arjuna, Percus, of Fred, Ana and Marsico94, Reference Cristina, Ekaterina, Bertozzi, Arjuna and Percus95]; other variations on multiclass MBO have been developed, such as an incremental reseeding method [Reference Bresson, Huiyi, Laurent, Szlam, von Brecht, Xue-Cheng, Egil and Lysaker31] and auction dynamics [Reference Jacobs, Merkurjev and Esedoḡlu110].

The publications focusing on the discrete level also include papers that study connections between graph-based differential operators or dynamics on the one hand, and on the other hand graph-based concepts that are useful for studying graph structures, which sometimes already had a history outside of the differential-equations-on-graphs literature. For example: the modulus on graphs [Reference Albin, Brunner, Perez, Poggi-Corradini and Wiens3], Cheeger cuts and ratio cuts [Reference Merkurjev, Bertozzi, Yan and Lerman141], ranking algorithms and centrality measures such as heat kernel PageRank [Reference Ekaterina, Bertozzi and Chung142], nonconservative alpha-centrality (as opposed to conservative PageRank) [Reference Ghosh and Lerman98], centrality measures and community structure based on the interplay between dynamics via parameterized (or generalized) Laplacians and the network structure [Reference Yan, Teng, Lerman and Ghosh201], random walks and SimRank on uncertain graphs (i.e., graphs in which each edge has a probability of existence assigned to it) [Reference Zhu, Zou and Jianzhong209], distance and proximity measures on graphs [Reference Avrachenkov, Chebotarev and Rubanov12, Reference Chebotarev, Frank and Barbaresco48], and a hubs-biased resistance distance (based on graph Laplacians) [Reference Estrada and Mugnolo79].

We also draw attention here to graph-based active learning [Reference Kevin, John, Jason, Xoaquin, Zhan, Jeff, Bertozzi, Edmund and Garber148], Laplacian learning [Reference Zhu, Ghahramani, Lafferty, Tom and Mishra211], Poisson learning [Reference Calder, Cook, Thorpe, Slepčev, Hal and Singh44], and results on the Lipschitz regularity of functions in terms of Laplacians on point clouds [Reference Calder, Trillos and Lewicka45]. For overview articles about graph-based methods in machine learning and image processing, we refer to [Reference Bertozzi, of Sirakov, Boyan, Paulo and Viana20, Reference Bertozzi, Merkurjev, of Kimmel, Ron and Tai23, Reference Chong, Ding, Yan and Pan51].

In the continuum setting, the dynamics of both the Allen–Cahn equation and the MBO scheme are known to approximate flow by mean curvature, in a sense that has been made precise through convergence analysis [Reference Barles and Georgelin16, Reference Lia and Kohn32, Reference Caffarelli and Souganidis39, Reference Evans80, Reference Laux and Lelmi125, Reference Laux and Yip126]. Rigorous connections between the graph Allen–Cahn equation and graph MBO scheme have been established in [Reference Budd33, Reference Budd and van Gennip35, Reference Budd and van Gennip36, Reference Budd, van Gennip and Latz37]; their connections with graph mean curvature flow are open questions. Details about these established connections and open questions, as well as a closer look at the various dynamics in the continuum setting, can be found in chapter 6 of [Reference van Gennip and Budd190].

In many of the works just cited, the graphs under consideration on which the variational models or differential equations are formulated are finite, undirected graphs, with edges that connect nodes pairwise and that, if weighted, have a positive weight. Moreover, the graphs are unchanging – also, for the continuum limits that we briefly mentioned, even though the limit |V| is considered, typically at each fixed |V|, the graph structure is static.

This leaves a lot of room for generalizations and extensions. In this Element we refrain from delving into these generalizations in too much detail, although in Section 4 we do briefly discuss Laplacians on directed graphs [Reference Bauer17, Reference Fanuel, Alaíz and Suykens87, Reference Hein, Audibert and von Luxburg104, Reference Zhou, Schölkopf, Hofmann, Lawrence, Yair and Bottou207]. Other possible generalizations are to multislice networks [Reference Mucha, Richardson, Macon, Porter and Onnela151], hypergraphs (in which edges can connect more than two nodes) [Reference Bosch, Klamt and Stoll26; Reference Ikeda and Uchida108], metric graphs and quantum graphs (in which edges are represented by intervals of the real line) [Reference Kennedy, Kurasov, Léna and Mugnolo118], signed graphs that can have positive and negative edge weights [Reference Cucuringu, Pizzoferrato and van Gennip60], metric random walk spaces (of which locally finite positively edge-weighted connected graphs are a special case) [Reference Mazón, Marcos and Toledo139; Reference Mazón, Marcos and Toledo140]Footnote 7, and graphs changing in time [Reference Bonaccorsi, Cottini and Mugnolo25].

Of interest also is the connection between methods on graphs and the constructions used to build the graphs, as is considered in, for example, [Reference Trillos, Nicolás and Yang97]. Some more details about building graph models for specific applications are given in section 4.2 of [Reference van Gennip and Budd190].

3 Calculus on Undirected Edge-Weighted Graphs

3.1 Graphs, Function Spaces, Inner Products, and Norms

Except where explicitly stated otherwise, in this Element we consider finite,Footnote 8 simple (i.e., without multi-edgesFootnote 9 and without self-loopsFootnote 10), connected,Footnote 11 edge-weighted graphs G=(V,E,ω) – if a graph G is mentioned without further specification, it is assumed to have these properties. Figure 3.1 shows an example of such a graph. Here V is the set of nodes or vertices of the graph, EV×V is the set of edges,Footnote 12 and ω:V×VR is the edge weight function which vanishes on Ec:=(V×V)E; thus ω|Ec=0. Unless otherwise specified, we assume that ω|E>0. In this framework, an unweighted graph G=(V,E) can be viewed as an edge-weighted graph G=(V,E,ω) with ω|E=1.

Figure 3.1 An example of a finite, simple, connected, and undirected graph.

We will assume that |V|2. It will often be useful to identify the nodes in V with the numbers 1 to nN,Footnote 13 and we write V=[n]:={1,2,,n}. Unspecified nodes from V we denote by i,j,k,. The edge from i to j is denoted by (i,j); the nodes i and j are endpoints of this edge. For any node function u, that is, a function u whose domain is (a subset of) V, we write ui for u(i). Similarly, for any function φ whose domain is (a subset of) V×V, we write φij for φ(i,j). Any such function which vanishes on Ec we will call an edge function. We define the function spaces of node and edge functions,

V:={u:VR},E:={φ:V×VR:φ|Ec=0}.

We use subscripts to indicate alternative codomains: if A is a set, then

VA:={u:VA},EA:={φ:V×VA:φ|Ec=0}.

In particular, V=VR and E=ER.

When it is not explicitly stated differently (as in Section 4), we consider undirected graphs, namely graphs for which (i,j)E if and only if (j,i)E. Two nodes i,jV in an undirected graph are called adjacent, or neighbours, if (i,j)E. The condition that ji, which is sometimes explicitly included in the definition of neighbour, is superfluous under our assumption of absence of self-loops. The matrix A with entries Aij:=ωij is called the (weighted) adjacency matrix, or weight matrix of the graph. In an undirected graph, we require ω to be symmetric in its arguments, that is, for all i,jV, ωij=ωji. Some authors demand their edge functions φE to be skew-symmetric, namely φij=φji. We do not require this assumption and so will not impose it.

Since we consider graphs without self-loops, for all iV, ωii=0. The degree of node i is di:=jVωij. A graph is connected if, for all i,jV with ij, there exist finitely many nodes i1,i2,,ik such that i1=i, ik=j, and ωi1i2ωik1ik>0. A graph which is not connected is called disconnected. Since our graphs are assumed to be connected (unless stated otherwise), we have, for all iV, di>0.

In some situations it will be useful to have a shorthand notation to indicate adjacency of nodes (in an undirected graph): for iV, we write ji if and only if jV and (i,j)E. Equivalently under our assumption that ω|E>0, we have that ji if and only if jV and ωij>0.

Our first step to defining a calculus on node and edge functions is to define an inner product structureFootnote 14 on V and E. LetFootnote 15 r[0,1], q[1/2,1] (see Remark 3.9), u,vV, and φ,ψE. Then

u,vV:=iVdiruiviandφ,ψE:=12(i,j)Eωij2q1φijψij.(3.1)

We note that the factor 12 compensates for the ‘double count’ of edges (i,j) and (j,i) in an undirected graph. In this and other circumstances it can be convenient to rewrite the sum over (i,j)E as a double sum over i,jV. This can be done since φij=ψij=ωij=0 if (i,j)̸E. In this case we have to interpret ωij0 to be 0 (not 1!) if (i,j)̸E.

These inner products induce norms in the usual way: uV:=u,uV and φE:=φ,φE. Other commonly used norms on V and E are the p-norms (for p[1,)) and -norms:

We note that the p=2 norms are the norms induced by the inner products.

If r=0, then uV is the Euclidean 2-norm of the vector with components ui. To avoid specifying r=0, we may write u2 in this case, with 2 inner product ,2. We also use this notation for vectors not (necessarily) meant to be interpreted as node functions. Similarly, if r=0, then uV,p is equal to the p-norm for the vector with components ui and we may write up instead.

If SV, we define χS to be its indicator (or characteristic) function, that is,

(χS)i:=1,if iS,0,if iSc:=VS.

In particular χV=1V (and we may write 1 for χV, or c for cχV, if cR is a constant) and χ=0V. Similarly, we define indicator functions χE for edge subsets EE. We will also need a variant indicator function ıA for AR, defined to be ıA(x)=0 if xA and ıA(x)=+ if xRA.

The volume of a node subset SV and volume of an edge subset EE are defined to be (for any p[1,)), respectively,

volS:=χSV,pp=iSdir,volE:=χEE,pp=12(i,j)Eωij2q1.(3.2)

Lemma 3.1. In this lemma we interpret 1=0. Let u,vV and φ,ψE. For all p,p[1,] such that 1p+1p=1, these Hölder inequalities hold:

uvV,1uV,p vV,pandφψE,1φE,p ψE,p.

Moreover, these embedding estimates are satisfied, if s,t[1,] with st:

uV,svolV1s1tuV,tandφE,svolE1s1tφE,t.

Furthermore,

limpuV,p=uV,andlimpφE,p=φE,.

In fact, for all p[1,) we have

miniVdirpuV,uV,pvolV1puV,

and

21pmin(i,j)Eωij2q1pφE,φE,pvolE1pφE,.

Proof. For a proof we refer to lemma 2.1.1 of [Reference van Gennip and Budd190]. □

Instead of interpreting φE in the standard way as a function from V×V to R, we may also view it as a function from V to V: for all iV, φiV. This prompts the following definitions of a node-dependent inner product and p- and -norms: for all φ,ψE, all iV, and all p[1,),

(φ,ψ)i:=12jVωij2q1φijψij,φi,p:=12jVωij2q1|φij|p1p,φi,:=max{|φij|:jV}.(3.3)

Corollary 3.2. In this corollary we interpret 1=0. Let φ,ψE and iV. For p,p[1,] such that 1p+1p=1, a Hölder inequality holds:

φψi,1φi,p ψi,p.

Moreover, an embedding estimate is satisfied, if s,t[1,] with st:

φi,s12jVωij2q11s1tφi,t.

Furthermore, for all φE and for all p[1,),

21pminjVωij>0ωij2q1pφi,φi,p12jVωij2q11pφi,,

and thus in particular,

limpφi,p=φi,.

Proof. We refer to corollary 2.1.2 in [Reference van Gennip and Budd190]. □

3.2 Graph Gradient, (p-)Dirichlet Energy, (p-)Laplacian, and Total Variation

To be able to do calculus on graphs, we require a discrete analogue of the derivative: the graph gradient. For uV the gradient uE is defined byFootnote 16

(u)ij:=ωij1q(ujui).

Remark 3.3. The graph gradient provides a good motivation for defining the node-dependent inner product and norms in (3.3). The continuum gradient u of a function u:RmR is a vector-valued function with length u2 – or, in general, p-norm – a real-valued function on Rm. Analogously, in the graph setting the node-dependent norms from (3.3) are real-valued functions on V.

In Ta et al. [Reference Vinh-Thong, Elmoataz and Lézoray183] a connection is made between the node-dependent -norms and morphological dilation and erosion operators, if q=1 or ω(V×V){0,1}. Using superscripts + and to denote the positive part x+:=max(0,x) and negative part x:=min(0,x) of a number xR, respectively, we compute

(u)+i,=maxjimax(0,ujui)=maxuj:j=i or jiui=:(δa(u))iui,(u)i,=maxji(min(0,ujui))=minuj:j=i or jiui=uiminuj:j=i or ji=:ui(εa(u))i.

In [Reference Vinh-Thong, Elmoataz and Lézoray183] the operators δa and εa are called the dilation and erosion operators, respectively, in analogy to similar operators in the continuum setting. We can understand these names, if we apply the operators to the indicator function u=χS of a node subset SV. Then δa(u)=χSδ and εa(u)=χSε, where the dilated set Sδ consists of S with all the nodes that have at least one neighbour in S added, and the eroded set Sε consists of S with all the nodes that have at least one neighbour in Sc (i.e., VS) removed. In El Chakik et al. [Reference Chakik, Abdallah and Desquesnes70] on weighted graphs the nonlocal dilation operator and nonlocal erosion operator,

(NLD(u))i:=ui+(u)+i,and(NLE(u))i:=ui(u)i,,

respectively, are introduced. The preceding computation shows that in the case of an unweighted graph, δa=NLD and εa=NLE.

We define the graph divergence div:EV to be given by

(divφ)i:=12dirjVωijq(φjiφij),

since this is the adjoint of the graph gradient: it can be checked that, for all uV and all φE, u,φE=u,divφV. We thus define a graph LaplacianFootnote 17 Δ:VV as the divergence of the gradient:

(Δu)i:=(divu)i=dirjVωij(uiuj).(3.4)

We note that Δ is independent of q. If r=0, Δ is called the combinatorial (or unnormalized) graph Laplacian, while if r=1, it is called the random walk (or asymmetrically normalized) graph Laplacian. We note that Δ is self-adjoint:

u,ΔvV=Δu,vV.(3.5)

Remark 3.4. Since we consider finite graphs with |V|=[n], there is a bijective correspondence between functions in uV and (column) vectors in Rn with entries ui. It follows that linear operators on V can be represented by matrices. In particular, the graph Laplacian Δ has associated matrix Dr(DA)=D1rDrA, where D is the diagonal degree matrix with Dii:=di and, as in Section 3.1, the matrix A is the weighted adjacency matrix with entries Aij=ωij. In particular, the matrix associated to the combinatorial graph Laplacian is DA, and to the random walk graph LaplacianFootnote 18 is ID1A, where by I we denote the identity matrix of the appropriate size. To avoid complicating the notation and text, we will freely use both kinds of representation without always explicitly noting any switches or changing notation; for example, u can denote a function or vector and Δ may be the Laplacian operator or matrix. We note that the multiplication of two node functions uv becomes a matrix-vector multiplication Uv in vector notation, with U the diagonal matrix with Uii=ui.

Remark 3.5. The name random walk Laplacian for Δ with r=1 comes from the fact that the D1A term in the associated matrix ID1A (see Remark 3.4) is a right-stochastic matrix, that is, its rows sum to one, and can thus be interpreted as the transition matrix of a discrete-time Markov chain with (D1A)ij=di1ωij being the probability of the transition from state i to state j in a single time step. Associating the states with nodes of a graph, the Markov chain describes a random walk on that graph. The (negative) unnormalized graph Laplacian Δ (with r=0) is the (infinitesimal) generator of a continuous-time Markov chain,Footnote 19 as described in detail in remark 2.1.5 in [Reference van Gennip and Budd190].

Remark 3.6. We would be remiss not to mention the symmetrically normalized Laplacian, which is represented in matrix form as

Δsym:=D12(DA)D12=Dr12(D1rDrA)D12=ID12AD12.(3.6)

This graph Laplacian appears frequently in the literature, for example, Chung [Reference Chung55], but is not captured in the preceding framework, as it would require a different scaling for each term in the gradient u. We refer to Zhou and Schölkopf [206, section 2.3] for details. We note that Δsym is self-adjoint, with respect to ,V for r=0. More generally, the two-parameter normalized graph Laplacian

Δ(s,t):=Ds(DA)Dt(3.7)

is self-adjoint with respect to ,V forFootnote 20 r=st. Furthermore, for all aR

DaΔ(s,t)Da=Δ(s+a,ta)

and so Δ(s,t) and Δ(s,t) are similar whenever s+t=s+t. In particular, Δ(s,t) is similar to the symmetric matrix Δ((s+t)/2,(s+t)/2) and thus Δ and Δsym are similar when r=1. For further details, we refer to Budd [Reference Budd34, chapter 2].

In Smola and Kondor [Reference Smola, Kondor, Bernhard and Warmuth180, theorem 3] it is shown that graph Laplacians are, in a sense, invariant to vertex permutations. It is also shown that they are, in some sense, the only linear operators that depend linearly on the graph’s adjacency matrix to be so.

In Zhou and Belkin [Reference Zhou, Belkin, of Gordon, Geoffrey, David and Dudík208] a geometry graph Laplacian is used, defined as

Lgeom:=I(Dgeom)1Ageom,

where Ageom:=D1AD1 and Dgeom is the corresponding degree matrix. We also mention that in Merkurjev et al. [Reference Merkurjev, Nguyen and Wei145] a multiscale Laplacian is introduced.

The graph Dirichlet ‘energy’ of uV (as defined in e.g. Van Gennip and Bertozzi [Reference Bertozzi189, appendix A]) is

12uE2=12u,ΔuV=14i,jVωij(uiuj)2=max{divφ,uV:φE and φE1}.(3.8)

We observe that uE2 does not depend on q.

For Poincaré inequalities [Reference Dacorogna, Gangbo and Subía61, Reference Dym and McKean66, Reference Evans81] involving the Dirichlet energy on (infinite) graphs, we refer to [Reference Coulhon and Koskela58, Reference Levi, Santagati, Tabacco and Vallarino129]. Computing the Gateaux derivative of the Dirichlet energy, we recognize the graph Laplacian:

ddα12(u+αv)E2α=0=12ddα[Δu,uV+2αΔu,vV+α2Δv,vV]α=0=Δu,vV,(3.9)

where αR, u,vV, and we have used the self-adjointness of Δ from (3.5).

In [Reference Budd34, theorem 4.1.2] it is shown that the two-parameter normalized graph Laplacian from (3.7) also is the first variation of a Dirichlet-type functional:

ddα12Dt(u+αv)E2α=0=Δ(s,t)u,vV,

where (Dtu)i=ditui and in the V-inner product r=st.

Changing the edge function norm in the maximum formulation in (3.8) to a maximum norm leads to a definition of graph total variation:

TV(u):=max{divφ,uV:φE and φE,1}=12i,jVωijq|uiuj|=uE,1.(3.10)

The second equality in (3.10) follows since the maximum in the definition is achieved at φ=sgn(u) (where the signum function acts elementwise: φij=sgn(ujui)), where sgn can be any representative of the signum function equivalence class in Lloc1(R), that is, sgn(x)=1 if x>0, sgn(x)=1 if x<0, and sgn(0) may be defined to equal any arbitrary, but determined, real number.Footnote 21 This can be seen by rewriting divφ,uV=φ,uE. This will play a role again when we consider curvature in Section 9. For the final equality in (3.10) we used that the edge weights ωij are nonnegative.

If uV{0,1}, then (uiuj)2=|uiuj| and thus by (3.8), if q=1, then

uE2=12i,jVωij(uiuj)2=12i,jVωij|uiuj|=TV(u).

Analogously to the ‘anisotropic’ total variation in (3.10), an isotropic total variation can be defined. See Van Gennip et al. [Reference Nestor, Braxton and Bertozzi191, remark 2.1].

The graph p-Dirichlet energy – so called in analogy to the Dirichlet energy in (3.8) – is

1puE,pp=12pi,jVωij2q1|ωij1q(ujui)|p=12pi,jVωij(2p)q+p1|uiuj|p.(3.11)

We note that for p=2 indeed we recover the graph Dirichlet energy from (3.8) and for p=1 we obtain the graph total variation from (3.10).

For p[1,), the graph p-Laplacian Δp:VV is definedFootnote 22 via the Gateaux derivative of the p-Dirichlet energy by requiring that, for all u,vV,

Δpu,vV=dds1p(u+sv)E,pps=0=12iVjVujuiωij(2p)q+p1|uiuj|p2(uiuj)(vivj)=iVjVuiujωij(2p)q+p1|uiuj|p2(uiuj)vi.(3.12)

(We note that we used the symmetry of ω.) Thus, for all uV and for all iV,

(Δpu)i:=dirjVujuiωij(2p)q+p1|uiuj|p2(uiuj).(3.13)

From (3.9) we see that we recover our standard graph Laplacian if we choose p=2. For other values of p, the operator Δp is not linear. We note that (for general p), if q=1, then the exponent of ωij in Δp equals 1 (see Remark 3.9).

By splitting the sum in (3.13), we obtain

(Δpu)i=dirjVui>ujωij(2p)q+p1(uiuj)p1dirjVuj>uiωij(2p)q+p1(ujui)p1=2dirω1qp(u)i,p1p12dirω1qp(u)+i,p1p1.

In particular, Δpu=0 if and only if, for all iV, ω1qp(u)i,p1=ω1qp(u)+i,p1. From the final inequalities in Corollary 3.2, we know that

limpω1qp(u)i,p1=(u)i,,limpω1qp(u)+i,p1=(u)+i,.

This inspires the following definitionFootnote 23 of the graph -Laplacian from Elmoataz et al. [Reference Elmoataz, Desquesnes, Lakhdari and Lézoray73], for all uV and for all iV:

(Δu)i:=(u)i,(u)+i,.

Then Δu=0 if and only if, for all iV, (u)i,=(u)+i,. The results in Lemma 3.7 give further justification for this definition of Δ. In order to state the lemma, we need the concept of lexicographic ordering of edge functions (we refer to Kyng et al. [Reference Rasmus, Anup, Sushant, Spielman, Peter, Elad and Kale123, section 2]): given functions φ,ψE, we define φψ if and only if, after reordering the values of φ and ψ in nonincreasing order, the first value of φ that is different from the corresponding value of ψ is smaller than this corresponding value, or no such value exists (i.e., φ and ψ are equal after the reorderings).

Lemma 3.7. Let SV be a nonempty subset and let u0:SR. The minimization problem

argminuVuE,s.t.u|S=u0

has a solution. Out of all minimizers, there is a unique one u*, which satisfies, for all uV with u|S=u0, u*u. Moreover, if uV with u|S=u0, then u=u* if and only if, for all iVS, (Δu)i=0. Furthermore, for p(1,), the minimization problem

argminuVuE,ps.t.u|S=u0

has a unique solution up. If q=12, then limpup=u*.

Proof. We refer to lemma 2.1.7 in [Reference van Gennip and Budd190]. □

Because it is the first variation of the graph p-Dirichlet energy (see (3.12)), the graph p-Laplacian in (3.13) is sometimes also called the variational graph p-Laplacian to distinguish it from the game-theoretic graph p-LaplacianFootnote 24

Δpgame:=1pΔ+α12pΔ,(3.14)

for some constant α>0; see Flores et al. [Reference Flores, Calder and Lerman90].Footnote 25 A similar operator is used in Elmoataz et al. [Reference Elmoataz, Desquesnes and Lézoray72], where it is called the graph normalized p-Laplacian and where the constants in the linear combination of Δ and Δ are kept as parameters that can be chosen depending on the application.

For a generalization of the discrete calculus presented in these past few sections to (undirected and directed) hypergraphs, that is, ‘graphs’ whose (hyper)edge set E is a subset of P(V){},Footnote 26 we refer to [Reference Bosch, Klamt and Stoll26, Reference Fazeny88, Reference Fazeny, Tenbrinck, Burger, of Calatroni, Luca, Marco, Serena and Santacesaria89, Reference Ikeda and Uchida108, Reference Jost and Mulas112].

Remark 3.8. Hypergraphs do not receive much attention in this work, but we do want to mention a string of recent works by Mulas, Jost, and collaborators, which define graph Laplacians [Reference Jost and Mulas112] and p-Laplacians [Reference Jost, Mulas and Zhang113] on hypergraphs, study their spectraFootnote 27 [Reference Galuppi, Mulas and Venturello92, Reference Mulas, Horak, Jost, of Battiston, Federico and Petri155, Reference Mulas and Zhang157], investigate random walks [Reference Mulas, Kuehn, Böhle and Jost156] and other hypergraph dynamics [Reference Bühler and Hein24], and generalize concepts such as the Cheeger cut [Reference Mulas154], independence number (i.e., the maximum size of a subset SV such that, for all distinct i,jS, there is no hyperedge to which they both belong), and colouring number (i.e., the fewest colours needed to colour all vertices such that no hyperedge contains two vertices with the same colour) [Reference Abiad, Mulas and Zhang1] to hypergraphs.

We note that Mulas et al. [Reference Mulas, Horak, Jost, of Battiston, Federico and Petri155] also provides generalizations of graph Laplacians to signed graphs (which are graphs in which edge weights can be positive and negative), graphs with self-loops, and directed graphs. For the latter case, the graph Laplacian from Bauer [Reference Bauer17] is used; see Section 4.

For extensions to hypergraphs of the graph limit theories of graphons and graphops (see chapter 7 of [Reference van Gennip and Budd190]) we refer to Elek and Szegedy [Reference Elek and Szegedy71] and Zhao [Reference Zhao203] (hypergraphons), and Zucal [Reference Zucal212] (hypergraphops), respectively.

Methods on hypergraphs have been applied to problems related to our interests in this work, for example hypergraph signal processing in Zhang et al. [Reference Zhang, Ding and Cui202], heat diffusion on hypergraphs for finding bipartite components in Macgregor and Sun [Reference Macgregor, Sun, Beygelzimer, Dauphin, Yann, Liang and Vaughan134] (see also Macgregor [Reference Macgregor133, chapter 7]), and semi-supervised learning (see section 4.1.2 in [Reference van Gennip and Budd190]) on hypergraphs for early diagnosis of Alzheimer’s disease in Aviles-Rivero et al. [Reference Aviles-Rivero, Christina, Nicolas, Zoe, Schönlieb, Linwei, Qi, Thomas, Speidel and Li11].

3.3 Miscellaneous Considerations

To define the distance dijG between two nodes i and j on a graph G, we first need the concept of a path on V. If i,jV are distinct nodes, a path γij from i to j is a tuple of N distinct vertices γij=(i1,,i) such that i1=i, i=j, and, for all k[1], ik+1ik. The length of the path γij is defined to be

|γij|:=k=1ωikik+1q1.

The reasons for this choice of scaling of ω are explored in some more detail in Remark 3.9. We now define the graph distance between nodes ij as

dijG:=minγij: a path from i to j|γij|.(3.15)

We also define diiG:=0. We note that the path of minimal length between two neighbouring nodes, is not necessarily the ‘direct’ path (i,j). If SV is not empty, we define the distance from iV to S as

diS:=minjSdijG.

For S= we define, for all iV, di:=+. By Manfredi et al. [Reference Manfredi, Oberman and Sviridov138, section 3.1, example 2], if S, u=dS is the unique solution to an eikonal equation:

minji(u)ij=1,if iVS,ui=0,if iS.

That dS is a solution can be understood as follows. If iVS, the minimum value of (dS)ij among all neighbours j of i will be achieved at a j that lies on a shortest path from i to S, in which case djSdiS=dijG=ωijq1 and thus (dS)ij=1. If iS, then diS=0. For a proof of uniqueness we refer to [Reference Manfredi, Oberman and Sviridov138, section 6.2]. We note that the eikonal equation, while inspired by a differential equation from the continuum setting, is a difference equation on graphs.

It is useful to note that we can rewrite the eikonal equation. Let iVS. Since the equation requires minji(u)ij=1, we can replace u by (u) without changing the equation. Since minji[(u)ij]=maxji(u)ij=(u)i,, we rewrite the eikonal equation as

(u)i,=1,if iVS,ui=0,if iS.

Remark 3.9. The parameter q, introduced in the definition of the E-inner product and used in the definition of the gradient, does not appear in the Laplacian and Dirichlet energy, but it does impact the definition of total variation.

There are two common choices for q. If we imagine the graph gradient to be a generalization of the finite-difference discretization of the continuum gradient from a regular grid to a general graph, it is natural to associate ωij1q with the length scale h1, where h is the mesh size of the ‘grid’. Reasoning similarly for the graph Laplacian, which ‘should’ be analogous to a finite-difference discretization of the continuum Laplacian, dirωij=h2. If the weights ωij are scaled by a constant factor, the degree di scales with the same factor, thus ωij1rh2. Hence ωij1r/ωij1q=ωijqrh2/h1=h1. As both ωijqr and ωij1q scale with h1, we might choose q by qr=1q, that is, q=12(1+r). The common picks r=0 and r=1 then correspond to q=12 and q=1, respectively.

This numerical-analysis-based analogy cannot be extended indefinitely. If we want to interpret the summations in the Dirichlet energy and total variation as discrete quadratures approximating integrals, then the volume of the grid cells of the discretization will have to be incorporated into ωij. Since this volume depends on the dimension of the space over which it is being integrated, this cannot be made consistent with our preceding considerations.

In the remainder of this work (except in Sections 4, 5.3, and 9 where q returns very briefly) we will make the choice q=1, even when r1. This is inspired by Theorem 5.1, which states that the graph Ginzburg–Landau functional (which we will define in Section 5) Γ-converges to graph total variation with q=1 in the relevant parameter limit ε0. The graph Ginzburg–Landau functional, which is at the heart of much that is discussed in this Element, is independent of q; the fact that in the limit total variation with q=1 appears can thus be viewed as a selection criterion for our parameter choice.

Another advantage of choosing q=1 is that the exponent of ωij in the graph p-Dirichlet energy in (3.11) is independent of p (and equal to 1) if and only if q=1. The graph p-Dirichlet energy is a common choice of regularizer in graph-based variational models (see section 12 and section 4.1 in [Reference van Gennip and Budd190]) and gave rise to the graph p-Laplacian in (3.13).

If G=(V,E) is an unweighted graph, we call the graph G=(V,E) a subgraph of G if VV, EE, and (i,j)E implies i,jV. If, additionally, i,jV and (i,j)E implies (i,j)E, then G is called an induced subgraph (or vertex-induced subgraph, or subgraph induced by V) of G. A subgraph G of G is connected if for all distinct nodes i,jV, there exists a path from i to j. A subgraph G is a component (or connected component) if it is connected and it is maximal in the sense that if is a connected subgraph of G and G is a subgraph of G, then . Any connected component of G necessarily is an induced subgraph of G.

These notions straightforwardly carry over to the case in which G˜=(V,E,ω) is an edge-weighted graph; that is, we call G˜=(V,E,χEω|V×V)Footnote 28 a subgraph of G˜, a (vertex-)induced subgraph of G˜, connected, or a connected component of G˜, if (V,E) is a subgraph of (V,E), a (vertex-)induced subgraph of (V,E), connected, or a connected component of (V,E), respectively.

4 Directed Graphs

In most of this Element we restrict our attention to undirected graphs – which we defined in Section 3.1 as graphs for which (i,j)E if and only if (j,i)E – as that is the setting in which most of the work has been done on the topics we cover. In this section we will take a brief detour to directed graphs – that is, graphs in which (i,j)E and (j,i)/E may both be true for given nodes i,jV – and we give a brief overview of (a priori different) approaches to defining graph Laplacians on such graphs. Each of these approaches uses a different aspect of Laplacians on undirected graphs as a starting point to define a Laplacian on a directed graph.

For an edge-weighted directed graph, the edge weights ω need not be symmetric, that is, it is possible that ωijωji. We still assume ωij0, with ωij>0 if and only if (i,j)E.

Figure 4.1 shows an example of a directed graph, where an arrow on an edge pointing from node i to node j indicates the edge (i,j). In particular, the graph in Figure 4.1 is an oriented graph, namely a directed graph in which (i,j)E implies (j,i)/E. We do not require the directed graphs to be oriented.

Figure 4.1 An example of a directed graph; specifically, an oriented graph.

A direct adaptation of the random walk graph Laplacian to directed graphs is given in Bauer [Reference Bauer17]. In Zhou et al. [Reference Zhou, Schölkopf, Hofmann, Lawrence, Yair and Bottou207] the authors make use of the one-to-one correspondence between directed graphs and bipartite graphsFootnote 29 to define new undirected graphs on the two independent vertex classes of the bipartite graph and combine the Laplacians on those new graphs into a Laplacian on the directed graph. In Chung [Reference Chung52] the author recalls the connection between graph Laplacians and random walks to define a graph Laplacian via a random walk on a directed graph. In Hein et al. [Reference Hein, Audibert and von Luxburg104] a similar approach is followed as we usedFootnote 30 for our undirected Laplacian in Section 3, that is, the authors start by defining inner product structures on the spaces of node and edge functions and a discrete gradient, then find a divergence as the adjoint of the gradient and define the Laplacian as the divergence of the gradient. In Shubin [Reference Shubin177] a magnetic Laplacian is introduced for physical reasons; in [Reference Fanuel, Alaíz, Fernández and Suykens86, Reference Fanuel, Alaíz and Suykens87, Reference Xue, Higham and Zygalakis100] it is used for community detection, visualization, and detection of hierarchical structures in directed graphs (see chapter 4 in [Reference van Gennip and Budd190]). In MacKay et al. [Reference MacKay, Johnson and Sansom137] and Gong et al. [Reference Xue, Higham and Zygalakis100] the trophic Laplacian is introduced, in Klus and Djurdjevac Conrad [Reference Klus and Conrad119] the forward-backward Laplacian, and in Jost et al. [Reference Jost, Mulas and Torres114] and Mulas et al. [Reference Mulas, Zhang and Zucal158] the non-backtracking Laplacian. We refer to section 2.5 of [Reference van Gennip and Budd190] for more details on these approaches.

5 The Graph Ginzburg–Landau Functional

A central object in this work is the graph Ginzburg–Landau functionalFootnote 31 GLε:VR, which was first introduced in Bertozzi and Flenner [Reference Bertozzi and Flenner21, Reference Bertozzi and Flenner22]. It consists of the Dirichlet energy plus a potential term:

GLε(u):=12α(ε)uE2+1εW(u).

There are some variations in the literature regarding the choice of the ε-dependent factor α(ε) and the choice of potential term, which we briefly discuss here.

In [Reference Bertozzi and Flenner21, Reference Bertozzi and Flenner22] the authors choose α(ε)=ε in analogy to the continuum Ginzburg–Landau functional (see section 3.7 in [Reference van Gennip and Budd190]). In Van Gennip and Bertozzi [Reference Bertozzi189, section 3.1], however, the authors argue that, if one is interested in the limit ε0, then α(ε)=1 is a better choice. The reason for this is (in brief) that the divergence of the Dirichlet energy for which ε is meant to compensate in the continuum case, does not occur for the discrete graph Dirichlet energy, which remains bounded. Thus instead of compensating for divergent behaviour, a factor ε would completely remove the influence of the Dirichlet term in the limit ε0, leading to a trivial (and uninteresting) limit.

When considering the variational properties of GLε at a fixed ε>0, the difference between α(ε)=ε and α(ε)=1 is minimal, since ε1/2GLε with α(ε)=ε equals GLε with α=1. Moreover, if we consider gradient flows derived from GLε (see Section 7), going from α(ε)=ε to α(ε)=1 involves only a time rescaling tnew=told/ε.

The different choices in the potential term W concern two different issues. The first choice involves the factor dir in the V-inner product. In [Reference Bertozzi and Flenner21; Reference Bertozzi and Flenner22] the authors use W(u)=iVW(ui), where is a potential function, which we will consider in much more detail shortly. In Budd and Van Gennip [Reference Budd and van Gennip35] it is noted that W(u)=Wu,1V=iVdirW(ui) is a preferable choice when studying the V-gradient flow of GLε (see Section 7), as it removes the factor dir (which, for r0, could be considered a node-dependent time rescaling) from the gradient flow.Footnote 32

The second choice in W concerns the potential function . In the classical continuum Ginzburg–Landau functional, this is a smooth double-well potential, whose graph resembles (a smooth variant of) the letter ‘W’. An example is shown in Figure 5.1(a). Such a function has three important properties: smoothness, a global minimum which is achieved in exactly two locations (two ‘wells’), and some kind of coercivity which implies that W(x) if |x|. In some papers, such as in [Reference Bertozzi189, section 2.3], no specific choice is made for W, but precise conditions are given for the smoothness, wells, and coercivity properties of W. Other papers choose a specific (quartic) potential from the class of potentials that satisfy these conditions, such as W(x)=14(x21)2 in [Reference Bertozzi and Flenner21, Reference Bertozzi and Flenner22] or Van Gennip et al. [Reference Nestor, Braxton and Bertozzi191, section 5] with wells at x{1,1} or W(x)=14x2(x1)2 in Calatroni et al. [Reference Luca, Hannah and Flenner41] with wells at x{0,1}. For most purposes, the location of the wells is not important.Footnote 33 A significant exception is the case of the signless Ginzburg–Landau functional, which we discuss in Section 5.3, and the graph Cahn–Hilliard equation (which requires a mass condition), which is addressed in Section 7.4.

Figure 5.1

(a) Two types of potential W that are commonly used in the Ginzburg–Landau functional: the double-well potential W(x)=14x2(x1)2

(a) and the double-obstacle potential from (5.1).

When GLε is minimized, the potential term drives ‘phase separation’, that is, minimizers (especially at small ε>0) will take values which are close to the values at which W has its wells (thus typically 0 and 1, or 1 and 1).

A significant departure from the standard double-well potential can be found in Budd et al. [Reference Budd and van Gennip35, Reference Budd and van Gennip36, Reference Budd, van Gennip and Latz37]. In those papers the double-obstacle potential is used (since it will be clear from the context which potential we are using at any given time in this Element, we use the same symbol W to denote it):

W(x)=12x(1x),if 0x1,+,otherwise.(5.1)

Figure 5.1 (b) shows a visualization of the double-obstacle potential. We note that the double-obstacle potential also achieves its minimum in two wells, located at x{0,1}, but the coercive nature of the potential is stronger than that of the double-well potential, with W attaining finite values only in the interval [0,1]. We also call attention to the quadratic, rather than quartic, nature of the potential on [0,1]. This is convenient when considering gradient flows in which the ‘derivative’ of W appears, which is linear on (0,1). A potential difficulty when using the double-obstacle potential compared to the double-well potential is that W is not differentiable outside of (0,1). We discuss these issues in more detail in Section 7 and in section 6 in [Reference van Gennip and Budd190], where we outline the results from [Reference Budd and van Gennip35, Reference Budd and van Gennip36, Reference Budd, van Gennip and Latz37].

5.1 Constraints

Minimizing GLε (in any of its variants that we have discussed) over V without imposing any further constraints, leads to constant minimizers u=c, where c can be the location of any of the wells of W.Footnote 34 Besides being necessary to avoid trivial minimizers, in applications constraints are often suggested naturally by a priori known information about the required solutions. In this Element we consider two common constraints:

  1. a ‘hard’ mass (or volume) constraint:Footnote 35 M(u):=u,1V=M, for a fixed and given M0;

  2. a ‘soft’ fidelity constraint, where a fidelity term of the form 1pμ1p(uf)V,pp is added to to GLε(u), where μV[0,){0} is a node-dependent fidelity parameter function with support equal to some subset ZV and fV is a reference function to which fidelity should be ‘softly’ enforced on Z.

These constraints are discussed in more detail in [Reference Bertozzi189, section 2.5] and play an important role in [Reference Budd and van Gennip35] (mass constraint) and [Reference Budd, van Gennip and Latz37] (fidelity constraint with p=2).

The hard mass constraint can be directly included in the functional that has to be minimized by adding a term ı{0}M(u)M to GLε(u). In [Reference Bae and Merkurjev13, Reference Jacobs, Merkurjev and Esedoḡlu110] in related contexts relaxed mass constraints are used, in which upper and lower bounds are imposed for the mass instead of a precise value.

A graph functional consisting of the Dirichlet energy term and a fidelity term (or, in other words, a Ginzburg–Landau functional with fidelity term, but without potential term) is studied in Zhou and Schölkopf [Reference Zhou and Schölkopf205].

From an application-based point of view, the presence of a fidelity constraint usually differentiates graph classification from clustering (we refer to chapter 4 of [Reference van Gennip and Budd190] for more details).

5.2 Graph Cuts and Γ-Convergence to Total Variation

One important reason why minimization of GLε can be useful in applications that require clustering or classification of the node set of a graph, is that at small ε>0 the functional GLε captures an (approximate) notion of perimeter of a node subset. To understand this, we first note that, for SV,

TV(χS)=iSjScωij=:CutS.

This quantity is called the graph cut of S and can be understood as a discrete perimeter (length) of S.

Since the graph cut is a measure for the strength of the connections between the sets S and Sc, it is a candidate for the objective function that is to be minimized in graph clustering. In practice it turns out that minimizing the graph cut subject only to the constraints that S and Sc are nonempty sets, often leads to an unbalanced clustering in which one of the two sets contains the vast majority of the graph nodes. In many applications this is not what is required, which has led to many ‘balanced’ (graph) cuts, such as the ratio cut [Reference Lars and Kahng102], normalized cut [Reference Shi and Malik176], min-max cut [Reference Ding, Xiaofeng, Zha, Ming, Simon, of Cercone, Nick, Young and Wu64], edge expansion (related to the sparsest cut problem) [Reference Arora, Hazan and Kale10], and Cheeger cutFootnote 36 (or conductance) [Reference Cheeger and Gunning49, Reference Chung55]:

RCut(S):=CutS|S|+CutS|Sc|,NCut(S):=CutSvolS+CutSvolSc,MCut(S):=CutSiSjSωij+CutSiScjScωij,ECut(S):=CutSmin(|S|,|Sc|),andCCut(S):=CutSmin(volS,volSc),

respectively, where r=1 is used in the computations of the volume. Since the main interest in these cuts is as objective functions in minimization problems, different versions of the cuts that are equivalent for minimization purposes may be encountered, such as CutS|S||Sc| for the ratio cut, which differs from the preceding definition above only by a constant factor 1|V|.

These balanced cuts have been normalized by factors which, upon minimization, avoid one of the sets S and Sc being much larger than the other, where ‘large’ is to be interpreted according to the specific measure used in the normalizing factors. Intuitively, these normalization factors play a role similar to the mass constraint from Section 5.1. Minimization of such balanced cuts is NP-hard (see, for example, [Reference Shi and Malik176, Reference von Luxburg193] and references therein). For more information about relaxations of some of these minimization problems in terms of eigenvalue problems for graph Laplacians, we refer to von Luxburg [Reference von Luxburg193].

Multiclass extensions of the preceding cuts for a partition {Si}iI of V are straightforward to construct: iICutSi|Si| for the ratio cut, iIcutSimin(|Si|,|Sic|) for the edge expansion, and similarly for the other balanced cuts.

For more details about graph cuts, we refer to Izenman [Reference Izenman109, chapter 11].

The following theorem shows the connection between the graph cut and GLε. It uses the language of Γ-convergence [Reference Braides28, Reference Maso62].

Theorem 5.1. Assume that W is either a smooth double-well potential with wellsFootnote 37 at x{0,1}, satisfying the coercivity condition lim|x|W(x)=, or is the double-obstacle potential. Let W be defined in either of the two ways that are introduced near the beginning of (the current) Section 5 and let α(ε)=1.

When ε0, the sequence (GLε) Γ-converges to the functional:

Moreover, the following compactness result holds: if (εk) is a sequence of strictly positive real numbers converging to zero and (uk) is a sequence in V for which GLεk(uk) is bounded uniformly in k, then there exists a subsequence of (uk) which converges to a uV.

Proof. In the double-well case, the proof can be found in [Reference Bertozzi189, theorems 3.1 and 3.2]. In the double-obstacle case, the proof of Γ-convergence is very similar and given in [Reference Budd and van Gennip35, theorem 6.1]. The proof of the compactness result is not given there, but it is simpler than the one in [Reference Bertozzi189, theorem 3.2]: the double-obstacle potential forces the functions uk to lie in the compact set V[0,1]. Compactness of V[0,1] is a consequence of the Heine–Borel theorem, if we identify the set with the closed and bounded set of vectors [0,1]nRn.

The proof in [Reference Bertozzi189] works with W(u)=iVW(ui), while the proof in [Reference Budd and van Gennip35] has W(u)=Wu,1V. □

As a consequence of Theorem 5.1, by the fundamental theorem of Γ-convergence (see [Reference Braides28, Reference Maso62]), if (uk) is a sequence such that uk minimizes GLεk over V, then any cluster point of the sequence minimizes GL0. Of course, that in itself is not very interesting, as we already know that the minimizers of GLε over V are constant functions. In [Reference Bertozzi189, theorem 3.6] it is shown that the results of Theorem 5.1 remain true (mutatis mutandis) if any one of the constraints from Section 5.1 is implemented in the double-well case and [Reference Budd33, theorem 7.2.4, corollary 7.2.5] shows the same for the double-obstacle case with a fidelity constraint. The double-obstacle case with a mass constraint can be tackled with similar techniques as the other cases.

5.3 Signless Laplacians and Signless Ginzburg–Landau Functional

A variant of the graph Ginzburg–Landau functional is the signless graph Ginzburg–Landau functional GLεσ:VR:

where the signless graph gradient is defined to be (σu)ij:=ωij1q(uj+ui). We note that the only difference with the standard graph gradient and graph Ginzburg–Landau functional is the sum rather than the difference of uj and ui in the gradient. In Keetch and Van Gennip [Reference Keetch and van Gennip117] and Keetch [Reference Keetch116], minimization of GLεσ is used to find approximate solutions to the Max-Cut problem, which consists of finding a partition of the node set V into SV and Sc, in such a way that CutS is maximized. As before near the beginning of Section 5, we note that the choice between α(ε)=1 and α(ε)=ε is not relevant for this minimization problem. The choice of W, however, is important. In particular, the wells of W should be placed symmetrically with respect to the origin, otherwise the constant function u=0 would be the unique minimizer. In [Reference Keetch116, Reference Keetch and van Gennip117] the authors use W(x)=(x21)2, W(u)=iVW(ui), α(ε)=1, and q=1. We note that no additional constraints, such as a mass or fidelity constraint, are required to avoid trivial minimizers of GLεσ.

As for the standard graph gradient, we can define the signless graph divergence as the adjoint of the signless gradient. Having those ingredients, we then define the signless graph Laplacian as the signless divergence of the signless gradient, the signless graph Dirichlet energy as in (3.8), but using the signless gradient instead, and signless graph total variation as in (3.10), but using the signless divergence instead. In each of these cases, this simply translates to replacing the difference in the divergence or Laplacian by a summation. In particular,

(Δσu)i:=dirjVωij(ui+uj)andTVσ(u):=12i,jVωij|ui+uj|.(5.2)

We note that the signless Laplacian has matrix representation Δσ=Dr(D+A) (see Remark 3.4). Besides the combinatorial (r=0) and random walk (r=1) versions, the symmetrically normalized signless Laplacian Δσ,sym=D12(D+A)D12 also appears in the literature. It is a special case of the signless variant of the two-parameter graph Laplacian in (3.7) (see also Remark 3.6).

The signless analogue of Theorem 5.1 holds, with GLε, TV, and V{0,1} replaced by GLεσ, TVσ, and V{1,1}, respectively, as proven in [Reference Keetch and van Gennip117, lemmas 4.3 and 4.4]. This justifies the use of GLε+ as an approximate objective function for the Max-Cut problem, since, for SV, TVσ(χSχSc)=2i,jVωij4CutS [Reference Keetch and van Gennip117, lemma 4.1]. Thus minimization of TVσ(χSχSc) over S maximizes the cut. We note that, given that the wells of W are located at x{1,1}, the relevant set ‘indicator’ function is here χSχSc, not χS.

6 Spectrum of the Graph Laplacians

Indicators of the usefulness of the graph Laplacians for clustering problems can be found in their spectrum, particularly the property that is stated in the following lemma. In this section (Section 6), we temporarily drop the assumption that our graphs are connected. We begin with a very important result.

Lemma 6.1. All eigenvalues of Δ and Δsym are real and nonnegative. The smallest eigenvalue of each of these operators is 0 and its algebraic and geometric multiplicities are both equal to the number kN of connected components G1,,Gk of the graph G. The associated eigenspace for Δ is spanned by the set of indicator functions (on V) {χSi}i[k], where SiV is the set of nodes that induces the subgraph Gi. The associated eigenspace for Δsym is spanned by {fi}i[k] where fi:VR is defined by, for all jV, fji:=dj1/2(χSi)j.

Proof. This well-known result has been proven and reproven in many places, for example Chung [Reference Chung55, lemma 1.7] and von Luxburg [Reference von Luxburg193, propositions 1–4]. □

This lemma shows that if the clustering problem is trivial, in the sense that each cluster is its own connected component, then the number of clusters is given by the multiplicity of the zero eigenvalue and the corresponding eigenspace contains information about the nodes belonging to each cluster. The expectation that graph Laplacians are also useful in nontrivial clustering problems, in which the graph itself may be connected, but certain induced subgraphs have a high edge volume (see (3.2)) and low connectivity to other parts of the graph, is based on the hope that small perturbations to the edge structure of the trivial case will preserve the important information in the first (in ascending order) eigenvalues of the graph Laplacian and their corresponding eigenfunctions (or eigenspaces). This hope is (non-rigorously) justified in practice by the success of Laplacian-based clustering techniques, such as spectral clustering [Reference Izenman109, Reference Ng, Jordan, Weiss, Thomas, Suzanna and Ghahramani161, Reference Shi and Malik176, Reference von Luxburg193] or the ones discussed in [Reference van Gennip and Budd190].

This hope can also be given some rigorous support, for example through higher-order Cheeger inequalities (see Lee et al. [Reference Lee, Gharan and Trevisan128, theorem 1.1])Footnote 38 of the form (for KN[2,n]),

12λKmin{Sk}k=1KmaxCutSkvolSk:k[K]C(K)λK,(6.1)

where λK denotes the Kth eigenvalue (in nondecreasing order) of the random walk graph Laplacian and C(K)=O(K2) as K.Footnote 39 The minimum in (6.1) is taken over all partitions of V into K (per definition of partition nonempty and pairwise disjoint) subsets Sk of V. These inequalities tell us that, if there is a large gap between the Kth and (K+1)st eigenvalues, then the value of the Cheeger constant, that is, the value in between the inequalities in (6.1) (see also footnote Footnote 36), will be significantly higher if we partition V into K+1 subsets, than if we partition it into K subsets.

For further details about the eigenvectors that support their use in clustering, we refer to the structure theorems of Peng et al. [Reference Peng, Sun and Zanetti169] and Macgregor and Sun [Reference Macgregor, Sun, Kamalika, Stefanie, Le, Csaba and Gang136] (see also Macgregor [Reference Macgregor133, chapter 4]). In Macgregor and Sun [Reference Macgregor, Sun, Kamalika, Stefanie, Le, Csaba and Gang136] (see also Macgregor [Reference Macgregor133, chapter 5]) it is shown that in some cases the use of fewer than K eigenvectors (specifically, in the spectral clustering method; see section 4.1.2 in [Reference van Gennip and Budd190]) to find K clusters can be beneficial.

More rigorous support is given by the results of Hoffmann et al. [Reference Franca, Bamdad, Oberai and Stuart105], whose paper looked at graph Laplacians Δ(s,t) (see (3.7)) that are built upon data sampled from a probability density which is a perturbation of perfectly separated clusters, and the continuum limit of these Laplacians obtained as the number of data points tends to infinity (more details on this continuum object and this random setting can be found in section 7.2 in [Reference van Gennip and Budd190]). It was shown in [Reference Franca, Bamdad, Oberai and Stuart105, theorem 3.2] that if s+t=1 (as is the case for the random walk and symmetrically normalized Laplacians), then this continuum operator has a spectral gap between its second and third eigenvalue which is independent of the size of the perturbation (refer to [Reference van Gennip and Budd190] for more details).

The second smallest eigenvalue of a graph LaplacianFootnote 40 is called the algebraic connectivity or Fiedler value. By Lemma 6.1, for a connected graph this value is strictly positive. The associated eigenvector is called the Fiedler vector.Footnote 41

Figure 6.1

(a) An example of a disconnected graph

(b) and that same graph with nodes coloured according to the value of the Fiedler vector at that node.

We illustrate these properties in Figures 6.1, 6.2, and 6.3. Figure 6.1 shows a disconnected graph, and Figure 6.1 (b) shows how the Fiedler vector distinguishes between the connected components of the graph. Figure 6.3 (a) shows the spectrum of the associated unnormalized graph Laplacian, and we observe that the first two eigenvalues are zero, in accordance with Lemma 6.1. Next, in Figure 6.2 we make this graph connected by adding a single extra edge. Figure 6.2 (b) shows how the Fiedler vector still distinguishes between the two clusters. Figure 6.3 (b) shows the spectrum of the associated graph Laplacian, and we observe that the second eigenvalue is now non-zero – in accordance with Lemma 6.1 – but the other eigenvalues are largely unchanged.

Figure 6.2

(a) The graph from Figure 6.1 with an extra edge added to make it connected

(b) and that same connected graph with nodes coloured according to the value of the Fiedler vector at that node.

(a) and Figure 6.2

(b) plotted on a (crunched) log scale to emphasize the change in the second eigenvalue, indicated by a square marker.

Figure 6.3 The spectra of the (unnormalized) graph Laplacians of the graphs from Figure 6.1

One key feature of the Laplacian spectrum is its stability with respect to noise. In El Karoui and Wu [Reference Karoui and Hau-Tieng115, lemmas 3.1 and 3.2] it was shown that if there exist (αi)i=1n and ε>0 such that for all i,jV, αi>0 and |ωi,jαiω˜i,j|ε,Footnote 42 then the corresponding random walk Laplacians built from ω and ω˜ differ in spectral normFootnote 43 by at most an O(ε) term which is independent of the αi. For a more comprehensive discussion of this stability, we refer to Ding and Wu [Reference Ding and Hau-Tieng65] and the references therein. This property is of practical importance, as it allows (spectral) graph Laplacian methods to be applied robustly to noisy data.

There are other properties of the spectrum that can be proven rigorously. Since we will not directly use them in the current work, we refer to other sources such as [Reference Chung55, Reference Nestor, Braxton and Bertozzi191, Reference von Luxburg193] for more information.

An analogous result to Lemma 6.1 also holds for the signless graph Laplacians (see Section 5.3). We recall the definition of bipartite graphs from footnote Footnote 29.

Lemma 6.2. All eigenvalues of Δσ and Δσ,sym are real and nonnegative. Assume the graph G has kN connected components G1,,Gk induced by the subsets SiV (i[k] ) and let kN[0,k]. Then both Δσ and Δσ,sym have eigenvalue 0 with algebraic and geometric multiplicity each equal to k if and only if k of the connected components are bipartite graphs.

In that case we may assume (possibly after relabelling) that, for all i[k], Gi is bipartite. Denote the partite sets of Gi by TiSi and SiTi. Then the eigenspace corresponding to the eigenvalue 0 for Δσ is spanned by the set of indicator functions (on V) {χTiχSiTi}i[k]. The eigenspace corresponding to the eigenvalue 0 for Δσ,sym is spanned by {fi}i[k] where the functions fi:VR are defined by, for all jV, fji:=dj1/2(χTi)j(χSiTi)j.

Proof. A proof is provided in Keetch and Van Gennip [Reference Keetch and van Gennip117, proposition 4]. □

Similar to how the result from Lemma 6.1 helps interpreting the usefulness of graph Laplacians for clustering, so the result from Lemma 6.2 clarifies the usefulness of signless graph Laplacians for Max-Cut problems, as in [Reference Keetch116, Reference Keetch and van Gennip117]. The objective in a Max-Cut problem is to find a partition of the node set V of a given graph into two disjoint nonempty subsets T and VT such that CutT=maxT˜VCutT˜ (see Section 5.2). In the trivial case, that is, for a graph in which all k connected components are bipartite) the maximum cut can be achieved by setting T=i=1kTi (using the notation from Lemma 6.2) and the information about each Ti is encoded in the eigenspace corresponding to the eigenvalue 0. Analogously to the clustering case, the hope (justified by numerical tests in [Reference Keetch116, Reference Keetch and van Gennip117]) is that, if the connected components are perturbations of bipartite graphs, enough of this information remains encoded in those eigenfunctions (or eigenspaces) that correspond to the smallest eigenvalues of the signless graph Laplacians, such that it can be used to construct a partition of V that, in a sense,Footnote 44 is the closest to an actual bipartition as is possible in the given graph (or a good approximation thereof).

For information about the (nonlinear) spectrum of graph p-Laplacians, we refer to Bühler and Hein [Reference Böhle, Kuehn, Mulas and Jost38].

7 Gradient Flow: Allen–Cahn

Starting from this section, we only consider undirected graphs. In the definition of GLε, we make the choices α(ε)=1 and W(u)=Wu,1V.Footnote 45 We will specify whether W is a double-well or double-obstacle potential where this is relevant. Also, we recall from Remark 3.9 that we choose q=1.

In Section 5 we have argued that minimizing GLε under a mass or fidelity constraint gives constrained approximate minimum cuts. This raises the question of how a minimizer can be found. We introduce three different options aimed at minimizing (or approximately minimizing) GLε: a gradient flow (in the current section), a threshold dynamics scheme (Section 8), and flow by mean curvature (Section 9). It should be noted that none of these options provably leads to global minimizers of GLε, but in many cases it can be shown that the scheme in question decreases GLε (or a closely related functional).

This work focuses on graph models, not the continuum models that were their inspiration, and so we will not delve deeply into those models. In section 3.7 of [Reference van Gennip and Budd190] we give a short (and very much not exhaustive) overview of some of the important literature on the continuum models that inspired many of the graph models we discuss in this work.

7.1 Smooth Double-Well Potential

First we consider the V-gradient flow of GLε [Reference Bertozzi and Flenner21, Reference Bertozzi and Flenner22, Reference Nestor, Braxton and Bertozzi191]. The equation describing this flow is of the form dudt=gradVGLε(u), where for all uV, the V-gradient of GLε at u is the unique function gradVGLε(u)V such that, with sR and for all vV,

ddsGLε(u+sv)s=0=gradVGLε(u),vV.

In analogy to the continuum case (see section 3.7.2 of [Reference van Gennip and Budd190]), the resulting equation is called the graph Allen–Cahn equation:

dudt=Δu1εWu.(7.1)

We have assumed here that the double-well potential W is chosen to be smooth (in the sense of Section 5) and, as is standard practice for gradient flows, we have introduced an ‘artificial time’ variable t, such that at each time t, u(t)V. To be precise, borrowing notation from Budd and Van Gennip [Reference Budd and van Gennip35], we define the following spaces, given an interval TR and subset AR:

VtT:={u:TV},VA,tT:={u:TVA},L2(T;V):={uVtT:utT<},H1(T;V):={uL2(T;V):dudtL2(T;V)},Hloc1(T;V):={uVtT:a,bTu|(a,b)H1((a,b);V)}.

The norm tT in the definition of L2(T;V) is induced by the inner product

(u,v)tT:=Tu(t),v(t)Vdt

and dudtL2(T;V) denotes the generalized time derivative of u, that is, the function which satisfies, for all vCc(T;V), u,dvdttT=dudt,vtT.

Remark 7.1. If we were not to choose α(ε)=1, but leave α(ε) unspecified, then (7.1) would be

dudt=α(ε)Δu1εWu

instead. Rescaling by α(ε) gives

1α(ε)dudt=Δu1εα(ε)Wuordudt˜=Δu1ε˜Wu,

if we define t˜=α(ε)1t and ε˜=εα(ε). So, even though the choice of α(ε) matters when considering the limiting behaviour of GLε as ε0 (see Theorem 5.1), when considering the graph Allen–Cahn equation we can assume, without loss of generality, that α(ε)=1, as indeed we did earlier in this section and as will be our default choice except where it is relevant to specify otherwise.

The choice of W has more influence on the Allen–Cahn equation. Choosing W(u)=iVW(ui) (rather than W(u)=Wu,1V) leads to a term 1εDrWu (if we interpret Wu as the vector with components W(ui) for iV) instead of 1εWu. To the authors’ knowledge, there are currently no studies that focus on the influence that the extra scaling Dr (if r0) has in practical applications. In our discussion of applications in chapter 4 of [Reference van Gennip and Budd190], we will mostly ignore this difference.

Remark 7.2. If we wish to similarly obtain a graph Allen–Cahn equation with the two-parameter normalization of the graph Laplacian Δ(s1,s2) (which, we recall from (3.7), is defined to be Ds1(DA)Ds2), that is,

dudt=Δ(s1,s2)u1εWu,

as a V-gradient flow of a Ginzburg–Landau-like energy, then we must modify the Ginzburg–Landau energy to

12(Ds2u)E2+1εW(u),

and take the V-gradient flow (and inner products in the energy) with r=s1s2. The same modification mutatis mutandis applies to all of the Allen–Cahn equations in (this) Section 7. For further details, we refer to Budd [Reference Budd34].

Remark 7.3. Since the right-hand side of (7.1) is locally Lipschitz continuous, by the Picard–Lindelöf theorem (see Hale [Reference Hale103, chapter I, theorem 3.1]), for every u0V there exists a unique continuously differentiable solution u locally in time of the initial-value problem with Equation (7.1) and initial condition u(0)=u0. Since (7.1) is the V-gradient flow of GLε, we have, for all t0 in the domain of u, GLε(u(t))GLε(u0)<. Since W satisfies a coercivity condition (i.e., W(x) if |x|; see Section 5), this implies that u(t)V, is bounded for all t0 in the domain of u (there is no finite-time blowup). A standard continuation result for ODEs [Reference Hale103, chapter I, theorem 2.1] tells us that the solution u exists for all t[0,).

Remark 7.4. As discussed in some more detail in section 3.7.6 of [Reference van Gennip and Budd190], solutions of the continuum Allen–Cahn equation converge to solutions of continuum mean curvature flow (in some sense that can be made precise) as ε0. For more details about the expectations this raises for the graph variants, we refer to Section 9.2.

Remark 7.5. In Keetch and Van Gennip [Reference Keetch and van Gennip117] (see also Keetch [Reference Keetch116]) the signless graph Allen–Cahn equation is derived, based on the signless graph Ginzburg–Landau functional from Section 5.3. It differs from the equation in (7.1) in that it uses the signless graph Laplacian Δσ from (5.2) rather than Δ.

7.2 Double-Obstacle Potential

Instead of using a smooth double-well potential W, we can also consider the V-gradient flow of GLε using the nondifferentiable double-obstacle potential. In that case gradVGLε(u) is no longer uniquely determined at each u. Instead we require the use of subgradients [Reference Ekeland and Temam67, definition 5.1] and the differential equation describing the gradient flow is replaced by a differential inclusion, dudtVGLε(u), where the subdifferential VGLε(u) of GLε at uV (with respect to the V-inner product) is the setFootnote 46

βV:wVwu,βV+GLε(u)GLε(w),if GLε(u)<+,,if GLε(u)=+.

Using the double-obstacle potential in GLε then gives us the following gradient flow differential inclusion:Footnote 47

dudt=Δu+1ε12+u+β, where, iV, βi(t)Rı[0,1](ui(t)).(7.2)

We define a solution of (7.2) on an interval TR to be a pair (u,β)V[0,1],tT×VtT with uHloc1(T;V)C0(T;V) such that (7.2) is satisfied at a.e. tT and for all iV.Footnote 48 In Budd and Van Gennip [Reference Budd and van Gennip35, theorem 3.2] it is proven that if (u,β) is a solution to (7.2), then β is determined at all iV and a.e. tT to be

βi(t)=12+ε(Δu(t))i,if ui(t)=0,0,if ui(t)(0,1),12+ε(Δu(t))i,if ui(t)=1.

This means that the differential inclusion above is actually a differential equation.

When discussing the Allen–Cahn equation with the double-obstacle potential from (7.2), we have stated the result for a.e. tT. This contrasts with the Allen–Cahn equation with the double-well potential from (7.1), for which we have seen in Remark 7.3 that, for every initial condition, a unique solution exists on [0,). In [Reference Budd and van Gennip35, theorems 3.9 and 3.10] it is shown that, for all initial conditions u0V[0,1] there exists a pair (u,β)V[0,1],tT×VtT which satisfies (7.2) for a.e. tT with u(0)=u0 and uHloc1(T;V)C0,1(T;V). Moreover, u is uniquely determined for all tT and β for a.e. tT.

The choice for the double-obstacle potential makes it possible to connect the graph Allen–Cahn equation with the graph Merriman–Bence–Osher (MBO) scheme, which we introduce in Section 8. This connection is looked at in more detail in chapter 6 of [Reference van Gennip and Budd190].

Also, (numerical) practice is influenced by the choice for the double-obstacle potential, as argued in Bosch et al. [Reference Bosch, Klamt and Stoll26], where the (fidelity-forced; see Section 7.3) graph Allen–Cahn equation with a non-smooth potential outperforms the one with a smooth potential on image segmentation tasks (for more detail on image segmentation, we refer to section 4.1.3 of [Reference van Gennip and Budd190]).

7.3 Allen–Cahn with Constraints

We can incorporate the mass constraint or fidelity constraint from Section 5.1 into the gradient flow. Starting with the latter, recall that the fidelity term 1pμ1/p(uf)V,pp is added to GLε(u), where μV[0,){0} has support ZV. For simplicity we restrict ourselves here to the case p=2. This is also most commonly used in the literature, for example, Bertozzi and Flenner [Reference Bertozzi and Flenner21, Reference Bertozzi and Flenner22] (double-well potential) and Budd et al. [Reference Budd, van Gennip and Latz37] (double-obstacle potential). At the level of the V-gradient flow equation, this adds a term μ(uf) (i.e., minus the first variation of the fidelity term in the functional) to the right-hand side of (7.1) or (7.2), leading to the fidelity-forced Allen–Cahn equation on graphs:

dudt=Δu1εg(u,β)μ(uf),(7.3)

where g(u,β)=Wu for the double-well potential case (see (7.1)) and g(u,β)=12uβ for the double-obstacle potential case (see (7.2)).

To instead impose a mass constraint M(u)=M, we recall from Section 5.1 that we can do so by adding the term ı{0}(M(u)M) to the functional. In the gradient flow this leads to an additional term [Reference Ekeland and Temam67, proposition 5.6] which should be an element of the subdifferential Vm(u), where m(u):=ı{0}(M(u)M).

Lemma 7.6. Let uV and MR, then

Vm(u)={vV:cRiVvi=c},if M(u)=M,,if M(u)M.

Proof. The proof is found in lemma 3.1.6 in [Reference van Gennip and Budd190]. □

Lemma 7.6 shows that, if a mass constraint is imposed, an additional constant term is added to the gradient flow. The value of this constant is determined by the requirement that the resulting equation conserves mass. To be explicit, if dudt=Δu1εg(u,β)+c (recall the definition of g(u,β) from its introduction in the fidelity-forced case), then we require

0=ddtu,1V=1εg(u,β),1V+cvolV=1εM(g(u,β))+cvolV,

where we used that M(Δu)=0. Hence c=1εvolVM(g(u,β)) and the mass-conserving Allen–Cahn equation is

dudt=Δu1εg(u,β)+1εvolVM(g(u,β)).(7.4)

We emphasize that c is constant as function in V, that is, it has the same value at each node, but may (and does) depend on u.

We may also impose the fidelity and mass constraints simultaneously. Either by explicit computation of the subdifferential of the sum of the fidelity term and m, or by using the fact that this subdifferential of the sum is equal to the sum of the respective subdifferentials [Reference Ekeland and Temam67, proposition 5.6], we find that the resulting mass-conserving fidelity-forced (p=2) Allen–Cahn equation (version 1) is

dudt=Δu1εg(u,β)μ(uf)+c,

for some constant cR. As before, we can determine the value of c by requiring that dudt,1V=0, which leads to c=1volV1εM(g(u,β))+μ,ufV.

In Calder et al. [Reference Calder, Cook, Thorpe, Slepčev, Hal and Singh44] a different approach is taken. Instead of the fidelity term from Section 5.1, a term u,μfV is added to the mass-conserving Ginzburg–Landau functional.Footnote 49 We note the minus sign. In the minimization of the functional, this term encourages ui to be large whenever μifi is large and to be small whenever μifi is small. The corresponding mass-conserving fidelity-forced Allen–Cahn equation (version 2) is

dudt=Δu1εg(u,β)+μf+c,

where again c is determined by the requirement of mass conservation: c=1volV1εM(g(u,β))μ,fV.

As was the case for the Allen–Cahn equation without constraints (see Remark 7.3), if we use the double-well potential in the constrained equation in (7.3) or (7.4), then the right-hand side of the equation is locally Lipschitz continuous and thus existence and uniqueness of solutions locally in time for any given initial condition follow from the Picard–Lindelöf theorem of ODE theory (see Hale [Reference Hale103, chapter I, theorem 3.1]). A gradient flow argument as in Remark 7.3, this time using the constrained Ginzburg–Landau functionals, again rules out finite-time blowupFootnote 50 and thus the solution can be extended to [0,) (see [Reference Hale103, chapter I, theorem 2.1]).

In the double-obstacle potential case (as for (7.2)), we define a solution of (7.3) (or (7.4)) on an interval TR to be a pair (u,β)V[0,1],tT×VtT with uHloc1(T;V)C0(T;V) such that (7.3) (or (7.4)) is satisfied at a.e. tT and, for all iV and a.e. tT, βi(t)Rı[0,1](ui(t)). (We will sometimes simply refer to u as a solution, implying the existence of a corresponding β.) We refer to Budd and Van Gennip [Reference Budd and van Gennip36, theorems 3.8 and 3.9] and Budd et al. [Reference Budd, van Gennip and Latz37, theorem 2.7] for proofs of existence and uniqueness of solutions for the initial-value problems corresponding to the mass-constrained and fidelity-forced graph Allen–Cahn equations, respectively. The function u is uniquely determined for all t. In the fidelity-forced case the function β is uniquely determined for a.e. t; in the mass-conserving case β is uniquely determined up to a constant for almost every time t. Moreover, this constant is zero if there exists such a time t and a node i such that ui(t)(0,1) (for more details, we refer to [Reference Budd and van Gennip36, theorem 3.8]).

7.4 Different Metrics

A gradient flow is determined by the functional of which the gradient is taken and the metric structure with respect to which the gradient is constructed. Earlier we used the metric induced by the V-inner product. Other choices lead to other gradient flows. In Van Gennip [Reference van Gennip188, definition 4.1], for example, the H1-inner product for functions u,vV with M(u)=M(v)=0 is introduced:

u,vH1:=φ,ψE=Δφ,ψV=φ,ΔψV,

where φ,ψV solve the graph Poisson equations Δφ=u and Δψ=v, respectively. In [Reference van Gennip188, section 3.4] it is proven that the zero-mass conditions on u and v are necessary and sufficient to ensure that these Poisson equations have solutions. Moreover, these solutions are unique up to an additive constant, which does not influence the inner product. In [Reference van Gennip188, Supplementary Materials section 4] the graph Cahn–Hilliard equation (named thus, in analogy to the continuum H1-gradient flow of the Ginzburg–Landau functional; see section 3.7.3 in [Reference van Gennip and Budd190]) is derived as the H1-gradient flow of GLε:

dudt=ΔΔu1εΔ(Wu).51

Footnote 51Since M(Δw)=0 for any wV, we see that the Cahn–Hilliard equation automatically conserves mass. It may appear restrictive that we required u to have zero mass, but if M(u)0, then we can easily transform u into a function with zero mass: v=uM(u)volV. This addition of a constant (as a function in V) term does not affect the Dirichlet term in GLε and effectively shifts the graph of W (and thus its wells) by M(u)/volV to the right. We note that this shift does depend on u, but if we restrict ourselves to functions uV with prescribed mass M (not necessarily zero), we get a constant shift M/volV. This restriction is natural, since the Cahn–Hilliard equation preserves mass.

Other choices for the metric are possible, but have not yet been explored in the literature, to the best current knowledge of the authors.

8 Merriman–Bence–Osher Scheme

In this section we take a look at a threshold dynamics scheme called the Merriman–Bence–Osher (MBO) scheme.

8.1 Definition of the MBO Scheme

The MBO scheme in the continuum was originally introduced in Merriman et al. [Reference Barry, Bence and Osher146, Reference Barry, Bence and Osher147] as a method to approximate the flow by mean curvature (we refer to section 3.7.5 of [Reference van Gennip and Budd190] for more information about continuum mean curvature flow and to Section 9 of this Element for graph variants), but has found its way into the ‘differential equations on graphs’ literature in recent years mostly as an alternative way to (approximately) minimize GLε. Before we consider some reasons why it is not unreasonable to use the scheme for this purpose, we will first give its recursive definition on graphs:

Graph MBO Scheme (without Constraints)

  1. Initialize. Choose an initial condition u0=χS0 with S0V and a ‘time step’Footnote 52 τ>0.

  2. Step k+1: diffusion. Solve the diffusion/heat equation dudt=Δu on (0,τ] with initial condition u(0)=uk.

  3. Step k+1: threshold. Define, for all iV, uik+1:=0,if ui(τ)<12,1,if ui(τ)12.

  4. Stop. Stop the scheme when a stopping condition or predetermined number of steps has been achieved.Footnote 53

By standard methods for linear ODEs (see Hale [Reference Hale103, section III.1]), it follows that the diffusion step has a unique outcome u(τ), given an initial condition uk.

The output of this scheme is a sequence of functions u0,u1,u2,V{0,1}, or equivalently a sequence of subsets S0,S1,S2,V, where uk=χSk. We will freely move between both representations as is convenient.

A potential, minor, source of ambiguity in the literature concerns the value of uik+1 if ui(τ) is exactly at the threshold: if ui(τ)=12, we can decide to set uk+1=0 or uk+1=1. We have decided on the latter for the algorithm given earlier, but in other literature the other choice may have been made. In practice, it is unlikely that ui(τ)=12 is exactly achieved, but for theoretical purposes this choice will determine the (non-)strictness of various inequalities along the way.

A first, hand-waving, reason to expect the graph MBO scheme to minimize GLε is its superficial similarity to the Allen–Cahn equation from (7.1). Instead of solving a nonlinear differential equation as in (7.1), when executing the MBO scheme the solution of a linear differential equation is sought in the diffusion step up to a, typically small,Footnote 54 time τ>0, after which the phase-separating drive of the W-derived nonlinear term from (7.1) is mimicked by the thresholding step. (For some more details about this interpretation of the MBO scheme as a time-splitting scheme, we refer to remark 6.1.8 in [Reference van Gennip and Budd190].)

Another reason to suspect similar behaviour from the graph MBO scheme as from the graph Allen–Cahn equation is that in the continuum both processes approximate mean curvature flow (for small ε in the Allen–Cahn case and small τ in the MBO case). This can be made rigorous in the form of precisely formulated convergence results, whose details fall outside the scope of the current work. In section 3.7.6 of [Reference van Gennip and Budd190] we give a brief overview of some literature about the continuum versions of the graph models, equations, and schemes we discuss in the current work, including these convergence results.

In chapter 6 of that same work [Reference van Gennip and Budd190], we also present a rigorous link between a specific time discretization of the Allen–Cahn equation with double-obstacle potential (7.2) and the MBO scheme.

8.2 The Signless MBO Scheme

Starting from the signless Ginzburg–Landau functional from Section 5.3, Keetch and Van Gennip [Reference Keetch and van Gennip117] and Keetch [Reference Keetch116] define a signless MBO scheme.Footnote 55

Signless Graph MBO Scheme

  1. Initialize (signless). Choose an initial condition u0=χS0χS0c with S0V and a ‘time step’ τ>0.

  2. Step k+1: signless diffusion. Solve the signless diffusion/heat equation dudt=Δσu on (0,τ] with initial condition u(0)=uk.

  3. Step k+1: signless threshold. Define, for all iV,

    uik+1:=1,if ui(τ)<0,1,if ui(τ)0.

  4. Stop.

We note that, besides the use of the signless graph Laplacian Δσ in the diffusion step, we also require the binary functionsFootnote 56 uk to take values that are symmetric with respect to the origin, as argued in Section 5.3.

As with the diffusion step before, standard ODE techniques guarantee the existence of a unique outcome of the signless diffusion step.

In [Reference Keetch and van Gennip117] it is shown that the signless MBO scheme performs well for the Max-Cut problem (see Section 5.3) in some practical tests, finding comparable or greater cut values than the Goemans–Williamson algorithm [Reference Goemans and Williamson99], in less computation time – but without any rigorous performance guarantees. Finding such guarantees for any of the MBO-type algorithms is currently still an open question of great interest.

8.3 The MBO Scheme with Constraints

A mass or fidelity constraint, as in Section 5.1, can also be incorporated into the MBO scheme. In the current section we will state and explain these schemes. The rigorous link between the Allen–Cahn equation and MBO scheme which was first established in Budd and Van Gennip [Reference Budd and van Gennip35] and which is also presented in chapter 6 of [Reference van Gennip and Budd190] does explain how these MBO schemes with constraint can be derived from the Allen–Cahn equations with constraint as presented in Section 7.3.

To incorporate a mass constraint as in [Reference van Gennip188, section 5.3]Footnote 57 and [Reference Budd and van Gennip36, theorem 4.16], we first require some more notation. If u is the solution of the (k+1)st diffusion step, we denote the preimage of xR under u at time τ by

U(x):={iV:ui(τ)=x}.

Let LN be the (always strictly positive and finite) number of xR for which U(x) and assume the labels x are such that x1<<xL. Then by [Reference Budd and van Gennip36, theorem 4.16] there is a unique *[K] such that

=*+1LM(U(x))<M(uk)=*LM(U(x)).

We now replace the threshold step by a mass-conserving threshold step to get a mass-conserving MBO scheme:

Mass-Conserving Graph MBO Scheme

  • Initialize.

  • Step k+1: diffusion.

  • Step k+1: mass-conserving threshold. Let uk+1 be any function in V which satisfies M(uk+1)=M(uk) and, for all iV,

    uik+1:=0,if ui(τ)1<*U(x),1,if ui(τ)*<LU(x).

  • Stop.

Remark 8.1. In the mass-conserving threshold step we order the nodes by their value of the diffused state ui(τ) and assign, by setting uik+1=1, as much mass to the nodes at the top of the order (with the highest diffused state values xL>>x*+1) as we can without assigning more than the required total mass M(uk). The leftover mass can be assigned in any way to the nodes in U(x*). If and only if |U(x*)|>1 and =*LM(U(x))M(uk), this introduces nonuniqueness into the scheme (see [Reference Budd and van Gennip36, theorem 4.16] or [Reference Budd33, theorem 4.2.16]). In [Reference van Gennip188] the choice is made to pick exactly enough (arbitrarily chosen) nodes iU(x*) so that all leftover mass can be assigned to them. If r0, the maximum amount of mass that can be assigned per node, namely dir, is node-dependent, so that it is possible (in practice likely) that the leftover mass does not match exactly the sum dir over the chosen nodes. In this case one of the chosen nodes does not get its full mass assigned, that is the resulting function uk+1 will be non-binary, since ui/{0,1} at this one node. Other choices to deal with the nonuniqueness are possible, for example, an equal division of mass over all nodes in U(x*), which may also lead to a non-binary function uk+1.

Next, we incorporate a fidelity constraint into the MBO scheme [Reference Budd, van Gennip and Latz37], instead of a mass constraint. This time we change the diffusion step of the scheme, instead of the threshold step. Restricting to the case p=2 (i.e., a fidelity term of the form 12μ12(uf)V2 in the functional) and taking a clue from the fidelity-forced Allen–Cahn equation from (7.3), we replace the diffusion step by a fidelity-forced diffusion step to get a fidelity-forced MBO scheme (version 1):

Fidelity-Forced Graph MBO Scheme (Version 1)

  • Initialize.

  • Step k+1: fidelity-forced diffusion. Solve the fidelity-forced diffusion/heat equation dudt=Δuμ(uf) on (0,τ] with initial condition u(0)=uk.

  • Step k+1: threshold.

  • Stop.

The interest in MBO schemes on graphs started with the introduction of this fidelity-forced scheme in Merkurjev et al. [Reference Ekaterina, Tijana and Bertozzi144] for classification and image processing applications (see also chapter 4 of [Reference van Gennip and Budd190]).

As before, by the usual methods for linear systems of ODEs (see Hale [Reference Hale103, chapter III.1]), it follows that the fidelity-forced diffusion step has a unique outcome, given an initial condition uk.

In Section 7.3 we described an alternative way to incorporate a fidelity constraint into a mass-conserving Allen–Cahn equation from Calder et al. [Reference Calder, Cook, Thorpe, Slepčev, Hal and Singh44]. This can be used in a non-mass-conserving context as well, giving rise to the following fidelity-forced scheme (version 2) (the continuum limit of which is examined in Laux and Lelmi [Reference Laux and Lelmi124]; see also section 7.2.6 of [Reference van Gennip and Budd190]):

Fidelity-Forced Graph MBO Scheme (Version 2)

  • Initialize.

  • Step k+1: fidelity-forced diffusion. Solve the fidelity-forced diffusion/heat equation dudt=Δu+μf on (0,τ] with initial condition u(0)=uk.

  • Step k+1: threshold.

  • Stop.

If we impose both a fidelity and mass constraint at once, both the diffusion and threshold steps of the original MBO scheme get replaced. While the threshold step gets replaced by the mass-conserving threshold step as described earlier, the diffusion step must be replaced by a mass-conserving fidelity-forced diffusion step to obtain a mass-conserving fidelity-forced MBO scheme (version 1):

Mass-Conserving Fidelity-Forced Graph MBO Scheme (Version 1)

  • Initialize.

  • Step k+1: mass-conserving fidelity-forced diffusion. Solve the mass-conserving fidelity-forced diffusion/heat equation dudt=Δuμ(uf)+1volVμ,ufV on (0,τ] with initial condition u(0)=uk.

  • Step k+1: mass-conserving threshold.

  • Stop.

As above, we can also define a different mass-conserving fidelity-forced MBO scheme, used in Calder et al. [Reference Calder, Cook, Thorpe, Slepčev, Hal and Singh44], which uses the following mass-conserving fidelity-forced diffusion step instead of the one described earlier to arrive at a mass-conserving fidelity-forced MBO scheme (version 2):

Mass-Conserving Fidelity-Forced Graph MBO Scheme (Version 2)

  • Initialize.

  • Step k+1: mass-conserving fidelity-forced diffusion. Solve the mass-conserving fidelity-forced diffusion/heat equation

    dudt=Δu+μf1volVμ,fV on (0,τ] with initial condition u(0)=uk.

  • Step k+1: mass-conserving threshold.

  • Stop.

As for the other diffusion(-like) steps earlier, we note that mass-conserving fidelity-forced diffusion initial-value problems have unique solutions.

As mentioned before, for explanations of how these alternative steps in the MBO scheme can be derived, we refer to [Reference van Gennip and Budd190]. It is also interesting to compare the two alternative fidelity-forced and mass-conserving fidelity-forced MBO schemes which we presented above. A full comparison is an interesting topic for future work. As an initial step, in the companion volume we have included a closer look at the steady states for the two mass-conserving fidelity-forced diffusion steps. It should be noted that in practice these steps will be run for a short time and steady states will not be achieved, but they do shed light on some features that distinguish both methods.

9 Graph Curvature and Mean Curvature Flow

As with many of the other concepts, functionals, and dynamics we have introduced thus far, mean curvature flow (or motion by mean curvature) finds it origins in the continuum setting. It refers to the evolution of a set in which each boundary point moves with a normal velocity proportional to the local mean curvature at that point. In this section we describe various attempts to define a meaningful concept of curvature and mean curvature flow (MCF) on a graph. For some references about mean curvature flow in the continuum setting, we refer to section 3.7.5 of [Reference van Gennip and Budd190]. First we introduce various notions of curvature on graphs.

9.1 Curvature

Various different definitions of graph curvature can be found in the literature. In this section we discuss a few of them.

In the discussion following (3.10) we determined that the maximum in the definition of TV(u) is achieved by φ=sgn(u). In the continuum setting, if u is the characteristic function of a set with smooth boundary, the supremum in the definition of total variation can be achieved by a smooth extension of the normal vector field on the boundary of the set. Inspired by this, in the graph setting, if u is the characteristic function of some node subset SV, namely u=χS, then we can interpret this edge function φ as a graph normal analogously to the normal vector field in the continuum:

νijS:=(sgn(χS))ij=sgn(χS)j(χS)i,if ωij>0,sgn(0),if ωij=0.

This prompted Van Gennip et al. [Reference Nestor, Braxton and Bertozzi191] to define graph (mean) curvature for a node subset S as κ:=divνS. Thus (allowing for a brief return of our parameter q from (3.1)):

κi=12dirjVωijq(sgn(χS))ji(sgn(χS))ij=12dir2jScωijq+jSωijqsgn(0)sgn(0),if iS,2jSωijq+jScωijqsgn(0)sgn(0),if iSc,=dirjScωijq,if iS,jSωijq,if iSc.

We note that κ is independent of the value chosen for sgn(0).

In El Bouchairi et al. [Reference Bouchairi, Abderrahim and Fadili69, remark 5.1(ii)] the graph mean curvature of a function uV at iV is defined as

(K(u))i:=di1jVωijsgn(ujui),(9.1)

where explicitly sgn(0)=1 is chosen. For r=1 and q=1, K(χS) would equal κ, if sgn(0)=0 had been chosen instead. The reason for the choice sgn(0)=1 in K is so that (K(u))i can be interpreted as the mean curvature Ki{jV:ujui} of the superlevel set {jV:ujui} at iV, where, for SV and iV,

KiS:=di1jScωijjSωij.

We note that (K(χS))i=KiS,if iS,1,if iSc. The curvature used in El Chakik et al. [Reference Chakik, Abdallah and Desquesnes70] is the same as K from (9.1).

In Zhou and Schölkopf [Reference Zhou, Schölkopf, Walter, Robert and Hanbury206, section 2.4] the authors define the graph mean curvature of uV as

12divu2uE,

where q=1 is chosen in the definition of the E-norm (see (3.1)) and the gradient and divergence operators are scaled to be consistent with the symmetrically normalized Laplacian of (3.6). Aside from the differences in scaling, this approach is similar to the ones mentioned earlier.

Finally we mention two notions of Ricci curvature on graphs: Ollivier’s Ricci curvature and Forman’s Ricci curvature.

Ollivier curvature (or coarse Ricci curvature; see Ollivier [Reference Ollivier165, Reference Ollivier, Motoko, Masanori and Kumagai166], also Münch and Wojciechowski [Reference Münch and Wojciechowski152]) is defined to be

(κGO)ij:=1WG(μi,μj)dijG,

where μ:VP(V) assigns a probability measure on the node set V to each node in V Footnote 58 and WG is a graph Wasserstein distance. In this context, the graph distance dijG often is chosen to be as in (3.15) with q=2 (or equivalently q=0, with reciprocal weights) – for unweighted graphs the choice of q in dijG is, of course, not relevant.

In Ni et al. [Reference Chien-Chun, Lin, Luo and Gao162, Reference Chien-Chun, Lin, Luo and Gao163] and Tian et al. [Reference Tian, Lubberts and Weber185] the probability-measure-valued node function μ is chosen to be

μi(j):=α,if j=i,(1α)e(dijG)pk{jV:ji}e(dikG)p,if ji,0,otherwise,

where, for given iV, μiP(V) assigns probability μi(j) to node jV. The parameters α[0,1] and p0 are both chosen to be zero for unweighted graphs, in which case the probability distribution μi is uniform over the neighbours of i. On random geometric graphs (see section 7.2 of [Reference van Gennip and Budd190]), Van der Hoorn et al. [Reference Cunningham, Gabor, Carlo and Krioukov186, section III.B] has a slightly different definition for Ollivier curvature:

(κ˜GO)ij:=1WG(μi,μj)δ,

where δ>0 is the distance between the sample points corresponding to nodes i and j along the geodesic on the manifold from which the points are sampled, and μi is the uniform distribution over the set of vertices that are within a weighted graph distance of at most δ>0 of the node i. If the edge weights ωij are chosen to be equal to the distance between i and j on the manifold, then in [Reference Cunningham, Gabor, Carlo and Krioukov186], this curvature is proven to convergeFootnote 59 to the Ricci curvature on the manifold that underlies the random geometric graph. In García Trillos and Weber [Reference Trillos and Weber96], explicit rates are proved for this convergence. We refer to those papers for further details, such as details about the graph Wasserstein distance (whose explanation would take us a bit too far afield here.) In particular, we refer to [Reference Trillos and Weber96] for an in-depth discussion of various choices of the graph distance and their impact on the result, and to [Reference Cunningham, Gabor, Carlo and Krioukov186, section II] and the references therein for information about more distinct notions of graph curvature.

Forman curvature (or combinatorial Ricci curvature) is defined in Forman [Reference Forman91] (see also [Reference Jost and Münch111, Reference Weber, Saucan and Jost196, Reference Weber, Saucan and Jost197]) for cell complexes. Here we do not introduce the concept in this general setting, but rather give its definition applied to graphs. Besides edge weights ωij, the definition also allows for node weights ωi. In our standard setting without node weights, we may assume that, for all iV, ωi=1. Forman curvature is then defined to beFootnote 60

(κGF)ij:=ωijωiωij+ωjωij2kVωiωijωik+ωjωijωkj.

9.2 Mean Curvature Flow

In Section 9.1 we saw that Van Gennip et al. [Reference Nestor, Braxton and Bertozzi191] and El Chakik et al. [Reference Chakik, Abdallah and Desquesnes70] introduce very similar notions of graph mean curvature. They also both give definitions of graph MCF, but do so starting from different descriptions of MCF at the continuum level.

In [Reference Chakik, Abdallah and Desquesnes70] the inspiration comes from the level set description of MCF in the continuum [Reference Chen, Giga and Goto50, Reference Clarenz, Haußer, Rumpf, Voigt, Weikard and of Voigt56, Reference Evans and Spruck82, Reference Evans and Spruck83, Reference Evans and Spruck84, Reference Evans and Spruck85]: dudt=|u|divu|u|, where the operators div and here denote the continuum versions of the divergence and gradient, respectively. If u satisfies this equation, then each level set of u evolves according to flow by mean curvature. Since divu|u| gives the curvature of the level sets of u, this leads [Reference Chakik, Abdallah and Desquesnes70] to introduce the following equations for MCF on graphs:

duidt=2p(K(u))i+(u)+i,p2p(K(u))i(u)i,p,if p[1,),duidt=(K(u))i+(u)+i,(K(u))i(u)i,,if p=.(9.2)

Recall the definitions of the positive and negative parts (superscripts ±) from Remark 3.3. Also recall the node-dependent p-norms from (3.3).

A similar MCF is used in El Bouchairi et al. [Reference Bouchairi, Abderrahim and Fadili69] and called power mean curvature flow, where each instance of K(u) in (9.2) is replaced by |K(u)|α1K(u) for a parameter α[0,1] (not to be confused with α(ε) from Section 5). In particular, if α=1, then [Reference Bouchairi, Abderrahim and Fadili69, formula (6.5)] is the same as (9.2).

If locally in time the mean curvature (K(u))i at node i is positive, only the term 2p(K(u))i+(u)+i,p or (K(u))i+(u)+i, remains in the right-hand side of the the preceding equations, which leads to what [Reference Chakik, Abdallah and Desquesnes70] calls a discrete dilation process. Similarly, if locally in time the mean curvature (K(u))i<0, only the terms with (K(u))i survive, leading to discrete erosion processes.

To understand those names, we recall from Remark 3.3 that [Reference Chakik, Abdallah and Desquesnes70, Reference Vinh-Thong, Elmoataz and Lézoray183] defines (nonlocal) dilation and erosion operators, NLD and NLE, respectively, in terms of the node-dependent -norms. These allow us to rewrite the graph MCF equation in (9.2) for p= as

duidt=(K(u))i+NLD(u)i+(K(u))iNLE(u)iui(K(u))i++(K(u))i.

For further brief discussions of the mean curvature flow from [Reference Chakik, Abdallah and Desquesnes70], in particular the decrease of total variation along its trajectories, we refer to Budd and Van Gennip [Reference Budd and van Gennip35, remark 6.4] and Budd [Reference Budd33, section 7.5]. Variants of this graph MCF equation are studied in El Bouchairi [Reference Bouchairi68] and El Bouchairi et al. [Reference Bouchairi, Abderrahim and Fadili69].

In [Reference Nestor, Braxton and Bertozzi191] (we also refer to Van Gennip [Reference van Gennip187, appendix B.1]) a different, variational, formulation is given for MCF, inspired by the variational formulation for continuum mean curvature flow in Almgren et al. [Reference Fred, Taylor and Wang6] and Luckhaus and Sturzenhecker [Reference Luckhaus and Sturzenhecker132]. This is a discrete-in-time scheme generating a sequence of node subsets Sk, starting from an initial set S0V, as follows:

Sk+1argminSVTV(χS)+1ðtχSχSk,(χSχSk)dSkV.(9.3)

Here dSk denotes the graph distance function (see Section 3.3) to Sk, which is the graph boundary of Sk defined by

Sk:={iV:jiχSkiχSkj}

and ðt>0 is a time-step parameter.

Besides its cosmetic resemblance to the continuum variational formulation of MCF in [Reference Fred, Taylor and Wang6, Reference Luckhaus and Sturzenhecker132], the definition for MCF in (9.3) has the desired property that any sequence (Sk) satisfying it does not increase total variation. To be precise, since S=Sk is admissible in the minimization problem in (9.3), we see that

TV(χSk+1)TV(χSk+1)+1ðtχSk+1χSk,(χSk+1χSk)dSkTV(χSk).

It is not true, however, that TV(χSk+1)=TV(χSk) only if Sk+1=Sk. Consider the following counterexample. Let G=(V,E,ω) be the graph with V={1,2,3,4}, ω12=ω21=ω23=ω32=ω34=ω43=1, ωij=0 in all other cases, and E contains (i,j) if and only if ωij0. If Sk={1,2}, then Sk={2,3} and thus d1Sk=d4Sk=1 and d2Sk=d3Sk=0 (recall from Remark 3.9 that we have chosen q=1). Moreover, d1r=d4r=1 and d2r=d3r=2r. For this example, the second term in the objective function in (9.3) thus becomes

1ðtχSχ{1,2},(χSχ{1,2})dSkV=1ðt((χS)11)2+(χS)42.

Since TV(χSk)=1, we know that TV(χSk+1)1. Furthermore, if S{,V}, then TV(χS)=0, and if S/{,V}, then TV(χS)1. In the former case, the objective function has value 1ðt. Looking for potential minimizers S in (9.3) that are different from and V, we know that TV(χS)=1, thus S{{1},{4},{1,2},{3,4},{1,2,3},{2,3,4}}. Among these candidates, only those that minimize the second term of the objective functional are potential miminizers of (9.3), thus we require 1S and 4/S. This leaves S{{1},{1,2},{1,2,3}}. For each of these candidates the objective function has value 1 and thus, if ðt<1, all these candidates are solutions Sk+1 of (9.3). In particular we note that two of these candidates are different from Sk, yet TV(χSk+1)=TV(χSk)=1.

Mean curvature flow of a form very similar to (9.3) is analysed on square grids by Braides et al. in [Reference Braides, Gelli and Novaga29].

In light of the close connection between the MBO scheme (and the Allen–Cahn equation) and MCF in the continuum that was briefly touched upon in Section 8.1 (and Remark 7.4) – we refer to section 3.7.6 of [Reference van Gennip and Budd190] for more details – a natural question is whether such a connection also exists in the graph setting. In [Reference Budd and van Gennip35, section 6.2] (and in more detail in [Reference Budd33, section 3]) it is argued that a close connection between the graph MBO scheme and the variational graph MCF from (9.3) is not to be expected.

A first reason for this is that, even though when the time step is large (i.e., large τ for MBO and large ðt for MCF) each scheme will reach a stationary state equal to u=0 (S=) or u=1 (S=V) after a single time step, the conditions governing which of these two states is chosen are different for MBO and MCF.

Since the main interest in these schemes is at small time steps, this first reason by itself would not necessarily be enough to abandon the formulation of MCF from (9.3). The other reason provided in [Reference Budd33, section 3] is more damning, if our intention is to connect MBO and MCF. Given a graph G=(V,E,ω) (satisfying the conditions described in Section 3.1) and η>0, we can consider the η-completion of G, which we define to be the graph Gη=(V,V2,ωη), where ωijη=ωij if ωij>0 and ωijη=η if ij and ωij=0. In [Reference Budd33, theorem 7.3.1] it is shown that the difference (in V-norm) between solutions of the heat equation on G and on Gη is bounded by CηteCt, where C>0 is a constant and 0<Cη=O(η) as η0.Footnote 61 Unless in very specific circumstances in which this small difference places both solutions on different sides of the threshold value in the MBO scheme, this shows that the MBO scheme will have very similar, if not the same, solutions on G and Gη. A similar conclusion is shown to hold for solutions of the graph Allen–Cahn equation.Footnote 62 On the other hand, however, if Sk/{,V}, then the graph boundary Sk on Gη is equal to V, thus dSk=0 and (9.3) reduces to Sk+1argminSVTV(χS)={,V}. For most graphs G, this behaviour of MCF on Gη will differ significantly from the behaviour of MCF (according to (9.3)) on G. In other words, while the behaviour of solutions to the Allen–Cahn equation or MBO scheme is expected to be very stable under the η-completion of G, the behaviour of solutions of (9.3) is not. Simple redefinitions of the graph boundary, for example as SkSk or Sk(VSk), do nothing to alter this conclusion.

A promising alternative is offered in [Reference Budd and van Gennip35, section 6.2] and [Reference Budd33, section 7.4]:

Sk+1argminSVTV(χS)+1τe12τΔ(χSχSk)V2.(9.4)

Just as we saw for solutions of (9.3), solutions of (9.4) satisfy TV(χSk+1)TV(χSk). Unlike before, we now also have that equality is achieved if and only if Sk+1=Sk, since e12τΔ is an invertible operator and thus equality in

TV(χSk+1)TV(χSk+1)+1τe12τΔ(χSk+1χSk)V2TV(χSk).

cannot be achieved unless χSk+1χSk=0 [Reference Budd33, proposition 7.4.1].

The main objection against the formulation in (9.3) does not apply to the formulation in (9.4). The total variation term in the objective function in (9.4) depends linearly on the edge weights and thus will differ only by an O(η) term between graphs G and Gη; the second term in the objective function depends on the solution of the graph diffusion equation and thus, by the argument from [Reference Budd33, theorem 7.3.1] discussed earlier, will also only differ by an O(η) term.

The other, minor, objection which was brought against (9.3) also does not apply here. It is argued in [Reference Budd33, section 7.4] that the behaviour at large τ of the scheme in (9.3) is very similar to that of the MBO scheme at large τ.

In section 6.3 of [Reference van Gennip and Budd190] we compare MCF defined by (9.4) more closely with the MBO scheme.

10 Freezing of Allen–Cahn, MBO, and Mean Curvature Flow

Freezing (or pinning) is the phenomenon where for very small values of the parameters ε (in the graph Allen–Cahn equation), τ (in the graph MBO scheme), or ðt or τ (in the variational graph mean curvature flow) no, or only trivial, dynamics occur. (See also footnote Footnote 54.)

In the graph Allen–Cahn equation, if the parameter ε is very small, the potential term dominates the right-hand side of the equation, leading to dynamics in which each value ui simply settles into the well of W in which it starts. Rigorous quantitative statements are made in Van Gennip et al. [Reference Nestor, Braxton and Bertozzi191, theorem 5.3], Budd and Van Gennip [Reference Budd and van Gennip35, remark 4.7], Budd and van Gennip [Reference Budd and van Gennip36, section 3.3], and Budd [Reference Budd33, theorem 3.4.11]. We collect them in the following lemma.

We define ρΔ to be the spectral radius of Δ, namely the maximal eigenvalue of Δ. In general, we write ρO for the spectral radius of a linear operator O:VV.

Lemma 10.1. Let W(x)=(x1)2(x+1)2. Let u0V and let u solve

duidt=(Δu)i1εdirW(ui),for all iV and t>0,u(0)=u0.

Then there exists a C>0 such that, for all t0, u(t)VC. Now assume additionally that there exists an α(0,1) such that, for all iV, |ui0|α. If

εC1ρΔ14α(1α2)maxiVdir2, orεsupt0Δu(t)V,14α(1α2)maxiVdir,

then, for all iV, tsgn(ui(t)) is constant.

Now assume that W is the double obstacle potential from (5.1) instead and T is some interval. Let S be a strict and nonempty subset of V. Then u(t)=χS, for all tT, is a solution of (7.2) if and only if

ε12ΔχSV,1.

Similarly u(t)=χS solves the mass-conserving Equation (7.4) if and only if

εmaxiSc|(ΔχS)i|1εmaxiS|(ΔχS)i|.

This latter condition is satisfied if ε12ΔχSV,1.

Lastly, u(t)=χS solves the fidelity-forced equation (7.3) if and only if

εmaxiSc|(ΔχS)i|+fi12andεmaxiS|(ΔχS)i|fi+μi12.

For the graph MBO scheme, the pinning or freezing phenomenon appears in a different guise. If the time parameter τ in the diffusion step is so small that at every node the local function value does not cross the threshold value, then the thresholding step will return the initial value and hence the scheme is stationary. The critical value of τ below which this behaviour is produced, depends not only on the structure of the graph, but also on the initial condition that is fed into the diffusion step. After all, it is known that for fixed τ the graph MBO schemes converge in a finite number of steps [Reference Nestor, Braxton and Bertozzi191, proposition 4.6] and so eventually a stationary state will be reached which, for the given τ, will be ‘frozen’ by the scheme. In practical applications, however, one wants to choose a value of τ that does not immediately freeze the initial condition that is given at the start of the first iteration of the scheme. The following lemma collects bounds on the critical value for τ for various variants of the graph MBO scheme. These bounds are not sharp and to the best of the authors’ knowledge, sharp bounds are not currently known. Similar restrictions on the size of τ were already discussed in the context of a finite-difference discretization of the continuum MBO scheme in Ruuth [Reference Ruuth173, section 2.3]. The bounds listed in the next lemma are taken from Van Gennip et al. [Reference Nestor, Braxton and Bertozzi191, theorem 4.2], Budd and Van Gennip [Reference Budd and van Gennip35, theorem 4.5 and remark 4.6], Budd and Van Gennip [Reference Budd and van Gennip36, theorem 4.24] (for the mass-conserving scheme), Budd [Reference Budd33, section 4.3.2] (for both the mass-conserving scheme and the fidelity-forced scheme), Keetch and Van Gennip [Reference Keetch and van Gennip117, theorem 5.3] (for the signless scheme), and Keetch [Reference Keetch116, theorem 4.2.3] (also for the signless scheme). The references [Reference Budd33, Reference Budd and van Gennip35, Reference Budd and van Gennip36] actually provide information about a whole family of schemes, of which graph MBO is only one member. These so-called semidiscrete implicit Euler (SDIE) schemesFootnote 63 are the main objects of study in chapter 6 of [Reference van Gennip and Budd190] and we briefly address the topic of freezing then and there.

To ease notation, we define A:VV by Au:=Δu+μu. Recall that ρA and ρΔσ are the spectral radii of A and the signless Laplacian Δσ, respectively.

Lemma 10.2. Let kN0, SV such that SV, and uk=χS. Let uk+1 be the result of one iteration of either the graph MBO scheme or the mass-conserving graph MBO scheme applied to initial condition uk. If

τ<12ΔχSV,

or

τ<ρΔ1log1+12miniVdir/2volS1/2,(10.1)

then uk+1=uk.

Alternatively, let SV (S= or S=V are now allowed) and let uk+1 be the result of one iteration of the fidelity-forced graph MBO scheme applied to initial condition uk. If

τ<12supt[0,)etAχVV,AχSfV,

or

τ<ρA1log1+12miniVdir/2volS1/2+ρA1fV1,

then uk+1=uk.

Finally, let SV and uk=χSχSc, and let uk+1 be the result of one iteration of the signless graph MBO scheme applied to initial condition uk. If

τ<ρΔσ1log1+miniVdir/2volV1/2,

then uk+1=uk.

If we let f=0 and A=Δ in the fidelity-forced case, we observe that we recover the conditions for the graph MBO scheme without forcing.

We note that the restriction SV in the first part of Lemma 10.2, does not diminish the generality of the lemma, since χ and χV are stationary states of the (mass-conserving) graph MBO scheme for any τ>0.

For the signless graph MBO scheme only one condition for τ is given in Lemma 10.2, whereas two conditions are given for the other two schemes. To mimick one of the proofs from the other schemes to try and derive a second condition for the signless scheme, we would require a comparison principle (as in [Reference Nestor, Braxton and Bertozzi191, lemma 2.6(d)]; see also [Reference Budd33, theorem 3.2.6]) for the signless graph Laplacian Δσ to produce an analogue of the proof of [Reference Nestor, Braxton and Bertozzi191, theorem 4.2], or we would require etΔσ to have a matrix representation with only nonnegative elements to mimick the proof from [Reference Budd33, lemma 4.3.4]. Neither is true; for a counterexample see section 3.4 of [Reference van Gennip and Budd190]. (Of course, the fact that mimicking the proofs from the other schemes does not work, does not prove a similar second condition does not hold.)

Remark 10.3. In section 6.1 [Reference van Gennip and Budd190] we present variational formulations of the graph MBO scheme and its fidelity-forced and mass-conserving variants. The functions generated by these schemes are solutions of these variational problems, but, due to some subtle nonuniqueness issues that are explained in that book, these are not the only solutions. Despite that, the results presented in Lemma 10.1 for these three graph MBO schemes in fact hold for all solutions of their variational formulations.

In [Reference Nestor, Braxton and Bertozzi191, theorem 4.8] (with an updated proof in [Reference van Gennip187, appendix A]) a condition for pinning in the graph MBO scheme at a specific node was given that depended only on the local graph structure around that node. Unfortunately, later in [Reference van Gennip187, theorem 3.1], it was proven that this is an ‘empty’ condition, namely that it cannot be satisfied by any graph. In [Reference van Gennip187, sections 4 and 5] ‘nonempty’ local conditions are provided specifically for star graphs and regular trees.

Pinning occurs in the MBO scheme if τ is chosen to be small; on the other hand, if τ is large, another example of trivial dynamics may occur, as one implementation of the diffusion step could lead to a state u(τ) which is close to the constant steady state M(u0)volVχV and hence the subsequent threshold step would return either u1=χV or u1=χ, which are stationary states of the MBO scheme. The following lemma gives a precise result (originally published as [Reference Nestor, Braxton and Bertozzi191, theorem 4.3]). We write λ2 for the second smallest eigenvalue of Δ (i.e., the algebraic connectivity; see Section 6).Footnote 64

Lemma 10.4. Let kN0, SV, uk=χS, and assume that RS:=M(uk)volV12. Let uk+1 be the result of one iteration of the graph MBO scheme applied to initial condition uk. If

τ>λ21log(volS)1/2(volSc)1/2(volV)1/2|RS12|miniVdir/2,(10.2)

then

uk+1=χV,if RS>12,χ,if RS<12.

It is natural to ask if there is a gap between the lower bound of (10.1) below which pinning occurs in the MBO scheme and the upper bound of (10.2) above which the dynamics is trivial in the way described in Lemma 10.4. The following result from [Reference Nestor, Braxton and Bertozzi191, theorem 4.4] (with corrections to the proof in [Reference van Gennip187, appendix B.2]; see also the discussion in [Reference Budd33, section 7.3.2]) answers affirmatively.

Lemma 10.5. If λ2ρΔ<log2log(3/2)0.85, then there exists a τ which does satisfy neither the inequality in (10.1), nor the inequality in (10.2).

We emphasize that there is no reason to believe that the inequality conditions in the lemmas above are sharp. In particular, Lemma 10.5 does not preclude the possibility that, for a particular given graph and particular given initial condition uk, every choice of τ>0 leads to either the behaviour described in Lemma 10.2 or the behaviour from Lemma 10.4.

In Boyd et al. [Reference Boyd, Bae, Tai and Bertozzi27, propositions 4.1 and 4.2], results similar to some of those in Lemmas 10.2 and 10.4 are presented for an MBO scheme with a generalized diffusion step.Footnote 65 That particular scheme is used for modularity optimization (see section 4.1.4 of [Reference van Gennip and Budd190]). The practical suggestion in [Reference Boyd, Bae, Tai and Bertozzi27, section 4.5] is to choose τ as the geometric mean of the (trivial-dynamics-avoiding) upper and lower bounds that are obtained for τ.

We end this section by taking a brief look at the behaviour of graph mean curvature flow for small τ and large τ. As explained in Section 9.2 there is currently not one main definition for the graph MCF and the behaviour at small or large time steps will depend on which form of the scheme that is chosen. For some notes on the small time-step behaviour of some of the variational graph MCF formulations we refer the interested reader to [Reference Nestor, Braxton and Bertozzi191, remark 3.10] and [Reference Budd33, theorem 7.4.2 and note 38], for information about behaviour at large time steps to [Reference Budd33, sections 7.3.2 and 7.4]. We end this section by giving some results from [Reference Budd33, section 7.4] for the graph MCF defined in (9.4) in detail.

We require the Lambert W-function [Reference Corless, Gonnet, Hare, Jeffrey and Knuth57], which we denote by WL. We note that, if x[0,) (as is the relevant case in Lemma 10.6), then WL is the unique function on [0,) that satisfies WL(x)eWL(x)=x. To make the following bound more intuitive, we note the following simple approximation for WL(x) when x0, which is given in Iacono and Boyd [Reference Roberto and Boyd107]:

WL(x)log1+x1+12log(1+x).

Lemma 10.6. Let kN, SkV, and let Sk+1 be a graph MCF update defined by (9.4). If

τeτρΔ<miniVdirmaxSVTV(χS), or equivalently, τ<ρΔ1WLρΔminiVdirmaxSVTV(χS),

then Sk+1=Sk.

Now assume τ>volVmin(i,j)Eωij, then Sk+1{,V}. If volSk<12volV, then Sk+1=. Alternatively, if volSk>12volV instead, then Sk+1=V.

We note that the volume conditions on Sk in the ‘large τ’ case of Lemma 10.6, correspond to the conditions on RS in Lemma 10.4 which decided the ‘large τ’ behaviour of the MBO scheme.

The results we have mentioned thus far have all been obtained by studying the discrete schemes directly and all provide parameter regions in which freezing is guaranteed to happen. A different approach is to consider the continuum limit of the graph-based MBO scheme and obtain parameter scalings under which the limiting dynamics is not frozen, as a way to guide parameter choices to avoid freezing. One example in the literature is Misiats and Yip [Reference Misiats and Yip149], which considers three regimes for the parameter τ for MBO on a two-dimensional square grid with grid size h. With h=C(τ)γ, for some constant C>0, the regimes are γ>1, in which the limiting dynamics is mean curvature flow; γ<1, in which the limiting dynamics is frozen; and the critical case γ=1 in which the dynamics depends on finer details. Another example is Reference Laux and LelmiLaux and Lelmi (2021, theorem 6 and corollary 8) which studies the continuum limit of (a generalization of) the graph MBO scheme on random geometric graphs and guarantees nontrivial limit dynamics if τ0 such that lim|V|[(log(|V|))ατ]1=0 and lim|V|ε(log(|V|))β=0, for constants α>0 (not to be confused with α(ε) from Section 5) and β>0 in a specified range. Here ε is the length scale used to construct the edge weights in the graph. (More details about the construction of the random geometric graphs are given in section 7.2 of [Reference van Gennip and Budd190].) Interestingly, the convergence to limiting dynamics remains valid even if the spectral decomposition of the graph Laplacian is truncated after the lowest K eigenvalues and their corresponding eigenfunctions, as is commonly done in implementations to keep the scheme computationally manageable and to filter out noise that may be present in the higher eigenfunctions (see also chapter 5 [Reference van Gennip and Budd190]). In this case K(log(|V|))q is required, for q>0 (unrelated to the parameter q from Section 3) in a specified range.

11 Multiclass Extensions

The models we have discussed so far all describe exactly two phases (or classes, or clusters), which allows for a characterization of the state of the system via a single (approximately) binary-valued function u. Many of these models can be extended to incorporate multiple classes; in this section we will discuss some of these extensions, specifically for the Ginzburg–Landau functional, Allen–Cahn equation, and MBO scheme. (In Shen and Kang [Reference Shen and Kang175] total variation for functions on a continuum domain and with a discrete multiclass range (quantum TV) is investigated for image processing applications.)

The two-phase nature of the functional GLε is the result of the double-well potential and double-obstacle potential each achieving their minimum value in two places (‘wells’). For a multiphase version of the Ginzburg–Landau functional, we thus require a potential with multiple minimizers.

One approach is using a potential function W˜:RR with multiple wells instead of W, as is done with a periodic-well potential in Garcia-Cardona et al. [Reference Cristina, Arjuna, Percus and Marsico93; Reference Cristina, Arjuna, Percus, of Fred, Ana and Marsico94]. The one-dimensional domain of W˜ has the advantage that the state of the system can still be described by a single real-valued function u:VR; but having more than two wells in the one-dimensional domain necessarily creates an asymmetry in the distances between the wells. For example, if the potential has wells located at 1, 0, and 1, then the distance between the first and third wells is twice that between the first and second wells. Combined with the Dirichlet energy term in the Ginzburg–Landau functional, this creates an unwanted bias in favour of interfaces between the 1- and 0-phases (or between the 0- and 1-phases) over interfaces between the 1- and 1-phases. In [Reference Cristina, Arjuna, Percus and Marsico93; Reference Cristina, Arjuna, Percus, of Fred, Ana and Marsico94] this problem is resolved by the use of a generalized difference function in the Dirichlet energy term, which compensates for the asymmetry.

A second approach, which has found more traction in the literature, is using a potential function on a higher-dimensional domain, as in Merkurjev et al. [Reference Ekaterina, Cristina, Bertozzi, Arjuna and Percus143] and Garcia-Cardona et al. [Reference Cristina, Ekaterina, Bertozzi, Arjuna and Percus95]. The state of a system with K phases is now described by a vector-valued function u:VRK. We denote the set of such functions by VK:=VRK and the k th component of the value of such a function u at node i by (ui)kR. Just as there is a bijective correspondence between functions uV and vectors in Rn (see Remark 3.4), there is also a bijective correspondence between functions uVK and n-by-K real matrices U, with entries Uij=(ui)j, that is, Uij is the jth entry of the vector uiRK. We write uj for the jth component function of u, that is, (uj)i=(ui)j. The K phases are represented by the standard basis vectorsFootnote 66 ekRK (i.e., (ek)j=1 if k=j and (ek)j=0 if kj), with k[K], and thus the potential function should have wells located at the corners of the K-dimensional Gibbs simplex

ΣK:=x[0,1]K:k=1Kxk=1.(11.1)

If EK:={e1,,eK} is the set of standard basis vectors in RK, then

VEK:=VEK

is the set of node functions that take values on the corners of the simplex. Hence, if uVEK with matrix representation URn×K and we write Ui* for the i th row of U, then, for all i[n], Ui*EK.

In [Reference Ekaterina, Cristina, Bertozzi, Arjuna and Percus143; Reference Cristina, Ekaterina, Bertozzi, Arjuna and Percus95] the multiclass potential

WK(x):=12k=1K14xek12

is used for xRK, where x1:=i=1K|xi| and, more generally, xp:=i=1K|xi|p1p for p[1,) and xRK. The Dirichlet energy term now extends straightforwardly to K classes:Footnote 67

12k=1KukE2,

where in matrix representation uk is represented by the kth column of U. Applying these multiclass extensions to the graph Ginzburg–Landau functional from Section 5, we arrive at the following multiclass graph Ginzburg–Landau functional GLεK:VKR:

GLεK(u):=12α(ε)k=1KukE2+1εWK(u),

where WK(u):=iVWK(ui) or WK(u):=iVdirWK(ui) (see Section 5 for a discussion about these two options in the context of the two-phase functional GLε and about α(ε); also see Remark 7.1).

In the case when K=2, GLεK is not equal to GLε. We can, however, derive a connection between the two under the assumption that u1=v and u2=1v, for some vV. We note that this is equivalent to requiring uΣ2, and note that that node i being in one of the pure phases, namely ui=e1R2 or ui=e2R2, corresponds to vi=1 or vi=0, respectively. Then we compute

12u1E2+u2E2=12vE2+(1v)E2=vE2

and

W2(ui)=132uie112uie212=1322|1vi|22|vi|2=2W(vi),

if W(x)=14x2(x1)2, which is one of the options discussed in Section 5. Hence, under these assumptions,

GLε2(u)=2GLε(v),

and so for minimization purposes both functionals are equivalent.

Given a multiclass graph Ginzburg–Landau functional, a corresponding multiclass graph Allen–Cahn equation is defined as its gradient flow with respect to the VK-inner product

u,vVK:=k=1Kuk,vkV,

for u,vVK. With sR and for all u,vVK, we have

ddsGLεK(u)s=0=α(ε)Δu,vVK+1εDWKu,vVK,

where Δ:VKVK and DWK:VKVK are defined by, for all uVK, iV, and k[K],

Δuik:=ΔukiandDWKuik:=kWK(ui),

with k the partial derivative with respect to the kth component of the independent variable in RK. Here we have chosen WK(u):=iVdirWK(ui); if WK(u):=iVWK(ui) instead, we would need to include the degrees in the second term, as in Remark 7.1, or, equivalently, redefine DWK(u)ik:=dirkWK(ui).

This leads to the following multiclass graph Allen–Cahn equation:

dudt=α(ε)Δu1εDWKu.

In [Reference Ekaterina, Cristina, Bertozzi, Arjuna and Percus143; Reference Cristina, Ekaterina, Bertozzi, Arjuna and Percus95] also a fidelity-forcing term is included in the Allen–Cahn equation.

Similarly, also a multiclass MBO scheme can be defined. In the diffusion step K uncoupled diffusion equations are solved, leading to an intermediate output uVK. In the threshold step, we first project each uiRK onto the simplex ΣK and then assign (one of) the closest corner(s) of the simplex to each node i. This leads to the following scheme.

Multiclass Graph MBO Scheme

  • Initialize (multiclass). Choose an initial condition u0VEK, and τ>0.

  • Step l+1: multiclass diffusion. Solve the multiclass diffusion/heat equations: for all k[K], dukdt=Δuk on (0,τ] with initial condition u(0)=ul.

  • Step l+1: multiclass threshold. For all iV, define uil+1:=ek*RK, where k*argmink[K]viek2 (if k* is not unique, a k is chosen uniformly at random out of all minimizers) with vi:=argminxΣKuix2.Footnote 68

  • Stop.

The initial condition u0 and the output ul+1 of each iteration are functions in VEK and can be interpreted as indicator functions for the K classes: for k[K], define Skl:={iV:uil=ek}, then ukl=χSkl. The diffusion step thus consists of K uncoupled instances of the standard (i.e., 2-phase) MBO scheme diffusion step, one for each of the K classes. In Cucuringu et al. [Reference Cucuringu, Pizzoferrato and van Gennip60, appendix A.3] it is shown that the multiclass threshold step is equivalent to assigning uil+1:=ek* where k*argmaxk[K](ui)k. In other words, the diffusion step is followed by a ‘majority vote’ determining to which class each node gets assigned.

In Bresson et al. [Reference Bresson, Huiyi, Laurent, Szlam, von Brecht, Xue-Cheng, Egil and Lysaker31] a variant of graph multiclass MBO is presented, with a slightly different random-walk-based diffusion step, but more importantly with an incremental reseeding step between iterations: each iteration the initial condition of the diffusion step is created by randomly sampling nodes from each class based on the output of the previous iteration’s multiclass threshold step. The number of sampled nodes per class is increased in each iteration.

The ideas presented earlier for the multiclass graph MBO scheme can directly be transferred to the signless setting from Sections 5.3 and 8.2 (see Keetch [Reference Keetch116, chapter 5]) and to the fidelity-forced setting [Reference Ekaterina, Cristina, Bertozzi, Arjuna and Percus143; Reference Cristina, Ekaterina, Bertozzi, Arjuna and Percus95].

In the signless case we need to take care to replace the standard simplex by

Σ±K:=x[1,1]K:k=1Kxk=2K,

with corners EK±:={e1±,,eK±}, where, for all i,k[K], (ek±)i=1 if i=k and (ek±)i=1 if ik. Let VE±K:=VEK± be the set of node-functions with values on the corners of Σ±K.

Signless Multiclass Graph MBO Scheme

  • Initialize (signless multiclass). Choose u0VE±K and τ>0.

  • Step l+1: signless multiclass diffusion. Solve the multiclass signless diffusion/heat equations: for all k[K], dukdt=Δσuk on (0,τ]; u(0)=ul.

  • Step l+1: signless multiclass threshold. For all iV, define uil+1:=ek*±RK, where k*argmink[K]viek2 (if k* is not unique, a k is chosen uniformly at random out of all minimizers) with vi:=argminxΣ±Kuix2.Footnote 69

  • Stop.

We again refer to [Reference Cucuringu, Pizzoferrato and van Gennip60, appendix A.3] for a ‘majority vote’ interpretation of the signless multiclass threshold step.

For the multiclass fidelity-forced MBO scheme, we need multiclass versions of the fidelity parameter function μ and reference function f from Section 5.1; that is, μV[0,)K{0}Footnote 70 and fVK. In [Reference Ekaterina, Cristina, Bertozzi, Arjuna and Percus143; Reference Cristina, Ekaterina, Bertozzi, Arjuna and Percus95] each μk is chosen to be the same and f is chosen from VEK.

Multiclass Fidelity-Forced Graph MBO Scheme

  • Initialize (multiclass).

  • Step l+1: multiclass fidelity-forced diffusion. Solve the multiclass fidelity-forced diffusion/heat equations: for all k[K], dukdt=Δukμk(ukfk) on (0,τ] with initial condition u(0)=ul.

  • Step l+1: multiclass threshold.

  • Stop.

A multiclass mass-conserving graph MBO scheme called auction dynamics is developed in Jacobs et al. [Reference Jacobs, Merkurjev and Esedoḡlu110] (for r=0), based on the (unconstrained) continuum multiclass MBO scheme that was introduced in Esedoḡlu and Otto [Reference Esedoḡlu and Otto77] to approximate multiclass mean curvature flow. The mass of each class is specified, that is, for a given MRK it is imposed that, for all k[K], M(uk)=Mk. In order that the mass is conserved, the simple majority vote from the multiclass threshold step has to be adapted by a class-dependent correction term pk. (Approximately) determining this term is accomplished in [Reference Jacobs, Merkurjev and Esedoḡlu110] via the duality theory of linear programming. The name ‘auction dynamics’ is inspired by an interpretation of the scheme in which each node is bidding for membership of one of the K classes, with uiRK the vector of weighted preferences of node i for each class and pk the price of membership for class k. For details we refer to [Reference Jacobs, Merkurjev and Esedoḡlu110].

At the time of writing and to the best knowledge of the authors, it is an open question whether this auction dynamics scheme can also be derived starting from a multiclass mass-conserving Allen–Cahn equation similar to the way the binary mass-conserving MBO scheme from Section 8.3 is derived as explained in section 6.1.2 of [Reference van Gennip and Budd190].

In [Reference Jacobs, Merkurjev and Esedoḡlu110] auction dynamics is also extended to handle upper and lower bounds on the masses of the classes, rather than exactly prescribed masses.

In the context of Poisson learning (see Section 12) Calder et al. [Reference Calder, Cook, Thorpe, Slepčev, Hal and Singh44] implement a multiclass version of the mass-conserving fidelity-forced graph MBO scheme (version 2) from Section 8.3, which they name PoissonMBO and which takes the following form.Footnote 71

Multiclass Mass-Conserving Fidelity-Forced Graph MBO Scheme

  • Initialize (multiclass).

  • Step l+1: multiclass mass-conserving fidelity-forced diffusion. Solve the multiclass mass-conserving fidelity-forced diffusion/heat equations: for all k[K], dukdt=Δuk+μkfkμk,fkV on (0,τ], with u(0)=ul.

  • Step l+1: multiclass mass-conserving threshold. For all iV, define uil+1:=ek*RK, where k*argmink[K]viek2 (in case of nonuniqueness of minimizers one k is chosen uniformly at random out of all minimizers) with vi:=k=1Kck (ui)k, where the strictly positive constants ck are chosen such that, for all k[K], M(uk)=M(uk0).

  • Stop.

It is argued in [Reference Calder, Cook, Thorpe, Slepčev, Hal and Singh44, remark 2.2] that the constants ck>0 in the multiclass mass-conserving threshold step can always be chosen such that the mass constraints are satisfied.

In section 7.2.6 of [Reference van Gennip and Budd190] some further multiclass MBO schemes and their continuum limits are discussed.

12 Laplacian Learning and Poisson Learning

A function uV is harmonic at iV, if (Δu)i=0. In Zhu et al. [Reference Zhu, Ghahramani, Lafferty, Tom and Mishra211] the random walk graph Laplacian (r=1) is used, but other Laplacians appear in the literature as well. If SV and u is harmonic at all iVS, then u is said to be harmonic on VS. If S and f˜:SR is given, there exists a unique function uV such that u is harmonic on VS and u|S=f˜, see Lovász [Reference Lovász130, theorem 3.1.2]. This is called the harmonic extension of f. For introductory notes on harmonic functions on graphs, see Wigderson [Reference Wigderson199]. In [Reference Zhu, Ghahramani, Lafferty, Tom and Mishra211] this harmonic extension of f˜ is suggested as a good candidate for semi-supervised learning (we refer to section 4.1.2 of [Reference van Gennip and Budd190] for more information about semi-supervised learning) of binary labels on a graph (label propagation), given the a priori labels f˜ on S. This technique is called Laplacian learning (or Laplace learning). In Zhu et al. [Reference Zhu, Lafferty and Ghahramani210] the harmonic extension is interpreted as the mean of an associated Gaussian Random Field. To go from the real-valued harmonic extension f˜ to binary labels, simple thresholding suffices – or mass-conserving thresholding if the preferred cluster sizes are known – in the vein of the MBO scheme(s) of Section 11. Extensions to multiple labels are also possible, similar to the multiclass extension of Calder et al. [Reference Calder, Cook, Thorpe, Slepčev, Hal and Singh44] which we discuss in Section 11.

The Laplacian learning problem

Δu˜=0,on VS,u˜=f˜,on S,(12.1)

can be transformed, via u:=u˜f, where

f:=0,on VS,f˜,on S,(12.2)

to the semihomogeneous Dirichlet problem

Δu=Δf,on VS,u=0,on S.

The unique solution to this problem can be constructed in terms of a Green’s function [Reference Chung and Yau53; Reference Chung and Yau54], which itself can be expressed in terms of equilibrium measures [Reference Enrique, Carmona and Encinas19; Reference van Gennip188] (see also appendix B of [Reference van Gennip and Budd190]).

The Laplacian learning problem in (12.1) also has a variational formulation.

Lemma 12.1. Let uV, then u is the unique solution of (12.1) if and only if

uargminuˆVuˆE2s.t.uˆ|S=f˜.(12.3)

Proof. For a proof, we refer to lemma 3.5.1 in [Reference van Gennip and Budd190]. □

Laplacian learning and variants (often using the Dirichlet energy as a regularizer in a variational setting, inspired by (12.3)), have been applied in various settings, for example in Zhou and Schölkopf [Reference Zhou, Schölkopf, Walter, Robert and Hanbury206], Zhou et al. [Reference Zhou, Huang, Schölkopf, Luc and Wrobel204], and Elmoataz et al. [Reference Elmoataz, Desquesnes and Toutain74]. One generalization is the p-Laplacian learning problem (see [Reference Zhou, Schölkopf, Walter, Robert and Hanbury206]) which uses uE,pp in (12.3) instead of uE2.Footnote 72 However, when there are many a priori unlabelled nodes (i.e., when |VS| is large), there are some drawbacks. In particular, for random geometric graphs built from point clouds in Rd it has been shown that when |S| is fixed and pd, solutions to the limiting problem |VS| are relatively flat, except for spikes concentrated around the a priori labelled points (see, e.g., [Reference Zhou, Belkin, of Gordon, Geoffrey, David and Dudík208]). When p>d, solutions to the limiting problem do not exhibit this unwanted behaviour, assuming the length scale in the edge weight function (which regulates the strength of the connection between points in the point cloud based on their Euclidean distance) decreases slowly enough as |VS|. For technical details, we refer to Slepčev and Thorpe [Reference Slepčev and Thorpe179]; see also [Reference Alicandro, Ansini, Braides, Piatnitski and Tribuzio4, section 7.2]. In Crook et al. [Reference Crook, Hurst, Schönlieb, Thorpe and Zygalakis59] this connection to the limiting problem is exploited to develop a new numerical method for solving p-Laplacian learning. In Calder et al. [Reference Calder, Cook, Thorpe, Slepčev, Hal and Singh44] this degeneracy issue for p=2 is related to the random walk interpretation of Laplace learning, in which ui equals the expected value of f˜j, where jS is the first node in S that a random walk starting from node i hits.Footnote 73 Also for p=2, Calder and Slepčev in [Reference Calder and Slepčev43] address this degeneracy issue by adapting the weights of edges that connect to nodes (i.e., points in the point cloud) that are near to the a priori labelled points.

When p=, -Laplacian learning is also called Lipschitz learning, see Kyng et al. [Reference Rasmus, Anup, Sushant, Spielman, Peter, Elad and Kale123]. Various graph-to-continuum consistency results for Lipschitz learning, as well as learning with the game-theoretic p-Laplacian from (3.14) (or variants thereof) are established in, for example, Flores et al. [Reference Flores, Calder and Lerman90] (for more literature references, see section 3.5 of [Reference van Gennip and Budd190]).

In Weihs and Thorpe [Reference Weihs and Thorpe198], fractional-Laplacian learning is studied (especially its consistency at the variational level; we refer to section 7.4 of [Reference van Gennip and Budd190]), which is based on another variant of the graph Dirichlet energy:

u,ΔsuV,

with parameter s>0 (and r=0 in Δ).

Another variant is Poisson learning, which is suggested in [Reference Calder, Cook, Thorpe, Slepčev, Hal and Singh44] as a way to avoid the degeneracy issue that we discussed earlier:

Δu=χSfM(f)M(χS),(12.4)

where f is as in (12.2). The choice r=0 is made for the Laplacian Δ and mass functional M. To select a unique solution, the constraint M(u)=0 is imposed, where this time r=1 is chosen in M.Footnote 74 As in the case of Laplace learning, to go from the real-valued function u to labels, a (mass-conserving) thresholding step is used. This is extended to multiple labels (see Section 11) and implemented using a multiclass variant of the mass-conserving fidelity-forced graph MBO scheme (version 2) from Section 8.3, which is named the Poisson MBO algorithm in [Reference Calder, Cook, Thorpe, Slepčev, Hal and Singh44]. Indeed, in the two-class case we see that the stationary solution to the diffusion step of this Poisson graph MBO scheme – namely to the mass-conserving fidelity-forced diffusion step of the mass-conserving fidelity-forced graph MBO scheme (version 2) – satisfies (12.4) with χS in the right-hand side replaced by μ with support equal to S.

Per [Reference Calder, Cook, Thorpe, Slepčev, Hal and Singh44, theorem 2.3], the equivalent variational formulation of (12.4) isFootnote 75

uargminvV12vE2χSfM(f)M(χS),vVs.t.M(v)=0.

For an interpretation of Poisson learning in terms of random walks, we refer to [Reference Calder, Cook, Thorpe, Slepčev, Hal and Singh44, theorem 2.1]. In appendix B of [Reference van Gennip and Budd190] we have a detailed look at solutions of (12.4).

In the recent work of Thorpe and Wang [Reference Thorpe, Wang, of Bruna, Joan, Jan and Zdeborova184] a robust certification against adversarial attacks is proved for graph Laplace learning, that is, the classification result remains unchanged if the data points (assumed to be embedded in Rm) are perturbed with some bounded perturbation. Work currently in preparation of Korolev et al. [Reference Korolev, Kuger and Thorpe122] studies Laplace learning in an infinite-dimensional setting.

13 Conclusions

Using the tools discribed in Section 3, discretized variants of differential equations and variational models can be formulated on graphs. We have focused mostly on the graph Ginzburg–Landau model and dynamics related to that model, such as those described by the graph Allen–Cahn equation, the graph MBO scheme, and graph mean curvature flow. In Section 12 we also had a look at the graph Laplace equation and graph Poisson equation.

A common theme that unites these various models and equations, besides being discretized versions of well-known continuum variational models and PDEs, is their application in machine learning and mathematical imaging.

Besides containing extended versions of some of the sections from the current book, the companion volume, [Reference van Gennip and Budd190], also contains a full chapter about these applications in machine learning and imaging. It has another chapter dealing with the numerical implementation of these methods, a chapter about the connections between the graph Allen–Cahn equation, the graph MBO scheme, and graph mean curvature flow, and a chapter about discrete-to-continuum limits of graph-based models. Furthermore, it contains a, necessarily relatively brief, overview of the continuum models from which the graph-based models are derived, and a discussion of connections with other fields and open questions.

If this work is part of a snapshot of the current state of this very active mathematical field, then its companion is meant to be a fuller and broader overview. While both works contain a literature overview that can suggest further directions to explore for the interested reader, the companion volume contains a significantly more extensive bibliography to explore.

Acknowledgements

First of all we would like to thank Daniel Tenbrinck for taking the initiative that has made it possible for this work to come into existence. It, and [190], grew from the fertile ground that has been the NoMADS (Nonlocal Methods for Arbitrary Data Sources) project which was funded by the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 777826.

Our thank also goes out to Luca Calatroni who has been our main point of contact for all questions editorial and David Tranah, our invaluable contact at Cambridge University Press. Furthermore, we very much appreciate the feedback of the two anonymous reviewers, whose comments have led to many improvements throughout this Element.

The content of this Element is built upon the expertise we have built in the course of our own research over many years, as well as the expertise of many others which we have had the privilege to learn from through the literature, scientific presentations, and – most enjoyably – many personal conversations. It would be impossible to thank all the people whose influence can be found to a smaller or larger degree in this Element, so we restrict ourselves to mentioning Andrea Bertozzi at the University of California, Los Angeles, and Abderrahim Elmoataz at the Université de Caen Normandie, and their many collaborators, who through their scientific works and their personal engagement have put the field of PDE-inspired differential equations on graphs for imaging and machine learning on the map as a vital and vibrant area of research. It can always remain a matter of debate in the larger community what may or even should be considered the origin of a new (sub)field in mathematics, but at least in the narrow context of the authors’ own contributions to this field, the people and groups from Los Angeles and Caen have been instrumental in shaping it. Moreover, we would like to thank explicitly Jonas Latz for his contributions to our comment on continuous-time Markov chains in Remark 3.5 and Daniele Avitabile for pointing out an alternative name for the semidiscrete implicit Euler scheme (see footnote Footnote 63).

The authors acknowledge financial support by the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 777826 via the NoMADS project. The second author has received funding from the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy - EXC-2047/1 - 390685813, and from start-up funds at the California Institute of Technology.

The open access edition of this Element was financed by the above-mentioned NoMADS project and the TU Delft Open Access Fund.

  • Luca Calatroni

  • Centre National de la Recherche Scientifique (CNRS)

  • Luca Calatroni is a permanent junior research scientist of the French Centre of Scientific Research (CNRS) at the laboratory I3S of Sophia-Antipolis, France. He got his PhD in applied mathematics in 2016 as part of the Cambridge Centre for Analysis (DTC) and he worked as post-doctoral research fellow at the École Polytechnique (Palaiseau, France) with a Lecteur Hadamard fellowship funded by the FMJH. His research interests include variational models for mathematical imaging, inverse problems, non-smooth and non-convex optimization with applications to biomedical imaging, computational neurosciences and digital art restoration.

  • Martin Burger

  • Friedrich-Alexander University Erlangen-Nürnberg, Germany

  • Raymond Chan

  • City University of Hong Kong

  • Ekaterina Rapinchuk

  • Michigan State University, USA

  • Carola-Bibiane Schönlieb

  • University of Cambridge, UK

  • Daniel Tenbrinck

  • Friedrich-Alexander University Erlangen-Nürnberg, Germany

About the Series

  • This series provides a mathematical description of the modelling, the analysis and the optimisation aspects of processing data that features complex and non-local relationships. It is intended for both students and researchers wishing to deepen their understanding of non-local analysis and explore applications to image and data processing.

Footnotes

1 The idea for this Element and its companion book owes a great debt to the Nonlocal Methods for Arbitrary Data Sources (NoMADS) project funded by the European Union’s Horizon 2020 research and innovation programme (Marie Skłodowska-Curie grant agreement No. 777826).

2 In this Element, by ‘graph’ we mean the discrete objects consisting of vertices and (potentially weighted) edges connecting the vertices that are studied in graph theory. Where we want to talk about the graph of a function, this is made explicit. As is quite common, we tend to use ‘graph’ and ‘network’ [Reference Estrada and Knight78] interchangeably, although some might prefer to distinguish between networks in the real world and the mathematical graphs that can be used to model them. We also use related terminology, such as vertices and nodes [Reference Barabási15, box 2.1], interchangeably.

3 Or a difference equation, if u does not depend on the variable t.

4 We sometimes refer to these two fields collectively as ‘imaging’ or ‘mathematical imaging’, or just ‘image analysis’ for brevity.

5 Although there are recent trends to incorporate (physical) models into the data-driven machinery.

6 There may be deviations from or additions to this general requirement. For example, in Macgregor and Sun [Reference Macgregor, Sun and Zhang135] (see also Macgregor [Reference Macgregor133, chapter 6]) the goal is to find two clusters (with an algorithm that is local, in the sense that its run time is independent of the size of the graph) that are densely connected to each other, but weakly to the rest of the graph.

7 In [Reference Mazón, Marcos and Toledo139] heat flow on metric random walk spaces is studied, in [Reference Mazón, Marcos and Toledo140] total variation flow.

8 This means |V|<∞.

9 Given two nodes i,j∈V, there is at most one edge (i,j)∈E, as is already implied by E⊆V×V.

10 Each edge connects two distinct nodes, that is, for all i∈V, (i,i)∈(V×V∖E). For a preprint discussing Laplacians (which we will introduce later) on graphs with self-loops see Açıkmeşe [Reference Açıkmeşe2].

11 The definition of connectivity is given later in this section.

12 In this particular case, since we do not allow self-loops, E is always a strict subset of V×V.

13 To avoid ambiguity, we note that we write N for the strictly positive natural numbers and N0:=N∪{0}. Also, to avoid ambiguity regarding zero, we call the numbers in the sets (0,∞) and (−∞,0) strictly positive and strictly negative, respectively.

14 At first glance it may seem that the bilinear forms defined here are not inner products, since ⟨u,u⟩V may be zero, even if u≠0, if there is an i∈V such that dir=0, and similarly ⟨φ,φ⟩E may be zero, even if φ≠0, if there are i,j∈V such that ωij2q−1=0. However, the possibility dir=0 is excluded by our assumption of connectedness and, while ωij2q−1=0 will be true for some i,j∈V for all graphs except complete graphs, the definition of E requires φij=0 for such i and j.

15 The intervals [0,1] and [1/2,1] from which r and q are chosen, respectively, are mostly intervals of convenience: they cover the most common values that appear in the literature – r∈{0,1} and q∈{1/2,1} – and allow for interpolation between them. We do not expect major, or possibly any, changes to be necessary if we allow r,q∈R instead.

16 Recall the convention that ωij0=0 if ωij=0.

17 These graph Laplacians can also be defined in terms of an incidence matrix of the graph, i.e., a matrix that has nonzero entry in the i th row and k th column, if and only if node i is an endpoint of edge k (under some arbitrary, yet fixed, ordering of the edges, where in an undirected graph (i,j) and (j,i) are treated as being a single edge); see for example [Reference Chung55, section 1.2]. We do not explicitly use this characterization of graph Laplacians in this work.

18 The random walk graph Laplacian is sometimes also called the left-normalized graph Laplacian to distinguish it from the right-normalized graph Laplacian that has associated matrix I−AD−1. This latter Laplacian does not fit into the framework of (3.4), but can be represented as member of the two-parameter family of graph Laplacians in (3.7).

19 The authors thank Jonas Latz for this observation.

20 We can allow any s,t∈R, so in this context r is not restricted to [0,1].

21 The usual definition includes sgn(0)=0. Some of the works we cite use a different definition at x=0 or do not specify the value in x=0.

22 Another graph p-Laplacian does appear in the literature, for example in [Reference Elmoataz, Lézoray and Bougleux75, Reference Elmoataz, Toutain and Tenbrinck76, Reference Zhou, Schölkopf, Walter, Robert and Hanbury206], inspired by the p-Laplacian in the continuum (about which section 3.7.5 in [Reference van Gennip and Budd190] contains some information). For further details we refer to footnote Footnote 40 in [Reference van Gennip and Budd190].

23 The definition of the graph -Laplacian is not fully consistent throughout the literature. Some definitions differ by an overall sign change or an overall factor 12, for example in [Reference Elmoataz, Desquesnes and Lézoray72; Reference Elmoataz, Desquesnes and Toutain74]. In Flores et al. [Reference Flores, Calder and Lerman90] it is defined at node i to be minj∈V(∇u)ij+minj∈V(∇u)ij inspired by equation (2.18) in [Reference van Gennip and Budd190]. As we can see in the proof of Lemma 3.7, which is given in lemma 2.1.7 of [Reference van Gennip and Budd190], the solution set of the important equation (Δ∞u)i=0 is the same under all these definitions. The choice of q in ∇u may also differ between papers, with q=12 and q=1 being common choices.

24 The definition of the game-theoretic graph p-Laplacian is not consistent throughout the literature. For more details we refer to footnote Footnote 43 of [Reference van Gennip and Budd190].

25 See also Peres and Sheffield [Reference Peres and Sheffield170] for the game-theoretic Laplacian in the continuum setting.

26 Here P(V) denotes the power set of V, namely, the set of all subsets of V.

27 We refer to Section 6 for information about the spectrum of (non-hyper)graph Laplacians.

28 In a slight abuse of notation χE′ is viewed here as a function on V′×V′ instead of E, such that χE′ωV′×V′ is the weight function that assigns ωij to (i,j)∈V′×V′ if (i,j)∈E′ and that assigns 0 to (i,j)∈V′×V′ otherwise.

29 That is, graphs where the the vertex set can be partitioned into disjoint sets V1 and V2, the partite sets, such that every edge has exactly one endpoint in V1 and one in V2.

30 More accurately, our Section 3 is mainly based on the setup in [Reference Bertozzi189; Reference Nestor, Braxton and Bertozzi191] which in turn was based on [Reference Hein, Audibert and von Luxburg104].

31 Not to be confused with the Ginzburg–Landau model for superconductivity. Other names for the (continuum version of the) functional may be encountered, especially in the continuum literature, such as Allen–Cahn, Cahn–Hilliard, Van der Waals–Cahn–Hilliard, or Modica–Mortola functional.

32 In fact, in [Reference Bertozzi and Flenner21; Reference Bertozzi and Flenner22] this factor is also not present in the gradient flow, since in that paper a Euclidean inner product is used, that is, the V-inner product with r=0.

33 The transformation v=2u−1 turns u2(u−1)2 into 116(v2−1)2, and thus GLε(u) with potential u2(u−1)2 into 14GLε(v) with potential 14(v2−1)2.

34 So c∈{−1,1} or c∈{0,1} for the specific double-well and double-obstacle potentials we have just discussed.

35 Here, we use ‘mass’ and ‘volume’ interchangeably, as, if S⊆V, then M(χS)=volS.

36 The minimal value minS⊆VCCut(S) is called the Cheeger constant (see also (6.1)). Some sources, such as [Reference Chung55, section 2.3], also speak of the edge expansion in this context.

37 The specific placement of the wells is not crucial, but the domain on which GL0 is finite, as well as the multiplicative factor in GL0(u) (currently ‘ 1’) needed to turn |ui−uj| into (ui−uj)2 at the values in the wells, would have to be adapted to reflect any alternative choice.

38 These inequalities are generalizations of the Cheeger (constant) inequality for K=2; see for example Alon and Milman [Reference Noga and Milman8], Alon [Reference Alon7], Sinclair and Jerrum [Reference Sinclair and Jerrum178], or Chung [Reference Chung55, chapter 2]. We note that if K=2, then the maximum in (6.1) is the Cheeger cut CCut from Section 5.2.

39 This means that there exist constants C1,C2>0 such that for all K≥C1, |C(K)|≤C2 K2.

40 Commonly the combinatorial graph Laplacian is used to define the Fiedler value and Fiedler vector, yet in practical (clustering) applications other Laplacians are used as well, often with greater success. See, for example, Bertozzi and Flenner [Reference Bertozzi and Flenner21; Reference Bertozzi and Flenner22, section 2.4]). In Le Gorrec et al. [Reference Sandrine, Duff, Knight, Ruiz, of Brefeld, Ulf, Elisa, Andreas, Arno, Marloes and Robardet127] it is argued that the eigenvector corresponding to the second-largest eigenvalue of the adjacency matrix, where the latter is first scaled (for example, by way of Knight and Ruiz [Reference Knight and Ruiz120]) into the double-stochastic form D1AD2 (i.e., each row and each column of D1AD2 sums to one) with D1 and D2 diagonal matrices, has a structure that more clearly indicates cluster structure than Fiedler vectors.

41 The associated eigenspace can have dimension two or higher, yet the literature typically speaks about the Fiedler vector.

42 And mini∈V1n∑j∈V,j≠iωij>ε. We note that here we have taken G=G˜=1 in [Reference Karoui and Hau-Tieng115, lemmas 2.1, 2.2, and 2.3] to align their setting with ours.

43 The spectral norm of a matrix is its largest singular value, namely the square root of the largest eigenvalue of the product of the transpose (or conjugate transpose for complex matrices) of the matrix and the matrix itself (in either order).

44 Namely in the sense that it leads to (a good approximation of) the maximum cut value for that graph.

45 Some of the sources that are cited throughout may make different choices. Unless these choices strongly impact the claims and results we present here, we will not draw attention to such differences.

46 Per the general definition of the subdifferential in Ekeland and Temam [Reference Ekeland and Temam67, definition 5.1], β should be in the topological dual of V. Since V is a finite-dimensional inner product space (and thus also a Hilbert space), it is reflexive and we identify V with its topological dual.

47 The (negative) subdifferential is given by

−∂Rı[0,1](ui(t))=[0,∞),if ui(t)=0,{0},if 0<ui(t)<1,(−∞,0],if ui(t)=1,∅,if ui(t)<0 or ui(t)>1.

48 We may sometimes simply refer to u as a solution, implying the existence of a corresponding β.

49 In [Reference Calder, Cook, Thorpe, Slepčev, Hal and Singh44] a multiclass functional is used; see Section 11.

50 Alternatively, for solutions of (7.4), we can also use the fact that the (finite) mass of the solution is conserved to rule out finite-time blowup.

51 In [Reference van Gennip188, Supplementary Materials section 4] an extra term appears, because the gradient flow is taken not of GLε, but of GLε plus an additional term. Choosing γ=0 in [Reference van Gennip188] returns us to the GLε case. Moreover, an additional factor D−r appears in the potential term in [Reference van Gennip188], since W(u)=∑i∈VW(ui) is used instead of W(u)=⟨W∘u,1⟩V.

52 Also called the diffusion time (step).

53 For the mathematical analysis of the scheme, it is convenient to assume there is no stopping condition and no upper bound on the number of steps, so that the scheme generates an infinite sequence of output functions uk.

54 What ‘small’ exactly means in this context is an interesting topic for study; see, for example, the results on time step selection or freezing (aka pinning) in [Reference Budd33, Reference Budd and van Gennip35Reference Budd, van Gennip and Latz37, Reference Laux and Lelmi124, Reference Ruuth173, Reference van Gennip187, Reference Nestor, Braxton and Bertozzi191]. See also Section 10.

55 In [Reference Keetch and van Gennip117; Reference Keetch116] the threshold step assigns −1 instead of 1 to nodes i with ui(τ)=0. As discussed in Section 8.1, arbitrariness in the choice of value assigned ‘at the threshold’ is a source of nonuniqueness of definition of these schemes. Here we have kept our choice consistent with that in Section 8.1.

56 Sometimes a function which takes two independent variables as input is called a binary function. That is not the sense in which we will use this term. When we refer to a binary function, we mean a function whose image is a subset of a set with two elements, most commonly {0,1} or {−1,1}.

57 We note that the description of the (mcOKMBO) algorithm in [Reference van Gennip188, section 5.3] contains typos: all instances of ‘ dirui’ in the description at the top of page 2357 should be replaced by ‘ dir’.

58 Here P(V) denotes the set of probability measures on V, not to be confused with the power set of V, which it denotes in some other, clearly indicated, places in this work.

59 Under appropriate conditions, such as the correct scaling of both δ and the connectivity radius (see remark 7.2.4 in [Reference van Gennip and Budd190]) in the random geometric graph with the number of nodes in the graph, and a rescaling of κG with δ−2.

60 The factor 2 in the last term appears if we count (i,j) and (j,i) as different edges; if we identify these edges as the same edge, the factor is 1.

61 By this we mean that there exists a C˜>0 such that, for η>0 small enough, |Cη|≤C˜η. (See footnote Footnote 39.)

62 Technically, it is only shown for the Allen–Cahn equation with smooth enough potentials, such as the double-well potential. In [Reference Budd33, note 39] it is argued that a similar conclusion holds when the double-obstacle potential is used, except potentially in very pathological circumstances.

63 Recently, the authors were made aware that the SDIE scheme may be called an implicit exponential scheme. We thank Daniele Avitabile for letting us know. For more details we refer to footnote 210 of [Reference van Gennip and Budd190].

64 We recall from Lemma 6.1 that the smallest eigenvalue is zero and, since we are working with a connected graph, λ2>0.

65 In the notation of [Reference Boyd, Bae, Tai and Bertozzi27], for γ=0 the scheme of Boyd et al. is the same as the graph MBO scheme (without constraints) from Section 8.1 with the combinatorial graph Laplacian.

66 In machine learning, there is the related notion of one-hot encoding of categorical variables: if data point i belongs to category j, then this can be encoded in a matrix U with Uij=1. In our context each data point belongs to one and only one phase, in which case the one-hot encoded rows of U are corners of a Gibbs simplex (see (11.1)). In a more general setting in which a data point can belong to multiple categories, one-hot encoded rows can contain multiple ones.

67 In [Reference Ekaterina, Cristina, Bertozzi, Arjuna and Percus143; Reference Cristina, Ekaterina, Bertozzi, Arjuna and Percus95] the symmetrically normalized Laplacian Δsym from (3.6) is used instead of Δ in the Dirichlet energy, which can be written in the form 12∑k=1K⟨u⋅k,Δsymu⋅k⟩V.

68 Existence and uniqueness of vi follow from a well-known result; see, for example, Bagirov et al. [Reference Adil, Napsu and Mäkelä14, lemma 2.2].

69 As in footnote Footnote 68, existence and uniqueness of vi follow from, for example, Bagirov et al. [Reference Adil, Napsu and Mäkelä14, lemma 2.2].

70 If fidelity-forcing is required in each of the K classes, then each μ⋅k should be non-zero.

71 To be consistent with the other MBO schemes, we write f for the of Section 8.3.

72 In fact, in various papers, for example Zhou and Schölkopf [Reference Zhou, Schölkopf, Walter, Robert and Hanbury206], the graph p-Laplacian from (3.13) – which is the one derived from the graph p-Dirichlet energy – is not used, but rather the p-Laplacian derived by discretizing the continuum p-Laplacian; see footnote Footnote 22.

73 For a brief overview of the connection between random walks and the graph Laplace equation, we refer to Van Gennip [Reference van Gennip188, Supplementary materials, section 2] and the references therein.

74 For a brief discussion of a redefined Poisson learning which uses a consistent r and which generalizes the Laplacian used to the two-parameter Laplacian Δ(s,t) from (3.7), see Budd [Reference Budd34].

75 To be consistent with [Reference Calder, Cook, Thorpe, Slepčev, Hal and Singh44], again we must choose r=0, except in the constraint M(v)=0, where r=1.

References

Abiad, Aida, Mulas, Raffaella, and Zhang, Dong. 2021. Coloring the normalized Laplacian for oriented hypergraphs. Linear Algebra and Its Applications, 629, 192207.CrossRefGoogle Scholar
Açıkmeşe, Behçet. 2015. Spectrum of Laplacians for Graphs with Self-Loops. arXiv:1505.08133 [math.OC].Google Scholar
Albin, Nathan, Brunner, Megan, Perez, Roberto, Poggi-Corradini, Pietro, and Wiens, Natalie. 2015. Modulus on graphs as a generalization of standard graph theoretic quantities. Conformal Geometry and Dynamics of the American Mathematical Society, 19(13), 298317.CrossRefGoogle Scholar
Alicandro, Roberto, Ansini, Nadia, Braides, Andrea, Piatnitski, Andrey, and Tribuzio, Antonio. 2023. A Variational Theory of Convolution-Type Functionals. SpringerBriefs on PDEs and Data Science. Singapore: Springer.Google Scholar
Allen, Samuel M., and Cahn, John W. 1979. A microscopic theory for antiphase boundary motion and its application to antiphase domain coarsening. Acta Metallurgica, 27(6), 10851095.CrossRefGoogle Scholar
Fred, Almgren, Taylor, Jean E., and Wang, Lihe. 1993. Curvature-driven flows: a variational approach. SIAM Journal on Control and Optimization, 31(2), 387438.Google Scholar
Alon, Noga. 1986. Eigenvalues and expanders. Combinatorica, 6(2), 8396. DOI: https://doi.org/10.1007/BF02579166.CrossRefGoogle Scholar
Noga, Alon, and Milman, Vitali D. 1985. isoperimetric inequalities for graphs, and superconcentrators. The Journal of Combinatorial Theory, Series B, 38(1), 7388.Google Scholar
Fuensanta, Andreu, Coloma, Ballester, Vicent, Caselles, and Mazón, José M. 2001. Minimizing total variation flow. Differential and Integral Equations, 14(3), 321360.Google Scholar
Arora, Sanjeev, Hazan, Elad, and Kale, Satyen. 2010. approximation to SPARSEST CUT in time. SIAM Journal on Computing, 39(5), 17481771.CrossRefGoogle Scholar
Aviles-Rivero, Angelica I., Christina, Runkel, Nicolas, Papadakis, Zoe, Kourtzi, and Schönlieb, Carola-Bibiane. 2022. Multi-modal hypergraph diffusion network with dual prior for Alzheimer classification. Pages 717–727 of Wang, Linwei, Dou, Qi, Fletcher, Thomas, P., Speidel, Stefanie, and Li, Shuo (eds.), Medical Image Computing and Computer Assisted Intervention: MICCAI 2022. Cham: Springer Nature.Google Scholar
Avrachenkov, Konstantin, Chebotarev, Pavel, and Rubanov, Dmytro. 2019. Similarities on graphs: kernels versus proximity measures. European Journal of Combinatorics, 80, 4756.CrossRefGoogle Scholar
Bae, Egil, and Merkurjev, Ekaterina. 2017. Convex variational methods on graphs for multiclass segmentation of high-dimensional data and point clouds. Journal of Mathematical Imaging and Vision, 58, 468493.CrossRefGoogle Scholar
Adil, Bagirov, Napsu, Karmitsa, and Mäkelä, Marko M. 2014. Introduction to Nonsmooth Optimization. Cham: Springer.Google Scholar
Barabási, Albert-László. 2016. Network Science. Cambridge: Cambridge University Press.Google Scholar
Barles, Guy, and Georgelin, Christine. 1995. A simple proof of convergence for an approximation scheme for computing motions by mean curvature. SIAM Journal on Numerical Analysis, 32(2), 484500.CrossRefGoogle Scholar
Bauer, Frank. 2012. Normalized graph Laplacians for directed graphs. Linear Algebra and Its Applications, 436(11), 41934222.CrossRefGoogle Scholar
Belongie, Serge, Fowlkes, Charless, Chung, Fan, and Malik, Jitendra. 2002. Spectral partitioning with indefinite kernels using the Nyström extension. Pages 531–542 of Heyden, Anders, Sparr, Gunnar, Nielsen, Mads, , and Johansen, Peter (eds.), European Conference on Computer Vision. Berlin: Springer. https://link.springer.com/book/10.1007/3-540-47969-4Google Scholar
Enrique, Bendito, Carmona, Ángeles, and Encinas, Andrés M. 2003. Solving Dirichlet and Poisson problems on graphs by means of equilibrium measures. European Journal of Combinatorics, 24(4), 365375.Google Scholar
Bertozzi, Andrea L. 2018. Graphical models in machine learning, networks, and uncertainty quantification. Pages 38533880 of Sirakov, , Boyan, Ney de Souza, Paulo, , and Viana, Marcelo (eds.), Proceedings of the International Congress of Mathematicians, vol. 3. Singapore: World Scientific.Google Scholar
Bertozzi, Andrea L., and Flenner, Arjuna. 2012. Diffuse interface models on graphs for classification of high dimensional data. Multiscale Modeling & Simulation, 10(3), 10901118.CrossRefGoogle Scholar
Bertozzi, Andrea L., and Flenner, Arjuna. 2016. Diffuse interface models on graphs for classification of high dimensional data. SIAM Review, 58(2), 293328.CrossRefGoogle Scholar
Bertozzi, Andrea L., and Merkurjev, Ekaterina. 2019. Graph-based optimization approaches for machine learning, uncertainty quantification and networks. Pages 503531 of Kimmel, , Ron, , and Tai, Xue-Cheng (eds.), Processing, Analyzing and Learning of Images, Shapes, and Forms: Part 2. Handbook of Numerical Analysis, vol. 20. Amsterdam: Elsevier.CrossRefGoogle Scholar
Bühler, Thomas, and Hein, Matthias. 2009. Spectral clustering based on the graph -Laplacian. Pages 8188 of Proceedings of the 26th Annual International Conference on Machine Learning. New York, NY: Association for Computing Machinery.CrossRefGoogle Scholar
Bonaccorsi, Stefano, Cottini, Francesca, and Mugnolo, Delio. 2021. Random Evolution equations: Well-posedness, asymptotics, and applications to graphs. Applied Mathematics and Optimization, 84, 28492887.CrossRefGoogle Scholar
Bosch, Jessica, Klamt, Steffen, and Stoll, Martin. 2018. Generalizing diffuse interface methods on graphs: nonsmooth potentials and hypergraphs. SIAM Journal on Applied Mathematics, 7(3), 13501377.CrossRefGoogle Scholar
Boyd, Zachary M., Bae, Egil, Tai, Xue-Cheng, and Bertozzi, Andrea L. 2018. Simplified energy landscape for modularity using total variation. SIAM Journal on Applied Mathematics, 78(5), 24392464.CrossRefGoogle Scholar
Braides, Andrea. 2002. -Convergence for Beginners. Oxford Lecture Series in Mathematics and Its Applications, vol. 22. Oxford: Oxford University Press.Google Scholar
Braides, Andrea, Gelli, Maria Stella, and Novaga, Matteo. 2010. Motion and pinning of discrete interfaces. Archive for Rational Mechanics and Analysis, 195(2), 469498.CrossRefGoogle Scholar
Brakke, Kenneth A. 1978. The Motion of a Surface by Its Mean Curvature. Mathematical Notes, vol. 20. Princeton, NJ: Princeton University Press.Google Scholar
Bresson, Xavier, Huiyi, Hu, Laurent, Thomas, Szlam, Arthur, and von Brecht, , , James. 2018. An incremental reseeding strategy for clustering. Pages 203–219 of Tai, Xue-Cheng, Bae, Egil, , and Lysaker, Marius (eds.), International Conference on Imaging, Vision and Learning based on Optimization and PDEs – IVLOPDE 2016: Imaging, Vision and Learning Based on Optimization and PDEs. Mathematics and Visualization. Cham: Springer International.Google Scholar
Lia, Bronsard, and Kohn, Robert V. 1991. Motion by mean curvature as the singular limit of Ginzburg–Landau dynamics. Journal of Differential Equations, 90(2), 211237.Google Scholar
Budd, Jeremy. 2022. Theory and Applications of PDE Methods in Graph-Based Learning. PhD thesis, Delft University of Technology.Google Scholar
Budd, Jeremy. 2023. Graph-Based Learning for Image Processing. https://jeremybudd.com/. Online lecture notes; accessed 7 August 2023.Google Scholar
Budd, Jeremy, and van Gennip, , , Yves. 2020. Graph Merriman–Bence–Osher as a semidiscrete implicit Euler scheme for graph Allen–Cahn flow. SIAM Journal on Mathematical Analysis, 52(5), 41014139.CrossRefGoogle Scholar
Budd, Jeremy, and van Gennip, , , Yves. 2022. Mass-conserving diffusion-based dynamics on graphs. European Journal of Applied Mathematics, 33(3), 423471.CrossRefGoogle Scholar
Budd, Jeremy, van Gennip, , Yves, and Latz, Jonas. 2021. Classification and image processing with a semi-discrete scheme for fidelity forced Allen–Cahn on graphs. GAMM Mitteilungen Special Issue: Scientific Machine Learning – Part I, 44, 143.CrossRefGoogle Scholar
Böhle, Tobias, Kuehn, Christian, Mulas, Raffaella, and Jost, Jürgen. 2022. Coupled hypergraph maps and chaotic cluster synchronization. Europhysics Letters, 136(4), 40005.CrossRefGoogle Scholar
Caffarelli, Luis A., and Souganidis, Panagiotis E. 2010. Convergence of nonlocal threshold dynamics approximations to front propagation. Archive for Rational Mechanics and Analysis, 195(1), 123.CrossRefGoogle Scholar
Cahn, John W., and Hilliard, John E. 1958. Free energy of a nonuniform system. I. Interfacial free energy. The Journal of Chemical Physics, 28(2), 258267.CrossRefGoogle Scholar
Luca, Calatroni, van Gennip, Yves, Schönlieb, Carola-Bibiane, Rowland, Hannah, M., and Flenner, Arjuna. 2017. Graph clustering, variational image segmentation methods and Hough transform scale detection for object measurement in images. Journal of Mathematical Imaging and Vision, 57, 269291.Google Scholar
Calder, Jeff, and Ettehad, Mahmood. 2022. Hamilton–Jacobi equations on graphs with applications to semi-supervised learning and data depth. Journal of Machine Learning Research, 23(318), 162.Google Scholar
Calder, Jeff, and Slepčev, Dejan. 2020. Properly-weighted graph Laplacian for semisupervised learning. Applied Mathematics and Optimization, 82, 11111159.CrossRefGoogle Scholar
Calder, Jeff, Cook, Brendan, Thorpe, Matthew, and Slepčev, Dejan. 2020. Poisson learning: graph based semi-supervised learning at very low label rates. Pages 1306–1316 of Daumé III, Hal, , and Singh, Aarti (eds.), Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 119.Google Scholar
Calder, Jeff, Trillos, García, Nicolás, and Lewicka, Marta. 2022. Lipschitz regularity of graph Laplacians on random data clouds. SIAM Journal on Mathematical Analysis, 54(1), 11691222.CrossRefGoogle Scholar
Chan, Tony F., and Shen, Jianhong (Jackie). 2005. Image Processing and Analysis. Philadelphia, PA: Society for Industrial and Applied Mathematics.CrossRefGoogle Scholar
Chan, Tony F., and Vese, Luminita A. 2001. Active contours without edges. IEEE Transactions on Image Processing, 10(2), 266277.CrossRefGoogle ScholarPubMed
Chebotarev, Pavel. 2013. Studying new classes of graph metrics. Pages 207–214 of Nielsen, Frank, , and Barbaresco, Frédéric (eds.), Geometric Science of Information: Proceedings of the First International Conference GSI 2013. Berlin: Springer.Google Scholar
Cheeger, Jeff. 1970. A lower bound for the smallest eigenvalue of the Laplacian. Pages 195199 Gunning, Robert C. (ed.), Problems in analysis (Papers dedicated to Salomon Bochner, 1969). Princeton, NJ: Princeton University Press.Google Scholar
Chen, Yun Gang, Giga, Yoshikazu, and Goto, Shun’ichi. 1991. Uniqueness and existence of viscosity solutions of generalized mean curvature flow equations. Journal of Differential Geometry, 33(3), 749786.CrossRefGoogle Scholar
Chong, Yanwen, Ding, Yun, Yan, Qing, and Pan, Shaoming. 2020. Graph-based semi-supervised learning: a review. Neurocomputing, 408, 216230.CrossRefGoogle Scholar
Chung, Fan. 2005. Laplacians and the Cheeger inequality for directed graphs. Annals of Combinatorics, 9, 119.CrossRefGoogle Scholar
Chung, Fan, and Yau, Shing-Tung. 2000. Discrete Green’s Functions. Journal of Combinatorial Theory, Series A, 91(1), 191214.CrossRefGoogle Scholar
Chung, Fan, and Yau, Shing-Tung. 2007. Discrete Green’s functions. https://mathweb.ucsd.edu/ fan/wp/green.pdf. Online version of [53] with additional revisions, file dated 4 December 2007; accessed 29 August 2022.Google Scholar
Chung, Fan R. K. 1997. Spectral Graph Theory. CBMS Regional Conference Series in Mathematics, vol. 92. Providence, RI: American Mathematical Society.Google Scholar
Clarenz, Ulrich, Haußer, Frank, Rumpf, Martin, Voigt, Axel, and Weikard, Ulrich. 2005. On level set formulations for anisotropic mean curvature flow and surface diffusion. Pages 227237 of Voigt, , , Axel (ed.), Multiscale Modeling in Epitaxial Growth. International Series of Numerical Mathematics, vol. 149. Basel: Birkhäuser.CrossRefGoogle Scholar
Corless, Robert M., Gonnet, Gaston H., Hare, David E. G., Jeffrey, David J., and Knuth, Donald E. 1996. On the Lambert function. Advances in Computational Mathematics, 5, 329359.CrossRefGoogle Scholar
Coulhon, Thierry, and Koskela, Pekka. 2004. Geometric interpretations of -Poincaré inequalities on graphs with polynomial volume growth. Milan Journal of Mathematics, 72, 209248.CrossRefGoogle Scholar
Crook, Oliver M., Hurst, Tim, Schönlieb, Carola-Bibiane, Thorpe, Matthew, and Zygalakis, Konstantinos C. 2019. PDE-Inspired Algorithms for Semi-Supervised Learning on Point Clouds. arxiv.org/abs/1909.10221v1 [math.NA].Google Scholar
Cucuringu, Mihai, Pizzoferrato, Andrea, and van Gennip, , , Yves. 2021. An MBO scheme for clustering and semi-supervised clustering of signed networks. Communications in Mathematical Sciences, 19(1), 73109.CrossRefGoogle Scholar
Dacorogna, Bernard, Gangbo, Wilfrid, and Subía, Nelson. 1992. Sur une généralisation de l’inégalité de Wirtinger. Annales de l’Institut Henri Poincaré / Analyse non linéaire, 9(1), 2950.CrossRefGoogle Scholar
Maso, Dal, , Gianni. 1993. An Introduction to -Convergence. first ed. Progress in Nonlinear Differential Equations and Their Applications, vol. 8. Boston, MA: Birkhäuser.CrossRefGoogle Scholar
Desquesnes, Xavier, Elmoataz, Abderrahim, and Lézoray, Olivier. 2012. Generalized fronts propagation on weighted graphs. Pages 371381 of Proceedings of ALGORITMY: 19th Conference on Scientific Computing, 2012, Slovakia. Bratislava: Comenius University, Department of Applied Mathematics and Statistics. www.iam.fmph.uniba.sk/amuc/ojs/index.php/algoritmy/issue/view/18Google Scholar
Ding, Chris H. Q., Xiaofeng, He, Zha, Hongyuan, Ming, Gu, and Simon, Horst D. 2001. A min-max cut algorithm for graph partitioning and data clustering. Pages 107–114 of Cercone, , Nick, Lin, Young, Tsau, and Wu, Xindong (eds.), Proceedings 2001 IEEE International Conference on Data Mining. Washington, DC: IEEE Computer Society. https://ieeexplore.ieee.org/xpl/conhome/7762/proceeding.Google Scholar
Ding, Xiucai, and Hau-Tieng, Wu. 2022. Impact of signal-to-noise ratio and bandwidth on graph Laplacian spectrum from high-dimensional noisy point cloud. IEEE Transactions on Information Theory, 69(3), 18991931.CrossRefGoogle Scholar
Dym, Harry, and McKean, Henry Pratt. 1985. Fourier Series and Integrals. Elsevier Academic Press.Google Scholar
Ekeland, Ivar, and Temam, Roger. 1976. Convex Analysis and Variational Problems. Amsterdam: North-Holland. Translated from the French, Studies in Mathematics and Its Applications, Vol. 1.Google Scholar
Bouchairi, El, Imad. 2021. Partial differential equations on graphs: continuum limits on sparse graphs and applications. PhD thesis, University of Caen Normandy.Google Scholar
Bouchairi, El, Imad, Abderrahim, Elmoataz, and Fadili, Jalal M. 2023. Nonlocal perimeters and curvature flows on graphs with applications in image processing and high-dimensional data classification. SIAM Journal on Imaging Sciences, 16(1), 368392.CrossRefGoogle Scholar
Chakik, El, Abdallah, Elmoataz, Abderrahim, and Desquesnes, Xavier. 2014. Mean curvature flow on graphs for image and manifold restoration and enhancement. Signal Processing, 105, 449463.CrossRefGoogle Scholar
Elek, Gábor, and Szegedy, Balázs. 2012. A measure-theoretic approach to the theory of dense hypergraphs. Advances in Mathematics, 231(3), 17311772.CrossRefGoogle Scholar
Elmoataz, Abderrahim, Desquesnes, Xavier, and Lézoray, Olivier. 2012. Non-local morphological PDEs and -Laplacian equation on graphs with applications in image processing and machine learning. IEEE Journal of Selected Topics in Signal Processing, 6(7), 764779.CrossRefGoogle Scholar
Elmoataz, Abderrahim, Desquesnes, Xavier, Lakhdari, Zakaria, and Lézoray, Olivier. 2014. Nonlocal infinity Laplacian equation on graphs with applications in image processing and machine learning. Mathematics and Computers in Simulation, 102, 153163. 4th International Conference on Approximation Methods and Numerical Modeling in Environment and Natural Resources: PART II.Google Scholar
Elmoataz, Abderrahim, Desquesnes, Xavier, and Toutain, Matthieu. 2017. On the game -Laplacian on weighted graphs with applications in image processing and data clustering. European Journal of Applied Mathematics, 28(6), 922948.CrossRefGoogle Scholar
Elmoataz, Abderrahim, Lézoray, Olivier, and Bougleux, Sébastien. 2008. Nonlocal discrete -Laplacian driven image and manifold processing. Comptes Rendus Mecanique, 336(5), 428433.CrossRefGoogle Scholar
Elmoataz, Abderrahim, Toutain, Matthieu, and Tenbrinck, Daniel. 2015. On the -Laplacian and -Laplacian on graphs with applications in image and data processing. SIAM Journal on Imaging Sciences, 8(4), 24122451.CrossRefGoogle Scholar
Esedoḡlu, Selim, and Otto, Felix. 2015. Threshold dynamics for networks with arbitrary surface tensions. Communications on Pure and Applied Mathematics, 68(5), 808864.CrossRefGoogle Scholar
Estrada, Ernesto, and Knight, Philip Anthony. 2015. A First Course in Network Theory. Oxford: Oxford University Press.Google Scholar
Estrada, Ernesto, and Mugnolo, Delio. 2022. Hubs-biased resistance distances on graphs and networks. Journal of Mathematical Analysis and Applications, 507(1), 125728.CrossRefGoogle Scholar
Evans, Lawrence C. 1993. Convergence of an algorithm for mean curvature motion. Indiana University Mathematics Journal, 42(2), 533557.CrossRefGoogle Scholar
Evans, Lawrence C. 2010. Partial Differential Equations. Second ed. Graduate Studies in Mathematics, vol. 19. Providence, RI: American Mathematical Society.Google Scholar
Evans, Lawrence C., and Spruck, Joel. 1991. Motion of level sets by mean curvature. I. Journal of Differential Geometry, 33(3), 635681.CrossRefGoogle Scholar
Evans, Lawrence C., and Spruck, Joel. 1992a. Motion of level sets by mean curvature. II. Transactions of the American Mathematical Society, 330(1), 321332.CrossRefGoogle Scholar
Evans, Lawrence C., and Spruck, Joel. 1992b. Motion of level sets by mean curvature. III. The Journal of Geometric Analysis, 2(2), 121150.CrossRefGoogle Scholar
Evans, Lawrence C., and Spruck, Joel. 1995. Motion of level sets by mean curvature. IV. The Journal of Geometric Analysis, 5(1), 77114.CrossRefGoogle Scholar
Fanuel, Michaël, Alaíz, Carlos M., Fernández, Ángela, and Suykens, Johan A. K. 2018. Magnetic eigenmaps for the visualization of directed networks. Applied and Computational Harmonic Analysis, 44(1), 189199.CrossRefGoogle Scholar
Fanuel, Michaël, Alaíz, Carlos M., and Suykens, Johan A. K. 2017. Magnetic eigenmaps for community detection in directed networks. Physical Review E, 95(Feb), 022302.CrossRefGoogle ScholarPubMed
Fazeny, Ariane. 2023. -Laplacian Operators on Hypergraphs. arXiv:2304.06468 [math.OC]. M.Sc. thesis.Google Scholar
Fazeny, Ariane, Tenbrinck, Daniel, and Burger, Martin. 2023. Hypergraph -Laplacians, scale spaces, and information flow in networks. Pages 677690 of Calatroni, , Luca, Donatelli, Marco, Morigi, Serena, Prato, Marco, and Santacesaria, Matteo (eds.), Scale Space and Variational Methods in Computer Vision. Lecture Notes in Computer Science (LNCS), vol. 14009. Cham: Springer International.CrossRefGoogle Scholar
Flores, Mauricio, Calder, Jeff, and Lerman, Gilad. 2022. Analysis and algorithms for -based semi-supervised learning on graphs. Applied and Computational Harmonic Analysis, 60, 77122.CrossRefGoogle Scholar
Forman, Robin. 2003. Bochner’s method for cell complexes and combinatorial Ricci curvature. Discrete & Computational Geometry, 29(3), 323374.CrossRefGoogle Scholar
Galuppi, Francesco, Mulas, Raffaella, and Venturello, Lorenzo. 2023. Spectral theory of weighted hypergraphs via tensors. Linear Multilinear Algebra, 71(3), 317347.CrossRefGoogle Scholar
Cristina, Garcia-Cardona, Arjuna, Flenner, and Percus, Allon G. 2013. Multiclass diffuse interface models for semi-supervised learning on graphs. Pages 78–86 of Fred, Ana, and Marsico, De, , Maria (eds.), Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods (ICPRAM 2013). Springer Nature. https://link.springer.com/book/10.1007/978-3-319-12610-4.Google Scholar
Cristina, Garcia-Cardona, Arjuna, Flenner, and Percus, Allon G. 2015. Multiclass semi-supervised learning on graphs using Ginzburg–Landau functional minimization. Pages 119135 of Fred, , Ana, , and Marsico, De, , Maria (eds.), Pattern Recognition Applications and Methods. Advances in Intelligent Systems and Computing (AISC), vol. 318. Cham: Springer International.Google Scholar
Cristina, Garcia-Cardona, Ekaterina, Merkurjev, Bertozzi, Andrea L., Arjuna, Flenner, and Percus, Allon G. 2014. Multiclass data segmentation using diffuse interface methods on graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(8), 16001613.Google Scholar
Trillos, García, Nicolás, and Weber, Melanie. 2023. Continuum Limits of Ollivier’s Ricci Curvature on data clouds: pointwise consistency and global lower bounds. arxiv.org/abs/2307.02378v1 [math.DG].Google Scholar
Trillos, García, Nicolás, Sanz-Alonso, Daniel, and Yang, Ruiyi. 2019. Local regularization of noisy point clouds: improved global geometric estimates and data analysis. Journal of Machine Learning Research, 20(136), 137.Google Scholar
Ghosh, Rumi, and Lerman, Kristina. 2014. Rethinking centrality: the role of dynamical processes in social network analysis. Discrete & Continuous Dynamical Systems: Series B, 19(5), 13551372.CrossRefGoogle Scholar
Goemans, Michel X., and Williamson, David P. 1995. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of the ACM (JACM), 42(6), 11151145.CrossRefGoogle Scholar
Xue, Gong, Higham, Desmond J., and Zygalakis, Konstantinos. 2021. Directed network Laplacians and random graph models. Royal Society Open Science, 8(211144).Google Scholar
Grady, Leo J., and Polimeni, Jonathan R. 2010. Discrete Calculus: Applied Analysis on Graphs for Computational Science. London: Springer.CrossRefGoogle Scholar
Lars, Hagen, and Kahng, Andrew B. 1992. New spectral methods for ratio cut partitioning and clustering. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 11(9), 10741085.Google Scholar
Hale, Jack K. 2009. Ordinary Differential Equations. Second ed. Mineola, NY: Dover.Google Scholar
Hein, Matthias, Audibert, Jean-Yves, and von Luxburg, , , Ulrike. 2007. Graph Laplacians and their convergence on random neighborhood graphs. Journal of Machine Learning Research, 8, 13251368.Google Scholar
Franca, Hoffmann, Bamdad, Hosseini, Oberai, Assad A., and Stuart, Andrew M. 2022. Spectral analysis of weighted Laplacians arising in data clustering. Applied and Computational Harmonic Analysis, 56, 189249.Google Scholar
Huiyi, Hu, Justin, Sunu, and Bertozzi, Andrea L. 2015. Multi-class graph Mumford–Shah model for plume detection using the MBO scheme. Pages 209–222 of Tai, Bae, Xue-Cheng, Chan, Egil, Tony, F., and Lysaker, Marius (eds.), Energy Minimization Methods in Computer Vision and Pattern Recognition. Cham: Springer International.Google Scholar
Roberto, Iacono, and Boyd, John P. 2017. New approximations to the principal real-valued branch of the Lambert W-function. Advances in Computational Mathematics, 43(6), 14031436.Google Scholar
Ikeda, Masahiro, and Uchida, Shun. 2023. Nonlinear evolution equation associated with hypergraph Laplacian. Mathematical Methods in the Applied Sciences, 46(8), 94639476.CrossRefGoogle Scholar
Izenman, Alan Julian. 2023. Network Models for Data Science: Theory, Algorithms, and Applications. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Jacobs, Matt, Merkurjev, Ekaterina, and Esedoḡlu, Selim. 2018. Auction dynamics: a volume constrained MBO scheme. Journal of Computational Physics, 354, 288310.CrossRefGoogle Scholar
Jost, Jürgen, and Münch, Florentin. 2021. Characterizations of Forman Curvature. arxiv.org/abs/2110.04554v1 [math.DG].Google Scholar
Jost, Jürgen, and Mulas, Raffaella. 2019. Hypergraph Laplace operators for chemical reaction networks. Advances in Mathematics, 351, 870896.CrossRefGoogle Scholar
Jost, Jürgen, Mulas, Raffaella, and Zhang, Dong. 2022. -Laplace operators for oriented hypergraphs. Vietnam Journal of Mathematics, 50(2), 323358.CrossRefGoogle Scholar
Jost, Jürgen, Mulas, Raffaella, and Torres, Leo. 2023. Spectral theory of the non-backtracking Laplacian for graphs. Discrete Mathematics, 346(10), 113536.CrossRefGoogle Scholar
Karoui, Noureddine El, and Hau-Tieng, Wu. 2016. Graph connection Laplacian methods can be made robust to noise. The Annals of Statistics, 44(1), 346372.CrossRefGoogle Scholar
Keetch, Blaine. 2020. Applying partial differential equations on networks to approximate the Max-Cut and Max-K-Cut problems. PhD thesis, University of Nottingham.Google Scholar
Keetch, Blaine, and van Gennip, , , Yves. 2019. A Max-Cut approximation using a graph based MBO scheme. Discrete and Continuous Dynamical Systems: Series B, 24(11), 60916139.CrossRefGoogle Scholar
Kennedy, James B., Kurasov, Pavel, Léna, Corentin, and Mugnolo, Delio. 2021. A theory of spectral partitions of metric graphs. Calculus of Variations and Partial Differential Equations, 60(61).CrossRefGoogle Scholar
Klus, Stefan, and Conrad, Djurdjevac, , Nataša. 2023. Koopman-based spectral clustering of directed and time-evolving graphs. Journal of Nonlinear Science, 33(1), Paper No. 8, 22.CrossRefGoogle Scholar
Knight, Philip A., and Ruiz, Daniel. 2013. A fast algorithm for matrix balancing. IMA Journal of Numerical Analysis, 33(3), 10291047.CrossRefGoogle Scholar
Knill, Oliver, and Rak, Annie. 2016. Differential equations on graphs. https://people.math.harvard.edu/knill/pde/pde.pdf. HCRP Summer 2016 program project notes.Google Scholar
Korolev, Yury, Kuger, Lorenz, and Thorpe, Matthew. Laplace Learning for Feature Vectors in Hilbert Spaces. In preparation.Google Scholar
Rasmus, Kyng, Anup, Rao, Sushant, Sachdeva, and Spielman, Daniel A. 2015. Algorithms for Lipschitz learning on graphs. Pages 1190–1223 of Grünwald, Peter, Hazan, Elad, , and Kale, Satyen (eds.), Proceedings of the 28th Conference on Learning Theory. Proceedings of Machine Learning Research, vol. 40. Paris: Proceedings of Machine Learning Research.Google Scholar
Laux, Tim, and Lelmi, Jona. 2021. Large data limit of the MBO scheme for data clustering: -convergence of the thresholding energies. arXiv:2112.06737 [math.AP].Google Scholar
Laux, Tim, and Lelmi, Jona. 2022. De Giorgi’s inequality for the thresholding scheme with arbitrary mobilities and surface tensions. Calculus of Variations and Partial Differential Equations, 61(35).CrossRefGoogle Scholar
Laux, Tim, and Yip, Nung Kwan. 2019. Analysis of diffusion generated motion for mean curvature flow in codimension two: a gradient-flow approach. Archive for Rational Mechanics and Analysis, 232, 11131163.CrossRefGoogle Scholar
le Gorrec, Luce, Sandrine, Mouysset, Duff, Iain S., Knight, Philip Anthony, and Ruiz, Daniel. 2020. Uncovering hidden block structure for clustering. Pages 140155 of Brefeld, , Ulf, Fromont, Elisa, Hotho, Andreas, Knobbe, Arno, Maathuis, Marloes, , and Robardet, Céline (eds.), Machine Learning and Knowledge Discovery in Databases. Lecture Notes in Computer Science (LNAI), vol. 11906. Cham: Springer International.Google Scholar
Lee, James R., Gharan, Oveis, Shayan, and Trevisan, Luca. 2014. Multiway spectral partitioning and higher-order Cheeger inequalities. Journal of the ACM, 61(6).CrossRefGoogle Scholar
Levi, Matteo, Santagati, Federico, Tabacco, Anita, and Vallarino, Maria. 2023. Poincaré inequalities on graphs. Analysis Mathematica, 49(2), 529544.CrossRefGoogle Scholar
Lovász, László. 2009. Geometric Representations of Graphs. https://web.cs.elte.hu/lovasz/geomrep.pdf. Online notes; accessed 4 August 2022.Google Scholar
Lozes, François, Elmoataz, Abderrahim, and Lézoray, Olivier. 2012. Nonlocal processing of 3D colored point clouds. Pages 1968–1971 of Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012). New York, NY: IEEE. https://ieeexplore.ieee.org/xpl/conhome/6425799/proceeding.Google Scholar
Luckhaus, Stephan, and Sturzenhecker, Thomas. 1995. Implicit time discretization for the mean curvature flow equation. Calculus of Variations and Partial Differential Equations, 3(2), 253271.CrossRefGoogle Scholar
Macgregor, Peter. 2022. On Learning the Structure of Clusters in Graphs. arxiv.org/abs/2212.14345 [cs.DS]. PhD thesis, University of Edinburgh.Google Scholar
Macgregor, Peter, and Sun, He. 2021a. Finding bipartite components in hypergraphs. Pages 7912–7923 of Ranzato, Beygelzimer, Marc’Aurelio, Dauphin, Alina, Yann, N., Liang, Percy S., and Vaughan, Wortman, , Jennifer (eds.), Advances in Neural Information Processing Systems, vol. 34. New York, NY: Curran Associates.Google Scholar
Macgregor, Peter, and Sun, He. 2021b. Local algorithms for finding densely connected clusters. Pages 7268–7278 of Meila, Marina, and Zhang, Tong (eds.), Proceedings of the 38th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 139.Google Scholar
Macgregor, Peter, and Sun, He. 2022. A tighter analysis of spectral clustering, and beyond. Pages 14717–14742 of Chaudhuri, Kamalika, Jegelka, Stefanie, Song, Le, Szepesvari, Csaba, Niu, Gang, , and Sabato, Sivan (eds.), Proceedings of the 39th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 162.Google Scholar
MacKay, Robert S., Johnson, Samuel, and Sansom, Brandon 2020. How directed is a directed network? Royal Society Open Science, 7(201138).CrossRefGoogle ScholarPubMed
Manfredi, Juan J., Oberman, Adam M., and Sviridov, Alexander P. 2015. Nonlinear elliptic partial differential equations and -harmonic functions on graphs. Differential Integral Equations, 28(1–2), 79102.Google Scholar
Mazón, José M. , Marcos, Solera, and Toledo, Julián. 2020a. The heat flow on metric random walk spaces. Journal of Mathematical Analysis and Applications, 483(2), 123645.CrossRefGoogle Scholar
Mazón, José M. , Marcos, Solera, and Toledo, Julián. 2020b. The total variation flow in metric random walk spaces. Calculus of Variations and Partial Differential Equations, 59(29), 164.CrossRefGoogle Scholar
Merkurjev, Ekaterina, Bertozzi, Andrea, Yan, Xiaoran, and Lerman, Kristina. 2017. Modified Cheeger and ratio cut methods using the Ginzburg–Landau functional for classification of high-dimensional data. Inverse Problems, 33, 074003.CrossRefGoogle Scholar
Ekaterina, Merkurjev, Bertozzi, Andrea L., and Chung, Fan. 2018. A semi-supervised heat kernel pagerank MBO algorithm for data classification. Communications in Mathematical Sciences, 16(5), 12411264.Google Scholar
Ekaterina, Merkurjev, Cristina, Garcia-Cardona, Bertozzi, Andrea L., Arjuna, Flenner, and Percus, Allon G. 2014. Diffuse interface methods for multiclass segmentation of high-dimensional data. Applied Mathematics Letters, 33, 2934.Google Scholar
Ekaterina, Merkurjev, Tijana, Kostic, and Bertozzi, Andrea L. 2013. An MBO scheme on graphs for segmentation and image processing. SIAM Journal on Imaging Sciences, 6(4), 19031930.Google Scholar
Merkurjev, Ekaterina, Nguyen, Duc Duy, and Wei, Guo-Wei. 2022. Multiscale laplacian learning. Applied Intelligence, 53, 1572715746.CrossRefGoogle ScholarPubMed
Barry, Merriman, Bence, James K., and Osher, Stanley J. 1992. Diffusion Generated Motion by Mean Curvature. UCLA Department of Mathematics CAM report 92-18. https://ww3.math.ucla.edu/cam-reports-1986-2000/Google Scholar
Barry, Merriman, Bence, James K., and Osher, Stanley J. 1993. Diffusion generated motion by mean curvature. AMS Selected Letters, Crystal Grower’s Workshop, 7383.Google Scholar
Kevin, Miller, John, Mauro, Jason, Setiadi, Xoaquin, Baca, Zhan, Shi, Jeff, Calder, and Bertozzi, Andrea L. 2022. Graph-based active learning for semi-supervised classification of SAR data. Page 120950C of Zelnio, Edmund, , and Garber, Frederick D. (eds.), Algorithms for Synthetic Aperture Radar Imagery XXIX, vol. 12095. SPIE.Google Scholar
Misiats, Oleksandr, and Yip, Nung Kwan. 2016. Convergence of space-time discrete threshold dynamics to anisotropic motion by mean curvature. Discrete and Continuous Dynamical Systems, 36(11), 63796411. https://www.aimsciences.org/article/doi/10.3934/dcds.2016076.Google Scholar
Moreau, Luc. 2004. Stability of continuous-time distributed consensus algorithms. Pages 39984003 of 43rd IEEE Conference on Decision and Control, vol. 4. DOI: https://doi.org10.1109/CDC.2004.1429377.Google Scholar
Mucha, Peter J., Richardson, Thomas, Macon, Kevin, Porter, Mason A., and Onnela, Jukka-Pekka. 2010. Community structure in time-dependent, multiscale, and multiplex networks. Science, 328(5980), 876878.CrossRefGoogle ScholarPubMed
Münch, Florentin, and Wojciechowski, Radosław K. 2019. Ollivier Ricci curvature for general graph Laplacians: heat equation, Laplacian comparison, non-explosion and diameter bounds. Advances in Mathematics, 356, 106759.CrossRefGoogle Scholar
Mugnolo, Delio. 2014. Semigroup Methods for Evolution Equations on Networks. Cham: Springer International.CrossRefGoogle Scholar
Mulas, Raffaella. 2021. A Cheeger cut for uniform hypergraphs. Graphs and Combinatorics, 37(6), 22652286.CrossRefGoogle Scholar
Mulas, Raffaella, Horak, Danijela, and Jost, Jürgen. 2022a. Graphs, simplicial complexes and hypergraphs: spectral theory and topology. Pages 1–58 of Battiston, , Federico, , and Petri, Giovanni (eds.), Higher-Order Systems. Understanding Complex Systems. Cham: Springer International 36(11).Google Scholar
Mulas, Raffaella, Kuehn, Christian, Böhle, Tobias, and Jost, Jürgen. 2022b. Random walks and Laplacians on hypergraphs: when do they match? Discrete Applied Mathematics, 317, 2641.CrossRefGoogle Scholar
Mulas, Raffaella, and Zhang, Dong. 2021. Spectral theory of Laplace operators on oriented hypergraphs. Discrete Mathematics, 344(6), 112372.CrossRefGoogle Scholar
Mulas, Raffaella, Zhang, Dong, and Zucal, Giulio. 2024. There is no going back: properties of the non-backtracking Laplacian. Linear Algebra and its Applications, 680, 341370. https://doi.org/10.1016/j.laa.2023.10.014.CrossRefGoogle Scholar
Mumford, David, and Shah, Jayant. 1989. Optimal approximations by piecewise smooth functions and associated variational problems. Communications on Pure and Applied Mathematics, 42, 577685. DOI: https://doi.org/10.1002/cpa.3160420503.CrossRefGoogle Scholar
Neuberger, John M. 2006. Nonlinear elliptic partial difference equations on graphs. Experimental Mathematics, 15(1), 91107.CrossRefGoogle Scholar
Ng, Andrew Y., Jordan, Michael I., and Weiss, Yair. 2001. On spectral clustering: analysis and an algorithm. Pages 849–856 of Dietterich, Thomas, G., Suzanna, Becker, and Ghahramani, Zoubin (eds.), Advances in Neural Information Processing Systems, vol. 14. Cambridge, MA: Massachusetts Institute of Technology Press.Google Scholar
Chien-Chun, Ni, Lin, Yu-Yao, Luo, Feng, and Gao, Jie. 2019a. Author correction: community detection on networks with Ricci flow. Scientific Reports, 9(12870).Google Scholar
Chien-Chun, Ni, Lin, Yu-Yao, Luo, Feng, and Gao, Jie. 2019b. Community detection on networks with Ricci flow. Scientific Reports, 9(9984).Google Scholar
Reza, Olfati-Saber, and Murray, Richard M. 2004. Consensus problems in networks of agents with switching topology and time-delays. IEEE Transactions on Automatic Control, 49(9), 15201533.Google Scholar
Ollivier, Yann. 2009. Ricci curvature of Markov chains on metric spaces. Journal of Functional Analysis, 256(3), 810864.CrossRefGoogle Scholar
Ollivier, Yann. 2010. A survey of Ricci curvature for metric spaces and Markov chains. Pages 343–381 of Kotani, Motoko, Hino, Masanori, , and Kumagai, Takashi (eds.), Probabilistic Approach to Geometry. Advanced Studies in Pure Mathematics, vol. 57. Mathematical Society of Japan, Tokyo.Google Scholar
Osher, Stanley J., and Sethian, James A. 1988. Fronts propagating with curvature-dependent speed: algorithms based on Hamilton–Jacobi formulations. Journal of Computational Physics, 79(1), 1249.CrossRefGoogle Scholar
Parker, Brian J., and Feng, Dagan D. 2005. Graph-based Mumford–Shah segmentation of dynamic PET with application to input function estimation. IEEE Transactions on Nuclear Science, 52(1), 7989.CrossRefGoogle Scholar
Peng, Richard, Sun, He, and Zanetti, Luca. 2017. Partitioning well-clustered graphs: spectral clustering works! SIAM Journal on Computing, 46(2), 710743.CrossRefGoogle Scholar
Peres, Yuval, and Sheffield, Scott. 2008. Tug-of-war with noise: a game-theoretic view of the -Laplacian. Duke Mathematical Journal, 145(1), 91120.CrossRefGoogle Scholar
Porter, Mason A., Onnela, Jukka-Pekka, and Mucha, Peter J. 2009. Communities in networks. Notices of the American Mathematical Society, 56(9), 1082–1097, 1164–1166.Google Scholar
Rak, Annie. 2017. Advection on Graphs. http://nrs.harvard.edu/urn-3:HUL.InstRepos:38779537. Senior thesis.Google Scholar
Ruuth, Steven J. 1996. Efficient algorithms for diffusion-generated motion by mean curvature. PhD thesis, University of British Columbia.Google Scholar
Schaeffer, Satu Elisa. 2007. Graph clustering. Computer Science Review, 1(1), 2764.CrossRefGoogle Scholar
Shen, Jianhong, and Kang, Sung Ha. 2007. Quantum TV and applications in image processing. Inverse Problems and Imaging, 1(3), 557575.CrossRefGoogle Scholar
Shi, Jianbo, and Malik, Jitendra. 2000. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888905.Google Scholar
Shubin, Mikhail A. 1994. Discrete Magnetic Laplacian. Communications in Mathematical Physics, 164, 259275.CrossRefGoogle Scholar
Sinclair, Alistair, and Jerrum, Mark. 1989. Approximate counting, uniform generation and rapidly mixing Markov chains. Information and Computation, 82(1), 93133.CrossRefGoogle Scholar
Slepčev, Dejan, and Thorpe, Matthew. 2019. Analysis of -Laplacian regularization in semisupervised learning. SIAM J. Math. Anal., 51(3), 20852120.CrossRefGoogle Scholar
Smola, Alexander J., and Kondor, Risi. 2003. Kernels and regularization on graphs. Pages 144–158 of Schölkopf, Bernhard, , and Warmuth, Manfred K. (eds.), Learning Theory and Kernel Machines. Lecture Notes in Computer Science (LNAI), vol. 2777. Berlin: Springer.CrossRefGoogle Scholar
Szeliski, Richard, Zabih, Ramin, Scharstein, Daniel, Veksler, Olga, Kolmogorov, Vladimir, Agarwala, Aseem, Tappen, Marshall, and Rother, Carsten. 2008. A comparative study of energy minimization methods for Markov random fields with smoothness-based priors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(6), 10681080.CrossRefGoogle ScholarPubMed
Vinh-Thong, Ta, Elmoataz, Abderrahim, and Lézoray, Olivier. 2008. Nonlocal morphological levelings by partial difference equations over weighted graphs. Pages 1–4 of 2008 19th International Conference on Pattern Recognition. IEEE Computer Society Digital Library. www.computer.org/csdl/proceedings/icpr/2008/12OmNx8wTfL.Google Scholar
Vinh-Thong, Ta, Elmoataz, Abderrahim, and Lézoray, Olivier. 2011. Nonlocal PDEs-based morphology on weighted graphs for image and data processing. IEEE Transactions on Image Processing, 20(6), 15041516.CrossRefGoogle Scholar
Thorpe, Matthew, and Wang, Bao. 2022. Robust certification for Laplace learning on geometric graphs. Pages 896920 of Bruna, , Joan, Hesthaven, Jan, , and Zdeborova, Lenka (eds.), Proceedings of the 2nd Mathematical and Scientific Machine Learning Conference. Proceedings of Machine Learning Research, vol. 145. Brookline, MA: Microtome Publishing.Google Scholar
Tian, Yu, Lubberts, Zachary, and Weber, Melanie. 2023. Curvature-Based Clustering on Graphs. arxiv.org/abs/2307.10155v1 [cs.SI].Google Scholar
van der Hoorn, Pim, Cunningham, William J., Gabor, Lippner, Carlo, Trugenberger, and Krioukov, Dmitri. 2021. Ollivier–Ricci curvature convergence in random geometric graphs. Physical Review Research, 3(Mar), 013211.Google Scholar
van Gennip, , , Yves. 2019. Graph MBO on star graphs and regular trees. With corrections to DOI 10.1007/s00032-014-0216-8. Milan Journal of Mathematics, 87(1), 141168.CrossRefGoogle Scholar
van Gennip, , , Yves. 2020. An MBO scheme for minimizing the graph Ohta–Kawasaki functional. Journal of Nonlinear Science, 30, 23252373.CrossRefGoogle Scholar
van Gennip, Yves, and Bertozzi, Andrea L. 2012. -convergence of graph Ginzburg–Landau functionals. Advances in Differential Equations, 17(11–12), 11151180.CrossRefGoogle Scholar
van Gennip, , Yves, and Budd, Jeremy. 2024. Differential Equations and Variational Methods on Graphs: with Applications in Machine Learning and Image Analysis. Cambridge: Cambridge University Press.Google Scholar
van Gennip, Yves, Nestor, Guillen, Braxton, Osting, and Bertozzi, Andrea L. 2014. Mean curvature, threshold dynamics, and phase field theory on finite graphs. Milan Journal of Mathematics, 82(1), 365.Google Scholar
Vol’pert, Aizik Isaakovich. 2015. Differential equations on graphs. Mathematical Modelling of Natural Phenomena, 10(5), 615. Reprinted with permission from Mathematics of the USSR-Sbornik, Vol. 17 (1972), no. 4.CrossRefGoogle Scholar
von Luxburg, , , Ulrike. 2007. A tutorial on spectral clustering. Statistics and Computing, 17(4), 395416.CrossRefGoogle Scholar
Wang, Yu-Xiang, Sharpnack, Smola, James, Alexander, J., and Tibshirani, Ryan J. 2016. Trend filtering on graphs. Journal of Machine Learning Research, 17(105), 141.Google Scholar
Wardetzky, Max, Mathur, Saurabh, Kälberer, Felix, and Grinspun, Eitan. 2007. Discrete Laplace operators: no free lunch. Pages 33–37 of SGP ’07: Proceedings of the Fifth Eurographics Symposium on Geometry Processing. Goslar: Eurographics.Google Scholar
Weber, Melanie, Saucan, Emil, and Jost, Jürgen. 2017a. Characterizing complex networks with Forman–Ricci curvature and associated geometric flows. Journal of Complex Networks, 5(4), 527550.CrossRefGoogle Scholar
Weber, Melanie, Saucan, Emil, and Jost, Jürgen. 2017b. Coarse geometry of evolving networks. Journal of Complex Networks, 6(5), 706732.CrossRefGoogle Scholar
Weihs, Adrien and Thorpe, Matthew. 2024. Consistency of fractional graph-Laplacian regularization in semi-supervised learning with finite labels. SIAM Journal on Mathematical Analysis. 56(4). 42534295. https://doi.org/10.1137/23M1559087.CrossRefGoogle Scholar
Wigderson, Yuval. 2016. Harmonic Functions on Graphs. https://web.stanford.edu/yuvalwig/math/teaching/HarmonicNotes.pdf. Online notes for Mathcamp 2016, Colby College, Waterville, ME; accessed 4 August 2022.Google Scholar
Xia, Feng, Sun, Ke, Shuo, Yu, Aziz, Abdul, Wan, Liangtian, Pan, Shirui, and Liu, Huan. 2021. Graph learning: a survey. IEEE Transactions on Artificial Intelligence, 2(2), 109127.CrossRefGoogle Scholar
Yan, Xiaoran, Teng, Shang-hua, Lerman, Kristina, and Ghosh, Rumi. 2016. Capturing the interplay of dynamics and networks through parameterizations of Laplacian operators. PeerJ Computer Science, 2(e57).CrossRefGoogle Scholar
Zhang, Songyang, Ding, Zhi, and Cui, Shuguang. 2020. Introducing hypergraph signal processing: theoretical foundation and practical applications. IEEE Internet of Things Journal, 7(1), 639660.CrossRefGoogle Scholar
Zhao, Yufei. 2015. Hypergraph limits: a regularity approach. Random Structures & Algorithms, 47(2), 205226.CrossRefGoogle Scholar
Zhou, Dengyong, Huang, Jiayuan, and Schölkopf, Bernhard. 2005. Learning from labeled and unlabeled data on a directed graph. Pages 1036–1043 of De Raedt, Luc, , and Wrobel, Stefan (eds.), Proceedings of the 22nd International Conference on Machine Learning. New York, NY: Association for Computing Machinery.Google Scholar
Zhou, Dengyong, and Schölkopf, Bernhard. 2004 (January). A regularization framework for learning from graph data. Pages 132137 of ICML Workshop on Statistical Relational Learning and Its Connections to Other Fields. Munich: Max Planck Society for the Advancement of Science. https://is.mpg.de/publications/2688.Google Scholar
Zhou, Dengyong, and Schölkopf, Bernhard. 2005. Regularization on discrete spaces. Pages 361–368 of Kropatsch, Walter, G., Robert, Sablatnig, and Hanbury, Allan (eds.), Pattern Recognition. Berlin: Springer.Google ScholarPubMed
Zhou, Dengyong, Schölkopf, Bernhard, and Hofmann, Thomas. 2004. Semi-supervised learning on directed graphs. Pages 1633–1640 of Saul, Lawrence, K., Yair, Weiss, and Bottou, Léon (eds.), Advances in Neural Information Processing Systems (NIPS 2004), vol. 17. Cambridge, MA: Masachusetts Institute of Technology Press.Google Scholar
Zhou, Xueyuan, and Belkin, Mikhail. 2011. Semi-supervised learning by higher order regularization. Pages 892900 of Gordon, , Geoffrey, Dunson, David, , and Dudík, Miroslav (eds.), Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 15. Fort Lauderdale, FL: PMLR.Google Scholar
Zhu, Rong, Zou, Zhaonian, and Jianzhong, Li. 2017. SimRank on uncertain graphs. IEEE Transactions on Knowledge and Data Engineering, 29(11), 25222536.CrossRefGoogle Scholar
Zhu, Xiaojin, Lafferty, John, and Ghahramani, Zoubin. 2003a. Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions. Pages 215–222 of Proceedings of the Twentieth International Conference on Machine Learning (ICLM-2003). Washington, DC: Association for the Advancement of Artificial Intelligence.Google Scholar
Zhu, Xiaojin, Ghahramani, Zoubin, and Lafferty, John. 2003b. Semi-supervised learning using Gaussian fields and harmonic functions. Pages 912–919 of Fawcett, Tom, , and Mishra, Nina (eds.), Proceedings of the Twentieth International Conference on Machine Learning (ICLM-2003). Washington, DC: Association for the Advancement of Artificial Intelligence.Google Scholar
Zucal, Giulio. 2023. Action convergence of general hypergraphs and tensors. arxiv.org/abs/2308.00226v1 [math.CO].Google Scholar
Figure 0

Figure 3.1 An example of a finite, simple, connected, and undirected graph.

Figure 1

Figure 4.1 An example of a directed graph; specifically, an oriented graph.

Figure 2

Figure 5.1(a) Two types of potential W that are commonly used in the Ginzburg–Landau functional: the double-well potential W(x)=14x2(x−1)2

Figure 3

Figure 5.1(a) and the double-obstacle potential from (5.1).

Figure 4

Figure 6.1(a) An example of a disconnected graph

Figure 5

Figure 6.1(b) and that same graph with nodes coloured according to the value of the Fiedler vector at that node.

Figure 6

Figure 6.2(a) The graph from Figure 6.1 with an extra edge added to make it connected

Figure 7

Figure 6.2(b) and that same connected graph with nodes coloured according to the value of the Fiedler vector at that node.

Figure 8

Figure 6.3(a) and Figure 6.2

Figure 9

Figure 6.3(b) plotted on a (crunched) log scale to emphasize the change in the second eigenvalue, indicated by a square marker.

Save element to Kindle

To save this element to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

A Prolegomenon to Differential Equations and Variational Methods on Graphs
Available formats
×

Save element to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

A Prolegomenon to Differential Equations and Variational Methods on Graphs
Available formats
×

Save element to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

A Prolegomenon to Differential Equations and Variational Methods on Graphs
Available formats
×