Gauge Theory and the Geometrisation of Physics

Henrique De Andrade Gomes

doi:10.1017/9781009029308

1 Introduction

1.1 Gauge Theories

Gauge theories lie at the heart of modern physics: in particular, they constitute the Standard Model of particle physics. But they have so far received far less philosophical analysis than the other revolutions of twentieth-century physics, namely, relativity and quantum mechanics. This is unfortunate, since they raise many philosophical questions.

For example: at its simplest, the idea of gauge is that nature is best described using a descriptively redundant language. This idea is tied to important topics in philosophy: Putnam’s permutation argument, structural scientific realism, Fregean sense and reference, to mention just a few. But in this context, the idea also prompts a conceptual puzzle: how can mere redundancy be scientifically fruitful and explanatory? Here, we will focus on this last puzzle, and try to give some answers. But first, we need to know what gauge theory is in more detail.

The first thing to say is that gauge theory is about symmetries. As a first approximation, we can think of a symmetry of a physical theory as a map (function) on the physical states that the theory attributes to the system it describes. The important property of this map is that it preserves the values of a salient, usually large, set of physical quantities of the system. Of course, this broad idea is made precise in various different ways: for example as a map on the space of states, or on the set of quantities; and as a map that must respect the system’s dynamics, for example, by mapping solutions to solutions or even by preserving the value of the Lagrangian.

But what makes a symmetry “gauge”? In the dictionary, “to gauge” means “to estimate or measure.” Historically, the relation to this dictionary meaning comes from Weyl (Reference Weyl1929), who introduced gauge fields in an attempt to understand electromagnetism as a measure of length-change along a curve in spacetime (I will give a brief historical timeline in Section 1.3). But that original meaning bears little resemblance to how “gauge” is understood today.

Today, we say gauge theory is a theory that has local symmetries: symmetries whose action on the local state of a system over a given spacetime region does not determine its action on the state over any nonoverlapping region.

The states with which gauge theories are concerned are described by fields over spacetime: we have a space of determinables over each spacetime point, and fixing a particular state or value of a field fixes a determinate value over each spacetime point. This is very much like the description of states of general relativity, which are described by metric and matter tensors. So what are the possible values of the states of gauge theory? In our most current understanding of particle physics – given by what’s called “the Standard Model,” developed in the and 1960s and 1970s – there are many fields: one for each particle type. So we have an electron field, a photon field, a gluon field, a quark field, a Higgs field, and so on. Gauge theory parts ways from general relativity in that the value space of these fields is not “soldered” onto spacetime. It has a rich structure that is not just supervenient on the properties of space and time.

And as to gauge symmetries, they are more than mere solution-preserving maps on the space of solutions of the theory: they are characterized independently of the states or quantities on which they act, forming a group whose elements can also be seen as points of a smooth manifold. In short: gauge symmetries are described by Lie groups. For illustration, the symmetry group of the Standard Model is called $S U (3) \times S U (2) \times U (1)$ , and we will have more to say about it in Section 4. The action of a gauge group on each spacetime point severely constrains, or partially determines, the properties of the particles of the Standard Model, who must vary in a prescribed manner under the action of this group.

Summing up this quick introduction of gauge symmetries, they are usually understood as: (i) leaving points of spacetime invariant, and (ii) not affecting the physical states of the system.

But both (i) and (ii) are somewhat flexible. Many physicists call the symmetries of general relativity “gauge”, and these don’t satisfy (i); and as to (ii), though physicists will agree that a gauge transformation has the connotation of being empirically undetectable, they are less concerned about whether symmetry-related states are metaphysically identical.

In contrast, for philosophers, faced with a symmetry, the obvious question is: do the two states thus related represent the same physical state of affairs? Of course, this was the very question at the center of Newton’s dispute with Leibniz, encapsulated in the Leibniz-Clarke Correspondence. It has resonated down the centuries, and rightly so: it lies at the center of natural philosophy. For it raises a cluster of good logico-philosophical questions about the identity of objects (both bodies, and spatial points and spacetime points), and about possibility: some of them linked, for example, whether some version of the principle of the identity of indiscernibles is true.

I will, of course, not be able to rehash these questions here; this is a short introduction! In fact, I won’t be able to comprehensively review any of these topics even in the narrow case of gauge theory. So, like most politicians, when faced with a tough question I will answer a different, easier one (and hope the audience doesn’t notice). I will approach the many questions about gauge symmetry and equivalence only obliquely, by drawing parallels between the case of bona-fide gauge theories and general relativity; the foremost example of a spacetime theory with symmetries. The hope is to show that, to the extent that one can draw certain kinds of conclusions – about contentious topics such as locality and physical equivalence – in the case of general relativity, there is very good reason to believe that these conclusions also apply to gauge theories.

General relativity is a thoroughly “geometricised” theory, and so it pays to understand gauge theory also through geometry. The geometrical theory that deals with symmetry groups acting at each spacetime point is called the theory of fiber bundles. In this formalism, the mathematical object determining how the photon, the gluons, or, more generally, how internal states of all particles, evolve is called the gauge potential.

Historically, gauge theory was not introduced geometrically. It was a tool used to simplify the field equations of electromagnetism using the gauge potential, and the inherent ambiguity in its representation – the origin of gauge freedom – was considered an indication that the potential lacked physical meaning. The geometrization of this theory and its generalizations was one the most formidable examples of an unintentional convergence between physics and mathematics. As C.N. Yang (Reference Yang1983, p. 567) recollects:

The Maxwell equations and the principles of quantum mechanics led to the idea of gauge invariance. Attempts to generalize this idea, motivated by physical concepts of phases, symmetries, and conservation laws, led to the theory of non-Abelian gauge fields. That non-Abelian gauge fields are conceptually identical to ideas in the beautiful theory of bundles, developed by mathematicians without reference to the physical world, left me astounded. In 1975, I discussed my considerations with Chern and said, “This is both exciting and perplexing, as you mathematicians invented these concepts out of nothing.” Immediately, he protested, “No, no. These concepts were not dreamed up. They were natural and real.”

And it is indeed true that Yang and Mills were exclusively seeking a generalization of Maxwell’s equations, with no knowledge of a geometric interpretation via bundles (an interpretation that we aim to delve into). Their quest was more than justified on purely physical grounds: the theory of quantum electrodynamics is one of the most successful in the history of physics. The goal of physics (or that of a significant portion of physicists) was then (and perhaps still is) to place all particles on the same footing as the photon.

1.2 Roadmap for This Element

First, I should be clear about my intentions. I don’t want to merely repeat or summarize what is easily found in the literature. I want to advertise my own, sometimes novel, understanding of gauge theory. I will try to be clear about what is orthodoxy and what is not; the unorthodox will be marked with a $#$ . Likewise, some sections will go slow, and be appropriate for a beginning graduate student, covering more standard material; others will require more mathematical background, and be more suitable for the professional researcher: the more advanced will be marked with a $*$ .

After setting so many topics aside, and being forthright about my intentions, let me tell you what you will find in this Element.

It is often said that it is not the business of physics, and science more generally, to ask “why” things happen the way that they do; its purview is only to ask “how.” This motto may rightly apply to some teleological ideas which have been long abandoned in physics. But nowadays, whenever a new formalism is introduced, physicists – and philosophers of physics even more so – want to understand why it is necessary. If we are going to postulate a rich internal structure to our theories and, perhaps more surprisingly, transformations of our models that correspond to no empirical difference (as per (ii) previously), we better have good reason to do so.

In my view, there are many answers to the “why” question. Unfortunately, I only have space to focus on two of them, which will be given in Sections 2 and 4; and I will skim a third reason in Section 5. The first is pragmatic, the second is geometrical, and the third is relational. All three are compatible, and indeed complementary.Footnote ¹

In Section 2 I will discuss a methodological reason to introduce more degrees of freedom than are to be counted as physically distinguishing. We will see that gauge redundancy is useful for constructing new theories for a very specific reason, related to Noether’s famous theorems. The purpose of this section is to provide a good practical reason to introduce gauge, or redundant, degrees of freedom, so it will assume certain properties of gauge fields, leaving their conceptual and mathematical treatment to the next section.

Section 3 will thus present the modern mathematical view of gauge theory and give a conceptual appraisal of the variables involved. Here I will introduce the theory of fiber bundles: the appropriate mathematical formalism to talk about “internal (or value) spaces” over spacetime. (Although there are several good textbooks with the mathematical machinery expounded in this section, I think none cover the subset that I think is most important for the conceptually minded physicist or philosopher; and none strike quite the balance between rigor and simplicity that I am aiming for.) The overarching conceptual theme of this section is that the modern mathematical view involves understanding gauge symmetries geometrically, as more like spacetime symmetries.

In Section 4 this analogy will be fully expanded and expounded. Here, using the machinery of Section 3, I give a very brief introduction of the Standard Model. I will argue that, as far as physics is concerned, the local symmetries of gauge theory are very closely analogous to the local Lorentz symmetry of spacetime. In this analogy, neither kind of symmetry is fundamental, both arise as sets of transformations that preserve a local geometric structure – of an internal vector space in the case of gauge theory and of the metric of spacetime in general relativity, respectively.

It would be impossible to write an introduction to gauge theory without mentioning the Aharonov–Bohm effect in electromagnetism – a philosopher’s favorite! The effect is important because it supposedly captures a phenomenon that cannot be described using only local, gauge invariant quantities. So I will finish this Element by setting the Aharonov–Bohm effect firmly within the geometric formalism, and discussing what is the appropriate notion of nonlocality that the Aharonov–Bohm effect evinces. But more importantly, I will use the effect to discuss nonlocality and nonseparability, and their relationship to gauge theory.

Now I will end this introduction with a brief historical timeline, recounting the most important steps in the development of gauge theories, in both physics and mathematics. An excellent source for this material is O’Raifertaigh (Reference O’Raifertaigh1997).

1.3 Historical Timeline

1.3.1 In Physics

1918–19: Weyl’s “unified theory”/infinitesimal geometry introduces gauge (Eichung) rescaling symmetry.
1929: Weyl introduces the gauge principle for the Abelian group U(1) in quantum mechanics, in order to “explain” Maxwell’s theory of electromagnetism.
1954: Yang & Mills, and Shaw, produce the first non-Abelian gauge theory for the group SU(2) (as an attempt to describe the strong interaction between proton/neutron as a doublet).
1954–55: Independently, R. Utiyama develops the framework of gauge theory for any Lie group $G$ . He shows that general relativity is, in a certain sense, a gauge theory of the local Lorentz group $G = S O (1, 3)$ .
1960s–70s: By a series of rapid developments, the Standard Model of particle theory arises (electroweak unification by Glashow in 1961; spontaneous symmetry breaking (SSB) mechanism by Englert-Brout-Higgs in 1964, Quarks by Gell-Mann and Zweig in 1964, electroweak theory with SSB by Weinberg and Salam in 1967, renormalizability by ’t Hooft 1971; asymptotic freedom by Politzer (and Gross and Wilczek) in 1973, etc.). Particle physics is described as a gauge field theory of with gauge group $G = U (1) \times$ $S U (2) \times S U (3)$ .

1.3.2 In Mathematics

1916–17: Theory of connections on manifolds by Levi-Civita, Schouten.
1918–19: Weyl’s infinitesimal geometry falls within this current of ideas.
1920s: Cartan’s “espaces generalisés,” a vast synthesis of (pseudo-) Riemannian and Klein geometries.
1930s: Whitney’s first definition of fibered spaces, or fiber bundles: spaces with “structured points”.
1950s–60s: Mature theory of connections on fiber bundles (Ehresmann, 1950; Steenrod, 1951; Kobayashi, 1957)

The two strands finally converged in the mid-1970s. In 1975, T. T. Wu and C.N. Yang published a paper about the physicist’s electromagnetic field theory and its relationship to the mathematician’s fiber bundle theory (Wu & Yang, Reference Wu and Yang1975). To clarify the deep – and precise – relation between these two strands, they constructed a dictionary. In 1976 Isadore Singer visited Stony Brook and Yang gave him a copy of the Wu-Yang preprint, which Singer took to Oxford. There, Michael Atiyah and other mathematicians studied the paper and began to work on gauge fields and related topics, leading to a period of close collaboration between mathematicians and physicists. Figure 1 is a table taken from a paper in this period.

Figure 1 The Wu-Yang Dictionary, as described by Isadore Singer, in an article about Weyl in Wells (Reference Wells1988).

2 Why Gauge? A Noether, Methodological Reason

All interpretations of modern gauge theories adopt two core assumptions at their foundation. The first is that gauge symmetry arises when there are more variables in a theory than there are physical degrees of freedom. Hence the well-known soubriquets: gauge is “descriptive redundancy,” “surplus structure,” and “descriptive fluff.” Correspondingly, considerable effort has been devoted to techniques for eliminating gauge redundancy in order to appropriately interpret gauge theories.Footnote ² The second assumption is that a theory with gauge symmetry constitutes the gold standard of a modern physical theory: witness the gauge symmetry invoked in the Standard Model. This leads to a remarkable puzzle of gauge symmetry: if interpreting gauge symmetry requires eliminating it, then why is gauge symmetry so ubiquitous?

The purpose of this section is to articulate one answer to this question: namely, that gauge symmetry provides a path to building appropriate dynamical theories – and that this rationale invokes the two theorems of Emmy Noether (Reference Noether1918).Footnote ³ Noether’s first and better-known theorem (commonly called simply Noether’s theorem) implies that global (or what we will call rigid) symmetries of a classical Lagrangian field theory – that is, symmetries in which the redundancy is specified in exactly the same way at all spacetime points – correspond to charges that are conserved over time, such as energy and angular momentum. For example, the conservation of an electron’s charge can be viewed as arising from the (redundant) global phases of the electron’s wavefunction. But we will be equally concerned with Noether’s second theorem, which is about local, or what I will also call malleable) gauge symmetries – meaning that the specified redundancy varies between spacetime points. Although these theorems’ physical significance is, of course, already well recognized, including in the philosophical literature (Brading & Brown, Reference Brading and Brown2000, Reference Brading, Brown, Brading and Castellani2003), in this section I will urge that these two theorems give us a further answer to the puzzle, “why gauge?”

To get the gist of the argument to come, let us take the example of the electron field. As we know, the charges carried by electrons are sources for electromagnetic fields. And we take the interaction between the electric charges to be mediated by the electromagnetic field; and we measure the electromagnetic field by its effect on electric charges. But just like spacetime has its own dynamics in general relativity, electromagnetic fields have their own dynamics, even in the absence of charges: there are nontrivial solutions of the Maxwell equations even in vacuum; for example, an electromagnetic plane wave.

Thus suppose that we are canvassing the possibilities for the dynamical laws of a field that is sourced by a charge – call it the force field – and that we want to ensure that charge was conserved. Suppose further that for each possible dynamical law we should be able to infer the quantity of charge in a sufficiently small region of space via the behavior of the force field surrounding this region, as I described previously. But certain dynamics of the force field would not permit this kind of inference: think of field lines emanating from a given region of space that suddenly vanish, or diminish in density. Charges might be conserved, but in this case we could never infer the charge contained in a region by its effect on other charges. So, if charges are to interact via the force field and are to be conserved, it is natural to impose a consistency constraint on the dynamics of the force fields. Simply put: conservation of material charges requires compatible dynamics of the force fields. This constraint applies to all interactions and all charges of the Standard Model of particle physics and general relativity. Indeed, such constraints will need to be imposed even for those conserved charges and associated interactions that we have not yet come across in our theorizing.

And such constraints are not methodologically idle. Indeed they have often guided the formulation of our theories: before gauge theory, scientists such as Einstein and Maxwell proposed early versions of the dynamics for the fields of their respective theories that did not satisfy these constraints, and both of them had many a headache for that reason. It was precisely this kind of consistency that eventually led them to postulate the final form of their equations.Footnote ⁴

One of the main merits of gauge theories is that they allow us to “cut out the middle man” that is this method of trial and error. How do they do this?

In gauge theories, the symmetries are local – their action on one spacetime region is independent of their action elsewhere. Noether’s second theorem applies to this kind of symmetry, but, unlike her first theorem, which applies to global, or rigid symmetries, it gives no straightforward conservation law; it only implies that the equations of motion are not all independent from each other. In other words, the theorem says that there are fewer independent equations of motion than there are degrees of freedom for it. Thus the original degrees of freedom are constrained: it is only a constrained subset of the original degrees of freedom at some initial time that is uniquely, deterministically, propagated to the future; (see e.g. Brading & Brown (Reference Brading, Brown, Brading and Castellani2003) for a conceptual overview).

Since the set of local symmetries contains the global symmetries, which are responsible for charge conservation via Noether’s first theorem, when we extend global to local symmetries, there should emerge a relationship between charge conservation according to Noether’s first theorem and the constraints that arise via Noether’s second theorem. And surely enough, a relationship does emerge. The amazing, even if not entirely surprising, fact about this relationship is that it is precisely the one required for consistency between charge conservation and the dynamics of the interacting field.

This answer to the question of “why gauge” is an instance of the much more general role for gauge, which I have sketched previously, and which has not been at all discussed in the philosophical literature: gauge symmetry supports theory construction. Although some philosophers like Brading and Brown (Reference Brading, Brown, Brading and Castellani2003) have pointed out the role of gauge symmetry in theory construction, a more specific answer to the puzzle of gauge symmetry that I will advocate here is that it constrains the coupling of charges to forces. This construal of the gauge argument is, to an extent idiosyncratic. But, as experts will be quick to note: the usual gauge argument in its common textbook form is fraught with difficulties.

Here is the roadmap for the remainder of this section. In Section 2.1 I will rehearse the usual gauge argument and its woes. In Section 2.2, I will present the much more general gauge argument sketched previously, which I will call the Noether gauge argument, in the context of classical Lagrangian field theory. The key to understanding this argument is the combined use of both Noether’s first and second theorems.

2.1 The Gauge Argument and Its Critics

The textbook gauge argument or gauge principle uses gauge invariance to motivate a quantum theory of electromagnetism. We begin Section 2.1.1 with a brief presentation of this argument as it is usually presented. Classic textbook statements can be found in Schutz (Reference Schutz1980: §6.14), Göckeler & Schücker (Reference Göckeler and Schücker1989: §4.2), and Ryder (Reference Ryder1996: §3.3), among many other places. Then in Section 2.1.2 we assess it. The argument has been discussed in the form herein by philosophers as well, such as Teller (Reference Teller1997, Reference Teller2000); Brown (Reference Brown, Butterfield and Pagonis1999), Martin (Reference Martin2002), and Wallace (Reference Wallace2009, §2). To the end of Section 2, I will take a very pragmatic approach: I will leave a more conceptual introduction to the variables of gauge theory to Section 3.

2.1.1 Beware: Dubious Arguments Ahead

We begin by describing a quantum system with the Hilbert space $L^{2} (R^{3})$ of wavefunctions, recalling that a unique pure quantum state is represented not by vector, but by a “rays” of vectors related by a complex unit. This implies that the transformation $ψ (\vec{x}) \mapsto e^{i θ} ψ (\vec{x})$ for some $θ \in R$ , referred to as a “global phase” transformation, acts identically on rays, and is in this sense an invariance of the quantum system.Footnote ⁵ This invariance is incorporated in the specification of the dynamics of the system, either via the Hamiltonian or the action, since either contains only real valued functions such as $| ψ |^{2}$ and $\partial_{i} ψ \partial_{i} \overline{ψ}$ .

But now, the story goes, suppose we replace this global phase with a “local phase” transformation $ψ (\vec{x}) \mapsto e^{i ϕ (\vec{x})} ψ (\vec{x})$ , in which the constant $θ$ is replaced with a function $ϕ : R^{3} \to R$ , or indeed with a smooth one-parameter family of such functions $ϕ_{t} (\vec{x})$ for each $t \in R$ ; or, adopting the covariant notation in which $x = (t, \vec{x})$ , we write as $ϕ (x)$ . This transformation is “local” in the sense that its values vary smoothly across space and time.

The corresponding Hilbert space map $R_{ϕ} : ψ \mapsto e^{i ϕ} ψ$ does not act identically on rays. As to the dynamics, whereas $| ψ |^{2}$ would remain invariant under such a transformation, that would not be the case for terms involving derivatives, such as $\partial_{i} ψ \partial_{i} \overline{ψ}$ , which under such a transformation acquires terms depending on $\partial_{i} ϕ$ such as $(\partial_{i} ϕ)^{2} ψ \overline{ψ}$ .

However, one might still wish to postulate that this transformation has no “physical effect” on the system, or is “gauge.” Various motivations for this step are given in the textbooks, often with vague references to general covariance of the kind found in general relativity. But to mimic the standard presentation, we will simply press forward, referring to $R_{ϕ} : ψ \mapsto e^{i ϕ} ψ$ as a local or malleable gauge transformation.

With respect to the dynamics, we still need to say something about the noninvariant terms involving derivatives. The big move of the gauge argument is to first introduce a vector $A = (A_{1}, A_{2}, A_{3})$ and a scalar $V$ , which are assumed to behave under the gauge transformation as,

\begin{matrix} A \mapsto A + \nabla ϕ_{t}, & V \mapsto V - \frac{d ϕ_{t}}{d t} . \end{matrix}

(2.1.1)

To restore invariance of the dynamics under gauge transformations, and with an eye toward a modern gauge theory formulated as a vector bundle with a derivative operator, writing $\partial_{μ} := (\frac{d}{d t}, \nabla)$ and $A_{μ} = (V, A)$ , one finds that one can restore gauge-invariance by replacing $\partial_{μ}$ with,

D_{μ} := \partial_{μ} + i A_{μ} = (\frac{d}{d t} + i V, \nabla + i A) = (D_{t}, D) .

(2.1.2)

This is commonly referred to as a “covariant derivative,” and it has the form of the familiar gauge freedom of the electromagnetic four-potential. That is, if we call $A_{0} := V$ , these transformations leave invariant the following tensor:

F_{μ ν} = \partial_{μ} A_{ν} - \partial_{μ} A_{ν},

(2.1.3)

where the electric and magnetic field are recovered in a given coordinate system as $E_{i} = F_{0 i}$ and $B^{i} = ϵ^{i j k} F_{j k}$ , where $x^{0}$ are the time coordinates, $i, j, k$ are spatial indices, and $ϵ^{i j k}$ is the totally anti-symmetric tensor in space. In short, it appears as if minimal electromagnetic coupling has been derived out of nothing: or at least, from an assumption of gauge invariance.

2.1.2 Criticisms of the Gauge Argument

That is how the story is usually presented. I agree: it is far from water-tight. The argument begins with a system with a global symmetry, gratuitously generalizes it to a local symmetry – which, to emphasize, was not required for mathematical consistency or for empirical adequacy – and then, in order to fix the ensuing noninvariance of the governing equations, proceeds to conjecture a new force of nature, which, so far, has no reason to be dynamical at all. Ultimately, the argument gives us no reason to think of the field $A_{μ}$ as being related to Maxwell’s equation. To put it uncharitably: the argument fixes a problem that didn’t exist by conjecturing a redundant field, and then turns this game around, claiming to come out successfully by “retrodicting” the existence of electromagnetism. More charitably: the gauge argument suffers from at least three categories of concerns. I will set out each of these three concerns here and in Section 2.2 present an alternative Noether gauge argument that answers them entirely.

The first category of concerns is the gauge argument’s claim to have derived a dynamics that is specifically electromagnetic in nature. Although a formal set of operators $A_{μ} = (V, A)$ have been included in the dynamics, no evidence is given that these operators take the form required for any specific electromagnetic potential, or that the coupling to $A_{μ}$ will be proportional to a particle’s charge $e$ , or even that $A_{μ}$ is nonzero. And if they could be shown to be nonzero, then as Wallace (Reference Wallace2009, p. 210) rightly asks: “how do neutral particles fit into the argument?” A minimally coupled dynamics does not apply to neutral particles, and yet since the gauge argument never mentioned or assumed anything about charge, it presumably is intended to apply to them.

A second category of problems arises out of the free-wheeling argumentative style of the gauge argument. For example, it is not a strict deductive derivation of either the electromagnetic potential or the dynamics. At best, the gauge argument appears to show that one can adopt a minimally coupled Hamiltonian in order to assure gauge invariance. But this does not ensure that one must do so: the door appears to be left open for other dynamics to be gauge invariant, but without taking the minimally coupled form that the gauge argument advocates. As Martin (Reference Martin2002, p. S230) writes: “The most I think we can safely say is that the form of the dynamics characteristic of successful physical (gauge) theories is suggested through running the gauge argument.”

Another example of free-wheeling argumentation is in the motivation for requiring the local gauge transformations $R_{ϕ} : ψ \mapsto e^{i ϕ (x)} ψ$ to be symmetries. Sometimes a preference for this transformation over global phase transformations is dubiously motivated by a desire to avoid superluminal signaling.Footnote ⁶ In other cases it is motivated by the coordinate invariance of a spatial coordinate system. But as Wallace (Reference Wallace2009, p. 210) points out, no reason is given as to why we do not similarly consider local transformations of configuration space, momentum space, or any other space, to be symmetries. Nor is there any clear reason why the $U (1)$ symmetry of electromagnetism is chosen as the global symmetry motivating the move to the local symmetry, as opposed (say) the $S U (3)$ symmetry of the strong nuclear force.

Regarding the generalization of the gauge argument to other global symmetry groups beyond electromagnetism, I wholeheartedly agree with Wallace: one should expect, and indeed I will argue in Section 2.2, that an appropriate generalization of the gauge argument can also be applied to these more general gauge groups.

My approach here speaks to a third category of concerns, that the gauge argument is awkwardly placed as an argument for a quantum theory of electromagnetism. Here too I agree with Wallace:

In fact, it seems to me that the standard argument feels convincing only because, when using it, we forget what the wavefunction really is [i.e. a wavefunction on configuration space]. It is not a complex classical field on spacetime, yet the standard argument, in effect, assumes that it is. This in turn suggests that the true home of the gauge argument is not non-relativistic quantum mechanics, but classical field theory.

(Wallace, 2009, p. 211)

In Section 2.2, we will switch perspectives from the verdammten Quantenspringerei to the context of classical Lagrangian field theory, and propose a framework that substantially clarifies the roles of global gauge symmetries, of local gauge symmetries, and of their relationship, which I will call the “Noether gauge argument.”

2.2 A Noether Reason for Gauge $#$

In Section 2.2.1 I will set out the pre-requisite assumptions necessary for my argument based on Noether’s theorems. Then, in Section 2.2.2 I will set up the mathematical background and equations that will be analyzed in a case-by-case basis in the subsections of Section 2.2.3.

2.2.1 Overview

For a more general view of how gauge symmetries constrain the dynamics of a physical theory, I will now, as announced in Section 2.1.2, make a two-step use of the theorems of Emmy Noether (Reference Noether1918): the first, and then the second. I will refer to this as the Noether gauge argument. Agreed: this is by no means a new observation, since practicing physicists use this property of gauge frequently!Footnote ⁷ But I believe it is worth highlighting and clarifying exactly the kind of information that can be extracted in various cases, as part of my advocacy that philosophical discussions of gauge should better recognize gauge’s significance for theory construction.

To recall the sketch of the argument: the Noether gauge argument proceeds in two steps. First, we choose a rigid gauge symmetry associated with an arbitrary global gauge group, and propose that its action produces a variational symmetry: by Noether’s first theorem, this guarantees the presence of a collection of conserved quantities. But matter fields do not exist in isolation: they couple to other “force” fields. Thus, in the second step, we introduce such a field and apply Noether’s second theorem, “loosening” the rigid global symmetries to local, malleable ones.

About the generality and applicability of Noether’s theorems, there are several issues that I will not address, but which should bridle undue enthusiasm (cf. Brown (Reference Brown, Read and Teh2022)). First, Noether’s theorems apply only to those theories that admit a Lagrangian (variational) formulation. But there are mathematical models that are useful and which do not admit such a formulation: Fourier’s heat equation and Navier-Stokes equations are well-known examples. Second, if one takes the equations of motion and not the Lagrangian as fundamental, there are many Lagrangians that give the same equations of motion, and there is, in general no unique symmetry associated with a given conservation law, or vice versa; (though most of these ambiguities can be accounted for by different boundary terms or boundary conditions, which don’t affect the fundamental meaning of the conservation law). Third, the meaning of the conserved quantity obtained from a given symmetry could be theory dependent. For instance, one can obtain a conserved quantity associated with time translation symmetry for a damped oscillator, but this is not energy as usually construed (for nondamped systems). There is also the matter of explanatory priority between conservation laws and symmetries, which I will not address here.

Thus, I will not only assume the minimum conditions under which the theorems apply, but in the interest of clarity and pedagogy, I will make several simplifying assumptions, both about the Lagrangian density and about the action of the gauge group, some of which are not strictly speaking necessary but which simplify my argument. So there are mathematically more general and more abstract ways to formulate this argument (see e.g. Gomes (Reference Gomes, Read and Teh2022)), but for our analysis, it is worthwhile to be specific about the field content of the theory, and show how local, or malleable symmetries provide three concrete constraints on the dynamics (namely, the vanishing of the three lines in Equation (2.2.8)). The interpretation of these constraints can be seen on a case-by-case, or sector-by-sector, basis: we will consider their implications for global versus local symmetries, as well as for theories that contain a force field that transforms under the transformation versus those that contain no such force field. Thus in the following sections we will spell out the consequences of the three constraints for four different sectors of the theory.Footnote ⁸

Throughout this discussion, I will follow standard practice and distinguish two equivalence relations for classical fields on a manifold. First, I will write “=” to denote ordinary equality between fields, irrespective of the satisfaction of the equations of motion, and refer to this as strong or off-shell equality. Second, given a fixed Lagrangian, I will write “ $\approx$ ” to denote equality between fields that holds if the Euler-Lagrange equations are satisfied for that Lagrangian, and refer to this as weak or on-shell equality.Footnote ⁹

2.2.2 Mathematical Setup*

Now I will introduce, without much explanation, some of the mathematical objects that will be the focus of Section 3. Here, the reader should just take the definitions at face value: they will be explained and motivated in that section.

We start by assuming that $φ$ is some field on spacetime, whose dynamically possible models are determined by a Lagrangian scalar function $L (φ) \in C^{\infty} (M)$ , as those that extremize the integral of this function over $M$ (called the action functional), a condition which we write as:Footnote ¹⁰

δ \int_{M} L (φ) = 0.

(2.2.1)

Using Leibniz-linearity of $δ$ , Equivalently, after successive integration by parts, we isolate $δ φ$ and write the conditions (2.2.1) as yielding equations of motion, up to boundary terms:

δ L = E L \cdot δ φ + d θ (δ φ),

(2.2.2)

where EL is the Euler-Lagrange functional (the left-hand part of the Euler-Lagrange equations) which has one component for each direction of $δ φ$ , and $θ$ is a linear operator on variations of the fields, but it is a differential form of codimension one on spacetime (i.e. it is a boundary term).Footnote ¹¹

Suppose that, for any value of $φ$ , there is a family of transformations $δ_{ξ} φ$ , whose parameters $ξ$ form an algebra, which is such that $δ_{ξ} δ_{ξ^{'}} φ - δ_{ξ^{'}} δ_{ξ} φ = δ_{[ξ, ξ^{'}]} φ$ ,Footnote ¹² and so that $δ_{ξ} L = 0$ . So from (2.2.2):

δ_{ξ} L = E L \cdot δ_{ξ} φ + d θ_{ξ} = 0,

(2.2.3)

and so, for dynamically possible models, for which $E L = 0$ , we get

d θ_{ξ} \approx 0,

(2.2.4)

where $θ_{ξ}$ is the Noether charge associated to the symmetry $ξ$ , and where $θ_{ξ} := θ (δ_{ξ} φ)$ . The Noether charge inherits the ambiguity $θ \to θ + d κ$ in the boundary term, but the ambiguity does not matter for conserved quantities, since $d \circ d = 0$ (or, equivalently, the boundary of a boundary is empty: $\partial \partial B = \emptyset$ ).

Noether’s second theorem also follows from (2.2.3). Assume that the symmetries are malleable, or local, so that we can restrict to those $ξ$ such that $ξ_{| B} = 0$ . Now, there are inner products $⟨ ∙, ∙ ⟩$ and $⟨ ⟨ ∙, ∙ ⟩ ⟩$ , on the appropriate function spaces of $φ$ and $ξ$ , respectively, so that

δ_{ξ} \int L (φ) = \int E L \cdot δ_{ξ} φ = \int ⟨ E L, δ_{ξ} φ ⟩ = \int ⟨ ⟨ Δ^{†} E L, ξ ⟩ ⟩ = 0

(2.2.5)

where $Δ^{†}$ is the formal adjoint of $δ_{ξ} φ$ , seen as a linear operator on $ξ$ (see Fischer & Marsden (Reference Fischer, Marsden and Hawking1979) for a thorough, geometric formulation of such inner products and adjoints in the space of fields of gauge theory and general relativity). Since (2.2.3) must vanish for all such $ξ$ , it implies a local equation that the Euler-Lagrange equations must satisfy everywhere, and which is valid off-shell:

Δ^{†} E L = 0.

(2.2.6)

Because these constraints are valid off-shell, they reflect a kinematic property of the variables of the theory. For instance, the Bianchi identities for the curvature tensors (which applies both to (semi-)Riemannian geometry and principal fiber bundles; cf. Proposition 4 in “The Curvature” section) is a purely geometric property that will give rise to such constraints. These are geometrical identities, that tell us that not all components of the curvature tensors are independent: they satisfy differential and algebraic constraints. In all theories under consideration here, local, covariant tensors that involve derivatives of the fundamental variables – either the gauge potential or the metric – must be written in terms of curvature (cf. e.g. Lovelock (Reference Lovelock1972)), and so obey such geometric constraints. Thus, the equations of motion obtained from Lagrangians that involve derivatives of the fundamental variables, will in some way or another inherit these constraints and so cannot all be independent. In other words, there are more independent variables than there are equations of motion. For this reason such equations of motion cannot be used to uniquely determine the evolution of all the fundamental variables: they do so only for a constrained subset. I will get back to this topic in Section 5.3.1, when we discuss non-locality.

Now let us be more specific. Let us start with the material charges, which I will represent as fields with no spacetime index, but, assuming the value space is a vector space $V$ , with indices $a$ ; so that $ψ^{a} (x) \in V$ for $x \in M$ , where $M$ is spacetime and $V$ is a vector space. In relativity, $M$ is a smooth Lorentzian manifold, that is, equipped with a nondegenerate symmetric bilinear product that is not positive definite, having a signature of $(3, 1)$ . To simplify the notation, when discussing gauge theory, I’ll consider the case of a Minkowski metric, whose Levi-Civita covariant derivative I’ll denote as $\partial$ (the generalization to another metric would amount to a minimal replacement $\partial \to \nabla$ ). From the pragmatic, nongeometric standpoint of this section, assume that the symmetries associated to the conservation of charges arise from the action of a Lie group, $G$ on $V$ (cf. footnote 5). We define this action pointwise as $g \cdot ψ (x) = g (x) \cdot ψ (x) \in V$ . Let $t_{I}^{i j}$ be the $n$ -dimensional Hermitean matrix representation on $V$ of $g$ , that is, $t : g \to G L (V)$ , where the $I$ are indices of the Lie algebra space, in the domain of the map, and $i, j$ denote the matrix indices in the image of the map, acting linearly on $V$ .Footnote ¹³

I will also assume that the forces that are sourced by $ψ$ have direction in space – as forces are prone to have – and take value again in some internal vector space. For reasons to be clarified in Section 3, at this point I will take these value spaces as being linearly isomorphic to the Lie algebra: so the force fields are labeled $A_{μ}^{I}$ , they take vectors of $M$ to $g$ , with $μ$ representing the spacetime components of the vector.Footnote ¹⁴ These fields are associated with a dynamics by postulating a real-valued action functional $S (ψ_{i}, A_{μ}^{I})$ , whose extremal values provide the equations of motion.

We take the (malleable, or local) gauge transformations, infinitesimally parametrized by $ξ \in g$ , to act on our fundamental variables as:

(\begin{matrix} δ_{ξ} ψ_{i} = ξ^{I} t_{I}^{i j} ψ_{j} = (ξ t ψ)_{i} \\ δ_{ξ} A_{μ}^{I} = D_{μ} ξ^{I} = \partial_{μ} ξ^{I} + [ξ, A_{μ}]^{I} \end{matrix}) .

(2.2.7)

where the square brackets are the Lie algebra commutators (again, we will justify these transformations in Section 3). These transformation rules are not as general as they could be, but neither are they arbitrary: they are highly constrained by the theory of representations of Lie groups on vector spaces! But even without going into representation theory, the reader can recognize that these are the first-order terms of the Lie algebra action on the respective vector spaces – in particular, “first-order” in the derivatives of $ξ$ and in powers of $A$ and $ψ$ – and in this sense provide an appropriate approximation of any malleable gauge transformation.

Our aim now is to constrain how the matter fields $ψ$ couple to force fields. Let $L (ψ, \partial ψ, A, \partial A)$ be the Lagrangian defining our action $S (ψ, A)$ , which we assume for simplicity does not depend on derivatives of higher order than two.Footnote ¹⁵ Variation along the directions of the gauge transformations previously yields (with summation convention on all indices):

\begin{matrix} \begin{matrix} (\frac{δ L}{δ ψ_{i}} (t^{I} ψ)_{i} + \frac{δ L}{δ \partial_{μ} ψ_{i}} (t^{I} \partial_{μ} ψ)_{i} + {(\frac{δ L}{δ A_{ν}}, A_{ν})}^{I} + {(\frac{δ L}{δ \partial_{ν} A_{μ}}, \partial_{μ} A_{ν})}^{I}) ξ_{I} + \\ (\frac{δ L}{δ \partial_{μ} ψ_{i}} (t^{I} ψ)_{i} + \frac{δ L}{δ A_{μ}^{I}} + {(\frac{δ L}{δ \partial_{ν} A_{μ}}, A_{ν})}^{I}) \partial_{μ} ξ_{I} + \\ \frac{δ L}{δ \partial_{ν} A_{μ}^{I}} \partial_{μ} \partial_{ν} ξ^{I} = 0 \end{matrix} \end{matrix}

(2.2.8)

Since the derivatives of $ξ$ are functionally independent, this equation implies that each line must vanish separately: the first line is a consequence of global symmetries – the equation would have to be satisfied even if the symmetry was independent of the spacetime point – and the remaining two are consequences of local symmetries. These are the fundamental constraints on the dynamics that we propose to analyze, and the task of the remainder of this section will be to unpack them.

The requirement that each of these lines vanishes provides a strong constraint on the form of the Lagrangian, and hence on the dynamics. This, I claim, provides the core of the Noether gauge argument.

2.2.3 The Four Different Cases

To extract interesting physical information from the constraint given by (2.2.8), there are four sectors to compare, arising from the use of either global or local symmetries, and either $A$ -independent or $A$ -dependent Lagrangians. We treat each sector in turn.

The results will be: when $A$ does not figure in the Lagrangian, a theory with global symmetries can be dynamically nontrivial and complete. In such a theory the charges don’t couple to forces, so it will not require further constraints for consistency. With local symmetries and no $A$ -dependence, the constraints demand that the dynamics be trivial, that is, no kinetic term for the matter field can appear in the Lagrangian. When forces have their own dynamics, that is, when the Lagrangian is $A$ -dependent, a theory with global symmetries may be incomplete, and require further constraints to render the dynamics of $A$ compatible with charge conservation; an example will be given. It is only in the last case, where we have local symmetries and $A$ -dependence, that the equations of motion coupling forces to charges is automatically consistent with the conservation of charges (and so no further constraints are required). Thus, we will see the power of malleable symmetries and $A$ -dependence together to secure an interacting dynamics that conserves charge. And this will be our Noether gauge argument.

Force-Independent Lagrangian, with Global Symmetries

First, suppose we are as in the first step of the textbook gauge argument: there is no $A$ in sight, and the symmetry is global, so that $\partial_{μ} ξ^{I} = 0 = \partial_{μ} \partial_{ν} ξ^{I}$ . Then the vanishing of the first line of Equation (2.2.8) reduces to

\frac{δ L}{δ ψ_{i}} (t^{I} ψ)_{i} + \frac{δ L}{δ \partial_{μ} ψ_{i}} (t^{I} \partial_{μ} ψ)_{i} = 0.

(2.2.9)

But by the Euler-Lagrange equations $E L (ψ)_{i} \approx 0$ , where $E L (ψ)_{i} = \frac{δ L}{δ ψ_{i}} - \partial_{μ} \frac{δ L}{δ \partial_{μ} ψ_{i}}$ , we have

\frac{δ L}{δ ψ_{i}} \approx \partial_{μ} \frac{δ L}{δ \partial_{μ} ψ_{i}},

(2.2.10)

where we again are using “ $\approx$ ” to denote “on-shell” equality. Applying this to Equation (2.2.9) we find that

\partial_{μ} (\frac{δ L}{δ \partial_{μ} ψ_{i}} (t^{I} ψ)_{i}) = \partial^{μ} J_{μ}^{I} (ψ) \approx 0

(2.2.11)

where we have defined the part that is conserved as the matter current:

J_{μ}^{I} (ψ) := \frac{δ L}{δ \partial_{μ} ψ_{i}} (t^{I} ψ)_{i},

(2.2.12)

as is customary.

In summary, we have derived what is guaranteed by Noether’s first theorem, that the current $J_{μ}^{I} (ψ)$ is conserved on-shell. Or, turning this around: symmetry requires the Lagrangian to be restricted so that $J_{μ}^{I} (ψ)$ defined in Equation (2.2.12) is divergenceless. Having constrained the space of theories in this manner, there are no more equations to satisfy: conservation of charge is consistent with the dynamics and no further constraints need to be imposed.

Force-Independent Lagrangian, with Local Symmetries

In the next case, suppose that we allow – in addition to “Force-independent Lagrangian, with global symmetries” section’s equations – the ones arising from a $\partial ξ \neq 0$ , while still not allowing for an $A$ in the theory. We get, in addition to equations (2.2.12) and (2.2.11), from the vanishing of the second line of Equation (2.2.8):

\frac{δ L}{δ \partial_{μ} ψ_{i}} (t^{I} ψ)_{i} = J_{μ}^{I} (ψ) = 0.

(2.2.13)

So here the conserved currents are forced to vanish. Clearly this condition is guaranteed for all field values if $\frac{δ L}{δ \partial_{μ} ψ_{i}} = 0$ , which requires a vanishing kinetic term. A careful analysis of more general cases reveals this is the only generic solution.Footnote ¹⁶

This analysis pinpoints the obstacle appearing in the textbook gauge argument that we rehearsed in Section 2.1.1. When the matter field Lagrangian has a nontrivial kinetic term, local transformations cannot be variational symmetries. That is: if we impose local symmetries without introducing a gauge potential, we cannot consistently also allow a term in the Lagrangian including $\partial_{μ} ψ_{i}$ . It is to allow such terms and still retain the local symmetries that the next two sections will introduce the gauge potential.

Force-Dependent Lagrangian, with Global Symmetries

We first proceed precisely as in the first case, introducing the $A$ field, but still keeping the symmetries global. Using the equations of motion for $A$ as well as those of $ψ$ , that is, $E L [A] = 0$ as well as $E L [ψ] = 0$ , we get, in direct analogy to (2.2.11), a conserved current that is a sum of two currents:Footnote ¹⁷

\partial_{μ} (\frac{δ L}{δ \partial_{μ} ψ_{i}} (t^{I} ψ)_{i} + {(\frac{δ L}{δ \partial_{ν} A_{μ}}, A_{ν})}^{I}) = \partial^{μ} (J_{μ}^{I} (ψ) + {\tilde{J}}_{μ}^{I} (A)) \approx 0

(2.2.14)

and nothing more; there are no further conditions that the terms of the Lagrangian need to obey. (Here, the definition of ${\tilde{J}}_{μ}^{I} (A))$ is given implicitly by (2.2.14).)

So, unlike the previous case, which admitted only a trivial kinetic term for the matter field $ψ$ , this sector will admit many possible dynamics. The problem here is of a different nature: the theories are not sufficiently constrained; the equations of motion do not automatically guarantee conservation of charges.

Let us look at an example of how things can go wrong in this intermediate sector containing forces but only global symmetries, for the simple, Abelian theory. In the Abelian theory, $\tilde{J} (A) \equiv 0$ , since quantities trivially commute. Thus, Equation (2.2.14) only contains the standard conservation of the matter charges and the symmetries are silent about the relationship between this charge and the dynamics of the forces.

Consider a kinetic term of the form $\partial_{(μ} A_{ν)} \partial^{(μ} A^{ν)}$ where round brackets denote symmetrization. So this differs from the standard Maxwell theory kinetic term for the gauge potential: namely, $V_{μ ν} F^{μ ν} := \partial_{[μ} A_{ν]} \partial^{[μ} A^{ν]}$ where square brackets denote anti-symmetrization. But the symmetrized version is nonetheless gauge-invariant (under global transformations). Now, the Euler-Lagrange equations for this theory differ only very slightly from the Maxwell-Klein-Gordon equations. The equations of motion for $A$ yield:

\partial^{μ} (\partial_{(μ} A_{ν)}) = J_{ν}

(2.2.15)

in contrast with the usual $\partial^{μ} (\partial_{[μ} A_{ν]}) = J_{ν}$ . But while the divergence of the right-hand side automatically vanishes, unlike the usual case the divergence of the left-hand side does not:

\partial^{ν} \partial^{μ} (\partial_{(μ} A_{ν)}) = \partial^{μ} \partial_{μ} \partial^{ν} A_{ν} = □ \partial^{ν} A_{ν} ̸ \equiv 0.

(2.2.16)

At this point, we would have to go back to the drawing board and introduce more constraints on the theory: this theory does not couple forces to charges in a manner that guarantees charge conservation.

Thus we glimpse my overall thesis: only by introducing local gauge symmetries do we restrict interactions between forces and their sources so that they are consistent with the conservation of the matter current.

Of course, in this example the culprit is easily found: the kinetic term $\partial_{(μ} A_{ν)} \partial^{(μ} A^{ν)}$ is not invariant under local transformations. According to the next section – our fourth sector – requiring this stronger form of invariance will restrict us to the space of consistent interactions. No tweaking required.

Force-Dependent Lagrangian, with Local Symmetries

In this fourth sector, we again obtain (2.2.14), from the vanishing of the first line of (2.2.8) – the constraint for the global symmetry – since nothing changes at that level. But, from the vanishing of the second line in Equation (2.2.8), we have:

- \frac{δ L}{δ A_{μ}^{I}} = \frac{δ L}{δ \partial_{μ} ψ_{i}} (t^{I} ψ)_{i} + {(\frac{δ L}{δ \partial_{ν} A_{μ}}, A_{ν})}^{I} = J_{μ}^{I} (ψ) + {\tilde{J}}_{μ}^{I} (A) .

(2.2.17)

Once again using the Euler-Lagrange equations for $A$ to substitute the left-hand side, we find that

E L (A)_{μ}^{I} = \frac{δ L}{δ A_{μ}^{I}} - \partial_{ν} \frac{δ L}{δ \partial_{ν} A_{μ}^{I}} \approx 0.

(2.2.18)

Defining $\frac{δ L}{δ \partial_{ν} A_{μ}^{I}} =: k_{μ ν}^{I}$ , we now obtain:

J_{μ}^{I} (ψ) + {\tilde{J}}_{μ}^{I} (A) = - \partial^{μ} k_{μ ν}^{I} + E L (A)_{μ}^{I} \approx - \partial^{μ} k_{μ ν}^{I}

(2.2.19)

This equation links both the matter and force currents to the dynamics of the force field, given in $E L (A)_{μ}^{I}$ .

We already know from the constraint for the global symmetry, Equation (2.2.8), that the sum of the currents is divergence-free on shell (cf. Equation 2.2.14). Thus, taking the divergence of (2.2.19), the left side vanishes, thus consistency between charge conservation and the dynamics of the force fields demand that the right-hand side must also vanish, implying that $\partial^{ν} \partial^{μ} k_{μ ν}^{I} = 0$ . Since partial derivatives are necessarily symmetric, all we need in order to satisfy conservation is that:

k_{μ ν}^{I} = - k_{ν μ}^{I} or k_{μ ν}^{I} = k_{[ν μ]}^{I},

(2.2.20)

which is just what we have from the vanishing of the third line of Equation (2.2.8). So the condition was automatically satisfied. The result of including local symmetries, in this simple case, restricts us to consider Lagrangians in which the derivatives of $A_{μ}^{I}$ only enter in anti-symmetrized form: $\partial_{[μ} A_{ν]}^{I}$ . This restriction excludes the previous example of Equation (2.2.15).

More generally, if we try to find a Lagrangian that includes force fields without obeying the relations obtained from the local symmetries, the equations of motion of the force fields and those relating force fields and matter may require further constraints to be compatible with charge conservation, as we saw in the counter-example in the previous section. This is one superpower of local gauge symmetries: they link charge conservation – taken as empirical fact or on a priori grounds – with the form of the Lagrangian for the force fields.

3 Gauge Theory and the Geometry of Fiber Bundles

This is the section in which I introduce the standard geometric approach to gauge theory. (In Section 4, I will introduce a less standard geometric approach.)

In Section 3.1 I will motivate the use of fiber bundles without appealing to any complicated mathematics. This section will introduce the main ideas to be developed in the rest of the section in a pedagogic fashion. Section 3.2 is more mathematically advanced and gets an asterisk (*). Indeed, it is the most mathematically involved in this Element, and so it merits a further preamble. The modern mathematical formalism of gauge theories relies on the theory of principal and associated fiber bundles. I will not give a comprehensive account here (cf. e.g. Kobayashi & Nomizu (Reference Kobayashi and Nomizu1963); Michor (Reference Michor2008) for rigorous mathematical treatments, Nakahara (Reference Nakahara2003) for a physics-based approach, or Baez & Munian (Reference Baez and Munian1994) for a more pedagogic conceptual introduction). There are also more (many more!) mathematically comprehensive sources on this topic in the literature, but I will focus only on the parts that are important for the geometric picture of gauge theory, and not get bogged down on existence proofs, and so on. For the demonstrations that are included, I will try to use more modern, shorter proofs, that as far as I know are only scattered throughout the literature. In Section 3.3 I will summarize the main ingredients that go into building a gauge theory of particles using the mathematics developed in the previous sections.

3.1 A Brief Introduction to Fiber Bundles

Our intuitive picture of a field over space or spacetime is something like temperature. A temperature field can be written as a map from space or spacetime $M$ to the real numbers, $T : M \to R$ ; each point in $M$ is assigned a temperature. We want to consider fields that have a more complicated “internal structure,” or “charge structure,” than temperature, so we can generalize from real numbers to vectors, in which case instead of $T$ we have $ψ : M \to V$ , a map from spacetime to some vector space $V$ .

Such a map gives us a definite identity relation for the value of the field at two different points of $M$ . Namely, two points $x, y$ can have the same value of temperature, or be mapped to the same element of $V$ . We could have a less rigid structure, where, each $x \in M$ gets its own “copy” of $V$ , with all such copies being linearly isomorphic to $V$ , but where we leave the isomorphism unspecified.Footnote ¹⁸ This is how we implement the idea that there is no absolute comparison of elements of $V$ belonging to different points of $M$ . Now, an isomorphism from the copy of $V$ over $x$ to one over $y$ will be given by a parallel transport between these two spaces, which requires further structure to be defined. In general this isomorphism may depend on the path taken from $x$ to $y$ . Fields then correspond to a particular assignment of one value $v_{x} \in V$ per point $x \in M$ : these are called sections of the vector bundle over $M$ with typical fiber $V$ .

One example of such vector spaces $V$ is familiar from differential geometry: namely, from the tangent bundle $T M$ , whose elements are, at each point, tangent to curves that pass through that point and are such that $T_{x} M ≃ R^{4} = V$ , for a four-dimensional spacetime, where here $≃$ represents a linear isomorphism that is not canonically specified.Footnote ¹⁹ Indeed, even if $T M$ were globally trivializable, so that a product structure could be found for its totality: $T M ≃ M \times R^{4}$ , this would not mean we could identify an element $v \in R^{4}$ at different points of $M$ , because such an identification would depend on the choice of isomorphism between $T_{x} M ≃ R^{4}$ .

Because the elements of $T_{x} M$ correspond to tangent of curves passing through $x$ , we say the tangent bundle is a vector bundle that is “soldered” onto spacetime. But the fields employed in modern theoretical physics – representing different properties of matter – live in more general vector bundles than $T M$ , and are not soldered to spacetime.

These “charged fields” have components at each spacetime point that are not associated to spacetime directions; they represent degrees of freedom that are “internal”: think of it as a “color” or as a kind of charge. Such charged matter fields interact through fundamental forces other than the gravitational force, and each of these forces is related to a given symmetry group. The forces tell us how the charge value at one spacetime point gets dragged along a spacetime curve to another charge value at another spacetime point.

The main idea underlying the physical significance of the parallel transport of internal quantities was already well stated in the paper that introduced this mathematical machinery into physics, Yang and Mills (Reference Yang and Mills1954):

The conservation of isotopic spin is identical with the requirement of invariance of all interactions under isotopic spin rotation. This means that when electromagnetic interactions can be neglected, as we shall hereafter assume to be the case, the orientation of the isotopic spin is of no physical significance. The differentiation between a neutron and a proton is then a purely arbitrary process. As usually conceived, however, this arbitrariness is subject to the following limitation: once one chooses what to call a proton, what a neutron, at one space-time point, one is then not free to make any choices at other space-time points.

The idea here is that calling a particle a proton or a neutron at a given point is meaningless; only relational or, more broadly, structural properties of the theory can have physical significance, for instance, whether your original “proton” became a “neutron” upon going around a loop.Footnote ²⁰ The only physically relevant information is a notion of sameness across different points of spacetime: thus, once we label a given particle as, for example, a proton at one point of spacetime, the structure of the bundle specifies what would also count as a proton at another spacetime point, infinitesimally nearby. In Section 3.2, we give the technical conditions that make precise this idea.

3.2 Fiber Bundles in Gauge Theory

This is a rather long section, and more mathematically involved than the others. But I will start slow, in Section 3.2.1, providing more motivation for using fiber bundles in general, and then principal fiber bundles and their associated vector bundles. As I mentioned previously, the basic idea of a bundle is that it has internal spaces associates with each spacetime point – called fibers – and there is no canonical way to identify points in different fibers. I will introduce fibers that are vector spaces, and then will try to give some intuition for principal fiber bundles as bundles of linear frames for these vector spaces. In Section 3.2.2 I will develop the promised mathematical machinery, with particular attention to conceptual elements.

3.2.1 The Intuition Behind Fiber Bundles

To gather intuition about principal fiber bundles (PFBs) as the “organizers” of symmetry principles, as described in Section 3.1, it is worthwhile to introduce them in the context of the familiar tangent vector fields on $M$ .

Fiber bundles are spaces that locally look like a product; that is, they form a ‘bundle’ of fibers over a base manifold (usually spacetime). Let us denote fiber bundles by $E$ ; they are smooth manifolds that admit the action of a surjective projection $π_{E} : E \to M$ so that any point of $M$ has a neighborhood, $U \subset M$ , such that $E$ is locally of the form $π_{E}^{- 1} (U) ≃ U \times V$ , where $V$ , as previously, is isomorphic to some “fiber”: a space over each point of $M$ and in which the fields take their values, and similarly for all subsets of $U$ , which ensures that $π_{E}^{- 1} (x) ≃ V$ . But the isomorphism between $π_{E}^{- 1} (U)$ and $U \times V$ is not unique, which is why there is no canonical identification of elements of fibers over different points of spacetime. Each choice of isomorphism is called “a trivialization” of the bundle: it is basically a coordinate system that makes the local product structure explicit. It is standard to denote a fiber bundle $E$ over $M$ , with typical fiber $V$ , with the triple $(E, M, V)$ .

Definition 1 (A section of a bundle) A field-configuration for $E$ is called a section, and it is a map $κ : U \to E$ such that $π_{E} \circ κ = {I d}_{U}$ . We denote smooth sections like this by $κ \in Γ (E)$ .Footnote ²¹

Sections replace the functions $\tilde{κ} : M \to V$ , that we would employ if the fields had a fixed, or “absolute” – that is, spacetime independent – space of values.

There are essentially two kinds of bundles that we will encounter here: a vector bundle and a principal fiber bundle; a third type, an associated vector bundle, is a vector bundle that is associated to a principal bundle.

Each matter field in a gauge theory is described by a section on a vector bundle, corresponding to that field. Indeed, given a vector bundle $E$ over $M$ (which we will describe next in more depth), we can directly define an affine connection $D$ as:

D : Γ (E) \to Γ (T^{*} M \otimes E)

(3.2.1)

such that the product rule

D (f s) = d f \otimes s + f D s

(3.2.2)

is satisfied for all smooth, real (or complex)-valued functions $f$ . But then the reader should ask: aren’t we essentially done? If we can define a covariant derivative for different matter fields directly, why introduce any other kind of bundle, what further structure do we need, for example, in order to write down a Lagrangian?

The problem, as it stands, is that each vector bundle has its own covariant derivative, and so the covariant derivatives of different matter fields are “uncoordinated.” Without such a coordination, covariant derivatives of different, but interacting fields would not “march-in-step.” This would imply that the notion of relative “charge,” for example, of electric charge of different matter fields, would be extremely history dependent, and unhelpful. The role of principal and associated bundle is to provide a mechanism for the coordination of covariant derivative among fields that have charges of the same type. In other words, associated vector bundles inherit their covariant derivatives from a single principal bundle, and so, if we tie each force to a principal bundle, we solve this coordination problem. I will discuss this further in Section 4.

Two Examples of Bundles

Example of a vector bundle: the tangent bundle. The tangent bundle, $T M$ serves again to illustrate these constructions. A smooth tangent vector field is a smooth assignment of elements of $T M$ over $M$ , in this case it is usual to, instead of $κ$ , use the notation $X \in Γ (T M)$ , with $π_{T M} : T M \to M$ , mapping $X_{x} \in T_{x} M \to x \in M$ . The tangent bundle $T M$ locally has the form of a product space, $U \times V$ , with $V ≃ R^{4}$ .

Example of a principal bundle: the bundle of frames of the tangent bundle. We can build a principal bundle as the set of all linear frames of $T M$ , called “the frame bundle” (where “frame” means “basis of the tangent space $T_{x} M$ ”), written $L (T M)$ . The fiber over each point of the base space $M$ consists of all choices ${e_{i} (x)}_{i = 1, \dots 4} \in L (T M)$ , of sets of spanning and linearly independent vectors (here the index $I$ enumerates the basis elements); and there is a one-to-one map between the group $G L (R^{4})$ and the fiber: we can use the group to go from any frame to any other (at that same point), but there is no basis that canonically corresponds to $I d \in G L (R^{4})$ . Similarly, given any vector bundle $E$ with typical fiber $V$ , the bundle of frames $L (E)$ forms a principal fiber bundle with $G L (V)$ as the structure group.

This example illustrates a feature of principal fiber bundles that distinguishes them from vector bundles: the fibers of a principal bundle can be mapped 1-1 not to a vector space but to a Lie group $G$ , and since the fibers have no preferred identity element, they are isomorphic to $G$ only as a homogeneous space.

Parallel Transport

We can now use this principal bundle to “coordinate” the parallel transport of different tensor fields.Footnote ²²

The bundle of frames is perfect for illustrating, in a familiar setting, how parallel transport is encoded by connections in principal fiber bundles. Directions transversal to the fiber will relate frames over neighboring points of $M$ ; they will tell us which basis over $x + δ x$ corresponds to a chosen basis over $x \in M$ . Imagining the manifold $M$ to lie horizontally on the page, we think of the fibers as vertical, and, on $P$ , we dub as horizontal a preferred set of directions transversal to the fibers, that we take as a preferential link between the frames on neighboring fibers. The horizontal space at a point is isomorphic to the tangent space of the base manifold under that point: $H_{p} ≃ T_{π (p)} M$ (cf. next subsection). So, the vertical spaces – the fibers – are part of the basic structure of the principal bundle, but a preferred choice of a transversal distribution – called horizontal – is not. Indeed, in physical theories, the principal bundle will be the fixed background on which the horizontal distribution is dynamical.

Thus, in the frame bundles, a horizontal direction at a point determines which frames in neighboring fibers correspond to each other, or are parallel transported. By expanding a vector field in these frames, the parallel transport of the vector fields is straightforwardly defined by constancy of the components of the vector in that parallel transported frame.

Now we will see precisely how these definitions fit together, and how we can understand the entire machinery of gauge theory geometrically.

3.2.2 The Mathematics of Principal Bundles*

A principal fiber bundle is, in short, just a manifold where some group acts, and whose equivalence classes under the group action correspond 1-1 to points of spacetime. In detail:

Definition 2 (a Principal Fiber Bundle) is a smooth manifold $P$ that admits a smooth free action of a (path-connected, semi-simple) Lie group, $G$ : that is, there is a map $G \times P \to P$ with $(g, p) \mapsto g \cdot p$ for some left action $\cdot$ and such that for each $p \in P$ , the isotropy group is the identity (i.e. $G_{p} := {g \in G | g \cdot p = p} = {e}$ ).

Naturally, we construct a projection $π : P \to M$ onto equivalence classes, given by $p \sim q \Leftrightarrow p = g \cdot q$ for some $g \in G$ . That is: the base space $M$ is the orbit space of $P$ , $M = P / G$ , with the quotient topology; that is, it is characterized by an open and continuous $π : P \to M$ .Footnote ²³ By definition, $G$ acts transitively on each fiber, that is, on each orbit of the group. Here, unlike in the general definition of a fiber bundle, we don’t need to postulate the local product structure: $π^{- 1} (U) ≃ U \times G$ ; it is easy to prove that this follows from Definition 2 (see the section “Local sections” for the proof).

The automorphism group of $P$ are fiber-preserving diffeomorphisms, that is:

Definition 3 Diffeomorphisms

τ : P \to P such that τ (g \cdot p) = g \cdot τ (p) .

(3.2.3)

Vertical automorphisms are those fiber-preserving diffeomorphisms for which $π \circ τ = π$ ; that is, they are purely “vertical” automorphisms of the bundle.

But to link fibers, we need to postulate more structure than just $P$ : we need a connection.

The Ehresmann Connection-Form

Given an element $ξ$ of the Lie-algebra $g$ , and the action of $G$ on $P$ , we use the exponential to find an action of $g$ on $P$ . This defines an embedding of the Lie algebra into the tangent space at each point, given by the hash operator: $#_{p} : g \to T_{p} P$ . The image of this embedding we call the vertical space $V_{p}$ at a point $p \in P$ : it is tangent to the orbits of the group, and is linearly spanned by vectors of the form

for ξ \in g : ξ^{#} (p) := \frac{d}{d t}_{| t = 0} (exp (t ξ) \cdot p) \in V_{p} \subset T_{p} P .

(3.2.4)

Vector fields of the form $ξ^{#}$ for $ξ \in g$ are called fundamental vector fields.Footnote ²⁴ The vertical spaces are defined canonically from the group action, as in (3.2.4). But we can define an “orthogonal” projection operator, $\hat{V}$ such that:

\hat{V} |_{V} = I d |_{V}, \hat{V} \circ \hat{V} = \hat{V},

(3.2.5)

and defining $H \subset T P$ as $H := k e r (\hat{V})$ . It follows that $\hat{H} = I d - \hat{V}$ and so $\hat{V} \circ \hat{H} = \hat{H} \circ \hat{V} = 0$ .Footnote ²⁵ Moreover, since $π_{*} \circ \hat{V} = 0$ it follows that:

π_{*} \circ \hat{H} = π_{*} .

(3.2.6)

As I said in the previous section, the connection-form should be visualized essentially as the projection onto the vertical spaces: given some infinitesimal direction, or change of frames, the vertical projection picks out the part of that change that was due solely to a different choice of frames, and the connection-form tells us what that change of frame was. The only difference between $\hat{V}$ and $ω$ is that the latter is $g$ -valued, Thus we get it via the isomorphism between $V_{p}$ and $g$ ( $ω$ ’s inverse is $# : g \mapsto V \subset T P$ ).

One often defines the connection directly, without appeal to vertical spaces:

Definition 4 (An Ehresmann connection-form) $ω$ is defined as a Lie-algebra valued one form on $P$ , satisfying the following properties:

ω (ξ^{#}) = ξ and {L_{g}}^{*} ω = {A d}_{g} ω,

(3.2.7)

where the adjoint representation of $G$ on $g$ is defined as ${A d}_{g} ξ = g ξ g^{- 1}$ , for $ξ \in g$ ; ${L_{g}}^{*}$ is the pull-back of $T P$ induced by the diffeomorphism $g : P \to P$ .

But it is possible to show that

Proposition 1 A Lie-algebra-valued one form on $P$ satisfies (3.2.7) if and only if $ω = #^{- 1} \circ \hat{V}$ (where $#^{- 1}$ is only defined in the restriction to the vertical subspace $V \subset T P$ ).

The relationship between the connection, the Lie-algebra, and the vertical projection is illustrated in Figure 2.

Figure 2 The relation between the Ehresmann connection form $ω$ and a vertical projection, on the principal fiber bundle with structure group $G$ . Taken from Wikipedia, under Creative Commons License.

If, on the second condition in (3.2.7), we take the infinitesimal pull-back, we get the Lie derivative along a vector $ξ^{#}$ on the left-hand side, and a Lie-algebra commutator on the right-hand side, that is,

L_{ξ^{#}} ω = [ω, ξ] .

(3.2.8)

This equation is only valid for fundamental vector fields, $ξ^{#}$ . But a vertical field may be vertical without being fundamental: we could take different $ξ \in g$ at different orbits (as discussed in footnote 24), that is, $Z_{p}^{v} := (ξ (π (p))^{#})_{p} \in T_{p} P$ , which we abbreviate to $(ξ (x)^{#})_{p}$ . The Lie derivative in (3.2.8) is not $C^{\infty}$ -linear on $ξ^{#}$ , and so we expect some difference when we compute $L_{ξ (x)^{#}} ω$ . To see what that is, we first define the inner derivative (or alternating contraction operator) on differential forms, $ι$ , so that the contraction of $ι_{ξ^{#} (π (p))} Λ$ is $C^{\infty}$ -linear in $ξ^{#} (x)$ , for any form $Λ$ . Thus:

ι_{ξ^{#} (x)} d ω_{p} = [ω_{p}, ξ (x)_{p}]

(3.2.9)

Now, I will merely state Cartan’s Magic Formula, describing the relation between the Lie derivative and inner and exterior derivative (which is proven inductively)

L_{ξ^{#} (x)} (∙) = (ι_{ξ^{#} (x)} d + d ι_{ξ^{#} (x)}) (∙) .

(3.2.10)

This equation is extremely useful in differential calculus, and here it can be used to compute:

L_{ξ^{#} (x)} ω = ι_{ξ^{#} (x)} d ω + d (ω (ξ^{#} (x))) = [ω, ξ (x)] + d ξ (x),

(3.2.11)

where, reinstating the $π (p)$ in place of $x$ , we read the action of the second term on $Z \in Γ (T P)$ as $d ξ (π (p)) (Z) = π_{*} (Z) [ξ (π (p))]$ , which, in a local trivialization takes the derivative of the spacetime function and leaves the Lie-algebra values intact. Equation (3.2.11) will be useful to compute the change of the connection under a change of gauge.

Let us pause here for a second to describe an important related notion, of horizontal lift. The horizontal lift of a vector $X_{x} \in T_{x} M$ through $p \in π^{- 1} (x) \subset P$ is a horizontal vector $X_{p}^{h}$ such that $π_{*} X_{p}^{h} = X_{x}$ .

Let $[[∙, ∙]]_{N}$ be the commutator of vector fields on a smooth manifold $N$ . Then it is easy to show that (see Kobayashi & Nomizu (Reference Kobayashi and Nomizu1963, Prop 1.3): (i) the lift of $X + Y$ is $X^{h} + Y^{h}$ , (ii) $f^{h} X^{h}$ is the lift of $f X$ , where $f^{h} := f \circ π$ , (iii) $\hat{H} ([[X^{h}, Y^{h}]])$ is the horizontal lift of $[[X, Y]]_{M}$ (this is the only nontrivial item, but it is easy to prove: for $\hat{H} ([[X^{h}, Y^{h}]]_{P})$ is horizontal, and $π_{*} \hat{H} ([[X^{h}, Y^{h}]]_{P}) = π_{*} ([[X^{h}, Y^{h}]]_{P}) = [[X, Y]]_{M}$ from the first two items.

In the principal bundle formalism parallel transport along a curve $γ : [0, 1] \to M$ , with $γ (0) = x$ and $γ (1) = y$ , is described via a horizontal lift $γ^{h}$ of $γ$ through a particular initial point, or frame, $γ^{h} (0) = p \in π^{- 1} (x)$ . So one might be inclined to think that parallel transport requires the stipulation of an initial $p \in π^{- 1} (x)$ , for example, an initial frame. But the horizontal lift commutes with the group action, $γ^{h} \circ L_{g} = L_{g} \circ γ^{h}$ (which follows from horizontal curves being sent to horizontal curves by translation of the origin; cf. Kobayashi & Nomizu (Reference Kobayashi and Nomizu1963, Ch. II Prop. 3.2). That means we can think of parallel transport as an isomorphism of an initial to a final fiber, for example, for the path $γ$ :

τ_{γ} : π^{- 1} (x) \to π^{- 1} (y) .

(3.2.12)

Given two different curves, $γ, γ^{'}$ , both between $x$ and $y$ , it follows that there exists a $g \in G$ such that $τ_{γ} = g \cdot τ_{γ^{'}}$ . By the composition properties of parallel transport, it is customary to focus only on closed curves $γ$ , starting (and ending) at $x \in M$ . For a path-connected $M$ , the subgroup generated by such elements for all such closed paths depends on the base point $x$ , only up to conjugation in $G$ . So usually, the total group generated by parallel transport around closed curves is called the holonomy group, denoted by $H o l (ω)$ .

It is clear that, since parallel transport can be thought of at the level of entire fibers, as in (3.2.12), there is a frame-independent abstract mathematical object that corresponds to the Ehresmann connection form, sometimes called the Atyiah-Lie connection. This is a section of the vector bundle $T^{*} P / G$ .Footnote ²⁶ In other words, if we know what parallel transport is at $p$ , we know what it is at $g \cdot p$ . By getting rid of this redundancy, we can find a global spacetime representation of the connection $ω$ . This Atyiah-Lie connection is a section on the bundle of connections, that is, $Υ \in Γ (T^{*} P / G)$ , where $T^{*} P / G$ is a vector bundle over spacetime.Footnote ²⁷

The Curvature

To define curvature, we note that an infinitesimally small parallelogram with horizontal sides that projects onto a closed parallelogram on $M$ , may not close on $P$ : if a horizontal parallelogram starts at $p \in P$ , it may end at $g \cdot p$ . In other words, the horizontal distributions need not be involutive.

Definition 5 (The curvature Ω of ω) is a Lie-algebra valued two-form on $P$ :

Ω (∙, ∙) := ω ([[\hat{H} (∙), \hat{H} (∙)]]_{P})

(3.2.14)

Let $P (M, G)$ be a principal fiber bundle and $ρ$ a representation of $G$ on a finite-dimensional vector space $V$ ; $ρ (a)$ is a linear transformation of $V$ for each $a \in G$ and $ρ (a b) = ρ (a) ρ (b)$ for $a, b \in G$ . And let $Λ^{n} (N)$ be the space of alternating n-forms on a smooth manifold $N$ .

Definition 6 (Pseudo-tensorial and tensorial forms.) A pseudotensorial form of degree $r$ on $P$ of type $(ρ, V)$ is a $V$ -valued $r$ -form $φ$ on $P$ such that

L_{g}^{*} φ = ρ (g) \cdot φ for g \in G .

(3.2.15)

Such a form $φ$ is called a tensorial form if it is horizontal, that is, $φ (X_{1}, \dots, X_{r}) = 0$ whenever at least one of the tangent vectors $X_{i}$ of $P$ is vertical, that is, tangent to a fiber.

In other words, a pseudo-tensorial form is covariant under the group action, but not necessarily horizontal: it is only when it is also horizontal that we call it tensorial.

Then we define

Definition 7 (The gauge covariant exterior derivative) of pseudo-tensorial forms as:

D φ := (d φ) \circ \hat{H} .

(3.2.16)

It is easy to show that, whereas $d φ$ is still only pseudo-tensorial, $D φ$ is tensorial, and not only pseudo-tensorial. In the language of principal fiber bundles, this is why minimal coupling, $d \to D$ , renders functions “coordinate-independent”: they acquire trivial dependence on the vertical directions along the fiber (which represent “coordinate” or frame changes).

We can rewrite (3.2.16) as:

D φ (∙) = d φ (∙) - d φ (ω (∙)^{#}),

(3.2.17)

where the second term is linear in $ω (∙)^{#}$ (i.e. can be read as $ι_{ω (∙)^{#}} d φ$ ) and can be understood as the vertical correction to the ‘gradient’ of $φ$ , so that this ‘gradient’ stays horizontal. We have:

ι_{ω (∙)^{#}} d φ = ρ (ω (∙)) φ .

(3.2.18)

Locally, in a trivialization of $P$ , we write $φ_{| U} \in Γ (Λ^{n} (π^{- 1} (U)) \otimes V)$ , so $φ_{| U} = φ_{i} e^{i}$ , where $φ_{i} \in Λ^{n} (π^{- 1} (U))$ is a real-valued pseudo-tensorial n-form, and ${e_{i}}$ is a basis for $V$ . Then $ρ (ω)_{| U} \in Γ (Λ^{n} (π^{- 1} (U)) \otimes G L (V))$ and we can write, in this basis:

(ρ (ω) φ)_{| U} = ρ (ω)_{j}^{i} \land φ^{j} e_{i},

(3.2.19)

which gives the usual expression for the action of the covariant derivative in (3.2.17). This action on other Lie-algebra valued forms will usually be written just as $[ω, φ]$ , with the understanding that, on a trivialization, we apply the $\land$ to the differential forms and the Lie bracket to the Lie algebra elements.

As with $ω$ , pseudo-tensorial forms are only required to be equivariant under the pointwise action of the group action, as in (3.2.15). Under spacetime dependent transformations, pseudo-tensorial forms are not necessarily equivariant (satisfying something like (3.2.15)). But the gauge-covariant derivative corrects for that: that is its role. In the infinitesimal case, we now prove:

Proposition 2 For a pseudo-tensorial form $φ$ as per Definition 6, under an infinitesimal vertical autormorphism, $ξ (x)^{#}$ , we have the equivariance property:

L_{ξ^{#} (x)} D φ = ρ (ξ) D φ

We use Cartan’s Magic Formula (3.2.10) and the Lie derivative of the Ehresmann connection, given in (3.2.11), in Equation (3.2.17), written as: $D φ = d φ - ρ (ω) φ$ . Applying $d ι_{ξ^{#} (x)}$ to anything horizontal, like $D φ$ , vanishes completely, since it first linearly contracts the horizontal form with a vertical vector. Now we apply $ι_{ξ^{#} (x)} d$ to $D φ$ , obtaining (first term vanishes since $d d = 0$ ):

\begin{matrix} ι_{ξ^{#} (x)} d (D φ) = - ι_{ξ^{#} (x)} d (ρ (ω) φ) \end{matrix}

(3.2.21)

So, first, as will be shown in Proposition 3 just below:Footnote ²⁸

- ι_{ξ^{#} (x)} d ω = - ι_{ξ^{#} (x)} (- [ω, ω]) = - [ξ (x), ω] + [ω, ξ (x)],

(3.2.22)

and so:

\begin{matrix} ι_{ξ^{#}} d (D φ) = & ρ (- [ξ, ω] + [ω, ξ]) φ - ι_{ξ^{#}} (ρ (ω) d φ) \end{matrix}

(3.2.23)

\begin{matrix} = & (- [ρ (ξ), ρ (ω)] + [ρ (ω), ρ (ξ)]) φ - ρ (ξ) d φ \\ + ρ (ω) ρ (ξ) φ \end{matrix}

(3.2.24)

\begin{matrix} = & ρ (ξ) (d φ - ρ (ω) φ) = ρ (ξ) D φ, \end{matrix}

(3.2.25)

where in going from the first to the second line, I used (3.2.18), and $ι_{ξ^{#}} ω = ξ$ , and going from the second to the third I used $[a, b] = \frac{1}{2} (a b - b a)$ . □.

From Proposition 2, it follows that:

L_{ξ^{#} (x)} Ω = [ξ, Ω] .^{29}

(3.2.26)

Footnote ²⁹

Proposition 3 The two next definitions of curvature are equivalent to (3.2.14):

\begin{matrix} Ω & = D Ω \\ Ω & = d ω + [ω, ω], \end{matrix}

(3.2.30)

where $d$ is the exterior derivative on $P$ .

The proof proceeds through explicit insertion of horizontal and fundamental vertical vector fields and multi-linearity. First, one writes

d ω (X, Y) = X [ω (Y)] - Y [ω (X)] - ω ([[X, Y]]_{P}),

(3.2.31)

the standard formula for the exterior derivative of a 1-form. (Note: this differs from the formula in some textbooks, such as in Kobayashi & Nomizu [Reference Kobayashi and Nomizu1963], by a factor of 2 on the left-hand-side; this gives a difference of 1/2 on the second term on (3.2.31)). So for two horizontal fields $X_{H}, Y_{H}$ , since $ω (X_{H}) = 0 = [ω (X_{H}), ω (Y_{H})]$ , it is immediate that :

(d ω) \circ \hat{H} (X_{H}, Y_{H}) = d ω (X^{h}, Y^{h}) = d ω (X^{h}, Y^{h}) + [ω (X^{h}), ω (Y^{h})] .

(3.2.32)

And using (3.2.31) it is immediate that $d ω (X^{h}, Y^{h}) = ω ([[\hat{H} (X^{h}), \hat{H} (Y^{h})]]_{P}) .$ This is the only case which has nonvanishing curvature. Now, for two vertical fields, the only nontrivial part of the equalities is to show that:

d ω (ξ^{#}, η^{#}) = - [ω (ξ^{#}), ω (η^{#})] .

We write, for $ξ, η \in g$ , $X = ξ^{#}, Y = η^{#}$ , and note that, because the orbits form integral submanifolds, commutators of vertical vector fields are vertical, and $[ξ, η]^{#} = [[ξ^{#}, η^{#}]]_{P}$ . So it follows from (3.2.31) that $d ω (ξ^{#}, η^{#}) = - [ξ, η]$ . I will only sketch the case for one vertical and one horizontal field (cf. Kobayashi & Nomizu (Reference Kobayashi and Nomizu1963, Theo. 5.2) for more detail). The idea is to show that the commutator $[[ξ^{#}, X^{h}]]_{P}$ between a fundamental vector field $ξ^{#}$ and a horizontal vector field $X^{h}$ (that is covariant under $G$ ) is horizontal as well. To show that, we define the horizontal field $X_{g \cdot p}^{h} = L_{g}_{*} X_{p}^{h}$ , and note that the Lie derivative $L_{ξ^{#}} X^{h} = lim_{t \to 0} \frac{1}{t} (X^{h} - L_{exp (t ξ)}_{*} X_{h})$ is also horizontal, since it is the difference between two horizontal vectors at $p$ , and so this ensures that the right-hand side of (3.2.31) vanishes. □

Proposition 4 (the Bianchi identity) $D Ω = 0$ – this is called the Bianchi identity.

By the definition, it is sufficient to compute its value on three horizontal vectors (the others vanish). The gauge-covariant exterior derivative is (anti)linear, so $\hat{H} (X^{h}, Y^{h}, Z^{h}) = (X^{h}, Y^{h}, Z^{h})$ and:

D Ω (X^{h}, Y^{h}, Z^{h}) = (d Ω) (X^{h}, Y^{h}, Z^{h}) = (d d ω + [d ω, ω] - [ω, d ω]) (X^{h}, Y^{h}, Z^{h}) = 0,

(3.2.33)

since every term has at least one contraction of $ω$ with a horizontal vector. The Bianchi identity is a nontrivial condition that any curvature satisfies (see Baez & Munian (Reference Baez and Munian1994, p. 278) for the geometric interpretation of this identity). □

In order to connect the definitions previously to the usual definition of $Ω$ in terms of the exterior product, we pick out a basis for the Lie-algebra, ${ϵ_{I} \in g}$ , with structure constants $c_{J K}^{I}$ defined by $ϵ_{i} c_{J K}^{I} = [ϵ_{J}, ϵ_{K}]$ . In terms of this basis, we write $ω = ω^{I} ϵ_{i}$ and (3.2.30) becomes:

Ω^{I} = d ω^{I} - c_{J K}^{I} ω^{J} \land ω^{K} .

(3.2.34)

Local Sections

Locally over $M$ , it is possible to choose a smooth embedding $σ$ of the group identity into the fibers of $P$ . These are called

Definition 8 (Local sections of P) are maps $σ : U \to P$ such that $π \circ σ = i d$ .

So for $U \subset M$ , there is a map $σ : U \to P$ such that $P$ is locally of the form $U \times G$ . For principal bundles, this need not be assumed, but follows from the definitions.

Proposition 5 (Local product structure) Any principal bundle $P$ , admits local diffeomorphisms $\overline{σ} : U \times G \to π^{- 1} (U)$ .

Here I will only sketch the proof. The idea is to build a tubular neighborhood (see e.g. Guillemin & Pollack (Reference Guillemin and Pollack2010)) around any given orbit. Roughly, we first construct a $G$ -invariant Riemannian metric on $P$ . In more detail, any differentiable manifold admits a Riemannian metric, and if the group $G$ is connected and compact, we can take a smearing – an integral over the group action, using the Haar measure – of the original metric. Now one finds the orthogonal space to the orbit, at a given point $p \in P$ and uses the Riemann exponential map to find a small “slice” that intersects each orbit in a neighborhood of $π^{- 1} (x)$ only once. This gives a local diffeomorphism between a neighborhood of $(x, I d) \in U \times G$ and a neighborhood of $p \in P$ . Moving the slice up and down according to the group action spans the entire “tubular” neighborhood of the orbit, giving a diffeomorphism between $π^{- 1} (U)$ and $U \times G$ . □

Definition 9 (A trivializing diffeomorphism) is a diffeomorphism $U \times G ≃ π^{- 1} (U)$ , given by $\overline{σ} : U \times G \to P$

The trivializing diffeomorphism is defined by a section $σ$ :

\overline{σ} : (x, g) \mapsto g \cdot σ (x), whose inverse is {\overline{σ}}^{- 1} : p \mapsto (π (p), g_{σ} (p)^{- 1})

(3.2.35)

where $g_{σ} : π^{- 1} (U) \to G$ gives $g_{σ} (p)$ as the unique group element taking $p$ to the local section, that is, $g_{σ} (p)$ is the group element such that

g_{σ} (p) \cdot p = σ (π (p)) .^{30}

(3.2.36)

Footnote ³⁰ Thus we have a condition:

g_{σ} (g \cdot p) = g_{σ} (p) g^{- 1} .

(3.2.37)

Call this equivariance of $g_{σ}$ between the given action of $G$ on $P$ and $G$ ’s action on itself.

A transition between the trivializing diffeomorphisms $\overline{σ}$ and $\overline{τ}$ takes an $(x, g)$ in the domain of $\overline{σ}$ to an element in $U \times G$ in the domain of $\overline{τ}$ by first taking $(x, g) \mapsto p = g \cdot σ (x)$ and then using the inverse $p \mapsto (π (p), g_{τ} (p)^{- 1})$ . Since

τ (x) = g_{τ} (σ (x)) \cdot σ (x) = g_{τ} (g_{σ} (p) \cdot p) \cdot (σ (x)) = g_{τ} (p) g_{σ}^{- 1} (p) \cdot σ (x),

(3.2.38)

it is clear that $g_{τ} (p) g_{σ}^{- 1} (p)$ will give the transition between the two sections. From (3.2.37):

g_{τ} (g \cdot p) g_{σ}^{- 1} (g \cdot p) = g_{τ} (p) g_{σ}^{- 1} (p),

(3.2.39)

so that the map $g_{τ} g_{σ}^{- 1}$ depends only on the fiber $π^{- 1} (x)$ , that is, depends only on $x \in M$ . We call

g_{τ σ} := g_{τ} g_{σ}^{- 1} : U \to G, the transition function between σ and τ .

(3.2.40)

Thus we get a local diffeomorphism from one trivialization to another:

{\overline{τ}}^{- 1} \circ \overline{σ} : (x, g) \mapsto (x, g g_{τ σ}) .

(3.2.41)

From (3.2.40) is straightforward to see that the transition functions obey:

\begin{matrix} g_{τ σ} g_{σ τ} & = I d \end{matrix}

(3.2.42)

\begin{matrix} g_{β τ} g_{τ σ} & = g_{β σ}; \end{matrix}

(3.2.43)

which are called the cocycle conditions.

Although I will not show it here, given an atlas of charts $U_{α} \subset M$ , and local sections $σ^{α}$ , we can define a principal bundle directly from the stitching together of local trivializations with transitions obeying the cocycle conditions (3.2.43); and a given $P$ with group $G$ is reducible to a $P^{'}$ with group $G^{'} \subset G$ , iff the transition functions lie in $G^{'}$ (see Kobayashi & Nomizu (Reference Kobayashi and Nomizu1963, Props. 5.2 and 5.3)).

The Gauge Potentials

Given local sections $σ$ on each chart domain $U$ , we define a local spacetime representative of $ω$ , as the pullback of the connection, $A^{σ} := σ^{*} ω \in Γ (Λ^{1} (U) \otimes g)$ ; (here $σ$ is not a spacetime index; we momentarily keep it in the notation as a reminder of the reliance on a choice of section). In coordinates $x^{μ}$ on $U \subset M$ , and for $ϵ_{I} \in g$ a Lie-algebra basis we write: $σ^{*} ω = A = A_{μ}^{I} d x^{μ} ϵ_{I}$ , and $A_{μ}^{I} \in C^{\infty} (U)$ . Similarly, we can define the field-strength $F^{σ} = σ^{*} Ω$ . It is important to note that the sections $σ$ are not usually horizontal: indeed, from (3.2.14) the horizontal distribution is involutive – and thus is the tangent to a submanifold of $P$ – iff the curvature vanishes. This is why, even though the connection-form $ω$ vanishes along horizontal directions, there is in general no section for which the pull-back $A^{σ}$ vanishes: it will vanish only for a fully horizontal section.

Proposition 6 We can use an infinitesimal transformation (as given in Equation (3.2.40)), obtaining an infinitesimally different section from a Lie-algebra valued function $ξ := ξ^{I} ϵ_{I} : U \to g$ , with coefficients $ξ^{I} \in C^{\infty} (U)$ . The infinitesimally different representative of $ω$ , already given in (2.2.7), is:

δ_{ξ} A := d ξ + [A, ξ] = D ξ, in coordinates: δ_{ξ} A_{μ}^{I} := \partial_{μ} ξ^{I} + [A_{μ}, ξ]^{I} = D_{μ} ξ^{I},

(3.2.44)

where $D_{μ} (∙) = \partial_{μ} (∙) + [A_{μ}, ∙]$ , the gauge-covariant derivative defined in (3.2.17), here acts on Lie-algebra valued scalar functions.Footnote ³¹

The proposition follows immediately from applying the pull back by $σ$ to the Lie derivative of $ω$ along a vertical direction, given in Equation (3.2.11). □

Similarly, for $F^{σ} = σ^{*} Ω$ (omitting the subscript $σ$ ):

F = d A - [A, A], in coordinates: F_{μ ν}^{I} = \partial_{[μ} A_{ν]}^{I} - [A_{μ}, A_{ν}]^{I},

(3.2.45)

with the square bracket in the subscripts denoting anti-symmetrization, where $\nabla_{μ}$ is the Levi-Civita covariant derivative on spacetime. Applying to (3.2.26) the same reasoning used to show that the gauge connection transforms as (3.2.44), we show that

δ_{ξ} F_{μ ν}^{I} = [ξ, F_{μ ν}]^{I} .

(3.2.46)

3.2.3 Associated Bundles

In Section 3.2.1 I said that the horizontal directions encode parallel transport in vector bundles, but I have not yet described this encoding. Again it is useful to illustrate the main ideas using the tangent bundle $T M$ and the frame bundle, $L (T M)$ . We proceed as follows: take a vector $X_{x}$ at a given point $x \in M$ : an element of the fiber $T_{x} M ≃ F = R^{4}$ , where according to a frame, ${e_{I} (x)} \in L (T M)$ we write $X_{x} = a^{I} e_{I} \in T_{x} M$ as the ordered quadruplet $(a^{1}, \dots, a^{4}) \in R^{4}$ . Each element of $P = L (T M)$ gives a linear isomorphism from $R^{4}$ into $T M$ . We can rotate the frame by a matrix $g_{J}^{I}$ to obtain ${g_{J}^{I} e_{I} (x)} \in L (T M)$ . The components of $X_{x}$ will change accordingly, as $a^{K} \mapsto a^{K} g_{K L}^{- 1}$ . With the two transformations, we obtain the same vector: $a^{K} g_{K L}^{- 1} g^{L I} e_{I} = a^{I} e_{I}$ . Thus, if we write a doublet $(p, v)$ as, respectively, the frame and the components, we want to identify $(g p, v g^{- 1})$ (where we have simplified the notation for the action of the group to be just juxtaposition). So we get an associated bundle, denoted by $T M ≃ L (T M) \times_{ρ} R^{4}$

L (T M) \times_{ρ} R^{4} = L (T M) \times R^{4} / \sim where (p, v) \sim (g p, v g^{- 1}),

(3.2.47)

and denote the equivalence classes with square brackets: $[p, v] \in L (T M) \times_{ρ} R^{4}$ . More generally, $E$ is a vector bundle over $M$ with typical fiber $V$ that is associated to $P$ with structure group $G$ , iff:

P \times_{ρ} F = P \times F / \sim where (p, v) \sim (g p, ρ (g^{- 1}) v),

(3.2.48)

where $ρ : G \to G L (V)$ is a representation of $G$ on $V$ . Similarly as to the case with $R^{4}$ , given any vector bundle $E$ , we could construct a principal bundle as $L (E)$ , and recover $E = L (E) \times_{ρ} V$ .

Connections on an Associated Bundle

Once we have constructed associated bundles in this way, parallel transport for any vector bundle comes naturally from a notion of horizontality in the principal bundle. To find the parallel transport of the vector $X_{x}$ along $Y_{x}$ , take the curve $γ (t) \in M$ with $γ (0) = x$ , so that $γ^{'} (0) = Y_{x}$ . Given a frame $p_{x} \in P$ so that $π (p_{x}) = x$ , we take the horizontal lift of $γ (t)$ through $p_{x}$ : call it $γ^{h} (t)$ . Let $X_{x} = [p_{x}, v]$ , where $v \in V$ are the components of $X_{x}$ in terms of the basis $p_{x}$ . By definition, the curve in $E$ given by $[γ^{h} (t), v]$ is parallel transported, that is, gives a parallel transport of $X_{x}$ along $γ (t)$ . Now, we define $v_{X} : P \to V$ such that, for all $p \in P$

X (π (p)) = [p, v_{X} (p)], where v_{X} (g \cdot p) = g^{- 1} v_{X} (p);

(3.2.49)

that is, $v_{X} (p)$ are the components of $X (π (p))$ on the basis $p$ (and therefore $v_{X}$ obeys the covariance property on the right of (3.2.49)). Thus we define the covariant derivative of $X$ along $Y$ at $x$ , as:

D_{Y} X (x) := (γ^{h} (0), {(\frac{d}{d t})}_{t = 0} v_{X} (γ^{h} (t))),

(3.2.50)

where ${(\frac{d}{d t})}_{t = 0} v_{X}$ acts component-by-component. In words, we compare the parallel transported components of $X$ with the actual components of $X$ ; their nonconstancy corresponds to the failure of $X$ to be parallel transported, and to the non-vanishing covariant derivative of $X$ . In this way a covariant derivative is just the standard derivative of the vector components as described in the horizontal – or parallel transported – frame.

In practice, this definition is employed by choosing a particular trivialization, or basis of frames on an open set $U \subset M$ for an associated vector bundle $E$ , with typical fiber $V$ ; this is a section of the bundle of frames $L (E)$ . Call this basis $σ = {e_{i}}_{i = 1}^{k}$ and its algebraic dual $σ^{*} = {e^{i}}_{i = 1}^{k}$ . A linear transformation of $E_{x}$ is an element of $E n d (E_{x}) := E_{x}^{*} \otimes E_{x}$ , and we can describe the extent to which the chosen basis is nonparallel along a certain direction by a 1-form valued on such linear transformations, which we write as:

ω^{σ} = ω^{σ}_{i}^{j} \otimes e^{i} \otimes e_{j} \in Γ (T^{*} U \otimes (E \otimes E^{*}))

(3.2.51)

where $ω^{σ}_{i}^{j} \in Γ (T^{*} U)$ . Thus, for $X \in (T_{x} M)$ ,

D_{X} e_{j} = ω^{σ}_{j}^{i} (X) e_{i} .

(3.2.52)

Now for some section of the real (or complex) vector bundle $s \in Γ (E)$ , we locally write $s = s^{i} e_{i}$ , and the covariant derivative of $s$ becomes:

D s = d s^{j} \otimes e_{j} + s^{i} ω^{σ}_{i}^{j} \otimes e_{j} .

(3.2.53)

Admissible Bases and Subgroups of $G L (n)$

If the principal bundle is construed as just a bundle of linear frames, how can we justify a restriction of to a subset of the most general group of transformations between frames? The restriction corresponds to the preservation of some added structure to $V$ . In other words, when $V$ is not just a bare vector space, but, for example, a normed vector space, we would like changes of basis to preserve this structure, for example, the orthonormality of the basis vectors, and this restricts the bundle of linear frames to the appropriate sub-bundle, of admissible frames. This sub-bundle has as its structure group the automorphisms of a typical fiber that has more than just the linear vector space structure that we started off with.

Let us illustrate the relationship between the structure of the fiber and the set of frames that are adapted to it. Suppose that the typical fiber has an added positive-definite inner product structure: $(\cdot, \cdot)$ as the canonical inner product in $R^{4}$ , and $p \in P$ as a linear isomorphism from $R^{4}$ to $T_{π (p)} M$ . Then we can define an inner product on $T M$ , for $X, Y \in T_{x} M$ as

(p^{- 1} X, p^{- 1} Y) = ⟨ X, Y ⟩,

(3.2.54)

where invariance of $(\cdot, \cdot)$ by $O (n)$ implies the inner product is independent of which basis $p \in π^{- 1} (x)$ we take. The converse – that a Riemannian structure $⟨ \cdot, \cdot ⟩$ for the associated bundle induces a subbundle for $L (T M)$ – is also easy to show: again, seeing $p \in P$ as a linear isomorphism from $R^{4}$ to $T_{π (p)} M$ , $p \in P \subset L (T M)$ iff given $u, v \in R^{4}$ we have $(u, v) = ⟨ p u, p v ⟩$ .Footnote ³² This corresponds to $G = O (4)$ ; similarly, $S O (4)$ adds an orientation to $R^{4}$ . But we can extend these constructions to more general cases, in which the typical fiber is not soldered onto spacetime. For instance, $G = U (n)$ corresponds to $V$ being a complex n-dimensional vector space with a Hermitean inner product on; and $G = S U (n)$ adds an orientation, and so on. The moral is that the added structure on $V$ induces an added structure on the associated vector bundle if and only if the transformation group $G \subset G L (n)$ preserves that added structure.

3.2.4 Getting Rid of Associated Bundles

Here we will look at how certain structures of $E$ seen as an associated bundle to $P$ (e.g. as $L (E) \times_{ρ} V$ ) can be understood directly on $E$ , without mention of $P$ .

Relation to Connections of $E$ Expressed without Frames of $L (E)$

We can also describe a connection on $E$ directly in terms of a trivialization of $E$ , without mentioning $L (E)$ and a choice of basis therein. For that, recall the expression of the covariant derivative directly in terms of the vector bundle, as in Equation (3.2.2). Call $C (E)$ the space of covariant derivatives for $E$ , and let: $Δ (E) := Γ (T^{*} M \otimes E n d (E))$ . Given any $D_{o}, D \in C (E)$ , there exists a $ω_{D}^{o} \in Δ (E)$ such that $D_{o} - D = ω_{D}^{o}$ . Therefore the map:Footnote ³³

\begin{matrix} Δ (E) & \to C (E) \\ ω & \mapsto D_{o} - ω \end{matrix}

(3.2.55)

is a bijection: that is, the space of covariant derivatives is an affine space over the vector space of connections, $Δ (E)$ . This is why, in any trivialization of $E$ – a trivialization that plays the analogous role of the choice of frames of $L (E)$ in (3.2.53) – we can take $D_{o} \to d \otimes I d$ , and take connections to parametrize the space of covariant derivatives. Ultimately, it is why the covariant derivatives are described as vector bosons: 1-forms valued on $E n d (E)$ , a fact that will be important in Section 4.

Of course, under a change of frame, $ω^{σ}$ given in (3.2.51) will transform in the familiar, inhomogeneous form, given in (3.2.17) (or (3.2.44)). This gives a passive interpretation of gauge transformations. But we can formulate the corresponding active interpretation in terms of $Δ (E)$ by considering two fiber-wise linearly isomorphic vector bundles, $E, \tilde{E}$ , over $M$ (i.e. related by a diffeomorphism $f : E \to \tilde{E}$ such that $π_{E} \circ f = π_{\tilde{E}}$ , where $f$ takes $π_{E}^{- 1} (x) \to π_{\tilde{E}}^{- 1} (x)$ by a linear isomorphism).

Two connections, $D$ and $\tilde{D}$ , in two linearly isomorphic vector bundles are isomorphic if they are related by conjugation by the linear isomorphism. This relation guarantees that the following diagram commutes (for all $X \in Γ (T M)$ ):

\begin{matrix} Γ (E) & \overset{D_{X}}{\to} & Γ (E) \\ f ↓ & ↓ f \\ Γ (\tilde{E}) & \to_{{\tilde{D}}_{X}}^{} & Γ (\tilde{E}) \end{matrix}

Thus we can represent the connection $D$ under a bundle isomorphism obtaining a new connection

{\tilde{D}}_{X} (s) = f D_{X} (f^{- 1} s) \Rightarrow {\tilde{D}}_{X} = f D_{X} f^{- 1}

(3.2.56)

or equivalently, $f D_{X} = {\tilde{D}}_{X} f$ . And, of course, if $D$ is related to $ω$ and $\tilde{D}$ is related to $\tilde{ω}$ then the relationship between $ω$ and $\tilde{ω}$ is that given in (3.2.11) (or (3.2.44)).Footnote ³⁴

The Structure Group $G$ as a Holonomy Group

As with the fiber-wise application of (3.2.12), which could be seen in terms of frames, we can define parallel transport for vector bundles as a linear isomorphism between different fibers. Given a covariant derivative (3.2.2) and a curve $γ \in M$ such that $γ (0) = x$ , where $E$ is the vector bundle and $E_{x}$ is the fiber over $x \in M$ , we define the parallel transport along $γ$ as a unique linear isomorphism:

τ_{γ (t)} : E_{x} \to E_{γ (t)}

(3.2.58)

such that given any $X_{x} \in E_{x}$ ,

D_{γ^{'}} (τ_{γ (t)} (X_{x})) = 0,

(3.2.59)

where $τ_{γ (t)} (X_{x}) \in Γ (E |_{γ})$ . or $γ, γ^{'} : [0, 1] \to M$ , with $γ (0) = γ^{'} (0)$ and $γ (1) = γ^{'} (1) = y$ :

g \cdot τ_{γ} = τ_{γ^{'}},! g \in E n d (E_{y}),

(3.2.60)

If the covariant derivative preserves the structure on the typical fiber (so would correspond to an Ehresmann connection on the bundle of admissible frames, as described next), then in (3.2.60) we have $g \in A u t (E_{y}) \subset E n d (E_{y})$ , where $A u t (E_{y})$ is the group of linear automorphisms that are not only linear (so not only in $E n d (E_{y})$ ) but that preserve the added structure on $E_{y}$ . Alternatively, by the composition properties of parallel transport, we can see parallel transport around a closed curve starting (and ending) at $x \in M$ as an element $g \in A u t (E_{x})$ . If we take all the closed curves, this generates a subgroup of $A u t (E_{x})$ called ${H o l}_{(x)} (D)$ .

It can be shown that, on a simply-connected region, the holonomy depends on $x$ only up to conjugation by a group element. Thus it is customary to refer to the path-independent $H o l (D)$ as the holonomy group $H o l (D)$ . From “Relation to Connections of E Expressed without Frames of $L (E)$ ” section, it follows that, for two linearly isomorphic bundles, $E, \tilde{E}$ , $H o l (D) = H o l (\tilde{D})$ . It can also be shown that, given a connection $D$ , one can find a principal bundle $(P, M, G)$ , with a connection $ω$ , such that the holonomy group is isomorphic (as a $G$ -torsor) to the structure group $G$ , and $E$ is an associated bundle to $P$ with $D$ being the induced connection from $ω$ (cf. Michor (Reference Michor2008, Theo. 17.11)).

3.3 Summary of Classical Gauge Theory

We are now in place to summarize the basic ingredients for the classical description of the interaction between a particle and a gauge field, whose elements we have surveyed thus far. To do so we need to employ both associated vector bundles, principal fiber bundles and Ehresmann connections. Different matter fields are represented as sections of different vector bundles. These fields interact via different forces of nature, with each force being associated to a Lie group. By associating a collection of vector bundles with the same principal bundle, we ensure that the parallel transports of a collection of matter fields that are charged under the same force are coordinated. For example, charged scalar, electron, and quark-fields all interact electromagnetically; and that interaction is mediated by the same fundamental electromagnetic field (mutatis mutandis, for other interactions, e.g., replacing “electromagnetism” by the “strong force”). This means that the relevant covariant derivative operators on the vector bundles in which these matter fields are valued have the same parallel transport properties.

So here are the basic ingredients of a gauge theory of particles:

1. A smooth (semi) Riemannian manifold $M$ . – This plays the role of spacetime.
2. A finite-dimensional vector space $F$ equipped with an inner product $⟨ \cdot, \cdot ⟩$ . - This is the space where the field corresponding to a particle takes its values. This space is determined by the internal structure of the particle in question (phase, isospin, etc.) and is called the internal space. Typical examples are $C, C^{2}, C^{4}$ or (in the standard interpretation; not in that of Section 4) Lie algebras $u (1), s u (2)$ .
3. A Lie group $G$ and a representation $ρ : G \to G L (V)$ orthogonal with respect to $⟨ \cdot, \cdot ⟩$ . - $G$ then acts on the bases of internal states at each point. The orthogonality of the representation is necessary for the inner product not to depend on the chosen basis of internal states.
4. A principal $G$ -bundle over $M$ : $(P, π, M, G)$ . - This bundle can be identified with the bundle of admissible $G$ -bases over $M$ . A section of $P$ is an admissible $G$ -reference relative to which we describe, for example, our wave function.
5. A connection $ω$ on $P$ , with curvature $Ω$ . - This connection provides us with the intrinsic variation of bases. Applied over a local reference $s$ , we obtain the local gauge potential, $A = σ^{*} ω$ . Similarly, we obtain the local curvature, $F = σ^{*} Ω$ . Thus far, all of the previous items have described nondynamical features of the models; that is, not subject to a gauge variational principle; this is the first dynamical element of the theory.
6. A global section $φ$ of the associated vector bundle $P \times_{ρ} F$ . - Matter fields will be associated with such sections that satisfy the Euler-Lagrange equations of some action functional involving the local potentials $A$ .
7. An action $S (Φ, ω)$ whose stationary points are classical solutions. - Typically, this functional is of the form:
$S (φ, ω) = a \int_{M} ∥ Ω ∥^{2} + c ∥ D φ ∥^{2}$ (3.3.1)
where $D$ is the covariant exterior derivative determined by $ω$ , which ensures, together with the norm on the algebra and tensor fields $∥ ∙ ∥$ , that the action functional is gauge invariant. The constant $a$ is called the normalization constant, and $c$ is the coupling constant.

In the next section, we will see how these ingredients come together in models of particle physics.

4 Why Gauge? A Geometrical Reason

In Section 3 we saw one sense in which gauge theory is geometrical, and here we will look at another. Clearly, the label “geometrical” is ambiguous. For instance, it is often taken to connote properties related to distance relations. Although there is one interpretation of gauge theories and gauge transformations that is geometric in this sense – called Kaluza-Klein theory – that is also not the sense we will focus on here. Here I want to assess whether gauge transformations can be understood naturally as automorphisms of a local and internal geometric structure, like Lorentz transformations are automorphisms of the local Lorentzian metric; and whether the Ehresmann connection can be understood as determining parallel transport for this internal geometry, like an affine connection determines parallelism for tensor fields over spacetime.

In this enterprise we encounter two putative disanalogies: one minor and one major. In brief, the minor putative disanalogy between local gauge symmetries and local spacetime symmetries is that, apparently, the Ehresmann connection makes ineliminable use of principal fiber bundles, whereas the Levi-Civita connection for spacetime does not. However, as I presaged in Section 3.2.1 (cf. discussion after Equation (3.2.2)), this minor disanalogy is a consequence of the fact that for spacetimes we only use tensor bundles over $T M$ , whereas we seem to need unrelated vector bundles for gauge theory. This fact leads to the major putative disanalogy: in the gauge case we stipulate by hand that different vector bundles are associated to the same principal bundle, which is why they covary under parallel transport, whereas in the spacetime case different tensor bundles obligatorily covary under parallel transport. In Section 4.1 I will describe this major putative disanalogy in more detail. Then, in Section 4.2, I will describe the minor putative disanalogy and already dispel it by recalling aspects of Section 3.2.4. In Section 4.3 I will dispel the major putative disanalogy: a more laborious enterprise, that will involve showing that the whole content of the SM consists of fields living on certain internal spaces.

And here is this section’s answer to these putative disanalogies, in slogan form: gauge transformations can be understood naturally as automorphisms of an internal geometric structure, to which the theory is ontologically committed; and the Ehresmann connection can be understood as defining parallel transport in these spaces, similarly to the Levi-Civita connection determining parallelism for tensor fields over spacetime.

4.1 A Major Putative Disanalogy between Gauge Theory and Gravity

In their modern mathematical guise, particles exist as sections of distinct vector bundles over spacetime. In more colloquial terms, particles are described by fields that take values in a variety of internal vector spaces coexisting over each spacetime point. In the standard mathematical explanation that we saw in Section 3.2.2, those fields that interact are associated to a single principal $G$ -bundle, $P$ – where $G$ is the symmetry group regimenting a particular interaction – and each principal bundle is endowed with a single Ehresmann affine connection $ω$ . Thus, in the Standard Model of particle physics (SM henceforth), all fields charged under the same gauge group get their parallel transport from the same mathematical object, $ω$ ; that is why their parallel transport “marches in step.” In the words of Weatherall (Reference Weatherall2016: p. 2401):

Principal bundles are auxiliary [in the sense that only] vector bundles represent possible local states of matter; principal bundles coordinate between these vector bundles ... [they are auxiliary] in the sense in which a coach is auxiliary to the players on the field.

This is a beguiling metaphor, but is it explanatory? It certainly falls short of the familiar geometric explanation for symmetry and parallel transport that we get in general relativity. There, all tensor fields co-rotate under parallel transport because they are sections of vector bundles built from the same tangent bundle, $T M$ . It is the tangent bundle that underpins a unified account of parallel transport for tensor fields.

In the gauge case, the textbook tradition – indeed, so far as I know, the extant literatureFootnote ³⁵ – reveals no similarly powerful explanation for why the fields that couple through the strong force march in step under parallel transport.

Of course, there is a straightforward definition of covariant derivative for an arbitrary vector bundle (given in Equation (3.2.2)) that specializes, when the vector bundle is the tangent bundle, to the usual definition of covariant derivatives. This definition does not mention frames, groups, and so on. So there is no disanalogy there. But that formulation of covariant derivative is “bundle-solipsist”: it works for each matter field but offers us no link between fields. Using this covariant derivative leaves the “marching-in-step” of sections of any two different vector bundles under parallel transport completely mysterious.

Thus a halfway house to solving this “coordination problem” is the textbook’s demand that particles whose parallel transport should march in step are all associated to the same principal fiber bundle, with the same structure group and Ehresmann connection. This is only a halfway house because, in Weatherall’s vivid metaphor, in the textbook tradition we choose to assign a single coach to all of the players.

To summarize, here is the major putative disanalogy that we are addressing in this section: it is clear why in general relativity the same Levi-Civita connection should guide the parallel transport of different tensor fields; it is clear why they co-rotate or march in step. Whereas it is not yet clear why the same Ehresmann connection should guide the parallel transport of different gauge fields.

4.2 Parallel Transport and Frame Dependence $#$

Here is a minor putative disanalogy, that we should get out of the way. Lie groups seem to appear explicitly in the principal fiber bundles encoding the parallel transport of particles field, whereas these groups need not be invoked for the parallel transport of tensor fields in spacetime.

The formulation of general relativity that is most apt to expound the two putative disanalogies employs an orthonormal basis of vectors at each spacetime point and a connection-form that describes their parallel transport.

The different orthonormal bases are related by elements of the Lie group $O (3, 1)$ (or $S O (3, 1)$ , if spacetime orientation is important). But this is also the group that leaves the Minkowski metric on a $3 + 1$ space invariant (and its subgroup of orientation preserving transformations). In other words, the symmetry group – for example, $S O (3, 1)$ – that acts on the orthonormal bases is tied to the preservation of the structure of a ‘typical fiber’; so $S O (3, 1) ≃ A u t (T_{x} M)$ , with only the particular isomorphism being given by a choice of frame. Thus, in general relativity, the reason we obtain an $S O (3, 1)$ action on the space of frames is that each fiber $T_{x} M$ has a Lorentzian inner product structure.

Similarly, as we saw in Section 3.2.3, Lie groups of a principal bundle seen as a bundle of frames of a vector bundle reflect the structure of the vector bundle’s typical fiber in a frame independent way: the gauge group is no longer postulated as fundamental but instead acquires meaning as the invariance group of the typical fiber of $E$ . Moreover, as we saw in “The Structure Group $G$ as a Holonomy Group” section, we can think of parallel transport on a vector bundle in a frame-independent way as being a structure preserving map, carrying the fiber’s structure from one point of spacetime to another along a spacetime path (cf. Equation (3.2.60)). Differences in parallel transport give rise to the holonomy group, which recovers the gauge group $G$ (as the group of automorphisms of the typical fiber). Just as well, they recover in this manner the group $S O (3, 1)$ in the case of spacetime.

In sum, for both spacetime tensors and vector bundles, covariant derivatives can be characterized invariantly, without mentioning frames, gauges, and so on. In the same way we think of the Levi-Civita connection as determining the rotation of the local tangent space as one moves from one point to another (and not as the explicit transport of a specific tangent basis), we think of an affine connection on a vector bundle as determining the rotation of internal spaces.

4.3 How to Dispel the Putative Disanalogy: The Internal Spaces $#$

This section will dispel the second putative disanalogy between parallel transport of spacetime and internal quantities. And here is the compulsory warning: this section’s approach to gauge theory is idiosyncratic; it is not part of the standard lore about gauge theory and so the Section gets a $#$ .

Let us first set aside all questions about the ‘external’ spacetime geometry. A matter field can be described as the tensor product between an “internal” and an “external” component: the internal space – for example, $C^{2}$ – on which gauge fields take values, and external spinor fields in the case of matter fields, or external tensor fields $X$ in the case of gauge bosons. So, in the standard formulation, gauge fields are acted on by representations of the gauge group and its Lie algebra, while, for example, spinor fields are acted on by representations of the Spin group and its Lie algebra ( $s o (3, 1)$ ), which correspond to changes of frames for the tangent bundle. Here, I will focus only on the gauge part.

Now, in order to interpret the Ehresmann connection and gauge transformations as on a par with the Levi-Civita connection we need to respond to the major putative disanalogy: in the SM, different fields live in different spaces, and the Ehresmann connection lives outside of these spaces but plays an auxiliary role. Here we will see that interacting fields can be seen as sections of bundles built up from the same internal spaces, or typical fibers. For instance, in the same way that a symmetric, covariant tensor of rank two is built from two copies of $T M$ , (the internal part of) quarks will have components in a typical fiber isomorphic to $C^{3}$ , and gluons will be certain (symmetrized, traceless) tensor bundles, involving $C^{3}$ and $C^{*}^{3}$ . Thus, by describing the connection form $ω$ in the bundle of admissible frames of $(E, M, C^{3})$ , we have a geometric reason for the parallel transport of the different quarks and leptons marching in step. This allows us to understand a principal bundle $P$ , not as “fundamental and yet auxiliary,” but as a bundle of frames of a single vector bundle $E$ for each force, which is what figures in our ontology. This argument will, of course, require a brief description of the particle content of the SM.

The SM is represented in terms of Weyl fermions, which are two-component spacetime spinors. But I am only interested in the structure of the internal spaces; the spaces where the gauge connections act. So here I am basically ignoring the spacetime spinor structure of the SM (though they are somewhat implicit in the notation of left- or right-handed particles to be used subsequently). When representing the full fermionic content of the SM, this spinor part would be included as factors in a tensor product with the internal part that I am interested in and aim to describe in this section. I will get back to this point next.

The part of the (minimal) SM that I am interested in consists of forty-five complex numbers, organized into three generations, which means it has the same structure repeated three times. We can understand this repetition in terms of direct sums:

C^{45} = C^{15} \oplus C^{15} \oplus C^{15}

(4.3.1)

The following table tells us how these components transform, and it is organized into blocks whose elements can transform into each other (elements from different generations, or blocks, cannot). So each $C^{15}$ breaks down into the five rows of the following table (I will here only focus on the first generation).Footnote ³⁶

Now let us unpack Table 1. First, the columns are labeled with the groups that are associated to the types of interaction: strong ( $S U (3)$ ), weak ( $S U (2)$ ), and hyperweak ( $U (1)$ ).

The quarks: are represented by the first three rows of the table. As to the first column: quarks clearly feel the strong forces, and they transform under the standard, or fundamental, representation of $S U (3)$ , labeled “3,” which just means $S U (3)$ acts on elements of $C^{3}$ via matrices which preserve the volume element and complex inner product of $C^{3}$ . So the components of quarks corresponding to the first row can be seen as vectors in internal spaces isomorphic to (a structured) $C^{3}$ . Now, $q_{L}$ is a left-handed quark doublet, which is a doublet of the form $q_{L} = (u_{L}, d_{L})$ . In the first generation this would be called up-left and down-left, respectively; in the second generation it would be charm-left and strange-left, and in the third generation it would be top-left and bottom left. The reason $q_{L}$ is called a doublet – unlike the two rows beneath it, representing the up-right and the down-right quarks, $u_{R}$ and $d_{R}$ which are singlets – is that the components of $q_{L}$ , namely, $u_{L}$ and $d_{L}$ , are charged under the weak nuclear force, and transform into each other under the action of $S U (2)$ . In the entry corresponding to $q_{L} \times S U (2)$ this transformation property is represented by the number 2, which means that $q_{L}$ transforms as an element of $C^{2}$ under the fundamental representation of $S U (2)$ . The number 1 for the entries $u_{R} \times S U (2)$ and $d_{R} \times S U (2)$ means that $u_{R}$ and $d_{R}$ are neutral under the weak forces, so cannot transform into each other (because, being singlets, they don’t transform at all under $S U (2)$ ). Finally, the left-handed quark has a “weak hypercharge” of $- 1 / 6$ under $U (1)$ , which means that it is a complex number (an element of $C$ ) which under the action of a given $U (1)$ phase shift generator $ξ$ , has its phase rotate at the rate of $- ξ / 6$ (or $e^{i ξ / 6}$ ); mutatis mutandis for the down-right and up-right quarks.Footnote ³⁷
The leptons: are represented by the remaining three rows in the table and have a kind of parallel structure to the quarks, but, of course, they are all neutral under $S U (3)$ (they are not charged under strong interactions). $ℓ_{L}$ is the left-handed lepton doublet, which is of the form $ℓ_{L} = (e_{L}, ν_{L})$ . In the first generation these are the left-handed electron and neutrino (in the second and third they get “muon” and “tau” prefixes). Again, we put $e_{L}$ and $ν_{L}$ in the same row because they are charged under $S U (2)$ (they are charged under the weak forces), and transform into each other, unlike the particle of the remaining row – the right-handed electron $e_{R}$ which is neutral under $S U (2)$ . The hypercharge of $ℓ_{L}$ is $- 1 / 2$ (which does not coincide with its electric charge; see footnote 37). The electric charge of the right-handed electron, is, as expected, 1.

With the basic ingredients in place, I will now, in Section 4.3.1 defend my interpretation of Table 1, arguing that it dispels the major (putative) disanalogy between gauge and gravity that I described previously. In Section 4.3.2, I will present five possible objections to my interpretation.

4.3.1 Interpretation

The first two columns of Table 1 contain only one kind of nontrivial representation: the fundamental. So, in these columns, elements of $S U (3)$ and $S U (2)$ are $3 \times 3$ and $2 \times 2$ matrices, respectively, acting on elements of $C^{3}$ and $C^{2}$ , preserving their canonical inner product and oriented volume.Footnote ³⁸ The third column, under $U (1)$ is, in one sense, the most familiar from classical electromagnetism: it represents an overall phase, where different charges transform with different rotation speeds under $U (1)$ .Footnote ³⁹

So we clearly have $C^{3}, C^{2}, C^{1}$ over each spacetime point, where particles take their values. These are the typical fibers of three different fundamental vector bundles, call them $(E^{3}, M, C^{3}), (E^{2}, M, C^{2}), (E^{1}, M, C^{1})$ , or $E^{3}, E^{2}, E^{1}$ for short, where, for each, a fiber at a point is isomorphic to a complex vector space with inner product and orientation: for $π_{n} : E^{n} \to M$ , $π_{n}^{- 1} (x) ≃ C^{n}$ (but recall: there is no canonical isomorphism). Each of these vector bundles is analogous to $T M$ in the spacetime case, and we also naturally have the dual bundles (of linear functionals): $E^{3}^{*}, E^{2}^{*}, E^{1}^{*}$ , that are necessary in order to represent the corresponding anti-particles. The group of automorphisms of these fibers are, again, (noncanonically) isomorphic to $S U (3), S U (2)$ , and $U (1)$ , respectively, which necessarily emerge via (3.2.60), or upon the introduction of a frame, as explained in Section 4.2.

Now, as usual, we can join these vector bundles in different ways, using different kinds of products; and as for tensor fields over spacetime, here too, the most important for our purposes is the tensor product.Footnote ⁴⁰ Of course, a group action or representation on a vector space $V$ induces a representation on arbitrary tensor products of $V$ and $V^{*}$ ; and so it is here: the structure of the typical fiber defines a group that acts on that typical fiber, and that action naturally extends to all tensor products.Footnote ⁴¹

In the first row the left-handed quark doublet has components lying along $C^{3}, C^{2}$ , and $C^{1}$ : we must locate it within a space of three colors, and of two isospin charges, and of one hypercharge. The internal part of the left-handed quark doublet is a section of the bundle

q_{L} \in Γ (E^{3} \otimes E^{2} \otimes E^{1}) .

(4.3.2)

Unlike the first row of Table 1, the particles in the following two rows have no component along $C^{2}$ , which is why they are not charged under $S U (2)$ . So for example, the down-right quark has three options for color, and only one option for isospin and electric charge. In contrast, the left-handed lepton doublet has no components along $C^{3}$ , but has components along $C^{2}$ ; and the right-handed electron has no components along either $C^{3}$ or $C^{2}$ (that is why it is not charged under either the strong or the weak interactions) it only has components along $C^{1}$ (cf. footnote 37).

As I said previously, the odd man out in Table 1 is the third column, corresponding to the $U (1)$ weak hypercharges, since there we have multiple non-neutral values. How should we interpret the different weak hypercharges as properties of sections of vector bundles? One immediate answer comes from a rather trivial technical point. Since $C^{1}$ has complex dimension 1, arbitrary tensor products of $C^{1}$ will also have complex dimension 1.Footnote ⁴² But if a particle is, formally, a section of a vector bundle $E^{1} \otimes E^{1} := E_{2}^{1}$ , under a rotation of $E^{1}$ ’s typical fiber $C$ by $θ$ , because of the multilinearity of the tensor product, that section of $E_{2}^{1}$ picks up a phase of $2 θ$ . Thus, formally, taking the lowest charge as the unit, we can think of a weak hypercharge of $\frac{N}{6}$ as being due to the $N$ -th tensor product of $E^{1}$ , which we call $E_{N}^{1}$ , and negative charges are sections of tensor products of $(E^{1})^{*}$ . But, precisely because these tensor products are still 1-dimensional, not much changes in terms of the representation of these sections: there are no added degrees of freedom.Footnote ⁴³

In the first two columns, the representations $3$ and $2$ , describe the number of degrees of freedom of the particle in these spaces: vectors in $C^{3}$ have three and in $C^{2}$ have two. Indeed, for the same reason, we label with an ‘ $8$ ’ the representation of the gluon, whose internal components, in our geometric treatment, would be a section of $Γ (E^{3} \otimes_{T} E^{*}^{3})$ , where $T$ stands for traceless (which is necessary for parallel transport to be not only linear, but compatible with the inner product). So ‘8’ is the number of internal degrees of freedom that such a field would have, and, its tensor structure implies it is acted on by the adjoint representation of the group action on $E^{3}$ .Footnote ⁴⁴

As with the fermions, we can, of course, have different sections of vector bosons. Any such vector boson defines an affine connection $D$ that is compatible with the fiber structure. So the structure group $G ≃ A u t (E_{x})$ , still emerges explicitly, even “physically,”Footnote ⁴⁵ by parallel transport along all the different curves, through (3.2.60).

But the gluon does not fit Table 1 because it is not a fermion, and does not decompose into a tensor product with Weyl spinors as the rest of the table does; it is a boson, and its spacetime part is a 1-form. Indeed, this is the case for all the affine connections, which, in particle physics terminology, are called the gluon, the W and the Z-bosons. These are the degrees of freedom dictating the parallel transport of color, isospin, and (hyper)charge, which, along a given spacetime curve $γ : [0, 1] \to M$ take, respectively, the fibers of $E^{3}, E^{2},$ and $E^{1}$ over $γ (0) \in M$ to the fibers of $E^{3}, E^{2},$ and $E^{1}$ over $γ (1) \in M$ , as a linear, structure-preserving transformation.

Summing up, apart from (4.3.2), we get:

u_{R} \in Γ (E^{3} \otimes E_{4}^{1}), d_{R} \in Γ (E^{3} \otimes E_{- 2}^{1}), ℓ_{L} \in Γ (E^{2} \otimes E_{- 3}^{1}), e_{R} \in Γ (E_{- 6}^{1}),

(4.3.3)

and adding the vector bosons (one for each $S U (n)$ ), for which we include its 1-form component in spacetime:

ω_{n} \in Γ (T^{*} M \otimes E^{n} \otimes_{T} E^{n}^{*}),

(4.3.4)

We can conceive of each generation as having the following decomposition into five factors:

\begin{matrix} C^{15} = & (C^{3} \otimes C^{2} \otimes C_{1}^{1}) \oplus (C^{3} \otimes C_{4}^{1}) \oplus (C^{3} \otimes C_{- 2}^{1}) \\ \oplus (C^{2} \otimes C_{- 3}^{1}) \oplus C_{- 6}^{1} . \end{matrix}

(4.3.5)

And we can finally answer the main question of this section, and indeed of the section: why do the parallel transports of different, mutually interacting particles, as sections of different vector bundles, march in step?

In the textbook tradition (see e.g. Nakahara (Reference Nakahara2003, Ch. 9)), the answer is postulated: the gauge symmetry group is not derived as preserving some physical structure, it is postulated in the definition of the principal bundle, which, as I said, is there merely auxiliary. But here I’ve argued that, just as tensor bundles are constructed from the underpinning geometry of $T M$ and tensors have components in the spaces thus constructed, particle fields have components in internal spaces corresponding to color, isospin, and (hyper)charge, that are constructed from the underpinning geometry that is isomorphic to $C^{3}, C^{2}$ , and $C^{1}$ , endowed with an inner product and, except in the case of $C^{1}$ , an orientation. Parallel transport marches in step because it concerns the underpinning internal geometry.

In this tensorial representation of the fields of gauge theory, there is no need for indices, except to denote the type of tensor under consideration. In the analogous spacetime case, this is called the abstract index notation for spacetime tensors. In that case, such tensors are invariant under passive, that is, coordinate transformations. It is only upon introducing a coordinate chart that we can talk about a spacetime tensor’s components transforming under a change of coordinates. But coordinate-free, abstract spacetime tensors are not invariant under active diffeomorphisms, which induce a linear isomorphism between different tangent bundles.

The situation for gauge theory as I have developed it here is very similar. One can explicitly introduce internal indices by introducing a choice of frame for the vector bundle, that is, a section of the principal bundle $L (E)$ . In this case, one recovers the gauge transformations via a change of frames, which amounts to a change of section of the principal bundle $L (E)$ : these are construed as “passive gauge transformations.” But we could also take the active, or global, point of view. Namely, given a structure-preserving linear isomorphism between vector bundles, we obtain different, but isomorphic connections. The transformation between these connections corresponds to the active view of gauge transformations (cf. Equation (3.2.57)). Nonetheless, The structure groups $S U (3) \times S U (2) \times U (1)$ are the symmetries that preserve the internal geometry, and emerge explicitly upon comparisons of parallel transported tensors via Equation (3.2.60): the holonomy group $H o l (D)$ is invariant under linear isomorphisms and is isomorphic to the automorphism group.

4.3.2 Possible Objections

Here I will address five possible objections about the geometric viewpoint: the first is more technical, the second is conceptual; the third is metaphysical, the fourth is about completeness; and the fifth is about applications beyond the SM. All but the first two lead to concessions about my framework. Lastly, I will dissolve one apparent source of tension between this section’s geometric viewpoint and Section 2’s more methodological one.

First the technical possible objection: I said previously that the spinor structure of the fields comes in as a factor in a tensor product with the internal tensorial structure. But that is not exactly right for the table as I presented it: it would require me to represent the SM solely in terms of one chirality, which is certainly possible. Instead of having both right- and left-handed spinors, one can include in the table only left-handed ones; I preferred not to mix particles and anti-particles in the table, which is why I instead used both chiralities. Using a single chirality would have the advantage of being rigorous about the tensor product between internal spaces and spinors but would have the disadvantage of having to introduce complex conjugates of the representations, for example, use $\overline{3}$ instead of $3$ for the first and fourth rows of Table 1, and also having to introduce $q_{L}^{c}$ , the anti-left-handed quark doublet, and $ℓ_{L}^{c}$ , the anti-left-handed lepton doublet. But, of course, doing this would not offend my main thesis, since complex conjugation of $C^{3}$ is an operation that requires no more structure than I have posited; it is analogous to taking $T^{*} M$ to be defined by $T M$ (as linear functionals thereof).

Table 1 The representation of the SM groups on fermions.

	$S U (3)$	$S U (2)$	$U (1)$
$q_{L}$	3	2	$\frac{1}{6}$
$u_{R}$	3	1	$\frac{2}{3}$
$d_{R}$	3	1	$- \frac{1}{3}$
$ℓ_{L}$	1	2	$- \frac{1}{2}$
$e_{R}$	1	1	$-$ 1

Now I’ll address the second, conceptual objection: given the Lagrangian of the SM written in a local coordinate system, I could extract all of the invariances and symmetry transformations directly. Invariance of the Lagrangian would constrain the internal values of the different particle fields to appropriately co-rotate. This is a true statement, but I don’t think it is explanatory. For the same could, of course, be said about general covariance in general relativity. There, it is the geometric interpretation that underpins the universal coupling of all of the fields to a single spacetime geometry. But this universality could fail; for instance, if “bi-metric” Lagrangians for gravity were adopted, we could have more than one Levi-Civita connection, which could dictate parallel transport differently for different fields. Reversing the explanatory arrow, the fact that such bi-metric theories have little empirical support can be explained by the more parsimonious, familiar geometric interpretation of general relativity. Similarly, my argument here shows that the most parsimonious explanation for the current form of the Standard Model (without the analogous “bi-metrics”), is that it concerns an internal structured space, isomorphic to $C^{3} \times C^{2} \times C^{1}$ .

The third objection is very similar in spirit to the second one, but it plays out in one level lower in the hierarchy of mathematical structures. Whereas the second was about the basic geometric objects describing parallel transport, the third concerns the underpinning spaces in which the fields in question live. For the interior complex spaces I have presented are not analogous to tangent spaces with Lorentzian inner-product in all relevant senses: there is a privilege afforded to the tangent space which isn’t similarly afforded to complex internal spaces, since each element of the tangent space is identified with an infinitesimal path through the base manifold: the tangent space is “soldered” onto spacetime. Thus the particular vector bundle $E$ has to be postulated and, we must assume, shared by interacting fields.Footnote ⁴⁶ Nonetheless, I maintain that the explanation afforded here distinguishes itself by putting structure, rather than symmetry, first. In contrast, the standard formalism posits both the symmetry of the principal fiber bundle and the vector bundles, and demands their compatibility, which goes unexplained.Footnote ⁴⁷

Fourthly, my description of the SM here was not complete. The attentive reader will have noticed a glaring omission: the Higgs particle is nowhere to be found in Table 1. There are, at bottom, two reasons for this omission. The first is that the Higgs would not fit in Table 1: it is a scalar field on $M$ , not a spin 1/2 fermion, and so does not fit the required (but implicit) tensor product structure. The second, more relevant reason, is that the Higgs and spontaneous symmetry breaking (SSB) make things rather more complicated, with added non-gauge interactions between the Higgs and other particles through Yukawa couplings. It is mostly differences in these couplings that distinguish the three generations of the SM. The up, charm and top quarks have the same electric charge, along with the same weak and strong interactions – they primarily differ in their mass, which comes from the Higgs field. The same thing holds for the down, strange and bottom quarks, along with the electron, muon and tau leptons. And yet there is a single generation of bosons, meaning that they are all parallel transported by the same connections. The striking similarity and apparent redundancy of the three generations is one of the great mysteries of the SM, even within the standard approach. In order to address this issue in this formalism, one would need to better understand gauge-invariant construals of the Higgs mechanism and Yukawa couplings (see e.g. Struyve (Reference Struyve2011) and Berghofer et al. (Reference Berghofer, Francois and Friederich2023, Ch. 5)), in terms of invariant geometric structures along the lines that I have proposed here. I leave a full treatment of Yukawa couplings, the Higgs, and SSB for further work.

Here is the fifth possible objection, about applications beyond the SM: the interpretation of the SM that I have proposed here was very straightforward because different non-neutral charges appear only in the $C^{1}$ sector.Footnote ⁴⁸ In that one-dimensional sector, the different charges arise from tensor products (by multi-linearity) at no additional ontological price, since these products imply no additional degrees of freedom for the particles in question. So a worry might emerge that we could not account for different charges for the other forces, and that the scope of the geometric interpretation is narrower than the scope of the standard interpretation in terms of principal fiber bundles and their associated bundles.

However, at least for $S U (n)$ , the geometric interpretation pursued here can recover all the different representations (representing different kinds of particles) by using tensor products and the internal geometric structures of the fibers $C^{n}$ (see e.g. Coleman (Reference Coleman1965) and Zee (Reference Zee2016, Ch. IV.4)). Indeed, we saw one such construction for the gauge boson, that lives in the adjoint representation, in Equation (4.3.4). That representation corresponds to a traceless tensor product between an internal space and its adjoint. And although for $n > 1$ , the number of degrees of freedom of such internal tensor fields is different for different valences, this is as it should be: the number of degrees of freedom of sections of spacetime tensor fields of valence $(j, k)$ depends on $j$ and $k$ , after all.

However, for some of the exceptional Lie groups, whose geometric interpretation is much more involved (cf. Adams (Reference Adams1996)), I believe that my interpretation might fail (e.g. if there is no minimal vector space whose structure is preserved by the group, or if there is some representation that cannot be understood in terms of tensor products of such a vector space. In these cases, my interpretation certainly becomes less natural, and so I also leave this for further study.

Finally, recall Section 2, where I gave a methodological reason for gauge symmetry: namely, that it ensured that the dynamics of fields and charges was compatible with charge conservation. Is the new, geometric viewpoint explored in the current section in contradiction with that methodological reason? No, clearly the new viewpoint merely provides a geometrical origin to the symmetry and highlights the power of geometry in physics. Indeed, as argued in Section 2.2.2, the constraints emerging from Noether’s second theorem are naturally interpreted in terms of geometric constraints on the curvature tensors and so fit nicely with this new viewpoint.

5 The Aharonov–Bohm Effect, Nonlocality, and Nonseparability

In this section I will focus on a topic that is very popular in philosophical treatments: the Aharonov–Bohm effect, henceforth, the AB effect. The effect is usually portrayed as being of a quantum nature; I think this is a mistake: the fact that an experimental probe of these effects employs the superposition principle is, in my view, accidental, not essential.

Instead, I will argue here that the importance of the effect is in showing there are physically salient gauge-invariant quantities that cannot be captured by the curvature tensor. The effect shows, so to speak, the fundamental significance of parallel transport, beyond what is encoded in the curvature.

Another nonessential feature of the effect as it is usually portrayed is its reliance on the nontrivial topology of space, which is a very obvious nonlocal fact. Although this portrayal is correct within the vacuum sector of the Abelian theory, even in a background space that is topologically trivial there are similar effects that have a similar significance.

Thus, in Section 5.1, I provide the standard description of the AB effect, in the vacuum sector in the Abelian theory. Next, in Section 5.2, I will show that a trivial topology does not completely close the gap between curvature and gauge invariant quantities. In the non-Abelian, vacuum case, and in a background spacetime that is topologically trivial, there still are gauge-invariant quantities that cannot be expressed using only the curvature even in a vacuum. Finally, in Section 5.3, I discuss the sense in which the content of the connection that outstrips that of the curvature is nonlocal and in which it is nonseparable. As we will see, there are important differences between the two. This section will give a very brief introduction to a third, relational reason for introducing gauge symmetry.

5.1 AB Effect in the Abelian Vacuum

Does the physical content of the gauge potential in the Abelian theory outstrip that of the Maxwell Faraday tensor? As is immediate to observe from (3.2.26), the curvature is gauge-invariant in the Abelian case. This often leads to questions about whether physical theories couldn’t be entirely described without the use of the gauge variant potentials. But surprisingly, Abelian gauge theory has more than curvature as its fundamental degrees of freedom. The AB effect describes physical, or gauge-invariant, features of the theory that cannot be articulated using only the curvature. These features appear even in vacuum, though there they require spacetime to have (effectively) a nontrivial topology.

Historically, in order to investigate the physical significance of the gauge potential, Aharonov and Bohm proposed an electron interference experiment, in which a beam is split into two branches which go around a solenoid and are brought back together to form an interference pattern.Footnote ⁴⁹ This solenoid is perfectly shielded, so that the magnetic field vanishes outside it and no electron can penetrate inside and detect the magnetic field directly.Footnote ⁵⁰

The experiment involves two different set-ups – solenoid on or off – which produce two different interference patterns. As the magnetic flux in the solenoid changes, the interference fringes shift. And yet, in both set-ups, the field-strength (i.e. the magnetic field) along the paths accessible to the charged particles is zero. So, the general outline of the experiment is: (a) the observable phenomena change when the current in the solenoid changes; and (b) the electrons that produce the phenomena are shielded from entering the region of nonzero magnetic fields; so (c) if we rule out unmediated action-at-a-distance, whatever physical difference accounts for the change must be located outside the solenoid.

Thus, to explain the different patterns, one must either conjecture a nonlocal action of the field-strength upon the particles, or regard the gauge potential as carrying ontic significance. Taking this second stance, the AB effect shows that the gauge potential cuts finer physical distinctions than the field-strength tensor can distinguish. How much finer?

Supposing such electrons take the paths $γ_{1}$ and $γ_{2}$ around the solenoid, we can infer from the shift in the interference pattern that there is a field-dependent contribution to the relative phase of electron paths that pass to the left and to the right of the solenoid, given by:Footnote ⁵¹

e^{i Δ} = exp (i \oint_{γ_{1} \circ γ_{2}} A),

(5.1.1)

where, assuming the electrostatic situation, we use bold-face to denote the spatial one-form $A$ without indices. This one-form satisfies $d A = B$ , where $d$ is the spatial exterior derivative. A gauge transformation $A \to A + d λ$ will not affect (5.1.1), (for any $λ \in C^{\infty} (Σ)$ ), since $γ_{1} \circ γ_{2} ≃ S^{1}$ , and so $\oint_{S^{1}} d λ = 0$ , by Stokes’ theorem. Thus, the phase difference $Δ$ cares only about the gauge-equivalence class of $A$ .

To find out more precisely what is the physical information in the equivalence classes of the gauge potential that outstrips what can be encoded by the curvature, we proceed as follows. Given spatial gauge potentials $A^{1}, A^{2}$ on the spatial surface $Σ$ , define $C := A^{1} - A^{2}$ where $C$ is a 1-form on $Σ$ . Suppose $A^{1}, A^{2}$ are such that

d A^{1} =: F^{1} = F^{2} := d A^{2},

(5.1.2)

and so

d C = d A^{1} - d A^{2} = 0.

(5.1.3)

Now, if there are $C$ such that $C \neq d λ$ (for any $λ \in C^{\infty} (Σ)$ ), then $A^{1}$ and $A^{2}$ are not related by a gauge-transformation and so are not in the same gauge-equivalence class, in spite of having the same curvature. By definition, such a $C$ would be a member of $H^{1} (Σ) := K e r d^{1} / I m d^{0} \subset Λ^{1} (Σ)$ , where $d^{1}$ is the exterior derivative operator acting on the space of 1-forms on $Σ$ , $Λ^{1} (Σ)$ , and $d^{0}$ is that same operator acting on smooth functions (or 0-forms). This space is called the first de Rham cohomology of $Σ$ and it is nontrivial only if there are loops in $Σ$ that are not contractible to a point: a topological condition. For such $Σ$ , we can therefore find distinct equivalence classes $[A^{1}] \neq [A^{2}]$ that can nonetheless correspond to the same electric and magnetic field. (See Belot (Reference Belot1998, Sec. 4) for a more thorough philosophical analysis of this paragraph’s discussion.)

5.2 The Non-Abelian, Vacuum Case

In the non-Abelian case, we have an even stronger result. Namely, If two connections $A$ and $A^{'}$ have the same curvature $F \neq 0$ , even on a simply-connected region, and in vacuum, they are not necessarily gauge-equivalent. Therefore, generally, there is indeed more physical information captured by holonomies or Wilson loops than by the curvature. A simple example is the following: take the gauge group $S U (2)$ and base manifold $R^{2}$ . The Pauli matrices, denoted as $σ_{1}$ , $σ_{2}$ , and $σ_{3}$ , form a basis for the Lie algebra $s u (2)$ :

σ_{1} = (\begin{matrix} 0 & 1 \\ 1 & 0 \end{matrix}), σ_{2} = (\begin{matrix} 0 & - i \\ i & 0 \end{matrix}), σ_{3} = (\begin{matrix} 1 & 0 \\ 0 & - 1 \end{matrix}) .

The Pauli matrices satisfy the following algebraic relations, known as the Pauli algebra:

\begin{matrix} σ_{1}^{2} & = σ_{2}^{2} = σ_{3}^{2} = I d (where I d is the identity matrix) \\ σ_{i} σ_{j} & = - σ_{j} σ_{i} for i \neq j (antisymmetry) \\ σ_{i} σ_{j} & = δ_{i j} I d + i ϵ_{i j k} σ_{k} (where ϵ_{i j k} is the Levi-Civita symbol) \end{matrix}

(5.2.1)

Now consider

A = - i y σ_{3} d x + i x σ_{3} d y

(5.2.2)

A^{'} = i σ_{1} d x - i σ_{2} d y

(5.2.3)

In both cases, the term $d A = d A^{'} = 0$ , and using (5.2.2) it is simple to verify that, calculating the curvatures $V = d A + A \land A$ we get:

F = F^{'} = 2 i σ_{3} d x \land d y .

(5.2.4)

If $A$ and $A^{'}$ were gauge-related, there should exist $g \in C^{\infty} (R^{2}, S U (2))$ such that $A^{'} = g A g^{- 1} - d g g^{- 1}$ , in which case $F^{'} = g F g^{- 1} = F$ . That is, $F$ should be invariant under such a $g$ , or, infinitesimally, $V$ should commute with the generator of the transformation. Since $F \propto σ_{3}$ , and, from (5.2.1), the only transformations that commute with $σ_{3}$ are generated by $σ_{3}$ , we would have $g_{o} = e^{i θ σ_{3}}$ for some $θ (x, y)$ . From (5.2.2), since $A$ only contains $σ_{3}$ , we get that $g_{o} A g_{o}^{- 1} = A$ and thus

g_{o} A g_{o}^{- 1} - d g_{o} g_{o}^{- 1} = A - i d θ σ_{3} .

(5.2.5)

Clearly, since this expression still only contains $σ_{3}$ , there is no $θ$ that can transform it into $A^{'}$ .

5.3 Nonlocality and Nonseparability

It is often said (cf. e.g. Belot (Reference Belot1998), Healey (Reference Healey2007), Healey & Gomes (Reference Healey, Gomes and Zalta2021), Myrvold (Reference Myrvold2011)) that the Aharonov–Bohm effect of classical electromagnetism evinces a form of nonlocality, something that otherwise might have been thought of as confined to nonclassical physics. In the same vein, it is often said that gauge-invariant quantities are nonlocal. In Section 5.3.1 I will argue that the nonlocality in question is relatively benign. This argument will lead us to yet another reason for gauge symmetry, introduced by Rovelli (Reference Rovelli2014). In Section 5.3.2 I will argue that, notwithstanding Section’s 5.3.1 deflationary account of nonlocality, there is still an interesting notion of nonseparability at play in gauge theories, but such a notion is the norm in physical theories, not the exception. Finally, in Section 5.4, I conclude this section and this Element.

5.3.1 Nonlocality and Rovelli’s Relational Reason for Gauge

In Equation (5.1.1), the AB effect was quantified by the holonomy of the connection along a closed curve (which we first encountered in Section 4.2, cf. Equation (3.2.60), and will discuss further in Section 5.3.2; cf. Equation (5.3.1)). This is a nonlocal, gauge-invariant quantity, and it raises the question of whether all gauge-invariant quantities are in some sense nonlocal. And this question has a long and vexed history, which I will not be able to fully cover here (cf. Berghofer et al. (Reference Berghofer, Francois and Friederich2023); Carrozza & Höohn (Reference Carrozza and Höhn2022); Earman (Reference Earman, Kargon, Achinstein and Kelvin1987) and Gomes (Reference Gomes2024b, Sec. 3.4)). All I can offer is a summary.

Of course, there are gauge-invariant quantities that are local. These are easy to find using the matter fields, $ψ \overline{ψ}$ is just one example. But even in vacuum they are easy to find: $F_{I}^{μ ν} F_{μ ν}^{I}$ is gauge-invariant.Footnote ⁵² But, as we saw in the previous section, not all gauge-invariant quantities could be written in terms of the curvature. A certain set of variables can be used to describe all the physical states of a system iff they can describe all the possible initial data for that system, so it pays to have a small digression into the canonical formalism.

As we saw in Section 2, the equations of motion of theories such as general relativity and Yang-Mills are not all independent, and thus they only uniquely determine the evolution of a subset of the original degrees of freedom. In practice, this means that initial data for these two theories must satisfy elliptic differential equations. Since elliptic equations do not describe anything propagating – their boundary conditions are instantaneous and nonlocally determine the solution within the bounded region – we get a nonlocal parametrization of initial data that satisfy these constraints.

There is thus a certain freedom in choosing which “components of the fields” will be uniquely propagated, or will evolve deterministically; each choice corresponds to a gauge-fixing, or, equivalently, to a parametrization of the solutions of the initial value constraints. In other words, each gauge-fixing can be seen as a choice of conjugate degrees of freedom – configuration and conjugate momentum variables – that are uniquely propagated because they satisfy the constraints and are freely specifiable; cf. Gomes and Butterfield (Reference Gomes and Butterfield2022) for more about this interpretation of the initial value constraints and nonlocality.

So that is a way of seeing nonlocality of gauge-invariant quantities through the lens of the canonical (or Hamiltonian) approach to these theories. From the covariant perspective we can give another heuristic explanation. The connection-form determines parallel transport on a vector bundle $E$ , but it is not invariant: it will transform if we apply a fiber-preserving linear automorphism on $E$ . Intuitively, in order to extract invariant information from parallel transport, we must somehow compare parallel transported objects. And this is a nonlocal operation. Heuristically, we could see this from the inhomogeneous, that is, nontensorial transformation properties of the connection, given in Equation (3.2.11) (for the Ehresmann connection, or (3.2.44) for the projection onto a section). Namely, in order to extract the content of the connection that is invariant under gauge transformations we need to eliminate this derivative; that is, we need to use integration.

Another way to describe gauge-invariant quantities is to anchor the representation of a state to a physical system, with the ensuing representation being straightforwardly understood in terms of gauge-invariant relations to this physical system (see Carrozza & Höhn (Reference Carrozza and Höhn2022); Gomes (Reference Gomes2024b) for a defense of this view). For instance, given a set of four scalar fields obeying functionally independent Klein-Gordon equations, we can understand harmonic gauge in general relativity as using these fields as coordinates on a region of spacetime (cf. Bamonti (Reference Bamonti2023) and references therein). Similarly, in gauge theory, a nowhere vanishing charged matter field could select an internal frame for a vector bundle, and we would describe other fields relative to this frame. In electromagnetism, unitary gauge can be understood in this way, and it is completely local (see e.g. Gomes (Reference Gomes2024b, Sec. 4.2) and Wallace (Reference Wallace2024)).Footnote ⁵³ But, again, one cannot pick out a frame using only local properties of parallel transport: it is for this reason that gauge-fixings based only on the connection – such as Coulomb gauge – are nonlocal: in order to describe this particular kind of internal frame, we need to resort to properties of the field at spatially separated points.

Thus we arrive at the doorstep of another answer to the question of “why gauge,” introduced by Rovelli (Reference Rovelli2014). Namely, that gauge freedom is essentially relational in character. This is easy to motivate within the geometric perspective I have developed here, for example, a vector in a vector space is not invariant under rotations, but the inner product between two vectors, a relational quantity, is. And similarly with internal quantities: for instance, the decomposition of a quark field into three colors is gauge-dependent, but measured and coupled to some other physical system these components can be interpreted invariantly.

Indeed, Rovelli argues more broadly, that symmetry-invariant descriptions of most physical systems arise only via relations between different parts of those systems. And, since it is an answer that this Element is too short to do justice to, I defer to Rovelli’s (Reference Rovelli2014, p. 7) excellent summary:

Gauge invariance is not just mathematical redundancy; it is an indication of the relational character of fundamental observables in physics. These do not refer to properties of a single entity. They refer to relational properties between entities: relative velocity, relative localization, relative orientation in internal space, and so on.[...] Gauge is ubiquitous. It is not unphysical redundancy of our mathematics. It reveals the relational structure of our world. [my italics]

This relational understanding of gauge fits nicely with the geometrical viewpoint provided in Section 4.

In sum, nonlocality arises because the function that takes the original local degrees of freedom in an arbitrary frame to a unique frame that is defined “physically” or via properties of the fields, is often nonlocal: the value of an element in the subset at point $x$ depends on the values of the original degrees of freedom at other points (see Gomes & Butterfield (Reference Gomes and Butterfield2022, Sec. 1.1, point (3)) for more discussion about this sort of (nonsignaling) classical nonlocality).Footnote ⁵⁴ And that is just because we determine, or “construct” the frame from properties of the fields, and some invariant relations between different components of some of the fields are nonlocal.

Gomes (Reference Gomes2019, Reference Gomes2021) and Gomes & Riello (Reference Gomes and Riello2021) develop Rovelli’s ideas further, distinguishing two aspects of the ‘coupling’ of systems that are related to symmetries. The first aspect is closer to Rovelli’s (Reference Rovelli2014) justification for gauge, briefly described previously. As they put it: in order to join gauge-invariant descriptions of subsystems, we need to employ gauge degrees of freedom. The second aspect of coupling that is related to symmetries is that there is more than one way to successfully couple gauge-invariant subsystems, and this multiplicity gives rise to a symmetry with empirical significance. This second notion is tightly related to that of nonseparability of physical systems, which we now turn to.

5.3.2 Nonseparability

As I said, the notion of nonlocality that I described earlier applies more generally than what is required for the AB effect: most gauge-fixings lead to such a nonlocal representation of the state (cf. Gomes (Reference Gomes2024b, Sec. 3)). But as we will see in this section, the AB effect also illustrates another, related idea: nonseparability. Broadly, the idea is that gauge-invariant states on patches or regions of space or spacetime do not uniquely determine the gauge-invariant state on the union of those patches or regions. In Gomes (Reference Gomes2021), I also argued that this kind of nonseparability, or holism, can be construed as an empirical manifestation of symmetries as applied ‘externally’ to a subsystem. In other words, there are often many ways in which to put together the physical states of two patches or regions into a whole, and all these different ways are related to each other by a symmetry transformation that acts only on one of the subsystems. Apart from the case of electromagnetism, the existence of such transformations is contingent on special – specially homogeneous – states at the boundary of the region.Footnote ⁵⁵

Similar versions of separability are offered by Healey (Reference Healey2007, p. 26), who calls it Spatial Separability, Belot (Reference Belot1998, p. 540), whose term is Synchronic Locality, and Myrvold (Reference Myrvold2011, p. 427), who calls it Patchy Separability. The guiding idea there is very similar to the one I described earlier (from Gomes (Reference Gomes2021)): it is that the state of a region supervenes on assignments of intrinsic properties to patches of the region, where the patches may be taken to be arbitrarily small.

But, whereas Gomes (Reference Gomes2021) formulates these properties using gauge-fixings of the states and considers also the non-Abelian theory, each of these papers formulates the question of separability for electromagnetism in holonomy variables. Although their formulation is inadequate to deal with non-Abelian theories – and so Gomes (Reference Gomes2021) is more general – for the vacuum Abelian case the use of holonomies is simpler and adequate. So in this section, I will describe nonseparability in electromagnetism using holonomies.

Given the space of loops – smooth embeddings into spacetime $c : S^{1} \to$ – one can form a basis of gauge-invariant quantities, called holonomies:

H (c) := exp \oint_{c} A .

(5.3.1)

Clearly these are gauge-invariant, since under a gauge transformation

H (c) \mapsto H (c) exp \oint_{c} (d λ) = H (c) exp ((λ (\overline{x}) - λ (\overline{x}))) = H (c),

(5.3.2)

where $\overline{x}$ is taken as any base point for the loop $c$ .

In order to compose holonomies for different curves, we take $c_{-}$ an open curve that ends where $c_{+}$ , another open curve, begins. Then we define the composition $c_{-} \circ c_{+}$ as a map from $[- 1 / 2, 1 / 2]$ into $M$ , which takes $[- 1 / 2, 0]$ to traverse $c_{-}$ and $[0, 1 / 2]$ to traverse $c_{+}$ . Following this composition law, it is easy to see that

H (c_{-} \circ c_{+}) = H (c_{-}) H (c_{+})

(5.3.3)

with the right-hand side understood as complex multiplication in the Abelian case. Thus, if $c_{-} = c_{+}^{- 1}$ (opposite orientations), $H (c_{-}) H (c_{+}) =$ Id. This property underlies the graphical calculus of Figure 3.

Figure 3 The composition properties of holonomies guarantee separability in the absence of holes: the two arrows going along the middle line cancel out.

Suppose we split a simply connected region into two patches that only overlap in their common, simply-connected boundary. By composing regional loops $c_{+}, c_{-}$ going in opposite directions at the boundary, since the opposite contributions of those segments cancel out, it is true, as Myrvold (Reference Myrvold2011) argues, that we recover the gauge-invariant holonomy corresponding to a larger loop $c$ not contained in either region. In this case, as seen in Figure 3, any holonomy can be recovered by the holonomy of curves intrinsic to the patches and so Abelian gauge theory in vacuum is separable.

According to Myrvold (Reference Myrvold2011), separability fails only for nonsimply connected manifolds. For, as can be seen in Figure 4, holonomies intrinsic to the two regions could not cancel out around the hole, for they are not collinear there.

Figure 4 In the presence of holes, there may be holonomies that are not separable into holonomies that are intrinsic to the patches that do not contain the holes.

But Myrvold (2011) considers only the vacuum case. And, as becomes evident using gauge-fixings (cf. Gomes (Reference Gomes2021)), in the presence of charges we can get nonseparability even in a trivial topological background (see also Greaves & Wallace (Reference Greaves and Wallace2014); ’t Hooft (Reference t Hooft1980); Wallace (Reference Wallace2022)). In the Abelian case, we can use holonomies to exhibit, but not quantify, this separability, as follows.

Let again $A$ be the electromagnetic gauge potential, and $ψ, \overline{ψ}$ the charged Klein-Gordon field and its conjugate. A gauge transformation maps

A (x) \mapsto A (x) + d λ (x), ψ (x) \mapsto exp (i λ (x)) ψ (x), \overline{ψ} (x) \mapsto exp - (i λ (x)) \overline{ψ} (x) .

(5.3.4)

Given one positive and one negative charge at the points $x_{1}$ and $x_{2}$ (fields that have a singular support on $x_{1}$ and $x_{2}$ ): $ψ$ and $\overline{ψ}$ , and an open curve $c$ whose initial and final points are $x_{1}$ and $x_{2}$ , respectively, then:

Q (c) := \overline{ψ} exp (i \int_{c} A) ψ

(5.3.5)

is also gauge-invariant, since

\begin{matrix} exp & (i \int_{c} A) \mapsto exp (i (λ (x_{1}) - λ (x_{2})) + i \int_{c} A) \\ = & exp (i λ (x_{1})) exp (i \int_{c} A) exp (- i λ (x_{2})), \end{matrix}

(5.3.6)

which cancels with the transformations of $ψ$ and $\overline{ψ}$ , by (5.3.4). Here, because we are assuming there are no charges except at the ends of $c$ , we cannot break $Q (c)$ up into gauge-invariant quantities $Q (c_{1, 2, \dots})$ attached to smaller segments $c_{1, 2, \dots} \subset c$ .Footnote ⁵⁶

5.4 Conclusion

Here is one thing the AB effect illustrates: even though curvature encodes the local geometric tensors that involve derivatives of the connection, there are geometric facts that arise from the comparison of parallel transported vectors along different curves: they do not involve derivatives of the connection, but their integrals. This applies to internal as well as spacetime vectors: there are several treatments of (close analogues of) the AB effect within general relativity; cf. Anandan (Reference Anandan1977), Dowker (Reference Dowker1967), Ford & Vilenkin (Reference Ford and Vilenkin1981). And although topological facts are, in a sense, nonlocal, these facts are mostly important in the Abelian case, where they merely allow sufficiently distinct connections all with the same curvature.

While it is true that holonomies in electromagnetism are nonlocal symmetry invariant quantities, the kind of nonlocality that they evince is general. For instance, it arises from using properties of the fields to fix the internal frames in which initial data is uniquely propagated by the equations of motion, a procedure called gauge-fixing.

But that is not the most important point that is illustrated by the AB effect. As we saw, the effect also illustrates nonseparability (even in the vacuum, Abelian case). Nonseparability is a feature not only of gauge theories, but also of other theories with symmetry, even nonrelativistic particle mechanics. It implies that fixing the symmetry-invariant content of subsystems does not fix the symmetry-invariant content of the composition of those subsystems. In other words: the physical content intrinsic to subsystems can be put together in more ways than one. Since these different ways are obtained from each other by “external symmetry transformations” (cf. Gomes (Reference Gomes2021)), we see here a new reason for introducing at least some symmetries: because they describe the physically inequivalent ways to couple the same subsystem states.

James Owen Weatherall
University of California, Irvine
James Owen Weatherall is Professor of Logic and Philosophy of Science at the University of California, Irvine. He is the author, with Cailin O’Connor, of The Misinformation Age: How False Beliefs Spread (Yale, 2019), which was selected as a New York Times Editors’ Choice and Recommended Reading by Scientific American. His previous books were Void: The Strange Physics of Nothing (Yale, 2016) and the New York Times bestseller The Physics of Wall Street: A Brief History of Predicting the Unpredictable (Houghton Mifflin Harcourt, 2013). He has published approximately fifty peer-reviewed research articles in journals in leading physics and philosophy of science journals and has delivered over 100 invited academic talks and public lectures.

About the Series

This Cambridge Elements series provides concise and structured introductions to all the central topics in the philosophy of physics. The Elements in the series are written by distinguished senior scholars and bright junior scholars with relevant expertise, producing balanced, comprehensive coverage of multiple perspectives in the philosophy of physics.

Element contents