1 Introduction
The stellarator is a promising approach to magnetic confinement, as it does not require plasma current to produce rotational transform and thus is inherently steady state and stable to current-driven modes. However, collisionless trajectories are not guaranteed to be confined in three-dimensional geometry as they are in axisymmetric systems (Gibson & Taylor Reference Gibson and Taylor1967; Helander Reference Helander2014). This can lead to poor confinement of energetic particles and increased neoclassical transport. A possible solution is the application of numerical optimization techniques to carefully tailor the magnetic geometry for improved confinement properties. One of the first demonstrations of this technique was in the design of the Wendelstein 7-X (W7-X) stellarator (Grieger et al.
Reference Grieger, Lotz, Merkel, Nührenberg, Sapper, Strumberger, Wobig, Burhenn, Erckmann and Gasparino1992; Lotz, Nührenberg & Schwab Reference Lotz, Nührenberg and Schwab1990), which was optimized for small neoclassical transport in the
$1/\unicode[STIX]{x1D708}$
regime and for small bootstrap current, in addition to several other physics criteria.
Another approach to improve confinement in stellarators is to obtain a configuration whose magnetic field strength exhibits a symmetry direction when expressed in Boozer coordinates, known as quasi-symmetry (Nührenberg & Zille Reference Nührenberg and Zille1988). This leads to a conserved canonical momentum of the guiding centre motion such that neoclassical properties are similar to those in a tokamak. However, perfect quasi-symmetry can never be achieved globally in practice (Garren & Boozer Reference Garren and Boozer1991; Landreman & Sengupta Reference Landreman and Sengupta2018), and it is often desirable to include symmetry-breaking components of the magnetic field strength in consideration of other design parameters, such as magneto-hydrodynamic (MHD) stability and energetic particle confinement (Nelson et al. Reference Nelson, Berry, Brooks, Cole, Chrzanowski, Fan, Fogarty, Goranson, Heitzenroeder and Hirshman2003; Henneberg et al. Reference Henneberg, Drevlak, Nührenberg, Beidler, Turkin, Loizu and Helander2019). As one must allow for breaking of quasi-symmetry, it remains essential to include a measure of neoclassical transport such that the symmetry-breaking harmonics of the field strength do not significantly degrade the confinement.
Neoclassical transport is governed by solutions of the drift kinetic equation, (DKE) (2.1), from which moments (e.g. radial fluxes and bootstrap current) are computed. The DKE local to a flux surface can be solved numerically (Landreman et al. Reference Landreman, Smith, Mollén and Helander2014; Belli & Candy Reference Belli and Candy2015). However, this four-dimensional problem is expensive to solve within an optimization loop, especially in low-collisionality regimes for which increased pitch-angle resolution is required in the collisional boundary layer.
Therefore, it may be desirable to consider an analytic reduction of the DKE. Under the assumption of low collisionality, a bounce-averaged DKE can be considered (Beidler & D’haeseleer Reference Beidler and D’haeseleer1995; Calvo et al.
Reference Calvo, Velasco, Parra, Alonso and García-Regaña2018). While bounce averaging can significantly reduce the computational cost by decreasing the spatial dimensionality, this approach typically requires restrictions on the geometry, such as closeness to omnigeneity or a model magnetic field. Additional reduction of the DKE can be made in low-collisionality regimes, resulting in semi-analytic expressions. For example the effective ripple,
$\unicode[STIX]{x1D716}_{\text{eff}}$
(Nemov et al.
Reference Nemov, Kasilov, Kernbichler and Heyn1999), quantifies the geometric dependence of the
$1/\unicode[STIX]{x1D708}$
radial transport, where
$\unicode[STIX]{x1D708}$
is the collision frequency, and has been widely used during optimization studies (Zarnstorff et al.
Reference Zarnstorff, Berry, Brooks, Fredrickson, Fu, Hirshman, Hudson, Ku, Lazarus and Mikkelsen2001; Ku et al.
Reference Ku, Garabedian, Lyon, Turnbull, Grossman, Mau, Zarnstorff and Team2008; Henneberg et al.
Reference Henneberg, Drevlak, Nührenberg, Beidler, Turkin, Loizu and Helander2019). This model, though, assumes very small values of the radial electric field,
$E_{r}$
, which is not always an experimentally relevant regime. A low-collisionality semi-analytic bootstrap current model (Shaing et al.
Reference Shaing, Crume, Tolliver, Hirshman and Van Rij1989) is also commonly adopted for stellarator design (Beidler et al.
Reference Beidler, Grieger, Herrnegger, Harmeyer, Lotz, Maassberg, Merkel, Nührenberg, Rau and Sapper1990; Hirshman et al.
Reference Hirshman, Spong, Whitson, Nelson, Batchelor, Lyon, Sanchez, Brooks, Y.-Fu and Goldston1999). However, this analytic expression is known to be ill-behaved near rational surfaces. Furthermore, benchmarks with numerical solutions of the DKE in the low-collisionality limit have been shown to differ significantly from the semi-analytic model (Beidler et al.
Reference Beidler, Allmaier, Isaev, Kasilov, Kernbichler, Leitold, Maaßberg, Mikkelsen, Murakami and Schmidt2011; Kernbichler et al.
Reference Kernbichler, Kasilov, Kapper, Martitsch, Nemov, Albert and Heyn2016). Any analytic reduction of the DKE implies additional assumptions, such as on the collisionality, size of
$E_{r}$
, or on the magnetic geometry.
Due to the limitations of bounce-averaged and semi-analytic models, there are benefits to computing neoclassical quantities using numerical solutions to the DKE without approximation. With the numerical methods currently used for stellarator optimization, this approach becomes computationally challenging within an optimization loop. Due to their fully three-dimensional nature, optimization of stellarator geometry requires navigation through high-dimensional spaces, such as the space of the shape of the outer boundary of the plasma or the shapes of electromagnetic coils. The number of parameters required to describe these spaces,
$N$
, is often quite large (
$O(10^{2})$
). Knowledge of the gradient of the objective function with respect to these parameters can greatly improve the convergence to a local minimum. Once a descent direction is identified, each iteration reduces to a one-dimensional line search. Gradient-based optimization with the Levenberg–Marquardt algorithm in the STELLOPT code (Strickler et al.
Reference Strickler, Hirshman, Spong, Cole, Lyon, Nelson, Williamson and Ware2004) has been widely used in the stellarator community and led to the design of the National Compact Stellarator Experiment, NCSX (Reiman et al.
Reference Reiman, Fu, Hirshman, Ku, Monticello, Mynick, Redi, Spong, Zarnstorff and Blackwell1999).
Although derivative information is valuable, numerically computing the derivative of a figure of merit
$f$
(for example, with finite difference derivatives) can be prohibitively expensive, as
$f$
must be evaluated
$O(N)$
times. For neoclassical optimization, this implies solving the DKE
$O(N)$
times; thus including finite-collisionality neoclassical quantities in the objective function is often impractical. In this work we describe an adjoint method for neoclassical optimization. With this method, the computation of the derivatives of
$f$
with respect to
$N$
parameters has cost comparable to solving the DKE twice, thus making the inclusion of these quantities possible within an optimization loop. In this work we obtain derivatives of neoclassical figures of merit with respect to local geometric parameters on a surface rather than the outer boundary or coil shapes. However, the geometric derivatives we compute provide an important step toward adjoint-based optimization of MHD equilibria, as discussed in § 5.2.2.
Adjoint methods have been applied in many fields including aerodynamic engineering and computational fluid dynamics (Pironneau Reference Pironneau1974; Glowinski & Pironneau Reference Glowinski and Pironneau1975), geophysics (Fichtner, Bunge & Igel Reference Fichtner, Bunge and Igel2006; Plessix Reference Plessix2006), structural engineering (Allaire et al. Reference Allaire, De Gournay, Jouve and Toader2005) and tokamak divertor design (Dekeyser, Reiter & Baelmans Reference Dekeyser, Reiter and Baelmans2014a ,Reference Dekeyser, Reiter and Baelmans b ,Reference Dekeyser, Reiter and Baelmans c ). They have only recently been implemented for stellarator design, namely for the design of coil shapes (Paul et al. Reference Paul, Landreman, Bader and Dorland2018) and efficiently computing shape gradients for MHD equilibria (Antonsen, Paul & Landreman Reference Antonsen, Paul and Landreman2019). The numerical method is quite general and has the potential to greatly impact many inverse design problems in magnetic confinement fusion.
In § 2 we provide an overview of the numerical solution of the DKE local to a flux surface. In § 3 the adjoint neoclassical method is described. Two approaches to the adjoint method, termed continuous and discrete, are presented, and their implementation and benchmarks are discussed in § 4. The adjoint method is used to compute derivatives of moments of the neoclassical distribution function with respect to local geometric quantities. The derivative information can be used to identify regions of increased sensitivity to magnetic perturbations, as discussed in § 5.1. We demonstrate adjoint-based optimization in § 5.2.1 by locally modifying the field strength on a flux surface. A discussion of the application of this method for optimization of MHD equilibria is presented in 5.2.2. Finally, the adjoint method is applied to accelerate the calculation of the ambipolar electric field in § 5.3.
2 Drift kinetic equation
The stellarator Fokker–Planck iterative neoclassical solver (SFINCS) code (Landreman et al. Reference Landreman, Smith, Mollén and Helander2014) solves the drift kinetic equation,

for general stellarator geometry. Here
$\boldsymbol{b}=\boldsymbol{B}/B$
is a unit vector in the direction of the magnetic field,
$v_{\Vert }=\boldsymbol{v}\boldsymbol{\cdot }\boldsymbol{b}$
is the parallel component of the velocity and
$2\unicode[STIX]{x03C0}\unicode[STIX]{x1D713}$
is the toroidal flux. The Fokker–Planck collision operator is
$C_{s}(f_{1s})$
, linearized about a Maxwellian
$f_{Ms}=n_{s}v_{ts}^{-3}\unicode[STIX]{x03C0}^{-3/2}\text{e}^{-v^{2}/v_{ts}^{2}}$
where
$v_{ts}=\sqrt{2T_{s}/m_{s}}$
is the thermal speed,
$n_{s}$
is the density,
$T_{s}$
is the temperature,
$m_{s}$
is the mass and the subscript indicates species. In (2.1), derivatives are performed holding
$W_{s}=m_{s}v^{2}/2+q_{s}\unicode[STIX]{x1D6F7}$
and
$\unicode[STIX]{x1D707}=v_{\bot }^{2}/2B$
fixed, where
$v=\sqrt{\boldsymbol{v}\boldsymbol{\cdot }\boldsymbol{ v}}$
is the magnitude of velocity,
$\unicode[STIX]{x1D6F7}$
is the electrostatic potential,
$v_{\bot }=\sqrt{v^{2}-v_{\Vert }^{2}}$
is the perpendicular velocity and
$q_{s}$
is the charge. The radial magnetic drift is

assuming a magnetic field in MHD force balance, and
$\boldsymbol{v}_{E}$
is the
$\boldsymbol{E}\times \boldsymbol{B}$
velocity

Throughout we assume
$\unicode[STIX]{x1D6F7}=\unicode[STIX]{x1D6F7}(\unicode[STIX]{x1D713})$
such that (2.1) is linear. In (2.1) we will not consider the effect of inductive electric fields, as this can be assumed to be small for stellarators without inductive current drive. We also do not consider the effects of magnetic drifts tangential to the flux surface in (2.1), as these only become important when
$E_{r}$
is small (Paul et al.
Reference Paul, Landreman, Poli, Spong, Smith and Dorland2017).
SFINCS solves (2.1) locally on a flux surface
$\unicode[STIX]{x1D713}$
, thus it is four-dimensional. The SFINCS coordinates include two angles (poloidal angle
$\unicode[STIX]{x1D703}$
and toroidal angle
$\unicode[STIX]{x1D701}$
), speed
$x_{s}=v/v_{ts}$
and pitch angle
$\unicode[STIX]{x1D709}_{s}=v_{\Vert }/v$
. Specifics about the implementation of (2.1) in the SFINCS code are described in appendix A. We will refer to two choices of implementation, the full trajectory model and the DKES trajectory model. The DKES model is employed in the drift kinetic equation solver (DKES) (van Rij & Hirshman Reference van Rij and Hirshman1989). The full trajectory model maintains
$\unicode[STIX]{x1D707}$
conservation as radial coupling (terms involving
$\unicode[STIX]{x2202}f_{1s}/\unicode[STIX]{x2202}\unicode[STIX]{x1D713}$
) is dropped. While the DKES model does not conserve
$\unicode[STIX]{x1D707}$
when the radial electric field
$E_{r}\neq 0$
, the adjoint operator under the DKES model takes a particularly simple form as discussed in § 3.1. This model also does not introduce any unphysical constraints on the distribution function when
$E_{r}=0$
, as occurs for the full trajectory model (Landreman et al.
Reference Landreman, Smith, Mollén and Helander2014). These constraints motivate the introduction of particle and heat sources, which are discussed in the following section. We will discuss some of the details of the implementation of the DKE in the SFINCS code as these need to be considered in arriving at the adjoint equation. However, the adjoint neoclassical approach is quite general and could be implemented in other drift kinetic codes with slight modification.
From solutions of (2.1), several neoclassical quantities are computed, including the flux surface-averaged parallel flow,

the radial particle flux,

and the radial heat flux (sometimes referred to as an energy flux),

We will also consider species-summed quantities including the bootstrap current,
$J_{b}=\sum _{s}q_{s}n_{s}V_{\Vert ,s}$
, the radial current,
$J_{r}=\sum _{s}q_{s}\unicode[STIX]{x1D6E4}_{s}$
, and the total heat flux,
$Q_{\text{tot}}=\sum _{s}Q_{s}$
. Here the effective normalized radius is
$\unicode[STIX]{x1D70C}=\sqrt{\unicode[STIX]{x1D713}/\unicode[STIX]{x1D713}_{0}}$
, where
$2\unicode[STIX]{x03C0}\unicode[STIX]{x1D713}_{0}$
is the toroidal flux at the boundary.
2.1 Sources and constraints
To avoid unphysical constraints on
$f_{1s}$
implied by the moment equations of (2.1) in the presence of a non-zero
$E_{r}$
(Landreman et al.
Reference Landreman, Smith, Mollén and Helander2014), particle and heat sources are added to the DKE (A 1),

where
$S_{1s}^{f}(\unicode[STIX]{x1D713})$
and
$S_{2s}^{f}(\unicode[STIX]{x1D713})$
are unknowns such that
$S_{1s}^{f}$
provides a particle source and
$S_{2s}^{f}$
provides a heat source. The collisionless trajectory operator in SFINCS coordinates is

and the inhomogeneous drive term is
$\mathbb{S}_{0s}=-(\boldsymbol{v}_{\text{m}s}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\unicode[STIX]{x1D713})\unicode[STIX]{x2202}f_{Ms}/\unicode[STIX]{x2202}\unicode[STIX]{x1D713}$
. The source functions are determined via the requirement that
$\langle \int \text{d}^{3}v\,f_{1s}\rangle _{\unicode[STIX]{x1D713}}=0$
and
$\langle \int \text{d}^{3}v\,x_{s}^{2}f_{1s}\rangle _{\unicode[STIX]{x1D713}}=0$
(i.e.
$f_{1s}$
does not provide net density or pressure). So, the following system of equations is solved,

The velocity-space averaging operations are denoted
$\mathbb{L}_{1s}\,f_{1s}=\langle \int \text{d}^{3}v\,f_{1s}\rangle _{\unicode[STIX]{x1D713}}$
and
$\mathbb{L}_{2s}\,f_{1s}=\langle \int \text{d}^{3}v\,f_{1s}x_{s}^{2}\rangle _{\unicode[STIX]{x1D713}}$
. The full multi-species system can be written as,

Here the linear systems corresponding to each species as in (2.9) are coupled through the collision operator. We use the following notation to refer to the above system,

3 Adjoint approach
The goal of the adjoint neoclassical approach is to efficiently compute derivatives of a moment of the distribution function,
${\mathcal{R}}$
(e.g.
$V_{\Vert ,s},\unicode[STIX]{x1D6E4}_{s},Q_{s},J_{b},J_{r},Q_{\text{tot}}$
), with respect to many parameters. Consider a set of parameters,
$\unicode[STIX]{x1D6FA}=\{\unicode[STIX]{x1D6FA}_{i}\}_{i=1}^{N_{\unicode[STIX]{x1D6FA}}}$
, on which
${\mathcal{R}}$
depends. Computing a forward difference derivative with respect to
$\unicode[STIX]{x1D6FA}$
requires
$N_{\unicode[STIX]{x1D6FA}}+1$
solutions of (2.11). With the adjoint approach,
$\unicode[STIX]{x2202}{\mathcal{R}}/\unicode[STIX]{x2202}\unicode[STIX]{x1D6FA}$
can be computed with one solution of (2.11) and one solution of a linear adjoint equation of the same size as (2.11). Thus if
$N_{\unicode[STIX]{x1D6FA}}$
is very large and the solution to (2.11) is computationally expensive to obtain, the adjoint approach can reduce the cost by a factor of
$N_{\unicode[STIX]{x1D6FA}}$
. For stellarator optimization, it is desirable to compute derivatives with respect to parameters which describe the magnetic geometry. In fully three-dimensional geometry,
$N_{\unicode[STIX]{x1D6FA}}$
is
$O(10^{2})$
and solving (2.11) is the most expensive part of computing
${\mathcal{R}}$
(rather than constructing the linear system or taking a moment of the distribution function). Thus the adjoint approach can provide a computational savings of a factor of
$O(10^{2})$
. The adjoint method is also advantageous over numerical derivatives, as it avoids additional noise from discretization error. In what follows we consider
$\unicode[STIX]{x1D6FA}$
to be a set of parameters describing the magnetic geometry, which will be specified in § 4.
We compute the derivatives of
${\mathcal{R}}$
using two approaches. In the first approach, we define an inner product which involves integrals over the distribution function, and an adjoint operator is obtained with respect to this inner product. This we refer to as the continuous approach. In the second approach, we consider the DKE after discretization, defining an adjoint operator with respect to the Euclidean dot product. This we refer to as the discrete approach. While these approaches should provide identical results within discretization error, the advantages and drawbacks of each approach will be discussed at the end of § 3.2.
3.1 Continuous approach
Let
$F=\{F_{s}\}_{s=1}^{N_{\text{species}}}$
be the set of unknowns computed with SFINCS before discretization, denoted by the column vector in (2.10) with
$F_{s}$
given by (2.9). That is,
$F$
consists of a set of
$N_{\text{species}}$
distribution functions over
$(\unicode[STIX]{x1D703},\unicode[STIX]{x1D701},x_{s},\unicode[STIX]{x1D709}_{s})$
and their associated source functions. We define an inner product between two such quantities in the following way,

Here the superscript on
$S_{1s}$
and
$S_{2s}$
denotes the distribution function with which the source functions are associated and the sum is over species. The space of continuous functions,
$F$
, of this form such that
$\langle F,F\rangle$
is bounded will be denoted by
${\mathcal{H}}$
. It can be seen that (3.1) is indeed an inner product, as it satisfies conjugate symmetry (
$\langle G,F\rangle =\langle F,G\rangle$
$\forall F,G\in {\mathcal{H}}$
), linearity (
$\langle F+G,H\rangle =\langle F,H\rangle +\langle G,H\rangle$
$\forall F,G,H\in {\mathcal{H}}$
and
$\langle F,aG\rangle =a\langle F,G\rangle$
$\forall F,G\in {\mathcal{H}}$
,
$a\in \mathbb{R}$
), and positive definiteness (
$\langle F,F\rangle \geqslant 0$
and
$\langle F,F\rangle =0$
only if
$F=0$
$\forall F\in {\mathcal{H}}$
) (Rudin Reference Rudin2006). This implies that if
${\mathcal{H}}$
is finite-dimensional, then for any linear operator
$L$
there exists a unique adjoint operator
$L^{\dagger }$
such that
$\langle LF,G\rangle =\langle F,L^{\dagger }G\rangle$
for all
$F,G\in {\mathcal{H}}$
. While here
${\mathcal{H}}$
is not finite-dimensional, we will show that such an adjoint operator exists for this inner product.
Note that the norm associated with this inner product
$\Vert F\Vert =\sqrt{\langle F,F\rangle }$
is similar to the free energy norm,

which obeys a conservation equation in gyrokinetic theory (Krommes & Hu Reference Krommes and Hu1994; Abel et al. Reference Abel, Plunk, Wang, Barnes, Cowley, Dorland and Schekochihin2013; Landreman, Plunk & Dorland Reference Landreman, Plunk and Dorland2015). The choice of inner product (3.1) is advantageous, as the linearized Fokker–Planck collision operator becomes self-adjoint for species linearized about Maxwellians with the same temperature. In what follows, we assume that all included species are of the same temperature. This assumption could be lifted, with a modification to the collision operator that appears in the adjoint equation (see appendix B). This assumption is not necessary when using the discrete approach (see § 3.2).
Consider a moment of the distribution function
${\mathcal{R}}\in \{V_{\Vert ,s},\unicode[STIX]{x1D6E4}_{s},Q_{s},J_{b},J_{r},Q_{\text{tot}}\}$
, which can be written as an inner product with a vector
$\widetilde{{\mathcal{R}}}\in {\mathcal{H}}$
,

according to (3.1). For example,

where the column structure corresponds with that in (2.9) and (2.10).
We are interested in computing the derivative of
${\mathcal{R}}$
with respect to a set of parameters,
$\unicode[STIX]{x1D6FA}=\{\unicode[STIX]{x1D6FA}_{i}\}_{i=1}^{N_{\unicode[STIX]{x1D6FA}}}$
. This derivative can be computed with the chain rule,

The subscripts in (3.5) denote the quantity that is held fixed while the derivative is computed. The first term on the right-hand side accounts for the explicit dependence on
$\unicode[STIX]{x1D6FA}_{i}$
while the second accounts for the implicit dependence on
$\unicode[STIX]{x1D6FA}_{i}$
through
$F$
. Here
$(\unicode[STIX]{x2202}F/\unicode[STIX]{x2202}\unicode[STIX]{x1D6FA}_{i})_{\mathbb{L}F=\mathbb{S}}$
can be computed by considering perturbations to the linear system (2.11), noting that in general both
$\mathbb{L}$
and
$\mathbb{S}$
can depend on
$\unicode[STIX]{x1D6FA}$
,

Computing
$(\unicode[STIX]{x2202}F/\unicode[STIX]{x2202}\unicode[STIX]{x1D6FA})_{\mathbb{L}F=\mathbb{S}}$
using (3.6) requires solving
$N_{\unicode[STIX]{x1D6FA}}$
linear systems of the same dimension as the DKE (2.11). To avoid this additional computational cost, we instead solve an adjoint equation,

In what follows, we show that the adjoint variable,
$q^{{\mathcal{R}}}$
, can be used to compute
$(\unicode[STIX]{x2202}{\mathcal{R}}/\unicode[STIX]{x2202}\unicode[STIX]{x1D6FA})_{\mathbb{L}F=\mathbb{S}}$
without solving (3.6) for each
$\unicode[STIX]{x1D6FA}_{i}$
. Using (3.7) with (3.5),

and applying the adjoint property, we obtain

Using (3.6),

So, (3.10) provides the same derivative information as (3.5). Thus, using (3.10), the derivative with respect to
$\unicode[STIX]{x1D6FA}$
can be computed with the solution to two linear systems, (2.11) and (3.7). The partial derivatives on the right-hand side of (3.10) can be computed analytically by considering the explicit geometric dependence of
${\mathcal{R}}$
,
$\mathbb{L}$
and
$\mathbb{S}$
.
When
$N_{\unicode[STIX]{x1D6FA}}$
is large, the cost of computing
$\unicode[STIX]{x2202}{\mathcal{R}}/\unicode[STIX]{x2202}\unicode[STIX]{x1D6FA}$
using (3.10) is dominated not by the linear solve but by constructing
$\unicode[STIX]{x2202}\mathbb{S}/\unicode[STIX]{x2202}\unicode[STIX]{x1D6FA}$
and
$\unicode[STIX]{x2202}\mathbb{L}/\unicode[STIX]{x2202}\unicode[STIX]{x1D6FA}$
and computing the inner product. Thus the cost still scales with
$N_{\unicode[STIX]{x1D6FA}}$
. However, we obtain a significant savings in comparison with forward difference derivatives, as shown in § 4.
The adjoint operator for each species takes the following form,

where
$\mathbb{L}_{1s}^{\dagger }=5/2\mathbb{L}_{1s}-\mathbb{L}_{2s}$
and
$\mathbb{L}_{2s}^{\dagger }=3/2\mathbb{L}_{1s}-\mathbb{L}_{2s}$
. The same column structure is used as for the forward operator (2.10),
$\mathbb{L}^{\dagger }=\{\mathbb{L}_{s}^{\dagger }\}_{i=1}^{N_{\text{species}}}$
. The quantity
$\mathbb{L}_{0s}^{\dagger }$
satisfies
$\langle \int \text{d}^{3}v\,g_{1s}\mathbb{L}_{0s}\,f_{1s}/f_{Ms}\rangle _{\unicode[STIX]{x1D713}}=\langle \int \text{d}^{3}v\,f_{1s}\mathbb{L}_{0s}^{\dagger }g_{1s}/f_{Ms}\rangle _{\unicode[STIX]{x1D713}}$
and depends on which trajectory model is applied. The expression (3.11) can be verified by noting that

For the DKES trajectories the adjoint operator is,

This anti-self-adjoint property is used in obtaining the variational principle which provides bounds on neoclassical transport coefficients in the DKES code (van Rij & Hirshman Reference van Rij and Hirshman1989). For full trajectories it is,

The anti-self-adjoint property does not hold for this trajectory model as the
$\boldsymbol{E}\times \boldsymbol{B}$
drift (C 9) is no longer divergenceless. See appendix C for details on obtaining these adjoint operators.
3.2 Discrete approach
Next we consider the discrete adjoint approach. Let
$\overrightarrow{\boldsymbol{F}}$
be the set of unknowns computed with SFINCS after discretization of
$F$
. The linear DKE (2.11) upon discretization can then be written schematically as

In this case, we can define an inner product as the vector dot product,

In real Euclidean space, the adjoint operator,
$(\overleftrightarrow{\boldsymbol{L}})^{\dagger }$
, which satisfies

is simply the transpose of the matrix,
$(\overleftrightarrow{\boldsymbol{L}})^{\text{T}}$
. Again, the moments of the distribution function,
${\mathcal{R}}$
can be expressed as an inner product with a vector
$\overrightarrow{\boldsymbol{R}}$
,

Using the discrete approach, the following adjoint equation must be solved

The adjoint variable,
$\overrightarrow{\boldsymbol{q}}^{{\mathcal{R}}}$
, can again be used to compute
$(\unicode[STIX]{x2202}{\mathcal{R}}/\unicode[STIX]{x2202}\unicode[STIX]{x1D6FA}_{i})_{\overleftrightarrow{\boldsymbol{L}}\overrightarrow{\boldsymbol{F}}=\overrightarrow{\boldsymbol{S}}}$
,

As with the continuous approach, the partial derivatives on the right-hand side can be computed analytically. In this way, the derivative of
${\mathcal{R}}$
with respect to
$\unicode[STIX]{x1D6FA}$
can be computed with only two linear solves, (3.15) and (3.19).
In the SFINCS implementation, the DKE is typically solved with the preconditioned GMRES algorithm. In the continuous approach, a preconditioner matrix for both the forward and adjoint operator must be lower–upper (
$LU$
)-factorized. Here the preconditioner matrix is the same as the full matrix but without cross-species or speed coupling. As the adjoint matrix is sufficiently different from the forward matrix, we do not obtain convergence when the same preconditioner is used for both problems. However, in the discrete approach, the
$LU$
-factorization for the preconditioner of the forward matrix can be reused for the preconditioner of the adjoint matrix (If a matrix
$\unicode[STIX]{x1D63C}$
has been factorized as
$\unicode[STIX]{x1D63C}=LU$
then
$\unicode[STIX]{x1D63C}^{\text{T}}=U^{\text{T}}L^{\text{T}}$
where
$U^{\text{T}}$
is lower triangular and
$L^{\text{T}}$
is upper triangular). This provides a significant reduction in memory and computational cost for the discrete approach.
Furthermore, the discrete adjoint approach provides the exact derivatives for the discretized problem. With this method the adjoint equation is obtained using the vector dot product and matrix transpose which can be computed without any numerical approximation. The error in the derivatives obtained by the adjoint method are therefore only limited by the tolerance to which the linear solve is performed with GMRES. On the other hand, the continuous adjoint approach relies on a continuous inner product which must ultimately be approximated numerically. Thus the continuous approach provides the exact derivatives only in the limit that the discrete approximation of the inner product exactly reproduces the continuous inner product. Therefore we expect the results of the discrete and adjoint approaches to agree within discretization error, as will be demonstrated in § 4.
The continuous approach can be advantageous in that an adjoint equation may be prescribed independently of the discretization scheme. Note that in the discrete approach, the adjoint operator is obtained from the matrix transpose of the discretized forward operator, which implies that the same spatial and velocity resolution parameters must be used for both the forward and adjoint solutions. In this work we will employ the same discretization parameters for both the adjoint and forward problems, but this restriction is not required for the continuous approach.
4 Implementation and benchmarks
The adjoint method has been implemented in the SFINCS code using both the discrete and continuous approaches. The magnetic geometry is specified in Boozer coordinates (Helander Reference Helander2014) such that the covariant form of the magnetic field is,

where
$I(\unicode[STIX]{x1D713})=\unicode[STIX]{x1D707}_{0}I_{T}(\unicode[STIX]{x1D713})/2\unicode[STIX]{x03C0}$
and
$G(\unicode[STIX]{x1D713})=\unicode[STIX]{x1D707}_{0}I_{P}(\unicode[STIX]{x1D713})/2\unicode[STIX]{x03C0}$
,
$I_{T}(\unicode[STIX]{x1D713})$
is the toroidal current enclosed by
$\unicode[STIX]{x1D713}$
, and
$I_{P}(\unicode[STIX]{x1D713})$
is the poloidal current outside of
$\unicode[STIX]{x1D713}$
. The contravariant form is

where
$\unicode[STIX]{x1D704}(\unicode[STIX]{x1D713})$
is the rotational transform. The Jacobian is obtained from dotting (4.1) with (4.2),

As
$K(\unicode[STIX]{x1D713},\unicode[STIX]{x1D703},\unicode[STIX]{x1D701})$
does not appear in any of the trajectory coefficients ((A 2) and (A 5)), in the drive term in (A 1), or in the geometric factors used to define the moments of the distribution function ((2.4) to (2.6)), all the geometric dependence enters through
$B(\unicode[STIX]{x1D713},\unicode[STIX]{x1D703},\unicode[STIX]{x1D701})$
,
$G(\unicode[STIX]{x1D713})$
,
$I(\unicode[STIX]{x1D713})$
and
$\unicode[STIX]{x1D704}(\unicode[STIX]{x1D713})$
. We choose to use Boozer coordinates for these computations as it reduces the number of geometric parameters that must be considered, but the neoclassical adjoint method is not limited to this choice of coordinate system.
We approximate
$B$
by a truncated Fourier series,

where
$j$
sums over Fourier modes
$m_{j}\leqslant m_{\max }$
and
$|n_{j}|\leqslant n_{\max }$
such that
$n_{j}$
is an integer multiple of
$N_{P}$
, the number of field periods. In (4.4), we have assumed stellarator symmetry such that
$B(-\unicode[STIX]{x1D703},-\unicode[STIX]{x1D701})=B(\unicode[STIX]{x1D703},\unicode[STIX]{x1D701})$
, and
$N_{p}$
symmetry such that
$B(\unicode[STIX]{x1D703},\unicode[STIX]{x1D701}+2\unicode[STIX]{x03C0}/N_{P})=B(\unicode[STIX]{x1D703},\unicode[STIX]{x1D701})$
. Thus we compute derivatives with respect to the parameters
$\unicode[STIX]{x1D6FA}=\{B_{mn}^{c},I(\unicode[STIX]{x1D713}),G(\unicode[STIX]{x1D713}),\unicode[STIX]{x1D704}(\unicode[STIX]{x1D713})\}$
. Additionally, derivatives with respect to
$E_{r}$
are computed, which are used for efficient ambipolar solutions and computing derivatives of geometric quantities at ambipolarity (see § 5.3) rather than at fixed
$E_{r}$
.
To demonstrate, we compute
$\unicode[STIX]{x2202}{\mathcal{R}}/\unicode[STIX]{x2202}B_{00}^{c}$
for moments of the ion distribution function using the discrete and continuous adjoint methods. A 3-mode model of the standard configuration W7-X geometry at
$\unicode[STIX]{x1D70C}=\sqrt{\unicode[STIX]{x1D713}/\unicode[STIX]{x1D713}_{0}}=0.5$
is used (table 1 in Beidler et al. (Reference Beidler, Allmaier, Isaev, Kasilov, Kernbichler, Leitold, Maaßberg, Mikkelsen, Murakami and Schmidt2011)),

where
$B_{01}^{c}=0.04645B_{00}^{c}$
,
$B_{11}^{c}=-0.04351B_{00}^{c}$
and
$B_{10}^{c}=-0.01902B_{00}^{c}$
. Electron and ion (
$Z=1$
) species are included, and the derivatives are computed at the ambipolar
$E_{r}$
with the full trajectory model. The derivatives are also computed with a forward difference approach with varying step size
$\unicode[STIX]{x0394}B_{00}^{c}$
. In figure 1 we show the fractional difference between
$\unicode[STIX]{x2202}{\mathcal{R}}/\unicode[STIX]{x2202}B_{00}^{c}$
computed using the adjoint method and with forward difference derivatives. We see that at large values of
$\unicode[STIX]{x0394}B_{00}^{c}$
, the adjoint and numerical derivatives begin to differ significantly due to discretization error from the forward difference approximation. The fractional error decreases proportional to
$\unicode[STIX]{x0394}B_{00}^{c}$
as expected until the rounding error begins to dominate (Sauer Reference Sauer2012) when
$\unicode[STIX]{x0394}B_{00}^{c}/(B_{00}^{c})$
is approximately
$10^{-4}$
, where
$B_{00}^{c}$
is the value of the unperturbed mode. The discrete and continuous approaches show qualitatively similar trends, though the minimum fractional difference is lower in the discrete approach due to the additional discretization error that arises with the continuous approach. With sufficient resolution parameters (41
$\unicode[STIX]{x1D703}$
grid points, 61
$\unicode[STIX]{x1D701}$
grid points, 85
$\unicode[STIX]{x1D709}$
basis functions and 7
$x$
basis functions), the fractional error of the continuous approach is
${\leqslant}0.1\,\%$
and should not be significant for most applications. We find similar agreement for other derivatives and with the DKES trajectory model.

Figure 1. Fractional difference between derivatives with respect to
$B_{00}^{c}$
computed with the adjoint method and with a forward difference derivative with step size
$\unicode[STIX]{x0394}B_{00}^{c}$
. The full trajectory model was used with (a) the discrete and (b) the continuous adjoint approaches.
To demonstrate that the discrete and continuous methods indeed produce the same derivative information, we compute the fractional difference between the derivatives computed with the two methods as a function of the resolution parameters. As an example, in figure 2(a) we show the fractional difference in
$\unicode[STIX]{x2202}Q_{i}/\unicode[STIX]{x2202}\unicode[STIX]{x1D704}$
, where
$Q_{i}$
is the radial ion heat flux, as a function of the number of Legendre polynomials used for the pitch angle discretization,
$N_{\unicode[STIX]{x1D709}}$
, keeping the other resolution parameters fixed. As
$N_{\unicode[STIX]{x1D709}}$
is increased, the fractional differences converges to a finite value, approximately
$10^{-4}$
, due to the discretization error in the other resolution parameters. Similar resolution parameters are required for the convergence of the moment itself,
$Q_{i}$
, and its derivative computed with the continuous method,
$\unicode[STIX]{x2202}Q_{i}/\unicode[STIX]{x2202}\unicode[STIX]{x1D704}$
. Convergence of
$Q_{i}$
within 5 % is obtained with
$N_{\unicode[STIX]{x1D709}}=38$
, similar to that required for the convergence of
$\unicode[STIX]{x2202}Q/\unicode[STIX]{x2202}\unicode[STIX]{x1D704}$
, as can be seen in figure 2(a).
In figure 2(b) we compare the cost of calculating derivatives of one moment with respect to
$N_{\unicode[STIX]{x1D6FA}}$
parameters using the continuous and discrete adjoint methods and forward difference derivatives. All computations are performed on the Edison computer at NERSC using 48 processors, and the elapsed wall time is reported. Here we include the cost of solving the linear system and computing diagnostics
$N_{\unicode[STIX]{x1D6FA}}+1$
times for the forward difference approach, and the cost of solving the forward and adjoint linear systems and computing diagnostics for the adjoint approaches. The cost of the continuous approach is slightly more than that of the discrete approach due to the cost of factorizing the adjoint preconditioner. However, at large
$N_{\unicode[STIX]{x1D6FA}}$
the cost of computing diagnostics for the adjoint approach (e.g. computing
$\unicode[STIX]{x2202}\mathbb{S}/\unicode[STIX]{x2202}\unicode[STIX]{x1D6FA}$
and
$\unicode[STIX]{x2202}\mathbb{L}/\unicode[STIX]{x2202}\unicode[STIX]{x1D6FA}$
and performing the inner product in (3.10)) dominates that of solving the adjoint linear system; thus the discrete and continuous approaches become comparable in cost. In this regime, the adjoint approach provides speed-up by a factor of approximately
$50$
.

Figure 2. (a) The fractional difference between
$\unicode[STIX]{x2202}Q_{i}/\unicode[STIX]{x2202}\unicode[STIX]{x1D704}$
computed with the continuous and discrete approaches converges with the number of pitch-angle Legendre modes,
$N_{\unicode[STIX]{x1D709}}$
. (b) Comparison of the computational cost of computing
$\unicode[STIX]{x2202}{\mathcal{R}}/\unicode[STIX]{x2202}\unicode[STIX]{x1D6FA}$
with forward difference derivatives and the adjoint approach as a function of
$N_{\unicode[STIX]{x1D6FA}}$
, the number of parameters in the gradient.
5 Applications of the adjoint method
5.1 Local magnetic sensitivity analysis
Using the adjoint method, it is possible to compute derivatives of a moment of the distribution function with respect to the Fourier amplitudes of the field strength,
$\{\unicode[STIX]{x2202}{\mathcal{R}}/\unicode[STIX]{x2202}B_{mn}^{c}\}$
. Rather than consider sensitivity in Fourier space, we would like to compute the sensitivity to local perturbations of the field strength. We now quantify the relationship between these two representations of sensitivity information.
Consider the Gâteaux functional derivative (Delfour & Zolâsio Reference Delfour and Zolâsio2011) of
${\mathcal{R}}$
with respect to
$B$
,

Here we consider a perturbation to the field strength at fixed
$I(\unicode[STIX]{x1D713})$
,
$G(\unicode[STIX]{x1D713})$
and
$\unicode[STIX]{x1D704}(\unicode[STIX]{x1D713})$
. As
$\unicode[STIX]{x1D6FF}{\mathcal{R}}(\unicode[STIX]{x1D6FF}B;B(\boldsymbol{r}))$
is a linear functional of
$\unicode[STIX]{x1D6FF}B$
, by the Riesz representation theorem (Rudin Reference Rudin2006),
$\unicode[STIX]{x1D6FF}{\mathcal{R}}$
can be expressed as an inner product with
$\unicode[STIX]{x1D6FF}B$
and some element of the appropriate space. The function
$\unicode[STIX]{x1D6FF}B$
is defined on a flux surface,
$\unicode[STIX]{x1D713}$
; thus it is sensible to express
$\unicode[STIX]{x1D6FF}{\mathcal{R}}$
in the following way,

Here
$\unicode[STIX]{x1D6FF}B(\boldsymbol{r})$
describes the local perturbation to the field strength, and
$\unicode[STIX]{x1D6FF}{\mathcal{R}}$
quantifies the corresponding change to the moment
${\mathcal{R}}$
. The function
$S_{{\mathcal{R}}}$
is analogous to the shape gradient, which quantifies the change in a figure of merit which results from a differential perturbation to a shape (Landreman & Paul Reference Landreman and Paul2018). The shape gradient will be discussed further in § 5.2.2.
Suppose that
$B$
is stellarator symmetric and
$N_{P}$
symmetric. If
$E_{r}=0$
, then
$S_{{\mathcal{R}}}$
must also possess stellarator and
$N_{P}$
symmetry (see appendix D). However, when
$E_{r}\neq 0$
,
$S_{{\mathcal{R}}}$
is no longer guaranteed to have stellarator symmetry. Nonetheless, it may be desirable to ignore the stellarator-asymmetric part of
$S_{{\mathcal{R}}}$
if an optimized stellarator-symmetric configuration is desired. For the remainder of this work, we will make this assumption, though the analysis could be extended to consider the effect of breaking of stellarator symmetry. The quantity
$S_{{\mathcal{R}}}$
can be approximated by a truncated Fourier series under these assumptions,

where
$k$
sums over
$m\leqslant m_{\max }$
and
$|n|\leqslant n_{\max }$
such that
$n$
is an integer multiple of
$N_{P}$
. The quantity
$\unicode[STIX]{x1D6FF}B(\boldsymbol{r})$
can be written in terms of perturbations to the Fourier coefficients,

where again the sum is only taken over
$N_{P}$
symmetric modes. Now
$\unicode[STIX]{x1D6FF}{\mathcal{R}}$
can be written in terms of perturbations to the Fourier coefficients,

In this way, (5.2) can be expressed as a linear system,

where

If the same number of modes is used to discretize
$\unicode[STIX]{x1D6FF}{\mathcal{R}}$
and
$S_{{\mathcal{R}}}$
, then the linear system is square.

Figure 3. (a) The local magnetic sensitivity function for the bootstrap current,
$S_{J_{b}}$
, is shown for the W7-X standard configuration. Positive values indicate that increasing the field strength at a given location will increase
$J_{b}$
through (5.2). (b) The local sensitivity function for the ion particle flux,
$S_{\unicode[STIX]{x1D6E4}_{i}}$
.
In contrast to derivatives with respect to the Fourier modes of
$B$
, the sensitivity function,
$S_{{\mathcal{R}}}$
, is a spatially local quantity, quantifying the change in a figure of merit resulting from a local perturbation of the field strength. In this way,
$S_{{\mathcal{R}}}$
can inform where perturbations to the magnetic field strength can be tolerated. The sensitivity function could be related directly to a local magnetic tolerance using the method described in section 9 of Landreman & Paul (Reference Landreman and Paul2018). In contrast with that work, here we are considering perturbations to the field strength on any flux surface rather than at the plasma boundary. However,
$S_{{\mathcal{R}}}$
still provides insight into where trim coils should be placed or coil displacements can be tolerated without sacrificing desired neoclassical properties. The sensitivity function can also be used for gradient-based optimization in the space of the field strength on a flux surface as in § 5.2.1.
We compute
$S_{J_{b}}$
for the W7-X standard configuration at
$\unicode[STIX]{x1D70C}=0.70$
, shown in figure 3(a). We use a fixed-boundary equilibrium that preceded the coil design and does not include coil ripple, and the full equilibrium is used rather than the truncated Fourier series considered in § 4. The same resolution parameters are used as in § 4, and derivatives with respect to
$B_{mn}^{c}$
are computed for
$m_{\max }=n_{\max }=20$
. The largest modes for this configuration are the helical curvature
$B_{11}^{c}$
, the toroidal curvature
$B_{10}^{c}$
and the toroidal mirror
$B_{01}^{c}$
. We find that
$S_{J_{b}}$
is large and negative on the inboard side, indicating that increasing the magnitude of the toroidal curvature component of
$B$
would lead to an increase in
$J_{b}$
. This result is in agreement with previous analysis of the dependence of the bootstrap current on these three modes in the W7-X magnetic configuration space (Maassberg, Lotz & Nührenberg Reference Maassberg, Lotz and Nührenberg1993), which found that at low collisionality the bootstrap current coefficients depend strongly on the toroidal curvature. In figure 3(b) is the sensitivity function for the ion particle flux,
$S_{\unicode[STIX]{x1D6E4}_{i}}$
, computed for the same configuration using
$m_{\max }=20$
and
$n_{\max }=25$
. We find that the particle flux is more sensitive to perturbations on the outboard side in localized regions, while on the inboard side the sensitivity is relatively small in magnitude.
5.2 Gradient-based optimization
5.2.1 Optimization of field strength
As a second demonstration of the adjoint neoclassical method, we consider optimizing in the space of the field strength on a surface, taking
$\unicode[STIX]{x1D6FA}=\{B_{mn}^{c}\}$
. As Boozer coordinates are used, the covariant form (4.1) satisfies
$(\unicode[STIX]{x1D735}\times \boldsymbol{B})\boldsymbol{\cdot }\unicode[STIX]{x1D735}\unicode[STIX]{x1D713}=0$
and the contravariant form (4.2) satisfies
$\unicode[STIX]{x1D735}\boldsymbol{\cdot }\boldsymbol{B}=0$
. As we will artificially modify the field strength while keeping other geometry parameters fixed, the resulting field will not necessarily satisfy both of these conditions with both the covariant and contravariant forms. While there is no guarantee that the resulting field strength will be consistent with a global equilibrium solution, it provides insight into how local changes to the field strength can impact neoclassical properties. As a second step, the outer boundary could be optimized to match the desired field strength on a single surface. In § 5.2.2, we discuss how the derivatives computed in this work could be coupled to optimization of an MHD equilibrium.
We perform optimization with a Broyden–Fletcher–Goldfarb–Shanno (BFGS) quasi-Newton method (Nocedal & Wright Reference Nocedal and Wright1999) using an objective function
$\unicode[STIX]{x1D712}^{2}=J_{b}^{2}$
. A backtracking line search is used at each iteration to find a step size that satisfies a condition of sufficient decrease of
$\unicode[STIX]{x1D712}^{2}$
. We use the same equilibrium as in § 5.1, retaining modes
$m\leqslant 12$
and
$|n|\leqslant 12$
, and compute derivatives with respect to these modes. Convergence to
$\unicode[STIX]{x1D712}^{2}\leqslant 10^{-10}$
was obtained within 8 BFGS iterations (28 function evaluations), as shown in figure 4(a). The difference in field strength between the initial and optimized configuration,
$B_{\text{opt}}-B_{\text{init}}$
, is shown in figure 4(b). As expected from the analysis in § 5.1, the field strength increased on the outboard side and decreased on the inboard side in comparison with
$B_{\text{init}}$
.

Figure 4. (a) Convergence of
$\unicode[STIX]{x1D712}^{2}=J_{b}^{2}$
for optimization over
$\unicode[STIX]{x1D6FA}=\{B_{mn}^{c}\}$
with an adjoint-based BFGS method. (b) The change in field strength from the initial to optimized configuration.
5.2.2 Optimization of MHD equilibria
The local sensitivity function,
$S_{{\mathcal{R}}}$
, along with
$\unicode[STIX]{x2202}{\mathcal{R}}/\unicode[STIX]{x2202}I$
,
$\unicode[STIX]{x2202}{\mathcal{R}}/\unicode[STIX]{x2202}G$
, and
$\unicode[STIX]{x2202}{\mathcal{R}}/\unicode[STIX]{x2202}\unicode[STIX]{x1D704}$
, can be used to determine how perturbations to the outer boundary of the plasma,
$\unicode[STIX]{x2202}\unicode[STIX]{x1D6E4}$
, result in perturbations to
${\mathcal{R}}$
. This is quantified through the idea of the shape gradient, which is described below. The partial derivatives of
${\mathcal{R}}$
can be computed with the adjoint method outlined in § 3, and the shape gradient can be obtained with only one additional MHD equilibrium solution through the application of another adjoint method.
Consider a figure of merit which is integrated over a toroidal domain,
$\unicode[STIX]{x1D6E4}$
,

where
$w(\unicode[STIX]{x1D713})$
is a weighting function. That is, SFINCS is run on a set of
$\unicode[STIX]{x1D713}$
surfaces within
$\unicode[STIX]{x1D6E4}$
and the volume integral is computed numerically. Here we consider
$\unicode[STIX]{x2202}\unicode[STIX]{x1D6E4}$
to be the plasma boundary used for a fixed-boundary MHD equilibrium calculation. The perturbation to
$f_{{\mathcal{R}}}$
resulting from normal perturbation to
$\unicode[STIX]{x2202}\unicode[STIX]{x1D6E4}$
can be written in the following form,

under certain assumptions of smoothness (Delfour & Zolésio Reference Delfour and Zolésio2011). This can be thought of as another instance of the Riesz representation theorem, as
$\unicode[STIX]{x1D6FF}f_{{\mathcal{R}}}$
is a linear functional of
$\unicode[STIX]{x1D6FF}\boldsymbol{r}$
. Here
$\boldsymbol{n}$
is the outward unit normal on
$\unicode[STIX]{x2202}\unicode[STIX]{x1D6E4}$
and
$\unicode[STIX]{x1D6FF}\boldsymbol{r}$
is a vector field describing the perturbation to the surface. Intuitively, only normal perturbations to
$\unicode[STIX]{x2202}\unicode[STIX]{x1D6E4}$
result in a change to
$f_{{\mathcal{R}}}$
. The shape gradient is
${\mathcal{G}}$
, which quantifies the contribution of a local normal perturbation of the boundary to the change in
$f_{{\mathcal{R}}}$
. The shape gradient can be used for fixed-boundary optimization of equilibria or for analysis of sensitivity to perturbations of magnetic surfaces. It can be computed using a second adjoint method, where a perturbed MHD force balance equation is solved with the addition of a bulk force which depends on derivatives computed from the neoclassical adjoint method (Antonsen et al.
Reference Antonsen, Paul and Landreman2019). While the continuous neoclassical adjoint method described in this work arises from the self-adjointness of the linearized Fokker–Planck operator, the adjoint method for MHD equilibria arises from the self-adjointness of the MHD force operator. In practice these two adjoint methods could be coupled by first computing an MHD equilibrium solution, computing neoclassical transport and its geometric derivatives from this equilibrium with the neoclassical adjoint method, and passing these derivatives back to the equilibrium code to compute the shape gradient with the perturbed MHD adjoint method. In this way derivatives of neoclassical quantities with respect to the shape of the outer boundary are computed with only two equilibrium solutions and two DKE solutions. This calculation will be reported in a future publication.
Rather than solve an additional adjoint equation, the outer boundary could be optimized by numerically computing derivatives of
$\{B_{mn}^{c}(\unicode[STIX]{x1D713}),G(\unicode[STIX]{x1D713}),I(\unicode[STIX]{x1D713})\}$
with respect to the double Fourier series describing the outer boundary shape in cylindrical coordinates,
$\{R_{mn}^{c},Z_{mn}^{s}\}$
, using a finite difference method. This could be done using the STELLOPT code (Spong et al.
Reference Spong, Hirshman, Whitson, Batchelor, Carreras, Lynch and Rome1998; Reiman et al.
Reference Reiman, Fu, Hirshman, Ku, Monticello, Mynick, Redi, Spong, Zarnstorff and Blackwell1999) with BOOZ_XFORM (Sanchez et al.
Reference Sanchez, Hirshman, Ware, Berry and Spong2000) to perform the coordinate transformation. For example, if the rotational transform is held fixed in the variational moments equilibrium code (VMEC) equilibrium calculation (Hirshman & Whitson Reference Hirshman and Whitson1983), the derivative of a moment,
${\mathcal{R}}$
, with respect to a boundary coefficient,
$R_{mn}^{c}$
, can be computed as,

where
$\unicode[STIX]{x2202}{\mathcal{R}}(\unicode[STIX]{x1D713})/\unicode[STIX]{x2202}B_{mn}^{c}(\unicode[STIX]{x1D713})$
,
$\unicode[STIX]{x2202}{\mathcal{R}}(\unicode[STIX]{x1D713})/\unicode[STIX]{x2202}G(\unicode[STIX]{x1D713})$
and
$\unicode[STIX]{x2202}{\mathcal{R}}(\unicode[STIX]{x1D713})/\unicode[STIX]{x2202}I(\unicode[STIX]{x1D713})$
are computed with the neoclassical adjoint method and
$\unicode[STIX]{x2202}B_{mn}^{c}(\unicode[STIX]{x1D713})/\unicode[STIX]{x2202}R_{mn}^{c}$
,
$\unicode[STIX]{x2202}G(\unicode[STIX]{x1D713})/\unicode[STIX]{x2202}R_{mn}^{c}$
and
$\unicode[STIX]{x2202}I(\unicode[STIX]{x1D713})/\unicode[STIX]{x2202}R_{mn}^{c}$
are computed with finite difference derivatives using STELLOPT. Similarly, derivatives of
$\{B_{mn}^{c}(\unicode[STIX]{x1D713}),G(\unicode[STIX]{x1D713}),I(\unicode[STIX]{x1D713})\}$
could be computed with respect to coil parameters using a free-boundary equilibrium solution, allowing for direct optimization of neoclassical quantities with respect to coil shapes. The neoclassical calculation with SFINCS is typically significantly more expensive than the equilibrium calculation (for the geometry discussed in § 5.1 fixed-boundary VMEC took 54 s while SFINCS took 157 s on 4 processors of the NERSC Edison computer). As such, combining adjoint-based with finite difference derivatives can still result in a significant computational savings.
5.3 Ambipolarity
As stellarators are not intrinsically ambipolar, the radial electric field is not truly an independent parameter. The ambipolar
$E_{r}$
must be obtained which satisfies the condition
$J_{r}(E_{r})=0$
. The application of adjoint-based derivatives for computing the ambipolar solution is discussed in § 5.3.1. An adjoint method to compute derivatives with respect to geometric parameters at fixed ambipolarity is discussed in § 5.3.2.
5.3.1 Accelerating ambipolar solve
A nonlinear root-finding algorithm must be used to compute the ambipolar
$E_{r}$
. This root finding can be accelerated with derivative information, such as with a Newton–Raphson method (Press et al.
Reference Press, Teukolsky, Vetterling and Flannery2007). The derivative required,
$\unicode[STIX]{x2202}J_{r}/\unicode[STIX]{x2202}E_{r}$
, can be computed with the discrete or continuous adjoint method as described in § 3 with the replacement
$\unicode[STIX]{x1D6FA}_{i}\rightarrow E_{r}$
, considering
${\mathcal{R}}=J_{r}$
.

Figure 5. The ambipolar root is obtained with Brent, Newton–Raphson and Newton hybrid nonlinear root solvers. The derivatives obtained with the adjoint method provide better convergence properties for the Newton methods.
We implement three nonlinear root-finding methods: Brent’s method (Brent Reference Brent2013), the Newton–Raphson method and a hybrid between the bisection and Newton–Raphson methods (Press et al. Reference Press, Teukolsky, Vetterling and Flannery2007). Brent’s method guarantees at least linear convergence by combining quadratic interpolation with bisection and does not require derivatives. The Newton–Raphson method can provide quadratic convergence under certain assumptions but in general is not guaranteed to converge. If an iterate lies near a stationary point or a poor initial guess is given, the method can fail. For this reason we implement the hybrid method, which combines the possible quadratic convergence properties of Newton–Raphson with the guaranteed linear convergence of the bisection method. Both Brent’s method and the hybrid method require the root to be bracketed, and therefore may require additional function evaluations in order to obtain the bracket.
We compare these methods in figure 5, using the W7-X standard configuration considered in § 5.1 with the full trajectory model and the discrete adjoint approach, beginning with an initial guess of
$E_{r}=-10~\text{kV}~\text{m}^{-1}$
with bounds at
$E_{r}^{\min }=-100~\text{kV}~\text{m}^{-1}$
and
$E_{r}^{\max }=100~\text{kV}~\text{m}^{-1}$
. The root is located at
$E_{r}=-3.56~\text{kV}~\text{m}^{-1}$
. For this example, the hybrid and Newton methods had nearly identical convergence properties, though the Newton method is less expensive as it does not require
$J_{r}$
to be evaluated at the bounds of the interval. To obtain the same tolerance, the Newton method provided a 14 % savings in wall clock time over Brent’s method.
In the above discussion we have made the assumption that there is only one stable root of interest. Of course a given configuration may possess several roots, especially if the ions and electrons are in different collisionality regimes (Hastings, Houlberg & Shaing Reference Hastings, Houlberg and Shaing1985). Multiple roots can be obtained by performing several root solves with different initial values and brackets, which could be trivially parallelized. Thus the adjoint method could still provide an acceleration in this more general case.
5.3.2 Derivatives at ambipolarity
The adjoint method described in § 3 assumes that
$E_{r}$
is held constant when computing derivatives with respect to
$\unicode[STIX]{x1D6FA}$
. However,
$E_{r}$
cannot truly be determined independently from geometric quantities, as the ambipolar solution should be recomputed as the geometry is altered. It is therefore desirable to compute derivatives at fixed ambipolarity (fixed
$J_{r}=0$
) rather than at fixed
$E_{r}$
. This is performed by solving an additional adjoint equation,

in the continuous approach or

in the discrete approach. Details are described in appendix E.
It should be noted that by computing derivatives at ambipolarity we assume that a given moment
${\mathcal{R}}$
is a differentiable function of the geometry at fixed
$J_{r}=0$
. That is, this method cannot be applied to cases in which a stable root disappears as the geometry varies. As this will occur at a stationary point of
$J_{r}(E_{r})$
, this situation could be avoided within an optimization loop by computing derivatives at constant
$E_{r}$
rather than constant
$J_{r}$
if
$|\unicode[STIX]{x2202}J_{r}/\unicode[STIX]{x2202}E_{r}|$
falls below a given threshold at ambipolarity.

Figure 6. (a) The cost of computing the gradient
$\unicode[STIX]{x2202}{\mathcal{R}}/\unicode[STIX]{x2202}\unicode[STIX]{x1D6FA}$
at ambipolarity scales with
$N_{\unicode[STIX]{x1D6FA}}$
, the number of parameters in
$\unicode[STIX]{x1D6FA}$
. (b) The fractional difference between
$\unicode[STIX]{x2202}{\mathcal{R}}/\unicode[STIX]{x2202}B_{00}^{c}$
at constant ambipolarity obtained with the adjoint method and with finite difference derivatives.
Although an additional adjoint solve is required, this method of computing derivatives at ambipolarity is advantageous as several linear solves are typically required to obtain the ambipolar root. A comparison of the computational cost between the adjoint method and forward difference method for derivatives at ambipolarity is shown in figure 6(a). Here the full trajectory model is used, and the result for both the discrete and continuous adjoint methods are shown. For the finite difference derivative, the ambipolar solve is performed with Brent’s method at each step in
$\unicode[STIX]{x1D6FA}$
. As in figure 2(b), we find that for large
$N_{\unicode[STIX]{x1D6FA}}$
the cost of the continuous and discrete approaches are essentially the same, as the cost is no longer dominated by the linear solve. When computing the derivatives at ambipolarity, both adjoint methods decrease the cost by a factor of approximately
$200$
for large
$N_{\unicode[STIX]{x1D6FA}}$
.

Figure 7. The sensitivity function for the ion particle flux,
$S_{\unicode[STIX]{x1D6E4}_{i}}$
, is computed at (a) constant
$E_{r}$
and (b) constant
$J_{r}$
. Similarly,
$S_{J_{b}}$
is computed at (c) constant
$E_{r}$
and (d) constant
$J_{r}$
.
In figure 6(b) we show a benchmark between derivatives at ambipolarity,
$(\unicode[STIX]{x2202}{\mathcal{R}}/\unicode[STIX]{x2202}B_{00}^{c})_{J_{r}}$
, computed with the discrete adjoint method and with forward difference derivatives. For the forward difference method, the Newton solver is used to obtain the ambipolar
$E_{r}$
as
$B_{00}^{c}$
is varied. As the forward difference step size
$\unicode[STIX]{x0394}B_{00}^{c}$
decreases, the fractional difference again decreases proportional to
$\unicode[STIX]{x0394}B_{00}^{c}$
until it reaches a minimum when
$\unicode[STIX]{x0394}B_{00}^{c}/B_{00}^{c}$
is approximately
$10^{-4}$
. In comparison with figure 1, we see that the minimum fractional difference is slightly larger at fixed ambipolarity than at fixed
$E_{r}$
, as the tolerance parameters associated with the Newton solver introduce an additional source of error to the forward difference approach.
In figure 7(a,b) we compare the sensitivity function for the particle flux,
$S_{\unicode[STIX]{x1D6E4}_{i}}$
, computed using derivatives at constant
$E_{r}$
with that computed at constant
$J_{r}$
. Here derivatives are computed using the discrete adjoint method with full trajectories, and the sensitivity function is constructed as described in § 5.1. The configuration and numerical parameters are the same as described in § 5.1. At constant
$J_{r}$
the large region of increased sensitivity on the outboard side that appears at constant
$E_{r}$
remains, though the overall magnitude of the sensitivity decreases. Thus it may be important to account for the effect of the ambipolar
$E_{r}$
when optimizing for radial transport. In figure 7(c,d) we perform the same comparison for
$S_{J_{b}}$
, finding the derivatives at fixed
$E_{r}$
and at fixed
$J_{r}$
to be virtually identical. This is to be expected, as numerical calculations of neoclassical transport coefficients for W7-X have found that the bootstrap coefficients are much less sensitive to
$E_{r}$
than those for the radial transport (figures 18 and 26 in Beidler et al. (Reference Beidler, Allmaier, Isaev, Kasilov, Kernbichler, Leitold, Maaßberg, Mikkelsen, Murakami and Schmidt2011)). Furthermore, the bootstrap current in the
$1/\unicode[STIX]{x1D708}$
regime is independent of
$E_{r}$
, and the finite-collisionality correction is small for optimized stellarators, such as W7-X (Helander, Parra & Newton Reference Helander, Parra and Newton2017). Therefore, the ambipolarity corrections to the derivatives are less important for
$J_{b}$
than for the radial transport.
6 Conclusions
We have described a method by which moments
${\mathcal{R}}$
of the neoclassical distribution function can be differentiated efficiently with respect to many parameters. The adjoint approach requires defining an inner product from which the adjoint operator is obtained. We consider two choices for this inner product. One choice corresponds with computing the adjoint of the linear operator after discretization, and the other corresponds with computing it before discretization. In the case of the former, the Euclidean dot product can be used, and in the case of the latter, an inner product whose corresponding norm is similar to the free energy norm (3.1) is defined. In § 4, we show that these approaches provide the same derivative information within discretization error, as expected. Both methods provide reduction in computational cost by a factor of approximately
$50$
in comparison with forward difference derivatives when differentiating with respect to many (
$O(10^{2})$
) parameters. In § 5.3.2 the adjoint method is extended to compute derivatives at ambipolarity. This method provides a reduction in cost by a factor of approximately
$200$
over a forward difference approach. We have implemented this method in the SFINCS code, and similar methods could be applied to other drift kinetic solvers.
In this work we consider derivatives with respect to geometric quantities that enter the DKE through Boozer coordinates. However, the adjoint neoclassical method we have described is much more general, allowing for many possible applications. For example, derivatives of the radial fluxes with respect to the temperature and density profiles could be used to accelerate the solution of the transport equations using a Newton method (Barnes et al. Reference Barnes, Abel, Dorland, Görler, Hammett and Jenko2010). The transport solution could furthermore be incorporated into the optimization loop to self-consistently evolve the macroscopic profiles in the presence of neoclassical fluxes. Rather than simply optimizing for minimal fluxes, an objective function such as the total fusion power could be considered (Highcock et al. Reference Highcock, Mandell, Barnes and Dorland2018), with optimization accelerated by adjoint-based derivatives.
Another application of the continuous adjoint formulation is correction of discretization error. The same solution obtained in § 3.1 can be used to quantify and correct for the error in a moment,
${\mathcal{R}}$
, providing similar accuracy to that computed with a higher-order stencil or finer mesh without the associated cost. This method has been applied in the field of computational fluid dynamics by solving adjoint Euler equations (Venditti & Darmofal Reference Venditti and Darmofal1999; Pierce & Giles Reference Pierce and Giles2004) and could prove useful for efficiently obtaining solutions of the DKE in low-collisionality regimes.
In § 5.2.1 we have shown an example of adjoint-based neoclassical optimization, where the optimization space is taken to be the Fourier modes of the field strength on a surface,
$\{B_{mn}^{c}\}$
. While optimization within this space is not necessarily consistent with a global equilibrium solution, it demonstrates the adjoint neoclassical method for efficient optimization. In § 5.2.2, two approaches to self-consistently optimize MHD equilibria are discussed. Further discussion and demonstration of these approaches will be provided in a future publication.
In appendix D we show that when
$E_{r}=0$
and the unperturbed geometry is stellarator symmetric, the sensitivity functions for moments of the distribution function are also stellarator symmetric. However, when
$E_{r}\neq 0$
this is no longer true. This implies that obtaining minimal neoclassical transport in the
$\sqrt{\unicode[STIX]{x1D708}}$
regime may require breaking of stellarator symmetry. In this work we have ignored the effects of stellarator symmetry breaking, although we hope to extend this work to study these effects in the future.
Acknowledgements
The authors acknowledge helpful discussion with L.-M. Imbert-Gérard and assistance with the STELLOPT code from S. Lazerson. This work was supported by the US Department of Energy through grants DE-FG02-93ER-54197 and DE-FC02-08ER-54964. The computations presented in this paper have used resources at the National Energy Research Scientific Computing Center (NERSC). Support for I.G.A. for the initial work was provided by the Chalmers University of Technology, under the auspices of the Framework grant for Strategic Energy Research (Dnr. 2014-5392) from Vetenskapsråadet.
Appendix A. Trajectory models
In the SFINCS coordinate system, the DKE can be written in the following way,

To obtain the trajectory coefficients (
$\dot{\boldsymbol{r}}$
,
${\dot{x}}_{s}$
and
$\dot{\unicode[STIX]{x1D709}}_{s}$
) several approximations are made. For example, any terms that require radial coupling (
$\unicode[STIX]{x1D713}$
derivatives of
$f_{1s}$
) cannot be retained, as this would necessitate solving a five-dimensional system.
Under the full trajectory model, the trajectory coefficients are chosen such that
$\unicode[STIX]{x1D707}$
conservation is maintained as radial coupling is dropped,

Under the DKES trajectory model, the
$\boldsymbol{E}\times \boldsymbol{B}$
velocity is taken to be divergenceless,

where the flux surface average of a quantity
$A$
is,

$\sqrt{g}=(\unicode[STIX]{x1D735}\unicode[STIX]{x1D713}\times \unicode[STIX]{x1D735}\unicode[STIX]{x1D703}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\unicode[STIX]{x1D701})^{-1}$
is the Jacobian, and
$V^{\prime }(\unicode[STIX]{x1D713})=\int _{0}^{2\unicode[STIX]{x03C0}}\text{d}\unicode[STIX]{x1D703}\int _{0}^{2\unicode[STIX]{x03C0}}\text{d}\unicode[STIX]{x1D701}\sqrt{g}$
. Under the DKES trajectory model, the trajectory coefficients are taken to be,

These effective trajectories are adopted in the widely used DKES code (Hirshman et al. Reference Hirshman, Shaing, van Rij, Beasley and Crume1986b ; van Rij & Hirshman Reference van Rij and Hirshman1989).
Appendix B. Adjoint collision operator
We want to find an adjoint collision operator,
$C_{s}^{\dagger }$
, that satisfies the following relation,

The linearized Fokker–Planck collision operator can be written as

where
$s^{\prime }$
sums over species. The first term on the right-hand side of (B 2) is referred to as the test-particle collision operator,
$C_{ss^{\prime }}^{\text{T}}(f_{1s})=C_{ss^{\prime }}(f_{1s},f_{Ms^{\prime }})$
, and the second the field-particle collision operator,
$C_{ss^{\prime }}^{F}(f_{1s^{\prime }})=C_{ss^{\prime }}(f_{Ms},f_{1s^{\prime }})$
. The test and field terms satisfy the following relations (Rosenbluth, Hazeltine & Hinton Reference Rosenbluth, Hazeltine and Hinton1972; Sugama, Watanabe & Nunami Reference Sugama, Watanabe and Nunami2009),


For collisions between species of the same temperature, we see that
$C_{s}(f_{1s})$
is self-adjoint. The adjoint operator with respect to the inner product (3.1) is thus,

Appendix C. Adjoint collisionless trajectories
We want to find an adjoint operator,
$\mathbb{L}_{0s}^{\dagger }$
, that satisfies,

for both trajectory models, where
$\mathbb{L}_{0s}$
is defined in (2.8) with (A 5) for the DKES trajectories model and (A 2) for the full trajectory model. Throughout we use the velocity-space element in SFINCS coordinates,
$\text{d}^{3}v=2\unicode[STIX]{x03C0}v_{ts}^{3}x_{s}^{2}\,\text{d}\unicode[STIX]{x1D709}_{s}\,\text{d}x_{s}$
.
C.1 DKES trajectories
The operator under consideration is,

Considering the contribution of the streaming term in (C 2) to the left-hand side of (C 1) we obtain,

Here the identity
$\langle \unicode[STIX]{x1D735}\boldsymbol{\cdot }\boldsymbol{Q}\rangle _{\unicode[STIX]{x1D713}}=1/V^{\prime }(\unicode[STIX]{x1D713})\unicode[STIX]{x2202}/\unicode[STIX]{x2202}\unicode[STIX]{x1D713}(V^{\prime }(\unicode[STIX]{x1D713})\langle \boldsymbol{Q}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\unicode[STIX]{x1D713}\rangle _{\unicode[STIX]{x1D713}})$
for any vector
$\boldsymbol{Q}$
has been used. We next consider the contribution of the
$\boldsymbol{E}\times \boldsymbol{B}$
drift term in (C 2),

Here we have used the identity

for any
$w$
. We consider the contribution of the mirror-force term in (C 2),

Combining (C 3)–(C 6), we obtain

Therefore, in the DKES trajectory model we obtain (3.13).
C.2 Full trajectories
The operator under consideration for the full model is,

The contribution to (C 1) from the streaming term in (C 8) is identical to that in the case of the DKES trajectory model, (C 3). We next consider the contribution from the
$\boldsymbol{E}\times \boldsymbol{B}$
drift term in (C 8),

again using (C 5). The contribution from the
${\dot{x}}_{s}$
term in (C 8) is,

The contribution from the mirror term in (C 8) is the same as in the case of the DKES trajectories model (C 6). We consider the contribution from the final term in (C 8),

Combining (C 3), (C 9), (C 10), (C 6) and (C 11), we obtain

Therefore, under the full trajectory model we obtain (3.14).
Appendix D. Symmetry of the sensitivity function
In this appendix we discuss several symmetry properties of the local sensitivity function,
$S_{{\mathcal{R}}}$
, defined through (5.2). The arguments that follow are similar to those in appendix C of Landreman & Paul (Reference Landreman and Paul2018). Throughout we will assume that
$B$
is stellarator symmetric and
$N_{P}$
symmetric. We will show that this implies
$N_{P}$
symmetry of
$S_{{\mathcal{R}}}$
. In the limit that
$E_{r}\rightarrow 0$
, then
$S_{{\mathcal{R}}}$
also has stellarator symmetry.
D.1 Symmetry of
$S_{{\mathcal{R}}}$
implied by Fourier derivatives
First we would like to show that
$S_{{\mathcal{R}}}$
is stellarator symmetric if and only if
$\unicode[STIX]{x2202}{\mathcal{R}}/\unicode[STIX]{x2202}B_{mn}^{s}=0$
for all
$m$
and
$n$
, where we express
$B$
in a Fourier series,

The perturbation,
$\unicode[STIX]{x1D6FF}B$
, is decomposed similarly. We begin with the ‘if’ portion of the argument. From (5.2) we have,

Suppose
$\unicode[STIX]{x2202}{\mathcal{R}}/\unicode[STIX]{x2202}B_{mn}^{s}=0$
for all
$m$
and
$n$
. The quantity
$(\sqrt{g}S_{{\mathcal{R}}})$
can be represented as a Fourier series,

From (D 2), we see that
$A_{mn}^{s}=0$
for all
$m$
and
$m$
. Thus the quantity
$(\sqrt{g}S_{{\mathcal{R}}})$
must be even under the transformation
$(\unicode[STIX]{x1D703},\unicode[STIX]{x1D701})\rightarrow (-\unicode[STIX]{x1D703},-\unicode[STIX]{x1D701})$
. We now note that
$\sqrt{g}$
must be even from (4.3) under the assumption that
$B$
is stellarator symmetric. Therefore
$S_{{\mathcal{R}}}$
must be stellarator symmetric, assuming that
$\sqrt{g}$
does not vanish anywhere, which must be the case for any well-defined coordinate transformation.
We continue with the ‘only if’ portion of the argument. Suppose
$S_{{\mathcal{R}}}$
is stellarator symmetric. As
$\sqrt{g}$
is also stellarator symmetric,
$(\sqrt{g}S_{{\mathcal{R}}})$
can be expressed in a Fourier series as (D 3) with
$A_{mn}^{s}=0$
for all
$m$
and
$n$
. Thus from (D 2)
$\unicode[STIX]{x2202}{\mathcal{R}}/\unicode[STIX]{x2202}B_{mn}^{s}=0$
for all
$m$
and
$n$
.
We next show that if
$B$
is
$N_{P}$
symmetric, then
$S_{{\mathcal{R}}}$
is
$N_{P}$
symmetric if and only if
$\unicode[STIX]{x2202}{\mathcal{R}}/\unicode[STIX]{x2202}B_{mn}^{c}=0$
for all
$n$
that are not integer multiples of
$N_{P}$
. We begin with the ‘if’ portion of the argument. From (5.2),

Suppose
$\unicode[STIX]{x2202}{\mathcal{R}}/\unicode[STIX]{x2202}B_{mn}^{c}=0$
for all
$n$
which are not integer multiples of
$N_{P}$
. Here
$(\sqrt{g}S_{{\mathcal{R}}})$
can be expressed in a Fourier series as (D 3) with
$A_{mn}^{s}=0$
for all
$m$
and
$n$
. Inserting the Fourier series into (D 4), we find that
$A_{mn}^{c}=0$
for all
$n$
that are not integer multiples of
$N_{P}$
. Thus
$(\sqrt{g}S_{{\mathcal{R}}})$
must be
$N_{P}$
symmetric. As
$\sqrt{g}$
must be
$N_{P}$
symmetric, this implies
$S_{{\mathcal{R}}}$
possesses the same symmetry.
Next we consider the ‘only if’ portion of the argument. Suppose that
$S_{{\mathcal{R}}}$
is
$N_{P}$
symmetric. As
$\sqrt{g}$
is also
$N_{P}$
symmetric, then
$(\sqrt{g}S_{{\mathcal{R}}})$
can be expressed in a Fourier series as (D 3) where the sum includes
$n$
that are integer multiples of
$N_{P}$
. Inserting the Fourier series into (D 4), we find that
$\unicode[STIX]{x2202}{\mathcal{R}}/\unicode[STIX]{x2202}B_{mn}^{c}=0$
for all
$n$
that are not integer multiples of
$N_{P}$
.
D.2 Symmetry of Fourier derivatives
To continue, we need to show that
$\unicode[STIX]{x2202}{\mathcal{R}}/\unicode[STIX]{x2202}B_{mn}^{s}=0$
for all
$m$
and
$n$
and
$\unicode[STIX]{x2202}{\mathcal{R}}/\unicode[STIX]{x2202}B_{mn}^{c}=0$
for all
$n$
which are not integer multiples of
$N_{P}$
. We begin with the
$N_{P}$
symmetry argument. We consider the symmetry of
$f_{1s}$
implied by (A 1). Under the transformation
$\unicode[STIX]{x1D701}\rightarrow \unicode[STIX]{x1D701}+2\unicode[STIX]{x03C0}/N_{P}$
, we find that each of the trajectory coefficients remain unchanged, as well as the source term and collision operator. Therefore we can conclude that
$f_{1s}$
is
$N_{P}$
symmetric. We can also note that each of the
$\widetilde{{\mathcal{R}}}$
vectors are
$N_{P}$
symmetric, as well as
$\sqrt{g}$
. We consider the integrand that appears in the flux surface average in (3.3),

Here the superscript and subscript on
$\widetilde{{\mathcal{R}}}$
denotes that we consider the unknowns corresponding to the distribution function of species
$s$
. We note that
$D_{s}(\unicode[STIX]{x1D703},\unicode[STIX]{x1D701}+2\unicode[STIX]{x03C0}/N_{P})=D_{s}(\unicode[STIX]{x1D703},\unicode[STIX]{x1D701})$
. The quantity
${\mathcal{R}}$
can be expressed in terms of
$D_{s}$
as follows,

Next we consider the functional derivative of
${\mathcal{R}}$
with respect to
$B$
, defined as in (5.1). The derivative with respect to
$B_{mn}^{c}$
can be thus defined as,

As the functional derivative maintains the
$N_{P}$
symmetry of
$D_{s}$
and
$\sqrt{g}$
, the quantity in parenthesis in (D 7) can be expressed in a Fourier series containing only
$n$
that are integer multiples of
$N_{P}$
. Thus we see that the quantity
$\unicode[STIX]{x2202}{\mathcal{R}}/\unicode[STIX]{x2202}B_{mn}^{c}=0$
for all
$n$
that are not integer multiples of
$N_{P}$
.
Next we consider a similar argument for stellarator symmetry. We begin by considering the symmetry of
$f_{1s}$
implied by (A 1) in the case
$E_{r}=0$
. Under the transformation
$(\unicode[STIX]{x1D703},\unicode[STIX]{x1D701},v_{\Vert })\rightarrow (-\unicode[STIX]{x1D703},-\unicode[STIX]{x1D701},-v_{\Vert })$
, we see that both the collisionless trajectory operator and the collision operator maintain the parity of
$f_{1s}$
, while the source term is odd. Therefore,
$f_{1s}$
must be odd under this transformation. In this case, we can write
$f_{1s}$
as

where
$f_{a,s}^{-}(x_{s},-\unicode[STIX]{x1D709}_{s})=-f_{a,s}^{-}(x_{s},\unicode[STIX]{x1D709}_{s})$
,
$f_{a,s}^{+}(x_{s},-\unicode[STIX]{x1D709}_{s})=f_{a,s}^{+}(x_{s},-\unicode[STIX]{x1D709}_{s})$
and analogous expressions for
$f_{b,s}^{+}$
and
$f_{b,s}^{-}$
.
We next note that each of the
$\widetilde{{\mathcal{R}}}_{s}^{f}$
are odd under the transformation
$(\unicode[STIX]{x1D703},\unicode[STIX]{x1D701},v_{\Vert })\rightarrow (-\unicode[STIX]{x1D703},-\unicode[STIX]{x1D701},-v_{\Vert })$
. As
$\sqrt{g}$
is even, then we can express
$\widetilde{{\mathcal{R}}}_{s}^{f}\sqrt{g}$
in a similar way to (D 8),

The integrand that appears in the flux surface average becomes,

We see that
$D_{s}$
is even with respect to the transformation
$(\unicode[STIX]{x1D703},\unicode[STIX]{x1D701})\rightarrow (-\unicode[STIX]{x1D703},-\unicode[STIX]{x1D701})$
. The quantity
${\mathcal{R}}$
can be written as in (D 6) and the derivative with respect to a stellarator-asymmetric mode is

The functional derivative with respect to
$B$
does not change the parity of
$D_{s}$
or
$\sqrt{g}$
, thus we see that the quantity in parenthesis in the above equation is even with respect to the transformation
$(\unicode[STIX]{x1D703},\unicode[STIX]{x1D701})\rightarrow (-\unicode[STIX]{x1D703},-\unicode[STIX]{x1D701})$
. Therefore,
$\unicode[STIX]{x2202}{\mathcal{R}}/\unicode[STIX]{x2202}B_{mn}^{s}=0$
for all
$m$
and
$n$
. A similar argument cannot be made if
$E_{r}\neq 0$
, as the inhomogeneous drive term in (A 1) no longer has definite parity. However, according to the arguments in (Hirshman, Shaing & Van Rij Reference Hirshman, Shaing and Van Rij1986a
) the transport coefficients do obey this symmetry property.
Appendix E. Derivatives at ambipolarity
In this appendix we derive an expression for derivatives of moments of the distribution function at fixed ambipolarity rather than fixed
$E_{r}$
by determining the relationship between geometry parameters,
$\unicode[STIX]{x1D6FA}$
, and
$E_{r}$
. To begin, it is assumed that the continuous adjoint approach outlined in § 3.1 is used. The approach taken here is analogous to that used in appendix A of Paul et al. (Reference Paul, Landreman, Bader and Dorland2018), in which an additional adjoint equation is used to compute derivatives at a fixed constraint function for optimization of stellarator coil shapes.
Consider the set of unknowns computed with SFINCS,
$F$
, which depends on parameters
$\unicode[STIX]{x1D6FA}$
and
$E_{r}$
. The total differential of
$F$
satisfies

which follows from (2.11). Consider
$J_{r}(F,\unicode[STIX]{x1D6FA})$
, which depends on
$E_{r}$
through
$F$
. The total differential of
$J_{r}$
can be computed,

which can be written using (E 1) and the solution to (5.11),

By enforcing
$\text{d}J_{r}(F(\unicode[STIX]{x1D6FA},E_{r}),\unicode[STIX]{x1D6FA})=0$
, we obtain the relationship between
$E_{r}$
and
$\unicode[STIX]{x1D6FA}$
at ambipolarity,

Consider a moment of the distribution function,
${\mathcal{R}}(F,\unicode[STIX]{x1D6FA})$
. The derivative with respect to
$\unicode[STIX]{x1D6FA}_{i}$
at fixed ambipolarity can thus be computed,

The first term corresponds to the explicit dependence on
$\unicode[STIX]{x1D6FA}_{i}$
, while the second contains dependence through
$F$
. Here
$(\unicode[STIX]{x2202}F/\unicode[STIX]{x2202}\unicode[STIX]{x1D6FA}_{i})_{J_{r}}$
satisfies,

from (E 1) using (E 4). Using (E 6) and (3.7), we find

An analogous expression can be obtained using the discrete approach,

where (5.12) has been used.