Hostname: page-component-7b9c58cd5d-7g5wt Total loading time: 0 Render date: 2025-03-14T00:33:43.276Z Has data issue: false hasContentIssue false

Variational treatment of inertia–gravity waves interacting with a quasi-geostrophic mean flow

Published online by Cambridge University Press:  14 November 2016

Rick Salmon*
Affiliation:
Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA 92093-0213, USA
*
Email address for correspondence: rsalmon@ucsd.edu

Abstract

The equations for three-dimensional hydrostatic Boussinesq dynamics are equivalent to a variational principle that is closely analogous to the variational principle for classical electrodynamics. Inertia–gravity waves are analogous to electromagnetic waves, and available potential vorticity (i.e. the amount by which the potential vorticity exceeds the potential vorticity of the rest state) is analogous to electric charge. The Lagrangian can be expressed as the sum of three parts. The first part corresponds to quasi-geostrophic dynamics in the absence of inertia–gravity waves. The second part corresponds to inertia–gravity waves in the absence of quasi-geostrophic flow. The third part represents a coupling between the inertia–gravity waves and quasi-geostrophic motion. This formulation provides the basis for a general theory of inertia–gravity waves interacting with a quasi-geostrophic mean flow.

Type
Papers
Copyright
© 2016 Cambridge University Press 

1 Introduction

This paper offers a new variational principle which seems to be especially useful for studying the interactions between inertia–gravity waves and quasi-geostrophic flow. The variational principle is developed for shallow-water dynamics in § 2 and for hydrostatic Boussinesq dynamics in § 3. The latter is of primary interest. The Lagrangian for Boussinesq dynamics,

(1.1) $$\begin{eqnarray}L=L_{1}[\unicode[STIX]{x1D713},\unicode[STIX]{x1D6FE},B]+L_{2}[\unicode[STIX]{x1D713},\unicode[STIX]{x1D6FE},\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD}],\end{eqnarray}$$

depends on the five fields $\unicode[STIX]{x1D713}$ , $\unicode[STIX]{x1D6FE}$ , $B$ , $\unicode[STIX]{x1D6FC}$ and $\unicode[STIX]{x1D6FD}$ . These have the following physical interpretations: $\unicode[STIX]{x1D713}$ is the streamfunction and $\unicode[STIX]{x1D6FE}_{t}$ is the velocity potential corresponding to the ‘thickness weighted’ velocity, i.e. the velocity times the local isopycnal separation; $B$ is the Bernoulli function; $\unicode[STIX]{x1D6FC}$ and $\unicode[STIX]{x1D6FD}$ are particle labels which track and measure the available potential vorticity (APV), defined as the difference between the potential vorticity and its value in the rest state. Variations of $\unicode[STIX]{x1D713}$ yield the equation that relates the velocity to the APV. Variations of $\unicode[STIX]{x1D6FE}$ yield the divergence equation. Variations of $B$ yield the hydrostatic relation. Variations of $\unicode[STIX]{x1D6FC}$ and $\unicode[STIX]{x1D6FD}$ yield the equation for the advection of APV. The importance of APV to wave–mean theory is strongly emphasized by Wagner & Young (Reference Wagner and Young2015).

If the flow is initially at rest, then the APV vanishes and $L_{2}\equiv 0$ does not contribute. In § 4, we use $\unicode[STIX]{x1D6FF}L_{1}=0$ to calculate the $O(a^{2})$ mean flow that arises when inertia–gravity waves of amplitude $a$ propagate into a region that is initially at rest. This is the classic problem treated by Bretherton (Reference Bretherton1969). If the wave amplitude varies slowly enough, we obtain (4.31), in which the curl of the pseudomomentum represents a source of quasi-geostrophic potential vorticity.

Section 5 considers the case of non-vanishing APV. In this case, $L_{2}$ does not vanish, and in § 5 we develop approximations to $L_{2}$ that are analogous to those developed for $L_{1}$ in § 4. Section 6 combines the results of §§ 4 and 5 to obtain a Lagrangian (6.12) that depends on the streamfunction $\bar{\unicode[STIX]{x1D713}}$ for the Lagrangian mean velocity, two variables $\tilde{\unicode[STIX]{x1D6FC}}$ and $\tilde{\unicode[STIX]{x1D6FD}}$ that track and label the available potential vorticity $\tilde{Q}$ that would occur if the APV were advected by the mean flow alone, and the fluid particle displacements $(\unicode[STIX]{x1D709},\unicode[STIX]{x1D702})$ caused by the waves. The Lagrangian (6.12) is our primary result. It consists of three parts. The first part, (6.13), corresponds to quasi-geostrophic motion in the absence of inertia–gravity waves. The second part, (6.14), corresponds to inertia–gravity waves in the absence of a mean flow. The third part, (6.15), represents a coupling between the inertia–gravity waves and the quasi-geostrophic mean flow.

The resulting dynamics always includes (7.3) governing the advection of $\tilde{Q}$ by the Lagrangian mean streamfunction $\bar{\unicode[STIX]{x1D713}}$ , but the equation (7.31) relating $\tilde{Q}$ to $\bar{\unicode[STIX]{x1D713}}$ contains a wave contribution $q^{w}$ whose precise form depends on the strength of our assumptions about the wave field. The weakest such assumption is that the waves and the mean flow are separated only by their time scales. This was the assumption made by Wagner & Young (Reference Wagner and Young2015), and in § 7 and appendix B we show that our $q^{w}$ is equivalent to theirs.

A somewhat stronger assumption is that the waves and the mean flow are separated both in time scale and in vertical length scale. This is the case for near-inertial waves, which oscillate rapidly in $t$ and in $z$ , but may have horizontal length scales comparable to those of the quasi-geostrophic mean flow. Near-inertial motion was analysed by Young & Ben Jelloul (Reference Young and Ben Jelloul1997) and Xie & Vanneste (Reference Xie and Vanneste2015), and in § 7 we obtain wave–mean equations closely related to theirs.

The strongest assumption is the Wentzel–Kramers–Brillouin (WKB) assumption that the waves and the mean flow have a scale separation with respect to all four of the independent variables $x,y,z,t$ . In this case, we recover results similar to those of Bühler & McIntyre (Reference Bühler and McIntyre1998).

Our ability to recover results such as these, which were obtained by powerful methods that seem to have little in common, hints at the synthetic potential of our method.

Throughout the present paper, we refer to an analogy between our formulation and the standard Lagrangian formulation of classical electrodynamics. This analogy, noted by Salmon (Reference Salmon2014, hereafter S14), largely inspired the present work. S14 treated a simplified shallow-water dynamics (for which the analogy is stronger) and did not consider coordinate-system rotation. In S14, the potential vorticity was assumed to be concentrated in point vortices, which are analogous to electrons. In the present paper, we relax all of the unrealistic assumptions of S14. (However, we continue to assume that the Coriolis parameter is a constant.) The electrodynamic analogy still seems quite useful, and it is explained more fully in appendix A. However, readers who find it unhelpful, or even annoying, are invited to ignore it completely. The present paper is entirely understandable in its own terms.

2 Shallow-water dynamics

The shallow-water equations are

(2.1) $$\begin{eqnarray}\displaystyle & \boldsymbol{u}_{t}+\unicode[STIX]{x1D735}\left(c^{2}{\hat{h}}+{\textstyle \frac{1}{2}}\boldsymbol{u}\boldsymbol{\cdot }\boldsymbol{u}\right)=(\unicode[STIX]{x1D701}+f_{0})(v,-u), & \displaystyle\end{eqnarray}$$
(2.2) $$\begin{eqnarray}\displaystyle & {\hat{h}}_{t}+\unicode[STIX]{x1D735}\boldsymbol{\cdot }({\hat{h}}\boldsymbol{u})=0, & \displaystyle\end{eqnarray}$$

where $\boldsymbol{u}(\boldsymbol{x},t)=(u,v)$ is the fluid velocity at location $\boldsymbol{x}=(x,y)$ and time $t$ , ${\hat{h}}=h/h_{0}$ is the depth divided by its constant mean value $h_{0}$ , $c^{2}=gh_{0}$ with gravity constant $g$ , $\unicode[STIX]{x1D701}=\unicode[STIX]{x1D735}\times \boldsymbol{u}\equiv v_{x}-u_{y}$ is the vertical component of vorticity and $f_{0}$ is the constant Coriolis parameter. Subscripts denote partial derivatives and $\unicode[STIX]{x1D735}=(\unicode[STIX]{x2202}_{x},\unicode[STIX]{x2202}_{y})$ . For definiteness, we suppose that the fluid is horizontally unbounded and quiescent at infinity. The vorticity and divergence equations corresponding to (2.1)–(2.2) are

(2.3) $$\begin{eqnarray}\unicode[STIX]{x1D701}_{t}=-\unicode[STIX]{x1D735}\boldsymbol{\cdot }((\unicode[STIX]{x1D701}+f_{0})\boldsymbol{u})\end{eqnarray}$$

and

(2.4) $$\begin{eqnarray}\frac{\unicode[STIX]{x2202}}{\unicode[STIX]{x2202}t}\unicode[STIX]{x1D735}\boldsymbol{\cdot }\boldsymbol{u}+\unicode[STIX]{x1D6FB}^{2}\left(c^{2}{\hat{h}}+\frac{1}{2}\boldsymbol{u}\boldsymbol{\cdot }\boldsymbol{u}\right)=\unicode[STIX]{x1D735}\times ((\unicode[STIX]{x1D701}+f_{0})\boldsymbol{u}).\end{eqnarray}$$

The shallow-water equations conserve energy in the form

(2.5) $$\begin{eqnarray}H=\iint \,\text{d}\boldsymbol{x}{\displaystyle \frac{1}{2}}(h\boldsymbol{u}\boldsymbol{\cdot }\boldsymbol{u}+gh^{2})=h_{0}\iint \,\text{d}\boldsymbol{x}{\displaystyle \frac{1}{2}}({\hat{h}}\boldsymbol{u}\boldsymbol{\cdot }\boldsymbol{u}+c^{2}{\hat{h}}^{2}),\end{eqnarray}$$

and the potential vorticity

(2.6) $$\begin{eqnarray}q\equiv (\unicode[STIX]{x1D701}+f_{0})/{\hat{h}},\end{eqnarray}$$

on fluid particles,

(2.7) $$\begin{eqnarray}q_{t}+\boldsymbol{u}\boldsymbol{\cdot }\unicode[STIX]{x1D735}q=0.\end{eqnarray}$$

In this section, we generalize the variational principle of S14 to full shallow-water dynamics (2.1)–(2.2) with continuously distributed vorticity and constant Coriolis parameter $f_{0}$ . We start by representing the physical variables

(2.8) $$\begin{eqnarray}\displaystyle & {\hat{h}}=1-\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}, & \displaystyle\end{eqnarray}$$
(2.9) $$\begin{eqnarray}\displaystyle & {\hat{h}}u=-\unicode[STIX]{x1D713}_{y}+\unicode[STIX]{x1D6FE}_{xt}, & \displaystyle\end{eqnarray}$$
(2.10) $$\begin{eqnarray}\displaystyle & {\hat{h}}v=\unicode[STIX]{x1D713}_{x}+\unicode[STIX]{x1D6FE}_{yt} & \displaystyle\end{eqnarray}$$

in terms of the potentials $\unicode[STIX]{x1D713}(\boldsymbol{x},t)$ and $\unicode[STIX]{x1D6FE}(\boldsymbol{x},t)$ . Compared with the method followed in S14, the representation (2.8)–(2.10) corresponds to the immediate adoption of the Coulomb gauge, as explained in appendix A. For present purposes, it is sufficient to note that (2.8)–(2.10) is a general representation that automatically satisfies the shallow-water continuity equation (2.2).

We motivate the derivation of the variational principle by first imposing – and then gradually lifting – a strong constraint: we assume that the potential vorticity (2.6) is uniform. If the potential vorticity is uniform, then

(2.11) $$\begin{eqnarray}\unicode[STIX]{x1D701}+f_{0}=f_{0}{\hat{h}}.\end{eqnarray}$$

By (2.8)–(2.10), equation (2.11) is equivalent to

(2.12) $$\begin{eqnarray}\frac{\unicode[STIX]{x2202}}{\unicode[STIX]{x2202}x}\left(\frac{\unicode[STIX]{x1D713}_{x}+\unicode[STIX]{x1D6FE}_{yt}}{1-\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}}\right)+\frac{\unicode[STIX]{x2202}}{\unicode[STIX]{x2202}y}\left(\frac{\unicode[STIX]{x1D713}_{y}-\unicode[STIX]{x1D6FE}_{xt}}{1-\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}}\right)=-f_{0}\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}.\end{eqnarray}$$

Both (2.12) and the corresponding form of (2.4), namely

(2.13) $$\begin{eqnarray}\left(\frac{-\unicode[STIX]{x1D713}_{y}+\unicode[STIX]{x1D6FE}_{xt}}{1-\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}}\right)_{xt}+\left(\frac{\unicode[STIX]{x1D713}_{x}+\unicode[STIX]{x1D6FE}_{yt}}{1-\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}}\right)_{yt}+\unicode[STIX]{x1D6FB}^{2}\left(c^{2}{\hat{h}}+\frac{1}{2}\boldsymbol{u}\boldsymbol{\cdot }\boldsymbol{u}\right)=f_{0}\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D713},\end{eqnarray}$$

result from the variational principle $\unicode[STIX]{x1D6FF}L_{1}[\unicode[STIX]{x1D713},\unicode[STIX]{x1D6FE}]=0$ , where

(2.14) $$\begin{eqnarray}L_{1}=\iiint \,\text{d}\boldsymbol{x}\,\text{d}t\left(\frac{1}{2}\frac{(-\unicode[STIX]{x1D713}_{y}+\unicode[STIX]{x1D6FE}_{xt})^{2}}{(1-\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE})}+\frac{1}{2}\frac{(\unicode[STIX]{x1D713}_{x}+\unicode[STIX]{x1D6FE}_{yt})^{2}}{(1-\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE})}+f_{0}\unicode[STIX]{x1D735}\unicode[STIX]{x1D713}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\unicode[STIX]{x1D6FE}-\frac{1}{2}c^{2}(\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE})^{2}\right).\end{eqnarray}$$

Now, we partly relax the constraint: we allow the potential vorticity to be non-uniform, but we assume that the departures from (2.11) are concentrated in delta functions. That is, we assume that

(2.15) $$\begin{eqnarray}\unicode[STIX]{x1D701}+f_{0}=f_{0}{\hat{h}}+\mathop{\sum }_{i}\unicode[STIX]{x1D6E4}_{i}\unicode[STIX]{x1D6FF}(\boldsymbol{x}-\boldsymbol{x}_{i}(t)).\end{eqnarray}$$

This is a consistent restriction on shallow-water dynamics in the sense that (2.15) is compatible with the conservation of total vorticity, $\iint \,\text{d}\boldsymbol{x}hq$ . The sum in (2.15) is the product of the fluid depth ${\hat{h}}$ and what Wagner & Young (Reference Wagner and Young2015) call the ‘available potential vorticity’ – the difference between the potential vorticity defined by (2.6) and its uniform value in the state of rest. This difference is, like the potential vorticity itself, conserved on fluid particles. In a slight departure from the terminology of Wagner & Young (Reference Wagner and Young2015), we shall refer to the sum itself as the available potential vorticity (APV). This sum obeys the flux-form conservation law (2.40).

Since the delta functions in (2.15) represent singularities in potential vorticity, they must, according to (2.7), move at the fluid velocity $\boldsymbol{u}$ . That is, the velocities of the point vortices must be given by $\dot{\boldsymbol{x}}_{i}(t)=\boldsymbol{u}(\boldsymbol{x}_{i}(t),t)$ , or, using (2.8)–(2.10),

(2.16) $$\begin{eqnarray}(1-\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE})\dot{\boldsymbol{x}}_{i}(t)=(-\unicode[STIX]{x1D713}_{y}+\unicode[STIX]{x1D6FE}_{tx},\unicode[STIX]{x1D713}_{x}+\unicode[STIX]{x1D6FE}_{ty}),\end{eqnarray}$$

where the overdot denotes time differentiation. The Lagrangian corresponding to (2.16) is

(2.17) $$\begin{eqnarray}L_{2}[\boldsymbol{x}_{i}(t)]=\int \,\text{d}t\mathop{\sum }_{i}\unicode[STIX]{x1D6E4}_{i}(-x_{i}{\dot{y}}_{i}+\unicode[STIX]{x1D713}(\boldsymbol{x}_{i},t)-\unicode[STIX]{x1D6FE}_{y}(\boldsymbol{x}_{i},t){\dot{x}}_{i}+\unicode[STIX]{x1D6FE}_{x}(\boldsymbol{x}_{i},t){\dot{y}}_{i}).\end{eqnarray}$$

That is, $\unicode[STIX]{x1D6FF}L_{2}/\unicode[STIX]{x1D6FF}\boldsymbol{x}_{i}=0$ implies (2.16).

We obtain the Lagrangian for shallow-water dynamics by combining (2.14) and (2.17) in the form

(2.18) $$\begin{eqnarray}\displaystyle & & \displaystyle L[\unicode[STIX]{x1D713}(\boldsymbol{x},t),\unicode[STIX]{x1D6FE}(\boldsymbol{x},t),\boldsymbol{x}_{i}(t)]\nonumber\\ \displaystyle & & \displaystyle \quad =L_{1}[\unicode[STIX]{x1D713}(\boldsymbol{x},t),\unicode[STIX]{x1D6FE}(\boldsymbol{x},t)]+L_{2}[\unicode[STIX]{x1D713}(\boldsymbol{x},t),\unicode[STIX]{x1D6FE}(\boldsymbol{x},t),\boldsymbol{x}_{i}(t)]=L_{1}[\unicode[STIX]{x1D713}(\boldsymbol{x},t),\unicode[STIX]{x1D6FE}(\boldsymbol{x},t)]\nonumber\\ \displaystyle & & \displaystyle \qquad +\iiint \,\text{d}t\,\text{d}\boldsymbol{x}\mathop{\sum }_{i}\unicode[STIX]{x1D6E4}_{i}\unicode[STIX]{x1D6FF}(\boldsymbol{x}-\boldsymbol{x}_{i}(t))(-x{\dot{y}}_{i}+\unicode[STIX]{x1D713}(\boldsymbol{x},t)-\unicode[STIX]{x1D6FE}_{y}(\boldsymbol{x},t){\dot{x}}_{i}+\unicode[STIX]{x1D6FE}_{x}(\boldsymbol{x},t){\dot{y}}_{i}).\qquad \quad\end{eqnarray}$$

As we shall show, shallow-water dynamics with point vortices of APV is equivalent to the requirement that (2.18) be stationary with respect to arbitrary independent variations of the fields $\unicode[STIX]{x1D713}(\boldsymbol{x},t)$ and $\unicode[STIX]{x1D6FE}(\boldsymbol{x},t)$ and the vortex locations $\boldsymbol{x}_{i}(t)$ . The last line of (2.18) is an alternative way of writing (2.17) that is useful for performing the field variations. The form (2.17) is preferable for performing the vortex-location variations. Before demonstrating the equivalence between (2.18) and shallow-water dynamics, we fully relax the constraint on potential vorticity.

Suppose that the APV is not confined to delta functions but varies continuously in space. It must still be advected at the fluid velocity $\boldsymbol{u}$ . We replace the ansatz (2.15) by

(2.19) $$\begin{eqnarray}\unicode[STIX]{x1D701}+f_{0}=f_{0}{\hat{h}}+\frac{\unicode[STIX]{x2202}(\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD})}{\unicode[STIX]{x2202}(x,y)},\end{eqnarray}$$

where $\unicode[STIX]{x1D6FC}(\boldsymbol{x},t)$ and $\unicode[STIX]{x1D6FD}(\boldsymbol{x},t)$ are a set of fluid particle labels that measure APV and move with the fluid. Thus,

(2.20) $$\begin{eqnarray}(1-\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE})\unicode[STIX]{x1D736}_{t}+(-\unicode[STIX]{x1D713}_{y}+\unicode[STIX]{x1D6FE}_{tx})\unicode[STIX]{x1D736}_{x}+(\unicode[STIX]{x1D713}_{x}+\unicode[STIX]{x1D6FE}_{ty})\unicode[STIX]{x1D736}_{y}=0,\end{eqnarray}$$

where $\unicode[STIX]{x1D736}=(\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD})$ . Compare (2.20)–(2.16). The labels $\unicode[STIX]{x1D6FC}$ and $\unicode[STIX]{x1D6FD}$ replace the subscript $i$ on $\unicode[STIX]{x1D6E4}_{i}$ in (2.15). Just as the last term in (2.15) integrates to $\sum _{i}\unicode[STIX]{x1D6E4}_{i}$ , the last term in (2.19) integrates to $\iint \,\text{d}\unicode[STIX]{x1D6FC}\,\text{d}\unicode[STIX]{x1D6FD}$ ; the total APV is a constant.

We obtain the general variational principle for shallow-water dynamics by replacing

(2.21) $$\begin{eqnarray}\mathop{\sum }_{i}\unicode[STIX]{x1D6E4}_{i}\unicode[STIX]{x1D6FF}(\boldsymbol{x}-\boldsymbol{x}_{i}(t))\rightarrow \frac{\unicode[STIX]{x2202}(\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD})}{\unicode[STIX]{x2202}(x,y)}\end{eqnarray}$$

and $\dot{\boldsymbol{x}}_{i}\rightarrow \unicode[STIX]{x2202}\boldsymbol{x}/\unicode[STIX]{x2202}\unicode[STIX]{x1D70F}$ in (2.18), where $\boldsymbol{x}(\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD},\unicode[STIX]{x1D70F})$ is the location at time $\unicode[STIX]{x1D70F}$ of the fluid particle labelled $(\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD})$ . Then,

(2.22) $$\begin{eqnarray}L_{2}=\iiint \,\text{d}t\,\text{d}\boldsymbol{x}\frac{\unicode[STIX]{x2202}(\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD})}{\unicode[STIX]{x2202}(x,y)}\left(-x\frac{\unicode[STIX]{x2202}y}{\unicode[STIX]{x2202}\unicode[STIX]{x1D70F}}+\unicode[STIX]{x1D713}(\boldsymbol{x},t)-\unicode[STIX]{x1D6FE}_{y}(\boldsymbol{x},t)\frac{\unicode[STIX]{x2202}x}{\unicode[STIX]{x2202}\unicode[STIX]{x1D70F}}+\unicode[STIX]{x1D6FE}_{x}(\boldsymbol{x},t)\frac{\unicode[STIX]{x2202}y}{\unicode[STIX]{x2202}\unicode[STIX]{x1D70F}}\right).\end{eqnarray}$$

By the chain rule,

(2.23) $$\begin{eqnarray}\frac{\unicode[STIX]{x2202}}{\unicode[STIX]{x2202}\unicode[STIX]{x1D70F}}\unicode[STIX]{x1D736}(\boldsymbol{x},t)=\frac{\unicode[STIX]{x2202}\unicode[STIX]{x1D736}}{\unicode[STIX]{x2202}t}+\left(\frac{\unicode[STIX]{x2202}\boldsymbol{x}}{\unicode[STIX]{x2202}\unicode[STIX]{x1D70F}}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\right)\unicode[STIX]{x1D736}=0.\end{eqnarray}$$

Solving (2.23) for $\unicode[STIX]{x2202}\boldsymbol{x}/\unicode[STIX]{x2202}\unicode[STIX]{x1D70F}$ , we obtain

(2.24) $$\begin{eqnarray}\displaystyle & \displaystyle \frac{\unicode[STIX]{x2202}(\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD})}{\unicode[STIX]{x2202}(x,y)}\frac{\unicode[STIX]{x2202}x}{\unicode[STIX]{x2202}\unicode[STIX]{x1D70F}}=\frac{\unicode[STIX]{x2202}(\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD})}{\unicode[STIX]{x2202}(y,t)}, & \displaystyle\end{eqnarray}$$
(2.25) $$\begin{eqnarray}\displaystyle & \displaystyle \frac{\unicode[STIX]{x2202}(\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD})}{\unicode[STIX]{x2202}(x,y)}\frac{\unicode[STIX]{x2202}y}{\unicode[STIX]{x2202}\unicode[STIX]{x1D70F}}=\frac{\unicode[STIX]{x2202}(\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD})}{\unicode[STIX]{x2202}(t,x)}. & \displaystyle\end{eqnarray}$$

Then, using (2.24)–(2.25) to eliminate $\unicode[STIX]{x2202}x/\unicode[STIX]{x2202}\unicode[STIX]{x1D70F}$ and $\unicode[STIX]{x2202}y/\unicode[STIX]{x2202}\unicode[STIX]{x1D70F}$ from (2.22), and noting that the first term in (2.22) simplifies as

(2.26) $$\begin{eqnarray}\displaystyle \iiint \,\text{d}t\,\text{d}\boldsymbol{x}\frac{\unicode[STIX]{x2202}(\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD})}{\unicode[STIX]{x2202}(x,y)}\left(-x\frac{\unicode[STIX]{x2202}y}{\unicode[STIX]{x2202}\unicode[STIX]{x1D70F}}\right) & = & \displaystyle \iiint \,\text{d}t\,\text{d}\boldsymbol{x}\left(-x\frac{\unicode[STIX]{x2202}(\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD})}{\unicode[STIX]{x2202}(t,x)}\right)\nonumber\\ \displaystyle & = & \displaystyle \iiint \,\text{d}t\,\text{d}\boldsymbol{x}\left(\unicode[STIX]{x1D6FC}\frac{\unicode[STIX]{x2202}(x,\unicode[STIX]{x1D6FD})}{\unicode[STIX]{x2202}(t,x)}\right)\nonumber\\ \displaystyle & = & \displaystyle \iiint \,\text{d}t\,\text{d}\boldsymbol{x}\;(-\unicode[STIX]{x1D6FC}\unicode[STIX]{x1D6FD}_{t}),\end{eqnarray}$$

we obtain the generalization of (2.18) to the case of continuously varying vorticity, namely

(2.27) $$\begin{eqnarray}\displaystyle & & \displaystyle L[\unicode[STIX]{x1D713}(\boldsymbol{x},t),\unicode[STIX]{x1D6FE}(\boldsymbol{x},t),\unicode[STIX]{x1D736}(\boldsymbol{x},t)]=L_{1}[\unicode[STIX]{x1D713}(\boldsymbol{x},t),\unicode[STIX]{x1D6FE}(\boldsymbol{x},t)]+L_{2}[\unicode[STIX]{x1D713}(\boldsymbol{x},t),\unicode[STIX]{x1D6FE}(\boldsymbol{x},t),\unicode[STIX]{x1D6FC}(\boldsymbol{x},t),\unicode[STIX]{x1D6FD}(\boldsymbol{x},t)]\nonumber\\ \displaystyle & & \displaystyle \quad =\iiint \,\text{d}\boldsymbol{x}\,\text{d}t\left(\frac{1}{2}\frac{(-\unicode[STIX]{x1D713}_{y}+\unicode[STIX]{x1D6FE}_{xt})^{2}}{(1-\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE})}+\frac{1}{2}\frac{(\unicode[STIX]{x1D713}_{x}+\unicode[STIX]{x1D6FE}_{yt})^{2}}{(1-\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE})}+f_{0}\unicode[STIX]{x1D735}\unicode[STIX]{x1D713}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\unicode[STIX]{x1D6FE}-\frac{1}{2}c^{2}(\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE})^{2}\right)\nonumber\\ \displaystyle & & \displaystyle \qquad +\,\iiint \,\text{d}t\,\text{d}\boldsymbol{x}\left(-\unicode[STIX]{x1D6FC}\unicode[STIX]{x1D6FD}_{t}+\unicode[STIX]{x1D713}\frac{\unicode[STIX]{x2202}(\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD})}{\unicode[STIX]{x2202}(x,y)}-\unicode[STIX]{x1D6FE}_{y}\frac{\unicode[STIX]{x2202}(\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD})}{\unicode[STIX]{x2202}(y,t)}+\unicode[STIX]{x1D6FE}_{x}\frac{\unicode[STIX]{x2202}(\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD})}{\unicode[STIX]{x2202}(t,x)}\right).\end{eqnarray}$$

We shall demonstrate that the requirement that (2.27) be stationary with respect to arbitrary variations in its four fields is equivalent to shallow-water dynamics.

The equations resulting from the requirement that (2.27) be stationary are

(2.28) $$\begin{eqnarray}\displaystyle & \displaystyle \unicode[STIX]{x1D6FF}\unicode[STIX]{x1D713}:\quad \frac{\unicode[STIX]{x2202}}{\unicode[STIX]{x2202}x}\left(\frac{\unicode[STIX]{x1D713}_{x}+\unicode[STIX]{x1D6FE}_{yt}}{1-\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}}\right)+\frac{\unicode[STIX]{x2202}}{\unicode[STIX]{x2202}y}\left(\frac{\unicode[STIX]{x1D713}_{y}-\unicode[STIX]{x1D6FE}_{xt}}{1-\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}}\right)+f_{0}\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}=Q, & \displaystyle\end{eqnarray}$$
(2.29) $$\begin{eqnarray}\displaystyle & \displaystyle \unicode[STIX]{x1D6FF}\unicode[STIX]{x1D6FE}:\quad \left(\frac{-\unicode[STIX]{x1D713}_{y}+\unicode[STIX]{x1D6FE}_{xt}}{1-\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}}\right)_{xt}+\left(\frac{\unicode[STIX]{x1D713}_{x}+\unicode[STIX]{x1D6FE}_{yt}}{1-\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}}\right)_{yt}+\unicode[STIX]{x1D6FB}^{2}\left(c^{2}{\hat{h}}+\frac{1}{2}\boldsymbol{u}\boldsymbol{\cdot }\boldsymbol{u}\right)-f_{0}\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D713}=\unicode[STIX]{x1D735}\times \boldsymbol{J}, & \displaystyle \nonumber\\ \displaystyle & & \displaystyle\end{eqnarray}$$
(2.30) $$\begin{eqnarray}\displaystyle & \unicode[STIX]{x1D6FF}\unicode[STIX]{x1D736}:\quad (1-\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE})\unicode[STIX]{x1D736}_{t}+(-\unicode[STIX]{x1D713}_{y}+\unicode[STIX]{x1D6FE}_{tx})\unicode[STIX]{x1D736}_{x}+(\unicode[STIX]{x1D713}_{x}+\unicode[STIX]{x1D6FE}_{ty})\unicode[STIX]{x1D736}_{y}=0, & \displaystyle\end{eqnarray}$$

where ${\hat{h}}$ and $\boldsymbol{u}$ are given by (2.8)–(2.10),

(2.31) $$\begin{eqnarray}Q\equiv \frac{\unicode[STIX]{x2202}(\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD})}{\unicode[STIX]{x2202}(x,y)}\end{eqnarray}$$

is the APV and

(2.32) $$\begin{eqnarray}\boldsymbol{J}\equiv (J^{x},J^{y})\equiv \left(\frac{\unicode[STIX]{x2202}(\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD})}{\unicode[STIX]{x2202}(y,t)},\frac{\unicode[STIX]{x2202}(\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD})}{\unicode[STIX]{x2202}(t,x)}\right).\end{eqnarray}$$

From (2.24)–(2.25) and (2.31), it follows that

(2.33) $$\begin{eqnarray}\boldsymbol{J}=\frac{\unicode[STIX]{x2202}\boldsymbol{x}}{\unicode[STIX]{x2202}\unicode[STIX]{x1D70F}}Q=\boldsymbol{u}Q.\end{eqnarray}$$

Thus, $\boldsymbol{J}$ is the flux of APV. In terms of the physical variables defined by (2.8)–(2.10), equations (2.28) and (2.29) take the forms

(2.34) $$\begin{eqnarray}\unicode[STIX]{x1D701}+f_{0}-f_{0}{\hat{h}}=Q\end{eqnarray}$$

and

(2.35) $$\begin{eqnarray}\unicode[STIX]{x1D735}\boldsymbol{\cdot }\boldsymbol{u}_{t}+\unicode[STIX]{x1D6FB}^{2}\left(c^{2}{\hat{h}}+{\textstyle \frac{1}{2}}\boldsymbol{u}\boldsymbol{\cdot }\boldsymbol{u}\right)-f_{0}({\hat{h}}v)_{x}+f_{0}({\hat{h}}u)_{y}=(vQ)_{x}-(uQ)_{y}.\end{eqnarray}$$

Eliminating $Q$ between (2.34) and (2.35), we obtain the shallow-water divergence equation (2.4). To see that (2.28)–(2.30) imply the shallow-water vorticity equation (2.3), we make use of the identity

(2.36) $$\begin{eqnarray}\frac{\unicode[STIX]{x2202}}{\unicode[STIX]{x2202}t}\frac{\unicode[STIX]{x2202}(\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD})}{\unicode[STIX]{x2202}(x,y)}+\frac{\unicode[STIX]{x2202}}{\unicode[STIX]{x2202}x}\frac{\unicode[STIX]{x2202}(\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD})}{\unicode[STIX]{x2202}(y,t)}+\frac{\unicode[STIX]{x2202}}{\unicode[STIX]{x2202}y}\frac{\unicode[STIX]{x2202}(\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD})}{\unicode[STIX]{x2202}(t,x)}=0,\end{eqnarray}$$

which holds for any two functions $\unicode[STIX]{x1D6FC}(x,y,t)$ and $\unicode[STIX]{x1D6FD}(x,y,t)$ . By (2.31)–(2.33), equation (2.36) is equivalent to

(2.37) $$\begin{eqnarray}Q_{t}+\unicode[STIX]{x1D735}\boldsymbol{\cdot }(Q\boldsymbol{u})=0.\end{eqnarray}$$

Substituting (2.34) into (2.37) and using (2.2), we obtain (2.3). This concludes the proof that stationarity of (2.27) is equivalent to shallow-water dynamics.

Before proceeding, we briefly return to the Lagrangian (2.18) for point vortices of APV. The equations resulting from the requirement that (2.18) be stationary with respect to variations of $\unicode[STIX]{x1D713}(\boldsymbol{x},t)$ , $\unicode[STIX]{x1D6FE}(\boldsymbol{x},t)$ and $\boldsymbol{x}_{i}(t)$ are (2.28) with

(2.38) $$\begin{eqnarray}Q=\mathop{\sum }_{i}\unicode[STIX]{x1D6E4}_{i}\unicode[STIX]{x1D6FF}(\boldsymbol{x}-\boldsymbol{x}_{i}(t)),\end{eqnarray}$$

equation (2.29) with

(2.39) $$\begin{eqnarray}\boldsymbol{J}=\mathop{\sum }_{i}\unicode[STIX]{x1D6E4}_{i}\unicode[STIX]{x1D6FF}(\boldsymbol{x}-\boldsymbol{x}_{i}(t))\dot{\boldsymbol{x}}_{i}\end{eqnarray}$$

and (2.16). The definitions (2.38) and (2.39) imply

(2.40) $$\begin{eqnarray}Q_{t}+\unicode[STIX]{x1D735}\boldsymbol{\cdot }\boldsymbol{J}=0.\end{eqnarray}$$

If the fluid depth is nearly constant, then we may replace the denominators $1-\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}$ in (2.14) by unity, obtaining the quadratic approximation

(2.41) $$\begin{eqnarray}L_{1}^{0}=\iiint \,\text{d}\boldsymbol{x}\,\text{d}t\left(\frac{1}{2}\unicode[STIX]{x1D735}\unicode[STIX]{x1D713}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\unicode[STIX]{x1D713}+\frac{1}{2}\unicode[STIX]{x1D735}\unicode[STIX]{x1D6FE}_{t}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\unicode[STIX]{x1D6FE}_{t}+f_{0}\unicode[STIX]{x1D735}\unicode[STIX]{x1D713}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\unicode[STIX]{x1D6FE}-\frac{1}{2}c^{2}(\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE})^{2}\right)\end{eqnarray}$$

to $L_{1}$ . If $f_{0}=0$ , then the Lagrangian $L_{1}^{0}+L_{2}$ , where $L_{2}$ is given by (2.17), is equivalent to the Lagrangian given in S14, which was shown in that paper to be closely analogous to the Lagrangian for classical electrodynamics. In this approximation, the waves are linear non-dispersive waves that interact with the point vortices but not with each other. The point vortices are analogous to electrons. The term $Q$ is analogous to electric charge density and $\boldsymbol{J}$ is analogous to electric current density. Equation (2.16) is analogous to the Lorentz force law. Further details on the electrodynamic analogy are given in appendix A.

In the present paper, we have greater use for the continuous-vorticity Lagrangian (2.27) than for its discrete-vorticity counterpart (2.18). The latter remains a valuable thinking tool (as illustrated in § 5) and may be useful for numerical studies. However, it is worth pointing out that point-vortex dynamics corresponding to (2.18) cannot be obtained as a special case of (2.27). This is obvious from the fact that, in applying point-vortex dynamics, we must omit the influence of each vortex on itself. Furthermore, any attempt to derive the continuous-vorticity dynamics by averaging over point vortices would introduce terms analogous to the molecular viscosity terms that arise when one attempts to derive ideal fluid dynamics by averaging over molecular motions.

The continuous-vorticity Lagrangian (2.27) seems to be a good starting point for theories describing the interactions between inertia–gravity waves and quasi-geostrophic flow. Three reasons for this are readily apparent. First, the Eulerian average of (2.9) and (2.10) corresponds to the Lagrangian mean velocity, often denoted $\bar{\boldsymbol{u}}^{L}$ . Thus, the average values of $\unicode[STIX]{x1D713}$ and $\unicode[STIX]{x1D6FE}_{t}$ represent the streamfunction and velocity potential for the Lagrangian mean flow. The automatic satisfaction of (2.2) by the representation (2.8)–(2.10) corresponds to the fact that, in the Lagrangian mean, no Reynolds fluxes occur in the average of (2.2).

Second, the Lagrangian $L_{1}$ , by itself, describes the fluid motion that results when waves propagate into a region that is initially at rest, and in which the APV therefore vanishes. That is, equation (2.14) is the Lagrangian for nonlinear inertia–gravity waves uncontaminated by APV. If $L_{1}$ is approximated as (2.41), then the equations resulting from $\unicode[STIX]{x1D6FF}L_{1}^{0}=0$ combine to yield the Klein–Gordon equation,

(2.42) $$\begin{eqnarray}\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}_{tt}+f_{0}^{2}\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}-c^{2}\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}=0,\end{eqnarray}$$

for $\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}$ , which describes linear shallow-water inertia–gravity waves. The fully nonlinear form (2.14) will be used in § 4 to determine the mean flow that arises when inertia–gravity waves propagate into an initially quiescent region.

Third, there is an intimate connection between (2.27) and the Lagrangian for quasi-geostrophic motion. To see this most directly, suppose that all of the $\unicode[STIX]{x1D6FE}$ -terms except $f_{0}\unicode[STIX]{x1D735}\unicode[STIX]{x1D713}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\unicode[STIX]{x1D6FE}$ and $c^{2}(\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE})^{2}$ are simply dropped from (2.27). (This can be justified by scaling the variables in the manner appropriate for geostrophic flow.) The resulting Lagrangian is

(2.43) $$\begin{eqnarray}\iiint \,\text{d}t\,\text{d}\boldsymbol{x}\left(\frac{1}{2}\unicode[STIX]{x1D735}\unicode[STIX]{x1D713}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\unicode[STIX]{x1D713}+f_{0}\unicode[STIX]{x1D735}\unicode[STIX]{x1D713}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\unicode[STIX]{x1D6FE}-\frac{1}{2}c^{2}(\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE})^{2}-\unicode[STIX]{x1D6FC}\unicode[STIX]{x1D6FD}_{t}+\unicode[STIX]{x1D713}J(\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD})\right),\end{eqnarray}$$

where $J(\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD})\equiv \unicode[STIX]{x1D6FC}_{x}\unicode[STIX]{x1D6FD}_{y}-\unicode[STIX]{x1D6FD}_{x}\unicode[STIX]{x1D6FC}_{y}$ . If we use

(2.44) $$\begin{eqnarray}\unicode[STIX]{x1D6FF}\unicode[STIX]{x1D6FE}:\quad \unicode[STIX]{x1D6FB}^{2}(f_{0}\unicode[STIX]{x1D713}+c^{2}\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE})=0,\end{eqnarray}$$

hence

(2.45) $$\begin{eqnarray}\unicode[STIX]{x1D713}=-c^{2}/f_{0}\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}=g/f_{0}(h-h_{0}),\end{eqnarray}$$

to eliminate $\unicode[STIX]{x1D6FE}$ from (2.43), we obtain the Lagrangian

(2.46) $$\begin{eqnarray}L_{QG}[\unicode[STIX]{x1D713},\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD}]=\iiint \,\text{d}t\,\text{d}\boldsymbol{x}\left(\frac{1}{2}\unicode[STIX]{x1D735}\unicode[STIX]{x1D713}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\unicode[STIX]{x1D713}+\frac{1}{2}\frac{f_{0}^{2}}{c^{2}}\unicode[STIX]{x1D713}^{2}-\unicode[STIX]{x1D6FC}\unicode[STIX]{x1D6FD}_{t}+\unicode[STIX]{x1D713}J(\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD})\right)\end{eqnarray}$$

for quasi-geostrophic dynamics. The condition (2.45) is a statement of geostrophic balance. Stationarity of (2.46) implies

(2.47) $$\begin{eqnarray}\displaystyle & \displaystyle \unicode[STIX]{x1D6FF}\unicode[STIX]{x1D713}:\quad \unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D713}-\frac{f_{0}^{2}}{c^{2}}\unicode[STIX]{x1D713}=J(\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD}), & \displaystyle\end{eqnarray}$$
(2.48) $$\begin{eqnarray}\displaystyle & \displaystyle \unicode[STIX]{x1D6FF}\unicode[STIX]{x1D6FD}:\quad \unicode[STIX]{x1D6FC}_{t}+J(\unicode[STIX]{x1D713},\unicode[STIX]{x1D6FC})=0, & \displaystyle\end{eqnarray}$$
(2.49) $$\begin{eqnarray}\displaystyle & \displaystyle \unicode[STIX]{x1D6FF}\unicode[STIX]{x1D6FC}:\quad \unicode[STIX]{x1D6FD}_{t}+J(\unicode[STIX]{x1D713},\unicode[STIX]{x1D6FD})=0. & \displaystyle\end{eqnarray}$$

Using the Jacobi identity,

(2.50) $$\begin{eqnarray}J(J(\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD}),\unicode[STIX]{x1D713})+J(J(\unicode[STIX]{x1D6FD},\unicode[STIX]{x1D713}),\unicode[STIX]{x1D6FC})+J(J(\unicode[STIX]{x1D713},\unicode[STIX]{x1D6FC}),\unicode[STIX]{x1D6FD})=0,\end{eqnarray}$$

we find that (2.47)–(2.49) imply the quasi-geostrophic potential vorticity equation,

(2.51) $$\begin{eqnarray}(\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D713}-\unicode[STIX]{x1D713}/{r_{d}}^{2})_{t}+J(\unicode[STIX]{x1D713},\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D713}-\unicode[STIX]{x1D713}/{r_{d}}^{2})=0,\end{eqnarray}$$

where $r_{d}=c/f_{0}$ is the deformation radius.

A more systematic treatment of (2.27) yields theories that describe the interactions between motions that obey (2.51) at leading order and those that obey (2.42) at leading order. The use of the Lagrangian formulation guarantees that all conservation laws will be automatically maintained. However, instead of applying this strategy to shallow-water dynamics, we generalize our results to hydrostatic Boussinesq dynamics, which are of much greater interest to meteorologists and oceanographers. That such a generalization must exist is obvious from the close correspondence between the shallow-water equations and the hydrostatic Boussinesq equations in buoyancy coordinates.

3 Hydrostatic Boussinesq dynamics

In buoyancy coordinates, the hydrostatic Boussinesq equations take the form

(3.1) $$\begin{eqnarray}\displaystyle & \displaystyle \boldsymbol{u}_{t}+\unicode[STIX]{x1D735}\left(B+{\textstyle \frac{1}{2}}\boldsymbol{u}\boldsymbol{\cdot }\boldsymbol{u}\right)=(\unicode[STIX]{x1D701}+f_{0})(v,-u), & \displaystyle\end{eqnarray}$$
(3.2) $$\begin{eqnarray}\displaystyle & z=-B_{\unicode[STIX]{x1D703}}, & \displaystyle\end{eqnarray}$$
(3.3) $$\begin{eqnarray}\displaystyle & z_{\unicode[STIX]{x1D703}t}+(uz_{\unicode[STIX]{x1D703}})_{x}+(vz_{\unicode[STIX]{x1D703}})_{y}=0. & \displaystyle\end{eqnarray}$$

Here, $\boldsymbol{u}(x,y,\unicode[STIX]{x1D703},t)=\boldsymbol{u}(\boldsymbol{x},\unicode[STIX]{x1D703},t)=(u,v)$ is the horizontal velocity, $\unicode[STIX]{x1D703}$ is the buoyancy, $z(\boldsymbol{x},\unicode[STIX]{x1D703},t)$ is the height of the isobuoyancy surface corresponding to $\unicode[STIX]{x1D703}$ at the horizontal location $\boldsymbol{x}$ and $B=p-z\unicode[STIX]{x1D703}$ , with $p$ equal to the pressure divided by the constant mass density. As before, $\unicode[STIX]{x1D701}=\unicode[STIX]{x1D735}\times \boldsymbol{u}\equiv v_{x}-u_{y}$ , but now all horizontal derivatives are taken with $\unicode[STIX]{x1D703}$ fixed. For simplicity, we consider a horizontally unbounded fluid with flat rigid boundaries at $z_{1}$ and $z_{2}=z_{1}+h_{0}$ . At these boundaries, the buoyancy is (and remains) uniform at the values $\unicode[STIX]{x1D703}_{1}$ and $\unicode[STIX]{x1D703}_{2}$ . Thus, the boundary conditions on (3.1)–(3.3) are

(3.4a,b ) $$\begin{eqnarray}B_{\unicode[STIX]{x1D703}}(\boldsymbol{x},\unicode[STIX]{x1D703}_{1},t)=-z_{1},\quad B_{\unicode[STIX]{x1D703}}(\boldsymbol{x},\unicode[STIX]{x1D703}_{2},t)=-z_{2}.\end{eqnarray}$$

The associated vorticity and divergence equations are

(3.5) $$\begin{eqnarray}\unicode[STIX]{x1D701}_{t}=-\unicode[STIX]{x1D735}\boldsymbol{\cdot }((\unicode[STIX]{x1D701}+f_{0})\boldsymbol{u})\end{eqnarray}$$

and

(3.6) $$\begin{eqnarray}\frac{\unicode[STIX]{x2202}}{\unicode[STIX]{x2202}t}\unicode[STIX]{x1D735}\boldsymbol{\cdot }\boldsymbol{u}+\unicode[STIX]{x1D6FB}^{2}\left(B+\frac{1}{2}\boldsymbol{u}\boldsymbol{\cdot }\boldsymbol{u}\right)=\unicode[STIX]{x1D735}\times ((\unicode[STIX]{x1D701}+f_{0})\boldsymbol{u}),\end{eqnarray}$$

where now $\unicode[STIX]{x1D735}=(\unicode[STIX]{x2202}_{x},\unicode[STIX]{x2202}_{y})$ is the horizontal gradient operator with $\unicode[STIX]{x1D703}$ held fixed. The potential vorticity equation is

(3.7) $$\begin{eqnarray}\frac{\text{D}}{\text{D}t}\left(\frac{\unicode[STIX]{x1D701}+f_{0}}{z_{\unicode[STIX]{x1D703}}}\right)=0,\end{eqnarray}$$

where

(3.8) $$\begin{eqnarray}\frac{\text{D}}{\text{D}t}\equiv \frac{\unicode[STIX]{x2202}}{\unicode[STIX]{x2202}t}+u\frac{\unicode[STIX]{x2202}}{\unicode[STIX]{x2202}x}+v\frac{\unicode[STIX]{x2202}}{\unicode[STIX]{x2202}y}.\end{eqnarray}$$

For derivations of these equations, see, for example, Salmon (Reference Salmon1998, pp. 105–107).

We seek a variational principle analogous to that found for shallow-water dynamics. First, as in § 2, we suppose that the fluid was, at some time, in the state of rest. In this state, the potential vorticity depends only on $\unicode[STIX]{x1D703}$ . However, since the potential vorticity and buoyancy are conserved on fluid particles, it must be true that

(3.9) $$\begin{eqnarray}\frac{\unicode[STIX]{x1D701}+f_{0}}{z_{\unicode[STIX]{x1D703}}}=F(\unicode[STIX]{x1D703})\end{eqnarray}$$

at all times, for some function $F$ . Evaluating (3.9) in the state of rest, we find that $F(\unicode[STIX]{x1D703})=f_{0}{N_{0}}^{2}(\unicode[STIX]{x1D703})$ , where $N_{0}(\unicode[STIX]{x1D703})$ is the Vaisala frequency associated with the state of rest. Thus,

(3.10) $$\begin{eqnarray}\unicode[STIX]{x1D701}+f_{0}=f_{0}{N_{0}}^{2}(\unicode[STIX]{x1D703})z_{\unicode[STIX]{x1D703}}.\end{eqnarray}$$

We introduce the potential representation

(3.11) $$\begin{eqnarray}\displaystyle & {N_{0}}^{2}(\unicode[STIX]{x1D703})z_{\unicode[STIX]{x1D703}}=1-\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}, & \displaystyle\end{eqnarray}$$
(3.12) $$\begin{eqnarray}\displaystyle & {N_{0}}^{2}(\unicode[STIX]{x1D703})z_{\unicode[STIX]{x1D703}}u=-\unicode[STIX]{x1D713}_{y}+\unicode[STIX]{x1D6FE}_{xt}, & \displaystyle\end{eqnarray}$$
(3.13) $$\begin{eqnarray}\displaystyle & {N_{0}}^{2}(\unicode[STIX]{x1D703})z_{\unicode[STIX]{x1D703}}v=\unicode[STIX]{x1D713}_{x}+\unicode[STIX]{x1D6FE}_{yt}. & \displaystyle\end{eqnarray}$$

Just as (2.8)–(2.10) automatically satisfies (2.2), the representation (3.11)–(3.13) satisfies (3.3). The prescribed function ${N_{0}}^{2}(\unicode[STIX]{x1D703})$ is analogous to $h_{0}^{-1}$ in § 2. In terms of $\unicode[STIX]{x1D713}$ and $\unicode[STIX]{x1D6FE}$ , the ansatz (3.10) takes the same form,

(3.14) $$\begin{eqnarray}\frac{\unicode[STIX]{x2202}}{\unicode[STIX]{x2202}x}\left(\frac{\unicode[STIX]{x1D713}_{x}+\unicode[STIX]{x1D6FE}_{yt}}{1-\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}}\right)+\frac{\unicode[STIX]{x2202}}{\unicode[STIX]{x2202}y}\left(\frac{\unicode[STIX]{x1D713}_{y}-\unicode[STIX]{x1D6FE}_{xt}}{1-\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}}\right)=-f_{0}\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE},\end{eqnarray}$$

as in shallow-water dynamics. The $\unicode[STIX]{x1D703}$ -derivative of (3.2) is

(3.15) $$\begin{eqnarray}B_{\unicode[STIX]{x1D703}\unicode[STIX]{x1D703}}=-N_{0}^{-2}(\unicode[STIX]{x1D703})(1-\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}).\end{eqnarray}$$

Let

(3.16) $$\begin{eqnarray}B=B_{0}(\unicode[STIX]{x1D703})+\hat{B}(x,y,\unicode[STIX]{x1D703},t),\end{eqnarray}$$

where $B_{0}(\unicode[STIX]{x1D703})$ is associated with the state of rest. Then, the hydrostatic equation (3.15) becomes

(3.17) $$\begin{eqnarray}\hat{B}_{\unicode[STIX]{x1D703}\unicode[STIX]{x1D703}}=N_{0}^{-2}(\unicode[STIX]{x1D703})\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}\end{eqnarray}$$

and the divergence equation (3.6) becomes

(3.18) $$\begin{eqnarray}\left(\frac{-\unicode[STIX]{x1D713}_{y}+\unicode[STIX]{x1D6FE}_{xt}}{1-\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}}\right)_{xt}+\left(\frac{\unicode[STIX]{x1D713}_{x}+\unicode[STIX]{x1D6FE}_{yt}}{1-\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}}\right)_{yt}+\unicode[STIX]{x1D6FB}^{2}\left(\hat{B}+\frac{1}{2}\boldsymbol{u}\boldsymbol{\cdot }\boldsymbol{u}\right)=f_{0}\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D713}.\end{eqnarray}$$

Equations (3.14), (3.17) and (3.18) result from the variational principle $\unicode[STIX]{x1D6FF}L_{1}[\unicode[STIX]{x1D713},\unicode[STIX]{x1D6FE},\hat{B}]=0$ , where

(3.19) $$\begin{eqnarray}\displaystyle L_{1} & = & \displaystyle \iiiint \,\text{d}\boldsymbol{x}\,\text{d}\unicode[STIX]{x1D703}\,\text{d}t\left(\frac{1}{2}{\hat{B}_{\unicode[STIX]{x1D703}}}^{2}+\frac{1}{N_{0}^{2}}\left[\frac{1}{2}\frac{(-\unicode[STIX]{x1D713}_{y}+\unicode[STIX]{x1D6FE}_{xt})^{2}}{(1-\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE})}\right.\right.\nonumber\\ \displaystyle & & \displaystyle +\left.\left.\frac{1}{2}\frac{(\unicode[STIX]{x1D713}_{x}+\unicode[STIX]{x1D6FE}_{yt})^{2}}{(1-\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE})}+f_{0}\unicode[STIX]{x1D735}\unicode[STIX]{x1D713}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\unicode[STIX]{x1D6FE}+\hat{B}\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}\right]\right).\end{eqnarray}$$

The Lagrangian (3.19) is analogous to (2.14).

Before proceeding, we slightly adjust the vertical coordinate. We define a new buoyancy coordinate $Z(\unicode[STIX]{x1D703})$ by

(3.20a,b ) $$\begin{eqnarray}\text{d}Z/\text{d}\unicode[STIX]{x1D703}=N_{0}^{-2}(\unicode[STIX]{x1D703}),\quad Z(\unicode[STIX]{x1D703}_{1})=z_{1}.\end{eqnarray}$$

Thus, $Z(\unicode[STIX]{x1D703})$ is the height of buoyancy surface $\unicode[STIX]{x1D703}$ in the state of rest. With this definition, (3.19) takes the form

(3.21) $$\begin{eqnarray}\displaystyle L_{1} & = & \displaystyle \iiiint \,\text{d}\boldsymbol{x}\,\text{d}Z\,\text{d}t\left(\frac{1}{2}N_{0}^{-2}{\hat{B}_{Z}}^{2}+\frac{1}{2}\frac{(-\unicode[STIX]{x1D713}_{y}+\unicode[STIX]{x1D6FE}_{xt})^{2}}{(1-\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE})}\right.\nonumber\\ \displaystyle & & \displaystyle +\left.\frac{1}{2}\frac{(\unicode[STIX]{x1D713}_{x}+\unicode[STIX]{x1D6FE}_{yt})^{2}}{(1-\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE})}+f_{0}\unicode[STIX]{x1D735}\unicode[STIX]{x1D713}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\unicode[STIX]{x1D6FE}+\hat{B}\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}\right).\end{eqnarray}$$

The integration limits on $Z$ are $z_{1}$ and $z_{2}$ , and we henceforth regard $N_{0}=N_{0}(Z)$ . From now on, we work in the $(x,y,Z,t)$ system. Equation (3.10) takes the form

(3.22) $$\begin{eqnarray}\unicode[STIX]{x1D701}+f_{0}=f_{0}\,\text{d}z/\text{d}Z\end{eqnarray}$$

and (3.17) takes the form

(3.23) $$\begin{eqnarray}(N_{0}^{-2}\hat{B}_{Z})_{Z}=\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}.\end{eqnarray}$$

Following the same path as in § 2 (but skipping the step corresponding to point vortices), we generalize the ansatz (3.22) to

(3.24) $$\begin{eqnarray}\unicode[STIX]{x1D701}+f_{0}=f_{0}\frac{\text{d}z}{\text{d}Z}+\frac{\unicode[STIX]{x2202}(\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD})}{\unicode[STIX]{x2202}(x,y)},\end{eqnarray}$$

where $(\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD})$ are labels that measure APV and move along buoyancy surfaces at the horizontal velocity $\boldsymbol{u}$ . Thus,

(3.25) $$\begin{eqnarray}(\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD})_{t}+\boldsymbol{u}\boldsymbol{\cdot }\unicode[STIX]{x1D735}(\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD})=0.\end{eqnarray}$$

We define

(3.26) $$\begin{eqnarray}L_{2}[\unicode[STIX]{x1D713},\unicode[STIX]{x1D6FE},\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD}]=\iiiint \,\text{d}\boldsymbol{x}\,\text{d}Z\,\text{d}t\left(-\unicode[STIX]{x1D6FC}\unicode[STIX]{x1D6FD}_{t}+\unicode[STIX]{x1D713}\frac{\unicode[STIX]{x2202}(\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD})}{\unicode[STIX]{x2202}(x,y)}-\unicode[STIX]{x1D6FE}_{y}\frac{\unicode[STIX]{x2202}(\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD})}{\unicode[STIX]{x2202}(y,t)}+\unicode[STIX]{x1D6FE}_{x}\frac{\unicode[STIX]{x2202}(\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD})}{\unicode[STIX]{x2202}(t,x)}\right).\end{eqnarray}$$

The Lagrangian (3.21) is analogous to (2.14), and (3.26) is analogous to the last line of (2.27). As we now demonstrate, the variational principle $\unicode[STIX]{x1D6FF}(L_{1}+L_{2})=0$ is equivalent to hydrostatic Boussinesq dynamics.

The variational principle $\unicode[STIX]{x1D6FF}(L_{1}+L_{2})=0$ implies

(3.27) $$\begin{eqnarray}\unicode[STIX]{x1D6FF}\unicode[STIX]{x1D713}:\quad \frac{\unicode[STIX]{x2202}}{\unicode[STIX]{x2202}x}\left(\frac{\unicode[STIX]{x1D713}_{x}+\unicode[STIX]{x1D6FE}_{yt}}{1-\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}}\right)+\frac{\unicode[STIX]{x2202}}{\unicode[STIX]{x2202}y}\left(\frac{\unicode[STIX]{x1D713}_{y}-\unicode[STIX]{x1D6FE}_{xt}}{1-\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}}\right)+f_{0}\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}=\frac{\unicode[STIX]{x2202}(\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD})}{\unicode[STIX]{x2202}(x,y)},\end{eqnarray}$$
(3.28) $$\begin{eqnarray}\displaystyle & & \displaystyle \unicode[STIX]{x1D6FF}\unicode[STIX]{x1D6FE}:\quad \left(\frac{-\unicode[STIX]{x1D713}_{y}+\unicode[STIX]{x1D6FE}_{xt}}{1-\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}}\right)_{xt}+\left(\frac{\unicode[STIX]{x1D713}_{x}+\unicode[STIX]{x1D6FE}_{yt}}{1-\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}}\right)_{yt}+\unicode[STIX]{x1D6FB}^{2}\left(\hat{B}+\frac{1}{2}\boldsymbol{u}\boldsymbol{\cdot }\boldsymbol{u}\right)-f_{0}\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D713}\nonumber\\ \displaystyle & & \displaystyle \quad =\frac{\unicode[STIX]{x2202}}{\unicode[STIX]{x2202}x}\frac{\unicode[STIX]{x2202}(\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD})}{\unicode[STIX]{x2202}(t,x)}-\frac{\unicode[STIX]{x2202}}{\unicode[STIX]{x2202}y}\frac{\unicode[STIX]{x2202}(\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD})}{\unicode[STIX]{x2202}(y,t)},\end{eqnarray}$$
(3.29) $$\begin{eqnarray}\displaystyle & \displaystyle \unicode[STIX]{x1D6FF}\hat{B}:\quad (N_{0}^{-2}\hat{B}_{Z})_{Z}=\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}, & \displaystyle\end{eqnarray}$$
(3.30) $$\begin{eqnarray}\displaystyle & \displaystyle \unicode[STIX]{x1D6FF}\unicode[STIX]{x1D6FD}:\quad \unicode[STIX]{x1D6FC}_{t}+u\unicode[STIX]{x1D6FC}_{x}+v\unicode[STIX]{x1D6FC}_{y}=0, & \displaystyle\end{eqnarray}$$
(3.31) $$\begin{eqnarray}\displaystyle & \displaystyle \unicode[STIX]{x1D6FF}\unicode[STIX]{x1D6FC}:\quad \unicode[STIX]{x1D6FD}_{t}+u\unicode[STIX]{x1D6FD}_{x}+v\unicode[STIX]{x1D6FD}_{y}=0, & \displaystyle\end{eqnarray}$$

and the boundary conditions $\hat{B}_{Z}=0$ at $Z=z_{1}$ and $Z=z_{2}$ , where, by (3.11)–(3.13),

(3.32) $$\begin{eqnarray}\boldsymbol{u}=\frac{(-\unicode[STIX]{x1D713}_{y}+\unicode[STIX]{x1D6FE}_{tx},\unicode[STIX]{x1D713}_{x}+\unicode[STIX]{x1D6FE}_{ty})}{1-\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}}.\end{eqnarray}$$

Solving (3.30) and (3.31) for $u$ and $v$ , we obtain

(3.33a,b ) $$\begin{eqnarray}u=\frac{1}{Q}\frac{\unicode[STIX]{x2202}(\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD})}{\unicode[STIX]{x2202}(y,t)},\quad v=\frac{1}{Q}\frac{\unicode[STIX]{x2202}(\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD})}{\unicode[STIX]{x2202}(t,x)},\end{eqnarray}$$

where

(3.34) $$\begin{eqnarray}Q=\frac{\unicode[STIX]{x2202}(\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD})}{\unicode[STIX]{x2202}(x,y)}\end{eqnarray}$$

is the APV. Then, (3.27) may be rewritten as

(3.35) $$\begin{eqnarray}\unicode[STIX]{x1D701}+f_{0}\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}=Q\end{eqnarray}$$

and (3.28) may be rewritten as

(3.36) $$\begin{eqnarray}\frac{\unicode[STIX]{x2202}}{\unicode[STIX]{x2202}t}\unicode[STIX]{x1D735}\boldsymbol{\cdot }\boldsymbol{u}+\unicode[STIX]{x1D6FB}^{2}\left(\hat{B}+\frac{1}{2}\boldsymbol{u}\boldsymbol{\cdot }\boldsymbol{u}\right)-f_{0}\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D713}=\unicode[STIX]{x1D735}\times (Q\boldsymbol{u}).\end{eqnarray}$$

By (3.32) and (3.35), the right-hand side of (3.36) is

(3.37) $$\begin{eqnarray}\unicode[STIX]{x1D735}\times ((\unicode[STIX]{x1D701}+f_{0})\boldsymbol{u}-f_{0}(1-\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE})\boldsymbol{u})=\unicode[STIX]{x1D735}\times ((\unicode[STIX]{x1D701}+f_{0})\boldsymbol{u})-f_{0}\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D713}.\end{eqnarray}$$

Thus, (3.36) is equivalent to the divergence equation (3.6). By (3.33a,b ) and (3.34), the mathematical identity (2.36) takes the same form,

(3.38) $$\begin{eqnarray}Q_{t}+\unicode[STIX]{x1D735}\boldsymbol{\cdot }(Q\boldsymbol{u})=0,\end{eqnarray}$$

as in § 2. Substituting (3.32) and (3.35) into (3.38), we obtain the vorticity equation (3.5). This concludes the proof that $\unicode[STIX]{x1D6FF}(L_{1}+L_{2})=0$ is equivalent to hydrostatic Boussinesq dynamics.

In the rest of this paper, we replace $Z$ by $z$ to simplify notation, but it is important to remember that $z$ is a disguised buoyancy coordinate. It is in fact the same coordinate introduced by Young (Reference Young2012).

To further simplify matters, we adopt the standard field-theory convention that all variations vanish at the extremities of the fluid, both in time and in space. That is, we do not attempt to incorporate boundary conditions (which are in some sense arbitrary) into the variational principle. Our focus is on the equations themselves.

4 Bretherton flow

As a first example of the use of the variational principle discovered in the previous section, we consider the flow that develops when waves enter a region that is initially at rest. Bühler & McIntyre (Reference Bühler and McIntyre2005) call this ‘Bretherton flow’ after the pioneering work of Bretherton (Reference Bretherton1969). If the fluid is initially at rest, then the APV vanishes, and the dynamics is completely described by $\unicode[STIX]{x1D6FF}L_{1}[\unicode[STIX]{x1D713},\unicode[STIX]{x1D6FE},\hat{B}]=0$ , where $L_{1}$ is given by (3.21). If the waves are sufficiently weak, we may write

(4.1) $$\begin{eqnarray}L_{1}=L_{1}^{0}+L_{1C},\end{eqnarray}$$

where

(4.2) $$\begin{eqnarray}L_{1}^{0}=\iiiint \,\text{d}\boldsymbol{x}\,\text{d}z\,\text{d}t\left(\frac{1}{2}N_{0}^{-2}{\hat{B}_{z}}^{2}+\frac{1}{2}\unicode[STIX]{x1D735}\unicode[STIX]{x1D713}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\unicode[STIX]{x1D713}+\frac{1}{2}\unicode[STIX]{x1D735}\unicode[STIX]{x1D6FE}_{t}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\unicode[STIX]{x1D6FE}_{t}+f_{0}\unicode[STIX]{x1D735}\unicode[STIX]{x1D713}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\unicode[STIX]{x1D6FE}+\hat{B}\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}\right)\end{eqnarray}$$

contains the quadratic terms in $L_{1}$ , and

(4.3) $$\begin{eqnarray}L_{1C}=\iiiint \,\text{d}\boldsymbol{x}\,\text{d}z\,\text{d}t\left(\frac{1}{2}\unicode[STIX]{x1D735}\unicode[STIX]{x1D713}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\unicode[STIX]{x1D713}+J(\unicode[STIX]{x1D713},\unicode[STIX]{x1D6FE}_{t})+\frac{1}{2}\unicode[STIX]{x1D735}\unicode[STIX]{x1D6FE}_{t}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\unicode[STIX]{x1D6FE}_{t}\right)(\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}+(\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE})^{2}+\cdots \,)\end{eqnarray}$$

contains the higher-order corrections. By (3.11), small $\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}$ corresponds to a nearly undisturbed stratification.

If the wave amplitude is small, then, to leading order, the waves are governed by $\unicode[STIX]{x1D6FF}L_{1}^{0}=0$ , which implies

(4.4) $$\begin{eqnarray}\displaystyle & \unicode[STIX]{x1D6FF}\unicode[STIX]{x1D713}:\quad \unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D713}+f_{0}\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}=0, & \displaystyle\end{eqnarray}$$
(4.5) $$\begin{eqnarray}\displaystyle & \unicode[STIX]{x1D6FF}\hat{B}:\quad (N_{0}^{-2}\hat{B}_{z})_{z}=\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}, & \displaystyle\end{eqnarray}$$
(4.6) $$\begin{eqnarray}\displaystyle & \unicode[STIX]{x1D6FF}\unicode[STIX]{x1D6FE}:\quad \unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}_{tt}-f_{0}\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D713}+\unicode[STIX]{x1D6FB}^{2}\hat{B}=0. & \displaystyle\end{eqnarray}$$

Using (4.4) and (4.5) to eliminate $\unicode[STIX]{x1D713}$ and $\unicode[STIX]{x1D6FE}$ , we obtain a single equation for $\hat{B}$ ,

(4.7) $$\begin{eqnarray}(\unicode[STIX]{x2202}_{tt}+f_{0}^{2})(N_{0}^{-2}\hat{B}_{z})_{z}+\unicode[STIX]{x1D6FB}^{2}\hat{B}=0.\end{eqnarray}$$

If $N_{0}(z)$ varies slowly, then (4.7) supports inertia–gravity waves of the form $\hat{B}\propto \cos (kx+ly+mz-\unicode[STIX]{x1D714}t)$ with dispersion relation

(4.8) $$\begin{eqnarray}\unicode[STIX]{x1D714}^{2}=f_{0}^{2}+K^{2}N_{0}^{2}/m^{2}\equiv \unicode[STIX]{x1D714}_{r}(K,m,z)^{2},\end{eqnarray}$$

where $K^{2}=k^{2}+l^{2}$ . The dispersion relation (4.8) is correct for hydrostatic inertia–gravity waves, which have frequencies much smaller than $N_{0}$ .

If the wave amplitude $a$ is small, then the waves, which are approximately governed by (4.4)–(4.6), induce an $O(a^{2})$ mean flow that may be computed from $L_{1}$ . Let

(4.9a-c ) $$\begin{eqnarray}\unicode[STIX]{x1D713}=\bar{\unicode[STIX]{x1D713}}+\unicode[STIX]{x1D713}^{\prime },\quad \unicode[STIX]{x1D6FE}=\bar{\unicode[STIX]{x1D6FE}}+\unicode[STIX]{x1D6FE}^{\prime },\quad \hat{B}=\bar{B}+B^{\prime },\end{eqnarray}$$

where a bar denotes the mean flow and a prime denotes the wave. We assume that the wave is slowly varying in the sense that its amplitude, wavenumbers and frequency all vary on scales that are large compared with its wavelength and period. We follow Whitham’s (Reference Whitham1965, Reference Whitham1974) procedure. The averaged Lagrangian is

(4.10) $$\begin{eqnarray}\langle L_{1}\rangle =L_{1}^{0}[\bar{B},\bar{\unicode[STIX]{x1D713}},\bar{\unicode[STIX]{x1D6FE}}]+\langle L_{1}^{0}[B^{\prime },\unicode[STIX]{x1D713}^{\prime },\unicode[STIX]{x1D6FE}^{\prime }]\rangle +\langle L_{1C}[\bar{\unicode[STIX]{x1D713}}+\unicode[STIX]{x1D713}^{\prime },\bar{\unicode[STIX]{x1D6FE}}+\unicode[STIX]{x1D6FE}^{\prime }]\rangle ,\end{eqnarray}$$

where $L_{1}^{0}$ and $L_{1C}$ are defined by (4.2) and (4.3). The angle brackets denote the average over wave phase. Thus, $\langle \bar{\unicode[STIX]{x1D713}}\rangle =\bar{\unicode[STIX]{x1D713}}$ and $\langle \unicode[STIX]{x1D713}^{\prime }\rangle =0$ . We approximate the last term in (4.10) by keeping only the cubic contributions to (4.3) that are quadratic in the primes. Then,

(4.11) $$\begin{eqnarray}\langle L_{1C}\rangle =\iiiint \,\text{d}\boldsymbol{x}\,\text{d}z\,\text{d}t(M\bar{\unicode[STIX]{x1D713}}+S\bar{\unicode[STIX]{x1D6FE}}),\end{eqnarray}$$

where

(4.12) $$\begin{eqnarray}M=\langle J(\unicode[STIX]{x1D6FE}_{t}^{\prime },\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}^{\prime })\rangle -\unicode[STIX]{x1D735}\boldsymbol{\cdot }\langle \unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}^{\prime }\unicode[STIX]{x1D735}\unicode[STIX]{x1D713}^{\prime }\rangle\end{eqnarray}$$

and

(4.13) $$\begin{eqnarray}\displaystyle S & = & \displaystyle \unicode[STIX]{x1D6FB}^{2}\left({\textstyle \frac{1}{2}}\langle \unicode[STIX]{x1D735}\unicode[STIX]{x1D713}^{\prime }\boldsymbol{\cdot }\unicode[STIX]{x1D735}\unicode[STIX]{x1D713}^{\prime }\rangle +\langle J(\unicode[STIX]{x1D713}^{\prime },\unicode[STIX]{x1D6FE}_{t}^{\prime })\rangle +{\textstyle \frac{1}{2}}\langle \unicode[STIX]{x1D735}\unicode[STIX]{x1D6FE}_{t}^{\prime }\boldsymbol{\cdot }\unicode[STIX]{x1D735}\unicode[STIX]{x1D6FE}_{t}^{\prime }\rangle \right)\nonumber\\ \displaystyle & & \displaystyle +\,\langle J(\unicode[STIX]{x1D713}^{\prime },\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}^{\prime })\rangle _{t}+\unicode[STIX]{x1D735}\boldsymbol{\cdot }\langle \unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}^{\prime }\unicode[STIX]{x1D735}\unicode[STIX]{x1D6FE}_{t}^{\prime }\rangle _{t}.\end{eqnarray}$$

The enormous advantage of Whitham’s method is that terms in the Lagrangian may be simplified before the variations are taken to determine the equations. It turns out that all of the wave-forcing terms on the right-hand sides of (4.12) and (4.13) are negligible except for the first term in (4.12). Let

(4.14) $$\begin{eqnarray}\unicode[STIX]{x1D6FE}^{\prime }=A(x,y,z,t)\cos \unicode[STIX]{x1D719}(x,y,z,t),\end{eqnarray}$$

where $A$ is the slowly varying amplitude and $\unicode[STIX]{x1D719}$ is the rapidly varying phase. The frequency and wavenumbers,

(4.15a-d ) $$\begin{eqnarray}\unicode[STIX]{x1D714}=-\unicode[STIX]{x1D719}_{t},\quad k=\unicode[STIX]{x1D719}_{x},\quad l=\unicode[STIX]{x1D719}_{y},\quad m=\unicode[STIX]{x1D719}_{z},\end{eqnarray}$$

all vary slowly. First, we consider the two terms on the right-hand side of (4.12). At leading order,

(4.16) $$\begin{eqnarray}\langle J(\unicode[STIX]{x1D6FE}_{t}^{\prime },\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}^{\prime })\rangle =\langle \unicode[STIX]{x1D6FE}_{tx}^{\prime }\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}^{\prime }\rangle _{y}-\langle \unicode[STIX]{x1D6FE}_{ty}^{\prime }\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}^{\prime }\rangle _{x}=\unicode[STIX]{x1D735}\times \left({\textstyle \frac{1}{2}}\unicode[STIX]{x1D714}K^{2}A^{2}\boldsymbol{k}\right),\end{eqnarray}$$

where $K^{2}=k^{2}+l^{2}$ . For the second term, we have

(4.17) $$\begin{eqnarray}-\unicode[STIX]{x1D735}\boldsymbol{\cdot }\langle \unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}^{\prime }\unicode[STIX]{x1D735}\unicode[STIX]{x1D713}^{\prime }\rangle =f_{0}\unicode[STIX]{x1D735}\boldsymbol{\cdot }\langle \unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}^{\prime }\unicode[STIX]{x1D735}\unicode[STIX]{x1D6FE}^{\prime }\rangle ,\end{eqnarray}$$

where we have used the fact that $\unicode[STIX]{x1D713}^{\prime }=-f_{0}\unicode[STIX]{x1D6FE}^{\prime }$ at leading order; see (4.4). The phase average in (4.17) is over two factors, $\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D6FE}^{\prime }$ and $\unicode[STIX]{x1D735}\unicode[STIX]{x1D6FE}^{\prime }$ , that are $90^{\circ }$ out of phase. Thus, (4.17) is smaller than (4.16) by a factor $\unicode[STIX]{x1D716}$ , where $\unicode[STIX]{x1D716}\ll 1$ is the ‘scale-separation’ parameter – the ratio of the wavelength to the length scale of slow variation. Similar remarks apply to all of the terms in (4.13): in contrast to (4.16), all involve at least two derivatives of the slowly varying wave amplitude, frequency and wavenumbers. Thus, the approximation (4.11) becomes

(4.18) $$\begin{eqnarray}\langle L_{1C}\rangle =\iiiint \,\text{d}\boldsymbol{x}\,\text{d}z\,\text{d}t\bar{\unicode[STIX]{x1D713}}\unicode[STIX]{x1D735}\times \left(\frac{1}{2}\unicode[STIX]{x1D714}K^{2}A^{2}\boldsymbol{k}\right)\end{eqnarray}$$

and the complete Lagrangian (4.10) becomes

(4.19) $$\begin{eqnarray}\displaystyle & & \displaystyle \langle L_{1}\rangle [\bar{B},\bar{\unicode[STIX]{x1D713}},\bar{\unicode[STIX]{x1D6FE}},A,\unicode[STIX]{x1D719}]\nonumber\\ \displaystyle & & \displaystyle \quad =\iiiint \,\text{d}\boldsymbol{x}\,\text{d}z\,\text{d}t\left[\frac{1}{2}N_{0}^{-2}{\bar{B}_{z}}^{2}+\frac{1}{2}\unicode[STIX]{x1D735}\bar{\unicode[STIX]{x1D713}}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\bar{\unicode[STIX]{x1D713}}+\frac{1}{2}\unicode[STIX]{x1D735}\bar{\unicode[STIX]{x1D6FE}}_{t}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\bar{\unicode[STIX]{x1D6FE}}_{t}+f_{0}\unicode[STIX]{x1D735}\bar{\unicode[STIX]{x1D713}}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\bar{\unicode[STIX]{x1D6FE}}+\bar{B}\unicode[STIX]{x1D6FB}^{2}\bar{\unicode[STIX]{x1D6FE}}\right.\nonumber\\ \displaystyle & & \displaystyle \qquad +\left.\frac{1}{4}K^{2}A^{2}(\unicode[STIX]{x1D714}^{2}-\unicode[STIX]{x1D714}_{r}^{2})+\bar{\unicode[STIX]{x1D713}}\unicode[STIX]{x1D735}\times \left(\frac{1}{2}\unicode[STIX]{x1D714}K^{2}A^{2}\boldsymbol{k}\right)\right],\end{eqnarray}$$

where the relative frequency $\unicode[STIX]{x1D714}_{r}$ is defined by (4.8). The equations resulting from $\unicode[STIX]{x1D6FF}\langle L_{1}\rangle =0$ are

(4.20) $$\begin{eqnarray}\displaystyle & \displaystyle \unicode[STIX]{x1D6FF}\bar{B}:\quad (N_{0}^{-2}\bar{B}_{z})_{z}-\unicode[STIX]{x1D6FB}^{2}\bar{\unicode[STIX]{x1D6FE}}=0, & \displaystyle\end{eqnarray}$$
(4.21) $$\begin{eqnarray}\displaystyle & \displaystyle \unicode[STIX]{x1D6FF}\bar{\unicode[STIX]{x1D713}}:\quad \unicode[STIX]{x1D6FB}^{2}\bar{\unicode[STIX]{x1D713}}+f_{0}\unicode[STIX]{x1D6FB}^{2}\bar{\unicode[STIX]{x1D6FE}}=\unicode[STIX]{x1D735}\times (E\boldsymbol{k}/\unicode[STIX]{x1D714}), & \displaystyle\end{eqnarray}$$
(4.22) $$\begin{eqnarray}\displaystyle & \displaystyle \unicode[STIX]{x1D6FF}\bar{\unicode[STIX]{x1D6FE}}:\quad \unicode[STIX]{x1D6FB}^{2}\bar{\unicode[STIX]{x1D6FE}}_{tt}-f_{0}\unicode[STIX]{x1D6FB}^{2}\bar{\unicode[STIX]{x1D713}}+\unicode[STIX]{x1D6FB}^{2}\bar{B}=0, & \displaystyle\end{eqnarray}$$
(4.23) $$\begin{eqnarray}\displaystyle & \displaystyle \unicode[STIX]{x1D6FF}A^{2}:\quad \unicode[STIX]{x1D714}^{2}=f_{0}^{2}+K^{2}N_{0}^{2}/m^{2}+2\unicode[STIX]{x1D714}\bar{\boldsymbol{U}}\boldsymbol{\cdot }\boldsymbol{k}, & \displaystyle\end{eqnarray}$$
(4.24) $$\begin{eqnarray}\displaystyle & \displaystyle \unicode[STIX]{x1D6FF}\unicode[STIX]{x1D719}:\quad \left(\frac{E}{\unicode[STIX]{x1D714}}\left(1-\frac{\bar{\boldsymbol{U}}\boldsymbol{\cdot }\boldsymbol{k}}{\unicode[STIX]{x1D714}}\right)\right)_{t}+\unicode[STIX]{x1D735}_{3}\boldsymbol{\cdot }\left(\frac{E}{\unicode[STIX]{x1D714}}\left(\frac{\unicode[STIX]{x1D714}_{r}}{\unicode[STIX]{x1D714}}\boldsymbol{c}_{gr}+\bar{\boldsymbol{U}}\right)\right)=0, & \displaystyle\end{eqnarray}$$

where $E\equiv (1/2)K^{2}\unicode[STIX]{x1D714}^{2}A^{2}$ , $\bar{\boldsymbol{U}}\equiv (-\bar{\unicode[STIX]{x1D713}}_{y},\bar{\unicode[STIX]{x1D713}}_{x})$ , $\unicode[STIX]{x1D735}_{3}\equiv (\unicode[STIX]{x2202}_{x},\unicode[STIX]{x2202}_{y},\unicode[STIX]{x2202}_{z})$ and

(4.25) $$\begin{eqnarray}\boldsymbol{c}_{gr}=\left(\frac{\unicode[STIX]{x2202}\unicode[STIX]{x1D714}_{r}}{\unicode[STIX]{x2202}k},\frac{\unicode[STIX]{x2202}\unicode[STIX]{x1D714}_{r}}{\unicode[STIX]{x2202}l},\frac{\unicode[STIX]{x2202}\unicode[STIX]{x1D714}_{r}}{\unicode[STIX]{x2202}m}\right)=\frac{1}{\sqrt{1+f_{0}^{2}m^{2}/K^{2}N_{0}^{2}}}\frac{N_{0}}{m}\left(\frac{k}{K},\frac{l}{K},-\frac{K}{m}\right)\end{eqnarray}$$

is the relative group velocity. The only forcing term in the equations (4.20)–(4.22) determining the mean flow is the curl of the pseudomomentum,

(4.26) $$\begin{eqnarray}\boldsymbol{p}\equiv E\boldsymbol{k}/\unicode[STIX]{x1D714}.\end{eqnarray}$$

This $O(a^{2})$ term induces an $O(a^{2})$ mean flow that contributes the small final term to the dispersion relation (4.23). Equation (4.23) differs from the expected Doppler-shifted result $\unicode[STIX]{x1D714}=\unicode[STIX]{x1D714}_{r}+\boldsymbol{U}\boldsymbol{\cdot }\boldsymbol{k}$ by an even smaller $O(a^{4})$ term, namely $(\boldsymbol{U}\boldsymbol{\cdot }\boldsymbol{k})^{2}$ , because the approximation (4.18) excludes terms that are quadratic in the mean flow. Similar remarks apply to (4.24). If we keep only the largest terms in each equation, then (4.20)–(4.22) are unchanged, (4.23) reduces to $\unicode[STIX]{x1D714}=\unicode[STIX]{x1D714}_{r}$ and (4.24) takes the familiar form

(4.27) $$\begin{eqnarray}(E/\unicode[STIX]{x1D714})_{t}+\unicode[STIX]{x1D735}_{3}\boldsymbol{\cdot }(\boldsymbol{c}_{g}E/\unicode[STIX]{x1D714})=0\end{eqnarray}$$

of action conservation. In this limit, we may consider the slowly varying wavetrain to be a prescribed solution of the linear equations (4.4)–(4.6). The only challenge is to determine the induced mean flow from (4.20)–(4.22). However, it is worth pointing out that (4.20)–(4.24) are consistent with all conservation laws (including total energy conservation) because they have been obtained from a variational principle, whereas a posteriori approximations like (4.27) generally destroy exact conservation.

The general solution of the mean flow equations (4.20)–(4.22) includes free inertia–gravity waves, but to include them would be redundant: we are solely interested in the forced response to $\unicode[STIX]{x1D735}\times \boldsymbol{p}$ . If $f_{0}=0$ , then $\bar{\unicode[STIX]{x1D6FE}}$ and $\bar{B}$ vanish, and

(4.28) $$\begin{eqnarray}\unicode[STIX]{x1D6FB}^{2}\bar{\unicode[STIX]{x1D713}}=\unicode[STIX]{x1D735}\times \boldsymbol{p}\end{eqnarray}$$

determines the Lagrangian mean velocity. If $f_{0}\neq 0$ , then (4.20)–(4.22) may be combined to give

(4.29) $$\begin{eqnarray}(\unicode[STIX]{x2202}_{tt}+f_{0}^{2})(N_{0}^{-2}\bar{B}_{z})_{z}+\unicode[STIX]{x1D6FB}^{2}\bar{B}=f_{0}\unicode[STIX]{x1D735}\times \boldsymbol{p}.\end{eqnarray}$$

Suppose that the pseudomomentum $\boldsymbol{p}$ corresponds to a wavepacket. The wavepacket propagates at the group velocity corresponding to its ‘carrier wavenumber’ $\boldsymbol{k}$ . The forced solution of (4.29) propagates at this same group velocity. Hence, we may solve (4.29) in the reference frame moving with the packet by replacing $\unicode[STIX]{x2202}_{t}\rightarrow -\boldsymbol{c}_{gr}\boldsymbol{\cdot }\unicode[STIX]{x1D735}_{3}$ in (4.29). If the group velocity is sufficiently small, then the $\unicode[STIX]{x2202}_{tt}$ -term in (4.29) is negligible, and (4.29) reduces to

(4.30) $$\begin{eqnarray}\unicode[STIX]{x1D6FB}^{2}\bar{B}+\frac{\unicode[STIX]{x2202}}{\unicode[STIX]{x2202}z}\left(\frac{f_{0}^{2}}{N_{0}^{2}}\frac{\unicode[STIX]{x2202}\bar{B}}{\unicode[STIX]{x2202}z}\right)=f_{0}\unicode[STIX]{x1D735}\times \boldsymbol{p}.\end{eqnarray}$$

In this same limit, equation (4.22) implies $\bar{B}=f_{0}\bar{\unicode[STIX]{x1D713}}$ , so that (4.30) is equivalent to

(4.31) $$\begin{eqnarray}\unicode[STIX]{x1D6FB}^{2}\bar{\unicode[STIX]{x1D713}}+\frac{\unicode[STIX]{x2202}}{\unicode[STIX]{x2202}z}\left(\frac{f_{0}^{2}}{N_{0}^{2}}\frac{\unicode[STIX]{x2202}\bar{\unicode[STIX]{x1D713}}}{\unicode[STIX]{x2202}z}\right)=\unicode[STIX]{x1D735}\times \boldsymbol{p}.\end{eqnarray}$$

The left-hand side of (4.31) is the quasi-geostrophic potential vorticity. Thus, if the inertia–gravity wavepacket propagates slowly enough, the mean flow response is quasi-geostrophic.

The WKB ansatz (4.14), which assumes a scale separation in all four of the dependent variables $(x,y,z,t)$ , is overly restrictive, because inertia–gravity waves with frequencies near $f_{0}$ correspond to large horizontal length scales. If we assume that only the time scale separates the mean flow from the waves, then only the last two terms in (4.13) may be neglected; all of the other terms in (4.12) and (4.13) must be retained. However, before proceeding with less restrictive assumptions, we generalize the dynamics to the case of non-vanishing APV.

5 Non-vanishing available potential vorticity

If the APV does not vanish, then the full Lagrangian includes $L_{2}$ , which couples the inertia–gravity waves to the APV. The introduction of APV is analogous to the introduction of electric charge. In this section, we develop approximations to $L_{2}$ analogous to those developed for $L_{1}$ in the previous section. We start by considering the form (2.17) of $L_{2}$ corresponding to shallow-water dynamics with point vortices of APV. Let

(5.1) $$\begin{eqnarray}\boldsymbol{x}_{i}(t)=\bar{\boldsymbol{x}}_{i}(t)+\unicode[STIX]{x1D743}_{i}(t),\end{eqnarray}$$

where $\bar{\boldsymbol{x}}_{i}(t)=\langle \boldsymbol{x}_{i}(t)\rangle$ is the average location of the $i$ th point vortex and $\unicode[STIX]{x1D743}_{i}(t)=(\unicode[STIX]{x1D709}_{i}(t),\unicode[STIX]{x1D702}_{i}(t))$ is the departure therefrom. Thus, $\unicode[STIX]{x1D709}_{i}(t)$ and $\unicode[STIX]{x1D702}_{i}(t)$ are rapidly fluctuating variables like $\unicode[STIX]{x1D6FE}^{\prime }(\boldsymbol{x},t)$ and $\unicode[STIX]{x1D713}^{\prime }(\boldsymbol{x},t)$ . Substituting (5.1) into (2.17) and averaging, we obtain

(5.2) $$\begin{eqnarray}\displaystyle \langle L_{2}\rangle & = & \displaystyle \left\langle \int \,\text{d}t\mathop{\sum }_{i}\unicode[STIX]{x1D6E4}_{i}\left(-\bar{x}_{i}\dot{\bar{y}}_{i}-\unicode[STIX]{x1D709}_{i}\dot{\unicode[STIX]{x1D702}}_{i}+\unicode[STIX]{x1D713}(\bar{\boldsymbol{x}}_{i}+\unicode[STIX]{x1D743}_{i},t)-\unicode[STIX]{x1D6FE}_{y}(\bar{\boldsymbol{x}}_{i}+\unicode[STIX]{x1D743}_{i},t)(\dot{\bar{x}}_{i}+\dot{\unicode[STIX]{x1D709}}_{i})\right.\right.\nonumber\\ \displaystyle & & \displaystyle +\left.\!\left.\unicode[STIX]{x1D6FE}_{x}(\bar{\boldsymbol{x}}_{i}+\unicode[STIX]{x1D743}_{i},t)(\dot{\bar{y}}_{i}+\dot{\unicode[STIX]{x1D702}}_{i})\right)\!\!\vphantom{\int \mathop{\sum }_{i}}\right\rangle .\end{eqnarray}$$

We proceed by Taylor expansion of the functions $\unicode[STIX]{x1D713}$ , $\unicode[STIX]{x1D6FE}_{y}$ and $\unicode[STIX]{x1D6FE}_{x}$ about the argument $\bar{\boldsymbol{x}}_{i}$ . Consider the single term

(5.3) $$\begin{eqnarray}\left\langle \int \,\text{d}t\mathop{\sum }_{i}\unicode[STIX]{x1D6E4}_{i}\unicode[STIX]{x1D713}(\bar{\boldsymbol{x}}_{i}+\unicode[STIX]{x1D743}_{i},t)\right\rangle =\left\langle \int \,\text{d}t\mathop{\sum }_{i}\unicode[STIX]{x1D6E4}_{i}(\bar{\unicode[STIX]{x1D713}}(\bar{\boldsymbol{x}}_{i}+\unicode[STIX]{x1D743}_{i},t)+\unicode[STIX]{x1D713}^{\prime }(\bar{\boldsymbol{x}}_{i}+\unicode[STIX]{x1D743}_{i},t))\right\rangle .\end{eqnarray}$$

The function $\bar{\unicode[STIX]{x1D713}}$ depends slowly on its argument no matter what the character of the argument itself. Similarly, the function $\unicode[STIX]{x1D713}^{\prime }$ depends rapidly on its argument. Proceeding with the Taylor expansion, and keeping only the quadratic terms in the fast variables, we obtain

(5.4) $$\begin{eqnarray}\displaystyle & & \displaystyle \left\langle \int \,\text{d}t\mathop{\sum }_{i}\unicode[STIX]{x1D6E4}_{i}\unicode[STIX]{x1D713}(\bar{\boldsymbol{x}}_{i}+\unicode[STIX]{x1D743}_{i},t)\right\rangle \nonumber\\ \displaystyle & & \displaystyle \quad \approx \left\langle \int \,\text{d}t\mathop{\sum }_{i}\unicode[STIX]{x1D6E4}_{i}\left(\bar{\unicode[STIX]{x1D713}}(\bar{\boldsymbol{x}}_{i},t)+\frac{1}{2}(\unicode[STIX]{x1D743}_{i}\boldsymbol{\cdot }\unicode[STIX]{x1D735})^{2}\bar{\unicode[STIX]{x1D713}}(\bar{\boldsymbol{x}}_{i},t)+\unicode[STIX]{x1D743}_{i}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\unicode[STIX]{x1D713}^{\prime }(\bar{\boldsymbol{x}}_{i},t)\right)\right\rangle .\end{eqnarray}$$

In the preceding section, we approximated $L_{1}$ by keeping only the cubic terms of the form bar–prime–prime. Since the constant $\unicode[STIX]{x1D6E4}_{i}$ counts as a ‘bar’ variable, the middle term on the right-hand side of (5.4) is of the order bar–bar–prime–prime. Thus, if we treat $L_{2}$ in the same manner as we treated $L_{1}$ in the previous section, we can consistently neglect this middle term. (We come back to this point in § 8.) Our approximation becomes

(5.5) $$\begin{eqnarray}\left\langle \int \,\text{d}t\mathop{\sum }_{i}\unicode[STIX]{x1D6E4}_{i}\unicode[STIX]{x1D713}(\bar{\boldsymbol{x}}_{i}+\unicode[STIX]{x1D743}_{i},t)\right\rangle \approx \int \,\text{d}t\mathop{\sum }_{i}\unicode[STIX]{x1D6E4}_{i}(\bar{\unicode[STIX]{x1D713}}(\bar{\boldsymbol{x}}_{i},t)+\langle \unicode[STIX]{x1D743}_{i}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\unicode[STIX]{x1D713}^{\prime }(\bar{\boldsymbol{x}}_{i},t)\rangle ).\end{eqnarray}$$

The last two terms in (5.2) may be approximated in a similar manner, and the result is

(5.6) $$\begin{eqnarray}\displaystyle \langle L_{2}\rangle & = & \displaystyle \int \,\text{d}t\mathop{\sum }_{i}\unicode[STIX]{x1D6E4}_{i} (-\bar{x}_{i}\dot{\bar{y}}_{i}+\bar{\unicode[STIX]{x1D713}}(\bar{\boldsymbol{x}}_{i},t)-\bar{\unicode[STIX]{x1D6FE}}_{y}(\bar{\boldsymbol{x}}_{i},t)\dot{\bar{x}}_{i}+\bar{\unicode[STIX]{x1D6FE}}_{x}(\bar{\boldsymbol{x}}_{i},t)\dot{\bar{y}}_{i}-\langle \unicode[STIX]{x1D709}_{i}\dot{\unicode[STIX]{x1D702}}_{i}\rangle \nonumber\\ \displaystyle & & \displaystyle +\,\langle \unicode[STIX]{x1D743}_{i}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\unicode[STIX]{x1D713}^{\prime }(\bar{\boldsymbol{x}}_{i},t)\rangle -\langle \unicode[STIX]{x1D6FE}_{y}^{\prime }(\bar{\boldsymbol{x}}_{i},t)\dot{\unicode[STIX]{x1D709}}_{i}\rangle +\langle \unicode[STIX]{x1D6FE}_{x}^{\prime }(\bar{\boldsymbol{x}}_{i},t)\dot{\unicode[STIX]{x1D702}}_{i}\rangle \!).\end{eqnarray}$$

The $\bar{\unicode[STIX]{x1D6FE}}$ -terms in (5.6) involve only the mean flow and represent higher-order corrections to quasi-geostrophy. Neglecting these terms, and defining $\unicode[STIX]{x1D743}(\boldsymbol{x},t)$ to be the fluctuating displacement of the fluid particle currently located at $\boldsymbol{x}$ (whether or not there is any APV there), we rewrite (5.6) as

(5.7) $$\begin{eqnarray}\langle L_{2}\rangle =\iiint \,\text{d}t\,\text{d}\boldsymbol{x}\mathop{\sum }_{i}\unicode[STIX]{x1D6E4}_{i}\unicode[STIX]{x1D6FF}(\boldsymbol{x}-\bar{\boldsymbol{x}}_{i}(t))(-x\dot{\bar{y}}_{i}+\bar{\unicode[STIX]{x1D713}}-\langle \unicode[STIX]{x1D709}\unicode[STIX]{x1D702}_{t}\rangle +\langle \unicode[STIX]{x1D743}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\unicode[STIX]{x1D713}^{\prime }\rangle -\langle \unicode[STIX]{x1D6FE}_{y}^{\prime }\unicode[STIX]{x1D709}_{t}\rangle +\langle \unicode[STIX]{x1D6FE}_{x}^{\prime }\unicode[STIX]{x1D702}_{t}\rangle ).\end{eqnarray}$$

Because of the delta function, all of the terms in the integrand of (5.7) can be considered to be functions of $(\boldsymbol{x},t)$ . We define

(5.8) $$\begin{eqnarray}\tilde{Q}(x,y,t)\equiv \mathop{\sum }_{i}\unicode[STIX]{x1D6E4}_{i}\unicode[STIX]{x1D6FF}(\boldsymbol{x}-\bar{\boldsymbol{x}}_{i}(t)).\end{eqnarray}$$

In contrast to the $Q$ defined by (2.38), $\tilde{Q}$ depends slowly on time; it evolves with the average locations of the point vortices. However, $\tilde{Q}$ is not the average of $Q$ ; in that way it resembles the quantity $\tilde{\unicode[STIX]{x1D70C}}$ introduced by Bühler & McIntyre (Reference Bühler and McIntyre1998). Just as their $\tilde{\unicode[STIX]{x1D70C}}$ represents the mass density that would occur if the fluid moved at the average velocity, $\tilde{Q}$ represents the APV that would be present if the point vortices moved at their average velocity $\dot{\bar{\boldsymbol{x}}}_{i}$ .

We pass to the limit of continuous APV by replacing

(5.9) $$\begin{eqnarray}\mathop{\sum }_{i}\unicode[STIX]{x1D6E4}_{i}\unicode[STIX]{x1D6FF}(\boldsymbol{x}-\bar{\boldsymbol{x}}_{i}(t))\rightarrow \frac{\unicode[STIX]{x2202}(\tilde{\unicode[STIX]{x1D6FC}},\tilde{\unicode[STIX]{x1D6FD}})}{\unicode[STIX]{x2202}(x,y)},\end{eqnarray}$$

where $(\tilde{\unicode[STIX]{x1D6FC}},\tilde{\unicode[STIX]{x1D6FD}})$ are labelling variables for $\tilde{Q}$ . Compare (5.9)–(2.21). Then, following steps similar to those between (2.21) and (2.27), we arrive at

(5.10) $$\begin{eqnarray}\langle L_{2}\rangle =\iiint \,\text{d}t\,\text{d}\boldsymbol{x}\left(-\tilde{\unicode[STIX]{x1D6FC}}\tilde{\unicode[STIX]{x1D6FD}}_{t}+\frac{\unicode[STIX]{x2202}(\tilde{\unicode[STIX]{x1D6FC}},\tilde{\unicode[STIX]{x1D6FD}})}{\unicode[STIX]{x2202}(x,y)}\left(\bar{\unicode[STIX]{x1D713}}-\langle \unicode[STIX]{x1D709}\unicode[STIX]{x1D702}_{t}\rangle +\langle \unicode[STIX]{x1D743}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\unicode[STIX]{x1D713}^{\prime }\rangle -\langle \unicode[STIX]{x1D6FE}_{y}^{\prime }\unicode[STIX]{x1D709}_{t}\rangle +\langle \unicode[STIX]{x1D6FE}_{x}^{\prime }\unicode[STIX]{x1D702}_{t}\rangle \right)\right)\end{eqnarray}$$

for the shallow-water case. The $\langle L_{2}\rangle$ for hydrostatic Boussinesq dynamics requires only an additional integration with respect to $z$ and the reinterpretation of all of the derivatives in (5.10) as derivatives with $z$ held fixed.

The Lagrangian (5.10) depends on the four fast variables $\unicode[STIX]{x1D713}^{\prime }$ , $\unicode[STIX]{x1D6FE}^{\prime }$ , $\unicode[STIX]{x1D709}$ and $\unicode[STIX]{x1D702}$ . However, we may use the equations

(5.11) $$\begin{eqnarray}\displaystyle & \unicode[STIX]{x1D6FF}\unicode[STIX]{x1D709}:\quad \unicode[STIX]{x1D702}_{t}=\unicode[STIX]{x1D713}_{x}^{\prime }+\unicode[STIX]{x1D6FE}_{ty}^{\prime }, & \displaystyle\end{eqnarray}$$
(5.12) $$\begin{eqnarray}\displaystyle & \unicode[STIX]{x1D6FF}\unicode[STIX]{x1D702}:\quad \unicode[STIX]{x1D709}_{t}=-\unicode[STIX]{x1D713}_{y}^{\prime }+\unicode[STIX]{x1D6FE}_{tx}^{\prime } & \displaystyle\end{eqnarray}$$

to eliminate $\unicode[STIX]{x1D713}^{\prime }$ and $\unicode[STIX]{x1D6FE}^{\prime }$ in favour of $\unicode[STIX]{x1D709}$ and $\unicode[STIX]{x1D702}$ . Note that the variations $\unicode[STIX]{x1D6FF}\unicode[STIX]{x1D743}$ are performed inside the averaging symbol in (5.10), and that the time integrations by parts needed to establish (5.11)–(5.12) ignore the slow time variations of the Jacobian in (5.10). Equations (5.11)–(5.12) equate the time derivatives of $\unicode[STIX]{x1D743}$ to the fluctuating velocity. Substituting (5.11)–(5.12) back into (5.10) (and adding the $z$ -integration required for three-dimensional Boussinesq dynamics), we obtain

(5.13) $$\begin{eqnarray}\langle L_{2}\rangle =\iiiint \,\text{d}t\,\text{d}\boldsymbol{x}\,\text{d}z\left(-\tilde{\unicode[STIX]{x1D6FC}}\tilde{\unicode[STIX]{x1D6FD}}_{t}+\frac{\unicode[STIX]{x2202}(\tilde{\unicode[STIX]{x1D6FC}},\tilde{\unicode[STIX]{x1D6FD}})}{\unicode[STIX]{x2202}(x,y)}(\bar{\unicode[STIX]{x1D713}}+\langle \unicode[STIX]{x1D709}\unicode[STIX]{x1D702}_{t}\rangle )\right)\end{eqnarray}$$

for hydrostatic Boussinesq dynamics.

6 The complete Lagrangian

Now, we combine the results of §§ 4 and 5 to obtain the complete Lagrangian. To obtain a form of $\langle L_{1}\rangle$ that is compatible with (5.13), we must eliminate the variables $\unicode[STIX]{x1D713}^{\prime }$ and $\unicode[STIX]{x1D6FE}^{\prime }$ from (4.10)–(4.13). Using (5.11)–(5.12), we obtain

(6.1) $$\begin{eqnarray}\displaystyle \langle L_{1}\rangle & = & \displaystyle \!\iiiint \,\text{d}\boldsymbol{x}\,\text{d}z\,\text{d}t\!\left(\frac{1}{2}N_{0}^{-2}(\bar{B}_{z})^{2}+\frac{1}{2}\unicode[STIX]{x1D735}\bar{\unicode[STIX]{x1D713}}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\bar{\unicode[STIX]{x1D713}}+\frac{1}{2}\unicode[STIX]{x1D735}\bar{\unicode[STIX]{x1D6FE}}_{t}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\bar{\unicode[STIX]{x1D6FE}}_{t}+f_{0}\unicode[STIX]{x1D735}\bar{\unicode[STIX]{x1D713}}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\bar{\unicode[STIX]{x1D6FE}}+\bar{B}\unicode[STIX]{x1D6FB}^{2}\bar{\unicode[STIX]{x1D6FE}}\right.\nonumber\\ \displaystyle & & \displaystyle +\left.\left\langle \frac{1}{2}N_{0}^{-2}(B_{z}^{\prime })^{2}+\frac{1}{2}\unicode[STIX]{x1D743}_{t}\boldsymbol{\cdot }\unicode[STIX]{x1D743}_{t}+f_{0}\unicode[STIX]{x1D709}\unicode[STIX]{x1D702}_{t}+B^{\prime }(\unicode[STIX]{x1D709}_{x}+\unicode[STIX]{x1D702}_{y})\right\rangle +\bar{\unicode[STIX]{x1D713}}M+\bar{\unicode[STIX]{x1D6FE}}\unicode[STIX]{x1D6FB}^{2}T\right),\end{eqnarray}$$

where

(6.2) $$\begin{eqnarray}M=-\unicode[STIX]{x1D735}\times \langle (\unicode[STIX]{x1D709}_{x}+\unicode[STIX]{x1D702}_{y})\unicode[STIX]{x1D743}_{t}\rangle\end{eqnarray}$$

and

(6.3) $$\begin{eqnarray}T=\left\langle {\textstyle \frac{1}{2}}\unicode[STIX]{x1D743}_{t}\boldsymbol{\cdot }\unicode[STIX]{x1D743}_{t}\right\rangle .\end{eqnarray}$$

In writing (6.1), we have dropped the last two terms in (4.13): since the average is an average over the fast time dependence of the waves, we have $\unicode[STIX]{x2202}_{t}\langle \rangle =0$ .

Now, we embark on a series of simplifications. First, assuming that the mean flow response will be slow, we drop the $(1/2)\unicode[STIX]{x1D735}\bar{\unicode[STIX]{x1D6FE}}_{t}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\bar{\unicode[STIX]{x1D6FE}}_{t}$ -term from (6.1). This neglect is equivalent to the neglect of $\bar{B}_{tt}$ in (4.29), which led to (4.31). By a priori neglect of $(1/2)\unicode[STIX]{x1D735}\bar{\unicode[STIX]{x1D6FE}}_{t}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\bar{\unicode[STIX]{x1D6FE}}_{t}$ , we prevent the mean flow from developing its own inertia–gravity waves.

Next, noting that the slow variables $\bar{B}$ and $\bar{\unicode[STIX]{x1D6FE}}$ appear only in $\langle L_{1}\rangle$ , we use the equation

(6.4) $$\begin{eqnarray}\unicode[STIX]{x1D6FF}\bar{\unicode[STIX]{x1D6FE}}:\quad -f_{0}\unicode[STIX]{x1D6FB}^{2}\bar{\unicode[STIX]{x1D713}}+\unicode[STIX]{x1D6FB}^{2}\bar{B}+\unicode[STIX]{x1D6FB}^{2}T=0\end{eqnarray}$$

to remove $\bar{B}$ and $\bar{\unicode[STIX]{x1D6FE}}$ from (6.1). That is, by substituting

(6.5) $$\begin{eqnarray}\bar{B}=f_{0}\bar{\unicode[STIX]{x1D713}}-T\end{eqnarray}$$

into the terms

(6.6) $$\begin{eqnarray}\displaystyle & & \displaystyle \iiiint \,\text{d}\boldsymbol{x}\,\text{d}z\,\text{d}t\left(\frac{1}{2}N_{0}^{-2}(\bar{B}_{z})^{2}+f_{0}\unicode[STIX]{x1D735}\bar{\unicode[STIX]{x1D713}}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\bar{\unicode[STIX]{x1D6FE}}+\bar{B}\unicode[STIX]{x1D6FB}^{2}\bar{\unicode[STIX]{x1D6FE}}+\bar{\unicode[STIX]{x1D6FE}}\unicode[STIX]{x1D6FB}^{2}T\right)\nonumber\\ \displaystyle & & \displaystyle \quad =\iiiint \,\text{d}\boldsymbol{x}\,\text{d}z\,\text{d}t\left(\frac{1}{2}N_{0}^{-2}(\bar{B}_{z})^{2}+\unicode[STIX]{x1D6FB}^{2}\bar{\unicode[STIX]{x1D6FE}}(-f_{0}\bar{\unicode[STIX]{x1D713}}+\bar{B}+T)\right),\end{eqnarray}$$

we obtain

(6.7) $$\begin{eqnarray}\iiiint \,\text{d}\boldsymbol{x}\,\text{d}z\,\text{d}t\left(\frac{1}{2}N_{0}^{-2}(\bar{B}_{z})^{2}+0\right)\approx \iiiint \,\text{d}\boldsymbol{x}\,\text{d}z\,\text{d}t\left(\frac{1}{2}\frac{f_{0}^{2}}{N_{0}^{2}}(\bar{\unicode[STIX]{x1D713}}_{z})^{2}-\frac{f_{0}}{N_{0}^{2}}\bar{\unicode[STIX]{x1D713}}_{z}T_{z}\right)\end{eqnarray}$$

after neglecting terms that are quartic in the wave amplitudes. With these simplifications, equation (6.1) takes the form

(6.8) $$\begin{eqnarray}\displaystyle \langle L_{1}\rangle & = & \displaystyle \iiiint \,\text{d}\boldsymbol{x}\,\text{d}z\,\text{d}t\left(\frac{1}{2}\unicode[STIX]{x1D735}\bar{\unicode[STIX]{x1D713}}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\bar{\unicode[STIX]{x1D713}}+\frac{1}{2}\frac{f_{0}^{2}}{N_{0}^{2}}(\bar{\unicode[STIX]{x1D713}}_{z})^{2}-\frac{f_{0}}{N_{0}^{2}}\bar{\unicode[STIX]{x1D713}}_{z}T_{z}\right.\nonumber\\ \displaystyle & & \displaystyle +\left.\left\langle \frac{1}{2}N_{0}^{-2}(B_{z}^{\prime })^{2}+\frac{1}{2}\unicode[STIX]{x1D743}_{t}\boldsymbol{\cdot }\unicode[STIX]{x1D743}_{t}+f_{0}\unicode[STIX]{x1D709}\unicode[STIX]{x1D702}_{t}+B^{\prime }\unicode[STIX]{x1D6FB}^{2}(\unicode[STIX]{x1D709}_{x}+\unicode[STIX]{x1D702}_{y})\right\rangle +\bar{\unicode[STIX]{x1D713}}M\right).\end{eqnarray}$$

Finally, we note that $\unicode[STIX]{x1D6FF}(\langle L_{1}\rangle +\langle L_{2}\rangle )=0$ implies

(6.9) $$\begin{eqnarray}\unicode[STIX]{x1D6FF}\bar{\unicode[STIX]{x1D713}}:\quad \unicode[STIX]{x1D6FB}^{2}\bar{\unicode[STIX]{x1D713}}+\left(\frac{f_{0}^{2}}{N_{0}^{2}}\bar{\unicode[STIX]{x1D713}}_{z}\right)_{z}-M-\left(\frac{f_{0}}{N_{0}^{2}}T_{z}\right)_{z}=\frac{\unicode[STIX]{x2202}(\tilde{\unicode[STIX]{x1D6FC}},\tilde{\unicode[STIX]{x1D6FD}})}{\unicode[STIX]{x2202}(x,y)}.\end{eqnarray}$$

To consistent order, we may use (6.9) to eliminate the Jacobian on the right-hand side of (5.13) from its product with $\langle \unicode[STIX]{x1D709}\unicode[STIX]{x1D702}_{t}\rangle$ . Neglecting terms quartic in the wave amplitudes, we obtain

(6.10) $$\begin{eqnarray}\langle L_{2}\rangle [\tilde{\unicode[STIX]{x1D6FC}},\tilde{\unicode[STIX]{x1D6FD}},\bar{\unicode[STIX]{x1D713}},\unicode[STIX]{x1D709},\unicode[STIX]{x1D702}]=\iiiint \,\text{d}t\,\text{d}\boldsymbol{x}\,\text{d}z\left(-\tilde{\unicode[STIX]{x1D6FC}}\tilde{\unicode[STIX]{x1D6FD}}_{t}+\bar{\unicode[STIX]{x1D713}}\frac{\unicode[STIX]{x2202}(\tilde{\unicode[STIX]{x1D6FC}},\tilde{\unicode[STIX]{x1D6FD}})}{\unicode[STIX]{x2202}(x,y)}+\langle \unicode[STIX]{x1D709}\unicode[STIX]{x1D702}_{t}\rangle {\mathcal{L}}\bar{\unicode[STIX]{x1D713}}\right),\end{eqnarray}$$

where

(6.11) $$\begin{eqnarray}{\mathcal{L}}\equiv \unicode[STIX]{x1D6FB}^{2}+\unicode[STIX]{x2202}_{z}\frac{f_{0}^{2}}{N_{0}^{2}}\unicode[STIX]{x2202}_{z}\end{eqnarray}$$

is the familiar potential vorticity operator.

Combining (6.8) and (6.10), we obtain the averaged Lagrangian for the complete system as

(6.12) $$\begin{eqnarray}\langle L_{1}+L_{2}\rangle =L_{QG}[\bar{\unicode[STIX]{x1D713}},\tilde{\unicode[STIX]{x1D6FC}},\tilde{\unicode[STIX]{x1D6FD}}]+L_{IG}[\unicode[STIX]{x1D709},\unicode[STIX]{x1D702},B^{\prime }]+L_{C}[\bar{\unicode[STIX]{x1D713}},\unicode[STIX]{x1D709},\unicode[STIX]{x1D702}],\end{eqnarray}$$

where

(6.13) $$\begin{eqnarray}L_{QG}=\iiiint \,\text{d}\boldsymbol{x}\,\text{d}z\,\text{d}t\left(\frac{1}{2}\unicode[STIX]{x1D735}\bar{\unicode[STIX]{x1D713}}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\bar{\unicode[STIX]{x1D713}}+\frac{1}{2}\frac{f_{0}^{2}}{N_{0}^{2}}(\bar{\unicode[STIX]{x1D713}}_{z})^{2}-\tilde{\unicode[STIX]{x1D6FC}}\tilde{\unicode[STIX]{x1D6FD}}_{t}+\bar{\unicode[STIX]{x1D713}}\frac{\unicode[STIX]{x2202}(\tilde{\unicode[STIX]{x1D6FC}},\tilde{\unicode[STIX]{x1D6FD}})}{\unicode[STIX]{x2202}(x,y)}\right)\end{eqnarray}$$

is the Lagrangian for quasi-geostrophic dynamics in the absence of inertia–gravity waves,

(6.14) $$\begin{eqnarray}L_{IG}=\iiiint \,\text{d}\boldsymbol{x}\,\text{d}z\,\text{d}t\left\langle \frac{1}{2}N_{0}^{-2}(B_{z}^{\prime })^{2}+\frac{1}{2}\unicode[STIX]{x1D743}_{t}\boldsymbol{\cdot }\unicode[STIX]{x1D743}_{t}+f_{0}\unicode[STIX]{x1D709}\unicode[STIX]{x1D702}_{t}+B^{\prime }(\unicode[STIX]{x1D709}_{x}+\unicode[STIX]{x1D702}_{y})\right\rangle\end{eqnarray}$$

is the Lagrangian for inertia–gravity waves in the absence of a quasi-geostrophic mean flow and

(6.15) $$\begin{eqnarray}L_{C}=\iiiint \,\text{d}\boldsymbol{x}\,\text{d}z\,\text{d}t\bar{\unicode[STIX]{x1D713}}\left(M+\left(\frac{f_{0}}{N_{0}^{2}}T_{z}\right)_{z}+{\mathcal{L}}\langle \unicode[STIX]{x1D709}\unicode[STIX]{x1D702}_{t}\rangle \right)\end{eqnarray}$$

is the Lagrangian that couples the inertia–gravity waves to the quasi-geostrophic motion. The quadratic wave averages $M$ and $T$ are defined by (6.2) and (6.3). The terms involving $M$ and $T$ occur in (6.1) and would be present even if there were no APV. The last term in (6.15) represents the coupling between the waves and the APV.

The Lagrangian (6.12)–(6.15) is our fundamental result. To derive it, we have assumed that the quasi-geostrophic motion, now represented by $(\bar{\unicode[STIX]{x1D713}},\tilde{\unicode[STIX]{x1D6FC}},\tilde{\unicode[STIX]{x1D6FD}})$ , evolves slowly in comparison with the inertia–gravity waves, now represented by $(\unicode[STIX]{x1D709},\unicode[STIX]{x1D702},B^{\prime })$ , and we have identified the average with an average over fast time. This assumption of a time scale separation between the mean flow and the waves would seem to be the weakest assumption on which to base any wave–mean theory: the waves cannot have frequencies less than $f_{0}$ , and the quasi-geostrophic flow evolves on a time scale longer than $1/f_{0}$ . Stronger assumptions lead to simplifications of (6.15) that yield simpler equations. In the next section, we explore a spectrum of possibilities.

7 Wave–mean equations

We work with the Lagrangian (6.12)–(6.15), which assumes that the waves and the mean flow are separated only by their time scales. We obtain the equations for the mean flow by varying $\bar{\unicode[STIX]{x1D713}}$ , $\tilde{\unicode[STIX]{x1D6FC}}$ and $\tilde{\unicode[STIX]{x1D6FD}}$ . By the Jacobi identity (2.50) and the definition

(7.1) $$\begin{eqnarray}\tilde{Q}=\frac{\unicode[STIX]{x2202}(\tilde{\unicode[STIX]{x1D6FC}},\tilde{\unicode[STIX]{x1D6FD}})}{\unicode[STIX]{x2202}(x,y)},\end{eqnarray}$$

the equations

(7.2) $$\begin{eqnarray}\unicode[STIX]{x1D6FF}\tilde{\unicode[STIX]{x1D736}}:\quad \tilde{\unicode[STIX]{x1D736}}_{t}+J(\bar{\unicode[STIX]{x1D713}},\tilde{\unicode[STIX]{x1D736}})=0\end{eqnarray}$$

imply

(7.3) $$\begin{eqnarray}\tilde{Q}_{t}+J(\bar{\unicode[STIX]{x1D713}},\tilde{Q})=0.\end{eqnarray}$$

The $\unicode[STIX]{x1D6FF}\bar{\unicode[STIX]{x1D713}}$ -variation yields the defining equation for $\tilde{Q}$ . The variations of $\unicode[STIX]{x1D709}$ , $\unicode[STIX]{x1D702}$ and $B^{\prime }$ yield the equations for the waves. The precise form of these equations will depend on additional assumptions that we may wish to make. At leading order, $\unicode[STIX]{x1D6FF}L_{IG}=0$ implies the linear dynamics

(7.4) $$\begin{eqnarray}\displaystyle & \displaystyle \unicode[STIX]{x1D6FF}\unicode[STIX]{x1D709}:\quad \unicode[STIX]{x1D709}_{tt}-f_{0}\unicode[STIX]{x1D702}_{t}=-B_{x}^{\prime }, & \displaystyle\end{eqnarray}$$
(7.5) $$\begin{eqnarray}\displaystyle & \displaystyle \unicode[STIX]{x1D6FF}\unicode[STIX]{x1D702}:\quad \unicode[STIX]{x1D702}_{tt}+f_{0}\unicode[STIX]{x1D709}_{t}=-B_{y}^{\prime }, & \displaystyle\end{eqnarray}$$
(7.6) $$\begin{eqnarray}\displaystyle & \displaystyle \unicode[STIX]{x1D6FF}B^{\prime }:\quad \unicode[STIX]{x2202}_{z}(N_{0}^{-2}B_{z}^{\prime })=\unicode[STIX]{x1D709}_{x}+\unicode[STIX]{x1D702}_{y}, & \displaystyle\end{eqnarray}$$

which are equivalent to (4.4)–(4.6). We may consistently use (7.4)–(7.6) to simplify the coupling Lagrangian (6.15).

The strongest possible assumption appears to be the WKB assumption of a separation in scale with respect to all four of the independent variables $x,y,z,t$ . Then, the averaging operator corresponds to an average over wave phase. Let $\unicode[STIX]{x1D716}$ be the scale-separation parameter, as in § 4. Then, the last two terms in (6.15) are $O(\unicode[STIX]{x1D716}^{2})$ because they involve two spatial derivatives of phase averages, whereas the $\bar{\unicode[STIX]{x1D713}}M$ -term in (6.15) is $O(\unicode[STIX]{x1D716})$ , as already noted in § 4. Thus, under WKB scaling,

(7.7) $$\begin{eqnarray}\displaystyle L_{C} & {\approx} & \displaystyle \iiiint \,\text{d}\boldsymbol{x}\,\text{d}z\,\text{d}t(-\bar{\unicode[STIX]{x1D713}}\unicode[STIX]{x1D735}\times \langle \unicode[STIX]{x1D743}_{t}(\unicode[STIX]{x1D709}_{x}+\unicode[STIX]{x1D702}_{y})\rangle )\nonumber\\ \displaystyle & {\approx} & \displaystyle \iiiint \,\text{d}\boldsymbol{x}\,\text{d}z\,\text{d}t(-\bar{\unicode[STIX]{x1D713}}\unicode[STIX]{x1D735}\times \langle (\unicode[STIX]{x1D709}_{x}\unicode[STIX]{x1D709}_{t},\unicode[STIX]{x1D702}_{y}\unicode[STIX]{x1D702}_{t})\rangle )=\iiiint \,\text{d}\boldsymbol{x}\,\text{d}z\,\text{d}t(\bar{\unicode[STIX]{x1D713}}\unicode[STIX]{x1D735}\times \boldsymbol{p}),\qquad\end{eqnarray}$$

where

(7.8) $$\begin{eqnarray}\boldsymbol{p}=-(\langle \unicode[STIX]{x1D709}_{x}\unicode[STIX]{x1D709}_{t}\rangle ,\langle \unicode[STIX]{x1D702}_{y}\unicode[STIX]{x1D702}_{t}\rangle ).\end{eqnarray}$$

The second approximation in (7.7) uses the fact that the variables $\unicode[STIX]{x1D709}$ and $\unicode[STIX]{x1D702}$ are $90^{\circ }$ out of phase, as may be shown from (7.4)–(7.6). Using (7.4)–(7.6) and $\unicode[STIX]{x2202}_{x}\langle \rangle =0$ , etc., we can rewrite the right-hand side of (7.8) in several ways, including the one preferred by Bühler & McIntyre (Reference Bühler and McIntyre1998). In the case of a slowly varying wavetrain, we find that $\boldsymbol{p}=E\boldsymbol{k}/\unicode[STIX]{x1D714}$ , as in § 4. The variational principle generalizes the results obtained in § 4. The equation

(7.9) $$\begin{eqnarray}\unicode[STIX]{x1D6FF}\bar{\unicode[STIX]{x1D713}}:\quad \unicode[STIX]{x1D6FB}^{2}\bar{\unicode[STIX]{x1D713}}+\frac{\unicode[STIX]{x2202}}{\unicode[STIX]{x2202}z}\left(\frac{f_{0}^{2}}{N_{0}^{2}}\frac{\unicode[STIX]{x2202}\bar{\unicode[STIX]{x1D713}}}{\unicode[STIX]{x2202}z}\right)-\unicode[STIX]{x1D735}\times \boldsymbol{p}=\tilde{Q}\end{eqnarray}$$

replaces (4.31), and we now have (7.3) for advection of APV. The dispersion relation (4.23) and the action equation (4.24) are unchanged. These results are consistent with Bühler & McIntyre (Reference Bühler and McIntyre1998). They obtain (7.3) and (7.9), but their wave equations differ from (4.23) and (4.24) because they assume that mean flow is $O(1)$ . For reasons that are further explained in § 8, we have assumed that the mean flow is $O(a^{2})$ .

A weaker assumption than the WKB assumption is that the waves have frequencies near the inertial frequency $f_{0}$ . Such waves vary rapidly in $t$ and $z$ , but not in $x$ or $y$ . Assuming that the average corresponds to an average over fast time and fast $z$ , we can consistently neglect the $z$ -derivatives of averages in the coupling Lagrangian (6.15). Then,

(7.10) $$\begin{eqnarray}L_{C}=\iiiint \,\text{d}\boldsymbol{x}\,\text{d}z\,\text{d}t\bar{\unicode[STIX]{x1D713}}(\unicode[STIX]{x1D6FB}^{2}\langle \unicode[STIX]{x1D709}\unicode[STIX]{x1D702}_{t}\rangle -\unicode[STIX]{x1D735}\times \langle (\unicode[STIX]{x1D709}_{x}+\unicode[STIX]{x1D702}_{y})\unicode[STIX]{x1D743}_{t}\rangle ).\end{eqnarray}$$

We use (7.6) to eliminate $B^{\prime }$ from (6.14). Then, (6.14) takes the form

(7.11) $$\begin{eqnarray}L_{IG}=\iiiint \,\text{d}\boldsymbol{x}\,\text{d}z\,\text{d}t\left(\frac{1}{2}\langle \unicode[STIX]{x1D743}_{t}\boldsymbol{\cdot }\unicode[STIX]{x1D743}_{t}\rangle +f_{0}\langle \unicode[STIX]{x1D709}\unicode[STIX]{x1D702}_{t}\rangle -\frac{1}{2}N_{0}^{2}\left\langle \left(\int \,\text{d}z(\unicode[STIX]{x1D709}_{x}+\unicode[STIX]{x1D702}_{y})\right)^{2}\right\rangle \right).\end{eqnarray}$$

In near-inertial motion, the last term in (7.11) is small. With this in mind, we rewrite the complete Lagrangian as

(7.12) $$\begin{eqnarray}\langle L_{1}+L_{2}\rangle =L_{QG}[\bar{\unicode[STIX]{x1D713}},\tilde{\unicode[STIX]{x1D6FC}},\tilde{\unicode[STIX]{x1D6FD}}]+L_{I}[\unicode[STIX]{x1D709},\unicode[STIX]{x1D702}]+L_{CN}[\bar{\unicode[STIX]{x1D713}},\unicode[STIX]{x1D709},\unicode[STIX]{x1D702}],\end{eqnarray}$$

where $L_{QG}$ is given by (6.13),

(7.13) $$\begin{eqnarray}L_{I}=\iiiint \,\text{d}\boldsymbol{x}\,\text{d}z\,\text{d}t\left(\frac{1}{2}\langle \unicode[STIX]{x1D743}_{t}\boldsymbol{\cdot }\unicode[STIX]{x1D743}_{t}\rangle +f_{0}\langle \unicode[STIX]{x1D709}\unicode[STIX]{x1D702}_{t}\rangle \right)\end{eqnarray}$$

is the Lagrangian for inertia waves and

(7.14) $$\begin{eqnarray}L_{CN}=\iiiint \,\text{d}\boldsymbol{x}\,\text{d}z\,\text{d}t\left(\bar{\unicode[STIX]{x1D713}}\unicode[STIX]{x1D6FB}^{2}\langle \unicode[STIX]{x1D709}\unicode[STIX]{x1D702}_{t}\rangle -\bar{\unicode[STIX]{x1D713}}\unicode[STIX]{x1D735}\times \langle (\unicode[STIX]{x1D709}_{x}+\unicode[STIX]{x1D702}_{y})\unicode[STIX]{x1D743}_{t}\rangle -\frac{1}{2}N_{0}^{2}\left\langle \left(\int \,\text{d}z(\unicode[STIX]{x1D709}_{x}+\unicode[STIX]{x1D702}_{y})\right)^{2}\right\rangle \right)\end{eqnarray}$$

is the sum of (7.10) and the last term in (7.11). We shall treat all of the terms in (7.14) as small terms.

Following Young & Ben Jelloul (Reference Young and Ben Jelloul1997) and Xie & Vanneste (Reference Xie and Vanneste2015, hereafter XV), we set

(7.15) $$\begin{eqnarray}\unicode[STIX]{x1D709}+\text{i}\unicode[STIX]{x1D702}=\unicode[STIX]{x1D712}_{z}\text{e}^{-\text{i}f_{0}t}.\end{eqnarray}$$

The exponential factor in (7.15) represents the fast time dependence of inertial motion. The complex coefficient $\unicode[STIX]{x1D712}(x,y,z,t)$ varies slowly in $x,y,t$ but rapidly in $z$ . From (7.15), we have

(7.16) $$\begin{eqnarray}\unicode[STIX]{x1D709}_{t}+\text{i}\unicode[STIX]{x1D702}_{t}=-\text{i}f_{0}\unicode[STIX]{x1D712}_{z}\text{e}^{-\text{i}f_{0}t}+\unicode[STIX]{x1D712}_{zt}\text{e}^{-\text{i}f_{0}t}\end{eqnarray}$$

and

(7.17) $$\begin{eqnarray}\int \,\text{d}z(\unicode[STIX]{x1D709}_{x}+\unicode[STIX]{x1D702}_{y})=\frac{1}{2}(\unicode[STIX]{x1D712}_{x}-\text{i}\unicode[STIX]{x1D712}_{y})\text{e}^{-\text{i}f_{0}t}+\text{c.c.},\end{eqnarray}$$

where c.c. denotes the complex conjugate. We substitute (7.16) and (7.17) back into (7.13) and (7.14), computing the averages as averages over the fast time dependence of the exponential factors. In (7.13), we must keep terms of the first and second order. In (7.14), we keep only the leading-order terms. The Coriolis parameter $f_{0}$ serves as a convenient ordering parameter. Thus, in (7.13) we set

(7.18) $$\begin{eqnarray}{\textstyle \frac{1}{2}}\langle \unicode[STIX]{x1D743}_{t}\boldsymbol{\cdot }\unicode[STIX]{x1D743}_{t}\rangle ={\textstyle \frac{1}{2}}f_{0}^{2}\langle \unicode[STIX]{x1D712}_{z}\unicode[STIX]{x1D712}_{z}^{\ast }\rangle +\text{i}f_{0}\langle \unicode[STIX]{x1D712}_{zt}\unicode[STIX]{x1D712}_{z}^{\ast }\rangle\end{eqnarray}$$

and

(7.19) $$\begin{eqnarray}\langle \unicode[STIX]{x1D709}\unicode[STIX]{x1D702}_{t}\rangle =-{\textstyle \frac{1}{2}}f_{0}\langle \unicode[STIX]{x1D712}_{z}\unicode[STIX]{x1D712}_{z}^{\ast }\rangle -\text{i}{\textstyle \frac{1}{2}}\langle \unicode[STIX]{x1D712}_{zt}\unicode[STIX]{x1D712}_{z}^{\ast }\rangle ,\end{eqnarray}$$

and in (7.14) we set

(7.20) $$\begin{eqnarray}\displaystyle & \displaystyle \langle \unicode[STIX]{x1D709}\unicode[STIX]{x1D702}_{t}\rangle =-{\textstyle \frac{1}{2}}f_{0}\langle \unicode[STIX]{x1D712}_{z}\unicode[STIX]{x1D712}_{z}^{\ast }\rangle , & \displaystyle\end{eqnarray}$$
(7.21) $$\begin{eqnarray}\displaystyle & \displaystyle \unicode[STIX]{x1D735}\times \langle \unicode[STIX]{x1D743}_{t}(\unicode[STIX]{x1D709}_{x}+\unicode[STIX]{x1D702}_{y})\rangle ={\textstyle \frac{1}{2}}\text{i}f_{0}\langle J(\unicode[STIX]{x1D712}_{z}^{\ast },\unicode[STIX]{x1D712}_{z})\rangle -{\textstyle \frac{1}{4}}f_{0}\unicode[STIX]{x1D6FB}^{2}\langle \unicode[STIX]{x1D712}_{z}\unicode[STIX]{x1D712}_{z}^{\ast }\rangle & \displaystyle\end{eqnarray}$$

and

(7.22) $$\begin{eqnarray}\left\langle \left(\int \,\text{d}z(\unicode[STIX]{x1D709}_{x}+\unicode[STIX]{x1D702}_{y})\right)^{2}\right\rangle =\frac{1}{2}\langle \unicode[STIX]{x1D735}\unicode[STIX]{x1D712}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\unicode[STIX]{x1D712}^{\ast }\rangle .\end{eqnarray}$$

Here, $^{\ast }$ denotes complex conjugation and the averaging symbols enclose the rapid $z$ -dependence of $\unicode[STIX]{x1D712}$ . In simplifying the terms, we have freely used integrations by parts with respect to the rapid variations in $z$ and $t$ , and the fact that the averages are averages over these rapid variations. With these substitutions, (7.13) takes the form

(7.23) $$\begin{eqnarray}L_{I}=\iiiint \,\text{d}\boldsymbol{x}\,\text{d}z\,\text{d}t\frac{1}{2}\text{i}f_{0}\langle \unicode[STIX]{x1D712}_{zt}\unicode[STIX]{x1D712}_{z}^{\ast }\rangle\end{eqnarray}$$

and (7.14) takes the form

(7.24) $$\begin{eqnarray}L_{CN}=\iiiint \,\text{d}\boldsymbol{x}\,\text{d}z\,\text{d}t\left(\frac{1}{2}\text{i}f_{0}\bar{\unicode[STIX]{x1D713}}\langle J(\unicode[STIX]{x1D712}_{z},\unicode[STIX]{x1D712}_{z}^{\ast })\rangle -\frac{1}{4}f_{0}\bar{\unicode[STIX]{x1D713}}\unicode[STIX]{x1D6FB}^{2}\langle \unicode[STIX]{x1D712}_{z}\unicode[STIX]{x1D712}_{z}^{\ast }\rangle -\frac{1}{4}N_{0}^{2}\langle \unicode[STIX]{x1D735}\unicode[STIX]{x1D712}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\unicode[STIX]{x1D712}^{\ast }\rangle \right).\end{eqnarray}$$

The complete Lagrangian for near-inertial motion is

(7.25) $$\begin{eqnarray}\displaystyle & & \displaystyle L_{QG}[\bar{\unicode[STIX]{x1D713}},\tilde{\unicode[STIX]{x1D6FC}},\tilde{\unicode[STIX]{x1D6FD}}]+L_{I}[\unicode[STIX]{x1D712}]+L_{CN}[\bar{\unicode[STIX]{x1D713}},\unicode[STIX]{x1D712}]\nonumber\\ \displaystyle & & \displaystyle \quad =\iiiint \,\text{d}\boldsymbol{x}\,\text{d}z\,\text{d}t\left(-\frac{1}{2}\bar{\unicode[STIX]{x1D713}}{\mathcal{L}}\bar{\unicode[STIX]{x1D713}}-\tilde{\unicode[STIX]{x1D6FC}}\tilde{\unicode[STIX]{x1D6FD}}_{t}+\bar{\unicode[STIX]{x1D713}}J(\tilde{\unicode[STIX]{x1D6FC}},\tilde{\unicode[STIX]{x1D6FD}})+\frac{\text{i}f_{0}}{2}\langle \unicode[STIX]{x1D712}_{zt}\unicode[STIX]{x1D712}_{z}^{\ast }\rangle -\frac{N_{0}^{2}}{4}\langle \unicode[STIX]{x1D735}\unicode[STIX]{x1D712}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\unicode[STIX]{x1D712}^{\ast }\rangle \right.\nonumber\\ \displaystyle & & \displaystyle \qquad +\left.\frac{\text{i}f_{0}}{2}\bar{\unicode[STIX]{x1D713}}\langle J(\unicode[STIX]{x1D712}_{z},\unicode[STIX]{x1D712}_{z}^{\ast })\rangle -\frac{f_{0}}{4}\bar{\unicode[STIX]{x1D713}}\unicode[STIX]{x1D6FB}^{2}\langle \unicode[STIX]{x1D712}_{z}\unicode[STIX]{x1D712}_{z}^{\ast }\rangle \right).\end{eqnarray}$$

The variations of $\tilde{\unicode[STIX]{x1D6FC}}$ and $\tilde{\unicode[STIX]{x1D6FD}}$ lead to (7.3), as in general. The variations of $\bar{\unicode[STIX]{x1D713}}$ and $\unicode[STIX]{x1D712}$ yield

(7.26) $$\begin{eqnarray}\unicode[STIX]{x1D6FF}\bar{\unicode[STIX]{x1D713}}:\quad {\mathcal{L}}\bar{\unicode[STIX]{x1D713}}+\frac{\text{i}f_{0}}{2}\langle J(\unicode[STIX]{x1D712}_{z}^{\ast },\unicode[STIX]{x1D712}_{z})\rangle +\frac{f_{0}}{4}\unicode[STIX]{x1D6FB}^{2}\langle \unicode[STIX]{x1D712}_{z}\unicode[STIX]{x1D712}_{z}^{\ast }\rangle =\tilde{Q}\end{eqnarray}$$

and

(7.27) $$\begin{eqnarray}\unicode[STIX]{x1D6FF}\unicode[STIX]{x1D712}^{\ast }:\quad \unicode[STIX]{x1D712}_{zzt}+\frac{\text{i}N_{0}^{2}}{2f_{0}}\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D712}+J(\bar{\unicode[STIX]{x1D713}},\unicode[STIX]{x1D712}_{z})_{z}+\frac{\text{i}}{2}(\unicode[STIX]{x1D712}_{z}\unicode[STIX]{x1D6FB}^{2}\bar{\unicode[STIX]{x1D713}})_{z}=0.\end{eqnarray}$$

(The variations $\unicode[STIX]{x1D6FF}\unicode[STIX]{x1D712}$ and $\unicode[STIX]{x1D6FF}\unicode[STIX]{x1D712}^{\ast }$ may be taken independently, but they lead to the same result.) Note that the terms in (7.27) vary rapidly in $z$ . Equations (7.3), (7.26) and (7.27) are very close to the equations derived by XV. In our notation, the XV equations comprise (7.3),

(7.28) $$\begin{eqnarray}{\mathcal{L}}\bar{\unicode[STIX]{x1D713}}+\frac{\text{i}f_{0}}{2}\langle J(\unicode[STIX]{x1D712}_{z}^{\ast },\unicode[STIX]{x1D712}_{z})\rangle +\frac{f_{0}}{4}\langle 2\unicode[STIX]{x1D735}\unicode[STIX]{x1D712}_{z}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\unicode[STIX]{x1D712}_{z}^{\ast }-\unicode[STIX]{x1D712}_{zz}\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D712}^{\ast }-\unicode[STIX]{x1D712}_{zz}^{\ast }\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D712}\rangle =\tilde{Q}\end{eqnarray}$$

and

(7.29) $$\begin{eqnarray}\unicode[STIX]{x1D712}_{zzt}+\frac{\text{i}N_{0}^{2}}{2f_{0}}\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D712}+J(\bar{\unicode[STIX]{x1D713}},\unicode[STIX]{x1D712}_{z})_{z}+\frac{\text{i}}{2}(\bar{\unicode[STIX]{x1D713}}_{zz}\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D712}+\unicode[STIX]{x1D712}_{zz}\unicode[STIX]{x1D6FB}^{2}\bar{\unicode[STIX]{x1D713}}-2\unicode[STIX]{x1D735}\bar{\unicode[STIX]{x1D713}}_{z}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\unicode[STIX]{x1D712}_{z})=0.\end{eqnarray}$$

Equation (7.29) was previously derived by Young & Ben Jelloul (Reference Young and Ben Jelloul1997), who did not consider the effect of the waves on the mean flow. The XV equations correspond to a Lagrangian, namely

(7.30) $$\begin{eqnarray}\displaystyle L_{XV} & = & \displaystyle \iiiint \,\text{d}\boldsymbol{x}\,\text{d}z\,\text{d}t\;\left(-\frac{1}{2}\bar{\unicode[STIX]{x1D713}}{\mathcal{L}}\bar{\unicode[STIX]{x1D713}}-\tilde{\unicode[STIX]{x1D6FC}}\tilde{\unicode[STIX]{x1D6FD}}_{t}+\bar{\unicode[STIX]{x1D713}}J(\tilde{\unicode[STIX]{x1D6FC}},\tilde{\unicode[STIX]{x1D6FD}})+\frac{\text{i}f_{0}}{2}\langle \unicode[STIX]{x1D712}_{zt}\unicode[STIX]{x1D712}_{z}^{\ast }\rangle -\frac{N_{0}^{2}}{4}\langle \unicode[STIX]{x1D735}\unicode[STIX]{x1D712}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\unicode[STIX]{x1D712}^{\ast }\rangle \right.\nonumber\\ \displaystyle & & \displaystyle +\left.\frac{\text{i}f_{0}}{2}\bar{\unicode[STIX]{x1D713}}\langle J(\unicode[STIX]{x1D712}_{z},\unicode[STIX]{x1D712}_{z}^{\ast })\rangle -\frac{f_{0}}{4}\bar{\unicode[STIX]{x1D713}}\unicode[STIX]{x1D6FB}^{2}\langle \unicode[STIX]{x1D712}_{z}\unicode[STIX]{x1D712}_{z}^{\ast }\rangle +\frac{f_{0}}{4}\bar{\unicode[STIX]{x1D713}}\langle \unicode[STIX]{x1D712}_{z}\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D712}^{\ast }+\unicode[STIX]{x1D712}_{z}^{\ast }\unicode[STIX]{x1D6FB}^{2}\unicode[STIX]{x1D712}\rangle _{z}\right),\end{eqnarray}$$

that differs from (7.25) by a term with a factor of the form $\unicode[STIX]{x2202}_{z}\langle \rangle$ . This term is negligible if, as we have assumed, there is a vertical scale separation between the quasi-geostrophic mean flow and the near-inertial waves. Similarly, the differences between (7.26) and (7.28), and between (7.27) and (7.29), are insignificant under this assumption. The important point is that both sets of equations – ours and those of XV – being equivalent to variational principles, correspond to a consistent set of conservation laws. This fact was used to great advantage by XV.

Finally, we consider the case in which we make no additional assumptions beyond the separation in time scales used to derive (6.12)–(6.15). Then, all of the terms in (6.15) must be kept. If we keep all of the terms in (6.15), then the requirement that (6.12) be stationary with respect to variations in $\bar{\unicode[STIX]{x1D713}}$ yields the generalization

(7.31) $$\begin{eqnarray}{\mathcal{L}}\bar{\unicode[STIX]{x1D713}}+q^{w}=\tilde{Q}\end{eqnarray}$$

of (7.9) and (7.26), where

(7.32) $$\begin{eqnarray}\displaystyle q^{w} & = & \displaystyle -M-\unicode[STIX]{x2202}_{z}\frac{f_{0}}{N_{0}^{2}}T_{z}-{\mathcal{L}}\langle \unicode[STIX]{x1D709}\unicode[STIX]{x1D702}_{t}\rangle \nonumber\\ \displaystyle & = & \displaystyle \unicode[STIX]{x1D735}\times \langle (\unicode[STIX]{x1D709}_{x}+\unicode[STIX]{x1D702}_{y})\unicode[STIX]{x1D743}_{t}\rangle -\unicode[STIX]{x1D6FB}^{2}\langle \unicode[STIX]{x1D709}\unicode[STIX]{x1D702}_{t}\rangle -\unicode[STIX]{x2202}_{z}\frac{f_{0}}{N_{0}^{2}}\unicode[STIX]{x2202}_{z}\left(f_{0}\langle \unicode[STIX]{x1D709}\unicode[STIX]{x1D702}_{t}\rangle +\frac{1}{2}\langle \unicode[STIX]{x1D743}_{t}\boldsymbol{\cdot }\unicode[STIX]{x1D743}_{t}\rangle \right).\end{eqnarray}$$

Using (7.4)–(7.6) and $\unicode[STIX]{x2202}_{t}\langle \rangle =0$ , we find that

(7.33) $$\begin{eqnarray}{\textstyle \frac{1}{2}}\langle \unicode[STIX]{x1D743}_{t}\boldsymbol{\cdot }\unicode[STIX]{x1D743}_{t}\rangle =-{\textstyle \frac{1}{2}}\langle \unicode[STIX]{x1D743}\boldsymbol{\cdot }\unicode[STIX]{x1D743}_{tt}\rangle =-f_{0}\langle \unicode[STIX]{x1D709}\unicode[STIX]{x1D702}_{t}\rangle +{\textstyle \frac{1}{2}}\langle \unicode[STIX]{x1D743}\boldsymbol{\cdot }\unicode[STIX]{x1D735}B^{\prime }\rangle .\end{eqnarray}$$

By similar manipulations,

(7.34) $$\begin{eqnarray}\unicode[STIX]{x1D735}\times \langle (\unicode[STIX]{x1D709}_{x}+\unicode[STIX]{x1D702}_{y})\unicode[STIX]{x1D743}_{t}\rangle -\unicode[STIX]{x1D6FB}^{2}\langle \unicode[STIX]{x1D709}\unicode[STIX]{x1D702}_{t}\rangle =\langle J(\unicode[STIX]{x1D709}_{t},\unicode[STIX]{x1D709})\rangle +\langle J(\unicode[STIX]{x1D702}_{t},\unicode[STIX]{x1D702})\rangle +\unicode[STIX]{x1D735}\boldsymbol{\cdot }\langle (\unicode[STIX]{x1D702}_{x}-\unicode[STIX]{x1D709}_{y})\unicode[STIX]{x1D743}_{t}\rangle .\end{eqnarray}$$

Thus, (7.32) may be written as

(7.35) $$\begin{eqnarray}q^{w}=\langle J(\unicode[STIX]{x1D709}_{t},\unicode[STIX]{x1D709})\rangle +\langle J(\unicode[STIX]{x1D702}_{t},\unicode[STIX]{x1D702})\rangle +\unicode[STIX]{x1D735}\boldsymbol{\cdot }\langle (\unicode[STIX]{x1D702}_{x}-\unicode[STIX]{x1D709}_{y})\unicode[STIX]{x1D743}_{t}\rangle -\left(\frac{f_{0}}{2N_{0}^{2}}\langle \unicode[STIX]{x1D743}\boldsymbol{\cdot }\unicode[STIX]{x1D735}B^{\prime }\rangle _{z}\right)_{z}.\end{eqnarray}$$

Assuming only the separation in time scales, Wagner & Young (Reference Wagner and Young2015, hereafter WY) derive (7.31) with

(7.36) $$\begin{eqnarray}q^{w}=\langle J(\unicode[STIX]{x1D709}_{t},\unicode[STIX]{x1D709})\rangle +\langle J(\unicode[STIX]{x1D702}_{t},\unicode[STIX]{x1D702})\rangle +f_{0}\langle J(\unicode[STIX]{x1D709},\unicode[STIX]{x1D702})\rangle +\frac{f_{0}}{2}\langle \unicode[STIX]{x1D709}_{i}\unicode[STIX]{x1D709}_{j}\rangle _{,i,j}.\end{eqnarray}$$

We must show that our (7.35) agrees with their (7.36). WY work in Cartesian coordinates with fluid particle displacements $(\unicode[STIX]{x1D709}_{1},\unicode[STIX]{x1D709}_{2},\unicode[STIX]{x1D709}_{3})=(\unicode[STIX]{x1D709},\unicode[STIX]{x1D702},\unicode[STIX]{x1D701})$ . (The vertical displacement $\unicode[STIX]{x1D701}$ must not be confused with the relative vorticity denoted by the same symbol earlier in the paper.) Although the first three terms on the right-hand side of (7.36) involve only $x$ - and $y$ -derivatives, the last term in (7.36) contains index summations from 1 to 3. Thus, the connection between (7.35) and (7.36) is not obvious. In appendix B, we show that (7.35) and (7.36) are in fact equivalent.

8 Discussion

This paper extends the familiar analogy between electric charge and quasi-geostrophic potential vorticity to include the effects of inertia–gravity waves. In this analogy, quasi-geostrophic dynamics corresponds to a kind of Coulomb dynamics in which slowly moving charges interact via action-at-a-distance forces. The introduction of inertia–gravity waves alters this picture in two ways. First, nonlinear interactions between the waves – a feature not present in electrodynamics – can temporarily create a virtual electric charge, i.e. Bretherton flow. This charge is captured by the mean flow if dissipation removes the waves, but even if no dissipation occurs, the virtual charge interacts with other charges in the fluid, as beautifully illustrated in the examples analysed by Bühler & McIntyre (Reference Bühler and McIntyre2003). Second, the Stokes drift of the waves contributes to the advection of the charges. It is remarkable that both of these effects, here computed separately as coupling contributions to $L_{1}$ and $L_{2}$ , appear as contributions to $q^{w}$ in (7.31).

One could extend our theory by including more of the terms that occur in the expansion (4.3) of $L_{1}$ and the Taylor expansion of $L_{2}$ as given by (5.2). In these expansions, we have kept only cubic terms of the form ‘bar–prime–prime’. This approximation, also introduced via scaling assumptions by WY and XV, assumes that the quasi-geostrophic flow is weak compared with the inertia–gravity waves. Although this is certainly correct for Bretherton flow, oceanographers would be happy to see terms of the form ‘bar–bar–prime–prime’. Unfortunately, the number of such terms is daunting. Besides the desire to avoid complexity, our choice of coupling terms was made in the hope of recovering the results of WY and XV. In the WKB limit, we recover results very similar to those of Bühler & McIntyre (Reference Bühler and McIntyre1998) despite their scaling assumption of a stronger mean flow. All three of these papers are challenging papers whose precise methods seem to have little in common. Our ability to recover similar results suggests that the present paper represents a valid synthesis.

Despite the agreement with other work, our methods of approximating the Lagrangian will strike many readers as ad hoc. The development of a more rigorous approach faces two fundamental difficulties. First, there seem to be no dependable rules determining when it is legal to substitute the results of a variational principle back into the variational principle itself. Second, it is useless to apply formal scaling theory to the terms in the Lagrangian because each such term typically contributes to several equations. The importance of this contribution varies from equation to equation and thus cannot be gauged from the size of the term in the Lagrangian.

The most accepted method in our field is asymptotic expansion applied directly to the fluid equations, but there is a sense in which that method too is ad hoc. Virtually all interesting solutions of the fluid equations share the property of turbulence that small differences in initial conditions lead to completely different solutions after a finite time. It must equally be true that the small changes introduced by even the best approximations lead to solutions that eventually differ significantly from the exact solution. The most that can be hoped for is that the approximate solutions resemble the exact solution in a statistical sense. And this is likely to be true only if the approximation maintains conservation laws. Thus, we should distinguish between an approximation that exactly conserves an approximate form of energy and an approximation that only approximately conserves anything. The energy of the latter is likely to increase indefinitely, leading to an enormous statistical divergence from the exact solution. As ad hoc as they may appear to be, Lagrangian approximation methods are the champions at maintaining conservation laws. So long as the corresponding symmetry property is not disturbed, the conservation law survives.

Finally, it should be emphasized that all theories of the general type discussed in this paper are ‘low-energy’ theories in the sense that only at low energy can the flow even approximately be considered to consist of quasi-geostrophic dynamics weakly coupled to free inertia–gravity waves. At higher system energy, the two interact in ways that change the fundamental character of the flow. Typically, the free inertia–gravity waves are replaced by a balanced dynamics much more complicated than quasi-geostrophy, in which much of the flow field is slaved to the potential vorticity; see especially McIntyre & Norton (Reference McIntyre and Norton2000).

Acknowledgements

This work was inspired by the elegant papers of Bühler & McIntyre (Reference Bühler and McIntyre1998), Wagner & Young (Reference Wagner and Young2015) and Xie & Vanneste (Reference Xie and Vanneste2015). I thank G. L. Wagner, W. R. Young, J.-H. Xie and three anonymous referees for valuable comments.

Appendix A. Analogy with electrodynamics

This appendix clarifies the connection between S14 and the present work. S14 considered a simplified shallow-water dynamics with continuity equation

(A 1) $$\begin{eqnarray}{\hat{h}}_{t}+\unicode[STIX]{x1D735}\boldsymbol{\cdot }\boldsymbol{u}=0.\end{eqnarray}$$

However, the same strategy applies to the exact continuity equation (2.2). If we rewrite (2.2) as a statement of vanishing space–time divergence,

(A 2) $$\begin{eqnarray}(\unicode[STIX]{x2202}_{t},\unicode[STIX]{x2202}_{x},\unicode[STIX]{x2202}_{y})\cdot ({\hat{h}},{\hat{h}}u,{\hat{h}}v)=0,\end{eqnarray}$$

then it follows from (A 2) that

(A 3) $$\begin{eqnarray}({\hat{h}},{\hat{h}}u,{\hat{h}}v)=(\unicode[STIX]{x2202}_{t},\unicode[STIX]{x2202}_{x},\unicode[STIX]{x2202}_{y})\times (-\unicode[STIX]{x1D713},A,B)\end{eqnarray}$$

for some vector $(-\unicode[STIX]{x1D713},A,B)$ . That is,

(A 4) $$\begin{eqnarray}\displaystyle & {\hat{h}}=B_{x}-A_{y}, & \displaystyle\end{eqnarray}$$
(A 5) $$\begin{eqnarray}\displaystyle & {\hat{h}}u=-\unicode[STIX]{x1D713}_{y}-B_{t}, & \displaystyle\end{eqnarray}$$
(A 6) $$\begin{eqnarray}\displaystyle & {\hat{h}}v=\unicode[STIX]{x1D713}_{x}+A_{t}. & \displaystyle\end{eqnarray}$$

Following steps similar to those in S14, it is straightforward to show that the non-rotating shallow-water equations, with vorticity confined to delta functions located at $\boldsymbol{x}_{i}(t)$ , are equivalent to the requirement that

(A 7) $$\begin{eqnarray}\displaystyle L & = & \displaystyle \iiint \,\text{d}\boldsymbol{x}\,\text{d}t\left(\frac{1}{2}\frac{(A_{t}+\unicode[STIX]{x1D713}_{x})^{2}}{(B_{x}-A_{y})}+\frac{1}{2}\frac{(B_{t}+\unicode[STIX]{x1D713}_{y})^{2}}{(B_{x}-A_{y})}-\frac{1}{2}c^{2}(B_{x}-A_{y})^{2}\right)\nonumber\\ \displaystyle & & \displaystyle +\,\mathop{\sum }_{i}\unicode[STIX]{x1D6E4}_{i}\int \,\text{d}t(\unicode[STIX]{x1D713}(\boldsymbol{x}_{i}(t),t)-A(\boldsymbol{x}_{i}(t),t){\dot{x}}_{i}(t)-B(\boldsymbol{x}_{i}(t),t){\dot{y}}_{i}(t))\end{eqnarray}$$

be stationary with respect to variations in $\unicode[STIX]{x1D713},A,B$ and $\boldsymbol{x}_{i}$ .

If we set $B_{x}-A_{y}=1$ (i.e. $h=h_{0}$ ) in the denominators of (A 7), then the resulting

(A 8) $$\begin{eqnarray}\displaystyle L & = & \displaystyle \iiint \,\text{d}\boldsymbol{x}\,\text{d}t\left(\frac{1}{2}(A_{t}+\unicode[STIX]{x1D713}_{x})^{2}+\frac{1}{2}(B_{t}+\unicode[STIX]{x1D713}_{y})^{2}-\frac{1}{2}c^{2}(B_{x}-A_{y})^{2}\right)\nonumber\\ \displaystyle & & \displaystyle +\,\mathop{\sum }_{i}\unicode[STIX]{x1D6E4}_{i}\int \,\text{d}t(\unicode[STIX]{x1D713}(\boldsymbol{x}_{i}(t),t)-A(\boldsymbol{x}_{i}(t),t){\dot{x}}_{i}(t)-B(\boldsymbol{x}_{i}(t),t){\dot{y}}_{i}(t))\end{eqnarray}$$

is equivalent to the Lagrangian given in S14. The dynamics corresponding to (A 8) were called ‘wave–vortex dynamics’ in S14. Because the first line in (A 7) is non-quadratic in the potentials $\unicode[STIX]{x1D713},A$ and $B$ , gravity waves interact nonlinearly in exact shallow-water dynamics, even in the absence of vorticity. However, because the first line in (A 8) is quadratic, the gravity waves in wave–vortex dynamics are linear waves that never interact. In that respect, they resemble electrodynamic waves, and, as noted in S14, the Lagrangian (A 8) is in fact closely analogous to the Lagrangian for classical electrodynamics; see, for example, Landau & Lifshitz (Reference Landau and Lifshitz1975, pp. 67–69). In this analogy, $\unicode[STIX]{x1D6E4}_{i}$ corresponds to the charge on the particle located at $\boldsymbol{x}_{i}(t)$ . In fact, the only difference between (A 8) and the Lagrangian for classical electrodynamics in two space dimensions is that (A 8) contains no term analogous to

(A 9) $$\begin{eqnarray}\int mc\,\text{d}s,\end{eqnarray}$$

where $m$ is the mass of the electron, $c$ is the speed of light and $\text{d}s$ is the differential of proper time.

The vector $(-\unicode[STIX]{x1D713},A,B)$ is not unique; to it we may add a vector $(\unicode[STIX]{x2202}_{t},\unicode[STIX]{x2202}_{x},\unicode[STIX]{x2202}_{y})\unicode[STIX]{x1D706}(\boldsymbol{x},t)$ without affecting (A 3). Thus, we are free to add a gauge condition. S14 adopted the Lorentz gauge. In the present paper, we choose the Coulomb gauge,

(A 10) $$\begin{eqnarray}A_{x}+B_{y}=0.\end{eqnarray}$$

Then,

(A 11) $$\begin{eqnarray}(A,B)=(\unicode[STIX]{x1D6FE}_{y},x-\unicode[STIX]{x1D6FE}_{x})\end{eqnarray}$$

for some $\unicode[STIX]{x1D6FE}$ . The $x$ -term in (A 11) just separates out the mean depth. Substituting (A 11) into (A 4)–(A 6), we obtain (2.8)–(2.10). Substituting (A 11) into (A 7), we obtain (2.18) with $f_{0}=0$ . The choice of the Coulomb gauge shifts the emphasis from the momentum equations to the vorticity and divergence equations.

The introduction of non-vanishing coordinate-system rotation is easy at any stage of the development. However, $f_{0}\neq 0$ precipitates a choice between physically equivalent formulations of the dynamics. S14 introduced charges as lumps of potential vorticity. In the present paper, the charges represent lumps of APV. These two approaches are equally valid, but if the former is chosen, then coordinate-system rotation enters as a background medium with a uniform electric charge. The inertia–gravity waves become gravity waves that continually interact with the background charge. This proves to be a clumsy way of looking at inertia–gravity waves.

Because no term like (A 9) appears in (A 7), (2.16), which is analogous to the Lorentz force law for the $i$ th charged particle,

(A 12) $$\begin{eqnarray}m_{i}\ddot{\boldsymbol{x}}_{i}=q_{i}(\boldsymbol{E}+\dot{\boldsymbol{x}}_{i}\times \boldsymbol{B}),\end{eqnarray}$$

reduces to the analogue of

(A 13) $$\begin{eqnarray}0=\boldsymbol{E}+\dot{\boldsymbol{x}}_{i}\times \boldsymbol{B},\end{eqnarray}$$

in which the electric force cancels the magnetic force. That is, (2.16) lacks the acceleration term $\ddot{\boldsymbol{x}}_{i}$ because point vortices, unlike electrons, are massless.

In overall summary, fluid dynamics is more complicated than electrodynamics in that the fluid waves interact with each other, even in the absence of charge (i.e. vorticity). However, electrodynamics is more complicated than fluid dynamics in that electrons, unlike point vortices, have mass.

The Lagrangians given in the present paper are believed to be new. They differ from the more conventional Lagrangians for a perfect fluid, which typically depend upon labelled fluid particles or upon a set of Clebsch potentials; see, for example, Salmon (Reference Salmon1998, chap. 7). Given the analogy between fluid dynamics and classical electrodynamics developed in the present paper, one could ask whether electrodynamics has a variational principle like the conventional variational principle of fluid dynamics, that is, solely in terms of the labelled locations of charged particles, and not involving the electromagnetic potentials at all. The answer, of course, is yes. It is the Fokker action principle discussed in the famous paper by Wheeler & Feynman (Reference Wheeler and Feynman1949).

Appendix B. Agreement with the result of Wagner & Young (Reference Wagner and Young2015)

In this appendix, we show that our (7.35) agrees with WY’s (7.36). WY work in Cartesian coordinates. In Cartesian coordinates, the linear wave dynamics are

(B 1) $$\begin{eqnarray}\displaystyle & \unicode[STIX]{x1D709}_{tt}-f_{0}\unicode[STIX]{x1D702}_{t}=-p_{x}, & \displaystyle\end{eqnarray}$$
(B 2) $$\begin{eqnarray}\displaystyle & \unicode[STIX]{x1D702}_{tt}+f_{0}\unicode[STIX]{x1D709}_{t}=-p_{y}, & \displaystyle\end{eqnarray}$$
(B 3) $$\begin{eqnarray}\displaystyle & p_{z}=b, & \displaystyle\end{eqnarray}$$
(B 4) $$\begin{eqnarray}\displaystyle & b_{t}+N_{0}^{2}\unicode[STIX]{x1D701}_{t}=0, & \displaystyle\end{eqnarray}$$
(B 5) $$\begin{eqnarray}\displaystyle & \unicode[STIX]{x1D709}_{x}+\unicode[STIX]{x1D702}_{y}+\unicode[STIX]{x1D701}_{z}=0, & \displaystyle\end{eqnarray}$$

where $p$ is the pressure and $b$ is the buoyancy. If we define

(B 6) $$\begin{eqnarray}\displaystyle & p=B^{\prime }, & \displaystyle\end{eqnarray}$$
(B 7) $$\begin{eqnarray}\displaystyle & \unicode[STIX]{x1D701}=-N_{0}^{-2}B_{z}^{\prime }, & \displaystyle\end{eqnarray}$$
(B 8) $$\begin{eqnarray}\displaystyle & b=B_{z}^{\prime }, & \displaystyle\end{eqnarray}$$

then (7.4)–(7.6) are formally identical to (B 1)–(B 5). (We can ignore the difference between the derivatives in the two coordinate systems; this difference is negligible at the order of $q^{w}$ .) Directly from (B 1)–(B 5), WY demonstrate that

(B 9) $$\begin{eqnarray}\langle \unicode[STIX]{x1D743}_{3}\boldsymbol{\cdot }\unicode[STIX]{x1D735}_{3}p\rangle _{z}=-2N_{0}^{2}\left\langle \unicode[STIX]{x1D743}_{3}\boldsymbol{\cdot }\unicode[STIX]{x1D735}_{3}\unicode[STIX]{x1D701}+{\textstyle \frac{1}{2}}\unicode[STIX]{x1D701}^{2}\unicode[STIX]{x2202}_{z}\ln N_{0}^{2}\right\rangle ,\end{eqnarray}$$

where $\unicode[STIX]{x1D743}_{3}=(\unicode[STIX]{x1D709},\unicode[STIX]{x1D702},\unicode[STIX]{x1D701})$ and $\unicode[STIX]{x1D735}_{3}=(\unicode[STIX]{x2202}_{x},\unicode[STIX]{x2202}_{y},\unicode[STIX]{x2202}_{z})$ . (This is their equation (A 10).) It follows from (B 7) and (B 9) that

(B 10) $$\begin{eqnarray}\langle \unicode[STIX]{x1D743}\boldsymbol{\cdot }\unicode[STIX]{x1D735}B^{\prime }\rangle _{z}=-2N_{0}^{2}\langle \unicode[STIX]{x1D743}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\unicode[STIX]{x1D701}\rangle .\end{eqnarray}$$

The linear dynamics (7.4)–(7.6) or (B 1)–(B 5) imply

(B 11) $$\begin{eqnarray}(\unicode[STIX]{x1D702}_{x}-\unicode[STIX]{x1D709}_{y})_{t}=-f_{0}(\unicode[STIX]{x1D709}_{x}+\unicode[STIX]{x1D702}_{y}).\end{eqnarray}$$

It follows from (B 11) that

(B 12) $$\begin{eqnarray}\unicode[STIX]{x1D735}\boldsymbol{\cdot }\langle (\unicode[STIX]{x1D702}_{x}-\unicode[STIX]{x1D709}_{y})\unicode[STIX]{x1D743}_{t}\rangle =f_{0}\unicode[STIX]{x1D735}\boldsymbol{\cdot }\langle (\unicode[STIX]{x1D709}_{x}+\unicode[STIX]{x1D702}_{y})\unicode[STIX]{x1D743}\rangle .\end{eqnarray}$$

Substituting (B 10) and (B 12) into (7.35), we obtain

(B 13) $$\begin{eqnarray}q^{w}=\langle J(\unicode[STIX]{x1D709}_{t},\unicode[STIX]{x1D709})\rangle +\langle J(\unicode[STIX]{x1D702}_{t},\unicode[STIX]{x1D702})\rangle +f_{0}\unicode[STIX]{x1D735}\boldsymbol{\cdot }\langle (\unicode[STIX]{x1D709}_{x}+\unicode[STIX]{x1D702}_{y})\unicode[STIX]{x1D743}\rangle +f_{0}\langle \unicode[STIX]{x1D743}\boldsymbol{\cdot }\unicode[STIX]{x1D735}\unicode[STIX]{x1D701}\rangle _{z}.\end{eqnarray}$$

Using (B 5), it is straightforward to show that (B 13) is equivalent to WY’s result (7.36).

References

Bretherton, F. P. 1969 On the mean motion induced by internal gravity waves. J. Fluid Mech. 36, 785803.CrossRefGoogle Scholar
Bühler, O. & McIntyre, M. E. 1998 On non-dissipative wave–mean interactions in the atmosphere or oceans. J. Fluid Mech. 354, 301343.CrossRefGoogle Scholar
Bühler, O. & McIntyre, M. E. 2003 Remote recoil: a new wave–mean interaction effect. J. Fluid Mech. 492, 207230.CrossRefGoogle Scholar
Bühler, O. & McIntyre, M. E. 2005 Wave capture and wave–vortex duality. J. Fluid Mech. 534, 6795.CrossRefGoogle Scholar
Landau, L. D. & Lifshitz, E. M. 1975 The Classical Theory of Fields. Pergamon.Google Scholar
McIntyre, M. E. & Norton, W. A. 2000 Potential vorticity inversion on a hemisphere. J. Atmos. Sci. 57, 12141235.2.0.CO;2>CrossRefGoogle Scholar
Salmon, R. 1998 Lectures on Geophysical Fluid Dynamics. Oxford University Press.CrossRefGoogle Scholar
Salmon, R. 2014 Analogous formulation of electrodynamics and two-dimensional fluid dynamics. J. Fluid Mech. 761, R2, 1–12.CrossRefGoogle Scholar
Wagner, G. L. & Young, W. R. 2015 Available potential vorticity and wave-averaged quasi-geostrophic flow. J. Fluid Mech. 785, 401424.CrossRefGoogle Scholar
Wheeler, J. A. & Feynman, R. P. 1949 Classical electrodynamics in terms of direct interparticle action. Rev. Mod. Phys. 21 (3), 425433.CrossRefGoogle Scholar
Whitham, G. B. 1965 A general approach to linear and nonlinear waves using a Lagrangian. J. Fluid Mech. 22, 273283.CrossRefGoogle Scholar
Whitham, G. B. 1974 Linear and Nonlinear Waves. Wiley.Google Scholar
Xie, J.-H. & Vanneste, J. 2015 A generalized-Lagrangian-mean model of the interactions between near-inertial waves and mean flow. J. Fluid Mech. 774, 143169.CrossRefGoogle Scholar
Young, W. R. 2012 An exact thickness-weighted average formulation of the Boussinesq equations. J. Phys. Oceanogr. 42, 692707.CrossRefGoogle Scholar
Young, W. R. & Ben Jelloul, M. 1997 Propagation of near-inertial oscillations through a geostrophic flow. J. Mar. Res. 55, 735766.CrossRefGoogle Scholar