Transportation on spheres via an entropy formula

Gordon Blower

doi:10.1017/prm.2022.54

Transportation on spheres via an entropy formula

Part of: Distribution theory - Probability Calculus on manifolds; nonlinear operators

Published online by Cambridge University Press: 05 September 2022

Gordon Blower

Show author details

Gordon Blower*: Affiliation:
Department of Mathematics and Statistics, Lancaster University, Lancaster LA1 4YF, UK g.blower@lancaster.ac.uk

Article contents

Abstract
Transportation on the sphere
Transportation on compact Riemannian manifolds
References

Rights & Permissions

Abstract

The paper proves transportation inequalities for probability measures on spheres for the Wasserstein metrics with respect to cost functions that are powers of the geodesic distance. Let $\mu$ be a probability measure on the sphere ${\bf S}^n$ of the form $d\mu =e^{-U(x)}{\rm d}x$ where ${\rm d}x$ is the rotation invariant probability measure, and $(n-1)I+{\hbox {Hess}}\,U\geq {\kappa _U}I$, where $\kappa _U>0$. Then any probability measure $\nu$ of finite relative entropy with respect to $\mu$ satisfies ${\hbox {Ent}}(\nu \mid \mu ) \geq (\kappa _U/2)W_2(\nu,\, \mu )^2$. The proof uses an explicit formula for the relative entropy which is also valid on connected and compact $C^\infty$ smooth Riemannian manifolds without boundary. A variation of this entropy formula gives the Lichnérowicz integral.

Keywords

Wasserstein metric curvature transport convexity

MSC classification

Primary: 60E15: Inequalities; stochastic orderings 58C35: Integration on manifolds; measures on manifolds

Type: Research Article
Information: Proceedings of the Royal Society of Edinburgh Section A: Mathematics , Volume 153 , Issue 5 , October 2023 , pp. 1467 - 1478

DOI: https://doi.org/10.1017/prm.2022.54 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright: Copyright © The Author(s), 2022. Published by Cambridge University Press on behalf of The Royal Society of Edinburgh

1. Transportation on the sphere

Optimal transportation involves moving unit mass from one probability distribution to another, at minimal cost, where the cost is measured by Wasserstein's distance.

Definition Let $(M,\,d)$ be a compact metric space and let $\mu$ and $\nu$ be probability measures on $M$. Then for $1\leq p<\infty$, Wasserstein's distance from $\mu$ to $\nu$ is $W_p(\nu,\, \mu )$, where

(1.1)\begin{align} W_p(\nu, \mu )^p=\inf_\pi\Bigg\{ \iint_{M\times M} d(x,y)^p \pi ({\rm d}x{\rm d}y): \pi\in {\hbox{Prob}}(M\times M)\Bigg\}\end{align}

where the probability measure $\pi$ has marginals $\nu$ and $\mu$ (see [Reference Dudley8, Reference Villani14]).

Transportation inequalities are results that bound the transportation cost $W_p(\nu,\, \mu )^p$ in terms of $\mu$, $\nu$ and geometrical quantities of $(M,\,d)$. Typically, one chooses $\mu$ to satisfy special conditions, and then one imposes minimal hypotheses on $\nu$. In this section, we consider the case where $(M,\,d)$ is the unit sphere ${\bf S}^2$ in ${\bf R}^3$, and obtain transportation inequalities by vector calculus. In section two, we extend these methods to a connected, compact and $C^\infty$ smooth Riemannian manifold $(M,\,d)$.

On ${\bf S}^2$, let $\theta \in [0,\, 2\pi )$ be the longitude and $\phi \in [0,\, \pi ]$ the colatitude, so the area measure is ${\rm d}x=\sin \phi \, d\phi d\theta$. Let $ABC$ be a spherical triangle where $A$ is the North Pole; then by [Reference Kimura and Okamoto10] the Green's function $G(B,\,C)=-(4\pi )^{-1}\log (1-\cos d(B,\,C))$ may be expressed in terms of longitude and co latitude of $B$ and $C$ via the spherical cosine formula. A related cost function is listed in [Reference Villani14], p 972. Given probability measures $\mu$ and $\nu$ on ${\bf S}^2$, we can form

\[ G(\mu -\nu )(x)=\int_{{\bf S}^2}G(x,y)(\mu ({\rm d}y)-\nu ({\rm d}y)) \]

with gradient in the $x$ variable

\[ \nabla G(\mu -\nu )(x)=\int_{{\bf S}^2}\nabla_xG(x,y)(\mu ({\rm d}y)-\nu ({\rm d}y)). \]

Proposition 1.1 Let $\mu$ and $\nu$ be nonatomic probability measures on ${\bf S}^2$. Then

(1.2)\begin{equation} W_1(\mu,\nu )\leq \int_{{\bf S}^2}\Vert\nabla G(\mu-\nu )(x)\Vert {\rm d}x.\end{equation}

Proof. The Green's function is chosen so that $\nabla \cdot \nabla G(B,\,C)=\delta _B(C)-1/(4\pi )$ in the sense of distributions. Given non-atomic probability measures $\mu$ and $\nu$ on ${\bf S}^2$, their difference $\mu -\nu$ is orthogonal to the constants on ${\bf S}^2,$ so for a $1$-Lipschitz function $\varphi : {\bf S}^2\rightarrow {\bf R}$, we have

(1.3)\begin{align} \int_{{\bf S}^2} \varphi (x)(\mu ({\rm d}x)-\nu ({\rm d}x))& =\int_{{\bf S}^2} \varphi (x)\nabla\cdot\nabla G(\mu-\nu )(x) {\rm d}x\nonumber\\ & ={-}\int_{{\bf S}^2} \nabla\varphi (x)\cdot\nabla G(\mu-\nu )(x)\,{\rm d}x \end{align}

so by Kantorovich's duality theorem [Reference Dudley8], the Wasserstein transportation distance is bounded by

(1.4)\begin{equation} W_1(\mu,\nu )\leq \int_{{\bf S}^2}\Vert\nabla G(\mu-\nu )(x)\Vert {\rm d}x.\end{equation}

Definition Suppose that $\mu$ is a probability measure and $\nu$ is a probability measure that is absolutely continuous with respect to $\mu$, so $d\nu =vd\mu$ for some probability density function $v\in L^1(\mu )$. Then the relative entropy of $\nu$ with respect to $\mu$ is

(1.5)\begin{equation} {\hbox{Ent}}(\nu \mid\mu )=\int_{{\bf S}^2} \log v(y)\,\nu ({\rm d}y),\end{equation}

where $0\leq {\hbox {Ent}}(\nu \mid \mu ) \leq \infty$ by Jensen's inequality.

At $x\in {\bf S}^2$, we have tangent space $T_s{\bf S}^2=\{ y\in {\bf R}^3: x\cdot y=0\}$. For $y\in T_x{\bf S}^2$ with $\Vert y\Vert =1$, we consider $\exp _x(ty)=x\cos t+y\sin t$ so that $\exp _x(0)=x$, $\Vert \exp _x(ty)\Vert =1$ and $(d/{\rm d}t)_{t=0}\exp _x(ty)=y$; hence $\exp _x:T_x{\bf S}^2\rightarrow {\bf S}^2$ gives the exponential map. We let $J_{\exp _x}$ be the Jacobian determinant of this map.

Suppose that $\mu ({\rm d}x)=e^{-U (x)}{\rm d}x$ is a probability measure and $\nu$ is a probability measure that is absolutely continuous with respect to $\mu$, so $d\nu =vd\mu$. We say that a Borel function $\Psi :{\bf S}^2\rightarrow {\bf S}^2$ induces $\nu$ from $\mu$ if $\int f(y)\nu ({\rm d}y)=\int f(\Psi (x))\mu ({\rm d}x )$ for all $f\in C({\bf S}^2; {\bf R})$. McCann [Reference McCann12] showed that there exists $\Psi$ that gives the optimal transport strategy for the $W_2$ metric; further, there exists a Lipschitz function $\psi : {\bf S}^2\rightarrow {\bf R}$ such that $\Psi (x)=\exp _x(\nabla \psi (x))$; so that

(1.6)\begin{equation} W_2(\nu, \mu )^2=\int_{{\bf S}^2} d(\Psi (x),x)^2 \mu ({\rm d}x)= \int_{{\bf S}^2} \Vert \nabla\psi (x)\Vert^2 \mu ({\rm d}x).\end{equation}

Talagrand developed $T_p$ inequalities in which $W_p(\nu,\, \mu )^p$ is bounded in terms of ${\hbox {Ent}}(\nu \mid \mu )$, as in [Reference Villani14], p 569. In [Reference Cordero-Erausquin5] and [Reference Cordero-Erausquin, McCann and Schmuckensläger6], the authors obtain some functional inequalities that are related to $T_p$ inequalities. Here we offer an approach that is more direct, and uses only basic differential geometry to augment McCann's fundamental result. The key point is an explicit formula for the relative entropy in terms of the optimal transport maps.

Lemma 1.2 Suppose that $\nu$ has finite relative entropy with respect to $\mu,$ and let

(1.7)\begin{equation} H={\hbox{Hess}}_x\psi (x)\quad {{and}}\quad A={\hbox{Hess}}_xd(x,y)^2/2\quad {{at}}\quad y=\Psi(x);\end{equation}

let $\Psi _t(x)=\exp _x(t\nabla \psi (x))$ for $t\in [0,\,1]$. Then the relative entropy satisfies

(1.8)\begin{align} {\hbox{Ent}}(\nu \mid\mu )& \geq\int_{{\bf S}^2}\Bigg( {\hbox{trace}}\,\bigl( H-\log (A+H)\bigr)-\log J_{\exp_x}(\nabla\psi (x))\nonumber\\ & \quad +\int_0^1(1-t) \frac{d^2}{{\rm d}t^2}U (\Psi_t(x)){\rm d}t\Bigg)\mu ({\rm d}x). \end{align}

where $A$ is positive definite, $H$ is symmetric and $A+H$ is also positive definite, and

(1.9)\begin{equation} {\hbox{trace}}\,(H-\log (A+H))\geq 0.\end{equation}

If $\psi \in C^2,$ then equality holds in (1.8).

Proof. To express the relative entropy in terms of the transportation map, we adapt an argument from [Reference Blower1]. We have ${\hbox {Ent}}(\nu \mid \mu )=\int _{{\bf S}^2} \log v(\Psi (x))\mu ({\rm d}x)$, where the integrand is

(1.10)\begin{equation} \log v(\Psi (x))=U (\Psi (x))-U (x)-\log J_\Psi (x),\end{equation}

where the final term arises from the Jacobian of the change of variable $y=\Psi (x)$, where $\Psi =\Psi _1$ and $\Psi _t(x)=\exp _x(t\nabla \psi (x))$. We compute this Jacobian by the chain rule for derivatives with respect to $x$. Specifically by [Reference Cordero-Erausquin, McCann and Schmuckensläger6] p 622, we have ${\hbox {Hess}}(\psi (x)+d(x,\,y)^2/2)\geq 0$ and

(1.11)\begin{equation} \log J_\Psi (x)=\log J_{\exp_x}(\nabla \psi (x))+\log\det {\hbox{Hess}}(\psi (x)+d(x,y)^2/2)\end{equation}

where $J_{\exp _x}$ is the Jacobian of $\exp _x:T_x{\bf S}^2\rightarrow {\bf S}^2$ and ${\hbox {Hess}}=D_x^2$ is the Hessian, where the expression is evaluated at $y=\exp _x(\nabla \psi (x))$. For $x\in {\bf S}^2$ and $\tau \in {\bf R}^3$ such that $x\cdot \tau =0$, we have $\tau \in T_x{\bf S}^2$ and

(1.12)\begin{equation} \exp_x( \tau )=\cos (\Vert \tau\Vert )\, x+\frac{\sin (\Vert \tau\Vert )}{\Vert \tau \Vert}\tau;\end{equation}

see [Reference Cordero-Erausquin5]. By a vector calculus computation, which we replicate from [Reference Cordero-Erausquin5], one finds

(1.13)\begin{equation} J_{\exp_x}(\Vert\nabla \psi (x)\Vert) =\frac{\sin\Vert\nabla \psi (x)\Vert }{\Vert\nabla \psi (x)\Vert}.\end{equation}

With $\psi :{\bf S}^2\rightarrow {\bf R}$ we have $\nabla \psi (x)\perp x$, so $0=x\cdot \nabla \psi (x),$ hence $0=\nabla \psi (x)+{\hbox {Hess}}(\psi (x)) x$. We write $\theta =\Vert \nabla \psi (x)\Vert$ for the angle between $x$ and $\Psi (x)$ so

\[ \Psi (x)=\exp_x(\nabla \psi (x))=x\cos\theta +{\frac{\sin\theta}{\theta}}\nabla\psi (x); \]

let $v=x\times \theta ^{-1}\nabla \psi (x)$ where $\times$ denotes the usual vector product; then $\{ x,\, \theta ^{-1}\nabla \psi (x),\, v\}$ gives an orthonormal basis of ${\bf R}^3$. Hence

\[ {\frac{\partial \Psi}{\partial v}}=v\cos \theta -\sin\theta \langle \nabla\theta ,v\rangle x+\Bigg(\cos\theta -{\frac{\sin\theta}{\theta}} \Bigg)\langle\nabla\theta ,v\rangle {\frac{\nabla\psi (x)}{\theta}}+{\frac{\sin\theta}{\theta}} {\hbox{Hess}}\psi (x) v, \]

and we obtain (1.13) from the final factor. Then by spherical trigonometry, we have

(1.14)\begin{equation} \cos d(\exp_x(\tau ), y)=(\cos \Vert\tau\Vert )\, \cos d(x,y) +{\frac{\sin \Vert\tau\Vert}{\Vert \tau\Vert}}\langle \tau ,y\rangle ,\end{equation}

so we have $\langle \nabla _x \cos d(x,\,y),\, \tau \rangle =\langle y,\, \tau \rangle$ and $\langle {\hbox {Hess}}_x\cos d(x,\,y)\tau,\, \tau \rangle =-(\cos d(x,\,y)) \Vert \tau \Vert ^2$; so

(1.15)\begin{equation} \langle A\tau, \tau\rangle= {\frac{1}{2}}\bigl\langle {\hbox{Hess}}_x d(x,y)^2\tau,\tau \bigr\rangle =\frac{d(x,y)}{\tan d(x,y)} \Vert \tau\Vert^2 +\Bigg(1-{\frac{d(x,y)}{\tan d(x,y)}}\Bigg) {\frac{\langle y, \tau\rangle^2}{\sin^2d(x,y)}};\end{equation}

hence $A$ is positive definite and is a rank-one perturbation of a multiple of the identity matrix. Note that the formulas degenerate on the cut locus $d(x,\,y)=\pi ;$ consider the international date line opposite the Greenwich meridian.

We have

(1.16)\begin{equation} {\hbox{Ent}}(\nu \mid\mu )=\int_{{\bf S}^2}\bigl( U (\Psi (x))-U (x)-\log J_\Psi (x)\bigr)e^{{-}U (x)}{\rm d}x\end{equation}

in which

(1.17)\begin{equation} U (\Psi (x))-U (x)=\langle \nabla U (x), \nabla \psi (x)\rangle +\int_0^1 (1-t){\frac{d^2}{{\rm d}t^2}}U (\Psi_t(x)){\rm d}t,\end{equation}

and we can combine the first two terms in (1.16) by the divergence theorem so

(1.18)\begin{equation} \int_{{\bf S}^2} \langle \nabla U (x), \nabla\psi (x)\rangle e^{{-}U (x)} {\rm d}x=\int_{{\bf S}^2} \nabla\cdot\nabla \psi (x) e^{{-}U (x)} {\rm d}x.\end{equation}

Hence from (1.11) we have

(1.19)\begin{equation} {\hbox{Ent}}(\nu \mid\mu )\,{=}\! \int_{{\bf S}^2} (\nabla\cdot \nabla \psi (x)\,{-}\log J_\Psi (x)) \mu ({\rm d}x)\,{+}\!\int_{{\bf S}^2}\int_0^1(1\,{-}\,t) {\frac{d^2}{{\rm d}t^2}}U (\Psi_t(x)){\rm d}t\mu ({\rm d}x),\end{equation}

in which the Alexandrov Hessian [Reference Cordero-Erausquin, McCann and Schmuckensläger6], [Reference Villani14] p 363 satisfies

(1.20)\begin{equation} {\hbox{trace}}\, {\hbox{Hess}}_x\psi (x)\leq\nabla\cdot\nabla \psi (x)=\Delta_D \psi (x),\end{equation}

where $\Delta _D\psi$ is the distributional derivative of the Lipschitz function $\psi$; so we recognize (1.8).

We have an orthonormal basis

(1.21)\begin{equation} \Bigg\{ x, {\frac{\nabla\psi (x)}{\Vert \nabla \psi (x)\Vert}}, x\times {\frac{\nabla\psi (x)}{\Vert \nabla \psi (x)\Vert }}\Bigg\}\end{equation}

for ${\bf R}^3$ in which the final two vectors give an orthonormal basis for $T_x{\bf S}^2$. Then

(1.22)\begin{equation} \Bigg\langle A {\frac{\nabla\psi (x)}{\Vert \nabla \psi (x)\Vert}}, {\frac{\nabla\psi (x)}{\Vert \nabla \psi (x)\Vert }}\Bigg\rangle =1\end{equation}

and

(1.23)\begin{equation} \Bigg\langle A \Bigg(x\times {\frac{\nabla\psi (x)}{\Vert \nabla \psi (x)\Vert}}\Bigg), x\times {\frac{\nabla\psi (x)}{\Vert \nabla \psi (x)\Vert}}\Bigg\rangle ={\frac{d(x,y)}{\tan d(x,y)}},\end{equation}

hence $A$ and $H$ have the form

(1.24)\begin{equation} A=\left[\begin{array}{@{}cc@{}} 1 & 0\\ 0 & \dfrac{\Vert \nabla\psi (x)\Vert}{\tan \Vert\nabla\psi (x)\Vert}\end{array}\right], \quad H=\left[\begin{array}{@{}cc@{}}h & \beta\\ \beta & k \end{array}\right]\end{equation}

with respect to the stated basis of $T_x{\bf S}^2$.

The function $f(x)=x-1-\log x$ for $x>0$ is convex and takes its minimum value at $f(1)=0$. Let $T$ be a self-adjoint matrix with eigenvalues $\lambda _1\geq \dots \geq \lambda _n$ where $\lambda _n>-1$; then the Carleman determinant of $I+T$ is $\det _2(I+T)=\prod _{j=1}^n (1+\lambda _j)e^{-\lambda _j}$. Since $A+H$ is positive definite, as in [Reference Blower1] corollary 4.3, we can apply the spectral theorem to compute the Carleman determinant and show that

(1.25)\begin{equation} -\log\det_2(A+H)={\hbox{trace}}\,\bigl( A+H-I-\log (A+H)\bigr)\geq 0\end{equation}

(1.26)\begin{align} {\hbox{trace}}\,\bigl( H-\log (A+H)\bigr)& ={\hbox{trace}}\,\bigl( A+H-I-\log (A+H)\bigr) +{\hbox{trace}}\, (I-A)\nonumber\\ & \geq 0+1-{\frac{\Vert\nabla\psi (x)\Vert}{\tan\Vert\nabla\psi (x)\Vert}}\geq 0. \end{align}

Proposition 1.3 Suppose that the Hessian matrix of $U$ satisfies

(1.27)\begin{equation} {\hbox{Hess}}\,U (x)+ I\geq \kappa_U I\quad (x\in {\bf S}^2)\end{equation}

for some $\kappa _U>0$. Then $\mu$ satisfies the transportation inequality

(1.28)\begin{equation} {\hbox{Ent}}(\nu\mid\mu )\geq {\frac{\kappa_U}{2}}W_2(\nu ,\mu )^2.\end{equation}

This applies in particular when $\mu$ is normalized surface area measure.

Proof. Let $K:[0,\, \pi )\rightarrow {\bf R}$ be the function

(1.29)\begin{equation} K(\alpha )=1-{\frac{\alpha}{\tan\alpha}}+\log {\frac{\alpha}{\sin\alpha}}={\frac{d}{d\alpha}}\Bigg( \alpha \log {\frac{\alpha}{\sin\alpha}}\Bigg).\end{equation}

Then from (1.13) and (1.26) we have

\[ \int_{{\bf S}^2} ( \nabla\cdot \nabla \psi (x)-\log J_\Psi (x)) \mu ({\rm d}x)\geq \int_{{\bf S}^2} \Bigg(\! -\log\det_2(A+H)+K(\Vert\nabla \psi (x)\Vert)\!\Bigg)\mu ({\rm d}x). \]

Considering the final integral in (1.8), we have

(1.30)\begin{equation} {\frac{\partial\Psi_t(x) }{\partial t}}={-}\Vert\nabla\psi(x)\Vert\sin (t\Vert\nabla\psi (x)\Vert) x+\cos (t\Vert\nabla\psi (x)\Vert)\nabla\psi (x)\end{equation}

which has constant speed $\Vert {\frac {\partial \Psi _t(x) }{\partial t}}\Vert =\Vert \nabla \psi (x)\Vert$ and $\langle {\frac {\partial \Psi _t(x) }{\partial t}},\, \Psi _t(x)\rangle =0;$ also

(1.31)\begin{align} {\frac{\partial^2}{\partial t^2}}U (\Psi_t(x))& =\Bigl\langle {\hbox{Hess}}U \circ \Psi_t(x) {\frac{\partial\Psi_t(x)}{\partial t}}, {\frac{\partial\Psi_t(x) }{\partial t}}\Bigr\rangle \nonumber\\ & \quad -\Vert\nabla\psi (x)\Vert^2 \bigl\langle (\nabla U )\circ\Psi_t(x), \Psi_t(x)\bigr\rangle,\end{align}

where the final term is zero since $\nabla U\circ \Psi _t(x)$ is in the tangent space at $\Psi _t(x)$, hence is perpendicular to $\Psi _t(x)$. We therefore have the crucial inequality

(1.32)\begin{align} {\hbox{Ent}}(\nu\mid\mu )& \geq\int_{{\bf S}^2} \Bigg( -\log\det_2(A+H)+K(\Vert\nabla \psi (x)\Vert)\nonumber\\ & \quad +\int_0^1(1-t)\Bigl\langle {\hbox{Hess}}U\circ \Psi_t(x){\frac{\partial\Psi_t(x) }{\partial t}}, {\frac{\partial\Psi_t(x) }{\partial t}}\Bigr\rangle {\rm d}t\Bigg)\mu ({\rm d}x) \end{align}

To simplify the function $K$, we recall from [Reference Gradsteyn and Ryzhik9] 8.342 the Maclaurin series

(1.33)\begin{align} \log{\frac{\alpha}{\sin\alpha}}& =\log\Gamma \Bigg(1+ {\frac{\alpha}{\pi}}\Bigg)+\log\Gamma \Bigg(1-{\frac{\alpha}{\pi}}\Bigg)\nonumber\\ & =\sum_{m=1}^\infty {\frac{\zeta (2m)}{\pi^{2m}m}}\alpha^{2m}\qquad (\vert\alpha\vert<\pi ), \end{align}

where we have introduced Euler's $\Gamma$ function and Riemann's $\zeta$ function, so

(1.34)\begin{equation} K(\alpha )=\sum_{m=1}^\infty {\frac{(2m+1)\zeta (2m)}{\pi^{2m}m}}\alpha^{2m}\geq {\frac{3\zeta(2)}{\pi^2}}\alpha^2={\frac{\alpha^2}{2}}.\end{equation}

Now we consider (1.32) with the hypothesis (1.27) in force. The Carleman determinant contributes a nonnegative term as in (1.25), while the final integral in (1.32) combines with the integral of $K(\Vert \nabla \psi (x)\Vert )$ to give

(1.35)\begin{align} {\hbox{Ent}}(\nu \mid \mu )& \geq \int_{{\bf S}^2} \Bigl( K(\Vert \nabla\psi (x)\Vert )+{\frac{1}{2}}\Vert \nabla\psi (x)\Vert^2\Bigr)\mu ({\rm d}x)\nonumber\\ & \geq{\frac{\kappa_U}{2}}\int_{{\bf S}^2}\Vert \nabla\psi (x)\Vert^2\mu ({\rm d}x)\nonumber\\ & ={\frac{\kappa_U}{2}}W_2(\nu,\mu)^2. \end{align}

When $\mu$ is normalized surface area, $U$ is a constant and the hypothesis (1.27) holds with $\kappa _U=1$.

2. Transportation on compact Riemannian manifolds

Let $M$ be a connected, compact and $C^\infty$ smooth Riemannian manifold of dimension $n$ without boundary, and let $g$ be the Riemannian metric tensor, giving metric $d$. Let $\mu ({\rm d}x)=e^{-U(x)}{\rm d}x$ be a probability measure on $M$ where ${\rm d}x$ is Riemannian measure and $U\in C^2(M; {\bf R})$. Suppose that $\nu$ is a probability measure on $M$ that is of finite relative entropy with respect to $\mu$. Then by McCann's theory [Reference McCann12], there exists a Lipschitz function $\psi :M\rightarrow {\bf R}$ such that $\Psi (x)=\exp _x(\nabla \psi (x))$ induces $\nu$ from $\mu$. then we let $\Psi _t(x)=\exp _x(t\nabla \psi (x))$. We proceed to compute quantities which we need for our extension of lemma 1.2.

Given distinct points $x,\,y\in M$, we suppose that $x=\exp _y(\xi )$, and for $w\in T_yM$ introduce

(2.1)\begin{equation} \gamma (s,t)= \exp_y(t(\xi +sw))\end{equation}

so that $t\mapsto \gamma (s,\,t)$ is a geodesic, and in particular $\gamma (0,\,t)$ is the geodesic from $y=\gamma (0,\,0)$ to $x=\gamma (0,\,1)$. When $y=\exp _x(\nabla \psi (x))$ for a Lipschitz function $\psi :M\rightarrow {\bf R}$, we can determine $\xi$ as follows. Let $\phi (z)=-\psi (z)$ and introduce its infimal convolution

(2.2)\begin{equation} \phi^c (y)=\inf_w\Bigl\{ {\frac{1}{2}}d(y,w)^2-\phi (w)\Bigr\} \end{equation}

which is attained at $x$ since $y=\exp _x(\nabla \psi (x))=\exp _x(-\nabla \phi (x))$. Now $\phi ^{cc}(x)=\phi (x)$, so

(2.3)\begin{equation} \phi (x)=\inf_w\Bigl\{ {\frac{1}{2}}d(x,w)^2-\phi^c (w)\Bigr\} \end{equation}

where the infimum is attained at $y$ since $\phi (x)+\phi ^c(y)=d(x,\,y)^2/2$. By lemma 2 of [Reference McCann12], $\phi ^c$ is Lipschitz and

(2.4)\begin{equation} x=\exp_y(-\nabla \phi^c(y)). \end{equation}

The speed of $\gamma (0,\,t)$ is given by

(2.5)\begin{align} \Bigl\Vert {\frac{\partial\gamma}{\partial t}}\Bigr\Vert& =\Vert\nabla \phi^c(y)\Vert =d(y, \exp_y(-\nabla\phi^c(y)))=d(x,y)\nonumber\\ & =d(x,\exp_x(-\nabla\phi (x)))=\Vert\nabla\psi (x)\Vert .\end{align}

Let $R$ be the curvature of the Levi–Civita derivation $\nabla$ so

\[ R(X,Y)Z=\nabla_Y\nabla_XZ-\nabla_X\nabla_YZ-\nabla_{[Y,X]}Z\quad (X,Y,Z\in T_xM). \]

Then by [Reference Pedersen13] p 36, for all $Y\in T_xM$, the curvature operator $R_Y: X\mapsto R(X,\,Y)Y$ is self-adjoint with respect to the scalar product on $T_xM$. Also

(2.6)\begin{equation} Y(s,t)={\frac{\partial}{\partial s}}\gamma (s,t)\end{equation}

satisfies the initial conditions

(2.7)\begin{equation} Y(s,0)=0,\quad {\frac{\partial Y}{\partial t}}(0,0)=w,\end{equation}

and Jacobi's differential equation [Reference Chavel4] (2.43)

(2.8)\begin{equation} {\frac{\partial^2Y}{\partial t^2}}+R\Bigg( {\frac{\partial\gamma}{\partial t}}, Y\Bigg){\frac{\partial\gamma}{\partial t}} =0.\end{equation}

By calculating the first variation of the length formula [Reference Pedersen13] p 161, one shows that

(2.9)\begin{equation} {\frac{1}{2}}\Bigl\langle {\hbox{Hess}}_x d(x,y)^2 Y(0,1),Y(0,1)\Bigr\rangle =g\Bigg({\frac{\partial Y}{\partial t}}(0,1), Y(0,1)\Bigg).\end{equation}

Assume that there are no conjugate points on $\gamma (s,\,t)$. Then by varying $w$, we can make $Y(0,\,1)$ cover a neighbourhood of $0$ in $T_xM$. Let

(2.10)\begin{equation} A={\frac{1}{2}}{\hbox{Hess}}_x d(x,y)^2\Bigr\vert_{y=\exp_x(\nabla\psi (x))},\end{equation}

and

(2.11)\begin{equation} H =\psi (x).\end{equation}

Let $J_{\exp _x}(v)$ be the Jacobian of the map $T_xM\rightarrow M$ given by $v\mapsto \exp _x(v)$, as in (3.4) of [Reference Cabre3].

Lemma 2.1 Suppose that $\Psi _t (x)=\exp _x(t \nabla \psi (x))$, where $\Psi _1$ induces the probability measure $\nu$ from $\mu$ and gives the optimal transport map for the $W_2$ metric. Then the relative entropy satisfies

(2.12)\begin{align} {\hbox{Ent}}(\nu \mid\mu )& \geq\int_{M}\Bigg( {\hbox{trace}}\,\bigl( H-\log (A+H)\bigr)-\log J_{\exp_x}(\nabla\psi (x))\nonumber\\ & \quad +\int_0^1(1-t) \Bigl\langle {\hbox{Hess}}\,U\circ \Psi_t(x){\frac{\partial \Psi_t(x)}{\partial t}}, {\frac{\partial\Psi_t(x)}{\partial t}}\Bigr\rangle {\rm d}t\Bigg)\mu ({\rm d}x). \end{align}

where $H$ is symmetric and $A+H$ is also positive definite. If $\psi \in C^2(M; {\bf R})$, then equality holds in (2.12).

Proof. This is similar to lemma 1.2. As in (1.5), we have

(2.13)\begin{align} {\hbox{trace}}\,\bigl( H-\log (A+H)\bigr)& ={-}\log\det_2(A+H)+{\hbox{trace}}(I-A)\nonumber\\ & \geq {\hbox{trace}}(I-A), \end{align}

and by standard calculations [Reference Pedersen13] p 32 we have

(2.14)\begin{equation} {\frac{\partial^2}{\partial t^2}}U (\Psi_t(x))=\Bigl\langle {\hbox{Hess}}U\circ \Psi_t(x){\frac{\partial \Psi_t(x)}{\partial t}}, {\frac{\partial\Psi_t(x)}{\partial t}}\Bigr\rangle\end{equation}

since $\Psi _t(x)$ is a geodesic.

The curvature operator is the symmetic operator $R_Z:Y\mapsto R(Z,\,Y)Z$. If $M$ has nonnegative Ricci curvature so that $R_Z\geq 0$ as a matrix for all $Z$, then we have

(2.15)\begin{equation} -\log J_{\exp_x}(\nabla\psi (x))\geq 0.\end{equation}

by (3.4) of [Ca].

The following result recovers the Lichnérowicz integral, as in (4.16) of [Reference Blower1] and (1.1) of [Reference Deuschel and Stroock7]. This integral also appears implicitly in the Hessian calculations in appendix D of [Reference Lott and Villani11]. Let $\Vert H\Vert _{HS}$ be the Hilbert–Schmidt norm of $H$.

Proposition 2.2 Suppose that $\psi \in C^2(M; {\bf R})$ and $\Psi _\tau (x)=\exp _x(\tau \nabla \psi (x))$ induces a probability measure $\nu _\tau$ from $\mu$ such that $\Psi _\tau$ is the optimal transport map for the $W_2$ metric. Then

(2.16)\begin{align} {\hbox{Ent}}(\nu_\tau \mid \mu )& ={\frac{\tau^2}{2}}\int_{M} \Bigg( \Vert {\hbox {Hess}}\,\psi (x)\Vert_{HS}^2 +{\hbox{trace}}\, R_{\nabla\psi (x)}\nonumber\\ & \quad +\bigl\langle {\hbox{Hess}}\,U (x)\nabla\psi (x),\nabla\psi (x)\bigr\rangle\Bigg) \mu ({\rm d}x)+O(\tau^3)\quad (\tau\rightarrow 0+). \end{align}

Proof. For small $\tau >0$, we rescale $\psi$ to $\tau \psi$ and consider $y=\exp _x(\tau \nabla \psi (x))$; then we return to $x$ along a geodesic $\gamma _\tau (t)=\exp _y(-t\nabla (-\tau \psi )^c(y))$ for $0\leq t\leq 1$ with constant speed $\tau \Vert \nabla \psi (x)\Vert$. Observe that $\tau \psi (x)=(-\tau \psi )^c(y)-\tau ^2\Vert \nabla \psi (x)\Vert ^2/2$, and $\nabla _xd(x,\,y)^2/2=-\exp _x^{-1}(y)=-\tau \nabla \psi (x)$ and $\nabla _yd(x,\,y)^2/2=-\exp _y^{-1}(x)=\nabla (-\tau \psi )^c(y)$ by Gauss's Lemma. Recalling that the curvature operator is self-adjoint by page 36 of [Reference Pedersen13], we choose the basis of $T_yM$ so that the first basis vector points along the direction of the geodesic $\gamma _\tau (0)$. Hence Jacobi's equation (2.8) can be expressed as a second-order differential equation in block matrix form, with a symmetric matrix $S_{-\nabla (-\tau \psi )^c(y)}$ given by components of the curvature tensor such that

(2.17)\begin{equation} R\Bigg( {\frac{d\gamma_\tau}{{\rm d}t}},Y\Bigg){\frac{d\gamma_\tau}{{\rm d}t}} =\left[\begin{array}{@{}cc@{}}0 & 0\\ 0 & S_{-\nabla(-\tau\psi )^c(y)}\end{array}\right]Y\quad (0< t<1).\end{equation}

as in (2.4) of [Reference Cordero-Erausquin, McCann and Schmuckensläger6]. Then the Jacobi equation reduces to a first-order block matrix equation with blocks of shape $(1+(n-1))\times (1+(n-1))$ in a $(2n)\times (2n)$ matrix

(2.18)\begin{equation} {\frac{d}{{\rm d}t}}\left[\begin{array}{@{}c@{}}Y\\ V\end{array}\right] =\left[\begin{array}{@{}cccc@{}}0 & 0 & 1 & 0\\ 0 & 0 & 0 & I_{n-1}\\ 0 & 0 & 0 & 0\\ 0 & -S_{-\nabla (-\tau\psi )^c(y)} & 0 & 0\end{array}\right] \left[\begin{array}{@{}c@{}}Y\\ V\end{array}\right] ;\quad \left[\begin{array}{@{}c@{}} Y(0)\\ V(0)\end{array}\right] =\left[\begin{array}{@{}c@{}}0\\ w\end{array}\right].\end{equation}

To find the limit as $\tau \rightarrow 0$, we can assume that $S_{-\nabla (-\tau \psi )^c (y)}$ is constant on the geodesic, and may be expressed as $\tau ^2 S$ where $\tau ^2 S=S_{\tau \nabla \psi (x)}$ has shape ${(n-1)\times (n-1)}$. The functions $\cos \alpha$ and $\sin \alpha /\alpha$ are entire and even, so $\cos \sqrt {s}$ and $\sin \sqrt {s}/\sqrt {s}$ are entire functions, hence they operate on complex matrices. Note that the matrix

\[ T=\left[\begin{array}{@{}cc@{}}0 & 0\\ 0 & S_{-\nabla(-\tau\psi)^c(y)}\end{array}\right] \]

in the bottom left corner is symmetric, has rank less than or equal to $n-1$, and does not depend upon $t$. Hence we consider the matrix

\[ \left[\begin{array}{@{}c@{}} Y\\ V\end{array}\right] =\left[\begin{array}{@{}cc@{}}\cos (t\sqrt {T}) & {\dfrac{\sin( t\sqrt {T})}{\sqrt {T}}}\\ -\sqrt T\sin (t\sqrt{ T}) & \cos ( t\sqrt { T})\end{array}\right]\left[\begin{array}{@{}c@{}} Y_0\\ V_0\end{array}\right] \]

which has derivative

\[ {\frac{d}{{\rm d}t}}\left[\begin{array}{@{}c@{}} Y\\ V\end{array}\right] =\left[\begin{array}{@{}cc@{}} 0 & I\\ -T & 0\end{array}\right]\left[\begin{array}{@{}cc@{}}\cos (t\sqrt {T}) & {\dfrac{\sin (t\sqrt {T})}{\sqrt {T}}}\\ -\sqrt {T}\sin (t\sqrt {T}) & \cos (t\sqrt {T})\end{array}\right]\left[\begin{array}{@{}c@{}} Y_0\\ V_0\end{array}\right] \]

so we can use this formula to solve (2.18). So the approximate differential equation has solution

(2.19)\begin{equation} \left[\begin{array}{@{}c@{}}Y(1)\\ V(1)\end{array}\right] =\left[\begin{array}{@{}cccc@{}} 1 & 0 & 1 & 0\\ 0 & \cos \tau\sqrt{S} & 0 & {\dfrac{\sin \tau\sqrt{S}}{\tau\sqrt{S}}}\\ 0 & 0 & 1 & 0\\ 0 & -\tau\sqrt{S}\sin \tau\sqrt{S} & 0 & \cos \tau\, \sqrt{S}\end{array}\right]\left[\begin{array}{@{}c@{}}0\\ w\end{array}\right].\end{equation}

Hence by (2.9) we have

(2.20)\begin{equation} A=\left[\begin{array}{@{}cc@{}} 1 & 0\\ 0 & {\dfrac{\tau\sqrt{S}}{\tan\tau\sqrt{S}}}\end{array}\right]=(1+O(\tau^2))I_n\end{equation}

which gives rise to the approximation

(2.21)\begin{equation} {\hbox{trace}}(I_n-A)={\hbox{trace}}\Bigg( I_{n-1}-{\frac{\tau\sqrt{S}}{\tan\tau\sqrt{S}}}\Bigg)= {\frac{\tau^2}{3}}{\hbox{trace}}(S)+O(\tau^4)\qquad (\tau\rightarrow 0+),\end{equation}

and likewise we obtain

(2.22)\begin{equation} -\log J_{\exp_x}(\tau \nabla\psi (x)) ={-}\log\det {\frac{\sin \tau\sqrt{S}}{\tau\sqrt{S}}}={\frac{\tau^2}{6}}{\hbox{trace}}(S)+O(\tau^4).\end{equation}

From (2.19), we have

(2.23)\begin{align} -\log\det_2 (A+\tau H)& ={\frac{1}{2}}{\hbox{trace}}\bigl( (A-I_n+\tau H)^2\bigr)+O(\tau^3)\nonumber\\ & ={\frac{\tau^2}{2}}{\hbox{trace}}(H^2)+O(\tau^3)\nonumber\\ & ={\frac{\tau^2}{2}}\Vert {\hbox{Hess}}\, \psi (x)\Vert_{HS}^2+O(\tau^3), \end{align}

so the result follows by lemma 2.1.

We conclude with a transportation inequality which generalizes proposition 1.3 to the unit spheres ${\bf S}^n$. See [Reference Blower and Bolley2] for a discussion of measures on product spaces.

Theorem 2.3 Let $M={\bf S}^n$ for some $n\geq 2,$ and suppose that

(2.24)\begin{equation} (n-1)I+{\hbox{Hess}} \,U (x)\geq \kappa_U I\qquad (x\in {\bf S}^{n})\end{equation}

for some $\kappa _U>0$. Then

(2.25)\begin{equation} {\hbox{Ent}}(\nu\mid\mu )\geq {\frac{\kappa_U} {2}}W_2(\nu, \mu )^2.\end{equation}

Proof. In this case, the curvature operator is constant, so we have $S_{\nabla \psi (x)} Y=\Vert \nabla \psi (x)\Vert ^2Y$, so

(2.26)\begin{equation} {\hbox{trace}}\,R_{\nabla\psi (x)} =(n-1)\Vert \nabla\psi (x)\Vert^2.\end{equation}

Thus the result follows with a similar proof to proposition 1.3 using data from the proof of proposition 2.2.

Acknowledgments

I thank Graham Jameson for helpful remarks concerning inequalities which led to (1.34). I am also grateful to the referee, whose helpful comments improved the exposition.

References

Blower, G.. The Gaussian isoperimetric inequality and transportation. Positivity 7 (2003), 203–224.CrossRef Google Scholar

Blower, G. and Bolley, F.. Concentration of measure on product spaces with applications to Markov processes. Studia Math. 175 (2006), 47–72.CrossRef Google Scholar

Cabre, X.. Nondivergent elliptic equations on manifolds with nonnegative curvature. Comm. Pure Appl. Math. 50 (1997), 623–665.3.0.CO;2-9>CrossRef Google Scholar

Chavel, I.. Riemannian Geometry: a modern introduction (Cambridge: Cambridge University Press, 1993).Google Scholar

Cordero-Erausquin, D.. Prékopa–Leindler inequalities sur la sphère. C.R. Acad. Sci. Paris 329 (1999), 789–792.CrossRef Google Scholar

Cordero-Erausquin, D., McCann, R. J. and Schmuckensläger, M.. Prékopa–Leindler type inequalities on Riemannian manifolds, Jacobi fields and optimal transport, Annales. de la Fac. Sci. Toulouse Math. 15 (2006), 613–635.CrossRef Google Scholar

Deuschel, J.-D. and Stroock, D. W.. Hypercontractivity and spectral gap of symmetric diffusions with applications to stochastic Ising models. J. Funct. Anal. 92 (1990), 30–48.CrossRef Google Scholar

Dudley, R. M.. Real Analysis and Probability, 2nd ed. (Cambridge: Cambridge University Press, 2004).Google Scholar

Gradsteyn, I. S. and Ryzhik, I. M.. Table of Integrals, Series and Products (Boston: Academic Press, 1965).Google Scholar

Kimura, Y. and Okamoto, H.. Vortex motion on a sphere. J. Phys. Soc. Japan 56 (1987), 4203–4206.CrossRef Google Scholar

Lott, J. and Villani, C.. Ricci curvature for metric measure spaces via optimal transport. Annals of Math. (2) 169 (2009), 903–991.CrossRef Google Scholar

McCann, R. J.. Polar factorization of maps on Riemannian manifolds. Geom. Funct. Anal. 11 (2001), 589–608.CrossRef Google Scholar

Pedersen, P.. Riemannian Geometry, 2nd ed. (New York: Springer, 2006).Google Scholar

Villani, C.. Optimal Transport: Old and New (Berlin: Springer, 2009).CrossRef Google Scholar

Article contents

Transportation on spheres via an entropy formula

Abstract

Keywords

MSC classification

1. Transportation on the sphere

2. Transportation on compact Riemannian manifolds

Acknowledgments

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests