Each Feller process with continuous paths has its associated Fokker-Planck equation for a second order differential operator that generates the process. In particular for standard Brownian motion on euclidean spaces the generator bares an intimate relationship with the Laplacian operator This was discussed in detail in my previous blog. Here we consider the so called Langevin process, which is the solution of the stochastic differential equation where the drift is a gradient. It turns out that for this process the solution of the associated Fokker-Planck equation is the minimizer of an action. This variational formulation originated in
Reference
The Variational Formulation of the Fokker-Planck Equation
The optimization can be done with gradient descent in an appropriate metric space called Wasserstein space, whose elements are probability measures, and the metric is given by a transportation cost. Rather than discussing the variational formulation of the Fokker-Planck equation associated with the Langevin equation above, we will first dive into the general theory of optimial transportationwhich will give us a better picture of the underlying concept of Wasserstein space and related theorems.
Elements of Optimal Transportation
In some sense diffusion models transport an unknown data distribution into a familiar distribution, the Gaussian distribution. So let us discuss transportation with respect to the standard Gaussian measure on
First some backgrounds. Let be a measurable space and let be a function on the space that measures the cost of transporting one unit of mass from to . We define the transportation cost of a probability measure on to another probability measure on as the infimum of the integral over all probability measures on such that is the first marginal and the second marginal of .
As a fun historical fact, more than 200 years ago Monge wanted to figure out how to move rubble to build a fortification so as to minimize the cost. In his original formulation, the cost from moving units of mass from to in is . Our modern day formulation involves two probability measures representing the rubble and fortification defined on probability spaces and find a measurable mapping (the so called transport map) which induces the pushforward i.e. so as to minimize the transportation cost
Something interesting occurs when for a metric on where the supremum is taken over all on such that for all .
Gaussian Transport Inequalities of Talagrand
Let be the square of the usual distance on euclidean space. An interesting inequality due to M. Talagrand
Reference
Transportation cost for Gaussian and other product measures
states that for any measure on that is absolutely continuous with then we have the following upper bound on the transport cost where is the Radon-Nikodym derivative (aka the density of with respect to ) and the right hand integral is the relative entropy, or KL-divergence of the measures. In Wasserstein distance this says
Let Likewise if and and both are measures on and the former absolutely continuous with the latter, then
Quadratic Cost Transportation
If the cost function is the square of some metric on then the square root of the transportation cost is a metric between pairs of probability measures and it called the Wasserstein distance. metrizes the weak-star topology on the space of probability measures on with finite second moment, that is for any In other words, convergence of measures under the Wasserstein-2 distance above is equivalent to weak convergence of measure in the sense that for every bounded continuous function on .
Information Theory Concepts
The relative entropy between measures The relative Fisher information The probability measure satisfies a logarithmic Sobolev inequality with constant , denoted LSI() if for every probability measure on absolutely continuous with The probability measure satisfies a Talagrand inequality with constant , denoted T() if for every probability measure on absolutely continuous with with finite second moments, The probability measure satisfies LSI + T() if for every probability measure on absolutely continuous with
Theorem of Otto and Villani
Let be a n-dimensional smooth and complete Riemmanian manifold, with denoting the Ricci curvature tensor. Let be a probability measure with finite second order moment, where and for a . If satisfies denoted LSI() for some ), then it also satisfies T() and LSI + T()
Reference
Generalization of an Inequality by Talagrand and Links with the Logarithmic Sobolev Inequality
Let be a probability measure on with finite second order moment, where and for a . Then for all probability measures on absolutely continuous with In particular if is convex, then See the discussion in
Reference
Generalization of an Inequality by Talagrand and Links with the Logarithmic Sobolev Inequality
Whereas the model elliptic equation prescribes the sum of the eigenvalues of the Hessian of the Monge-Ampère Equation prescribes the product of its eigenvalues: for open subset convex function and precribed function. The Gaussian curvature equation with serve the basic prototype of Monge-Ampère Equation which imposes the Guassian curvature of the graph of at .
A variation of the equation turns out to be useful for optimal transportation in the study of the regularity of transport maps.
Studying weak solution (so called Alexandrov solutions) of the Monge-Ampère equaiton led to the following formulation. First recall some concepts. Given an open connected convex set the subdifferentialof a convex function is defined The Monge-Ampère measure of is defined for every Boreal set where denotes the Lebesgue measure and It turns out in .
Given an open convex set and a Boreal measure on , a convex function is called an Alexandrov solutionto the Monge-Ampère Equation if as Boreal measures.
The notation above is such that if then we say solves .
Let be a convex function defined on the bounded open and connected set . The Alexandrov maximum principle is that if on then
A theorem states that for a bounded open and connected and nonnegative Boreal measure the following problem has a Anexandrov solution
It turns out that if and are compactly supported probability measures on and is absolutely continuous with the Lebesgue measure then
There exists a unique solution of the optimal transport problem with quadratic cost (which turns out to be equivalent to the cost )
There exists a convex function such that the optimal transport map is given by for a.e. .
If and then is differentiable a.e. and for a.e.
Any convex function admits Hessian in the distributional sense. It turns out that solves the Monge-Ampère Equation with boundary condition . In this case we call a Brenier solution of the Monge-Ampère Equation. More about this and the partial transport problem can be found in
Reference
G. de Philippis, A. Figalli, The Monge-Ampere Equation And Its Link To Optimal Transportation, Bull. AMS
We now sketch out some interesting ideas not too rigorouly . On a d-dimensional Riemmanian manifold M with metric g, and reference measure with a potential function we can define a differential operator . We define which gives rise to a quadratic energy Moreover L generates a semigroup which gives rise to the solution of the diffusion eqution
When and , then the operator L is the Laplacian, and the corresponding process is heat flow. For then we have a drift-diffusion operator. In particular the potential gives rise to a Gaussian measure and the corresponding Ornstein-Uhlenbeck operator