Tails of exit times from unstable equilibria on the line

Yuri Bakhtin; Zsolt Pajor-Gyulai

doi:10.1017/jpr.2020.16

Tails of exit times from unstable equilibria on the line

Part of: Markov processes Stochastic analysis Random dynamical systems

Published online by Cambridge University Press: 16 July 2020

Yuri Bakhtin and

Zsolt Pajor-Gyulai

Show author details

Yuri Bakhtin*: Affiliation:
New York University
Zsolt Pajor-Gyulai*: Affiliation:
New York University
*: *Postal address: Courant Institute of Mathematical Sciences, New York University, 251 Mercer St, New York, NY, 10012, USA. Email: bakhtin@cims.nyu.edu
*Postal address: Courant Institute of Mathematical Sciences, New York University, 251 Mercer St, New York, NY, 10012, USA. Email: bakhtin@cims.nyu.edu

Article contents

Abstract
Introduction
Proof of Theorem
Linear system with additive noise
Proof of Theorem 2 for $\alpha\in(1,1+\beta)$
Extension to arbitrary timescales
Rare transitions in heteroclinic networks
References

Rights & Permissions

Abstract

For a one-dimensional smooth vector field in a neighborhood of an unstable equilibrium, we consider the associated dynamics perturbed by small noise. We give a revealing elementary proof of a result proved earlier using heavy machinery from Malliavin calculus. In particular, we obtain precise vanishing noise asymptotics for the tail of the exit time and for the exit distribution conditioned on atypically long exits. We also discuss our program on rare transitions in noisy heteroclinic networks.

Keywords

vanishing noise limit unstable equilibrium exit problem polynomial decay

MSC classification

Primary: 60H10: Stochastic ordinary differential equations

Secondary: 60J60: Diffusion processes 37H10: Generation, random and stochastic difference and differential equations

Type: Research Papers
Information: Journal of Applied Probability , Volume 57 , Issue 2 , June 2020 , pp. 477 - 496

DOI: https://doi.org/10.1017/jpr.2020.16 [Opens in a new window]
Copyright: © Applied Probability Trust 2020

1. Introduction

In a recent paper [Reference Bakhtin and Pajor-Gyulai4] we studied tails of diffusion exit times from neighborhoods of unstable critical points, in the limit of vanishing noise, in one dimension. The typical exit time $\kappa_{\varepsilon}$ in this setting (see the detailed description of the setting below) is of the order of $\frac{1}{\lambda}\log\frac{1}{{\varepsilon}}$ , where $\lambda>0$ is the local expansion coefficient of the linearization of the system near the critical point, and ${\varepsilon}\downarrow 0$ is the noise magnitude. The main result of [Reference Bakhtin and Pajor-Gyulai4] is that the following polynomial asymptotics holds for a class of initial conditions near the critical point:

(1.1)

\begin{equation}\mathbb{P}\left(\kappa_{\varepsilon}>\frac{\alpha}{\lambda}\log \frac{1}{{\varepsilon}}\right)=c{\varepsilon}^{\alpha-1}(1+o(1)),\qquad {\varepsilon}\downarrow 0,\end{equation}

for $\alpha>1$ , with an explicit dependence of the factor c on the initial condition and the parameters of the model; see (1.7) below for details.

This result is part of an ongoing effort to understand the long-term properties of multi-dimensional diffusions in the context of noisy heteroclinic networks, including the limiting behavior of invariant distributions associated with such systems in the vanishing noise limit. The typical behavior in such settings is understood for timescales logarithmic in ${\varepsilon}^{-1}$ ; see [Reference Almada and Bakhtin1, Reference Bakhtin2, Reference Bakhtin3]. To see what happens in the long run, though, one has to quantify rare events responsible for transitions that are atypical at the logarithmic time scale. Our work in progress shows that these rare events play a crucial role in the long-term dynamics near noisy heteroclinic networks. Moreover, we argue that they occur exactly due to atypically long stays near unstable critical points. The resulting picture is similar to that of metastability but with polynomial transition rates in place of exponential ones. We give more details on this picture in Section 6, while here we only reiterate that the result of the form (1.1) and its ramifications will be crucial for that program. However, the technique we used in [Reference Bakhtin and Pajor-Gyulai4] to analyze densities of auxiliary random variables was based on heavy tools from Malliavin calculus. That approach somewhat obscures the reason why this result is true and does not seem to be tractable when applied to the study of the analogous exit problem in the neighborhood of a hyperbolic saddle in ${\mathbb R}^d$ , $d>1$ , i.e. when both attracting and repelling directions are present.

In the present note, our goal is to give a new proof of this result that (a) is based on a more precise description of the dynamics at small scales, (b) uses more elementary tools of stochastic calculus, and (c) has a strong potential to be applicable in higher dimensions. In fact, we prove a slightly more general result on probabilities of the form $\mathbb{P}\big(\kappa_{\varepsilon}>\frac{\alpha}{\lambda}\log \frac{1}{{\varepsilon}}+t\big)$ , $\alpha>1$ , for all $t\in{\mathbb R}$ instead of $t=0$ as considered in [Reference Bakhtin and Pajor-Gyulai4]. It turns out that, asymptotically, the dependence on t is exponential, which implies that for any $T\in{\mathbb R}$ , $\kappa_{\varepsilon}-\frac{\alpha}{\lambda}\log \frac{1}{{\varepsilon}}-T$ conditioned on $\kappa_{\varepsilon}-\frac{\alpha}{\lambda}\log \frac{1}{{\varepsilon}}>T$ converges in distribution to an exponential random variable. This phenomenon is a manifestation of loss of memory in the system under conditioning, and it is consistent with the fact that $\kappa_{\varepsilon} - \frac{1}{\lambda}\log \frac{1}{{\varepsilon}}$ converges in distribution to a random variable with exponentially decaying right tails; see, e.g., [Reference Bakhtin2].

An important ingredient in this note is Lemma 3.2, a conditional equidistribution result that states that the distribution of the diffusion, conditioned on no exit from a small interval, converges to the uniform distribution. Thus our new approach is closer in spirit to the one based on quasi-stationary distributions; see [Reference Champagnat and Villemonais6]. However, the existing general theory does not provide answers for us since in our situation both the system and the domain depend on ${\varepsilon}$ . Moreover, the timescales we are interested in are too short for the $t\to\infty$ limit to be a good approximation while taking $\varepsilon\downarrow 0$ .

Let us be more precise now. We consider the family of stochastic differential equations

(1.2)

\begin{equation}{\rm d} X_{{\varepsilon}}(t)=b\left(X_{{\varepsilon}}(t)\right){\rm d} t+{\varepsilon}\sigma(X_{{\varepsilon}}(t)){\rm d} W(t) \end{equation}

on a bounded interval $\mathcal{I}=[q_-,q_+]\subseteq \mathbb{R}$ with the origin in its interior. The drift is given by a vector field $b\in\mathcal{C}^{2}({\mathbb R})$ and the random perturbation is given via a standard Brownian motion W with respect to a filtration $(\mathcal{F}_t)_{t\ge 0}$ defined on some probability space $(\Omega,\mathcal{F},\mathbb{P})$ under the usual conditions. The noise magnitude is given by a small parameter ${\varepsilon}>0$ in front of the diffusion coefficient $\sigma$ , which is assumed to be Lipschitz and satisfy $\sigma(0)>0$ . Although we are interested only in the evolution within $\mathcal{I}$ , we can assume that b and $\sigma$ are globally Lipschitz without changing the setting.

Standard results on stochastic differential equations (see, e.g., [8, Chapter 5]) imply that, for any starting location $X^{{\varepsilon}}(0)\in\mathcal{I}$ , (1.2) has a unique strong solution up to the exit time from $\mathcal{I}$ ,

\begin{equation*}\tau_{\mathcal{I}}^{{\varepsilon}}=\inf\{t\geq 0\,{:}\,X_{{\varepsilon}}(t)\in\partial \mathcal{I}\}.\end{equation*}

Let $(S^t)_{t\in{\mathbb R}}$ be the flow generated by the vector field b, i.e. $x(t)=S^tx_0$ is the solution of the autonomous ordinary differential equation

\begin{equation*}\dot{x}(t)=b(x(t)),\qquad x(0)=x_0\in{\mathbb R}\end{equation*}

(we recall that solving this equation for negative times is equivalent to solving $\dot y(t)=-b(y(t))$ for $y(t)=x(\!-t)$ ). We assume that there is a unique repelling zero of the vector field b on ${\mathbb R}$ , which, without loss of generality, we place at the origin. In other words, we assume that $b(0)=0$ and, for some $\lambda>0$ and $\eta\in\mathcal{C}^2(\mathcal{I})$ ,

(1.3)

\begin{equation}b(x)=\lambda x+\eta(x)|x|^2,\qquad x\in\mathcal{I}.\end{equation}

Note that since the origin is the only zero of b in the closed interval $\mathcal{I}$ , this assumption implies that, for all $x\neq 0$ , there is a uniquely defined finite time T(x) such that $S^{T(x)}\in\partial\mathcal{I}$ .

Under the condition (1.3), the map $f\,{:}\,\mathcal{I}\to{\mathbb R}$ defined by

(1.4)

\begin{equation}f(x)=\lim_{t\to\infty}{\rm e}^{\lambda t} S^{-t}x=x-\int_0^\infty {\rm e}^{\lambda s}\eta(S^{-s}x)|S^{-s}x|^2 \, {\rm d} s\end{equation}

is an order-preserving $\mathcal{C}^2$ -diffeomorphism (see [Reference Eizenberg7]). In particular, $f(q_-)<0<f(q_+)$ . This map linearizes the flow $(S^t)$ , see (2.1), and helps to state the main result concisely; see (1.7).

Under the above assumptions, a version of the following theorem was proved in [Reference Bakhtin and Pajor-Gyulai4]. In its statement and throughout the paper we use

(1.5)

\begin{equation}\psi(t,x)=\frac{1}{\sqrt{2\pi t}\,}{\rm e}^{-\frac{x^2}{2t}}.\end{equation}

Theorem 1.1. Consider $X_\varepsilon$ defined by (1.2) with initial condition $X_\varepsilon(0)=\varepsilon x$ , and let $K({\varepsilon})$ be any function that satisfies

(1.6)

\begin{equation}\lim_{\varepsilon\downarrow 0}\varepsilon^{\gamma}K(\varepsilon)=0 \qquad \text{for all}\ \gamma>0.\end{equation}

Then, for all $\alpha>1$ and all $t\in{\mathbb R}$ ,

(1.7)

\begin{equation}\lim_{\varepsilon\downarrow 0}\sup_{|x|\leq K(\varepsilon)}\left|\varepsilon^{-(\alpha-1)}\mathbb{P}\left(\tau_{\mathcal{I}}^\varepsilon>\frac{\alpha}{\lambda}\log\varepsilon^{-1}+t;\ X_\varepsilon(\tau_{\mathcal{I}}^\varepsilon)=q_{\pm}\right)-{\rm e}^{-\lambda t}|f(q_\pm)|\psi_0(x)\right|=0,\end{equation}

where

\begin{equation*}\psi_0(x)=\psi\left(\frac{\sigma^2(0)}{2\lambda},x\right)=\sqrt{\frac{\lambda}{\pi}}\frac{{\rm e}^{-\lambda\left(\frac{x}{\sigma(0)}\right)^2}}{\sigma(0)}.\end{equation*}

In particular, for any $T\in{\mathbb R}$ ,

\begin{align*}&\textsc{Law}\left[\left(\tau_{\mathcal{I}}^\varepsilon-\frac{\alpha}{\lambda}\log\varepsilon^{-1}-T,\ X_\varepsilon(\tau_{\mathcal{I}}^\varepsilon)\right) \, \Big| \, \tau_{\mathcal{I}}^\varepsilon>\frac{\alpha}{\lambda}\log\varepsilon^{-1}+T\right]\\[3pt] &\quad \Rightarrow \exp_\lambda\otimes\left(\frac{|f(q_-)|}{|f(q_-)|+|f(q_+)|}\delta_{q_-}+\frac{|f(q_+)|}{|f(q_-)|+|f(q_+)|}\delta_{q_+}\right)\!,\end{align*}

where $\Rightarrow$ stands for weak convergence, and $\exp_{\lambda}$ is the exponential distribution with rate $\lambda>0$ , i.e. $\exp_\lambda[t,\infty)={\rm e}^{-\lambda t}$ for $t\ge 0$ .

Remark 1.1. We say that a function satisfying (1.6) grows at most subpolynomially at 0. We say that a function $c(\varepsilon)$ decays at most subpolynomially at 0 if $1/c(\varepsilon)$ grows subpolynomially at 0. For brevity, we will usually omit the reference to 0 and simply say grows/decays subpolynomially even when the function might not actually grow.

Remark 1.2. The theorem is stated for initial conditions that are at most of the order of ${\varepsilon}$ away from the origin up to a subpolynomial factor. The case of initial conditions of the order of ${\varepsilon}^\beta$ for $\beta<1$ is less interesting since then the tails of exit times decay as stretched exponentials of ${\varepsilon}^{-1}$ instead of the power decay given by (1.7) (see Proposition 2.1).

Remark 1.3. In the statement of Theorem 1.1 and in the following we adopt the usual convention that each relation involving $\pm$ and $\mp$ stands for two relations, one with all top signs and one with all bottom signs.

The brief outline of our approach to the proof of this theorem is as follows. It is convenient to work in coordinates given by the function f defined in (1.4) where the drift is linear. We study the dynamics of the linear process in two separate phases: (1) in a neighborhood of the critical point of radius $\varepsilon^{\beta}$ for $\beta\in(0,1)$ ; (2) between leaving this small neighborhood and reaching the boundary of $f(\mathcal{I})$ .

In the second stage, the drift dominates the noise, and the process closely follows the corresponding deterministic trajectory one obtains by setting $\varepsilon=0$ . The outcome of the first stage, though, i.e. the exit from $[\!-\varepsilon^{\beta},\varepsilon^{\beta}]$ , is determined by a delicate interplay between the noise and the drift in an even smaller neighborhood of the origin ( $\beta$ can be chosen arbitrarily close to one). We study this regime by introducing an auxiliary process $Z_\varepsilon(t)$ with constant diffusion coefficient approximating $Y_\varepsilon(t)=f(X_\varepsilon(t))$ pathwise, at least over time intervals that are not too large and for which Theorem 1.1 is easier to establish. Since $Z_\varepsilon(t)$ and $Y_\varepsilon(t)$ do not, in general, stay close on longer timescales, we introduce an iterative scheme to tackle this problem. Namely, we split the longer time interval into shorter ones and show that a useful approximation result, which holds under conditioning on the process not having exited the spatial interval, can be applied sequentially.

The plan of the paper is as follows. In Section 2 we perform the aforementioned change of variables to linearize the drift and prove Theorem 1.1 using an intermediate result on the exit from a small neighborhood of the origin. In Section 3 we introduce an auxiliary process, which is fully linear and thus allows us to derive certain properties of the exit problem through explicit calculations. In Section 4 we prove an approximation result which allows us to transfer these properties from the fully linear process to the case where only the drift is linear, as long as the timescales involved are not too large. Finally, in Section 5 we use an iterative scheme to lift this limitation, thereby finishing the proof of the intermediate result. In Section 6 we explain how the result of this paper fits our program on long-term behavior of diffusions near heteroclinic networks.

2. Proof of Theorem 1.1

As outlined above, we study the system first in a small neighborhood of the origin and then after the process has escaped this small neighborhood.

Let us start with the first part. The diffeomorphism $f\,{:}\,\mathcal{I}\to{\mathbb R}$ introduced in (1.4) and its inverse $g=f^{-1}$ provide a conjugation between the flow $(S^t)$ and a linear flow:

(2.1)

\begin{equation}f(S^tx)={\rm e}^{\lambda t}f(x),\qquad\textrm{or}\qquad f'(x)b(x)=\lambda f(x).\end{equation}

Note that the integrand in (1.4) is quadratic when x is close to zero and thus we have $f(0)=0$ and $f'(0)=1$ . Outside of $\mathcal{I}$ , we define f so that f and f are bounded.

Let $Y_\varepsilon(t)=f(X_\varepsilon(t))$ for times prior to the escape from $\mathcal{I}$ . Itô’s formula and (2.1) then imply that this process satisfies the stochastic differential equation

(2.2)

\begin{equation}{\rm d} Y_\varepsilon(t)=\lambda Y_\varepsilon(t){\rm d} t+\varepsilon\tilde\sigma(Y_\varepsilon(t)){\rm d} W(t)+\frac{\varepsilon^2}{2}h(Y_\varepsilon(t)){\rm d} t\end{equation}

for $t<\tau_{\mathcal{I}}^\varepsilon$ , where $\tilde\sigma(y)=f'(g(y))\sigma(g(y))$ and $h(y)=f''(g(y))\sigma^2(g(y))$ . Due to the boundedness of f and f, $\tilde\sigma$ and h are also bounded.

By Duhamel’s formula, $Y_\varepsilon$ satisfies the integral equation

(2.3)

\begin{equation}Y_\varepsilon(t)={\rm e}^{\lambda t}\big(Y_\varepsilon(0) + \varepsilon U_\varepsilon(t)+\varepsilon^2 V_\varepsilon(t)\big),\end{equation}

where

\begin{equation*}U_\varepsilon(t)=\int_0^t{\rm e}^{-\lambda s}\tilde\sigma(Y_\varepsilon(s)) \, {\rm d} W(s),\qquad V_\varepsilon(t)=\frac{1}{2}\int_0^t{\rm e}^{-\lambda s}h(Y_\varepsilon(s)) \, {\rm d} s.\end{equation*}

Due to our conventions on f, f outside of $\mathcal{I}$ , the processes $U_{\varepsilon}(t)$ and $V_{\varepsilon}(t)$ are defined for all $t\ge 0$ . Moreover, the boundedness of h immediately implies the boundedness of $V_\varepsilon(t)$ :

(2.4)

\begin{equation}\|V_{\varepsilon}(\cdot)\|_\infty\le \frac{\|h\|_\infty}{2\lambda},\end{equation}

where $\|\cdot\|_\infty$ is the sup-norm on $[0,\infty)$ . The boundedness of $\tilde\sigma$ yields a similar conclusion about the quadratic variation of $U_\varepsilon(t)$ . Hence, the existence of constants $c_1,c_2, N_0>0$ such that

(2.5)

\begin{equation}\mathbb{P}\bigg(\sup_{t\ge 0}|U_\varepsilon(t\wedge\tau_\mathcal{I}^{\varepsilon})|\geq N\bigg)\leq c_1{\rm e}^{-c_2N^2},\qquad N\geq N_0,\end{equation}

is implied by the following exponential martingale inequality (see, e.g., Problem 12.10 in [Reference Bass5]):

Lemma 2.1. Let M(t) be a centered martingale with quadratic variation process $\langle M\rangle_t$ . Then

\begin{equation*}\mathbb{P}\bigg(\sup_{t\ge 0}|M(t)|\geq a; \langle M\rangle_{\infty}\leq b\bigg)\leq 2{\rm e}^{-\frac{a^2}{2b}}.\end{equation*}

Let us take $\beta\in(0,1)$ and set $\mathcal{V}=g\left([\!-\varepsilon^{\beta},\varepsilon^{\beta}]\right)\subseteq \mathcal{I}$ . The following result describes the tail behavior of $\tau_\mathcal{V}^\varepsilon$ , the exit time from $\mathcal{V}$ . In particular, it says that, in the $\varepsilon\downarrow 0$ asymptotics, the choice of the exit direction is distributed symmetrically independently of the exit time.

Theorem 2.1. Let $Y_\varepsilon(0)=\varepsilon y$ , where $|y|\leq K(\varepsilon)$ with $K(\varepsilon)$ growing subpolynomially at 0. Then, for all $\alpha>1$ , $C\in{\mathbb R}$ , and any function $c({\varepsilon})$ satisfying $\lim_{{\varepsilon}\to 0}c({\varepsilon})=0$ , there is $\beta_0\in(0,1)$ such that, for $\beta\in(\beta_0,1)$ , we have

\begin{equation*} \lim_{\varepsilon\downarrow 0}\sup_{|y|\leq K(\varepsilon)}\bigg|\varepsilon^{-(\alpha-1)}\mathbb{P}\bigg(\tau_{\mathcal{V}}^\varepsilon>\frac{\alpha-\beta}{\lambda}\log\varepsilon^{-1}-C+c(\varepsilon);\, Y_\varepsilon(\tau_\mathcal{V}^\varepsilon)=\pm\varepsilon^{\beta}\bigg)-{\rm e}^{\lambda C}\psi_0(y)\bigg|=0.\end{equation*}

We give the proof of Theorem 2.1 in Section 5.

After exit from $\mathcal{V}$ , the deterministic dynamics dominates the evolution, which means that the exit time will be close to

(2.6)

\begin{equation}T_\varepsilon^\pm\coloneqq T\left(g\left(\pm \varepsilon^{\beta}\right)\right)=\frac{\beta}{\lambda}\log\varepsilon^{-1}+\frac{1}{\lambda}\log |f(q_\pm)|,\end{equation}

the time it takes for $X_0(t)$ to exit $\mathcal{I}$ starting at $g\left(\pm \varepsilon^{\beta}\right)$ . This is captured by the following standard large deviation estimates.

Proposition 2.1. Let $X_{\varepsilon}(0)=g\left(\pm \varepsilon^{\beta}\right)$ . Then, for every $\beta'\in(0,\beta)$ and subpolynomially decaying function $c(\varepsilon)>0$ , we have

(2.7)

\begin{equation} \mathbb{P}\left(\left|\tau_\mathcal{I}^\varepsilon-T_\varepsilon^\pm\right|>c({\varepsilon})\right)=o\left(\exp\bigg\{-\frac{1}{{\varepsilon}^{2(1-\beta')}}\bigg\}\right)\end{equation}

and

(2.8)

\begin{equation}\mathbb{P}\left(X_\varepsilon(\tau_\mathcal{I}^\varepsilon)=q_{\pm}\right)\geq 1-o\left(\exp\left\{-\frac{1}{{\varepsilon}^{2(1-\beta')}}\right\}\right)\!.\end{equation}

Proof. We start by showing that with overwhelming probability the exit happens through the endpoint that is on the same side as the starting point. Indeed, (2.3), (2.4), and (2.5) imply

\begin{align*}\mathbb{P}(X_\varepsilon(\tau_\mathcal{I}^\varepsilon)=q_{\mp})&=\mathbb{P}(Y_\varepsilon(\tau_\mathcal{I}^\varepsilon)=f(q_{\mp}))=\mathbb{P}\bigg({\varepsilon}^\beta <\left|{\varepsilon} U_\varepsilon(\tau_\mathcal{I}^\varepsilon)+\varepsilon^2V_\varepsilon(\tau_\mathcal{I}^\varepsilon)\right|\bigg)\\[3pt] &\leq \mathbb{P}\bigg({\varepsilon}^\beta <{\varepsilon}\sup_{t>0}\left| U_\varepsilon(t\wedge\tau_\mathcal{I}^\varepsilon)\right|+\varepsilon^2\frac{\|h\|_{\infty}}{2\lambda}\bigg)\\[3pt] &\leq\mathbb{P}\bigg({\varepsilon}^{-(1-\beta)}-\varepsilon\frac{\|h\|_{\infty}}{2\lambda} <\sup_{t>0}\left| U_\varepsilon(t\wedge\tau_\mathcal{I}^\varepsilon)\right|\!\!\bigg)=o\bigg(\!\!\exp\bigg\{-\frac{1}{{\varepsilon}^{2(1-\beta')}}\bigg\}\bigg),\end{align*}

and (2.8) follows.

To prove (2.7), let us introduce

\begin{equation*}U_\varepsilon =U_\varepsilon\left((T_\varepsilon^\pm+c(\varepsilon))\wedge\tau_\mathcal{I}^\varepsilon\right)\!,\qquad V_\varepsilon =V_\varepsilon\left((T_\varepsilon^\pm+c(\varepsilon))\wedge\tau_\mathcal{I}^\varepsilon\right)\!,\end{equation*}

and note that (2.3) implies

\begin{align*}&\mathbb{P}\big(\tau_\mathcal{I}^{\varepsilon}>T_{\varepsilon}^{\pm} + c({\varepsilon})\big) \\[3pt] &\quad \le \mathbb{P}\left(f(q_-)< {\rm e}^{\lambda \left(T_{\varepsilon}^{\pm}+c({\varepsilon})\right)}\left(\pm{\varepsilon}^\beta + {\varepsilon} U_\varepsilon+\varepsilon^2V_\varepsilon\right) <f(q_+)\right)\\[3pt] &\quad = \mathbb{P}\left({\varepsilon}^\beta {\rm e}^{-\lambda c({\varepsilon})}\frac{f(q_-)}{|f(q_\pm)|} <\pm{\varepsilon}^\beta + {\varepsilon} U_\varepsilon(t)+\varepsilon^2V_\varepsilon(t)<{\varepsilon}^\beta {\rm e}^{-\lambda c({\varepsilon})}\frac{f(q_+)}{|f(q_\pm)|}\right)\\[3pt] &\quad\le \mathbb{P}\left( \left|U_\varepsilon\right|>\varepsilon^{-(1-\beta)}\left(1-{\rm e}^{-\lambda c({\varepsilon})}\right)-\varepsilon\frac{\|h\|_\infty}{2\lambda}\right) + o\left(\exp\left\{-\frac{1}{{\varepsilon}^{2(1-\beta')}}\right\}\right)\\[3pt] &\quad=o\left(\exp\left\{-\frac{1}{{\varepsilon}^{2(1-\beta')}}\right\}\right)\!,\end{align*}

where we used (2.6), (2.8), (2.4), (2.5), and the subpolynomial decay of $c(\varepsilon)$ . Similarly,

\begin{align*}& \mathbb{P} \big(\tau_\mathcal{I}^{\varepsilon}< T_{\varepsilon}^{\pm}-c({\varepsilon})\big) \\[3pt] &\quad\le \mathbb{P}\left(\sup_{t<T_{\varepsilon}^{\pm}-c({\varepsilon})}\!\!\!\!{\rm e}^{\lambda (t-c({\varepsilon}))}\left|\pm{\varepsilon}^\beta + {\varepsilon} U_\varepsilon(t)+\varepsilon^2V_\varepsilon(t)\right| > |f(q_{\pm})|\right)+ o\left(\exp\left\{-\frac{1}{{\varepsilon}^{2(1-\beta')}}\right\}\right)\\[3pt] &\quad\le \mathbb{P}\!\left(\!{\rm e}^{\lambda (T_{\varepsilon}^{\pm}-c({\varepsilon}))}\!\left(\!{\varepsilon}^\beta + {\varepsilon}\! \sup_{t<T_{\varepsilon}^\pm} \left|U_\varepsilon(t)\right|+\varepsilon^2\frac{\|h\|_{\infty}}{2\lambda}\!\right) > |f(q_\pm)|\right)+ o\left(\exp\left\{-\frac{1}{{\varepsilon}^{2(1-\beta')}}\right\}\right)\\[3pt] &\quad= \mathbb{P}\left({\varepsilon}^\beta + {\varepsilon} \sup_{t<T_{\varepsilon}^{\pm}} \left|U_\varepsilon(t)\right|+\varepsilon^2\frac{\|h\|_{\infty}}{2\lambda}\ge {\varepsilon}^\beta {\rm e}^{\lambda c({\varepsilon})}\right) + o\left(\exp\left\{-\frac{1}{{\varepsilon}^{2(1-\beta')}}\right\}\right)\\[3pt] &\quad\leq \mathbb{P}\left( \sup_{t>0} \left|U_\varepsilon(t\wedge\tau_\mathcal{I}^\varepsilon)\right|\ge {\varepsilon}^{-(1-\beta)} \big({\rm e}^{\lambda c({\varepsilon})}-1\big)-\varepsilon \frac{\|h\|_{\infty}}{2\lambda}\right)+ o\left(\exp\left\{-\frac{1}{{\varepsilon}^{2(1-\beta')}}\right\}\right)\\[3pt] &\quad=o\bigg(\exp\bigg\{-\frac{1}{{\varepsilon}^{2(1-\beta')}}\bigg\}\bigg)\!.\\[-35pt]\end{align*}

□

Proof of Theorem 1.1. We have

\begin{equation*}Y_\varepsilon(0)= f(X_\varepsilon(0))=f(\varepsilon x)=\varepsilon x + \mathcal{O}\big(\varepsilon^2 x^2\big)=\varepsilon\big(x+\mathcal{O}\big(\varepsilon x^2\big)\big)\end{equation*}

and

\begin{equation*}|x+\mathcal{O}(\varepsilon x^2)|\leq K(\varepsilon)+\mathcal{O}\left(\varepsilon (K(\varepsilon))^2\right)\leq 2K(\varepsilon)\end{equation*}

for small enough $\varepsilon$ , so the right-hand side is a function subpolynomially growing at 0. Let us define $\theta^{\pm}_\varepsilon=T^\pm_{\varepsilon}-(\tau_\mathcal{I}^\varepsilon-\tau_\mathcal{V}^\varepsilon)$ . Due to (2.8),

\begin{align*}\mathbb{P}\big(X_\varepsilon\left(\tau_{\mathcal{I}}^\varepsilon\right)&=q_\pm;\ \tau_{\mathcal{I}}^\varepsilon>\frac{\alpha}{\lambda}\log\varepsilon^{-1}{+t}\big)\\[3pt] & =\mathbb{P}\bigg(Y_\varepsilon\left(\tau_{\mathcal{V}}^\varepsilon\right)=\varepsilon^{\beta};\ \tau_{\mathcal{V}}^\varepsilon>\frac{\alpha}{\lambda}\log\varepsilon^{-1}{+t}-T^\pm_{\varepsilon}+\theta^{\pm}_\varepsilon\bigg)+o\big(\varepsilon^{\alpha-1}\big),\end{align*}

where the error term (along with all subsequent error terms) is uniform in the starting points $|x|\leq K(\varepsilon)$ . Note that the strong Markov property implies the conditional independence of $\theta^{\pm}_\varepsilon$ and $\tau_\mathcal{V}^\varepsilon$ given $Y_\varepsilon(\tau_\mathcal{V}^\varepsilon)=\varepsilon^{\beta}$ . This, along with (2.7), allows us to give upper and lower estimates of the first term on the right-hand side:

\begin{align*}\mathbb{P}\Big(Y_\varepsilon\left(\tau_{\mathcal{V}}^\varepsilon\right)&=\pm\varepsilon^{\beta};\ \tau_{\mathcal{V}}^\varepsilon>\frac{\alpha}{\lambda}\log\varepsilon^{-1}{+t}-T^\pm_{\varepsilon}+c(\varepsilon)\bigg) + o\big(\varepsilon^{\alpha-1}\big)\\[3pt] &\leq\mathbb{P}\Big(Y_\varepsilon\left(\tau_{\mathcal{V}}^\varepsilon\right)=\varepsilon^{\beta};\ \tau_{\mathcal{V}}^\varepsilon>\frac{\alpha}{\lambda}\log\varepsilon^{-1}{+t}-T^\pm_{\varepsilon}+\theta_\varepsilon^{\pm}\Big)\\[3pt] &\leq\mathbb{P}\left(Y_\varepsilon\left(\tau_{\mathcal{V}}^\varepsilon\right)=\pm\varepsilon^{\beta};\ \tau_{\mathcal{V}}^\varepsilon>\frac{\alpha}{\lambda}\log\varepsilon^{-1}{+t}-T^\pm_{\varepsilon}-c(\varepsilon)\right) + o\big(\varepsilon^{\alpha-1}\big) , \end{align*}

where $c(\varepsilon)$ is an arbitrary positive function that decays subpolynomially. Now we may apply Theorem 2.1 with $C={-t} +\lambda^{-1}\log|f(q_\pm)|$ to both sides, which concludes the proof.□

3. Linear system with additive noise

In this section we introduce an auxiliary process, which is a simpler special case of (2.2). Namely, we consider

(3.1)

\begin{align}{\rm d} Z_{\varepsilon}(t)&=\lambda Z_{\varepsilon}(t){\rm d} t + {\varepsilon} \sigma_0 {\rm d} W(t),\qquad Z_{\varepsilon}(0)=\varepsilon z,\end{align}

where $\sigma_0=\tilde{\sigma}(0)=\sigma(0)>0$ , $|z|<K({\varepsilon})$ , and $K({\varepsilon})$ grows subpolynomially at 0.

We will need a precise description of the exit of $Z_\varepsilon$ from $\mathcal{V}_Z=[\!-\varepsilon^{\beta}(1+\delta_\varepsilon),\varepsilon^{\beta}(1+\delta_\varepsilon)]$ for $\beta\in (0,1)$ and any $\delta_{\varepsilon}>0$ satisfying $\delta_\varepsilon\downarrow 0$ as $\varepsilon\downarrow 0 $ . Let us introduce a stopping time

\begin{equation*}\tau_{\varepsilon}=\inf\!\big\{t>0\,{:}\, |Z_{\varepsilon}(t)|={\varepsilon}^\beta(1+\delta_\varepsilon)\big\},\end{equation*}

and, for a constant C and a subpolynomially decaying at 0 function $c({\varepsilon})$ , a deterministic time

(3.2)

\begin{equation}t_\varepsilon=\frac{\alpha-\beta}{\lambda}\log\varepsilon^{-1}-C+c({\varepsilon}).\end{equation}

We will often use $C_\varepsilon=C-c(\varepsilon)$ . The first result of this section is a version of Theorem 2.1 for the process $Z_{\varepsilon}$ with stronger control of the dependence on the initial point.

Lemma 3.1. For any $\alpha>1$ and any subpolynomially decaying function $c(\varepsilon)$ in the definition of $t_{\varepsilon}$ , there is $c>0$ such that

\begin{equation*}\lim_{\varepsilon\downarrow 0}\sup_{|z|\leq K(\varepsilon)}{\rm e}^{cz^2}\left|\varepsilon^{-(\alpha-1)}\mathbb{P}\left(\tau_\varepsilon>t_\varepsilon\right)-2{\rm e}^{\lambda C}\psi_0(z)\right|=0.\end{equation*}

Proof. Duhamel’s formula gives an explicit solution to (3.1):

(3.3)

\begin{equation}Z_{\varepsilon}(t)={\varepsilon} {\rm e}^{\lambda t} (z+\sigma_0 N(t)),\end{equation}

where $N(t)=\int_0^t {\rm e}^{-\lambda s} \, {\rm d} W(s)$ . Plugging in $\tau_\varepsilon$ for t, we obtain

\begin{equation*} (1+\delta_\varepsilon)\varepsilon^{\beta}=|Z_{\varepsilon}(\tau_\varepsilon)|={\varepsilon} {\rm e}^{\lambda \tau_\varepsilon} \left|z+\sigma_0N(\tau_\varepsilon)\right|,\end{equation*}

which is equivalent to

\begin{equation*}\tau_\varepsilon=\frac{1-\beta}{\lambda}\log\varepsilon^{-1}-\frac{1}{\lambda}\log|z+\sigma_0N(\tau_\varepsilon)|+o(1).\end{equation*}

Therefore,

\begin{equation*}\left\{\tau_\varepsilon> t_\varepsilon\right\}=\Big\{|z+\sigma_0N(\tau_\varepsilon)| < {\rm e}^{\lambda C_{\varepsilon}}\varepsilon^{\alpha-1}(1+o(1))\Big\}.\end{equation*}

By the martingale convergence theorem, N(t) converges to a random variable $N_{\infty}$ as $t\to\infty$ almost surely and in $L^1$ . We claim that, in addition, there are $c_1,c_2>0$ such that, for any $L>0$ ,

(3.4)

\begin{equation}\mathbb{P}\left(|N(\tau_\varepsilon)-N_\infty|>L\varepsilon^{\alpha-\beta};\ \tau_\varepsilon>t_\varepsilon\right)\leq c_1 {\rm e}^{-c_2 L^2}.\end{equation}

Indeed, Lemma 2.1 implies

\begin{align*}&\mathbb{P}\left(\sup_{t\geq t_\varepsilon}|N(t)-N_\infty|\geq L\varepsilon^{\alpha-\beta}\right)\\ &\quad \leq2\mathbb{P}\left(\sup_{t\geq t_\varepsilon}|N(t)-N(t_\varepsilon)|\geq L\varepsilon^{(\alpha-\beta)}/2\right)\leq c_1\exp\Big\{-c_2L^2\varepsilon^{2(\alpha-\beta)}e^{2\lambda t_\varepsilon}\Big\}\end{align*}

for some $c_1,c_2>0$ , which proves the claim since $\varepsilon^{\alpha-\beta}{\rm e}^{\lambda t_\varepsilon}={\rm e}^{-\lambda C_{\varepsilon}}$ .

Let us fix $\gamma\in (\alpha-1,\alpha-\beta)$ and write

\begin{equation*}\mathbb{P}\left(\tau_\varepsilon> t_\varepsilon\right)=H_1(z,\varepsilon)+H_2(z,\varepsilon),\end{equation*}

where

\begin{align*}H_1(z,\varepsilon)&=\mathbb{P}\big(|z+\sigma_0N_\infty+\sigma_0(N(\tau_\varepsilon)-N_{\infty})|< {\rm e}^{\lambda C_{\varepsilon} }\varepsilon^{\alpha-1}(1+o(1)); \\ & \quad |N(\tau_\varepsilon)-N_{\infty}|\le \varepsilon^{\gamma} \big),\\H_2(z,\varepsilon)&=\mathbb{P}\left(|z+\sigma_0N(\tau_\varepsilon)| < {\rm e}^{\lambda C_{\varepsilon}}\varepsilon^{\alpha-1}(1+o(1));\ |N(\tau_\varepsilon)-N_{\infty}|>\varepsilon^{\gamma}\right)\!.\end{align*}

The random variables $N(\tau_\varepsilon)$ and $N(\tau_\varepsilon)-N_\infty$ are independent due to the strong Markov property. So the Gaussian tail of the maximum of the Brownian motion and (3.4) imply

\begin{equation*}H_2(z,\varepsilon)\leq c_1{\rm e}^{-c_2z^2}o(\varepsilon^{\alpha-1}).\end{equation*}

The desired asymptotics of $H_1(z,{\varepsilon})$ follows from the explicit form of the Gaussian density of the random variable $z+\sigma_0 N_\infty$ .□

The next result is based on the fact that the distribution of $Z_\varepsilon(t_\varepsilon)$ conditioned on non-exit is approximately uniform over $[\!-\varepsilon^{\beta}(1+\delta_\varepsilon),\varepsilon^{\beta}(1+\delta_\varepsilon)]$ . In fact, this stronger statement is proved as an intermediate step. We first note that the density of any absolutely continuous random variable conditioned on a positive probability event is well defined.

Lemma 3.2. If $f_{\varepsilon}^c(u)$ is the probability density of $Z_{\varepsilon}(t_{\varepsilon})$ conditioned on $\{\tau_{\varepsilon} > t_{\varepsilon}\}$ , then, for any $\delta>0$ ,

(3.5)

\begin{equation}\lim_{\varepsilon\downarrow 0}\sup_{|u|\leq(1-\delta)\varepsilon^{\beta}}\sup_{|z|\leq K(\varepsilon)}\left|{\varepsilon}^{\beta}f_{\varepsilon}^c(u)- \frac{1}{2}\right|=0.\end{equation}

Moreover, for any integrable function h with exponentially decaying tails and any subpolynomially growing function $K(\varepsilon)$ ,

(3.6)

\begin{equation}\lim_{\varepsilon\downarrow 0}\sup_{|z|\leq K(\varepsilon)}\left|\varepsilon^{-(1-\beta)}\mathbb{E}\left[h\left(\varepsilon^{-1}Z_{\varepsilon}(t_\varepsilon)\right) \mid \tau_\varepsilon>t_{\varepsilon}\right]-\frac{1}{2} \int_{-\infty}^{\infty} h(y) \, {\rm d} y\right|=0.\end{equation}

Proof. By (3.3) and the Dambis–Dubins–Schwartz theorem (see, e.g., [8, Section 3.4B]),

\begin{equation*}Z_{\varepsilon}(t)={\varepsilon} {\rm e}^{\lambda t} \left(z+\sigma_0\int_0^t {\rm e}^{-\lambda s} \, {\rm d} W(s)\right)={\varepsilon} {\rm e}^{\lambda t} \left(z+B(r(t))\right)\!,\end{equation*}

where $r(t)=\sigma_0^2\left(1-{\rm e}^{-2\lambda t}\right)/(2\lambda)$ , and B is an auxiliary standard Brownian motion. Since

\begin{equation*}r(t_{\varepsilon})=\sigma_0^2\frac{1-{\varepsilon}^{2(\alpha-\beta)}{\rm e}^{2\lambda C_\varepsilon}}{2\lambda},\end{equation*}

we have

\begin{align*}Z_{\varepsilon}(t_{\varepsilon})={\varepsilon}^{1-\alpha+\beta}{\rm e}^{-\lambda C_\varepsilon}\left(z+B\left(\sigma_0^2\frac{1-{\varepsilon}^{2(\alpha-\beta)}{\rm e}^{2\lambda C_\varepsilon}}{2\lambda}\right)\right)\!.\end{align*}

This is a Gaussian random variable. Its density at a point $u\in{\mathbb R}$ is given by

\begin{equation*} p_{\varepsilon}(u)=\frac{\sqrt{\lambda}{\rm e}^{\lambda C_{\varepsilon}} }{\sqrt{\pi}\sigma_0 {\varepsilon}^{1-\alpha+\beta}(1+o(1))}\exp\left\{-\frac{(u{\varepsilon}^{\alpha-\beta-1} {\rm e}^{\lambda C_{\varepsilon}}-z)^2}{\frac{\sigma^2_0}{\lambda}(1+o(1))}\right\}\!.\end{equation*}

If $|u|\le{\varepsilon}^\beta$ , then $|u{\varepsilon}^{\alpha-1-\beta}|\le {\varepsilon}^{\alpha-1}$ , so

(3.7)

\begin{equation} p_{\varepsilon}(u)={\varepsilon}^{\alpha-\beta-1} {\rm e}^{\lambda C} \psi_0(z)(1+o(1)) \end{equation}

uniformly over u and z satisfying $|u|\le{\varepsilon}^\beta$ , $|z|\leq K(\varepsilon)$ .

Therefore, (3.5) will follow from

\begin{equation*}\lim_{\varepsilon\downarrow 0}\sup_{|u|\leq (1-\delta)\varepsilon^\beta}\sup_{|z|\leq K(\varepsilon)}\left|\frac{f_{\varepsilon}(z,u)}{{\varepsilon}^{\alpha-\beta-1}{\rm e}^{\lambda C}\psi_0(z)}-1\right|=0,\end{equation*}

where $f_{\varepsilon}(z,u)$ is the sub-probability density of $Z_\varepsilon(t_\varepsilon)$ on the event $\{\tau_\varepsilon>t_\varepsilon\}$ . For this, it suffices to see that

(3.8)

\begin{equation}\sup_{|u|\leq (1-\delta)\varepsilon^\beta}\sup_{|z|\leq K(\varepsilon)}\frac{g_{\varepsilon}(z,u)}{\psi_0(z)} \end{equation}

decays exponentially fast as ${\varepsilon}\downarrow 0$ , where $g_{\varepsilon}(z,u)$ is the sub-probability density of $Z_{\varepsilon}(t_{\varepsilon})$ on the event $\{\tau_{\varepsilon} \le t_{\varepsilon}\}$ . Given that $Z_{\varepsilon}(\tau_\varepsilon)=\varepsilon^{\beta}(1+\delta_\varepsilon)$ (similarly for $-\varepsilon^{\beta}(1+\delta_\varepsilon)$ ), we have

\begin{equation*}Z_{\varepsilon}(t_{\varepsilon})={\rm e}^{\lambda (t_{\varepsilon}-\tau_\varepsilon)}\left(\varepsilon^{\beta}(1+\delta_\varepsilon)+{\varepsilon} \sigma_0 \int_0^{t_{\varepsilon}-\tau_\varepsilon}{\rm e}^{-\lambda s} \, {\rm d} W(s)\right) , \end{equation*}

and thus

(3.9)

\begin{equation}g_{\varepsilon}(z,u)=\int_0^{t_{\varepsilon}}\mathbb{P}\{\tau_{\varepsilon}\in [t,t+{\rm d} t)\}G({\varepsilon},t, u),\end{equation}

where

\begin{align*}G({\varepsilon},t,u)&=\psi\left({\varepsilon}^2\sigma_0^2\frac{w^2-1}{2\lambda}, w\varepsilon^{\beta}(1+\delta_\varepsilon)-u\right)\\&= \frac{1}{\sqrt{2\pi {\varepsilon}^2\sigma_0^2\frac{w^2-1}{2\lambda}}\,}\exp\left\{-\frac{\lambda(w\varepsilon^{\beta}(1+\delta_\varepsilon)-u)^2}{{\varepsilon}^2\sigma_0^2(w^2-1)}\right\}\!,\end{align*}

with $\psi(\cdot,\cdot)$ as introduced in (1.5) and $w={\rm e}^{\lambda(t_{\varepsilon}-t)}$ .

We claim that there is $c>0$ such that

(3.10)

\begin{equation}\sup\{G({\varepsilon},t,u)\,{:}\, 0\le t\le t_{\varepsilon},\ |u|\le(1-\delta) \varepsilon^{\beta} \}=\mathcal{O}\Big({\rm e}^{-c/{\varepsilon}^{2(1-\beta)}}\Big).\end{equation}

This, along with (3.9) and the fact that $\psi_0(z)>\lambda^{1/2}\pi^{-1/2}\sigma_0^{-1}{\rm e}^{-\lambda K^2({\varepsilon})/\sigma_0^2}$ for all ${\varepsilon}$ and z satisfying $|z|<K({\varepsilon})$ , will imply the desired exponential decay in (3.8).

Let us fix any $w_0$ and find $c_0>0$ such that

\begin{equation*}\lambda (w\varepsilon^{\beta}(1+\delta_\varepsilon)-(1-\delta)\varepsilon^\beta )^2/(\sigma_0^2(w^2-1))\ge c_0\varepsilon^{2\beta}\end{equation*}

for $w> w_0$ and all ${\varepsilon}>0$ . Then there is a constant $c_0$ such that, for all $|u|\le (1-\delta){\varepsilon}^\beta$ and ${\varepsilon}>0$ ,

(3.11)

\begin{equation}G({\varepsilon},t,u)\le \frac{1}{\sqrt{2\pi {\varepsilon}^2\sigma_0^2\frac{w_0^2-1}{2\lambda}}\,}{\rm e}^{-c_0/{\varepsilon}^{2(1-\beta)}},\qquad w>w_0.\end{equation}

If $1 \le w \le w_0$ , then $(w\varepsilon^{\beta}(1+\delta_\varepsilon)-u)^2\ge\Delta\coloneqq \varepsilon^{2\beta}\delta^2$ . So, denoting $D={\varepsilon}^2\sigma_0^2(w^2-1)/(2\lambda)$ , we obtain

(3.12)

\begin{equation}G({\varepsilon},t,u)\le \frac{1}{\sqrt{2\pi D}\,}{\rm e}^{-\frac{\Delta}{2D}}=\psi(D, \Delta),\qquad 1\le w\le w_0.\end{equation}

The restriction on w implies $0\le D \le c_1{\varepsilon}^2$ for some $c_1$ . To maximize $\psi(D,\Delta)$ , we compute

\begin{equation*}\partial_D\ln \psi(D,\Delta)= -\frac{1}{2D} + \frac{\Delta}{2 D^2},\end{equation*}

so $\psi(D,\Delta)$ grows in $D\in [0,\Delta]$ , and we obtain, from (3.12),

(3.13)

\begin{equation}G({\varepsilon},t,x)\le \frac{1}{\sqrt{2\pi c_1 {\varepsilon}^2}\,}{\rm e}^{-\frac{\Delta}{2c_1{\varepsilon}^2}}.\end{equation}

Combining (3.11) and (3.13), we obtain (3.10) and hence (3.8). Thus, (3.5) is proved.

To prove (3.6), we write

\begin{align*}&\mathbb{E}\left[h(\varepsilon^{-1}Z_{\varepsilon}(t_\varepsilon)) \mid \tau_\varepsilon>t_{\varepsilon}\right] \\ &\quad =\int_{|u|<(1-\delta)\varepsilon^{\beta}}h(\varepsilon^{-1}u)f_\varepsilon^c(u) \, {\rm d} u + \int_{(1-\delta)\varepsilon^{\beta}\le|u|\le (1+\delta_{\varepsilon}){\varepsilon}^\beta}h(\varepsilon^{-1}u)f_\varepsilon^c(u) \, {\rm d} u\end{align*}

and notice that the first term on the right-hand side equals

\begin{align*}&\int_{-(1-\delta)\varepsilon^{\beta}}^{(1-\delta)\varepsilon^{\beta}}h(\varepsilon^{-1}x)f_\varepsilon^c(x) \, {\rm d} x=\frac{1}{2\varepsilon^{\beta}}(1+o(1))\int_{-(1-\delta)\varepsilon^{\beta}}^{(1-\delta)\varepsilon^{\beta}}h(\varepsilon^{-1}x) \, {\rm d} x\\[3pt] &\quad = \frac{1}{2}\varepsilon^{1-\beta}(1+o(1))\int_{-(1-\delta)\varepsilon^{-(1-\beta)}}^{(1-\delta)\varepsilon^{-(1-\beta)}}h(x) \, {\rm d} x=\frac{1}{2}\varepsilon^{1-\beta}\int_{-\infty}^{\infty}h(x) \, {\rm d} x+o(\varepsilon^{1-\beta}),\end{align*}

while the second term decays much faster than ${\varepsilon}^{1-\beta}$ due to the decay assumption applied to h.

4. Proof of Theorem 2 for $\alpha\in(1,1+\beta)$

We start by studying the deviations $\Delta_{\varepsilon}(t)=Y_{\varepsilon}(t)-Z_{\varepsilon}(t)$ as long as both processes $Y_{\varepsilon}(t)$ and $Z_{\varepsilon}(t)$ are close to the origin. Let us fix $\beta\in(0,1)$ and introduce the stopping times

\begin{equation*}\bar\tau_{\varepsilon}=\inf\{t\ge 0\,{:}\, |Y_{\varepsilon}(t)|={\varepsilon}^\beta \},\qquad\bar{\bar\tau}_{\varepsilon}=\inf\{t\ge 0\,{:}\, |Y_{\varepsilon}(t)|=2{\varepsilon}^\beta \}.\end{equation*}

Lemma 4.1. Suppose $\alpha\in(1,1+\beta)$ , $\beta'\in(\alpha-1,\beta)$ , $L(\varepsilon)>0$ is a bounded function, and

(4.1)

\begin{equation}t_\epsilon^{\prime}=\frac{\alpha-\beta}{\lambda}\log \left(L(\varepsilon)\varepsilon^{-1}\right)\!.\end{equation}

Then, for sufficiently small ${\varepsilon}>0$ , we have

(4.2)

\begin{equation}\mathbb{P}\left(\sup_{0\le t\le t_{\varepsilon}' \wedge \bar{\bar\tau}_{\varepsilon}} |\Delta_{\varepsilon}(t)|>{\varepsilon}^{\beta+\beta'-(\alpha-1) } \right)\leq 2\exp\left\{-\frac{c}{{\varepsilon}^{2(\beta-\beta')}}\right\}\!,\end{equation}

provided $Y_\varepsilon(0)=Z_\varepsilon(0)={\varepsilon} y$ for every $y\in(\!-2{\varepsilon}^\beta, 2{\varepsilon}^\beta)$ .

Proof. Recalling (2.3), we obtain

\begin{equation*}\Delta_{\varepsilon}(t)={\varepsilon} {\rm e}^{\lambda t} \big(I_{\varepsilon}^{(1)}(t)+I_{\varepsilon}^{(2)}(t)\big),\end{equation*}

where $I_{\varepsilon}^{(1)}(t)=\int_0^t {\rm e}^{-\lambda s}(\tilde\sigma(Y_{\varepsilon}(s))-\sigma_0)\,{\rm d} W(s)$ and $I^{(2)}_\varepsilon=\mathcal{O}(\varepsilon)$ .

Clearly, $I_\varepsilon^{(1)}$ is a martingale satisfying $\langle I_{\varepsilon}^{(1)}\rangle_t=\mathcal{O}\left({\varepsilon}^{2\beta}\right)$ for $t\le \bar{\bar\tau}_{\varepsilon}$ , so for any $c_0>0$ , there is $c>0$ such that

\begin{equation*}\mathbb{P}\left(\sup_{s\le \bar{\bar\tau}_{\varepsilon}}|I_{\varepsilon}^{(1)}|>c_0{\varepsilon}^{\beta'}\right)\le 2 {\rm e}^{-c\frac{{\varepsilon}^{2\beta'}}{{\varepsilon}^{2\beta}}}\le 2\exp\left\{-c\frac{1}{{\varepsilon}^{2(\beta-\beta')}}\right\}\!,\end{equation*}

by Lemma 2.1. Therefore

\begin{equation*}\mathbb{P}\left(\sup_{s\le \bar{\bar\tau}_{\varepsilon}}|I_{\varepsilon}^{(1)}+I_{\varepsilon}^{(2)}|>2c_0{\varepsilon}^{\beta'}\right)\le 2\exp\left\{-c\frac{1}{{\varepsilon}^{2(\beta-\beta')}}\right\}\!,\end{equation*}

and on the complementary event we have

\begin{equation*}\sup_{0\le t\le t'_{\varepsilon} \wedge \bar{\bar\tau}_{\varepsilon}} |\Delta_{\varepsilon}(t)|<2c_0 {\varepsilon} L^{\alpha-\beta}({\varepsilon}) {\varepsilon}^{-(\alpha-\beta)}{\varepsilon}^{\beta'} < {\varepsilon}^{\beta+\beta'-(\alpha-1) }\end{equation*}

if $c_0$ is chosen sufficiently small, which finishes the proof.□

Based on this approximation result and the calculation for $Z_\varepsilon$ in the previous section, the following theorem proves Theorem 2.1 for $\alpha$ not too large.

Theorem 4.1. Let $\alpha\in(1,1+\beta)$ and let $t_{\varepsilon}$ be as in (3.2). Then there is $c>0$ such that

(4.3)

\begin{equation}\lim_{\varepsilon\downarrow 0}\sup_{|y|\leq K(\varepsilon)}{\rm e}^{cy^2}\left|\varepsilon^{-(\alpha-1)}\mathbb{P}\left(\bar\tau_\varepsilon>t_{\varepsilon};\ Y_{\varepsilon}(\bar\tau_\varepsilon)=\pm\varepsilon^{\beta}\right)-{\rm e}^{\lambda C}\psi_0(y)\right|=0.\end{equation}

Proof. We start with an upper bound on $\mathbb{P}(\bar\tau_\varepsilon>t_\varepsilon)$ in terms of $Z_{\varepsilon}$ . Let us fix any $\beta''\in(\beta,2\beta -(\alpha-1))$ , so that $\beta'=\beta'' -\beta+\alpha -1\in(\alpha-1,\beta)$ , which will allow us to apply Lemma 4.1 several times in this proof. Let us take any family of events $(B_{\varepsilon})_{{\varepsilon}>0}$ and estimate

\begin{align*}&\mathbb{P}(\bar\tau_{\varepsilon}>t_\varepsilon;\ B_{\varepsilon})=I_1({\varepsilon})+I_2({\varepsilon})\\[3pt] &\quad =\mathbb{P}\bigg(\bar\tau_{\varepsilon}>t_\varepsilon;\ \sup_{0\leq t\leq t_\varepsilon\wedge\bar{\bar\tau}_{\varepsilon}}|\Delta_\varepsilon(t)|<\varepsilon^{\beta''};\ B_{\varepsilon}\bigg)+\mathbb{P}\bigg(\bar\tau_{\varepsilon}>t_\varepsilon;\ \sup_{0\leq t\leq t_\varepsilon\wedge\bar{\bar\tau}_{\varepsilon}}|\Delta_\varepsilon(t)|\geq\varepsilon^{\beta''};\ B_{\varepsilon}\bigg).\end{align*}

Note that $t_\varepsilon$ is of the form (4.1) with $L({\varepsilon})={\rm e}^{-\frac{(\alpha-\beta)C_{\varepsilon}}{\lambda}}$ and thus (4.2) implies

(4.4)

\begin{equation}I_2({\varepsilon})=o_{\exp}(1),\end{equation}

where, for any $\gamma>0$ , we use $o_{\exp}(1)$ as a shorthand for $o({\rm e}^{-{\varepsilon}^{-\gamma}})$ . Also,

\begin{equation*}I_1({\varepsilon})=\mathbb{P}\bigg(\bar\tau_{\varepsilon}>t_\varepsilon;\ \sup_{0\leq t\leq t_\varepsilon}|\Delta_\varepsilon(t)|<\varepsilon^{\beta''};\ B_{\varepsilon}\bigg).\end{equation*}

We need to approximate this in terms of the exit time of $Z_{\varepsilon}$ instead of $\bar\tau_{\varepsilon}$ . We do not have control over the difference of these two times in general as we can only control the difference of the processes until $t_{\varepsilon}$ . Instead, we are going to set a different threshold for $Z_{\varepsilon}$ to reach. Let $\gamma\in (\beta,\beta'')$ , $l_\varepsilon^1=\varepsilon^{\beta}$ , and $l_\varepsilon^2=l_\varepsilon^1+{\varepsilon}^\gamma$ . This implies

\begin{equation*}\mathbb{P}\bigg(\bar\tau_{\varepsilon}>t_\varepsilon;\ \tau^Z_\varepsilon \leq t_\varepsilon;\ \sup_{0\leq t\leq t_\varepsilon}|\Delta_\varepsilon(t)|\leq\varepsilon^{\beta''};\ B_{\varepsilon}\bigg)=0,\end{equation*}

where $\tau^Z_{\varepsilon}$ is the exit time from $[\!-l_\varepsilon^2,l_\varepsilon^2]$ for $Z_{\varepsilon}$ , and thus

\begin{equation*}I_1({\varepsilon})=\mathbb{P}\bigg(\bar\tau_{\varepsilon}>t_\varepsilon;\ \tau^Z_\varepsilon >t_\varepsilon;\ \sup_{0\leq t\leq t_\varepsilon}|\Delta_\varepsilon(t)|\leq\varepsilon^{\beta''};\ B_{\varepsilon}\bigg)\leq \mathbb{P}\left(\tau^Z_{\varepsilon}>t_\varepsilon;\ B_{\varepsilon}\right)\!.\end{equation*}

Combining this with (4.4), we obtain

(4.5)

\begin{equation}\mathbb{P}\big(\bar\tau_{\varepsilon}>t_\varepsilon;\ B_{\varepsilon}\big)\leq\mathbb{P}\left(\tau_\varepsilon^Z>t_\varepsilon;\ B_{\varepsilon}\right)+o_{\exp}(1).\end{equation}

Next, we set $l_\varepsilon^3=l_\varepsilon^{1}-\varepsilon^{\gamma}$ and define $\eta_\varepsilon^Z$ to be the exit time of $Z_{\varepsilon}$ from $[\!-l_\varepsilon^3,l_\varepsilon^3]$ . Lemma 4.1 and the fact that $\{\bar\tau_\varepsilon\leq t_\varepsilon;\ \sup_{0\leq t\leq t_\varepsilon\wedge\bar{\bar\tau}_\varepsilon}|\Delta_\varepsilon(t)|\leq\varepsilon^{\beta''}\}\subset \{\eta_\varepsilon^Z\le t_\varepsilon\}$ imply

(4.6)

\begin{align}\mathbb{P}(\bar\tau_\varepsilon>t_\varepsilon;\ B_{\varepsilon})&\geq\mathbb{P}\left(\bar\tau_{\varepsilon}>t_\varepsilon;\ \sup_{0\leq t\leq t_\varepsilon\wedge\bar{\bar\tau}_\varepsilon}|\Delta_\varepsilon(t)|\leq\varepsilon^{\beta''};\ B_{\varepsilon}\right)\\[3pt]\notag&=\mathbb{P}(B_{\varepsilon})+o_{\exp}(1)-\mathbb{P}\left(\bar\tau_{\varepsilon}\leq t_\varepsilon;\ \sup_{0\leq t\leq t_\varepsilon\wedge\bar{\bar\tau}_\varepsilon}|\Delta_\varepsilon(t)|\leq\varepsilon^{\beta''};\ B_{\varepsilon}\right)\\[3pt] \notag&\geq \mathbb{P}(B_{\varepsilon})+o_{\exp}(1)-\mathbb{P}\left(\eta_\varepsilon^Z\leq t_\varepsilon;\ B_{\varepsilon}\right)=\mathbb{P}\left(\eta_\varepsilon^Z>t_\varepsilon;\ B_{\varepsilon}\right)+o_{\exp}(1).\end{align}

Combining (4.5) and (4.6), we obtain

(4.7)

\begin{equation}\mathbb{P}(\eta_\varepsilon^Z>t_\varepsilon;\ B_{\varepsilon})-o_{\exp}(1)\leq\ \mathbb{P}(\bar\tau_\varepsilon>t_\varepsilon;\ B_{\varepsilon})\ \leq \mathbb{P}(\tau_\varepsilon^Z>t_\varepsilon;\ B_{\varepsilon})+o_{\exp}(1).\end{equation}

This, Lemma 3.1, and our choice of $l_\varepsilon^i$ , $i=1,2,3$ , imply

(4.8)

\begin{equation}\lim_{\varepsilon\downarrow 0}\sup_{|y|\leq K(\varepsilon)}{\rm e}^{cy^2}\left|\varepsilon^{-(\alpha-1)}\mathbb{P}\left(\bar\tau_\varepsilon>t_{\varepsilon}\right)-2{\rm e}^{\lambda C}\psi_0(y)\right|=0.\end{equation}

To finish the proof, we will need

(4.9)

\begin{equation}\lim_{\varepsilon\downarrow 0}\sup_{\varepsilon^{\beta''}\le |u| \le \varepsilon^{\beta}}\left|\mathbb{P}\left(Y_{\varepsilon}(\bar\tau^{\varepsilon})=\varepsilon^{\beta}|Y_\varepsilon(t_\varepsilon)=u\right)-{\textbf{1}}_{u>0}\right|=0,\end{equation}

which holds since, due to (2.3), $Y_{\varepsilon}(\bar\tau_\varepsilon)=\varepsilon^{\beta}$ is equivalent to $\varepsilon^{-1}u+U_\varepsilon(\bar\tau_\varepsilon)+\varepsilon V_\varepsilon(\bar\tau_\varepsilon)>0,$ and so

\begin{equation*}\left|\mathbb{P}\left(Y_{\varepsilon}(\bar\tau^{\varepsilon})=\varepsilon^{\beta} \mid Y_\varepsilon(t_\varepsilon)=u\right)-{\textbf{1}}_{u>0}\right|\leq\mathbb{P}\left(|U_\varepsilon(\bar\tau_\varepsilon)|\geq \varepsilon^{-(1-\beta'')}-\varepsilon V(\bar\tau_\varepsilon)\right)\to 0,\ \varepsilon\downarrow 0,\end{equation*}

due to the boundedness of $V(\bar\tau_\varepsilon)$ and (2.5).

Using (4.9), we can write

(4.10)

\begin{align}\mathbb{P}\left(Y(\bar\tau_\varepsilon)=\varepsilon^{\beta} \mid \bar\tau_\varepsilon>t_{\varepsilon}\right)&=\mathbb{E}\left[\mathbb{P}\left(Y_{\varepsilon}(\bar\tau^{\varepsilon})=\varepsilon^{\beta} \mid Y_\varepsilon(t_\varepsilon)\right)|\ \bar\tau_\varepsilon>t_\varepsilon\right] \notag\\[3pt] &=\mathbb{E}\Big[\mathbb{P}\big(Y_{\varepsilon}(\bar\tau^{\varepsilon})=\varepsilon^{\beta}|Y_\varepsilon(t_\varepsilon)\big);\ |Y_\varepsilon(t_\varepsilon)|<\varepsilon^{\beta''}\, \big|\, \bar\tau_\varepsilon>t_\varepsilon\Big]\notag\\[3pt] &\quad+\mathbb{P}\big(Y_\varepsilon(t_\varepsilon)>\varepsilon^{\beta''} \mid \bar\tau_\varepsilon>t_\varepsilon\big) + o(1)\notag\\[3pt] &=A_1+A_2+o(1).\end{align}

Due to (4.2),

(4.11)

\begin{align}A_1&=\mathbb{P}\bigg(|Y_\varepsilon(t_\varepsilon)|<\varepsilon^{\beta''};\ \sup_{0\leq t\leq t_\varepsilon}|\Delta_\varepsilon(t)|\leq\varepsilon^{\beta''} \mid \bar\tau_\varepsilon>t_\varepsilon\bigg)+o(1)\notag\\[3pt] &\leq \mathbb{P}\left(|Z_\varepsilon(t_\varepsilon)|<2\varepsilon^{\beta''} \mid \bar\tau_\varepsilon>t_\varepsilon\right)+o(1)\to 0.\end{align}

In the last convergence, we used (3.7) to compute

\begin{equation*}\mathbb{P}(|Z_\varepsilon(t_\varepsilon)|<2\varepsilon^{\beta''})=2{\varepsilon}^{\beta''}{\varepsilon}^{1-\alpha+\beta}{\rm e}^{\lambda C}\psi_0(y)(1+o(1)),\end{equation*}

and we used (4.7) with $B_{\varepsilon}\equiv \Omega$ , along with Lemma 3.1, to compute $\mathbb{P}(\bar\tau_{\varepsilon}>t_{\varepsilon})=2 {\rm e}^{\lambda C}\psi_0(y){\varepsilon}^{\alpha-1}(1+o(1))$ , so $\mathbb{P}(|Z_\varepsilon(t_\varepsilon)|<2\varepsilon^{\beta''})/\mathbb{P}(\bar\tau_{\varepsilon}>t_{\varepsilon})\to 0$ follows from our assumptions on $\alpha$ , $\beta$ , and $\beta''$ .

Also, due to (4.2),

\begin{equation*}\mathbb{P}\big(Z_\varepsilon(t_\varepsilon)>2\varepsilon^{\beta''} \mid \bar\tau_\varepsilon>t_\varepsilon\big)+o(1)\leq A_2 \leq \mathbb{P}\left(Z_\varepsilon(t_\varepsilon)>0 \mid \bar\tau_\varepsilon>t_\varepsilon\right)+o(1).\end{equation*}

Using (4.7) with $B_{\varepsilon}=\{Z_\varepsilon(t_\varepsilon)>2\varepsilon^{\beta''}\}$ , $B_{\varepsilon}=\{Z_\varepsilon(t_\varepsilon)>0\}$ , and $B_{\varepsilon}=\Omega$ , we can switch conditioning to that in terms of $Z_{\varepsilon}$ :

\begin{equation*}\mathbb{P}\Big(Z_\varepsilon(t_\varepsilon)>2\varepsilon^{\beta''} \mid \eta_\varepsilon^Z>t_\varepsilon\Big)+o(1)\leq A_2 \leq \mathbb{P}\left(Z_\varepsilon(t_\varepsilon)>0 \mid \tau_\varepsilon^Z>t_\varepsilon\right)+o(1),\end{equation*}

where both the left- and right-hand sides converge to $1/2$ as $\varepsilon\downarrow 0$ due to Lemma 3.2. Combining this with (4.10) and (4.11), noticing that all the o(1) terms in these estimates are independent of the starting point y, and using (4.8), we obtain (4.3), which completes the proof.

5. Extension to arbitrary timescales

The goal of this section is to extend Theorem 4.1 for arbitrary $\alpha>1$ and thus prove Theorem 2.1. We set $\theta=\alpha-\beta$ , $L(\varepsilon)={\rm e}^{-\frac{\lambda}{\theta} C_\varepsilon}$ , and

\begin{equation*}t_\varepsilon=\frac{\theta}{\lambda}\log\big(L(\varepsilon)\varepsilon^{-1}\big)=\frac{\theta}{\lambda}\log {\varepsilon}^{-1}-C_{\varepsilon}.\end{equation*}

When $\theta\in(1-\beta,1)$ , Theorem 4.1 applies and there is nothing new to prove. Here we study the case $\theta\geq 1$ . Up to this point the only restriction on $\beta$ was $\beta\in(0,1)$ . Let us now set $N=[\theta]+1\ge 2$ , $\beta_0=\frac{1}{2}\left(1+\frac{\theta}{N}\right)$ , and assume $\beta\in (\beta_0,1)$ throughout this section. We have

(5.1)

\begin{equation}\theta<N<\frac{\theta}{1-\beta}.\end{equation}

We also define $t_\varepsilon'=t_\varepsilon/N$ and $t_{\varepsilon,k} = kt_\varepsilon'$ , $k=0,1,\dots, N$ . Our plan is to track $Y_{\varepsilon,k}=Y_{\varepsilon}(t_{\varepsilon,k})$ , $k=0,1,\dots, N$ , using the results of the previous section on the short intervals $[t_{\varepsilon,k},t_{\varepsilon,k+1}]$ .

The first step is the following lemma, which establishes that the process needs to stay close to the origin to delay the exit.

Lemma 5.1. We have

(5.2)

\begin{equation}\max_{k=0,\ldots,N-1}\mathbb{P}\left(\sup_{t\leq t_{\varepsilon,k}}|Y_\varepsilon(t)|>\varepsilon K(\varepsilon);\ \bar\tau_{\varepsilon}>t_{\varepsilon,k+1}\right)\leq C_1{\rm e}^{-C_2K^2(\varepsilon)}\end{equation}

for some $C_1,C_2>0$ . In particular,

\begin{equation*}\max_{k=0,\dots,N-1}\mathbb{P}\left(\max_{u=0,\dots,k}|Y_{\varepsilon,u}|>\varepsilon K(\varepsilon);\ \bar\tau_{\varepsilon}>t_{\varepsilon,k+1}\right)\leq C_1{\rm e}^{-C_2K^2(\varepsilon)}.\end{equation*}

Proof. Using the strong Markov property and applying Duhamel’s principle (2.3) and (2.4) to the initial condition y with $|y|>{\varepsilon} K({\varepsilon})$ , we reduce the lemma to the estimate

\begin{align*}&\mathbb{P}\left({\rm e}^{\lambda t'_{\varepsilon}}\inf_{t\le t_{\varepsilon}}\left|y+{\varepsilon} U_{\varepsilon}(t)+{\varepsilon}^2 V_{\varepsilon}(t) \right|<{\varepsilon}^\beta\right)\\ &\quad \le \mathbb{P}\left(\sup_{t\le t_{\varepsilon}}|U_{\varepsilon}(t)|> K({\varepsilon})-{\varepsilon}^{\beta+\frac{\theta}{N}-1- \frac{\theta}{N}\log(L({\varepsilon}))}-{\varepsilon} \frac{\|h\|_\infty}{2\lambda}\right)\!,\end{align*}

and the desired inequality follows by (2.5) since $\beta+\frac{\theta}{N}-1>0$ due to (5.1).□

We now collect some results needed for our iteration scheme.

Lemma 5.2. Let $Y_{\varepsilon}(0) = \varepsilon y$ . Then there is $c>0$ such that

(5.3)

\begin{equation}\lim_{\varepsilon\downarrow 0}\sup_{|y|\le K(\varepsilon)}{\rm e}^{cy^2}\left|\varepsilon^{-\left(\frac{\theta}{N}+\beta-1\right)}\mathbb{P}\left(Y_{\varepsilon}(\bar\tau_\varepsilon)=\pm\varepsilon^{\beta};\ \bar\tau_\varepsilon>t_\varepsilon'\right)-{\rm e}^{\frac{\lambda C}{N}}\psi_0(y)\right|=0.\end{equation}

Moreover, for any Lipschitz function h on ${\mathbb R}$ , exponentially decaying at $\infty$ , we have

(5.4)

\begin{equation}\lim_{\varepsilon\downarrow 0}\sup_{|y|\leq K(\varepsilon)}{\rm e}^{cy^2}\left|\varepsilon^{-\frac{\theta}{N}}\mathbb{E}\left[h(\varepsilon^{-1}Y_{\varepsilon,1});\ \bar\tau_\varepsilon>t_\varepsilon'\right]-{\rm e}^{\frac{\lambda C}{N}}\psi_0(y) \int_{-\infty}^{\infty} h(y) \, {\rm d} y\right| =0.\end{equation}

Proof. The first claim follows from Theorem 4.1 with $C/N$ in place of C and $\alpha=\beta+\theta/N$ . Note that this value of $\alpha$ belongs to $(1,1+\beta)$ due to (5.1).

The second claim is a direct consequence of the first one and

(5.5)

\begin{equation}\lim_{\varepsilon\downarrow 0}\sup_{|y|\leq K(\varepsilon)}\left|\varepsilon^{-(1-\beta)}\mathbb{E}\left[h(\varepsilon^{-1}Y_{\varepsilon,1})|\, \bar\tau_\varepsilon>t_\varepsilon'\right]- \frac{1}{2}\int_{-\infty}^{\infty} h(y) \, {\rm d} y\right|=0.\end{equation}

Once again, to prove this, we would like to use the result for the linear process. However, the estimate in (4.2) is insufficient when applied directly. Instead, let us note that

\begin{equation*}\varepsilon^{-1}Y_{\varepsilon,1}-\varepsilon^{-1}Z_{\varepsilon,1}={\rm e}^{\lambda t_\varepsilon'}\int_0^{t_\varepsilon'}{\rm e}^{-\lambda t}\left(\sigma(Y_\varepsilon(t))-\sigma_0\right){\rm d} W(t)=\varepsilon^{-\theta/N}I_{\varepsilon}(t_\varepsilon'),\end{equation*}

where $I_{\varepsilon}(t_\varepsilon')={\rm e}^{-\lambda C_{\varepsilon}/N}\int_0^{t_\varepsilon'}{\rm e}^{-\lambda t}\left(\sigma(Y_\varepsilon(t))-\sigma_0\right){\rm d} W(t)$ and choose any

\begin{equation*}\beta'\in\Bigg(\frac{1}{2}\Bigg(1+\frac{\theta}{2}\Bigg),\beta\Bigg).\end{equation*}

The exponential martingale inequality (Lemma 2.1) and the Lipschitz continuity of $\sigma$ imply

\begin{align*}\mathbb{P}\left(\sup_{s\le \bar{\bar\tau}_{\varepsilon}}|I_{\varepsilon}|>{\varepsilon}^{\beta'}\right)= o_{\exp}(1),\\[-18pt]\end{align*}

and thus

\begin{equation*}\mathbb{P}\left(\varepsilon^{-(1-\beta)}\left|\varepsilon^{-1}Y_{\varepsilon,1}-\varepsilon^{-1}X_{\varepsilon,1}\right|>\varepsilon^{\beta'-\theta/N-1 +\beta} \mid \bar\tau_\varepsilon>t_\varepsilon'\right)= o_{\exp}(1).\end{equation*}

Note that $\beta'-\theta/N-1+\beta>0$ due to the choice of $\beta$ and $\beta'$ . Now we can use the last display and the Lipschitz continuity of h to obtain

\begin{equation*}\varepsilon^{-(1-\beta)}\mathbb{E}\left[\left|h(\varepsilon^{-1}Y_{\varepsilon,1})-h(\varepsilon^{-1}Z_{\varepsilon,1})\right| \mid \bar\tau_\varepsilon>t_\varepsilon'\right]\leq \|h\|_{\rm Lip}\varepsilon^{\beta'-\theta/N-1 +\beta} + o_{\exp}(1)\to 0\end{equation*}

as $\varepsilon\downarrow 0$ . This and (3.6) imply (5.5), which completes the proof of the lemma.□

Finally, the next theorem implies Theorem 2.1 with $\beta_0=\frac{1}{2}\left(1+\frac{\theta}{N}\right)$ .

Theorem 5.1. There is $c>0$ such that

(5.6)

\begin{equation}\lim_{\varepsilon\downarrow 0}\sup_{|y|\leq K(\varepsilon)} {\rm e}^{cy^2} \left|\varepsilon^{-(\alpha-1)}\mathbb{P}\left(\bar\tau_\varepsilon>t_{\varepsilon}\right)-2{\rm e}^{\lambda C}\psi_0(y)\right|=0,\end{equation}

and

(5.7)

\begin{equation}\lim_{\varepsilon\downarrow 0}\sup_{|y|\leq K(\varepsilon)}\left|\mathbb{P}\left(Y_\varepsilon\left(\bar\tau_\varepsilon\right)=\pm\varepsilon^{\beta} \mid \bar\tau_\varepsilon>t_\varepsilon\right)-\frac{1}{2}\right|=0.\end{equation}

Proof. We will use $o_K(1)$ to denote any function that decays faster than any power of $\varepsilon$ as $\varepsilon\downarrow 0$ . It suffices to prove the theorem in the case where the function $K({\varepsilon})$ grows fast enough as ${\varepsilon}\to0$ to guarantee that the right-hand side of (5.2) is $o_K(1)$ . To see that the theorem will then follow in full generality, we just notice that enlarging the set of initial conditions y from $\{|y|\le K({\varepsilon})\}$ to $\{|y|\le K({\varepsilon})\vee |\log {\varepsilon}|\}$ reduces the situation to that special case.

We will prove by induction that, for every $k=1,\dots, N$ , there is $c>0$ such that

(5.8)

\begin{equation}\lim_{\varepsilon\downarrow 0}\sup_{|y|\leq K(\varepsilon)}{\rm e}^{cy^2}\left|\varepsilon^{-\left(k\frac{\theta}{N}+\beta-1\right)}\mathbb{P}_y\left(\bar\tau_\varepsilon>t_{{\varepsilon},k}\right)-2{\rm e}^{\frac{\lambda k C}{N}}\psi_0(y)\right|=0,\end{equation}

where we explicitly indicate the dependence on the starting point $Y_\varepsilon(0)=\varepsilon y$ as a subscript in $\mathbb{P}_y$ for clarity. The case $k=N$ is the desired result (5.6). The base of induction, the case $k=1$ , is the first claim of Lemma 5.2. Let us make the induction step assuming that (5.8) holds for some k.

Lemma 5.1 and the Markov property allows us to write

(5.9)

\begin{align}\mathbb{P}_{y}\left(\bar\tau_\varepsilon>t_{\varepsilon,k+1}\right) &= \mathbb{P}_{y}\left(\bar\tau_\varepsilon>t_{\varepsilon,k+1};\ |Y_{\varepsilon,1}|\leq\varepsilon K(\varepsilon)\right) + o_K(1)\nonumber\\[3pt] & =\int_{-K(\varepsilon)}^{K(\varepsilon)}\mathbb{P}_{y'}\left(\bar\tau_\varepsilon>t_{\varepsilon,k}\right)\mathbb{P}_{y}\left(Y_{\varepsilon,1}\in\varepsilon {\rm d} y';\ \bar\tau_\varepsilon>t_\varepsilon'\right)+o_K(1).\end{align}

Using the induction hypothesis (5.8), we obtain

\begin{equation*}\mathbb{P}_{y'}\left(\bar\tau_\varepsilon>t_{\varepsilon,k}\right)=2\varepsilon^{k\frac{\theta}{N}+\beta-1}{\rm e}^{\frac{\lambda k C}{N}}\psi_0(y')+{\rm e}^{-cy'^2}o\left(\varepsilon^{k\frac{\theta}{N}+\beta-1}\right)\!,\end{equation*}

where the error term is uniform over $|y'|\leq K(\varepsilon)$ . This means that the first term on the right-hand side of (5.9) can be written as

(5.10)

\begin{align}&\int_{-K(\varepsilon)}^{K(\varepsilon)}\mathbb{P}_{y}\left(\bar\tau_\varepsilon>t_{\varepsilon,k+1} \mid Y_{\varepsilon,1}=\varepsilon y'\right)\mathbb{P}_{y}\left(Y_{\varepsilon,1}\in\varepsilon {\rm d} y';\ \bar\tau_\varepsilon>t_\varepsilon'\right)\nonumber\\[3pt] &\quad =2 \varepsilon^{k\frac{\theta}{N}+\beta-1}{\rm e}^{\frac{\lambda k C}{N}}\mathbb{E}_{y}\left[\psi_0\left(\varepsilon^{-1}Y_{\varepsilon,1}\right);\ |Y_{\varepsilon,1}|\leq\varepsilon K(\varepsilon);\ \bar\tau_\varepsilon>t_\varepsilon'\right] +H(\varepsilon,y),\end{align}

where the error term satisfies

(5.11)

\begin{equation}\sup_{|y|\leq K(\varepsilon)}|H(\varepsilon,y)|\\\leq\sup_{|y|\leq K(\varepsilon)}\mathbb{E}_{y}\big( {\rm e}^{-c(\varepsilon^{-1}Y_{\varepsilon,1})^2} ;\ \bar\tau_\varepsilon>t_\varepsilon'\big) o \big(\varepsilon^{k\frac{\theta}{N}+\beta-1}\big)=o \big(\varepsilon^{(k+1)\frac{\theta}{N}+\beta-1}\big) ,\end{equation}

where we used (5.4) in the last step with $h(y)={\rm e}^{-cy^2}$ . The main term on the right-hand side of (5.10) can be estimated using Lemma 5.1 and (5.4) with $h(y)=\psi_0(y)$ (so $\int_{\mathbb R} h(y)\, {\rm d} y=1$ ):

(5.12)

\begin{align}\mathbb{E}_{y}\left[\psi_0\left(\varepsilon^{-1}Y_{\varepsilon,1}\right);\ |Y_{\varepsilon,1}|\leq\varepsilon K(\varepsilon);\ \bar\tau_\varepsilon>t_\varepsilon'\right] &=\mathbb{E}_{y}\left[\psi_0\left(\varepsilon^{-1}Y_{\varepsilon,1}\right);\ \bar\tau_\varepsilon>t_\varepsilon'\right] +o_K(1)\nonumber\\[3pt] &=\varepsilon^{\frac{\theta}{N}}{\rm e}^{\frac{\lambda C}{N}}\psi_0(y) + o\big(\varepsilon^{\frac{\theta}{N}}\big),\end{align}

and this expansion holds uniformly over $|y|\leq K(\varepsilon)$ .

Putting together (5.9), (5.10), (5.11) and (5.12) yields, for sufficiently small $c'>0$ ,

\begin{equation*}\lim_{\varepsilon\downarrow 0}\sup_{|y|\leq K(\varepsilon)}{\rm e}^{c'y^2}\left|\varepsilon^{-\left((k+1)\frac{\theta}{N}+\beta-1\right)}\mathbb{P}_y\left(\bar\tau_\varepsilon>t_{{\varepsilon},k+1}\right)-2{\rm e}^{\frac{\lambda (k+1)C}{N}}\psi_0(y)\right|=0.\end{equation*}

This completes the induction step and finishes the proof of (5.6).

To prove (5.7), note first that (5.2) and the strong Markov property imply

\begin{align*}&\mathbb{P}\left(Y_\varepsilon\left(\bar\tau_\varepsilon\right)=\pm\varepsilon^{\beta}\mid\bar\tau_\varepsilon>t_\varepsilon\right)\\[3pt] &\quad =\int_{-K(\varepsilon)}^{K(\varepsilon)} \mathbb{P}_y\left(Y_\varepsilon\left(\bar\tau_\varepsilon\right)=\pm\varepsilon^{\beta}\mid\bar\tau_\varepsilon>t_{\varepsilon,1}\right)\mathbb{P}\left(Y_{\varepsilon,N-1}\in \varepsilon {\rm d} y\mid\bar\tau_\varepsilon>t_{\varepsilon,N-1}\right)+o(1).\end{align*}

Using (5.3), the integrand can be written as

\begin{equation*}\mathbb{P}_y\left(Y_\varepsilon\left(\bar\tau_\varepsilon\right)=\pm\varepsilon^{\beta}\mid\bar\tau_\varepsilon>t_{\varepsilon,1}\right)=\frac{\varepsilon^{\frac{\theta}{N}+\beta-1}{\rm e}^{\lambda C}\psi_0(y)+o\big(\varepsilon^{\frac{\theta}{N}+\beta-1}\big)}{2\varepsilon^{\frac{\theta}{N}+\beta-1}{\rm e}^{\lambda C}\psi_0(y)+o\big(\varepsilon^{\frac{\theta}{N}+\beta-1}\big)}=\frac{1}{2}+o(1),\end{equation*}

where the error terms are uniform in y, and thus another application of (5.2) finishes the proof of (5.7).□

6. Rare transitions in heteroclinic networks

In this section we discuss, briefly and nonrigorously, the questions that led us to study the tails of exit times in detail. These questions originate in the long-term behavior of diffusions near heteroclinic networks in the vanishing noise limit. A heteroclinic network is a feature of the phase portrait associated with a vector field composed of multiple hyperbolic critical points (‘saddles’) connected to each other by heteroclinic orbits; see an example of a phase portrait with a heteroclinic network for a cellular flow in Figure 1.

Figure 1: A heteroclinic network is the backbone of this phase portrait associated with a cellular flow.

Let us consider a diffusion process solving the Itô equation (1.2) in ${\mathbb R}^2$ (although, with minor modifications, the discussion below applies to higher dimensions as well) with drift b giving rise to a heteroclinic network. The typical behavior of such processes for times that are of the order of $\log {\varepsilon}^{-1}$ was studied in [Reference Almada and Bakhtin1, Reference Bakhtin2, Reference Bakhtin3]. Its main features depend mainly on the linearization of the drift b near the saddle points and can be described as follows. Upon reaching a small neighborhood of a saddle, the process spends a long (logarithmic in ${\varepsilon}$ ) time in that neighborhood where the drift is weak and eventually exits along the unstable manifold associated with the positive eigenvalue $\lambda$ of the linearization. This manifold is composed of two outgoing heteroclinic orbits, so the dynamics chooses one of them and follows it for a time of the order of 1 until it reaches the next saddle where it will also eventually decide between two outgoing directions, etc. This description may seem to imply the picture where the limiting (as ${\varepsilon}\to 0$ ) process is essentially a random walk on the directed graph of heteroclinic connections. However, the character of the limiting process is often not Markovian and depends on the linearizations of b near the saddles.

To see what is going on, let us consider a two-dimensional diffusion $(X_{\varepsilon},Y_{\varepsilon})$ near a model saddle described by equations

\begin{align*}{\rm d} X_{{\varepsilon}}(t)&=\lambda X_{{\varepsilon}}(t){\rm d} t+{\varepsilon} {\rm d} W(t),\\[3pt] {\rm d} Y_{{\varepsilon}}(t)&=-\mu Y_{{\varepsilon}}{\rm d} t+ {\varepsilon} {\rm d} B(t),\end{align*}

driven by independent standard Wiener processes W and B. Here, $\lambda>0$ and $\mu>0$ can be viewed as coefficients of expansion and contraction, respectively. Assuming that $(X_{\varepsilon}(0),Y_{\varepsilon}(0))=(0,1)$ , i.e. starting the process on the stable manifold of the saddle located at the origin, we are interested in the distribution of $(X_{\varepsilon}(\tau_{\varepsilon}),Y_{\varepsilon}(\tau_{\varepsilon}))$ , where $\tau_{\varepsilon}=\inf\{t\ge 0\,{:}\,|X_{\varepsilon}(t)|=1\}$ is the exit time from the strip $[\!-1,1]\times{\mathbb R}$ . Since

\begin{align*}X_{{\varepsilon}}(t)&={\varepsilon} {\rm e}^{\lambda t} N(t),\\Y_{\varepsilon}(t)&={\rm e}^{-\mu t}+{\varepsilon} M(t),\end{align*}

where

\begin{equation*}N(t)=\int_0^t\! {\rm e}^{-\lambda s} \, {\rm d} W(s)\stackrel{{\rm d}}{\longrightarrow} N(\infty),\quad M(t)=\int_0^t\! {\rm e}^{-\mu(t-s)} \, {\rm d} B(s)\stackrel{{\rm d}}{\longrightarrow} M(\infty) \quad \text{\rm as\ } t\to\infty,\end{equation*}

we find that, for small ${\varepsilon}$ ,

(6.1)

\begin{equation}\tau_{\varepsilon} \stackrel{{\rm d}}{\approx}\frac{1}{\lambda}\log \frac{1}{{\varepsilon}}+\frac{1}{\lambda}\log\frac{1}{|N(\infty)|} \end{equation}

and

(6.2)

\begin{equation}Y_{\varepsilon}(\tau_{\varepsilon}) \stackrel{{\rm d}}{\approx} {\varepsilon}^{\rho} |N(\infty)|^\rho+{\varepsilon} M(\infty),\end{equation}

where $\rho=\mu/\lambda$ . This allows us to conclude that, as ${\varepsilon}\to 0$ , the distribution of the exit point $Y_{\varepsilon}(\tau_{\varepsilon})$ depends cruciially on how $\rho$ compares to 1. In particular, if $\rho<1$ (i.e. the contraction is not as strong as expansion: $\mu<\lambda$ ) then the first term ${\varepsilon}^{\rho} |N(\infty)|^\rho$ dominates. It is positive and scales as ${\varepsilon}^\rho\gg {\varepsilon}$ , i.e. it is stronger than the noise magnitude ${\varepsilon}$ . Therefore, in this situation, at the next saddle point the system is most likely to stay on the same side of the heteroclinic network. If $\rho> 1$ , then the probability of choosing either of the two outgoing connections at the next saddle approaches $1/2$ as ${\varepsilon}\to0$ .

This analysis can be extended to more general initial conditions, to nonlinear drift and diffusion coefficients, to higher dimensions, and to sequences of saddles. The result is that for each sequence of saddles one can iteratively determine the asymptotic probability of realization of each next step along that sequence and the scaling asymptotics of the associated exit distributions. This was done in [Reference Almada and Bakhtin1, Reference Bakhtin2, Reference Bakhtin3]. The result is that at logarithmic timescales (the saddle exit times are typically logarithmic in ${\varepsilon}$ , see (6.1)), certain pathways in the network are typical but many pathways are not realized due to the largely one-sided exit distributions scaling as ${\varepsilon}^\alpha$ with $\alpha<1$ , which in turn are due to insufficient contraction at saddles.

This is interesting per se, and among other applications gives an explanation of the poor vocabulary of excitation patterns in neural networks and similar dynamics modeled by Lotka–Volterra-type systems with small noise. However, this information is not sufficient to address questions about timescales that are longer than logarithmic, such as the limiting behavior of the invariant distribution. To answer these questions, one must quantify the probabilities of rare events corresponding to atypical exits from saddle points. This means that one needs to study probabilities of events like $\mathbb{P}(Y_{\varepsilon}(\tau_{\varepsilon})\sim {\varepsilon})$ for $Y_{\varepsilon}(\tau_{\varepsilon})$ given in (6.2). The second term in (6.2) is of the order of ${\varepsilon}$ , so ignoring many technical details we reduce this question to estimating

(6.3)

\begin{equation}\mathbb{P}({\varepsilon}^\rho|N(\infty)|^\rho\sim{\varepsilon})\sim\mathbb{P}(N(\infty)\sim{\varepsilon}^{\gamma}) \sim c{\varepsilon}^{\gamma}\end{equation}

with $\gamma=\frac{1}{\rho}-1$ . The last relation holds since $N(\infty)$ has continuous Lebesgue density at 0.

This means that the probability of an atypical exit from the saddle is asymptotically polynomial in ${\varepsilon}$ , of the order of ${\varepsilon}^\gamma$ , which in turn means that one typically has to wait for time of the order of ${\varepsilon}^{-\gamma}\log {\varepsilon}^{-1}$ before one sees such a rare event happen. Ordering all exponents emerging in such calculations for all rare transitions, $\gamma_1<\gamma_2<\cdots<\gamma_N$ , and introducing $T_{k,{\varepsilon}}={\varepsilon}^{-\gamma_k}\log {\varepsilon}^{-1}$ , we see that, for $t_{\varepsilon}$ satisfying

\begin{equation*}T_{k,{\varepsilon}}\ll t_{\varepsilon} \ll T_{k+1,{\varepsilon}}, \end{equation*}

transitions can be classified into admissible (that typically occur many times up to $t_{\varepsilon}$ ) and rare (that typically do not occur at all up to $t_{\varepsilon}$ ). As $t_{\varepsilon}$ crosses a level $T_{k,{\varepsilon}}$ from below, some new transitions become available. Increasing $t_{\varepsilon}$ gradually from 0 to values beyond $T_{N,{\varepsilon}}$ creates a hierarchical structure of merging clusters and associated timescales such that, at each timescale, the system explores one cluster, making no transitions between different clusters.

This picture containing the description of the limit of invariant distribution, homogenization results, etc., is similar to the Freidlin–Wentzell picture of metastability and the associated hierarchy of cycles. The important difference is that in our picture the probabilities of rare events decay polynomially and the associated timescales grow polynomially, while the large deviation estimates in the Freidlin–Wentzell theory lead to exponentially decaying probabilities and exponentially growing transition times between the metastable states.

We do not have a rigorous derivation of a general precise version of the asymptotic relation (6.3). The difficulties that emerge are related to handling nonlinearities in the drift and diffusion terms, and to the fact that $N(\infty)$ and $M(\infty)$ are only approximations to the (mutually dependent) random variables $N_{\varepsilon}(\tau_{\varepsilon})$ and $M_{\varepsilon}(\tau_{\varepsilon})$ , where the ${\varepsilon}$ subscript of $N_{\varepsilon}$ and $M_{\varepsilon}$ refers to the fact that for the case of non-additive noise, these stochastic processes do depend on ${\varepsilon}$ . So, to realize this program, among other things we must either prove that the density of $N_{\varepsilon}(\tau_{\varepsilon})$ uniformly converges to the Gaussian density in a small neighborhood of zero, or find other means to compare the distribution of $N_{\varepsilon}(\tau_{\varepsilon})$ to the Gaussian at small scales, which requires going beyond the known weak convergence of distributions.

According to (6.1), if $N(\infty)$ takes an atypically small value of the order of ${\varepsilon}^{\gamma}$ , then

(6.4)

\begin{equation}\tau_{\varepsilon}\approx\frac{1+\gamma}{\lambda}\log\frac{1}{{\varepsilon}},\end{equation}

i.e. exit takes an abnormally long time (the typical exit time corresponds to $\gamma=0$ ). In other words, the rare transitions determining the long-term behavior of diffusions near heteroclinic networks occur due to atypically long stays in the neighborhood of saddle points withstanding the repulsion in the unstable direction.

In the present paper (as well as in [Reference Bakhtin and Pajor-Gyulai4]), we study the polynomial decay of the distribution of exit times at scales described by (6.4). We believe that the method we propose here is applicable in the multi-dimensional situation, and we plan to give a rigorous treatment of it in upcoming publications.

Acknowledgement

Yuri Bakhtin gratefully acknowledges partial support from the NSF via grant DMS-1811444.

References

Almada, S. andBakhtin, Y. (2011). Normal forms approach to diffusion near hyperbolic equilibria. Nonlinearity 24, 1883–1907.Google Scholar

Bakhtin, Y. (2010). Small noise limit for diffusions near heteroclinic networks. Dynam. Syst. 25, 413–431.10.1080/14689367.2010.482520CrossRef Google Scholar

Bakhtin, Y. (2011). Noisy heteroclinic networks. Prob. Theory Relat. Fields 150, 1–42.CrossRef Google Scholar

Bakhtin, Y. andPajor-Gyulai, Z. (2019). Malliavin calculus approach to long exit times from an unstable equilibrium. Ann. Appl. Prob. 29, 827–850.10.1214/18-AAP1387CrossRef Google Scholar

Bass, R. F. (2011). Stochastic Processes (Camb. Ser. Statist. Probab. Math. 33). Cambridge University Press.Google Scholar

Champagnat, N. andVillemonais, D. (2016). Exponential convergence to quasi-stationary distribution and Q-process. Prob. Theory Relat. Fields 164, 243–283.10.1007/s00440-014-0611-7CrossRef Google Scholar

Eizenberg, A. (1984). The exit distributions for small random perturbations of dynamical systems with a repulsive type stationary point. Stochastics 12, 251–275.CrossRef Google Scholar

Karatzas, I. andShreve, S. (1991). Brownian Motion and Stochastic Calculus. Graduate Texts in Mathematics. Springer, New York.Google Scholar

Figure 1: A heteroclinic network is the backbone of this phase portrait associated with a cellular flow.

Article contents

Tails of exit times from unstable equilibria on the line

Abstract

Keywords

MSC classification

1. Introduction

2. Proof of Theorem 1.1

3. Linear system with additive noise

4. Proof of Theorem 2 for $\alpha\in(1,1+\beta)$

5. Extension to arbitrary timescales

6. Rare transitions in heteroclinic networks

Acknowledgement

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests