Hostname: page-component-745bb68f8f-v2bm5 Total loading time: 0 Render date: 2025-02-06T05:14:02.809Z Has data issue: false hasContentIssue false

Thinning and multilevel Monte Carlo methods for piecewise deterministic (Markov) processes with an application to a stochastic Morris–Lecar model

Published online by Cambridge University Press:  29 April 2020

Vincent Lemaire*
Affiliation:
Sorbonne University
MichÉle Thieullen*
Affiliation:
Sorbonne University
Nicolas Thomas*
Affiliation:
Sorbonne University
*
*Postal address: Laboratory of Probability, Statistics and Modeling (LPSM), UMR CNRS 8001, Sorbonne University–Campus Pierre et Marie Curie, Case 158, 4 place Jussieu, F-75252 Paris Cedex 5, France. Email addresses: vincent.lemaire@sorbonne-universite.fr, michele.thieullen@upmc.fr, nicolas.thomas@upmc.fr
*Postal address: Laboratory of Probability, Statistics and Modeling (LPSM), UMR CNRS 8001, Sorbonne University–Campus Pierre et Marie Curie, Case 158, 4 place Jussieu, F-75252 Paris Cedex 5, France. Email addresses: vincent.lemaire@sorbonne-universite.fr, michele.thieullen@upmc.fr, nicolas.thomas@upmc.fr
*Postal address: Laboratory of Probability, Statistics and Modeling (LPSM), UMR CNRS 8001, Sorbonne University–Campus Pierre et Marie Curie, Case 158, 4 place Jussieu, F-75252 Paris Cedex 5, France. Email addresses: vincent.lemaire@sorbonne-universite.fr, michele.thieullen@upmc.fr, nicolas.thomas@upmc.fr
Rights & Permissions [Opens in a new window]

Abstract

In the first part of this paper we study approximations of trajectories of piecewise deterministic processes (PDPs) when the flow is not given explicitly by the thinning method. We also establish a strong error estimate for PDPs as well as a weak error expansion for piecewise deterministic Markov processes (PDMPs). These estimates are the building blocks of the multilevel Monte Carlo (MLMC) method, which we study in the second part. The coupling required by the MLMC is based on the thinning procedure. In the third part we apply these results to a two-dimensional Morris–Lecar model with stochastic ion channels. In the range of our simulations the MLMC estimator outperforms classical Monte Carlo.

Type
Original Article
Copyright
© Applied Probability Trust 2020

1. Introduction

In this paper we are interested in the approximation of the trajectories of piecewise deterministic processes (PDPs). We establish strong error estimates for a PDP and a weak error expansion for a piecewise deterministic Markov process (PDMP). Then we study the application of the multilevel Monte Carlo (MLMC) method in order to approximate expectations of functionals of PDMPs. Our motivation comes from neuroscience, where the whole class of stochastic conductance-based neuron models can be interpreted as PDMPs. The response of a neuron to a stimulus, called neural encoding, is considered as relevant information to understand the functional properties of such excitable cells. Thus many quantities of interest, such as mean first spike latency, mean interspike intervals, and mean firing rate, can be modelled as expectations of functionals of PDMPs.

PDPs were introduced by Davis [Reference Davis5] as a general class of stochastic processes characterized by a deterministic evolution between two successive random times. In the case where the deterministic evolution part follows a family of ordinary differential equations (ODEs), the corresponding PDP enjoys the Markov property and is called a PDMP. The distribution of a PDMP is thus determined by three parameters called the characteristics of the PDMP: a family of vector fields, a jump rate (intensity function), and a transition measure.

We first consider a general PDP $(x_t)$ that is not necessarily Markov on a finite time interval [0, T] for which the flow is not explicitly solvable. Approximating its flows by the classical Euler scheme and using our previous work [Reference Lemaire, Thieullen and Thomas22], we build a thinning algorithm which provides us with an exact simulation of an approximation of $(x_t)$ , which we denote by $(\overline x_t)$ . The process $(\overline x_t)$ is a PDP constructed by thinning of a homogeneous Poisson process which enjoys explicitly solvable flows.

In fact this thinning construction provides a whole family of approximations indexed by the time step $h>0$ of the Euler scheme. We prove that for any real-valued smooth function F the following strong estimate holds:

(1.1) \begin{equation}{\text{there exist $V_1>0, V_2>0$ such that}} \ {\mathbb{E}}[ |F(\overline x_T) - F(x_T)|^2 ] \leq V_1 h + V_2 h^2.\end{equation}

Moreover, if $(x_t)$ is a PDMP the following weak error expansion holds:

(1.2) \begin{equation}{\text{there exists $ c_1>0$ such that}}\ {\mathbb{E}}[ F(\overline x_T)] - {\mathbb{E}}[ F(x_T) ] = c_1 h +{\text{o}}(h^{2}).\end{equation}

The estimate (1.1) is mainly based on the construction of the pair $(x_t, \overline x_t)$ and on the fact that the Euler scheme is of order 1; this is why it is valid for a general PDP and its Euler scheme. In contrast, the estimate (1.2) relies on properties which are specific to PDMPs, such as the Feynman–Kac formula.

The MLMC method relies simultaneously on estimates (1.1) and (1.2); this is why we study its application to the PDMP framework instead of the more general PDP framework. MLMC extends the classical Monte Carlo (MC) method, which is a very general approach to estimating expectations using stochastic simulations. The complexity (i.e. the number of operations necessary in the simulation) associated with an MC estimation can be prohibitive, especially when the complexity of an individual random sample is very high. MLMC relies on repeated independent random samplings taken on different levels of accuracy, which differs from classical MC. MLMC can then greatly reduce the complexity of classical MC by performing most simulations with low accuracy but with low complexity, and only a few simulations with high accuracy at high complexity. MLMC was introduced by Heinrich [Reference Heinrich18] and developed by Giles [Reference Giles11]. The MLMC estimator has been efficiently used in various fields of numerical probability such as SDEs [Reference Giles11], Markov chains [Reference Anderson and Higham1, Reference Anderson, Higham and Sun2, Reference Glynn and Rhee14], Lévy processes [Reference Ferreiro-Castilla, Kyprianou, Scheichl and Suryanarayana10], jump diffusions [Reference Dereich7, Reference Dereich and Heidenreich8, Reference Xia and Giles28], and nested Monte Carlo [Reference Giorgi13, Reference Lemaire and Pagés21]. See [Reference Giles12] for more references. To the best of our knowledge, application of MLMC to PDMPs has not been considered.

For the sake of clarity, we will describe the general improvement of MLMC. We are interested in the estimation of ${\mathbb{E}}[X]$ , where X is a real-valued square-integrable random variable on a probability space $( \Omega,\mathcal{F}, \mathbb{P} )$ . When X can be simulated exactly, the classical MC estimator $(1/N)\sum_{k=1}^N X^k$ with $X^k, k \geq 1$ independent random variables identically distributed as X provides an unbiased estimator. The associated $L^2$ -error satisfies $\| Y-{\mathbb{E}}[X] \|_2^2 = \text{Var}(Y)=\frac{1}{N} \text{Var}(X)$ . If we quantify the precision by the $L^2$ -error, then a user-prescribed precision $\epsilon^2 > 0$ is achieved for $N ={\text{O}}(\epsilon^{-2})$ , so that in this case the global complexity is of order ${\text{O}}(\epsilon^{-2})$ .

Assume now that X cannot be simulated exactly (or cannot be simulated at a reasonable cost) and that we can build a family of real-valued random variables $(X_h, h > 0)$ on $( \Omega,\mathcal{F}, \mathbb{P} )$ which converges weakly and strongly to X as $h \rightarrow 0$ in the following sense:

(1.3) \begin{equation}{\text{there exist $c_1>0, \alpha >0$ such that}}\ {\mathbb{E}}[ X_h] - {\mathbb{E}}[ X ] = c_1 h^\alpha +{\text{o}}(h^{2\alpha}),\end{equation}

and

(1.4) \begin{equation}{\text{there exist $ V_1>0, \beta >0 $ such that}}\ {\mathbb{E}}[ |X_h - X|^2 ] \leq V_1 h^\beta.\end{equation}

Assume, moreover, that for $h > 0$ the random variable $X_h$ can be simulated at reasonable complexity (the complexity increases as $h \rightarrow 0$ ). The classical MC estimator now consists of a sequence of random variables,

(1.5) \begin{equation}Y = \frac{1}{N} \sum_{k=1}^N X^k_h,\end{equation}

where $X^k_h, k\geq 1$ are independent random variables identically distributed as $X_h$ . The bias and the variance of the estimator (1.5) are given by ${\mathbb{E}}[Y] - {\mathbb{E}}[X]={\mathbb{E}}[X_h] - {\mathbb{E}}[X]\simeq c_1 h^\alpha$ and $\text{Var}(Y)=\frac{1}{N} \text{Var}(X_h)$ respectively. From the strong estimate (1.4) we have $\text{Var}(X_h) \rightarrow \text{Var}(X)$ as $h \rightarrow 0$ , so that $\text{Var}(X_h)$ is asymptotically a constant independent of h. If as above we quantify the precision by the $L^2$ -error and use $\| Y-{\mathbb{E}}[X] \|_2^2 = ({\mathbb{E}}[Y] - {\mathbb{E}}[X])^2 + \text{Var}(Y)$ , we obtain that the estimator (1.5) achieves a user-prescribed precision $\epsilon^2 > 0$ for $h = {\text{O}}(\epsilon^{1/\alpha})$ and $N ={\text{O}}(\epsilon^{-2})$ , so that the global complexity of the estimator is now ${\text{O}}(\epsilon^{-2- {{{1}/{\alpha}}}})$ .

The MLMC method takes advantage of the estimate (1.4) in order to reduce the global complexity. Let us fix $L \geq 2$ and consider for $l \in \{1,\ldots,L\}$ a geometrically decreasing sequence $(h_l,1 \leq l \leq L)$ , where $h_l = h^{*} M^{-(l-1)}$ for fixed $h^{*} >0$ and $M>1$ . The indexes l are called the levels of the MLMC and the complexity of $X_{h_l}$ increases as the level increases. Thanks to the weak expansion (1.3), the quantity ${\mathbb{E}} [X_{h_L}]$ approximates ${\mathbb{E}} [X]$ . Using the linearity of the expectation, the quantity ${\mathbb{E}} [X_{h_L}]$ can be decomposed over the levels $l \in \{1,\ldots,L \}$ as follows:

(1.6) \begin{equation}{\mathbb{E}} [X_{h_L}] = {\mathbb{E}} [X_{h^{*}}] + \sum_{l=2}^L {\mathbb{E}} [X_{h_l} - X_{h_{l-1}}].\end{equation}

For each level $l \in \{1,\ldots,L\}$ , a classical MC estimator is used to approximate ${\mathbb{E}} [X_{h_l} - X_{h_{l-1}}]$ and ${\mathbb{E}} [X_{h^{*}}]$ . At each level, a number $N_l \geq 1$ of samples are required and the key point is that the random variables $X_{h_l}$ and $X_{h_{l-1}}$ are assumed to be correlated in order to make the variance of $X_{h_l} - X_{h_{l-1}}$ small. Considering at each level $l=2,\ldots,L$ independent pairs $(X_{h_l},X_{h_{l-1}})$ of correlated random variables, the MLMC estimator then reads

(1.7) \begin{equation}Y = frac{1}{N_1} \sum_{k=1}^{N_1} X_{h^{*}}^{k} + \sum_{l=2}^L \frac{1}{N_l} \sum_{k=1}^{N_l} (X_{h_l}^{k} - X_{h_{l-1}}^{k}),\end{equation}

where $(X_{h^{*}}^{k}, k \geq 1)$ is a sequence of independent and identically distributed random variables distributed as $X_{h^{*}}$ and $( (X^{k}_{h_l},X^{k}_{h_{l-1}}), k \geq 1 )$ for $l=2,\ldots,L$ are independent sequences of independent copies of $(X_{h_l},X_{h_{l-1}})$ and independent of $(X_{h^{*}}^{k})$ . It is known (see [Reference Giles11] or [Reference Lemaire and Pagés21]) that given a precision $\epsilon > 0$ and provided that the family $(X_h,h > 0)$ satisfies the strong and weak error estimates (1.4) and (1.3), the multilevel estimator (1.7) achieves a precision $\| Y-{\mathbb{E}}[X] \|_2^2 =\epsilon^2$ with a global complexity of order ${\text{O}}(\epsilon^{-2})$ if $\beta > 1$ , ${\text{O}}(\epsilon^{-2}(\log(\epsilon))^2)$ if $\beta = 1$ , and ${\text{O}}(\epsilon^{-2 - (1-\beta)/\alpha})$ if $\beta < 1$ . This complexity result shows the importance of the parameter $\beta$ . Finally, let us mention that in the case $\beta > 1$ it is possible to build an unbiased multilevel estimator: see [Reference Glynn and Rhee15].

Estimates (1.1) and (1.2) suggest investigating the use of the MLMC method in the PDMP framework with $\beta=1$ and $\alpha =1$ . Letting $X = F(x_T)$ and $X_h = F(\overline x_T)$ for $h > 0$ and F a smooth function, we define an MLMC estimator of ${\mathbb{E}} [F(x_T)]$ just as in (1.7) (denoted here by $Y^{\text{MLMC}}$ ) where the processes involved at the level l are correlated by thinning. Since these processes are constructed using two different time steps, the probability of accepting a proposed jump time differs from one process to the other. Moreover, the discrete components of the post-jump locations may also be different. This results in the presence of the term $V_1 h$ in the estimate (1.1). In order to improve the convergence rate (to increase the parameter $\beta$ ) in (1.1), we show that for a given PDMP $(x_t)$ we have the following auxiliary representation:

(1.8) \begin{equation}{\mathbb{E}} [F(x_T)] = {\mathbb{E}} [F(\tilde x_T) \tilde R_T].\end{equation}

The PDMP $(\tilde x_t)$ and its Euler scheme are such that their discrete components jump at the same times and in the same state. $(\tilde R_t)$ is a process that depends on $(\tilde x_t, t \in [0,T])$ . The representation (1.8) is inspired by the change of probability introduced in [Reference Xia and Giles28] and is actually valid for a general PDP (Proposition 2.2) so that ${\mathbb{E}} [F(\overline x_T)] = {\mathbb{E}} [F(\underline{\tilde x}_T) \underline{\tilde R}_T]$ , where $(\underline{\tilde x}_t)$ is the Euler scheme corresponding to $(\tilde x_t)$ and $(\underline{\tilde R}_t)$ is a process that depends on $(\underline{\tilde x}_t, t \in [0,T])$ . Letting $X=F(\tilde x_T) \tilde R_T$ and $X_h=F(\underline{\tilde x}_T) \underline{\tilde R}_T$ , we define a second MLMC estimator (denoted by $\tilde Y^{\text{MLMC}}$ ) where now the discrete components of the Euler schemes $(\underline{\tilde x}_t)$ involved at the level l always jump in the same states and at the same times. To sum up, the first MLMC estimator we consider ( $ Y^{\text{MLMC}}$ ) derives from (1.6), where the corrective term at level l is ${\mathbb{E}}[F(\overline x_T^{h_l}) - F(\overline x_T^{h_{l-1}})]$ , whereas the corrective term of the second estimator ( $\tilde Y^{\text{MLMC}}$ ) is ${\mathbb{E}}[F(\underline{\tilde x}_T^{h_l})\underline{\tilde R}_T^{h_l} - F(\underline{\tilde x}_T^{h_{l-1}})\underline{\tilde R}_T^{h_{l-1}}]$ . For readability, we no longer write the dependence of the approximations on the time step. For the processes $(F(\underline{\tilde x}_t) \underline{\tilde R}_t)$ and $(F(\tilde x_t) \tilde R_t$ ) we show the strong estimate

\begin{equation*}{\text{there exists $ \tilde V_1>0$ such that}}\ {\mathbb{E}}[ |F(\underline{\tilde x}_T) \underline{\tilde R}_T - F(\tilde x_T) \tilde R_T|^2 ] \leq \tilde V_1 h^2,\end{equation*}

so that we end up with $\beta=2$ and the complexity goes from a ${\text{O}}(\epsilon^{-2}(\log(\epsilon))^2)$ to a ${\text{O}}(\epsilon^{-2})$ .

As an application we consider the PDMP version of the two-dimensional Morris–Lecar model (see [Reference Pakdaman, Thieullen and Wainrib25]), which takes into account the precise description of the ionic channels and in which the flows are not explicit. Let us mention [Reference Benam, Le Borgne, Malrieu and Zitt3] for the application of quantitative bounds for the long-time behaviour of PDMPs to a stochastic three-dimensional Morris–Lecar model. The original deterministic Morris–Lecar model was introduced in [Reference Morris and Lecar23] to account for various oscillating states in the barnacle giant muscle fibre. Because of its low dimension, this model is among the favourite conductance-based models in computational neuroscience. Furthermore, this model is particularly interesting because it reproduces some of the main features of excitable cell response, such as the shape, amplitude, and threshold of the action potential, the refractory period. We compare the classical MC and the MLMC estimators on the two-dimensional stochastic Morris–Lecar model to estimate the mean value of the membrane potential at fixed time. It turns out that in the range of our simulations the MLMC estimator outperforms the MC one. It suggests that MLMC estimators can be used successfully in the framework of PDMPs.

As mentioned above, the quantities of interest, such as mean first spike latency, mean interspike intervals, and mean firing rate, can be modelled as expectations of path-dependent functionals of PDMPs. This setting can then be considered as a natural extension of this work.

The paper is organized as follows. In Section 2 we construct a general PDP by thinning, and we give a representation of its distribution in terms of the thinning data (Proposition 2.1). In Section 3 we establish strong error estimates (Theorems 3.13.2). In Section 4 we establish a weak error expansion (Theorem 4.1). In Section 5 we compare the efficiency of the classical and multilevel Monte Carlo estimators on the two-dimensional stochastic Morris–Lecar model.

2. Piecewise deterministic process by thinning

2.1. Construction

In this subsection we introduce the setting and recall some results on the thinning method from our previous paper [Reference Lemaire, Thieullen and Thomas22]. Let $E \,:\!= \Theta \times \mathbb{R}^d$ where $\Theta$ is a finite or countable set and $d \geq 1$ . A piecewise deterministic process (PDP) is defined via the following characteristics:

  1. a family of functions $( \Phi_{\theta}, \theta \in \Theta )$ such that $\Phi_{\theta} \colon \mathbb{R}_+ \times \mathbb{R}^d \rightarrow \mathbb{R}^d$ for all $\theta \in \Theta$ ,

  2. a measurable function $\lambda\colon E \rightarrow ]0,+\infty[$ ,

  3. a transition measure $Q \colon E \times \mathcal{B}(E) \rightarrow [0,1]$ .

We let $x = (\theta,\nu) $ denote a generic element of E. We only consider PDPs with continuous $\nu$ -component, so for $A \in \mathcal{B}(\Theta)$ and $B \in \mathcal{B}(\mathbb{R}^d)$ we write

(2.1) \begin{equation}Q(x, A \times B) = Q(x, A) \delta_{\nu}(B).\end{equation}

If we write $x=(\theta_x,\nu_x)$ , then it holds that

\begin{equation*}Q((\theta_x,\Phi_{\theta_x}(t,\nu_x)), \,{\text{d}} \theta \,{\text{d}} \nu)=Q((\theta_x,\Phi_{\theta_x}(t,\nu_x)),\,{\text{d}} \theta)\delta_{\Phi_{\theta_x}(t,\nu_x)}({\text{d}} \nu).\end{equation*}

Our results do not depend on the dimension of the variable in $\mathbb{R}^d$ so we restrict ourselves to $\mathbb{R}$ ( $d=1$ ) for readability. We work under the following assumption.

Assumption 2.1. There exists $\lambda^{*} < +\infty$ such that, for all $x \in E$ , $ \lambda(x) \leq \lambda^{*}$ .

In [Reference Lemaire, Thieullen and Thomas22] we considered a general upper bound $\lambda^{*}$ . In the present paper $\lambda^{*}$ is a constant (see Assumption 2.1). Let $( \Omega,\mathcal{F},\mathbb{P} )$ be a probability space on which we define the following.

  1. (1) A homogeneous Poisson process $(N^{*}_t,t \geq 0)$ with intensity $\lambda^{*}$ (given in Assumption 2.1) whose successive jump times are denoted $(T^{*}_k, k \geq 1 )$ . We set $T^{*}_0 =0$ .

  2. (2) Two sequences of i.i.d. random variables with uniform distribution on [0, 1], $(U_k, k \geq 1)$ and $(V_k, k \geq 1)$ independent of each other and independent of $( T^{*}_k, k \geq 1 )$ .

Given $T>0$ we construct iteratively the sequence of jump times and post-jump locations $(T_n,(\theta_n,\nu_n), n \geq 0)$ of the E-valued PDP $(x_t, t \in [0,T])$ that we want to obtain in the end using its characteristics $( \Phi, \lambda, Q )$ . Let $(\theta_0,\nu_0) \in E$ be fixed and let $T_0 = 0$ . We construct $T_1$ by thinning of $(T^{*}_k)$ , that is,

(2.2) \begin{equation}T_1 \,:\!= T^{*}_{\tau_1},\end{equation}

where

(2.3) \begin{equation}\tau_1 \,:\!= \inf \{ k > 0\colon U_k \lambda^{*} \leq \lambda( \theta_0, \Phi_{\theta_0}( T^{*}_k,\nu_0))\}.\end{equation}

We let $|\Theta|$ denote the cardinal of $\Theta$ (which may be infinite) and we set $\Theta = \{k_1,\ldots,k_{|\Theta|}\}$ . For $j \in \{ 1,\ldots,|\Theta| \}$ we introduce the functions $a_j$ defined on E by

(2.4) \begin{equation}a_j(x) \,:\!= \sum_{i=1}^j Q(x,\{k_i\})\quad\text{{for all} } x \in E.\end{equation}

By convention, we set $a_0 \,:\!= 0$ . We also introduce the function H defined by

\begin{equation*}H(x,u) \,:\!= \sum_{i=1}^{|\Theta|} k_i \mathbb{1}_{ a_{i-1}(x) < u \leq a_i(x)}\quad\text{{for all} } x \in E, \ u \in [0,1].\end{equation*}

For all $x \in E$ , $H(x,.)$ is the inverse of the cumulative distribution function of $Q(x,.)$ (see e.g. [Reference Devroye9]). Then we construct $(\theta_1,\nu_1)$ from the uniform random variable $V_1$ and the function H as follows:

\begin{align*}(\theta_1,\nu_1)&=( H( (\theta_{0},\Phi_{\theta_0}( T^{*}_{\tau_1},\nu_0)), V_1 ), \phi_{\theta_0}( T^{*}_{\tau_1},\nu_0) ) \\* &=( H( (\theta_{0},\Phi_{\theta_0}( T_{1},\nu_0)), V_1 ), \phi_{\theta_0}( T_1,\nu_0) ).\end{align*}

Thus, the distribution of $(\theta_1,\nu_1)$ given $( \tau_1, ( T^{*}_k)_{k \leq \tau_1} )$ is $Q((\theta_{0},\Phi_{\theta_0}( T^{*}_{\tau_1},\nu_0)),.)$ or, in view of (2.1),

\begin{equation*}\sum_{k \in \Theta} Q ( (\theta_{0},\Phi_{\theta_0}( T^{*}_{\tau_1},\nu_0)), \{k\} ) \delta_{( k,\phi_{\theta_0}( T^{*}_{\tau_1},\nu_0) )}.\end{equation*}

For $n>1$ , assume that $(\tau_{n-1}, (T^{*}_k)_{k \leq \tau_{n-1}}, (\theta_{n-1},\nu_{n-1}) )$ is constructed. Then we construct $T_n$ by thinning of $(T^{*}_k)$ conditionally to $(\tau_{n-1}, ( T^{*}_k)_{k \leq \tau_{n-1}}, (\theta_{n-1},\nu_{n-1}) )$ , that is,

\begin{equation*}T_n \,:\!= T^{*}_{\tau_n},\end{equation*}

where

\begin{equation*}\tau_n \,:\!= \inf \{ k > \tau_{n-1}\colon U_k \lambda^{*} \leq \lambda(\theta_{n-1}, \Phi_{\theta_{n-1}}(T^{*}_k-T^{*}_{\tau_{n-1}},\nu_{n-1}) ) \}.\end{equation*}

Then we construct $(\theta_n,\nu_n)$ using the uniform random variable $V_n$ and the function H as follows:

\begin{align*}(\theta_n,\nu_n)&\,:\!=( H( (\theta_{n-1}, \Phi_{\theta_{n-1}}( T^{*}_{\tau_n} - T^{*}_{\tau_{n-1}},\nu_{n-1})), V_n ), \Phi_{\theta_{n-1}}( T^{*}_{\tau_n} - T^{*}_{\tau_{n-1}},\nu_{n-1}) )\\* &\,\kern1pt =( H( (\theta_{n-1}, \Phi_{\theta_{n-1}}( T_{n} - T_{n-1},\nu_{n-1})), V_n ), \Phi_{\theta_{n-1}}( T_n - T_{n-1},\nu_{n-1}) )\notag.\end{align*}

We define the PDP $x_t$ for all $t \in [0,T]$ from the process $(T_n,(\theta_n,\nu_n))$ by

(2.5) \begin{equation}x_t\,:\!=( \theta_{n}, \Phi_{\theta_{n}}(t-T_{n},\nu_{n}) ),\quad t \in [T_n,T_{n+1}[.\end{equation}

Thus $x_{T_n}=(\theta_n,\nu_n)$ and $x^-_{T_n} = (\theta_{n-1},\nu_n)$ . We also define the counting process associated with the jump times $N_t \,:\!= \sum_{n \geq 1} \mathbb{1}_{T_n \leq t}$ .

2.2. Approximation of a PDP

In applications we may not know the functions $\Phi_{\theta}$ explicitly. In this case, we use a numerical scheme $\overline \Phi_{\theta}$ approximating $\Phi_{\theta}$ . In this paper we consider schemes such that there exist positive constants $C_1$ and $C_2$ independent of h and $\theta$ such that

(2.6) \begin{equation}\sup_{t\in{[0,T]}} | \Phi_{\theta}(t,\nu_1)-\overline \Phi_{\theta}(t,\nu_2) | \leq {\text{e}}^{C_1 T} |\nu_1-\nu_2| + C_2 h\quad\text{{for all} } \theta \in \Theta, \ (\nu_1, \nu_2) \in \mathbb{R}^2.\end{equation}

To each family $(\overline \Phi_{\theta})$ we can associate a PDP constructed as above, which we denote by $(\overline x_t)$ . We emphasize that there is a positive probability that $(x_t)$ and $(\overline x_t)$ jump at different times and/or in different states, even if they are both constructed from the same data $(N^{*}_t)$ , $(U_k)$ , and $(V_k)$ . However, if the characteristics $(\Phi,\tilde \lambda, \tilde Q)$ of a PDP $(\tilde x_t)$ are such that $\tilde \lambda$ and $\tilde Q$ depend only on $\theta$ , i.e. $\tilde \lambda(x) = \tilde \lambda(\theta)$ and $\tilde Q(x,.) = \tilde Q(\theta,.)$ for all $x=(\theta,\nu) \in E$ , then its embedded Markov chain $(\tilde T_n, (\tilde \theta_n,\tilde \nu_n), n\geq 0)$ is such that $(\tilde \theta_n, n\geq 0)$ is an autonomous Markov chain with kernel $\tilde Q$ and $(\tilde T_n, n\geq 0)$ is a counting process with intensity $\tilde \lambda_t = \sum_{n \geq 0} \tilde \lambda(\tilde \theta_n) \mathbb{1}_{\tilde T_n \leq t < \tilde T_{n+1}}$ . In particular, $(\tilde \theta_n)$ and $(\tilde \tau_n)$ do not depend on $\Phi$ . The particular form of the characteristics $\tilde \lambda$ and $\tilde Q$ implies that the PDP $(\tilde x_t)$ and its approximation $(\underline{\tilde x_t})$ are correlated via the same process $(\tilde \tau_n,\tilde \theta_n)$ . In other words, these processes always jump at exactly the same times and their $\theta$ -components always jump in the same states. Such processes $(\tilde x_t)$ are easier theoretically as well as numerically than the general case. They will be useful for us below.

The following lemma (which is important for several proofs below) gives a direct consequence of the estimate (2.6).

Lemma 2.1. Let $(\Phi_{\theta})$ and $(\overline \Phi_{\theta})$ satisfy (2.6). Let $(t_n, n \geq 0)$ be an increasing sequence of non-negative real numbers with $t_0=0$ and let $(\alpha_n, n \geq 0)$ be a sequence of $\Theta$ -valued components. For a given $\nu \in \mathbb{R}$ , let us define iteratively the sequences $(\beta_n, n \geq 0)$ and $(\overline \beta_n, n \geq 0)$ as follows:

\vskip-15pt\begin{alignat*}{3}\beta_n & = \Phi_{\alpha_{n-1}}(t_n-t_{n-1},\beta_{n-1}), & \qquad\overline \beta_n & = \overline \Phi_{\alpha_{n-1}}(t_n-t_{n-1},\overline \beta_{n-1}),\\\beta_0 &=\nu, &\quad\overline \beta_0 &=\nu.\end{alignat*}

Then for all $n \geq 1$ we have

\begin{equation*}|\overline \beta_n - \beta_n|\leq{\text{e}}^{C_1 t_n} n C_2 h,\end{equation*}

where $C_1$ and $C_2$ are positive constants independent of h.

Proof of Lemma 2.1. Let $n \geq 1$ . From the estimate (2.6), we have for all $k \leq n$

\begin{equation*}| \overline \beta_k - \beta_k |\leq{\text{e}}^{C_1(t_k - t_{k-1})}|\overline \beta_{k-1} - \beta_{k-1}| + C_2 h,\end{equation*}

and therefore

\begin{equation*}{\text{e}}^{- C_1 t_k} | \overline \beta_k - \beta_k |\leq{\text{e}}^{-C_1 t_{k-1}}|\overline \beta_{k-1} - \beta_{k-1}| + C_2 h.\end{equation*}

By summing up these inequalities for $1 \leq k \leq n$ and since $\beta_0 = \overline \beta_0$ , we obtain

\begin{equation}| \overline \beta_n - \beta_n |\leq{\text{e}}^{C_1 t_n} n C_2 h.\end{equation}

2.3. Application to the construction of a PDMP and its associated Euler scheme

In this subsection we define a PDMP and its associated Euler scheme from the construction of the Section 2.1. Consider a family of vector fields $(\,f_{\theta}, \theta \in \Theta)$ satisfying the following.

Assumption 2.2. For all $\theta \in \Theta$ , the function $f_{\theta} \colon \mathbb{R} \rightarrow \mathbb{R}$ is bounded and Lipschitz with constant L independent of $\theta$ .

If we choose $\Phi_{\theta} = \phi_{\theta}$ in the above construction, where for all $x = (\theta,\nu) \in E$ we let $(\phi_{\theta}(t,\nu), t \geq 0)$ denote the unique solution of the ODE

(2.7) \begin{equation}\dfrac{{\text{d}} y(t)}{{\text{d}} t} = f_{\theta} ( y(t) ), \quad y(0) = \nu,\end{equation}

then the corresponding PDP is Markov since $\phi$ satisfies the semi-group property that reads $\phi_{\theta}(t+s,\nu) = \phi_{\theta}(t,\phi_{\theta}(s,\nu))$ for all $t,s \geq 0$ and for all $(\theta,\nu) \in E$ . In this case, the process $(x_t)$ is a piecewise deterministic Markov process (see [Reference Davis6] or [Reference Jacobsen20]).

Let $h > 0$ . We approximate the solution of (2.7) by the Euler scheme with time step h. First, we define the Euler subdivision of $[0,+\infty[$ with time step h, denoted $(\overline t_i, i \geq 0)$ , by $\overline t_i \,:\!= ih$ .

Then, for all $x = (\theta,\nu) \in E$ , we define the sequence $(\overline y_i(x), i \geq 0)$ , the classical Euler scheme, iteratively by

\begin{equation*}\overline y_{i+1}(x) = \overline y_i(x) + h f_{\theta}(\overline y_i(x)), \quad \overline y_0(x) = \nu,\end{equation*}

to emphasize its dependence on the initial condition. Finally, for all $x = (\theta,\nu) \in E$ , we set

(2.8) \begin{equation}\overline \phi_{\theta}(t,\nu)\,:\!=\overline y_i(x) + (t - \overline t_i)\, f_{\theta}(\overline y_i(x))\quad\text{{for all} } t \in [\overline t_i,\overline t_{i+1}].\end{equation}

We construct the approximating process $(\overline x_t)$ as follows. Its continuous component starts from $\nu_0$ at time 0 and follows the flow $\overline \phi_{\theta_0}(t,\nu_0)$ until the first jump time $\overline T_1$ , which we construct by (2.2) and (2.3) of Section 2.1, where we replace $\Phi_{\theta_0}( T^{*}_k,\nu_0)$ by $\overline \phi_{\theta_0}( T^{*}_k,\nu_0)$ . At time $\overline T_1$ the continuous component of $\overline x_{\overline T_1}$ is equal to $\overline \phi_{\theta_0}( \overline T_1,\nu_0)\,:\!=\overline \nu_1$ since there is no jump in the continuous component. The discrete component jumps to $\overline \theta_1$ . We iterate this procedure with the new flow $\overline \phi_{\overline \theta_1}(t- \overline T_1,\overline \nu_1)$ until the next jump time $\overline T_2$ given by (2.2) and (2.3) with $\overline \phi_{\overline \theta_1}( T^{*}_k-\overline T_1, \overline \nu_1)$ , and so on. We proceed by iteration to construct $(\overline x_t)$ on [0, T].

Consequently, the discretization grid for $(\overline x_t)$ on the interval [0, T] is random and is formed by the points $\overline T_n + kh$ for $n = 0,\ldots, \overline N_T$ and $k = 0, \ldots, \lfloor (\overline T_{n+1} \wedge T - \overline T_n)/h \rfloor$ . This differs from the SDE case, where the classical grid is fixed.

By classical results of numerical analysis (see e.g. [Reference Hairer, Nørsett and Wanner17]), the continuous Euler scheme (2.8) (also called the Euler polygon) satisfies estimate (2.6). If we choose $\Phi_{\theta} = \overline \phi_{\theta}$ in the above construction then the corresponding PDP $(\overline x_t)$ is not Markov, since the functions $\overline \phi_{\theta}(.,\nu)$ do not satisfy the semi-group property (see [Reference Jacobsen20]).

2.4. Thinning representation for the marginal distribution of a PDP

The sequence $(T_n,(\theta_n,\nu_n), n\geq 0)$ is an $\mathbb{R}_+ \times E$ -valued Markov chain with respect to its natural filtration $\mathcal{F}_n$ and with kernel K defined by

(2.9) \begin{align}& K ((t,\theta,\nu), \,{\text{d}} u \,{\text{d}} j \,{\text{d}} z)\notag \\* &\quad \,:\!=\mathbb{1}_{u \geq t}\, \lambda(\theta,\Phi_{\theta}(u-t,\nu))\ {\exp\bigg(\!{-\int_{0}^{u-t}\lambda(\theta,\Phi_{\theta}(s,\nu))\,{\text{d}} s}\bigg)}\, Q((\theta,\Phi_{\theta}(u-t,\nu)), \,{\text{d}} j \,{\text{d}} z) \,{\text{d}} u .\end{align}

For $n \geq 0$ , the law of the random variable $T_n-T_{n-1}$ given $\mathcal{F}_{n-1}$ admits the density given for $t \geq 0$ by

(2.10) \begin{equation}\lambda(\theta_{n-1},\Phi_{\theta_{n-1}}(t,\nu_{n-1}))\ {\exp\bigg(\!{- \int_0^t \lambda(\theta_{n-1},\Phi(s,\nu_{n-1})) \,{\text{d}} s}\bigg)}.\end{equation}

Classically, the marginal distribution of $x_t$ is expressed using (2.5), the intensity $\lambda$ via (2.10), and the kernel K (see (2.9)). Indeed, for fixed $x_0=x \in E$ and for any bounded measurable function g, we can write

(2.11) \begin{align}{\mathbb{E}} [ g(x_t) ] &=\sum_{n \geq 0} {\mathbb{E}} [ g( \theta_n, \Phi_{\theta_n}(t-T_n,\nu_n) ) \mathbb{1}_{N_t =n} ] \notag\\* &=\sum_{n \geq 0} {\mathbb{E}} [ g( \theta_n,\Phi_{\theta_n}(t-T_n,\nu_n) ) \mathbb{1}_{T_n \leq t} {\mathbb{E}} [ \mathbb{1}_{T_{n+1}>t} | \mathcal{F}_n ] ] \notag\\ &=\sum_{n \geq 0} {\mathbb{E}} \bigg[ g(\theta_n, \Phi_{\theta_n}(t-T_n,\nu_n) )\mathbb{1}_{T_n \leq t}\ {\exp\bigg(\!{-\int_0^{t-T_n} \lambda(\theta_n, \Phi_{\theta_n}(u,\nu_n) ) \,{\text{d}} u }\bigg)} \bigg]\end{align}
(2.12) \begin{align}&\qquad\qquad= \sum_{n \geq 0} \int_0^t \!\!\int_E g(\theta, \Phi_{\theta}(t-s,\nu) )\ {\exp\bigg(\!{-\!\int_0^{t-s} \!\!\lambda( \theta, \Phi_{\theta}(u,\nu) ) \,{\text{d}} u }\bigg)}\, K^n ((0,x),\, {\text{d}} s \,{\text{d}} \theta \,{\text{d}} \nu){,} \end{align}

where $K^0 \,:\!= \delta$ and $K^n = K \circ \ldots \circ K$ n times, that is,

\begin{equation*}\int_0^t \int_E K^n ( (0,x), \,{\text{d}} s \,{\text{d}} y )=\int_0^t \int_E \int_{(\mathbb{R}_+ \times E)^{n-1}} K((0,x), \,{\text{d}} t_1 \,{\text{d}} y_1) \ldots K((t_{n-1},y_{n-1}), \,{\text{d}} s \,{\text{d}} y).\end{equation*}

However, since we have constructed $(x_t)$ by thinning, we would prefer to express the distribution of $x_t$ using the upper bound $\lambda^{*}$ , the Poisson process $( N^{*}_t, t \geq 0)$ , and the sequences $(U_k, k\in {\mathbb N})$ , $(V_k, k\in {\mathbb N})$ . In Proposition 2.1 we give another representation of (2.11). The product term which appears in the expectation on the right-hand side of the equality in Proposition 2.1 should be interpreted as the conditional survival function,

\[ t \rightarrow {\exp \bigg( {-\int_0^{t-T_n} \lambda(\theta_n, \Phi_{\theta_n}(u,\nu_n) ) \,{\text{d}} u }\bigg)}, \]

of $T_{n+1}$ in (2.11).

Proposition 2.1. Let $(x_t, t \in [0,T])$ be a PDP with characteristics $(\Phi,\lambda,Q)$ constructed in Section 2.1and let $n \in \mathbb{N}$ . Then

\begin{align*}{\mathbb{E}}[g(x_t)\mathbb{1}_{\{N_t=n\}}]&=\sum_{1\leq p_1<p_2 \cdots <p_n\leq m} \sum_{\theta\in \Theta} {\mathbb{E}}\bigg[Q(x^-_{T^{*}_{p_{n}}},\theta)\, g( \theta, \Phi_{\theta}(t-T^{*}_{p_n},\nu_n))\\* &\quad\, \times\mathbb{1}_{\{\tau_i=p_i, 1\leq i\leq n, N^{*}_t=m\}}\prod_{q=p_n+1}^{m}\bigg(1-\dfrac{\lambda(\theta,\Phi_{\theta}(T_{q}^{*}-T_{p_n}^{*},\nu_n))}{\lambda^{*}}\bigg)\bigg].\end{align*}

The following proposition and its corollaries will be useful in Section 3. In their statements $(x_t, t \in [0,T])$ and $(\tilde x_t, t \in [0,T])$ are PDPs constructed in Section 2.1 using the same data $( N^{*}_t)$ , $(U_k)$ , $(V_k)$ and the same initial point $x \in E$ but with different sets of characteristics.

The following results are inspired by the change of probability introduced in [Reference Xia and Giles28] where the authors are interested in the application of the MLMC to jump-diffusion SDEs with state-dependent intensity. In our case we need a change of probability which guarantees not only that the processes jump at the same times but also in the same states.

Proposition 2.2. Let $(\Phi,\lambda,Q)$ (resp. $(\Phi,\tilde \lambda,\tilde Q)$ ) denote the characteristics of $(x_t)$ (resp. $(\tilde x_t)$ ). Let us assume that $\tilde\lambda$ and $\tilde Q$ depend only on $\theta$ , that $\tilde Q$ is always positive, and $0< \tilde \lambda(\theta) < \lambda^{*}$ for all $\theta \in \Theta$ . For all integer n, let us define on the event $\{ \tilde N_t = n \},$

\begin{equation*}\tilde Z_n=\dfrac{Q(\tilde x^-_{T^{*}_{\tilde \tau_n}},\tilde\theta_n)}{\tilde Q(\tilde\theta_{n-1},\tilde\theta_n)} \bigg( \bigg(1-\dfrac{\tilde\lambda(\tilde\theta_n)}{\lambda^{*}} \bigg)^{N_t^{*}-\tilde\tau_n} \bigg)^{-1} \prod_{q=\tilde\tau_n+1}^{N_t^{*}} \bigg( 1-\dfrac{\lambda(\tilde\theta_n,\Phi_{\tilde\theta_n}(T_{q}^{*}-T_{\tilde \tau_n}^{*},\tilde\nu_n))}{\lambda^{*}} \bigg),\end{equation*}

the product being equal to 1 if $\tilde \tau_n=N_t^{*}$ and, for all $1\leq \ell\leq n-1,$

\begin{align*}\tilde Z_{\ell}&=\dfrac{Q(\tilde x^-_{T^{*}_{\tilde \tau_{\ell}}},\tilde\theta_{\ell})}{\tilde Q(\tilde\theta_{\ell-1},\tilde\theta_{\ell})}\, \bigg(\dfrac{\tilde\lambda(\tilde\theta_{\ell})}{\lambda^{*}}\bigg(1-\dfrac{\tilde\lambda(\tilde\theta_{\ell})}{\lambda^{*}} \bigg)^{\tilde\tau_{\ell+1}-\tilde\tau_{\ell}-1}\bigg)^{-1}\notag\\* &\quad\, \times \dfrac{\lambda(\tilde\theta_{\ell},\Phi_{\tilde\theta_{\ell}}(T_{\tilde\tau_{\ell+1}}^{*}-T_{\tilde\tau_{\ell}}^{*},\tilde\nu_{\ell}))}{\lambda^{*}}\prod_{q=\tilde\tau_{\ell} +1}^{\tilde\tau_{\ell+1}-1} \bigg(1-\dfrac{\lambda(\tilde\theta_{\ell},\Phi_{\tilde\theta_{\ell}}(T_{q}^{*}-T_{\tilde\tau_{\ell}}^{*},\tilde\nu_{\ell}))}{\lambda^{*}} \bigg),\\ \tilde Z_0&=\bigg( \dfrac{\tilde \lambda(\tilde \theta_0)}{\lambda^{*}} \bigg( 1-\dfrac{\tilde \lambda(\tilde \theta_0)}{\lambda^{*}} \bigg)^{\tilde \tau_1 -1} \bigg)^{-1}\dfrac{\lambda(\tilde \theta_{0},\Phi_{\tilde \theta_{0}}(T^{*}_{\tilde \tau_1},\tilde \nu_{0}))}{\lambda^{*}} \prod_{q=1}^{\tilde \tau_1 -1} \bigg( 1-\dfrac{\lambda(\tilde \theta_{0},\Phi_{\tilde \theta_{0}}(T^{*}_q,\tilde \nu_{0}))}{\lambda^{*}} \bigg),\\* \tilde R_n&=\tilde Z_n\, \, \, \prod_{\ell=0}^{n-1} \, \tilde Z_{\ell}.\end{align*}

Then, for all $n \geq 0$ we have

\begin{equation*}{\mathbb{E}}[g(\tilde x_t)\, \tilde R_n\, \mathbb{1}_{\{\tilde N_t=n\}}]={\mathbb{E}}[g(x_t)\, \mathbb{1}_{\{N_t=n\}}].\end{equation*}

Corollary 2.1. Under the assumptions of Proposition 2.2, setting $\tilde R_t = \tilde R_{\tilde N_t}$ , we have

\begin{equation*}{\mathbb{E}}[g(\tilde x_t) \tilde R_t ]={\mathbb{E}}[g(x_t)].\end{equation*}

Remark 2.1. Proposition 2.2 looks like Girsanov’s theorem (see [Reference Palmowski and Rolski26]), but we do not use martingale theory here.

Remark 2.2. We have chosen to state Proposition 2.2 with a PDP $(\tilde x_t)$ whose intensity and transition measure only depend on $\theta$ for the sake of readability. In fact the arguments of the proof are valid for non-homogeneous intensity and transition measures of the form $\tilde \lambda (x,t)$ and $\tilde Q((x,t),{\text{d}} y)$ for $x=(\theta,\nu) \in E$ . A possible choice of such characteristics is $\tilde \lambda (x,t) = \lambda(\theta, \tilde \Phi_{\theta}(t,\nu))$ and $\tilde Q((x,t),{\text{d}} y) = Q( (\theta,\tilde \Phi_{\theta}(t,\nu)), {\text{d}} y )$ for $\tilde \Phi$ a given function. This remark will be implemented in Section 5.4.

Corollary 2.2. Let $(\Phi,\lambda,Q)$ (resp. $(\tilde \Phi, \lambda, Q)$ ) be the set of characteristics of $(x_t)$ (resp. $(\tilde x_t))$ . We assume that Q is always positive and that $0< \lambda(x) < \lambda^{*}$ for all $x \in E$ . Let $(\mu_n)$ be the sequence defined by $\mu_0 = \nu$ and $\mu_n = \tilde \Phi_{\theta_{n-1}}(T_n-T_{n-1},\mu_{n-1})$ for $n \geq 1$ . For all integer n, let us define on the event $\{ N_t = n \}$ ,

\begin{align*}\tilde Z_n&=\dfrac{Q ( (\theta_{n-1},\mu_n),\theta_n )}{Q ( (\theta_{n-1},\nu_n),\theta_n )}\Bigg( \prod_{q=\tau_n+1}^{N_t^{*}} 1-\dfrac{\lambda ( \theta_n, \Phi_{\theta_n}(T^{*}_q - T^{*}_{\tau_n},\nu_n) )}{\lambda^{*}} \Bigg)^{-1}\\* &\quad\, \times\prod_{q=\tau_n+1}^{N_t^{*}} \bigg(1-\dfrac{\lambda ( \theta_n, \tilde \Phi_{\theta_n}(T^{*}_q - T^{*}_{\tau_n},\mu_n) )}{\lambda^{*}} \bigg),\end{align*}

the products being equal to 1 if $ \tau_n=N_t^{*}$ and for all $1\leq \ell\leq n-1,$

\begin{align*}\tilde Z_{\ell}&=\dfrac{Q ( (\theta_{\ell-1},\mu_{\ell}),\theta_{\ell} )}{ Q ( (\theta_{\ell-1},\nu_{\ell}),\theta_{\ell} )}\,\bigg( \dfrac{\lambda ( \theta_{\ell}, \Phi_{\theta_{\ell}}(T^{*}_{\tau_{\ell+1}} - T^{*}_{\tau_{\ell}},\nu_{\ell}) )}{\lambda^{*}}\prod_{q=\tau_{\ell} +1}^{\tau_{\ell+1}-1}\! \bigg(\! 1-\dfrac{\lambda ( \theta_{\ell}, \Phi_{\theta_{\ell}}(T^{*}_{q} - T^{*}_{\tau_{\ell}},\nu_{\ell}) )}{\lambda^{*}} \bigg) \bigg)^{-1} \\* &\quad\, \times \dfrac{\lambda ( \theta_{\ell}, \tilde \Phi_{\theta_{\ell}}(T^{*}_{\tau_{\ell+1}} - T^{*}_{\tau_{\ell}},\mu_{\ell}) )}{\lambda^{*}}\prod_{q=\tau_{\ell} +1}^{\tau_{\ell+1}-1} \bigg( 1-\dfrac{\lambda ( \theta_{\ell}, \tilde \Phi_{\theta_{\ell}}(T^{*}_{q} - T^{*}_{\tau_{\ell}},\mu_{\ell}) )}{\lambda^{*}} \bigg),\notag\\ \tilde Z_0&=\bigg(\dfrac{\lambda \big( \theta_0, \Phi_{\theta_0}(T^{*}_{\tau_{1}},\nu_0) \big)}{\lambda^{*}} \prod_{q=1}^{\tau_{1}-1} \bigg( 1-\dfrac{\lambda \big( \theta_0, \Phi_{\theta_0}(T^{*}_{q},\nu_0) \big)}{\lambda^{*}} \bigg) \bigg)^{-1} \notag\\& \quad\, \times \dfrac{\lambda \big( \theta_0, \tilde \Phi_{\theta_0}(T^{*}_{\tau_{1}},\mu_0) \big)}{\lambda^{*}} \prod_{q=1}^{\tau_{1}-1} \bigg( 1-\dfrac{\lambda \big( \theta_0, \tilde \Phi_{\theta_0}(T^{*}_{q},\mu_0) \big)}{\lambda^{*}} \bigg),\\ \tilde R_n&=\tilde Z_n\, \, \, \prod_{\ell=0}^{n-1} \, \tilde Z_{\ell}. \notag\end{align*}

Then, for all $n \geq 0$ we have

\begin{equation*}{\mathbb{E}}[g (\theta_n, \tilde \Phi_{\theta_n}(t-T_n,\mu_n) )\, \tilde R_n\, \mathbb{1}_{\{ N_t=n\}}]={\mathbb{E}}[g(\tilde x_t)\, \mathbb{1}_{\{\tilde N_t=n\}}].\end{equation*}

Proof of Proposition 2.1. It holds that $\{N_t=n, \tau_i=p_i,\, 1\leq i\leq n\}\subset \{N_t^{*}\geq p_n\}$ . Then

(2.13) \begin{equation}{\mathbb{E}}[g(x_t)\mathbb{1}_{\{N_t=n\}}]=\sum_{1\leq p_1<p_2< \cdots <p_{n}\leq m}\, \, {\mathbb{E}}[g(x_t)\mathbb{1}_{\{N_t=n, \tau_i=p_i,\, 1\leq i\leq n,\, N_t^{*}=m\}}].\end{equation}

The set $\{N_t=n, \tau_i=p_i,\, 1\leq i\leq n, N_t^{*}=m\}$ is equivalent to the following:

  1. $N_t^{*}=m$ ,

  2. among the times $T^{*}_{\ell}, 1\leq \ell\leq m$ , exactly n are accepted by the thinning method; these are the $T^{*}_{p_i}, 1\leq i\leq n$ , and all the others are rejected.

We proceed by induction, starting from the fact that all the $T^{*}_q, \, p_n+1\leq q\leq m$ are rejected, which corresponds to the event

\begin{equation*} U_q> \dfrac{\lambda(\theta_n,\Phi_{\theta_n}(T_q^{*}-T_{p_n}^{*},\nu_n))}{\lambda^{*}}\quad {\text{for all } p_n+1\leq q\leq m}.\end{equation*}

The random variable $\mathbb{1}_{\{\tau_i=p_i,\, 1\leq i\leq n\}}$ depends on

\[(\theta_{\ell}, \nu_{\ell}, 1\leq \ell \leq n-1, T^{*}_i, 1\leq i\leq p_n, U_j, 1\leq j\leq p_n ){,}\]

where by construction $\nu_{\ell}=\phi_{\theta_{\ell-1}}(T^{*}_{p_{\ell}}-T^{*}_{p_{\ell-1}},\nu_{\ell-1})$ , $\theta_{\ell}=H((\theta_{\ell-1},\nu_{\ell}),V_{\ell})$ , which implies that $(\theta_{\ell}, \nu_{\ell}, 1\leq \ell \leq n-1)$ depend on $(T^{*}_i, 1\leq i\leq p_{n-1}, U_j, 1\leq j\leq p_{n-1}, V_k, 1\leq k\leq n-1).$ Thus $V_n$ is independent of all the other random variables of thinning that are present in $g(x_t)\mathbb{1}_{\{N_t=n, \tau_i=p_i,\, 1\leq i\leq n, \, N_t^{*}=m\}}$ . The conditional expectation of $g(x_t)\mathbb{1}_{\{N_t=n, \tau_i=p_i,\, 1\leq i\leq n, N_t^{*}=m\}}$ with respect to the vector $(T^{*}_i, 1\leq i\leq m+1, U_j, 1\leq j\leq m, V_k, 1\leq k\leq n-1)$ is therefore an expectation indexed by this vector as parameters. Since the law of $H(x,V_n)$ is $Q(x,\cdot)$ for all $x \in E$ , we obtain for $p_1<p_2< \cdots <p_n\leq m$ ,

(2.14) \begin{align}&{\mathbb{E}}[g(x_t)\mathbb{1}_{\{N_t=n, \tau_i=p_i,\, 1\leq i\leq n, \, N_t^{*}=m\}}]\notag\\* &\quad = {\mathbb{E}}\Bigg[\sum_{\theta\in \Theta} Q(x^-_{T^{*}_{p_{n}}},\theta)\, g( \theta, \Phi_{\theta}(t-T^{*}_{p_n},\nu_n))\notag\\* &\quad\quad\, \times F(\theta,U_j,1\leq j\leq m,\, T^{*}_{\ell},1\leq \ell\leq m+1, V_k, 1\leq k\leq n-1)\Bigg],\end{align}

with

\begin{align*}&F(\theta,U_j,1\leq j\leq m,\, T^{*}_{\ell},1\leq \ell\leq m+1, V_k, 1\leq k\leq n-1)\notag\\* &\quad =\mathbb{1}_{\{N_t^{*}=m, \tau_i=p_i,\, 1\leq i\leq n\}}\prod_{q=p_n+1}^{m}\, \mathbb{1}_{U_{q}> {{{\lambda(\theta,\Phi_{\theta}(T_{q}^{*}-T_{p_n}^{*},\nu_n))}/{\lambda^{*}}}}}.\end{align*}

In (2.14) the random variables $(U_{q}, \, p_n+1\leq q\leq m)$ are independent of the vector $(T^{*}_i, 1\leq i\leq m+1, U_j, 1\leq j\leq p_n, V_k, 1\leq k\leq n-1)$ . Conditioning by this vector, we obtain

(2.15) \begin{align}&{\mathbb{E}}[g(x_t)\mathbb{1}_{\{N_t=n, \tau_i=p_i,\, 1\leq i\leq n, \, N_t^{*}=m\}}] \notag\\* &\quad =\sum_{\theta\in \Theta}\, {\mathbb{E}}\bigg[Q(x^-_{T^{*}_{p_{n}}},\theta)\, g( \theta, \Phi_{\theta}(t-T^{*}_{p_n},\nu_n))\mathbb{1}_{\{N_t^{*}=m, \tau_i=p_i,\, 1\leq i\leq n\}}\notag\\* &\quad\quad\, \times \prod_{q=p_n+1}^{m}\bigg(1-\dfrac{\lambda(\theta,\Phi_{\theta}(T_{q}^{*}-T_{p_n}^{*},\nu_n))}{\lambda^{*}}\bigg)\bigg].\end{align}

Summing the above equality over $1\leq p_1<p_2<\cdots <p_{n}\leq m$ and using equation (2.13) yields the result.

Just as with the successive use of conditioning to obtain (2.12), we can iterate on the form (2.15) by first conditioning $V_{n-1}$ by all the other random variables and then conditioning $(U_{q}, \, p_{n-1}+1\leq q \leq p_n)$ by all the remaining ones, and so on. However, the terms that appear do not have the same structure, since the $U_{q}$ correspond to rejection for $p_{n-1}+1 \leq q \leq p_n-1$ whereas $U_{p_{n}}$ corresponds to acceptance. Consequently the next step yields

(2.16) \begin{align}&{\mathbb{E}}[g(x_t)\mathbb{1}_{\{N_t=n, \tau_i=p_i,\, 1\leq i\leq n, \, N_t^{*}=m\}}] \notag\\* &\quad =\sum_{\alpha\in \Theta}\sum_{\theta\in \Theta}\, {\mathbb{E}}\bigg[Q(x^-_{T^{*}_{p_{n-1}}},\alpha)Q ( (\alpha,\nu_n),\theta ) \notag \\&\quad\quad\, \times g( \theta, \Phi_{\theta}(t-T^{*}_{p_n},\nu_n))\mathbb{1}_{\{N_t^{*}=m, \tau_i=p_i,\, 1\leq i\leq n-1\}}\notag\\&\quad\quad\, \times \dfrac{\lambda(\alpha,\Phi_{\alpha}(T_{p_n}^{*}-T_{p_{n-1}}^{*},\nu_{n-1}))}{\lambda^{*}}\prod_{q=p_{n-1}+1}^{p_n-1}\bigg(1-\dfrac{\lambda(\alpha,\Phi_{\alpha}(T_{q}^{*}-T_{p_{n-1}}^{*},\nu_{n-1}))}{\lambda^{*}}\bigg)\notag\\* &\quad\quad\, \times \prod_{q=p_n+1}^{m}\bigg(1-\dfrac{\lambda(\theta,\Phi_{\theta}(T_{q}^{*}-T_{p_n}^{*},\nu_n))}{\lambda^{*}}\bigg)\bigg],\end{align}

where we write $\nu_n$ for simplicity, keeping in mind that

\begin{align*}\nu_n&=\Phi_{\theta_{n-1}}(T^{*}_{p_n}-T^{*}_{p_{n-1}},\nu_{n-1})\\* &=\Phi_{\theta_{n-1}}(T^{*}_{p_n}-T^{*}_{p_{n-1}},\Phi_{\theta_{n-2}}(T^{*}_{p_{n-1}}-T^{*}_{p_{n-2}},\nu_{n-2}))\\* &=\Phi_{\alpha}(T^{*}_{p_n}-T^{*}_{p_{n-1}},\Phi_{\theta_{n-2}}(T^{*}_{p_{n-1}}-T^{*}_{p_{n-2}},\nu_{n-2})).\end{align*}

In (2.16), the product term

\[\dfrac{\lambda(\alpha,\Phi_{\alpha}(T_{p_n}^{*}-T_{p_{n-1}}^{*},\nu_{n-1}))}{\lambda^{*}}\prod_{q=p_{n-1}+1}^{p_n-1}\bigg(1-\dfrac{\lambda(\alpha,\Phi_{\alpha}(T_{q}^{*}-T_{p_{n-1}}^{*},\nu_{n-1}))}{\lambda^{*}}\bigg)\]

should be interpreted as the density probability function of $T_n$ which appears in (2.12) via the kernel K.

Moreover, the previous arguments apply to

\[{\mathbb{E}}(g(x_t)\, f(\theta_i,\nu_i, 1\leq i\leq n-1, \theta_n,\nu_n, T^{*}_k, 1\leq k\leq m)\, \mathbb{1}_{\{N_t=n, \tau_i=p_i,\, 1\leq i\leq n, \, N_t^{*}=m\}}){,}\]

where f is a measurable function, and provide

(2.17) \begin{align}&{\mathbb{E}}[g(x_t) f(\theta_i,\nu_i, 1\leq i\leq n-1, \theta_n,\nu_n, T^{*}_k, 1\leq k\leq m) \, \mathbb{1}_{\{N_t=n, \tau_i=p_i,\, 1\leq i\leq n, \, N_t^{*}=m\}}]\notag\\&\quad =\sum_{\theta\in \Theta} {\mathbb{E}}\bigg[Q(x^-_{T^{*}_{p_{n}}},\theta) g( \theta, \Phi_{\theta}(t-T^{*}_{p_n},\nu_n)) \notag\\&\quad\quad\, \times f(\theta_i,\nu_i, 1\leq i\leq n-1, \theta, \nu_n, T^{*}_k, 1\leq k\leq m)\, \mathbb{1}_{\{N_t^{*}=m, \tau_i=p_i,\, 1\leq i\leq n\}}\notag\\* &\quad\quad\, \times \, \prod_{q=p_n+1}^{m}\bigg(1-\dfrac{\lambda(\theta,\Phi_{\theta}(T_{q}^{*}-T_{p_n}^{*},\nu_n))}{\lambda^{*}}\bigg)\bigg].\end{align}

Below we prove Proposition 2.2. The other statements can be proved analogously.

Proof of Proposition 2.2. By assumption the (jump) characteristics $(\tilde \lambda, \tilde Q)$ of $(\tilde x_t)$ depend only on $\theta$ . Let $p_1<p_2< \cdots <p_n\leq m$ . Applying the same arguments as in (2.17) to $(\tilde x_t)$ and using the definitions of $\tilde Z_{\ell}, \, 0\leq \ell\leq n$ and $\tilde R_n$ , we obtain

\begin{align*}&{\mathbb{E}}[g(\tilde x_t)\, \tilde R_n\, \mathbb{1}_{\{\tilde N_t=n, \tilde\tau_i=p_i, 1\leq i\leq n, N_t^{*}=m\}}]\\* &\quad =\sum_{\theta\in \Theta} {\mathbb{E}}\bigg[\tilde Q(\tilde\theta_{n-1},\theta)\, g( \theta, \Phi_{\theta}(t-T^{*}_{p_n},{\tilde\nu}_n))\, \tilde Z_n\, \prod_{\ell=0}^{n-1} \, \tilde Z_{\ell}\,\mathbb{1}_{\{N_t^{*}=m, {\tilde\tau}_i=p_i,\, 1\leq i\leq n\}}\bigg] \,\bigg(1-\dfrac{\tilde\lambda(\theta)}{\lambda^{*}}\bigg)^{m-p_n}\\&\quad =\sum_{\theta\in \Theta}\, {\mathbb{E}}\bigg[\tilde Q(\tilde\theta_{n-1},\theta)\, g( \theta, \Phi_{\theta}(t-T^{*}_{p_n},\tilde\nu_n)) \, \prod_{\ell=0}^{n-1} \, \tilde Z_{\ell}\,\, \mathbb{1}_{\{N_t^{*}=m, \tilde\tau_i=p_i,\, 1\leq i\leq n\}} \,\bigg(1-\dfrac{\tilde\lambda(\theta)}{\lambda^{*}}\bigg)^{m-p_n}\\&\quad\quad\, \times \bigg(\bigg(1-\dfrac{\tilde\lambda(\theta)}{\lambda^{*}}\bigg)^{m-p_n}\bigg)^{-1}\dfrac{Q(\tilde x^-_{ T^{*}_{p_{n}}},\theta)}{\tilde Q(\tilde\theta_{n-1},\theta)}\prod_{q=p_n+1}^{m}\bigg(1-\dfrac{\lambda(\theta,\Phi_{\theta}(T_{q}^{*}-T_{p_n}^{*},\tilde\nu_n))}{\lambda^{*}}\bigg)\bigg]\\&\quad =\sum_{\theta\in \Theta}\, {\mathbb{E}}\bigg[Q(\tilde x^-_{T^{*}_{p_{n}}},\theta)\, g( \theta, \Phi_{\theta}(t-T^{*}_{p_n},\tilde\nu_n)) \,\tilde Z_{n-1}\, \prod_{\ell=0}^{n-2} \, \tilde Z_{\ell}\,\, \mathbb{1}_{\{N_t^{*}=m, \tilde\tau_i=p_i,\, 1\leq i\leq n\}}\\* &\quad\quad\, \times \prod_{q=p_n+1}^{m}\bigg(1-\dfrac{\lambda(\theta,\Phi_{\theta}(T_{q}^{*}-T_{p_n}^{*},\tilde\nu_n))}{\lambda^{*}}\bigg)\bigg].\end{align*}

We iterate the above argument based on the use of (2.17) and we use the definition of $\tilde Z_{n-1}$ to obtain

\begin{align*}&{\mathbb{E}}[g(\tilde x_t) \tilde R_n \mathbb{1}_{\{\tilde N_t=n, \tilde\tau_i=p_i,\, 1\leq i\leq n, \, N_t^{*}=m\}}] \notag\\* &\quad =\sum_{\alpha\in \Theta}\sum_{\theta\in \Theta}\, {\mathbb{E}}\bigg[Q(\tilde x^-_{T^{*}_{p_{n-1}}},\alpha) Q( (\alpha,\tilde \nu_n),\theta)\, g( \theta, \Phi_{\theta}(t-T^{*}_{p_n},\tilde\nu_n)) \notag\\&\quad\quad\, \times \prod_{\ell=0}^{n-2} \, \tilde Z_{\ell}\, \, \mathbb{1}_{\{N_t^{*}=m, \tilde\tau_i=p_i,\, 1\leq i\leq n-1\}}\prod_{q=p_n+1}^{m}\bigg(1-\dfrac{\lambda(\theta,\Phi_{\theta}(T_{q}^{*}-T_{p_n}^{*},\tilde\nu_n))}{\lambda^{*}}\bigg)\\* &\quad\quad\, \times \dfrac{\lambda(\alpha,\Phi_{\alpha}(T_{p_n}^{*}-T_{p_{n-1}}^{*},\tilde\nu_{n-1}))}{\lambda^{*}}\prod_{q=p_{n-1}+1}^{p_{n}-1}\bigg(1-\dfrac{\lambda(\alpha,\Phi_{\alpha}(T_{q}^{*}-T_{p_{n-1}}^{*},\tilde\nu_{n-1}))}{\lambda^{*}}\bigg)\bigg],\end{align*}

where for short $\tilde\nu_{n}=\phi_{\alpha}(T^{*}_{p_n}-T^{*}_{p_{n-1}},\tilde\nu_{n-1})$ and $\tilde\nu_{n-1}=\phi_{\tilde\theta_{n-2}}(T^{*}_{p_{n-1}}-T^{*}_{p_{n-2}},\tilde\nu_{n-2})$ . Comparing the latter expression to (2.16) and using induction, we conclude that

\begin{equation*}{\mathbb{E}}[g(\tilde x_t) \tilde R_n \mathbb{1}_{\{\tilde N_t=n, \tilde\tau_i=p_i,\, 1\leq i\leq n, \, N_t^{*}=m\}}]={\mathbb{E}}[g(x_t)\,\mathbb{1}_{\{N_t=n, \tau_i=p_i, 1\leq i\leq n, N_t^{*}=m\}}].\end{equation*}

It remains to sum up on $p_i, 1\leq i\leq n$ and m.

3. Strong error estimates

In this section we are interested in strong error estimates. Below, we state the main assumptions and theorems of this section. The proofs are given in Sections 3.2 and 3.3 respectively.

Assumption 3.1. For all $\theta \in \Theta$ and for all $A \in \mathcal{B}(\Theta)$ , the functions $\nu \mapsto \lambda(\theta,\nu)$ and $\nu \mapsto Q( (\theta,\nu),A)$ are Lipschitz with constants $L_{\lambda}>0$ , $L_Q>0$ , respectively, independent of $\theta$ .

Theorem 3.1. Let $\Phi_{\theta}$ and $\overline \Phi_{\theta}$ satisfy (2.6) and let $(x_t, t \in [0,T])$ and $(\overline x_t, t \in [0,T])$ be the corresponding PDPs constructed in Section 2.1 with $x_0 = \overline x_0 = x$ for some $x \in E$ . Assume that $\Theta$ is finite and that $\lambda$ and Q satisfy Assumption 3.1. Then, for all bounded functions $F\colon E \rightarrow \mathbb{R}$ such that for all $\theta \in \Theta$ the function $\nu \mapsto F(\theta,\nu)$ is $L_F$ -Lipschitz, where $L_F$ is positive and independent of $\theta$ , there exist constants $V_1 > 0$ and $V_2 > 0$ independent of the time step h such that

\begin{equation*}{\mathbb{E}} [ |F(\overline x_T) - F(x_T)|^2 ] \leq V_1 h + V_2 h^2.\end{equation*}

Remark 3.1. When the numerical scheme $\overline \Phi_{\theta}$ is of order $p \geq 1$ , which means

\[ \sup_{t \in [0,T]} |\Phi_{\theta}(t,\nu_1) - \overline \Phi_{\theta}(t,\nu_2)| \leq {\text{e}}^{C_1 T}|\nu_1-\nu_2| + C_2 h^p {,} \]

we have

\[ {\mathbb{E}} [ |F(\overline x_T) - F(x_T)|^2 ] \leq V_1 h^p + V_2 h^{2p}. \]

Assumption 3.2. There exist positive constants $\rho$ , $\tilde \lambda_{\min}$ , $\tilde \lambda_{\max}$ such that, for all $(i,j) \in \Theta^2$ , $\rho \leq \tilde Q(i,j)$ and $\tilde \lambda_{\min} \leq \tilde\lambda(i) \leq \tilde \lambda_{\max}<\lambda^{*}$ .

Theorem 3.2. Let $\Phi_{\theta}$ and $\overline \Phi_{\theta}$ satisfy (2.6) and let $(\tilde x_t, t \in [0,T])$ and $(\underline{ \tilde x}_t, t \in [0,T])$ be the corresponding PDPs constructed in Section 2.1with $\underline{\tilde x}_0 =\tilde x_0 = x$ for some $x \in E$ . Let $(\tilde R_t, t \in [0,T])$ and $(\underline{ \tilde R}_t, t \in [0,T])$ be defined as in Corollary 2.1. Under Assumptions 3.1and 3.2and for all bounded functions $F\colon E \rightarrow \mathbb{R}$ such that for all $\theta \in \Theta$ the function $\nu \mapsto F(\theta,\nu)$ is $L_F$ -Lipschitz ( $L_F > 0$ ), there exists a positive constant $\tilde V_1$ independent of the time step h such that

\begin{equation*}\mathbb{E} [ |F(\underline{\tilde x}_T) \underline{\tilde R}_T - F(\tilde x_T) \tilde R_T|^2 ]\leq\tilde V_1 h^2.\end{equation*}

We now introduce the random variable $\overline \tau^\dagger$ which will play an important role in the strong error estimate of Theorem 3.1 as well as in the identification of the coefficient $c_1$ in the weak error expansion in Section 4 (see the proof of Theorem 4.1 in Section 4.2).

Definition 3.1. Let us define $\overline \tau^\dagger \,:\!= \inf\{ k > 0 \colon (\tau_k,\theta_k) \neq (\overline \tau_k,\overline \theta_k) \}$ .

The random variable $\overline \tau^\dagger$ enables us to partition the trajectories of the pair $(x_t,\overline x_t)$ in a sense that we now make precise. Consider the event

(3.1) \begin{equation}\{ \min( T_{\overline \tau^\dagger}, \overline T_{\overline \tau^\dagger}) > T \} = \{ N_T = \overline N_T, (T_1,\theta_1) = (\overline T_1,\overline \theta_1),\ldots, (T_{N_T},\theta_{N_T}) = (\overline T_{\overline N_T},\overline \theta_{\overline N_T}) \},\end{equation}

where $(T_n)$ and $(\overline T_n)$ denote the sequences of jump times of $(x_t)$ and $(\overline x_t)$ . On this event $\{ \min( T_{\overline \tau^\dagger}, \overline T_{\overline \tau^\dagger}) > T\}$ the trajectories of the discrete time processes $(T_n,\theta_n)$ and $(\overline T_n, \overline \theta_n)$ are equal for all n such that $T_n \in [0,T]$ (or equivalently $\overline T_n \in [0,T]$ ). Moreover, the complement, i.e. $\{ \min( T_{\overline \tau^\dagger}, \overline T_{\overline \tau^\dagger}) \leq T \}$ , contains the trajectories for which $(T_n,\theta_n)$ and $(\overline T_n,\overline \theta_n)$ differ on [0, T] (there exists $n \leq N_T \vee \overline N_T$ such that $T_n \neq \overline T_n$ or $\theta_n \neq \overline \theta_n$ ).

3.1. Preliminary lemmas

In this subsection we start with two lemmas which will be useful for proving Theorems 3.2 and 3.3.

Lemma 3.1. Let K be a finite set. We let $|K|$ denote the cardinal of K, and for $i=1,\ldots,|K|$ we let $k_i$ denote its elements. Let $( p_i, 1 \leq i \leq |K| )$ and $(\overline p_i, 1 \leq i \leq |K|)$ be two probabilities on K. Let $a_j \,:\!= \sum_{i=1}^{j} p_i $ and $ \overline a_j \,:\!= \sum_{i=1}^{j} \overline p_i$ for all $j \in \{1,\ldots,|K|\}$ . By convention, we set $a_0 = \overline a_0 \,:\!= 0$ . Let X and $ \overline X $ be two K-valued random variables defined by

\[X \,:\!= G(U),\quad\overline X \,:\!= \overline G (U),\]

where

\[U \sim \mathcal{U} ([0,1]) ,\quad G(u) = \sum_{j=1}^{|K|} k_j \mathbb{1}_ { a_{j-1} < u \leq a_{j} }{,}\quad\overline G (u) = \sum_{j=1}^{|K|} k_j \mathbb{1}_ { \overline a_{j-1} < u \leq \overline a_j }\quad \text{for all $ u\in [0,1]$.}\]

Then, we have

\[\mathbb{P}(X \neq \overline X) \leq \sum_{j=1}^{|K|-1}|a_{j}-\overline a_{j}|.\]

Proof of Lemma 3.1. By definition of X and $\overline X$ and since the intervals $[ a_{j-1} , a_j ] \cap [ \overline a_{j-1} , \overline a_j]$ are disjoint for $j=1,\ldots,K$ , we have

\begin{equation*}\mathbb{P} ( X = \overline X )=\sum_{j=1}^{|K|} \mathbb{P} ( U \in { [ a_{j-1} , a_j ] \cap [ \overline a_{j-1} , \overline a_j] } ).\end{equation*}

Moreover, for all $1 \leq j \leq |K|$ , we have

\begin{equation*}\mathbb{P} ( U \in { [ a_{j-1} , a_j ] \cap [ \overline a_{j-1} , \overline a_j] } )=\begin{cases}0 & \text{if } [ a_{j-1} , a_j ] \cap [ \overline a_{j-1} , \overline a_j] = \emptyset, \\a_j \wedge \overline a_j - a_{j-1} \vee \overline a_{j-1} & \text{if } [ a_{j-1} , a_j ] \cap [ \overline a_{j-1} , \overline a_j] \neq \emptyset.\end{cases}\end{equation*}

Thus, letting $x^+ \,:\!= \max(x,0)$ denote the positive part of $x \in \mathbb{R}$ and using $x^+ \geq x$ , we obtain

\begin{equation*}\mathbb{P} ( X = \overline X )\geq\sum_{j=1}^{|K|} ( a_j \wedge \overline a_j - a_{j-1} \vee \overline a_{j-1} ).\end{equation*}

Adding and subtracting $ a_j \vee \overline a_j$ in the the above sum yields

\begin{equation*}\mathbb{P} ( X = \overline X )\geq\sum_{j=1}^{|K|} (a_j \vee \overline a_j - a_{j-1} \vee \overline a_{j-1}) + \sum_{j=1}^{|K|} (a_j \wedge \overline a_j - a_j \vee \overline a_j).\end{equation*}

The first sum above is a telescopic sum. Since $a_{|K|} = \overline a_{|K|} = 1$ and $a_{0} = \overline a_{0} = 0$ , we have

\begin{equation}\mathbb{P} ( X = \overline X ) \geq 1 - \sum_{j=1}^{|K|-1} | a_j - \overline a_j |.\end{equation}

Lemma 3.2. Let $(a_n, n \geq 1)$ and $(b_n, n \geq 1)$ be two real-valued sequences. For all $n \geq 1$ , we have

\[\prod_{i=1}^n a_i - \prod_{i=1}^n b_i = \sum_{i=1}^n ( a_i - b_i ) \prod_{j=i+1}^n a_j \prod_{j=1}^{i-1} b_j {.}\]

Proof of Lemma 3.2. We use induction.

3.2. Proof of Theorem 3.1

First, we write

\begin{align*}&\mathbb{E} [ | F(\overline x_T) - F(x_T)|^2 ] \\* &\quad =\mathbb{E} [ \mathbb{1}_{\min(T_{\overline \tau^\dagger}, \overline T_{\overline \tau^\dagger}) \leq T} |F(\overline x_T) - F(x_T)|^2 ]+\mathbb{E} [ \mathbb{1}_{\min(T_{\overline \tau^\dagger}, \overline T_{\overline \tau^\dagger}) > T} |F(\overline x_T) - F(x_T)|^2 ]\\* &\quad =\!:\,\overline P + \overline D,\end{align*}

where $\overline \tau^\dagger$ is defined in Definition 3.1. The order of the term $\overline P$ is the order of the probability that the discrete processes $(T_n,\theta_n)$ and $(\overline T_n, \overline \theta_n)$ differ on [0, T]. The order of the term $\overline D$ is given by the order of the Euler scheme squared, because the discrete processes $(T_n,\theta_n)$ and $(\overline T_n,\overline \theta_n)$ are equal on [0, T]. In the following we prove that $\overline P = {\text{O}}(h)$ and that $\overline D = {\text{O}}(h^2)$ .

Step 1: estimation of $\overline P$ . The function F being bounded, we have

\[\overline P \leq 4 M_F^2 \mathbb{P} (\!\min(T_{\overline \tau^\dagger},\overline T_{\overline \tau^\dagger}) \leq T ) {,}\]

where $M_F>0$ . Moreover, for $k \geq 1$ ,

\[\{\overline \tau^\dagger = k \} = \{ \overline \tau^\dagger > k-1 \} \bigcap \{ (\tau_k,\theta_k ) \neq (\overline \tau_k,\overline \theta_k ) \}.\]

Hence

\begin{align*}\mathbb{P} (\!\min(T_{\overline \tau^\dagger},\overline T_{\overline \tau^\dagger}) \leq T )&=\sum_{k \geq 1} {\mathbb{E}} [ \mathbb{1}_{\min(T_k,\overline T_k) \leq T} \mathbb{1}_{\overline \tau^\dagger = k} ] \\* &=\sum_{k \geq 1} {\mathbb{E}} [ \mathbb{1}_{\min(T_k,\overline T_k) \leq T} \mathbb{1}_{\overline \tau^\dagger > k-1} \mathbb{1}_{(\tau_k,\theta_k ) \neq (\overline \tau_k,\overline \theta_k )} ] \\* &\leq\sum_{k \geq 1} \overline J_k + 2 \overline I_k {,}\end{align*}

where

(3.2) \begin{equation} \begin{aligned}\overline J_k&\,:\!={\mathbb{E}} [\mathbb{1}_{\min(T_k,\overline T_k)\, \leq\, T} \mathbb{1}_{\overline \tau^\dagger > k-1} \mathbb{1}_{\tau_k = \overline \tau_k} \mathbb{1}_{\theta_k \neq \overline \theta_k} ],\\* \overline I_k&\,:\!={\mathbb{E}} [ \mathbb{1}_{\min(T_k,\overline T_k)\, \leq\, T} \mathbb{1}_{\overline \tau^\dagger > k-1} \mathbb{1}_{\tau_k \neq \overline \tau_k} ].\end{aligned}\end{equation}

We start with $\overline J_k$ . First note that, for $k \geq 1$ , $\{\tau_k = \overline \tau_k\} = \{T_k = \overline T_k\}$ , and that on the event $\{T_k = \overline T_k\}$ , we have $\min(T_k,\overline T_k) = T_k$ , so

\[\overline J_k={\mathbb{E}} [\mathbb{1}_{T_k \leq T} \mathbb{1}_{\overline \tau^\dagger > k-1} \mathbb{1}_{\tau_k = \overline \tau_k} \mathbb{1}_{\theta_k \neq \overline \theta_k} ].\]

We emphasize that it makes no difference in the rest of the proof if we choose $\min(T_k,\overline T_k) = \overline T_k$ . Since

\[\Bigg\{ \overline \tau^\dagger > k-1 \} = \bigcap_{i=0}^{k-1} \{ (\tau_i,\theta_i) = (\overline \tau_i,\overline \theta_i) \Bigg\},\]

we can rewrite $\overline J_k$ as follows:

(3.3) \begin{equation}\sum_{{\genfrac{}{}{0pt}{2}{1 \leq p_1 < \ldots < p_k}{\alpha_1,\ldots,\alpha_{k-1} \in \Theta}}}{\mathbb{E}} [ \mathbb{1}_{\{\tau_i= \overline \tau_i =p_i, 1\leq i \leq k\}} \mathbb{1}_{\{\theta_i = \overline \theta_i=\alpha_i, 1\leq i \leq k-1\}} \mathbb{1}_{T^{*}_{p_k} \leq T} \mathbb{1}_{\theta_k \neq \overline \theta_k} ].\end{equation}

By construction we have $\theta_k = H((\theta_{k-1},\nu_k),V_k)$ and $\overline \theta_k = H((\overline \theta_{k-1},\overline \nu_k),V_k)$ . The random variable $\mathbb{1}_{\{\tau_i= \overline \tau_i =p_i, 1\leq i \leq k\}} \mathbb{1}_{\{\theta_i = \overline \theta_i=\alpha_i, 1\leq i \leq k-1\}} \mathbb{1}_{T^{*}_{p_k} \leq T}$ depends on the vector $(U_i,1 \leq i \leq p_k, T^{*}_j, 1 \leq j \leq p_k, V_q, 1 \leq q \leq k-1 )$ , which is independent of $V_k$ . Conditioning by this vector in (3.3) and applying Lemma 3.1 yields

\begin{align*}&{\mathbb{E}} [ \mathbb{1}_{\{\tau_i= \overline \tau_i =p_i, 1\leq i \leq k\}} \mathbb{1}_{\{\theta_i = \overline \theta_i=\alpha_i, 1\leq i \leq k-1\}} \mathbb{1}_{T^{*}_{p_k} \leq T} \mathbb{1}_{\theta_k \neq \overline \theta_k} ] \\* &\quad \leq{\mathbb{E}} \Bigg[ \mathbb{1}_{\{\tau_i= \overline \tau_i =p_i, 1\leq i \leq k\}} \mathbb{1}_{\{\theta_i = \overline \theta_i=\alpha_i, 1\leq i \leq k-1\}} \mathbb{1}_{T^{*}_{p_k} \leq T} \sum_{j=1}^{|\Theta|-1} | a_j( \alpha_{k-1},\overline\nu_k) - a_j(\alpha_{k-1},\nu_k) | \Bigg].\end{align*}

From the definition of $a_j$ (see (2.4)), the triangle inequality, and since Q is $L_Q$ -Lipschitz, we have

\[\sum_{j=1}^{|\Theta|-1} | a_j( \alpha_{k-1},\overline\nu_k) - a_j(\alpha_{k-1},\nu_k) | \leq \dfrac{( |\Theta|-1 ) |\Theta|}{2} L_{Q} |\overline \nu_k - \nu_k|.\]

Since we are on the event

\[\{\tau_i= \overline \tau_i =p_i, 1\leq i \leq k\} \bigcap \{\theta_i = \overline \theta_i=\alpha_i, 1\leq i \leq k-1\},\]

the application of Lemma 2.1 yields $|\overline \nu_k - \nu_k| \leq {\text{e}}^{L T^{*}_{p_k}} k C h$ . Thus $\overline J_k \leq C_1 h {\mathbb{E}}[\mathbb{1}_{T_k \leq T}k]$ , where $C_1$ is a constant independent of h. Moreover,

\[\sum_{k \geq 1 }\mathbb{1}_{T_k \leq T} k = \sum_{k=1}^{N_T} k \leq N_T^2\quad\text{and}\quad{\mathbb{E}}[ N_T^2] \leq {\mathbb{E}}[ (N^{*}_T)^2] < +\infty\]

so that

\[\sum_{k \geq 1} \overline J_k = {\text{O}}(h).\]

From the definition of $\overline I_k$ (see (3.2)), we can write

\begin{align*}\overline I_k&={\mathbb{E}} [ \mathbb{1}_{\min(T_k,\overline T_k) \leq T} \mathbb{1}_{\overline \tau^\dagger > k-1} ( \mathbb{1}_{\tau_k < \overline \tau_k} + \mathbb{1}_{\tau_k > \overline \tau_k} ) ] \\&={\mathbb{E}} [ \mathbb{1}_{T_k \leq T} \mathbb{1}_{\overline \tau^\dagger > k-1} \mathbb{1}_{\tau_k < \overline \tau_k} ]+{\mathbb{E}} [ \mathbb{1}_{\overline T_k \leq T} \mathbb{1}_{\overline \tau^\dagger > k-1} \mathbb{1}_{\tau_k > \overline \tau_k} ]\\&=\!:\,\overline I^{(1)}_k + \overline I^{(2)}_k.\end{align*}

The second equality above follows since $\{\tau_k < \overline \tau_k \} = \{T_k < \overline T_k\}$ and $\{\tau_k > \overline \tau_k \} = \{T_k > \overline T_k\}$ . We only treat the term $\overline I^{(1)}_k$ ; the term $\overline I^{(2)}_k$ can be treated similarly by swapping $(\tau_k,T_k)$ and $(\overline \tau_k,\overline T_k)$ . Just as in the previous case, we can rewrite $\overline I^{(1)}_k$ as follows:

(3.4) \begin{equation}\sum_{{\genfrac{}{}{0pt}{2}{1 \leq p_1 < \ldots < p_k}{\alpha_1,\ldots,\alpha_{k-1} \in \Theta}}}{\mathbb{E}} [ \mathbb{1}_{\{\tau_i= \overline \tau_i =p_i, 1\leq i \leq k-1\}} \mathbb{1}_{\{\theta_i = \overline \theta_i=\alpha_i, 1\leq i \leq k-1\}} \mathbb{1}_{T^{*}_{p_k} \leq T} \mathbb{1}_{\tau_k = p_k} \mathbb{1}_{p_k < \overline \tau_k} ].\end{equation}

In (3.4) we have

\begin{align*} & \{\tau_k = p_k\} \cap \{p_k < \overline \tau_k\} \\* &\quad \subseteq \{ \lambda(\alpha_{k-1}, \overline \Phi_{\alpha_{k-1}}(T^{*}_{p_k}-T^{*}_{p_{k-1}},\overline \nu_{k-1})) < U_{p_k} \lambda^{*} \leq \lambda( \alpha_{k-1}, \Phi_{\alpha_{k-1}}(T^{*}_{p_k}-T^{*}_{p_{k-1}}, \nu_{k-1})) \}.\end{align*}

The random variable $\mathbb{1}_{\{\tau_i= \overline \tau_i =p_i, 1\leq i \leq k-1\}} \, \mathbb{1}_{\{\theta_i = \overline \theta_i=\alpha_i, 1\leq i \leq k-1\}}\, \mathbb{1}_{T^{*}_{p_k} \leq T}$ depends on

\[(U_i,1 \leq i \leq p_{k-1}, T^{*}_j, 1 \leq j \leq p_k, V_q, 1 \leq q \leq k-1 ){,}\]

which is independent of $U_{p_k}$ . Conditioning by this vector in (3.4) yields

\begin{align*}&{\mathbb{E}} [ \mathbb{1}_{\{\tau_i= \overline \tau_i =p_i, 1\leq i \leq k-1\}} \mathbb{1}_{\{\theta_i = \overline \theta_i=\alpha_i, 1\leq i \leq k-1\}} \mathbb{1}_{T^{*}_{p_k} \leq T} \mathbb{1}_{\tau_k = p_k} \mathbb{1}_{p_k < \overline \tau_k} ]\\* &\quad\leq{\mathbb{E}} [ \mathbb{1}_{\{\tau_i= \overline \tau_i =p_i, 1\leq i \leq k-1\}} \mathbb{1}_{\{\theta_i = \overline \theta_i=\alpha_i, 1\leq i \leq k-1\}} \mathbb{1}_{T^{*}_{p_k} \leq T} \\* &\quad\quad\, \times |\lambda(\alpha_{k-1}, \overline \Phi_{\alpha_{k-1}}(T^{*}_{p_k}-T^{*}_{p_{k-1}},\overline \nu_{k-1}))-\lambda( \alpha_{k-1}, \Phi_{\alpha_{k-1}}(T^{*}_{p_k}-T^{*}_{p_{k-1}}, \nu_{k-1})) | ]{.}\end{align*}

Using the Lipschitz continuity of $\lambda$ and then Lemma 2.1, we obtain

\[\overline I^{(1)}_k \leq C_2 h {\mathbb{E}}[\mathbb{1}_{T_k \leq T} k]{,}\]

where $C_2$ is a constant independent of h. Concerning the term $\overline I^{(2)}_k$ , we will end with the estimate $\overline I^{(2)}_k \leq C_2 h {\mathbb{E}}[\mathbb{1}_{\overline T_k \leq T} k]$ . We conclude in the same way as in the estimation of $\overline J_k$ above that $\sum_{k \geq 1 } \overline I_k = {\text{O}}(h)$ .

Step 2: estimation of $\overline D$ . Note that for $n \geq 0$ we have

\[\{N_T = n \} \cap \{ \min( T_{\overline \tau^\dagger}, \overline T_{\overline \tau^\dagger}) > T \}=\{N_T = n \} \cap \{ \overline N_T = n \} \cap \{ \overline \tau^\dagger > n \},\]

where we can swap $\{N_T = n \}$ and $\{\overline N_T = n \}$ . Thus, using the partition $ \{N_T = n,n \geq 0\}$ , we have

\begin{equation*}\overline D=\sum_{n \geq 0} {\mathbb{E}} [ \mathbb{1}_{N_T=n} \mathbb{1}_{\overline N_T=n} \mathbb{1}_{\overline \tau^\dagger > n} | F(\theta_n, \overline \Phi_{ \theta_n}(T- T_n,\overline \nu_n)) - F(\theta_n,\Phi_{\theta_n}(T-T_n,\nu_n)) |^2 ] {.}\end{equation*}

Application of the Lipschitz continuity of F and of Lemma 2.1 yields

\begin{equation*}| F(\theta_n, \overline \Phi_{\theta_n}(T-T_n,\overline \nu_n)) - F(\theta_n,\Phi_{\theta_n}(T-T_n,\nu_n)) |\leq L_F \,{\text{e}}^{ L T} (n+1)Ch.\end{equation*}

Then we have $\overline D \leq C_3 h^2 \sum_{n \geq 0} {\mathbb{E}} [ \mathbb{1}_{N_T=n} (n+1)^2 ]$ , where $C_3$ is a constant independent of h. Since

\[\sum_{n \geq 0} {\mathbb{E}} [ \mathbb{1}_{N_T=n} (n+1)^2 ] = {\mathbb{E}}[(N_T+1)^2] \leq {\mathbb{E}}[(N^{*}_T+1)^2] < +\infty,\]

we conclude that $\overline D = {\text{O}}(h^2)$ .

3.3. Proof of Theorem 3.2

First we reorder the terms in $\tilde R_T$ . We write $\tilde R_T = \tilde{\textsf{Q}}_T \tilde{\textsf{S}}_T \tilde{\textsf{H}}_T$ , where

(3.5) \begin{align}& \tilde{\textsf{Q}}_T = \prod_{l=1}^{\tilde N_T} \dfrac{Q(\tilde x_{T^{*}_{\tilde \tau_l}}^-,\tilde \theta_l)}{\tilde Q(\tilde \theta_{l-1},\tilde \theta_l)},\end{align}
(3.6) \begin{align}& \tilde{\textsf{S}}_T= \prod_{l=1}^{\tilde N_T} \dfrac{\lambda(\tilde \theta_{l-1},\Phi_{\tilde \theta_{l-1}}(T^{*}_{\tilde \tau_l}-T^{*}_{\tilde \tau_{l-1}},\tilde \nu_{l-1}))}{\lambda^{*}} \prod_{k=\tilde \tau_{l-1}+1}^{\tilde \tau_l} \bigg(1 - \dfrac{\lambda(\tilde \theta_{l-1},\Phi_{\tilde \theta_{l-1}}(T^{*}_{k}-T^{*}_{\tilde \tau_{l-1}},\tilde \nu_{l-1}))}{\lambda^{*}}\bigg)\\ & \qquad\times \prod_{l=\tilde \tau_{\tilde N_T}+1}^{N^{*}_T} \bigg(1 - \dfrac{\lambda(\tilde \theta_{\tilde N_T},\Phi_{\tilde \theta_{\tilde N_T}}(T^{*}_{l}-T^{*}_{\tilde \tau_{\tilde N_T}},\tilde \nu_{\tilde N_T}))}{\lambda^{*}}\bigg),\nonumber \end{align}
(3.7) \begin{align} \tilde{\textsf{H}}_T = \prod_{l=1}^{\tilde N_T} \bigg( \dfrac{\tilde \lambda(\tilde \theta_{l-1})}{\lambda^{*}} \bigg(1-\dfrac{\tilde \lambda(\tilde \theta_{l-1})}{\lambda^{*}}\bigg)^{\tilde \tau_l - \tilde \tau_{l-1}-1} \bigg)^{-1} \bigg( \bigg(1-\dfrac{\tilde \lambda(\tilde \theta_{\tilde N_T})}{\lambda^{*}}\bigg)^{N^{*}_T - \tilde \tau_{\tilde N_T}} \bigg)^{-1}. \end{align}

Likewise we reorder the terms in $\underline{\tilde R}_T$ , writing $\underline{\tilde R}_T = \underline{\tilde{\textsf{Q}}}_T \underline{\tilde{\textsf{S}}}_T \tilde{\textsf{H}}_T$ , where $\underline{\tilde{\textsf{Q}}}_T$ and $\underline{\tilde{\textsf{S}}}_T$ are defined as (3.5) and (3.6) replacing $\tilde x$ and $\Phi$ by $\underline{\tilde x}$ and $\overline \Phi$ . Since the processes $(\tilde \theta_n)$ and $(\tilde \tau_n)$ do not depend on $\Phi$ or $\overline \Phi$ , the term $\tilde{\textsf{H}}$ is the same in $\tilde R$ and $\underline{\tilde R}$ . To prove Theorem 3.2, let us decompose the problem and write

\begin{align*}|F(\underline{\tilde x}_T) \underline{\tilde R}_T - F(\tilde x_T) \tilde R_T |&=| (F(\underline{\tilde x}_T) - F(\tilde x_T)) \underline{\tilde R}_T + (\underline{\tilde R}_T- \tilde R_T) F(\tilde x_T) | \\&\leq| F(\underline{\tilde x}_T) - F(\tilde x_T) | |\underline{\tilde R}_T| + |\underline{\tilde R}_T- \tilde R_T| |F(\tilde x_T)|,\end{align*}

so that

\begin{align*}\mathbb{E} [ |F(\underline{\tilde x}_T) \underline{\tilde R}_T - F(\tilde x_T) \tilde R_T|^2 ]&\leq2 {\mathbb{E}} [ | F(\underline{\tilde x}_T) - F(\tilde x_T) |^2 |\underline{\tilde R}_T|^2 ]+2 {\mathbb{E}} [ |\underline{\tilde R}_T- \tilde R_T|^2 |F(\tilde x_T) |^2 ]\\* &=\!:\, 2 \overline D + 2 \overline C.\end{align*}

In the following we show that $\overline C = {\text{O}}(h^2)$ and $\overline D = {\text{O}}(h^2)$ .

Step 1: estimation of $\overline C$ . The function F being bounded, we have

\[ \overline C \leq M_F^2 {\mathbb{E}} [ |\underline{\tilde R}_T - \tilde R_T|^2 ]{,} \]

where $M_F$ is a positive constant. Moreover, for all $\theta \in \Theta$ we have

\[ (1-\tilde \lambda(\theta)/\lambda^{*})^{-1} \leq (1-\tilde \lambda_{\text{max}}/\lambda^{*})^{-1} \quad\text{and}\quad (\tilde \lambda(\theta)/\lambda^{*})^{-1} \leq (\tilde \lambda_{\text{min}}/\lambda^{*})^{-1}. \]

Thus

\[ \tilde H_T \leq \bigg( \dfrac{\tilde \lambda_{\text{min}}}{\lambda^{*}}\bigg(1-\dfrac{\tilde \lambda_{\text{max}}}{\lambda^{*}}\bigg) \bigg)^{- N^{*}_T} {,} \]

and using the definition of $\tilde R$ and $\underline{\tilde R}$ (see (3.5), (3.6), and (3.7)) we can write

\begin{equation*}|\underline{\tilde R}_T - \tilde R_T|\leq\bigg( \dfrac{\tilde \lambda_{\text{min}}}{\lambda^{*}}\bigg(1-\dfrac{\tilde \lambda_{\text{max}}}{\lambda^{*}}\bigg) \bigg)^{- N^{*}_T} ( |\underline{\tilde{\textsf{Q}}}_T - \tilde{\textsf{Q}}_T| \tilde{\textsf{S}}_T + |\underline{\tilde{\textsf{S}}}_T - \tilde{\textsf{S}}_T| \underline{\tilde{\textsf{Q}}}_T ).\end{equation*}

We set $\overline J = |\underline{\tilde{\textsf{Q}}}_T - \tilde{\textsf{Q}}_T| \tilde{\textsf{S}}_T$ and $\overline I = |\underline{\tilde{\textsf{S}}}_T - \tilde{\textsf{S}}_T| \underline{\tilde{\textsf{Q}}}_T$ . To provide the desired estimate for $\overline C$ , we proceed as follows. First, we work $\omega$ by $\omega$ to determine (random) bounds for $\overline J$ and $\overline I$ , from which we deduce a (random) bound for $|\underline{\tilde R}_T - \tilde R_T|$ . Finally, we take the expectation. We start with $\overline I$ . For all $(\theta,\nu) \in E$ and for all $t \geq 0$ , we have from Assumption 2.1 that $1-\lambda(\theta,\Phi_{\theta}(t,\nu))/\lambda^{*} \leq 1$ and $\lambda(\theta,\Phi_{\theta}(t,\nu))/\lambda^{*} \leq 1$ . Then, using Lemma 3.2 (twice), we have

\begin{align*}|\underline{\tilde{\textsf{S}}}_T - \tilde{\textsf{S}}_T|&\leq\dfrac{1}{\lambda^{*}} \sum_{l=1}^{\tilde N_T+1} \sum_{k=\tilde \tau_{l-1}+1}^{\tilde \tau_l \wedge N^{*}_T} \!\!|\lambda(\tilde \theta_{l-1}, \overline \Phi_{\tilde \theta_{l-1}}(T^{*}_k-T^{*}_{\tilde \tau_{l-1}},\underline{\tilde \nu}_{l-1}))\\ &\quad-\lambda(\tilde \theta_{l-1}, \Phi_{\tilde \theta_{l-1}}(T^{*}_k-T^{*}_{\tilde \tau_{l-1}},{\tilde \nu}_{l-1}))|.\end{align*}

Using the Lipschitz continuity of $\lambda$ and Lemma 2.1, we find that, for all $l=1,\ldots,\tilde N_T+1$ and $k=\tilde \tau_{l-1}+1,\ldots,\tilde \tau_l \wedge N^{*}_T$ ,

\begin{equation*}|\lambda(\tilde \theta_{l-1}, \overline \Phi_{\tilde \theta_{l-1}}(T^{*}_k-T^{*}_{\tilde \tau_{l-1}},\underline{\tilde \nu}_{l-1}))-\lambda(\tilde \theta_{l-1}, \Phi_{\tilde \theta_{l-1}}(T^{*}_k-T^{*}_{\tilde \tau_{l-1}},{\tilde \nu}_{l-1}))|\leq{\text{e}}^{LT} C h l.\end{equation*}

Moreover, for all $l=1,\ldots,\tilde N_T+1$ we have $\tilde \tau_l \wedge N^{*}_T-\tilde \tau_{l-1} \leq N^{*}_T$ , so that

\[|\underline{\tilde{\textsf{S}}}_T - \tilde{\textsf{S}}_T| \leq N^{*}_T (N^{*}_T+1)^2 C_1 h{,}\]

where $C_1$ is a positive constant independent of h. Finally, since $\underline{\tilde{\textsf{Q}}}_T \leq \rho^{-N^{*}_T}$ we have

(3.8) \begin{equation}\overline I\leq\rho^{-N^{*}_T} N^{*}_T (N^{*}_T+1)^2 C_1 h.\end{equation}

Now, consider $\overline J$ . Note that from Assumption 2.1 we have $\tilde{\textsf{S}}_T \leq 1$ . We use the same type of arguments as for $\overline I$ . That is, we successively use Lemma 3.2, the Lipschitz continuity of Q, and Lemma 2.1 to obtain

(3.9) \begin{equation}\overline J \leq \rho^{- N^{*}_T} (N^{*}_T)^2 C_2 h,\end{equation}

where $C_2$ is a positive constant independent of h. Then we derive from the previous estimates (3.8) and (3.9) that

\begin{equation*}|\underline{\tilde R}_T - \tilde R_T|\leq\Xi_1(N^{*}_T) C_3 h,\end{equation*}

where

\[\Xi_1(n) = \bigg( \rho \dfrac{\tilde \lambda_{\text{min}}}{\lambda^{*}}\bigg(1-\dfrac{\tilde \lambda_{\text{max}}}{\lambda^{*}}\bigg) \bigg)^{- n} n (n+1)^2\quad\text{and}\quad C_3=\max(C_1,C_2).\]

Finally, we have ${\mathbb{E}}[|\underline{\tilde R}_T - \tilde R_T|^2] \leq C_3 h^2 {\mathbb{E}} [ \Xi_1(N^{*}_T)^2 ]$ . Since

\[{\mathbb{E}} [ \Xi_1(N^{*}_T)^2 ] < +\infty{,}\]

we conclude that $\overline C = {\text{O}}(h^2)$ .

Step 2: estimation of $\overline D$ . Recall that $\smash{\tilde x_T=(\tilde \theta_{\tilde N_T}, \Phi_{\tilde \theta_{\tilde N_T}}(T-\tilde T_{\tilde N_T},\tilde \nu_{\tilde N_T}))}$ and $\underline{\tilde x}_T=(\tilde \theta_{\tilde N_T},\overline \Phi_{\tilde \theta_{\tilde N_T}}(T-\tilde T_{\tilde N_T},\underline{\tilde \nu}_{\tilde N_T}))$ . Then, using the Lipschitz continuity of F, Lemma 2.1 and since $\tilde N_T \leq N^{*}_T$ , we get

\begin{equation*}|F(\underline{\tilde x}_T) - F(\tilde x_T)|\leq L_F \,{\text{e}}^{L T} (\tilde N_T+1)C h\leq L_F \,{\text{e}}^{L T} (N^{*}_T+1)C h.\end{equation*}

Moreover,

\[|\underline{\tilde R}_T| \leq \bigg( \rho \dfrac{\tilde \lambda_{\text{min}}}{\lambda^{*}}\bigg(1-\dfrac{\tilde \lambda_{\text{max}}}{\lambda^{*}}\bigg) \bigg)^{-N^{*}_T}{,}\]

so that $\overline D \leq C_4 h^2 {\mathbb{E}} [\Xi_2(N^{*}_T)^2 ]$ , where $C_4$ is a positive constant independent of h and

\[\Xi_2(n) = (n+1)\bigg( \rho \dfrac{\tilde \lambda_{\text{min}}}{\lambda^{*}}\bigg(1-\dfrac{\tilde \lambda_{\text{max}}}{\lambda^{*}}\bigg) \bigg)^{-n}.\]

Since ${\mathbb{E}} [\Xi_2(N^{*}_T)^2 ] < +\infty$ , we conclude that $\overline D = {\text{O}}(h^2)$ .

4. Weak error expansion

In this section we are interested in a weak error expansion for the PDMP $(x_t)$ of Section 2.3 and its associated Euler scheme $(\overline x_t)$ . First of all, we recall from [Reference Davis5] that the generator $\mathcal{A}$ of the process $(t,x_t)$ which acts on functions g defined on $\mathbb{R}_+ \times E$ is given by

(4.1) \begin{equation}\mathcal{A} g(t,x) = {\partial_t g}(t,x) + f(x) {\partial_{\nu} g}(t,x) + \lambda(x) \int_E ( g(t,y) - g(t,x) ) Q(x,\,{\text{d}} y),\end{equation}

where for notational convenience we have set

\[ {\partial_{\nu} g}(t,x) \,:\!= \dfrac{\partial g}{ \partial \nu}(t,\theta,\nu),\quad {\partial_t g}(t,x) \,:\!= \dfrac{\partial g}{ \partial t}(t,x){,}\quad\text{and}\quad f(x) = f_{\theta}(\nu) \quad \text{for all $x =(\theta,\nu) \in E$.} \]

Below, we state the assumptions and the main theorem of this section. Its proof, which is inspired by [Reference Talay and Tubaro27] (see also [Reference Pagés24] or [Reference Graham and Talay16]), is deferred to Section 4.2.

Assumption 4.1. For all $ \theta \in \Theta $ and for all $A \in \mathcal{B}(\Theta)$ , the functions $\nu \mapsto Q ( (\theta,\nu), A )$ , $\nu \mapsto \lambda(\theta,\nu )$ , and $\nu \mapsto f_{\theta}(\nu )$ are bounded and twice continuously differentiable with bounded derivatives.

Assumption 4.2. The solution u of the integro-differential equation

(4.2) \begin{equation}\begin{cases}\mathcal{A} u(t,x) = 0 & (t,x) \in [0,T[ \times E, \\u(T,x) = F(x) & x \in E,\end{cases}\end{equation}

with $F \colon E \rightarrow \mathbb{R}$ a bounded function and $\mathcal{A}$ given by (4.1), is such that for all $\theta \in \Theta$ , the function $(t,\nu) \mapsto u(t,\theta,\nu)$ is bounded and twice differentiable with bounded derivatives. Moreover, the second derivatives of $(t,\nu) \mapsto u(t,\theta,\nu)$ are uniformly Lipschitz in $\theta$ .

Theorem 4.1. Let $(x_t, t \in [0,T])$ be a PDMP and $(\overline x_t, t \in [0,T])$ its approximation constructed in Section 2.3with $x_0 = \overline x_0 = x$ for some $x \in E$ . Under Assumptions 4.1 and 4.2, for any bounded function $F\colon E \rightarrow \mathbb{R}$ there exists a constant $c_1$ independent of h such that

(4.3) \begin{equation}{\mathbb{E}} [F(\overline x_T)] - {\mathbb{E}} [F(x_T)]=h c_1 + {\text{O}}(h^2).\end{equation}

Remark 4.1. If $(\tilde x_t)$ is a PDMP whose characteristics $\tilde \lambda, \tilde Q$ satisfy the assumptions of Proposition 2.2 and $(\underline{\tilde x}_t)$ is its approximation, we deduce from Theorem 4.1 that

(4.4) \begin{equation}{\mathbb{E}}[F(\underline{\tilde x}_T) \underline{\tilde R}_T] - {\mathbb{E}} [F(\tilde x_T) \tilde R_T]= h c_1 + {\text{O}}(h^2).\end{equation}

4.1. Further results on PDMPs: Itô and Feynman–Kac formulas

Definition 4.1. Let us define the following operators, which act on functions g defined on $\mathbb{R}_+ \times E$ :

\begin{align*}\mathcal{T} g(t,x)& \,:\!={\partial_t g}(t,x) + f(x) {\partial_{\nu} g}(t,x),\\* \mathcal{S} g(t,x)& \,:\!=\lambda(x) \int_E ( g(t,y) - g(t,x) ) Q(x, \,{\text{d}} y).\end{align*}

From Definition 4.1, the generator $\mathcal{A}$ defined by (4.1) reads $\mathcal{A} g(t,x) = \mathcal{T} g(t,x) + \mathcal{S} g(t,x)$ . We introduce the random counting measure p associated with the PDMP $(x_t)$ defined by $p([0,t] \times A) \,:\!= \sum_{n \geq 1} \mathbb{1}_{T_n \leq t} \mathbb{1}_{Y_n \in A}$ for $t \in [0,T]$ and for $A \in \mathcal{B}(E)$ . The compensator of p, denoted by pʹ, is given from [Reference Davis5] by

\begin{equation*}p'( [0,t] \times A ) = \int_0^t \lambda(x_s) Q(x_s,A) \,{\text{d}} s.\end{equation*}

Hence, $q \,:\!= p - p'$ is a martingale with respect to the filtration generated by p, denoted by $(\mathcal{F}_t^p)_{t \in [0,T]}$ . Similarly, we introduce $\overline p$ , $\overline p'$ , $\overline q$ , and $(\mathcal{F}_t^{\overline p})_{t \in [0,T]}$ to be the same objects as above but corresponding to the approximation $(\overline x_t)$ . The fact that $\overline p'$ is the compensator of $\overline p$ and that $\overline q$ is a martingale derives from arguments of the marked point processes theory: see [Reference Brémaud4].

Definition 4.2. Let us define the following operators, which act on functions g defined on $\mathbb{R}_+ \times E$ :

\begin{align*}\overline{\mathcal{T}} g(t,x,y)&\,:\!={\partial_t g}(t,x) + f(y) {\partial_{\nu} g}(t,x),\\* \overline{\mathcal{A}} g(t,x,y)&\,:\!=\overline{\mathcal{T}} g(t,x,y) + \mathcal{S} g(t,x).\end{align*}

Remark 4.2. For all functions g defined on $\mathbb{R}_+ \times E$ , $\overline{\mathcal{T}} g(t,x,x) = \mathcal{T} g(t,x)$ , so that $\overline{\mathcal{A}} g(t,x,x) = \mathcal{A} g(t,x)$ .

The next theorem provides Itô formulas for the PDMP $(x_t)$ and its approximation $(\overline x_t)$ . For all $s \in [0,T]$ , we set $\overline \eta(s) \,:\!= \overline T_n + kh$ if $s \in [\overline T_n + kh, (\overline T_n + (k+1)h) \wedge \overline T_{n+1}[$ for some $n \geq 0$ and for some $k \in \{0,\ldots, \lfloor (\overline T_{n+1} - \overline T_n)/h \rfloor\}$ .

Theorem 4.2. Let $(x_t, t \in [0,T])$ and $(\overline x_t, t \in [0,T])$ be a PDMP and its approximation, respectively, constructed in Section 2.3with $x_0 = \overline x_0 = x$ for some $x \in E$ . For all bounded functions $g \colon \mathbb{R}_+ \times E \rightarrow \mathbb{R}$ continuously differentiable with bounded derivatives, we have

(4.5) \begin{equation}g(t,x_t) = g(0,x) + \int_0^t \mathcal{A} g (s,x_s) \,{\text{d}} s + M^g_t,\end{equation}

where

\[M^g_t \,:\!= \int_0^t \int_E ( g(s,y) - g(s,x_{s-}) ) q({\text{d}} s \,{\text{d}} y)\]

is a true $\mathcal{F}_t^p$ -martingale, and

(4.6) \begin{equation}g(t,\overline x_t) = g(0,x) + \int_{0}^{t} \overline{\mathcal{A}} g(s,\overline x_s,\overline x_{\overline \eta(s)}) \,{\text{d}} s + \overline M^g_t,\end{equation}

where

\[\overline M^g_t \,:\!= \int_0^t \int_E ( g(s,y) - g(s,\overline x_{s-}) ) \overline q({\text{d}} s \,{\text{d}} y)\]

is a true $\mathcal{F}_t^{\overline p}$ -martingale.

Proof of Theorem 4.2. The proof of (4.5) is given in [Reference Davis5]. We prove (4.6) following the same arguments. Since $\overline q = \overline p - \overline p'$ , we have

\begin{equation*}\overline M^g_t=\sum_{k \geq 1 } \mathbb{1}_{\overline T_k \leq t} ( g(\overline T_k, \overline x_{\overline T_k}) - g(\overline T_k, \overline x_{\overline T_k}^-) ) - \int_0^t \mathcal{S} g(s,\overline x_s)\,{\text{d}} s.\end{equation*}

Consider the above sum. As in [Reference Davis5], we write, on the event $\{\overline N_t = n\}$ , that

\begin{align*}&\sum_{k \geq 1 } \mathbb{1}_{\overline T_k \leq t} ( g(\overline T_k, \overline x_{\overline T_k}) - g(\overline T_k, \overline x_{\overline T_k}^-) ) \\* &\quad =g(t,\overline x_t) - g(0,x) - \Bigg[ g(t,\overline x_t) - g(\overline T_n, \overline x_{\overline T_n}) + \sum_{k=0}^{n-1} g(\overline T_{k+1}, \overline x_{\overline T_{k+1}}^-) - g(\overline T_{k}, \overline x_{\overline T_{k}}) \Bigg].\end{align*}

For all $k \leq n-1$ , we decompose the increment

\[g(\overline T_{k+1},\overline x_{\overline T_{k+1}}^-) - g(\overline T_k,\overline x_{\overline T_k})\]

as a sum of increments on the intervals $[\overline T_k + ih, (\overline T_k + (i+1)h)\wedge \overline T_{k+1} ] \subset [\overline T_k, \overline T_{k+1} ]$ . Without loss of generality we are led to consider increments of the form $g(t,\theta,\overline \phi_{\theta}(t,\nu)) - g(ih,\theta,\overline y_i(x))$ for some $i \geq 0$ , $t \in [ih,(i+1)h]$ and for all $x = (\theta,\nu) \in E$ , where we recall that $\overline \phi$ is defined by (2.8). The function g is smooth enough to write

\begin{equation*}g(t,\theta,\overline \phi_{\theta}(t,\nu)) - g(ih,\theta,\overline y_i(x))=\int_{ih}^{t} ( {\partial_t g} + f_{\theta}(\overline y_i(x)) {\partial_{\nu} g} )(s,\theta,\overline \phi_{\theta}(s, \nu) ) \,{\text{d}} s.\end{equation*}

Then the above arguments together with Definition 4.2 yield

\begin{equation}g(t,\overline x_t) - g(\overline T_{n},\overline x_{\overline T_{n}}) + \sum_{k=0}^{n-1} g(\overline T_{k+1},\overline x_{\overline T_{k+1}}^-) - g(\overline T_k,\overline x_{\overline T_k})=\int_{0}^t \overline{\mathcal{T}} g (s,\overline x_s,\overline x_{\overline \eta(s)}) \,{\text{d}} s.\end{equation}

The following theorem gives us a way to represent the solution of the integro-differential equation (4.6) as the conditional expected value of a functional of the terminal value of the PDMP $(x_t)$ . It plays a key role in the proof of Theorem 4.1.

Theorem 4.3. (PDMP’s Feynman–Kac formula [Reference Davis6].) Let $F \colon E \rightarrow \mathbb{R}$ be a bounded function. Then the integro-differential equation (4.2) has a unique solution $u \colon \mathbb{R}_+ \times E \rightarrow \mathbb{R}$ given by

\begin{equation*}u(t,x) = {\mathbb{E}} [ F(x_T) | x_t = x], \quad (t,x) \in [0,T] \times E.\end{equation*}

4.2. Proof of Theorem 4.1

We provide a proof in two steps. First, we give an appropriate representation of the weak error ${\mathbb{E}} [F (\overline x_T)] - {\mathbb{E}} [F (x_T)]$ . Then we use this representation to identify the coefficient $c_1$ in (4.3).

Step 1: representing ${\mathbb{E}} [F (\overline x_T)] - {\mathbb{E}} [F (x_T)]$ . Let u denote the solution of (4.2). From Theorem 4.3 we can write

\[{\mathbb{E}} [F (\overline x_T)] - {\mathbb{E}} [F (x_T)]={\mathbb{E}} [u(T,\overline x_T)] - u(0,x).\]

Then, the application of the Itô formula (4.6) to u at time T yields

\begin{equation*}u(T,\overline x_T) = u(0,x) + \int_{0}^{T} \overline{\mathcal{A}} u (s,\overline x_s,\overline x_{\overline \eta(s)}) \,{\text{d}} s + \overline M^u_T.\end{equation*}

Since $(\overline M^u_t)$ is a true martingale, we obtain

\begin{equation*}{\mathbb{E}} [u(T,\overline x_T) - u(0,x)] = {\mathbb{E}} \bigg[ \int_{0}^{T} \overline{\mathcal{A}} u (s,\overline x_s,\overline x_{\overline \eta(s)}) \,{\text{d}} s \bigg].\end{equation*}

For $s \in [0,T]$ we have $\overline{\mathcal{A}} u(s,\overline x_s,\overline x_{\overline \eta(s)})=\partial_t u(s,\overline x_s) + f(\overline x_{\overline \eta(s)}){\partial_{\nu} u}(s,\overline x_s) + \mathcal{S} u(s,\overline x_s)$ (see Definition 4.2). From the regularity of $\lambda$ , Q, and u (see Assumptions 4.1 and 4.2), the functions $\partial_t u$ , ${\partial_{\nu} u}$ , and $\mathcal{S} u$ are smooth enough to apply the Itô formula (4.6) between $\overline \eta(s)$ and s respectively. This yields

\begin{align*}\partial_t u(s,\overline x_s) & = \partial_t u(\overline \eta(s),\overline x_{\overline \eta(s)}) + \int_{\overline \eta(s)}^{s} \overline{\mathcal{A}} (\partial_t u)(r,\overline x_r,\overline x_{\overline \eta(r)}) \,{\text{d}} r + \overline M^{\partial_t u}_s - \overline M^{\partial_t u}_{\overline \eta(s)},\\* {\partial_{\nu} u}(s,\overline x_s) & = {\partial_{\nu} u}(\overline \eta(s),\overline x_{\overline \eta(s)}) + \int_{\overline \eta(s)}^{s} \overline{\mathcal{A}} ({\partial_{\nu} u})(r,\overline x_r,\overline x_{\overline \eta(r)}) \,{\text{d}} r + \overline M^{{\partial_{\nu} u}}_s - \overline M^{{\partial_{\nu} u}}_{\overline \eta(s)},\\* \mathcal{S} u(s,\overline x_s) & = \mathcal{S} u(\overline \eta(s),\overline x_{\overline \eta(s)}) + \int_{\overline \eta(s)}^{s} \overline{\mathcal{A}} (\mathcal{S} u)(r,\overline x_r,\overline x_{\overline \eta(r)}) \,{\text{d}} s + \overline M^{\mathcal{S} u}_s - \overline M^{\mathcal{S} u}_{\overline \eta(s)}.\end{align*}

Moreover, since $\overline \eta(r) = \overline \eta(s)$ for $r \in [\overline \eta(s),s]$ , we have

\begin{align*}f(\overline x_{\overline \eta(s)}){\partial_{\nu} u}(s,\overline x_s)&=f(\overline x_{\overline \eta(s)}){\partial_{\nu} u}(\overline \eta(s),\overline x_{\overline \eta(s)})\\* &\quad\,+\int_{\overline \eta(s)}^{s} f(\overline x_{\overline \eta(r)})\overline{\mathcal{A}} ({\partial_{\nu} u})(r,\overline x_r,\overline x_{\overline \eta(r)}) \,{\text{d}} r + f(\overline x_{\overline \eta(s)})(\overline M^{{\partial_{\nu} u}}_s - \overline M^{{\partial_{\nu} u}}_{\overline \eta(s)}),\end{align*}

so that

\begin{align*}\overline{\mathcal{A}} u(s,\overline x_s,\overline x_{\overline \eta(s)})&=\overline{\mathcal{A}}u(\overline \eta(s),\overline x_{\overline \eta(s)},\overline x_{\overline \eta(s)}) + \int_{\overline \eta(s)}^s \Upsilon(r,\overline x_r,\overline x_{\overline \eta(r)})\,{\text{d}} r\\* &\quad\, +\overline M^{\partial_t u}_s - \overline M^{\partial_t u}_{\overline \eta(s)} + f(\overline x_{\overline \eta(s)})(\overline M^{{\partial_{\nu} u}}_s - \overline M^{{\partial_{\nu} u}}_{\overline \eta(s)}) + \overline M^{\mathcal{S} u}_s - \overline M^{\mathcal{S} u}_{\overline \eta(s)},\end{align*}

where

(4.7) \begin{equation}\Upsilon(t,x,y) \,:\!= ( \overline{\mathcal{A}} (\partial_t u) + f(y) \overline{\mathcal{A}} ({\partial_{\nu} u}) + \overline{\mathcal{A}} (\mathcal{S} u) )(t,x,y).\end{equation}

Since $\overline{\mathcal{A}}u(t,x,x) = \mathcal{A}u(t,x)$ , the first term in the above equality is 0 by Theorem 4.3. By using Fubini’s theorem and the fact that $(\overline M^{\partial_t u}_t)$ and $(\overline M^{\mathcal{S} u}_t)$ are true martingales, we obtain

\begin{equation*}{\mathbb{E}} \bigg[ \int_0^T \overline M^{\partial_t u}_s - \overline M^{\partial_t u}_{\overline \eta(s)} \,{\text{d}} s \bigg]={\mathbb{E}} \bigg[ \int_0^T \overline M^{\mathcal{S} u}_s - \overline M^{\mathcal{S} u}_{\overline \eta(s)} \,{\text{d}} s \bigg]=0.\end{equation*}

Moreover, since $(\overline M^{{\partial_{\nu} u}}_t)$ is a $\mathcal{F}^{\overline p}_{t}$ -martingale, we have

\begin{equation*}{\mathbb{E}} \bigg[ \int_0^T f(\overline x_{\overline \eta(s)})(\overline M^{{\partial_{\nu} u}}_s - \overline M^{{\partial_{\nu} u}}_{\overline \eta(s)}) \,{\text{d}} s \bigg]= \int_0^T{\mathbb{E}} \Big[ f(\overline x_{\overline \eta(s)}) {\mathbb{E}}[ \overline M^{{\partial_{\nu} u}}_s - \overline M^{{\partial_{\nu} u}}_{\overline \eta(s)} | \mathcal{F}^{\overline p}_{\overline \eta(s)} ] \Big] \,{\text{d}} s=0.\end{equation*}

Collecting the above results, we obtain

\begin{equation*}{\mathbb{E}} [F(\overline x_T)] - {\mathbb{E}} [F(x_T)] ={\mathbb{E}} \bigg[ \int_0^T \int_{\overline \eta(s)}^s \Upsilon(r,\overline x_{r},\overline x_{\overline \eta(r)}) \,{\text{d}} r \,{\text{d}} s \bigg].\end{equation*}

We can compute an explicit form of $\Upsilon$ in terms of u, f, $\lambda$ , Q and their derivatives. Indeed, $\Upsilon$ is given by (4.7), and we have

\begin{align*}\overline{\mathcal{A}}(\partial_t u)(t,x,y)& ={\partial^2_{tt} u}(t,x) + f(y){\partial^2_{t\nu} u}(t,x) + \mathcal{S} (\partial_t u)(t,x),\\* ( f\overline{\mathcal{A}}({\partial_{\nu} u}) )(t,x,y)& =f(y) ( {\partial^2_{t\nu} u}(t,x) + f(y){\partial^2_{\nu\nu} u}(t,x) + \mathcal{S}({\partial_{\nu} u})(t,x) ),\\* \overline{\mathcal{A}}(\mathcal{S} u)(t,x,y)&=\partial_t (\mathcal{S} u)(t,x) + f(y) \partial_{\nu} (\mathcal{S} u)(t,x) + \mathcal{S}(\mathcal{S} u)(t,x).\end{align*}

Application of the Taylor formula to the functions ${\partial^2_{tt} u}$ , ${\partial^2_{t\nu} u}$ , ${\partial^2_{\nu\nu} u}$ , $\mathcal{S} (\partial_t u)$ , $\mathcal{S}({\partial_{\nu} u})$ , $\partial_t (\mathcal{S} u)$ , $\partial_{\nu} (\mathcal{S} u)$ and $\mathcal{S}(\mathcal{S} u)$ at the order 0 around $(\overline \eta(r),\overline x_{\overline \eta(r)})$ yields $\Upsilon(r,\overline x_r,\overline x_{\overline \eta(r)})=\Upsilon(\overline \eta(r),\overline x_{\overline \eta(r)},\overline x_{\overline \eta(r)}) + {\text{O}}(h).$ Setting $\Psi(t,x) = \Upsilon(t,x,x) $ and recalling that for $r \in [\overline \eta(s),s]$ , $\overline \eta(r) = \overline \eta(s)$ and that $|s - \overline \eta(s)| \leq h$ , we obtain

\begin{equation*}{\mathbb{E}} [F(\overline x_T)] - {\mathbb{E}} [F(x_T)] ={\mathbb{E}} \bigg[ \int_0^T (s - \overline \eta(s) )\Psi(\overline \eta(s),\overline x_{\overline \eta(s)}) \,{\text{d}} s \bigg]+ {\text{O}}(h^2).\end{equation*}

Consider the expectation on the right-hand side of the above equality. We decompose the integral into a (finite) sum of integrals on the intervals $[\overline T_n + kh, (\overline T_n + (k+1)h)\wedge \overline T_{n+1} ]$ , where $\Psi$ is a constant. Without loss of generality, we are led to consider integrals of the form $\int_{kh}^t (s-kh)C \,{\text{d}} s $ for some $k \geq 0$ , $t \in [kh,(k+1)h]$ and C a bounded constant. We have

\[\int_{kh}^t (s-kh)C \,{\text{d}} s = \dfrac{t - kh}{2} \int_{kh}^tC \,{\text{d}} s{,}\]

and moreover adding and subtracting h in the numerator of $(t - kh)/2$ yields

\begin{equation*}\int_{kh}^t (s-kh)C \,{\text{d}} s=\dfrac{h}{2} \int_{kh}^tC \,{\text{d}} s + \dfrac{t - (k+1)h}{2} \int_{kh}^tC \,{\text{d}} s.\end{equation*}

Since C is bounded we deduce that

\[\int_{kh}^t (s-kh)C \,{\text{d}} s = \dfrac{h}{2} \int_{kh}^tC \,{\text{d}} s + {\text{O}}(h^2).\]

Since $\Psi$ is assumed bounded and ${\mathbb{E}}[\overline N_T] < +\infty$ , the above arguments yield the following representation:

(4.8) \begin{equation}{\mathbb{E}} [F(\overline x_T)] - {\mathbb{E}} [F(x_T)] =\dfrac{h}{2}{\mathbb{E}} \bigg[ \int_0^T \Psi(\overline \eta(s),\overline x_{\overline \eta(s)}) \,{\text{d}} s \bigg]+{\text{O}}(h^2).\end{equation}

Step 2: from the representation (4.8) to the expansion at the order one. In this step we show that

\[{\mathbb{E}} \bigg[ \int_0^T \Psi(\overline \eta(s),\overline x_{\overline \eta(s)}) \,{\text{d}} s \bigg]={\mathbb{E}} \bigg[ \int_0^T \Psi(s,x_{s}) \,{\text{d}} s \bigg]+{\text{O}}(h).\]

First, we introduce the random variables $\overline \Gamma$ and $\Gamma$ defined by

\[\overline \Gamma \,:\!=\int_{0}^{ T} \Psi (\overline \eta(s),\overline x_{\overline \eta(s)}) \,{\text{d}} s\quad\text{and}\quad\Gamma \,:\!=\int_{0}^{ T} \Psi (\overline \eta(s),x_{\overline \eta(s)}) \,{\text{d}} s {,}\]

and write

\begin{equation*}{\mathbb{E}} [ |\overline \Gamma - \Gamma| ]={\mathbb{E}} \big[\mathbb{1}_{\min(T_{\overline \tau^\dagger},\overline T_{\overline \tau^\dagger}) \leq T} |\overline \Gamma - \Gamma| \big] x+{\mathbb{E}} \big[\mathbb{1}_{\min(T_{\overline \tau^\dagger},\overline T_{\overline \tau^\dagger}) > T} |\overline \Gamma - \Gamma| \big],\end{equation*}

where $\overline \tau^\dagger$ is defined in Definition 3.1. Since $\Psi$ is bounded and $\mathbb{P}(\!\min(T_{\overline \tau^\dagger},\overline T_{\overline \tau^\dagger}) \leq T ) = {\text{O}}(h) $ (see the proof of Theorem 3.1), we have

\[{\mathbb{E}} \big[ |\overline \Gamma - \Gamma| \mathbb{1}_{\min(T_{\overline \tau^\dagger},\overline T_{\overline \tau^\dagger}) \leq T} \big]={\text{O}}(h).\]

Now, recall from (3.1) that, on the event $\{\min(T_{\overline \tau^\dagger},\overline T_{\overline \tau^\dagger}) > T\}$ , we have $T_k = \overline T_k$ and $\theta_k = \overline \theta_k$ for all $k \geq 1$ such that $T_k \in [0,T]$ . Thus, for all $n \leq \overline N_T$ and for all $s \in [\overline T_n,\overline T_{n+1}$ [we have $\overline x_{\overline \eta(s)} = (\overline \theta_n, \overline \phi_{\overline \theta_n}(\overline \eta(s)-\overline T_n,\overline \nu_n))$ and $ x_{\overline \eta(s)} = (\overline \theta_n, \phi_{\overline \theta_n}(\overline \eta(s)-\overline T_n, \nu_n))$ . Consequently, on the event $\{\min(T_{\overline \tau^\dagger},\overline T_{\overline \tau^\dagger}) > T \}$ we have

\begin{align*}|\overline \Gamma - \Gamma|\leq\sum_{n=0}^{\overline N_T} \int_{\overline T_n}^{\overline T_{n+1} \wedge T} | \Psi (\overline \eta(s),\overline \theta_n,\overline \phi_{\overline \theta_n}(\overline \eta(s)-\overline T_n,\overline \nu_n)) - \Psi (\overline \eta(s),\overline \theta_n, \phi_{\overline \theta_n}(\overline \eta(s)-\overline T_n, \nu_n)) | \,{\text{d}} s.\end{align*}

From the regularity Assumptions 4.1 and 4.2, the function $\nu \mapsto \Psi(t,\theta,\nu)$ is uniformly Lipschitz in $(t,\theta)$ with constant $L_{\Psi}$ as sum and product of bounded Lipschitz functions. Thus, from this Lipschitz property and the application of Lemma 2.1, we get

\begin{equation*}| \Psi (\overline \eta(s),\overline \theta_n,\overline \phi_{\overline \theta_n}(\overline \eta(s)-\overline T_n,\overline \nu_n)) - \Psi (\overline \eta(s),\overline \theta_n, \phi_{\overline \theta_n}(\overline \eta(s)-\overline T_n, \nu_n)) |\leq L_{\Psi} C \,{\text{e}}^{LT} (n+1) h.\end{equation*}

From the above inequality, we find that

\[{\mathbb{E}} \big[ \mathbb{1}_{\min(T_{\overline \tau^\dagger},\overline T_{\overline \tau^\dagger}) > T} |\overline \Gamma - \Gamma| \big] \leq L_{\Psi} C \,{\text{e}}^{LT} T h {\mathbb{E}} [\overline N_T (\overline N_T+1)].\]

Since $\overline N_T \leq N^{*}_T$ and ${\mathbb{E}} [ N^{*}_T (N^{*}_T+1)] < +\infty$ , we conclude that

\[{\mathbb{E}} \big[ \mathbb{1}_{\min(T_{\overline \tau^\dagger},\overline T_{\overline \tau^\dagger}) > T} |\overline \Gamma - \Gamma| \big] = {\text{O}}(h).\]

We have shown that

\[{\mathbb{E}} \bigg[ \int_0^T \Psi(\overline \eta(s),\overline x_{\overline \eta(s)}) \,{\text{d}} s \bigg]={\mathbb{E}} \bigg[ \int_0^T \Psi(\overline \eta(s), x_{\overline \eta(s)}) \,{\text{d}} s \bigg]+{\text{O}}(h).\]

Secondly, from the regularity Assumptions 4.1 and 4.2, the function $(t,\nu) \mapsto \Psi(t,\theta,\nu)$ is uniformly Lipschitz in $\theta$ . Moreover, for all $s \in [0,T]$ there exists $k \geq 0$ such that both s and $\overline \eta(s)$ belong to the same interval $[\overline T_k,\overline T_{k+1}$ [so that $x_s = (\theta_k,\phi_{\theta_k}(s-\overline T_k,\nu_k))$ and $x_{\overline \eta(s)} = (\theta_k,\phi_{\theta_k}(\overline \eta(s)-\overline T_k,\nu_k))$ . Thus, from the Lipschitz continuity of $\Psi$ , from the fact that $|s-\overline \eta(s)| \leq h$ and since $f_{\theta}$ is uniformly bounded in $\theta$ , we have $|\Psi(s,x_s) - \Psi (\overline \eta(s),x_{\overline \eta(s)})| \leq C h$ , where C is a constant independent of h. Then we obtain

\[\sup_{s \in [0,T]} | {\mathbb{E}} [\Psi(s,x_s)] - {\mathbb{E}} [\Psi (\overline \eta(s),x_{\overline \eta(s)}) ] |\leq C h{,}\]

from which we deduce that

\[\bigg| {\mathbb{E}} \bigg[ \int_{0}^T \Psi (\overline \eta(s),x_{\overline \eta(s)}) \,{\text{d}} s \bigg]-{\mathbb{E}} \bigg[ \int_{0}^T \Psi (s,x_s) \,{\text{d}} s \bigg] \bigg|\leq CT h.\]

Finally, the weak error expansion reads

\begin{equation*}{\mathbb{E}} [F (\overline x_T)] - {\mathbb{E}} [F (x_T)]=\dfrac{h}{2} {\mathbb{E}} \bigg[ \int_{0}^T \Psi (s,x_s) \,{\text{d}} s \bigg] +{\text{O}}(h^2).\end{equation*}

5. Numerical experiment

In this section we use the theoretical results above to apply the MLMC method to the PDMP two-dimensional Morris–Lecar (shortened to PDMP 2d-ML).

5.1. The PDMP two-dimensional Morris–Lecar

The deterministic Morris–Lecar model was introduced in 1981 by Catherine Morris and Harold Lecar in [Reference Morris and Lecar23] to explain the dynamics of the barnacle muscle fibre. This model belongs to the family of conductance-based models (like the Hodgkin–Huxley model [Reference Hodgkin and Huxley19]) and takes the following form:

(5.1) \begin{equation}\begin{aligned} \dfrac{{\text{d}} v}{{\text{d}} t} & = \dfrac{1}{C} ( I - g_{\text{Leak}} (v - V_{\text{Leak}}) - g_{\text{Ca}} M_{\infty}(v) (v - V_{\text{Ca}}) - g_{\text{K}} n (v - V_{\text{K}}) ),\\\dfrac{{\text{d}} n}{{\text{d}} t} & = (1 - n) \alpha_{\text{K}}(v) - n \beta_{\text{K}}(v),\end{aligned} \end{equation}

where

\begin{align*} M_{\infty}(v) & = (1 + \tanh[ (v-V_1 )/ V_2 ] )/2,\\[3pt] \alpha_{\text{K}}(v) & = \lambda_{\text{K}}(v) N_{\infty}(v),\\[3pt] \beta_{\text{K}}(v) & = \lambda_{\text{K}}(v) (1- N_{\infty}(v)),\\[3pt] N_{\infty}(v) & = (1 + \tanh[ (v-V_3 )/ V_4 ] )/2,\\[3pt] \lambda_{\text{K}}(v) & = \overline \lambda_{\text{K}} \cosh((v-V_3 )/ 2V_4 ). \end{align*}

In this section we consider the PDMP version of (5.1), which we denote by $(x_t, t \in [0,T])$ , $T > 0$ , whose characteristics $(\,f,\lambda,Q)$ are given by

\begin{gather*} f(\theta,\nu) = \dfrac{1}{C} \bigg( I - g_{\text{Leak}} (\nu - V_{\text{Leak}}) - g_{\text{Ca}} M_{\infty}(\nu) (\nu - V_{\text{Ca}}) - g_{\text{K}} \dfrac{\theta}{N_{\text{K}}} (\nu - V_{\text{K}}) \bigg),\\* \lambda(\theta,\nu) = (N_{\text{K}} - \theta) \alpha_{\text{K}}(\nu) + \theta \beta_{\text{K}}(\nu), \\* Q( (\theta,\nu), \{\theta +1\} ) = \dfrac{(N_{\text{K}} - \theta) \alpha_{\text{K}}(\nu)}{\lambda(\theta,\nu)}, \quad Q ( (\theta,\nu), \{\theta - 1\} ) = \dfrac{\theta \beta_{\text{K}}(\nu)}{\lambda(\theta,\nu)}.\end{gather*}

The state space of the model is $E = \{0,\ldots,N_{\text{K}}\} \times \mathbb{R}$ where $N_{\text{K}} \geq 1$ stands for the number of potassium gates. The values of the parameters used in the simulations are $V_1 = -1.2$ , $V_2 = 18$ , $V_3 = 2$ , $V_4 = 30$ , $\overline \lambda_{\text{K}} = 0.04$ , $C = 20$ , $g_{\text{Leak}} = 2$ , $V_{\text{Leak}} = -60$ , $g_{\text{Ca}} = 4.4$ , $V_{\text{Ca}} = 120$ , $g_{\text{K}} = 8$ , $V_{\text{K}} = -84$ , $I = 60$ , $N_{\text{K}} = 100$ . See Figure 1.

Figure 1. Ten trajectories of the characteristics of the PDMP 2d-ML on [0,100]. (a) Membrane potential as a function of time: red curves, stochastic potential; black curve, deterministic potential. (b) Proportion of opened gates as a function of time: red curves, stochastic gates ( $\theta/N_{\text{K}}$ ); black curve, deterministic gates (n). (c) Probability of opening a gate ( $Q(x_t, \{\theta_t+1\})$ ) as a function of time. (d) Jump rate ( $\lambda(x_t)$ ) as a function of t.

5.2. Classical and multilevel Monte Carlo estimators

In this subsection we introduce the classical and multilevel Monte Carlo estimators in order to estimate the quantity ${\mathbb{E}} [ F(x_T) ]$ , where $(x_t, t \in [0,T])$ is the PDMP 2d-ML and $F(\theta,\nu) = \nu$ for $(\theta,\nu) \in E$ so that $F(x_T)$ gives the value of the membrane potential at time T. Note that other possible choices are $F(\theta,\nu) = \nu^n$ or $F(\theta,\nu) = \theta^n$ for some $n \geq 2$ . In those cases, the quantity ${\mathbb{E}} [ F(x_T) ]$ gives the moments of the membrane potential or the number of open gates at time T so that we can compute statistics on these biological variables.

Let $X \,:\!= F(x_T)$ . It will be convenient below to emphasize the dependence of the Euler scheme $(\overline x_t)$ on a time step h. We introduce a family of random variables $(X_h, h > 0)$ defined by $X_h \,:\!= F(\overline x_T)$ , where for a given $h>0$ the corresponding PDP $(\overline x_t)$ is constructed as in Section 2.3 with time step h. In particular, the processes $(\overline x_t)$ for $h >0$ are correlated through the same randomness $(U_k)$ , $(V_k)$ , and $(N^{*}_t)$ . We build a classical Monte Carlo estimator of ${\mathbb{E}}[X]$ based on the family $(X_h, h > 0)$ as follows:

(5.2) \begin{equation}Y^{\text{MC}} = \dfrac{1}{N} \sum_{k=1}^N X_h^{k},\end{equation}

where $(X_h^{k},k \geq 1)$ is an i.i.d. sequence of random variables distributed like $X_h$ . The parameters $h>0$ and $N \in \mathbb{N}$ have to be determined. We build a multilevel Monte Carlo estimator based on the family $(X_h, h > 0)$ as follows:

(5.3) \begin{equation}Y^{\text{MLMC}} = \dfrac{1}{N_1} \sum_{k=1}^{N_1} X_{h^{*}}^{k} + \sum_{l=2}^L \dfrac{1}{N_l} \sum_{k=1}^{N_l} (X_{h_l}^{k} - X_{h_{l-1}}^{k}),\end{equation}

where $( (X_{h_l}^{k}, X_{h_{l-1}}^{k}), k \geq 1 )$ for $l=2,\ldots,L$ are independent sequences of independent copies of the pair $(X_{h_l}, X_{h_{l-1}})$ and independent of the i.i.d. sequence $(X_{h^{*}}^{k}, k \geq 1)$ . The parameter $h^{*}$ is a free parameter that we fix in Section 5.4. The parameters $L \geq 2$ , $M \geq 2$ , $N \geq 1$ and $q = (q_1, \ldots, q_L) \in ]0,1[^L$ with $\sum_{l=1}^L q_l = 1$ have to be determined; then we set $N_l \,:\!= \lceil N q_l\rceil$ , $h_l \,:\!= h^{*} M^{-(l-1)}$ .

We also set $\tilde X \,:\!= F(\tilde x_T) \tilde R_T $ , where $\tilde R_T$ is defined as in Proposition 2.2 with an intensity $\tilde \lambda$ and a kernel $\tilde Q$ that will be specified in Section 5.4, and let $(\tilde X_h, h > 0)$ be such that $\tilde X_h \,:\!= F(\underline{\tilde x}_T) \underline{\tilde R}_T$ for all $h >0$ . By Proposition 2.2 we have ${\mathbb{E}}[X] = {\mathbb{E}}[\tilde X]$ and ${\mathbb{E}}[X_h] = {\mathbb{E}}[\tilde X_h]$ for $h >0$ . Consequently, we build likewise a multilevel estimator $\tilde Y^{\text{MLMC}}$ based on the family $(\tilde X_h, h > 0)$ .

The complexity of the classical Monte Carlo estimator $Y^{\text{MC}}$ depends on the parameters (h, N) and that of the multilevel estimators $Y^{\text{MLMC}}$ and $\tilde Y^{\text{MLMC}}$ depends on (L, q, N). In order to compare those estimators we proceed as in [Reference Lemaire and Pagés21] (see also [Reference Pagés24]), that is to say, for each estimator we determine the parameters which minimize the global complexity (or cost) subject to the constraint that the resulting $L^2$ -error must be lower than a prescribed $\epsilon >0$ .

As in [Reference Lemaire and Pagés21], we let $V_1$ , $c_1$ , $\alpha$ , $\beta$ , and $\text{Var}(X)$ be the structural parameters associated with the family $(X_h,h > 0)$ and X. We know theoretically from Theorem 3.1 (strong estimate) and Theorem 4.1 (weak expansion) that $(\alpha,\beta) = (1,1)$ , whereas $V_1$ , $c_1$ , and $\text{Var}(X)$ are not explicit (we explain how we estimate them in Section 5.3). Moreover, the structural parameters $\tilde V_1$ , $\tilde c_1$ , $\tilde \alpha$ , $\tilde \beta$ , and $\text{Var}(\tilde X)$ associated with $(\tilde X_h, h > 0)$ and $\tilde X$ are such that $\tilde \alpha = \alpha$ , $\tilde c_1 = c_1$ (see (4.4)), $\tilde \beta = 2$ (see Theorem 3.2), and $\tilde V_1$ , $\text{Var}(\tilde X)$ are not explicit.

The classical and multilevel estimators defined above are linear and of Monte Carlo type in the sense described in [Reference Lemaire and Pagés21]. The optimal parameters of those estimators are then expressed in terms of the corresponding structural parameters as follows (see [Reference Lemaire and Pagés21] or [Reference Pagés24]). For a user-prescribed $\epsilon > 0$ , the classical Monte Carlo parameters h and N are

(5.4) \begin{equation}h(\epsilon) = (1 + 2\alpha)^{{-1}/{({2\alpha})}} \bigg( \dfrac{\epsilon}{|c_1|} \bigg)^{{{{1}/{\alpha}}}},\quad N(\epsilon) = \bigg( 1 + \dfrac{1}{2\alpha} \bigg) \dfrac{\text{Var}(X) ( 1 + \rho h^{\beta/2}(\epsilon) )^2}{\epsilon^2},\end{equation}

where $\rho = \sqrt{V_1 / \text{Var}(X)}$ . The parameters of the estimator $Y^{\text{MLMC}}$ are given in Table 1, where $n_l \,:\!= M^{l-1}$ for $l = 1, \ldots, L$ with the convention $n_0 = n_0^{-1} = 0$ . The parameters of $\tilde Y^{\text{MLMC}}$ are given in a similar way using $\tilde V_1$ , $\tilde \beta$ , and $\text{Var}(\tilde X)$ . Finally, the parameter $M(\epsilon)$ is determined as in [Reference Lemaire and Pagés21, Section 5.1].

Table 1: Optimal parameters for the MLMC estimator (5.3).

5.3. Methodology

We compare the classical and multilevel Monte Carlo estimators in terms of precision, CPU-time, and complexity. The precision of an estimator Y is defined by the $L^2$ -error

\[\| Y - {\mathbb{E}}[X] \|_2 = \sqrt{ ({\mathbb{E}}[Y] - {\mathbb{E}}[X])^2 + \text{Var}(Y) }{,}\]

also known as the root mean square error (RMSE). The CPU-time represents the time needed to compute one realization of an estimator. The complexity is defined as the number of time steps involved in the simulation of an estimator. Let Y denote the estimator (5.2) or (5.3). We estimate the bias of Y by

\begin{equation*}\widehat b_R = \dfrac{1}{R} \sum_{k=1}^R Y^{k} - {\mathbb{E}} [X],\end{equation*}

where $Y^{1},\ldots,Y^{R}$ are R independent replications of the estimator. We estimate the variance of Y by

\begin{equation*}\widehat v_R = \dfrac{1}{R} \sum_{k=1}^R v^{k},\end{equation*}

where $v^{1},\ldots,v^{R}$ are R independent replications of v, the empirical variance of Y. In the case where Y is the crude Monte Carlo estimator, we set

\begin{equation*}v = \dfrac{1}{N(N-1)} \sum_{k=1}^N (X_h^{k} - m_N)^2,\quad m_N = \dfrac{1}{N} \sum_{k=1}^N X_h^{k}.\end{equation*}

If Y is the MLMC estimator, we set

\begin{equation*}v = \dfrac{1}{N_1 (N_1 -1)} \sum_{k=1}^{N_1} (X_h^{k} - m^{(1)}_{N_1})^2 + \sum_{l = 2}^L \dfrac{1}{N_l (N_l-1)} \sum_{k=1}^{N_l} (X_{h_l}^{k}-X_{h_{l-1}}^{k} - m^{(l)}_{N_l})^2,\end{equation*}

where

\[m^{(1)}_{N_1} = \dfrac{1}{N_1} \sum_{k=1}^{N_1} X_{h}^{k}{,}\]

and for $l \geq 2$ ,

\[m^{(l)}_{N_l} = \dfrac{1}{N_l} \sum_{k=1}^{N_l} X_{h_l}^{k}-X_{h_{l-1}}^{k}.\]

Then we define the empirical RMSE $\widehat \epsilon_R$ by

(5.5) \begin{equation}\widehat \epsilon_R = \sqrt{ \widehat b_R^2 + \widehat v_R }.\end{equation}

The numerical computation of (5.5) for both estimators (5.2) and (5.3) requires computation of the optimal parameters given by (5.4) and in Table 1 of Section 5.2, which are expressed in terms of the structural parameters $c_1$ , $V_1$ , and $\text{Var}(X)$ . Moreover, computation of the bias requires the value ${\mathbb{E}}[X]$ . Since there is no closed formula for the mean and variance of X, we estimate them using a crude Monte Carlo estimator with $h=10^{-5}$ and $N=10^6$ . The constants $c_1$ and $V_1$ are not explicit; we use the same estimator of $V_1$ as in [Reference Lemaire and Pagés21, Section 5.1], that is,

(5.6) \begin{equation}\widehat{V_1} = (1 + M^{-\beta/2})^{-2} h^{- \beta} {\mathbb{E}} [ |X_h - X_{h/M}|^2 ],\end{equation}

and we use the following estimator of $c_1$ :

(5.7) \begin{equation}\widehat{c_1} = (1- M^{- \alpha} )^{-1} h^{-\alpha} {\mathbb{E}} [ X_{h/M} - X_h].\end{equation}

The estimator of $c_1$ is obtained by writing the weak error expansion for the two time steps h and $h/M$ , summing and neglecting the ${\text{O}}(h^2)$ term. In (5.6) we use $(h,M)=(0.1,4)$ and in (5.7) we use $(h,M) = (1,4)$ , and the expectations are estimated using a classical Monte Carlo of size $N=10^4$ on $(X_{h/M},X_h)$ . We emphasize that we are interested in the order of $c_1$ and $V_1$ , so we do not need a precise estimate here.

5.4. Numerical results

In this subsection we first illustrate the results of Theorems 3.1 and 3.2 on the Morris–Lecar PDMP and then compare the MC and MLMC estimators. The simulations were carried out on a computer with an Intel Core i5-4300U CPU @ 1.90GHz $\times$ 4 processor. The code is written in C++. We implement the estimator $\tilde Y^{\text{MLMC}}$ (see Section 5.2) for the following choices of the parameters $(\tilde \lambda, \tilde Q)$ .

Case 1: $\tilde \lambda(\theta) = 1$ and

\[\tilde Q( \theta, \{\theta +1\} ) = \dfrac{N_{\text{K}} - \theta}{N_{\text{K}}},\quad\tilde Q ( \theta, \{\theta - 1\} ) = \dfrac{\theta}{N_{\text{K}}}.\]

Case 2: $\tilde \lambda(x,t) = \lambda(\theta, v(t))$ and $\tilde Q((x,t), \,{\text{d}} y) = Q((\theta,v(t)), \,{\text{d}} y)$ , where v denotes the first component of the solution of (5.1).

Cases 1 and 2 correspond to the application of Proposition 2.2. Based on Corollary 2.2, we also consider the following case.

Case 3: Consider the quantity ${\mathbb{E}} [F(x_T) - F(\tilde x_T)]$ , where $(x_t)$ and $(\tilde x_t)$ are PDPs with characteristics $(\Phi,\lambda,Q)$ and $(\tilde \Phi,\lambda,Q)$ respectively. By Corollary 2.2, we have ${\mathbb{E}}[F(\tilde x_T)] = {\mathbb{E}}[F(y_T) \tilde R_T]$ , where $(y_t)$ is a PDP whose discrete component jumps in the same states and at the same times as the discrete component of $(x_t)$ , and $(\tilde R_t)$ is the corresponding corrective process. Thus, we consider the quantity ${\mathbb{E}} [F(x_T) - F(y_T) \tilde R_T]$ instead of ${\mathbb{E}} [F(x_T) - F(\tilde x_T)]$ .

Case 3 implies using the following MLMC estimator, which is slightly different from (5.3):

\begin{equation*}\tilde Y^{\text{MLMC}} = \dfrac{1}{N_1} \sum_{k=1}^{N_1} X_{h^{*}}^{k} + \sum_{l=2}^L \dfrac{1}{N_l} \sum_{k=1}^{N_l} X_{h_l}^{k} - \tilde X_{h_{l-1}}^{k},\end{equation*}

where $( (X_{h_l}^{k}, \tilde X_{h_{l-1}}^{k}), k \geq 1 )$ for $l=2,\ldots,L$ are independent sequences of independent copies of the pair $(X_{h_l}, \tilde X_{h_{l-1}}) = (F(\overline x_T),F(\overline y_T)\tilde R_T)$ , where $(\overline y_t)$ is a PDP whose discrete component jumps in the same states and at the same times as the Euler scheme $(\overline x_t)$ with time step $h_l$ , whose deterministic motions are given by the approximate flows with time step $h_{l-1}$ and $(\tilde R_t)$ is the corresponding corrective process (see Corollary 2.2).

Figure 2 confirms numerically that ${\mathbb{E}} [ | X_{h_l} - X_{h_{l-1}} |^2 ] = {\text{O}}(h_l)$ and that ${\mathbb{E}}[ |\tilde X_{h_l} - \tilde X_{h_{l-1}}|^2 ] = {\text{O}}(h_l^2)$ for Cases 1, 2, and 3 (see Theorems 3.1 and 3.2 respectively). Indeed, for $T=10$ (see Figure 2(a)), we observe that the curve corresponding to the decay of ${\mathbb{E}}[ | X_{h_l} - X_{h_{l-1}}|^2 ]$ as l increases is approximately parallel to a line of slope $-1$ and that the curves corresponding to the decay of ${\mathbb{E}}[ |\tilde X_{h_l} - \tilde X_{h_{l-1}}|^2 ]$ in Cases 1, 2, and 3 are parallel to a line of slope $-2$ . We also see that the curves corresponding to Cases 2 and 3 are approximately similar, and that for some value of l those curves go below the one corresponding to ${\mathbb{E}}[ | X_{h_l} - X_{h_{l-1}}|^2 ]$ . The curve corresponding to Case 1 is always above all the others; this indicates that the $L^2$ -error (or the variance) in Case 1 is too big (with respect to the others) and that is why we do not consider this case below. As T increases (see Figures 2(b) and 2(c)), the theoretical order of the numerical schemes is still observed. However, for $T=20$ , a slight difference begins to emerge between Cases 2 and 3 (Case 3 being better) and this difference is accentuated for $T=30$ , so we do not represent Case 2.

Figure 2. Plots (a), (b), and (c) show the decay of ${\mathbb{E}} [(X_{h_l} - X_{h_{l-1}})^2]$ and ${\mathbb{E}}[\tilde X_{h_l} - \tilde X_{h_{l-1}})^2]$ (y-axis, $\log_M$ scale) as a function of l with $h_l = h \times M^{-(l-1)}$ , $h=1$ , $M=4$ , for different values of the final time T: (a) $T=10$ , (b) $T=20$ , (c) $T=30$ . For a visual guide, we added black solid lines with slopes $-1$ and $-2$ .

For the Monte Carlo simulations we set $T=30$ , $\lambda^{*}=10$ , and the time step involved in the first level of the MLMC is set to $h^{*}= 0.1$ . We choose this value for $h^{*}$ because it represents (on average) the size of an interval $[T^{*}_n,T^{*}_{n+1}]$ of two successive jump times of the auxiliary Poisson process $(N^{*}_t)$ . The estimation of the true value and variance leads to ${\mathbb{E}}[X] = -31.4723$ and Var $(X) = 335$ . Note that $v(30) = -35.3083$ , where v is the deterministic membrane potential solution of (5.1), so there is an offset between the deterministic potential and the mean of the stochastic potential. We replicate 100 times the simulation of the classical and multilevel estimators to compute the empirical RMSE so that $R=100$ in (5.5).

The results of the Monte Carlo simulations are shown in Table 2 for the classical Monte Carlo estimator $Y^{\text{MC}}$ and in Tables 3 and 4 for the multilevel estimators $Y^{\text{MLMC}}$ and $\tilde Y^{\text{MLMC}}$ (Case 3). As an example, the first line of Table 3 reads as follows: for a user-prescribed $\epsilon= 2^{-1} = 0.5$ , the MLMC estimator $Y^{\text{MLMC}}$ is implemented with $L = 2 $ levels, the time step at the first level is $h^{*} = 0.1$ , this time step is refined by a factor $n_l = M^{l-1}$ with $M=2$ at each level, and the sample size is $N = 2600$ . For such parameters, the numerical complexity of the estimator is $\text{Cost}(Y^{\text{MLMC}} ) = 28\,200$ , the empirical RMSE $\widehat \epsilon_{100} = 0.389$ and the computational time of one realization of $Y^{\text{MLMC}}$ is $0.362$ seconds. We also reported the empirical bias $\widehat b_{100}$ and the empirical variance $\widehat v_{100}$ in view of (5.5).

Table 2: Results and parameters of the Monte Carlo estimator $Y^{\text{MC}}$ . Estimated values of the structural parameters: $c_1=4.58$ , $V_1=7.25$ .

Table 3: Results and parameters of the multilevel Monte Carlo estimator $ Y^{\text{MLMC}}$ . Estimated values of the structural parameters: $c_1=4.58$ , $V_1=7.25$ .

Table 4: Results and parameters of the multilevel Monte Carlo estimator $ \tilde Y^{\text{MLMC}}$ (Case 3). Estimated values of the structural parameters: $\tilde c_1=3.91$ , $\tilde V_1=34.1$ .

The results indicate that the MLMC outperforms classical MC. More precisely, for small values of $\epsilon$ (i.e. $k=1,2,3$ ) the complexity and the CPU-time of the classical and multilevel MC estimators are of the same order. As $\epsilon$ decreases (i.e. as k increases) the difference in complexity and CPU-time between classical and multilevel MC increases. Indeed, for $k=5$ the complexity of the estimator $Y^{\text{MC}}$ is approximately 13 times superior to that of $Y^{\text{MLMC}}$ and 19 times superior to that of $\tilde Y^{\text{MLMC}}$ . The same fact appears when we look at the complexity ratio of the estimators $Y^{\text{MLMC}}$ and $\tilde Y^{\text{MLMC}}$ (i.e. Cost( $Y^{\text{MLMC}}$ )/Cost( $\tilde Y^{\text{MLMC}}$ )) as $\epsilon$ decreases. However, the difference between the complexity of these two MLMC estimators increases more slowly than the one between an MC and an MLMC estimator. Recall that the computational benefit of the MLMC over the MC grows as the prescribed $\epsilon$ decreases.

Both classical and multilevel estimators provide an empirical RMSE which is close to the prescribed precision (see Tables 2, 3, and 4). We can conclude that the choice of the parameters is well adapted. For readability, Figures 3(a) and 3(b) show the ratios of the complexities and the CPU-times of the three estimators $Y^{\text{MC}}$ , $Y^{\text{MLMC}}$ , and $\tilde Y^{\text{MLMC}}$ as a function of $\epsilon$ .

Figure 3. (a) Ratio of the complexities and (b) ratio of the CPU-times with respect to the complexity and CPU-time of the estimator $\tilde Y^{\text{MLMC}}$ as a function of the prescribed $\epsilon$ ( $\log_2$ scale for the x-axis, $\log$ scale for the y-axis).

References

Anderson, D. F. andHigham, D. J. (2012). Multilevel Monte Carlo for continuous time Markov chains, with applications in biochemical kinetics. Multiscale Model. Simul. 10, 146179.CrossRefGoogle Scholar
Anderson, D. F., Higham, D. J. andSun, Y. (2014). Complexity of multilevel Monte Carlo tau-leaping. SIAM J. Numer. Anal. 52, 31063127.CrossRefGoogle Scholar
Benam, M., Le Borgne, S., Malrieu, F. andZitt, P.-A. (2012). Quantitative ergodicity for some switched dynamical systems. Electron. Commun. Probab. 17, 114, 56.Google Scholar
Brémaud, P. (1981). Point Processes and Queues, Martingale Dynamics. Springer, New York.CrossRefGoogle Scholar
Davis, M. H. A. (1984). Piecewise-deterministic Markov processes: a general class of non-diffusion stochastic models. J. R. Statist. Soc. 46, 353388.Google Scholar
Davis, M. H. A. (1993). Markov Models and Optimization. Chapman & Hall, London.CrossRefGoogle Scholar
Dereich, S. (2011). Multilevel Monte Carlo algorithms for Lévy-driven SDEs with Gaussian correction. J. Appl. Prob. 21, 283311.CrossRefGoogle Scholar
Dereich, S. andHeidenreich, F. (2011). A multilevel Monte Carlo algorithm for Lévy-driven stochastic differential equations. Stoch. Process. Appl. 121, 15651587.CrossRefGoogle Scholar
Devroye, L. (1986). Non-Uniform Random Variate Generation. Springer, New York.CrossRefGoogle Scholar
Ferreiro-Castilla, A., Kyprianou, A. E., Scheichl, R. andSuryanarayana, G. (2014). Multilevel Monte Carlo simulation for Lévy processes based on the Wiener–Hopf factorisation. Stoch. Process. Appl. 124, 9851010.CrossRefGoogle Scholar
Giles, M. B. (2008). Multilevel Monte Carlo path simulation. Operat. Res. 56, 607617.CrossRefGoogle Scholar
Giles, M. B. (2015). Multilevel Monte Carlo methods. In Acta Numerica 24, pp. 259328. Cambridge University Press.CrossRefGoogle Scholar
Giorgi, D. (2017). Théorèmes limites pour estimateurs Multilevel avec et sans poids: comparaisons et applications. Doctoral thesis, Université Pierre et Marie Curie – Paris 6.Google Scholar
Glynn, P. W. andRhee, C.-H. (2014). Exact estimation for Markov chain equilibrium expectations. J. Appl. Prob. 51, 377389.CrossRefGoogle Scholar
Glynn, P. W. andRhee, C.-H. (2015). Unbiased estimation with square root convergence for SDE models. Operat. Res. 63, 10261043.Google Scholar
Graham, C. andTalay, D. (2013). Stochastic Simulation and Monte Carlo. Springer.CrossRefGoogle Scholar
Hairer, E., Nørsett, S. P. andWanner, G. (2008). Solving Ordinary Differential Equations I, 2nd revised edn. Springer.Google Scholar
Heinrich, S. (2001). Multilevel Monte Carlo methods. In Large-Scale Scientific Computing (Lecture Notes Comput. Sci. 2179), eds S. Margenov et al., pp. 5867. Springer, Berlin and Heidelberg.Google Scholar
Hodgkin, A. L. andHuxley, A. F. (1952). A quantitative description of membrane current and its application to conduction and excitation in nerve. J. Physiol. 117, 500544.CrossRefGoogle ScholarPubMed
Jacobsen, M. (2006). Point Process Theory and Applications: Marked Point and Piecewise Deterministic Processes. Birkhäuser, Boston.Google Scholar
Lemaire, V. andPagés, G. (2017). Multilevel Richardson–Romberg extrapolation. Bernoulli 23 (4A), 26432692.CrossRefGoogle Scholar
Lemaire, V., Thieullen, M. andThomas, N. (2018). Exact simulation of the jump times of a class of piecewise deterministic Markov processes. J. Sci. Comput. 75, 17761807.CrossRefGoogle Scholar
Morris, C. andLecar, H. (1981). Voltage oscillations in the barnacle giant muscle fiber. Biophys. J. 35, 193213.CrossRefGoogle ScholarPubMed
Pagés, G. (2018). Numerical Probability: An Introduction with Applications to Finance (Universitext). Springer, Cham.CrossRefGoogle Scholar
Pakdaman, K., Thieullen, M. andWainrib, G. (2010). Fluid limit theorems for stochastic hybrid systems with application to neuron models. Adv. Appl. Prob. 42 (3), 761794.CrossRefGoogle Scholar
Palmowski, Z. andRolski, T. (2002). A technique for exponential change of measure for Markov processes. Bernoulli 8 (6), 767785.Google Scholar
Talay, D. andTubaro, L. (1990). Expansion of the global error for numerical schemes solving stochastic differential equations. Stoch. Anal. Appl. 8 (4), 483509.CrossRefGoogle Scholar
Xia, Y. andGiles, M. B. (2012). Multilevel path simulation for jump-diffusion SDEs. In Monte Carlo and Quasi-Monte Carlo Methods 2010 (Springer Proc. Math. Statist. 23), eds L. Plaskota and H. Woźniakowski, pp. 695708. Springer, Berlin and Heidelberg.CrossRefGoogle Scholar
Figure 0

Figure 1. Ten trajectories of the characteristics of the PDMP 2d-ML on [0,100]. (a) Membrane potential as a function of time: red curves, stochastic potential; black curve, deterministic potential. (b) Proportion of opened gates as a function of time: red curves, stochastic gates ($\theta/N_{\text{K}}$); black curve, deterministic gates (n). (c) Probability of opening a gate ($Q(x_t, \{\theta_t+1\})$) as a function of time. (d) Jump rate ($\lambda(x_t)$) as a function of t.

Figure 1

Table 1: Optimal parameters for the MLMC estimator (5.3).

Figure 2

Figure 2. Plots (a), (b), and (c) show the decay of ${\mathbb{E}} [(X_{h_l} - X_{h_{l-1}})^2]$ and ${\mathbb{E}}[\tilde X_{h_l} - \tilde X_{h_{l-1}})^2]$ (y-axis, $\log_M$ scale) as a function of l with $h_l = h \times M^{-(l-1)}$, $h=1$, $M=4$, for different values of the final time T: (a) $T=10$, (b) $T=20$, (c) $T=30$. For a visual guide, we added black solid lines with slopes $-1$ and $-2$.

Figure 3

Table 2: Results and parameters of the Monte Carlo estimator $Y^{\text{MC}}$. Estimated values of the structural parameters: $c_1=4.58$, $V_1=7.25$.

Figure 4

Table 3: Results and parameters of the multilevel Monte Carlo estimator $ Y^{\text{MLMC}}$. Estimated values of the structural parameters: $c_1=4.58$, $V_1=7.25$.

Figure 5

Table 4: Results and parameters of the multilevel Monte Carlo estimator $ \tilde Y^{\text{MLMC}}$ (Case 3). Estimated values of the structural parameters: $\tilde c_1=3.91$, $\tilde V_1=34.1$.

Figure 6

Figure 3. (a) Ratio of the complexities and (b) ratio of the CPU-times with respect to the complexity and CPU-time of the estimator $\tilde Y^{\text{MLMC}}$ as a function of the prescribed $\epsilon$ ($\log_2$ scale for the x-axis, $\log$ scale for the y-axis).