1. Introduction
Sampling paths of a diffusion process remains a challenging problem in applied probability. The major bottleneck is that its finite-dimensional distributions are seldom available in closed form, and one must often resort to time-discretized numerical approximations. These approximations, however, induce bias and approximation errors that are difficult to quantify. Moreover, reducing such errors requires refining the time-discretization grid, which, in turn, increases computational costs. In this context, exact simulation algorithms, which aim to recover samples from the true finite-dimensional distributions of a diffusion, have become increasingly popular.
The standard approach to exact simulation of diffusions is based on the family of exact rejection algorithms, which rely on an acceptance–rejection scheme that requires samples from a candidate diffusion only at a finite collection of (random) time points in order to take a decision. The candidate needs to be such that it can be simulated without approximation, and it needs to allow for the construction of the acceptance–rejection probability by means of a Girsanov transformation of measures; see [Reference Karatzas and Shreve34]. Once a candidate is accepted, the algorithm returns a skeleton of the target path, and the remaining segments can be sampled at any other time instant by simulating from suitable diffusion bridges and with no further reference to the unknown target distribution.
In their seminal paper, Beskos and Roberts [Reference Beskos and Roberts5] present an exact rejection algorithm for simulation of paths of a class of one-dimensional diffusions, which requires imposing some boundedness assumptions on the drift of the target diffusion and its derivative. Acknowledged as too restrictive, these assumptions are relaxed to one-sided bounds in a second publication [Reference Beskos, Papaspiliopoulos and Roberts3], and a further extension based on a layered Brownian bridge construction provides an exact simulation method for boundary crossing and hitting times [Reference Beskos, Papaspiliopoulos and Roberts4]. Other extensions include algorithms that allow for simulation of killed diffusions and have applications to double-barrier option pricing problems [Reference Casella and Roberts9], and a localized exact algorithm that relaxes any boundedness assumptions by considering smaller pieces of the target path and can be used to simulate diffusions with boundaries [Reference Chen and Huang10]. Another interesting approach is that presented in [Reference Pollock, Johansen and Roberts40], where the authors provide an exact rejection algorithm for jump diffusions.
Downsides of the exact rejection algorithms presented above include that they impose somewhat strong assumptions on the drift and require the use of the Lamperti transformation (see [Reference Kloeden and Platen37]) to provide unit volatility coefficients, which hinders generalizability to the multivariate case. To overcome these issues, alternative techniques have been proposed, such as the one in [Reference Blanchet and Zhang6] which introduces an exact algorithm for simulation of multivariate diffusions based on tolerance-enforced simulation and rough path analysis. This algorithm overcomes the more restrictive assumptions required in [Reference Beskos and Roberts5] and [Reference Beskos, Papaspiliopoulos and Roberts3] but has, admittedly, infinite expected running time.
Another restrictive feature of exact rejection algorithms is that they rely heavily on the availability of suitable candidates (in all cases mentioned above, Brownian motion or slight modifications thereof). For diffusions with finite boundaries, for example, Brownian candidates may either differ too much from the target, thus providing low acceptance probabilities, or be unsuitable to construct the acceptance–rejection probability itself. Rejection algorithms with candidates other than Brownian motion include [Reference Jenkins31], which uses Bessel proposals to simulate target diffusions with a finite entrance boundary.
In this context, the recent work in [Reference Jenkins and Spanò32] extends the class of diffusions for which exact rejection simulation is possible. The authors propose a simulation technique to recover samples from neutral Wright–Fisher diffusions that, in turn, are used as candidates in an exact rejection algorithm for simulating a wider target family of one-dimensional Wright–Fisher diffusions. Diffusions of this class, as well as their multivariate counterparts, are extensively used in population genetics, where proliferation of exact simulation algorithms can foster the use of suitable inferential techniques such as approximate Bayesian computation; see [Reference Tavaré, Balding, Griffiths and Donnelly46], [Reference Beaumont, Zhang and Balding2], and [Reference Fearnhead and Prangle21].
Along these lines, the main contribution of this paper is to present an exact rejection algorithm for coupled Wright–Fisher diffusions, with candidates built from samples of independent (multivariate) neutral Wright–Fisher diffusions that can be recovered using the techniques presented in [Reference Jenkins and Spanò32]. The coupled Wright–Fisher diffusion [Reference Aurell, Ekeberg and Koski1] is a family of multivariate Wright–Fisher diffusions that models how different allele types (genetic traits) co-evolve across different loci (different locations along the genome). It incorporates parent-dependent mutation at each locus, interlocus selection in the form of pairwise interactions, and free recombination. The model is based on quasi-linkage equilibrium where the fitness coefficients are inspired by a Potts model (see [Reference Aurell, Ekeberg and Koski1, Reference Neher and Shraiman38]) and generalize the classical additive fitness under weak selection (see e.g. [Reference Bürger7, Ch. II]) to the multi-locus case. The assumptions in the coupled Wright–Fisher model are suitable, for example, for studying networks of loci in recombining populations of bacteria (e.g. Streptococcus pneumoniae) that evolve under shared selective pressure when the linkage disequilibrium is low across the genome; see [Reference Ekeberg14, Reference Skwark43]. In contrast, the model is unsuitable in populations of bacteria where the amount of homologous recombination is low, which makes it difficult to separate couplings arising from recombination from those arising from selection (e.g. Streptococcus pyogenes).
The coupled Wright–Fisher diffusion corresponds to a haploid version of the two-locus model in the case of weak selection, loose linkage in [Reference Ethier and Nagylaki17]; see also [Reference Fearnhead20] for a different multi-locus extension. Diffusion models in the presence of epistasis, i.e., allelic interactions between loci, have been used in the analysis of sets of human immune system genes in the presence of balancing selection; see [Reference Buzbas, Joyce and Rosenberg8]. Conversely, diffusion approximations have been deemed poor in some scenarios, e.g. for low mutation rates where the stationary density is ill-defined at the boundary; see [Reference Hössjer, Tyvand and Miloh30].
To complete the proposed exact rejection algorithm, a further contribution of this paper deals with simulation of multivariate Wright–Fisher bridges, for which an exact simulation technique is provided. These bridges allow sampling further points of the path once a skeleton of the coupled Wright–Fisher diffusion has been accepted. Our sampling approach can therefore be viewed as a generalization of that presented in [Reference Jenkins and Spanò32] for the one-dimensional Wright–Fisher diffusions to the multivariate case.
The rest of the paper is structured as follows. In Section 2 we briefly present the main properties and structure of the family of coupled Wright–Fisher diffusions, together with a formal overview of exact rejection algorithms. In Section 3 we recall and present some revised algorithms for exact simulation of one- and multidimensional neutral Wright–Fisher diffusions, i.e., those needed for sampling our candidate processes. This leads us to the proposed exact rejection algorithm for coupled Wright–Fisher diffusions (Section 4). Section 5 includes performance results illustrated through several simulation scenarios, and in Section 6 the technique for simulating exactly from a multivariate Wright–Fisher bridge is provided, which completes the sampling scheme. Finally, Section 7 contains mathematical proofs.
2. Background
This section provides the necessary insights on the structure and main properties of the coupled Wright–Fisher family of diffusions and fixes some notation. It also provides a brief overview of exact rejection algorithms for diffusions, which constitute the basis of our work.
2.1. Coupled Wright–Fisher diffusions
The family of Wright–Fisher models, and more specifically their diffusion approximations, have been widely used in population genetics; see, for instance, [Reference Kimura36] and [Reference Griffiths24]. In its simplest form, the Wright–Fisher model describes the evolution of the frequency of two allele types in a single locus that have the same fitness, and whose configuration at each new generation of individuals is chosen uniformly and with replacement from that of the current generation in a haploid population of constant size. Extensions of the model include consideration of more than two allele types that might be located at different loci, and can incorporate other evolutionary forces such as mutation, selection, and recombination. A comprehensive overview of the family of Wright–Fisher models can be found, for example, in [Reference Crow and Kimura12] or [Reference Ewens18].
With the proliferation of genome-wide association studies, questions arise about how genetic variants associated to numerous diseases co-evolve or interact over time. Moreover, the increasing availability of allele frequency time series data is fostering the study of evolutionary forces such as mutation or selection; see [Reference Steinrücken, Bhaskar and Song44], [Reference Skwark43], [Reference Tataru, Simonsen, Bataillon and Hobolth45], and [Reference Nené, Mustonen and Illingworth39]. Within this framework, the coupled Wright–Fisher model [Reference Aurell, Ekeberg and Koski1] tracks the evolution of frequencies of allele types located at different loci, and, as well as locus-wise mutation and selection, describes possible selective pairwise allele interactions between loci in a haploid population. Its diffusion approximation can be derived as the weak limit of a sequence of discrete coupled Wright–Fisher models characterized by the assumption that the evolution of the population at one locus is conditionally independent of the other loci given the state of the previous generation. The coupled Wright–Fisher diffusion can be expressed as a system of stochastic differential equations of the form
where $X_t$ is a vector of frequencies of allele types, $\alpha$ governs their mutations, and G contains the single and pairwise selective locus interactions.
Let L denote the total number of loci and $d_i\geq 2$ the number of different allele types in each of them. For $n\,:\!=\,\sum_{i=1}^L [d_i-1]$, let us index each element of an n-dimensional vector x by $i\in\{1,\ldots , L\}$, referring to a specific locus, and $j\in\{1,\ldots, d_i-1\}$, referring to an allele type, so that $x=\{x^i\}_{i=1}^L$ where each $x^i=\{x^{ij}\}_{j=1}^{d_i-1}$. If $x\in\mathbb{R}^n$ refers to the vector of allele type frequencies $X_t$, the elements of the drift $\alpha(x)\in\mathbb{R}^n$ take the form
where $|\theta|=\sum_{k=1}^{d_i}\theta^{i}_{k}$ and $\theta^{i}_{k}>0$ denote the parent-independent mutation rates to allele type $k\in\{1,\ldots, d_i\}$ at locus i, so that mutations occur at each locus separately. Wright–Fisher diffusions with drift $\alpha(x)$ correspond to the reversible neutral mutations allele model. The coupled Wright–Fisher model also admits parent-dependent mutations, but we will not consider these here.
The coupling term $G(x)\in\mathbb{R}^n$ has general form (Svirezhev–Shahshahani gradient)
where the square of the diffusion matrix $D(x)=\text{diag}(D^i(x^i))\in\mathbb{R}^{n\times n}$ is an L-block diagonal matrix with entries
$\nabla_{x}$ is the gradient operator with respect to each component of x; f transforms x into the augmented $(n+L)$-dimensional vector $\overline{x}$ that reflects the dependency between allele frequencies at locus i, i.e.,
and $\overline{V}(\overline{x})\in \mathbb{R}$,
where $s\in\mathbb{R}^{n+L}$ is the vector of within-locus selection parameters and $H\in\mathbb{R}^{(n+L)\times(n+L)}$ is a symmetric between-loci pairwise interactions matrix. The matrix H is in fact built by L blocks of zeros of size $d_i\times d_i$ in the main diagonal (denoted by $0^{ii}$), and off-diagonal blocks of the form $H^{il}=(H^{li})^T\in\mathbb{R}^{d_i\times d_l}$, $i\neq l$, $i,l\in\{1,\ldots, L\}$,
so that interactions of each locus with itself are not permitted. Note that if one removes the coupling term G, (2.1) simply becomes the usual multivariate neutral mutations Wright–Fisher diffusion, where $X_t$ describes the evolution of allele frequencies that evolve at each locus without interacting. The explicit form of $V(x)\,:\!=\,\nabla_{x} (\overline{V}\circ f)(x)$ in terms of s and H is specified in the following proposition, whose proof can be found in Section 7.
Proposition 2.1. Let $V(x)\in\mathbb{R}^n$ be the n-dimensional vector such that $V(x)\,:\!=\,\nabla_{x} (\overline{V}\circ f)(x)$. Then $\forall i \in\{1,\ldots, L\}$, $j\in\{1,\ldots, d_i-1\}$,
with
where $h^{il}_{jk}$ denotes the entry in row j and column k of the block $H^{il}$ of H.
Following Kimura’s formulation [Reference Kimura35], the explicit stationary density of (2.1) can also be obtained by solving the corresponding Fokker–Planck equation, see [Reference Aurell, Ekeberg and Koski1], whose solution takes the form
where
and Z is the normalizing constant
Here $\mathcal{X}=\{x\in\mathbb{R}^n\mid x^{ij}\geq 0, \sum_{j=1}^{d_i-1}x^{ij}\leq 1\}$.
The representation (2.3) resembles the stationary density of the haploid version of the model studied by Fearnhead (see [Reference Fearnhead20, Theorem 2]), which in the two-locus case agrees with the coupled Wright–Fisher diffusion.
2.2. Overview of exact rejection algorithms for simulation of diffusions
This subsection provides an overview of the exact rejection algorithm presented in [Reference Beskos and Roberts5], and presents the same sampling scheme followed for simulating from the coupled Wright–Fisher diffusion, as detailed further in Algorithm 4. Let
where $\mu({\cdot})$ is such that (2.4) admits a weakly unique solution $(X_t)_{t\in[0,T]}$. Let $\mathbb{Q}_{x_0}$ be the law of the process $(X_t)_{t\in[0,T]}$, and let $\mathbb{P}_{x_0}$ denote the law of a Brownian motion $(B_t)_{t\in[0,T]}$ starting at $B_0=x_0$. By means of a Girsanov transformation of measures one can write the Radon–Nikodym derivative of $\mathbb{Q}_{x_0}$ with respect to $\mathbb{P}_{x_0}$ as
Assuming that $\mu({\cdot})$ is differentiable everywhere and using Itô’s lemma, (2.5) can be rewritten as
where $\tilde{A}(x)\,:\!=\,\int_0^x\mu(u)du$. Imposing the further conditions that $\tilde{A}(x)$ be bounded above by a constant $K^A$, and that $(\mu^2+\mu')/2$ be bounded between constants $K^-$ and $K^+$, we find that
with $\tilde{A}(x)\leq K^A$ and $K^-\leq \tilde{\phi}(x)\,:\!=\,\dfrac{1}{2}[\mu^2(x)+\mu'(x)]\leq K^+$, is a suitable acceptance–rejection probability.
Note that an exact evaluation of the integral in (2.6) is not possible without any approximation error, because this would require the storage of infinitely many points of the sample candidate path $B=(B_t)_{t\in[0,T]}$. However, the key point of the exact algorithms proposed in [Reference Beskos and Roberts5] and references therein is that an event occurring with probability (2.6) can be evaluated by sampling B only at a finite number of time points. This follows because the last term in (2.6) can be interpreted as the probability of the event $\omega_{\tilde{\phi}}$ that no points from a homogeneous spatial Poisson process $\Phi=\{(t_j,\psi_j): j=1,\ldots,J\}$ with unit intensity on $[0,T]\times[0,K^+-K^-]$ lie below the graph of $t\mapsto\tilde{\phi}(B_t)$. A formal statement and proof of this observation can be found in Theorem 1 of [Reference Beskos, Papaspiliopoulos and Roberts3].
Therefore, the exact sampling procedure (detailed in Algorithm 1) starts by drawing a sample from $\Phi$ that will determine the (random) time points at which the candidate B will be drawn, and then provides a skeleton of $(X_t)_{t\in[0,T]}$ at such time points upon acceptance of the sampled candidate (that is, if all the evaluated $\tilde{\phi}(B_t)$ lie above the sampled Poisson points). Note that the last point on the candidate path, $B_T$, serves to evaluate an event that occurs with probability $\exp\{\tilde{A}(B_T)-K^A\}$ and that is independent of $\omega_{\tilde{\phi}}$. In previously presented versions of the algorithm, $B_T$ is considered to follow a specific probability distribution, thus slightly modifying the candidate to be a certain biased Brownian motion, which is shown to improve the algorithm’s efficiency. Such an option is not needed for our purposes and is therefore not fully described here. In brief, this modification permits the boundedness condition on $\tilde{A}(x)$ to be relaxed, but the equivalent function in our proposed exact algorithm is already bounded.
3. Simulation of the candidate processes
This section is devoted to describing existing simulation strategies for the candidate processes in the exact rejection algorithm that will be presented later in Section 4. Suitable candidate processes in our setting will be L independent $(d_i-1)$-dimensional neutral Wright–Fisher diffusions $(X_t)_{t\in [0,T]}$, each one a weakly unique solution of
with $\alpha(X_t)$ a $(d_i-1)$-dimensional vector with $\alpha^{ij}(x)=\frac{1}{2}(\theta^i_j-|\theta| x^{ij})$.
3.1. Transition density function expansions
Exact simulation of each neutral Wright–Fisher diffusion is possible by exploiting an available eigenfunction expansion of its transition density function that allows a probabilistic representation; see, for example, [Reference Griffiths, Spanò, Bingham and Goldie29].
For a fixed locus $i\in\{1,\ldots, L\}$, let $x=(x_1,\ldots, x_{d-1})$ be a vector of initial frequencies and $\theta+l$ a d-dimensional vector with entries $\theta_j+l_j, \ j\in\{1,\ldots, d\}$. Then the probabilistic representation of the transition density function of $(X_t)_{t\in [0,T]}$ in (3.1) is given by
where the $q_m^{\theta}(t)$ are transition functions of a pure death process $A_{\infty}^{\theta}(t)$ with an entrance boundary at $\infty$, $\mathcal{M}_{m,x}({\cdot})$ is the probability mass function (PMF) of a multinomial random variable, and $\mathcal{D}_{\theta+l}({\cdot})$ is the probability density function of a Dirichlet random variable; that is,
and
with $l_d=m-\sum_{j=1}^{d-1} l_j$. A more detailed description of the process $A_{\infty}^{\theta}(t)$ and an exact sampling technique are provided later on.
In the case of one-dimensional ($d=2$) Wright–Fisher diffusions, the multivariate components of the mixture in (3.2) reduce to their one-dimensional counterparts, i.e., a binomial and a beta random variable, respectively [Reference Griffiths, Spanò, Bingham and Goldie29].
A sampling strategy for $g(x,\cdot;t)$ is summarized in Algorithm 2; see [Reference Griffiths and Li28] or [Reference Jenkins and Spanò32] for an analogous version in the one-dimensional case (the latter also includes a modification for the infinite-dimensional case, that is, for Fleming–Viot diffusions). Once expressed in probabilistic terms and given the simplicity of Algorithm 2, recovering samples from $g(x,\cdot;t)$ seems straightforward. However, sampling exactly from $q_m^{\theta}(t)$ poses some difficulty because it is only known in infinite series form. Previous approaches for simulating from approximated versions of $q_m^{\theta}(t)$ can be found in [Reference Griffiths and Li28], [Reference Griffiths25], [Reference Griffiths26], and [Reference Jewett and Rosenberg33]. In the next section, we review the exact simulation procedure presented in [Reference Jenkins and Spanò32], which is the one used here.
3.2. Exact simulation of the ancestral process $A_{\infty}^{\theta}$
We describe here the sampling procedure for recovering exact samples of $q_m^{\theta}(t)$, the transition functions of the aforementioned death process $A_{\infty}^{\theta}$. In more detail, let $\{A_n^{\theta}(t): t\geq 0\}$ be a pure death process on $\mathbb{N}$ such that $A_n^{\theta}(0)=n$ almost surely and with its only possible transition $m\to m-1$ occurring at rate $m(m+|\theta|-1)/2$ for each $m=1,\ldots,n$; that is, it represents the number of non-mutant lineages that coalesce backwards in time in the coalescent process with mutation. Then, let $q_m^{\theta}(t)=\lim_{n \to \infty}\mathbb{P}(A_n^{\theta}(t)=m)$.
For a more thorough interpretation of the transition density $g(x,\cdot, t)$ and its one-dimensional counterpart, it is worth noting that the expansion in (3.2) is derived via a duality principle for Markov processes [Reference Ethier and Kurtz16], that is, from the moment dual process of the Wright–Fisher diffusion, which is also a pure death process representing lineages backwards in time; see for example [Reference Etheridge and Griffiths15] or [Reference Griffiths, Spanò, Bingham and Goldie29] for a complete derivation and details. The corresponding dual (coalescent) process for coupled Wright–Fisher diffusions is derived in [Reference Favero, Hult and Koski19].
An expression for $q_m^{\theta}(t)$ starting from the entrance boundary at infinity is as follows (see [Reference Griffiths24]):
where
As shown in [Reference Jenkins and Spanò32], samples from $q_m^{\theta}(t)$ can be recovered exactly by means of a variant of the alternating series method, described in [Reference Devroye13, Chapter 4]. In brief, the alternating series method would require the sequence of coefficients $b_{m+i}^{(t,\theta)}(m)$ to be decreasing in i for each m, a condition that is not always met here. Nonetheless, one can exploit the fact that there exists a finite $C_m^{(t,\theta)}$ such that for all m and $i\geq C_m^{(t,\theta)}$ the sequence of coefficients $b^{(t,\theta)}_{m+i}(m)$ decreases monotonically as i tends to $\infty$. More explicitly, there exists
Then, once $C_m^{(t,\theta)}$ is available, the remaining of the sequence of coefficients is ensured to be decreasing and the alternating series method can be applied. The following proposition summarizes the main properties of the bound $C_m^{(t,\theta)}$.
Proposition 3.1. ([Reference Jenkins and Spanò32, Proposition 1]) Let $b_{m+i}^{(t,\theta)}(m)$ be the coefficients defined in (3.4) and let $C_m^{(t,\theta)}$ be as in (3.5). Then
(i) $C_m^{(t,\theta)}<\infty$ for all m;
-
(ii) $b_{m+i}^{(t,\theta)}(m)\downarrow 0$ as $i\to\infty$ for all $i\geq C_m^{(t,\theta)}$;
-
(iii) $C_m^{(t,\theta)}=0$ for all $m>D_{\epsilon}^{(t,\theta)}$, where
(3.6) \begin{equation} D_{\epsilon}^{(t,\theta)}= \inf\left\{u\geq \left(\frac{1}{t}-\frac{|\theta|+1}{2}\right)\vee 0 : (|\theta|+2u-1)e^{u(u+|\theta|-1)t/2}<1-\epsilon \right\}, \end{equation}for $\epsilon\in[0,1)$.
Part (iii) of Proposition 3.1 will be of interest later in proposing the exact sampling algorithm for $(d-1)$-dimensional Wright–Fisher bridges (Section 6), where an explicit bound on m is needed.
Following Proposition 3.1, one can then recover exact samples from $q_m^{\theta}(t)$ because the terms $b^{(t,\theta)}_{m+i}(m)$ become monotonically smaller with increasing i, and for each m there exists $k_m$, an element of $\tilde{k}\in\mathbb{R}^{M+1}$ (i.e., $\tilde{k}=\{k_m\}_{m=0}^M$), such that
Because
and because both $S^-_{\tilde{k}}(M)$ and $S^+_{\tilde{k}}(M)$ can be computed from finitely many terms, given $U\sim \text{Uniform}(0,1)$ we can find $\tilde{k}^0\in\mathbb{R}^{M+1}$ with elements $k^0_m$ such that
for each $m\in\{0,\ldots, M\}$.
Now, if $\tilde{k}^0$ is such that $S^-_{\tilde{k}^0}(M) > U$, standard inverse sampling provides
with M exactly distributed following $q_m^{\theta}(t)$. The sampling strategy will consist in exploring the summands in $S^-_{\tilde{k}}(M)$ and $S^+_{\tilde{k}}(M)$ through their respective indexes m and $k_m$, until, for a given realization of U, the condition (3.7) is satisfied.
A complete simulation procedure is presented in Algorithm 3, where several improvements mentioned in [Reference Jenkins and Spanò32] have been incorporated. Most notably, the variable M is initialized at the nearest integer around the mean of a certain distribution (not necessarily at 0) that serves as an estimate of the mode $\hat{q}_{\text{mod}}$ of $q_m^{\theta}(t)$, a modification that decreases computation times substantially. Such initialization originates from an asymptotic approximation of the transition functions $q_m^{\theta}(t)$, see [Reference Griffiths25], which states that as $t\to 0$, $A_{\infty}^{\theta}(t)$ converges to a normal distribution, that is,
where $\mu^{(t, \theta)}=2\eta/t$ and
with $\eta=\beta/e^{\beta}-1$ for $\beta\neq 0$ or $\eta=1$ otherwise, $\beta=\frac{1}{2}(|\theta|-1)t$, and where $\stackrel{\mathcal{D}}{\longrightarrow}$ denotes convergence in distribution; see [Reference Jenkins and Spanò32, Theorem 1].
Note that this initialization is possible because the exploration of the summands in $S^-_{\tilde{k}}(M)$ and $S^+_{\tilde{k}}(M)$ does not need to occur in any specific order. It is also worth mentioning that, when M is initialized at 0, the vector $\tilde{k}$ is updated increasingly, i.e. a new element of the vector is added at every new iteration where M is increased by one unit. In Algorithm 3, however, M is updated telescopically; that is, at each new iteration j, M moves farther from $\hat{q}_{\text{mod}}$ by one unit alternatingly above or below. This in turn entails updating the corresponding $M+1$ elements of $\tilde{k}$ accordingly; i.e., the number of elements might either increase or decrease at each iteration. In addition, because M is updated telescopically, when an M such that $S^-_{\tilde{k}}(M)>U$ is found, one needs to ensure whether this is the infimum M for which this condition is satisfied. For precise results on the complexity of Algorithm 3 and simulation performance we refer the reader to [Reference Jenkins and Spanò32].
At this stage, Algorithm 3 can be used in Step 1 of Algorithm 2, and an exact sampling procedure for $(d-1)$-dimensional neutral Wright–Fisher diffusions is completed.
4. Exact rejection algorithm for simulating coupled Wright–Fisher diffusions
Let $X_t$ be the n-dimensional vector of allele frequencies satisfying (2.1). Following the same scheme as Algorithm 1 in Section 2, Algorithm 4 simulates exact skeletons of paths of coupled Wright–Fisher diffusions with L loci and $d_i$ allele types each, $i\in\{1,\ldots,L\}$. Candidate processes in this case are L independent $(d_i-1)$-dimensional neutral Wright–Fisher diffusions, each one a weakly unique solution of (3.1) and sampled exactly following Algorithm 2.
The exact rejection algorithm proposed in this paper relies on the existence and characterization of the following acceptance–rejection probability, which is detailed in Theorem 4.1 and whose proof is deferred to Section 7.
Theorem 4.1. Let $\mathbb{CWF}_{\alpha,G,x_0}$ be the law of X, solution of (2.1), and let $\mathbb{WF}L_{\alpha,x_0}$ be the joint law of L independent $(d_i-1)$-dimensional neutral Wright–Fisher diffusions weakly unique solutions of (3.1), $i\in\{1,\ldots, L\}$. Then the Radon–Nikodym derivative of $\mathbb{CWF}_{\alpha,G,x_0}$ with respect to $\mathbb{WF}L_{\alpha,x_0}$ is of the form
and there exist constants $A^-, A^+, C^-$, and $C^+$ such that $A(X_0,X_T)$ is bounded on $[0,1]^n\times[0,1]^n$ by $A^-\leq A(X_0,X_T)\leq A^+$ and $\phi(X_t)$ is bounded on $[0,1]^n$ by $C^-\leq\phi(X_t)\leq C^+$, with
and
Using Algorithm 2 in Step 3 (or the corresponding modification for one-dimensional diffusions, if $d_i=2$ for some i), Algorithm 4 returns an exactly simulated skeleton of $(X_t)_{t\in[0,T]}$ solution of (2.1).
The algorithm’s computational complexity can also be established; this is made precise in Proposition 4.1, whose proof can be found in Section 7.
Proposition 4.1. Let L be the number of loci; let M(t) denote the total number of coefficients that must be computed in the implementation of Algorithm 3, where $t\in(0,T)$ is the time distance between two sampled skeleton points; and let N(T) denote the number of Poisson points required until the first skeleton in Algorithm 4 is accepted. Then $\text{E}[LM(t)]<\infty$ and $\text{E}[N(T)]<\infty$, and more specifically, there exists $\kappa>0$ such that
In summary, the complexity of Algorithm 4 increases either as $t\to 0$, when the average number of coefficients to be computed in Algorithm 3 increases as $1/t$, or with increasing T, when the average number of Poisson points needed until acceptance increases exponentially.
The latter is easily solvable, simply by considering shorter intervals $[t_{k-1}, t_k]$ such that
and then concatenating the accepted skeletons in each of them. To solve the problem when $t\to 0$, we follow the recommendation in [Reference Jenkins and Spanò32], and whenever $t<0.05$, resort to the approximation in (3.8).
While asymptotically the algorithm’s growth rate does not depend on the number of loci, it is worth mentioning that with increasing L the acceptance probability decreases, as there are a larger number of skeletons that need to be accepted simultaneously, which naturally affects the algorithm’s feasibility. This will be clearer in the next section, where simulation results for examples with $L=2$ and $L=4$ are provided. As expected, the acceptance probability also decreases whenever the target diffusion differs more from the neutral Wright–Fisher candidate. This is exemplified in the next section, where results are shown for two coupled Wright–Fisher models with the same number of loci and mutation parameters, but different selective pairwise interactions.
5. Numerical experiments
In the following, several implementations of Algorithm 4 are shown along with their simulation results. The examples below represent plausible network structures present in biological applications, and whose interaction parameters are of interest. Examples of these include frequency-dependent selection in population dynamics with applications to vaccine interventions [Reference Corander11], genome-wide discovery of interdependent loci affecting antibiotic resistance [Reference Schubert, Maddamsetti, Nyman, Farhat and Marks42], co-evolution of interacting human gamete-recognition genes [Reference Rohlfs, Swanson and Weir41], and analysis of sequence data [Reference Gao, Zhou and Aurell23].
Consider the case of two loci with two allele types each, i.e., $L=2$ and $d_1=d_2=2$. A particular example with mutation, one-type allele interaction between loci, and no within-locus selection reduces (2.1) to
where the $\alpha^{i1}({\cdot})$ are as in (2.2), $B=(B_t^1,B_t^2)$ is a vector of independent Brownian motions, $H^{12}=H^{21}=\begin{pmatrix} h&\quad 0 \\ 0&\quad 0\end{pmatrix}$, and H is 0 elsewhere.
In this case, $A(X_0,X_T)\,:\!=\,h(X_T^{11}X_T^{21}-X_0^{11}X_0^{21})$ and
and the bound constants were set to $A^+=h(1-x_0^{11}x_0^{21})$, $C^-=-\frac{h}{2}(|\theta^{1}|+|\theta^{2}|)$, and $C^+=\frac{h}{2}(\frac{h}{2}+\theta^{1}_{1}+\theta^{2}_{1})$, where $|\theta^i|=\theta_{1}^{i}+\theta_{2}^{i}$.
The results of several simulation scenarios for sampled skeletons of the diffusion solution of (5.1) are shown in Table 1 and Table 2, which report the total length T of the considered interval [0,T], the initialization of the path $(x_0^{11}, x_0^{21})$, the average number of attempts (drawn skeletons) until acceptance, the average number of Poisson points needed until acceptance, the average number of coefficients computed in Algorithm 3, the acceptance probability, the number of approximations needed due to small ts in between drawn points of the skeleton, and the average time in seconds per accepted path.
As shown in Table 1 and Table 2, the average number of coefficients needed is larger for shorter intervals, where the sampled Poisson points are more likely to be close to each other ($t\to 0$), than for longer intervals, as expected from the results presented in Proposition 4.1. Also, the acceptance probabilities when $h=1$ drop to around half compared with the simulations when $h=0.1$. This is also expected, as the model with larger pairwise interaction parameter differs more from the candidate paths, so acceptance of the candidate becomes harder. This is also reflected in the average number of attempts, needed Poisson points, and coefficients, which increase consistently in the case $h=1$. Running time also increases with increasing T. In the case $h=1$ and $T=5$ the total running times became prohibitive.
In order to establish the correctness of Algorithm 4 and with the aim of providing a qualitative comparison, sampled paths of the solution of (5.1) at large T are compared with the corresponding stationary density, and this is done for different mutation parameters (see Figure 1 and Figure 2) with satisfactory results. Note that the stationary density for (5.1) can be explicitly computed, and its normalizing constant reads as follows, see [Reference Aurell, Ekeberg and Koski1]:
where $a^{(n)}=a(a+1)\ldots(a+n-1)$. For the qualitative comparisons in Figure 1 and Figure 2, the infinite sum is truncated at a sufficiently high n so that the remainder is negligible. For the parameters in Table 1 and 2 the truncation level is set to $n=70$.
Consider now the case of four loci with two allele types each, i.e., $L=4$ and $d_i=2$ for $i\in\{1,2,3,4\}$. A particular example with mutation, one-type allele interactions between loci, and no within-locus selection reduces (2.1) to
where the $\alpha^{i1}({\cdot})$ are as in (2.2), $B=(B_t^1,B_t^2,B_t^3,B_t^4)$ is a vector of independent Brownian motions,
and H is 0 elsewhere. In this case,
and
and the bound constants were set to
Similarly to the previous example with $L=2$ and interaction parameter $h=1$, the model in (5.2) differs more from the candidate process than, say, a model with only one pairwise interaction parameter (that is, a model with, for example, $h_2=h_3=0$). This is clearly reflected in the low average acceptance probabilities, or, equivalently, in the average number of attempts or simulated Poisson points needed until acceptance; see Table 3. Moreover, simulating from model (5.2) implicitly requires the simultaneous acceptance of four candidate paths, which makes acceptance more difficult. Nonetheless, it is still feasible to use Algorithm 4 in these scenarios, as other approximate simulation strategies would be affected by similar problems.
6. Simulation of multidimensional neutral Wright–Fisher bridges
To complete our simulation scheme, this section presents an exact simulation technique for sampling from neutral $(d-1)$-dimensional Wright–Fisher bridges. As mentioned before, once Algorithm 4 recovers a skeleton of the desired coupled Wright–Fisher diffusion, the remainder of the path can be filled by sampling from the corresponding neutral Wright–Fisher bridges, with no further reference to the target distribution needed; see, for example, [Reference Beskos, Papaspiliopoulos and Roberts3].
Consider a $(d-1)$-dimensional Wright–Fisher bridge, between x at time 0 and z at time t. Its transition density is given by
where $g(\cdot,\cdot;\cdot)$ is as in (3.2), see [Reference Fitzsimmons, Pitman and Yor22]: The precise eigenfunction expansion for $g_{z,t}(x,\cdot;s)$ is provided in the following proposition, whose proof can be found in Section 7.
Proposition 6.1. Let $g_{z,t}(x,\cdot;s)$ be as in (6.1), the transition density function of a $(d-1)$-dimensional Wright–Fisher bridge. Then its eigenfunction expansion reads
with
where $\mathcal{DM}_{\theta+l;n}({\cdot})$ denotes the PMF of a Dirichlet–multinomial random variable, with
Following the result in Proposition 6.1, a sampling scheme for $g_{z,t}(x,y;s)$ is provided in Algorithm 5.
Similarly to Step 1 in Algorithm 2, sampling exactly from the discrete random variable with PMF $p_{m,n,l,r}^{(x,z,s,t,\theta)}$ is not straightforward. However, given the results obtained so far, it only remains to find how to evaluate $g(x,z;t)$ without approximation error, and the sampling strategy will be complete. As pointed out in [Reference Jenkins and Spanò32] for the one-dimensional case, note that the problem of evaluating $g(x,z;t)$ at x and z is different from that of sampling from it.
By (3.2) and (3.3), one obtains
where $L_m\sim\text{Multinomial}(m,x)$.
As in Section 3, the aim is to find monotonically converging bounds on $p_{m,n,l,r}^{(x,z,s,t,\theta)}$ so that the alternating series method can be applied.
Let
for $m=0,1,\ldots,$ which, rearranging the terms in (6.2), gives the alternating series
Indeed, it can be proved that the terms $(d_i)_{i\geq 0}$ are monotonically decreasing from a certain threshold that is characterized in the following results, which is a condition required to apply the alternating series method.
First, note that the strategy presented in [Reference Jenkins and Spanò32] for the one-dimensional case applies here almost without change, with the exception of the terms involving $\mathbb{E}[\mathcal{D}_{\theta+L_m}(z)]$ which by an analogous (generalized) argument can be shown to decrease in m, as shown in the following lemma, proved in Section 7.
Lemma 6.1. Let $L_m\sim{Multinomial}(m,x)$. Then for all $m\in\mathbb{N}$,
where
Now, similarly to (3.5), the following bound is defined:
This is used in Proposition 6.2 below, which fully characterizes the bound on m. The proof is again deferred to Section 7.
Proposition 6.2. Let the sequence $(d_i)_{i\geq 0}$ be as defined in (6.3), and consider the bounds $E^{(t,\theta)}$, $D_{\epsilon}^{(t,\theta)}$, and $\tilde{K}^{(\theta,x,z)}$ as in (6.5), (3.6), and (6.4) respectively. Then, for $\epsilon\in(0,1)$,
for all $m\geq E^{(t,\theta)}\vee D_{\epsilon}^{(t,\theta)}\vee 2\tilde{K}^{(\theta,x,z)}/\epsilon$.
Once the bound in Proposition 6.2 is established, exact simulation of $(d-1)$-dimensional Wright–Fisher bridges ($d>2$) is possible by setting
which for $2u\geq \tilde{F}_{m,n,l,r}^{(s,t,\theta)}$ provides the monotonically converging bounds
where
see [Reference Jenkins and Spanò32, Proposition 4].
To recover exact samples from $p_{m,n,l,r}^{(x,z,s,t,\theta)}$, consider the pairing bijective function $\Sigma:\mathbb{N}\rightarrow \mathbb{N}\times \mathbb{N}\times \mathbb{N}^d \times \mathbb{N}^d$ such that $\Sigma(j)=(m,n,l,r)$. Now, for each j there exists $v_j$, an element of $\tilde{v}\in\mathbb{R}^{J+1}$ (i.e., $\tilde{v}=\{v_j\}_{j=0}^J$), such that
providing a setting analogous to the one presented in Section 3. The proposed exact sampling scheme can be found in [Reference Jenkins and Spanò32, Algorithm 5], which we reproduce here as Algorithm 6 for completeness.
Other approaches to exact simulation of one-dimensional Wright–Fisher bridges include that recently proposed in [Reference Griffiths, Jenkins and Spanò27], which restricts to the case where either $\theta_1$ or $\theta_2$ is 0, and one or both of x and z are 0, which is not applicable here.
7. Proofs
Proof of Proposition 2.1. Let $V:\mathbb{R}^n \to \mathbb{R}^{n}$ be such that $V\,:\!=\,\nabla_{x}(\overline{V}\circ f)(x)$, where we recall that
Then for all $i\in\{1,\ldots, L\}$, $j\in\{1,\ldots,d_i-1\}$, we have
where the last equality holds because whenever $i=l$, all entries of the blocks $H^{ii}$ are 0.
Proof of Theorem 4.1. By definition of $A({\cdot})$, $\phi({\cdot})$, and $X_t$,
Because $V({\cdot})$ and $D({\cdot})$ are continuous on $[0,1]^n$, there exist constants $C^-$ and $C^+$ such that
Consequently,
so Novikov’s condition is fulfilled and (7.1) can be identified as a Girsanov transformation [Reference Karatzas and Shreve34] with Girsanov kernel $(V(X_t))^T D^{\frac{1}{2}}(X_t)$.
Let $\mathbb{Q}$ be the probability measure with
It follows that the law of X under $\mathbb{Q}$ coincides with $\mathbb{CWF}_{\alpha,G,x_0}$. Indeed, by Girsanov’s theorem,
is a $\mathbb{Q}$-Brownian motion and
Now, by Proposition 2.1, and for $K_s,K_l, K_{lk}\in\mathbb{R}^n$,
where the rightmost term comes from pairing terms of the form
and recalling that $K^{ij}_{lk}=K^{lk}_{ij}$, and $i\neq l$ prevents squared terms. Hence,
The fact that
is bounded follows immediately, concluding the proof.
Proof of Proposition 4.1. Let M(t) be the total number of coefficients that must be computed in Algorithm 3, with $t\in(0,T)$ the time distance between two sampled skeleton points. By [Reference Jenkins and Spanò32, Proposition 5(iv)], there exists a $\kappa>0$ such that $\mathbb{E}[M(t)]=o(t^{-(1+\kappa)})$ as $t\to 0$, and further random coefficients needed in Algorithm 2 do not add to the algorithms’ complexity. Similarly, although our rejection scheme uses Algorithm 3 L times,
so the algorithm’s complexity is proportional to L, but its growth rate as $t\to 0$ remains the same as with $L=1$.
Now let $\epsilon\,:\!=\,d\mathbb{CWF}_{\alpha,G,x_0}/d\mathbb{WF}L_{\alpha,x_0}$ be the acceptance–rejection probability in Algorithm 4. Because $A^-\leq A(X_T)\leq A^+$ and $C^-\leq\phi(X_t)\leq C^+$,
Let D refer to the number of Poisson points needed to decide upon acceptance or rejection of a proposed path. Then, following [Reference Beskos, Papaspiliopoulos and Roberts3, Proposition 3],
where the first equality follows from considering the expectation of the sum of all drawn Poisson points $\sum_{i=1}^I D_i$ over I iterations of the algorithm until the first path is accepted. First conditioning on I and applying the law of iterated expectations, and then applying the law of total expectation, concludes the proof.
Proof of Proposition 6.1. Let $g_{z,t}(x,y;s)$ be the transition density function of a $(d-1)$-dimensional Wright–Fisher bridge, between x at time 0 and z at time t. By (6.1) and (3.2),
By definition,
Multiplying and dividing by $\dfrac{\Gamma(|\theta+l+r|)}{\prod_{j=1}^d \Gamma(\theta_j+l_j+r_j)}$ and rearranging shows that
Now, identifying the coefficients of $p_{m,n,l,r}^{(x,z,s,t,\theta)}$, the proof is complete.
Proof of Lemma 6.1. For the sake of simplicity, we will first consider the case $d=3$ and then show that by analogous arguments the results can be extended to the general $(d-1)$-dimensional case. First, note that the indexes of the sum on the right-hand side of
can be seen as being placed on the $(d-1)$-face of the d-simplex $\mathcal{S}_d=\{l\in\mathbb{R}^d\mid l_{j}\geq 0,\sum_{j=1}^{d}l_j=m\}$. For example, if $d=3$, we only need to consider indexes $l_1$ and $l_2$ (see Figure 3). Thus,
Let us define the quantities
where $\tilde{l}\in\mathbb{R}^d$ is equal to l except in its jth component, for which $\tilde{l}_j=l_j+1$. As a special case, when $j=d$, one can write $\tilde{l}_d=m-\sum_{j=1}^{d-1}l_j+1$.
Then, for $l_1\leq\lfloor mz_1 \rfloor, l_2\leq\lfloor mz_2 \rfloor$,
Here, in the second inequality, note that the function
is decreasing in m, and thus for $m\geq 1$, it attains its maximum value at g(1). Then, if
one obtains $f(m)\leq f(0)\vee g(m)\leq f(0)\vee g(1)$ yielding the desired result.
Similarly, for $l_1\leq\lfloor mz_1 \rfloor, \lfloor mz_2 \rfloor\leq l_2$,
and for $\lfloor mz_1 \rfloor\leq l_1$ and all $l_2$,
Combining the inequalities in (7.3), (7.4), and (7.5),
where, in the first inequality, terms starting at $l_j=\lfloor mz_j \rfloor+1$ are compared one-to-one to terms starting at $l_j=\lfloor mz_j \rfloor$, and the last inequality holds after taking common factors and noting that the terms for $l_j=\lfloor mz_j \rfloor$ are bounded by both
and
The proof for the general $(d-1)$-dimensional case follows analogously. Consider
where $|l|_{d-2}=\sum_{j=1}^{d-2} l_j$.
Following the strategy used above, the sums can be partitioned into terms such that either
(a) $l_j\leq \lfloor mz_j \rfloor$ for all j;
-
(b) $\lfloor mz_j \rfloor\leq l_j$ for all $j\neq 1$, and $l_i\leq \lfloor mz_i \rfloor$ for all $i\neq j$; or
-
(c) $\lfloor mz_1 \rfloor\leq l_1$, and $l_i$ is free for all $i\neq 1$.
Note that this partition covers all (non-exclusive) combinations and includes all the elements of the sum. Now, comparing $\mathbb{E}[\mathcal{D}_{\theta+L_{m+1}}(z)]$ with $\mathbb{E}[\mathcal{D}_{\theta+L_{m}}(z)]$, the bounding constants for each case are as follows:
(a)
\begin{equation*}\left(\dfrac{|\theta|}{\theta_d}\left(1-\sum_{j=1}^{d-1}z_j\right)\vee \dfrac{2(1+|\theta|)}{1-\sum_{j=1}^{d-1}z_j}\right)\left(1-\sum_{j=1}^{d-1}x_j\right),\end{equation*}(b)
\begin{equation*}\left(\dfrac{|\theta|}{\theta_j}z_j\vee \dfrac{2(1+|\theta|)}{z_j}\right)x_j,\end{equation*}(c)
\begin{equation*}\left(\dfrac{|\theta|}{\theta_1}z_1\vee \dfrac{2(1+|\theta|)}{z_1}\right)x_1.\end{equation*}
This yields
with
Proof of Proposition 6.2. The proof follows from that of [Reference Jenkins and Spanò32, Proposition 3], which is reproduced here for completeness.
The inequality $d_{2m+1}<d_{2m}$ follows because if $m\geq E^{(t,\theta)}$ then $2j\geq C_{m-j}^{(t,\theta)}$ for all $j=0,\ldots, m$, which, by Proposition 3.1, implies
Multiplying by $\mathbb{E}[\mathcal{D}_{\theta+L_{m-j}}(z)]$ and then summing over $j=0,\ldots,m$ gives
Proving $d_{2m+2}<d_{2m+1}$ requires some extra steps. First, note that
where $j=r-1$. Noting now that $2j+1>2j\geq C_{m-j}^{(t,\theta)} \text{ for all } j=0,\ldots, m$, which implies
and using the same argument as in (7.6) yields
where the sum is taken only over $j=1,\ldots, m$ so that the remaining terms in $d_{2m+2}$ and $d_{2m+1}$ can be compared. Indeed, it only remains to prove that
Note now that
where
the second equality follows from (3.4), and the last inequality holds because $h_m(k)<h_k(k)=|\theta|+2k+1$; see the proof of [Reference Jenkins and Spanò32, Proposition 1].
Because by hypothesis $m\geq D_{\epsilon}^{(t,\theta)}$, recalling the definition of $D_{\epsilon}^{(t,\theta)}$ in (3.6) and choosing $k=m+1$ in (7.7) yields
Finally,
where the last inequality follows because $m+1>m\geq 2\tilde{K}^{(\theta,x,z)}/\epsilon$ and using Lemma 6.1, yielding
which concludes the proof.
Acknowledgements
Part of this work was done while C. G.-P. was a doctoral student at the Unit of Biostatistics, Institute of Environmental Medicine, at Karolinska Institutet, Stockholm, Sweden. Part of this research was supported by the Swedish Research Council through grants 621-2013-4628 (H. H.) and 40-2012-5952 (T. K.).