An ergodic theorem for the weighted ensemble method

David Aristoff

doi:10.1017/jpr.2021.38

An ergodic theorem for the weighted ensemble method

Part of: Probabilistic methods, simulation and stochastic differential equations

Published online by Cambridge University Press: 18 January 2022

David Aristoff

Show author details

David Aristoff*: Affiliation:
Colorado State University
*: *Postal address: 841 Oval Drive, Fort Collins, CO 80523, USA. Email address: aristoff@math.colostate.edu

Article contents

Abstract
Introduction
Description of the method
Main results
Comparison to direct Monte Carlo
Derivations
Funding information
Competing interests
References

Rights & Permissions

Abstract

We study weighted ensemble, an interacting particle method for sampling distributions of Markov chains that has been used in computational chemistry since the 1990s. Many important applications of weighted ensemble require the computation of long time averages. We establish the consistency of weighted ensemble in this setting by proving an ergodic theorem for time averages. As part of the proof, we derive explicit variance formulas that could be useful for optimizing the method.

Keywords

Interacting particle system sequential Monte Carlo ergodic theorem weighted ensemble importance sampling steady state stratification

MSC classification

Primary: 65C05: Monte Carlo methods

Secondary: 65C35: Stochastic particle methods 65C40: Computational Markov chains

Type: Original Article
Information: Journal of Applied Probability , Volume 59 , Issue 1 , March 2022 , pp. 152 - 166

DOI: https://doi.org/10.1017/jpr.2021.38 [Opens in a new window]
Copyright: © The Author(s), 2022. Published by Cambridge University Press on behalf of Applied Probability Trust

1. Introduction

Weighted ensemble [Reference Bhatt, Zhang and Zuckerman6, Reference Chong, Saglam and Zuckerman11, Reference Costaouec, Feng, Izaguirre and Darve16, Reference Darve and Ryu17, Reference Dickson and Brooks23, Reference Donovan, Sedgewick, Faeder and Zuckerman25, Reference Huber and Kim35, Reference Rojnuckarin, Kim and Subramaniam39, Reference Rojnuckarin, Livesay and Subramaniam40, Reference Zhang, Jasnow and Zuckerman51, Reference Zhang, Jasnow and Zuckerman52, Reference Zwier, Adelman, Kaus, Pratt, Wong, Rego, Suárez, Lettieri, Wang and Grabe56] is an importance sampling method, based on interacting particles, for distributions associated with a Markov chain. In this article we focus on sampling the average of a function with respect to the steady-state distribution of a generic Markov chain. By generic, we mean that the only thing we might know about the Markov chain is how to sample it; in particular, we may not know its stationary distribution up to a normalization factor.

Weighted ensemble consists of a collection of evolving particles with associated weights. In this sense, weighted ensemble can be understood as a kind of sequential Monte Carlo method [Reference Assaraf, Caffarel and Khelif4, Reference Del Moral18, Reference Del Moral and Doucet19, Reference Del Moral and Garnier20, Reference Doucet, de Freitas and Gordon28, Reference Hairer and Weare32, Reference Webber48, Reference Webber, Plotkin, O’Neill, Abbot and Weare49]. In weighted ensemble, the particles evolve between selection steps according to the law of the underlying Markov chain. In each selection step, some of the particles are copied while others are killed; the resulting particles are given new weights so that weighted ensemble is statistically unbiased [Reference Zhang, Jasnow and Zuckerman52].

The selection step is based on dividing the particles into bins, where the particles in each bin are resampled according to their relative weights. In practice, the binning, and the number of copies in each bin, should be chosen so that important particles survive and irrelevant particles are killed. The definition of the bins, and how many copies to maintain in each, requires some care. With appropriate choices, weighted ensemble can have drastically smaller variance than direct Monte Carlo, or independent particles; see the references above, or [Reference Zuckerman and Chong54] for a more complete list.

Weighted ensemble was developed for applications in computational chemistry [Reference Huber and Kim35] ranging from state space exploration [Reference Dickson and Brooks23] to protein association [Reference Huber and Kim35] and protein folding [Reference Zwier and Chong55]. One important application we have in mind is the computation of the mean time for a protein to unfold [Reference Bhatt, Zhang and Zuckerman6]. This time can be reformulated as the inverse of the steady-state flux into the unfolded state of the underlying Markovian molecular dynamics, with an added sink in the unfolded state and source in the folded state [Reference Hill34]. This dynamics can approach its steady state on time scales significantly smaller than the mean unfolding time [Reference Zuckerman53]. As the flux into the unfolded state is usually very small, importance sampling is needed to estimate it with substantial precision [Reference Bhatt, Zhang and Zuckerman6, Reference Suárez, Adelman and Zuckerman43].

Other unbiased methods, differing from weighted ensemble in that they usually sample finite-time quantities rather than ergodic averages, include Adaptive Multilevel Splitting [Reference Bréhier, Gazeau, Goudenège, Lelièvre and Rousset7, Reference Bréhier, Lelièvre and Rousset8, Reference Cérou, Guyader, Lelièvre and Pommier10], Forward Flux Sampling [Reference Allen, Frenkel and ten Wolde1], and some sequential Monte Carlo methods [Reference Chraibi, Dutfoy, Galtier and Garnier13, Reference Del Moral and Garnier20, Reference Webber, Plotkin, O’Neill, Abbot and Weare49]. This unbiased property allows for a relatively straightforward study of variance using martingale techniques [Reference Aristoff2, Reference Bréhier, Gazeau, Goudenège, Lelièvre and Rousset7, Reference Del Moral18, Reference Del Moral and Garnier20]. In this article we extend these techniques to study the long-time stability of weighted ensemble.

Our main contribution here is a proof of the consistency of weighted ensemble via an ergodic theorem. We believe that this is the first ergodic theorem for an interacting particle system in which the interactions come from resampling.

A secondary contribution comes from explicit formulas for the variance of weighted ensemble at finite particle number. The proof of the ergodic theorem is a straightforward consequence of these formulas. On the theoretical side, our variance formulas are handy for understanding the rate of weighted ensemble convergence, and on the practical side, they could be used for optimizing the method. We mostly leave this discussion to other works, including our companion paper [Reference Aristoff and Zuckerman3]; see also [Reference Aristoff2] and the references above.

This article is organized as follows. In Section 2 we describe weighted ensemble in detail. In Section 3 we state our main results, including the unbiased property (Theorem 2), the ergodic theorem (Theorem 3), and the variance formulas (Theorem 4). In Section 4 we compare weighted ensemble to direct Monte Carlo, and give a simple example illustrating the potential gain. All of our proofs are in Section 5.

2. Description of the method

Weighted ensemble consists of a fixed number, N, of particles belonging to a common state space, each carrying a positive scalar weight, and undergoing repeated selection and mutation steps. In the selection step, some of the particles are copied, and others are killed, according to a stratification or binning scheme. In the mutation step, the particles evolve according to an underlying Markov kernel K.

At time t before selection, the particles, called parents, are $\xi_t^1,\ldots,\xi_t^N$ . At time t after selection, the particles, called children, are $\hat\xi_t^1,\ldots,\hat\xi_t^N$ . The weights of the parents and children are $ \omega_t^1,\ldots,\omega_t^N$ and $\hat\omega_t^1,\ldots,\hat\omega_t^N$ , respectively. The following diagram illustrates weighted ensemble evolution:

(1)

\begin{align}\begin{split}\begin{array}{c@{\quad}c@{\quad}c}\text{parents }\{\xi_t^i\}^{i=1,\ldots,N} & \text{parents' weights } \{\omega_t^i\}^{i=1,\ldots,N} & \text{user-chosen}\\[3pt] \downarrow \text{selection} & \downarrow \text{selection} & \text{parameters} \\[3pt] \text{children }\{\hat{\xi}_t^i\}^{i=1,\ldots,N} & \text{children's weights } \{\hat{\omega}_t^i\}^{i=1,\ldots,N} & \leftarrow\\[3pt] \downarrow \text{mutation}& \downarrow \text{mutation}\\[3pt] \text{new parents }\{\xi_{t+1}^i\}^{i=1,\ldots,N} & \text{new parents' weights } \{\omega_{t+1}^i\}^{i=1,\ldots,N}\end{array}\end{split}\end{align}

The initial particles $\xi_0^1, \ldots,\xi_0^N$ can be arbitrary. The initial weights must be strictly positive and sum to one: $\omega_0^i>0$ for all i, and $\omega_0^1+\cdots + \omega_0^N =1$ . The children are initially just copies of their parents, but they evolve forward in time conditionally independently. When we say conditionally independent, we mean conditional on the $\sigma$ -algebra representing the information from (1) up to the current time.

Weighted ensemble requires the user to choose, before the selection step at time t, a collection of non-empty bins that partition the set of parents, as well as a particle allocation that defines the number of children in each bin. We write u for the bins, and $N_t(u) \ge 1$ for the number of children in bin u at time t. We require that $\sum_u N_t(u) = N$ . The bins can change in time, but for simpler notation we leave this implicit.

In the selection step, the children in each bin are obtained by sampling with replacement from the parents in the bin, according to their weight distribution, as many times as the particle allocation specifies. The children’s weights in each bin are all the same after selection, and the total weight in the bin is preserved [Reference Costaouec, Feng, Izaguirre and Darve16].

In more detail, define the total weight in bin u at time t as

\begin{equation*} \omega_t(u) = \sum_{i\colon \xi_t^i \in u} \omega_t^i.\end{equation*}

The numbers, $N_t^i$ , of children of the parents $\xi_t^i \in u$ are conditionally multinomial:

(2)

\begin{equation}\{N_t^i\colon \xi_t^i \in u\} \sim \text{Multinomial}\biggl(N_t(u), \biggl\{\dfrac{\omega_t^i}{\omega_t(u)}\colon \xi_t^i \in u\biggr\}\biggr).\end{equation}

Children are assigned to the same bins as their parents, with weights

(3)

\begin{equation}\hat{\omega}_t^j = \dfrac{\omega_t(u)}{N_t(u)}\quad \text{if }\hat{\xi}_t^j \in u.\end{equation}

Selections in distinct bins are conditionally independent.

In the mutation step, the children evolve conditionally independently via K:

(4)

\begin{equation}(\xi_{t+1}^1,\ldots,\xi_{t+1}^N) \sim K(\hat{\xi}_t^1,\cdot) \times \cdots \times K(\hat{\xi}_t^N,\cdot).\end{equation}

The weights do not change during the mutation step. Thus

(5)

\begin{equation}{\omega}_{t+1}^{j} = \hat{\omega}_{t}^j, \quad j=1,\ldots,N.\end{equation}

We summarize weighted ensemble in the following algorithm.

Algorithm 1. Choose initial weights $\omega_0^1,\ldots,\omega_0^N>0$ summing to 1 and initial particles $\xi_0^1,\ldots,\xi_0^N$ . Then iterate over $t \ge 0$ :

Selection step

1. Partition the parents $\xi_t^1,\ldots,\xi_t^N$ into a collection of bins.
2. Assign a number $N_t(u)\ge 1$ of children to the parents in each bin u.
3. Sample $N_t(u)$ children from the parents in bin u, with replacement, using
\[{\mathbb{P}}(\text{sample $\xi_t^j$ in bin {u}}) = \dfrac{\omega_t^j}{\omega_t(u)},\]
where $\omega_t(u) \;:\!=\; \sum_{i\colon \xi_t^i \in u}\omega_t^i$ is the total weight in bin u.
4. Give all the children in bin u the same weight
\[\hat{\omega}_t^j = \dfrac{\omega_t(u)}{N_t(u)} \quad \text{if }\hat{\xi}_t^j \in u.\]

Mutation step

1. Evolve the children $\hat{\xi}_t^1,\ldots,\hat{\xi}_t^N$ conditionally independently using K to get the next parents $\xi_{t+1}^1,\ldots,\xi_{t+1}^N$ . Keep the same weights $\omega_{t+1}^j = \hat{\omega}_t^j$ , $j=1,\ldots,N$ .

2.1. Algorithm details and remarks

A few remarks are in order to clarify Algorithm 1.

We abuse notation by writing $\xi_t^i \in u$ or $\hat{\xi}_t^j \in u$ to indicate that $\xi_t^i$ or $\hat{\xi}_t^j$ is in bin u, even though the bins form a partition of the particles, and not (necessarily) a partition of the state space.
A child is simply a copy of its parent: if $\hat{\xi}_t^j$ is a child of $\xi_t^i$ , then $\hat{\xi}_t^j = \xi_t^i$ . The indices of the children are not important, so specifying the number of children of each parent is enough to define them.
Since the weights do not change in the mutation step and the the selection step preserves the total weight, the total weight is constant in time: $\omega_t^1 + \cdots + \omega_t^N = 1$ for all $t \ge 0$ . We discuss the importance of this in Remark 1.
We assume that the bins and particle allocation at time t are included in ${\mathcal F}_t$ , the $\sigma$ -algebra generated by the information from Algorithm 1 just before the tth selection step. We also write $\hat{\mathcal F}_t$ for the $\sigma$ -algebra generated by the information from Algorithm 1 just after the tth selection step. See Section 5.1 for details.
We assume multinomial sampling in the bins because it leads to simple explicit variance expressions in terms of intrabin variances. In Remark 3 we comment on residual sampling, which performs much better than multinomial resampling and still admits nice variance formulas.

For our ergodic theorem, the bins and particle allocation can be arbitrary. To actually do better than direct Monte Carlo, they must be judiciously chosen. The most common strategy is to define bins based on a carefully constructed partition of state space – particles occupy the same bin when they belong to the same element of the partition – and then allocate children approximately uniformly among these bins. Some knowledge about the underlying problem is needed to choose the bins, but this strategy has had considerable success, as the references in the introduction attest (see [Reference Zuckerman and Chong54] for a mostly current list of application papers). We propose a different strategy in our companion paper [Reference Aristoff and Zuckerman3] that uses our variance analysis below. We summarize that strategy in Section 4 below.

3. Main results

3.1. Unbiased property

We begin with the unbiased property of weighted ensemble. This property was previously noted in [Reference Zhang, Jasnow and Zuckerman52], and proved in a slightly different setting in [Reference Aristoff2].

Theorem 2. (Unbiased property.) For each $t\ge 0$ and all bounded measurable g,

\begin{equation*}{\mathbb E}\biggl[\sum_{i=1}^N \omega_t^i g(\xi_t^i)\biggr] = \int K^t g\,{\mathrm{d}}\nu,\end{equation*}

where $\nu$ is the weighted ensemble initial distribution,

\[\int g\,{\mathrm{d}}\nu \;:\!=\; {\mathbb E}\biggl[\sum_{i=1}^N \omega_0^i g(\xi_0^i)\biggr].\]

We could interpret Theorem 2 as follows. If $(X_t)$ is a Markov chain with kernel K and initial distribution $\nu$ , then

\[ {\mathbb E}\biggl[\sum_{i=1}^N \omega_t^i g(\xi_t^i)\biggr] = {\mathbb E}[g(X_t)]. \]

In this sense, weighted ensemble gives unbiased estimates of the law of the underlying Markov chain.

3.2. Ergodic theorem

To ensure that weighted ensemble is ergodic, the underlying Markov kernel K must be ergodic in some sense. We assume that K is uniformly ergodic [Reference Douc, Moulines and Stoffer27].

Assumption 1. There is $c>0$ , $\lambda \in [0,1)$ and a probability measure $\mu$ such that

\[\|K^t(\xi,\cdot) - \mu(\!\cdot\!)\|_{TV} \le c \lambda^t \quad \textit{for all $\xi$ and all $t\ge 0$.}\]

Here and below, f is a fixed bounded measurable function.

Theorem 3. (Ergodic theorem.) If Assumption 1 holds, then with probability 1,

(6)

\begin{equation}\lim_{T \to\infty} \dfrac{1}{T}\sum_{t=0}^{T-1}\sum_{i=1}^N \omega_t^i f(\xi_t^i) =\int f\,{\mathrm{d}}\mu.\end{equation}

Convergence of the mean of the time average in (6), at the same rate as direct Monte Carlo, follows from Assumption 1 and the unbiased property (Theorem 2). For the ergodic theorem to hold, and for weighted ensemble to beat direct Monte Carlo, the variance of the time average should be sufficiently small. Well-behaved variance is not automatic for unbiased methods; see Remark 1 below.

3.3. Variance formulas

Here, we give exact, finite N formulas for the variance of weighted ensemble, based on a martingale decomposition. To get nice concise formulas, we need some notation. Define the intrabin distributions

\begin{equation*}\eta_t^u = \sum_{i\colon \xi_t^i \in u} \dfrac{\omega_t^i}{\omega_t(u)} \delta_{\xi_t^i},\end{equation*}

where $\delta_{\xi}$ is the Dirac delta distribution centered at $\xi$ . Define also

\begin{equation*}h_{t,T} = \sum_{s=0}^{T-t-1} K^sf.\end{equation*}

For a probability measure $\eta$ and bounded measurable function g, define

\begin{equation*}\eta(g) = \int g\,{\mathrm{d}}\eta, {\rm{Var}}_\eta(g) = \eta(g^2)-\eta(g)^2,\end{equation*}

and in particular, let ${\rm{Var}}_K g(\xi) \;:\!=\; {\rm{Var}}_{K(\xi,\cdot)}(g) = Kg^2(\xi) - (Kg)^2(\xi)$ .

Theorem 4. (Variance formulas.) For each time $T >0$ ,

(7)

\begin{align} {\rm{Var}}\biggl(\dfrac{1}{T}\sum_{t=0}^{T-1}\sum_{i=1}^N \omega_t^if(\xi_t^i) \biggr) \end{align}

(8)

\begin{align} &\!\!\!\!\!\!\!\!\!\!\!= \dfrac{1}{T^2}{\rm{Var}}\biggl(\sum_{i=1}^N \omega_0^i h_{0,T}(\xi_0^i)\biggr) \quad {\rm{(initial \, condition \, variance)}} \end{align}

(9)

\begin{align} & \quad\quad + \dfrac{1}{T^2}\sum_{t=0}^{T-2}{\mathbb E}\biggl[\sum_u \dfrac{\omega_t(u)^2}{N_t(u)}{\rm{Var}}_{\eta_t^u}(Kh_{t+1,T})\biggr]\quad \text{(selection variance)} \end{align}

(10)

\begin{align} & \quad\quad + \dfrac{1}{T^2}\sum_{t=0}^{T-2} {\mathbb E}\biggl[\sum_u\dfrac{\omega_t(u)^2}{N_t(u)}\eta_t^u({\rm{Var}}_K h_{t+1,T})\biggr] \quad \text{(mutation variance)}. \end{align}

The expression in (8) can be interpreted as the variance coming from the initial condition, while the expressions in (9) and (10) can be understood as the variances arising from each selection and mutation step, respectively.

Using Theorem 4, the proof of the ergodic theorem is straightforward. Under Assumption 1, we can show that variances of $h_{t,T}$ and $Kh_{t,T}$ are uniformly bounded in t and T. This makes the weighted ensemble variance

\[{\rm{Var}}\biggl(\dfrac{1}{T}\sum_{t=0}^{T-1}\sum_{i=1}^N \omega_t^if(\xi_t^i) \biggr) = {\mathrm{O}}(1/T).\]

The ergodic theorem then follows from standard arguments. Note that the variance expressions (7)–(10) by themselves do not require Assumption 1.

Beyond the ergodic theorem, these variance formulas are interesting in their own right, since they could be used to design binning and particle allocation schemes that minimize the weighted ensemble variance. Indeed, that is what we have done in our companion paper [Reference Aristoff and Zuckerman3] (see also [Reference Aristoff2]). We discuss this more in the next section.

4. Comparison to direct Monte Carlo

There are many existing works showing that weighted ensemble can provide significant gains over direct Monte Carlo: see for instance the references list in [Reference Zuckerman and Chong54], and our companion paper [Reference Aristoff and Zuckerman3]. The main goal of this article is to prove the consistency of weighted ensemble from explicit variance formulas, and not to reaffirm this point. Nonetheless, we include a brief discussion here.

Weighted ensemble works by reducing the mutation variance (10), compared to that of direct Monte Carlo (see (28) below), via the selection step. However, this comes at the cost of a positive selection variance (9), compared to direct Monte Carlo, which has selection variance equal to zero. Thus weighted ensemble beats direct Monte Carlo if the reduction in mutation variance is greater than the selection variance cost. (The initial condition variances may be ignored, since they are ${\mathrm{O}}(1/T^2)$ for weighted ensemble and direct Monte Carlo, while the overall variances are ${\mathrm{O}}(1/T)$ .)

In more detail, the weighted ensemble mutation variance is

(11)

\begin{equation}\dfrac{1}{T^2}\sum_{t=0}^{T-2}{\mathbb E}\biggl[\sum_u\dfrac{\omega_t(u)^2}{N_t(u)}\eta_t^u({\rm{Var}}_Kh_{t+1,T})\biggr].\end{equation}

A Lagrange multiplier calculation shows that the expression inside the expectation in (11) is minimized over $N_t(u)$ , subject to the constraint $\sum_u N_t(u) = N$ , when

(12)

\begin{equation}N_t(u) \approx \dfrac{N\omega_t(u)\sqrt{\eta_t^u({\rm{Var}}_Kh_{t+1,T})}}{\sum_u \omega_t(u)\sqrt{\eta_t^u({\rm{Var}}_Kh_{t+1,T})}}.\end{equation}

Plugging this into (11) makes the weighted ensemble mutation variance

(13)

\begin{equation}\approx \dfrac{1}{NT^2}\sum_{t=0}^{T-2}{\mathbb E}\biggl[ \biggl(\sum_u\omega_t(u)\sqrt{\eta_t^u({\rm{Var}}_Kh_{t+1,T})}\biggr)^2\biggr].\end{equation}

By Jensen’s inequality, (13) is less than or equal to

(14)

\begin{equation}\dfrac{1}{NT^2} \sum_{t=0}^{T-2}{\mathbb E}\biggl[\sum_u \omega_t(u)\eta_t^u({\rm{Var}}_Kh_{t+1,T})\biggr],\end{equation}

which, by the unbiased property of weighted ensemble (Theorem 2), is exactly the direct Monte Carlo mutation variance (see Remark 2 below). This form of optimal mutation variance gain was originally observed in [Reference Aristoff2, Remark 4.1].

The selection variance (9) is small whenever, at each time t, bins u are chosen inside which $Kh_{t+1,T}$ does not vary too much. With enough particles and bins, it is possible to keep the selection variance arbitrarily small, while also controlling the mutation variance by keeping the particle allocation close to the optimal (12). Even with modest numbers of particles and bins, weighted ensemble has proved to be useful; see the list of applications in [Reference Zuckerman and Chong54], most of which use relatively small N.

In our companion paper [Reference Aristoff and Zuckerman3], we propose an optimization strategy based on choosing the particle allocation to minimize mutation variance, and the bins to minimize selection variance. There, the particle allocation is a simplified version of (12), in which we use the limit $h \;:\!=\; \lim_{T\to \infty} (h_{t,T}- (T-t)\int f\,{\mathrm{d}}\mu)$ in place of $h_{t+1,T}$ , and the bins are chosen to make the intrabin variances of Kh small. Of course, estimating Kh and ${\rm{Var}}_Kh$ is a difficult problem. In [Reference Aristoff and Zuckerman3] we propose using a Markov state model to get ‘cheap’ approximations of Kh and ${\rm{Var}}_Kh$ ; such models are already commonly used for preconditioning weighted ensemble simulations [Reference Bhatt, Zhang and Zuckerman6, Reference Copperman and Zuckerman14, Reference Copperman and Zuckerman15].

4.1. Example

Below is a simple example illustrating the variance reduction in weighted ensemble, compared to direct Monte Carlo, in the context of our variance analysis above. Consider a Markov chain on three states, with transition matrix

\[K(1,2) = K(2,3) = \delta,\quad K(1,1) = K(2,1) = 1-\delta, \quad K(3,1) = 1,\]

where $\delta>0$ is small, and we take $f(1) = f(2) = 0$ and $f(3) = 1$ .

For weighted ensemble, we will assign particles to the same bin if and only if they occupy the same point in space. Following the usual method in the literature, we allocate an approximately equal number of children to each bin.

In this case the variance of weighted ensemble can be estimated as

(15)

\begin{equation} \dfrac{1}{T^2}\sum_{t=0}^{T-2}{\mathbb E}\biggl[\sum_{i=1}^3\dfrac{\omega_t(i)^2}{N_t(i)}{\rm{Var}}_K h_{t+1,T}(i)\biggr]\approx \dfrac{1}{T}\sum_{i=1}^3 \dfrac{\mu(i)^2}{N/3}{\rm{Var}}_K h_{t+1,T}(i)\approx \dfrac{6\delta^3}{NT},\end{equation}

where the first approximation in (15) holds for large enough N and T, and the second approximation uses direct calculations, dropping terms of higher order than $\delta^3$ . The variance of direct Monte Carlo (see Remark 2) can be estimated by

(16)

\begin{equation} \dfrac{1}{NT^2}\sum_{t=0}^{T-2}\nu(K^t({\rm{Var}}_Kh_{t+1,T}))\approx \dfrac{1}{NT}\mu({\rm{Var}}_K h_{t+1,T})\approx \dfrac{\delta^2-\delta^3}{NT}\end{equation}

for large T. Figure 1 shows numerical confirmation of these estimates. Note the smaller variance (higher order in $\delta$ ) for weighted ensemble, compared to direct Monte Carlo.

Figure 1. Comparison of weighted ensemble with direct Monte Carlo when $\delta = 0.001$ and $T = 500$ . (a) Average values of $ ({{1}/{T}})\sum_{t=0}^{T-1}\sum_{i=1}^N \omega_t^i f(\xi_t^i) $ versus N, computed from $10^4$ independent trials, for weighted ensemble and direct Monte Carlo. Error bars widths are $\sigma_T/10^2$ , where $\sigma_T^2$ are the empirical variances. (b) Weighted ensemble empirical standard deviation compared with (15). (c) Direct Monte Carlo empirical standard deviation compared with (16).

5. Derivations

5.1. Notation

Below, ${\mathcal F}_t$ is the $\sigma$ -algebra generated by the parents and their weights from times $0 \le s \le t$ , the children and their weights from times $0 \le s \le t-1$ , and the bins and particle allocations from times $0 \le s \le t$ . Meanwhile, $\hat{\mathcal F}_t$ is the $\sigma$ -algebra generated by ${\mathcal F}_t$ together with the children and their weights at time t. Throughout, g denotes a bounded measurable function, and c a positive constant whose value can change between different equations. We will use the notation

(17)

\begin{equation}{\rm{par}}(\hat{\xi}_t^j) = \xi_t^i \Longleftrightarrow \xi_t^i \text{ is the parent of }\hat{\xi}_t^j.\end{equation}

5.2. One-step means

Lemma 1. For each $i=1,\ldots,N$ and $t \ge 0$ ,

(18)

\begin{equation}{\mathbb E}[\omega_{t+1}^i g(\xi_{t+1}^i)\mid \hat{\mathcal F}_t] = \hat{\omega}_t^i Kg(\hat{\xi}_t^i).\end{equation}

At each time $t \ge 0$ , for each bin u,

(19)

\begin{equation}{\mathbb E}\biggl[\sum_{i\colon \hat{\xi}_t^i \in u}\hat{\omega}_t^i g(\hat{\xi}_t^i)\biggm|{\mathcal F}_t\biggr] =\sum_{i\colon \xi_t^i \in u} \omega_t^i g(\xi_t^i).\end{equation}

Proof. By (4), ${\mathbb E}[g(\xi_{t+1}^i)\mid \hat{\mathcal F}_t] = Kg(\hat{\xi}_t^i)$ . Now (18) follows from this and (5). Meanwhile, by (2), ${\mathbb E}[N_t^i\mid {\mathcal F}_t] = N_t(u)\omega_t^i/\omega_t(u)$ if $\xi_t^i \in u$ . Thus, by (3) and (17),

\begin{align*}{\mathbb E}\biggl[\sum_{i\colon \hat{\xi}_t^i \in u}\hat{\omega}_t^i g(\hat{\xi}_t^i)\biggm|{\mathcal F}_t\biggr]&= \sum_{i\colon \xi_t^i \in u}{\mathbb E}\biggl[\sum_{j\colon {\rm{par}}(\hat{\xi}_t^j) = \xi_t^i}\hat{\omega}_t^j g(\hat{\xi}_t^j)\biggm|{\mathcal F}_t\biggr] \\&= \sum_{i\colon \xi_t^i \in u}\dfrac{\omega_t(u)}{N_t(u)}g(\xi_t^i){\mathbb E}[N_t^i \mid {\mathcal F}_t] \\&= \sum_{i\colon \xi_t^i \in u}\omega_t^i g(\xi_t^i).\end{align*}

Lemma 2. (One-step means.) For each time $t \ge 0$ ,

(20)

\begin{align} &{\mathbb E}\biggl[\sum_{i=1}^N \omega_{t+1}^i g(\xi_{t+1}^i)\biggm|\hat{\mathcal F}_t\biggr] = \sum_{i=1}^N \hat{\omega}_t^i Kg(\hat{\xi}_t^i), \end{align}

(21)

\begin{align} &{\mathbb E}\biggl[\sum_{i=1}^N \hat{\omega}_t^i g(\hat{\xi}_t^i)\biggm|{\mathcal F}_t\biggr] = \sum_{i=1}^N {\omega}_t^i g({\xi}_t^i). \end{align}

Proof. This follows immediately from Lemma 1, by summing over the particles in (18) to get (20), and summing over the bins in (19) to get (21).

5.3. Proof of the unbiased property

Proof of Theorem 2. By repeated application of Lemma 2 with the tower property,

(22)

\begin{equation}{\mathbb E}\biggl[\sum_{i=1}^N \omega_{t}^i g(\xi_{t}^i)\biggm|{\mathcal F}_0\biggr] = \sum_{i=1}^N \omega_0^i K^t g(\xi_0^i).\end{equation}

Taking expectations in (22) gives the result.

5.4. Doob martingale and variance decomposition

Below, define the Doob martingale

\begin{equation*}D_t = {\mathbb E}\biggl[\sum_{s=0}^{T-1}\sum_{i=1}^N \omega_s^i f(\xi_s^i)\biggm|{\mathcal F}_t\biggr], \quad \hat{D}_t = {\mathbb E}\biggl[\sum_{s=0}^{T-1}\sum_{i=1}^N \omega_s^i f(\xi_s^i)\biggm|\hat{\mathcal F}_t\biggr].\end{equation*}

Proposition 1. For $0 \le t \le T-1$ , we have

(23)

\begin{align}D_t = \sum_{s=0}^{t} \sum_{i=1}^N \omega_s^i f(\xi_s^i) + \sum_{i=1}^N \omega_{t}^i Kh_{t+1,T}(\xi_{t}^i), \quad \hat{D}_t = \sum_{s=0}^{t} \sum_{i=1}^N \omega_s^i f(\xi_s^i) + \sum_{i=1}^N \hat{\omega}_{t}^i Kh_{t+1,T}(\hat{\xi}_t^i).\end{align}

Proof. This comes from Lemma 2 by repeated application of the tower property.

Proposition 2. (Martingale variance decomposition.) For each $T>0$ ,

(24)

\begin{align}&{\rm{Var}}\biggl(\dfrac{1}{T}\sum_{t=0}^{T-1}\sum_{i=1}^N \omega_t^i f(\xi_t^i)\biggr) \notag \\*&\quad = \underbrace{\dfrac{1}{T^2}{\rm{Var}}(D_0)}_{{initial \, condition \, variance}} +\underbrace{\dfrac{1}{T^2}\sum_{t=0}^{T-2}{\mathbb E}[(\hat{D}_t-D_t)^2]}_{{selection \, variance}} + \underbrace{\dfrac{1}{T^2}\sum_{t=0}^{T-2}{\mathbb E}[(D_{t+1}-\hat{D}_t)^2]}_{{mutation \, variance}}.\end{align}

Proof. It is straightforward to check that all the martingale differences $D_{t+1}-\hat{D}_t$ and $\hat{D}_t-D_t$ are uncorrelated with each other and with $D_0$ . The proof is finished by writing

\[D_{T-1} = \sum_{t=0}^{T-1}\sum_{i=1}^N \omega_t^i f(\xi_t^i)\]

as a telescoping sum of the martingale differences, computing ${\mathbb E}[D_{T-1}^2]$ in terms of the martingale differences, and subtracting ${\mathbb E}[D_{T-1}]^2 = {\mathbb E}[D_0]^2$ from the resulting expression.

In the proof of Theorem 4 below, we will see that the initial condition variance, selection variance, and mutation variance in (24) are the same as those in (8)–(10).

5.5. Proof of the variance formulas

Proof of Theorem 4. Using the formula (23) for the Doob martingale, together with the one-step mean formula (20), the weight update formula (5), and the conditional independence of particle evolution in (4),

(25)

\begin{align} {\mathbb E}[(D_{t+1} - \hat{D}_{t})^2\mid \hat{\mathcal F}_{t}] &= {\rm{Var}}\biggl(\sum_{i=1}^N \omega_{t+1}^i h_{t+1,T}(\xi_{t+1}^i)\biggm|\hat{\mathcal F}_t\biggr) \notag \\ &= \sum_{i=1}^N (\hat{\omega}_t^i)^2{\rm{Var}}_{K}h_{t+1,T}(\hat{\xi}_t^i).\end{align}

By (25), the weight update formula (3), and the bin mean formula (19),

\begin{align*} {\mathbb E}[(D_{t+1}-\hat{D}_t)^2] &= {\mathbb E}\biggl[\sum_{u} \dfrac{\omega_t(u)}{N_t(u)} \sum_{i\colon \hat{\xi}_t^i \in u}\hat{\omega}_t^i {\rm{Var}}_{K}h_{t+1,T}(\hat{\xi}_t^i)\biggr] \\[-2pt] &= {\mathbb E}\biggl[\sum_{u} \dfrac{\omega_t(u)}{N_t(u)} \sum_{i\colon {\xi}_t^i \in u}\omega_t^i{\rm{Var}}_{K}h_{t+1,T}(\xi_t^i)\biggr].\end{align*}

In light of Proposition 2, this gives (10). Now we turn to (9). Using the formula (23) for the Doob martingale, the one-step mean formula (21), and the fact that selections in distinct bins are conditionally independent,

(26)

\begin{align} {\mathbb E}[(\hat{D}_t - D_t)^2\mid {\mathcal F}_t] &= {\rm{Var}}\biggl(\sum_{i=1}^N \hat{\omega}_{t}^i Kh_{t+1,T}(\hat{\xi}_{t}^i)\biggm|{\mathcal F}_t\biggr)\notag \\[-2pt] &= \sum_u {\rm{Var}}\biggl(\sum_{i\colon \hat{\xi}_t^i\in u} \hat{\omega}_t^i Kh_{t+1,T}(\hat{\xi}_t^i)\biggm|{\mathcal F}_t\biggr).\end{align}

Using (26), the weight update formula (3), and the conditional independence property of multinomial resampling (see e.g. [Reference Douc, Cappe and Moulines26, equation (6)),

\begin{align*} {\mathbb E}[(\hat{D}_t - D_t)^2\mid {\mathcal F}_t] &= \sum_u \omega_t(u)^2 {\rm{Var}}\biggl(\dfrac{1}{N_t(u)}\sum_{i\colon \hat{\xi}_t^i\in u} Kh_{t+1,T}(\hat{\xi}_t^i)\biggm|{\mathcal F}_t\biggr) \\[-2pt] &= \sum_u \dfrac{\omega_t(u)^2}{N_t(u)}{\rm{Var}}_{\eta_t^u}(Kh_{t+1,T}).\end{align*}

Taking expectations in this expression and appealing to Proposition 2 gives (9).

5.6. Remarks on the variance formulas

Remark 1. We briefly comment on the variance for other methods of the selection and mutation type (1). Consider a method with the same mutation step, but a different selection step that is still unbiased in the sense of (21).

In this case the same variance decomposition (24) applies, with selection variance

(27)

\begin{equation}\dfrac{1}{T^2}\sum_{t=0}^{T-2}{\mathbb E}[(\hat{D}_t-D_t)^2] =\dfrac{1}{T^2}\sum_{t=0}^{T-2}{\mathbb E}\biggl[ {\rm{Var}}\biggl(\sum_{i=1}^N \hat{\omega}_t^i Kh_{t+1,T}(\hat{\xi}_t^i)\biggm|{\mathcal F}_t\biggr)\biggr].\end{equation}

For a method like weighted ensemble in which the total weight is always 1, $Kh_{t+1,T}$ can be replaced with $Kh_{t+1,T} - (T-t-1)\int f\,{\mathrm{d}}\mu$ in (27) without otherwise changing the equation. Assumption 1 shows that $Kh_{t+1,T} - (T-t-1)\int f\,{\mathrm{d}}\mu$ is uniformly bounded in t and T. As a result, the selection variance (27) is ${\mathrm{O}}(1/T)$ , and the ergodic theorem remains valid.

We observed numerically that if the total weight varies at each time, then the weights tend to approach zero, and the variance (7) is of order T as $T \to \infty$ . Indeed, $Kh_{t+1,T}$ is typically of order T as $T \to \infty$ , which suggests that in this case the selection variance (27) is of the order of

\[\sum_{t=0}^{T-2}{\mathbb E}\biggl[{\rm{Var}}\biggl(\sum_{i=1}^N \hat{w}_t^i\biggm|{\mathcal F}_t\biggr)\biggr]\quad\text{as $T \to \infty$.}\]

Of course, the ergodic theorem fails if the variance (7) goes to infinity as $T \to \infty$ .

Remark 2. Note that direct Monte Carlo – which we define as independent, equally weighted particles – is a special case of weighted ensemble in which each particle always has weight $1/N$ , every parent always gets its own bin u, and $N_t(u)$ is always 1. In this case the selection variance is zero, while the mutation variance is

(28)

\begin{equation}\dfrac{1}{NT^2} \sum_{t=0}^{T-2}{\mathbb E}\biggl[\dfrac{1}{N}\sum_{i=1}^N {\rm{Var}}_Kh_{t+1,T}(\xi_t^i)\biggr] = \dfrac{1}{NT^2} \sum_{t=0}^{T-2}\nu(K^t({\rm{Var}}_Kh_{t+1,T})).\end{equation}

Remark 3. If the selections in the bins use residual multinomial resampling [Reference Douc, Cappe and Moulines26] instead of multinomial resampling, then the selection variance (9) becomes

\begin{equation*}\dfrac{1}{T^2}\sum_{t=0}^{T-2} {\mathbb E}\biggl[\sum_u\dfrac{\omega_t(u)^2}{N_t(u)^2}r_t(u){\rm{Var}}_{\gamma_t^u} (Kh_{t+1,T})\biggr],\end{equation*}

where $r_t(u)$ and $\gamma_t^u$ are defined from the residuals

\[r_t^i = \dfrac{N_t(u)\omega_t^i}{\omega_t(u)} - \biggl\lfloor \dfrac{N_t(u)\omega_t^i}{\omega_t(u)}\biggr\rfloor\]

\[r_t(u) = \sum_{i\colon \xi_t^i \in u} r_t^i, \quad \gamma_t^u = \sum_{i\colon \xi_t^i\in u} \dfrac{r_t^i}{r_t(u)}\delta_{\xi_t^i}.\]

We omit proof, but include this formula in case it is useful for optimizations, since residual multinomial resampling performs much better than multinomial resampling yet still admits simple explicit variance expressions.

5.7. Proof of the ergodic theorem

Lemma 3. If Assumption 1 holds, then as $T \to \infty$ ,

\begin{equation*}{\rm{Var}}\biggl(\dfrac{1}{T}\sum_{t=0}^{T-1}\sum_{i=1}^N \omega_t^i f(\xi_t^i)\biggr) = {\mathrm{O}}(1/T).\end{equation*}

Proof. By Assumption 1, we have

(29)

\begin{equation}|K^tf(x)-K^tf(y)| \le c\lambda^t \quad \text{for all {x},{y} and all $t\ge 0$,}\end{equation}

where now $c>0$ is a different constant. Thus

\begin{equation*}|h_{t+1,T}(x)-h_{t+1,T}(y)| \le \sum_{s=0}^{T-t-2}|K^sf(x)-K^sf(y)| \le \dfrac{c}{1-\lambda} \;:\!=\; C.\end{equation*}

This shows that ${\rm{Var}}_\eta h_{t+1,T} \le C^2$ , and similarly ${\rm{Var}}_\eta Kh_{t+1,T} \le C^2$ , for any probability distribution $\eta$ . As a result, the selection and mutation variances (9) and (10) are both ${\mathrm{O}}(1/T)$ as $T \to \infty$ . By similar arguments,

\[{\rm{Var}}\biggl(\sum_{i=1}^N \omega_0^i h_{0,T}(\xi_0^i)\biggr) \le C^2,\]

which makes the initialization variance (8) ${\mathrm{O}}(1/T^2)$ . Thus the variance in (7) is ${\mathrm{O}}(1/T)$ .

Proof of Theorem 3 . Define

\[ \theta_T = \dfrac{1}{T}\sum_{t=0}^{T-1}\sum_{i=1}^N \omega_t^i f(\xi_t^i). \]

Then, for $0 < S \le T$ ,

(30)

\begin{equation}|\theta_T - \theta_S| = \biggl|\biggl(\dfrac{S}{T}-1\biggr)\theta_S + \dfrac{1}{T} \sum_{t=S}^{T-1}\sum_{i=1}^N\omega_t^i f(\xi_t^i)\biggr| \\\le 2\sup|f|\biggl(1-\dfrac{S}{T}\biggr).\end{equation}

By Lemma 3, ${\rm{Var}}(\theta_T) = {\mathrm{O}}(1/T)$ . So by Chebyshev’s inequality, there is $c>0$ so that

\begin{equation*}{\mathbb P}(|\theta_T - {\mathbb E}[\theta_T]| \ge cT^{1/3-1/2}) \le \dfrac{1}{T^{2/3}}\end{equation*}

for large enough T. With $T_n = n^2$ , by the Borel–Cantelli lemma, there is $n_0$ such that

(31)

\begin{equation}|\theta_{T_n} - {\mathbb E}[\theta_{T_n}]| < cT_n^{1/3-1/2} \quad\text{a.s. for all $ n \ge n_0$.}\end{equation}

By the unbiased property and Assumption 1,

(32)

\begin{equation}\lim_{T \to \infty} {\mathbb E}[\theta_T] = \int f\,{\mathrm{d}}\mu.\end{equation}

Now given $S>0$ , we can choose $T_n$ so that $T_n \le S \le T_{n+1}$ and write

(33)

\begin{equation}|\theta_S - \int f\,{\mathrm{d}}\mu| \le |\theta_S - \theta_{T_n}| + |\theta_{T_n} - {\mathbb E}[\theta_{T_n}]|+ |{\mathbb E}[\theta_{T_n}] - \int f\,{\mathrm{d}}\mu|.\end{equation}

By (30)–(32), with probability 1, the right-hand side of (33) vanishes as $S \to \infty$ .

Acknowledgements

The author would like to acknowledge Frédéric Cérou, Peter Christman, Josselin Garnier, Gideon Simpson, Gabriel Stoltz and Brian Van Koten for helpful comments, and especially Jeremy Copperman, Matthias Rousset, Robert J. Webber, and Dan Zuckerman for interesting discussions and insights. The author thanks Robert J. Webber for pointing out errors and making many helpful suggestions concerning a previous version of the manuscript.

Funding information

The author also gratefully acknowledges support from the National Science Foundation via the awards NSF-DMS-1818726 and NSF-DMS-1522398.

Competing interests

There were no competing interests to declare which arose during the preparation or publication process of this article.

References

Allen, R. J., Frenkel, D. and ten Wolde, P. R. (2006). Forward flux sampling-type schemes for simulating rare events: efficiency analysis. J. Chem. Phys. 124, 194111.CrossRef Google Scholar PubMed

Aristoff, D. (2018). Analysis and optimization of weighted ensemble sampling. ESAIM Math. Model. Numer. Anal. 52, 1219–1238.CrossRef Google Scholar

Aristoff, D. and Zuckerman, D. M. (2020). Optimizing weighted ensemble sampling of steady states. SIAM Multiscale Model. Simul. 18, 646–673 10.1137/18M1212100CrossRef Google Scholar PubMed

Assaraf, R., Caffarel, M. and Khelif, A. (2000). Diffusion Monte Carlo methods with a fixed number of walkers. Phys. Rev. E 61, 4566–4575.CrossRef Google Scholar PubMed

Bello-Rivas, J. M. and Elber, R. (2015). Exact milestoning. J. Chem. Phys. 142, 03B6021.10.1063/1.4913399CrossRef Google Scholar

Bhatt, D., Zhang, B. W. and Zuckerman, D. M. (2010). Steady-state simulations using weighted ensemble path sampling. J. Chem. Phys. 133, 014110.CrossRef Google Scholar

Bréhier, C.-E. Gazeau, M., Goudenège, L., Lelièvre, T. and Rousset, M. (2016). Unbiasedness of some generalized Adaptive Multilevel Splitting algorithms. Ann. Appl. Prob. 26, 3559–3601.10.1214/16-AAP1185CrossRef Google Scholar

Bréhier, C. E., Lelièvre, T. and Rousset, M. (2015). Analysis of Adaptive Multilevel Splitting algorithms in an idealized case. ESAIM Prob. Statist. 19, 361–394.10.1051/ps/2014029CrossRef Google Scholar

Cappé, O., Moulines, E. and Rydén, T. (2005). Inference in Hidden Markov Models. Springer.CrossRef Google Scholar

Cérou, F., Guyader, A., Lelièvre, T. and Pommier, D. (2011). A multiple replica approach to simulate reactive trajectories. J. Chem. Phys. 134, 054108.CrossRef Google Scholar PubMed

Chong, L. T., Saglam, A. S. and Zuckerman, D. M. (2017). Path-sampling strategies for simulating rare events in biomolecular systems. Curr. Opin. Struct. Biol. 43, 88–94.CrossRef Google Scholar

Chopin, N. (2004). Central limit theorem for sequential Monte Carlo methods and its application to Bayesian inference. Ann. Statist. 32, 2385–2411.10.1214/009053604000000698CrossRef Google Scholar

Chraibi, H., Dutfoy, A., Galtier, T. and Garnier, J. (2021). Optimal input potential functions in the interacting particle system method. Monte Carlo Methods Appl. 27, 137–152.CrossRef Google Scholar

Copperman, J. T. and Zuckerman, D. M. (2020). Accelerated estimation of long-timescale kinetics by combining weighted ensemble simulation with Markov model ‘microstates’ using non-Markovian theory. Biophys. J. 118, 180a.CrossRef Google Scholar

Copperman, J. T. and Zuckerman, D. M. (2020). Accelerated estimation of long-timescale kinetics from weighted ensemble simulation via non-Markovian ‘microbin’ analysis. J. Chem. Theory Comput. 16, 6763–6775.CrossRef Google Scholar PubMed

Costaouec, R., Feng, H., Izaguirre, J. and Darve, E. (2013). Analysis of the accelerated weighted ensemble methodology. Discrete Contin. Dyn. Syst. 2013, 171–181.Google Scholar

Darve, E. and Ryu, E. (2012). Computing reaction rates in bio-molecular systems using discrete macro-states. Chapter 7 of Innovations in Biomolecular Modeling and Simulations: Volume 1. RSC Publishing.Google Scholar

Del Moral, P. (2004). Feynman–Kac Formulae: Genealogical and Interacting Particle Systems with Applications (Probability and its Applications). Springer.CrossRef Google Scholar

Del Moral, P. and Doucet, A. (2014). Particle methods: an introduction with applications. In ESAIM: Proc. 44, 1–46.CrossRef Google Scholar

Del Moral, P. and Garnier, J. (2005). Genealogical particle analysis of rare events. Ann. Appl. Prob. 15, 2496–2534.10.1214/105051605000000566CrossRef Google Scholar

Del Moral, P., Doucet, A. and Jasra, A. (2006). Sequential Monte Carlo samplers. J. R. Statist. Soc. B 68, 411–436.CrossRef Google Scholar

Del Moral, P., Moulines, E., Olsson, J. and Vergé, C. (2016). Convergence properties of weighted particle islands with application to the double bootstrap algorithm. Stoch. Systems 6, 367–418.CrossRef Google Scholar

Dickson, A. and Brooks, C. L. III (2014). WExplore: hierarchical exploration of high-dimensional spaces using the weighted ensemble algorithm. J. Phys. Chem. B 118, 3532–3542.10.1021/jp411479cCrossRef Google Scholar PubMed

Dinner, A. R., Mattingly, J. C., Tempkin, J. O. B., Van Koten, B. and Weare, J. (2018). Trajectory stratification of stochastic dynamics. SIAM Rev. 60, 909–938.CrossRef Google Scholar PubMed

Donovan, R. M., Sedgewick, A. J., Faeder, J. R. and Zuckerman, D. M. (2013). Efficient stochastic simulation of chemical kinetics networks using a weighted ensemble of trajectories. J. Chem. Phys. 139, 09B642-1.CrossRef Google Scholar

Douc, R., Cappe, O. and Moulines, E. (2005). Comparison of resampling schemes for particle filtering. In ISPA 2005: Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis, pp. 64–69. IEEE.CrossRef Google Scholar

Douc, R., Moulines, E. and Stoffer, D. (2014). Nonlinear Time Series Theory, Methods, and Applications with R Examples. CRC Press.Google Scholar

Doucet, A., de Freitas, N. and Gordon, N. (2001). Sequential Monte Carlo Methods in Practice (Statistics for Engineering and Information Science). Springer.CrossRef Google Scholar

Durrett, R. (2019). Probability: Theory and Examples, 5th edn. Cambridge University Press.10.1017/9781108591034CrossRef Google Scholar

Glowacki, D. R., Paci, E. and Shalashilin, D. V. (2011). Boxed molecular dynamics: decorrelation time scales and the kinetic master equation. J. Chem. Theory Comput. 7, 1244–1252.CrossRef Google Scholar PubMed

Grimmett, G. R. and Stirzaker, D. R. (2001). Probability and Random Processes, 3rd edn. Oxford University Press.Google Scholar

Hairer, M. and Weare, J. (2014). Improved diffusion Monte Carlo. Commun. Pure Appl. Math. 67, 1995–2021.CrossRef Google Scholar

Hartmann, C., Banisch, R., Sarich, M., Badowski, T. and Schütte, C. (2014). Characterization of rare events in molecular dynamics. Entropy 16, 350–376.CrossRef Google Scholar

Hill, T. L. (1984). Free Energy Transduction and Biochemical Cycle Kinetics. Dover Publications, New York.Google Scholar

Huber, G. A. and Kim, S. (1996). Weighted-ensemble Brownian dynamics simulations for protein association reactions. Biophys. J. 70, 97–110.10.1016/S0006-3495(96)79552-8CrossRef Google Scholar

Lelièvre, T. (2013). Two mathematical tools to analyze metastable stochastic processes. In Numerical Mathematics and Advanced Applications 2011, eds A. Cangiani et al., pp. 791–810. Springer.CrossRef Google Scholar

Lelièvre, T., Rousset, M. and Stoltz, G. (2010). Free Energy Computations: A Mathematical Perspective. Imperial College Press.CrossRef Google Scholar

Motwani, R. and Raghavan, P. (1995). Randomized Algorithms. Cambridge University Press.CrossRef Google Scholar

Rojnuckarin, A., Kim, S. and Subramaniam, S. (1998). Brownian dynamics simulations of protein folding: access to milliseconds time scale and beyond. Proc. Nat. Acad. Sci. 95, 4288–4292.CrossRef Google Scholar PubMed

Rojnuckarin, A., Livesay, D. R. and Subramaniam, S. (2000). Bimolecular reaction simulation using weighted ensemble Brownian dynamics and the University of Houston Brownian dynamics program. Biophys. J. 79, 686–693.CrossRef Google Scholar PubMed

Rousset, M. (2006). Méthods population Monte-Carlo en temps continu pour la physique numérique. Doctoral Thesis, L’Université Paul Sabatier Toulouse III.Google Scholar

Rousset, M. (2006). On the control of an interacting particle estimation of Schrödinger ground states. SIAM J. Math. Anal. 38, 824–844.CrossRef Google Scholar

Suárez, E., Adelman, J. L. and Zuckerman, D. M. (2016). Accurate estimation of protein folding and unfolding times: beyond Markov State Models. J. Chem. Theory Comput. 12, 3473–3481.10.1021/acs.jctc.6b00339CrossRef Google Scholar PubMed

van Erp, T. S., Moroni, D. and Bolhuis, P. G. (2003). A novel path sampling method for the calculation of rate constants. J. Chem. Phys. 118, 7762–7774.CrossRef Google Scholar

Vanden-Eijnden, E. and Venturoli, M. (2009). Exact rate calculations by trajectory parallelization and tilting. J. Chem. Phys. 131, 044120.CrossRef Google Scholar PubMed

Vergé, C., Dubarry, C., Del Moral, P. and Moulines, E. (2015). On parallel implementation of sequential Monte Carlo methods: the island particle model. Statist. Comput. 25, 243–260.CrossRef Google Scholar

Warmflash, A., Bhimalapuram, P. and Dinner, A. R. (2007). Umbrella sampling for nonequilibrium processes. J. Chem. Phys. 127, 154112.CrossRef Google Scholar PubMed

Webber, R. J. (2019). Unifying sequential Monte Carlo with resampling matrices. Available at arXiv:1903.12583.Google Scholar

Webber, R. J., Plotkin, D. A., O’Neill, M. E., Abbot, D. S. and Weare, J. (2019). Practical rare event sampling for extreme mesoscale weather. Chaos 29, 053109.CrossRef Google Scholar

Wouters, J. and Bouchet, F. (2016). Rare event computation in deterministic chaotic systems using genealogical particle analysis. J. Phys. A 49, 374002.CrossRef Google Scholar

Zhang, B. W., Jasnow, D. and Zuckerman, D. M. (2007). Efficient and verified simulation of a path ensemble for conformational change in a united-residue model of calmodulin. Proc. Nat. Acad. Sci. 104, 18043–18048.CrossRef Google Scholar

Zhang, B. W., Jasnow, D. and Zuckerman, D. M. (2010). The ‘weighted ensemble’ path sampling method is statistically exact for a broad class of stochastic processes and binning procedures. J. Chem. Phys. 132, 054107.CrossRef Google Scholar

Zuckerman, D. M. Discrete-state kinetics and Markov models. Equation (34). Available at http://www.physicallensonthecell.org/discrete-state-kinetics-and-markov-models.Google Scholar

Zuckerman, D. M. and Chong, L. http://westpa.github.io/westpa/publications.html.Google Scholar

Zwier, M. C. and Chong, L. T. (2010). Reaching biological timescales with all-atom molecular dynamics simulations. Curr. Opin. Pharmacol. 10, 745–752.CrossRef Google Scholar PubMed

Zwier, M. C., Adelman, J. L., Kaus, J. W., Pratt, A. J., Wong, K. F., Rego, N. B., Suárez, E., Lettieri, S., Wang, D. W., Grabe, M., et al. (2015). WESTPA: an interoperable, highly scalable software package for weighted ensemble simulation and analysis. J. Chem. Theory Comput. 11, 800–809.CrossRef Google Scholar PubMed

Figure 1. Comparison of weighted ensemble with direct Monte Carlo when $\delta = 0.001$ and $T = 500$. (a) Average values of $ ({{1}/{T}})\sum_{t=0}^{T-1}\sum_{i=1}^N \omega_t^i f(\xi_t^i) $ versus N, computed from $10^4$ independent trials, for weighted ensemble and direct Monte Carlo. Error bars widths are $\sigma_T/10^2$, where $\sigma_T^2$ are the empirical variances. (b) Weighted ensemble empirical standard deviation compared with (15). (c) Direct Monte Carlo empirical standard deviation compared with (16).

Article contents

An ergodic theorem for the weighted ensemble method

Abstract

Keywords

MSC classification

1. Introduction

2. Description of the method

2.1. Algorithm details and remarks

3. Main results

3.1. Unbiased property

3.2. Ergodic theorem

3.3. Variance formulas

4. Comparison to direct Monte Carlo

4.1. Example

5. Derivations

5.1. Notation

5.2. One-step means

5.3. Proof of the unbiased property

5.4. Doob martingale and variance decomposition

5.5. Proof of the variance formulas

5.6. Remarks on the variance formulas

5.7. Proof of the ergodic theorem

Acknowledgements

Funding information

Competing interests

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests