1. Introduction
Weighted ensemble [Reference Bhatt, Zhang and Zuckerman6, Reference Chong, Saglam and Zuckerman11, Reference Costaouec, Feng, Izaguirre and Darve16, Reference Darve and Ryu17, Reference Dickson and Brooks23, Reference Donovan, Sedgewick, Faeder and Zuckerman25, Reference Huber and Kim35, Reference Rojnuckarin, Kim and Subramaniam39, Reference Rojnuckarin, Livesay and Subramaniam40, Reference Zhang, Jasnow and Zuckerman51, Reference Zhang, Jasnow and Zuckerman52, Reference Zwier, Adelman, Kaus, Pratt, Wong, Rego, Suárez, Lettieri, Wang and Grabe56] is an importance sampling method, based on interacting particles, for distributions associated with a Markov chain. In this article we focus on sampling the average of a function with respect to the steady-state distribution of a generic Markov chain. By generic, we mean that the only thing we might know about the Markov chain is how to sample it; in particular, we may not know its stationary distribution up to a normalization factor.
Weighted ensemble consists of a collection of evolving particles with associated weights. In this sense, weighted ensemble can be understood as a kind of sequential Monte Carlo method [Reference Assaraf, Caffarel and Khelif4, Reference Del Moral18, Reference Del Moral and Doucet19, Reference Del Moral and Garnier20, Reference Doucet, de Freitas and Gordon28, Reference Hairer and Weare32, Reference Webber48, Reference Webber, Plotkin, O’Neill, Abbot and Weare49]. In weighted ensemble, the particles evolve between selection steps according to the law of the underlying Markov chain. In each selection step, some of the particles are copied while others are killed; the resulting particles are given new weights so that weighted ensemble is statistically unbiased [Reference Zhang, Jasnow and Zuckerman52].
The selection step is based on dividing the particles into bins, where the particles in each bin are resampled according to their relative weights. In practice, the binning, and the number of copies in each bin, should be chosen so that important particles survive and irrelevant particles are killed. The definition of the bins, and how many copies to maintain in each, requires some care. With appropriate choices, weighted ensemble can have drastically smaller variance than direct Monte Carlo, or independent particles; see the references above, or [Reference Zuckerman and Chong54] for a more complete list.
Weighted ensemble was developed for applications in computational chemistry [Reference Huber and Kim35] ranging from state space exploration [Reference Dickson and Brooks23] to protein association [Reference Huber and Kim35] and protein folding [Reference Zwier and Chong55]. One important application we have in mind is the computation of the mean time for a protein to unfold [Reference Bhatt, Zhang and Zuckerman6]. This time can be reformulated as the inverse of the steady-state flux into the unfolded state of the underlying Markovian molecular dynamics, with an added sink in the unfolded state and source in the folded state [Reference Hill34]. This dynamics can approach its steady state on time scales significantly smaller than the mean unfolding time [Reference Zuckerman53]. As the flux into the unfolded state is usually very small, importance sampling is needed to estimate it with substantial precision [Reference Bhatt, Zhang and Zuckerman6, Reference Suárez, Adelman and Zuckerman43].
Other unbiased methods, differing from weighted ensemble in that they usually sample finite-time quantities rather than ergodic averages, include Adaptive Multilevel Splitting [Reference Bréhier, Gazeau, Goudenège, Lelièvre and Rousset7, Reference Bréhier, Lelièvre and Rousset8, Reference Cérou, Guyader, Lelièvre and Pommier10], Forward Flux Sampling [Reference Allen, Frenkel and ten Wolde1], and some sequential Monte Carlo methods [Reference Chraibi, Dutfoy, Galtier and Garnier13, Reference Del Moral and Garnier20, Reference Webber, Plotkin, O’Neill, Abbot and Weare49]. This unbiased property allows for a relatively straightforward study of variance using martingale techniques [Reference Aristoff2, Reference Bréhier, Gazeau, Goudenège, Lelièvre and Rousset7, Reference Del Moral18, Reference Del Moral and Garnier20]. In this article we extend these techniques to study the long-time stability of weighted ensemble.
Our main contribution here is a proof of the consistency of weighted ensemble via an ergodic theorem. We believe that this is the first ergodic theorem for an interacting particle system in which the interactions come from resampling.
A secondary contribution comes from explicit formulas for the variance of weighted ensemble at finite particle number. The proof of the ergodic theorem is a straightforward consequence of these formulas. On the theoretical side, our variance formulas are handy for understanding the rate of weighted ensemble convergence, and on the practical side, they could be used for optimizing the method. We mostly leave this discussion to other works, including our companion paper [Reference Aristoff and Zuckerman3]; see also [Reference Aristoff2] and the references above.
This article is organized as follows. In Section 2 we describe weighted ensemble in detail. In Section 3 we state our main results, including the unbiased property (Theorem 2), the ergodic theorem (Theorem 3), and the variance formulas (Theorem 4). In Section 4 we compare weighted ensemble to direct Monte Carlo, and give a simple example illustrating the potential gain. All of our proofs are in Section 5.
2. Description of the method
Weighted ensemble consists of a fixed number, N, of particles belonging to a common state space, each carrying a positive scalar weight, and undergoing repeated selection and mutation steps. In the selection step, some of the particles are copied, and others are killed, according to a stratification or binning scheme. In the mutation step, the particles evolve according to an underlying Markov kernel K.
At time t before selection, the particles, called parents, are
$\xi_t^1,\ldots,\xi_t^N$
. At time t after selection, the particles, called children, are
$\hat\xi_t^1,\ldots,\hat\xi_t^N$
. The weights of the parents and children are
$ \omega_t^1,\ldots,\omega_t^N$
and
$\hat\omega_t^1,\ldots,\hat\omega_t^N$
, respectively. The following diagram illustrates weighted ensemble evolution:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqn1.png?pub-status=live)
The initial particles
$\xi_0^1, \ldots,\xi_0^N$
can be arbitrary. The initial weights must be strictly positive and sum to one:
$\omega_0^i>0$
for all i, and
$\omega_0^1+\cdots + \omega_0^N =1$
. The children are initially just copies of their parents, but they evolve forward in time conditionally independently. When we say conditionally independent, we mean conditional on the
$\sigma$
-algebra representing the information from (1) up to the current time.
Weighted ensemble requires the user to choose, before the selection step at time t, a collection of non-empty bins that partition the set of parents, as well as a particle allocation that defines the number of children in each bin. We write u for the bins, and
$N_t(u) \ge 1$
for the number of children in bin u at time t. We require that
$\sum_u N_t(u) = N$
. The bins can change in time, but for simpler notation we leave this implicit.
In the selection step, the children in each bin are obtained by sampling with replacement from the parents in the bin, according to their weight distribution, as many times as the particle allocation specifies. The children’s weights in each bin are all the same after selection, and the total weight in the bin is preserved [Reference Costaouec, Feng, Izaguirre and Darve16].
In more detail, define the total weight in bin u at time t as
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqnU1.png?pub-status=live)
The numbers,
$N_t^i$
, of children of the parents
$\xi_t^i \in u$
are conditionally multinomial:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqn2.png?pub-status=live)
Children are assigned to the same bins as their parents, with weights
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqn3.png?pub-status=live)
Selections in distinct bins are conditionally independent.
In the mutation step, the children evolve conditionally independently via K:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqn4.png?pub-status=live)
The weights do not change during the mutation step. Thus
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqn5.png?pub-status=live)
We summarize weighted ensemble in the following algorithm.
Algorithm 1. Choose initial weights
$\omega_0^1,\ldots,\omega_0^N>0$
summing to 1 and initial particles
$\xi_0^1,\ldots,\xi_0^N$
. Then iterate over
$t \ge 0$
:
Selection step
-
1. Partition the parents
$\xi_t^1,\ldots,\xi_t^N$ into a collection of bins.
-
2. Assign a number
$N_t(u)\ge 1$ of children to the parents in each bin u.
-
3. Sample
$N_t(u)$ children from the parents in bin u, with replacement, using
\[{\mathbb{P}}(\text{sample $\xi_t^j$ in bin {u}}) = \dfrac{\omega_t^j}{\omega_t(u)},\]
$\omega_t(u) \;:\!=\; \sum_{i\colon \xi_t^i \in u}\omega_t^i$ is the total weight in bin u.
-
4. Give all the children in bin u the same weight
\[\hat{\omega}_t^j = \dfrac{\omega_t(u)}{N_t(u)} \quad \text{if }\hat{\xi}_t^j \in u.\]
Mutation step
-
1. Evolve the children
$\hat{\xi}_t^1,\ldots,\hat{\xi}_t^N$ conditionally independently using K to get the next parents
$\xi_{t+1}^1,\ldots,\xi_{t+1}^N$ . Keep the same weights
$\omega_{t+1}^j = \hat{\omega}_t^j$ ,
$j=1,\ldots,N$ .
2.1. Algorithm details and remarks
A few remarks are in order to clarify Algorithm 1.
-
We abuse notation by writing
$\xi_t^i \in u$ or
$\hat{\xi}_t^j \in u$ to indicate that
$\xi_t^i$ or
$\hat{\xi}_t^j$ is in bin u, even though the bins form a partition of the particles, and not (necessarily) a partition of the state space.
-
A child is simply a copy of its parent: if
$\hat{\xi}_t^j$ is a child of
$\xi_t^i$ , then
$\hat{\xi}_t^j = \xi_t^i$ . The indices of the children are not important, so specifying the number of children of each parent is enough to define them.
-
Since the weights do not change in the mutation step and the the selection step preserves the total weight, the total weight is constant in time:
$\omega_t^1 + \cdots + \omega_t^N = 1$ for all
$t \ge 0$ . We discuss the importance of this in Remark 1.
-
We assume that the bins and particle allocation at time t are included in
${\mathcal F}_t$ , the
$\sigma$ -algebra generated by the information from Algorithm 1 just before the tth selection step. We also write
$\hat{\mathcal F}_t$ for the
$\sigma$ -algebra generated by the information from Algorithm 1 just after the tth selection step. See Section 5.1 for details.
-
We assume multinomial sampling in the bins because it leads to simple explicit variance expressions in terms of intrabin variances. In Remark 3 we comment on residual sampling, which performs much better than multinomial resampling and still admits nice variance formulas.
For our ergodic theorem, the bins and particle allocation can be arbitrary. To actually do better than direct Monte Carlo, they must be judiciously chosen. The most common strategy is to define bins based on a carefully constructed partition of state space – particles occupy the same bin when they belong to the same element of the partition – and then allocate children approximately uniformly among these bins. Some knowledge about the underlying problem is needed to choose the bins, but this strategy has had considerable success, as the references in the introduction attest (see [Reference Zuckerman and Chong54] for a mostly current list of application papers). We propose a different strategy in our companion paper [Reference Aristoff and Zuckerman3] that uses our variance analysis below. We summarize that strategy in Section 4 below.
3. Main results
3.1. Unbiased property
We begin with the unbiased property of weighted ensemble. This property was previously noted in [Reference Zhang, Jasnow and Zuckerman52], and proved in a slightly different setting in [Reference Aristoff2].
Theorem 2. (Unbiased property.) For each
$t\ge 0$
and all bounded measurable g,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqnU4.png?pub-status=live)
where
$\nu$
is the weighted ensemble initial distribution,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqnU5.png?pub-status=live)
We could interpret Theorem 2 as follows. If
$(X_t)$
is a Markov chain with kernel K and initial distribution
$\nu$
, then
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqnU6.png?pub-status=live)
In this sense, weighted ensemble gives unbiased estimates of the law of the underlying Markov chain.
3.2. Ergodic theorem
To ensure that weighted ensemble is ergodic, the underlying Markov kernel K must be ergodic in some sense. We assume that K is uniformly ergodic [Reference Douc, Moulines and Stoffer27].
Assumption 1. There is
$c>0$
,
$\lambda \in [0,1)$
and a probability measure
$\mu$
such that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqnU7.png?pub-status=live)
Here and below, f is a fixed bounded measurable function.
Theorem 3. (Ergodic theorem.) If Assumption 1 holds, then with probability 1,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqn6.png?pub-status=live)
Convergence of the mean of the time average in (6), at the same rate as direct Monte Carlo, follows from Assumption 1 and the unbiased property (Theorem 2). For the ergodic theorem to hold, and for weighted ensemble to beat direct Monte Carlo, the variance of the time average should be sufficiently small. Well-behaved variance is not automatic for unbiased methods; see Remark 1 below.
3.3. Variance formulas
Here, we give exact, finite N formulas for the variance of weighted ensemble, based on a martingale decomposition. To get nice concise formulas, we need some notation. Define the intrabin distributions
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqnU8.png?pub-status=live)
where
$\delta_{\xi}$
is the Dirac delta distribution centered at
$\xi$
. Define also
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqnU9.png?pub-status=live)
For a probability measure
$\eta$
and bounded measurable function g, define
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqnU10.png?pub-status=live)
and in particular, let
${\rm{Var}}_K g(\xi) \;:\!=\; {\rm{Var}}_{K(\xi,\cdot)}(g) = Kg^2(\xi) - (Kg)^2(\xi)$
.
Theorem 4. (Variance formulas.) For each time
$T >0$
,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqn7.png?pub-status=live)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqn8.png?pub-status=live)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqn9.png?pub-status=live)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqn10.png?pub-status=live)
The expression in (8) can be interpreted as the variance coming from the initial condition, while the expressions in (9) and (10) can be understood as the variances arising from each selection and mutation step, respectively.
Using Theorem 4, the proof of the ergodic theorem is straightforward. Under Assumption 1, we can show that variances of
$h_{t,T}$
and
$Kh_{t,T}$
are uniformly bounded in t and T. This makes the weighted ensemble variance
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqnU11.png?pub-status=live)
The ergodic theorem then follows from standard arguments. Note that the variance expressions (7)–(10) by themselves do not require Assumption 1.
Beyond the ergodic theorem, these variance formulas are interesting in their own right, since they could be used to design binning and particle allocation schemes that minimize the weighted ensemble variance. Indeed, that is what we have done in our companion paper [Reference Aristoff and Zuckerman3] (see also [Reference Aristoff2]). We discuss this more in the next section.
4. Comparison to direct Monte Carlo
There are many existing works showing that weighted ensemble can provide significant gains over direct Monte Carlo: see for instance the references list in [Reference Zuckerman and Chong54], and our companion paper [Reference Aristoff and Zuckerman3]. The main goal of this article is to prove the consistency of weighted ensemble from explicit variance formulas, and not to reaffirm this point. Nonetheless, we include a brief discussion here.
Weighted ensemble works by reducing the mutation variance (10), compared to that of direct Monte Carlo (see (28) below), via the selection step. However, this comes at the cost of a positive selection variance (9), compared to direct Monte Carlo, which has selection variance equal to zero. Thus weighted ensemble beats direct Monte Carlo if the reduction in mutation variance is greater than the selection variance cost. (The initial condition variances may be ignored, since they are
${\mathrm{O}}(1/T^2)$
for weighted ensemble and direct Monte Carlo, while the overall variances are
${\mathrm{O}}(1/T)$
.)
In more detail, the weighted ensemble mutation variance is
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqn11.png?pub-status=live)
A Lagrange multiplier calculation shows that the expression inside the expectation in (11) is minimized over
$N_t(u)$
, subject to the constraint
$\sum_u N_t(u) = N$
, when
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqn12.png?pub-status=live)
Plugging this into (11) makes the weighted ensemble mutation variance
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqn13.png?pub-status=live)
By Jensen’s inequality, (13) is less than or equal to
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqn14.png?pub-status=live)
which, by the unbiased property of weighted ensemble (Theorem 2), is exactly the direct Monte Carlo mutation variance (see Remark 2 below). This form of optimal mutation variance gain was originally observed in [Reference Aristoff2, Remark 4.1].
The selection variance (9) is small whenever, at each time t, bins u are chosen inside which
$Kh_{t+1,T}$
does not vary too much. With enough particles and bins, it is possible to keep the selection variance arbitrarily small, while also controlling the mutation variance by keeping the particle allocation close to the optimal (12). Even with modest numbers of particles and bins, weighted ensemble has proved to be useful; see the list of applications in [Reference Zuckerman and Chong54], most of which use relatively small N.
In our companion paper [Reference Aristoff and Zuckerman3], we propose an optimization strategy based on choosing the particle allocation to minimize mutation variance, and the bins to minimize selection variance. There, the particle allocation is a simplified version of (12), in which we use the limit
$h \;:\!=\; \lim_{T\to \infty} (h_{t,T}- (T-t)\int f\,{\mathrm{d}}\mu)$
in place of
$h_{t+1,T}$
, and the bins are chosen to make the intrabin variances of Kh small. Of course, estimating Kh and
${\rm{Var}}_Kh$
is a difficult problem. In [Reference Aristoff and Zuckerman3] we propose using a Markov state model to get ‘cheap’ approximations of Kh and
${\rm{Var}}_Kh$
; such models are already commonly used for preconditioning weighted ensemble simulations [Reference Bhatt, Zhang and Zuckerman6, Reference Copperman and Zuckerman14, Reference Copperman and Zuckerman15].
4.1. Example
Below is a simple example illustrating the variance reduction in weighted ensemble, compared to direct Monte Carlo, in the context of our variance analysis above. Consider a Markov chain on three states, with transition matrix
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqnU12.png?pub-status=live)
where
$\delta>0$
is small, and we take
$f(1) = f(2) = 0$
and
$f(3) = 1$
.
For weighted ensemble, we will assign particles to the same bin if and only if they occupy the same point in space. Following the usual method in the literature, we allocate an approximately equal number of children to each bin.
In this case the variance of weighted ensemble can be estimated as
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqn15.png?pub-status=live)
where the first approximation in (15) holds for large enough N and T, and the second approximation uses direct calculations, dropping terms of higher order than
$\delta^3$
. The variance of direct Monte Carlo (see Remark 2) can be estimated by
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqn16.png?pub-status=live)
for large T. Figure 1 shows numerical confirmation of these estimates. Note the smaller variance (higher order in
$\delta$
) for weighted ensemble, compared to direct Monte Carlo.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_fig1.png?pub-status=live)
Figure 1. Comparison of weighted ensemble with direct Monte Carlo when
$\delta = 0.001$
and
$T = 500$
. (a) Average values of
$ ({{1}/{T}})\sum_{t=0}^{T-1}\sum_{i=1}^N \omega_t^i f(\xi_t^i) $
versus N, computed from
$10^4$
independent trials, for weighted ensemble and direct Monte Carlo. Error bars widths are
$\sigma_T/10^2$
, where
$\sigma_T^2$
are the empirical variances. (b) Weighted ensemble empirical standard deviation compared with (15). (c) Direct Monte Carlo empirical standard deviation compared with (16).
5. Derivations
5.1. Notation
Below,
${\mathcal F}_t$
is the
$\sigma$
-algebra generated by the parents and their weights from times
$0 \le s \le t$
, the children and their weights from times
$0 \le s \le t-1$
, and the bins and particle allocations from times
$0 \le s \le t$
. Meanwhile,
$\hat{\mathcal F}_t$
is the
$\sigma$
-algebra generated by
${\mathcal F}_t$
together with the children and their weights at time t. Throughout, g denotes a bounded measurable function, and c a positive constant whose value can change between different equations. We will use the notation
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqn17.png?pub-status=live)
5.2. One-step means
Lemma 1. For each
$i=1,\ldots,N$
and
$t \ge 0$
,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqn18.png?pub-status=live)
At each time
$t \ge 0$
, for each bin u,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqn19.png?pub-status=live)
Proof. By (4),
${\mathbb E}[g(\xi_{t+1}^i)\mid \hat{\mathcal F}_t] = Kg(\hat{\xi}_t^i)$
. Now (18) follows from this and (5). Meanwhile, by (2),
${\mathbb E}[N_t^i\mid {\mathcal F}_t] = N_t(u)\omega_t^i/\omega_t(u)$
if
$\xi_t^i \in u$
. Thus, by (3) and (17),
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqnU13.png?pub-status=live)
Lemma 2. (One-step means.) For each time
$t \ge 0$
,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqn20.png?pub-status=live)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqn21.png?pub-status=live)
Proof. This follows immediately from Lemma 1, by summing over the particles in (18) to get (20), and summing over the bins in (19) to get (21).
5.3. Proof of the unbiased property
Proof of Theorem 2. By repeated application of Lemma 2 with the tower property,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqn22.png?pub-status=live)
Taking expectations in (22) gives the result.
5.4. Doob martingale and variance decomposition
Below, define the Doob martingale
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqnU14.png?pub-status=live)
Proposition 1. For
$0 \le t \le T-1$
, we have
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqn23.png?pub-status=live)
Proof. This comes from Lemma 2 by repeated application of the tower property.
Proposition 2. (Martingale variance decomposition.) For each
$T>0$
,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqn24.png?pub-status=live)
Proof. It is straightforward to check that all the martingale differences
$D_{t+1}-\hat{D}_t$
and
$\hat{D}_t-D_t$
are uncorrelated with each other and with
$D_0$
. The proof is finished by writing
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqnU15.png?pub-status=live)
as a telescoping sum of the martingale differences, computing
${\mathbb E}[D_{T-1}^2]$
in terms of the martingale differences, and subtracting
${\mathbb E}[D_{T-1}]^2 = {\mathbb E}[D_0]^2$
from the resulting expression.
In the proof of Theorem 4 below, we will see that the initial condition variance, selection variance, and mutation variance in (24) are the same as those in (8)–(10).
5.5. Proof of the variance formulas
Proof of Theorem 4. Using the formula (23) for the Doob martingale, together with the one-step mean formula (20), the weight update formula (5), and the conditional independence of particle evolution in (4),
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqn25.png?pub-status=live)
By (25), the weight update formula (3), and the bin mean formula (19),
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqnU16.png?pub-status=live)
In light of Proposition 2, this gives (10). Now we turn to (9). Using the formula (23) for the Doob martingale, the one-step mean formula (21), and the fact that selections in distinct bins are conditionally independent,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqn26.png?pub-status=live)
Using (26), the weight update formula (3), and the conditional independence property of multinomial resampling (see e.g. [Reference Douc, Cappe and Moulines26, equation (6)),
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqnU17.png?pub-status=live)
Taking expectations in this expression and appealing to Proposition 2 gives (9).
5.6. Remarks on the variance formulas
Remark 1. We briefly comment on the variance for other methods of the selection and mutation type (1). Consider a method with the same mutation step, but a different selection step that is still unbiased in the sense of (21).
In this case the same variance decomposition (24) applies, with selection variance
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqn27.png?pub-status=live)
For a method like weighted ensemble in which the total weight is always 1,
$Kh_{t+1,T}$
can be replaced with
$Kh_{t+1,T} - (T-t-1)\int f\,{\mathrm{d}}\mu$
in (27) without otherwise changing the equation. Assumption 1 shows that
$Kh_{t+1,T} - (T-t-1)\int f\,{\mathrm{d}}\mu$
is uniformly bounded in t and T. As a result, the selection variance (27) is
${\mathrm{O}}(1/T)$
, and the ergodic theorem remains valid.
We observed numerically that if the total weight varies at each time, then the weights tend to approach zero, and the variance (7) is of order T as
$T \to \infty$
. Indeed,
$Kh_{t+1,T}$
is typically of order T as
$T \to \infty$
, which suggests that in this case the selection variance (27) is of the order of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqnU18.png?pub-status=live)
Of course, the ergodic theorem fails if the variance (7) goes to infinity as
$T \to \infty$
.
Remark 2. Note that direct Monte Carlo – which we define as independent, equally weighted particles – is a special case of weighted ensemble in which each particle always has weight
$1/N$
, every parent always gets its own bin u, and
$N_t(u)$
is always 1. In this case the selection variance is zero, while the mutation variance is
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqn28.png?pub-status=live)
Remark 3. If the selections in the bins use residual multinomial resampling [Reference Douc, Cappe and Moulines26] instead of multinomial resampling, then the selection variance (9) becomes
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqnU19.png?pub-status=live)
where
$r_t(u)$
and
$\gamma_t^u$
are defined from the residuals
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqnU20.png?pub-status=live)
by
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqnU21.png?pub-status=live)
We omit proof, but include this formula in case it is useful for optimizations, since residual multinomial resampling performs much better than multinomial resampling yet still admits simple explicit variance expressions.
5.7. Proof of the ergodic theorem
Lemma 3. If Assumption 1 holds, then as
$T \to \infty$
,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqnU22.png?pub-status=live)
Proof. By Assumption 1, we have
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqn29.png?pub-status=live)
where now
$c>0$
is a different constant. Thus
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqnU23.png?pub-status=live)
This shows that
${\rm{Var}}_\eta h_{t+1,T} \le C^2$
, and similarly
${\rm{Var}}_\eta Kh_{t+1,T} \le C^2$
, for any probability distribution
$\eta$
. As a result, the selection and mutation variances (9) and (10) are both
${\mathrm{O}}(1/T)$
as
$T \to \infty$
. By similar arguments,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqnU24.png?pub-status=live)
which makes the initialization variance (8)
${\mathrm{O}}(1/T^2)$
. Thus the variance in (7) is
${\mathrm{O}}(1/T)$
.
Proof of Theorem 3 . Define
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqnU25.png?pub-status=live)
Then, for
$0 < S \le T$
,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqn30.png?pub-status=live)
By Lemma 3,
${\rm{Var}}(\theta_T) = {\mathrm{O}}(1/T)$
. So by Chebyshev’s inequality, there is
$c>0$
so that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqnU26.png?pub-status=live)
for large enough T. With
$T_n = n^2$
, by the Borel–Cantelli lemma, there is
$n_0$
such that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqn31.png?pub-status=live)
By the unbiased property and Assumption 1,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqn32.png?pub-status=live)
Now given
$S>0$
, we can choose
$T_n$
so that
$T_n \le S \le T_{n+1}$
and write
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220328010303824-0027:S0021900221000383:S0021900221000383_eqn33.png?pub-status=live)
By (30)–(32), with probability 1, the right-hand side of (33) vanishes as
$S \to \infty$
.
Acknowledgements
The author would like to acknowledge Frédéric Cérou, Peter Christman, Josselin Garnier, Gideon Simpson, Gabriel Stoltz and Brian Van Koten for helpful comments, and especially Jeremy Copperman, Matthias Rousset, Robert J. Webber, and Dan Zuckerman for interesting discussions and insights. The author thanks Robert J. Webber for pointing out errors and making many helpful suggestions concerning a previous version of the manuscript.
Funding information
The author also gratefully acknowledges support from the National Science Foundation via the awards NSF-DMS-1818726 and NSF-DMS-1522398.
Competing interests
There were no competing interests to declare which arose during the preparation or publication process of this article.