RARE EVENT SIMULATION

Agnès Lagnoux

doi:10.1017/S0269964806060025

RARE EVENT SIMULATION

Published online by Cambridge University Press: 12 December 2005

Agnès Lagnoux

Show author details

Agnès Lagnoux: Affiliation:
Laboratoire de Statistique et Probabilités, Université Paul Sabatier, 31062 Toulouse Cedex 4, France, E-mail: lagnoux@cict.fr

Article contents

Abstract
1. INTRODUCTION
2. IMPORTANCE SPLITTING MODEL
3. STUDY OF THE VARIANCE AND OPTIMIZATION
4. LAPLACE TRANSFORM OF
5. CONCLUSION
Acknowledgments
References

Rights & Permissions

Abstract

This article deals with estimations of probabilities of rare events using fast simulation based on the splitting method. In this technique, the sample paths are split into multiple copies at various stages in the simulation. Our aim is to optimize the algorithm and to obtain a precise confidence interval of the estimator using branching processes. The numerical results presented suggest that the method is reasonably efficient.

Type: Research Article
Information: Probability in the Engineering and Informational Sciences , Volume 20 , Issue 1 , January 2006 , pp. 45 - 66

DOI: https://doi.org/10.1017/S0269964806060025 [Opens in a new window]
Copyright: © 2006 Cambridge University Press

1. INTRODUCTION

The analysis of rare events is of great importance in many fields because of the risk associated with the event. Their probabilities are often about 10⁻⁹ to 10⁻¹². One can use many ways to study them: The first is statistical analysis, based on the standard extreme value distributions, but this needs a long observation period (see Aldous [1]); the second is modeling, which leads to estimating the rare event probability either by an analytical approach (see Sadowsky [10]) or by simulation.

In this article we focus on the simulation approach based on the Monte Carlo method. Nevertheless, a crude simulation is impractical for estimating such small probabilities: To estimate probabilities of order 10⁻¹⁰ with acceptable confidence would require the simulation of at least 10¹² events (which corresponds to the occurrence of 100 rare events).

To overcome these limits, fast simulation techniques are applied. In particular, importance sampling (IS) is a refinement of Monte Carlo methods. The main idea of IS is to make the occurrence of the rare event more frequent. More precisely, IS consists of selecting a change of measure that minimizes the variance of the estimator. Using another method based on particles systems, Cerou, Del Moral, Legland, and Lezaud [3] gave theoretical results on the convergence of this kind of algorithm. In this article, we deal with the RESTART (REpetitive Simulation Trials After Reaching Thresholds) algorithm presented by Villen-Altamirano and Villen-Altamirano [11] and based on splitting. The basic idea of splitting is to partition the space state of the system into a series of nested subsets and to consider the rare event as the intersection of a nested sequence of events. When a given subset is entered by a sample trajectory, random retrials are generated from the initial state corresponding to the state of the system at the entry point. Thus, the system trajectory has been split into a number of new subtrajectories. However, the analysis of the RESTART model presents numerous difficulties because of the lack of hypothesis and the complexity of formulas.

In this article we build a simple model of splitting for which we are able to derive precise conclusions. It is based on the same idea: Before entering the rare event A, there exists intermediate states visited more often than A by the trajectory: A = B_M+1 ⊂ B_M ⊂ ··· ⊂ B₁. Let

. The fact that a sample trajectory enters B_i is represented by a Bernoulli trial. Every time a sample trajectory enters a subset B_i, i = 1,…,M, it is divided in a number R_i of subtrajectories starting from level i. More precisely, we generate N random variables with common law Bernoulli Ber(P₁) and check whether the subset B₁ is reached. If so, we duplicate the trials in R₁ retrials of Ber(P₂) and check whether the subset B₂ is reached. Thus,

and an unbiased estimator of P is

where

are independent and identically distributed (i.i.d.), N_A is the number of trials that reach A during the simulation, and N is the number of particles initially generated. An optimal algorithm is chosen via the minimization of the variance of

for a given budget. For this, we have to describe the cost of a given simulation: Each time a particle is launched, it generates an average cost that is assumed here to be a function h of the transition probability. Therefore, the (average) cost is

where r_i = R₁···R_i, i = 1,…,M, r₀ = 1, and P_i|0 = P₁···P_i, i = 1,…,M + 1, and P_0|0 = 1. Then the optimal algorithm is described by

and M is given by M = [ln P/y₀] − 1 or M = [ln P/y₀], where y₀ is the solution of Eq. (30). The optimal sampling number is independent of the budget and this former only determines the optimal number of independent particles first generated. In the special case of h = 1,

Thus, the optimal sampling number and the optimal transition probabilities are independent of the rare event probability. For example, if P = 10⁻¹² and C = 10³, M = 16, P_i ≈ 0.2, R_i = 5, and N = 59.

Example 1.1: To analyze the behavior of the different implementations described earlier, we perform a simulation experiment using these methods. We consider a queuing network and we want to estimate the occupancy of the finite buffer queuing system M/M/1/C₀. The results are presented in Figure 1. As expected and since we proceed for a given cost C (C = 10⁴), the crude simulation stops after a few iterations, the number of samples run at the beginning being not sufficient. However, note that splitting simulation and theoretical analysis give very close results.

Comparison between the different methods: queuing theory model. Level of confidence 95/100.

Example 1.2: This model can be applied to approximate counting (see Jerrum and Sinclair [7] and Diaconis and Holmes [5]). Given a positive real vector a = (a_i)_i=1ⁿ and a real number b, we want to estimate the number of 0–1 vectors x = (x_i)_i=1ⁿ s.t.

For more details, see Section 3.2.

Remark 1.1: Hereafter we will take all the R_i equal to R and all the P_i equal to P₀ = 1/R. Thus, RP₀ = 1.

The aim of the article is to give a precise confidence interval of

. The bound involving the variance of

and given by the Markov inequality is not precise enough. Therefore, as done in the theory of large deviations, we introduce the Laplace transform of

, which can be rewritten as

, where f_M is the Mth functional iterate of a Bin(R,P₀) generating function (g.f.). The elementary theory of branching processes leads to precise bounds of f_M and to a precise confidence interval that we can compare to the confidence interval if we only use the variance. For example, for P = 10⁻⁹, C = 10⁸, and α = 0.02, the variance gives a bound about 10⁻² and the Laplace transform gives a bound approximating 10⁻¹².

The article is organized as follows. Section 2 describes the importance splitting model, presents our model and goals (the analysis of the behavior of the probability P of a rare event), and introduces an estimator

. Section 3 is dedicated to the optimization of the algorithm. In Section 4 we obtain a precise confidence interval of the estimator via branching processes. Finally, in Section 5 we conclude and discuss the merits of this approach and potential directions for further researches.

2. IMPORTANCE SPLITTING MODEL

Our goal is to estimate the probability of a rare event A corresponding, for example, to the hit of a certain level L by a process X(t). The main hypothesis is to suppose that before entering the target event, there exists intermediate states visited more frequently than A by the trajectory. Thus, define a sequence of sets of states B_i such as A = B_M+1 ⊂ B_M ⊂ ··· ⊂ B₁, which determines a partition of the state space into regions B_i–B_i+1 called the importance regions (as represented in Figure 2). In general, these sets are defined through a function Φ called the importance function from the state space to

such that for all i, B_i = {Φ ≤ T_i} for some value T_i called thresholds, with T₁ ≤ T₂ ≤ ··· ≤ T_M ≤ L.

Splitting model.

In this model a more frequent occurrence of the rare event is achieved by performing a number of simulation retrials when the process enters regions where the chance of occurrence of the rare event is higher. The fundamental idea consists of generating N Bernoulli Ber(P₁) and check whether the subset B₁ is reached. If so, we duplicate the trials in R₁ retrials of Bernoulli Ber(P₂) and check whether the subset B₂ is reached. If none of the higher levels is reached, the simulation stops.

Thus, by the Bayes formula,

Then P is the product of M + 1 quantities (conditional probabilities) that are easier to estimate and with more accuracy than the probability P of the rare event itself, for a given simulation effort.

The estimator

defined in (2) can be rewritten as

where 1_{i₀
i₁···i_j} represents the result of the jth trial. In that case,

Moreover, we define

as the probability of reaching A and we suppose that the process forgets the past after reaching a level; this happens as soon as the process is Markov.

3. STUDY OF THE VARIANCE AND OPTIMIZATION

3.1. Variance of the Estimator

First, note that

is unbiased since

As done in [11], the variance of the estimator

is derived by induction and the variance for k thresholds is given by

where

represents the estimator of P in a simulation with k thresholds.

Clearly, the formula holds in straightforward simulation (i.e., when k = 0), since

is a renormalized sum of i.i.d. Bernoulli variables with parameter P.

To go from k to k + 1, assume (12); thus, we have to prove that this formula holds for k + 1 thresholds. First, note that for all X and Y random variables, which are independent given the set B and X σ(B)-measurable, we have

Now let

The random variables X_i₀ are i.i.d. with common law Ber(P₁), and conditionally at the event B₁, X_i₀ and Z_i₀ are independent. Note that each Z_i₀ is the estimator of P in a model with k thresholds, T₂ to T_k+1 for the trajectory issued from the success of X_i₀. Thus,

and by the induction hypothesis,

So applying (13) with X ∼ Ber(P₁) and Z ∼ Z_i₀, we have

Thus, for M thresholds,

Remark 3.1: The induction principle has a concrete interpretation: If in a simulation with M steps, the retrials generated in the first level are not taken into account except one that we call the main trial, we have a simulation with M − 1 steps.

3.2. Optimization of the Parameters

As stated in Section 1, our aim is to minimize the variance for a fixed budget, giving optimal values for N,R₁,…,R_M, P₁,…,P_M+1, and M. Therefore, we have to describe the cost of a given simulation: Each time a particle is launched, it generates an average cost function h. We assume the following:

The cost h for a particle to reach B_i starting from B_i−1 depends only on P_i (not on the starting level).
h is decreasing in x (which means that the smaller the transition probability is, the harder the transition is and the higher the cost is).
h is nonnegative.
h converges to a constant (in general, small) when x converges to 1.

The (average) cost is then

where N_i is the number of trials that have reached threshold i. Finally,

Example 3.1: We want to study the model of the simple random walk on

starting from zero that we kill as soon as it reaches the level −1 or k (success if we reach k, failure otherwise).

So let X_n be such that

, where {Y_n} is a sequence of random variables valued in {−1,1} with

and define T_k = inf{n ≥ 0 : X_n = −1 or k}.

One can easily check that X_n and X_n² − n are martingales. By Doob's stopping theorem,

, which yields

(i.e., the cost needed to reach the next level is (1/p) − 1 if p is the success probability).

To minimize the variance of

, the optimal values are derived in three steps:

The optimal values of N,R₁,…,R_M are derived when we consider that P₁,…,P_M+1 are constant (i.e., the thresholds B_i are fixed).
Replacing these optimal values in the variance, we derive the optimal transition probabilities: P₁,…,P_M+1.
Replacing these optimal values in the variance, we derive M, the optimal number of thresholds.

Optimal values for N, R₁,…, R_M. Using the method of Lagrange multipliers, we get

Optimal values for P₁,…,P_M+1. Thus, the variance becomes

Proceeding as previously under the constraint P = P₁···P_M+1, we obtain that all of the P_i's satisfy

. If we assume that there exists a unique solution to this equation, we have P_i = g(λ); hence, P = g(λ)^M+1 and g(λ) = P^1/(M+1). Finally,

Optimal value for M. The optimal values for P₁,…,P_M+1 imply that the optimal R_i becomes 1/P_i, i = 1,…,M; thus,

which we want to minimize in M. Note that R_i P_i = 1. Let

whose derivative cancels in

In general, this does not give an integer. We have y₀ = ln P/(M + 1) (i.e., M + 1 = [ln P/y₀] or [ln P/y₀] + 1). Let ln P/y₀ = n + x with 0 < x < 1. Then the following hold:

If we take M + 1 = n, y = ln P/n.
If we take M + 1 = n + 1, y = ln P/(n + 1).

The value of the ratio ρ := f (n − 1)/f (n) gives the best choice for M as follows:

If ρ < 1, M = n − 1.
If ρ > 1, M = n.

Thus, the optimal number of thresholds is given by M = [ln P/y₀] − 1 or M = [ln P/y₀], where y₀ solves F(y) = 0. Then M minimizes

Example 3.2: For h = 1, we have to solve y = 2(e^y − 1). We get y₁ = 0 and y₂ ≈ −1.5936. y₂ is a minimum and the optimal value of M is

With P = 10^−k, we have

Note that M increases while P decreases, and with this value of M, each R_i and P_i become

Thus, the optimal sampling number and the optimal transition probabilities are independent of the rare event probability.

Moreover, asymptotically, M = n = [ln P/y₀] − 1; thus,

Application 3.1: In approximate counting, remember that the goal is to estimate the number of Knapsack solutions (i.e., the cardinal of Ω defined by

for a given positive real vector a = (a_i)_i=1ⁿ and real number b). We might try to apply the Markov chain Monte Carlo method (MCMC) [9]: Construct a Markov chain

with state space Ω = {x ∈ {0,1}ⁿ : a·x ≤ b} and transitions from each state x = (x₁,…,x_n) ∈ Ω defined by the following:

With probability ½, let y = x; otherwise
select i uniformly at random in {1,…,n} and let y′ = (x₁,…,x_i−1, 1 − x_i,x_i+1,…,x_n)
If ay′ ≤ b, then let y = y′, else let y = x.

The new state is y. This random walk on the hypercube truncated by the hyperplane a·x = b converges to the uniform distribution over Ω. This suggests a procedure for selecting Knapsack solutions almost uniformly at random. Starting in state (0,..,0), simulate

for sufficiently many steps that the distribution over states is “close”

The problem is to bound the number of steps necessary to make the Markov chain

“close” to stationarity. More precisely, we need a bound of the mixing time:

where Δ_x(t) = max_S⊂Ω|P^t(x,S) − Π(S)| and Π is the stationary distribution. In [7], it is shown that

steps suffice.

to uniform; then return to the current state. Of course, sampling over Ω is not the same as estimating the size of Ω. However, the first task leads to the second.

Keep the vector a fixed but allow b to vary. Use

instead of

to emphasize the dependence on b. Assume without loss of generality that a₁ ≤ ··· ≤ a_n and define

. One can check that

Now write

The ratio ρ_i = |Ω(b_i)|/|Ω(b_i+1)| may be estimated by sampling almost uniformly from Ω(b_i+1) using the Markov chain

and computing the fraction of the samples that lie within Ω(b_i).

Now take a = [1,2,3,4], b = 3, h = 1, R = 5, and C = 2600. We chose the levels as follows: First, define b₁ = 0, b₂ = 1, b₃ = 3, b₄ = 3, and b₅ = b; second, define B₀ = Ω, B₁ = Ω(b₄), B₂ = Ω(b₃), B₃ = Ω(b₂), and B₄ = Ω(b₁). Thus, here, M = n − 1, N = C/n, and n_step = 1020. Obviously, Card(Ω) = 5. We run three different simulations: The first, suggested in [7], consists of estimating the n ratios independently, the crude and splitting ones. We obtain different estimations for Card(Ω):

Estimation by crude simulation = 4.088
Estimation by the n ratios independently = 5.44
Estimation by splitting = 5.019

Even though the levels are not optimal, splitting provides an improvement.

Let us describe briefly the possible solutions of (30). Remember that we want to solve (30); that is, if z = e^y and z ≠ 0,1,

First, let z₀ be the solution of 2(z − 1) = ln z. Since h′ ≤ 0, H is negative and a quick survey shows that L is positive on]0,z₀ [and negative on]z₀,1[. As a consequence, the solutions of (37) lie in]z₀,1[, if they exist. Thus, solving (30) is equivalent to studying the intersections between H and L. A quick survey of these functions shows that we have two cases (see Fig. 3):

Case 1: An odd number of intersections between L and H

Case 2: An even number of intersections or 0 between L and H

Note that y = 0 is a solution of (30). In case 1, it corresponds to a maximum, and in case 2, it corresponds to a minimum. The second case is excluded since we made the assumption h(1) > 0.

Behavior of H and L.

Remark 3.2: The solution y = 0 corresponds to the following optimal values:

However P_i = 1 implies that P = 1 and R_i = 1 means that we just perform a crude simulation.

Example 3.3: Here, P = 10⁻¹² and C = 10⁴.

1. In Example 1, h(1) = 0 and we are in the second case: The unique solution y = 0 is the minimum.

2. Let h(x) = 1/x − 8x² + 12x − 5. h(1) = 0 and we are in the first case: y = 0 and y ≈ −0.9919 are the solutions. y = 0 is the maximum and the other solution is the minimum. Taking y ≈ −0.9919, we obtain

and we can take R = 3 and N = 23.

3. Let h(x) = (1/x − 1)²e^6x. h(1) = 0 and we are in the second case: y = 0, y₁ ≈ −0.4612, and y₂ ≈ −0.5645 are the solutions. y = 0 is the minimum and the second solution is the maximum.

4. Let h(x) = 1/x. Here, h(1) = 1. We want to solve (30), whose solutions are y = 0 and y ≈ −0.6438. Taking y ≈ −0.6438, we obtain

and we can take R = 2 and N = 34.

Thus, the control of the variance of

gives a crude confidence interval for P. Indeed, we get

This estimation is, in general, useless. For example, for h = 1, M = 12, and α = 10⁻², the upper bound becomes ≈ 5 × 10⁵/N. To obtain a bound lower than 1, we need N ≥ 5 × 10⁵. To improve it, we will use Chernoff's bounding method instead of the Markov inequality: For all λ > 0,

where

is the log-Laplace of

. Optimization on λ > 0 provides

Similarly,

Let ψ* be the Crämer transform of ψ: ψ*(τ) = sup_λ[λτ − ψ(λ)]. Thus,

So we want to obtain an accurate lower bound of ψ*.

Remark 3.3: Although we would therefore like to take R_i so that R_i P_i = 1, we are constrained to choose R_i to be a positive integer. Hereafter, we suppose that we are in the optimal case, where R_i = 1/P_i is an integer.

4. LAPLACE TRANSFORM OF

To study the Laplace transform of

, we turn to the theory of branching processes (see Harris [6], Lyons [8], and Athreya and Ney [2]). More precisely, we consider our splitting model as a Galton–Watson process, the thresholds representing the different generations.

4.1. Description of the Model and First Results

We consider a Galton–Watson model (Z_n), where the size of the nth-generation Z_n is the number of particles that have reached the level B_n, with one particle run at the beginning. Then Z₀ = 1 and Z_n satisfies the following recurrence relation:

where X_iⁿ is the number of particles among R_i that have reached the (n + 1)-st level. The (X_iⁿ)_n≥1 are i.i.d. with common law Binomial, with parameters (R_n,P_n+1) and X_i⁰ ∼ Ber(P₁). Take the optimal values of Section 3.2:

Let

, the g.f. of Z₁. Then the g.f. of Z_n is the nth iterate of f. Since

, we get

where g is the g.f. of a Ber(P₀) and f the g.f. of a Bin(R,P₀). Thus, we are interested in the expression of f_M, the Mth functional iterate of f.

Here,

, so we are in the critical case of the branching process that ensures that the algorithm of the simulation stops with probability 1 when M → ∞ (see [6]), since if f⁽³⁾(1) < ∞,

This emphasizes the rarity character when the number M of thresholds increases and the probabilities between the levels decrease.

In our case,

The iterated function f_M has no explicit tractable form and we will derive bounds for f_M(s) around s = 1. To do this, we state a general result on the Laplace transform in critical Galton–Watson models, which we could not find in the literature.

4.2. Bounds of f_n(s) for 0 ≤ s < 1 and m = 1

Remark 4.1: Remember that f_n and its derivatives are convex. Furthermore, for all 0 ≤ s ≤ 1, s ≤ f (s) ≤ f (1) = 1, and by induction, f (s) ≤ f₂(s) ≤ ··· ≤ 1. Finally, we obtain f_n(s) → 1 since f_n(s) ≥ f_n(0).

Proposition 4.1: Let α₁ = f′′(1)/2, C = (max_s∈[0,1] f′′′(s))/6α₁, and γ_n = nα₁ [1 − (C/n)(log n + 1)] − α₁. Then, for s close to 1 and large n,

Proof: Upper bound: Using Taylor's expansion, with f_n(s) ≤ θ_n ≤ f_n(1) = 1,

since f′(1) = 1. Let r_n = 1 − f_n(s); r_n satisfies

Now let α₀ = f′′(0)/2. Define the decreasing sequences (a_n) and (b_n) satisfying

Then

1. b_n's upper bound: Since 0 ≤ b_j ≤ 1, we have

Thus,

2. a_n's lower bound: Apply this upper bound to a_n (α₀ becoming α₁):

By substituting (66) in

, we get

Finally, (63) and (67) lead to the upper bound of f_n in (58).

Lower bound: In fact, we prove by induction that

For n = 1, the left-hand side of (68) is given by Remark 4.1. Then note that

; thus, for n large enough,

. For all 1 − (1/n) ≤ s ≤ 1,

However, by induction, we have

, and so, since f is increasing, taking

(i.e.,

, where

. Note that

; more precisely, we have

and we finally obtain the left-hand side of (58) since γ → h_γ is increasing. █

In the particular case of f (s) = (P₀ s + 1 − P₀)^R, we can derive a more precise lower bound:

Proposition 4.2: For s close to 1,

Observe that this is precise at s = 1.

Proof: Let h(s) = 1 − ((1 − s)/(1 + α₁(1 − s))). Since f (1) = h(1) = 1, f′(1) = h′(1) = 1, and f′′(1) = h′′(1) = 2α₁, the sign of f − h trivially depends on the sign of the third derivative of f − h, which is obviously negative here. Then h ≤ f. Since f is increasing, we deduce (74) by induction. █

We plot in Figure 4a the upper bound and the two lower bounds for P = 10⁻¹² and s near 1.

Bounds of fM(s).

4.3. Bounds of f_n(s) for 1 ≤ s and m = 1

Remark 4.2: First, let us note that, by convexity, for all s ≥ 1,

hence, f (s) ≥ s, and by induction on n,

We remark that for s > 1, the iterated function increases rapidly to infinity.

Proposition 4.3: Let γ_n′ = nα₁ [1 + (C/n)(log n + 1)] − α₁. Then, for s close to 1 and large n,

Proof: Proceeding as in Proposition 4.1 leads to the upper bound. Here, f_n →_n→+∞ ∞, which prevents us from making a Taylor expansion around 1. To overcome this difficulty, consider k_n, the inverse function of f_n; it is the nth functional iterate of the g.f. k (inverse function of f) that takes the value 1 in 1, whose derivative is 1 in 1 and second derivative is negative, and k_n →_n→+∞ 1. Thus, making a Taylor development and using the same tools as previously, we get

where β₂ = k′′(s)/2 and s_n := 1 + (1/nα₁). Using the link between k_n and f_n and the upper bound of k_n,

The lower bound of k_n leads to an upper bound of f_n. However, it provides no improvement. █

As done earlier, we can derive a more precise upper bound in the particular case of f (s) = (P₀ s + 1 − P₀)^R:

Proposition 4.4: For s close to 1,

We plot in Figure 4b these three bounds for P = 10⁻¹² and s near 1.

About the geometric distribution. If the law of X is such that the probabilities p_k are in a geometric proportion (

for k = 1,2…. and p₀ = 1 − p₁ − p₂… with b,c > 0 and b ≤ 1 − c), then the associated g.f. is a rational function:

Taking b = (1 − c)² and c = α₁ /(1 + α₁) leads to

So we have compared the nth functional iterate of a Binomial g.f. to the one of a geometric g.f. It suggests comparing the importance splitting models with Binomial and with geometric laws. The geometric laws model is set in the following way: We run particles one after the other. As long as the next level is not reached, we keep on generating particles; then we start again from the level the particle is at (the geometric distribution is the law of the first success).

This link is also stressed by Cosnard and Demongeot in [4]: for m = 1 and σ² = f′′(1) = 2α₁, the asymptotic behavior of f^2ⁿ is the same as the geometric distribution with the same variance (i.e., h).

4.4. Optimization of the Crämer Transform

Remember that

Considering the gradient of the functions, we prove that the supremum for λ ≥ 0 is reached near zero. So we can use the upper bounds for f_M obtained in the previous subsection, which leads to lower bounds for ψ*:

where

Finally,

One can easily obtain explicit but complex expressions for F(x) and G(x). We plot in Figure 5 the upper bounds obtained by the variance and by the Laplace transform, for different values of α, the prescribed error of the confidence interval. We take P = 10⁻⁹ and the optimal values obtained above for the parameters. Note that the upper bound given by the Laplace transform is better than the bound given by Chebychev's inequality, with the variance. We obtain

. In the preceding example where P = 10⁻⁹, if we fix α = 0.05 and L close to 0.01, then the corresponding costs needed are 3 × 10⁷ for the variance and 3 × 10⁶ for the Laplace transform.

Upper bounds obtained by the variance and the Laplace transform.

5. CONCLUSION

The simplified model described here has two main faults. First, we cannot choose in general the optimal level P_i. In practice, we just have an empirical estimation of the P_i, and we can adjust the levels according to them. A more precise analysis is then needed to get confidence intervals of the estimation. Second, the optimal sampling number at each level is not an integer in general. Therefore, in practice, the number of particles generated at each step should be chosen at random, either such that

. Thus, we finally need to work in a random environment. This requires a precise asymptotic of random iterates of the Laplace transform, where analysis is more delicate than the one presented here and will be the subject of a forthcoming paper.

Acknowledgments

I gratefully acknowledge my supervisors Dominique Bakry and Pascal Lezaud for their helpful suggestions and comments. I would also like to thank Persi Diaconis for telling me about approximate counting and Pieter-Tjerk de Boer for making a number of valuable remarks that improved the precision of the figures.

References

REFERENCES

Aldous, D. (1989). Probability approximations via the Poisson clumping heuristic. Applied Mathematical Sciences. Vol. 77. New York: Springer-Verlag.CrossRef

Athreya, K.B. & Ney, P.E. (1972). Branching processes. Die Grundlehren der mathematischen Wissenschaften Band 196. New York: Springer-Verlag.CrossRef

Cerou, F., Del Moral, P., Legland, F., & Lezaud, P. (2002). Genetic genealogical models in rare event analysis. Preprint. Available from http://www.lsp.ups-tlse.fr/delmoral/preprints.html.

Cosnard, M. & Demongeot, J. (1984). Théorèmes de point fixe et processus de Galton–Watson. Annales des sciences mathématiques du Québec 8(1): 5–21.Google Scholar

Diaconis, P. & Holmes, S. (1995). Three examples of Monte-Carlo Markov chains: At the interface between statistical computing, computer science, and statistical mechanics. In D. Aldous, P. Diaconis, J. Spencer, & J.M. Steele (eds.), Discrete probability and algorithms. IMA Volumes in Mathematics and its Applications, Vol. 72. New York: Springer-Verlag, pp. 43–56.

Harris, T.E. (2002). The theory of branching processes. Dover Phoenix Editions. Mineola, NY: Dover.

Jerrum, M. & Sinclair, A. (1996). The Markov chain Monte Carlo method: An approach to approximate counting and integration. In D. Hochbaum (ed.), Approximation algorithms for NP-hard problems. Boston: PWS Publishing, pp. 482–520.

Lyons, R. (2002). Probability on trees and networks. Preprint. Available from http://mypage.iu.edu/rdlyons/.

Norris, J.R. (1998). Markov chains. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge: Cambridge University Press.

Sadowsky, J.S. (1996). On Monte Carlo estimation of large deviations probabilities. Annals of Applied Probability 6(2): 399–422.Google Scholar

Villen-Altamirano, J. & Villen-Altamirano, M. (1994). RESTART: A method for accelerating rare event simulations. Amsterdam: North-Holland, pp. 71–76.

Comparison between the different methods: queuing theory model. Level of confidence 95/100.

Splitting model.

Behavior of H and L.

Bounds of fM(s).

Upper bounds obtained by the variance and the Laplace transform.

Article contents

RARE EVENT SIMULATION

Abstract

1. INTRODUCTION

2. IMPORTANCE SPLITTING MODEL

3. STUDY OF THE VARIANCE AND OPTIMIZATION

3.1. Variance of the Estimator

3.2. Optimization of the Parameters

4. LAPLACE TRANSFORM OF

4.1. Description of the Model and First Results

4.2. Bounds of fn(s) for 0 ≤ s < 1 and m = 1

4.3. Bounds of fn(s) for 1 ≤ s and m = 1

4.4. Optimization of the Crämer Transform

5. CONCLUSION

Acknowledgments

References

REFERENCES

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests

4.2. Bounds of f_n(s) for 0 ≤ s < 1 and m = 1

4.3. Bounds of f_n(s) for 1 ≤ s and m = 1