ON THE OUTCOME OF A CASCADING FAILURE MODEL

Claude Lefèvre

doi:10.1017/S0269964806060256

ON THE OUTCOME OF A CASCADING FAILURE MODEL

Published online by Cambridge University Press: 01 June 2006

Claude Lefèvre

Show author details

Claude Lefèvre: Affiliation:
Institut de Statistique et de Recherche Opérationnelle, Université Libre de Bruxelles, B-1050 Bruxelles, Belgique, E-mail: clefevre@ulb.ac.be

Article contents

Abstract
1. INTRODUCTION
2. THE BASIC MODEL
3. A GENERALIZED MODEL
4. EXAMPLES AND BOUNDS
Acknowledgments
References

Rights & Permissions

Abstract

This article is concerned with a loading-dependent model of cascading failure proposed recently by Dobson, Carreras, and Newman [6]. The central problem is to determine the distribution of the total number of initial components that will have finally failed. A new approach based on a closed connection with epidemic modeling is developed. This allows us to consider a more general failure model in which the additional loads caused by successive failures are arbitrarily fixed (instead of being constant as in [6]). The key mathematical tool is provided by the partial joint distributions of order statistics for a sample of independent uniform (0,1) random variables.

Type: Research Article
Information: Probability in the Engineering and Informational Sciences , Volume 20 , Issue 3 , July 2006 , pp. 413 - 427

DOI: https://doi.org/10.1017/S0269964806060256 [Opens in a new window]
Copyright: © 2006 Cambridge University Press

1. INTRODUCTION

In a recent article, Dobson, Carreras, and Newman [6] have proposed an interesting model to describe the occurrence of cascading failures in a closed system of n components. The basic points of the model can be summarized as follows. Initially, the n components have random loads L₁,…, L_n that correspond to independent uniform (0,1) random variables (r.v.'s). Following some disturbance, a load d is added to each of the n loads. Any component fails if its new load is larger than one. When a components fails, a new load p is added to each of the components that are still functioning. This can cause further failures in a cascade. The central problem is to determine the distribution of a r.v. N that represents the total number of components that will have finally failed. This loading-dependent model of cascading failure is motivated by the study of large blackouts of electric power transmission systems (see also [6] for a bibliography).

Section 2 is mainly concerned with the same failure model. First, we reformulate this model by means of a cascade algorithm, different but equivalent to the original one, that allows us to deal with the occurrences of failures one by one. Moreover, it is convenient to introduce the r.v.'s U₁ = 1 − L₁,…,U_n = 1 − L_n, which can be interpreted as the initial resistances of the n components. Obviously, these r.v.'s are still independent uniform (0,1); let U_1:n,…,U_n:n be the associated order statistics. We then show that the distribution of N can be expressed in terms of the joint distributions of the order statistics {U_i:n, 1 ≤ i ≤ k} for 1 ≤ k ≤ n, with respect to the linear boundary {d + (i − 1)p, 1 ≤ i ≤ k}. As a corollary, we are able to derive, in an enlightening way, a key result obtained in [6], namely in case of nonsaturation N has a quasi-binomial distribution. Furthermore, we prove how this result can be extended to a model for which the additional loads after failures are independent and identically distributed (i.i.d.) r.v.'s.

In Section 3 we discuss a generalized failure model in which the additional loads are still constant but arbitrarily fixed (instead of being equal to each other). In particular, this extension covers those cases in which the initial resistances are i.i.d. r.v.'s with any given (not necessarily uniform) distribution. For this model, we can now write the distribution of N in terms of the joint distributions of the order statistics {U_i:n, 1 ≤ i ≤ k} for 1 ≤ k ≤ n, with respect to an arbitrary (instead of linear) nondecreasing boundary {s_i, 1 ≤ i ≤ k}. We then show that the state probabilities of N still have a remarkable structure and that they can be calculated using a simple recursion. As special situations, we examine in more detail a model built with several redundant components (so that the first failures will not generate any additional load) and a model for which the resistances are exponentially distributed and the additional loads are independent r.v.'s. Finally, some numerical illustrations and simple bounds are presented in Section 4.

The approach that we will follow to tackle this cascading failure is inspired by an argument used in epidemic modeling to determine the distribution of the final number of new infections for epidemics of the S (susceptible) → I (infected) → R (removed) schema (see, e.g., Ball and O'Neill [2], Lefèvre and Utev [8], and Picard and Lefèvre [9]). Very roughly, in an epidemic context, the n components represent the n initial susceptibles and the failure of a component corresponds to the infection of a susceptible (by one of the infectives present at that time). The additional load in case of failure means that once infected, an individual adds its infectiousness to the other infectives against the remaining susceptibles. An indirect contribution of our article is thus to point out the existence of a closed connection, within this framework, between reliability and epidemic modelings.

Applying the proposed method asks us to evaluate the partial joint distributions of order statistics for a sample of n independent uniform (0,1) r.v.'s. Much on this problem can be found in Shorack and Wellner [10, Chap.9]; see also Denuit, Lefèvre, and Picard [5] and the references therein. In [5], it is established that the left tail distributions such as derived here rely on an underlying polynomial structure of the Abel–Gontcharoff form. This property, however, is not exploited in the present work, at least not in a direct or explicit way.

2. THE BASIC MODEL

Consider a system made of n identical components. Initially, the n components are unfailed and have random loads L₁,…, L_n that are independent and uniformly distributed on (0,1). The cascade proceeds according to the following algorithm (Dobson et al. [6]):

Step 0

[bull ] Load increment. A disturbance d is added to the load of each of the n components.

[bull ] Failure test. For each component j, 1 ≤ j ≤ n, if the load L_j is greater than one, j fails. Let M₀ be the number of failures.

Step 1

[bull ] Load increment. If there are failures, a disturbance M₀ p is added to the load of each of the n − M₀ remaining components.

[bull ] Failure test. For each of these components j, if the new load d + M₀ p + L_j is greater than one, j fails. Let M₁ be the number of failures.

Step 2

[bull ] Load increment. A disturbance M₁ p is added to the load of each of the n − M₀ − M₁ remaining components.

[bull ] Failure test. For each of these components j, if the new load d + (M₀ + M₁)p + L_j is greater than one, j fails. Let M₂ be the number of failures.

The next steps are similar. Failures can occur until time T, when there are no more new failures. Indeed, when M_T = 0, there is no new load added to the remaining components, so that the cascade process stops (the system is stabilized). The total number of failures is N′ = M₀ + M₁ + ··· + M_T−1, the number of unfailed components being n − N′.

An equivalent cascade algorithm. We now construct another algorithm that leads to the same total number of failures in the system. As indicated in Section 1, a similar approach is used in epidemic modeling to determine the final size distribution of an epidemic of the SIR schema. Let us introduce the order statistics L_1:n,…, L_n:n, which represent the initial loads of the n components arranged by increasing weights.

Step 0

[bull ] Load increment. A disturbance d is added to the load of each of the n components.

[bull ] Failure test. If the largest load d + L_n:n is greater than one, the corresponding component fails.

Step 1

[bull ] Load increment. If this occurs, a disturbance p is added to the load of each of the n − 1 remaining components.

[bull ] Failure test. If the new load d + p + L_n−1:n is greater than one, the corresponding component fails.

Step 2

[bull ] Load increment. A disturbance p is added to the load of each of the n − 2 remaining components.

[bull ] Failure test. If the new load d + 2p + L_n−2:n is greater than one, the corresponding component fails.

Here too, failures stop when all of the loads of the remaining components are smaller than one. Let N be the associated total number of failures.

It can be easily understood that as long as the total number of failures is considered, both algorithms are quite equivalent; that is, N′ and N have the same distribution. For instance, in the former formulation any given unfailed component at step 1 receives an additional load M₀ p, whereas in the latter formulation, that component receives the same load in several steps. This change of timescale, however, will not affect the final possible failure of the component.

The new version of the algorithm allows us to express the distribution of N in a simple way, using the key tool of order statistics. Indeed, we directly see that

In (2.1), it was implicitly assumed that 1 − d − (n − 1)p > 0. If this is not true, define j* = min{j : 1 − d − jp ≤ 0}. Obviously, P(N ≥ k) is still given by (2.1) when k = 1,…, j*, whereas P(N ≥ k) = P(N ≥ j*) when k = j* + 1,…, n, meaning that the occurrence of j* failures necessarily yields n failures. Such a consequence of cascading is called a saturation effect in [6]. To simplify the presentation, we will continue with (2.1) without saturation.

Now let us introduce the r.v.'s U₁ = 1 − L₁,…,U_n = 1 − L_n. Obviously, these U_i's are still independent and uniformly distributed on (0,1). By construction of the model, they can be interpreted as the initial random resistances of the n components. Denote by U_1:n,…,U_n:n the associated order statistics (i.e., the n initial resistances arranged by increasing strengths). Then we can rewrite (2.1) as

From (2.2), we then get

where we put U_n+1:n = 1, say.

In order to evaluate the probability (2.3), we begin by deriving the preliminary result (2.4) below. That formula is well known (see, e.g., [5]); the simple proof given here will be adapted later to more general situations. For any fixed size k (1 ≤ k ≤ n), consider a sample U₁,…,U_k of independent and uniform (0,1) r.v.'s, and denote by U_1:k,…,U_k:k the associated order statistics. Let x be any real in (0,d).

Lemma 2.1:

Proof: We will proceed by induction. Obviously, for k = 1, P[x < U_1:1 < d] = d − x. Suppose that (2.4) holds for k = 1,…, l (≤ n − 1). For k = l + 1, we start by representing the probability on the left-hand side of (2.4) as follows. Consider the l + 1 possible choices U_i, 1 ≤ i ≤ l + 1, for the first ordered value U_1:l+1, and denote by U_1:l⁽ⁱ⁾,…,U_l:l⁽ⁱ⁾ the order statistics of the original sample deprived of U_i (which reduces to a sample of size l). The probability in (2.4) can be expressed as

By conditioning on U_i, we can then rewrite (2.5) as

However, by induction, (2.6) is equivalent to

which, after integration, yields the right-hand side of (2.4) for k = l + 1. █

It now becomes easy to show from (2.3) that N has a quasi-binomial distribution (in the sense of Consul [3]). This result was derived in [6]; as indicated in [6], it was also obtained in other contexts and by different methods. For related problems, see, for example, Stadje [11] and Zacks [12].

Proposition 2.2:

Proof: Since the n initial components are i.i.d., a number of N = k failures can arise in an analogous fashion among all of the sets of k components, hence the factor

in (2.8). Fix any given set of k components as the set having those k failures. Looking at (2.3), we first see that the other n − k components will not fail with the probability (1 − d − kp)^n−k, as indicated in (2.8). Furthermore, when k ≥ 1, the failure of all of the k components of the set now means that [U_1:k < d,U_2:k < d + p,…,U_k:k < d + (k − 1)p] with a sample of size k (instead of n), and this will occur with a probability given by (2.4) evaluated at x = 0, hence the remaining factor in (2.8). █

The noncascade situation arises when p = 0. As expected, (2.8) then reduces to the binomial distribution for a sample size n and a success (i.e., failure) probability d.

A randomized extension. The functioning of the components can be influenced by various factors, internal or external. In such situations, the additional loads imposed to the system are not known with certainty but can be viewed as random elements of the model. So, let us suppose that the initial disturbance corresponds to a r.v. D and that, independently of D, the successive loads after failures correspond to i.i.d. r.v.'s W₁,W₂,…,W_n−1. Denote the cumulated additional loads by P₁ = W₁,P₂ = W₁ + W₂,…, P_n−1 = W₁ + ··· + W_n−1. To avoid saturation, we assume that D + P_n−1 < 1 a.s. (this hypothesis is satisfied if, for instance, D and the W_i's are bounded by 1/n). Now (2.8) for the law of N can be extended as follows (see Lefèvre and Picard [7] for a similar result derived in an epidemic context).

Proposition 2.3:

Proof: For any given k, let us condition the event (N = k) with respect to the two r.v.'s D and P_k. By adapting the argument followed for (2.8), we then get

where θ₀(0) ≡ 1 and the other θ_k(0)'s represent the following conditional probabilities evaluated at x = 0:

x being any real in (0,D). Now we will prove that

which leads to the announced formula (2.9).

As for Lemma 2.1, we proceed by recurrence and consider (2.11) for k = l + 1. First, (2.6) is replaced by

Now, for the term P[···] in (2.13), let us condition on the r.v. P₁, and put

. We obtain that

However, by the hypothesis of recurrence (for k = l) and using

, (2.14) then becomes

To close, we observe that E(P₁ | P_l+1) = P_l+1 /(l + 1), so that (2.15) is analogous to (2.7) with P_l+1 /(l + 1) substituted for p, hence (2.12) for k = l + 1. █

3. A GENERALIZED MODEL

In this section we discuss another generalization of the model in [6] by assuming this time that the additional loads in case of failures are still constant but arbitrarily fixed. Such an extension allows us to widen the field of applications of the model.

Thus, a large system is often built, for security reasons, with several redundant components. Then the first failures of components will not influence the functioning conditions applied to the other components. In the model of Section 2, this means that a load p is added to each unfailed component only when there are at least c + 1 failures (instead of one), with 0 ≤ c ≤ n − 1. The probability (2.2) is then modified as follows: If k = 1,…, c,

and if k = c + 1,…, n,

More generally, let us assume that the additional loads are successively of given weights w₁,…,w_n−1. Denote the cumulated weights by p₁ = w₁,…, p_n−1 = w₁ + ··· + w_n−1, and to avoid saturation, suppose that d + p_n−1 < 1. Then (2.2) is changed in

Such an extension is also relevant when in (2.2) the initial resistances of the n components are independent but no longer uniformly distributed. More precisely, let us suppose that these resistances are represented by i.i.d. random variables X₁,…, X_n, with an arbitrary continuous distribution function F. Let X_1:n,…, X_n:n be the associated order statistics. The initial disturbance is still denoted by d, and the successive loads are still denoted by p₁,…, p_n−1; assume that d and d + p_n−1 are in the support of F. Then, instead of (2.2), we have

However, since the vector [X_1:n,…, X_n:n] has the same distribution as [F⁻¹(U_1:n),…, F⁻¹(U_n:n)], (3.4) is equivalent to

where we put s₁ = F(d),s₂ = F(d + p₁),…, s_n = F(d + p_n−1). Of course, (3.5) covers all of the previous cases.

Now let us determine from (3.5) the exact probability of N. To begin, we will extend Lemma 2.1. Specifically, we consider again any fixed sample of size k (1 ≤ k ≤ n) and define

x being any real in (0,s₁), with ξ₀ ≡ 1. These ξ_k's are no longer known explicitly but they can be computed recursively from (3.7).

Lemma 3.1: For any real y,

Proof: First, we observe that as for (2.6), one can write, for 0 ≤ l ≤ n − 1,

Now let us proceed by induction and suppose that (3.7) is true for k = 1,…, l (≤ n − 1). For k = l + 1, applying induction to the probability on the right-hand side of (3.8) yields

However, computing the first integral in (3.9) and again using (3.8) for the second integral, we find that (3.9) reduces to

hence (3.7) for k = l + 1. █

The recursion (3.7) can sometimes be slightly simplified by choosing an appropriate value for y. For instance, y = 1 will guarantee that the coefficients in the recursion are all positive. Several applications are given in the following.

The distribution (2.8) for N can now be generalized by (3.10) in terms of the probabilities ξ_k(0). The reader is referred to [5] for an alternative proof based on Abel–Gontcharoff polynomials.

Proposition 3.2:

Proof: As for (2.3), we have

Thus, it suffices to follow the argument given in the proof of Proposition 2.2 using (3.6) with x = 0. █

Formula (3.10) with the recursion (3.7) can be combined to give the single recursion (3.11).

Corollary 3.3:

Proof: Equation (3.7) with x = 0 and y = 1 yields

After multiplication by

, (3.12) can be rewritten as

However, applying (3.10) inside the sum of (3.13) allows us to exhibit P(N = j), which then leads to (3.11). █

Notice that the relations (3.11) constitute a triangular system of n + 1 linear equations in the n + 1 unknown probabilities P(N = j). For instance, (3.11) for k = 0 gives P(N = 0) = (1 − s₁)ⁿ, and for k = n, P(N = 0) + ··· + P(N = n) = 1.

Case with redundant components. Let us go back to the previous particular model specified by (3.1), (3.2), with a threshold of c + 1 failures. Here thus,

Corollary 3.4: Under (3.14), the law of N is given by (3.10), where, if k = 1,…, c,

and if k = c + 1,…, n,

Proof: Equation (3.15) is straightforward and we will focus on (3.16). For k = c + 1,…, n, choosing x = 0 and y = d in (3.7) then yields

Proceeding by recurrence, let us suppose that (3.16) is true for ξ_j(0) with j = c + 1,…, k − 1. Then, by substitution in (3.17), and after rearrangement and permutation of the two sums, we obtain

However, for any real x,

since the left-hand side of (3.19) corresponds to Δ^m(x^m−1), where Δ denotes the usual forward difference operator. Thus, using (3.19) we see that the second sum in (3.18) reduces to −(k − c)^k−l−1. Finally, (3.18) becomes

hence (3.16), as indicated. █

Case with exponential resistances. Let us assume that the resistances have an exponential law, with parameter μ (> 0) say. Thus, s₁ = 1 − exp(−μd), and for k = 1,… n − 1, s_1+k = 1 − exp[−μ(d + p_k)], with p_k = w₁ + ··· + w_k. In other words,

where q₀ ≡ exp(−μd) and q_k ≡ exp(−μw_k), k = 1,…, n − 1. This particular situation is of the type met in epidemic theory with the so-called Reed–Frost model. The homogeneous case where q₁ = ··· = q_n−1 ≡ q corresponds to that model in its standard version (see, e.g., Andersson and Britton [1] and Daley and Gani [4]); the nonhomogeneous case with different q_k's gives a nonstationary version of the model (see [7]). Under (3.20), (3.11) becomes

An advantage of this case is that the additional loads imposed on the system can be considered, without difficulty, as random elements. Specifically, let us suppose that the initial disturbance and the successive loads in case of failure correspond to independent r.v.'s, D and W₁,…,W_n−1, all possibly with different laws. In an epidemic context, W₁,…,W_n are i.i.d. r.v.'s that represent lengths of infectious periods; the model then corresponds to the randomized version of the Reed–Frost model (e.g., [1,4]). Now, for this randomized case, the n + 1 relations (3.21) can still be extended into a linear system of similar structure. More precisely, for j = 0,…, n, define the parameters

Corollary 3.5: Under randomized (3.20),

Proof: First, we condition on fixed values d for D and w₁,…,w_n−1 for W₁,…,W_n−1. The conditional state probabilities of N thus satisfy the relations (3.21). We observe that P(N = j), 0 ≤ j ≤ n, depends only on the parameters q₀,…,q_j. Let us multiply both sides of (3.21) by (q₀···q_k)^n−k, and on the right-hand side, replace (q₀···q_k)^n−k/(q₀···q_j)^n−k by (q_1+j···q_k)^n−k . Then, to eliminate the condition, we take the expectation, which yields, using the parameters (3.22),

hence the system (3.23). █

4. EXAMPLES AND BOUNDS

For illustration, we consider a special case of nonregular type for the model examined in Section 3. Specifically, (3.5) is assumed to hold now with

where θ > 0. This arises when the resistances are i.i.d. r.v.'s with a power-function distribution [i.e., F(x) = x^θ for x ∈ (0,1) with θ > 0] and when the initial disturbance is equal to d and each additional load is equal to p, with d + p(n − 1) < 1. If θ = 1, the model corresponds to the case in Section 2.

We have evaluated the probability law of N, using the recursion (3.10), (3.12), when the system has n = 50 components and for different sets of parameters (with nonsaturation). Figure 1 gives this distribution on a logarithmic scale if

and θ = 0.5,0.7,1, or 1.1. As expected, the power parameter θ plays a crucial role; in particular, the mode is at k = 50 for θ = 0.5 or 0.7 and at k = 0 if θ = 1 or 1.1. In Figure 2, the parameters are θ = 0.5 and d = 0.1p, 0.75p, or p, with

. We observe that varying the initial disturbance d moderately affects the distribution; of course, the influence exerted by d depends also on the value of θ.

Log plot of the distribution of the number N of failed components, under (4.1) with and for four different θ's.

Log plot of the distribution of the number N of failed components, under (4.1) with and for three different d's.

It is worth pointing out that for n large, the recursion is not always efficient and can even generate negative values for certain probabilities of N.

To close, we indicate that for arbitrary s₁,…, s_n, the distribution of N can be approximated by finding upper and lower bounds for the probabilities ξ_k(0) in (3.10). Such simple bounds are provided in Property 4.1.

Property 4.1: In general,

If the sequence s₁,…, s_k is concave (resp. convex), then

and for any given k′ ∈ {1,…, k − 1},

provided, in the convex case, that s_k′+1 − k′(s_k′+1 − s_k′) > 0.

Proof: Let us first derive (4.2). By definition (3.6),

whereas choosing x = 0 and y = s_k in (3.7) yields

Now if the sequence s₁,…, s_k is concave, then a lower linear approximation to this sequence is given by

and an upper linear approximation is given by

for any k′ ∈ {1,…, k − 1}. Moreover, it is obvious that

It then remains to evaluate the two bounds in (4.5) by applying (2.4), which leads to (4.3) and (4.4). The convex case can be treated similarly, under the constraint for (4.4) that s₁^(u) defined earlier is positive. █

Acknowledgments

This research was supported in part by the Banque Nationale de Belgique. I am grateful to I. Coulibaly (ISRO) for his help with the numerical part. I also wish to thank the Editor for helpful remarks.

References

REFERENCES

Andersson, H. & Britton, T. (2000). Stochastic epidemic models and their statistical analysis. Lecture Notes in Statistics 151. New York: Springer-Velag.CrossRef

Ball, F.G. & O'Neill, P.D. (1999). The distribution of general final state random variables for stochastic epidemic models. Journal of Applied Probability 36: 473–491.Google Scholar

Consul, P.C. (1974). A simple urn model dependent on pre-determined strategy. Sankhya 36: 391–399.Google Scholar

Daley, D.J. & Gani, J. (1999) Epidemic modelling: An introduction. Cambridge: Cambridge University Press.

Denuit, M., Lefèvre, C., & Picard, P. (2003). Polynomial structures in order statistics distributions. Journal of Statistical Planning and Inference 113: 151–178.Google Scholar

Dobson, I., Carreras, B.A., & Newman, D.E. (2005). A loading-dependent model of probabilistic cascading failure. Probability in the Engineering and Informational Sciences 19: 15–32.Google Scholar

Lefèvre, C. & Picard, P. (2005). Non-stationarity and randomization in the Reed–Frost epidemic model. Journal of Applied Probability 42: 950–963.Google Scholar

Lefèvre, C. & Utev, S. (1996). Comparing sums of exchangeable Bernoulli random variables. Journal of Applied Probability 33: 285–310.Google Scholar

Picard, P. & Lefèvre, C. (1990). A unified analysis of the final state and severity distribution in collective Reed–Frost epidemic processes. Advances in Applied Probability 22: 269–294.Google Scholar

Shorack, G.R. & Wellner, J.A. (1986). Empirical processes with applications to statistics. New York: Wiley.

Stadje, W. (1993). Distribution of first-exit times for empirical counting and Poisson processes with moving boundaries. Stochastic Models 9: 91–103.Google Scholar

Zacks, S. (1991). Distributions of stopping times for Poisson processes with linear boundaries. Stochastic Models 7: 233–242.Google Scholar

Log plot of the distribution of the number N of failed components, under (4.1) with and for four different θ's.

Log plot of the distribution of the number N of failed components, under (4.1) with and for three different d's.

Article contents

ON THE OUTCOME OF A CASCADING FAILURE MODEL

Abstract

1. INTRODUCTION

2. THE BASIC MODEL

3. A GENERALIZED MODEL

4. EXAMPLES AND BOUNDS

Acknowledgments

References

REFERENCES

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests