1. INTRODUCTION
Failure modes and effects analysis (FMEA) is a powerful tool in identifying potential failure modes and preventing them from occurring (Vandenbrande, Reference Vandenbrande1998). Although it has a worldwide recognition across many industries, there are documented limitations for this technique. That is, it is severely restricted in its usefulness because of the limitation of spreadsheet-based approach to elicit and represent the failure causal knowledge. In these models, system variables and states are inadequately named using natural language, and causal relationships are imprecisely and ambiguously represented, so the inference results are, in turn, generally not satisfactory (Lee, Reference Lee2001). The fundamental issues of how to represent the failure causal knowledge and realize an automated FMEA arise in the context.
In response to the little structure nature of FMEA, researchers have been interested in failure knowledge modeling and reasoning through construction of formal function, behavior, and state models. Teoh and Case (Reference Teoh and Case2004) provided a knowledge representation required to build a FMEA model, and employed the functional reasoning technique to enable automatic FMEA generation from historical data. Eubanks et al. (Reference Eubanks, Kmenta and Ishii1996) proposed a behavior-based FMEA model for decomposing a system into functions and behaviors, and then mapping this model to physical artifacts. This method focuses on identifying a comprehensive set of failure modes, in contrast to performing efficient automated analysis. Zhao et al. (Reference Zhao, Su, He and Sun2004) presented an intelligent FMEA based on the system hierarchical model and fault input/output relationship net model, which is an effective way to improve automated FMEA with applying expert knowledge. Lee (Reference Lee2001) used Bayesian networks as a common knowledge representation and reasoning formalism in building FMEA and diagnostic models (BN-FMEA), which can improve the knowledge representation and inference power. Price et al. (Reference Price, Pugg, Snooke, Hunt and Wilson1997) united the functional reasoning with the structural reasoning to realize the safety analysis of electrical designs, and developed an automated FMEA system. Ruiz et al. (Reference Ruiz, Paniagua, Alberto and Sanabria2000) introduced a state-based approach as an alternative to FMEA, which makes some significant improvements over conventional FMEA in terms of adding structure to the approach. Bell et al. (Reference Bell, Cox, Jackson and Schaefer1992) developed an automated FMEA with the multipurpose causal (MPC) tool. It is built around a flexible causal reasoning module, and has been adapted to various computer-aided design and engineering platforms. Ormsby et al. (Reference Ormsby, Hunt and Lee1991) proposed a concept for automated FMEA using qualitative reasoning in a model-based environment to make the analysis extensible to other domains. Kara-Zaitri et al. (Reference Kara-Zaitri, Keller, Barody and Fleming1991) discussed an ordered matrix FMEA, which is a pictorial representation retaining all relevant qualitative and quantitative information of a failure mode or component, and developed a computer program to realize the automatic FMEA. Russomanno et al. (Reference Russomanno, Bonnell and Bowles1994) related the FMEA process to various artificial intelligence techniques, which facilitates performing a computer simulation. These FMEA approaches, although useful, often suffer from relatively weak power in terms of providing a formal representation for both a thorough comprehension of the physical system and insight into the failure knowledge needed to automate the FMEA process. In other words, it is impossible to describe not only the underlying system itself, but also its different failure characteristics in a unified and concise model. Consequently, the traceability feature of FMEA can easily become difficult to achieve, and the accuracy of the result would be suspected, especially when the underlying system with highly complicated structures and logical relationships. Anyway, from the viewpoint of supporting the automated FMEA, this is a key aspect needed to be concerned.
Toward this end, this paper proposes a formal failure knowledge representation model with the purpose to assist with the modeling and reasoning portions of FMEA. The new modeling methodology utilizes the well-established technique of polychromatic sets to represent the failure modes and their causes and effects in unified mathematic language, which provides more powerful modeling formalism than the conventional FMEA. Combining with the reasoning matrices, this model can serve as the foundation to automate the failure effects analysis process, and can be performed without any special software. Because of the availability of the standardized mathematical model of the polychromatic sets theory, this method has advantageous features both in the formal knowledge representation and the subsequent reasoning analysis. Meanwhile, another advantage of this modeling method is that the model closely follows the actual structure of the system, and therefore is readily understood by the failure analyst.
The rest of this paper is organized as follows. Section 2 discusses the mathematical aspects of the polychromatic sets theory. Section 3 proposes a formal modeling method based on the polychromatic sets theory. Section 4 elaborates the reasoning matrices and gives a specific procedure for identifying failure modes and their effects. An illustrative example is presented in section 5. Finally, Section 6 concludes this paper, along with the suggestion for future research.
2. POLYCHROMATIC SETS THEORY
Polychromatic sets theory is a newly established system theory of information processing proposed by Russian scientist V.V. Pavlov. Its key idea is to use standardized mathematical model to simulate different objects. Owing to the formed mathematic foundation, the polychromatic sets theory has made significant progress in problem formalization of many engineering domains, such as product life-cycle simulation (Pavlov, Reference Pavlov2000), product conceptual design (Gao & Li, Reference Gao and Li2006), process modeling (Li & Xu, Reference Li and Xu2003), and so on.
In the conventional set theory, a set comprises a group of elements. For different constituent elements, the difference only resides in their names, even though these elements could be various in many other characteristics. It is impossible to represent all other characteristics of the elements formally. In the polychromatic sets, not only its constituent elements, but its entirety can be pigmented with different colors to represent the research object as well as the characteristics of its elements. Given a polychromatic set A = (a 1, … , a i, … , a n), the color F j(A) corresponding to the entirety of A is defined as unified color and the color set F(A) = [F 1(A), … , F j(A), … , F p(A)] is called unified pigmentation, whereas the color F j(a i) corresponding to the element a i is defined as individual color and the color set F(a i) = [F 1(a i), … , F j(a i), … , F m(a i)] is called individual pigmentation. When an object is represented with the polychromatic sets, the unified color F j(A) and individual color F j(a i) correspond to the jth characteristics of the object and its element, respectively.
In the polychromatic sets, the Boolean matrix is a very useful instrument to store the relationships existing between the elements, individual colors, and unified colors. When simulating the real-life system, the existence of individual colors is the key factor that determines the availability of unified colors; therefore, the unified colors and individual colors are always correlated and the correlations are represented by using the following Boolean matrix:
![\eqalign{\Vert c_{ij}\Vert _{F\lpar a\rpar \comma \,F\lpar A\rpar } &= \lsqb F\lpar a\rpar \times F\lpar A\rpar \rsqb \cr &= \matrix{F_1 \lpar A\rpar \hfill \cdots \hfill F_j \lpar A\rpar \hfill \cdots \hfill F_p \lpar A\rpar & \cr \left[\matrix{c_{1{\rm l}} &\cdots &c_{1j} &\cdots &c_{1p} \cr \cdots &\cdots &\cdots &\cdots &\cdots \cr c_{i{\rm l}} &\cdots &c_{ij} &\cdots &c_{ip} \cr \cdots &\cdots &\cdots &\cdots &\cdots \cr c_{m{\rm l}} &\cdots &c_{mj} &\cdots &c_{mp}}\right]&\!\!\!\!\matrix{F_{\rm l} \lpar a\rpar \hfill \cr \cdots\hfill \cr F_i \lpar a\rpar \hfill \cr \cdots\hfill \cr F_m \lpar a\rpar \hfill}}}](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160328065532116-0898:S089006040900002X_eqn1.gif?pub-status=live)
where c ij = 1 indicates that the individual color F i(a) affects the existence of the unified color F j(A), and c ij = 0, for the opposite case.
In addition, the relationships between the individual colors and the elements, namely, the individual pigmentation of all the elements are represented by the Boolean matrix [A × F(a)], whereas the relationships between the unified colors and the constituent elements are represented by the Boolean matrix [A × F(A)]. Furthermore, the colors of the polychromatic sets are expressed with Boolean vector, and its logical operation is similar to the Boolean vector operation in Boolean vector space, such as logic sum (∨) and logic product (∧). For more information on polychromatic sets theory, please see the literatures (Li & Xu, Reference Li and Xu2003; Xu et al., Reference Xu, Li, Li and Tang2005; Gao & Li, Reference Gao and Li2006).
3. FORMAL MODELING METHOD
3.1. System decomposition
It is impractical and unnecessary to analyze the entire system simultaneously. To gain a reasonable and sufficient scope of study, a large system can be naturally arranged into a hierarchy to ease the analysis by taking the system decomposition into account (Russomanno et al., Reference Russomanno, Bonnell and Bowles1993; Sharma et al., Reference Sharma, Kumar and Kumar2005). In this paper, the issue of representing the knowledge of how systems constitute is approached from a structural perspective, but not limited to that. That is to say, the function or behavior model is also conformable; they simply represent different ways to understand a system. In the hierarchical structural model (see Fig. 1), the upper level system is organized by several subsystems that are, in turn, comprised of subsubsystems, and so forth (Graham-Jones & Mellor, Reference Graham-Jones and Mellor1995). Through this iterative manner, the complex system can be decomposed until the smallest replaceable unit or part (e.g., a bolt, pipe, or seal). The definition of system “levels” is arbitrary and the process of decomposition is intuitive. Each level in the hierarchy contains one or more elements that together represent a static view of the system under a particular granularity, and the granularity of view increases with levels.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160627153232-48745-mediumThumb-S089006040900002X_fig1g.jpg?pub-status=live)
Fig. 1. Hierarchical structural model.
The complex system decomposition is a very difficult task because of our limited understanding of the system. The ideal solution is to break the system into subsystems of smaller dimension by analyzing the parts that can be decoupled and the subassemblies that are weakly interconnected. These subsystems are typically quite coherent, and it is easy to separate them from the rest of the system. Building and decomposing system models is a subjective process; however, the process itself can provide an improved understanding of the system. Note that the term “component” is used to describe the physical entities that achieve a certain function, and usually includes the subsystems, subassemblies, parts, and so forth, unless specified.
3.2. Failure causal relationship analysis
Failure modes occur at different levels of system aggregation. A component may be included in more than one failure mode and a failure mode may contain more than one component that fails because of the same cause. The phenomena of system failures are complicated because of the interdependence of these failure modes, so a thorough FMEA needs to examine the effects of many potential failure modes in as many locations as possible (Hawkins & Woollons, Reference Hawkins and Woollons1998). From the structural point of view, there is an intrinsic causal relationship between the failure modes with the hierarchical characteristic (Hu et al., Reference Hu, Starr and Leung2003; Zhao et al., Reference Zhao, Su, He and Sun2004). For a system failure, the failure sources at the first level are the subsystems and the failure causes are the current failure modes of these subsystems. The failure sources at the second and third levels are, respectively, the functional modules and subassemblies, and the causes are the correlative failure modes of these components. In that case, the part failure modes are the lowest level failures, and their causes are usually the possible original causes of a system failure. In other words, the failure modes of the lower level may have effects on that of the upper, which certainly results in the upper level components failing. This failure scenario is described as the next higher level effect in the spreadsheet-based FMEA. Another scenario is that the failure modes of some components can interact with each other at the same system level (Zhao et al., Reference Zhao, Su, He and Sun2004). These failure modes are defined as interactive failure (Sun et al., Reference Sun, Ma, Mathew and Zhang2006). When the interactive failure happens, the influencing component will cause the affected component to fail immediately or increase the deterioration of the affected component step by step. This scenario can be described by extending the conception of the local effect in conventional FMEA, that is, the effects resulting from the interactive failures can also be defined as the local effects.
The above two kinds of failure propagation mechanisms would accelerate even a small failure propagating widely in the complex system. As shown in Figure 2, the failure mode FM1of the component A a1 will have an effect on the higher level component A a, and the failure mode of A b will have a local effect on the same level component A a. The effects of the bottom failure modes will propagate to the entire system A, which results in the end effect EEF5.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160627153232-31398-mediumThumb-S089006040900002X_fig2g.jpg?pub-status=live)
Fig. 2. Iterative relation between FMEA spreadsheets.
The unstructured knowledge representation of the spreadsheet-based FMEA model cannot provide full traceability between the failure modes and the effects of the complex system. and it is not applicable to analyze the multiple failures. That is why a more powerful failure knowledge modeling method is needed.
3.3. Formal knowledge representation with polychromatic sets
The requirement, which the new procedure and/or method have to fulfill, is derived from the problem mentioned above. In the hierarchical structural model, each component represents the smallest distinct unit under the granularity of a level, and can be expanded to a structure at the next lower level. Similarly, each failure mode can be tracked until the original causes are achieved. To provide a useful visualization of how the failure modes (FM) contribute to the failures of the component (C) at different levels, a predefined mapping framework is proposed, as shown in Figure 3. The left side of the framework illustrates the causal logical relationship between the failure modes, whereas the right side indicates the hierarchical decomposition for the complex system. The specific correlation existed between the failure mode node and the component node is denoted by a dash line. Through this mapping process, the failure modes are attached to the corresponding components at the different levels, for example, FM0 → C0, FM12 → C11.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160627153231-79560-mediumThumb-S089006040900002X_fig3g.jpg?pub-status=live)
Fig. 3. Mapping framework between failure modes and components.
As mentioned before, each failure mode can be assigned to a corresponding component, so we regard it as an inherent component property. To facilitate the expression and computer operation, the polychromatic sets can be employed to represent the component and its failure modes in the form of integrity. Suppose using a polychromatic set A = (a 1, … , a i, … , a n) to denote a group of components at a certain level, then the individual color F j(a i) can be adopted to represent the single failure mode of the component a i, whereas the individual pigmentation F(a i) can be used to represent the whole. Accordingly, the next higher level effect of the failure mode can be represented by using the unified color F j(A) corresponding to the entirety of A, and all these effects can be represented by the unified pigmentation F(A). Therefore, the failure modes and their effects are defined as the different colors of the components, which is in favor of the formalization and programming. Considering the complex system is organized in terms of hierarchy, a formal failure knowledge representation model for the complex system can be constructed as shown in Figure 4.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160627153232-70130-mediumThumb-S089006040900002X_fig4g.jpg?pub-status=live)
Fig. 4. Formal failure knowledge representation model.
In the above model, the component and its failure modes are incorporated and represented as a two-tuple <F[A(k, i k, j k−1)];A(k, i k, j k−1)>, where A(k, i k, j k−1) denotes the i kth component at the kth level and F[A(k, i k, j k−1)] indicates the corresponding failure modes of the component. In addition, A(k, i k, j k−1) is also the subcomponent of the j k−1th component at the (k − 1)th level. Specifically, <F[(A(0,0,0)];A(0,0,0)> is the top node in which A(0,0,0) represents the entire system and F(A(0,0,0)) represents the corresponding top-level failure modes. The total number of nodes at the kth level is n k.
An advantage of this representation model consists in the possibility of modeling the system at each desired level. Considering the failure modes on the bottom level are usually well known and can be found in the existing failure modes database with more specific description (Contini, Reference Contini1995; Bluvband et al., Reference Bluvband, Polak and Grabov2005), these failure modes are represented within the three-tuple set (S)
![S = \lcub\!\!\! \lt \!a_i\semicolon \; F\lpar a_i \rpar \semicolon \; f\lpar a_i \rpar\! \gt \vert i = 1\comma \; 2\comma \ldots\comma \; n\rcub](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160328065532116-0898:S089006040900002X_eqn2.gif?pub-status=live)
where a i is the bottom level component; F(a i) and f(a i) are the corresponding failure modes and original failure causes, respectively. The original causes of the system failure could be various because of the different selection of the bottom level components. In addition, there are some special failures that cannot be represented as well as the normal ones, such as the software failures, the operator's mistakes, the failures of the components beyond the system boundary, and so forth. In this paper, these failures are considered as the original causes of the system failure and are assigned to the correlative components.
4. FAILURE EFFECTS ANALYSIS
4.1. Reasoning matrix
Once the failure modes and effects are defined as the mathematic term “color,” the required causal relationships between the failure modes need to be represented and stored through a more convenient approach. The theory of polychromatic sets, in particular, the Boolean matrix, does provide a feasible approach for the solution of imprecisely and incompletely reasoning problems.
First of all, it is very important to relate the different failure modes to the corresponding components. Assuming A(k, j k−1) = A(k, i k, j k−1), and F(A(k, j k−1)) =
, where n(k, j k−1) is the number of child nodes of the j k−1th node at the (k − 1)th level. The relationships between the component at the kth level A(k, i k, j k−1) and its correlative failure modes F[A(k, i k, j k−1)] can be represented and reasoned using the following Boolean matrix [A × F(a)] of the polychromatic sets:
![\eqalignno{\Vert c_{i_{k}\comma j}\Vert_{A\comma F\lpar a\rpar } &= \lsqb A\lpar k\comma \; j_{k-1}\rpar \times F\lpar A\lpar k\comma \; j_{k-1}\rpar \rpar \rsqb \cr &\hskip14pt F_1 \lpar A\lpar k\comma \; j_{k-1}\rpar \rpar \hfill \cdots \hfill F_j \lpar A\lpar k\comma \; j_{k-1}\rpar \rpar \hfill \cdots \hfill F_p \lpar A\lpar k\comma \; j_{k-1}\rpar \rpar \cr& = \!\!\matrix{ \left[\matrix{c_{11} &\cdots &c_{1j} &\cdots & c_{1p} \cr \cdots &\cdots &\cdots &\cdots &\cdots \cr c_{i_{k} 1} &\cdots &c_{i_{k}j} &\cdots &c_{i_{k}p} \cr \cdots &\cdots &\cdots &\cdots &\cdots \cr c_{n\lpar k\comma\, j_{k-1}\rpar1 } &\cdots &c_{n\lpar k\comma \,j_{k-1}\rpar \, j} &\cdots &c_{n\lpar k\comma \, j_{k-1}\rpar \, p}}\right]}\cr &\quad\hskip+2.8pt \matrix{A\lpar k\comma \; 1\comma \; j_{k-1}\rpar \hfill \cr \hskip-1.5pt\cdots \hfill \cr A\lpar k\comma \; i_k\comma \; j_{k-1}\rpar \hfill \cr\hskip-1.5pt \cdots\hfill \cr A\lpar k\comma \; n\lpar k\comma \; j_{k-1}\rpar \comma \; j_{k-1}\rpar \hfill}}](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160328065532116-0898:S089006040900002X_eqn3.gif?pub-status=live)
where c i k j = 1, if F j[A(k, j k−1)] ∈ F[A(k, i k, j k−1)], otherwise c i k j = 0. This means that for a given reasoning matrix, if c i k j is 1 for some i k and j, then a definite correlation exists between the i kth component and the jth failure mode. In contrast, there is no correlation if c i k j is 0. Thus, the reasoning matrix determines whether there is a relationship between the i kth component and the jth failure mode.
According to the hierarchical failure nature, a component may have effects on its higher level “containers” by means of the failure propagation, that is, the upper level failure modes F[A(k, i k, j k−1)] are always considered as the failure effects derived from the failure modes at the next lower level F[A(k + 1, i k+1, i k)]. Therefore, the causal relationships between the failure modes at the adjacent levels can be described and reasoned by the following Boolean matrix [F(a) × F(A)] corresponding to expression (1):
![\Vert c_{ij} \Vert _{F\lpar a\rpar \comma F\lpar A\rpar } = \lsqb F\lpar A\lpar k+1\comma \; i_k \rpar \rpar \times F\lpar A\lpar k\comma \; i_k\comma \; j_{k - 1}\rpar \rpar \rsqb \comma](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160328065532116-0898:S089006040900002X_eqn4.gif?pub-status=live)
where c ij = 1 if the occurrence of the failure mode F j[A(k, i k, j k−1)] is caused by the other failure mode at the next lower level F i[A(k + 1, i k)]. In addition, a component may also respond to failure events generated by the other components on the same level, namely, interactive failure. Similarly, the causal relationships between the interactive failure modes can be described and reasoned by using the self-correlative Boolean matrix [F(a) × F(a)]:
![\Vert c_{ij} \Vert _{F\lpar a\rpar \comma F\lpar a\rpar } = \lsqb F\lpar A\lpar k\comma \; j_{k - 1}\rpar \rpar \times F\lpar A\lpar k\comma \; j_{k - 1}\rpar \rpar \rsqb.](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160328065532116-0898:S089006040900002X_eqn5.gif?pub-status=live)
In Eq. (5), if the failure mode F j[A(k, j k−1)] is caused by the other failure mode on the same level F i[A(k, j k−1)], c ij = 1.
As described previously, the conventional FMEA only considers the single failures and displays only those system effects stemming from the single failures. If the multiple failures are considered, completely new system effects or top events, which were previously not taken into account, can occur. That is, the system failure events are not always caused by the single failures. The double, triple failures combinations or higher should be concerned together. To this end, the causal relationships between the failures modes are classified as two sorts differed with disjunction format (P∨S) and conjunction format (P∧S) of the polychromatic sets. It is similar to the “OR” operation and “AND” operation in FTA (fault tree analysis). They can be expressed as:
![{\rm P^{\vee}S}\colon \quad F_j \lpar A\lpar k\comma \; i_k\comma \; j_{k - 1}\rpar \rpar = \mathop {\vee}\limits_{i_{k^{\hskip0.2\prime}} \geq 1} F_j \lpar A\lpar k^{\hskip0.2\prime}\comma \; i_{k^{\prime}}\comma \; j_{k^{\prime} - 1}\rpar \rpar\comma](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160328065532116-0898:S089006040900002X_eqn6.gif?pub-status=live)
![{\rm P^{\wedge}S}\colon \quad F_j \lpar A\lpar k\comma \; i_k\comma \; j_{k - 1}\rpar \rpar = \mathop{\wedge}\limits_{i_{k^{\hskip0.2\prime}} \geq 1} F_j \lpar A\lpar k^{\hskip0.2\prime}\comma \; i_{k^{\prime}}\comma \; j_{k^{\prime} - 1}\rpar \rpar\comma](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160328065532116-0898:S089006040900002X_eqn7.gif?pub-status=live)
where k′ = k or k + 1. When there is more than one combination of the failure modes (l ≥ 1) that can affect the existence of the other failure modes at the same or the next higher level, the composition of these combinations are expressed as
![F_j \lpar A\lpar k\comma \; i_k\comma \; j_{k - 1} \rpar \rpar = \mathop {\vee}\limits_{l \,\geq 1} \mathop {\wedge}\limits_{i_{k^{\hskip0.2\prime}} \geq 1} F_j \lpar A\lpar k^{\hskip0.2\prime}\!\comma \; i_{k^{\prime}}\!\comma \; j_{k^{\prime} - 1} \rpar \rpar _l.](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160328065532116-0898:S089006040900002X_eqn8.gif?pub-status=live)
4.2. Failure effect analysis process
Failure effects analysis need to identify the failure modes and their end effects on the entire system, which is essential for the system decomposition and model construction. Each stage of the failure effects analysis starts with a set of components at one level, and identifies all the possible failure modes of these components, then assesses the effects on the local level, the next higher level, and the system as a whole. However, these three kinds of failure effects are not available directly. The effects to the local level and to the upper level are validated at first, then, the end effect is gotten through upward iterating until into the top level. Figure 5 shows the flow chart revealing the specific procedure for carrying out failure effects analysis.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160627153233-94381-mediumThumb-S089006040900002X_fig5g.jpg?pub-status=live)
Fig. 5. Failure effects analysis process.
Step 1. Determine the system level to be analyzed. At the first stage of the failure effects analysis, the system level needs to be located so as to perform a focused analysis. The selection of the system level is decided by the particular application or the knowledge level of the maintenance personnel (usually the component level). There is no benefit to expand the level too lower because it will defeat the purpose of the study by introducing too many failure modes.
Step 2. Identify the potential failure modes of the components in the selected system level. It involves a theoretical and scientific engineering insight at each of the failures defined previously, with a goal of narrowing the field down to the most predominant failing components and associated predominant failure modes of the system. This step is accomplished through a process of theoretical analysis, followed by comparison and grouping of the actual maintenance data. If current level is the top level, then go to Step 5 directly, otherwise go to the next step.
Step 3. Analyze the local effects of the selected failure modes. The local effects on the same level can be captured by searching the self-correlative reasoning matrix [F(a) × F(a)]. If the local effects exist, then add the corresponding failure modes of the affected components into the candidate failure modes that are needed to be further analyzed, otherwise go to Step 4 directly.
Step 4. Analyze the next higher level effects of the failure modes decided by the Step 3. The failure effects on the next higher level can be captured in search of the reasoning matrix [F(a) × F(A)] until the top level is arrived at. Through this reasoning analysis, all the possible failure modes at the next higher level are identified and can be considered as the candidate failure modes in preparation for the Step 2.
Step 5. Export the end effects on the entire system. Using the iterative searching process operated on the reasoning matrices, the end effects on the entire system will be achieved automatically. In general, the end effects can be derived from the list of the functional requirements for the system under analysis.
Through the above analysis process, most parts of the data can be extracted from the causal relationship to form a FMEA report. It must be remarked here that the rest of the information such as the risk priority number (RPN), current control, and recommended action should be provided by the analyst at certain stages of the FMEA generation process. As the analysis procedure is repetitive, it lends itself to computer automation, but requires a program to perform the reasoning portion of the analysis. Evidently, the failure diagnostic reasoning can be interpreted as a reverse search process operating on the reasoning matrices.
5. APPLICATION EXAMPLE
5.1. System description
According to Dean (Reference Dean2003), the low-pressure and medium-pressure air compressor (LP-MPAC) plants on various ships are defined as “a sum of all components and the relationships among them that participate in an air supply to maintain the system pressure at the desired level.” To help the understanding of our proposed model and associated reasoning method, the FMEA for a low-pressure reciprocating air compressor (LPRAC) plant is investigated. The LPRAC plant is a typical complex system with multilevels, and the organization of the failure knowledge is complicated as there are too many factors needed to be considered. Therefore, before applying the new modeling methodology, the concerned system and its physical boundary must be well defined. The components beyond the system boundary are not taken into account, and the failures of these components are considered as the original failure causes that need not be further analyzed. Briefly, the system can be composed of several components such as oil pump, compressor, fresh water (FW) filter, driver motor, and so forth. Obviously, some of these components can be further broken down into smaller level, for example, the compressor can be divided into high pressure/low pressure (HP/LP) cylinder, piston, gas valve, slide, and so forth. An example of hierarchical structural decomposition scheme of the LPRAC plant is depicted in Figure 6.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160627153356-10031-mediumThumb-S089006040900002X_fig6g.jpg?pub-status=live)
Fig. 6. Structural model of LPRAC plant.
The failure effects analysis requires the deep understanding of the end effects on the entire system. The way to define the end effects is to investigate the system function failure, that is, “The inability of a system to meet a specified performance standard.” Functional failures can be described by judging what is too much, too little, or degraded functional outputs for the system. In addition, it is necessary to further define the possible failure modes on the bottom level, and these failure modes are considered as the direct or original causes of the system function failure. The failure modes on the bottom level are usually well known and with more specific descriptions in advance. To accomplish the failure effects analysis, the failure modes on other levels needs to be well defined and organized as well, and these failure modes can be generated from the system technical manuals, the failure reports or the maintenance records.
5.2. Formal modeling
As discussed above, our formal representation model is a structure-based and modularized method that enables us to perform a failure effects analysis more precisely and completely. It is not only useful for finding potential failure modes while avoiding the incorrectness problems inherently in traditional methods, but also helpful for giving a guide to the designers and domain experts to elicit the domain knowledge incrementally. To facilitate failure effect analysis, a formal failure knowledge representation model is constructed using the aforementioned modeling method as shown in Figure 7.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160627153426-67874-mediumThumb-S089006040900002X_fig7g.jpg?pub-status=live)
Fig. 7. Formal failure knowledge representation model of LPRAC plant.
In Figure 7, the single node <F[(A(0,0,0)]; A(0,0,0)> forms the first level in which the A(0,0,0) denotes the entire system and F[A(0,0,0)] = {F j0, j = 1, 2, … , m 0} indicates a set of function failure modes (m 0 is the sum of the failure modes). This node can be further decomposed into a group of child nodes, namely, <F[A(1,i,0)]; A(1,i,0)> [i = 1, 2, … , n(1, 0)] in which the A(1,i,0) denotes the ith component on the second level and can be substituted using the symbol A i1 for simplification. Accordingly, F[A(1,i,0)] = {F j1, j = 1, 2, … , m i} represents a set of failure modes related to the component A i1. Using the overlapping process, the components and failure modes on the next lower level can be represented using our proposed modeling method until the lowest level achieved. In this study, only a limited number of levels, components, and failure modes are taken into consideration, but is adequate for illuminating the concepts and ideas put forward in this paper. A list of limited failure modes and components are provided in Table 1.
Table 1. Failure mode/component definition and description
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160627153430-27415-mediumThumb-S089006040900002X_tab1.jpg?pub-status=live)
A thorough FMEA should look into all the potential failure modes, particularly those failure modes that cause the double, triple, and higher system functional failures. To facilitate the analyst to trace horizontally or vertically the failure at arbitrary indenture level, the complete reasoning matrices are applied to store the failure causal knowledge of the complex system. For simplicity, these reasoning matrices are integrated into one chart as shown in Figure 8.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160627153423-34428-mediumThumb-S089006040900002X_fig8g.jpg?pub-status=live)
Fig. 8. Reasoning matrices.
The reasoning matrices to mapping the failure modes into the corresponding components on the second and third level are denoted by the Boolean matrices (1) and (4), respectively. A “1” used at the intersection of related failure mode and component implies a definite correlative relationship. In addition, the reasoning matrices relating failure modes at the adjacent levels is denoted by matrices (2) and (3), which have the upper level failure mode codes as rows (or columns) and the lower level failure mode codes as columns (or rows). Similarly, the “1” means a definite causal relationship existing between the two correlative failure modes. Furthermore, the relationships between the failure modes on the same system level also needs to be considered as well, for example, the excessive wear, faceted run surfaces, and fractures found in the piston rings are because of excessive friction against the inner surface of the piston liner. Because it is easy to represent and interpret this type of failure based on the self-correlative reasoning matrix, the detailed expatiation on it is out of discussion here.
5.3. Failure effects analysis
The failure effects analysis starts with the failure modes of interest and all their possible effects on the initial indenture level. Each failure effect can be further delivered until the top level is arrived. It is not hard to make an automated causal reasoning based on the reasoning matrices. The failure mode selection, as is common in the traditional FMEA, is accomplished by an additional searching process applied to the reasoning matrices (1) or (4); all failure modes that cannot be a part of the selected component will be disabled automatically. In the example of reasoning to single-point failure of “gas valve leakage” (F 32) of the gas valve (A 12), the failure effects on the compressor assembly (A 41) are obtained in search of the reasoning matrix (3), namely, “inadequate airflow” (F 51) and “unqualified exhaust parameters” (F 61). Furthermore, the failure modes F 51 and F 61 might result in the failure of the entire system, respectively, that is, “pressure satisfactory, low capacity” (F 30) and “low pressure, low capacity” (F 20), which are achieved by searching the reasoning matrix (2).
We now continue with the double failures investigation by focusing on the specific failure modes. For instance, focusing on the third level components A 12 and A 42, the correlative failure modes are enabled in the reasoning matrix (4), namely, F 32 and (F 12, F 22, F 62, F 72). Assuming a double-point failure (a combination of two failure modes leading to a failure) occurs at the third level, that is, F 32 ∧ F 62, three combinations of failure modes on the second level are exposed in search of the reasoning matrix (3), that is, (F 51 ∧ F 71) ∨ (F 61 ∧ F 71) ∨ (F 71 ∧ F 121). The final effects on the top level are also achieved by searching the reasoning matrix (2); for example, the system failure event (F 20 ∧ F 50) will happen when the double-point failure (F 71 ∧ F 121) is chosen as the failure cause. Applying the same process used for the triple failures or higher, the desired failure effects on the arbitrary system level are displayed automatically. Especially, some failure modes are in combination with one another to cause a new failure mode, which means that these failure modes are networked with the AND operation. The reasoning matrix needs to be modified to satisfy this case, for example, we can use “−1” at the intersection of the failure mode and its causes to denote this type of causal relationships in the form of conjunction (∧). The results obtained from the failure effects analysis using the proposed approach is equivalent to the published results in Dean (1992). Most importantly, we realize the automatic diagnosis reasoning for more than one top failure events through the reverse process. For example, the different combinations of the failure modes associated with the terms that cause the failure state of the system (F 30 ∧ F 50) are obtained in search of the reasoning matrix (2), and the logical expression is written as follows:
![\eqalign{F_3^0 \wedge F_5^0 &= \lpar F_5^1 \wedge F_{12}^1\rpar \vee \lpar F_5^1 \wedge F_{17}^1 \rpar \vee \lpar F_5^1 \wedge F_{19}^1 \rpar \vee \lpar F_5^1 \wedge F_{21}^1 \rpar \cr &\quad \vee \lpar F_5^1 \wedge F_{22}^1 \rpar \vee \lpar F_5^1 \wedge F_{24}^1 \rpar}](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160328065532116-0898:S089006040900002X_eqnU1.gif?pub-status=live)
These failure modes can be further analyzed until the bottom level is reached. The results of the effects and diagnosis analysis give useful information on how to improve the system reliability through the identification of the weakest parts, where the most effective prevention maintenance can be adopted. Of course, the combinatorial explosion problem will exist when the system and the failure modes are more complicated. We can use the approximate failure rates for components to select the most likely combinations of failures for analysis. This means that the large number of multiple failures can be ignored, enabling the engineers to concentrate their attention on the most significant multiple failures.
5.4. Assessment of methods
The polychromatic sets (PS) approach to FMEA makes improvements over the traditional FMEA process by adding more structure and formalism to the failure knowledge representation, which in turn, facilitates the automated process of failure effects analysis (see Section 5.3). On the contrary, most conventional FMEA methods give very little attention to these points. For comparing the proposed approach with current FMEA efforts (Kara-Zaitri et al., Reference Kara-Zaitri, Keller, Barody and Fleming1991; Eubanks et al., Reference Eubanks, Kmenta and Ishii1996; Ruiz et al., Reference Ruiz, Paniagua, Alberto and Sanabria2000; Lee, Reference Lee2001; Teoh & Case, Reference Teoh and Case2004), we systematically examine the methods according to their modeling instruments, reasoning mechanism, ability to observe multiple failures, and so forth. The result of the comparison is shown in Table 2.
Table 2. Method comparison
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160627153427-30369-mediumThumb-S089006040900002X_tab2.jpg?pub-status=live)
Note: ++, most appropriate; +, medium appropriate; −, less appropriate.
It is found that our proposed method is advantageous in some aspects, they either: provide more unified and concise representation of the failure knowledge, improve the accuracy and consistency of the analysis results, support the multiple failures analysis, bridge the design–diagnosis modeling gap, or facilitate the implementation in the computer. For example, about 101 blocks and 202 linkages are needed to describe the elements provided in Table 1 by using the functional diagram (Teoh & Case, Reference Teoh and Case2004), whereas merely 18 nodes and 17 borders are needed in our proposed model. Further improvements in terms of formalism can be made for the more complex system by modularizing the representation model before the analysis takes place. Rather than the spreadsheet-based approach, representing the causal relationships in the form of a Boolean matrix lends itself to the mathematical manipulation and gives significant savings in the computational efficiency. The accuracy and consistency of results is guaranteed by a deductive inference process applied to the Boolean matrices. Particularly, this new modeling method also provides an efficient support to the multiple failures analysis and diagnostic reasoning, which needs no special software package. There is no prescription for these types of analysis in conventional FMEA procedures.
However, the proposed method provides limited ability to satisfy the quantitative simulation requirement. We can expand the current model and integrate it with other quantitative analysis methods to support the quantitative simulation function. In addition, the proposed method uses a matrix approach for reasoning about the causes and effects, but does not provide an efficient algorithm to perform this reasoning. These are the works needed to be done in the near future.
6. CONCLUSIONS
FMEA has been well recognized as a standardized engineering technique to help identify, rank, and alleviate potential failures in a design or process. This paper mainly concentrates on the failure knowledge modeling and reasoning aspects of FMEA. By incorporating the system structural decomposition, the polychromatic sets theory is employed as a powerful tool to construct the formal failure knowledge representation model. In this model, the failure mode is defined as the mathematic terminology “color,” and the failure causal relationships are stored in the Boolean matrices, which can be readily expressed and implemented in the computer. This model provides an organizational support to the management of available knowledge on the characteristics of failure propagation. From the same representation model, the complete FMEA results for various candidate failure modes could come true through the bottom-up search process repetitively operated on the reasoning matrices. The method will always come up with the same results in the same amount of failure modes, resulting in a better, more consistent analysis. In particular, the model can also be used to realize the automatic diagnosis reasoning for more than one top failure events with the top-down consideration, which helps to bridge the design–diagnosis modeling gap and promote the FMEA knowledge reuse. The case study of LPRAC demonstrates that the proposed modeling and reasoning methods provide a practical and feasible approach for FMEA formalization.
ACKNOWLEDGMENTS
This research work was funded by the National High-Tech R&D Program of China (grant 2006AA04Z441).
G. Li received a Bachelor's degree in mechanical engineering from Xi'an Jiaotong University in 2002 and is currently a graduate student at the State Key Laboratory for Manufacturing System Engineering at Xi'an Jiaotong University, China. His interests are in FMECA automation, safety analysis, and RCM.
J.M. Gao is a Professor at CIM Institute in the Department of Mechanical Engineering at Xi'an Jiaotong University, China, where he received his PhD in mechanical engineering. Dr. Gao has been working in quality management engineering for 15 years. His research and teaching interests are industrial safety, total quality management, integrated quality systems, and ERP.
F.M. Chen is a Professor at CIM Institute in the Department of Mechanical Engineering at Xi'an Jiaotong University, China, where he received his PhD in mechanical engineering. His research and teaching interests are CIMS, CE, and CAD.