Hostname: page-component-745bb68f8f-f46jp Total loading time: 0 Render date: 2025-02-06T12:15:54.822Z Has data issue: false hasContentIssue false

An efficient diagnosis algorithm for inconsistent constraint sets

Published online by Cambridge University Press:  10 June 2011

A. Felfernig*
Affiliation:
Institute for Software Technology, Graz University of Technology, Graz, Austria
M. Schubert
Affiliation:
Institute for Software Technology, Graz University of Technology, Graz, Austria
C. Zehentner
Affiliation:
Institute for Software Technology, Graz University of Technology, Graz, Austria
*
Reprint requests to: A. Felfernig, Institute for Software Technology, Graz University of Technology, Inffeldgasse 16b, A-8010 Graz, Austria. E-mail: alexander.felfernig@ist.tugraz.at
Rights & Permissions [Opens in a new window]

Abstract

Constraint sets can become inconsistent in different contexts. For example, during a configuration session the set of customer requirements can become inconsistent with the configuration knowledge base. Another example is the engineering phase of a configuration knowledge base where the underlying constraints can become inconsistent with a set of test cases. In such situations we are in the need of techniques that support the identification of minimal sets of faulty constraints that have to be deleted in order to restore consistency. In this paper we introduce a divide and conquer-based diagnosis algorithm (FastDiag) that identifies minimal sets of faulty constraints in an overconstrained problem. This algorithm is specifically applicable in scenarios where the efficient identification of leading (preferred) diagnoses is crucial. We compare the performance of FastDiag with the conflict-directed calculation of hitting sets and present an in-depth performance analysis that shows the advantages of our approach.

Type
Articles
Copyright
Copyright © Cambridge University Press 2012

1. INTRODUCTION

Constraint technologies (Tsang, Reference Tsang1993) are applied in different areas such as configuration (Mittal & Frayman, Reference Mittal and Frayman1989; Fleischanderl et al., Reference Fleischanderl, Friedrich, Haselboeck, Schreiner and Stumptner1998; Sinz & Haag, Reference Sinz and Haag2007), recommendation (Felfernig et al., Reference Felfernig, Friedrich, Schubert, Mandl, Mairitsch and Teppan2009), and scheduling (Castillo et al., Reference Castillo, Borrajo and Salido2005). There are many scenarios where the underlying constraint sets can become overconstrained. For example, when implementing a configuration knowledge base, constraints can become inconsistent with a set of test cases (Felfernig et al., Reference Felfernig, Friedrich, Jannach and Stumptner2004). Alternatively, when interacting with a configurator application (O'Sullivan et al., Reference O'Sullivan, Papdopoulos, Faltings and Pu2007; Felfernig et al., Reference Felfernig, Friedrich, Schubert, Mandl, Mairitsch and Teppan2009), the given set of customer requirements (represented as constraints) can become inconsistent with the configuration knowledge base. In both situations there is a need of an intelligent assistance that actively supports users of a constraint-based application (end users or knowledge engineers). A widespread approach to support users in the identification of minimal sets of faulty constraints is to combine conflict detection (e.g., see Junker, Reference Junker2004) with a corresponding hitting set algorithm (DeKleer & Williams, Reference DeKleer and Williams1987; Reiter, Reference Reiter1987; DeKleer et al., Reference DeKleer, Mackworth and Reiter1992). In their original form these algorithms are applied for the calculation of minimal (cardinality) diagnoses that are typically determined with breadth-first search. Further diagnosis algorithms have been developed that follow a best-first search regime where the expansion of the hitting set search tree is guided by failure probabilities of components (DeKleer, Reference DeKleer1990). Another example for such an approach is presented in (Felfernig et al., Reference Felfernig, Friedrich, Schubert, Mandl, Mairitsch and Teppan2009), where similarity metrics are used to guide the (best-first) search for a preferred (plausible) minimal diagnosis (including repairs).

Both simple breadth-first search and best-first search diagnosis approaches are predominantly relying on the calculation of conflict sets (Junker, Reference Junker2004). In this context, the determination of a minimal diagnosis of cardinality n requires the identification of at least n minimal conflict sets. In this paper we introduce a diagnosis algorithm (FastDiag) that allows to determine one minimal diagnosis at a time with the same computational effort related to the calculation of one conflict set at a time. The algorithm supports the identification of preferred diagnoses given predefined preferences regarding a set of decision alternatives. FastDiag is boosting the applicability of diagnosis methods in scenarios such as online configuration and reconfiguration (Felfernig et al., Reference Felfernig, Friedrich, Jannach and Stumptner2004), recommendation of products and services (Felfernig et al., Reference Felfernig, Friedrich, Schubert, Mandl, Mairitsch and Teppan2009), and (more generally) in scenarios where the efficient calculation of preferred (leading) diagnoses is crucial (DeKleer, Reference DeKleer1990). FastDiag is not restricted to constraint-based systems but it is also applicable, for example, in the context of SAT solving (Marques-Silva & Sakallah, Reference Marques-Silva and Sakallah1996) and description logics reasoning (Friedrich & Shchekotykhin, Reference Friedrich and Shchekotykhin2005).

The remainder of this paper is organized as follows. In Section 2 we introduce a simple example configuration task from the automotive domain. In Section 3 we discuss the basic hitting set-based approach to the calculation of diagnoses. In Section 4 we introduce an algorithm (FastDiag) for calculating preferred diagnoses for a given overconstrained problem. In Section 5 we present a detailed evaluation of FastDiag, which clearly outperforms standard hitting set-based algorithms in the calculation of the topmost-n preferred diagnoses. With Section 6 we provide an overview of related work in the field. The paper is concluded with Section 7.

2. EXAMPLE DOMAIN: CAR CONFIGURATION

Car configuration will serve as a working example throughout this paper. Because we exploit configuration problems for the discussion of our diagnosis algorithm, we first introduce a formal definition of a configuration task. This definition is based on Felfernig et al. (Reference Felfernig, Friedrich, Jannach and Stumptner2004) but is given in the context of a constraint satisfaction problem (CSP; Tsang, Reference Tsang1993).

Definition 1 (configuration task)

A configuration task can be defined as a CSP (V, D, C). Here, V = {v 1, v 2, … , v n} represents a set of finite domain variables; D = {dom(v 1), dom(v 2), … , dom(vn)} represents a set of variable domains dom(vk), where dom(vk) represents the domain of variable vk; C = C KBC R, where C KB = {c 1, c 2, … , cq} is a set of domain specific constraints (the configuration knowledge base) that restrict the possible combinations of values assigned to the variables in V; and C R = {cq +1, cq +2, … , ct} is a set of customer requirements that is also represented as constraints.■

A simplified example of a configuration task in the automotive domain is the following. In this example, type represents the car type, pdc is the park distance control functionality, fuel represents the fuel consumption per 100 km, a skibag allows ski stowage inside the car, and 4-wheel represents the corresponding actuation type. These variables describe the potential set of requirements that can be specified by the user (customer). The possible combinations of these requirements are defined by a set of constraints that are denoted as C KB, which is defined as C KB = {c 1, c 2, c 3, c 4} in our example. Furthermore, we assume the set of customer requirements C R = {c 5, c 6, c 7}.Footnote 1

  • V = {type, pdc, fuel, skibag, 4-wheel}

  • D = {dom(type) = {city, limo, combi, xdrive}, dom(pdc) = {yes, no}, dom(fuel) = {4l, 6l, 10l}, dom(skibag) = {yes, no}, dom(4-wheel) = {yes, no}

  • C KB = {c 1: 4-wheel = yestype = xdrive, c 2: skibag = yestypecity, c 3: fuel = 4ltype = city, c 4: fuel = 6ltypexdrive}

  • C R = {c 5: type = combi, c 6: fuel = 4l, c 7: 4-wheel = yes}

On the basis of this configuration task definition, we can now introduce the definition of a concrete configuration (solution for a configuration task).

Definition 2 (configuration)

A configuration for a given configuration task (V, D, C) is an instantiation I = {v 1= ins1, v 2 = ins2, … , vn = insn}, where insk ∈ dom(vk).■

A configuration is consistent if the assignments in I are consistent with the ciC. Furthermore, a configuration is complete if all variables in V are instantiated. Finally, a configuration is valid if it is consistent and complete.

3. DIAGNOSING OVERCONSTRAINED PROBLEMS

For the configuration task introduced in Section 2 we are not able to find a solution, for example, a combi-type car does not support a fuel consumption of 4l per 100 km. Consequently, we want to identify minimal sets of constraints (ciC R) which have to be deleted in order to be able to identify a solution (restore the consistency). In the example of Section 2 the set of constraints C R = {c 5, c 6, c 7} is inconsistent with the constraints C KB = {c 1, c 2, c 3, c 4}, that is, no solution can be found for the underlying configuration task. A standard approach to determine a minimal set of constraints that have to be deleted from an overconstrained problem is to resolve all minimal conflicts contained in the constraint set. The determination of such constraints is based on a conflict detection algorithm (e.g., see Junker, Reference Junker2004), the derivation of the corresponding diagnoses is based on the calculation of hitting sets (Reiter, Reference Reiter1987). Because both the notion of a (minimal) conflict and the notion of a (minimal) diagnosis will be used in the following sections, we provide the corresponding definitions here.

Definition 3 (conflict set)

A conflict set is a set CS ⊆ C R such that C KB ∪ CS is inconsistent. CS is a minimal if there does not exist a conflict set CS′ with CS′ ⊂ CS.■

In our working example we can identify three minimal conflict sets, which are CS1 = {c 5, c 6}, CS2 = {c 5, c 7}, and CS3 = {c 6, c 7}.

CS1, CS2, and CS3 are conflict sets because CS1C KB ∨ CS2C KB ∨ CS3CKB is inconsistent. The minimality property is fulfilled because a conflict set CS4 does not exist with CS4 ⊂ CS1 or CS4 ⊂ CS2 or CS4 ⊂ CS3. The standard approach to resolve the given conflicts is the construction of a corresponding hitting set directed acyclic graph (HSDAG; Reiter, Reference Reiter1987) where the resolution of all minimal conflict sets automatically corresponds to the identification of a minimal diagnosis. A minimal diagnosis in our application context is a minimal set of customer requirements contained in the set of car features (C R) that has to be deleted from C R in order to make the remaining constraints consistent with C KB. Because we are dealing with the diagnosis of customer requirements, we introduce the definition of a customer requirements diagnosis problem (Definition 4). This definition is based on the definition given in Felfernig et al. (Reference Felfernig, Friedrich, Jannach and Stumptner2004).

Definition 4 (CR diagnosis problem)

A customer requirements diagnosis (CR diagnosis) problem is defined as a tuple (C KB, C R), where C R is the set of given customer requirements and C KB represents the constraints part of the configuration knowledge base.■

The definition of a CR diagnosis that corresponds to a given CR diagnosis problem is the following (see Definition 5).

Definition 5 (CR diagnosis)

A CR diagnosis for a CR diagnosis problem (C KB, C R) is a set Δ ⊆ C R, such that C KB ∪ (C R − Δ) is consistent. Δ is minimal if there does not exist a diagnosis Δ′ ⊂ Δ, such that C KB ∪ (C R − Δ′) is consistent.■

The HSDAG algorithm for determining minimal diagnoses is discussed in detail in Reiter (Reference Reiter1987). The concept of this algorithm will be explained on the basis of our working example. It relies on a conflict detection algorithm that is responsible for detecting minimal conflicts in a given set of constraints (in our case, in the given customer requirements). One conflict detection algorithm is QuickXplain (Junker, Reference Junker2004), which is based on an efficient divide and conquer search strategy. For the purposes of our working example let us assume that the first minimal conflict set determined by QuickXplain is the set CS1 = {c 5, c 6}. Because of the minimality property, we are able to resolve each conflict by simply deleting one element from the set, for example, in the case of CS1 we have to either delete c 5 or c 6. Each variant to resolve a conflict set is represented by a specific path in the corresponding HSDAG. The HSDAG for our working example is depicted in Figure 1. The deletion of c 5 from CS1 triggers the calculation of another conflict set CS3 = {c 6, c 7} because C R − {c 5} ∪ C KB is inconsistent. If we decide to delete c 6 from CS1, C R − {c 6} ∪ C KB remains inconsistent, which means that QuickXplain returns another minimal conflict set, which is CS2 = {c 5, c 7}.

Fig. 1. Hitting set directed acyclic graph (HSDAG; Reiter, Reference Reiter1987) for the CR diagnosis problem (C R = {c 5, c 6, c 7}, C KB = {c 1, c 2, c 3, c 4}). The sets {c 5, c 6}, {c 6, c 7}, and {c 5, c 7} are the minimal diagnoses; the conflict sets CS1, CS2, and CS3 are determined on the basis of QuickXplain (Junker, Reference Junker2004).

The original HSDAG algorithm (Reiter, Reference Reiter1987) follows a strict breadth-first search regime. Following this strategy, the next node to be expanded in our working example is the minimal conflict set CS3, which has been returned by QuickXplain for C R − {c 5} ∪ C KB. In this context, the first option to resolve CS3 is to delete c 6. This option is a valid one and Δ1= {c 5, c 6} is the resulting minimal diagnosis. The second option for resolving CS3 is to delete the constraint c 7. In this case, we have identified the next minimal diagnosis Δ2 = {c 5, c 7} because C R − {c 5, c 7} ∪ C KB is consistent. This way we are able to identify all minimal sets of constraints Δi that, if deleted from C R, help to restore the consistency with C KB. If we want to calculate the complete set of diagnoses for our working example, we still have to resolve the conflict set CS2. The first option to resolve CS2 is to delete c 5, because {c 5, c 6} has already been identified as a minimal diagnosis, we can close this node in the HSDAG. The second option to resolve CS2 is to delete c 7. In this case we have determined the third minimal diagnosis, which is Δ3 = {c 6, c 7}.

In our working example we are able to enumerate all possible diagnoses that help to restore consistency. However, the calculation of all minimal diagnoses is expensive and thus in many cases not practicable for interactive settings. Because users are often interested in a reduced subset of all the potential diagnoses, alternative algorithms are needed that are capable of identifying preferred diagnoses (Reiter, Reference Reiter1987; DeKleer, Reference DeKleer1990; Felfernig et al., Reference Felfernig, Friedrich, Schubert, Mandl, Mairitsch and Teppan2009). Such approaches have already been developed (DeKleer, Reference DeKleer1990; Felfernig et al., Reference Felfernig, Friedrich, Schubert, Mandl, Mairitsch and Teppan2009); however, they are still based on the resolution of conflict sets, which is computationally expensive (see Section 5). Our idea presented in this paper is a diagnosis algorithm that helps to determine preferred diagnoses without the need of calculating the corresponding conflict sets. The basic properties of FastDiag will be discussed in Section 4.

4. CALCULATING PREFERRED DIAGNOSES WITH FastDiag

4.1. Preferred diagnoses

Users typically prefer to keep the important requirements and to change or delete (if needed) the less important ones (Junker, Reference Junker2004). The major goal of (model-based) diagnosis tasks is to identify the preferred (leading) diagnoses, which are not necessarily minimal cardinality ones (DeKleer, Reference DeKleer1990). For the characterization of a preferred diagnosis we will rely on the definition of a total ordering of the given set of constraints in C (respectively C R). Such a total ordering can be achieved, for example, by directly asking the customer regarding the preferences, by applying multiattribute utility theory (Winterfeldt & Edwards, Reference Winterfeldt and Edwards1986; Ardissono et al., Reference Ardissono, Felfernig, Friedrich, Jannach, Petrone, Schaefer and Zanker2003) where the determined interest dimensions correspond with the attributes of C R or by applying the rankings determined by conjoint analysis (Belanger, Reference Belanger2005). The following definition of a lexicographical ordering (Definition 6) is based on total orderings for constraints that has been applied in (Junker, Reference Junker2004) for the determination of preferred conflict sets.

Definition 6 (total lexicographical ordering)

Given a total order < on C, we enumerate the constraints in C in increasing < order c 1..cn, starting with the least important constraints (i.e., ci < cji < j). We compare two subsets X and Y of C lexicographically:

\matrix{X \gt_{\rm lex}Y \quad {\rm iff} \cr & \cr \exists k\colon \ c_k \in Y - X \quad {\rm and} \cr & \cr X \bigcap \lcub c_{k+1}\comma \ldots \comma \; c_t \rcub = Y \bigcap \lcub c_{k+1}\comma \ldots \comma \; c_t \rcub .}

Based on this definition of a lexicographical ordering, we can now introduce the definition of a preferred diagnosis.■

Definition 7 (preferred diagnosis)

A minimal diagnosis Δ for a given CR diagnosis problem (C R, C KB) is a preferred diagnosis for (C R, C KB) if and only if another minimal diagnosis Δ′ with Δ′ >lex Δ.■

In our working example we assumed the lexicographical ordering (c 5 < c 6 < c 7), that is, the most important customer requirement is c 7 (the 4-wheel functionality). If we assume that X = {c 5, c 7} and Y = {c 6, c 7} then YX = {c 6} and X ∩ {c 7} = Y ∩ {c 7}. Intuitively, {c 5, c 7} is a preferred diagnosis compared to {c 6, c 7} because both diagnoses include c 7 but c 5 is less important than c 6. If we change the ordering to (c 7 < c 6 < c 5), FastDiag would then determine {c 6, c 7} as the preferred minimal diagnosis.

4.2. FastDiag approach

For the following discussions we introduce the set AC, which is initially set to C KBC R [the union of customer requirements (C R) and the configuration knowledge base (C KB)] and subsequently changed when the algorithm runs. The basic idea of the FastDiag algorithm (Algorithm 1) is the following.Footnote 2 In our example, the set of customer requirements C R = {c 5, c 6, c 7} includes at least one minimal diagnosis because C KB is consistent and C KBC R is inconsistent. In the extreme case C R itself represents the minimal diagnosis, which then means that all constraints in C R are part of the diagnosis, that is, each ciC R represents a singleton conflict. In our case C R obviously does not represent a minimal diagnosis—the set of diagnoses in our working example is {Δ1 = {c 5, c 6}, Δ2 = {c 5, c 7}, Δ3 = {c 6, c 7}} (see Section 3). The next step in Algorithm 1 is to divide the set of customer requirements C R = {c 5, c 6, c 7} into the two sets C1 = {c 5} and C2 = {c 6, c 7} and to check whether AC − C1 is already consistent. If this is the case, we can omit the set C2 because at least one minimal diagnosis can already be identified in C1. In our case, AC − {c 5} is inconsistent, which means that we have to consider further elements from C2. Therefore, C2 = {c 6, c 7} is divided into the sets {c 6} and {c 7}. In the next step we can check whether AC – (C1 ∪ {c 6}) is consistent—this is the case that means that we do not have to further take into account {c 7} for determining the diagnosis. Because {c 5} does not include a diagnosis but {c 5} ∪ {c6} includes a diagnosis, we can deduce that {c 6} must be part of the diagnosis. The final step is to check whether AC – {c 6} leads to a diagnosis without including {c 5}. We see that AC – {c 6} is inconsistent, that is, Δ = {c 5, c 6} is a minimal diagnosis for the CR diagnosis problem (C R = {c 5, c 6, c 7}, C KB = {c 1, … , c 4}). An execution trace of the FastDiag algorithm in the context of our working example is shown in Figure 2.

Algorithm 1: FastDiag

  1. 1 func FastDiag(C ⊆ AC, AC = {c 1..ct}) : diagnosis Δ

  2. 2 ifisEmpty(C) or inconsistent(AC – C) return

  3. 3 elsereturn FD(∅, C, AC);

  4. 4 func FD(D, C = {c 1..cq}, AC) : diagnosis Δ

  5. 5 if D ≠ ∅ and consistent(AC) return ∅;

  6. 6 ifsingleton(C) return C;

  7. 7 k=q/2;

  8. 8 C1 = {c 1..ck}; C2 = {c k+1..cq};

  9. 9 D1 = FD(C1, C2, AC − C1);

  10. 10 D2 = FD(D1, C1, AC − D1);

  11. 11 return(D1 ∪ D2);

Fig. 2. FastDiag execution trace for the C R diagnosis problem (C R = {c 5, c 6, c 7}, C KB = {c 1, c 2, c 3, c 4}). Enumerations 1–6 show the order in which the different incarnations of the FD function are activated.

4.3. Calculating n > 1 diagnoses

In order to be able to calculate n > 1 diagnosesFootnote 3 with FastDiag we have to adopt the HSDAG construction introduced in Reiter (Reference Reiter1987) by substituting the resolution of conflicts (see Fig. 1) with the deletion of elements ci from C R (C) (see Fig. 3). In this case, a path in the HSDAG is closed if no further diagnoses can be identified for this path or the elements of the current path are a superset of an already closed path (containment check). Conform to the HSDAG approach presented in Reiter (Reference Reiter1987), we expand the search tree in a breadth-first manner. In our working example, we can delete {c 5} (one element of the first diagnosis Δ1 = {c 5, c 6}) from the set C R of diagnosable elements and restart the algorithm for finding another minimal diagnosis for the CR diagnosis problem ({c 6, c 7}, C KB). Because AC − {c 5} is inconsistent, we can conclude that C R = {c 6, c 7} includes another minimal diagnosis (Δ2 = {c 6, c 7}), which is determined by FastDiag for the CR diagnosis problem (C R − {c 5}, C KB). Finally, we have to check whether the CR diagnosis problem ({c 5, c 7}, C KB) leads to another minimal diagnosis. This is the case, that is, we have identified the last minimal diagnosis that is Δ3 = {c 5, c 7}. The calculation of all diagnoses in our working example on the basis of FastDiag is depicted in Figure 3.

Fig. 3. FastDiag: calculating the complete set of minimal diagnoses. Enumerations 1–6 show the order in which the different incarnations of the FastDiag algorithm are activated.

Note that for a given set of constraints (C) FastDiag always calculates the preferred diagnosis in terms of Definition 7. If Δ1 is the diagnosis returned by FastDiag and we delete one element from Δ1 (e.g., c 5), then FastDiag returns the preferred diagnosis for the CR diagnosis problem ({c 5, c 6, c 7} − {c 5}, {c 1, … , c 7}), which is Δ2 in our example case, that is, Δ1 >lex Δ2. Consequently, diagnoses part of one path in the search tree (such as Δ1 and Δ2 in Fig. 3) are in a strict preference ordering. However, there is only a partial order between individual diagnoses in the search tree in the sense that a diagnosis at level k is not necessarily preferable to a diagnosis at level k + 1.

4.4. FastDiag properties

A detailed listing of the basic operations of FastDiag is shown in Algorithm 1. First, the algorithm checks whether the constraints in C contain a diagnosis, that is, whether AC − C is consistent—the function assumes that it is activated in the case that AC is inconsistent. If AC − C is inconsistent or C = ∅, FastDiag returns the empty set as result (no solution can be found, line 2 of the algorithm). If at least one diagnosis is contained in the set of constraints C, FastDiag activates the FastDiag function that is in charge of retrieving a preferred diagnosis (line 3 of the algorithm). FastDiag follows a divide and conquer strategy where the recursive function FastDiag divides the set of constraints (in our case the elements of C R) into two different subsets (C1 and C2; line 8 of the algorithm) and tries to figure out whether C1 already contains a diagnosis (line 5 of the algorithm). If this is the case, FastDiag does not further take into account the constraints in C2. If only one element is remaining in the current set of constraints C and the current set of constraints in AC is still inconsistent, then the element in C is part of a minimal diagnosis (line 6 of the algorithm). FastDiag is complete in the sense that if C contains exactly one minimal diagnosis then FD will find it. If there are multiple minimal diagnoses then one of them (the preferred one, see Definition 7) is returned. The recursive function FD is triggered if AC − C is consistent and C consists of at least one constraint. In such a situation a corresponding minimal diagnosis can be identified. If we assume the existence of a minimal diagnosis Δ that cannot be identified by FastDiag, this would mean that there exists at least one constraint ca in C that is part of the diagnosis but not returned by FD. The only way in which elements can be deleted from C (i.e., not included in a diagnosis) is by the return ∅ statement in FD and ∅ is only returned in the case that AC is consistent, which means that the elements of C2 (C1) from the previous FD incarnation are not part of the preferred diagnosis. Consequently, it is not possible to delete elements from C, which are part of the diagnosis. FastDiag computes only minimal diagnoses in the sense of Definition 5. If we assume the existence of a nonminimal diagnosis Δ calculated by FastDiag, this would mean that there exists at least one constraint ca with Δ − {ca} is still a diagnosis. The only situation in which elements of C are added to a diagnosis Δ is if C itself contains exactly one element. If C contains only one element (let us assume ca) and AC is inconsistent (in the function FastDiag) then ca is the only element that can be deleted from AC, that is, ca must be part of the diagnosis.

5. EVALUATION

5.1. Performance of FastDiag

In this section we will compare the performance of FastDiag with the performance of the hitting set algorithm (Reiter, Reference Reiter1987) in combination with the QuickXplain conflict detection algorithm introduced in (Junker, Reference Junker2004).

The worst case complexity of FastDiag in terms of the number of consistency checks needed for calculating one minimal diagnosis is 2d × log2(n/d) + 2d, where d is the minimal diagnoses set size and n is the number of constraints (in C). The best case complexity is log2(n/d) + 2d. In the worst case each element of the diagnosis is contained in a different path of the search tree: log2(n/d) is the depth of the path, 2d represents the branching factor and the number of leaf-node consistency checks. In the best case all elements of the diagnosis are contained in one path of the search tree.

The worst case complexity of QuickXplain in terms of consistency checks needed for calculating one minimal conflict set is 2k ⋅ log2(n/k ) + 2k where k is the minimal conflicts set size and n is again the number of constraints (in C; Junker, Reference Junker2004). The best case complexity of QuickXplain in terms of the number of consistency checks needed is log2(n/k ) + 2k (Junker, Reference Junker2004). Consequently, the number of consistency checks per conflict set (QuickXplain) and the number of consistency checks per diagnosis (FastDiag) fall into a logarithmic complexity class.

Let n cs be the number of minimal conflict sets in a constraint set and n diag be the number of minimal diagnoses, then we need n diag FastDiag calls (see Algorithm 1) plus n cs additional consistency checks and n cs activations of QuickXplain with n diag additional consistency checks for determining all diagnoses. The results of a performance evaluation of FastDiag are depicted in the Figure 4, Figure 5, Figure 6, and Figure 7. The basis for these evaluations was the bicycle configuration knowledge base taken from the CLib4 (www.itu.dk/research/cla/externals/clib/) configuration benchmarks library (34 variables and about 65 constraints). For this example knowledge base we randomly generated different sets of requirements (of cardinality 5, 7, 10, and 15 requirements) and measured the performance of calculating corresponding diagnosis sets (the first diagnosis, first 5 diagnoses, first 10 diagnoses, and all diagnoses). The run time performance of the different diagnosis algorithms and the needed amount of TP calls is shown in the Figures 4–7. As solver we used the CLib-based decision diagram representation that allows for backtracking-free solution search. The tests have been executed on a standard desktop computer (Intel Core 2 Quad CPU QD9400 2.66-GHz CPU with 2 GB of RAM). Note that we have evaluated the performance of FastDiag with different other benchmark configuration knowledge bases on the CLib Web page with basically the same result. FastDiag shows to be a valuable alternative for determining diagnoses in interactive settings especially for calculating the preferred first-n solutions.

Fig. 4. Calculating the first minimal diagnosis with FastDiag versus hitting set-based diagnosis on the basis of QuickXplain for 5, 7, 10, and 15 user requirements (req): performance in milliseconds on the top and number of needed TP calls on the bottom. [A color version of this figure can be viewed online at journals.cambridge.org/aie]

Fig. 5. Calculating the topmost-5 minimal diagnoses with FastDiag versus hitting set-based diagnosis on the basis of QuickXplain for 5, 7, 10, and 15 user requirements (req): performance in milliseconds on the top and number of needed TP calls on the bottom. [A color version of this figure can be viewed online at journals.cambridge.org/aie]

Fig. 6. Calculating the topmost-10 minimal diagnoses with FastDiag versus hitting set-based diagnosis on the basis of QuickXplain for 5, 7, 10, and 15 user requirements (req): performance in milliseconds on the top and number of needed TP calls on the bottom. [A color version of this figure can be viewed online at journals.cambridge.org/aie]

Fig. 7. Calculating all minimal diagnoses with FastDiag versus hitting set-based diagnosis on the basis of QuickXplain for 5, 7, 10, and 15 user requirements (req): performance in milliseconds on the top and number of needed TP calls on the bottom. [A color version of this figure can be viewed online at journals.cambridge.org/aie]

Figure 4 shows a comparison between the hitting set based diagnosis approach (denoted as HSDAG) and the FastDiag algorithm (denoted as FastDiag) in the case that only one diagnosis is calculated. FastDiag clearly outperforms the HSDAG approach independent of the way in which diagnoses are calculated (breadth-first or best-first). Figure 5 shows the performance evaluation for calculating the topmost-5 minimal diagnoses. The result is similar to the one for calculating the first diagnosis, that is, FastDiag outperforms the two HSDAG versions. Our evaluations show that FastDiag is very efficient in calculating preferred minimal diagnoses.

5.2. Empirical evaluation

Based on a computer configuration dataset of the Graz University of Technology (N = 415 configurations) we evaluated the three presented approaches with respect to their capability of predicting diagnoses that are acceptable for the user (diagnoses leading to selected configurations). Each entry of the dataset consists of a set of initial user requirements C R inconsistent with the configuration knowledge base C KB and the configuration that had been finally selected by the user. Because the original requirements stored in the dataset are inconsistent with the configuration knowledge base, we could determine those diagnoses that indicated which minimal sets of requirements have to be deleted in order to be able to find a solution.

We evaluated the prediction accuracy of the three diagnosis approaches (breadth-first HSDAG, FastDiag, and best-first HSDAG). We first measured the distance between the predicted position of a diagnosis leading to a selected configuration and the expected position of the diagnosis (which is 1). This distance was measured in terms of the root mean square deviation (RMSD; see Formula 1). The results of this first analysis are the following: an important result is that FastDiag has the lowest RMSD value (0.95), best-first HSDAG has a similar prediction quality (RMSD = 0.97), and breadth-first HSDAG has the worst prediction quality (RMSD = 1.64).

(1)
{\rm RMSD} = \sqrt {{1 \over n}\sum\limits_1^n \lpar {\rm predicted \;position} - 1\rpar ^2}.

RMSD is an often used quality estimate but it provides only a limited view on the precision of a (diagnosis) prediction. Therefore, we wanted to analyze the precision of the diagnosis selection strategies discussed in this paper—a measure for the precision of a diagnosis algorithm is depicted in Formula 2. The idea behind this measure is to describe how often a diagnosis that leads to a selected configuration (selected by the user) is among the topmost-n ranked diagnoses. As shown in Table 1, FastDiag and best-first HSDAG have highest prediction accuracy in terms of precision, whereas the breadth-first HSDAG approach shows the worst precision.

(2)
precision = {\vert correctly \, predicted \, diagnoses \vert \over \vert predicted \, diagnoses \vert }.

Table 1. Precision of FastDiag versus HSDAG based approaches

Note: HSDAG, hitting set directed acyclic graph.

We applied a Mann–Whitney U test in order to statistically analyze differences between the three diagnosis approaches in terms of ranking behavior. We conducted a pairwise comparison between the diagnosis approaches on the basis of the mentioned Mann–Whitney U test. We could identify a significant difference between the rankings of best-first HSDAG and breadth-first HSDAG based diagnosis (p = 6.625e −5) and also between FastDiag and breadth-first HSDAG based diagnosis (p < 2.441e −7). There was no significant difference between best-first HSDAG and FastDiag in terms of ranking behavior (p = 0.12).

6. RELATED WORK

6.1. Knowledge base analysis

The authors of Felfernig et al. (Reference Felfernig, Friedrich, Jannach and Stumptner2004) introduce an algorithm for the automated debugging of configuration knowledge bases. The idea is to combine a conflict detection algorithm such as QuickXplain (Junker, Reference Junker2004) with the hitting set algorithm used in model-based diagnosis (MBD; Reiter, Reference Reiter1987) for the calculation of minimal diagnoses. In this context, conflicts are induced by test cases (examples) that, for example, stem from previous configuration sessions, have been automatically generated, or have been explicitly defined by domain experts. Further applications of MBD in constraint set debugging are introduced in Felfernig et al. (Reference Felfernig, Friedrich, Isak, Shchekotykhin, Teppan and Jannach2007) where diagnosis concepts are used to identify minimal sets of faulty transition conditions in state charts and in Felfernig et al. (Reference Felfernig, Friedrich, Teppan and Isak2008) where MBD is applied for the identification of faulty utility constraint sets in the context of knowledge-based recommendation. In contrast to Felfernig et al. (Reference Felfernig, Friedrich, Jannach and Stumptner2004, Reference Felfernig, Friedrich, Isak, Shchekotykhin, Teppan and Jannach2007, Reference Felfernig, Friedrich, Teppan and Isak2008), our work provides an algorithm that allows to directly determine diagnoses without the need to determine corresponding conflict sets. FastDiag can be applied in knowledge engineering scenarios for calculating preferred diagnoses for faulty knowledge bases given that we are able to determine reasonable ordering for the given set of constraints; this could be achieved, for example, by the application of corresponding complexity metrics (Chen & Suen, Reference Chen and Suen2003).

6.2. Conflict detection

In contrast to the algorithm presented in this paper, calculating diagnoses for inconsistent requirements typically relies on the existence of (minimal) conflict sets. A well-known algorithm with a logarithmic number of consistency checks, depending on the number of constraints in the knowledge base and the cardinality of the minimal conflicts, is QuickXplain (Junker, Reference Junker2004). It has made a major contribution to more efficient interactive constraint-based applications. QuickXplain is based on a divide and conquer strategy. FastDiag relies on the same principle of divide and conquer but with a different focus, namely, the determination of minimal diagnoses. QuickXplain calculates minimal conflict sets based on the assumption of a linear preference ordering among the constraints. Similarly, if we assume a linear preference ordering of the constraints in C, FastDiag calculates preferred diagnoses.

6.3. Interactive settings

Note that in the interactive configuration scenario discussed in this paper our goal was to support open configuration that lets the user explore the configuration space where the system proactively points out inconsistent requirements, such a functionality is often provided by commercial configuration environments. O'Sullivan et al. (Reference O'Sullivan, Papdopoulos, Faltings and Pu2007) focus on interactive settings where users of constraint-based applications are confronted with situations where no solution can be found. In this context, O'Sullivan et al. (Reference O'Sullivan, Papdopoulos, Faltings and Pu2007) introduce the concept of minimal exclusion sets that correspond to the concept of minimal diagnoses as defined in Reiter (Reference Reiter1987). As mentioned, the major focus of O'Sullivan et al. (Reference O'Sullivan, Papdopoulos, Faltings and Pu2007) are settings where the proposed algorithm supports users in the identification of acceptable exclusion sets. The authors propose an algorithm (representative explanations) that helps to improve the quality of the presented exclusion set (in terms of diversity) and thus increases the probability of finding an acceptable exclusion set for the user. Our diagnosis approach calculates preferred diagnoses in terms of a predefined ordering of the constraint set. Thus, compared to the work of O'Sullivan et al. (Reference O'Sullivan, Papdopoulos, Faltings and Pu2007), we follow a different approach in terms of focusing more on preferences than on the degree of representativeness.

6.4. Diagnosis algorithms

There are a couple of algorithms that help to improve the efficiency of diagnosis determination; they are further developments of the original algorithm introduced by Reiter (Reference Reiter1987). These approaches focus on making the construction of hitting sets more efficient. Wotawa (Reference Wotawa2001) introduces an algorithm that reduces the number of subset checks compared to the original HSDAG approach (Reiter, Reference Reiter1987). Fijany and Vatan (Reference Fijany and Vatan2004) introduce an approach to represent the problem of determining minimal hitting sets as a corresponding integer programming problem. Further approaches to optimize the determination of hitting sets are discussed in Lin and Jiang (Reference Lin and Jiang2003). All the mentioned approaches rely on (minimal) conflict sets that are the basis for calculating a set of minimal diagnoses, whereas FastDiag is a complete and minimal diagnosis algorithm without the need of conflict sets. It is important to mention that especially when calculating the first n-diagnoses (for n > 1, i.e., not a single diagnosis), FastDiag can also exploit the mentioned algorithms of Lin and Jiang (Reference Lin and Jiang2003) and Wotawa (Reference Wotawa2001) for the calculation of more than one diagnosis, that is, it is not bound to the usage of the original HSDAG algorithm. Lin and Jiang (Reference Lin and Jiang2002) introduce an approach to determine hitting sets on the basis of genetic algorithms; a similar approach to the determination of diagnoses is presented in Feldman et al. (Reference Feldman, Provan and Gemund2008) who introduce a stochastic fault diagnosis algorithm, which is based on greedy stochastic search. Such approaches show to significantly improve search performance; however, there is no general guarantee of completeness and diagnosis minimality. Finally, there exist a couple of algorithms that are improving the algorithmic performance of diagnosis calculation due to additional knowledge about the structural properties of the diagnosis problem. For example, Jannach and Liegl (Reference Jannach and Liegl2006) show the determination of (minimal) diagnoses for the case of conjunctive queries on database tables (the set of diagnoses can be precompiled by executing the individual parts of the query on the given data set). Siddiqi and Huang (Reference Siddiqi and Huang2007) show one approach to exploit structural properties of system descriptions to improve the overall performance of diagnosis determination; in this case cones are areas in a gate with a certain structure and a certain probability of including a diagnosis, and the search process focuses on exactly those areas. FastDiag does not exploit specific properties of the underlying constraint set; however, taking into account such properties can further improve the performance of the algorithm. Corresponding evaluations are within the scope of future work.

6.5. Personalized diagnosis

Many of the existing diagnosis approaches do not take into account the need for personalizing the set of diagnoses to be presented to a user. Identifying diagnoses of interest in an efficient manner is a clear surplus regarding the acceptance of the underlying application, for example, users of a configurator application are not necessarily interested in minimal cardinality diagnoses (Reiter, Reference Reiter1987) but rather in those that correspond to their current preferences. A first step toward the application of personalization concepts in the context of knowledge-based recommendation is presented in Felfernig et al. (Reference Felfernig, Friedrich, Schubert, Mandl, Mairitsch and Teppan2009). The authors introduce an approach that calculates leading diagnoses on the basis of similarity measures used for determining n-nearest neighbors. A general approach to the identification of preferred diagnoses is introduced in DeKleer (Reference DeKleer1990), where probability estimates are used to determine the leading diagnoses with the overall goal to minimize the number of measurements needed for identifying a malfunctioning device. Basic principles of determining diagnoses in knowledge-based recommendation scenarios are discussed in Jannach and Liegl (Reference Jannach and Liegl2006). Furthermore, Froehlich et al. (Reference Fröhlich, Nejdl and Schroeder1994) introduce a logical characterization of preferences that are expressed as preference relations on single diagnoses and modal logical formulas on groups of diagnoses. In contrast to our work, Froehlich et al. (Reference Fröhlich, Nejdl and Schroeder1994) do not provide an algorithm to efficiently calculate preferred diagnoses. We see our work as a major contribution in this context because FastDiag helps to identify leading diagnoses more efficiently. Further empirical studies in different application contexts are within the major focus of our future work.

7. CONCLUSION

In this paper we have introduced a new diagnosis algorithm (FastDiag), which allows the efficient calculation of one diagnosis at a time with logarithmic complexity in terms of the number of consistency checks. Thus, the computational complexity for the calculation of one minimal diagnosis is equal to the calculation of one minimal conflict set in hitting set-based diagnosis approaches. The algorithm is especially applicable in settings where the number of conflict sets is equal to or larger than the number of diagnoses, or in settings where preferred (leading) diagnoses are needed. Issues for future work are the determination of repair actions for diagnoses, the further development of FastDiag for supporting anytime diagnosis tasks, and the conduction of further empirical studies in different configurator application domains.

Alexander Felfernig is a Professor of applied software engineering at Graz University of Technology. Alexander is also Cofounder and Director of ConfigWorks, a company focused on the development of knowledge-based recommendation technologies. Prof. Felfernig's research focuses on intelligent methods and algorithms supporting the development and maintenance of complex knowledge bases. Furthermore, he is interested in the application of AI techniques in the software engineering context, for example, the application of decision and recommendation technologies to make software requirements engineering processes more effective. In 2009, Dr. Felfernig received the Heinz–Zemanek Award from the Austrian Computer Society for his research.

Monika Schubert is a PhD student in the group of Applied Software Engineering at Graz University of Technology. She received her MS in software engineering and economy from Graz University of Technology. Her research focuses on knowledge-based systems, intelligent product configuration, model-based diagnosis, and product recommendation. Monika is also interested in user interaction with complex knowledge bases.

Christoph Zehentner is a PhD student in the Applied Software Engineering Group at Graz University of Technology. He received his MS in software engineering and economy from Graz University of Technology. His research focuses on testing and debugging of knowledge-based systems as well as recommendations in group decision environments. Christoph is also interested in multiple-agent system coordination and decision making.

Footnotes

1 Note that constraints are not necessarily unary or binary (we tried to keep the example simple). They can also be n-ary.

2 In Algorithm 1 we use the set C instead of C R because the application of the algorithm is not restricted to inconsistent sets of customer requirements.

3 Typically a CR diagnosis problem has more than one related diagnosis.

References

REFERENCES

Ardissono, L., Felfernig, A., Friedrich, G., Jannach, D., Petrone, G., Schaefer, R., & Zanker, M. (2003). A framework for the development of personalized, distributed web-based configuration systems. AI Magazine 24(3), 93108.Google Scholar
Belanger, F. (2005). A conjoint analysis of online consumer satisfaction. Journal of Electronic Commerce Research 6, 95111.Google Scholar
Castillo, L., Borrajo, D., & Salido, M. (2005). Planning, Scheduling and Constraint Satisfaction: From Theory to Practice. Amsterdam: IOS Press.Google Scholar
Chen, Z., & Suen, C. (2003). Measuring the complexity of rule-based expert systems. Expert Systems with Applications 7(4), 467481.CrossRefGoogle Scholar
DeKleer, J. (1990). Using crude probability estimates to guide diagnosis. AI Journal 45(3), 381391.Google Scholar
DeKleer, J., Mackworth, A., & Reiter, R. (1992). Characterizing diagnoses and systems. AI Journal 56(2–3), 197222.Google Scholar
DeKleer, J., & Williams, B. (1987). Diagnosing multiple faults. AI Journal 32(1), 97130.Google Scholar
Feldman, A., Provan, G., & Gemund, A. (2008). Computing minimal diagnoses by greedy stochastic search. Proc. 23rd AAAI Conf. Artificial Intelligence (AAAI’08), pp. 911–918, Chicago.Google Scholar
Felfernig, A., Friedrich, G., Isak, K., Shchekotykhin, K., Teppan, E., & Jannach, D. (2007). Automated debugging of recommender user interface descriptions. Journal of Applied Intelligence 31(1), 114.CrossRefGoogle Scholar
Felfernig, A., Friedrich, G., Jannach, D., & Stumptner, M. (2004). Consistency-based diagnosis of configuration knowledge bases. AI Journal 152(2), 213234.Google Scholar
Felfernig, A., Friedrich, G., Schubert, M., Mandl, M., Mairitsch, M., & Teppan, E. (2009). Plausible repairs for inconsistent requirements. Proc. 21st Int. Joint Conf. Artificial Intelligence (IJCAI’09), pp. 791–796, Pasadena, CA.Google Scholar
Felfernig, A., Friedrich, G., Teppan, E., & Isak, K. (2008). Intelligent debugging and repair of utility constraint sets in knowledge-based recommender applications. Proc. 13th ACM Int. Conf. Intelligent User Interfaces (IUI’08), pp. 218–226, Canary Islands, Spain.CrossRefGoogle Scholar
Fijany, A., & Vatan, F. (2004). New approaches for efficient solutions of hitting set problems. Proc. Int. Symp. Information and Communication Technologies, pp. 1–10, Cancun, Mexico.Google Scholar
Fleischanderl, G., Friedrich, G., Haselboeck, A., Schreiner, H., & Stumptner, M. (1998). Configuring large systems using generative constraint satisfaction. IEEE Intelligent Systems 13(4), 5968.CrossRefGoogle Scholar
Friedrich, G., & Shchekotykhin, K. (2005). A general diagnosis method for ontologies. Proc. 4th Int. Semantic Web Conference (ISWC’05), LNCS, Vol. 3729, pp. 232246. New York: Springer.Google Scholar
Fröhlich, P., Nejdl, W., & Schroeder, M. (1994). A formal semantics for preferences and strategies in model-based diagnosis. Proc. 5th Int. Workshop on Principles of Diagnosis (DX-94), pp. 106–113.Google Scholar
Jannach, D., & Liegl, J. (2006). Conflict-directed relaxation of constraints in content-based recommender systems. Proc. IEA/AIE 2006, pp. 819829, Annency, France.CrossRefGoogle Scholar
Junker, U. (2004). QuickXplain: preferred explanations and relaxations for over-constrained problems. Proc. 19th National Conf. Artificial Intelligence (AAAI’04), pp. 167–172, San Jose, CA.Google Scholar
Lin, L., & Jiang, Y. (2002). Computing minimal hitting sets with genetic algorithms. Algorithmica 32(1), 95106.Google Scholar
Lin, L., & Jiang, Y. (2003). The computation of hitting sets: review and new algorithm. Information Processing Letters 86, 177–184.CrossRefGoogle Scholar
Marques-Silva, J., & Sakallah, K. (1996). Grasp: a new search algorithm for satisfiability. Proc. Int. Conf. Computer-Aided Design, pp. 220–227, Santa Clara, CA.CrossRefGoogle Scholar
Mittal, S., & Frayman, F. (1989). Towards a generic model of configuration tasks. Proc. 11th Int. Joint Conf. Artificial Intelligence (IJCAI’89), pp. 1395–1401, Detroit, MI.Google Scholar
O'Sullivan, B., Papdopoulos, A., Faltings, B., & Pu, P. (2007). Representative explanations for over-constrained problems. Proc. 22nd National Conf. Artificial Intelligence (AAAI’07), pp. 323–328, Vancouver, Canada.Google Scholar
Reiter, R. (1987). A theory of diagnosis from first principles. AI Journal 23(1), 5795.Google Scholar
Siddiqi, S., & Huang, J. (2007). Hierarchical diagnosis of multiple faults. Proc. 20th Int. Joint Conf. Artificial Intelligence (IJCAI’07), pp. 581–586, Hyderabad, India.Google Scholar
Sinz, C., & Haag, A. (2007). Configuration. IEEE Intelligent Systems 22(1), 7890.CrossRefGoogle Scholar
Tsang, E. (1993). Foundations of Constraint Satisfaction. New York: Academic.Google Scholar
Winterfeldt, D., & Edwards, W. (1986). Decision Analysis and Behavioral Research. New York: Cambridge University Press.Google Scholar
Wotawa, F. (2001). A variant of Reiter's hitting-set algorithm. Information Processing Letters 79, 4551.CrossRefGoogle Scholar
Figure 0

Fig. 1. Hitting set directed acyclic graph (HSDAG; Reiter, 1987) for the CR diagnosis problem (CR = {c5, c6, c7}, CKB = {c1, c2, c3, c4}). The sets {c5, c6}, {c6, c7}, and {c5, c7} are the minimal diagnoses; the conflict sets CS1, CS2, and CS3 are determined on the basis of QuickXplain (Junker, 2004).

Figure 1

Fig. 2. FastDiag execution trace for the CR diagnosis problem (CR = {c5, c6, c7}, CKB = {c1, c2, c3, c4}). Enumerations 1–6 show the order in which the different incarnations of the FD function are activated.

Figure 2

Fig. 3. FastDiag: calculating the complete set of minimal diagnoses. Enumerations 1–6 show the order in which the different incarnations of the FastDiag algorithm are activated.

Figure 3

Fig. 4. Calculating the first minimal diagnosis with FastDiag versus hitting set-based diagnosis on the basis of QuickXplain for 5, 7, 10, and 15 user requirements (req): performance in milliseconds on the top and number of needed TP calls on the bottom. [A color version of this figure can be viewed online at journals.cambridge.org/aie]

Figure 4

Fig. 5. Calculating the topmost-5 minimal diagnoses with FastDiag versus hitting set-based diagnosis on the basis of QuickXplain for 5, 7, 10, and 15 user requirements (req): performance in milliseconds on the top and number of needed TP calls on the bottom. [A color version of this figure can be viewed online at journals.cambridge.org/aie]

Figure 5

Fig. 6. Calculating the topmost-10 minimal diagnoses with FastDiag versus hitting set-based diagnosis on the basis of QuickXplain for 5, 7, 10, and 15 user requirements (req): performance in milliseconds on the top and number of needed TP calls on the bottom. [A color version of this figure can be viewed online at journals.cambridge.org/aie]

Figure 6

Fig. 7. Calculating all minimal diagnoses with FastDiag versus hitting set-based diagnosis on the basis of QuickXplain for 5, 7, 10, and 15 user requirements (req): performance in milliseconds on the top and number of needed TP calls on the bottom. [A color version of this figure can be viewed online at journals.cambridge.org/aie]

Figure 7

Table 1. Precision of FastDiag versus HSDAG based approaches