1. INTRODUCTION
Product configuration has provided constraint programming with one of its most successful application domains (Sabin & Weigel, Reference Sabin and Weigel1998; Junker, Reference Junker, Rossi, van Beek and Walsh2006). Model-based, particularly constraint-based, approaches to configuration are the most successful in practice (http://www.gartner.com), because constraint-based product configurators are specified in a highly declarative formalism.
Configuration presents several modeling and reasoning challenges. First, it is challenging to maintain consistent integration between product catalogs and constraint-based configurator models. Second, constraint-based approaches need to be able to handle taxonomic inheritance among components and subsystems. Third, the space of possible configurable products is often unbounded but might be subject to resource restrictions. Fourth, users have preferences, and full customization must be possible.
Most configurator engines restrict the configuration process to some degree. In particular, a configurator will typically configure systems before subsystems. Also, isomorphic configurations provide challenges for the configurator; isomorphic configurations can be regarded as being structurally symmetric. For this reason, many configurators represent the product being configured as a set of systems rather than associating each system with a variable, which can introduce unnecessary symmetries into the configuration space.
Although constraint satisfaction techniques have supported configuration for many years, they have required extensions to the basic constraint satisfaction problem. For example, composite constraint satisfaction (Sabin & Freuder, Reference Sabin, Freuder and Luger1996) has been introduced to handle hierarchical system configuration. Mittal and Falkenhainer (Reference Mittal and Falkenhainer1990) introduced dynamic constraint satisfaction to cover problems in which the existence of features depended on the existence or values of other features; this scheme was subsequently given a formal logical semantics (Bowen & Bahler, Reference Bowen and Bahler1991). This work was subsequently extended by other researchers (Mailharro, Reference Mailharro1998; Stumptner et al., Reference Stumptner, Friedrich and Haselböck1998). Dynamic constraint satisfaction has more recently been referred to as conditional constraint satisfaction (Gelle & Faltings, Reference Gelle and Faltings2003) to distinguish dynamism due to conditional relevance of some variables and constraints from dynamism due to uncertainty and environmental change.
Many authors (including Mittal and Falkenhainer) reformulate conditional constraint satisfaction problems (CCSPs) into classic CSPs by introducing redundant domain values and augmenting the problem constraints so that some problem variables take a “not defined” value (Sabin & Gelle, Reference Sabin and Gelle2006). Although feature models (FMs) appear quite different from CCSPs, they can also be mapped to CSPs (Benavides et al., Reference Benavides, Ruiz-Cortés and Trinidad2005) and other data structures. These reformulations are problematic. First, they seem unnatural as a modeling technique, especially for large real-world configuration problems. Second, they become impractical and difficult to maintain, especially when the configuration space is extremely large or unbounded.
Our motivation arises from sophisticated tools that the formal methods community has developed for modelling and reasoning about complex engineered artifacts that can be regarded as configuration problems (Hinchey et al., Reference Hinchey, Jackson, Cousot, Cook, Bowen and Margaria2008). Our objective is to study the utility of formal methods for modeling and reasoning about configuration models. The two main contributions of this paper are the following:
1. Using well-known examples, we show how to model constraint-based configuration problems naturally and concisely in the formal methods package Alloy (Jackson, Reference Jackson2002), which is usually used for modeling software systems, and in the answer-set programming (ASP) language lparse (http://www.tcs.hut.fi/Software/smodels/)
2. In addition to providing a natural modeling paradigm, these approaches are capable of providing reasoning capabilities that are very appropriate for configuration, in particular, verifying the specification of the configuration problem to ensure that specific flaws are absent, a problem identified and studied in earlier work (Sabin & Freuder, Reference Sabin and Freuder1998). We argue that formalisms such as Alloy and lparse provide modeling tools that can be easily used by nonexperts to model and reason about configuration problems directly and naturally.
In Section 2 we informally present conditional constraint satisfaction, motivated by a well-known configuration problem, which we use as a running example. We also list some flaws that can occur in the specification of conditional configuration problems. We present both a formal methods approach (Section 3) and an answer set programming approach (Section 4) to reasoning about CCSPs. In Section 5 we show how we can easily identify flaws in CCSPs and demonstrate that Mittal and Falkenhainer's benchmark problem exhibits such flaws. We briefly show how ASP can easily find solutions involving the minimum or maximum number of options in Section 6. We turn our attention to FMs in Section 7, showing how an ASP approach can reason about these configuration models. In Section 8 we present XML representations for both CCSPs and FMs and report on our software to apply ASP methods to the data stored in such XML representations. Finally, in Section 9 we draw several conclusions and summarize our plans for future study.
2. CONDITIONAL CONSTRAINT SATISFACTION
Mittal and Falkenhainer (Reference Mittal and Falkenhainer1990) introduced CCSPs. A CCSP differs from a classical CSP in that some variables are marked as potential, which means that they need not take a value in all solutions. CCSPs allow activity constraints that deal with the existence of potential variables, including the following:
• require variable (RV), which stipulates that under certain value assignments to other variables, a potential variable must exist;
• require not variable (RN), which stipulates that under certain value assignments to other variables, a potential variable must not exist;
• always require variable (ARV), which stipulates that the existence of some other variable implies the existence of a potential variable; and
• always require not variable (ARN), which stipulates that the existence of some other variable precludes the existence of a potential variable.
Mittal and Falkenhainer demonstrate these concepts by presenting two examples. In the first, the task is to generate valid configurations of options for a car. Because we plan to encode this example for our own purposes, we present it essentially as Mittal and Falkenhainer do in Figure 1. This small model captures, among other constraints, that luxury vehicles must have some sort of sunroof (constraint 1), that any sort of sunroof requires an option for glass (constraint 6), that an sr1 sunroof has no opener (constraint 10), and that a luxury car may not have an ac1 air conditioner (constraint 14).
Given such a CCSP, one can pose several queries:
1. Find/count/enumerate solutions to the CCSP. To find is to compute a single solution; to count is to discover the number of unique solutions, and to enumerate is to list all those solutions.
2. Enumerate all variable flaws in the CCSP. A variable flaw is a potential variable that is present in all solutions, so it is really a classical, not a potential, variable, or a potential variable that is never present in any solution.
3. Enumerate all value flaws in the CCSP. A value flaw is a value for a variable (actual or potential) that is never achieved by any solution.
4. Find/count/enumerate minimum/maximum solutions to the CCSP. A minimum (maximum) solution is one with the fewest (most) potential variables.
5. Find/count/enumerate minimal/maximal solutions to the CCSP. A minimal (maximal) solution is one in which removing (adding) any potential variable leads to a nonsolution.
3. REASONING ABOUT CCSPs IN ALLOY
Alloy Analyzer 4.0 is a language originally intended to model design of data structures. Jackson presents the formal semantics of Alloy in a comprehensive manner (Jackson, Reference Jackson2002). Alloy has been widely used for modeling large complex engineering systems (http://alloy.mit.edu/community/models). It provides a way to specify types and constrain their instances. It can convert those types and constraints into SAT problems that it then solves, displaying the solutions via a graphical interface. If it fails to find a solution, the specification is most likely inconsistent, although the solver might not have searched a large enough population of instances; the specification indicates how many instances of each type to generate for testing purposes. In this sense, Alloy is not a complete solver.
If the graphical representation of the solution seems erroneous to the user, new constraints that the user adds to the specification can prevent the erroneous interpretation.
We find that Alloy is well suited to represent CCSPs. Figure 2 presents our Alloy representation of part of the car example from Figure 1. In Alloy, a sig introduces a type. These types, something like classes in object-oriented programming languages, may be defined to contain members. To model the car-configuration problem, we introduce a sig called Car with a member for each variable. During configuration we define instances of Car.
Each car has required attributes, including a package. The fact that every instance of a car comprises one of each of these attributes is specified with the keyword one. These attributes represent the classical CSP variables of the problem.
A car has additional optional attributes, including a battery and an air conditioner. These attributes correspond to the potential variables of the problem. We specify them with the keyword lone to state that each instance of a car may have at most one instance of each of these attributes. Alloy can now generate one instance of every classical variable and an optional instance of every potential variable. The particular instance that Alloy generates captures the CSP idea of a variable's value.
We introduce each CCSP variable with an abstract sig, introducing a type (such as Package) that has no direct instances. Then we introduce subtypes (such as Luxury). These subtypes may have at most one instance each.
Constraints are represented inside a fact. RV and ARV constraints differ in the form of their left-hand side, referring either to values (like Luxury in constraint 2) or variables (like sunroof in constraint 6). RN and ARN constraints (like constraints 13 and 14) differ from RV and ARV constraints only in that they have no on the right-hand side.
This representation would lead to a fifth and sixth sort of constraint not contemplated by Mittal zFalkenhainer, in which the nonexistence of a potential variable leads to the existence or nonexistence of another potential variable. We would model such constraints in Alloy by facts with no on the left-hand side and either one or no on the right-hand side.
The Alloy program is executable. It generates the solution in Figure 3, among others. Unfortunately, Alloy gives us no way to directly count or enumerate the solutions, short of interacting multiple times with the Alloy Analyzer to request the next solution.
4. REASONING ABOUT CCSPs IN ASP
To represent CCSPs using ASP, we use the syntax that lparse recognizes and converts to a form acceptable to solvers such as smodels (Niemela & Simons Reference Niemelä, Simons, Dix, Furbach and Nerode1997), clasp (Gebser et al., Reference Gebser, Kaufmann, Neumann and Schaub2007), and Cmodels (which converts lparse into SAT and invokes a SAT solver; Giunchiglia et al., Reference Giunchiglia, Yu and Maratea2004). ASP programs deal with predicates, which are either true or false. We introduce a predicate for each value of each actual and potential variable. For instance, the predicate package(luxury) represents the value luxury for the variable package. In any given model, this predicate is either true or false.
Many ASP solvers allow cardinality-constrained predicates, in which the number of true predicates in a given list is bounded above, below, or both. We say
to represent a cardinality-constrained predicate stating that at least 0 and at most 1 of the three predicates in the list is true. The car may have no battery at all, but if it has one, the battery must be one of small, medium, or large. lparse allows a shorthand for lists of predicates that share the same functor; we can equivalently write
We use ASP implications to represent CCSP constraints:
This implication says that if a car has the luxury package, it must have at least one of the battery sizes.
Finally, ASP programs have failures,Footnote 1 indicated by an empty left-hand side of an implication. The conjunction of predicates (some of which may be negated) on the right-hand side must not be true in any satisfying model. For example,
specifies that no model may have an sr1 sunroof and have either an automatic or a manual opener.
An excerpt of the cars specification in lparse syntax is presented in Figure 4. Although the lparse representation is not as elegant as the Alloy one, it is not difficult to read. Instead of one, we bound above and below by 1 (as in the rule for the classical variable package). Instead of lone, we bound below by 0 and above by 1 (as in the rule for the potential variable battery). Instead of no, we bound a failure below by 1, as in rule 10. We represent constraints that preclude particular values by failures (rule 13). We represent classical constraints that imply particular values by implications (rule 16).
ASP solvers, unlike Alloy, can easily enumerate all solutions. Each solution is an answer set, that is, a set of predicates that satisfies all the rules in the specification. A few of the 450 solutions to the car configuration specification are presented in Table 1. One can look through the list of solutions to search for variable and value flaws. However, one can also generate them automatically, as we show in the next section.
Note: ASP, answer-set programming.
5. AUTOMATICALLY CHECKING FOR SPECIFICATION FLAWS
Specifications of CCSPs can contain a variety of flaws that can be difficult to detect manually (Sabin & Freuder, Reference Sabin and Freuder1998). We can apply both the Alloy and ASP approaches to identify flaws in the constraint specification. In particular, we can discover that the car-configuration problem exhibits both a variable flaw and a value flaw. As we mentioned earlier, a variable flaw occurs when a potential variable is required in all models, that is, an option is not really optional. A value flaw occurs when a value cannot exist in any valid configuration, so it does not represent an option.
5.1. Checking for specification flaws in alloy
We extend the model we presented in Figure 2 to introduce an abstract type Flaw, with lone subtypes for each category of flaw for we would like to test. Here are some of the sig definitions for the possible flaws in our model.
We then introduce constraints that force the existence of flaw instances, such as one for BatteriesFlaw.
Given similar definitions for each flaw, we can run the Alloy Analyzer requiring no flaws for a large number of cars as follows:
This run fails to find an instance; by experiment, we need to raise the number of flaws to 2 before the analyzer finds a solution. This solution includes an instance of batteryFlaw (all cars have batteries: a variable flaw) and noConvertible (there are no convertibles: a value flaw).
Figure 5 shows the Alloy Analyzer visualization of the flaws in Mittal and Falkenhainer's specification from Figure 1. We clearly see four instances of Car, with links to their associated components. However, on the far right of the figure we see an instance of batteryFlaw and of noConvertible. The fact that we see instances of these flaws demonstrates that they exist in the specification. The instance of batteryFlaw indicates the presence of the variable flaw highlighted above, namely, that batteries are not optional, despite the fact that the specification suggests the opposite. The presence of the value flaw that no convertible cars are possible is indicated by the instance of noConvertible.
We can explain these flaws, once we find them, by referring to the original specification of Figure 1. Rule 7 forces a battery in every car that has an engine, and engine is a classical variable. We might as well call battery a classical variable as well.
The other, noConvertible, is a value flaw: it is impossible to generate a convertible. This flaw is hidden in the implications of the activity and classical constraints. By constraint 11, convertibles do not have sunroofs. By constraints 1 and 3, cars with the luxury and deluxe packages do have sunroofs, so by elimination, convertibles must have the standard package. But by rule 15, cars with the standard package are not convertibles.
5.2. Checking for specification flaws in ASP
We expand the lparse representation for the car CCSP by adding a second, numeric, argument to every predicate. The new argument represents car number. For example, package(luxury,4) is a predicate indicating that the fourth car has a luxury package. Now rules like
are shorthands that lparse expands (in a process called grounding) to a new rule for each valid value of N. We can limit any solution to four car designs.
The number 4 is arbitrary; we will use it for the examples to follow. Next, we introduce new nullary predicates for each value of each variable (both classical and potential) to indicate the fact that no car at all uses a particular value, such as in this rule:
The grounder converts this shorthand into a rule containing a list of predicates package(luxury,1) to package (luxury,4). If not a single one of these package predicates is true, which happens if none of the N cars has the luxury package, then noLuxury is true, indicating a possible value flaw.
For each potential variable, we introduce two rules, like these:
The grounder expands the first rule to four rules, one for each car. If for any car, no sunroof at all is specified, then okSunroof is asserted. If no car at all asserts okSunroof, then we have a variable flaw, as evidenced by asserting sunroofFlaw.
A solution to this expanded program may contain one of the predicates indicating a flaw for several reasons. One is that the particular solutions chosen for the N cars may simply not be the ones that demonstrate the use of each value and the absence of each potential variable. We deal with this possibility by asking the solver to minimize the number of such predicates:
The solver now searches for solutions containing N cars that have the fewest flaws. If we limit N to 3, for instance, we find at least four flaws: noLuxury, noConvertible, batteryFlaw, noManual. Setting N = 4 produces the apparent flaws batteryFlaw and noConvertible. No matter how high we set N, these flaws remain.
When we try the same technique on the second example that Mittal and Falkenhainer present (we omit the second example in the interest of space), we also find both a variable flaw (can capacity) and a value flaw (the particle-physics value of the ontology variable).
It is instructive to note that only four cars are needed to cover the reachable parts of the variable domains; we might have expected that far more are needed. We can inspect these cars to verify that all reachable values are covered and that potential variables can be omitted, as in Table 2.
6. OPTIMAL CARDINALITY CONFIGURATIONS
We might often be interested in finding solutions to a set of conditional constraints that involve the fewest number of options or the largest number of options. We briefly demonstrate how such queries can be answered using our ASP model. We can use the minimize construct of lparse with our original formulation (before we add the numeric argument) to find a minimum solution, that is, a solution with the fewest potential variables. The requirement we add is simply as follows:
Using the clasp solver for our lparse model we obtain 18 optimal (minimum) solutions, including those presented in Table 3. Similarly, by using maximize, we can enumerate all 176 maximum solutions, such as those also presented in the table.
7. FMs
We now consider FMs, another form of specification that is encountered in domains such as software configuration, in which the architecture of an artifact is represented graphically. Although FMs appear quite different from CCSPs, they have very similar purposes and yield to very similar analysis. FMs are directed acyclic graphs, where nodes are called features and edges imply various kinds of constraints (Czarnecki & Eisenecker Reference Czarnecki and Eisenecker2000). A solution is a subset of the features that satisfies all of the constraints. If a feature is present in a solution, then all the features on the path from it to the root of the tree must also be present. A feature in the tree may be marked as mandatory, meaning that it must be present in any solution if its parent is present; otherwise, it is optional. A feature may indicate that its set of children constitutes an OR set, meaning that if the feature is present, at least one of the children must be present. Similarly, a feature may indicate that is set of children constitutes an XOR set, meaning that if the feature is present, exactly one of its children must be present. Additional nontree edges indicate that if a feature is present, its successor along the edge must also be present or must not be present.
Figure 6 is based on Segura's (2008) FM for mobile telephones. Each node in the tree is a feature that might or might not be included in any model. Filled circles above features indicate that the feature is mandatory if the parent feature is included in a model. Open circles indicate optional features. Filled semicircles under a node indicate an OR set of children; if the parent is included in the model, at least one of the children must be included. Open semicircles under a node indicate an XOR set of children; if the parent is included in the model, exactly one of the children must be included. Therefore, the Media feature is optional, but if it is present, the MP3 subfeature is mandatory. The OS feature requires that exactly one of its subfeatures, Symbian or WinCE, must be present. The Messaging feature requires that at least one of its subfeatures, SMS and MMS, must be present. The Games feature requires the presence of the Java feature elsewhere in the tree.
FMs are in most ways like CCSPs. Features in FMs are like variables in CCSPs. These variables have only one possible value, which we can depict as yes. The mandatory, optional, and edge constraints are like activity constraints. The OR and XOR constraints do not map directly to CCSPs, however.
Given these similarities, it is not surprising that representing FMs in Alloy or ASP is very much like representing CCSPs. In lparse, for instance, we can indicate the mandatory nature of Settings and the optional nature of Media this way, where N refers to the serial number distinguishing phones:
We specify the constraint that Settings is implied by any child:
We represent the OR and XOR set constraints for the children of Messaging and OS:
Finally, the extra edge constraint from Games to Java:
FMs are also subject to flaws. If a feature is marked as optional but exists in all solutions, the FM has a optional-feature flaw. If a feature is absent in all models, the FM has a missing-feature flaw. These flaws can result from nontree edges. For instance, an edge from AlarmClock to Symbian in Figure 6 (Segura, Reference Segura2008) would create a missing-feature flaw for WinCE. An edge from AlarmClock to Java would create an optional-feature flaw for Java.
Methods very similar to those we use in CCSPs can discover these flaws in FMs.
8. XML REPRESENTATIONS
In order to standardize how we represent CCSPs and FMs, we have designed XML Document Type Definitions (DTDs) for both, based roughly on XCSP 2.1, the DTD for CSPs (http://www.cril.univ-artois.fr/CPAI08/XCSP2_1.pdf). We have also written Perl scripts that accept instances of CCSPs and FMs obeying these DTDs, generate lparse renditions of the constraints, and then apply clasp to count or enumerate solutions, find minimum and maximum solutions, and detect flaws. In this way we can automatically generate a formal model of a configuration problem from a very natural specification.
Figure 7 shows our XML representation of the cars CCSP. The constraints typically name a variable or value as a condition and as a result. Either may be negated (as in constraint 10). The XML representation may include the logical connector and in the condition (constraint 12).
Figure 8 shows our XML representation of the phones FM, which nests feature nodes to mirror the picture of Figure 6.
9. DISCUSSION
Product configuration is a major industrial application domain for constraint satisfaction techniques. CCSPs and FMs have been developed to represent configuration problems in a direct and natural way. In this paper we have presented two alternative approaches to reasoning about specifications of conditional constraint sets: one approach based on well-established formal methods techniques for reasoning about software specifications, and another based on ASP. The models of the constraint specification are natural in both cases and do not require any reformulation of the original CCSP or FM. We have also shown how we could automate the testing for variable and value flaws (for CCSPs), and missing-feature and optional-feature flaws (for FMs), and that it is possible to find optimal cardinality specifications.
The DTD and Perl script are available from the authors under the GNU General Public License (http://www.gnu.org/copyleft/gpl.html). We have used this software on the fairly large “bikes” configuration (http://www.itu.dk/research/cla/externals/clib/Bike.pm), with 27 variables, some them with domains of size 14, 16, and 36. Our analyzer sets N to twice the largest domain size and tries for 10 s to minimize flaws. It then uses divide and conquer to verify each of the discovered flaws, which might be false positives due to insufficiently large N or incomplete minimization within the time limit. Each verification, however, is very fast and not subject to false positives. In the “bikes” specification, our analyzer finds 100 potential flaws in 10 s of minimization and then in another 9 s verifies that 5 are actual value flaws. Finding a solution with a given variable set to a specific value is quite fast (about 0.04 s) even in this relatively large specification; verifying a flaw takes about 0.09 s. We therefore think that the ASP approach scales well. Alloy also scales well; it is used routinely for reasoning about large complex industrial specifications (http://alloy.mit.edu/community/).
10. CONCLUSION
Our future work will study three problems.
1. We will generalize constraint-based explanation techniques so we can give advice on resolving flaws in problem specifications, thus contributing to the emerging literature on conflict detection in formal specifications (Torlak et al., Reference Torlak, Chang, Jackson, Cuellar, Maibaum and Sere2008).
2. We will apply fault detection to configuration, so fixing the value of a variable will eliminate all newly unreachable values of other variables.
3. We will investigate how to handle nondiscrete variables, such as real ranges.
ACKNOWLEDGMENTS
Raphael Finkel's work was partially supported by the US National Science Foundation (Grant IIS-0325063). Barry O'Sullivan is funded by the Science Foundation Ireland (Grant 05/IN/I886). Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding agencies. This work is an extension of a conference paper (Finkel & O'Sullivan, Reference Finkel and O'Sullivan2009).
Raphael Finkel has been a Professor of computer science at the University of Kentucky in Lexington since 1987. He attained his PhD from Stanford University in 1976. He was associated with the first work on quad trees; k-d trees; quotient networks; and the Roscoe/Arachne, Charlotte, Yackos, and Unify operating systems. Dr. Finkel was involved in developing DIB, a package for dynamically distributing tree-structured computations on an arbitrary number of computers. His research includes tools for Unix system administration, databases, operating systems, distributed algorithms, computational morphology, Web-based homework, and ASP applications. He has published over 50 articles in refereed journals and has written two textbooks.
Barry O'Sullivan is the Associate Director of the Cork Constraint Computation Centre and a Senior Lecturer in the Department of Computer Science at University College Cork. He attained his PhD from University College Cork in 1999. His main areas of research interest are constraint programming, artificial intelligence, and optimization, with a focus on application domains such as cancer care, health, environmental sustainability, computer/network security, configuration, design, telecommunications, combinatorial auctions, and electronic commerce. Dr. Sullivan is also interested in theoretical computer science, particularly parameterized complexity and its applications.