1. INTRODUCTION
Work processes in large organizations often constitute generic frameworks that encompass all possible development activities rather than specific development processes that are well suited to the needs of a project. Organization-wide guidelines are often specified as comprehensive work processes describing the sequence of activities to be followed in development. In order to provide commonality, accountability, and to communicate acknowledged good practice across the enterprise, such processes are often established as reference processes that cover a wide range of development projects. As a result, these processes exhibit large size, high complexity, and require a high degree of individualization for specific projects. Companies such as Motorola (Fitzgerald et al., Reference Fitzgerald, Russo and O'Kane2003) and Siemens AG (Schmelzer & Sesselmann, Reference Schmelzer and Sesselmann2004) are committed to process-driven systems development and have developed formalized work processes that each development project must comply with.
The nature of products and environments among business units exerts significant pressure toward diversity and create the desire to engage in adaptation of generic reference processes to suit specific development projects and business units, while retaining as much commonality as possible with the reference process. The size of the processes and discrepancies between the reference processes and the needs of particular projects render this task difficult. In addition to the structure of the reference process, organization-specific constraints associated with particular tasks must be respected when altering the reference template (Bajec et al., Reference Bajec, Vavpotič and Krisper2007). Current practice for this adaptation is to manually apply changes to the reference process template, which is a time-consuming and error-prone approach.
Commercial process modeling tools often only provide support for basic editing tasks. The inability to express and validate organization-specific requirements has led to poor process management, where errors in processes are often rectified once compliance violations with respect to the reference process have been detected. Automated support for process design, adaptation, and validation has the potential to yield significant improvements in the quality of customized processes and a reduction in the effort required to perform adaptation. Although considerable results have been achieved by formal modeling and analysis of processes (van der Aalst, Reference Van der Aalst, van der Aalst, Desel and Overweis2000), analysis techniques predominantly focus on verifying that undesirable states, such as deadlocks, cannot be reached in any execution. Organization-specific requirements and constraints have largely been disregarded.
In contrast, methods developed for automated configuration and customization of complex systems have traditionally incorporated vast quantities of domain-specific knowledge. In this context, configuration means assembling a larger system from a set of available components. Modeling principles to guide the formulation of large domain-specific knowledge bases have been established (Soininen et al., Reference Soininen, Tiihonen, Männistö and Sulonen1998; Stumptner et al., Reference Stumptner, Friedrich and Haselböck1998; Felfernig et al., Reference Felfernig, Friedrich and Jannach2001; Asikainen & Mannistö, Reference Asikainen and Männistö2009) in conjunction with efficient inference procedures (Mailharro, Reference Mailharro1998; Stumptner et al., Reference Stumptner, Friedrich and Haselböck1998; Magro, Reference Magro2010). Declarative modeling principles advocate maintainability and scalability of the approach. However, most approaches have been designed for the configuration of physical systems and product lines, whereas work processes have received comparatively little attention. Albert et al. (Reference Albert, Henocque and Kleiner2005) present a workflow composition methodology that relies on domain specific and generic models, but requires a library of given workflow templates. Conceptual (Heiskala et al., Reference Heiskala, Tiihonen and Soininen2005) and constraint-based models (Mayer et al., Reference Mayer, Thiagarajan and Stumptner2009) have been developed for the synthesis of executable software processes from individual components. Customization methods for service agreements have also been developed (Dausch & Hsu, Reference Dausch and Hsu2006).
However, existing configuration methods cannot directly be applied to work process adaptation: the absence of detailed specifications of the predominantly manual activities in development processes precludes the application of process synthesis from individual work steps, while the need to adapt the process structure based on specific project requirements prohibits the creation of a comprehensive library of specialized process templates. Instead, generic reference processes in conjunction with lightweight domain-specific models can be exploited to guide the configuration process in a semiautomatic manner.
This article presents extensions of established configuration mechanisms to support process adaptation and validation in the presence of domain-specific entities and constraints. A process manipulation framework is presented that allows the semiautomatic adaptation of generic processes to specific requirements of a given project. The framework extends constraint-based configuration techniques to processes by introducing a process-oriented constraint language and solving strategies. This article shows that existing constraint-based formalisms lack the ability to adequately represent path constraints, such as precedence and dominance, in generic work process models. It contributes a constraint language suited to formalize path constraints within a generic process metamodel for work processes and demonstrates how this formalism can be integrated into a generative constraint satisfaction framework for reference process configuration. A version of the approach was used to provide support in instantiating real-world processes in the software development process framework employed by Siemens AG.
First, process models and related adaptation tasks are discussed. Second, a model-based framework for process customization is introduced and a constraint language for process configuration is presented. Third, the results obtained from using a version of the framework for providing process editing support at Siemens AG are discussed.
2. WORK PROCESS MODELS
A variety of models for documenting work processes in large organizations have been developed. Process models describe the type of activities and the order in which they are performed. Some models include explicit time constraints and resources consumed and produced by activities. Process models generally differ in the level of detail, the representation language, and the degree to which the meaning of each model element is captured in a formal language. Aalst and Hee (Reference Van der Aalst and van Hee2004) provide a summary of the major business process and workflow modeling approaches.
In this article, we use the notation of the event-driven process chain (EPC) modeling framework (Keller et al., Reference Keller, Nüttgens and Scheer1992) for illustration purposes. The EPC framework has been widely adopted to describe business processes related to system development and software project management, and its use by Siemens AG meant that EPC processes were available for testing and project outcomes could be directly applied. However, the presented approach is largely independent of any concrete process modeling language and can be applied to any framework where activities, control flow, and constraints are captured explicitly. In particular, it must be noted that EPCs are a semiformal notation, and our framework is consistent with the use of the notation in the ARIS toolkit that Siemens uses for working with EPCs.
The EPC approach relies on a flow-chart-like graphical language for presentation, where individual activities are linked together to form process “chains” describing the possible sequences of executions (the “control flow” in process modeling terminology). Figure 1 shows an example of an EPC process model. A process is represented as a graph consisting of nodes and edges (flow connectors). Nodes of different shapes denote functions and flow connectors specify the possible paths through the process. Nodes represent “regular” functions (activities), control functions, or events. Regular functions are represented as rectangular shapes and represent activities that are performed. The label of a function informally describes the activity it represents. Control functions are drawn as circles and govern the execution and synchronization of different branches in the process. The EPC language provides control functions to select one of several successor branches (∨ split function) and to execute several branches in parallel (∧ split function). The same notation is used to merge alternative branches (∨ join function) and to synchronize the execution in different branches (∧ join function). Events are represented by hexagons and indicate that a certain state in the process has been reached.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160626160848-22669-mediumThumb-S0890060410000594_fig1g.jpg?pub-status=live)
Fig. 1. The event-driven process chain and corresponding function allocation diagram.
The semantics of the EPC language (Keller et al., Reference Keller, Nüttgens and Scheer1992; Kindler, Reference Kindler2006) are inspired by the operational semantics of Petri-nets (Aalst & Hee, Reference Van der Aalst and van Hee2004) and specify the possible sequences of functions that can be executed using a simple activation model based on the transfer of activation tokens along the control flow connectors. Informally, a function is ready to be executed if all incoming control flow connectors provide a control token. Once activated, a function consumes one or more control tokens from its incoming flows, and generates one or more control tokens on its outgoing flows once the function has finished executing. The tokens traveling along outgoing flows subsequently activate their successor function(s).
The example process in Figure 1 specifies that the test specification can be developed in parallel with the design specification and the implementation, but both must be completed before testing can be performed. Similarly, the intellectual property declaration can be created while the implementation and testing activities are in progress, provided that all are completed before the software component is integrated with other components.
Although the EPC framework prescribes the structural rules and activation rules for activities every model must adhere to, the meaning of the user-defined activities is described by textual labels and is not otherwise captured formally. The EPC model can be complemented with function allocation diagrams (FAD; Scheer, Reference Scheer2000) that model the artifacts, data, and resources that are used and produced by a function. Figure 1 shows an example of an FAD. The diagram shows the documents that are required and produced by the activity (polygons with zig-zag bottom line), resources allocated to the activity (rectangular nodes with double border), and files related to its execution.
The EPC and the FAD diagrams provide orthogonal views of the same process. Although the EPC notation facilitates visualization and manipulation of the control flow of the process, the FAD's main concern is resource allocation and artifact tracking.
To effectively manipulate a process, both views must be considered in unison. However, the size of typical development processes prohibits detailed visualization of an entire process and its dependencies. Because changes made in a model of a subprocess or FAD easily propagate to other parts in the process, it is difficult to assess implications of local changes on the entire process. As a result, manual process adaptation may result in processes that no longer conform to the prescribed reference processes, and may contain activities that are unnecessary in the context of a specific project. Similar problems can arise with resource allocation, where resources are allocated that may be unavailable at the required time.
2.1. Process adaptation
Process adaptation approaches can be distinguished by the adaptation principles that are applied. We distinguish reductive adaptation and generative adaptation.
Reductive adaptation is predominantly concerned with making minor modifications to a given reference process. Reductive adaptation deals with minor changes to subprocesses and the implied effects that propagate throughout the entire development process. Typically, some subprocesses would be deleted, alternative subprocesses for an abstract activity would be selected from a given library, and resources would be defined and allocated to individual activities.
Generative adaptation strives to synthesize entire processes and subprocesses in order to satisfy a given goal. Different from reductive adaptation, no reference process is given, and the goal and properties of individual activities must be specified in a formal language. This approach has gained prominence particularly in the context of service oriented software architecture, where complex software systems are synthesized using models of the individual software components. The main focus of this article is on reductive adaptation that will support the specialization of generic work processes to suit a specific project context. Reductive adaptation fits more naturally in this context, because the activities in the generic process are not usually specified in a formal language. However, the formal framework presented in this article can be extended to process synthesis (Mayer et al., Reference Mayer, Thiagarajan and Stumptner2009).
Typical adaptation operations include adding and removing process elements, such as activities, milestones, artifacts and resources, duplicating a subprocess, and associating activities with specific artifacts and resources.
Although reference processes are often created with dedicated process modeling tools, such as ARIS (IDS Scheer, 2006), and executed with the help of project management tools and workflow engines, little support is available for semiautomated tailoring and compliance checking of resulting processes with the reference processes. Niknafs and Ramsin (Reference Niknafs and Ramsin2008) investigated available tools and came to the conclusion that only limited prototypes exist that support a few steps of the method engineering process but do not provide a complete solution. For example, the authors identified only a single tool that is process oriented. However, it does not provide a semantic data model that is required for automated verification. Killisperger (Reference Killisperger2010) also concluded that the same limitations apply in the areas of information system development, human-centered processes, and system-centered processes.
Recognizing and managing the implications of local changes on distant activities and finding suitable repairs for violations that occur during adaptation all remain predominantly manual tasks. Consequently, it has been observed that adaptation frequently results in invalid or inefficient processes, where activities are performed in a way that violates the reference process, or where activities are carried out that are not actually required to achieve the goal of the development project.
Project-specific adaptation of development processes has attracted significant attention, such as Brinkkemper's method engineering proposal (Brinkkemper, Reference Brinkkemper1996) for the creation of situational methods, and the extension of the “product lines” concept to software processes (Armbrust et al., Reference Armbrust, Katahira, Miyamoto, Münch, Nakao and Ocampo2008) in order to manage variation. Simple forms of process adaptation have also been introduced in business process and workflow modeling frameworks, such as configurable EPCs (Rosemann & van der Aalst, Reference Rosemann and van der Aalst2007). However, their focus is usually on structural properties of the process and methods are restricted to simple adaptations, such as enabling and disabling individual activities and pattern-based rewriting of local subprocesses. Validation of the resulting process focuses on the structural integrity of the process and its operational behavior, such as verifying the absence of deadlocks. Domain-specific constraints and constraints governing the assignment of resources throughout the process are usually not respected in the generic adaptation and validation procedures.
Transforming a generic reference process into a project-specific development process generally involves three tasks: tailoring, resource allocation, and instantiation of artifacts (Killisperger, Reference Killisperger2010).
2.1.1. Tailoring
Tailoring is “the act of adjusting the definitions and/or of particularizing the terms of a reference process description to derive a new process applicable to an alternative (and probably less general) environment” (Ginsberg & Quinn, Reference Ginsberg and Quinn1995). Tailoring is not restricted to cutting away unneeded parts but also includes the addition, change, and duplication of parts of the process. Reference processes describe the development of components in an exemplary way for all components of a system to be developed. This general description must be instantiated for each component. Tailoring may include particular sections or constructs in the process as well as specific elements such as activities, artifacts, milestones, and resources.
2.1.2. Resource Allocation
The assignment of resources to activities is called resource allocation (Aalst & Hee, Reference Van der Aalst and van Hee2004). In software development, this predominantly involves the assignment of human resources to activities. However, reference processes are defined in terms of abstract roles rather than functions, which must be translated into specific project resources. This translation may be complicated by availability and scheduling constraints, some roles may be filled by multiple resources, and project participants may act in several roles. For example, the role Developer in a reference process would normally be filled by multiple engineers.
2.1.3. Instantiation of artifacts
Because a reference process is independent of any particular project, artifacts are described by generic templates and placeholders. Similar to processes, such generic artifacts must be individualized for a specific project, and an appropriate number of copies must be associated with relevant activities in the tailored process.
2.2. Process adaptation requirements
Large-scale processes are notoriously difficult to tailor, as it can be difficult to keep track of the structure of the overall process hierarchy, interdependencies between tasks in different subprocesses, and the allocation of resources to tasks at different times. Tools like the ARIS process modeling system (IDS Scheer, 2006) have been used in many organizations to document their work processes. The availability of convenient graphical user interface tools to create and visualize complex processes make these tools an attractive choice for process modelers. However, complex adaptation and restructuring of large processes remains difficult, as little support is provided to ensure that the result adheres to the structural and domain-specific constraints that user may have imposed on the processes.
Several aspects must be considered simultaneously in order to provide effective support for creating and editing process models.
2.2.1. Custom Metamodel
Activity types, their properties, and relations to other process elements shall reflect organization-specific processes. A flexible approach is desired, where the individual classes of activities, relations, and other process entities can be changed without altering the software implementation. For example, most existing frameworks provide either a fixed model that is usually accompanied with rigid mathematical analysis tools (van der Aalst, Reference Van der Aalst, van der Aalst, Desel and Overweis2000) or flexible process editors with few analysis capabilities. The approach presented in this article aims to cover the middle ground between the two extremes, providing a flexible metamodel that is enriched with constraints that govern structural and organization-specific requirements on process models.
2.2.2. Syntactic correctness
The syntax of a modeling language is generally defined by rules governing the structural composition of process elements and their connections. These rules are typically defined in a metamodel that specifies the valid process models that will be accepted in a modeling language. Figure 2 depicts a metamodel for the EPC language introduced earlier in this article. For example, the EPC metamodel requires that control flow connectors link only functions and events. Two subsequent flow connectors without an intermediate function or event violate the metamodel and should be rejected by any decent modeling environment. Furthermore, it is required that each flow connector originates and ends in a function or event; no “dangling” connectors are allowed.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160626160846-88619-mediumThumb-S0890060410000594_fig2g.jpg?pub-status=live)
Fig. 2. The partial event-driven process chain metamodel in Unified Modeling Language notation (List & Korherr, Reference List and Korherr2005).
Although syntactic correctness is an essential requirement for any process model, this criterion is insufficient to guarantee that a model with well-defined operational behavior is obtained in general.
2.2.3. Semantic correctness
The execution of a process model is governed by rules outlining the operational behavior of the model elements. For EPC models, the execution of activities is defined by firing rules derived from Petri nets as outlined in van der Aalst (Reference Van der Aalst1999). Each node in an EPC model with its connectors is mapped to a Petri net fragment. The simple and precise execution semantics of Petri nets specify the execution sequence of the activities in the EPC model. The behavior exhibited by a process model is defined by applying the firing rules to each activity in the process model.
Although the propagation of tokens by a single activity can be understood easily, the global behavior of complex processes may be difficult to assess. In particular, syntactic correctness does not imply that the overall process model functions correctly, that is, each possible execution admitted by the model will eventually reach a final activity and complete the process. For example, a connector that flows into the same activity it originates from will prohibit this activity from ever being activated. Consequently, the execution will halt at this point and prohibit the process from completing successfully. This condition is generally known as “deadlock” (van der Aalst, Reference Van der Aalst, van der Aalst, Desel and Overweis2000).
Although some undesirable patterns can be detected locally, in general, the entire process must be analyzed in order to validate its behavior. Simulators and other process analysis tools have been developed to aid in this task (van der Aalst, Reference Van der Aalst, van der Aalst, Desel and Overweis2000); however, most process editing tools provide only limited support for such validation. As a result, the implications of local changes in large processes may be difficult to assess.
Assessing semantic correctness for EPC models is further complicated by the fact that the meaning and effects of individual user-defined activities is only captured informally. Although the flow of tokens can be simulated, the EPC model lacks the information that is necessary to verify that domain-specific assumptions and constraints have not been violated. Typically, this verification must be performed manually once the model has been created (Killisperger et al., Reference Killisperger, Stumptner, Peters and Stückl2008). Because this task is time consuming and may require detailed knowledge of the entire process, modified processes often do not undergo appropriate validation when changes are applied. Serious errors in several large business processes have been identified through systematic analysis (Mendling, Reference Mendling2009), and previously unknown errors in the customized Siemens processes were uncovered by our analysis described in Section 4 in this article.
2.2.4. Data flow and resource usage
In addition to correct control flow, work processes must also adhere to the correct use of artifacts and resources generated and consumed by the process. For software development processes, typical examples include artifacts, such as documents and source code that is produced, and resources such as manpower, software licenses, and computing systems. Pure process models generally focus on the control flow aspects of processes and leave the implications for resource allocation and availability hidden.
If resource allocation is done separately, critical resources may be unavailable at the time an activity would require them. In addition, the shared use of artifacts may induce dependencies between tasks that may not be explicit in the process model. If processes are restructured without consideration of hidden dependencies, seemingly correct process models may exhibit deadlocks because of unavailable input artifacts and resources. Furthermore, resource-induced dependencies may link seemingly unrelated subprocesses, making it difficult to assess the impact of local changes on the overall process.
Consider the example process in Figure 1. Both activities Implement Component and Create Intellectual Property Rights File require input from the software developer. The shared resource induces a potential dependency between the activities. If resources are tight, activities may have to be scheduled sequentially rather than in parallel (as would be permitted by the process model). Furthermore, the execution of the entire subprocess Implement Component also depends on the remote IP Department's schedule. This dependency is not evident from the control flow and the activity labels alone, but it may severely constrain the possible process schedules, something that must be taken into consideration when altering the Implement Component subprocess.
Large work process models are typically defined as a (hierarchical) process where the possible sequences of (abstract) activities are specified, but the precise meaning of the individual activities is not usually formally expressed. It follows that any attempt to provide automated support to the process designer will have to rely on semi-interactive mechanisms and seek the designer's input to resolve ambiguities and choose between different alternative adaptations.
3. PROCESS CONFIGURATION FRAMEWORK
Adapting a common reference development process to the needs of a particular project can be seen as a configuration task, where the general reference process must be tailored to suit the requirements of a given project. The configuration goal in this scenario is a process that is consistent with the reference process template, has appropriate resources allocated, and no extraneous activities. Although project managers will usually be able to decide which changes must be made to the reference process, applying these changes consistently remains challenging for the reasons outlined in the previous sections. It is here where semiautomated support to check the validity of the customized process and suggest alternative changes is most valuable.
Approaching this problem as a configuration task, the reference process constitutes the initial configuration, and the configuration goal is a valid process that conforms to both the generic organizational and the project-specific requirements. However, different from traditional applications of automated configuration tools, where systems are predominantly built up from an empty initial state, configuration of processes works on an initial process that is subsequently modified to suit a goal that may not have been completely formalized. Rather, configuration in this context is largely a task that follows the propose-and-revise principle (Marcus & McDermott, Reference Marcus and McDermott1989), where manual changes applied by the project managers are validated, and remedies for problems are proposed. The ability of configuration frameworks to incorporate both the concrete initial process state and the generic requirements of the reference process makes this approach particularly attractive. The search for repairs of inconsistencies between the given modified process and the reference process replaces the synthesis of complete configurations from a sparsely instantiated initial configuration.
Although the overall knowledge-based configuration principle is suitable for process adaptation, the underlying knowledge base describing the valid process variants is quite different to that of classical configuration. In the process adaptation scenario, the exact properties of activities are rarely formalized in detail, and constraints on processes will usually involve a number of activities along a path.
The following sections introduce the conceptual framework that supports the representation of processes and the constraints required in the system development process domain.
The framework addresses the major expectations set out for a semi-interactive process configuration tool: organization-specific metamodels are incorporated in an extensible process metamodel, which also forms the basis for constraints that enforce syntactic and semantic correctness. Artifact and resource usage are also governed by constraints. Process entities that should not be altered (because they correspond to activities that have already executed or that have been introduced manually) are excluded by constraints that prohibit the application of certain modifications to these process entities. Temporary inconsistencies are tolerated while manual changes are made, as explanations for constraint violations are provided by highlighting the process entities involved in a violated constraint. Remaining inconsistencies are resolved later through heuristic search. Heuristics attempt to confine changes to local subprocesses and derive processes with no unnecessary entities.
3.1. Conceptual metamodel for process configuration
Knowledge-based configuration requires a specification of the potential constituent components and constraints that specify the set of possible configurations. For work processes, the available component types, their properties and associated value domains, and possible relationships between components must be given. Constraints attached to each entity restrict the valid values and relationships between entities comprising a configuration.
Our process configuration framework relies on a generic process metamodel that captures process entities such as elements, relations between elements, and process scopes. Figure 3 shows the major components of the conceptual process metamodel that has been developed for the Siemens AG application. The model consists of a generic framework that remains the same for all processes, and organization-specific entities that represent the concepts and properties relevant for a particular organization or application domain. Although some model parts are specific to the Siemens Reference Process, the modeling approach is generic and can be applied to a variety of different process models, languages, and domains. The metamodel is also consistent with the EPC metamodel in Figure 2. The embedding can be extended to the full EPC metamodel described in (List & Korherr, Reference List and Korherr2005) by introducing additional subclasses and constraints. Some classes have been renamed to reflect the organization-specific terms that are used in the definition of the Siemens reference process. For example, the FlowConnector in the EPC metamodel is named Flow in our framework. Intermediate subclasses such as ExecutableElement have also been introduced to capture constraints common to all subclasses that exist in the reference process but not in the generic EPC metamodel.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160626160903-25751-mediumThumb-S0890060410000594_fig3g.jpg?pub-status=live)
Fig. 3. The main entities of the Siemens Reference Process metamodel in Unified Modeling Language notation. The model part highlighted in bold represents the generic process framework, and the remaining parts are specific to the Siemens Reference Process. Most organization-specific attributes have been omitted for brevity.
Conceptual class Entity forms the central component of our metamodel that acts as an abstraction of all elements of a process. This concept introduces a unique identifier for each entity instance and establishes the relationship and constraint between the model entities.
A process is represented as a graph structure, where the nodes represent executable actions, resources, and artifacts. Class Element forms the basis of a hierarchy that models specific types of nodes in each category.
Class ExecutableElement models an executable node. Most immediate specializations are determined by the EPC process modeling language. Class Split represents EPC nodes that introduce multiple branches into the process, and class Join models nodes where multiple branches rejoin.
The subclasses representing ∧- and ∨-split and join activities have been omitted for space reasons. Class Event represents signals that trigger subsequent executable elements in the process. Class Activity is used to model activities that will be carried out in the process (represented as Functions in the EPC language). The subclasses AutomaticActivity and HumanActivity distinguish between manual activities and those that are carried out automatically by software systems. These classes are modeled separately, because different constraints apply to each kind.
Executable elements are complemented with models of resources and artifacts involved in the execution of activities. Class Resource models the common properties of all resources available in the reference process, and its specializations HumanResource and NonhumanResource cover personnel and software and hardware tools.
Class Artifact and its specializations WorkProduct and Guideline express the data that are associated with activities. Class WorkProduct covers documents, source code, and other data that are used and created in the development process. Class Guideline represents external information that is required to perform an activity, where the information does not change throughout the development process. Constraints specific to each subclass reflect the different usage patterns.
In addition to the element hierarchy, control flow between executable elements and associations between activities, resources, and artifacts must be captured in the metamodel. Different to many other configuration frameworks, relations are represented explicitly as conceptual classes. This approach is preferred over models where connections are represented implicitly as properties, because conceptually different relationships can be distinguished by specialization, and constraints can be attached to each class. An equivalent model where all constraints are attached to Elements can only be obtained at the expense of duplicated of constraints in multiple classes.
Class Flow represents a relation between elements. Subclass ControlFlow models the propagation of activation between activities in a process. The subtree rooted in class ResourceAssignment models the assignment of resources to activities. Classes ResponsibleResourceAssignment and ExecutingResourceAssignment are specific to the Siemens Reference Process and distinguish the different roles that resources can play. Class InformationFlow links an artifact with the activities where it is created and used. The information flow is partitioned into input and output flows, because different constraints apply to each class.
Class Scope comprises the entities in a process or subprocess. Its main purpose is to capture constraints that apply to an entire subprocess rather than a single element or flow entity. This model element is essential for representing constraints that express properties of paths limited to certain subprocesses.
To avoid potential confusion, it is important to make the distinction that this model is a metamodel of the process models built by process designers. It is not a metamodel of the configuration process. The metamodel is a conceptual description of the problem domain, and can hence be categorized as a configuration model of Soininen et al. (Reference Soininen, Tiihonen, Männistö and Sulonen1998). The metamodel does not describe configuration model concepts, which could be seen as a metamodel of the configuration task. Because the term metamodel is not actually used by Soininen et al. (Reference Soininen, Tiihonen, Männistö and Sulonen1998), we retain the conventional use of the term in software engineering as “a collection of concepts in a domain.”
The conceptual metamodel provides the basis for specifying the activities and flows comprising the reference process and its constraints that all variants must satisfy. The next section introduces the formal framework that will be used to encode the conceptual metamodel as constraints that can be used to automatically validate process variants.
3.2. Constraint-based configuration
As a problem-solving technique in technical systems, the term configuration refers to the assembly of systems from smaller components. To automate this process, the available components and possible interconnections must be formalized. Typically, a taxonomy of principal component types, their attributes, their interfaces through which components can be connected to other components, and constraints that govern the valid compositions must be specified. It is assumed that the available components types are known, but the size and structure of a particular solution is not. It has been shown that this knowledge about a problem domain can be elegantly expressed as a constraint satisfaction problem (CSP; Mittal, Reference Mittal1990; Stumptner et al., Reference Stumptner, Friedrich and Haselböck1998).
A CSP consists of a finite set of variables V and a finite set of constraints C in which bold is used to denote set-valued variables. A constraint is a relation over a set of variables V′ ⊆ V that specifies the valid combinations of value assignments to these variables. The set of values that may be assigned to a variable is called the domain of that variable. Solving a CSP means finding an assignment to all the variables such that all constraints are satisfied.
The main obstacle in using CSPs for configuration is that the set of variables and constraints is static and thus cannot be used to describe problems where the structure or size of the solution is unknown. Dynamic constraint satisfaction formalisms have been developed where the set of variables is extended on demand, and conditional constraint satisfaction techniques have been devised in order to flexibly enable and disable constraints (Rossi et al., Reference Rossi, Beek and Walsh2006).
Generative constraint satisfaction problems (GCSPs; Stumptner et al., Reference Stumptner, Friedrich and Haselböck1998) provide a combination of both techniques by lifting constraints and variables to a metalevel, where generative constraints describe the valid CSPs instances. Generative constraints can be seen as constraint schema that drives the solving process by introducing fresh CSP variables and constraints as new components are added to the configuration. Therefore, GCSPs can adapt the problem size and structure during the solving process based on the information contained in the partial solution.
A GCSP is characterized by the set of available component types, their attributes, and a set of generative constraints: let T be the set of available component types and let A be the set of attribute names.
Definition 1 (GCSP)
A GCSP is a tuple (X, Γ, C, P), where X is a set of meta variables, Γ is a set of generative constraints over X; C is an infinite set of constraint variables representing component instances, and P = {C.a|C ∈ C, a ∈ A} is an infinite set of constraint variables representing the components' attributes. Each component variable C ∈ C may be assigned a value from T, whereas attribute variables P ∈ P may choose their value from C (if P represents an attribute connecting to another component), or may take on a numeric or a string value (if P represents an attribute).■
Each variable in C ∈ C may be active or inactive; it is active if C corresponds to a component that is part of the solution of a configuration problem, and it is inactive otherwise. Similarly for variables in P.
A generative constraint is a logical implication of the form Φ ⇒ Ψ over variables in X, where Φ represents the precondition of the constraint and Ψ specifies the variables and constraints that are introduced into the CSP if Φ holds for some instantiation of X in C. Typically, Φ is used to express that constraint Ψ is applicable only for particular component types, whereas Ψ enforces the requirements on attribute values and connections. Free variables in generative constraints are implicitly universally quantified.
A configuration problem confines a GCSP to those CSPs that satisfy the configuration goal, which is expressed by an initial set of components and constraints that must be met.
Definition 2
(configuration problem). A configuration problem is a CSP 〈V, Δ〉 where V ⊆ C ∪ P represents the initially active variables, and Δ contains a set of (generative) constraints over V that express the initial configuration and the desired configuration goal.■
A solution to a configuration problem R is given by an assignment of values to all CSP variables in R such that all constraints in R are satisfied and all instances of generative constraints over variables in R are satisfied.
To find a solution, all generative constraints ϕ ⇒ ψ ∈ Γ are instantiated with variables in V to check if ϕ is satisfied. If an instantiation of a generative constraint is created, the CSP is extended with the variables and constraints in ψ. This procedure repeats until no more constraints and variables can be instantiated. A solution to the configuration problem has been found if all possible instantiations of generative constraints have been applied and a consistent complete assignment of values to variables in V has been found. Otherwise, the procedure backtracks and selects alternative instantiations of generative constraints.
Consider the example where a software component is to be implemented as part of a work process. Among others, this activity requires the software design specification, a developer, and suitable computing resources. Assume that the component types are T = {ImplementationActivity, SourceCode, SWDeveloper, DesignSpec, Workstation}, and let A = {C, D, S, W}. Here the attributes model connections between components: C refers to a source file, D models the software developer, S the specification, and W the assigned workstation to be used for development. The GCSP representing this domain contains an infinite number of component variables c i, each of which may be assigned a component type from T. Similarly, the set of property variables is given by P = {c 1.C, c 1.D, … , c 2.C, … }, each with domain {c 1, c 2, …}.
Generative constraints describe the relationships and restrictions between component and attribute variables. For example, to assert that each ImplementationActivity component must be connected with a design specification (a component of type DesignSpec) via its S attribute, a generative constraint
![X = ImplementationActivity \wedge X.S = Y \Rightarrow Y = DesignSpec](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160328053628131-0042:S0890060410000594_eqn1.gif?pub-status=live)
is asserted in Γ. Here, X and Y denote metavariables that will be instantiated with variables in C. Note that the domain of X.S is the set of component variables; therefore, the generative constraint establishes the structure of the instance-level CSP dynamically. Once instantiated, the generative constraint will manifest as an ordinary CSP constraint. The constraints will assert that, for some CSP variable c i, the property variable c i.S must refer to another component variable c j whose value is DesignSpec. Assuming that similar generative constraints are defined for DesignSpec and the remaining component types, further constraints will be created that restrict the values of c j's property variables and neighbors. A complete configuration is then given by a set of component variables together with instantiated generative constraints such that all instantiations of generative constraints have been formed and are consistent.
3.3. Constraints for process configuration
In an attempt to extend the configuration approach from physical systems to processes, constraints associated with the entities in the process metamodel must be identified. The constraints will be used by the configuration engine to assess the validity of a (possibly incomplete) candidate process and to guide the search for admissible process variants.
In order to obtain a formal model suitable for automated configuration, the process metamodel and its associated constraints must be rephrased in terms of components and connections. The process entities in Figure 3 form the component types, and their attributes translate into attributes of the corresponding component type. Associations in the metamodel are also represented as attributes. We adopt the model introduced by Mailharro (Reference Mailharro1998) to represent set-valued attributes and relations: associations with cardinality greater than 1 appear as multiple instances of the same attribute. For example, a Scope entity may be associated with multiple subentities via its entities association. An instance of the Scope entity may have multiple entities attribute instances, each referring to a different Entity instance. Cardinality constraints restricting the number of associations are also an integral part of process modeling. In the following sections, Mailharro's work on direct associations between components is extended to generative constraints formalizing the reachability of components within a process.
Table 1 lists the predicates and operators permitted in generative constraints. Although other connectives may be introduced, this set was found to be sufficient to express all constraints used when applying the approach to the large-scale company-wide Siemens reference process described in this article. Following an established constraint notation (Stumptner et al., Reference Stumptner, Friedrich and Haselböck1998), the term T (C) expresses that metavariable C must refer to an entity of type T, or a subtype thereof. Expression C.A refers to an attribute A of a component represented by variable C.
Table 1. Predicates and operators for generative constraints
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160626161240-79805-mediumThumb-S0890060410000594_tab1.jpg?pub-status=live)
Note: C, C 1, and C 2 denote metavariables in generative constraints; E, E 1, and E 2 represent logic expressions; N 1 and N 2 represent numeric expressions; T is an entity type name; and V and A denote a variable and an attribute identifier, respectively.
The configuration of processes poses additional challenges related to expressing properties of paths and subprocesses that may not be known precisely at the time the constraint is written. Different types of constraints can be distinguished based on the locality of affected process entities.
3.3.1. Entity and attribute constraints
Entity constraints are concerned with constraints that affect only a single entity or local scope in the entire process. Such a constraint affects only the values of attributes of entities that can be reached from a single entity by navigating a fixed path comprising the relations between entities in the process model. This type of constraint is most often used to specify value range restrictions of attributes.
In the simplest case, only the attributes of a single process entity are affected by a constraint. For example, assume that entity ExecutableElement has an attribute duration that holds the length of the time span during which the element will be executed, and that the duration must always be a nonnegative number. This constraint can be expressed as generative constraint:
![ExecutableElement \lpar E\rpar \Rightarrow E.duration \ge 0](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160328053628131-0042:S0890060410000594_eqn2.gif?pub-status=live)
Similar to the constraints in the work process configuration example given earlier, ExecutableElement(E) acts as a type test that determines when the consequent of the constraint will apply. In this example, the constraint applies to all constraint variables that model an instance of entity ExecutableElement. Type hierarchies and inheritance relationships in the metamodel are taken into account implicitly; thus, the constraint also applies to all instances of subtypes.
Constraints reflecting project-specific restrictions on the entity types that may appear in a process also belong to this category. Certain activities and flows can be excluded from a process by type constraints and constraints on attribute values. For example, the following constraint prohibits all activities with a label containing the word “Mechanical.”
![Activity \lpar C\rpar \Rightarrow C.label \neq ``{}^{\ast} \hbox{Mechanical}^{\ast}"](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160328053628131-0042:S0890060410000594_eqn3.gif?pub-status=live)
A comprehensive domain model for reasoning about the properties of activities and artifacts can be incorporated in the same way if available. Although the given example constraint may seem overly simplistic and error prone, the controlled language used in the definition of the reference process resulted in sufficient precision when applying the approach solve the process adaptation problem in the Siemens Process Framework (SPF) as described later in this article.
3.3.2. Association constraints
Constraints that govern the valid connections between entities belong in this category. Different from the previous example, the constraint may refer to the properties of neighboring components by navigating via relations to connecting entities. For example, the requirement that only a HumanResource element can be responsible for the execution of an activity can be expressed as follows:
![Activity \lpar A\rpar \wedge Responsible RA \lpar RA\rpar \wedge RA.source = A \wedge RA.target= R \Rightarrow HumanResource\lpar R\rpar](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160328053628131-0042:S0890060410000594_eqn4.gif?pub-status=live)
Here, the dot notation is first used to navigate from the ResponsibleRA to the connected Activity entity, and is used again to constrain the possible values of the target attribute of the connecting entity. In this case, the target is restricted to be of type HumanResource (or a subtype thereof).
3.3.3. Cardinality constraints
The same mechanism can be used to enforce cardinality constraints for relations between entities. For example, the following generative constraints express that every ExecutableElement must be related to a resource via exactly one instance of the ResponsibleRA relation:
![Executable Element\lpar E\rpar \Rightarrow \exists R \colon ResponsibleRA\lpar R\rpar \wedge R.source=E](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160328053628131-0042:S0890060410000594_eqn5.gif?pub-status=live)
![\eqalign{&Executable Element \lpar E\rpar \wedge Responsible RA \lpar R1\rpar \wedge Responsible RA \lpar R2\rpar \cr & \quad \wedge R1.source = E \wedge R2 .source = E \Rightarrow R1 = R2}](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160328053628131-0042:S0890060410000594_eqn6.gif?pub-status=live)
However, cardinality constraints can be expressed more succinctly by using set reification, denoted as , in conjunction with the set cardinality operator
. Set reification collects all values bound to variable V in a model of a given logic expression E into a multiset. Variable V must appear free in E and is implicitly universally quantified. This operation is commonly known as collect in other object-oriented constraint languages (Object Management Group, 2006). This connective is convenient for the application of predicates to sets of selected entities. Using the reified notation, the constraint can be expressed more naturally:
![ExecutableElement\lpar E\rpar \Rightarrow \Vert {\bigcirc} R \colon \lpar ResponsibleRA\lpar R\rpar \wedge R.source = E\rpar \Vert = 1](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160328053628131-0042:S0890060410000594_eqn7.gif?pub-status=live)
3.3.4. Path constraints
Restrictions that relate to the properties of the entities in a path of a process require a different modeling approach. Path constraints arise when relationships between entities that are distant from each other must be enforced. For example, a reference process constraint that every software component must have undergone testing before it can be integrated into larger units can be informally expressed as a path constraint:
In every feasible process execution from an Implement Component activity to an Integrate Component activity, there must be a Perform Test Component activity. (CONS)
Although it is easy to see that this constraint is satisfied in the example process shown in Figure 1, the constraint may affect different entities in other processes. For example, if there were additional entities between “Implement Component” and “Perform Test Component,” the constraint would have to propagate over those components. Such scenarios arise easily in process adaptation as processes are merged and restructured to accommodate resource constraints and scheduling conflicts. In general, the entities involved in a path constraint may not be known at the metamodel level, and the exact path connecting affected entities may not be known at the time the constraint is formulated. Instead, a (partial) process instance is required to determine the concrete set of entity instances that form the connecting paths. In the configuration of physical systems, one can introduce an additional attribute to express aggregate properties, such as total voltage with a battery compartment instead of dealing with individual batteries. However, the same principle cannot easily applied in the process domain, where activities of unrelated subprocesses may be interleaved freely. Many attributes and propagation constraints would be required, which would yield a difficult to maintain knowledge base. Therefore, this type of constraint cannot be expressed in a GCSP that relies on fixed paths between components. The constraint language must be extended to incorporate path constraints with dynamic scope that is determined on a concrete process instance. Once a concrete process is available, the execution paths and their components can be determined easily, and path constraints can be instantiated and evaluated.
In order to evaluate a path constraint the affected entities must be determined. This computation is essentially a reachability problem defined on the process graph, where constraints specify properties that must be true in some execution state in the future (or past) relative to another state. This problem has been studied thoroughly in the field of modal logic and its application to automated verification of software and hardware systems (Clarke et al., Reference Clarke, Grumberg, Jha, Lu and Veith2003).
Computing all reachable states induced by a given process model is computationally difficult (Jensen, Reference Jensen1997). It is of more importance that this can only be done once a concrete process model is available for analysis, which is not usually the case when the configuration knowledge base is built, because generative constraints should be written to be independent of any particular example configuration. The differences between the structural process model and its reachable state space may make it difficult for users who are familiar with EPCs but may not be experts in formal logic to write appropriate constraints. Process algebras (Milner, Reference Milner1990) may align more closely with parallel processes but may be difficult to use for nonexperts, as algebraic specification languages, the EPC framework and constraint-based techniques all follow different modeling paradigms.
The discrepancy between representations of structure and transitions between reachable states has profound implications for writing path constraints. Writing temporal logic formulas to verify particular reachability properties requires us to know the branching behavior of the concrete process, where the execution is split into multiple parallel branches, and where the branches rejoin. For example, in an attempt to formalize constraint (CONS) the formula
![\eqalign{&Activity\lpar I\rpar \wedge I.label = ``Integrate \,Component" \Rightarrow \cr &EO \left[\matrix{Activity\lpar T\rpar \wedge T.label = ``Perform \, Test \, Component" \wedge \hfill \cr EO\left[Activity\lpar M\rpar \wedge M.label = ``Implement \, Component" \right]} \right]}](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160328053628131-0042:S0890060410000594_eqn8.gif?pub-status=live)
could be devised. Here, the modal operator EO (Latvala et al., Reference Latvala, Biere, Heljanko and Junttila2005) is used to denote that a control-flow path exists where, in the past, its argument proposition is true at least once. Note that the constraint is adequate for the process example in Figure 1, but may not be strong enough for other processes with different structure and other instances of the same activity types. The fundamental problem is that the transition relation underlying the modal operators can only be determined once the branching nodes in the process are known. A formula specific to that structure can then be devised. Unfortunately, this constraint may also apply in different process variants but may have unintended effects. For example, if the parallel split and join nodes in Figure 1 were replaced with decision and merge nodes (this alternative may arise because of erroneous manual change), the constraint would still apply but would no longer enforce that Integrate Component is always preceded by implementation and testing.
We circumvent this problem by adopting a restricted language to express path constraints that is based directly on the process structure and not on its corresponding state representation. Our language is based on the observation that many interesting path properties can be formalized based on the concept of dominance (Cytron et al., Reference Cytron, Ferrante, Rosen, Wegman and Zadeck1991).
The binary dominance relation describes the relation between process entities with respect to the set of all possible process executions. It is usually defined based on the Control Flow Graph (Muchnick, Reference Muchnick1997), a representation of the possible sequential execution behaviors of a program. Informally, an entity A dominates an entity B if and only if, in all process executions from the start element to the end element, an execution of A precedes every execution of B. For example, activity Develop Design Spec Component dominates activity Implement Component in Figure 1. The dominance relation defined on the inverse control flow graph is called postdominance. An entity A postdominates entity B if and only if all process executions from B to the end of the process subsequently pass through A.
Dominance and postdominance can be generalized to a set of elements:
Definition 3 (domination by a set)
Let A = {A 1, … , A n} be a set of entities, and let B be an entity in a sequential process P. Set A dominates B if and only if in all executions from the start element to the end element, an execution of an entity in A precedes every execution of B.■
Postdominance can be generalized analogously. For notational convenience, the set braces will be omitted for singleton sets in the remainder of this article.
The classic definitions of control flow graph, dominance, postdominance, and their extensions to sets assume a sequential process. We extended the definition to concurrent processes as follows:
Definition 4 (sequentially implied dominance)
Let P be a process model and let P′ be the projection of P onto subtypes of entities Executable Element and Control Flow. That is, P′ is the subprocess of P where all other elements have been removed. Let further A be a set of entities and B be an entity in P′. Set As-dominates B if and only if A dominates B in all sequential executions admitted by P′.■
The projection of P onto its executable elements is the smallest process that admits the same set of executions as the original process. The s-postdominance relation can be defined analogously.
For example, the process definition in Figure 1 admits 11 possible sequential executions (split and join nodes are ignored for simplicity). In all executions, activity Implement Component is executed before Perform Test, hence the former s-dominates the latter. Activity Create IP Rights File does not s-dominate Perform Test, as there is a possible execution sequence where testing is performed before any intellectual property is documented. In fact, there are 3 such sequences.
The use of s-dominance for specifying process constraints allows the process modelers to specify constraints in terms of the process structure, whereas the evaluation engine will be responsible for checking that the relation is satisfied in all possible sequential executions. Although this operation is potentially more expensive than evaluating simple component constraints, dominance can be inferred from the process structure (Cytron et al., Reference Cytron, Ferrante, Rosen, Wegman and Zadeck1991). No exhaustive simulation of the possible executions is necessary.
The constraint language for specifying generative constraints has been extended to include s-dominance and s-postdominance relations and set reification. s-dominance is denoted by A ◃ B, s-postdominance by A ▹ B, and set reification by . Table 1 lists the predicates and operators permitted in generative constraints. Other connectives may be introduced; however. these operations were sufficient to express all constraints used in our case study.
The constraint that implementation activities must precede testing, which must precede integration, can now be expressed in the extended constraint language:
![\eqalign{&Activity\lpar M\rpar \wedge M.label = ``{\rm Implement Component}" \wedge Activity\lpar T\rpar \cr &\quad \wedge T.label = ``{\rm Test\ Component}" \wedge Activity\lpar I\rpar \wedge I.label \cr & \quad = ``{\rm Integrate\ Component}" \Rightarrow M \ltri T \wedge T \ltri I}](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160328053628131-0042:S0890060410000594_eqn9.gif?pub-status=live)
The use of the s-dominance relation simplifies the constraint such that no modal operator and nesting of temporal expressions is necessary.
3.3.5. Scope constraints
Dominance constraints can also be used to restrict the scope of generative constraints to subprocesses. For example, certain quality assurance activities and milestones related to component development must not be bypassed by any path throughout the development process. By using dominance constraints, the scope of a generative constraint can be reduced to the subregion or the subprocess that is relevant to such a path constraint. A conjunction of s-dominance and s-postdominance can be used to isolate a region that is delimited by unique entry and exit elements. For example, to express that in each subprocess delimited by entities Implement Component Decided and Component Implemented, activity Create IP Rights File must be executed, can be formalized as follows:
![\eqalign{&Activity (S) \wedge S.label = ``\hbox{Implement Component Decided}" \cr &\wedge Activity (E) \wedge E.label = ``\hbox{Component Implemented}" \cr &\Rightarrow \exists P\, \colon \left(\matrix{S \ltri P \wedge P \rtri E \wedge P \ltri E \wedge \hfill \cr Activity \lpar P\rpar \wedge P.label = ``\hbox{Create IP Rights File}"} \right)}](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160328053628131-0042:S0890060410000594_eqn10.gif?pub-status=live)
The dominance constraints S ◃ P ∧ P ▹ E confine P to match only elements that are inside a region that can only be entered via S and only be exited via E. The third constraint P ◃ E states that in all executions where E is executed, P must be executed before E. Hence, P cannot be bypassed.
In the previous discussion we have implicitly assumed that there is only one instance of each activity type in a process. If multiple instances may be present, additional constraints must be introduced to ensure that the entry and exit elements of each subprocess match. We resort to constraints relating the entities of a subprocess to its unique Scope entity. The same principle can also be used to enforce that all activities in a subprocess share a common property. For example, all development activities in a subprocess should relate to the same work product:
![\eqalign{&Scope(S) \wedge Activity(A1) \wedge A1.parent = S \wedge OutputIF(F1) \cr &\wedge F1.source = A1 \wedge Activity(A2) \wedge A2.parent \cr &= S \wedge OutputIF(F2) \wedge F2.source = A2 \Rightarrow \exists W: WorkProduct(W) \cr & \quad \wedge F1.target = W \wedge F2.target = W}](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160328053628131-0042:S0890060410000594_eqn11.gif?pub-status=live)
3.3.6. Data flow constraints
Dominance and postdominance provide natural means to relate the data flow between entities that create certain artifacts with those entities that use the artifacts. Constraints are associated directly with the artifacts and flows rather than with the activities that create or use an artifact. Generative constraints associated with an artifact entity can navigate its associated flow entities to reach the activities that create and use the artifact. For example, the invariant that an artifact must be created before it can be used as input to an activity is modeled as follows:
![\eqalign{&Artifact\lpar A\rpar \wedge InputIF \lpar IF\rpar \wedge IF.source = A \wedge IF.target \cr & = U \wedge Activity \lpar U\rpar \Rightarrow \exists C \colon \exists OF \colon \lpar Activity\lpar C\rpar \wedge OutputIF \lpar OF\rpar \cr & \wedge OF .source = C \wedge OF .target = A \wedge C \ltri U\rpar}](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160328053628131-0042:S0890060410000594_eqn12.gif?pub-status=live)
The constraint expresses that for each activity that requires an artifact, there must be an activity that creates the artifact and dominates the activity using the artifact. If an artifact can be created by one of multiple activities, the constraint can be revised to
![\eqalign{&Artifact\lpar A\rpar \wedge InputIF \lpar IF\rpar \wedge IF{.}source = A \wedge IF\!{.}target \cr &= U \wedge Activity \lpar U\rpar \Rightarrow \exists Cs \colon\ \lpar Cs \subseteq {\bigcirc}C \colon \lpar Activity \lpar C\rpar \wedge OutputIF \lpar IF^{\prime}\rpar \cr &\quad \wedge IF^{\prime}.source = C \wedge IF^{\prime}.target = A\rpar \wedge Cs \ltri U\rpar}](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160328053628131-0042:S0890060410000594_eqn13.gif?pub-status=live)
The constraint states that for any activity U using an artifact A there must be a set of activities Cs that supply the artifact and that together dominate its use. Therefore, no execution may exist where U is reached without A being created first. However, it may not be known precisely which activity will create A when the process is executed.
3.4. Solution search
Configuration systems in some domains can operate largely without human intervention; however, fully automatic process configuration is difficult to achieve. Although constraints on process structure and some domain-specific aspects can be exploited, many possible configurations remain for large processes. Albeit additional formalization of process entities would eliminate some candidates, the effort required to build the detailed specifications of activities precludes this approach in practice. For example, EPC models specify possible control and data flow between process elements, but do not usually include specifications about detailed behavior of functions or structure and meaning of artifacts. Although more formal models may exclude invalid process candidates that would be admitted by a simpler model, formalizing the missing knowledge is time consuming and often requires considerable expertise in formal languages. For these reasons, process configuration as presented in this article abandons the goal of obtaining all optimal configurations and adopts semi-interactive heuristic search procedures to compute solutions.
The overall search for processes proceeds as outlined in Figure 4: starting with a copy of the reference process as initial configuration, the requirements and constraints are specified by the user. In addition, manual changes to the initial process can be made in an ad hoc manner (e.g., through a graphical editor). When changes are made, the configurator evaluates the generative constraints on the candidate process and highlights violated constraints and their associated process entities. Subsequently, heuristic search is applied to resolve constraint violations. The resulting candidate process is presented to the user for validation, who may request further changes and recommence the search. This process repeats until a valid process that is consistent with all constraints is obtained.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160626160918-50032-mediumThumb-S0890060410000594_fig4g.jpg?pub-status=live)
Fig. 4. The interactive process configuration workflow. [A color version of this figure can be viewed online at journals.cambridge.org/aie]
For clarify of presentation we omit much of the specific technical implementation details and optimizations and focus on the overall conceptual mechanism. For example, change operations can be applied incrementally by maintaining the differences to a base model rather than copying the entire model each time the configuration is modified.
The initial candidate process is obtained from a copy of the generic reference process and its constraints, amended with the constraints describing project-specific requirements. For example, the process fragment in Figure 5 can be obtained from the reference process. Assuming the process will be tailored to a pure software development project, a generative constraint can be added that excludes all development activities not related to software. Resource and scheduling constraints can be added to direct the assignment of resources to activities. Furthermore, facts specifying the available personnel and their roles can be added to the process. Manual operations can be applied to create, modify, and delete process entities. Changes are recorded and constraints may be added to ensure that the subsequent heuristic search respects the manual modifications.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160626160927-06271-mediumThumb-S0890060410000594_fig5g.jpg?pub-status=live)
Fig. 5. An example process fragment. Activities and resource assignments highlighted in gray are deleted when the process is tailored to a pure software development project.
Once the initial configuration and constraints have been established, automated search procedures are applied to resolve constraint violations. Because exhaustive search is prohibitively expensive, heuristics are applied to find good solutions. When applying the framework to the Siemens reference process, the quality of a configuration is indicated by the inverse of the number of constraint violations and the number of process entities. Other organization-specific measures, such as makespan or robustness with respect to delays or resource availability, can also be applied to guide the search. In our case study, minimizing the number of constraint violations while minimizing the process size as a secondary measure was sufficient to obtain “satisfying” configurations that improved upon the best processes that had been created manually. Optimizing makespan and other process measures was not a primary concern in our study.
Change operators are applied to the process in order to resolve constraint violations. Our framework incorporated the basic operations to add and remove process elements in a path, add and remove flows between elements, duplicate subprocesses, and assign resources and artifacts to elements. These operations closely align with the operations that had been carried out manually by project managers. The selection of operators is guided by heuristics and restricted by constraints reflecting the modifications made by project managers. Numeric and other constraints are solved by standard CSP techniques.
For semi-interactive configuration, heuristics that apply only minimal alterations to the existing configuration are preferred to extensive but “optimal” changes, because large modifications tend to be difficult to trace by users. Currently, the framework adopts the following generic principles to direct search; organization-specific heuristics can be implemented if desired. Heuristics are listed in order of preference.
1. Promoted pruning: The predominant operation in process tailoring is the removal of unnecessary entities from the generic reference process. The removal of one entity may propagate throughout the process and yield smaller processes with fewer (violated) constraints. For this reason, deletion of entities is considered the preferred resolution for constraint violations, unless the entity was introduced manually by the user.
2. Minimum change: Dually, the creation of entities is only considered if no other resolution yields a valid configuration. This strategy, in conjunction with constraints on the number of created entities, favors small processes and prohibits infinite regress of the search procedure in subprocesses where no consistent solution exists (Mayer et al., Reference Mayer, Thiagarajan and Stumptner2009).
3. Minimal violation: If multiple repair candidates apply, select one with the least number of remaining violated constraints.
4. Locality of changes: A large number of possible alternatives may exist for routing data and control flow in a partially specified process. Although most alternatives may be admissible given the generative constraints, flows between distant entities make it hard to comprehend processes. Furthermore, artifacts tend to be manipulated in closely related activities. To account for this observation, flows between entities with small distance are preferred. Distance is measured as the minimum number of control flow steps between two entities.
5. Deferred synchronization: Frequent use of synchronization elements yields processes with little concurrent execution. Because the absence of parallel activities may impact on future resource assignment, the introduction of synchronization elements is delayed as long as possible (but not beyond the scope of the current subprocess). This heuristics favors parallel branches but respects locality of control flow.
Using these heuristics, the neighborhood of the candidate configuration is explored in order to find a revised, valid process. Because the heuristics may not be able to resolve all constraint violations, further user interaction may be necessary.
If the candidate configuration obtained from the heuristic configuration step satisfies all constraints, the adaptation process terminates. Otherwise, the user is given the option to resolve some of the remaining inconsistencies manually and repeat the heuristic search.
From an organizational point of view, custom adaptation of reference processes may also induce a maintenance problem when changes in the reference process must be applied to a large number of variants. Although reference process maintenance is beyond the scope of this article, techniques from version control systems and configuration can also be applied to resolve this problem. Graph-based differencing algorithms, such as the approach used by Zeller (Reference Zeller2002), can be applied to compare different versions of the reference process and merge differences into the modified processes. Once changes have been applied, constraints can be verified and configuration recommenced if necessary. In this approach, the search for valid configuration that repair violated constraints corresponds the conflict resolution step in manual version management. Once tailored processes have been revised, running instances of previous process versions may need to be adapted to conform to the new model (Weber et al., Reference Weber, Reichert and Rinderle-Ma2008).
Figure 5 shows a generic reference process fragment that includes activities related to both software and hardware development. For a software project, the hardware development-related aspects are irrelevant and should be removed. Although deleting the relevant entities is trivial in this example, it can be a tedious task in a large process. The constraint-based approach allows the project manager to exclude all hardware-related development activities by asserting a constraint like Eq. (3). The adaptation can then proceed largely automatically, which will result in the removal of the shaded entities in Figure 5. Control flow connectors will be rerouted accordingly. Data flow constraints ensure that artifacts related to the development of mechanical parts will be removed even if they are only indirectly related to the deleted activities. For example, the use of the Mechanic RE Specification document in activity Prove Feasibility of RE will be automatically removed. Constraints prohibiting unnecessary process entities, such as the empty parallel control flows arising after deletion, trigger the removal of the redundant split and join functions.
3.4. Discussion
Constraint formalisms on top of object-oriented models are not new. The Object Constraint Language (OCL; Object Management Group, 2006), an extension of the Unified Modeling Language (UML), also follows this approach. Although arbitrary constraints over object graphs can be expressed in OCL, the constraint language cannot easily capture path constraints where the structure of the specific process configuration is not known. For example, expressing a dominance constraint requires quantification over all paths using the forall operator. However, the knowledge engineer would have to anticipate the expressions to reach the set of relevant branching points in the process for all possible process configurations. Alternative implementations where dominance is simulated using recursion, custom properties and OCL iterators are too restrictive or require complex expressions (Beckert & Trentelman, Reference Beckert and Trentelman2005). Related languages like XPath (Berglund et al., Reference Berglund, Boag, Chamberlin, Fernández, Kay, Robie and Siméon2007) can express reachability and quantification but suffer from the same problems as the approach using OCL's forall operator.
Temporal constraint satisfaction problems have also been applied to reason about the order of execution of activities. Allen's (Reference Allen1983) temporal operators, for example, are sufficient to express partial orderings of tasks along a path, but quantification over execution paths is difficult in the standard constraint formalism if the process structure is not known at the time the constraints are written. Therefore, precedence relationships between artifacts on alternative branches, such as constraint [Eq. (13)], cannot easily be enforced in a generative way on the process metamodel.
Thomas and Fellmann (Reference Thomas and Fellmann2007) combined EPCs with domain-specific models such that activities and related metadata in a process can be formalized. In this approach the entities in an EPC process model are annotated with concepts defined in an EPC metamodel similar to the one presented in this article. Entities in the EPC metamodel are linked with concepts and instances described in a domain-specific ontology. From the annotations the domain- or organization-specific meaning of each process element can be inferred in terms of the underlying ontologies, and ambiguities in the textual labels in the plain EPC model can be resolved. This approach is orthogonal to the configuration and repair aspects addressed in this article, but could be used to introduce additional problem-specific metadata and constraints in the process model.
Egyed et al. (Reference Egyed, Letier and Finkelstein2008) used language-specific constraints to highlight errors in software design models. Software design diagrams in the UML are converted into a CSP where constraints verify the consistency of multiple diagrams representing different aspects of the same software component. From violated constraints the conflicting diagram elements and potential repairs are inferred and ranked using heuristics. The approach shares the semi-interactive repair principle and constraints expressed in OCL, but relies on a metamodel specific to the UML.
Ly et al. (Reference Ly, Rinderle and Dadam2008) validated concrete workflow executions using workflow models annotated with semantic constraints. In this work workflow models are enriched with dependency and exclusion constraints between activities. The constraints are used to verify workflows after ad hoc changes have been made and if running instances of the old workflow can be migrated without conflict to the modified model. This work focuses mainly on verification of manual changes and not on computing tailored work processes. Although domain-specific constraints are used to detect errors in executions, their constraint language lacks expressiveness and configuration and repair aspects are left aside.
Existing workflow adaptation methods predominantly focus on the avoidance of deadlocks and on the transition of running instances after workflows have been changed (Weber et al., Reference Weber, Reichert and Rinderle-Ma2008). In particular, workflow patterns and structural adaptation of process models and their instances in response to manual changes have been studied widely. However, domain-specific models, resources, and constraints are not usually considered. Methods based on pattern recognition and rewriting rely on a predetermined catalog of possible patterns and remedies, which is typically unavailable in ad hoc process configuration scenarios.
Specific executable processes can also be generated by automated planning approaches (Sirin et al., Reference Sirin, Parsia, Wu, Hendler and Nau2004). Although planners use powerful reasoning techniques, detailed models of actions and goals are required which prohibits the direct application of planning algorithms to the work process configuration problem addressed in this article. If available, our approach could also benefit from detailed semantic constraints to improve conflict detection and search heuristics.
4. SIEMENS AG PROCESS INSTANTIATION
Information system engineering and design processes at Siemens AG are nested within a general framework, the SPF (Fig. 6). Its purpose is to standardize the management and structure of processes. The main components of the SPF are the Reference Process House (RPH), the Roles of individuals and committees in process management, and Process Modeling Methods (Schmelzer & Sesselmann, Reference Schmelzer and Sesselmann2004). The RPH describes the hierarchical structure of Siemens processes. It includes all business processes that cover the relationship to customers as well as internal management processes, such as strategic planning and control, and support processes, for example, assigning of human resources to roles in a project.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160626161232-30135-mediumThumb-S0890060410000594_fig6g.jpg?pub-status=live)
Fig. 6. The structure of the Siemens Process Framework.
The development processes are nested in several levels of abstraction. The RPH provides the structure of the top four levels (levels 0–3) with the RPH as detailed in Figure 6 corresponding to level 0. Processes at level 1 typically specify an abstract sequence of activities, such as Plan, Product management, Define, Implement, Operate, and Phase out. The structure and appearance of lower levels is determined by business units according to their individual needs. Detailed processes are specified as EPCs and FADs (see Fig. 1).
Overall, the product life cycle management part comprises nearly 3800 elements. Although a wide variety of tasks are covered, not all activities apply to all projects, and the generic process framework can usually be reduced significantly.
Adaptation is performed in two steps. First, processes relevant to a project are identified in the reference processes at levels 0 and 1, and irrelevant activities are removed. Second, detailed tailoring is performed in order to create extra activities required by a project, fill roles, and assign resources. This process is performed iteratively over the duration of the project, because the complexity of the reference process and unplanned changes prohibit project managers from forming a comprehensive plan at the start of a project.
To support adaptation, a process editor has been built that integrates with the overall process management systems at Siemens AG. The architecture of the system is shown in Figure 7. The editor manipulates the processes stored in the commercial ARIS system via a custom-built interface. The XML-based XPDL process definition language (WFMC, 2008) is used as data exchange format. The interface allows the project manager to edit and visualize the processes and resources associated with a project. Figure 8 shows snapshots of the user interface for manipulating processes and associated resources. This architecture ensures that the ARIS system remains the central authority for process information and existing project management tools need not be altered, a factor that we considered to be crucial for the adoption of any advanced process adaptation tool. Generative constraints were expressed in a language derived from the OCL (Object Management Group, 2006). This has the benefit that the metamodel and the constraints can be developed and versioned together in a UML tool.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160626161230-98237-mediumThumb-S0890060410000594_fig7g.jpg?pub-status=live)
Fig. 7. The process management systems architecture at Siemens AG.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160626161239-06903-mediumThumb-S0890060410000594_fig8g.jpg?pub-status=live)
Fig. 8. The graphical user interface of the process adaptation tool. The left image shows the process adaptation interface, and the right image shows the interface to browse and manipulate activities and resources.
We constructed the metamodel for the case study to fit the structure of the reference software development processes at Siemens AG, where the processes are stored, retrieved, and displayed using the EPC modeling language. After testing various smaller scenarios using the metamodel with interactive users, a major case study applied to the model to create a tailored variant of the reference process for a particular business unit for real world use. Manual adaptation had been tried and abandoned several times as even in teams with elaborate review processes, consistency of the adapted process could not be guaranteed.
The metamodel of the reference process was inspired by the EPC process modeling language, but includes additional entities and constraints that are specific to the development practices at Siemens AG. The model was developed in collaboration with the project managers and developers at Siemens AG. The process comprises 33 entity types and phases (composite activities modeled as subprocesses), milestones, and control flows. Figure 3 shows the structure of the main entities in the model. In addition, 139 generative constraints describe the valid composition of entities and flows into processes. Restrictions derived from the EPC language and also rules specific to the business unit's development process have been incorporated. The model was developed iteratively, intertwining constraint formulation and evaluation on selected subprocesses, to ensure that the constraints are specific enough to rule out invalid processes while tolerating the variation necessary for effective process customization. Overall, the resulting reference process includes about 3800 process elements and about 14,000 instantiations of generative constraints. As a significant fraction of Siemens development processes involves embedded software in electronic and mechatronic systems, the process actually is of much wider scope than pure software development, which results in significant variation for different business units. However, the size and complexity of the process meant that even seemingly trivial adaptions could result in insurmountable effort. As a result, process management had so far been restricted to merely using the reference process as a rough guideline without adapting to business unit or project specific needs as originally envisioned.
Once the model had been built, the reference process could be tailored to suit selected development scenarios. For the initial full-scale application of the framework, the adaptation of the reference process for a software business unit was chosen. Experts at Siemens AG identified 108 activities and 41 work products that had to be removed. These changes triggered additional adaptations that were applied automatically by our tool. The resulting process included 1959 elements, down from 3790 before the adaptation (a reduction of 48%). The execution of this task completed in 221 s of CPU time on a standard PC (excluding the time required to import and export the process from ARIS). Manual creation of this particular process (using project teams and review meetings) had been tried several times and abandoned because of the impossibility of tracing dependencies of process elements and verify conformance with constraints.
Note that because of the declarative modeling approach, the framework and tool can also be used to validate any given process. Applied to the overall 3800 element Siemens reference process, the tool identified 177 previously unknown constraint violations.
The application to the real world Siemens problem has demonstrated that automated process configuration has the potential to achieve tremendous improvements in process quality while reducing the cost of process maintenance. We anticipate that automated process configuration will yield significant savings that is due to a reduction of the time required to tailor processes and the absence of activities that do not contribute to the project's outcomes.
5. CONCLUSION AND FUTURE DIRECTIONS
The adaptation of large-scale generic processes to the requirements of specific projects is a challenging task where automated tool support has been desired. The process adaptation framework presented in this article extends well-known constraint-based configuration principles to the process domain. The approach generalizes traditional component-oriented configuration techniques into a uniform formal framework for the configuration of entire processes. The framework comprises an extensible metamodel of organization-specific process entities and relations and a declarative constraint language that incorporates constraints on process structure and execution paths. The metamodel in conjunction with the declarative constraint language allows organizations to minimize the effort required to adopt the approach by gradually amending the level of detail that is reflected formally. Generic heuristics are applied to select changes that resolve inconsistencies such that the overall process structure is maintained. Methods or algorithms that implement the approach can be generic or chosen for the specific context.
For the Siemens case, a semi-interactive tool was built that fit within the Siemens process environment, allowing project managers to tailor and validate work processes with respect to a set of project-specific constraints and a given reference process. The approach has helped to identify a large number of errors in development processes that had been manually tailored at Siemens AG. Furthermore, substantially smaller processes have been obtained from the monolithic reference process with very little effort. The efforts involved in creating the initial metamodel, constraints, and tool implementation have been offset by a significant reduction in time required to tailor and validate specific processes.
Given that considerable improvements in the management of development processes at Siemens AG have been demonstrated, we intend to conduct further studies and evaluate our configuration approach on additional processes sourced from different organizations and work process domains. Our primary objective will be to refine the heuristic adaptation mechanisms for common work process patterns and to further improve the quality of tailored processes for common process patterns and change operations.
The investigations in this project have focused largely on structural and flow aspects within a process. Further extensions can be incorporated that reflect additional properties of the domain, such as an explicit representation of time and detailed resource allocation and scheduling. Expanding the declarative representation to incorporate approaches that strive to capture semantics of larger building blocks of the design also opens further research directions. Developing declarative means to specify context-dependent search strategies in order to synthesize efficient implementations would also be desired to yield even more efficient yet flexible tools.
ACKNOWLEDGMENTS
This research was supported by the Australian Research Council (Grant DP0988961) and by Siemens AG.
Wolfgang Mayer is a Lecturer at the University of South Australia. He received his PhD from the University of South Australia and his MS in computer science from the Vienna University of Technology. His research interests include analysis, composition, and diagnosis of process models and software systems. Dr. Mayer's work focuses on model-based reasoning and knowledge representation techniques with applications to fault localization, automated configuration, and data- and workflow integration.
Markus Stumptner is a Professor of computer science at the University of South Australia, where he directs the Advanced Computing Research Centre. He received MS and PhD degrees in computer science from the Vienna University of Technology. Dr. Stumptner's research interests include object-oriented modeling, knowledge representation, and model-based reasoning in areas such as configuration and diagnosis.
Peter Killisperger is a Researcher at Siemens Corporate Research and Technologies in Munich. He received a PhD in information technology from the University of South Australia, an MS in distributed and multimedia information systems from the Heriot–Watt University in Edinburgh, Scotland, and a diploma degree in business informatics from the University of Applied Sciences in Augsburg, Germany. Dr. Killisperger's work focuses on improving information systems development processes.
Georg Grossmann is a Research Fellow at the University of South Australia. His PhD thesis on behavior-based integration of object-oriented information systems was awarded the Ian Davey Research Thesis Prize for the most outstanding research thesis at the University of South Australia. He also received an MS in economics and computer science (jointly from University of Vienna and Vienna University of Technology). Dr. Grossmann's current research interests include business process integration, behavior-based integration of Web services, ontology-driven data integration, and distributed event-based systems.