Intelligent product-gene acquisition method based on K-means clustering and mutual information-based feature selection algorithm

Pan Li; Yanzhao Ren; Yan Yan; Guoxin Wang

doi:10.1017/S0890060419000258

Intelligent product-gene acquisition method based on K-means clustering and mutual information-based feature selection algorithm

Published online by Cambridge University Press: 08 November 2019

Yan Yan and

Pan Li: Affiliation:
School of Mechanical Engineering, Beijing Institute of Technology, Beijing, China
Yanzhao Ren: Affiliation:
College of Information and Electrical Engineering, China Agricultural University, Beijing, China
Yan Yan: Affiliation:
School of Mechanical Engineering, Beijing Institute of Technology, Beijing, China
Guoxin Wang*: Affiliation:
School of Mechanical Engineering, Beijing Institute of Technology, Beijing, China
*: Author for correspondence: Guoxin Wang, E-mail: wangguoxin@bit.edu.cn

Article contents

Abstract
Introduction
Relevant literature
Product-gene definition and structure
Intelligent acquisition method for product genes
Illustrative example
Case analysis
Conclusion
References

Rights & Permissions

Abstract

Conceptual design is a key stage of product design and has received increasing attention in recent years. However, this stage is characterized by limited information, large uncertainty, and multidisciplinary aspects. Thus, increased workload and time cost are associated with conceptual design information acquisition; sometimes, it is difficult to develop novel solutions and the feasibility of the solutions obtained according to these limited and uncertain information is difficult to guarantee. Genetics-based design (GBD) is an effective approach to develop novel solutions and improve the reuse of knowledge, which is consistent with the goal of the conceptual design process. Product-gene acquisition is the premise and basis of GBD. At present, there are few reported studies in this area; most of the existing works are constrained by the structural aspects of the acquisition process, and there are limited studies on specific implementation techniques. To explore the specific implementation technologies of product-gene acquisition, an intelligent acquisition method based on K-means clustering and mutual information-based feature selection algorithm is proposed in this paper. The product genes defined in this paper are key product information that determines the nature of the product and influences the conceptual design process. Thus, solutions obtained according to them are more feasible than that based on limited and uncertain information. An illustrative example is presented. The results show that the proposed method can achieve intelligent acquisition of product genes to a certain extent. Further, the proposed method will allow designers to quickly search for the corresponding product genes when performing similar functional design tasks.

Keywords

Conceptual design feature selection K-means mutual information product genes

Type: Research Article
Information: AI EDAM , Volume 33 , Special Issue 4: Intelligent Interaction Design , November 2019 , pp. 469 - 483

DOI: https://doi.org/10.1017/S0890060419000258 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2019

Introduction

Conceptual design has received much attention in recent years because of its importance in product design. Specifically, the conceptual design of a product can determine the innovation and quality of the final solution devised to achieve the design objectives. It is generally accepted that conceptual design provides the minimum constraints for designers as well as the greatest innovation possibilities. Thus, conceptual design can best showcase designer experience, wit, and creativity (Ullman, 1992). Research (Zheng et al., Reference Zheng, Feng and Tan2015) has shown that this stage determines 70%−80% of the lifecycle cost while constituting approximately 5% of the lifecycle cost. Further, if decisions made in this stage involve errors, it is difficult to repair those mistakes in the production stage or in other downstream stages (Huang et al., Reference Huang, Liu, Li, Xue and Wang2013). These factors motivate designers to constantly seek new conceptual design methods to improve product quality, reduce development cost, and shorten the time to market (Srinivasan et al., Reference Srinivasan, Chakrabarti and Lindemann2015).

Conceptual design is a process of abstracting target functions from design requirements and using necessary technical means to obtain structural schemes that meet the target functions; its core process is the function–structure scheme mapping (Pahl et al., Reference Pahl, Beitz and Feldhusen1984), in which functions are the aims of conceptual design and structural schemes are the carriers of functions. However, the conceptual design is characterized by a lack of information, considerable uncertainty, multidisciplinary aspects, and a large number of possible alternative solutions (Bucknall and Ciaramella, Reference Bucknall and Ciaramella2010; Steimel et al., Reference Steimel, Harrmann and Schembecker2013). These characteristics increase the difficulty of product innovation and extend the product design timeframe, specifically in the following three aspects:

1) It is difficult to retrieve and share multidisciplinary design knowledge with multisource heterogeneity.
2) The design information in the conceptual design phase is limited and uncertain. It is difficult to determine which knowledge is the key information affecting product quality and the conceptual design process in the context of the information explosion.
3) Solutions obtained according to limited and uncertain information have little probability of meeting functional requirements, and many iterations are generally required to get a relatively satisfactory product.

Therefore, a method of determining the key design information while expressing it in a unified manner is urgently needed.

One solution to this problem is the introduction of the product-gene concept to conceptual design. Product gene is a standardized information set that determines the essential properties of products and can be passed from parent products to children products. The product-gene concept has been developed with the aim of reusing the key information that impacts the design process by learning from bio-genetic engineering technology (Chen et al., Reference Chen, Feng and Chen2005a). Genetics-based design (GBD) is an effective approach to develop novel solutions, which can reduce the blindness of innovation, increase the solution space, and improve the feasibility of the solutions and the reuse rate of knowledge (Feng et al., Reference Feng, Chen and Zhang2002; Chen et al., Reference Chen, Feng and Chen2005a, Reference Chen, Feng and Lin2005b; Chen and Feng, Reference Chen and Feng2009). Therefore, introducing the product gene to represent key information and to support the conceptual design process is an effective way to solve the problems caused by multisource heterogeneous and finitely fuzzy information. However, the product-gene acquisition has remained at the process structuring stage. There are a few intelligent algorithms available to implement this process. To solve this problem, this paper presents an intelligent product-gene acquisition method based on a K-means clustering algorithm as well as a feature selection algorithm based on mutual information (MI). Product-gene definition was provided based on the analysis of the conceptual design process and analogy function expressing the process to biological trait expression process. Then, a product-gene coding method based on function elements was presented. Finally, an intelligent method based on K-means and MI-based feature selection method was proposed for product-gene acquisition.

Relevant literature

This work covers primarily three aspects of knowledge: product gene, functional expression, and machine learning. This section provides a brief review of literatures relevant to these three aspects.

Product gene

A product gene is a standardized set of key product information that determines the product nature and that can be passed from parent products to offspring products. Several studies have focused on product genes (Gero and Kazakov, Reference Gero and Kazakov1998; Gero, Reference Gero2000; Feng et al., Reference Feng, Chen and Zhang2002; Luo et al., Reference Luo, Sun, Pan and Zhu2004; Chen et al., Reference Chen, Feng and Chen2005a, Reference Chen, Feng and Lin2005b, Reference Chen, Feng, He, Lin and Xie2006; Tai et al., Reference Tai, Zhong and Miao2007; Shang et al., Reference Shang, Huang and Zhang2009; Teng et al., Reference Teng, Cao and Gao2010; Reich and Shai, Reference Reich and Shai2012). Design genes associated with space layout planning problems were considered in Gero and Kazakov (Reference Gero and Kazakov1998); the results showed that gene evolution could induce partial decomposition of the layout problems and rapidly identify solutions. Further, the evolved genes represent design features that can be re-used at a later time for a range of similar problems. In addition, a product-gene representation and acquisition method based on a population of product cases was proposed, with an evolution methodology based on the product genes being presented (Tai et al., Reference Tai, Zhong and Miao2007). In that study, the product gene was defined as a collection of information combined with a function base, principle base, and structure base. The proposed method was reported to be helpful for information formulation, accumulation, and reuse, and for addressing the difficulty of design information acquisition in traditional design. Further, Chen et al. (2005) defined a product gene as an information set consisting of a verb and attributes. Hence, a knowledge base was constructed by extracting product genes from principle solutions via reverse transcription and translation. Then, a genetics-based approach to the solution of conceptual design problems while handling functional relation changes was proposed, with the suggestion that product genes can bridge the gap between relation change-based functions and their corresponding principle solutions. Similarly, a product gene obtained through extraction from a function model was proposed (Feng et al., Reference Feng, Chen and Zhang2002), being composed of process characteristics and solution characteristics. Then, product-gene mechanisms such as “transcription” and “translation” were presented. Hence, a conceptual design methodology based on inheritance and reorganization of product genes was developed, which is helpful for product principle innovation in conceptual design. A gene engineering-based innovation method was applied to innovate products via artificial differentiation of their virtual chromosomes (Chen et al., 2005). A logically structured process and reverse acquisition of virtual chromosomes according to function requirements was proposed. This method was reported to be helpful for reducing product innovation blindness. Finally, Reich and Shai (Reference Reich and Shai2012) proposed an interdisciplinary engineering knowledge genome (IEKG) concept, which is parallel to the human genome concept. The IEKG was composed of system genes and method genes, which are helpful for transforming knowledge among many disciplines. User implicit knowledge was considered in the collaborative conceptual design of products, and a feature gene concept was proposed (Luo et al., Reference Luo, Sun, Pan and Zhu2004). Feature genes could be obtained through the division of features, such as shape, function, ergonomics, and manufacturing. A product-gene definition comprising function and structure genes was presented by Teng et al. (Reference Teng, Cao and Gao2010), and a product-gene search method was subsequently proposed based on the degree of match between the target product (TP) gene and an existing product gene in the database.

The literature review shows that although significant advances have been reported in the literature, product-gene acquisition has remained at the process structuring stage. Existing studies have only given the acquisition process of product genes, whereas very few studies have considered a feasible intelligent algorithm to implement the process.

Functional expression

Functional expression is, in essence, the transformation from functions to structural schemes in conceptual design. To solve this problem, function modeling is often considered as the most useful method which will ideally remove designers' biases and generate more complete conceptual solutions (Chakrabarti et al., Reference Chakrabarti, Shea, Stone, Cagan, Campbell, Hernandez and Wood2011). To support function synthesis design, various models have been proposed during the last few years. In 1990, AI research scientist Gero (Reference Gero1990) proposed a real-time design thinking prototype based on behavioral reflection theory. He analyzed the influence of ever-changing design situation on the design process and proposed a scenario-based function–behavior–structure (FBS) model that received widespread recognition in the design community.

Many researches have attempted to apply and expand this model. Gero and Udo (Reference Gero and Udo2004) proposed eight basic design processes in product design based on the FBS model framework and established a situated FBS model that can realize the conceptual expression of a dynamic world. Based on an analysis of relationships among elements in the FBS model, Qian and Gero (Reference Qian and Gero2009) established a formulation of design knowledge that provides theoretical support for the analogous design of products. Umeda et al. (Reference Umeda, Ishii, Yoshioka and Shimomura2009) established a function expression model based on functions, behaviors, and states. This model can provide support for conceptual design in the analysis and integration stage. Vermaas and Dorst (Reference Vermaas and Dorst2007) clarified two problems that existed in the FBS model proposed in Gero (Reference Gero1990) and tried to solve these problems from a philosophical point of view. Using an object-oriented concept, Feng et al. (Reference Feng, Chen and Zhang2002) defined characteristics of action process, object, environment, and effect in product design process as a feature vector and presented a vector-based function model. Chen et al. (2005) proposed a model of mapping from an objective function to a behavior principle scheme to solve problems of objective function expression with a changing relationship. However, this model worked only for a separation function and did not consider the behavior–structure mapping process.

In addition, some scholars have used reverse derivation to express functions, which is based on mapping from structures and behaviors to functions. For example, Bogoni (Reference Bogoni1998) provided a structure–behavior–function (SBF) model to express the mapping of internal attributes of objects between concepts and entities. Since then, a number of scholars have studied on the basis of this model, such as the environmentally bound SBF (ESBF) model (Prabhakar and Goel, Reference Prabhakar and Goel1998), an SBF model framework based on ontologies (Ying et al., Reference Ying, Li and Guo2004), and a function expression model based on an input–output stream reference (Kitamura and Mizoguchi, Reference Kitamura and Mizoguchi2003).

Among the above-mentioned function models, the FBS model proposed by Gero is the one most in line with designers' thoughts and product design process and is generally accepted as the most typical functional expression model at present.

Machine learning

Machine learning is one of today's most rapidly growing technical fields, lying at the intersection of computer science and statistics, and at the core of artificial intelligence (AI) and data science (Jordan and Mitchell, Reference Jordan and Mitchell2015). Clustering and feature selection algorithms in machine learning are associated in this paper, and their relevant literatures are discussed in the subsequent section.

Clustering algorithm

Clustering is the process of dividing elements into multiple categories or clusters based on their similarity (Rodriguez and Laio, Reference Rodriguez and Laio2014). A common method of evaluating the “similarity” between two samples is to calculate the “distance” between them. Note that the use of different clustering algorithms for the same dataset may yield different results. The time costs of different algorithms also vary and are determined by the algorithm characteristics. There are four main types of commonly used clustering algorithms: K-means, DBSCAN, Gaussian Mixture, and BIRCH (Yuan et al., Reference Yuan, Sun, Zhao, Wang and Wang2017). Among them, the Gaussian Mixture (Srivastava et al., Reference Srivastava, Subramaniyan and Wang2017) clustering method has high complexity and is unsuitable for large-scale data processing. For the BIRCH (Saravanan and Srinivasan, Reference Saravanan and Srinivasan2011) clustering algorithm, the cluster must be spherical as this algorithm utilizes the radius or diameter to control the cluster boundary; thus, for other cases, this algorithm does not work effectively. DBSCAN (Kumar and Reddy, Reference Kumar and Reddy2016) is one of the few algorithms for which the number of clusters does not have to be set. This algorithm can find many clusters that cannot be found by the K-means algorithm; however, it is not suitable for data with high-density changes or for high-dimensional data. K-means (Macqueen, Reference Macqueen1967) is an efficient clustering algorithm that can overcome the disadvantages of the other three algorithms, and it is also suitable for application to high-dimensional large-scale data. However, this algorithm is sensitive to the initialization of the cluster centroids.

Feature selection algorithm

Feature selection, also called variable elimination, is helpful in understanding data, reducing computation requirement, reducing the effect of the curse of dimensionality, and improving the predictor performance (Chandrashekar and Sahin, Reference Chandrashekar and Sahin2014). Its focus is to select a subset of variables from the input that can describe the input data and provide good prediction results while reducing effects from noise or irrelevant variables (Guyon et al., Reference Guyon, Elisseeff and Kaelbling2003). To remove irrelevant features, a feature selection criterion is required that can measure the input features with the output class or labels. There are three main feature selection methods: filter methods, wrapper methods, and embedded methods. Filter methods use variable ranking techniques as the principle criteria for variable selection by ordering; the criteria of these methods mainly include the Pearson correlation coefficient and MI (Guyon et al., Reference Guyon, Elisseeff and Kaelbling2003; Lazar et al., Reference Lazar, Taminau, Meganck, Steenhoff, Coletta, Molter, de Schaetzen, Duque, Bersini and Nowe2012). Wrapper methods use the predictor as a black box and the predictor performance as the objective function to evaluate the variable subset; these methods are broadly classified into sequential feature selection algorithms and heuristic feature search algorithms (Lazar et al., Reference Lazar, Taminau, Meganck, Steenhoff, Coletta, Molter, de Schaetzen, Duque, Bersini and Nowe2012); typical wrapper methods include the tree structure (Kohavi and John, Reference Kohavi and John1997) and genetic algorithm (Rudi and Yaqub, Reference Rudi and Yaqub2015). The main drawback of wrapper methods is the larger number of computations required to obtain the feature subset. Embedded methods automatically perform feature selection during the training of the learner. The objective is to reduce the computation time required for reclassifying different subsets, which is performed in wrapper methods. There are two main embedded methods: one is based on MI (Battiti, Reference Battiti1994; Peng et al., Reference Peng, Long and Ding2005) and the other one is the weights of a classifier (Keerthi et al., Reference Keerthi, Shevade, Bhattacharyya and Murthy2014).

Peng et al. (Reference Peng, Long and Ding2005) presented a max-relevance and min-redundancy (mRMR) feature selection method and proved that the mRMR criteria are equivalent to the max-dependency criteria if only one feature is selected at a time. This is called the “first-order” incremental search. Moreover, it was proved that the proposed method could significantly improve the classification accuracy with the lower expense.

Product-gene definition and structure

The product genes discussed in this article pertain to the conceptual design process and are mainly used to study the key product information elements that affect this process. Thus, the conceptual design process should be analyzed before product genes are discussed. Note that process modeling is one of the most effective means of analyzing the conceptual design process.

Product-gene definition

Conceptual design is a key stage in the overall product design cycle and determines the design solution, innovation, and nature of the resultant product. From analogy with biological genes, we define product genes as a collection of key information that determines the conceptual process for the product, which can be “inherited” and “mutated” during the conceptual design process. Here, heredity means that offspring products can retain the same structure and functions as the parent products through replication of product genes, whereas mutation refers to the phenomenon in which the product functions or structures change as a result of genetic changes.

Product-gene structure

The product-gene structure is mainly explored using the following approach:

1. The product conceptual design process is analyzed and a process model is established;
2. The product-gene structure composition is deduced from the model; this is analogous to the exploration method for a biological gene structure.

These two steps are discussed separately below.

Conceptual design process modeling

One of the most important objectives of conceptual design is to seek a structure solution that most satisfies the objective functions. The input of the conceptual design process is the target functions, and the output is the product structure scheme that most satisfies the target functions. Considering these characteristics, we established the simple conceptual design process model shown in Figure 1.

Fig. 1. Conceptual design process model.

From the perspective of functional expression, this article defines the behaviors and key product attributes as the key information that influences the conceptual design process. Behaviors can be represented by action verbs. The key product attributes mainly correspond to the physical and geometric properties of the products, which can be expressed by parameters.

Definition and structure of product genes

The biological traits of a given organism describe its ability to respond to or generate a certain effect on a particular subject; for example, the phagocytic traits of phagocytic cells (Selsted and Ouellette, Reference Selsted and Ouellette1995) and the lodging resistances of certain rice varieties (Ookawa et al., Reference Ookawa, Hobo and Yano2010). Bio-proteins form the material basis of life, being the main agents of life activity and the main carriers of biological traits. Similarly, the functions of a product also describe that product's ability to respond or react to specific objects; consider, for example, the cooling and heating capabilities of air conditioners. The structure is the material basis of a product and the carrier of a products' functions. Therefore, product functions can be taken as being analogous to biological traits, with their structures corresponding to biological proteins (Feng et al., Reference Feng, Chen and Zhang2002; Chen et al., Reference Chen, Feng and Chen2005a). It is worth noting that, in conceptual design, structure schemes are applied as the “proteins” of products, rather than simply structures.

Studies of biological genes begin with the biological traits and ultimately end with the production of proteins through gene expression. Similarly, as product functions and structure schemes are taken as being analogous to biological traits and proteins, respectively, the intermediate process from product function to structure scheme can be regarded as the product-gene expression process.

Following the double-stranded structure of biological genes, product genes can also be expressed as a double-stranded structure, in which the behaviors and key attributes are recognized as “bases” and the relationships between them correspond to “hydrogen bonds.” The gene structure derivation process is shown in Figure 2. In this figure, A, B, and VA, respectively, refer to product attributes and behaviors and the relationships between them. Here, A and B are bases as biological bases including A′, T′, C′, and G′, which refer to the biological base adenine, thymine, cytosine, and guanine, respectively; VA is the relationship between V and A. B can be represented by action verbs. Note that VA is not studied in this article, because it only acts as a link between the attributes and process characteristics, similar to the role of hydrogen bonds in biological genes.

Fig. 2. Product-gene double-helix structure.

The product-gene structure proposed in this paper can express the product behaviors and, also, represent the key product attributes. The behaviors represent the characteristics of the function realization process, while the key attributes indicate the product features. Previously reported works (Deng et al., Reference Deng, Tor and Britton2000; Georgiou et al., Reference Georgiou, Haritos, Fowler and Imani2016; Zhao et al., Reference Zhao, Liu, Wang and Shi2016) show that changes in a product's attribute (or activation of a new attribute) mean that the product's nature has undergone a qualitative change; that is, innovation has occurred. Therefore, attribute analysis is very important for system improvement to achieve the innovative design. The product-gene structure proposed in this paper includes not only the process elements for function expression but also important attributes that influence the product properties and innovation. Therefore, the product genes proposed herein are reasonable and effective for the achievement of product replication and innovation.

Product-gene coding method based on function elements

In organisms, DNA acts as the gene carrier. There are one or more gene fragments in one DNA chain. Mendel believed that each trait of an organism is controlled by a particular genetic factor (gene) (Bhattacharyya et al., Reference Bhattacharyya, Smith, Ellis, Hedley and Martin1990). Further, eukaryotic genes are structurally divided into coding regions and non-coding regions.

The functions of mechanical products are similar to biological traits, while their genetic factors, that is, product genes, are similar to biological genes. Therefore, analogous to the biological gene encoding form, a product-gene coding method based on functional elements is proposed in this paper. In this approach, every function unit corresponds to one product-gene chain and can also be called the product DNA. All the chains corresponding to the function units of a product comprise its genetic material. These chains are structurally divided into coding and noncoding regions, as detailed in Table 1. The coding region mainly includes the gene address coding regions and base element coding regions, while the noncoding region includes a functional element corresponding to the chain, with “Start” and “End” components to the chain. Product genes are stored in the product-gene library as coding chains.

Table 1. Product-gene coding chain

In Table 1, F _i is the ith function unit, Start is the factor that initiates the transcription of this chain, N is the address of the first product gene in this chain, A _m represents key attributes that influence the expression of F _i, B _y is a behavior of F _i, and End is the factor that ends transcription of this chain. The action verb represents the behavior of the product. In general, there is only one behavior corresponding to one function element.

This product-gene coding method not only helps designers to learn from genetic engineering for application to product-genetic engineering but also maps product genes with every function element. Further, product genes are separated by address, which clarifies the gene chains. By applying this coding method, designers can quickly search for product-gene chains corresponding to certain target function elements during the design process. This approach helps reduce the designers' labor intensity and shortens their information search time. In addition, as product genes contain key information on products, the information obtained through this method is of high quality and designers are highly likely to apply it in practice. This, in turn, increases the design knowledge reuse rate.

Intelligent acquisition method for product genes

Organisms reproduce through operational mechanisms, such as gene replication, expression, and mutation. Similarly, in conceptual design, product information can be passed from parent to offspring products and used to create innovative products through operational techniques, such as product-gene reproduction, expression, and mutation. The difference is that the genetics and variations of organisms are spontaneous behaviors, whereas product inheritance and innovation must rely on human participation. To develop product-gene manipulation technology that serves product conceptual design, product-gene acquisition is an important problem that designers must consider first.

The product-gene acquisition process, also called reverse transcription of product genes, is a reverse process of product-gene expression. This process uses certain technical means to obtain product genes based on existing products or product schemes.

Comparison of biological genetic central dogma and product-genetic central dogma

In biological genetic engineering, biological genes are genetic materials that support the basic structures and mechanisms of life. Biological traits are the external manifestations of biological mechanisms, while proteins are their carriers. The DNAs of an organism are transcribed to RNAs, which are then translated into proteins. Proteins are vectors that express biological traits. An identical organism can be cloned from an organism through gene replication, transcription, and translation processes (Fig. 3). Conversely, RNAs and DNAs can be obtained from biological proteins through reverse transcription, which is the inverse process of the gene expression process consisting of transcription and translation processes. Reverse transcription is also called the biological gene acquisition process, which is the premise and basis for research and utilization of biological genes. This is the biological genetic central dogma.

Fig. 3. Comparison of biological genetic central dogma and product-genetic central dogma.

As described in the section “Product-gene structure,” product functions, structure schemes, product genes, and product-gene expression processes were analogous to biological traits, biological proteins, biological genes, and biological gene expression processes, respectively. Product-gene expression includes the transcription and translation processes. Drawing on the biological genetic central dogma, a product-genetic central dogma has been proposed, as shown in Figure 3. In this figure, product genes, action principles, and structure schemes were likened to DNAs, RNAs, and proteins, respectively. Therefore, the transcription and translation processes, respectively, refer to the mapping of product genes to action principles and action principles to structural schemes. Product-gene replication refers to the process of retrieving the product genes that meet function requirements from the product-gene pool. These genes could be used directly in the gene expression process. Reverse transcription, also called product-gene acquisition, refers to the reverse process of product-gene expression, which uses necessary technical means to obtain product genes based on existing products or product schemes. Therefore, the product-genetic central dogma could be expressed as follows: action principles can be obtained through the transcription of product genes and then be translated into structure schemes. In turn, product genes can be gained through reverse transcription of action principles and structure schemes. Meanwhile, parent products' genetic information can be passed to their offspring products by means of the product-gene replication, transcription, and translation processes.

Intelligent acquisition method for product genes

At present, methods for obtaining product genes mainly include reverse engineering-based (Feng et al., Reference Feng, Chen and Zhang2002; Chen and Feng, Reference Chen and Feng2004; Chen et al., Reference Chen, Feng and Chen2005a, Reference Chen, Feng and Lin2005b; Ai and Wang, Reference Ai and Wang2012; Ai et al., Reference Ai, Wang and Liu2013; Zhu et al., Reference Zhu, Zhang and Zhu2013; Waris et al., Reference Waris, Sanin and Szczerbicki2016) and case-based (Tai et al., Reference Tai, Zhong and Miao2007; Ai et al., Reference Ai, Wang and Liu2013) product-gene acquisition techniques. Most of these methods have only provided a structured model of the top-level acquisition process. Few studies have been conducted on specific methods or technologies, let alone algorithms, about how to implement each step of the process. In this case, product-gene acquisition can only stay at the surface of the process structure level and cannot be truly implemented. The present article presents a product-gene acquisition methodology combining a K-means clustering algorithm and MI-based feature selection algorithm, as shown in Figure 4, with the aim of providing a feasible and intelligent acquisition algorithm for product genes.

Fig. 4. Product-gene acquisition method diagram. TP: target product; C: category of TP.

As shown in Figure 4, the input of this method is product information, which involves details of the structure, action process, and parameters of a product, while the output is the product genes of this product. The product information could be obtained through probabilistic methods or knowledge-based engineering methods (KBE). Traditional probabilistic methods mainly include tree-structured dependencies, Okapi BM 25, and Bayesian networks (Manning et al., Reference Manning, Raghavan and Schütze2010). Tree-structured dependency mode has removed the assumption that terms are independent, but the estimation problem has held back its practical success. Okapi BM25 pays more attention to term frequency and document length without introducing too many additional parameters into the model, but it does not address the meaning and grammatical structures of terms. Bayesian networks allow learning and inference with arbitrary knowledge within arbitrary-directed acyclic graphs. However, they need a predictor that the distribution of variables is completely independent, which is hard to satisfy in reality. KBE is one of the research fields of AI and has gradually become a research hotspot (Russell and Norvig, Reference Russell and Norvig2013). There are some KBE technologies that can be used in information retrieval, such as knowledge map-based method (Hao et al., Reference Hao, Yan, Gong, Wang and Lin2014), ontology-based method (Chris et al., Reference Chris, Ying and Daniel2016), and knowledge component (Hao et al., Reference Hao, Yang and Yan2012). These technologies have considered the meaning and grammatical structures of terms, which could overcome the shortcomings of traditional probabilistic methods. The acquisition of product information from products is not our research focus.

Here, we use TP to refer to the target product. The specific steps of our methodology are as follows:

(1) In accordance with the TP action process, an SV(P)O labeling method is adopted to analyze the main functions and verbs of this product. These functions are decomposed into functional elements according to the principles of decomposition and reconstruction. Each function element is represented by a certain performance index. The SV(P)O (Chen et al., Reference Chen, Yan and Wang2011) labeling method is a semantic annotation method based on ontologies, where S is the description subject, V is the operation of S, P represents the attributes of O or the extent of V, and O is the operating objectives of S. In this SV(P)O structure, P provides additional instructions for VO and is optional. The information abstracted using this method is the core design knowledge.

The principle of decomposition and reconstruction (PDR) is a theory for dividing and synthesizing items (Lin et al., Reference Lin, Gao, Zhou, Lu, Ye, Zhang, Liu and Yang2013). For functional design in engineering, functions are divided into multi-level sub-functions through decomposition until function elements are obtained. Conversely, functional elements and sub-functions can be reconstructed into total functions.

(2) A sample set is selected, having multiple product instances of the same family and the same major functions as TP. Note that the products belonging to a product family have the same attributes but different attribute values (Messac et al., Reference Messac, Martinez and Simpson2013).

(3) The functions and attributes of the sample set and TP are assigned and normalized to remove dimensions. During the assignment, different assignment methods are used for different objects. When assigning attributes, we use their true values to represent them if they are quantitative and a fuzzy evaluation method if they are qualitative. The function unit assignment is completed by the assignment of their corresponding indicators. The indicator assignment method is the same as the attribute assignment method.

Note that normalization must be implemented, as the function and attribute data have different dimensions. The min–max normalization method is adopted in this work. After formalization, the values of the functions and attributes are converted into data points with values between 0 and 1.

(4) The K-means clustering algorithm is utilized to divide the samples into K clusters and to determine the cluster (C) to which TP belongs. The K-means clustering algorithm includes four main steps:

1. K-value setting: K is the number of desired clusters and is specified by the user.
2. Data preprocessing: The attribute data are normalized and the dimensions removed. These data include both attribute data of the sample set and those of TP. They are normalized together.
3. Sample classification: The K-means clustering algorithm is used to randomly classify the samples into K clusters and to record the samples and centroid of each cluster in each iteration.
4. Judgment: The cluster to which TP belongs is determined. After sample clustering, the distance between TP and the centroid of every cluster is calculated according to the Euclidean distance (Amirteimoori and Kordrostami, Reference Amirteimoori and Kordrostami2010). Its function expression is as follows:

(1)

$$\rho (X\comma \,U) = \sqrt {\sum {{(x_i-\bar{x})}^2}}, $$

where x _i is the ith column vector of TP and $\bar{x}$ is the corresponding column vector of every centroid of the K clusters.

(5) Key attributes are selected from the sample set S of cluster C through a feature selection method based on MI proposed by Peng et al. (Reference Peng, Long and Ding2005). This method is used based on the minimal redundancy–maximal-relevance (max-relevance) criterion (mRMR). Hence, key attributes corresponding to every function element from S are selected. The main steps of this method are described as follows:

(a) Data preprocessing: Attribute data and function element data are normalized together to remove their dimensions.
(b) MI calculation: The MI between every attribute and every function element of C is calculated. The attributes of C are recognized as features in this method. Further, normalized MI (NMI) was calculated to convert MI into values between zero and one.
MI is a useful information metric in information theory. It can be regarded as a random variable containing information on another random variable (Paninski, Reference Paninski2014). If two random variables X and Y are discrete, the MI between them is defined in terms of their probabilities p(x), p(y) and their joint probability p(x, y):
(2)$$I(X;Y) = \sum\limits_{x\in X} {\sum\limits_{y\in Y} {\,p(x\comma \,y)\log \left( {\displaystyle{{\,p(x\comma \,y)} \over {\,p(x)p(y)}}} \right)}}. $$
If these two random variables are consequent, the MI between them is defined in terms of their probabilistic densities p(x), p(y) and their joint probabilistic density p(x, y):
(3)$$I(X;Y) = \int_X {\int_Y {\,p(x\comma \,y)\log \left( {\displaystyle{{\,p(x\comma \,y)} \over {\,p(x)p(y)}}} \right){\rm d}x{\rm d}y}}. $$
NMI is the normalization of MI and is obtained by using entropy as the denominator to adjust MI between zero and one. Its function expression is as follows:
(4)$${\rm U}(X;Y) = \displaystyle{{2I(X;Y)} \over {H(X) + H(Y)}},$$
where H(X) and H(Y) are the entropy of X and Y. Their function expression are as follows:
(5)$$H(X) = \sum {\,p(x)} I(x) = -\sum {\,p(x)} \log _2p(x),$$
(6)$$H(Y) = \sum {\,p(y)} I(y) = -\sum {\,p(y)} \log _2p(y).$$
(c) Max-relevance selection: Max-relevance is applied to select features from S. The individual selected features are required to have the largest MI with every function element of S.
Max-relevance is the method used to select features satisfying Eq. (7), which is the mean value of all MI values between the individual feature x _i(x _i ∈ S) and the function element f _λ.
(7)$$\max D(S\comma \,f_\lambda )\comma \,D = \displaystyle{1 \over {{\vert S \vert } ^2}}\sum\limits_{x_i\in S} {I(x_i;f_\lambda )}. $$
(d) Min-redundancy selection: The feature set selected according to max-relevance is sometimes not the best set, because its features likely have rich redundancy; that is, the dependency among these features could be large. Therefore, min-redundancy criteria satisfying Eq. (8) are added to select mutually exclusive features:
(8)$$\min R(S)\comma \,R = \displaystyle{1 \over {{\vert S \vert } ^2}}\sum\limits_{x_i\in S} {I(x_i\comma \,x_j)}, $$
where x _i, x _j in Eq. (8) are features of S. Combining the above two constraints, the criterion is called mRMR, and its function expression can be expressed as follows:
(9)$$\max \phi (D-R) = D-R.$$
Peng et al. (Reference Peng, Long and Ding2005) proved that the mRMR criterion is equivalent to the max-dependency criterion if one feature is selected (added) at a time. They called this type of selection the “first-order” incremental selection. They also proved that the resultant selection accuracy is significantly improved. In our methodology, this “first-order” incremental selection based on mRMR and a forward search algorithm were adopted to select key attributes corresponding to every function element of class C. Therefore, the following equation was used instead of Eq. (9):
(10)$$\mathop {\max} \limits_{x_j\in X-S_{m-1}} \left[ {I(x_j;f_\lambda )-\displaystyle{1 \over {m-1}}\mathop I\limits_{x_i\in S_{m-1}} (x_j;x_i)} \right].$$
The key attributes obtained from this method constitute the product genes of product P.
(e) The product genes of TP are formed and stored in a product-gene library. The attributes obtained from the above four steps show the common characteristics that belong to the same class as TP. These characteristics are the most basic features that characterize TP. Apart from the common characteristics, the personality characteristics of TP should also be represented in the product-gene acquisition process. These characteristics indicate that TP is different from the other products in its class. Besides the geometric features that represent the product layout, the parameter values for most features are not required in the conceptual design stage. Therefore, we must only retain the parameter values for the geometric features that involve the product layout.

Together with behaviors corresponding to every function element, the product genes are stored as coded chains in the product-gene library using the coding method proposed in Table 1. The form of the product-gene library is presented in Table 2.

Table 2. Product-gene library form

Use of product genes in conceptual design

The obtained product genes could be used to drive the conceptual design process for engineering products. Target functions could be gained according to design requirements and then be decomposed into functional elements through PDR. Product genes corresponding to function elements could be retrieved from the product-gene library and be used as key information to express function elements. Analogous to the biological gene expression process, structure schemes could be obtained through the product-gene expression process, which involves the transcription and translation process. Thus, the conceptual design process driven by product genes is schematically presented in Figure 5.

Fig. 5. Conceptual design driven by product genes.

Product genes, obtained through the proposed acquisition methodology, contain key information important for realizing the product and the conceptual design process. This enables the clarification and standardization of product information, which solves the problems caused by the limited fuzzy design information in the conceptual design stage, that is, it reduces the workload and time of information retrieval and increases the feasibilities of solutions.

Illustrative example

We take a product-gene extraction process of an agricultural solid spray aircraft shell, hereafter labeled AS, as an example to illustrate the methodology proposed in this paper. The shell is a device that forms the aircraft shape, holds solid spray material, and connects to and mounts other parts of the aircraft. AS can withstand various loads from the exterior and interior during flight. In addition, the shape and surface quality of AS have a significant influence on the resistance and friction encountered during the flight. Further, the volume of AS determines the amount of internal space and the spray material. A structure schematic diagram of AS is shown in Figure 6.

Fig. 6. Structure schematic diagram of AS.

The parameters of AS are listed in Table 3.

Table 3. Key attributes of shell AS

In accordance with the product-gene acquisition methodology proposed in the section “Product-gene definition and structure”, the gene acquisition process of AS was implemented as follows:

1. The main functions and behaviors were analyzed according to the AS action process. Then, the functions were decomposed into function elements. The verbs and functional elements obtained in this step are listed in Table 4.

Table 4. Main functions and behaviors of AS

According to the SV(P)O semantic annotation of the action process, the main functions could be decomposed into four function elements, as detailed in Table 5.

Table 5. Functional elements of AS

2. A sample set of the same family as AS was composed. In this study, 99 cases were selected from the case library to form a sample set. Their attributes were identical to those of AS, while the attribute values were different. The attributes and their values are displayed in Figure 7. Assuming that the sample set D = {x ₀, x ₁, · · · , x ₉₈} is a set of 99 unmarked samples, each sample is represented by x _i = (x _i0, x _i1, · · · , x _ij · · · , x _im), where m + 1 corresponds to the number of parameters x _i contains. In this example, m = 15.

Fig. 7. Portion of attributes and their corresponding values for sample set.

3. The K-means algorithm was used to cluster samples and determine the category (called C) to which AS belongs. Generally, according to the cone angle values, the aircraft shell could be divided into three types: (1) cone-column-skirt shape (cone angle 1 ≠ 0, cone angle 2 = 0, and cone angle 3 ≠ 0); (2) double cone (cone angle 1 ≠ 0, cone angle 2 = cone angle 3 ≠ 0); and (3) three cone (cone angle 1 ≠ 0, cone angle 2 ≠ 0, and cone angle 3 ≠ 0). Therefore, in this study, the variable k in the K-means algorithm was set to 3.

The dimensions of every attribute were not exactly the same. Further, there were some dimensionless quantities. Thus, it was necessary to first normalize the parameters after the data were collected. The normalization was implemented using the max–min-normalization method, expressed as

(11)

$${X}^{\prime}_i = \displaystyle{{X_i-X_{\min}} \over {X_{\max} -X_{\min} + e}},$$

where X _max and X _min are the maximum and minimum data values of the original dataset, respectively, and x _i and ${X}^{\prime}_i$ are the original and normalized data values, respectively. Further, e is a denominator supplement term, having a very small value. The role of e is to prevent mathematical errors when the denominator is zero.

In this study, the K-means clustering algorithm was applied to cluster samples into three categories, and the maximum number of iterations is set to 500. Python software was used for programming, and the clustering results are shown in Figs. 8–10. There were 27, 48, and 24 samples, respectively, in clusters 1−3.

Fig. 8. Two-dimensional visualization figure of clustering results.

Fig. 9. Portion of K-means clustering results for f samples.

Fig. 10. Centroids of first 16 iterations.

Then, the distances between AS and the centroids of every cluster were calculated using Eq. (1). The results show that the distance between AS and the centroid of cluster 2 was the smallest; thus, AS belonged to cluster 2.

4. Product-gene extraction from cluster 2 was performed using MI-based feature selection. The corresponding indicators of function element F ₁–F ₄ are max-stress, freedom, capacity, and max-flying resistance, respectively, as detailed in Figure 11.

Fig. 11. Portion of function elements' corresponding indicators of sample set.

The MI-based feature selection algorithm, based on the mRMR criterion (Peng et al., Reference Peng, Long and Ding2005), was adopted to extract the product genes from cluster 2. The first-order incremental strategy was used for feature selection. Suppose X was the feature set with all the features of cluster 2, and we already had S _m−1, the feature set with m − 1 features. The task is to select the mth feature from the set {X − S _m−1}. The objective function is

(12)

$$\mathop {\max} \limits_{x_{\ast j} \in X-S_{m-1}} \left[ {{\rm NMI}(x_{{\rm {^\ast}}j};f_\lambda )-\displaystyle{1 \over {m-1}}\mathop {{\rm NMI}}\limits_{x_{{\rm {^\ast}}i}\in S_{m-1}} (x_{{^\ast}j};x_{{^\ast}i})} \right],$$

where NMI is the normalization of MI. It could be expressed as

(13)

$${\rm NMI}(X\comma \,Y) = \displaystyle{{2{\rm MI}(X\comma \,Y)} \over {H(X) + H(Y)}}.$$

In Eq. (13), H(X) and H(Y) are entropy of X and Y, respectively.

The first-order incremental algorithm was implemented in Python 3.6, and the results are shown in Table 6.

Table 6. Results of MI-based feature selection algorithm

In Table 6, the number in each cell represents the subscript of the corresponding function element or attribute. This table gives the order of the attributes for each functional element according to the mRMR criterion. However, all features are included. In this study, eightfold cross-validation was carried out and, attributes, corresponding to every function element, were obtained with low classification error. Due to space limitations, the verification process is no longer detailed in this paper.

Together with the action verbs obtained in step (1), product genes are acquired.

5. Product genes are stored in the form of coding chains in the product-gene library, as detailed in Table 7.

Table 7. Product-gene library fragment for AS

The obtained product genes are stored into a product-gene library in the form of coding chains, as shown in Table 6. In the product design, designers could retrieve the product genes from the product-gene pool according to the required functions. Assume that the functional elements, obtained through decomposing the target function, contain the functional element that “withstands various loads from internal and external,” the designer can find the first product-gene chain in Table 6 from the product-gene pool and other product-gene chains capable of achieving this functional element. Then, these product-gene chains could be expressed as structural schemes through the transcription and translation process. The structural solutions corresponding to each functional element are randomly combined to form structural solutions of the product. Finally, these structural solutions are optimized and the optimal one is determined. If the optimal solution meets the design requirements, it is the final structural solution of the product. If not, the design iteration process is re-run until a product structure that meets the design requirements is designed.

Case analysis

We have proposed an intelligent methodology for product-gene acquisition. This methodology combines a K-means clustering algorithm and MI-based feature selection algorithm and can obtain product genes from existing product information. The application of this approach to the case of an agricultural solid spray aircraft shell has proved its effectiveness.

In our proposed approach, the input is the product information, which includes the structure, parameters, and action process. Information such as behaviors, attributes, and function elements can be obtained through the SV(P)O semantic annotation method, which is a knowledge-based engineering method based on domain ontologies. Because of space limitations, we do not elaborate on this aspect in this article.

To extract the product genes from the AS information, 99 samples were first selected from the AS product family and then clustered through the K-means algorithm into three clusters. Then, the distances between AS and every centroid of these three clusters were computed to identify the cluster in which AS was located. Finally, the key attributes of the product genes were selected from AS and the samples of its cluster through the MI-based feature selection algorithm and eightfold cross-validation. Note that, in the feature selection process, we chose the AS product information and all the 48 samples in cluster 2 as the selected object, instead of the AS product information only. This choice was made because the product genes constitute key information that can be transferred from parent to offspring products; this information can also determine the categories to which the products belong. Compared with the selection of one AS only as a research object, this method of obtaining product genes using multiple products in a product cluster is more general. Product genes obtained in this manner can share the common characteristics of this product type.

The clustering algorithm was used when determining the category of AS. In this study, the Euclidean distance was applied to calculate the distance between two samples; this is the most commonly used distance calculation method. The AS example selected in this paper has a large number and multidimensional characteristics, and the cluster number could be predetermined. K-means clustering algorithm is simple, and it is suitable for both general case and big data case. Therefore, K-means was adopted in this work. The K-means algorithm package in the Sklearn library [implemented in the Python programming language – developed by Guido van Rossum, 1989 (Morgan et al., Reference Morgan, Perley and Cenko2013)] was employed. To compensate for the influences of the initial centroids, the algorithm defaults to 10 centroid initializations. Then, the algorithm is implemented and the best result is returned. The maximum iteration time was set to 500. The iteration ended when the error between the ith iteration and the (i + 1)th iteration was less than 0.0001 or when 500 iterations were performed. In addition, we conducted a dimension reduction visualization. The results showed that the clustering results tend to be stable at the 16th iteration, as shown in Table 7. Further, the K-means clustering algorithm quickly reached convergence and terminated in 16.8 s.

After the sample clustering and cluster determination of AS was complete, feature selection was performed. The purpose of this step was to select key attributes relevant to the function elements. The selected object was AS and 48 samples in cluster 2 obtained in the K-means clustering process. In this study, we regarded the problem of feature selection as a classification problem; that is, for a functional element, the related and unrelated features were divided into two categories. In classification problems, the optimal characterization condition often means the minimal classification error. This usually requires the use of the max-dependency scheme to accomplish this goal in an unsupervised situation. As the max-dependency criterion is difficult to get an accurate estimation for multivariate density, and the mRMR criterion is equivalent to the max-dependency criterion if only one feature is selected at a time, and it can effectively improve the classification accuracy and has lower time cost, we used the first-order mRMR method to select features relevant to the functional elements. The forward selection programming method was applied, and the selection results are shown in Table 6. The MI-based feature selection algorithm finished in 25.5 s. Then, an eightfold cross-validation was carried out to set the feature number that resulted in the smallest classification error. Together with the behaviors obtained through the SV(P)O semantic annotation, product genes were acquired and stored in a product-gene library, as detailed in Table 7.

The results prove that the product-gene acquisition method proposed in this paper, combined a K-means clustering algorithm and an MI-based feature selection algorithm, is an intelligent method, as it carried out the task in 43 s with no human involvement. Furthermore, product genes stored in the library correspond to function elements. In this way, designers can quickly find the corresponding product genes according to the desired function elements.

Conclusion

We present an intelligent acquisition method for product genes and show that it is effective and intelligent. Product genes defined in this paper are the key product information that determine the intrinsic characteristics of products and influence the conceptual design process. The acquisition of product genes is the basis for conceptual design based on product genes. The work of this paper lays the foundation for conceptual design based on product genes. In addition, structural schemes obtained according to product genes will be more feasible than those obtained according to fuzzy, finite, and uncertain information in the conceptual design process. Furthermore, the conceptual design is a process of expressing target functions as structural schemes. The product-gene coding and storage method proposed in this paper are based on function elements, which will be of benefit for designers to quickly and efficiently retrieve product genes corresponding to functions from the product-gene pool. This will be helpful for improving design efficiency and reducing labor intensity of designers.

In this paper, product genes are extracted from a large number of existing products that are in the same family as the TP. How to obtain product genes based on limited or existing small sample products is one of our research priorities in the future.

Acknowledgements

The authors thank the anonymous reviewers for their valuable comments that served to enhance this paper.

Funding

Financial support is gratefully acknowledged from the National Ministries (JCKY2016602B007) and the National Natural Science Foundation of China (NSFC 51375049).

Pan Li received a BS degree from the Shandong University of Technology in 2010, an M.S. degree from the Shandong University of Technology in 2013, and a PhD degree from the Beijing Institute of Technology in 2019. She is currently a technical researcher in China Justice Big Data Institute. Her research interests include conceptual design, artificial intelligence, knowledge engineering, and wisdom court.

Yanzhao Ren received a BS degree from the Shandong University of Technology in 2010, an MS degree from the Shandong University of Technology in 2012, and a PhD degree from China Agricultural University in 2018. He is currently a Postdoctoral at China Agricultural University, China. His current research interests include the Agricultural Internet of things and Agricultural intelligent equipment.

Yan Yan received a BS degree in mechanical engineering from the Beijing Institute of Technology, China, in 1989 and the PhD degree in mechanical engineering from the Beijing Institute of Technology, China, in 2001. She is currently a professor at the Beijing Institute of Technology, China. Her current research interests include reconfigurable manufacturing systems, intelligent design, and knowledge engineering.

Guoxin Wang received a BS degree from Lanzhou Jiaotong University in 2001, an MS degree from Lanzhou Jiaotong University in 2004, and a PhD degree from the Beijing Institute of Technology in 2007. He was a visiting scholar at the University of Oklahoma, USA from 2014 to 2015. He is currently an associate professor at the Beijing Institute of Technology, China. His current research interests include reconfigurable manufacturing systems, intelligent design, and knowledge engineering.

References

Ai, QS and Wang, Y (2012) Review of contemporary product gene research in design and modeling areas. Journal of Advanced Mechanical Design, Systems, and Manufacturing 6, 1234–1249.CrossRef Google Scholar

Ai, QS, Wang, Y and Liu, Q (2013) An intelligent method of product scheme design based on product gene. Advances in Mechanical Engineering 5, 323–335.CrossRef Google Scholar

Amirteimoori, A and Kordrostami, S (2010) A Euclidean distance-based measure of efficiency in data envelopment analysis. Optimization 59, 985–996.CrossRef Google Scholar

Battiti, R (1994) Using mutual information for selecting features in supervised neural net learning. IEEE Transactions on Neural Networks 5, 537–550.CrossRef Google Scholar PubMed

Bhattacharyya, MK, Smith, AM, Ellis, THN, Hedley, C and Martin, C (1990) The wrinkled-seed character of pea described by Mendel is caused by a transposon-like insertion in a gene encoding starch-branching enzyme. Cell 60, 115–122.CrossRef Google Scholar

Bogoni, L (1998) More than just shape: a representation for functionality 1. Artificial Intelligence in Engineering 12, 337–354.CrossRef Google Scholar

Bucknall, RWG and Ciaramella, KM (2010) On the conceptual design and performance of a matrix converter for marine electric propulsion. IEEE Transactions on Power Electronics 25, 1497–1508.CrossRef Google Scholar

Chakrabarti, A, Shea, K, Stone, R, Cagan, J, Campbell, M, Hernandez, NV and Wood, KL (2011) Computer-based design synthesis research: an overview. Journal of Computing and Information Science in Engineering 11, 519–523.CrossRef Google Scholar

Chandrashekar, G and Sahin, F (2014) A survey on feature selection methods. Computers & Electrical Engineering 40, 16–28.CrossRef Google Scholar

Chen, KZ and Feng, XA (2004) Virtual genes of manufacturing products and their reforms for product innovative design. Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science 218, 557–574.Google Scholar

Chen, KZ and Feng, XA (2009) A gene-engineering-based design method for the innovation of manufactured products. Journal of Engineering Design 20, 175–193.CrossRef Google Scholar

Chen, K, Feng, X and Chen, X (2005 a) Reverse deduction of virtual chromosomes of manufactured products for their gene-engineering-based innovative design. Computer-Aided Design 37, 1191–1203.CrossRef Google Scholar

Chen, Y, Feng, PE and Lin, ZQ (2005 b) A genetics-based approach for the principle conceptual design of mechanical products. International Journal of Advanced Manufacturing Technology 27, 225–233.CrossRef Google Scholar

Chen, Y, Feng, PE, He, B, Lin, ZQ and Xie, YB (2006) Automated conceptual design of mechanisms using improved morphological matrix. Journal of Mechanical Design 128, 516–526.CrossRef Google Scholar

Chen, S, Yan, Y and Wang, GX (2011) Product-design knowledge retrieval based on ontology. Journal of Beijing Institute of Technology 20, 379–386.Google Scholar

Chris, MM, Ying, L and Daniel, MA (2016) Ontology-based executable design decision template representation and reuse. Ai Edam Artificial Intelligence for Engineering Design Analysis and Manufacturing 30, 390–405.Google Scholar

Deng, YM, Tor, SB and Britton, GA (2000) Abstracting and exploring functional design information for conceptual mechanical product design. Engineering with Computers 16, 36–52.CrossRef Google Scholar

Feng, PE, Chen, Y and Zhang, S (2002) Product gene based conceptual design. Chinese Journal of Mechanical Engineering 38, 1–6.CrossRef Google Scholar

Georgiou, A, Haritos, G, Fowler, M and Imani, Y (2016) Attribute and technology value mapping for conceptual product design phase. ARCHIVE Proceedings of the Institution of Mechanical Engineers Part C Journal of Mechanical Engineering Science 230, 1745–1756.CrossRef Google Scholar

Gero, JS (1990) Design prototypes: a knowledge representation schema for design. AI Magazine 11, 26–36.Google Scholar

Gero, JS (2000) Computational models of innovative and creative design processes. Technological Forecasting and Social Change 64, 183–196.CrossRef Google Scholar

Gero, JS and Kazakov, V (1998) A evolving design genes in space layout planning problems. Artificial Intelligence in Engineering 12, 163–176.CrossRef Google Scholar

Gero, JS and Udo, K (2004) The situated function-behaviour-structure framework. Design Studies 25, 373–391.CrossRef Google Scholar

Guyon, I, Elisseeff, A and Kaelbling, LP (2003) An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182.Google Scholar

Hao, J, Yang, HC and Yan, Y (2012) Configurable knowledge component technology oriented to product design tasks. Computer Integrated Manufacturing Systems 18, 705–712.Google Scholar

Hao, J, Yan, Y, Gong, L, Wang, GL and Lin, JJ (2014) Knowledge map-based method for domain knowledge browsing. Decision Support Systems 61, 106–114.CrossRef Google Scholar

Huang, HZ, Liu, Y, Li, YF, Xue, LH and Wang, ZL (2013) New evaluation methods for conceptual design selection using computational intelligence techniques. Journal of Mechanical Science and Technology 27, 733–746.CrossRef Google Scholar

Jordan, MI and Mitchell, TM (2015) Machine learning: trends, perspectives, and prospects. Science 349, 255–260.CrossRef Google Scholar PubMed

Keerthi, SS, Shevade, SK, Bhattacharyya, C and Murthy, KRK (2014) Improvements to Platt's SMO algorithm for SVM classifier design. Neural Computation 13, 637–649.CrossRef Google Scholar

Kitamura, Y and Mizoguchi, R (2003) Ontology-based description of functional design knowledge and its use in a functional way server. Expert Systems with Applications 24, 153–166.CrossRef Google Scholar

Kohavi, R and John, GH (1997) Wrappers for feature subset selection. Artificial Intelligence 97, 273–324.CrossRef Google Scholar

Kumar, KM and Reddy, ARM (2016) A fast DBSCAN clustering algorithm by accelerating neighbor searching using Groups method. Pattern Recognition 58, 39–48.CrossRef Google Scholar

Lazar, C, Taminau, J, Meganck, S, Steenhoff, D, Coletta, A, Molter, C, de Schaetzen, V, Duque, R, Bersini, H and Nowe, A (2012) A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Transactions on Computational Biology and Bioinformatics 9, 1106–1119.CrossRef Google Scholar PubMed

Lin, H, Gao, JZ, Zhou, Y, Lu, GL, Ye, M, Zhang, CX, Liu, LG and Yang, RG (2013) Semantic decomposition and reconstruction of residential scenes from LiDAR data. ACM Transactions on Graphics 32, 66.CrossRef Google Scholar

Luo, SJ, Sun, SQ, Pan, YH and Zhu, SS. (2004) A case study on product collaborative conceptual design technology based on user implicit knowledge. International Conference on Computer Supported Cooperative Work in Design 1, Xiamen, China, pp. 191–196.Google Scholar

Macqueen, J (1967) Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability 1, Oakland, CA, USA, pp. 281–297.Google Scholar

Manning, CD, Raghavan, P and Schütze, H (2010) Introduction to Information Retrieval. England: Cambridge University Press.Google Scholar

Messac, A, Martinez, MP and Simpson, TW (2013) Introduction of a product family penalty function using physical programming. Journal of Mechanical Design 124, 164–172.CrossRef Google Scholar

Morgan, AN, Perley, DA and Cenko, S (2013) Evidence for dust destruction from the early-time colour change of GRB 120119A. Monthly Notices of the Royal Astronomical Society 440, 1810–1823.CrossRef Google Scholar

Ookawa, T, Hobo, T and Yano, M (2010) New approach for rice improvement using a pleiotropic QTL gene for lodging resistance and yield. Nature Communications 1, 132.CrossRef Google Scholar PubMed

Pahl, G, Beitz, W and Feldhusen, J (1984) Engineering Design. London: Springer.Google Scholar

Paninski, L (2014) Estimation of entropy and mutual information. Neural Computation 15, 1191–1253.CrossRef Google Scholar

Peng, H, Long, F and Ding, C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 1226–1238.CrossRef Google Scholar PubMed

Prabhakar, S and Goel, AK (1998) Functional modeling for enabling adaptive design of devices for new environments. Artificial Intelligence in Engineering 12, 417–444.CrossRef Google Scholar

Qian, L and Gero, JS (2009) Function–behavior–structure paths and their role in analogy-based design. Artificial Intelligence for Engineering Design Analysis and Manufacturing 10, 289–312.CrossRef Google Scholar

Reich, Y and Shai, O (2012) The interdisciplinary engineering knowledge genome. Research in Engineering Design 23, 251–264.CrossRef Google Scholar

Rodriguez, A and Laio, A (2014) Machine learning. Clustering by fast search and find of density peaks. Science 344, 1492–1496.CrossRef Google Scholar PubMed

Rudi, S and Yaqub, R (2015) Coevolutionary and genetic algorithm based building spatial and structural design. AIEDAM: Artificial Intelligence for Engineering Design Analysis and Manufacturing 29, 351–370.Google Scholar

Russell, SJ and Norvig, P (2013) Artificial Intelligence: A Modern Approach, 3rd Edn. China: Tsinghua University Press.Google Scholar

Saravanan, D and Srinivasan, DS (2011) A proposed new algorithm for hierarchical clustering suitable for video data mining. International journal of Data Mining and Knowledge Engineering 3, 565–568.Google Scholar

Selsted, ME and Ouellette, AJ (1995) Defensins in granules of phagocytic and non-phagocytic cells. Trends in Cell Biology 5, 114–119.CrossRef Google Scholar PubMed

Shang, Y, Huang, KZ and Zhang, QP (2009) Genetic model for conceptual design of mechanical products based on functional surface. The International Journal of Advanced Manufacturing Technology 42, 211–221.CrossRef Google Scholar

Srinivasan, V, Chakrabarti, A and Lindemann, U (2015) An empirical understanding of use of internal analogies in conceptual design. AIEDAM: Artificial Intelligence for Engineering Design Analysis and Manufacturing 29, 147–160.CrossRef Google Scholar

Srivastava, A, Subramaniyan, AK and Wang, L (2017) Analytical global sensitivity analysis with Gaussian processes. Artificial Intelligence for Engineering Design Analysis & Manufacturing 31, 235–250.CrossRef Google Scholar

Steimel, J, Harrmann, M and Schembecker, G (2013) Model-based conceptual design and optimization tool support for the early stage development of chemical processes under uncertainty. Computers & Chemical Engineering 59, 63–73.CrossRef Google Scholar

Tai, LG, Zhong, TX and Miao, ZH (2007) Product gene representation and acquisition method based on population of product cases. Chinese Journal of Mechanical Engineering 20, 114–119.CrossRef Google Scholar

Teng, R, Cao, X and Gao, S (2010) Hybrid integrated design and realizable strategy of database of mechanical product gene. 2010 Sixth International Conference on Natural Computation, Yantai, China, August 10–12, 2010, Volume 8, pp. 4039–4044, doi:10.1109/ICNC.2010.5584838.CrossRef Google Scholar

Ullman, DG (1992) The Mechanical Design Process. Vol. 2. New York: McGraw-Hill.Google Scholar

Umeda, Y, Ishii, M, Yoshioka, M and Shimomura, Y (2009) Supporting conceptual design based on the function-behavior-state modeler. AIEDAM: Artificial Intelligence for Engineering Design Analysis and Manufacturing 10, 275–288.CrossRef Google Scholar

Vermaas, PE and Dorst, K (2007) On the conceptual framework of John Gero's FBS-model and the prescriptive aims of design methodology. Design Studies 28, 133–157.CrossRef Google Scholar

Waris, MM, Sanin, C and Szczerbicki, E (2016) Framework for product innovation using SOEKS and decisional DNA. Asian Conference on Intelligent Information and Database Systems, Da Nang, Vietnam, March 14–16, 2016, pp. 480–489.Google Scholar

Ying, H, Li, SP and Guo, M (2004) Research on ontology-based product knowledge S-B-F representation model. Computer Integrated Manufacturing Systems 10, 30–38.Google Scholar

Yuan, G, Sun, P, Zhao, J, Wang, C and Wang, C (2017) A review of moving object trajectory clustering algorithms. Artificial Intelligence Review 47, 123–144CrossRef Google Scholar

Zhao, M, Liu, ZM, Wang, YQ and Shi, RM (2016) Process-Based Knowledge Engineering and Innovation. China: Aviation Industry Press.Google Scholar

Zheng, H, Feng, YX and Tan, JR (2015) Research on intelligent product conceptual design based on cognitive process. Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science 230, 2060–2072.Google Scholar

Zhu, HM, Zhang, YB and Zhu, MZ (2014) Research on the reverse mold assembly technology based on product gene. Applied Mechanics and Materials 475, 1463–1467Google Scholar