首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 609 毫秒
1.
基于聚类的快速多目标遗传算法   总被引:8,自引:1,他引:8  
多目标遗传算法非常适合于求解多目标优化问题.讨论了进化个体之间的支配关系及有关性质,论证了可以用快速排序的方法对进化群体中的个体进行分类,同时探讨了用聚类方法来保持群体的多样性,具体讨论了基于层次凝聚距离的聚类,在此基础上提出了用分类和聚类的方法构造新的进化群体.理论分析与实验结果表明,所讨论的方法比较国际上已有的方法具有更快的收敛速度.  相似文献   

2.
Clustering is an important tool in data mining process. Fuzzy \(c\)-means is one of the most classic methods. But it has been criticized that it is sensitive to the initial cluster centers and is easy to fall into a local optimum. Not depending on the selection of the initial population, evolutionary algorithm is used to solve the problems existed in original fuzzy \(c\)-means algorithm. However, evolutionary algorithm emphasizes the competition in the population. But in the real world, the evolution of biological population is not only the result of internal competition, but also the result of mutual competition and cooperation among different populations. Co-evolutionary algorithm is an emerging branch of evolutionary algorithm. It focuses on the internal competition, while on the cooperation among populations. This is more close to the process of natural biological evolution and co-evolutionary algorithm is a more excellent bionic algorithm. An immune clustering algorithm based on co-evolution is proposed in this paper. First, the clonal selection method is used to achieve the competition within population to reconstruct each population. The internal evolution of each population is completed during this process. Second, co-evolution operation is conducted to realize the information exchange among populations. Finally, the iteration results are compared with the global best individuals, with a strategy called elitist preservation, to find out the individual with a highest fitness value, that is, the result of clustering. Compared with four state-of-art algorithms, the experimental results indicate that the proposed algorithm outperforms other algorithms on the test data in the highest accuracy and average accuracy.  相似文献   

3.
A population-based algorithm-generator for real-parameter optimization   总被引:1,自引:1,他引:0  
In this paper, we propose a population-based, four-step, real-parameter optimization algorithm-generator. The approach divides the task of reaching near the optimum solution into four independent plans of (i) selecting good solutions from a solution base, (ii) generating new solutions using the selected solutions, (iii) choosing inferior or spurious solutions for replacement, and (iv) updating the solution base with good new or old solutions. Interestingly, many classical and evolutionary optimization algorithms are found to be representable by this algorithm-generator. The paper also recommends an efficient optimization algorithm with the possibility of using a number of different recombination plans and parameter values. With a systematic parametric study, the paper finally recommends a real-parameter optimization algorithm which outperforms a number of existing classical and evolutionary algorithms. To extend this study, the proposed algorithm-generator can be utilized to develop new and more efficient population-based optimization algorithms. The treatment of population-based classical and evolutionary optimization algorithms identically through the proposed algorithm-generator is the main hall-mark of this paper and should enable researchers from both classical and evolutionary fields to understand each others methods better and interact in a more coherent manner.  相似文献   

4.
A two-leveled symbiotic evolutionary algorithm for clustering problems   总被引:3,自引:3,他引:0  
Because of its unsupervised nature, clustering is one of the most challenging problems, considered as a NP-hard grouping problem. Recently, several evolutionary algorithms (EAs) for clustering problems have been presented because of their efficiency for solving the NP-hard problems with high degree of complexity. Most previous EA-based algorithms, however, have dealt with the clustering problems given the number of clusters (K) in advance. Although some researchers have suggested the EA-based algorithms for unknown K clustering, they still have some drawbacks to search efficiently due to their huge search space. This paper proposes the two-leveled symbiotic evolutionary clustering algorithm (TSECA), which is a variant of coevolutionary algorithm for unknown K clustering problems. The clustering problems considered in this paper can be divided into two sub-problems: finding the number of clusters and grouping the data into these clusters. The two-leveled framework of TSECA and genetic elements suitable for each sub-problem are proposed. In addition, a neighborhood-based evolutionary strategy is employed to maintain the population diversity. The performance of the proposed algorithm is compared with some popular evolutionary algorithms using the real-life and simulated synthetic data sets. Experimental results show that TSECA produces more compact clusters as well as the accurate number of clusters.  相似文献   

5.
Although the \(k\)-NN classifier is a popular classification method, it suffers from the high computational cost and storage requirements it involves. This paper proposes two effective cluster-based data reduction algorithms for efficient \(k\)-NN classification. Both have low preprocessing cost and can achieve high data reduction rates while maintaining \(k\)-NN classification accuracy at high levels. The first proposed algorithm is called reduction through homogeneous clusters (RHC) and is based on a fast preprocessing clustering procedure that creates homogeneous clusters. The centroids of these clusters constitute the reduced training set. The second proposed algorithm is a dynamic version of RHC that retains all its properties and, in addition, it can manage datasets that cannot fit in main memory and is appropriate for dynamic environments where new training data are gradually available. Experimental results, based on fourteen datasets, illustrate that both algorithms are faster and achieve higher reduction rates than four known methods, while maintaining high classification accuracy.  相似文献   

6.
Clustering analysis is one of the most commonly used data processing algorithms. Over half a century, K-means remains the most popular clustering algorithm because of its simplicity. Recently, as data volume continues to rise, some researchers turn to MapReduce to get high performance. However, MapReduce is unsuitable for iterated algorithms owing to repeated times of restarting jobs, big data reading and shuffling. In this paper, we address the problems of processing large-scale data using K-means clustering algorithm and propose a novel processing model in MapReduce to eliminate the iteration dependence and obtain high performance. We analyze and implement our idea. Extensive experiments on our cluster demonstrate that our proposed methods are efficient, robust and scalable.  相似文献   

7.
The one-dimensional clustering aims to group real-values of an input array into identified number of clusters. Some of the current algorithms, such as the k-means, need the number of clusters in advance, and use a goal function based on minimizing the sum of squared Euclidean distances to the mean of each group. This paper shows why this goal function is not efficient, even for one-dimensional case, then proposes an O (n × log n) efficient algorithm for the one-dimensional clustering purposes. The proposed algorithm can automatically detect the number of clusters. The performance of the proposed algorithm is approved across several experiments. In addition, results of experiments show why the goal function used in some current algorithms like the k-means is not suitable for the one-dimensional clustering.  相似文献   

8.
In this paper an efficient evolutionary algorithm is proposed which could be applied to real-time problems such as robotics applications. The only parameter of the proposed algorithm is the “Population Size” which makes the proposed algorithm similar to parameter-less algorithms, and the only operator applied during the algorithm execution is the bacterial conjugation operator, which makes using and implementation of the proposed algorithm much easier. The procedure of the bacterial conjugation operator used in this algorithm is different from operators of the same name previously used in other evolutionary algorithms such as the pseudo bacterial genetic algorithm or the microbial genetic algorithm. For a collection of 23 benchmark functions and some other well-known optimization problems, the experimental results show that the proposed algorithm has better performance when compared to particle swarm optimization and a simple genetic algorithm.  相似文献   

9.
The heuristic clustering methods based on similarity coefficient are considered to be very efficient for providing modularity and flexibility in a cellular manufacturing systems (CMS's). Various algorithms have been implemented in these heuristic methods. However, these algorithms suffer from string effect which is also known as “chaining”. Sedveral studies have reported this problem, yet not much research has been conducted to investigate its impact on actual clustering process.

This paper presents results from an analytical study performed to determine the severity of chaining problem and other characteristics associated with the clustering process of four selected algorithms. The four algorithms are Single linkage clustering (SLINK), Average linkage clustering (ALINK), Weighted average linkage clustering (WLINK), and Complete linkage clustering (CLINK). A sample of fifty problems with randomly generated data sets was used to determine feasible solutions consisting of machine cells and corresponding part families from each of the four algorithms. A quantitative measure is proposed for evaluating the performance of different algorithms. The study concludes that the chaining effect for CLINK, WLINK, ALINK and SLINK progresively worsens from CLINK to SLINK in the same order. The study also provides important guidelines to designers of a CMS in selecting the most efficient algorithm for a given problem data. Several important statistical results are also presented.  相似文献   


10.
Scalable Clustering Algorithms with Balancing Constraints   总被引:2,自引:0,他引:2  
Clustering methods for data-mining problems must be extremely scalable. In addition, several data mining applications demand that the clusters obtained be balanced, i.e., of approximately the same size or importance. In this paper, we propose a general framework for scalable, balanced clustering. The data clustering process is broken down into three steps: sampling of a small representative subset of the points, clustering of the sampled data, and populating the initial clusters with the remaining data followed by refinements. First, we show that a simple uniform sampling from the original data is sufficient to get a representative subset with high probability. While the proposed framework allows a large class of algorithms to be used for clustering the sampled set, we focus on some popular parametric algorithms for ease of exposition. We then present algorithms to populate and refine the clusters. The algorithm for populating the clusters is based on a generalization of the stable marriage problem, whereas the refinement algorithm is a constrained iterative relocation scheme. The complexity of the overall method is O(kN log N) for obtaining k balanced clusters from N data points, which compares favorably with other existing techniques for balanced clustering. In addition to providing balancing guarantees, the clustering performance obtained using the proposed framework is comparable to and often better than the corresponding unconstrained solution. Experimental results on several datasets, including high-dimensional (>20,000) ones, are provided to demonstrate the efficacy of the proposed framework.
Joydeep GhoshEmail:
  相似文献   

11.
Clustering is needed in various applications such as biometric person authentication, speech coding and recognition, image compression and information retrieval. Hundreds of clustering methods have been proposed for the task in various fields but, surprisingly, there are few extensive studies actually comparing them. An important question is how much the choice of a clustering method matters for the final pattern recognition application. Our goal is to provide a thorough experimental comparison of clustering methods for text-independent speaker verification. We consider parametric Gaussian mixture model (GMM) and non-parametric vector quantization (VQ) model using the best known clustering algorithms including iterative (K-means, random swap, expectation-maximization), hierarchical (pairwise nearest neighbor, split, split-and-merge), evolutionary (genetic algorithm), neural (self-organizing map) and fuzzy (fuzzy C-means) approaches. We study recognition accuracy, processing time, clustering validity, and correlation of clustering quality and recognition accuracy. Experiments from these complementary observations indicate clustering is not a critical task in speaker recognition and the choice of the algorithm should be based on computational complexity and simplicity of the implementation. This is mainly because of three reasons: the data is not clustered, large models are used and only the best algorithms are considered. For low-order models, choice of the algorithm, however, can have a significant effect.  相似文献   

12.
Multi-objective clustering algorithms are preferred over its conventional single objective counterparts as they incorporate additional knowledge on properties of data in the from of objectives to extract the underlying clusters present in many datasets. Researchers have recently proposed some standardized multi-objective evolutionary clustering algorithms based on genetic operations, particle swarm optimization, clonal selection principles, differential evolution and simulated annealing, etc. In many cases it is observed that hybrid evolutionary algorithms provide improved performance compared to that of individual algorithm. In this paper an automatic clustering algorithm MOIMPSO (Multi-objective Immunized Particle Swarm Optimization) is proposed, which is based on a recently developed hybrid evolutionary algorithm Immunized PSO. The proposed algorithm provides suitable Pareto optimal archive for unsupervised problems by automatically evolving the cluster centers and simultaneously optimizing two objective functions. In addition the algorithm provides a single best solution from the Pareto optimal archive which mostly satisfy the users' requirement. Rigorous simulation studies on 11 benchmark datasets demonstrate the superior performance of the proposed algorithm compared to that of the standardized automatic clustering algorithms such as MOCK, MOPSO and MOCLONAL. An interesting application of the proposed algorithm has also been demonstrated to classify the normal and aggressive actions of 3D human models.  相似文献   

13.
Partitional clustering is a common approach to cluster analysis. Although many algorithms have been proposed, partitional clustering remains a challenging problem with respect to the reliability and efficiency of recovering high quality solutions in terms of its criterion functions. In this paper, we propose a niching genetic k-means algorithm (NGKA) for partitional clustering, which aims at reliably and efficiently identifying high quality solutions in terms of the sum of squared errors criterion. Within the NGKA, we design a niching method, which encourages mating among similar clustering solutions while allowing for some competitions among dissimilar solutions, and integrate it into a genetic algorithm to prevent premature convergence during the evolutionary clustering search. Further, we incorporate one step of k-means operation into the regeneration steps of the resulted niching genetic algorithm to improve its computational efficiency. The proposed algorithm was applied to cluster both simulated data and gene expression data and compared with previous work. Experimental results clear show that the NGKA is an effective clustering algorithm and outperforms two other genetic algorithm based clustering methods implemented for comparison.  相似文献   

14.
Individual privacy may be compromised during the process of mining for valuable information, and the potential for data mining is hindered by the need to preserve privacy. It is well known that k-means clustering algorithms based on differential privacy require preserving privacy while maintaining the availability of clustering. However, it is difficult to balance both aspects in traditional algorithms. In this paper, an outlier-eliminated differential privacy (OEDP) k-means algorithm is proposed that both preserves privacy and improves clustering efficiency. The proposed approach selects the initial centre points in accordance with the distribution density of data points, and adds Laplacian noise to the original data for privacy preservation. Both a theoretical analysis and comparative experiments were conducted. The theoretical analysis shows that the proposed algorithm satisfies ε-differential privacy. Furthermore, the experimental results show that, compared to other methods, the proposed algorithm effectively preserves data privacy and improves the clustering results in terms of accuracy, stability, and availability.  相似文献   

15.
Outlier detection is an important data mining task with many contemporary applications. Clustering based methods for outlier detection try to identify the data objects that deviate from the normal data. However, the uncertainty regarding the cluster membership of an outlier object has to be handled appropriately during the clustering process. Additionally, carrying out the clustering process on data described using categorical attributes is challenging, due to the difficulty in defining requisite methods and measures dealing with such data. Addressing these issues, a novel algorithm for clustering categorical data aimed at outlier detection is proposed here by modifying the standard \(k\)-modes algorithm. The uncertainty regarding the clustering process is addressed by considering a soft computing approach based on rough sets. Accordingly, the modified clustering algorithm incorporates the lower and upper approximation properties of rough sets. The efficacy of the proposed rough \(k\)-modes clustering algorithm for outlier detection is demonstrated using various benchmark categorical data sets.  相似文献   

16.
The efficiency of universal electric motors that are widely used in home appliances can be improved by optimizing the geometry of the rotor and the stator. Expert designers traditionally approach this task by iteratively evaluating candidate designs and improving them according to their experience. However, the existence of reliable numerical simulators and powerful stochastic optimization techniques make it possible to automate the design procedure. We present a comparative study of six stochastic optimization algorithms in designing optimal rotor and stator geometries of a universal electric motor where the primary objective is to minimize the motor power losses. We compare three methods from the domain of evolutionary computation, generational evolutionary algorithm, steady-state evolutionary algorithm and differential evolution, two particle-based methods, particle-swarm optimization and electromagnetism-like algorithm, and a recently proposed multilevel ant stigmergy algorithm. By comparing their performance, the most efficient method for solving the problem is identified and an explanation of its success is offered.  相似文献   

17.

Context

Component identification, the process of evolving legacy system into finely organized component-based software systems, is a critical part of software reengineering. Currently, many component identification approaches have been developed based on agglomerative hierarchical clustering algorithms. However, there is a lack of thorough investigation on which algorithm is appropriate for component identification.

Objective

This paper focuses on analyzing agglomerative hierarchical clustering algorithms in software reengineering, and then identifying their respective strengths and weaknesses in order to apply them effectively for future practical applications.

Method

A series of experiments were conducted for 18 clustering strategies combined according to various similarity measures, weighting schemes and linkage methods. Eleven subject systems with different application domains and source code sizes were used in the experiments. The component identification results are evaluated by the proposed size, coupling and cohesion criteria.

Results

The experimental results suggested that the employed similarity measures, weighting schemes and linkage methods can have various effects on component identification results with respect to the proposed size, coupling and cohesion criteria, so the hierarchical clustering algorithms produced quite different clustering results.

Conclusions

According to the experimental results, it can be concluded that it is difficult to produce perfectly satisfactory results for a given clustering algorithm. Nevertheless, these algorithms demonstrated varied capabilities to identify components with respect to the proposed size, coupling and cohesion criteria.  相似文献   

18.
Sanghamitra  Sriparna   《Pattern recognition》2007,40(12):3430-3451
In this paper, an evolutionary clustering technique is described that uses a new point symmetry-based distance measure. The algorithm is therefore able to detect both convex and non-convex clusters. Kd-tree based nearest neighbor search is used to reduce the complexity of finding the closest symmetric point. Adaptive mutation and crossover probabilities are used. The proposed GA with point symmetry (GAPS) distance based clustering algorithm is able to detect any type of clusters, irrespective of their geometrical shape and overlapping nature, as long as they possess the characteristic of symmetry. GAPS is compared with existing symmetry-based clustering technique SBKM, its modified version, and the well-known K-means algorithm. Sixteen data sets with widely varying characteristics are used to demonstrate its superiority. For real-life data sets, ANOVA and MANOVA statistical analyses are performed.  相似文献   

19.
Meilă  Marina  Heckerman  David 《Machine Learning》2001,42(1-2):9-29
We compare the three basic algorithms for model-based clustering on high-dimensional discrete-variable datasets. All three algorithms use the same underlying model: a naive-Bayes model with a hidden root node, also known as a multinomial-mixture model. In the first part of the paper, we perform an experimental comparison between three batch algorithms that learn the parameters of this model: the Expectation–Maximization (EM) algorithm, a winner take all version of the EM algorithm reminiscent of the K-means algorithm, and model-based agglomerative clustering. We find that the EM algorithm significantly outperforms the other methods, and proceed to investigate the effect of various initialization methods on the final solution produced by the EM algorithm. The initializations that we consider are (1) parameters sampled from an uninformative prior, (2) random perturbations of the marginal distribution of the data, and (3) the output of agglomerative clustering. Although the methods are substantially different, they lead to learned models that are similar in quality.  相似文献   

20.
膜计算(也称为P系统或膜系统)是一种新颖的分布式、并行计算模型.为了处理数据聚类问题,提出了一种采用混合进化机制的膜聚类算法.它使用了一个由3个细胞组成的组织P系统,为一个待聚类的数据集发现最优的簇中心.其对象表示候选的簇中心,并且这3个细胞分别使用了3种不同的进化机制:遗传算子、速度-位移模型和差分进化机制.然而,所使用的速度-位移模型和差分进化机制是结合了这个特殊膜结构和转运机制所提出的改进版本.这种混合进化机制能够增强系统中对象的多样性和改善收敛性能.在混合进化机制和转运机制控制下,这种膜聚类算法能够确定一个数据集的良好划分.所提出的膜聚类算法在3个人工数据集和5个真实数据集上被评估,并与k-means和几种进化聚类算法进行比较.统计显著性测试建立了所提出的膜聚类算法的优势.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号