首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 843 毫秒
1.
In this paper a new multiobjective (MO) clustering technique (GenClustMOO) is proposed which can automatically partition the data into an appropriate number of clusters. Each cluster is divided into several small hyperspherical subclusters and the centers of all these small sub-clusters are encoded in a string to represent the whole clustering. For assigning points to different clusters, these local sub-clusters are considered individually. For the purpose of objective function evaluation, these sub-clusters are merged appropriately to form a variable number of global clusters. Three objective functions, one reflecting the total compactness of the partitioning based on the Euclidean distance, the other reflecting the total symmetry of the clusters, and the last reflecting the cluster connectedness, are considered here. These are optimized simultaneously using AMOSA, a newly developed simulated annealing based multiobjective optimization method, in order to detect the appropriate number of clusters as well as the appropriate partitioning. The symmetry present in a partitioning is measured using a newly developed point symmetry based distance. Connectedness present in a partitioning is measured using the relative neighborhood graph concept. Since AMOSA, as well as any other MO optimization technique, provides a set of Pareto-optimal solutions, a new method is also developed to determine a single solution from this set. Thus the proposed GenClustMOO is able to detect the appropriate number of clusters and the appropriate partitioning from data sets having either well-separated clusters of any shape or symmetrical clusters with or without overlaps. The effectiveness of the proposed GenClustMOO in comparison with another recent multiobjective clustering technique (MOCK), a single objective genetic algorithm based automatic clustering technique (VGAPS-clustering), K-means and single linkage clustering techniques is comprehensively demonstrated for nineteen artificial and seven real-life data sets of varying complexities. In a part of the experiment the effectiveness of AMOSA as the underlying optimization technique in GenClustMOO is also demonstrated in comparison to another evolutionary MO algorithm, PESA2.  相似文献   

2.
In this paper the problem of automatic clustering a data set is posed as solving a multiobjective optimization (MOO) problem, optimizing a set of cluster validity indices simultaneously. The proposed multiobjective clustering technique utilizes a recently developed simulated annealing based multiobjective optimization method as the underlying optimization strategy. Here variable number of cluster centers is encoded in the string. The number of clusters present in different strings varies over a range. The points are assigned to different clusters based on the newly developed point symmetry based distance rather than the existing Euclidean distance. Two cluster validity indices, one based on the Euclidean distance, XB-index, and another recently developed point symmetry distance based cluster validity index, Sym-index, are optimized simultaneously in order to determine the appropriate number of clusters present in a data set. Thus the proposed clustering technique is able to detect both the proper number of clusters and the appropriate partitioning from data sets either having hyperspherical clusters or having point symmetric clusters. A new semi-supervised method is also proposed in the present paper to select a single solution from the final Pareto optimal front of the proposed multiobjective clustering technique. The efficacy of the proposed algorithm is shown for seven artificial data sets and six real-life data sets of varying complexities. Results are also compared with those obtained by another multiobjective clustering technique, MOCK, two single objective genetic algorithm based automatic clustering techniques, VGAPS clustering and GCUK clustering.  相似文献   

3.
In this paper a new framework based on multiobjective optimization (MOO), namely FeaClusMOO, is proposed which is capable of identifying the correct partitioning as well as the most relevant set of features from a data set. A newly developed multiobjective simulated annealing based optimization technique namely archived multiobjective simulated annealing (AMOSA) is used as the background strategy for optimization. Here features and cluster centers are encoded in the form of a string. As the objective functions, two internal cluster validity indices measuring the goodness of the obtained partitioning using Euclidean distance and point symmetry based distance, respectively, and a count on the number of features are utilized. These three objectives are optimized simultaneously using AMOSA in order to detect the appropriate subset of features, appropriate number of clusters as well as the appropriate partitioning. Points are allocated to different clusters using a point symmetry based distance. Mutation changes the feature combination as well as the set of cluster centers. Since AMOSA, like any other MOO technique, provides a set of solutions on the final Pareto front, a technique based on the concept of semi-supervised classification is developed to select a solution from the given set. The effectiveness of the proposed FeaClustMOO in comparison with other clustering techniques like its Euclidean distance based version where Euclidean distance is used for cluster assignment, a genetic algorithm based automatic clustering technique (VGAPS-clustering) using point symmetry based distance with all the features, K-means clustering technique with all features is shown for seven higher dimensional data sets obtained from real-life.  相似文献   

4.
In this paper, the automatic segmentation of a multispectral magnetic resonance image of the brain is posed as a clustering problem in the intensity space. The automatic clustering problem is thereafter modelled as solving a multiobjective optimization (MOO) problem, optimizing a set of cluster validity indices simultaneously. A multiobjective clustering technique, named MCMOClust, is used to solve this problem. MCMOClust utilizes a recently developed simulated annealing based multiobjective optimization method as the underlying optimization strategy. Each cluster is divided into several small hyperspherical subclusters and the centers of all these small sub-clusters are encoded in a string to represent the whole clustering. For assigning points to different clusters, these local sub-clusters are considered individually. For the purpose of objective function evaluation, these sub-clusters are merged appropriately to form a variable number of global clusters. Two cluster validity indices, one based on the Euclidean distance, XB-index, and another recently developed point symmetry distance based cluster validity index, Sym-index, are optimized simultaneously to automatically evolve the appropriate number of clusters present in MR brain images. A semi-supervised method is used to select a single solution from the final Pareto optimal front of MCMOClust. The present method is applied on several simulated T1-weighted, T2-weighted and proton density normal and MS lesion magnetic resonance brain images. Superiority of the present method over Fuzzy C-means, Expectation Maximization clustering algorithms and a newly developed symmetry based fuzzy genetic clustering technique (Fuzzy-VGAPS), are demonstrated quantitatively. The automatic segmentation obtained by multiseed based multiobjective clustering technique (MCMOClust) is also compared with the available ground truth information.  相似文献   

5.
Classifying the pixels of satellite images into homogeneous regions is a very challenging task as different regions have different types of land covers. Some land covers contain more regions, while some contain relatively smaller regions (e.g., bridges, roads). In satellite image segmentation, no prior information is available about the number of clusters. Here, in this paper, we have solved this problem using the concepts of semi-supervised clustering which utilizes the property of unsupervised and supervised classification. Three cluster validity indices are utilized, which are simultaneously optimized using AMOSA, a modern multiobjective optimization technique based on the concepts of simulated annealing. The first two cluster validity indices, symmetry distance based Sym-index, and Euclidean distance based I-index, are based on unsupervised properties. The last one is a supervised information based cluster validity index, Minkowski index. For supervised information, initially fuzzy C-mean clustering technique is used. Thereafter, based on the highest membership values of the data points to their respective clusters, randomly 10 % data points with their class labels are chosen. The effectiveness of this proposed semi-supervised clustering technique is demonstrated on three satellite image data sets of different cities of India. Results are also compared with existing clustering techniques.  相似文献   

6.
提出了一种改进的基于对称点距离的蚂蚁聚类算法。该算法不再采用Euclidean距离来计算类内对象的相似性,而是使用新的对称点距离来计算相似性,在处理带有对称性质的数据集时,可以有效地识别给定数据集的聚类数目和合适的划分。在该算法中,用人工蚂蚁代表数据对象,根据算法给定的聚类规则来寻找最合适的聚类划分。最后用本算法与标准的蚂蚁聚类算法分别对不同的数据集进行了聚类实验。实验结果证实了算法的有效性。  相似文献   

7.
Clustering is a significant data mining task which partitions datasets based on similarities among data. This technique plays a very important role in the rapidly growing field known as exploratory data analysis. A key difficulty of effective clustering is to define proper grouping criteria that reflect fundamentally different aspects of a good clustering solution such as compactness and separation of clusters. Moreover, in the conventional clustering algorithms only a single criterion is considered that may not conform to the diverse and complex shapes of the underlying clusters. In this study, partitional clustering is defined as a multiobjective optimization problem. The aim is to obtain well-separated, connected, and compact clusters and for this purpose, two objective functions have been defined based on the concepts of data connectivity and cohesion. These functions are the core of an efficient multiobjective particle swarm optimization algorithm, which has been devised for and applied to automatic grouping of large unlabeled datasets. A comprehensive experimental study is conducted and the obtained results are compared with the results of four other state-of-the-art clustering techniques. It is shown that the proposed algorithm can achieve the optimal number of clusters, is robust and outperforms, in most cases, the other methods on the selected benchmark datasets.  相似文献   

8.
The objective of brain image segmentation is to partition the brain images into different non-overlapping homogeneous regions representing the different anatomical structures. Magnetic resonance brain image segmentation has large number of applications in diagnosis of neurological disorders like Alzheimer diseases, Parkinson related syndrome etc. But automatically segmenting the MR brain image is not an easy task. To solve this problem, several unsupervised and supervised based classification techniques have been developed in the literature. But supervised classification techniques are more time consuming and cost-sensitive due to the requirement of sufficient labeled data. In contrast, unsupervised classification techniques work without using any prior information but it suffers from the local trap problems. So, to overcome the problems associated with unsupervised and supervised classification techniques, we have proposed a new semi-supervised clustering technique using the concepts of multiobjective optimization and applied this technique for automatic segmentation of MR brain images in the intensity space. Multiple centers are used to encode a cluster in the form of a string. The proposed clustering technique utilizes intensity values of the brain pixels as the features. Additionally it also assumes that the actual class label information of 10% points of a particular image data set is also known. Three cluster validity indices are utilized as the objective functions, which are simultaneously optimized using AMOSA, a modern multiobjective optimization technique based on the concepts of simulated annealing. First two cluster validity indices are symmetry distance based Sym-index and Euclidean distance based I-index, which are based on unsupervised properties. Last one is a supervised information based cluster validity index, Minkowski Index. The effectiveness of this proposed semi-supervised clustering technique is demonstrated on several simulated MR normal brain images and MR brain images having some multiple sclerosis lesions. The performance of the proposed semi-supervised clustering technique is compared with some other popular image segmentation techniques like Fuzzy C-means, Expectation Maximization and some recent image clustering techniques like multi-objective based MCMOClust technique, and Fuzzy-VGAPS clustering techniques.  相似文献   

9.
In this article, a new symmetry based genetic clustering algorithm is proposed which automatically evolves the number of clusters as well as the proper partitioning from a data set. Strings comprise both real numbers and the don't care symbol in order to encode a variable number of clusters. Here, assignment of points to different clusters are done based on a point symmetry based distance rather than the Euclidean distance. A newly proposed point symmetry based cluster validity index, {em Sym}-index, is used as a measure of the validity of the corresponding partitioning. The algorithm is therefore able to detect both convex and non-convex clusters irrespective of their sizes and shapes as long as they possess the point symmetry property. Kd-tree based nearest neighbor search is used to reduce the complexity of computing point symmetry based distance. A proof on the convergence property of variable string length GA with point symmetry based distance clustering (VGAPS-clustering) technique is also provided. The effectiveness of VGAPS-clustering compared to variable string length Genetic K-means algorithm (GCUK-clustering) and one recently developed weighted sum validity function based hybrid niching genetic algorithm (HNGA-clustering) is demonstrated for nine artificial and five real-life data sets.  相似文献   

10.
Clustering for symbolic data type is a necessary process in many scientific disciplines, and the fuzzy c-means clustering for interval data type (IFCM) is one of the most popular algorithms. This paper presents an adaptive fuzzy c-means clustering algorithm for interval-valued data based on interval-dividing technique. This method gives a fuzzy partition and a prototype for each fuzzy cluster by optimizing an objective function. And the adaptive distance between the pattern and its cluster center varies with each algorithm iteration and may be either different from one cluster to another or the same for all clusters. The novel part of this approach is that it takes into account every point in both intervals when computing the distance between the cluster and its representative. Experiments are conducted on synthetic data sets and a real data set. To compare the comprehensive performance of the proposed method with other four existing methods, the corrected rand index, the value of objective function and iterations are introduced as the evaluation criterion. Clustering results demonstrate that the algorithm proposed in this paper has remarkable advantages.  相似文献   

11.
Due to data sparseness and attribute redundancy in high-dimensional data, clusters of objects often exist in subspaces rather than in the entire space. To effectively address this issue, this paper presents a new optimization algorithm for clustering high-dimensional categorical data, which is an extension of the k-modes clustering algorithm. In the proposed algorithm, a novel weighting technique for categorical data is developed to calculate two weights for each attribute (or dimension) in each cluster and use the weight values to identify the subsets of important attributes that categorize different clusters. The convergence of the algorithm under an optimization framework is proved. The performance and scalability of the algorithm is evaluated experimentally on both synthetic and real data sets. The experimental studies show that the proposed algorithm is effective in clustering categorical data sets and also scalable to large data sets owning to its linear time complexity with respect to the number of data objects, attributes or clusters.  相似文献   

12.
Identification of the correct number of clusters and the appropriate partitioning technique are some important considerations in clustering where several cluster validity indices, primarily utilizing the Euclidean distance, have been used in the literature. In this paper a new measure of connectivity is incorporated in the definitions of seven cluster validity indices namely, DB-index, Dunn-index, Generalized Dunn-index, PS-index, I-index, XB-index and SV-index, thereby yielding seven new cluster validity indices which are able to automatically detect clusters of any shape, size or convexity as long as they are well-separated. Here connectivity is measured using a novel approach following the concept of relative neighborhood graph. It is empirically established that incorporation of the property of connectivity significantly improves the capabilities of these indices in identifying the appropriate number of clusters. The well-known clustering techniques, single linkage clustering technique and K-means clustering technique are used as the underlying partitioning algorithms. Results on eight artificially generated and three real-life data sets show that connectivity based Dunn-index performs the best as compared to all the other six indices. Comparisons are made with the original versions of these seven cluster validity indices.  相似文献   

13.
Clustering is an important unsupervised learning technique widely used to discover the inherent structure of a given data set. Some existing clustering algorithms uses single prototype to represent each cluster, which may not adequately model the clusters of arbitrary shape and size and hence limit the clustering performance on complex data structure. This paper proposes a clustering algorithm to represent one cluster by multiple prototypes. The squared-error clustering is used to produce a number of prototypes to locate the regions of high density because of its low computational cost and yet good performance. A separation measure is proposed to evaluate how well two prototypes are separated. Multiple prototypes with small separations are grouped into a given number of clusters in the agglomerative method. New prototypes are iteratively added to improve the poor cluster separations. As a result, the proposed algorithm can discover the clusters of complex structure with robustness to initial settings. Experimental results on both synthetic and real data sets demonstrate the effectiveness of the proposed clustering algorithm.  相似文献   

14.
Data clustering using bacterial foraging optimization   总被引:1,自引:0,他引:1  
Clustering divides data into meaningful or useful groups (clusters) without any prior knowledge. It is a key technique in data mining and has become an important issue in many fields. This article presents a new clustering algorithm based on the mechanism analysis of Bacterial Foraging (BF). It is an optimization methodology for clustering problem in which a group of bacteria forage to converge to certain positions as final cluster centers by minimizing the fitness function. The quality of this approach is evaluated on several well-known benchmark data sets. Compared with the popular clustering method named k-means algorithm, ACO-based algorithm and the PSO-based clustering technique, experimental results show that the proposed algorithm is an effective clustering technique and can be used to handle data sets with various cluster sizes, densities and multiple dimensions.  相似文献   

15.
指定K个聚类的多均值聚类算法在K-均值算法的基础上设置了多个次类,以改善K-均值算法在非凸数据集上的劣势,并将多均值聚类问题形式化为优化问题,可以得到更优的聚类效果。但是该算法对初始原型敏感,且随机选取原型的方式使聚类结果不稳定。针对上述问题,提出一种稳定的K-多均值聚类算法,并对该算法的复杂度与收敛性进行了简要讨论。该算法先基于数据样本的最邻近关系构造图,根据图的连通分支将数据分为若干组,取每组数据的均值点作为初始原型,再用交替迭代的方法对优化问题进行求解,得到最后的聚类结果。在人工数据集和真实数据集上的实验表明,该算法具有更稳定更优越的聚类效果。  相似文献   

16.
This article describes a multiobjective spatial fuzzy clustering algorithm for image segmentation. To obtain satisfactory segmentation performance for noisy images, the proposed method introduces the non-local spatial information derived from the image into fitness functions which respectively consider the global fuzzy compactness and fuzzy separation among the clusters. After producing the set of non-dominated solutions, the final clustering solution is chosen by a cluster validity index utilizing the non-local spatial information. Moreover, to automatically evolve the number of clusters in the proposed method, a real-coded variable string length technique is used to encode the cluster centers in the chromosomes. The proposed method is applied to synthetic and real images contaminated by noise and compared with k-means, fuzzy c-means, two fuzzy c-means clustering algorithms with spatial information and a multiobjective variable string length genetic fuzzy clustering algorithm. The experimental results show that the proposed method behaves well in evolving the number of clusters and obtaining satisfactory performance on noisy image segmentation.  相似文献   

17.
为了改善量子行为粒子群优化算法的收敛性能,避免粒子早熟问题,提出了一种基于完全学习策略的量子行为粒子群优化算法。由此设计了一种新的数据聚类算法,新的聚类算法通过特殊的粒子编码方式在聚类过程中能够自动确定最佳的聚类数目。在五个测试数据集上与其他两种动态聚类算法进行聚类实验比较,实验结果表明,基于完全学习策略的量子行为粒子群优化动态聚类算法能够获得较好的聚类结果,有着良好的应用前景。  相似文献   

18.
Clustering divides data into meaningful or useful groups (clusters) without any prior knowledge. It is a key technique in data mining and has become an important issue in many fields. This article presents a new clustering algorithm based on the mechanism analysis of chaotic ant swarm (CAS). It is an optimization methodology for clustering problem which aims to obtain global optimal assignment by minimizing the objective function. The proposed algorithm combines three advantages into one: finding global optimal solution to the objective function, not sensitive to clusters with different size and density and suitable to multi-dimensional data sets. The quality of this approach is evaluated on several well-known benchmark data sets. Compared with the popular clustering method named k-means algorithm and the PSO-based clustering technique, experimental results show that our algorithm is an effective clustering technique and can be used to handle data sets with complex cluster sizes, densities and multiple dimensions.  相似文献   

19.
针对数据竞争算法采用欧式距离计算相似度、人为指定聚类簇数以及聚类中心无法准确自动确定等问题,提出了一种自动确定聚类中心的数据竞争聚类算法。引入了数据场的概念,使得计算出的势值更加符合数据集的真实分布;同时,结合数据点的势能与局部最小距离形成决策图完成聚类中心点的自动确定;根据近邻原则完成聚类。在人工以及真实数据集上的实验效果表明,提出的算法较原数据竞争算法具有更好的聚类性能。  相似文献   

20.
This paper addresses three major issues associated with conventional partitional clustering, namely, sensitivity to initialization, difficulty in determining the number of clusters, and sensitivity to noise and outliers. The proposed robust competitive agglomeration (RCA) algorithm starts with a large number of clusters to reduce the sensitivity to initialization, and determines the actual number of clusters by a process of competitive agglomeration. Noise immunity is achieved by incorporating concepts from robust statistics into the algorithm. RCA assigns two different sets of weights for each data point: the first set of constrained weights represents degrees of sharing, and is used to create a competitive environment and to generate a fuzzy partition of the data set. The second set corresponds to robust weights, and is used to obtain robust estimates of the cluster prototypes. By choosing an appropriate distance measure in the objective function, RCA can be used to find an unknown number of clusters of various shapes in noisy data sets, as well as to fit an unknown number of parametric models simultaneously. Several examples, such as clustering/mixture decomposition, line/plane fitting, segmentation of range images, and estimation of motion parameters of multiple objects, are shown  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号