首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
针对K—Modes算法的不足,提出了一种基于信任值的分类属性聚类算法TrustCCluster,该算法不需预先给定聚类个数,聚类结果稳定且不依赖于初始值的选取。在真实数据上验证了TrustC—Cluster聚类算法,并与K—Modes和P—Modes算法进行了对比,实验结果表明TmstCCluster算法是有效、可行的。  相似文献   

2.
基于新的相异度量的模糊K-Modes聚类算法   总被引:3,自引:2,他引:1       下载免费PDF全文
白亮  曹付元  梁吉业 《计算机工程》2009,35(16):192-194
传统的模糊K-Modes聚类算法采用简单匹配方法度量对象与Mode之间的相异程度,没有充分考虑Mode对类的代表程度,容易造成信息的丢失,弱化了类内的相似性。针对上述问题,通过对象对类的隶属度反映Mode对类的代表程度,提出一种新的相异度量,并将它应用于传统的模糊K—Modes聚类算法。与传统的K—Modes和模糊K-Modes聚类算法相比,该相异度量是有效的。  相似文献   

3.
基于免疫规划的K-means聚类算法   总被引:48,自引:0,他引:48  
在分析K—means聚类算法的优越性和存在不足的基础上,提出了一种新的聚类算法——基于免疫规划的K—means聚类算法.理论分析和仿真结果表明,该算法不仅有效地克服了传统的K—means聚类算法易陷入局部极小值的缺点,而且明显地避免了对初始化选值敏感性的问题,同时也有较快的收敛速度.  相似文献   

4.
模糊K-Prototypes(FKP)算法能够对包含数值属性和分类属性相混合的数据集进行有效聚类,但是存在对初始值敏感、容易陷入局部极小值的问题.为了克服该缺点,提出了一种基于粒子群优化(PSO)算法和FKP算法的混合聚类算法,先利用PSO算法确定FKP的初始聚类中心,再将PSO聚类结果作为后续FKP算法的初始值.实验结果表明,新算法具有良好的收敛性和稳定性,聚类效果优于单一使用FKP算法.  相似文献   

5.
针对模糊C-均值聚类算法对初始化分类参数(包括起始聚类中心位置和初始化分类隶属度矩阵)的选择比较敏感而导致分类结果差异性较大,以及错误分类会给解决实际问题带来难以预料后果的不足,本文从反映数据聚类后类间分离性测度的划分系数入手,提出了可变加权划分系数的新概念,并用于数据分类效果的评价。实验结果表明,本文提出的评价方法不仅是可行的,而且比模糊C-均值聚类算法的目标函数作为数据分类效果的评价准则更好。  相似文献   

6.
为了改善K均值聚类算法对初始聚类中心敏感和易于陷入局部最优的不足,提出人工蜂群算法和K均值聚类算法相结合的想法,即基于人工蜂群优化的K均值聚类算法。通过全局寻优能力强的人工蜂群算法初始化K均值的聚类中心并优化聚类中心的位置,从而帮助K均值跳出局部极值,优化聚类效果。将混合聚类算法用Iris、Red Wine和New Red Wine数据集做聚类测试,结果表明该算法既克服了原始K均值聚类算法容易受初始聚类中心影响和不稳定的缺点,又具有良好的性能和聚类效果。  相似文献   

7.
基于二阶模糊聚类算法的雷达目标距离像识别   总被引:1,自引:0,他引:1  
彭翔  周代英 《计算机应用》2011,31(2):399-401
针对于模糊C-均值(FCM)算法敏感于聚类中心初始值的缺点,提出一种基于二阶模糊聚类方法。该方法利用传递闭包(TC)算法无初始化的优点,先对样本集按一定分类水平进行划分,选取若干类,求得这些类的样本均值作为FCM算法的初始聚类中心。一方面能够获得理想的聚类中心初始值,同时还能通过分类水平值来优化聚类中心数和聚类中心,避免局部最优,克服一致性聚类。利用该算法对三类飞机目标的实测一维距离像数据进行了识别实验,实验结果表明,基于二阶模糊聚类方法的识别率比FCM有了明显的改善。  相似文献   

8.
本文提出了基于信息熵和K均值算法混合迭代模糊聚类的客户细分模型,解决了模糊聚类的原型初始化参数问题.将信息熵和K均值算法引入模糊聚类中进行分析,并结合联通客户的大样本数据进行实际分析,与传统方法相比,取得了较好的效果.  相似文献   

9.
分析了模糊聚类中的FCM(Fuzzy C—Means)算法,利用该算法对一个TCP连接日志的抽样数据进行聚类,利用聚类中心对任选的两组数据集进行分类,并对聚类结果进行了分析。  相似文献   

10.
基于流数据的模糊聚类算法   总被引:1,自引:0,他引:1  
对流数据进行有效聚类是一个吸引研究者很大注意力的问题.传统的聚类挖掘算法只能适用于纯数值属性数据或纯分类属性数据,很难适用于混合属性的数据.针对混合属性数据的特点,在借鉴AcluStream算法的基础上,提出了一种模糊聚类算法.算法对流数据的相异度分类度量,定量属性使用欧氏距离和曼哈坦距离度量,定性属性可以采用hamming距离度量.模糊聚类算法的主要步骤有两步:第一步,运用最小距离聚类算法进行聚类,构成一个初始类.第二步,对基于最小距离聚类算法进行聚类所得到的初始簇,运用密度聚类方法进行聚合或分割,使得聚类集合稳定.实践证明:该算法是快速地有效的.  相似文献   

11.
A fuzzy k-modes algorithm for clustering categorical data   总被引:12,自引:0,他引:12  
This correspondence describes extensions to the fuzzy k-means algorithm for clustering categorical data. By using a simple matching dissimilarity measure for categorical objects and modes instead of means for clusters, a new approach is developed, which allows the use of the k-means paradigm to efficiently cluster large categorical data sets. A fuzzy k-modes algorithm is presented and the effectiveness of the algorithm is demonstrated with experimental results  相似文献   

12.
The k-means algorithm is well known for its efficiency in clustering large data sets. However, working only on numeric values prohibits it from being used to cluster real world data containing categorical values. In this paper we present two algorithms which extend the k-means algorithm to categorical domains and domains with mixed numeric and categorical values. The k-modes algorithm uses a simple matching dissimilarity measure to deal with categorical objects, replaces the means of clusters with modes, and uses a frequency-based method to update modes in the clustering process to minimise the clustering cost function. With these extensions the k-modes algorithm enables the clustering of categorical data in a fashion similar to k-means. The k-prototypes algorithm, through the definition of a combined dissimilarity measure, further integrates the k-means and k-modes algorithms to allow for clustering objects described by mixed numeric and categorical attributes. We use the well known soybean disease and credit approval data sets to demonstrate the clustering performance of the two algorithms. Our experiments on two real world data sets with half a million objects each show that the two algorithms are efficient when clustering large data sets, which is critical to data mining applications.  相似文献   

13.
On the impact of dissimilarity measure in k-modes clustering algorithm   总被引:3,自引:0,他引:3  
This correspondence describes extensions to the k-modes algorithm for clustering categorical data. By modifying a simple matching dissimilarity measure for categorical objects, a heuristic approach was developed in (Z. He, et al., 2005), (O. San, et al., 2004) which allows the use of the k-modes paradigm to obtain a cluster with strong intrasimilarity and to efficiently cluster large categorical data sets. The main aim of this paper is to rigorously derive the updating formula of the k-modes clustering algorithm with the new dissimilarity measure and the convergence of the algorithm under the optimization framework  相似文献   

14.
聚类是数据挖掘中重要的技术之一,它是按照相似原则将数据进行分类。然而分类型数据的聚类是学习算法中重要而又棘手的问题。传统的k-modes算法采用简单的0-1匹配方法定义两个属性值之间的相异度,没有将整个数据集的分布考虑进来,导致差异性度量不够准确。针对这个问题,提出基于结构相似性的k-modes算法。该算法不仅考虑属性值它们本身的异同,而且考虑了它们在其他属性下所处的结构。从集群识别和准确率两个方面进行仿真实验,表明基于结构相似性的k-modes算法在伸缩性和准确率方面更有效。  相似文献   

15.
提出了一种基于新相异度量的模糊K-Modes算法。该算法假定不同属性对聚类结果有不同程度的影响,定义了新的属性值函数,以基于划分相似度的聚类精确度作为聚类结果的评价准则。通过真实数据的实验结果表明,新的基于相异度量的模糊K-Modes算法比传统的模糊K-Modes算法有更好的聚类效果。  相似文献   

16.
Clustering categorical data sets using tabu search techniques   总被引:2,自引:0,他引:2  
Clustering methods partition a set of objects into clusters such that objects in the same cluster are more similar to each other than objects in different clusters according to some defined criteria. The fuzzy k-means-type algorithm is best suited for implementing this clustering operation because of its effectiveness in clustering data sets. However, working only on numeric values limits its use because data sets often contain categorical values. In this paper, we present a tabu search based clustering algorithm, to extend the k-means paradigm to categorical domains, and domains with both numeric and categorical values. Using tabu search based techniques, our algorithm can explore the solution space beyond local optimality in order to aim at finding a global solution of the fuzzy clustering problem. It is found that the clustering results produced by the proposed algorithm are very high in accuracy.  相似文献   

17.
In this research, a data clustering algorithm named as non-dominated sorting genetic algorithm-fuzzy membership chromosome (NSGA-FMC) based on K-modes method which combines fuzzy genetic algorithm and multi-objective optimization was proposed to improve the clustering quality on categorical data. The proposed method uses fuzzy membership value as chromosome. In addition, due to this innovative chromosome setting, a more efficient solution selection technique which selects a solution from non-dominated Pareto front based on the largest fuzzy membership is integrated in the proposed algorithm. The multiple objective functions: fuzzy compactness within a cluster (π) and separation among clusters (sep) are used to optimize the clustering quality. A series of experiments by using three UCI categorical datasets were conducted to compare the clustering results of the proposed NSGA-FMC with two existing methods: genetic algorithm fuzzy K-modes (GA-FKM) and multi-objective genetic algorithm-based fuzzy clustering of categorical attributes (MOGA (π, sep)). Adjusted Rand index (ARI), π, sep, and computation time were used as performance indexes for comparison. The experimental result showed that the proposed method can obtain better clustering quality in terms of ARI, π, and sep simultaneously with shorter computation time.  相似文献   

18.

针对粗糙模糊聚类算法对初值敏感、易陷入局部最优和聚类性能依赖阈值选择等问题, 提出一种混合蛙跳与阴影集优化的粗糙模糊聚类算法(SFLA-SRFCM). 通过设置自适应调节因子, 以增加混合蛙跳算法的局部搜索能力; 利用类簇上、下近似集的模糊类内紧密度和模糊类间分离度构造新的适应度函数; 采用阴影集自适应获取类簇阈值. 实验结果表明, SFLA-SRFCM 算法是有效的, 并且具有更好的聚类精度和有效性指标.

  相似文献   

19.
聚类算法是数据挖掘中的重要方法,针对现有适用类属性和混合型属性的数据集聚类算法如k-modes算法、k-prototypes算法和模糊k-prototypes算法等的不足,提出一种新的方法——类属性分解法。这种方法有更高的稳定性和可靠性,并能有效地减少随机性。  相似文献   

20.
Categorical data clustering is a difficult and challenging task due to the special characteristic of categorical attributes: no natural order. Thus, this study aims to propose a two-stage method named partition-and-merge based fuzzy genetic clustering algorithm (PM-FGCA) for categorical data. The proposed PM-FGCA uses a fuzzy genetic clustering algorithm to partition the dataset into a maximum number of clusters in the first stage. Then, the merge stage is designed to select two clusters among the clusters that generated in the first stage based on its inter-cluster distances and merge two selected clusters to one cluster. This procedure is repeated until the number of clusters equals to the predetermined number of clusters. Thereafter, some particular instances in each cluster are considered to be re-assigned to other clusters based on the intra-cluster distances. The proposed PM-FGCA is implemented on ten categorical datasets from UCI machine learning repository. In order to evaluate the clustering performance, the proposed PM-FGCA is compared with some existing methods such as k-modes algorithm, fuzzy k-modes algorithm, genetic fuzzy k-modes algorithm, and non-dominated sorting genetic algorithm using fuzzy membership chromosomes. Adjusted Ranked Index (ARI), Normalized Mutual Information (NMI), and Davies–Bouldin (DB) index are selected as three clustering validation indices which are represented to both external index (i.e., ARI and NMI) and internal index (i.e., DB). Consequently, the experimental result shows that the proposed PM-FGCA outperforms the benchmark methods in terms of the tested indices.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号