共查询到20条相似文献,搜索用时 125 毫秒
1.
2.
3.
4.
模糊K-Prototypes(FKP)算法能够对包含数值属性和分类属性相混合的数据集进行有效聚类,但是存在对初始值敏感、容易陷入局部极小值的问题.为了克服该缺点,提出了一种基于粒子群优化(PSO)算法和FKP算法的混合聚类算法,先利用PSO算法确定FKP的初始聚类中心,再将PSO聚类结果作为后续FKP算法的初始值.实验结果表明,新算法具有良好的收敛性和稳定性,聚类效果优于单一使用FKP算法. 相似文献
5.
针对模糊C-均值聚类算法对初始化分类参数(包括起始聚类中心位置和初始化分类隶属度矩阵)的选择比较敏感而导致分类结果差异性较大,以及错误分类会给解决实际问题带来难以预料后果的不足,本文从反映数据聚类后类间分离性测度的划分系数入手,提出了可变加权划分系数的新概念,并用于数据分类效果的评价。实验结果表明,本文提出的评价方法不仅是可行的,而且比模糊C-均值聚类算法的目标函数作为数据分类效果的评价准则更好。 相似文献
6.
为了改善K均值聚类算法对初始聚类中心敏感和易于陷入局部最优的不足,提出人工蜂群算法和K均值聚类算法相结合的想法,即基于人工蜂群优化的K均值聚类算法。通过全局寻优能力强的人工蜂群算法初始化K均值的聚类中心并优化聚类中心的位置,从而帮助K均值跳出局部极值,优化聚类效果。将混合聚类算法用Iris、Red Wine和New Red Wine数据集做聚类测试,结果表明该算法既克服了原始K均值聚类算法容易受初始聚类中心影响和不稳定的缺点,又具有良好的性能和聚类效果。 相似文献
7.
基于二阶模糊聚类算法的雷达目标距离像识别 总被引:1,自引:0,他引:1
针对于模糊C-均值(FCM)算法敏感于聚类中心初始值的缺点,提出一种基于二阶模糊聚类方法。该方法利用传递闭包(TC)算法无初始化的优点,先对样本集按一定分类水平进行划分,选取若干类,求得这些类的样本均值作为FCM算法的初始聚类中心。一方面能够获得理想的聚类中心初始值,同时还能通过分类水平值来优化聚类中心数和聚类中心,避免局部最优,克服一致性聚类。利用该算法对三类飞机目标的实测一维距离像数据进行了识别实验,实验结果表明,基于二阶模糊聚类方法的识别率比FCM有了明显的改善。 相似文献
8.
9.
分析了模糊聚类中的FCM(Fuzzy C—Means)算法,利用该算法对一个TCP连接日志的抽样数据进行聚类,利用聚类中心对任选的两组数据集进行分类,并对聚类结果进行了分析。 相似文献
10.
基于流数据的模糊聚类算法 总被引:1,自引:0,他引:1
对流数据进行有效聚类是一个吸引研究者很大注意力的问题.传统的聚类挖掘算法只能适用于纯数值属性数据或纯分类属性数据,很难适用于混合属性的数据.针对混合属性数据的特点,在借鉴AcluStream算法的基础上,提出了一种模糊聚类算法.算法对流数据的相异度分类度量,定量属性使用欧氏距离和曼哈坦距离度量,定性属性可以采用hamming距离度量.模糊聚类算法的主要步骤有两步:第一步,运用最小距离聚类算法进行聚类,构成一个初始类.第二步,对基于最小距离聚类算法进行聚类所得到的初始簇,运用密度聚类方法进行聚合或分割,使得聚类集合稳定.实践证明:该算法是快速地有效的. 相似文献
11.
A fuzzy k-modes algorithm for clustering categorical data 总被引:12,自引:0,他引:12
This correspondence describes extensions to the fuzzy k-means algorithm for clustering categorical data. By using a simple matching dissimilarity measure for categorical objects and modes instead of means for clusters, a new approach is developed, which allows the use of the k-means paradigm to efficiently cluster large categorical data sets. A fuzzy k-modes algorithm is presented and the effectiveness of the algorithm is demonstrated with experimental results 相似文献
12.
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values 总被引:76,自引:0,他引:76
Zhexue Huang 《Data mining and knowledge discovery》1998,2(3):283-304
The k-means algorithm is well known for its efficiency in clustering large data sets. However, working only on numeric values
prohibits it from being used to cluster real world data containing categorical values. In this paper we present two algorithms
which extend the k-means algorithm to categorical domains and domains with mixed numeric and categorical values. The k-modes
algorithm uses a simple matching dissimilarity measure to deal with categorical objects, replaces the means of clusters with
modes, and uses a frequency-based method to update modes in the clustering process to minimise the clustering cost function.
With these extensions the k-modes algorithm enables the clustering of categorical data in a fashion similar to k-means. The
k-prototypes algorithm, through the definition of a combined dissimilarity measure, further integrates the k-means and k-modes
algorithms to allow for clustering objects described by mixed numeric and categorical attributes. We use the well known soybean
disease and credit approval data sets to demonstrate the clustering performance of the two algorithms. Our experiments on
two real world data sets with half a million objects each show that the two algorithms are efficient when clustering large
data sets, which is critical to data mining applications. 相似文献
13.
Ng MK Li MJ Huang JZ He Z 《IEEE transactions on pattern analysis and machine intelligence》2007,29(3):503-507
This correspondence describes extensions to the k-modes algorithm for clustering categorical data. By modifying a simple matching dissimilarity measure for categorical objects, a heuristic approach was developed in (Z. He, et al., 2005), (O. San, et al., 2004) which allows the use of the k-modes paradigm to obtain a cluster with strong intrasimilarity and to efficiently cluster large categorical data sets. The main aim of this paper is to rigorously derive the updating formula of the k-modes clustering algorithm with the new dissimilarity measure and the convergence of the algorithm under the optimization framework 相似文献
14.
聚类是数据挖掘中重要的技术之一,它是按照相似原则将数据进行分类。然而分类型数据的聚类是学习算法中重要而又棘手的问题。传统的k-modes算法采用简单的0-1匹配方法定义两个属性值之间的相异度,没有将整个数据集的分布考虑进来,导致差异性度量不够准确。针对这个问题,提出基于结构相似性的k-modes算法。该算法不仅考虑属性值它们本身的异同,而且考虑了它们在其他属性下所处的结构。从集群识别和准确率两个方面进行仿真实验,表明基于结构相似性的k-modes算法在伸缩性和准确率方面更有效。 相似文献
15.
提出了一种基于新相异度量的模糊K-Modes算法。该算法假定不同属性对聚类结果有不同程度的影响,定义了新的属性值函数,以基于划分相似度的聚类精确度作为聚类结果的评价准则。通过真实数据的实验结果表明,新的基于相异度量的模糊K-Modes算法比传统的模糊K-Modes算法有更好的聚类效果。 相似文献
16.
Clustering categorical data sets using tabu search techniques 总被引:2,自引:0,他引:2
Clustering methods partition a set of objects into clusters such that objects in the same cluster are more similar to each other than objects in different clusters according to some defined criteria. The fuzzy k-means-type algorithm is best suited for implementing this clustering operation because of its effectiveness in clustering data sets. However, working only on numeric values limits its use because data sets often contain categorical values. In this paper, we present a tabu search based clustering algorithm, to extend the k-means paradigm to categorical domains, and domains with both numeric and categorical values. Using tabu search based techniques, our algorithm can explore the solution space beyond local optimality in order to aim at finding a global solution of the fuzzy clustering problem. It is found that the clustering results produced by the proposed algorithm are very high in accuracy. 相似文献
17.
In this research, a data clustering algorithm named as non-dominated sorting genetic algorithm-fuzzy membership chromosome (NSGA-FMC) based on K-modes method which combines fuzzy genetic algorithm and multi-objective optimization was proposed to improve the clustering quality on categorical data. The proposed method uses fuzzy membership value as chromosome. In addition, due to this innovative chromosome setting, a more efficient solution selection technique which selects a solution from non-dominated Pareto front based on the largest fuzzy membership is integrated in the proposed algorithm. The multiple objective functions: fuzzy compactness within a cluster (π) and separation among clusters (sep) are used to optimize the clustering quality. A series of experiments by using three UCI categorical datasets were conducted to compare the clustering results of the proposed NSGA-FMC with two existing methods: genetic algorithm fuzzy K-modes (GA-FKM) and multi-objective genetic algorithm-based fuzzy clustering of categorical attributes (MOGA (π, sep)). Adjusted Rand index (ARI), π, sep, and computation time were used as performance indexes for comparison. The experimental result showed that the proposed method can obtain better clustering quality in terms of ARI, π, and sep simultaneously with shorter computation time. 相似文献
18.
针对粗糙模糊聚类算法对初值敏感、易陷入局部最优和聚类性能依赖阈值选择等问题, 提出一种混合蛙跳与阴影集优化的粗糙模糊聚类算法(SFLA-SRFCM). 通过设置自适应调节因子, 以增加混合蛙跳算法的局部搜索能力; 利用类簇上、下近似集的模糊类内紧密度和模糊类间分离度构造新的适应度函数; 采用阴影集自适应获取类簇阈值. 实验结果表明, SFLA-SRFCM 算法是有效的, 并且具有更好的聚类精度和有效性指标.
相似文献19.
聚类算法是数据挖掘中的重要方法,针对现有适用类属性和混合型属性的数据集聚类算法如k-modes算法、k-prototypes算法和模糊k-prototypes算法等的不足,提出一种新的方法——类属性分解法。这种方法有更高的稳定性和可靠性,并能有效地减少随机性。 相似文献
20.
Categorical data clustering is a difficult and challenging task due to the special characteristic of categorical attributes: no natural order. Thus, this study aims to propose a two-stage method named partition-and-merge based fuzzy genetic clustering algorithm (PM-FGCA) for categorical data. The proposed PM-FGCA uses a fuzzy genetic clustering algorithm to partition the dataset into a maximum number of clusters in the first stage. Then, the merge stage is designed to select two clusters among the clusters that generated in the first stage based on its inter-cluster distances and merge two selected clusters to one cluster. This procedure is repeated until the number of clusters equals to the predetermined number of clusters. Thereafter, some particular instances in each cluster are considered to be re-assigned to other clusters based on the intra-cluster distances. The proposed PM-FGCA is implemented on ten categorical datasets from UCI machine learning repository. In order to evaluate the clustering performance, the proposed PM-FGCA is compared with some existing methods such as k-modes algorithm, fuzzy k-modes algorithm, genetic fuzzy k-modes algorithm, and non-dominated sorting genetic algorithm using fuzzy membership chromosomes. Adjusted Ranked Index (ARI), Normalized Mutual Information (NMI), and Davies–Bouldin (DB) index are selected as three clustering validation indices which are represented to both external index (i.e., ARI and NMI) and internal index (i.e., DB). Consequently, the experimental result shows that the proposed PM-FGCA outperforms the benchmark methods in terms of the tested indices. 相似文献