首页 | 官方网站   微博 | 高级检索  
     

基于数据集特点的增强聚类集成算法
引用本文:侯勇,郑雪峰.基于数据集特点的增强聚类集成算法[J].计算机应用,2013,33(8):2204-2207.
作者姓名:侯勇  郑雪峰
作者单位:1. 北京科技大学 计算机与通信工程学院,北京 100083; 2. 山东经贸职业学院 科学与人文学院,山东 潍坊 261011
基金项目:山东省企业培训与职工教育课题资助项目;潍坊市社科规划重点课题资助项目;山东省高校人文社科研究计划项目
摘    要:当前流行的聚类集成算法无法依据不同数据集的不同特点给出恰当的处理方案,为此提出一种新的基于数据集特点的增强聚类集成算法,该算法由基聚类器的生成、基聚类器的选择与共识函数构成。该算法依据数据集的特点,通过启发式方法,选出合适的基聚类器,构建最终的基聚类器集合,并产生最终聚类结果。实验中,对ecoli,leukaemia与Vehicle三个基准数据集进行了聚类,所提出算法的聚类误差分别是0.014,0.489,0.479,同基于Bagging的结构化集成(BSEA)、异构聚类集成(HCE)和基于聚类的集成分类(COEC)算法相比,所提出算法的聚类误差始终最低;而在增加候基聚类器的情况下,所提出算法的标准化互信息(NMI)值始终高于对比算法。实验结果表明,同对比的聚类集成算法相比,所提出算法的聚类精度最高,可伸缩性最强。

关 键 词:基聚类器  共识函数  聚类集成算法  聚类误差  自适应性  标准化互信息  
收稿时间:2013-02-04
修稿时间:2013-03-12

Enhanced clustering ensemble algorithm based on characteristics of data sets
HOU Yong ZHENG Xuefeng.Enhanced clustering ensemble algorithm based on characteristics of data sets[J].journal of Computer Applications,2013,33(8):2204-2207.
Authors:HOU Yong ZHENG Xuefeng
Affiliation:1. College of Humanities and Science, Shandong Vocational College of Economics and Business, Weifang Shandong 61011, China
2. School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China
Abstract:The popular clustering ensemble algorithms cannot give the appropriate treatment program in the light of the different characteristics of the different data sets. A new clustering ensemble algorithm — Enhanced Clustering Ensemble algorithm based on Characteristics of Data sets (ECECD) was proposed for overcoming this defect. ECECD was composed of generation of base clustering, selection of base clustering and consensus function. It selected a special range of ensemble members to form the final ensemble and produced the final clustering based on the characteristic of the data set. Three Benchmark data sets including ecoli, leukaemia and Vehicle were clustered in the experiment, and the clustering errors gained by the proposed algorithm were 0.014, 0.489 and 0.361 respectively, which were always the minimum compared with that of the other algorithms such as Bagging based Structure Ensemble Approach (BSEA), Hybrid Cluster Ensemble (HCE) and Cluster-Oriented Ensemble Classifier (COES). The Normalized Mutual Information (NMI) values of the proposed algorithm were also always higher than that of these algorithms when increasing candidate base clusterings. Therefore, compared with these popular clustering ensemble algorithms, the proposed algorithm has the highest clustering precision and the strongest scalability.
Keywords:base clustering  consensus function  clustering ensemble algorithm  clustering error  adaptivity  Normalized Mutual Information (NMI)  
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号