首页 | 官方网站   微博 | 高级检索  
     

一种大规模高维数据快速聚类算法
引用本文:刘铭,王晓龙,刘远超.一种大规模高维数据快速聚类算法[J].自动化学报,2009,35(7):859-866.
作者姓名:刘铭  王晓龙  刘远超
作者单位:1.哈尔滨工业大学计算机科学与技术学院 哈尔滨 150001
基金项目:国家高技术研究发展计划(863计划)(2006AA01Z197,2007AA01Z172);;国家自然科学基金(60435020)资助~~
摘    要:提出了一种面向大规模高维数据的自组织映射聚类算法. 算法通过压缩神经元的特征集合, 仅选择与神经元代表的文档类相关的特征构造神经元的特征向量, 从而减少了聚类时间. 同时由于选取的特征能够将映射到不同神经元的文档类进行有效区分, 避免了无关特征的干扰, 因而提升了聚类的精度. 实验结果表明该方法能够有效加快聚类的速度, 提升聚类的准确度, 达到比较理想的聚类效果.

关 键 词:向量压缩    神经元合并    类内相似度    类间区分度
收稿时间:2008-7-1
修稿时间:2008-12-3

A Fast Clustering Algorithm for Large-scale and High Dimensional Data
LIU Ming WANG Xiao-Long LIU Yuan-Chao .School of Computer Science , Technology,Harbin Institute of Technology,Harbin.A Fast Clustering Algorithm for Large-scale and High Dimensional Data[J].Acta Automatica Sinica,2009,35(7):859-866.
Authors:LIU Ming WANG Xiao-Long LIU Yuan-Chao School of Computer Science  Technology  Harbin Institute of Technology  Harbin
Affiliation:1.School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001
Abstract:A novel self-organizing-mapping algorithm for large-scale and high dimensional data is proposed in this paper. By compressing neurons' feature sets and only selecting relative features to construct neurons' feature vectors, the clustering time can be dramatically decreased. Simultaneously, because the selected features can effectively distinguish different documents which are mapped to different neurons, the algorithm can avoid interferences of irrelative features and improve clustering precision. Experiments results demonstrate that this methodology can accelerate clustering speed and improve clustering precision significantly and can reach relatively ideal clustering effect.
Keywords:Vector compression  neuron combination  intra-cluster similarity  inter-cluster distinctness
本文献已被 CNKI 等数据库收录!
点击此处可从《自动化学报》浏览原始摘要信息
点击此处可从《自动化学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号