一种大规模高维数据快速聚类算法 A Fast Clustering Algorithm for Large-scale and High Dimensional Data期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种大规模高维数据快速聚类算法

引用本文：	刘铭,王晓龙,刘远超.一种大规模高维数据快速聚类算法[J].自动化学报,2009,35(7):859-866.

作者姓名：	刘铭王晓龙刘远超

作者单位：	1.哈尔滨工业大学计算机科学与技术学院哈尔滨 150001

基金项目：	国家高技术研究发展计划(863计划)(2006AA01Z197,2007AA01Z172);;国家自然科学基金(60435020)资助~~

摘要：	提出了一种面向大规模高维数据的自组织映射聚类算法. 算法通过压缩神经元的特征集合, 仅选择与神经元代表的文档类相关的特征构造神经元的特征向量, 从而减少了聚类时间. 同时由于选取的特征能够将映射到不同神经元的文档类进行有效区分, 避免了无关特征的干扰, 因而提升了聚类的精度. 实验结果表明该方法能够有效加快聚类的速度, 提升聚类的准确度, 达到比较理想的聚类效果.
关键词：	向量压缩神经元合并类内相似度类间区分度
收稿时间：	2008-7-1
修稿时间：	2008-12-3
A Fast Clustering Algorithm for Large-scale and High Dimensional Data

LIU Ming WANG Xiao-Long LIU Yuan-Chao .School of Computer Science , Technology,Harbin Institute of Technology,Harbin.A Fast Clustering Algorithm for Large-scale and High Dimensional Data[J].Acta Automatica Sinica,2009,35(7):859-866.

Authors:	LIU Ming WANG Xiao-Long LIU Yuan-Chao School of Computer Science Technology Harbin Institute of Technology Harbin

Affiliation:	1.School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001

Abstract:	A novel self-organizing-mapping algorithm for large-scale and high dimensional data is proposed in this paper. By compressing neurons' feature sets and only selecting relative features to construct neurons' feature vectors, the clustering time can be dramatically decreased. Simultaneously, because the selected features can effectively distinguish different documents which are mapped to different neurons, the algorithm can avoid interferences of irrelative features and improve clustering precision. Experiments results demonstrate that this methodology can accelerate clustering speed and improve clustering precision significantly and can reach relatively ideal clustering effect.

Keywords:	Vector compression neuron combination intra-cluster similarity inter-cluster distinctness
本文献已被 CNKI 等数据库收录！
	点击此处可从《自动化学报》浏览原始摘要信息
	点击此处可从《自动化学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏