基于LLE-k均值方法的中文文本聚类 A Method of LLE-k Means for Chinese Text Clustering期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于LLE-k均值方法的中文文本聚类

引用本文：	冯燕,王洪元,程起才,刘爱萍.基于LLE-k均值方法的中文文本聚类[J].计算机与数字工程,2010,38(11):10-12,21.

作者姓名：	冯燕王洪元程起才刘爱萍

作者单位：	常州大学信息工程学院,常州213164

基金项目：	国家自然科学基金，江苏省自然科学基金，江苏省高校自然科学基金，国家基金

摘要：	文本聚类中,文本特征向量的高维特性使得对样本统计特征的评估十分困难,所以有必要进行有效的维数简约。LLE算法利用线性重构的局部对称性找出高维数据空间中的非线性结构,并在保持各数据点临近位置关系情况下,把高维空间数据点映射为低维空间对应的数据点。文章采用LLE-k均值方法进行中文文本聚类研究。首先利用LLE进行降维处理,然后对得到的线性特征向量用k均值进行聚类分析,与PCAI、SOMAP和LLE算法比较,结果显示LLE-k均值算法能得到更好的可视化效果。
关键词：	文本聚类 LLE 维数简约 k-means
A Method of LLE-k Means for Chinese Text Clustering

Feng Yan,Wang Hongyuan,Cheng Qicai,Liu Aiping.A Method of LLE-k Means for Chinese Text Clustering[J].Computer and Digital Engineering,2010,38(11):10-12,21.

Authors:	Feng Yan Wang Hongyuan Cheng Qicai Liu Aiping

Affiliation:	Feng Yan Wang Hongyuan Cheng Qicai Liu Aiping(School of Information Engineering,Changzhou University,Changzhou 213164)

Abstract:	In text clustering,the high dimensional characteristics of text feature vector make the assessment of statistical characteristics very difficult,it is necessary for effective dimensional reduction.In locally linear embedding algorithm,the nonlinear structure in high dimensional data space is exploited with the local symmetries of linear reconstructions.The data points in high dimensional space are mapped into corresponding data points in lower dimensional space under preserving distance between data points.This paper use LLE-k means to research Chinese text clustering.Firstly,reducing dimension with LLE algorithm,and then using k means algorithm to cluster and analysis,moreover,comparing with PCA,ISOMAP,and LLE.The results show that the LLE-k means get the better visualization.

Keywords:	text clustering LLE dimensional reduction k-means
本文献已被维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏