结合语义与统计的特征降维短文本聚类 Feature Dimension Reduction Short Text Clustering Combined with Semantic and Statistics期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

结合语义与统计的特征降维短文本聚类

引用本文：	杨婉霞,孙理和,黄永峰.结合语义与统计的特征降维短文本聚类[J].计算机工程,2012,38(22):171-175.

作者姓名：	杨婉霞孙理和黄永峰

作者单位：	1. 甘肃农业大学工学院,兰州730070;清华大学电子工程系,北京100084 2. 西北师范大学外国语学院,兰州,730070 3. 清华大学电子工程系,北京,100084

基金项目：	国家"863"计划基金资助项目，清华大学自主科研基金资助项

摘要：	为解决文本聚类时文本的高维稀疏性问题，提出一种语义和统计特征相结合的短文本聚类算法。该算法通过语义词典对词汇的语义相关性分析实现一次降维，结合统计方法进行特征选择实现二次降维，并融合二次降维特征实现短文本聚类。实验结果表明，该算法具有较好的短文本聚类效果和效率。
关键词：	特征选择聚类短文本向量空间模型语义降维
收稿时间：	2012-06-14
修稿时间：	2012-09-12
Feature Dimension Reduction Short Text Clustering Combined with Semantic and Statistics

YANG Wan-xia , SUN Li-he , HUANG Yong-feng.Feature Dimension Reduction Short Text Clustering Combined with Semantic and Statistics[J].Computer Engineering,2012,38(22):171-175.

Authors:	YANG Wan-xia SUN Li-he HUANG Yong-feng

Affiliation:	(1. College of Technology, Gansu Agricultural University, Lanzhou 730070, China; 2. Department of Electronic Engineering, Tsinghua University, Beijing 100084, China; 3. College of Foreign Languages and Literature, Northwest Normal University, Lanzhou 730070, China)

Abstract:	The primary difficulty of text clustering lies in the multi-dimensional sparseness of texts. A short text clustering algorithm which takes semantic and statistic features into account is proposed. A dimensionality reduction is achieved via the semantic relativity analysis of lexical semantics by semantic dictionary. The second dimension reduction is completed after a feature selection through statistical methods. The short text clustering is obtained with the combination of the two reductions. Experimental result shows that the algorithm has better clustering effect and efficiency on short text.

Keywords:	feature selection clustering short text Vector Space Model(VSM) semantic dimension reduction
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《计算机工程》浏览原始摘要信息
	点击此处可从《计算机工程》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏