基于信息粒度的文本聚类算法 Text clustering based on information granularity期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于信息粒度的文本聚类算法

引用本文：	赵亚琴,邹红艳. 基于信息粒度的文本聚类算法[J]. 计算机工程与设计, 2009, 30(22)

作者姓名：	赵亚琴邹红艳

作者单位：	南京林业大学,机械电子工程学院,江苏,南京,210037;南京林业大学,机械电子工程学院,江苏,南京,210037

基金项目：	南京林业大学高学历人才基金项目

摘要：	根据文本对象数据的高维性,稀疏性的特点,提出一种基于信息粒度原理的文本聚类方法.首先在给出文本的稀疏特征,文本的稀疏特征向量,文本的稀疏相似度,等价关系隶属度,广义的等价关系等定义的基础上,利用信息粒度原理生成初始聚类,然后提出并理论推导类间相似度的计算方法,进行类的归并.该算法聚类过程不依赖于输入样本的排列顺序,文本数据的有效压缩提高了算法的执行效率.
关键词：	信息粒度数据压缩等价关系隶属度文本聚类
Text clustering based on information granularity

ZHAO Ya-qin,ZOU Hong-yan. Text clustering based on information granularity[J]. Computer Engineering and Design, 2009, 30(22)

Authors:	ZHAO Ya-qin ZOU Hong-yan

Abstract:	The high-dimension, the sparseness and noise are three key factors to influence accuracy of text clustering. A text clustering algorithm based on information granularity is presented. A lot of definitious are given, such as sparse character, sparse vector and sparse similarity of text, membership degree of equivalence relations, generalized equivalence relation. Based on these definitions, equivalence relation theory in information granularity is applied to form initial equivalent clusters. Afterwards, inter-cluster similarity is deduced to combine two initial clusters. The proposed method is independent of sequence of the objects and effective data compression. The ex-perimental results show that the method can effectively and efficiently classify text data.

Keywords:	information granularity data compression membership degree of equivalence relation text clustering
本文献已被万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏