首页 | 官方网站   微博 | 高级检索  
     

基于信息粒度的文本聚类算法
引用本文:赵亚琴,邹红艳.基于信息粒度的文本聚类算法[J].计算机工程与设计,2009,30(22).
作者姓名:赵亚琴  邹红艳
作者单位:南京林业大学,机械电子工程学院,江苏,南京,210037
基金项目:南京林业大学高学历人才基金项目 
摘    要:根据文本对象数据的高维性,稀疏性的特点,提出一种基于信息粒度原理的文本聚类方法.首先在给出文本的稀疏特征,文本的稀疏特征向量,文本的稀疏相似度,等价关系隶属度,广义的等价关系等定义的基础上,利用信息粒度原理生成初始聚类,然后提出并理论推导类间相似度的计算方法,进行类的归并.该算法聚类过程不依赖于输入样本的排列顺序,文本数据的有效压缩提高了算法的执行效率.

关 键 词:信息粒度  数据压缩  等价关系隶属度  文本聚类

Text clustering based on information granularity
ZHAO Ya-qin,ZOU Hong-yan.Text clustering based on information granularity[J].Computer Engineering and Design,2009,30(22).
Authors:ZHAO Ya-qin  ZOU Hong-yan
Abstract:The high-dimension, the sparseness and noise are three key factors to influence accuracy of text clustering. A text clustering algorithm based on information granularity is presented. A lot of definitious are given, such as sparse character, sparse vector and sparse similarity of text, membership degree of equivalence relations, generalized equivalence relation. Based on these definitions, equivalence relation theory in information granularity is applied to form initial equivalent clusters. Afterwards, inter-cluster similarity is deduced to combine two initial clusters. The proposed method is independent of sequence of the objects and effective data compression. The ex-perimental results show that the method can effectively and efficiently classify text data.
Keywords:information granularity  data compression  membership degree of equivalence relation  text clustering
本文献已被 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号