首页 | 官方网站   微博 | 高级检索  
     

基于扩展概念格模型的文本分类规则提取的研究
引用本文:周顽,周才学.基于扩展概念格模型的文本分类规则提取的研究[J].计算机工程与科学,2010,32(8):98-100.
作者姓名:周顽  周才学
作者单位:九江学院信息科学与技术学院,江西,九江,332005
摘    要:文本分类是信息检索与数据挖掘领域的研究热点与核心技术,近年来得到了广泛的关注和快速的发展。概念格是规则提取和数据分析的有效工具,然而概念格的构造效率始终是概念格应用的一大难题。本文研究了基于扩展概念格模型的文本分类规则提取,利用粗糙集和扩展概念格模型来进行分类规则提取。该方法利用概念树,极大地除去了冗余的概念,只需要建造很少的概念就能够提取出全部的分类规则,不仅效率较高,而且同时提取的分类规则与概念格相同。本文算法在MATLAB7.0的环境中运行的实验表明,查全率比KNN算法和SVM算法稍低,但是查准率比它们都高,因此该分类规则用于文本分类时效果与KNN和SVM相当。

关 键 词:文本分类  数据挖掘  粗糙集  概念格  分类规则
收稿时间:2009-05-22
修稿时间:2009-09-10

Research on the Extracting Rules of Text Categorization Based on the Extended Concept Lattice Model
ZHOU Wan,ZHOU Cai-xue.Research on the Extracting Rules of Text Categorization Based on the Extended Concept Lattice Model[J].Computer Engineering & Science,2010,32(8):98-100.
Authors:ZHOU Wan  ZHOU Cai-xue
Affiliation:(School of Information Science and Technology,Jiujiang University,Jiujiang 332005,China)
Abstract:The technique of  auto  text categorization is the foundation in text mining, and text feature selection is the core of the text categorization. Concept lattice is a very effective method to extract rules and data analysis, however, its building efficiency is very low. This paper extracts the rules of the text categorization based on the extended concept lattices model, takes advantage of concept lattice in the categorization rule extracting which eliminates the useless concepts. This method can extract all rules by using a few concepts, which is efficient. This algorithm shows in the environment of running MATLAB7.0 that the recall precision is slightly lower than KNN and SVM ,but precision ratio is higher than them. Therefore, if the classification rules are applied to text categorization, the categorization effect can be comparable with KNN and SVM.
Keywords:document categorization  data mining  rough set  concept lattice  categorization rule
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号