首页 | 官方网站   微博 | 高级检索  
     

使用类内集中度和分层递阶约简的特征选择方法
引用本文:陈吕强,朱颢东,伏明兰.使用类内集中度和分层递阶约简的特征选择方法[J].计算机工程与应用,2010,46(30):134-137.
作者姓名:陈吕强  朱颢东  伏明兰
作者单位:1.黄山学院 信息工程学院,安徽 黄山 245021 2.郑州轻工业学院 计算机与通信工程学院,郑州 450002 3.中国科学院 成都计算机应用研究所,成都 610041
基金项目:四川省科技计划项目,黄山学院科学研究计划项目
摘    要:特征选择是文本分类的关键步骤之一,所选特征子集的优劣直接影响文本分类的结果。首先简单分析了几种经典的特征选择方法,总结了它们的不足,然后提出了类内集中度的概念,紧接着把分层递阶的思想引入粗糙集并提出了一个改进的基于分层递阶的属性约简算法,最后把该约简算法同类内集中度结合起来,提出了一个综合的特征选择方法。该方法首先利用类内集中度进行特征初选以过滤掉一些词条来降低特征空间的稀疏性,然后利用所提约简算法消除冗余,从而获得较具代表性的特征子集。实验结果表明此种特征选择方法效果良好。

关 键 词:文本分类  特征选择  类内集中度  分层递阶约简  
收稿时间:2009-3-26
修稿时间:2010-3-18  

Feature selection method using classificatory concentration and hierarchical reduction
CHEN Lv-qiang,ZHU Hao-dong,FU Ming-lan.Feature selection method using classificatory concentration and hierarchical reduction[J].Computer Engineering and Applications,2010,46(30):134-137.
Authors:CHEN Lv-qiang  ZHU Hao-dong  FU Ming-lan
Affiliation:1.Department of Information Engineering,Huangshan University,Huangshan,Anhui 245021,China 2.College of Computer and Communication Engineering,Zhengzhou University of Light Industry,Zhengzhou 450002,China 3.Chengdu Institute of Computer Application,Chinese Academy of Sciences,Chengdu 610041,China
Abstract:Feature selection is one of the key steps in text categorization.The selected feature subset directly influences results of text categorization.It firstly analyzes simply several classic feature selection methods and summarizes their deficiencies,and then presents the concept of classificatory concentration,next introduces thoughts of hierarchical reduction into rough sets and provides an improved hierarchical reduction algorithm,finally,combines the reduction algorithm with the classificatory concentration and proposes a comprehensive feature selection method.The comprehensive method firstly uses the classificatory concentration to select feature and filter out some terms to reduce the sparsity of feature spaces,and then uses the hierarchical reduction algorithm to eliminate redundancy,so can acquire the feature subset which are more representative.The experimental results show that the comprehensive method is promising.
Keywords:text categorization  feature selection  classificatory concentration  hierarchical reduction
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号