首页 | 官方网站   微博 | 高级检索  
     

一种基于最少出现文档频的文本特征提取方法
引用本文:苏丹,周明全,王学松,任玉芝.一种基于最少出现文档频的文本特征提取方法[J].计算机工程与应用,2012,48(10):164-166,178.
作者姓名:苏丹  周明全  王学松  任玉芝
作者单位:北京师范大学信息科学与技术学院,北京,100875
摘    要:传统特征提取改进方法在特征分布信息的量化方面存在不足,很大程度上影响了其分类效能。针对这一问题,提出一种基于最少出现文档频的特征提取改进方法,即TF-LDF算法。该算法用最少出现文档频来量化特征类间集中度与类内离散度,能够更加准确地反映特征分布情况。通过实验结果比较,可以证明TF-LDF算法分类效果更佳。

关 键 词:特征提取  特征分布  类间集中度  类内离散度  文档-最少出现文档频率(TF-LDF)

Method based on least document frequency for text feature extraction
SU Dan , ZHOU Mingquan , WANG Xuesong , REN Yuzhi.Method based on least document frequency for text feature extraction[J].Computer Engineering and Applications,2012,48(10):164-166,178.
Authors:SU Dan  ZHOU Mingquan  WANG Xuesong  REN Yuzhi
Affiliation:College of Information Science and Technology, Beijing Normal University, Beijing 100875, China
Abstract:Conventional methods of text feature extraction are inadequate at distribution quantification, which to a large extent affects the efficiency of classification. Aiming at this problem, a scheme of Least Document Frequency (LDF)is proposed, which can quantify the concentration and dispersion among feature classes through LDF, thus can reflect the characteristics of the distribution more accurately. Through experiments, TF-LDF algorithm can ac- quire a better result.
Keywords:feature extraction  feature distribution  concentration among classes  dispersion within class  Term Fre-quency-Least Document Frequency(TF-LDF)
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号