首页 | 官方网站   微博 | 高级检索  
     

基于改进特征选择方法的文本情感分类研究
引用本文:刘洺辛,陈晶,王麒媛.基于改进特征选择方法的文本情感分类研究[J].电信科学,2018,34(10):85-95.
作者姓名:刘洺辛  陈晶  王麒媛
作者单位:1. 燕山大学信息科学与工程学院,河北 秦皇岛 066004;2. 河北省信息传输与信号处理重点实验室,河北 秦皇岛 066004;3. 河北省计算机虚拟技术实验室,河北 秦皇岛 066004
基金项目:国家自然科学基金资助项目(61602401);国家自然科学基金资助项目(61472340)
摘    要:提出了结合情感词典的改进信息增益特征选择方法。首先,针对现有的信息增益特征选择存在注重特征词的文档频率而忽视语料均衡等问题,提出了改进方法。其次,考虑情感词对文本分类的影响,提出了基于情感词典的特征选择(information gain combining sentiment classification,IGSC)算法进行文本分类。该算法通过对文本情感词进行匹配并结合情感词赋权重,实现了特征降维并解决了文本数据稀疏影响分类性能的问题;最后,针对旅游评论数据集对所提出的特征选择方法进行了实验验证及分析。实验结果表明,本文提出的改进文本情感分类特征选择方法在分类准确率、召回率和F值方面均得到了提升,并且具有较好的分类稳定性。

关 键 词:信息增益  情感词典  特征选择  情感分类  

Research on text sentiment classification based on improved feature selection method
Mingxin LIU,Jing CHEN,Qiyuan WANG.Research on text sentiment classification based on improved feature selection method[J].Telecommunications Science,2018,34(10):85-95.
Authors:Mingxin LIU  Jing CHEN  Qiyuan WANG
Affiliation:1. College of Information Science and Engineering,Yanshan University,Qinhuangdao 066004,China;2. Hebei Key Laboratory of Information Transmission and Signal Processing,Qinhuangdao 066004,China;3. Computer Virtual Technology Laboratory in Hebei Province,Qinhuangdao 066004,China
Abstract:An improved information gain feature selection method based on sentiment dictionary was proposed.Firstly,aiming at the existing problems of information gain feature selection,such as paying attention to the frequency of feature word and ignoring the balance of corpus,an improved method was proposed.Secondly,considering the influence of sentiment words in text classification,a feature selection method IGSC (information gain combining sentiment classification) based on sentiment dictionary was proposed for text classification.By matching the text emotion words and combining the weight of emotion words,the feature dimension reduction was realized and the problem of text data sparseness affecting classification performance was solved.Finally,according to the proposed feature selection method of travel review data set for experimental verification and analysis,the experimental results show that the improved text sentiment classification feature selection method has been improved in terms of classification accuracy,recall and F value,and classification has better stability.
Keywords:information gain  sentiment dictionary  feature selection  sentiment classification  
点击此处可从《电信科学》浏览原始摘要信息
点击此处可从《电信科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号