首页 | 官方网站   微博 | 高级检索  
     

面向短文本情感分析的词扩充LDA模型
引用本文:沈冀,马志强,李图雅,张力. 面向短文本情感分析的词扩充LDA模型[J]. 山东大学学报(工学版), 2018, 48(3): 120-126. DOI: 10.6040/j.issn.1672-3961.0.2017.407
作者姓名:沈冀  马志强  李图雅  张力
作者单位:内蒙古工业大学信息工程学院, 内蒙古 呼和浩特 010080
基金项目:国家自然科学基金资助项目(61650205);内蒙古自治区自然科学基金资助项目(2014MS0608)
摘    要:针对短文本在情感极性判断上准确率不高的缺点,在隐含狄利克雷分配(latent Dirichlet allocation, LDA)的基础上提出一种适用于短文本的情感分析模型。该模型在短文本中按词性寻找情感词汇,并对其进行有约束的词语扩充形成扩充集合,增强情感词汇之间的共现频率。将扩充集合加入文本中已发现的情感词汇,使得短文本长度增加并且模型可以提取到情感信息,模型通过这种方法将主题聚类变成情感主题聚类。该模型使用4 000条带有正负情感极性的短文本进行验证,结果表明该模型准确率比情感主题联合模型提高约11%,比隐含情感模型提高约9.5%,同时可以发现更多的情感词汇,证明该模型对于短文本能够提取更丰富的情感特征并在情感极性分类上准确率较高。

关 键 词:短文本  情感分析  隐含狄利克雷分配  无监督学习  词扩充  文档-主题生成模型  
收稿时间:2017-05-09

A word extend LDA model for short text sentiment
SHEN Ji,MA Zhiqiang,LI Tuya,ZHANG Li. A word extend LDA model for short text sentiment[J]. Journal of Shandong University of Technology, 2018, 48(3): 120-126. DOI: 10.6040/j.issn.1672-3961.0.2017.407
Authors:SHEN Ji  MA Zhiqiang  LI Tuya  ZHANG Li
Affiliation:College of Information Engineering, Inner Mongolia University of Technology, Hohhot 010080, Inner Mongolia, China
Abstract:Faced with low accuracy of sentiment polarity analysis for short text, this research presented an sentiment analysis model for short text based on latent dirichlet allocation. The model searched for the emotional words by the part of speech in the short texts and expanded them restrainedly to an extended set, enhanced the co-occurrence frequency between emotional words. The model added the expanded set to the discovered emotional words in short texts, increasing length of the short texts, extracting emotional information and turning topic clustering into emotion topic clustering. The model used 4 000 positive and negative short texts to experiments. The results showed that our model improved sentiment classification 11.8% than joint sentiment topic model model and 9.5% than latent sentiment model model; more emotional words were found at the same time. It proved that the model extracted richer emotion features for short texts and had a higher accuracy of classification in sentiment analysis.
Keywords:short text  word extend function  latent Dirichlet allocation  unsupervised learning  sentiment analysis  document-topic generative model  
本文献已被 CNKI 等数据库收录!
点击此处可从《山东大学学报(工学版)》浏览原始摘要信息
点击此处可从《山东大学学报(工学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号