首页 | 官方网站   微博 | 高级检索  
     

Twitter推文与情感词典SentiWordNet匹配算法研究
引用本文:易顺明1,周洪斌1,周国栋2. Twitter推文与情感词典SentiWordNet匹配算法研究[J]. 南京师范大学学报, 2016, 0(3). DOI: 10.3969/j.issn.1672-1292.2016.03.007
作者姓名:易顺明1  周洪斌1  周国栋2
作者单位:(1.沙洲职业工学院电子信息工程系,江苏 苏州 215600)(2.苏州大学计算机科学与技术学院,江苏 苏州 215006
摘    要:在Twitter情感分类研究中,经常会采用将推文中的单词匹配情感词典中的同义词条查找相应情感值的方法. 但推文书写比较随意,包含许多俚语、缩写和特殊符号,导致许多词汇与情感词典中的词条无法匹配,匹配率不高直接影响推文的情感分类性能. 针对Twitter的语言特征,提出了一套Twitter推文与情感词典SentiWordNet的匹配算法. 该算法首先通过对推文内容进行数据清洗、替代处理、词性标注和词形还原等预处理,增加了命名实体识别、对hashtags内容的断词处理、基于Word Clusters的否定句处理和词组匹配等方法. 实验结果表明,采用此方法的匹配率可达90%以上.

关 键 词:推文  情感分类  SentiWordNet  匹配算法

A Matching Algorithm Between the Tweets in Twitter and SentiWordNet
Yi Shunming1,Zhou Hongbin1,Zhou Guodong2. A Matching Algorithm Between the Tweets in Twitter and SentiWordNet[J]. Journal of Nanjing Nor Univ: Eng and Technol, 2016, 0(3). DOI: 10.3969/j.issn.1672-1292.2016.03.007
Authors:Yi Shunming1  Zhou Hongbin1  Zhou Guodong2
Affiliation:(1.Department of Electronics and Information Engineering,Shazhou Professional Institute of Technology,Suzhou 215600,China)(2.School of Computer Science and Technology,Soochow University,Suzhou 215006,China)
Abstract:In the research of the Twitter sentiment classification,a method is widely used to obtain sentiment values by mapping tweets’ words with the synonym terms in the sentiment lexicon. However,tweets are usually written informally,which contain slangs,abbreviations and special symbols,many words in the tweets cannot be found in the terms of sentiment lexicon. Lower matching rate directly impacts the performance of sentiment classification. Based on the features of Twitter,a set of matching algorithm between tweets and sentiment lexicon SentiWordNet is proposed in the article. In this method,tweets are processed by data cleaning,alternative processing,POS tagging and word lemmatizing,along with some algorithms such as named entity recognition,hashtags word segmentation,negated context recognition with Word Clusters and phrase matching. Experimental results show that the matching rate reaches over 90%.
Keywords:tweets  sentiment classification  SentiWordNet  matching algorithm
本文献已被 CNKI 等数据库收录!
点击此处可从《南京师范大学学报》浏览原始摘要信息
点击此处可从《南京师范大学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号