首页 | 官方网站   微博 | 高级检索  
     

面向微博文本的情绪标注语料库构建
引用本文:姚源林,王树伟,徐睿峰,刘 滨,桂 林,陆 勤,王晓龙.面向微博文本的情绪标注语料库构建[J].中文信息学报,2014,28(5):83-91.
作者姓名:姚源林  王树伟  徐睿峰  刘 滨  桂 林  陆 勤  王晓龙
作者单位:1. 哈尔滨工业大学 深圳研究生院,广东 深圳 518055;
2.香港理工大学 电子计算学系, 香港 九龙
基金项目:国家自然科学基金(61203378, 61300112, 61370165); 高等院校博士学科点专项基金(20122302120 070);广东省自然科学基金(S2012040007390, S2013010014475);模式识别国家重点实验室开放课题基金;深圳市基础研究计划(JCYJ20120613152557576, JC201005260118A);深圳市国际合作计划(GJHZ201206131 106 1217),百度高校合作项目
摘    要:文本情绪分析研究近年来发展迅速,但相关的中文情绪语料库,特别是面向微博文本的语料库构建尚不完善。为了对微博文本情绪表达特点进行分析以及对情绪分析算法性能进行评估,该文在对微博文本情绪表达特点进行深入观察和分析的基础上,设计了一套完整的情绪标注规范。遵循这一规范,首先对微博文本进行了微博级情绪标注,对微博是否包含情绪及有情绪微博所包含的情绪类别进行多标签标注。而后,对微博中的句子进行有无情绪及情绪类别进行标注,并标注了各情绪类别对应的强度。目前,已完成14000条微博,45431句子的情绪标注语料库构建。应用该语料库组织了NLP&CC2013中文微博情绪分析评测,有力地促进了微博情绪分析相关研究。

关 键 词:情绪语料库  语料库构建  情绪标注  微博文本  

The Construction of an Emotion Annotated Corpus on Microblog Text
YAO Yuanlin,WANG Shuwei,XU Ruifeng,LIU Bin,GUI Lin,LU Qin,WANG Xiaolong.The Construction of an Emotion Annotated Corpus on Microblog Text[J].Journal of Chinese Information Processing,2014,28(5):83-91.
Authors:YAO Yuanlin  WANG Shuwei  XU Ruifeng  LIU Bin  GUI Lin  LU Qin  WANG Xiaolong
Affiliation:1. Harbin Institute of Technology, Shenzhen Graduate School, Shenzhen, Guangdong 518055;
2. Department of Computing, The Hong Kong Polytechnic University, Kowloon, HongKong
Abstract:The research on text emotion analysis has made substantial progesses in recent years. However, the emotion annotated corpus is less developed, especially the ones on micro-blog text. To support the analysis on the emotion expression in Chinese micro-blog text and the evaluation of the emotion classification algorithms, an emotion annotated corpus on Chinese micro-blog text is designed and constructed. Based on the observation and analysis on the emotion expression in micro-blog text, a set of emotion annotation specification is developed. Following this specification, the emotion annotation on micro-blog level is firstly performed. The annotated information includes whether the micro-blog text has emotion expression and the emotion categories corresponding to the micro-blog with emotion expressions. Next, the sentence-level annotation is conducted. Meanwhile, the annotation on whether the sentence has emotion expression and the emotion categories, the strength corresponding to each emotion category is annotated. Currently, this emotion annotated corpus consists of 14000 micro-blogs, totaling 45431 sentences. This corpus was used as the standard resource in the NLP&CC2013 Chinese micro-blog emotion analysis evaluation, facilitating the research on emotion analysis to a great extent.
Keywords:emotion corpus  corpus construction  emotion annotation  micro-blog text  
本文献已被 CNKI 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号