首页 | 官方网站   微博 | 高级检索  
     

微博观点句识别的话题影响研究
引用本文:罗,凌,陈毅东,曹茂元.微博观点句识别的话题影响研究[J].数字社区&智能家居,2014(1):123-127.
作者姓名:    陈毅东  曹茂元
作者单位:厦门大学信息科学与技术学院智能科学与技术系,福建厦门361005
基金项目:国家自然科学基金项目(61005052);国家科技支撑计划课题(2012BAH14F03);中央高校基本科研业务费专项资金(2010121068);福建省自然科学基金项目(201U01369)
摘    要:为了从海量的网络信息中迅速准确地获取评价信息,观点句识别已经成了自然语言处理的一个研究热点。现在观点句识别系统大都是基于机器学习的方法,一般使用机器学习的方法来进行分类会受到领域差异性影响。针对这个问题,该文对微博观点句识别系统是否会受到微博话题影响做了经验性研究,同时为了弥补训练数据的不足,该文通过规则方法自动标注网络数据进行了训练集的扩充。实验结果表明,微博话题间存在差异,进行分话题模型训练可以提升微博观点句识别系统的性能。

关 键 词:观点句识别  机器学习  话题  规则

A Study on the Effects of Topics on an Opinion Sentences Identification System for Micro-blog Data
LUO Ling,CHEN Yi-dong,CAO Mao-yuan.A Study on the Effects of Topics on an Opinion Sentences Identification System for Micro-blog Data[J].Digital Community & Smart Home,2014(1):123-127.
Authors:LUO Ling  CHEN Yi-dong  CAO Mao-yuan
Affiliation:LUO Ling, CHEN Yi-dong, CAO Mao-yuan
Abstract:As an important stage for information extraction, the problem of Opinion Sentence Identification (OSI) has attracted more and more attentions from NLP researchers in the past decade. Similar to other areas in NLP, most current OSI systems are built based on machine learning (ML) technologies, which often suffer from the problem of domain/topic adaptation. In this pa-per, an empirical study was conducted to test whether the topic difference among the micro-blog data effects on the performance of an ML-based OSI system, which used rule-based automatic annotation methods to expand the training set. The experimental results indicated that by introducing a topic classifier and performing the training based on the sub topics, the performance of the OSI system for micro-blog data could be improved significantly.
Keywords:opinion sentences identification  machine learning  topic  rule-based
本文献已被 CNKI 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号