首页 | 官方网站   微博 | 高级检索  
     

成本约束下自适应众包标注的用户观点抽取
引用本文:赵威,林煜明,黄涛贻,李优. 成本约束下自适应众包标注的用户观点抽取[J]. 计算机应用, 2019, 39(5): 1351-1356. DOI: 10.11772/j.issn.1001-9081.2018112496
作者姓名:赵威  林煜明  黄涛贻  李优
作者单位:广西可信软件重点实验室(桂林电子科技大学),广西桂林,541004;广西自动检测技术与仪器重点实验室(桂林电子科技大学),广西桂林,541004
基金项目:国家自然科学基金资助项目(61562014,U1711263);广西自然科学基金重点项目(2018GXNSFDA281049);桂林电子科技大学研究生优秀学位论文培育项目(16YJPYSS15);桂林电子科技大学研究生教育创新计划项目(2018YJCX48);广西可信软件重点实验室研究课题(kx201916)。
摘    要:用户评论包含了丰富的用户观点信息,对潜在的顾客和商家具有重要的参考价值。观点目标和观点词作为用户评论中的核心对象,它们的自动抽取是用户评论智能化应用的一项核心工作。目前主要采用有监督的抽取方法解决该问题,这些方法依赖于利用高质量的标注样本进行模型训练,而传统人工标注样本的方法不仅耗时费力,且标注成本高。众包计算为构建高质量训练样本集提供了一种有效途径,然而,众包工作者由于知识背景等因素使得标注结果的质量参差不齐。为了在有限的成本下获取高质量的标注样本,提出一种基于工作者专业水平评估的自适应众包标注方法,构建可靠的观点目标-观点词数据集。首先,通过小成本挖掘出高专业水平的工作者;然后,设计一种基于工作者可靠性的任务分发机制;最后,利用观点目标和观点词间的依赖关系设计了一种有效的标注结果融合算法,通过整合不同工作者的标注结果生成最终可靠的结果。在真实数据集上进行了一系列实验表明,与GLAD模型和多数投票(MV)算法方法相比,所提方法能够在成本预算较小的情况下将构建出的高质量观点目标-观点词数据集的可靠性提高10%左右。

关 键 词:观点挖掘  众包计算  成本约束  工作者检测  数据整合
收稿时间:2018-12-04
修稿时间:2018-12-18

User opinion extraction based on adaptive crowd labeling with cost constrain
ZHAO Wei,LIN Yuming,HUANG Taoyi,LI You. User opinion extraction based on adaptive crowd labeling with cost constrain[J]. Journal of Computer Applications, 2019, 39(5): 1351-1356. DOI: 10.11772/j.issn.1001-9081.2018112496
Authors:ZHAO Wei  LIN Yuming  HUANG Taoyi  LI You
Affiliation:1. Guangxi Key Laboratory of Trusted Software(Guilin University of Electronic Technology), Guilin Guangxi 541004, China;2. Guangxi Key Laboratory of Automatic Detecting Technology and Instruments(Guilin University of Electronic Technology), Guilin Guangxi 541004, China
Abstract:User reviews contain a wealth of user opinion information which has great reference value to potential customers and merchants. Opinion targets and opinion words are core objects of user reviews, so the automatic extraction of them is a key work for user review intelligent applications. At present, the problem is solved mainly by supervised extraction method, which depends on high quality labeled samples to train the model. And traditional manual labeling method is time-consuming, laborious and costly. Crowdsourcing calculation provides an effective way to build a high-quality training sample set. However, the quality of the labeling results is uneven due to some factors such as knowledge background of the workers. To obtain high-quality labeling samples at a limited cost, an adaptive crowdsourcing labeling method based on professional level evaluation of workers was proposed to construct a reliable dataset of opinion target-opinion words. Firstly, high professional level workers were digged out with small cost. And then, a task distribution mechanism based on worker reliability was designed. Finally, an effective fusion algorithm for labeling results was designed by using the dependency relationship between opinion targets and opinion words, and the final reliable results were generated by integrating the labeling results of different workers. A series of experiments on real datasets show that the reliability of high quality opinion target-opinion word dataset built by the proposed method can be improved by about 10%, compared with GLAD (Generative model of Labels, Abilities, and Difficulties) model and MV (Majority Vote) method when the cost budget is low.
Keywords:opinion mining   crowdsourcing calculation   cost constraint   worker measurement   data integration
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号