首页 | 官方网站   微博 | 高级检索  
     

基于协同训练的电商领域短语挖掘
引用本文:许勇,刘井平,肖仰华,朱慕华. 基于协同训练的电商领域短语挖掘[J]. 计算机工程, 2020, 46(4): 70-76,84
作者姓名:许勇  刘井平  肖仰华  朱慕华
作者单位:复旦大学计算机科学技术学院,上海200433;阿里巴巴网络技术有限公司,杭州311121
摘    要:电商领域的文本通常不遵循通用领域文本的表达方式,导致传统短语挖掘方法在电商领域文本中的挖掘精度较低.为此,提出一种基于协同训练的电商领域短语挖掘方法.通过基于语义特征的短语分类模型来有效检测电商领域文本中的反序表达,构建协同训练的短语挖掘框架,以降低领域语料中标注训练数据的成本,在此基础上,利用Stacking方法集成统计模型和语义模型的优点,提升模型整体挖掘性能.在淘宝网查询语料上的实验结果表明,相比于ClassPhrase、AutoPhrase方法,该方法具有更高的精度和召回率.

关 键 词:集成学习  短语挖掘  协同训练  深度学习  命名实体识别

Phrase Mining in Ecommerce Based on Cooperative Training
XU Yong,LIU Jingping,XIAO Yanghua,ZHU Muhua. Phrase Mining in Ecommerce Based on Cooperative Training[J]. Computer Engineering, 2020, 46(4): 70-76,84
Authors:XU Yong  LIU Jingping  XIAO Yanghua  ZHU Muhua
Affiliation:(School of Computer Science,Fudan University,Shanghai 200433,China;Alibaba Network Technology Co.,Ltd.,Hangzhou 311121,China)
Abstract:The texts in ecommerce usually do not follow the way of expression as the texts in general domains,resulting in low accuracy of traditional phrase mining methods in the ecommerce text mining.Therefore,this paper proposes a phrase mining method based on cooperative training.Through the phrase classification model based on semantic features,the antitone expression of ecommerce texts is effectively detected.Then the phrase mining framework of cooperative training is constructed,so as to reducing the cost of marking training data in the domain corpus.On this basis,the Stacking method is used to integrate the advantages of statistical model and semantic model,thus improving the overall mining performance of the model.Experimental results on Taobao query corpus show that compared with ClassPhrase and AutoPhrase methods,the proposed method has higher accuracy and recall rate.
Keywords:ensemble learning  phrase mining  cooperative training  deep learning  Named Entity Recognition(NER)
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号