首页 | 官方网站   微博 | 高级检索  
     

基于异质集成学习的虚假评论检测
引用本文:张大鹏,刘雅军,张伟,沈芬,杨建盛.基于异质集成学习的虚假评论检测[J].山东大学学报(工学版),2020,50(2):1-9.
作者姓名:张大鹏  刘雅军  张伟  沈芬  杨建盛
作者单位:燕山大学信息科学与工程学院,河北 秦皇岛066004;河北建筑工程学院信息工程学院,河北 张家口075000
基金项目:张家口市科学技术研究与发展指令计划项目(1711007B);张家口市科学技术研究与发展指令计划项目(1711045H);张家口市科学技术研究与发展指令计划项目(1811009B-04)
摘    要:为了防止卖家的恶性竞争、保证电商平台能够公平交易、保护消费者的权益不受侵犯,针对虚假评论检测领域中数据集小、标注不准确等问题,基于亚马逊最新发布的虚假评论数据集对相关算法进行改进。考虑到Word2vec模型无法识别英语中的词对,提出了Bigram-Word2vec模型;提出“二分类加权硬投票法”以解决异质集成学习中分类器投票数相等的情况;针对异质集成学习中分类器权重设置问题提出“加权软投票法”。试验结果表明,文中对相关算法的改进取得了较为理想的结果。

关 键 词:机器学习  异质集成学习  投票法  虚假评论检测  Word2vec
收稿时间:2019-07-24

Fake comment detection based on heterogeneous ensemble learning
Dapeng ZHANG,Yajun LIU,Wei ZHANG,Fen SHEN,Jiansheng YANG.Fake comment detection based on heterogeneous ensemble learning[J].Journal of Shandong University of Technology,2020,50(2):1-9.
Authors:Dapeng ZHANG  Yajun LIU  Wei ZHANG  Fen SHEN  Jiansheng YANG
Affiliation:1. School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, Hebei, China2. College of Information Engineering, Hebei Institute of Architecture and Civil Engineering, Zhangjiakou 075000, Hebei, China
Abstract:In view of the problem of small data set and inaccurate labeling in the field of fake comment detection, in order to prevent the vicious competition of sellers, ensure the fair trading of e-commerce platform, and protect the rights of consumers, the latest fake comment data set released by Amazon was used. The research was carried out and the related algorithms were improved. The Word2vec model could not recognize the word pairs in English. The Bigram-Word2vec model was proposed. The "two-class weighted hard voting" was proposed to solve the heterogeneous integration learning's case where the number of votes of the classifier was equal. The "weighted soft voting" was studied for how to set the weight of the classifier in heterogeneous integration learning. The experimental results showed that the improvement of related algorithms in this paper had achieved more ideal results.
Keywords:machine learning  heterogeneous ensemble learning  voting  fake comment detection  Word2vec  
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《山东大学学报(工学版)》浏览原始摘要信息
点击此处可从《山东大学学报(工学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号