首页 | 官方网站   微博 | 高级检索  
     

融合LDA的卷积神经网络主题爬虫研究
引用本文:汪岿,费晨杰,刘柏嵩.融合LDA的卷积神经网络主题爬虫研究[J].计算机工程与应用,2019,55(11):123-128.
作者姓名:汪岿  费晨杰  刘柏嵩
作者单位:宁波大学 信息科学与工程学院,浙江 宁波,315211;宁波大学 信息科学与工程学院,浙江 宁波 315211;宁波大学 图书馆与信息中心,浙江 宁波 315211
基金项目:国家社会科学基金;省部级实验室/开放基金
摘    要:传统的主题爬虫在计算主题相似度时,通常采用基于词频、向量空间模型以及语义相似度的方法,给相似度计算准确率的提升带来一定瓶颈。因此,提出融合LDA的卷积神经网络主题爬虫,将主题判断模块视为文本分类问题,利用深度神经网络提升主题爬虫的性能。在卷积层之后拼接LDA提取的主题特征,弥补传统卷积神经网络的主题信息缺失。实验结果表明,该方法可以有效提升主题判断模块的平均准确率,在真实爬取环境中相比其他方法更具优势。

关 键 词:卷积神经网络  主题爬虫  深度学习  LDA主题模型

Convolutional Neural Network Themed Reptile Research Based on LDA
WANG Kui,FEI Chenjie,LIU Baisong.Convolutional Neural Network Themed Reptile Research Based on LDA[J].Computer Engineering and Applications,2019,55(11):123-128.
Authors:WANG Kui  FEI Chenjie  LIU Baisong
Affiliation:1. School of Information Science and Engineering, Ningbo University, Ningbo, Zhejiang 315211, China 2. Library and Information Center, Ningbo University, Ningbo, Zhejiang 315211, China
Abstract:When the traditional theme crawler calculates the topic similarity, it usually adopts the method based on word frequency, vector space model and semantic similarity, which brings certain bottleneck to the improvement of similarity calculation accuracy. Therefore, a convolutional neural network topic crawler that integrates LDA is proposed, and the subject judgment module is regarded as a text classification problem, and the deep neural network is used to improve the theme crawler performance. After the convolutional layer, the theme features extracted by LDA are spliced to make up for the missing information of the traditional convolutional neural network. The experimental results show that this method can effectively improve the average accuracy of the topic judgment module, and it is more advantageous than other methods in the real crawl environment.
Keywords:convolutional neural network  subject crawler  deep learning  LDA topic model  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号