首页 | 官方网站   微博 | 高级检索  
     

融合潜在主题信息和卷积语义特征的文本分类
引用本文:陈培新,郭武.融合潜在主题信息和卷积语义特征的文本分类[J].信号处理,2017,33(8):1090-1096.
作者姓名:陈培新  郭武
作者单位:中国科学技术大学语音及语言信息处理国家工程实验室
基金项目:国家重点研发计划项目(2016YFB1001300)
摘    要:经典的概率主题模型通过词与词的共现挖掘文本的潜在主题信息,在文本聚类与分类任务上被广泛应用。近几年来,随着词向量和各种神经网络模型在自然语言处理上的成功应用,基于神经网络的文本分类方法开始成为研究主流。本文通过卷积神经网络(Convolutional Neural Network,CNN)和概率主题模型在文本主题分类上的效果对比,展示了CNN在此任务上的优越性。在此基础上,本文利用CNN模型提取文本的特征向量并将其命名为卷积语义特征。为了更好地刻画文本的主题信息,本文在卷积语义特征上加入文本的潜在主题分布信息,从而得到一种更有效的文本特征表示。实验结果表明,相比于单独的概率主题模型或CNN模型,新的特征表示显著地提升了主题分类任务的F1值。 

关 键 词:概率主题模型    词向量    卷积神经网络    文本分类
收稿时间:2017-01-10

Text Categorization Combining Latent Topic Information and Convolutional Semantic Features
Affiliation:University of Science and Technology of China, National Engineering Laboratory for? Speech and Language Information Processing
Abstract:The classical probabilistic topic models can discover the latent topic information of documents by the co-occurrences of words, thus being widely used in text clustering and categorization tasks. In the last few years, with the successful applications of word embedding and neural networks, the research of text categorization based on neural networks has formed the mainstream. This paper shows the superiority of neural networks in text categorization tasks by comparing the Convolutional Neural Networks (CNN) and probabilistic topic models. And on this basis, this paper extracted the document feature vector through CNN and named it Convolutional Semantic Feature. In order to describe the topic information of documents better, this paper proposed a new kind of feature by combing the Convolutional Semantic Feature and latent topic information. The experimental results presented in this paper shows that this kind of new feature is superior to individual probabilistic topic model or CNN model,and obviously improves the F1 performance of topic categorization tasks. 
Keywords:
点击此处可从《信号处理》浏览原始摘要信息
点击此处可从《信号处理》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号