首页 | 官方网站   微博 | 高级检索  
     

基于注意力机制的文本作者识别
引用本文:张洋,江铭虎.基于注意力机制的文本作者识别[J].计算机应用,2021,41(7):1897-1901.
作者姓名:张洋  江铭虎
作者单位:清华大学 人文学院, 北京 100084
基金项目:国家自然科学基金资助项目(62036001)。
摘    要:基于神经网络的作者识别在面临较多候选作者时识别准确率会大幅降低。为了提高作者识别精度,提出一种由快速文本分类(fastText)和注意力层构成的神经网络,并将该网络结合连续的词性标签n元组合(POS n-gram)特征进行中文小说的作者识别。与文本卷积神经网络(TextCNN)、文本循环神经网络(TextRNN)、长短期记忆(LSTM)网络和fastText进行对比,实验结果表明,所提出的模型获得了最高的分类准确率,与fastText模型相比,注意力机制的引入使得不同POS n-gram特征对应的准确率平均提高了2.14个百分点;同时,该模型保留了fastText的快速高效,且其所使用的文本特征可以推广到其他语言上。

关 键 词:作者识别  词性标签n元组合  神经网络  快速文本分类  注意力机制  
收稿时间:2020-10-08
修稿时间:2020-12-15

Authorship identification of text based on attention mechanism
ZHANG Yang,JIANG Minghu.Authorship identification of text based on attention mechanism[J].journal of Computer Applications,2021,41(7):1897-1901.
Authors:ZHANG Yang  JIANG Minghu
Affiliation:School of Humanities, Tsinghua University, Beijing 100084, China
Abstract:The accuracy of authorship identification based on deep neural network decreases significantly when faced with a large number of candidate authors. In order to improve the accuracy of authorship identification, a neural network consisting of fast text classification (fastText) and an attention layer was proposed, and it was combined with the continuous Part-Of-Speech (POS) n-gram features for authorship identification of Chinese novels. Compared with Text Convolutional Neural Network (TextCNN), Text Recurrent Neural Network (TextRNN), Long Short-Term Memory (LSTM) network and fastText, the experimental results show that the proposed model obtains the highest classification accuracy. Compared with the fastText model, the introduction of attention mechanism increases the accuracy corresponding to different POS n-gram features by 2.14 percentage points on average; meanwhile, the model retains the high-speed and efficiency of fastText, and the text features used by it can be applied to other languages.
Keywords:authorship identification  Part-Of-Speech (POS) n-gram  neural network  fast text classification (fastText)  attention mechanism  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号