首页 | 官方网站   微博 | 高级检索  
     

基于ERNIE-BiGRU模型的中文文本分类方法
引用本文:雷景生,钱叶.基于ERNIE-BiGRU模型的中文文本分类方法[J].上海电力学院学报,2020,36(4):329-335,350.
作者姓名:雷景生  钱叶
作者单位:上海电力大学 计算机科学与技术学院
摘    要:针对新闻文本分类方法中词向量的表示无法很好地保留字在句子中的信息及其多义性,利用知识增强的语义表示(ERNIE)预训练模型,根据上下文计算出字的向量表示,在保留该字上下文信息的同时也能根据字的多义性进行调整,增强了字的语义表示。在ERNIE模型后增加了双向门限循环单元(Bi GRU),将训练后的词向量作为Bi GRU的输入进行训练,得到文本分类结果。实验表明,该模型在新浪新闻的公开数据集THUCNews上的精确率为94. 32%,召回率为94. 12%,F1值为0. 942 2,在中文文本分类任务中具有良好的性能。

关 键 词:文本分类  利用知识增强的语义表示模型  双向门限循环单元模型  预训练模型  知识整合
收稿时间:2020/2/24 0:00:00

Chinese-text Classification Method Based on ERNIEBiGRU
LEI Jingsheng,QIAN Ye.Chinese-text Classification Method Based on ERNIEBiGRU[J].Journal of Shanghai University of Electric Power,2020,36(4):329-335,350.
Authors:LEI Jingsheng  QIAN Ye
Affiliation:School of Computer Science and Technology, Shanghai University of Electric Power, Shanghai 200082, China
Abstract:In the news text classification method,the representation of word vectors cannot well preserve the information of the words in the sentence and its ambiguity.Using ERNIE pre-trained model,the vector of words is calculated according to the context.While retaining the context information of the word,it can also be adjusted according to the ambiguity of the word,which enhances the semantic representation of the word.A BiGRU layer is innovatively added after the ERNIE model,and the trained word vector is used as the input of the BiGRU for training to obtain the text classification result.The experiments show that the accuracy of the model on the public data set THUCNews of Sina News is 94.32%,the loss rate is 94.12%,and the F1 value is 0.9422,which has good performance in Chinese text classification tasks.
Keywords:text classification  enhaned representation through knowledge integration  bidirectional gated recurrent unit  pre-trained model  knowledge integration
本文献已被 CNKI 等数据库收录!
点击此处可从《上海电力学院学报》浏览原始摘要信息
点击此处可从《上海电力学院学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号