首页 | 官方网站   微博 | 高级检索  
     

基于ALBERT-BGRU-CRF的中文命名实体识别方法
引用本文:李军怀,陈苗苗,王怀军,崔颖安,张爱华.基于ALBERT-BGRU-CRF的中文命名实体识别方法[J].计算机工程,2022,48(6):89-94+106.
作者姓名:李军怀  陈苗苗  王怀军  崔颖安  张爱华
作者单位:1. 西安理工大学 计算机科学与工程学院, 西安 710048;2. 中铝萨帕特种铝材(重庆)有限公司, 重庆 401326
基金项目:国家重点研发计划(2018YFB1703000);
摘    要:命名实体识别是知识图谱构建、搜索引擎、推荐系统等上层自然语言处理任务的重要基础,中文命名实体识别是对一段文本序列中的专有名词或特定命名实体进行标注分类。针对现有中文命名实体识别方法无法有效提取长距离语义信息及解决一词多义的问题,提出一种基于ALBERT-双向门控循环单元(BGRU)-条件随机场(CRF)模型的中文命名实体识别方法。使用ALBERT预训练语言模型对输入文本进行词嵌入获取动态词向量,有效解决了一词多义的问题。采用BGRU提取上下文语义特征进一步理解语义,获取长距离词之间的语义特征。将拼接后的向量输入至CRF层并利用维特比算法解码,降低错误标签输出概率。最终得到实体标注信息,实现中文命名实体识别。实验结果表明,ALBERT-BGRU-CRF模型在MSRA语料库上的中文命名实体识别准确率和召回率分别达到95.16%和94.58%,同时相比于片段神经网络模型和CNN-BiLSTM-CRF模型的F1值提升了4.43和3.78个百分点。

关 键 词:命名实体识别  预训练语言模型  双向门控循环单元  条件随机场  词向量  深度学习  
收稿时间:2021-05-12
修稿时间:2021-07-26

Chinese Named Entity Recognition Method Based on ALBERT-BGRU-CRF
LI Junhuai,CHEN Miaomiao,WANG Huaijun,CUI Ying'an,ZHANG Aihua.Chinese Named Entity Recognition Method Based on ALBERT-BGRU-CRF[J].Computer Engineering,2022,48(6):89-94+106.
Authors:LI Junhuai  CHEN Miaomiao  WANG Huaijun  CUI Ying'an  ZHANG Aihua
Affiliation:1. School of Computer Science and Engineering, Xi'an University of Technology, Xi'an 710048, China;2. Sapa Chalco Aluminium Products(Chongqing) Co., Ltd., Chongqing 401326, China
Abstract:Named Entity Recognition(NER) is an important basis for upper-level natural language processing tasks such as knowledge graph construction, search engines, and recommendation systems.Chinese NER labels and classifies proper nouns or specific named entities in a text sequence.Aiming at the problem that the existing Chinese NER methods cannot effectively extract long-distance semantic information and solve the problem of polysemy, this study proposes a Chinese NER method based on ALBERT pre-training language model, Bidirectional Gated Recurrent Unit(BGRU) and Conditional Random Field(CRF), called ALBERT-BGRU-CRF model.First, the ALBERT pre-trained language model performs word embedding on the input text to obtain dynamic word vectors, which can effectively solve the polysemy problem.Second, BGRU extracts contextual semantic features to further understand semantics and obtain semantic features between long-distance words.Finally, the concatenated vector is input to the CRF layer and decoded using the Viterbi algorithm to reduce the probability of wrongly labelling the output.Then, the entity annotation information is obtained, and the Chinese NER task is completed.The experimental results show that the Chinese NER accuracy and recall rate of the ALBERT-BGRU-CRF model on the MSRA corpus reach 95.16% and 94.58%, respectively.Simultaneously, compared with the fragment neural network model and the CNN-BiLSTM-CRF model, the F1 value of the ALBERT-BGRU-CRF model has increased by 4.43 and 3.78 percentage points.
Keywords:Named Entity Recognition(NER)  pre-trained language model  Bidirectional Gated Recurrent Unit(BGRU)  Conditional Random Field(CRF)  word vector  deep learning  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号