首页 | 官方网站   微博 | 高级检索  
     

基于上下文相关字向量的中文命名实体识别
引用本文:张栋,陈文亮.基于上下文相关字向量的中文命名实体识别[J].计算机科学,2021,48(3):233-238.
作者姓名:张栋  陈文亮
作者单位:苏州大学计算机科学与技术学院 江苏 苏州 215006;苏州大学计算机科学与技术学院 江苏 苏州 215006
摘    要:命名实体识别(NER)旨在识别出文本中的专有名词,并对其进行分类。由于用于监督学习的训练数据通常由人工标注,耗时耗力,因此很难得到大规模的标注数据。为解决中文命名实体识别任务中因缺乏大规模标注语料而造成的数据稀缺问题,以及传统字向量不能解决的一字多义问题,文中使用在大规模无监督数据上预训练的基于上下文相关的字向量,即利用语言模型生成上下文相关字向量以改进中文NER模型的性能。同时,为解决命名实体识别中的未登录词问题,文中提出了基于字语言模型的中文NER系统。把语言模型学习到的字向量作为NER模型的输入,使得同一中文汉字在不同语境中有不同的表示。文中在6个中文NER数据集上进行了实验。实验结果表明,基于上下文相关的字向量可以很好地提升NER模型的性能,其平均性能F1值提升了4.95%。对实验结果进行进一步分析发现,新系统在OOV实体识别上也可以取得很好的效果,同时对一些特殊类型的中文实体识别也有不错的表现。

关 键 词:命名实体识别  语言模型  上下文相关字向量

Chinese Named Entity Recognition Based on Contextualized Char Embeddings
ZHANG Dong,CHEN Wen-liang.Chinese Named Entity Recognition Based on Contextualized Char Embeddings[J].Computer Science,2021,48(3):233-238.
Authors:ZHANG Dong  CHEN Wen-liang
Affiliation:(School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 215006,China)
Abstract:Named Entity Recognition(NER)is designed to identify and classify proper nouns in text.Training data for supervised learning are usually manually annotated,and it is difficult to obtain large-scale annotated data due to time-consuming and labor-intensive.In order to solve the problem of data sparseness caused by the lack of large-scale annotation corpus and the problem of polysemy of char embedding in the Chinese NER task,this paper uses contextualized char embeddings which is pre-trained on large-scale unlabeled data to improve the performance of the Chinese NER model.Furthermore,to solve the problem of out-of-vocabulary words in named entity recognition,this paper proposes a Chinese NER system based on word language model.We use the contextualized char embeddings of generated by the language model as the input of the NER model to capture different mea-nings of Chinese characters in different contexts.In this paper,we conduct experiments on six Chinese NER datasets.The experimental results show that the proposed model can improve the performance and the average F1 improves by 4.95%.In addition,this paper further analyzes the experimental results and finds that the proposed model can achieve better results on OOV entities,and it has good performance for some special types of Chinese entity recognition.
Keywords:Named entity recognition  Language model  Contextualized char vector
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号