基于上下文相关字向量的中文命名实体识别 Chinese Named Entity Recognition Based on Contextualized Char Embeddings期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于上下文相关字向量的中文命名实体识别

引用本文：	张栋,陈文亮.基于上下文相关字向量的中文命名实体识别[J].计算机科学,2021,48(3):233-238.

作者姓名：	张栋陈文亮

作者单位：	苏州大学计算机科学与技术学院江苏苏州 215006;苏州大学计算机科学与技术学院江苏苏州 215006

摘要：	命名实体识别(NER)旨在识别出文本中的专有名词,并对其进行分类。由于用于监督学习的训练数据通常由人工标注,耗时耗力,因此很难得到大规模的标注数据。为解决中文命名实体识别任务中因缺乏大规模标注语料而造成的数据稀缺问题,以及传统字向量不能解决的一字多义问题,文中使用在大规模无监督数据上预训练的基于上下文相关的字向量,即利用语言模型生成上下文相关字向量以改进中文NER模型的性能。同时,为解决命名实体识别中的未登录词问题,文中提出了基于字语言模型的中文NER系统。把语言模型学习到的字向量作为NER模型的输入,使得同一中文汉字在不同语境中有不同的表示。文中在6个中文NER数据集上进行了实验。实验结果表明,基于上下文相关的字向量可以很好地提升NER模型的性能,其平均性能F1值提升了4.95%。对实验结果进行进一步分析发现,新系统在OOV实体识别上也可以取得很好的效果,同时对一些特殊类型的中文实体识别也有不错的表现。
关键词：	命名实体识别语言模型上下文相关字向量
Chinese Named Entity Recognition Based on Contextualized Char Embeddings

ZHANG Dong,CHEN Wen-liang.Chinese Named Entity Recognition Based on Contextualized Char Embeddings[J].Computer Science,2021,48(3):233-238.

Authors:	ZHANG Dong CHEN Wen-liang

Affiliation:	(School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 215006,China)

Abstract:	Named Entity Recognition(NER)is designed to identify and classify proper nouns in text.Training data for supervised learning are usually manually annotated,and it is difficult to obtain large-scale annotated data due to time-consuming and labor-intensive.In order to solve the problem of data sparseness caused by the lack of large-scale annotation corpus and the problem of polysemy of char embedding in the Chinese NER task,this paper uses contextualized char embeddings which is pre-trained on large-scale unlabeled data to improve the performance of the Chinese NER model.Furthermore,to solve the problem of out-of-vocabulary words in named entity recognition,this paper proposes a Chinese NER system based on word language model.We use the contextualized char embeddings of generated by the language model as the input of the NER model to capture different mea-nings of Chinese characters in different contexts.In this paper,we conduct experiments on six Chinese NER datasets.The experimental results show that the proposed model can improve the performance and the average F1 improves by 4.95%.In addition,this paper further analyzes the experimental results and finds that the proposed model can achieve better results on OOV entities,and it has good performance for some special types of Chinese entity recognition.

Keywords:	Named entity recognition Language model Contextualized char vector
本文献已被维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏