首页 | 官方网站   微博 | 高级检索  
     

基于深度神经网络的法语命名实体识别模型
引用本文:严红,陈兴蜀,王文贤,王海舟,殷明勇.基于深度神经网络的法语命名实体识别模型[J].计算机应用,2019,39(5):1288-1292.
作者姓名:严红  陈兴蜀  王文贤  王海舟  殷明勇
作者单位:四川大学计算机学院,成都,610065;四川大学计算机学院,成都610065;四川大学网络空间安全学院,成都610065;四川大学网络空间安全研究院,成都,610065
基金项目:国家自然科学基金资助项目(61802270);国家"双创"示范基地之变革性技术国际研发转化平台项目(C700011);四川省重点研发项目(2018G20100)。
摘    要:现有法语命名实体识别(NER)研究中,机器学习模型多使用词的字符形态特征,多语言通用命名实体模型使用字词嵌入代表的语义特征,都没有综合考虑语义、字符形态和语法特征。针对上述不足,设计了一种基于深度神经网络的法语命名实体识别模型CGC-fr。首先从文本中提取单词的词嵌入、字符嵌入和语法特征向量;然后由卷积神经网络(CNN)从单词的字符嵌入序列中提取单词的字符特征;最后通过双向门控循环神经网络(BiGRU)和条件随机场(CRF)分类器根据词嵌入、字符特征和语法特征向量识别出法语文本中的命名实体。实验中,CGC-fr在测试集的F1值能够达到82.16%,相对于机器学习模型NERC-fr、多语言通用的神经网络模型LSTM-CRF和Char attention模型,分别提升了5.67、1.79和1.06个百分点。实验结果表明,融合三种特征的CGC-fr模型比其他模型更具有优势。

关 键 词:命名实体识别  法语  深度神经网络  自然语言处理  序列标注
收稿时间:2018-10-26
修稿时间:2018-12-26

Recognition model for French named entities based on deep neural network
YAN Hong,CHEN Xingshu,WANG Wenxian,WANG Haizhou,YIN Mingyong.Recognition model for French named entities based on deep neural network[J].journal of Computer Applications,2019,39(5):1288-1292.
Authors:YAN Hong  CHEN Xingshu  WANG Wenxian  WANG Haizhou  YIN Mingyong
Affiliation:1. College of Computer Science, Sichuan University, Chengdu Sichuan 610065, China;2. College of Cybersecurity, Sichuan University, Chengdu Sichuan 610065, China;3. Cybersecurity Research Institute, Sichuan University, Chengdu Sichuan 610065, China
Abstract:In the existing French Named Entity Recognition (NER) research, the machine learning models mostly use the character morphological features of words, and the multilingual generic named entity models use the semantic features represented by word embedding, both without taking into account the semantic, character morphological and grammatical features comprehensively. Aiming at this shortcoming, a deep neural network based model CGC-fr was designed to recognize French named entity. Firstly, word embedding, character embedding and grammar feature vector were extracted from the text. Then, character feature was extracted from the character embedding sequence of words by using Convolution Neural Network (CNN). Finally, Bi-directional Gated Recurrent Unit Network (BiGRU) and Conditional Random Field (CRF) were used to label named entities in French text according to word embedding, character feature and grammar feature vector. In the experiments, F1 value of CGC-fr model can reach 82.16% in the test set, which is 5.67 percentage points, 1.79 percentage points and 1.06 percentage points higher than that of NERC-fr, LSTM(Long Short-Term Memory network)-CRF and Char attention models respectively. The experimental results show that CGC-fr model with three features is more advantageous than the others.
Keywords:Named Entity Recognition (NER)                                                                                                                        French                                                                                                                        neural network                                                                                                                        Natural Language Processing (NLP)                                                                                                                        sequence labeling
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号