首页 | 官方网站   微博 | 高级检索  
     

基于Transformer与技术词信息的知识产权实体识别方法
引用本文:王宇晖1,2,杜军平1,2,邵蓥侠1,2. 基于Transformer与技术词信息的知识产权实体识别方法[J]. 智能系统学报, 2023, 18(1): 186-193. DOI: 10.11992/tis.202203036
作者姓名:王宇晖1  2  杜军平1  2  邵蓥侠1  2
作者单位:1. 北京邮电大学 计算机学院,北京 100876;2. 北京邮电大学 智能通信软件与多媒体北京市重点实验室,北京 100876
摘    要:专利文本中包含了大量实体信息,通过命名实体识别可以从中抽取包含关键信息的知识产权实体信息,帮助研究人员更快了解专利内容。现有的命名实体提取方法难以充分利用专业词汇变化带来的词层面的语义信息。本文提出基于Transformer和技术词信息的知识产权实体提取方法,结合BERT语言方法提供精准的字向量表示,并在字向量生成过程中,加入利用字向量经迭代膨胀卷积网络提取的技术词信息,提高对知识产权实体的表征能力。最后使用引入相对位置编码的Transformer编码器,从字向量序列中学习文本的深层语义信息,并实现实体标签预测。在公开数据集和标注的专利数据集的实验结果表明,该方法提升了实体识别的准确性。

关 键 词:中文命名实体识别  知识产权  Transformer编码器  信息融合  向量表示  科技大数据  专利  深度学习

An intellectual property entity recognition method based on Transformer and technological word information
WANG Yuhui1,2,DU Junping1,2,SHAO Yingxia1,2. An intellectual property entity recognition method based on Transformer and technological word information[J]. CAAL Transactions on Intelligent Systems, 2023, 18(1): 186-193. DOI: 10.11992/tis.202203036
Authors:WANG Yuhui1  2  DU Junping1  2  SHAO Yingxia1  2
Affiliation:1. School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China;2. Beijing Key Laboratory of Intelligent Telecommunication Software and Multimedia, Beijing University of Posts and Telecommunications, Beijing 100876, China
Abstract:Patent text contains abundant entity information, from which the intellectual property (IP) entity information containing key information can be extracted through named entity recognition, which helps researchers understand patent content faster. For the existing named entity extraction method, the semantic information at the word level brought by a change in technical words is difficult to fully use. In this paper, the IP entity information extraction method based on Transformer and technical word information is proposed, which provides exact word vector representation based on the BERT language model. In the process of word vector generation, this method improves the representation ability of IP entities by adding the technical word information extracted by iterated dilated convolution neural network. Finally, the Transformer encoder with relative position coding is used to learn the deep semantic information of the text from the word vector sequence, realizing the prediction of the entity label. Experimental results on public and annotated patent datasets show that this method improves entity recognition accuracy.
Keywords:entity recognition named in Chinese   intellectual property   Transformer encoder   information fusion   vector representation   science and technology big data   patent   deep learning
点击此处可从《智能系统学报》浏览原始摘要信息
点击此处可从《智能系统学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号