基于Transformer与技术词信息的知识产权实体识别方法 An intellectual property entity recognition method based on Transformer and technological word information期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于Transformer与技术词信息的知识产权实体识别方法

引用本文：	王宇晖^1,2,杜军平^1,2,邵蓥侠^1,2. 基于Transformer与技术词信息的知识产权实体识别方法[J]. 智能系统学报, 2023, 18(1): 186-193. DOI: 10.11992/tis.202203036

作者姓名：	王宇晖¹ 2 杜军平¹ 2 邵蓥侠¹ 2

作者单位：	1. 北京邮电大学计算机学院，北京 100876;2. 北京邮电大学智能通信软件与多媒体北京市重点实验室，北京 100876

摘要：	专利文本中包含了大量实体信息，通过命名实体识别可以从中抽取包含关键信息的知识产权实体信息，帮助研究人员更快了解专利内容。现有的命名实体提取方法难以充分利用专业词汇变化带来的词层面的语义信息。本文提出基于Transformer和技术词信息的知识产权实体提取方法，结合BERT语言方法提供精准的字向量表示，并在字向量生成过程中，加入利用字向量经迭代膨胀卷积网络提取的技术词信息，提高对知识产权实体的表征能力。最后使用引入相对位置编码的Transformer编码器，从字向量序列中学习文本的深层语义信息，并实现实体标签预测。在公开数据集和标注的专利数据集的实验结果表明，该方法提升了实体识别的准确性。
关键词：	中文命名实体识别知识产权 Transformer编码器信息融合向量表示科技大数据专利深度学习
An intellectual property entity recognition method based on Transformer and technological word information

WANG Yuhui^1,2,DU Junping^1,2,SHAO Yingxia^1,2. An intellectual property entity recognition method based on Transformer and technological word information[J]. CAAL Transactions on Intelligent Systems, 2023, 18(1): 186-193. DOI: 10.11992/tis.202203036

Authors:	WANG Yuhui¹ 2 DU Junping¹ 2 SHAO Yingxia¹ 2

Affiliation:	1. School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China;2. Beijing Key Laboratory of Intelligent Telecommunication Software and Multimedia, Beijing University of Posts and Telecommunications, Beijing 100876, China

Abstract:	Patent text contains abundant entity information, from which the intellectual property (IP) entity information containing key information can be extracted through named entity recognition, which helps researchers understand patent content faster. For the existing named entity extraction method, the semantic information at the word level brought by a change in technical words is difficult to fully use. In this paper, the IP entity information extraction method based on Transformer and technical word information is proposed, which provides exact word vector representation based on the BERT language model. In the process of word vector generation, this method improves the representation ability of IP entities by adding the technical word information extracted by iterated dilated convolution neural network. Finally, the Transformer encoder with relative position coding is used to learn the deep semantic information of the text from the word vector sequence, realizing the prediction of the entity label. Experimental results on public and annotated patent datasets show that this method improves entity recognition accuracy.

Keywords:	entity recognition named in Chinese intellectual property Transformer encoder information fusion vector representation science and technology big data patent deep learning

	点击此处可从《智能系统学报》浏览原始摘要信息
	点击此处可从《智能系统学报》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏