首页 | 官方网站   微博 | 高级检索  
     

面向电力低资源领域的无监督命名实体识别方法
引用本文:刘荫,张凯,王惠剑,杨冠群.面向电力低资源领域的无监督命名实体识别方法[J].中文信息学报,2022,36(6):69-79.
作者姓名:刘荫  张凯  王惠剑  杨冠群
作者单位:国网山东电力集团公司,山东 济南 250001
基金项目:国网山东省电力公司科技项目(2020A-013)
摘    要:该文提出了一种在低资源条件下,只利用无标注文档资源进行电力领域命名实体识别的无监督方法。该方法收集电力领域相关语料,利用串频统计技术更新电力领域词典,同时根据结构化电力数据解析出实体词及其类型,并通过表示学习获得每种实体类型的代表词表示。同时利用BERT全词遮盖技术对文本中的词语进行预测,计算文本词语和实体类型代表词之间的语义相似度,进而完成命名实体识别及类型判断。实验表明,该方法对数据条件要求低,具有很强的实用性,且易于复用到其他领域。

关 键 词:命名实体识别  无监督方法  电力领域  BERT全词遮盖  

Unsupervised Low-resource Name Entities Recognition in Electric Power Domain
LIU Yin,ZHANG Kai,WANG Huijian,YANG Guanqun.Unsupervised Low-resource Name Entities Recognition in Electric Power Domain[J].Journal of Chinese Information Processing,2022,36(6):69-79.
Authors:LIU Yin  ZHANG Kai  WANG Huijian  YANG Guanqun
Affiliation:Shandong Electric Power Group Co. LTD, Jinan, Shandong 250001, China
Abstract:This paper proposes an unsupervised method for the low-resource named entity recognition in the electric power domain. We collect the target domain corpus and use string statistics techniques to update the domain vocabulary. We also obtain a small scale of entity words with their types by parsing the structured electric power maintenance manuals, and the representative words for each entity type are selected according to the word embedding based similarity. At the same time, we pre-train the electric power BERT model with the whole word masking technique, and predict the entity words in the text and their possible entity types by calculating their semantic similarities with those representative words. Experiments show that our method is feasible for low-source data and can be easily reused in other domains.
Keywords:named entity recognition  unsupervised method  electric power domain  BERT whole word masking  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号