首页 | 官方网站   微博 | 高级检索  
     

结合预训练模型和语言知识库的文本匹配方法
引用本文:周烨恒,石嘉晗,徐睿峰.结合预训练模型和语言知识库的文本匹配方法[J].中文信息学报,2020,34(2):63-72.
作者姓名:周烨恒  石嘉晗  徐睿峰
作者单位:哈尔滨工业大学(深圳) 计算机科学与技术学院,广东 深圳 518055
基金项目:国家自然科学基金(U1636103;61632011;61876053);深圳市基础研究项目(JCYJ20180507183527919,JCYJ20180507183608379);深圳市技术攻关项目(JSGG20170817140856618);深圳证券信息联合研究计划资助;哈尔滨工业大学(深圳)创新研修课资助
摘    要:针对文本匹配任务,该文提出一种大规模预训练模型融合外部语言知识库的方法。该方法在大规模预训练模型的基础上,通过生成基于WordNet的同义-反义词汇知识学习任务和词组-搭配知识学习任务引入外部语言学知识。进而,与MT-DNN多任务学习模型进行联合训练,以进一步提高模型性能。最后利用文本匹配标注数据进行微调。在MRPC和QQP两个公开数据集的实验结果显示,该方法可以在大规模预训练模型和微调的框架基础上,通过引入外部语言知识进行联合训练有效提升文本匹配性能。

关 键 词:文本匹配  预训练模型  语言知识库融合

A Text Matching Method by Combining Pre-trained Model and Language Knowledge Base
ZHOU Yeheng,SHI Jiahan,XU Ruifeng.A Text Matching Method by Combining Pre-trained Model and Language Knowledge Base[J].Journal of Chinese Information Processing,2020,34(2):63-72.
Authors:ZHOU Yeheng  SHI Jiahan  XU Ruifeng
Affiliation:School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China
Abstract:Aiming at text matching task, this paper proposes a method to incorporate large-scale pre-training model and external language knowledge base. On the basis of large-scale pre-training model, this method introduces external linguistic knowledge by generating synonym-antonym knowledge learning task and phrase-collocation knowledge learning task based on WordNet, respectively. Then, the two new generated tasks are joint trained with MT-DNN multi task learning model to further improve the model performance. Finally, the annotated text matching data is used to fine tune. The experimental results on two open datasets, MRPC and QQP, show that the proposed method can effectively improve the performance of text matching by introducing external language knowledge for joint training on the basis of the framework of large-scale pre-training model and fine-tuning.
Keywords:text matching  pre-training model  language knowledge base  
本文献已被 维普 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号