首页 | 官方网站   微博 | 高级检索  
     

中文开放域问答系统数据增广研究
引用本文:杜家驹,叶德铭,孙茂松. 中文开放域问答系统数据增广研究[J]. 中文信息学报, 2022, 36(11): 121-130
作者姓名:杜家驹  叶德铭  孙茂松
作者单位:1.清华大学 计算机科学与技术系,北京100084;
2.清华大学 人工智能研究院,北京100084;
3.清华大学 智能技术与系统国家重点实验室,北京100084
基金项目:国家重点研发计划项目(2020AAA0106500)
摘    要:开放域问答是自然语言处理中的重要任务之一。目前的开放域问答模型总是倾向于在问题和文章之间做浅层的文本匹配,经常在一些简单问题上出错。这些错误的原因部分是由于阅读理解数据集缺少一些真实场景下常见的模式。该文提出了几种能够提高开放域问答鲁棒性的数据增广方法,能有效减少这些常见模式的影响。此外,我们还构造并公开发布了一个新的开放域问答数据集,能够评估模型在真实场景下的实际效果。实验结果表明,该文提出的方法在实际场景下带来了性能提升。

关 键 词:开放域问答  鲁棒性  数据增广
收稿时间:2020-12-31

Data Augmentation in Chinese Open-domain Question Answering
DU Jiaju,YE Deming,SUN Maosong. Data Augmentation in Chinese Open-domain Question Answering[J]. Journal of Chinese Information Processing, 2022, 36(11): 121-130
Authors:DU Jiaju  YE Deming  SUN Maosong
Affiliation:1.Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China;2.Institute of Artificial Intelligence, Tsinghua University, Beijing 100084, China;3.State Key Laboratory of Intelligent Technology and Systems, Tsinghua University, Beijing 100084, China
Abstract:Open-domain Question Answering (OpenQA) is an important task in natural language processing. However, OpenQA models tend to match texts on a superficial level between questions and documents and often make stupid errors on some easy questions. Part of the reason for these errors is that reading comprehension datasets lack some common patterns in the actual scenes. To eliminate the effects of these patterns, we propose several methods to improve the robustness of OpenQA models. Besides, we build a new dataset to evaluate the performance of models in the real world. The experimental results show that the proposed methods can improve the performance of OpenQA models on this dataset.
Keywords:open-domain question answering    robustness    data augmentation  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号