中文开放域问答系统数据增广研究 Data Augmentation in Chinese Open-domain Question Answering期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

中文开放域问答系统数据增广研究

引用本文：	杜家驹,叶德铭,孙茂松. 中文开放域问答系统数据增广研究[J]. 中文信息学报, 2022, 36(11): 121-130

作者姓名：	杜家驹叶德铭孙茂松

作者单位：	1.清华大学计算机科学与技术系,北京100084; 2.清华大学人工智能研究院,北京100084; 3.清华大学智能技术与系统国家重点实验室,北京100084

基金项目：	国家重点研发计划项目(2020AAA0106500)

摘要：	开放域问答是自然语言处理中的重要任务之一。目前的开放域问答模型总是倾向于在问题和文章之间做浅层的文本匹配，经常在一些简单问题上出错。这些错误的原因部分是由于阅读理解数据集缺少一些真实场景下常见的模式。该文提出了几种能够提高开放域问答鲁棒性的数据增广方法，能有效减少这些常见模式的影响。此外，我们还构造并公开发布了一个新的开放域问答数据集，能够评估模型在真实场景下的实际效果。实验结果表明，该文提出的方法在实际场景下带来了性能提升。
关键词：	开放域问答鲁棒性数据增广
收稿时间：	2020-12-31
Data Augmentation in Chinese Open-domain Question Answering

DU Jiaju,YE Deming,SUN Maosong. Data Augmentation in Chinese Open-domain Question Answering[J]. Journal of Chinese Information Processing, 2022, 36(11): 121-130

Authors:	DU Jiaju YE Deming SUN Maosong

Affiliation:	1.Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China;2.Institute of Artificial Intelligence, Tsinghua University, Beijing 100084, China;3.State Key Laboratory of Intelligent Technology and Systems, Tsinghua University, Beijing 100084, China

Abstract:	Open-domain Question Answering (OpenQA) is an important task in natural language processing. However, OpenQA models tend to match texts on a superficial level between questions and documents and often make stupid errors on some easy questions. Part of the reason for these errors is that reading comprehension datasets lack some common patterns in the actual scenes. To eliminate the effects of these patterns, we propose several methods to improve the robustness of OpenQA models. Besides, we build a new dataset to evaluate the performance of models in the real world. The experimental results show that the proposed methods can improve the performance of OpenQA models on this dataset.

Keywords:	open-domain question answering robustness data augmentation

	点击此处可从《中文信息学报》浏览原始摘要信息
	点击此处可从《中文信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏