基于预训练语言模型的关键词感知问题生成 Keyword Aware Question Generation Based on Pre-Trained Language Model期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于预训练语言模型的关键词感知问题生成

引用本文：	于尊瑞,毛震东,王泉,张勇东.基于预训练语言模型的关键词感知问题生成[J].计算机工程,2022,48(2):125-131.

作者姓名：	于尊瑞毛震东王泉张勇东

作者单位：	1. 中国科学技术大学信息科学技术学院, 合肥 230000;2. 北京百度网讯科技有限公司, 北京 100000

基金项目：	国家自然科学基金（U19A2057）；

摘要：	问题生成任务是指根据给定的文本段落和答案来自动生成对应的问题。针对现有问题生成方法存在的误差累积现象以及问题生成任务固有的“一对多”情况，提出一种带有关键词感知功能的问题生成方法。在预训练语言模型的基础上，实现关键词分类模型与问题生成模型的网络结构设计。输入文本段落中蕴含关键词，为使所生成的问题中包含同样的关键词以保证问题与段落的语义一致性，利用关键词分类模型提取出文本段落中的关键词，将关键词与非关键词的区分特征融入问题生成模型的输入中，该特征作为问题生成过程的全局信息，用以消除问题生成模型仅依赖局部最优解的弊端，减少误差累积与“一对多”情况的发生。在SQuAD数据集上的实验结果表明，该方法能够提升问题生成的质量，其BLEU-4指标值可达24，优于带有复制机制、带有语义监督的问题生成模型，目前已经借助百度百科数据平台实现了大规模工业应用。
关键词：	问题生成预训练语言模型关键词分类自注意力掩码嵌入向量
收稿时间：	2021-01-06
修稿时间：	2021-02-21
Keyword Aware Question Generation Based on Pre-Trained Language Model

YU Zunrui,MAO Zhendong,WANG Quan,ZHANG Yongdong.Keyword Aware Question Generation Based on Pre-Trained Language Model[J].Computer Engineering,2022,48(2):125-131.

Authors:	YU Zunrui MAO Zhendong WANG Quan ZHANG Yongdong

Affiliation:	1. School of Information Science and Technology, University of Science and Technology of China, Hefei 230000, China;2. Beijing Baidu Netcom Science Technology Co., Ltd., Beijing 100000, China

Abstract:	The Question Generation(QG) task is to automatically generate the corresponding question based on a given text paragraph and answer.The existing QG methods often fail to deal with error accumulation and the one-answer-to-multiple-question problem in QG tasks.To address the problem, this paper proposes a keyword aware question generation method.We design the network structure for keyword classification and QG based on the pre-trained language model.To make the generated question include the same keywords as the input paragraph, which ensures the semantic consistency between the question and paragraph, we use the keyword classification model to extract the keywords in the paragraph, and integrate the feature that distinguish keywords from non-keywords into the input of the QG model.The feature acts as the global information of QG process to reduce dependency of the QG model on the local optimal solution only, and reduce the occurrence of error accumulation and one-answer-to-multiple-question problem.The experimental results on the SQuAD dataset show that this method can improve the quality of generated questions.Its BLEU-4 value reaches up to 24, higher than the QG models with replication mechanism or semantic supervision.This method has realized large-scale industrial application based on the Baidu Encyclopedia, a ten-million-scale data platform.

Keywords:	Question Generation(QG) pre-trained language model keyword classification self-attention mask embedding vector
本文献已被维普等数据库收录！
	点击此处可从《计算机工程》浏览原始摘要信息
	点击此处可从《计算机工程》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏