面向非任务型对话系统的人工标注中文数据集 A Chinese Corpus for Non-task-oriented Dialogue Systems with Five-grade Manual Annotations期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

面向非任务型对话系统的人工标注中文数据集

引用本文：	李菁,张海松,宋彦.面向非任务型对话系统的人工标注中文数据集[J].中文信息学报,2019,33(3):17-24.

作者姓名：	李菁张海松宋彦

作者单位：	腾讯AI Lab,广东深圳 518052

摘要：	该文针对非任务导向型对话的回复质量构建了一个大规模的人工标注中文数据集,该数据集包含了从社交媒体收集到的超过27 000个对话问题以及超过82 000个对话问题的回复①。为了产生高质量的标注数据,邀请了专业人员根据对话回复的相关性、连贯性、信息性、趣味性,以及是否潜在地具有让对话继续延续的特性进行标注,在标注中定义了一个五级评分方法,分别是:极差的、较差的、一般的、较好的、极好的。为了测试标注产生的数据集是否具有有效性和实用性,以对话回复选择为任务,在标注数据集上测试了多种无监督和有监督模型。实验结果表明,该数据集对于提升对话回复选择的质量有显著效果。
关键词：	对话系统人工标注中文数据集
A Chinese Corpus for Non-task-oriented Dialogue Systems with Five-grade Manual Annotations

LI Jing,ZHANG Haisong,SONG Yan.A Chinese Corpus for Non-task-oriented Dialogue Systems with Five-grade Manual Annotations[J].Journal of Chinese Information Processing,2019,33(3):17-24.

Authors:	LI Jing ZHANG Haisong SONG Yan

Affiliation:	Tencent AI Lab, Shenzhen, Guangdong 518052, China

Abstract:	This paper presents a large-scale corpus for non-task-oriented dialogue systems, which contains over 27K distinct prompts with more than 82K responses collected from social media. To annotate this corpus, we define a 5-grade rating scheme (bad, mediocre, acceptable, good, and excellent) with respect to the relevance, coherence, informativeness, interestingness, and the potential to move a conversation forward. To test the validity and usefulness of the produced corpus, we compare various unsupervised and supervised models for response selection. Experimental results confirm that the proposed corpus is helpful in training response selection models.

Keywords:	dialogue system manual annotation Chinese corpus
本文献已被维普等数据库收录！
	点击此处可从《中文信息学报》浏览原始摘要信息
	点击此处可从《中文信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏