首页 | 官方网站   微博 | 高级检索  
     

基于子词级别词向量和指针网络的朝鲜语句子排序
引用本文:闫晓东,解晓庆. 基于子词级别词向量和指针网络的朝鲜语句子排序[J]. 中文信息学报, 2022, 36(8): 54-61
作者姓名:闫晓东  解晓庆
作者单位:1.中央民族大学 信息工程学院,北京 100089;
2.国家语言资源监测与少数民族语言中心,北京 100089
摘    要:句子排序是自然语言处理中的重要任务之一,其应用包括多文档摘要、问答和文本生成。不当的句子排序会产生逻辑不通顺的文本,使得文本的可读性降低。该文采用在中英文上大规模使用的深度学习方法,同时结合朝鲜语词语形态变化丰富的特点,提出了一种基于子词级别词向量和指针网络的朝鲜语句子排序模型,其目的是解决传统方法无法挖掘深层语义信息的问题。该文提出基于形态素拆分的词向量训练方法(MorV),同时对比子词n元词向量训练方法(SG),得到朝鲜语词向量;采用了两种句向量方法:基于卷积神经网络(CNN)、基于长短时记忆网络(LSTM),结合指针网络分别进行实验。结果表明采用MorV和LSTM的句向量结合方法可以更好地捕获句子间的语义逻辑关系,提升句子排序的效果。

关 键 词:词向量  形态素拆分  指针网络  句子排序

Korean Sentence Ordering Based on Sub-Word Level Vector and Pointer Network
YAN Xiaodong,XIE Xiaoqing. Korean Sentence Ordering Based on Sub-Word Level Vector and Pointer Network[J]. Journal of Chinese Information Processing, 2022, 36(8): 54-61
Authors:YAN Xiaodong  XIE Xiaoqing
Affiliation:1.School of Information Engineering Minzu University of China, Beijing 100089, China;
2.National language resource Monitoring & Research Center Minority Languages Branch, Beijing 100089, China
Abstract:Sentence ranking is one of the core technologies in natural language processing, with wide applications in multi-document summarization, question answering and text generation. Considering the rich morphological changes of Korean words, this paper a puts forward a Korean sentence ordering model based on the sub-word level vector and pointer network. A morpheme split based word vector training method (MorV) is presented, and the Korean word vector is obtained by comparing the sub word n-gram word vector training method (SG). Two sentence vector methods, i.e. convolution neural network (CNN) and long-term memory network (LSTM), are explored in the pointer network. The results show that the combination of MorV and LSTM can better capture the semantic logic relationship between sentences to improve the sentence ordering.
Keywords:word vector    morpheme split    pointer network    sentence ordering  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号