首页 | 官方网站   微博 | 高级检索  
     

中文单文档摘要模型DSum-SSE
引用本文:赫俊民,鲁梦华,孟魁.中文单文档摘要模型DSum-SSE[J].计算机工程与应用,2021,57(15):200-206.
作者姓名:赫俊民  鲁梦华  孟魁
作者单位:1.中国石化股份有限公司 胜利油田分公司,物探研究院,山东 东营 257093 2.上海交通大学 电子信息与电气工程学院,上海 200240
摘    要:针对中文文档摘要领域存在的缺少可靠数据集,有监督的摘要模型不成熟的问题,构建了一个规模超过20万篇的中文文档级别的摘要语料库(Chinese Document-level Extractive Summarization Dataset,CDESD),提出了一种有监督的文档级别抽取式摘要模型(Document Summarization with SPA Sentence Embedding,DSum-SSE)。该模型以神经网络为基础的框架,使用结合了Pointer和注意力机制的端到端框架解决句子级别的生成式摘要问题,以获得反映句子核心含义的表示向量,然后在此基础上引入极端的Pointer机制,完成文档级别抽取式摘要算法。实验表明,相比于无监督的单文档摘要算法--TextRank,DSum-SSE有能力提供更高质量的摘要。CDESD和DSum-SSE分别对中文文档级别摘要领域的语料数据和模型做了很好的补充。

关 键 词:文档级文本摘要  抽取式摘要  端到端框架  注意力机制  Pointer  

Chinese Document-Level Summary Model—DSum-SSE
HE Junmin,LU Menghua,MENG Kui.Chinese Document-Level Summary Model—DSum-SSE[J].Computer Engineering and Applications,2021,57(15):200-206.
Authors:HE Junmin  LU Menghua  MENG Kui
Affiliation:1.Shengli Geophysical Research Institute of China Petroleum and Chemical Corporation, Dongying, Shandong 257093, China 2.School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
Abstract:Text summarization technology filters out important information from the text and presents it reasonably, which can help people quickly obtain information. In the field of Chinese single-document summarization, the supervised summarization model is not mature due to the lack of reliable data sets. A Chinese document-level summary corpus-CDESD(Chinese Document-level Extractive Summarization Dataset) with a scale of more than 200,000 articles is constructed, and a supervised document-level extractive summary model-DSum-SSE(Document Summarization with SPA Sentence Embedding) is proposed. The model is based on a neural network framework, and uses a sequence-to-sequence framework that combines Pointer and attention mechanisms to solve sentence-level generative summarization problems to obtain a representation vector that reflects the core meaning of the sentence, and introduce extremes on this basis Pointer mechanism, complete the supervised document-level extractive summary algorithm. Experiments show that compared with the popular unsupervised document-level extractive summary algorithm-TextRank, DSum-SSE is capable of providing higher-quality summaries. The corpus CDESD and the model DSum-SSE complement well in the field of Chinese document level summaries.
Keywords:document-level summarization  extractive summary  sequence-to-sequence  attention mechanism  Pointer  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号