首页 | 官方网站   微博 | 高级检索  
     

一种基于BERT的自动文本摘要模型构建方法
引用本文:岳一峰,黄蔚,任祥辉.一种基于BERT的自动文本摘要模型构建方法[J].计算机与现代化,2020,0(1):63-68.
作者姓名:岳一峰  黄蔚  任祥辉
作者单位:华北计算技术研究所,北京 100083;华北计算技术研究所,北京 100083;华北计算技术研究所,北京 100083
基金项目:国家重点研发计划资助项目
摘    要:针对传统词向量在自动文本摘要过程中因无法对多义词进行有效表征而降低文本摘要准确度和可读性的问题,提出一种基于BERT(Bidirectional Encoder Representations from Transformers)的自动文本摘要模型构建方法。该方法引入BERT预训练语言模型用于增强词向量的语义表示,将生成的词向量输入Seq2Seq模型中进行训练并形成自动文本摘要模型,实现对文本摘要的快速生成。实验结果表明,该模型在Gigaword数据集上能有效地提高生成摘要的准确率和可读性,可用于文本摘要自动生成任务。

关 键 词:文本摘要    BERT模型    注意力机制    Sequence-to-Sequence模型  
收稿时间:2020-02-13

An Automatic Text Summarization Model Construction Method Based on BERT Embedding
YUE Yi-feng,HUANG Wei,REN Xiang-hui.An Automatic Text Summarization Model Construction Method Based on BERT Embedding[J].Computer and Modernization,2020,0(1):63-68.
Authors:YUE Yi-feng  HUANG Wei  REN Xiang-hui
Affiliation:(North China Institute of Computing Technology,Beijing 100083,China)
Abstract:Aiming at the problem that the traditional word vector can not effectively represent polysemous words in text summarization,which reduces the accuracy and readability of summarization,this paper proposes an automatic text summarization model construction method based on BERT( Bidirectional Encoder Representations from Transformers) Embedding. This method introduces the BERT pre-training language model to enhance the semantic representation of word vector. The generated word vectors are input into the Seq2 Seq model for training to form an automatic text summarization model,which realizes the rapid generation of text summarization. The experimental results show that the model can effectively improve the accuracy and readability of the generated summarization on Gigaword dataset,and can be used for automatic text summarization generation tasks.
Keywords:text summarization  BERT model  attention mechanism  Sequence-to-Sequence(Seq2Seq) model  
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机与现代化》浏览原始摘要信息
点击此处可从《计算机与现代化》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号