首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 116 毫秒
1.
为了获取同一事件的汉越双语新闻的自动摘要,该文提出了一种多特征融合的汉越双语新闻摘要方法。关于同一事件的新闻文本,其句子间具有一定的关联关系,利用这些关联关系有助于生成摘要。根据该思想,首先计算句子间的新闻要素共现程度及句子间的相似度;然后将这两种特征融入句子无向图,并利用图排序算法对句子进行排序;之后结合句子的位置特征对排序结果进行调序;最后挑选重要句子并去除冗余生成摘要。在汉越双语新闻文档集上进行了摘要实验,结果表明该方法取得了较好的结果,具有有效性。  相似文献   

2.
针对文本自动摘要任务中生成式摘要模型对句子的上下文理解不够充分、生成内容重复的问题,基于BERT和指针生成网络(PGN),提出了一种面向中文新闻文本的生成式摘要模型——BERT-指针生成网络(BERT-PGN)。首先,利用BERT预训练语言模型结合多维语义特征获取词向量,从而得到更细粒度的文本上下文表示;然后,通过PGN模型,从词表或原文中抽取单词组成摘要;最后,结合coverage机制来减少重复内容的生成并获取最终的摘要结果。在2017年CCF国际自然语言处理与中文计算会议(NLPCC2017)单文档中文新闻摘要评测数据集上的实验结果表明,与PGN、伴随注意力机制的长短时记忆神经网络(LSTM-attention)等模型相比,结合多维语义特征的BERT-PGN模型对摘要原文的理解更加充分,生成的摘要内容更加丰富,全面且有效地减少重复、冗余内容的生成,Rouge-2和Rouge-4指标分别提升了1.5%和1.2%。  相似文献   

3.
Internet网络新闻文本自动摘要的研究   总被引:6,自引:0,他引:6  
官礼和 《计算机工程与设计》2007,28(14):3518-3520,F0003
给出了Internet网络新闻中文文本自动摘要的基本思路和基本步骤,讨论了断句、分词算法.针对自动摘要中新闻文本的4种形式特征,提出了一套新的自动摘要方案:首先综合新闻文本的4种形式特征对词汇和句子赋予不同的权值,然后根据权值大小按给定的比例挑选句子,并进行平滑处理,生成文字流畅且具备一定质量的摘要.最后实验分析表明效果较好.  相似文献   

4.
涉案舆情新闻文本摘要任务是从涉及特定案件的舆情新闻文本中,获取重要信息作为其简短摘要,因此对于相关人员快速掌控舆情态势具有重要作用。涉案舆情新闻文本摘要相比开放域文本摘要任务,通常涉及特定的案件要素,这些要素对摘要生成过程有重要的指导作用。因此,该文结合深度学习框架,提出了一种融入案件要素的涉案舆情新闻文本摘要方法。首先构建涉案舆情新闻摘要数据集并定义相关案件要素,然后通过注意力机制将案件要素信息融入新闻文本的词、句子双层编码过程中,生成带有案件要素信息的新闻文本表征,最后利用多特征分类层对句子进行分类。为了验证算法有效性,在构造的涉案舆情新闻摘要数据集上进行实验。实验结果表明,该方法相比基准模型取得了更好的效果,具有有效性和先进性。  相似文献   

5.
在对中文文本进行摘要提取时,传统的TextRank算法只考虑节点间的相似性,忽略了文本的其他重要信息。首先,针对中文单文档,在现有研究的基础上,使用TextRank算法,一方面考虑句子间的相似性,另一方面,使TextRank算法与文本的整体结构信息、句子的上下文信息等相结合,如文档句子或者段落的物理位置、特征句子、核心句子等有可能提升权重的句子,来生成文本的摘要候选句群;然后对得到的摘要候选句群做冗余处理,以除去候选句群中相似度较高的句子,得到最终的文本摘要。最后通过实验验证,该算法能够提高生成摘要的准确性,表明了该算法的有效性。  相似文献   

6.
当前,突发热点事件的传播日益迅猛与广泛.如何通过事件抽取准确快速地抽取出事件触发词及其事件元素,有助于决策者分析舆情态势、引导社会舆论.针对现有事件抽取方法多是从单个句子中抽取事件元素,而突发热点事件的事件元素往往分布在多个句子当中的问题,提出了一种基于图注意力网络的突发热点事件联合抽取方法,该方法分为三个阶段:基于TextRank的事件句抽取、基于图注意力网络的篇章级事件联合抽取、突发热点事件补全.在抽取出新闻主旨事件以后对整篇新闻做事件抽取,利用候选事件与新闻主旨事件的事件向量相似度以及事件论元相似度对该新闻主旨事件进行补全.实验结果表明,该方法在DUEE1.0数据集上进行触发词抽取和论元角色抽取任务时的F1指标分别达到83.2%、59.1%;在中文突发事件语料库上进行触发词抽取和论元角色抽取任务时的F1指标分别达到82.7%、58.7%,验证了模型的合理性和有效性.  相似文献   

7.
为了解决传统抽象式摘要模型生成的中文摘要难以保存原文本语义信息的问题,提出了一种融合语言特征的抽象式中文摘要模型。模型中添加了拼接层,将词性、命名实体、词汇位置、TF-IDF等特征拼接到词向量上,使输入模型的词向量包含更多的维度的语义信息来确定关键实体。结合指针机制有选择地复制原文中的关键词到摘要中,从而提高生成的摘要的语义相关性。使用LCSTS新闻数据集进行实验,取得了高于基线模型的ROUGE得分。分析表明本模型能够生成语义相关度较高的中文摘要。  相似文献   

8.
为了让计算机能够对中文文章提取摘要,提出一种中文摘要自动生成算法。该算法基于Gensim自然语言处理框架实现,并在原有的基础上做出了改进,算法主要分为两个阶段。关键句生成阶段,对中文语料进行预处理,并放入Gensim框架中的Word2vec模型进行训练,修改TextRank算法使其能够接受词向量的输入生成无向图从而找到关键句;摘要生成框架构建阶段,根据文章结构与Gensim框架中的LDA主题模型所提取的关键词,赋予句子不同的权值,将分数高的几个句子组合生成文章摘要。Rouge摘要评测结果表明,该算法生成的摘要能够包含文章关键信息,相比于其他自动文摘算法,句意通顺程度得到了提升。  相似文献   

9.
为了更深入地挖掘突发事件Web新闻并应用于应急管理,提出了突发事件Web新闻时间抽取方法。首先引入中文的时间关系理论;然后从突发事件Web新闻的时间构成、时间位置特征以及时间常用词三个方面分析了突发事件Web新闻的表达特征;基于此,提出突发事件Web新闻的时间抽取方法,通过统计学习,正确率较理想;最后,基于已抽取到的事件发生时间信息,程序实现了突发事件Web新闻排序。  相似文献   

10.
针对文本摘要任务中的生成式摘要模型对原文理解不充分且容易生成重复文本等问题,提出将词向量模型ALBERT与统一预训练模型UniLM相结合的算法,构造出一种ALBERT-UniLM摘要生成模型。该模型采用预训练动态词向量ALBERT替代传统的BERT基准模型进行特征提取获得词向量。利用融合指针网络的UniLM语言模型对下游生成任务微调,结合覆盖机制来降低重复词的生成并获取摘要文本。实验以ROUGE评测值作为评价指标,在2018年CCF国际自然语言处理与中文计算会议(NLPC-C2018)单文档中文新闻摘要评价数据集上进行验证。与BERT基准模型相比,ALBERT-UniLM模型的Rouge-1、Rouge-2和Rouge-L指标分别提升了1.57%、1.37%和1.60%。实验结果表明,提出的ALBERT-UniLM模型在文本摘要任务上效果明显优于其他基准模型,能够有效提高文本摘要的生成质量。  相似文献   

11.
Information ordering is a nontrivial task in multi‐document summarization (MDS), which typically relies on the traditional vector space model (VSM) notorious for semantic deficiency. In this article, we propose a novel event‐enriched VSM to alleviate the problem by building event semantics into sentence representations. The mediation of event information between sentence and term, especially in the news domain, has an intuitive appeal as well as technical advantage in common sentence‐level operations such as sentence similarity computation. Inspired by the block‐style writing by humans, we base the sentence ordering algorithm on sentence clustering. To accommodate the complexity introduced by event information, we adopt a soft‐to‐hard clustering strategy on the event and sentence levels, using expectation–maximization clustering and K‐means, respectively. For the purpose of cluster‐based sentence ordering, the event‐enriched VSM enables us to design an ordering algorithm to enhance event coherence computed between sentence and sentence–context pairs. Drawing on the findings of earlier research, we also incorporate topic continuity measures and time information into the scheme. We evaluate the performance of the model and its variants automatically and manually, with experimental results showing clear advantage of the event‐based model over baseline and non‐event‐based models in information ordering for multi‐document news summarization. We are confident that the event‐enriched VSM has even greater potential in summarization and beyond, which awaits further research. © 2014 Wiley Periodicals, Inc.  相似文献   

12.
针对新闻文本领域,该文提出一种基于查询的自动文本摘要技术,更加有针对性地满足用户信息需求。根据句子的TF-IDF、与查询句的相似度等要素,计算句子权重,并根据句子指示的时间给定不同的时序权重系数,使得最近发生的新闻内容具有更高的权重,最后使用最大边界相关的方法选择摘要句。通过与基于TF-IDF、Text-Rank、LDA等六种方法的对比,该摘要方法ROUGE评测指标上优于其他方法。从结合评测结果及摘要示例可以看出,该文提出的方法可以有效地从新闻文档集中摘取核心信息,满足用户查询内容的信息需求。  相似文献   

13.
Event summarization is a task to generate a single, concise textual representation of an event. This task does not consider multiple development phases in an event. However, news articles related to long and complicated events often involve multiple phases. Thus, traditional approaches for event summarization generally have difficulty in capturing event phases in summarization effectively. In this paper, we define the task of Event Phase Oriented News Summarization (EPONS). In this approach, we assume that a summary contains multiple timelines, each corresponding to an event phase. We model the semantic relations of news articles via a graph model called Temporal Content Coherence Graph. A structural clustering algorithm EPCluster is designed to separate news articles into several groups corresponding to event phases. We apply a vertex-reinforced random walk to rank news articles. The ranking results are further used to create timelines. Extensive experiments conducted on multiple datasets show the effectiveness of our approach.  相似文献   

14.
带有时间标志的演化式摘要是近年来提出的自然语言处理任务,其本质是多文档自动文摘,它的研究对象是互联网上连续报道的热点新闻文档。针对互联网新闻事件报道的动态演化、动态关联和信息重复等特点,该文提出了一种基于局部—全局主题关系的演化式摘要方法,该方法将新闻事件划分为多个不同的子主题,在考虑时间演化的基础上同时考虑子主题之间的主题演化,最后将新闻标题作为摘要输出。实验结果表明,该方法是有效的,并且在以新闻标题作为输入输出时,和当前主流的多文档摘要和演化摘要方法相比,在Rouge评价指标上有显著提高。  相似文献   

15.
Storyline-based summarization for news topic retrospection   总被引:2,自引:0,他引:2  
Electronics newspapers gradually become main sources for news readers. When facing the numerous reports on a series of events in a topic, a summary of stories from news reports will benefit news readers in reviewing the news topic efficiently. Besides identifying events and presenting news titles and keywords the TDT (Topic Detection and Tracking) techniques are used to do, a summarized text to present event evolution is necessary for general news readers to review events under a news topic. This paper proposes a topic retrospection process and implements the SToRe (Story-line based Topic Retrospection) system that identifies various events under a news topic, and composes a summary that news readers can get the sketch of event evolution in the topic. It consists of three main functions: event identification, main storyline construction and storyline-based summarization. The constructed main storyline can remove the irrelevant events and present a main theme. The storyline-based summarization extracts the representative sentences and takes the main theme as the template to compose the summary. The storyline summary not only provides readers enough information to understand the development of a news topic, but also serves as an index for readers to search corresponding news reports. Following a design science paradigm, a lab experiment is conducted to evaluate the SToRe system in the question-and-answer (Q&A) setting. The experimental results show that SToRe enables news readers to effectively and efficiently capture the evolution of a news topic.  相似文献   

16.
In text summarization, relevance and coverage are two main criteria that decide the quality of a summary. In this paper, we propose a new multi-document summarization approach SumCR via sentence extraction. A novel feature called Exemplar is introduced to help to simultaneously deal with these two concerns during sentence ranking. Unlike conventional ways where the relevance value of each sentence is calculated based on the whole collection of sentences, the Exemplar value of each sentence in SumCR is obtained within a subset of similar sentences. A fuzzy medoid-based clustering approach is used to produce sentence clusters or subsets where each of them corresponds to a subtopic of the related topic. Such kind of subtopic-based feature captures the relevance of each sentence within different subtopics and thus enhances the chance of SumCR to produce a summary with a wider coverage and less redundancy. Another feature we incorporate in SumCR is Position, i.e., the position of each sentence appeared in the corresponding document. The final score of each sentence is a combination of the subtopic-level feature Exemplar and the document-level feature Position. Experimental studies on DUC benchmark data show the good performance of SumCR and its potential in summarization tasks.  相似文献   

17.
案件舆情摘要是从涉及特定案件的新闻文本簇中,抽取能够概括其主题信息的几个句子作为摘要.案件舆情摘要可以看作特定领域的多文档摘要,与一般的摘要任务相比,可以通过一些贯穿于整个文本簇的案件要素来表征其主题信息.在文本簇中,由于句子与句子之间存在关联关系,案件要素与句子亦存在着不同程度的关联关系,这些关联关系对摘要句的抽取有着重要的作用.提出了基于案件要素句子关联图卷积的案件文本摘要方法,采用图的结构来对多文本簇进行建模,句子作为主节点,词和案件要素作为辅助节点来增强句子之间的关联关系,利用多种特征计算不同节点间的关联关系.然后,使用图卷积神经网络学习句子关联图,并对句子进行分类得到候选摘要句.最后,通过去重和排序得到案件舆情摘要.在收集到的案件舆情摘要数据集上进行实验,结果表明:提出的方法相比基准模型取得了更好的效果,引入要素及句子关联图对案件多文档摘要有很好的效果.  相似文献   

18.
基于滑动窗口的微博时间线摘要算法   总被引:1,自引:0,他引:1  
时间线摘要是在时间维度上对文本进行内容归纳和概要生成的技术。传统的时间线摘要主要研究诸如新闻之类的长文本,而本文研究微博短文本的时间线摘要问题。由于微博短文本内容特征有限,无法仅依靠文本内容生成摘要,本文采用内容覆盖性、时间分布性和传播影响力3种指标评价时间线摘要,并提出了基于滑动窗口的微博时间线摘要算法(Microblog timeline summariaztion based on sliding window, MTSW)。该算法首先利用词项强度和熵来确定代表性词项;然后基于上述3种指标构建出评价时间线摘要的综合评价指标;最后采用滑动窗口的方法,遍历时间轴上的微博消息序列,生成微博时间线摘要。利用真实微博数据集的实验结果表明,MTSW算法生成的时间线摘要可以有效地反映热点事件发展演化的过程。  相似文献   

19.
抽取式摘要的核心问题在于合理地建模句子,正确地判断句子重要性。该文提出一种计算句子话题重要性的方法,通过分析句子与话题的语义关系,判断句子是否描述话题的重要信息。针对自动摘要任务缺乏参考摘要作为训练数据的问题,该文提出一种基于排序学习的半监督训练框架,利用大规模未标注新闻语料训练模型。在DUC2004多文档摘要任务上的实验结果表明,该文提出的话题重要性特征能够作为传统启发式特征的有效补充,改进摘要质量。  相似文献   

20.
A fuzzy ontology and its application to news summarization.   总被引:7,自引:0,他引:7  
In this paper, a fuzzy ontology and its application to news summarization are presented. The fuzzy ontology with fuzzy concepts is an extension of the domain ontology with crisp concepts. It is more suitable to describe the domain knowledge than domain ontology for solving the uncertainty reasoning problems. First, the domain ontology with various events of news is predefined by domain experts. The document preprocessing mechanism will generate the meaningful terms based on the news corpus and the Chinese news dictionary defined by the domain expert. Then, the meaningful terms will be classified according to the events of the news by the term classifier. The fuzzy inference mechanism will generate the membership degrees for each fuzzy concept of the fuzzy ontology. Every fuzzy concept has a set of membership degrees associated with various events of the domain ontology. In addition, a news agent based on the fuzzy ontology is also developed for news summarization. The news agent contains five modules, including a retrieval agent, a document preprocessing mechanism, a sentence path extractor, a sentence generator, and a sentence filter to perform news summarization. Furthermore, we construct an experimental website to test the proposed approach. The experimental results show that the news agent based on the fuzzy ontology can effectively operate for news summarization.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号