首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
In this paper, we propose a new method for automatic query-reply in social network. Information extraction and query-oriented summarization method are applied here to reply people’s query. There are few effective and commonly used methods on filtering the redundancy and noise of the raw data, which results in the poor quality of the reply. Due to the characteristics of social network messages, we pay more attention to reducing the noise and eliminating the redundancy of the messages to ensure the quality of the final reply. First, we propose an information extraction method to extract high quality information from social network messages, which is based on time-frequency transformation. Second, query-oriented text summarization is implemented to generate a brief and concise summary as the final reply, which is based on the scoring, ranking and selection of sentences of high quality social network messages produced by previous step. Experimental results show that the research is effective in filtering the redundancy and noise of social network messages, the final query-reply results outperform other commonly used methods’ results in both automatic evaluation and manual evaluation. Through our approach, noise and redundancy of social network messages are effectively filtered. Certainly, our method improves the quality of the reply for people’s query.  相似文献   

2.
The technology of automatic document summarization is maturing and may provide a solution to the information overload problem. Nowadays, document summarization plays an important role in information retrieval. With a large volume of documents, presenting the user with a summary of each document greatly facilitates the task of finding the desired documents. Document summarization is a process of automatically creating a compressed version of a given document that provides useful information to users, and multi-document summarization is to produce a summary delivering the majority of information content from a set of documents about an explicit or implicit main topic. In our study we focus on sentence based extractive document summarization. We propose the generic document summarization method which is based on sentence clustering. The proposed approach is a continue sentence-clustering based extractive summarization methods, proposed in Alguliev [Alguliev, R. M., Aliguliyev, R. M., Bagirov, A. M. (2005). Global optimization in the summarization of text documents. Automatic Control and Computer Sciences 39, 42–47], Aliguliyev [Aliguliyev, R. M. (2006). A novel partitioning-based clustering method and generic document summarization. In Proceedings of the 2006 IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology (WI–IAT 2006 Workshops) (WI–IATW’06), 18–22 December (pp. 626–629) Hong Kong, China], Alguliev and Alyguliev [Alguliev, R. M., Alyguliev, R. M. (2007). Summarization of text-based documents with a determination of latent topical sections and information-rich sentences. Automatic Control and Computer Sciences 41, 132–140] Aliguliyev, [Aliguliyev, R. M. (2007). Automatic document summarization by sentence extraction. Journal of Computational Technologies 12, 5–15.]. The purpose of present paper to show, that summarization result not only depends on optimized function, and also depends on a similarity measure. The experimental results on an open benchmark datasets from DUC01 and DUC02 show that our proposed approach can improve the performance compared to sate-of-the-art summarization approaches.  相似文献   

3.
Relevance feedback (RF) is a technique that allows to enrich an initial query according to the user feedback. The goal is to express more precisely the user’s needs. Some open issues arise when considering semi-structured documents like XML documents. They are mainly related to the form of XML documents which mix content and structure information and to the new granularity of information. Indeed, the main objective of XML retrieval is to select relevant elements in XML documents instead of whole documents. Most of the RF approaches proposed in XML retrieval are simple adaptation of traditional RF to the new granularity of information. They usually enrich queries by adding terms extracted from relevant elements instead of terms extracted from whole documents. In this article, we describe a new approach of RF that takes advantage of two sources of evidence: the content and the structure. We propose to use the query term proximity to select terms to be added to the initial query and to use generic structures to express structural constraints. Both sources of evidence are used in different combined forms. Experiments were carried out within the INEX evaluation campaign and results show the effectiveness of our approaches.  相似文献   

4.
为了满足用户的个性化需求,提供尽可能丰富、实用、方便的文摘结果,该文设计了面向查询的多文档自动文摘的多种摘要模式。在将查询返回的文档集合表示为以文本、段落为节点的双层复杂网络结构以发现子主题的基础上,除传统的摘要模式外,该文又设计了概括摘要、局部摘要、全局摘要和详细摘要这四种摘要模式,并给出了各种摘要的生成方法。支持用户以主题为线索自主漫游,按照一定的逻辑顺序浏览信息。  相似文献   

5.
信息爆炸是信息化时代面临的普遍性问题, 为了从海量文本数据中快速提取出有价值的信息, 自动摘要技术成为自然语言处理(natural language processing, NLP)领域中的研究重点. 多文档摘要的目的是从一组具有相同主题的文档中精炼出重要内容, 帮助用户快速获取关键信息. 针对目前多文档摘要中存在的信息不全面、冗余度高的问题, 提出一种基于多粒度语义交互的抽取式摘要方法, 将多粒度语义交互网络与最大边界相关法(maximal marginal relevance, MMR)相结合, 通过不同粒度的语义交互训练句子的表示, 捕获不同粒度的关键信息, 从而保证摘要信息的全面性; 同时结合改进的MMR以保证摘要信息的低冗余度, 通过排序学习为输入的多篇文档中的各个句子打分并完成摘要句的抽取. 在Multi-News数据集上的实验结果表明基于多粒度语义交互的抽取式多文档摘要模型优于LexRank、TextRank等基准模型.  相似文献   

6.
案件舆情摘要是从涉及特定案件的新闻文本簇中,抽取能够概括其主题信息的几个句子作为摘要.案件舆情摘要可以看作特定领域的多文档摘要,与一般的摘要任务相比,可以通过一些贯穿于整个文本簇的案件要素来表征其主题信息.在文本簇中,由于句子与句子之间存在关联关系,案件要素与句子亦存在着不同程度的关联关系,这些关联关系对摘要句的抽取有着重要的作用.提出了基于案件要素句子关联图卷积的案件文本摘要方法,采用图的结构来对多文本簇进行建模,句子作为主节点,词和案件要素作为辅助节点来增强句子之间的关联关系,利用多种特征计算不同节点间的关联关系.然后,使用图卷积神经网络学习句子关联图,并对句子进行分类得到候选摘要句.最后,通过去重和排序得到案件舆情摘要.在收集到的案件舆情摘要数据集上进行实验,结果表明:提出的方法相比基准模型取得了更好的效果,引入要素及句子关联图对案件多文档摘要有很好的效果.  相似文献   

7.
文本情感摘要任务旨在对带有情感的文本数据进行浓缩、提炼进而产生文本所表达的关于情感意见的摘要,用以帮助用户更好地阅读、理解情感文本的内容。该文主要研究多文档的文本情感摘要问题,重点针对网络上存在的同一个产品的多个评论进行摘要抽取。在情感文本中,情感相关性是一个重要的特点,该文将充分考虑情感信息对文本情感摘要的重要影响。同时,对于评论语料,质量高的评论或者说可信度高的评论可以帮助用户更好的了解评论中所评价的对象。因此,该文将充分考虑评论质量对文本情感摘要的影响。并且为了进行关于文本情感摘要的研究,该文收集并标注了一个基于产品评论的英文多文档文本情感摘要语料库。实验证明,情感信息和评论质量能够帮助多文档文本情感摘要,提高摘要效果。  相似文献   

8.
Update summarization is a new challenge in automatic text summarization. Different from the traditional static summarization, it deals with the dynamically evolving document collections of a single topic changing over time, which aims to incrementally deliver salient and novel information to a user who has already read the previous documents. How to have a content selection and linguistic quality control in a temporal context are the two new challenges brought by update summarization. In this paper, we address a novel content selection framework based on evolutionary manifold-ranking and normalized spectral clustering. The proposed evolutionary manifold-ranking aims to capture the temporal characteristics and relay propagation of information in dynamic data stream and user need. This approach tries to keep the summary content to be important, novel and relevant to the topic. Incorporation with normalized spectral clustering is to make summary content have a high coverage for each sub-topic. Ordering sub-topics and selecting sentences are dependent on the rank score from evolutionary manifold-ranking and the proposed redundancy removal strategy with exponent decay. The evaluation results on the update summarization task of Text Analysis Conference (TAC) 2008 demonstrate that our proposed approach is competitive. In the 71 run systems, we receive three top 1 under PYRAMID metrics, ranking 13th in ROUGE-2, 15th in ROUGE-SU4 and 21st in BE.  相似文献   

9.
于广川  贺瑞芳  刘洋  党建武 《软件学报》2017,28(10):2654-2673
时序推特摘要是文本摘要任务中的一个重要分支,旨在从热点事件相关的海量推特流中总结出随时间演化的简要推特集,以帮助用户快速获取信息.推特作为当今最流行的社交媒体平台,其信息量爆发式的增长以及文本碎片的非结构性,使得单纯依赖文本内容的传统摘要方法不再适用.与此同时,社交媒体的新特性也为推特摘要带来了新的机遇.将推特流视作信号,剖析了其中的复杂噪声,提出融合推特流随时序变化的宏微观信号以及用户社交上下文语境信息的时序推特摘要新方法.首先,通过小波分析对推特流全局时序信息建模,实现某一关键词相关的热点子事件时间点检测;接着,融入推特流局部时序信息和用户社交信息建立推特的随机步图模型摘要框架,为每个热点子事件生成推特摘要.在算法评估过程中,对真实推特数据集进行了专家时间点和专家摘要的人工标注,实验结果表明了小波分析和融合了时序-社交上下文语境的图模型在时序推特摘要中的有效性.  相似文献   

10.
With the proliferation of mobile devices and the growing necessity for gender information in personalized intelligent systems, gender prediction of mobile users has become an important research issue. Text data in mobile devices are known to have high discriminative power for gender, but transmitting those data to the outside of a device has a security risk and raises a privacy concern of users. This study introduces an on-device gender prediction framework, by which the entire data analysis is performed inside a device minimizing the privacy risk. To cope with the resource limitation of mobile devices, gender information of a user is predicted by matching the user’s mobile text data against gender representative wordsets which are constructed from web documents using a word evaluation measure. From the experiments conducted on real-world datasets, the effectiveness of the proposed framework was confirmed, and it was concluded that not only discriminability of a word but also popularity should be considered for the on-device gender prediction. The proposed framework is simple yet very powerful for gender prediction that its practical application to various expert and intelligent systems is possible attributed to the low computational complexity and high prediction performances.  相似文献   

11.
We present an optimization-based unsupervised approach to automatic document summarization. In the proposed approach, text summarization is modeled as a Boolean programming problem. This model generally attempts to optimize three properties, namely, (1) relevance: summary should contain informative textual units that are relevant to the user; (2) redundancy: summaries should not contain multiple textual units that convey the same information; and (3) length: summary is bounded in length. The approach proposed in this paper is applicable to both tasks: single- and multi-document summarization. In both tasks, documents are split into sentences in preprocessing. We select some salient sentences from document(s) to generate a summary. Finally, the summary is generated by threading all the selected sentences in the order that they appear in the original document(s). We implemented our model on multi-document summarization task. When comparing our methods to several existing summarization methods on an open DUC2005 and DUC2007 data sets, we found that our method improves the summarization results significantly. This is because, first, when extracting summary sentences, this method not only focuses on the relevance scores of sentences to the whole sentence collection, but also the topic representative of sentences. Second, when generating a summary, this method also deals with the problem of repetition of information. The methods were evaluated using ROUGE-1, ROUGE-2 and ROUGE-SU4 metrics. In this paper, we also demonstrate that the summarization result depends on the similarity measure. Results of the experiment showed that combination of symmetric and asymmetric similarity measures yields better result than their use separately.  相似文献   

12.
多文档自动文摘能够帮助人们自动、快速地获取信息,是目前的一个研究热点。相比于单文档自动文摘,多文档自动文摘需要更多考虑文档之间的相关性,以及文档信息之间的冗余性。因此如何控制信息冗余是多文档自动文摘的一个关键所在。该文在考虑文摘特性的基础上提出了一个冗余度控制模型,该模型通过计算文本单元在主题概率分布之间的相似度来决定句子的选择,从而达到控制冗余的目的。实验结果表明,该方法能够有效降低冗余度,且总体性能优于现有的自动文摘系统。  相似文献   

13.
张庆  周璠  华成  徐光华 《计算机工程》2012,38(13):224-227
针对数控机床可靠性信息分散、信息采集手段和管理方式落后等问题,分析机床可靠性信息构成,构建机床可靠性信息共享模型和交互模型,为机床在不同阶段、不同人员、不同平台间的信息交换和共享奠定基础。通过面向可靠性信息模型的XML Schema,建立可靠性分析的信息索引XML文档,实现信息的统一组织表达。对可靠性信息的采集、管理及应用进行系统部署,为机床企业的可靠性信息管理提供完整的解决方案。  相似文献   

14.
随机冲浪模型;顺序关系;主题相关性;句子重排  相似文献   

15.

Given an information need and the corresponding set of documents retrieved, it is known that user assessments for such documents differ from one user to another. One frequent reason that is put forward is the discordance between text complexity and user reading fluency. We explore this relationship from three different dimensions: quantitative features, subjective-assessed difficulty, and reader/text factors. In order to evaluate quantitative features, we wondered whether it is possible to find differences between documents that are evaluated by the user and those that are ignored according to the complexity of the document. Secondly, a task related to the evaluation of the relevance of short texts is proposed. For this end, users evaluated the relevance of these short texts by answering 20 queries. Documents complexity and relevance assessments were done previously by some human experts. Then, the relationship between participants assessments, experts assessments and document complexity is studied. Finally, a third experimentation was performed under the prism of neuro-Information Retrieval: while the participants were monitored with an electroencephalogram (EEG) headset, we tried to find a correlation among EEG signal, text difficulty and the level of comprehension of texts being read during the EEG recording. In light of the results obtained, we found some weak evidence showing that users responded to queries according to text complexity and user’s reading fluency. For the second and third group of experiments, we administered a sub-test from the Woodcock Reading Mastery Test to ensure that participants had a roughly average reading fluency. Nevertheless, we think that additional variables should be studied in the future in order to achieve a sound explanation of the interaction between text complexity and user profile.

  相似文献   

16.
XML can supply the standard data type in information exchange format on a lot of data generated in running database or applied programs for a company by using the advantage that it can describe meaningful information directly. Accordingly since there are increasing needs for the efficient management and telemedicine security of the massive volume of XML data, it is necessary to develop a secure access control mechanism for XML. The existing access control has not taken information structures and semantics into full consideration due to the fundamental limitations of HTML. In addition, access control for XML documents allows read operations only, and there are problems of slowing down the system performance due to the complex authorization evaluation process. To resolve this problem, this paper designs and builds a XACS (XML Access Control System), which is capable of making fined-grained access control. This only provides data corresponding to its users’ authority levels by authorizing them to access only the specific items of XML documents when they are searching XML documents in telemedicine. To accomplish this, XACS eliminates certain parts of the documents that are inaccessible and transmits the parts accessible depending on the users’ authority levels. In addition, it can be expanded to existing web servers because XML documents are used based on the normal web sites. The telemedicine secure and the guidelines are provided to enable quick and precise understanding of the information, and thus the safety enhancement gets improved. Ultimately, this paper suggests an empirical telemedicine application to confirm the adequacy and validity using the proposed method.  相似文献   

17.
主题关键词信息融合的中文生成式自动摘要研究   总被引:2,自引:0,他引:2  
随着大数据和人工智能技术的迅猛发展,传统自动文摘研究正朝着从抽取式摘要到生成式摘要的方向演化,从中达到生成更高质量的自然流畅的文摘的目的.近年来,深度学习技术逐渐被应用于生成式摘要研究中,其中基于注意力机制的序列到序列模型已成为应用最广泛的模型之一,尤其在句子级摘要生成任务(如新闻标题生成、句子压缩等)中取得了显著的效果.然而,现有基于神经网络的生成式摘要模型绝大多数将注意力均匀分配到文本的所有内容中,而对其中蕴含的重要主题信息并没有细致区分.鉴于此,本文提出了一种新的融入主题关键词信息的多注意力序列到序列模型,通过联合注意力机制将文本中主题下重要的一些关键词语的信息与文本语义信息综合起来实现对摘要的引导生成.在NLPCC 2017的中文单文档摘要评测数据集上的实验结果验证了所提方法的有效性和先进性.  相似文献   

18.
As information is available in abundance for every topic on internet, condensing the important information in the form of summary would benefit a number of users. Hence, there is growing interest among the research community for developing new approaches to automatically summarize the text. Automatic text summarization system generates a summary, i.e. short length text that includes all the important information of the document. Since the advent of text summarization in 1950s, researchers have been trying to improve techniques for generating summaries so that machine generated summary matches with the human made summary. Summary can be generated through extractive as well as abstractive methods. Abstractive methods are highly complex as they need extensive natural language processing. Therefore, research community is focusing more on extractive summaries, trying to achieve more coherent and meaningful summaries. During a decade, several extractive approaches have been developed for automatic summary generation that implements a number of machine learning and optimization techniques. This paper presents a comprehensive survey of recent text summarization extractive approaches developed in the last decade. Their needs are identified and their advantages and disadvantages are listed in a comparative manner. A few abstractive and multilingual text summarization approaches are also covered. Summary evaluation is another challenging issue in this research field. Therefore, intrinsic as well as extrinsic both the methods of summary evaluation are described in detail along with text summarization evaluation conferences and workshops. Furthermore, evaluation results of extractive summarization approaches are presented on some shared DUC datasets. Finally this paper concludes with the discussion of useful future directions that can help researchers to identify areas where further research is needed.  相似文献   

19.
涉案舆情新闻文本摘要任务是从涉及特定案件的舆情新闻文本中,获取重要信息作为其简短摘要,因此对于相关人员快速掌控舆情态势具有重要作用。涉案舆情新闻文本摘要相比开放域文本摘要任务,通常涉及特定的案件要素,这些要素对摘要生成过程有重要的指导作用。因此,该文结合深度学习框架,提出了一种融入案件要素的涉案舆情新闻文本摘要方法。首先构建涉案舆情新闻摘要数据集并定义相关案件要素,然后通过注意力机制将案件要素信息融入新闻文本的词、句子双层编码过程中,生成带有案件要素信息的新闻文本表征,最后利用多特征分类层对句子进行分类。为了验证算法有效性,在构造的涉案舆情新闻摘要数据集上进行实验。实验结果表明,该方法相比基准模型取得了更好的效果,具有有效性和先进性。  相似文献   

20.
Automatic text summarization is an essential tool in this era of information overloading. In this paper we present an automatic extractive Arabic text summarization system where the user can cap the size of the final summary. It is a direct system where no machine learning is involved. We use a two pass algorithm where in pass one, we produce a primary summary using Rhetorical Structure Theory (RST); this is followed by the second pass where we assign a score to each of the sentences in the primary summary. These scores will help us in generating the final summary. For the final output, sentences are selected with an objective of maximizing the overall score of the summary whose size should not exceed the user selected limit. We used Rouge to evaluate our system generated summaries of various lengths against those done by a (human) news editorial professional. Experiments on sample texts show our system to outperform some of the existing Arabic summarization systems including those that require machine learning.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号