首页 | 官方网站   微博 | 高级检索  
     

基于事件要素的组合模型微博热点事件摘要提取
引用本文:李纲,徐伟,王馨平.基于事件要素的组合模型微博热点事件摘要提取[J].图书情报工作,2018,62(1):96-105.
作者姓名:李纲  徐伟  王馨平
作者单位:武汉大学信息管理学院 武汉 430072
基金项目:本文系国家社会科学基金重大项目"面向学科领域的网络信息资源深度聚合与服务研究"(项目编号:12&ZD221)研究成果之一。
摘    要:目的/意义] 为帮助读者从热点事件产生的海量微博报道中快速了解事件的来龙去脉,提高微博事件摘要的准确性和可读性,提出一种基于事件要素的多模型微博热点事件时间轴摘要提取方法。方法/过程] 针对微博文本特征,结合主题模型(LDA)与互信息最大熵模型(MaRxEnt-MI)的特点提取事件摘要关键词,以微博传播价值和主题相关性为标准筛选微博,以时间-摘要关键词-摘要微博的形式生成时间轴摘要。结果/结论] 利用人工标注的测试集,与传统的TextRank方法进行对比,F值提高8%-13%,内部测试表明摘要可读性提高明显。实验文本和测试集的数量及事件丰富度需要进一步扩展,应考虑更多的加权策略模型以提高摘要的准确性。实验结果及测试反馈表明,本文的方法能很好满足用户对热点事件摘要信息需求,提高微博摘要提取的准确率。

关 键 词:文本挖掘  事件摘要  潜在狄利克·雷分布  互信息最大熵模型  
收稿时间:2017-05-10

Hot Event Summary on Micro-blog Generated by Multi Model Based on Event Elements
Li Gang,Xu Wei,Wang Xinping.Hot Event Summary on Micro-blog Generated by Multi Model Based on Event Elements[J].Library and Information Service,2018,62(1):96-105.
Authors:Li Gang  Xu Wei  Wang Xinping
Affiliation:School of Information, Wuhan University, Wuhan 430072
Abstract:Purpose/significance] In order to help the readers understand the contexts of the news event on micro-blog platform and improve readability and accuracy of micro-blog event summary, we propose a method for extracting the event summary organized by time axis based on event elements.Method/process] Based on the characteristics of micro-blog text, we combine both advantages and disadvantages of the LDA and mutual information maximum entropy model (MaxEnt-MI) and extract event summary keywords, screening micro-blog with micro-blog communication value and theme relevance and generating event summary in the form of time-keywords-mircro-blog.Result/conclusion] Comparing with the traditional TextRank method in the artificially labeled test set, we find the F value increased by 8% to 13%, and the internal tests show that the readability of the abstracts is significantly improved. The number of experimental texts and test sets and the richness of the event need to be further expanded, and more weighting strategies should be considered in order to improve the accuracy of the abstracts. The experimental results and the test results show that the proposed method is feasible and effective, which can meet the needs of the users for the hot event summary information, and improve the accuracy of the micro-blog abstract extraction.
Keywords:text mining  event summarization  latent dirichlet allocation  mutual information maximum entropy model  
点击此处可从《图书情报工作》浏览原始摘要信息
点击此处可从《图书情报工作》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号