首页 | 官方网站   微博 | 高级检索  
     

基于主题模型的科技报告文档聚类方法研究
引用本文:曲靖野,陈震,郑彦宁.基于主题模型的科技报告文档聚类方法研究[J].图书情报工作,2018,62(4):113-120.
作者姓名:曲靖野  陈震  郑彦宁
作者单位:1. 北华大学信息技术与传媒学院 吉林 132013; 2. 中国科学技术信息研究所 北京 100038
基金项目:本文系吉林省教育科学"十三五"规划项目"项目教学法在高校基础计算机教学中的应用研究"(项目编号:GH170061)研究成果之一。
摘    要:目的/意义]探索实践以科技报告为文献载体形式的融合主题模型的文本聚类方法,拓展基于科技文献进行技术监测服务的新领域,提出基于科技报告进行语义分析的新方法。方法/过程]以国家科技报告服务系统中的科技报告为数据源,首先基于LDA主题模型对经过文本预处理的科技报告进行主题挖掘,再基于Ward与K-means相结合的聚类算法对包含主题分布信息的文本向量进行聚类分析,尝试提出一种适合科技报告文档聚类的文本挖掘新方法。结果/结论]实验结果表明,LDA主题模型能有效准确挖掘科技报告中的主题信息,所提出的Ward与K-means相结合的聚类算法对科技报告的聚类效果也优于其它传统聚类算法。

关 键 词:科技报告  主题模型  LDA  文本聚类  
收稿时间:2017-08-12
修稿时间:2017-11-13

Research on the Text Clustering Method of Science and Technology Reports Based on the Topic Model
Qu Jingye,Chen Zhen,Zheng Yanning.Research on the Text Clustering Method of Science and Technology Reports Based on the Topic Model[J].Library and Information Service,2018,62(4):113-120.
Authors:Qu Jingye  Chen Zhen  Zheng Yanning
Affiliation:1. Information Technology and Media College of Beihua University, Jilin 132013; 2. Institute of Scientific and Technical Information of China, Beijing 100038
Abstract:Purpose/significance] This paper explores the method of text clustering in the science and technology reports based on the topic model, develops new scientific literature technology monitoring areas, and puts forward a new semantic analysis method based on science and technology reports. Method/process] Based on the national science and technology report service system, firstly, it conducted topic mining based on the LDA model after the text preprocessing; secondly, a clustering analysis based on the combination of K-means and Ward was carried out based on the text vector of the abstract containing theme distribution information. A proper text clustering method for the text mining suitable for the science and technical report was proposed. Result/conclusion] The experimental results show that the LDA model can be effectively and accurately used in the topic mining of science and technology reports, and the clustering effect of the combination of Ward and K-means proposed in this paper is better than that of other traditional clustering algorithms in science and technology reports.
Keywords:science and technology report  topic model  LDA  text clustering  
点击此处可从《图书情报工作》浏览原始摘要信息
点击此处可从《图书情报工作》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号