首页 | 官方网站   微博 | 高级检索  
     

基于改进K?均值的微博热点话题发现方法
引用本文:陈阳键,温秋华.基于改进K?均值的微博热点话题发现方法[J].太赫兹科学与电子信息学报,2023,21(3):378-383.
作者姓名:陈阳键  温秋华
作者单位:1.广州开放大学(广州市广播电视大学) 数字化服务中心,广东 广州 510000;2.暨南大学 信息科学技术学院,广东 广州 510000
基金项目:广东省广州市高校第九批教育教学改革基金资助项目(2017F10)
摘    要:微博文本数据高维度、同义、多义特征明显,传统基于向量空间模型(VSM)联合K-均值的热点话题发现方法存在准确率低,计算复杂,聚类中心难以确定等问题。提出一种相关向量机(RVM)优化VSM的微博文本向量化方法,首先利用RVM的自适应特征选择能力对VSM特征向量进行降维,然后利用主成分分析(PCA)方法确定K-均值算法的初始聚类中心,进而采用K-均值算法得到聚类结果,最后根据微博转发、评论和高影响力用户数量定义热度指数,热度指数最大的话题即为当前热点话题。采用实际微博文本数据集开展实验,结果表明所提方法相对于2种传统方法的准确率分别提升7.3%和1.1%,实时性分别提升45%和53%。

关 键 词:热点话题发现  向量空间模型  话题聚类  数据降维  微博
收稿时间:2020/9/14 0:00:00
修稿时间:2021/2/9 0:00:00

Micro-blog hot topic detection method based on improved K-means
CHEN Yangjian,WEN Qiuhua.Micro-blog hot topic detection method based on improved K-means[J].Journal of Terahertz Science and Electronic Information Technology,2023,21(3):378-383.
Authors:CHEN Yangjian  WEN Qiuhua
Abstract:Micro-blog text data is high-dimensional, bearing the obvious features of synonymy and polysemy. Traditional topic detection method based on Vector Space Model(VSM) combined with K-means has some problems such as low accuracy, complex calculation, and being difficult to determine the center of clustering. A Relevance Vector Machine(RVM) optimized VSM method is proposed to realize the text vectorization. Firstly, the dimension of VSM feature vector is reduced automatically by using the adaptive feature selection ability of RVM, and then Principal Component Analysis(PCA) is applied to determine the cluster center of K-means clustering algorithm. K-means algorithm is employed to get the clustering results. Finally, according to the number of micro-blog forwarding and comments, the topic with the largest heat index is the current hot topic. The results show that compared with two traditional methods, the accuracy of the proposed method is improved by 7.3% and 1.1%, and the real-time performance is improved by 45% and 53%, respectively.
Keywords:hot topic detection  Vector Space Model  topic clustering  data dimensionality reduction  Micro-blog
点击此处可从《太赫兹科学与电子信息学报》浏览原始摘要信息
点击此处可从《太赫兹科学与电子信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号