首页 | 官方网站   微博 | 高级检索  
     

基于CLSVSM的惩罚性矩阵分解及其在文本主题聚类中的应用
引用本文:牛奉高,冯世佳,黄琛.基于CLSVSM的惩罚性矩阵分解及其在文本主题聚类中的应用[J].计算机与现代化,2021,0(5):66-72.
作者姓名:牛奉高  冯世佳  黄琛
作者单位:山西大学数学科学学院,山西 太原 030006
基金项目:山西省应用基础研究计划项目(优秀青年基金)(201801D211002); 全国统计科学研究项目(2017LY04); 山西省高等学校优秀成果培育项目(2019KJ004)
摘    要:文本信息的合理表示对文本主题聚类及检索有重要作用。针对文本表示模型维度较高的问题,基于共现潜在语义向量空间模型(CLSVSM)研究惩罚性矩阵分解(PMD),利用PMD对向量进行稀疏约束,提取核心特征词,进而实现原始数据的重建;通过共现分析理论及PMD方法,深度挖掘特征词之间的语义信息,构建语义核函数(PMD_K)。将本文方法应用于文本主题聚类中,实验结果显示,PMD和PMD_K这2种方法的聚类效果均明显优于其他方法,以F值为例,PMD_K方法较以往的95%CLSVSM_K方法,F值提高了21.9%。将PMD与文本表示模型相结合,在提高了文本主题聚类的效率和精度的同时,还避免了对高维矩阵的复杂运算。

关 键 词:CLSVSM    惩罚性矩阵分解    语义核函数    文本主题聚类  
收稿时间:2021-06-03

Penalized Matrix Decomposition Based on CLSVSM and Its Application in Text Topic Clustering
NIU Feng-gao,FENG Shi-jia,HUANG Chen.Penalized Matrix Decomposition Based on CLSVSM and Its Application in Text Topic Clustering[J].Computer and Modernization,2021,0(5):66-72.
Authors:NIU Feng-gao  FENG Shi-jia  HUANG Chen
Abstract:Reasonable representation of text information plays an important role in text topic clustering and retrieval. Aiming at the problem of high dimension of text representation model, penalized matrix decomposition (PMD) is studied based on the co-occurrence potential semantic vector space model (CLSVSM), and the vector is sparsely constrained by PMD to extract core features, so as to realize the reconstruction of original data. Through co-occurrence analysis theory and PMD method, the semantic information between features is deeply mined and the semantic kernel function (PMD_K) is constructed. The methods proposed in this paper are applied to text topic clustering, the experimental results show that the clustering effect of PMD and PMD_K is obviously better than that of other methods. Taking the F value as an example, the F value of PMD_K method is 21.9% higher than that of the previous 95%CLSVSM_K method. Combining PMD with text representation model not only improves the efficiency and accuracy of text topic clustering, but also avoids the complex computation of high-dimensional matrix.
Keywords:CLSVSM(Co-occurrence Latent Semantic Vector Space Model)  PMD  semantic kernel function  text topic clustering  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机与现代化》浏览原始摘要信息
点击此处可从《计算机与现代化》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号