首页 | 官方网站   微博 | 高级检索  
     

基于LDA模型的主题分析
引用本文:石晶,范猛,李万龙.基于LDA模型的主题分析[J].自动化学报,2009,35(12):1586-1592.
作者姓名:石晶  范猛  李万龙
作者单位:1.长春工业大学计算机科学与工程学院 长春 130012
摘    要:在文本分割的基础上, 确定片段主题, 进而总结全文的中心主题, 使文本的主题脉络呈现出来, 主题以词串的形式表示. 为了分析准确, 利用LDA (Latent dirichlet allocation)为语料库及文本建模, 以Clarity度量块间相似性, 并通过局部最小值识别片段边界. 依据词汇的香农信息提取片段主题词, 采取背景词汇聚类及主题词联想的方式将主题词扩充到待分析文本之外, 尝试挖掘隐藏于字词表面之下的文本内涵. 实验表明, 文本分析的结果明显好于其他方法, 可以为下一步文本推理的工作提供有价值的预处理.

关 键 词:主题分析    LDA模型    文本分割    Gibbs抽样
收稿时间:2008-7-16
修稿时间:2009-3-25

Topic Analysis Based on LDA Model
SHI Jing,FAN Meng,LI Wan-Long.Topic Analysis Based on LDA Model[J].Acta Automatica Sinica,2009,35(12):1586-1592.
Authors:SHI Jing  FAN Meng  LI Wan-Long
Affiliation:1.College of Computer Science and Engineering, Changchun University of Technology, Changchun 130012;2.Department of Science and Research Administration, Changchun University of Technology, Changchun 130012;3.College of Computer Science and Technology, Jilin University, Changchun 130012
Abstract:Topic spotting of segments is performed based on text segmentation and the main topic of the whole text is then generalized. Topics are represented by means of word clusters. LDA (Latent dirichlet allocation) is used to model corpora and text. Clarity is taken as a metric for similarity of blocks and segmentation points are identified by local minimum. The topic words of segments are extracted according to Shannon information. Words which are not distinctly in the analyzed text can be included to express the topics with the help of word clustering of background and topic words association. The signification behind the words are attempted to be digged out. Experiments tell that the result of analyzing is far better than those of other methods. Valuable pre-processing is provided for text reasoning.
Keywords:Topic analysis  latent dirichlet allocation (LDA) model  text segmentation  Gibbs sampling
点击此处可从《自动化学报》浏览原始摘要信息
点击此处可从《自动化学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号