首页 | 官方网站   微博 | 高级检索  
     

TMvis:基于LDA的主题建模可视分析系统
引用本文:汤颖,苏建明,童宁. TMvis:基于LDA的主题建模可视分析系统[J]. 计算机辅助设计与图形学学报, 2019, 0(10): 1728-1738
作者姓名:汤颖  苏建明  童宁
作者单位:浙江工业大学计算机科学与技术学院
基金项目:国家自然科学基金面上项目(71571160);浙江省公益技术研究计划(LGG19F020012)
摘    要:主题建模是非常重要的一类文本挖掘方法,被广泛用于构建文本语料库的主题,但其存在难以解释和调整的问题.为了协助用户构建字典以及帮助用户理解主题模型并调节模型,设计并实现了渐进式可视化分析框架,包含2个可视化工作区:语料库优化可视化工作区,协助用户高效构建字典;主题模型可视化工作区,提供多尺度信息可视化以辅助用户理解主题模型并交互地改进主题建模.实现了Web环境下的交互式可视主题模型系统TMvis,并采用20newsgroups新闻数据设计了对照实验,证明了方法的有效性.此外,实现了针对豆瓣电影数据的案例分析,验证了系统的实用性.

关 键 词:文本数据  主题建模  可视分析  模型提升

TMvis: A Visual Analysis System Based on LDA Topic Modelling
Tang Ying,Su Jianming,Tong Ning. TMvis: A Visual Analysis System Based on LDA Topic Modelling[J]. Journal of Computer-Aided Design & Computer Graphics, 2019, 0(10): 1728-1738
Authors:Tang Ying  Su Jianming  Tong Ning
Affiliation:(College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310000)
Abstract:Topic modeling is one of the most important text mining methods, which has been widely used in analyzing the topic composition of a text corpus. Its main drawback lies in that it is difficult to interpret or adjust the topic modeling results. To help users understand and manipulate topic models, we design and implement a progressive visual analysis framework with two visualization components: a corpus refinement component which assists users construct the dictionary efficiently;and a topic modelling component which illustrates multi-dimensional information concerning topics and allows for interactive manipulation of topic models. The effectiveness of the proposed approach is tested with a control experiment using the 20 newsgroups news dataset. A case study on the real Douban movie dataset further verifies the practicability of TMvis.
Keywords:text data  topic modeling  visual analysis  model improving
本文献已被 CNKI 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号