首页 | 官方网站   微博 | 高级检索  
     

基于LDA主题的改进TFIDF 95598工单智能分类研究
引用本文:武光华,李洪宇,刘二刚,柳长发,李倩.基于LDA主题的改进TFIDF 95598工单智能分类研究[J].微型电脑应用,2020(3):87-90.
作者姓名:武光华  李洪宇  刘二刚  柳长发  李倩
作者单位:国网河北省电力有限公司电力科学研究院
摘    要:为了提高95595工单智能分类的准确率,提出了基于LDA(Latent Dirichlet Allocation)的改进TFIDF算法。先对文本提取特征词,然后采用K-means算法进行聚类处理。构建LDA模型,获得概率分布函数θ和φ,求取语义影响力SI(semantic influence,SI)作为特征词的权重,该改进算法记作SI-TFIDF(semantic influence-term frequency inverse document frequency,SI-TFIDF)。将SI-TFIDF算法与传统的TFIDF算法在sougou的数据库进行特征词提取,并采用K-means算进行聚类,对比结果显示,采用SI-TFIDF算法提取的特征词聚类效果优于TFIDF,验证了所提出方法的可靠性。在95598投诉工单上进行仿真实验,SI-TFIDF算法的投诉工单聚类准确率高于传统的TFIDF算法,验证了SI-TFIDF更适用于处理工单投诉的分类研究。

关 键 词:95598  投诉工单  LATENT  DIRICHLET  ALLOCATION  TERM  FREQUENCY  inverse  document  FREQUENCY

Study on Intelligent Classification of Improved TFIDF 95598 Work Order Based on LDA
WU Guanghua,LI Hongyu,LIU Ergang,LIU Changfa,LI Qian.Study on Intelligent Classification of Improved TFIDF 95598 Work Order Based on LDA[J].Microcomputer Applications,2020(3):87-90.
Authors:WU Guanghua  LI Hongyu  LIU Ergang  LIU Changfa  LI Qian
Affiliation:(Research Institute, State Grid Hebei Electric Power, Shijiazhuang 050000)
Abstract:In order to improve the accuracy of intelligent classification of 95595 work order,an improved TFIDF algorithm based on LDA(Latent Dirichlet allocation)is proposed.The text feature words are extracted and then the K-means algorithm is used for clustering processing.The probability distribution functionsθandφare obtained by constructing the LDA,and semantic influence(SI)is obtained as the weight of feature words.The improved algorithm is denoted as the semantic influence-term frequency inverse document frequency(SI-TFIDF).SI-TFIDF algorithm and the traditional TFIDF algorithm are used to extract feature words in Sougou database,and K-means algorithm is used for clustering.The comparison results show that the feature words extracted by SI-TFIDF algorithm is better than TFIDF,which verifies the reliability of the method proposed in this paper.Simulation experiments on 95598 complaint work order shows that the clustering accuracy of the complaint work order of SI-TFIDF algorithm is higher than that of the traditional TFIDF algorithm,which verifies that SI-TFIDF is more suitable for the classification research of handling complaint work order.
Keywords:95598  Complaint sheets  Latent Dirichlet allocation  Term frequency inverse document frequency
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号