首页 | 官方网站   微博 | 高级检索  
     

基于标签相似度的不良信息多标签分类方法
引用本文:刘卓然,胡杨,刘骊,冯旭鹏,刘利军,黄青松.基于标签相似度的不良信息多标签分类方法[J].计算机应用研究,2016,33(4).
作者姓名:刘卓然  胡杨  刘骊  冯旭鹏  刘利军  黄青松
作者单位:昆明理工大学 信息工程与自动化学院,昆明理工大学 信息工程与自动化学院,昆明理工大学 信息工程与自动化学院,昆明理工大学 教育技术与网络中心,昆明理工大学 信息工程与自动化学院,昆明理工大学 信息工程与自动化学院
基金项目:国家自然科学基金资助项目(81360230);科技部科技型中小企业技术创新基金资助项目(13C26215305404)
摘    要:在多标记分类中,标签与标签之间的相关关系是影响分类效果的一个重要因子。而传统的经典多标签分类方法如BR算法,ML-KNN算法等,忽略了标签之间的相关关系对实际分类的影响,分类效果一直不能令人满意。面对类别关联度极高的不良信息的多标签分类,分类效果更是大打折扣。针对上述问题,本文改进一种经典的多标签分类算法RAkEL,首先根据训练文本计算出各标签之间的相似度系数,然后再根据自定义不良信息层次关系计算出综合标签相似度系数矩阵,最后在RAkEL算法投票过程中根据综合标签相似度与中心标签重新确定最终的结果标签集合。与传统的分类方法在真实的语料库上进行多标签分类效果对比,实验证明,该方法在对不良信息分类具有较好的效果。

关 键 词:多标签分类  标签的相关关系关系  不良信息  中心标签  相似度系数矩阵
收稿时间:2014/12/17 0:00:00
修稿时间:2016/2/24 0:00:00

A method based on label similarity for multi-label classification of bad information
LIU Zhuo-Ran,HU Yang,LIU Li,FENG Xu-peng,LIU Li-jun and HUANG Qing-song.A method based on label similarity for multi-label classification of bad information[J].Application Research of Computers,2016,33(4).
Authors:LIU Zhuo-Ran  HU Yang  LIU Li  FENG Xu-peng  LIU Li-jun and HUANG Qing-song
Affiliation:Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Educational technology and Network Center,Kunming University of Science and Technology,Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Faculty of Information Engineering and Automation,Kunming University of Science and Technology
Abstract:In the multi-label classification,The relationship between the labels plays an important role in affecting the performance of classification. But traditional methods of multi-label classification handle each label independently, ignoring the influence of the relationship between labels,so that the effect of the classification is often not satisfactory.especially in the situation of dealing with the bad information.Aiming at these problems above,a modified algorithm based on the RAkEL,a classic algorithms for the multi-label classification,is presented.The first step is to work out the similarity coefficient between labels,and then to calculate the similarity coefficient matrix between labels according to the hierarchy chart for the bad information.Finally,in the voting process of RAkEL,the last step is to figure out the the result set with the similarity coefficient matrix Experimental results on the real corpus involving bad information show that better performance can be achieved compared to traditional multi-label classification methods.
Keywords:multi-label classification  bad information  similarity coefficient  similarity coefficient matrix  voting process
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号