首页 | 官方网站   微博 | 高级检索  
     

基于集成学习的图像垃圾邮件过滤方法
引用本文:赵俊生,候圣,王鑫宇,尹玉洁.基于集成学习的图像垃圾邮件过滤方法[J].计算机工程与科学,2020,42(6):1049-1059.
作者姓名:赵俊生  候圣  王鑫宇  尹玉洁
作者单位:(内蒙古工业大学信息工程学院, 内蒙古 呼和浩特 010080)
基金项目:内蒙古自治区自然科学基金;内蒙古工业大学自然科学重点基金;国家自然科学基金
摘    要:目前的图像垃圾邮件过滤技术,大都采用国际上通用的垃圾图像数据集作为训练集,与中国国内图像垃圾邮件的图像特点不一致,图像数据缺乏实时更新,且分类器单一,过滤效果难以保证。针对该问题,在建立国内垃圾邮件图像数据库的基础上,首先提取图像的颜色、纹理和形状特征,再经K-NN分类算法优选出HSV颜色直方图特征对不同分类器进行训练、测试和性能比较,提出将基于粗糙集的K-NN算法、Naive Bayes算法和SVM算法构成的3种基分类器相结合,并基于串行迭代提升的方法形成集成学习的强分类器。该方法可以实现对国内图像垃圾邮件的有效过滤,使图像垃圾邮件过滤的准确率和召回率同时得到提升,分别为97.3%和96.1%,误判率降低到了2.7%。

关 键 词:图像垃圾邮件过滤  图像分类  集成学习  K-NN算法  HSV颜色直方图
收稿时间:2019-08-10
修稿时间:2020-01-10

An image spam filtering method based on integrated learning
ZHAO Jun-sheng,HOU Sheng,WANG Xin-yu,YIN Yu-jie.An image spam filtering method based on integrated learning[J].Computer Engineering & Science,2020,42(6):1049-1059.
Authors:ZHAO Jun-sheng  HOU Sheng  WANG Xin-yu  YIN Yu-jie
Affiliation:(College of Information Engineering,Inner Mongolia University of Technology,Hohhot 010080,China)
Abstract:Currently, majority of the image spam mail filtering technologies adopt a global common image spam mail data set as the training set. This data set lacks of updates and exhibits characteristics different from Chinese domestic image spam mails. In addition, it only employs only one type of classi- fier, which worsens the filtering performance. To address this issue, on the basis of constructing a domestic image spam mail database, the color, texture, and shape characteristics of images are extracted firstly. Then, the K-NN classification algorithm is used to select the HSV color histogram features for training, testing and performance comparison of different classifiers. A serial iterative improvement method integrating rough set-based K-NN, Naive Bayes, and SVM is proposed to form a strong integrated learning classifier, which can effectively filter domestic image spam mails. The accuracy and recall rate of image spam filtering can be improved to 97.3% and 96.1% respectively, and the false positive rate is reduced to 2.7%.
Keywords:image spam filtering  image classification  integrated learning  K-NN algorithm  HSV color histogram  
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号