首页 | 官方网站   微博 | 高级检索  
     

一种基于成词概率的贝叶斯垃圾邮件过滤方法
引用本文:林伟.一种基于成词概率的贝叶斯垃圾邮件过滤方法[J].微机发展,2011(9):242-244,249.
作者姓名:林伟
作者单位:四川警察学院计算机系,四川泸州646000
基金项目:四川省青年软件创新工程基金(2007AA42)
摘    要:贝叶斯分类方法在英文邮件过滤中效果良好,在中文环境下一直表现不佳,而特征选择是垃圾邮件过滤中的重要步骤,它能够有效地改善过滤效果。文中以成词概率作为特征选择的基础,用构造的方法形成候选特征集,然后进一步用信息增益的方法来度量特征与类的关系,选择信息增益较大的N个特征做为最后的特征向量空间。在此基础上利用贝叶斯方法对邮件进行分类,实验结果验证了该方法在分类时间和分类效果上都优于传统的基于机械分词的贝叶斯方法。

关 键 词:垃圾邮件  成词概率  贝叶斯方法

A Bayesian Spam Filtering Method Based on Words Probability
LIN Wei.A Bayesian Spam Filtering Method Based on Words Probability[J].Microcomputer Development,2011(9):242-244,249.
Authors:LIN Wei
Affiliation:LIN Wei(Department of Computer Science,Sichuan Police College,Luzhou 646000,China)
Abstract:Bayesian classification method has expressed high accuracy in English mails filtration,but the performance was not good under Chinese environment.It has taken the words probability as the foundation of the feature selection,the candidate feature sets were formed through the construction method,then use information gain to evaluate the relationship between feature and class,choose the n-larger information gain features as the final feature vector space.Based on this,the mails were classified by Bayesian method.Experimental verification shows this method surpassed the tradition method which based on the mechanical participle of the Bayesian theorem in the classified time and the classified effect.
Keywords:spam  words probability  Bayesian method
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号