首页 | 官方网站   微博 | 高级检索  
     

基于动态特征库的电子邮件分类的研究
引用本文:穆俊鹏,董魁锋,张明. 基于动态特征库的电子邮件分类的研究[J]. 计算机与现代化, 2012, 0(7): 120-123
作者姓名:穆俊鹏  董魁锋  张明
作者单位:[1]上海出版印刷高等专科学校,上海200093 [2]上海海事大学信息工程学院,上海201306
基金项目:上海海事大学科研基金资助项目(20100091)
摘    要:随着邮件分类技术的不断发展,为了对邮件进行更加有效的组织和管理,需要对不断变化的邮件进行动态特征提取,根据其动态特征对邮件进行分类。从邮件的动态特征方面入手,通过编写邮件客户端程序,利用中科院的ICTCLAS分词工具实现中文邮件的准确分词,利用改进的TF-IDF算法对邮件的特征权重进行计算,并利用WEKA挖掘工具进行结果的仿真实验。实验结果表明,利用邮件的动态特征来对邮件进行分类是切实可行的,且在一定程度上能够对邮件进行合理有效的分类。

关 键 词:动态特征  邮件分类  中文分词  TF-IDF  WEKA  数据挖掘

Research on E-mail Classification Based on Dynamic Characteristics Library
MU Jun-peng,DONG Kui-feng,ZHANG Ming. Research on E-mail Classification Based on Dynamic Characteristics Library[J]. Computer and Modernization, 2012, 0(7): 120-123
Authors:MU Jun-peng  DONG Kui-feng  ZHANG Ming
Affiliation:2 ( 1. Shanghai Publishing and Printing College, Shanghai 200093, China; 2. College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China)
Abstract:With the development of E-mail classification technology, it needs to extract from the constantly E-mail features, so as to improve the organization and management of the message category more effective, according to changing characteristics. This article resolves the problem from the aspects of the message' s dynamic characteristics, by using the mail client software, using the ICTCLAS tool to realize Chinese word segmentation, and using the improved TF-IDF algorithm to calculate the mail feature weighting, and also using the WEKA mining tool to examine the result with the simulation experiment. The experimental results show that, by using the dynamic characteristics in a mail message, the realization of changing characteristics in mail classification is feasible, and to a certain extent, this method is more reasonable and effective.
Keywords:dynamic characteristics  mail classification  Chinese word segmentation  TF-IDF  WEKA  data mining
本文献已被 CNKI 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号