首页 | 官方网站   微博 | 高级检索  
     

基于Word2vec的短信向量化算法
引用本文:王贵新,郑孝宗,张浩然,张小川.基于Word2vec的短信向量化算法[J].电子科技,2016,29(4):49.
作者姓名:王贵新  郑孝宗  张浩然  张小川
作者单位:(1.重庆工程学院 软件学院,重庆 402260;2.重庆理工大学 计算机学院,重庆 400054)
基金项目:国家自然科学基金资助项目(60443004);校内科研基金资助项目(2014xcxtd05;2014xzky05)
摘    要:针对目前垃圾短信过滤效果有待提高的问题,提出一种新的短信特征提取方法。该方法采用了建立在深度学习理论基础上的最新成果和Word2vec工具。基于中文短信的内容和结构特点,利用该工具设计了一个短信向量化算法。该算法能有效地将每条短信与一个向量对应,在深度置信网络上利用该算法对垃圾短信进行分类实验。实验结果表明,推广性能比已有报道结果提高了约5%。

关 键 词:深度置信网络  深度学习  短信  向量化  

An Algorithm for Vectoring SMS Based on Word2vec
WANG Guixin,ZHENG Xiaozong,ZHANG Haoran,ZHANG Xiaochuan.An Algorithm for Vectoring SMS Based on Word2vec[J].Electronic Science and Technology,2016,29(4):49.
Authors:WANG Guixin  ZHENG Xiaozong  ZHANG Haoran  ZHANG Xiaochuan
Affiliation:(1.School of Software Engineering,Chongqing Institute of Engineering,Chongqing 402260,China; 2.School of Computer Science,Chongqing University of Technology,Chongqing,400054,China)
Abstract:This paper proposes a new method of feature extraction of SMS for better spam message filtering.The method uses the latest results and tools of Word2vec based on deep learning theory.With the content and structure characteristics of Chinese short messages in mind,an algorithm of Vectoring SMS is designed based on this tool.The algorithm can effectively match each text message with a vector.The classification's experiments on the spam messages are carried out using the proposed algorithm on the deep belief networks.The results show that the performance of the proposed algorithm is improved by 5% compared with the previously reported results.
Keywords:deep belief nets  deep learning  short messages  vectoring  
本文献已被 万方数据 等数据库收录!
点击此处可从《电子科技》浏览原始摘要信息
点击此处可从《电子科技》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号