首页 | 官方网站   微博 | 高级检索  
     

基于多特征融合的恶意代码分类算法
引用本文:郎大鹏,丁巍,姜昊辰,陈志远.基于多特征融合的恶意代码分类算法[J].计算机应用,2019,39(8):2333-2338.
作者姓名:郎大鹏  丁巍  姜昊辰  陈志远
作者单位:哈尔滨工程大学计算机科学与技术学院,哈尔滨150001;中国科学院信息工程研究所中国科学院网络测评技术重点实验室,北京100093;哈尔滨工程大学计算机科学与技术学院,哈尔滨,150001
基金项目:中国科学院信息工程研究所中国科学院网络测评技术重点实验室开放课题资助项目(10201050201)。
摘    要:针对多数恶意代码分类研究都基于家族分类和恶意、良性代码分类,而种类分类比较少的问题,提出了多特征融合的恶意代码分类算法。采用纹理图和反汇编文件提取3组特征进行融合分类研究,首先使用源文件和反汇编文件提取灰度共生矩阵特征,由n-gram算法提取操作码序列;然后采用改进型信息增益(IG)算法提取操作码特征,其次将多组特征进行标准化处理后以随机森林(RF)为分类器进行学习;最后实现了基于多特征融合的随机森林分类器。通过对九类恶意代码进行学习和测试,所提算法取得了85%的准确度,相比单一特征下的随机森林、多特征下的多层感知器和Logistic回归算法分类器,准确率更高。

关 键 词:恶意代码  纹理特征  操作码序列  随机森林  静态分析
收稿时间:2019-01-16
修稿时间:2019-04-17

Malicious code classification algorithm based on multi-feature fusion
LANG Dapeng,DING Wei,JIANG Haocheng,CHEN Zhiyuang.Malicious code classification algorithm based on multi-feature fusion[J].journal of Computer Applications,2019,39(8):2333-2338.
Authors:LANG Dapeng  DING Wei  JIANG Haocheng  CHEN Zhiyuang
Affiliation:1. College of Computer Science and Technology, Harbin Engineerning University, Harbin Heilongjiang 150001, China;2. Key Laboratory of Network Assessment Technology, Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China
Abstract:Concerning the fact that most malicious code classification researches are based on family classification and malicious and benign code classification, and the classification of categories is relatively few, a malicious code classification algorithm based on multi-feature fusion was proposed. Three sets of features extracted from texture maps and disassembly files were used for fusion classification research. Firstly, the gray level co-occurrence matrix features were extracted from source files and disassembly files and the sequences of operation codes were extracted by n-gram algorithm. Secondly, the improved Information Gain (IG) algorithm was used to extract the operation code features. Thirdly, Random Forest (RF) was used as the classifier to learn the multi-group features after normalization. Finally, the random forest classifier based on multi-feature fusion was realized. The proposed algorithm achieves 85% accuracy by learning and testing nine types of malicious codes. Compared with random forest under single feature, multi-layer perceptron under multi-feature and Logistic regression classifier, it has higher accuracy.
Keywords:malicious code                                                                                                                        texture feature                                                                                                                        opcode sequence                                                                                                                        Random Forest (RF)                                                                                                                        static analysis
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号