首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 781 毫秒
1.
方磊  魏强  武泽慧  杜江  张兴明 《计算机科学》2021,48(10):286-293
二进制代码相似性检测在程序的追踪溯源和安全审计中都有着广泛而重要的应用.近年来,神经网络技术被应用于二进制代码相似性检测,突破了传统检测技术在大规模检测任务中遇到的性能瓶颈,因此基于神经网络嵌入的代码相似性检测技术逐渐成为热门研究.文中提出了一种基于神经网络的二进制函数相似性检测技术,该技术首先利用统一的中间表示来消除不同汇编代码在指令架构上的差异;其次在程序基本块级别,利用自然语言处理的词嵌入模型来学习中间表示代码,以获得基本块语义嵌入;然后在函数级别,利用改进的图神经网络模型来学习函数的控制流信息,同时兼顾基本块的语义,获得最终的函数嵌入;最后通过计算两个函数嵌入向量间的余弦距离来度量函数间的相似性.文中实现了一个基于该技术的原型系统,实验表明该技术的程序代码表征学习过程能够避免人为偏见的引入,改进的图神经网络更适合学习函数的控制流信息,系统的可扩展性和检测的准确率较现有方案都得到了提升.  相似文献   

2.
提出了一种基于结构化指纹的静态分析模型,用于辅助逆向工作者对恶意代码及其变种进行分析.该方法依据所提取的恶意代码及其变种的结构化指纹特征,在调用图和控制流图两个层次对两个文件进行同构比较,找出发生改变的函数以及发生改变的基本块,从而帮助逆向工作者迅速定位和发现恶意代码及其变种的不同之处,便于进一步分析.该模型采用了结构化特征及素数乘积等方法,可以较好地对抗一些常见的代码迷惑手段,从而识别出一些变形的代码是等价的.  相似文献   

3.
二进制小片段代码指令序列较短,基本块逻辑调用图结构简单,有限语义信息影响代码相似性比较结果,为此提出一种融合知识表示学习的二进制代码小片段相似性比较模型(BSM)。分别提取小片段代码的函数知识和函数代码,利用注意力机制和双向长短记忆得到知识嵌入,使用序列学习模型或图神经网络得到函数嵌入,融合知识嵌入和函数嵌入作为小片段代码向量表示。实验结果表明,BSM模型在跨平台比较上优于其它对比模型,说明模型能提升小片段代码比较的准确度。  相似文献   

4.
夏冰  庞建民  周鑫  单征 《计算机应用》2022,42(4):985-998
随着物联网和工业互联网的快速发展,网络空间安全的研究日益受到工业界和学术界的重视。由于源代码无法获取,二进制代码相似性搜索成为漏洞挖掘和恶意代码分析的关键核心技术。首先,从二进制代码相似性搜索基本概念出发,给出二进制代码相似性搜索系统框架;然后,围绕相似性技术系统介绍二进制代码语法相似性搜索、语义相似性搜索和语用相似性搜索的发展现状;其次,从二进制哈希、指令序列、图结构、基本块语义、特征学习、调试信息恢复和函数高级语义识别等角度总结比较现有解决方案;最后,展望二进制代码相似性搜索未来发展方向与前景。  相似文献   

5.
李继中  蒋烈辉  舒辉  常瑞 《计算机应用》2014,34(4):1025-1028
二进制代码中的密码算法识别与筛选对于恶意软件分析、密码算法应用安全性验证有着重要意义。分析了密码函数代码实现中内存数据操作特征和基本块循环结构特征,根据二进制数据的信息熵理论,实验验证了密码算法内存操作数据的高熵值特性,构建了基于动态循环信息熵的密码函数筛选模型,并采用动静结合的方法重构基本块循环中的动态读写内存数据。测试结果表明了筛选模型的可靠性和准确性。  相似文献   

6.
二进制翻译系统是一种基于软件的跨平台代码迁移系统,它将一种体系结构的二进制代码翻译成另一种体系结构的二进制代码.二进制翻译可以用于解决遗产代码的迁移问题,也可以实现不同硬件平台之间软件的通用.浮点栈的处理已成为以X86为源的二进制翻译的研究中的关键性问题之一,如何处理X86浮点栈问题直接关系到以X86为源的二进制翻译系统的性能.针对X86浮点寄存器栈的特征,提出了一种扩展虚拟栈(extending virtual stack)处理方案.它采用归一的方法,保证了每个基本块中的运算所涉及到的浮点寄存器可以直接映射到目标机器中的浮点寄存器,确保了翻译的效率,并利用翻译时的分析避免了在入口处不必要的判断;同时还给出了在基本块入口处判别一个基本块是否会出现浮点栈上溢和下溢的充分必要条件,为生成更加高效的代码提供了条件.实验表明,它能够在保证正确实现其功能的前提下,获得更好的执行效率.  相似文献   

7.
二进制翻译技术能够有效解决二进制兼容问题,促进新型体系结构的发展,也是虚拟机技术的重要组成部分,具有重要的研究和应用价值,但是其效率仍然有待提升,特别是目标代码生成的效率。设计了一种高效的目标代码生成算法——代码生成的子图覆盖算法(subgraph covering for code generation,SCCG),能够以尽可能少的代价生成精简的目标代码。该算法应用数据流图对二进制代码中的基本块进行建模,获取指令间的数据相关,并采用基于子图覆盖的贪心算法得到目标代码。在TransARM原型系统中进行了实现和测试,结果表明该算法获得了更优质的目标代码,并且成本得到了有效控制。  相似文献   

8.
Fuzzing测试是一种基于缺陷注入的自动软件测试技术,近几年来广泛应用于软件测试、安全漏洞挖掘等领域.当Fuzzing测试出现异常时,需要人工逆向精确定位触发漏洞的代码位置,分析的工作量大,而且分析效率较为低下.论文以现有二进制比较技术为基础,根据二进制目标软件中代码运行的依赖关系,提出一种针对执行记录的两阶段比较算法,在函数级别和基本块级别上对文件正常执行和出现异常的跟踪记录内容进行比较,得到所有相同和不同的指令执行序列.依托指令序列信息,可以有效辅助人工分析,大大降低分析的工作量,提高分析效率.  相似文献   

9.
软件安全逆向分析中程序结构解析模型设计   总被引:2,自引:0,他引:2  
提出了一种基于二进制文件的程序结构解析模型。该模型通过对二进制文件反汇编,去除汇编文件中的冗余信息,对汇编文件进行静态分析,构建带有索引依赖信息的基本块,并以该基本块为基础提取二进制程序的内部控制流与函数调用关系信息,最后给出程序内部控制流图以及函数调用关系图。该模型不依赖程序的源文件,以二进制文件为分析对象,实用性和通用性比较好;实验结果表明模型对二进制程序内部结构解析具有较高的准确性。  相似文献   

10.
提出动态二进制翻译的两种优化方案:基本块和热路径;分析了从代码中抽取值得优化部分的详细过程;同时也给出针对这两种方案的一些优化方法.最后简单介绍了当前一些动态二进制翻译系统所采用的优化技术.  相似文献   

11.
An abstraction resilient to common malware obfuscation techniques is the call-graph. A call-graph is the representation of an executable file as a directed graph with labeled vertices, where the vertices correspond to functions and the edges to function calls. Unfortunately, most of the interesting graph comparison problems, including full-graph comparison and computing the largest common subgraph, belong to the \(NP\) -hard class. This makes the study and use of graphs in large scale systems difficult. Existing work has focused only on offline clustering and has not addressed the issue of clustering streams of graphs. In this paper we present Classy, a scalable distributed system that clusters streams of large call-graphs for purposes including automated malware classification and facilitating malware analysts. Since algorithms aimed at clustering sets are not suitable for clustering streams of objects, we propose the use of a clustering algorithm that relies on the notion of candidate clusters and reference samples therein. We demonstrate via thorough experimentation that this approach yields results very close to the offline optimal. Graph similarity is determined by computing a graph edit distance (GED) of pairs of graphs using an adapted version of simulated annealing. Furthermore, we present a novel lower bound for the GED. We also study the problem of approximating statistics of clusters of graphs when the distances of only a fraction of all possible pairs have been computed. Finally, we present results and statistics from a real production-side system that has clustered and contains more than 0.8 million graphs.  相似文献   

12.
A similarity metric method of obfuscated malware using function-call graph   总被引:1,自引:0,他引:1  
Code obfuscating technique plays a significant role to produce new obfuscated malicious programs, generally called malware variants, from previously encountered malwares. However, the traditional signature-based malware detecting method is hard to recognize the up-to-the-minute obfuscated malwares. This paper proposes a method to identify the malware variants based on the function-call graph. Firstly, the function-call graphs were created from the disassembled codes of program; then the caller–callee relationships of functions and the operational code (opcode) information about functions, combining the graph coloring techniques were used to measure the similarity metric between two function-call graphs; at last, the similarity metric was utilized to identify the malware variants from known malwares. The experimental results show that the proposed method is able to identify the obfuscated malicious softwares effectively.  相似文献   

13.
可执行文件比较广泛应用于软件版权检测、恶意软件家族检测、异常检测的模式更新以及补丁分析.传统方法无法满足应用对速度和精度的要求.在函数、基本块和指令级别上设计了一元指令签名、基于函数控制流程图邻接矩阵的函数一元结构签名、指令的强/中/弱一元签名,并提出了融合签名和属性的函数匹配算法、基本块匹配算法,从而简化了已有指令比较,可抗指令重排,优于SPP.并通过匹配权统计以及严格的最大唯一匹配策略和Hash进一步降低误报,提高效率.最后,实现原型工具PEDiff,并通过实验证实了该比较方法在速度和精度上具有良好的性能.  相似文献   

14.
Graph-based malware detection using dynamic analysis   总被引:1,自引:0,他引:1  
We introduce a novel malware detection algorithm based on the analysis of graphs constructed from dynamically collected instruction traces of the target executable. These graphs represent Markov chains, where the vertices are the instructions and the transition probabilities are estimated by the data contained in the trace. We use a combination of graph kernels to create a similarity matrix between the instruction trace graphs. The resulting graph kernel measures similarity between graphs on both local and global levels. Finally, the similarity matrix is sent to a support vector machine to perform classification. Our method is particularly appealing because we do not base our classifications on the raw n-gram data, but rather use our data representation to perform classification in graph space. We demonstrate the performance of our algorithm on two classification problems: benign software versus malware, and the Netbull virus with different packers versus other classes of viruses. Our results show a statistically significant improvement over signature-based and other machine learning-based detection methods.  相似文献   

15.
Detection of malware using data mining techniques has been explored extensively. Techniques used for detecting malware based on structural features rely on being able to identify anomalies in the structure of executable files. The structural attributes of an executable that can be extracted include byte ngrams, Portable Executable (PE) features, API call sequences and Strings. After a thorough analysis we have extracted various features from executable files and applied it on an ensemble of classifiers to efficiently detect malware. Ensemble methods combine several individual pattern classifiers in order to achieve better classification. The challenge is to choose the minimal number of classifiers that achieve the best performance. An ensemble that contains too many members might incur large storage requirements and even reduce the classification performance. Hence the goal of ensemble pruning is to identify a subset of ensemble members that performs at least as good as the original ensemble and discard any other members.  相似文献   

16.
新出现的恶意代码大部分是在原有恶意代码基础上修改转换而来.许多变形恶意代码更能自动完成该过程,由于其特征码不固定,给传统的基于特征码检测手段带来了极大挑战.采用归一化方法,并结合使用传统检测技术是一种应对思路.本文针对指令乱序这种常用变形技术提出了相应的归一化方案.该方案先通过控制依赖分析将待测代码划分为若干基本控制块,然后依据数据依赖图调整各基本控制块中的指令顺序,使得不同变种经处理后趋向于一致的规范形式.该方案对指令乱序的两种实现手段,即跳转法和非跳转法,同时有效.最后通过模拟测试对该方案的有效性进行了验证.  相似文献   

17.
夏宏  刘立宇 《计算机工程》2008,34(3):113-115
寄生程序是指注入到可执行文件中的程序代码,被广泛地应用在二进制文件加解密、版权保护等领域。病毒也是寄生程序的一种。Linux下的寄生程序很难利用宿主没有加载的动态连接库,使其功能受到很大限制。该文通过对ELF动态连接机制的研究,采用了一种寄生程序通过proc文件系统进行加载和利用动态库的方法,并对这种方法进行了实现。  相似文献   

18.
A metamorphic virus is a type of malware that modifies its code using a morphing engine. Morphing engines are used to generate a large number of metamorphic malware variants by performing different obfuscation techniques. Since each metamorphic malware has its own unique structure, signature based anti-virus programs are ineffective to detect these metamorphic variants. Therefore, detection of these kind of viruses becomes an increasingly important task. Recently, many researchers have focused on extracting common patterns of metamorphic variants that can be used as micro-signatures to identify the metamorphic malware executables. With the similar motivation, in this work, we propose a novel metamorphic malware identification method, named HLES-MMI (Higher-level Engine Signature based Metamorphic Malware Identification). The proposed method firstly constructs a unique graph structure, called as co-opcode graph, for each metamorphic family, then extracts engine-specific opcode patterns from the graphs. Finally, it generates higher-level signature belonging to each family by representing the extracted opcode-patterns with a binary vector. Experimental results on four datasets produced by different morphing engines demonstrate the effectiveness and efficiency of the proposed method by comparing with several existing malware identification methods.  相似文献   

19.
Writing modern day executable packers has turned into a rather profitable business. In many cases, the reason for packing is not protecting genuine applications against piracy or plagiarism, but rather avoiding reverse-engineering and detection of malicious samples. Unlike developers, which show moderate interest for using a packer and lack time and resources for creating one, malware creators show a huge interest and are willing to spend large amounts of money to use this technology (especially if it offers protection against security solutions). This happens mainly because protecting from piracy and plagiarism isn’t that profitable as spreading new and undetected malware on as many computers as possible. Consequently, creating a custom packer designed to avoid malware detection has grown into a very profitable business.However, developing a good packer is not an easy task to accomplish. Novel techniques of achieving anti-static analysis, anti-virtual machine, anti-sandbox, anti-emulation, anti-debugging, anti-patching, and so on, have to be discovered and added regularly. From the malware creator’s perspective, this must happen frequently enough so that the updates are issued shortly after malware researchers analyze and bypass the existing mechanisms because, once these techniques are bypassed, the detection rate increases in the case of the malware samples packed with the old version of the packer.In this paper, we present our findings which resulted from closely monitoring the fight between malware researchers and packer developers during a period of almost two years. We focus on three different packers used for prevalent malware families like Upatre, Gamarue, Hedsen. We named those packers UPA 1, UPA 2, and UPA 3 and we discuss the mechanisms used in them to achieve anti-emulation. Each technique is presented by listing the code and explaining the inner workings in details. In the end, we manage to get a grasp of the current trends in achieving anti-emulation when developing modern packers.  相似文献   

20.
Malware has considerably increased recently, posing a serious security danger to both people and enterprises. In order to distinguish and stop the negative effects of malware, a variety of machine and deep learning approaches have been used to detect it. However, while extracting malware features, the feature-to-feature spatial hierarchy is not taken into account by the existing techniques and as a result, information is lost during the pooling operation. Hence, a modified capsule deep neural network was developed in which discriminative features are extracted from three channel image derived from malware binary with considering feature-to-feature spatial hierarchy. Also, conventional capsule deep neural network is modified by adding a global average pooling layer before fully connected layer thereby classified the dataset as malicious or benign without any loss of information. Moreover, these malwares were not accurately classified based on their families using existing variants of convolutional neural network (CNN) since malware family variants can modify due to minute changes in malware binaries. Hence, a hybrid deep convolutional neural network (DCNN) and long-short-term memory (LSTM) has been utilized that determine minute changes in malware binaries using LSTM without vanishing gradient issue and effectively perform malware family classification using DCNN. As a result, the proposed approach successfully identifies malware in executable files and categorizes malware into families with 98.5% accuracy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号