首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 609 毫秒
1.
罗琼程  吴强 《计算机应用研究》2009,26(12):4572-4576
动态优化是动态二进制翻译研究中一个十分重要的课题,数据预取优化能提高现代处理器体系结构应用程序性能。基于超级块(Superblock)的动态数据预取优化采用软件插桩方式收集应用程序的load访存延迟信息并构造Superblock;然后根据延迟信息以及Superblock数据流分析得出的寄存器定值引用关系,对延迟load指令进行预取优化。通过在龙芯DigitalBridge动态二进制翻译系统上实验验证,数据预取优化可以提高翻译后SPEC2000浮点测试程序代码的平均性能3.3%,开销远小于0.5%。  相似文献   

2.
基于动态插桩的工具被广泛应用于程序分析中, 但该类工具都面临着严重的性能问题。这类工具的性能开销主要由两部分组成, 即插桩引擎的开销和用户定义的分析程序的开销。为降低用户定义的分析工具的开销, 首先分析了基于动态插桩的工具的性能开销的组成, 并通过实验分析了造成性能开销的几点原因及其对工具的性能影响; 根据分析结论提出了使用离线分析方式来优化工具性能, 最后通过并行数据收集来进一步提升工具性能。使用该方法能减少分析程序5%~15%的CPU占用时间。  相似文献   

3.
陈彬  肖侬  蔡志平  王志英 《软件学报》2010,21(12):3186-3198
针对大规模虚拟机环境下软件的按需部署,提出了一种基于预取的按需软件部署优化机制,能够降低用户端虚拟机的启动延迟以及为用户提供更好的虚拟机本地运行性能.基于用户使用软件的行为特点以及虚拟磁盘映像的细粒度分割,预取机制在后台对服务器端存储的虚拟磁盘映像进行预取,通过一种基于访问频率和优先级的预取目标识别算法AFPTR(access frequency and priority-based prefetch target recognition)和一种预取量动态调节机制,将预取集中在用户使用的少数小尺寸的虚拟磁盘映像上,并在预取过程中对预取量进行动态自适应地调节,以提高虚拟磁盘访问的本地命中率,进而提高用户端虚拟机的运行性能.基于QEMU虚拟机和Linux平台,实现了基于预取的按需软件部署原型系统.实验结果表明,预取机制能够有效地降低虚拟机的启动延迟,并能提高虚拟机的本地运行性能,支持虚拟机环境下按需、快速的软件部署.  相似文献   

4.
针对应用在移植到异构多核高性能计算机系统中所面临的可移植性差以及性能优化难度大的问题,文中提出一种面向异构多核架构的自适应编译框架.通过源到源编译解决传统并行编程模型应用向异构多核架构的映射问题;同时利用动态剖分信息,自适应地调整插桩并配置优化策略,形成迭代式的自动优化过程.文中自适应编译框架将软硬件映射机制与优化策略结合,有效地解决了同构并行应用向异构多核架构的移植问题并提高了应用的整体性能.实验结果表明,文中基于Cell架构实现的原型系统,很好地解决了异构多核架构下应用移植性等问题,同时应用性能有所提高.  相似文献   

5.
模糊测试是一种行之有效的软件缺陷检测方法.其基本思想是生成大量随机输入,从而广泛探索程序行为,并以此发现程序崩溃和崩溃背后的软件缺陷.显然,纯随机的输入无法高效探索程序行为,大量程序缺陷也难以导致崩溃.为了进一步提升模糊测试的有效性,模糊测试往往引入静态插桩技术,用于加快探索程序状态空间速度,提升发现缺陷的能力.因此,引入静态插桩已经成为当下模糊测试的经典实践.聚焦于模糊测试场景下的插桩需求,除了介绍静态插桩技术的基本原理外,从安全特性强化和导向信息收集两个视角出发,系统性地分析了当下静态插桩的典型方法.同时,针对插桩的额外开销问题,全面地测量了不同插桩方案下的程序的执行速度,并与基线的未插桩程序进行比对.最后基于上述分析和测量,初步展望了静态插桩的优化方向.  相似文献   

6.
飞行控制软件测试中插桩技术的优化方法   总被引:2,自引:1,他引:1       下载免费PDF全文
飞行控制软件主要采用Host-Target的仿真测试模式,并基于插桩技术实现覆盖测试。针对采用传统的程序插桩技术,往往会大量增加程序运行时间,降低程序的实时性甚至导致软件失效的问题,通过分析插桩在程序仿真测试中对程序各阶段执行时间和程序实时性的影响,提出一种优化桩信息传输过程的插桩测试方法。实验结果表明了该方法的有效性。  相似文献   

7.
基于性能分析的自适应插桩框架   总被引:1,自引:0,他引:1       下载免费PDF全文
插桩技术用于跟踪获取软件系统的运行时信息,是软件性能管理工具中不可或缺的一个部分。目前,存在的各类插桩工具所产生的插桩点在目标程序运行的过程中往往是不可改变的。在实际的程序异常、错误检测中,用户关心的程序代码的位置在不同阶段往往不同,因此目标程序运行的不同阶段所需要获取信息的插桩点位置是不同的,如果插桩点过多会导致插桩工具浪费系统资源,产生系统资源消耗过大的问题;插桩点过少则会导致无法准确定位目标软件发生异常位置。针对以上问题,使用基于线性回归和K-Means的分析模型,分析目标软件的性能数据,在其运行过程中动态地改变插桩点,尽可能的减少资源的消耗;另外,采用朴素贝叶斯分类模型对插桩类进行筛选,减少植入插桩点的类,可降低插桩带来的资源消耗。实验表明:与传统的工具相比,使用此插桩框架进行监控,被测网页的平均响应时间减少6.88%,同时对目标程序的干扰更小。  相似文献   

8.
传统的基于覆盖率反馈的模糊测试工具通过跟踪代码覆盖率来指导测试用例的变异,从而发现目标程序中潜在的漏洞。但在闭源软件的模糊测试过程中,跟踪覆盖率不仅带来额外的开销,而且在模糊测试开销中占据主导。本文通过对Windows平台闭源软件模糊测试开销的剖析,锁定其中两个主要来源,插桩开销和“预热”开销。基于上述分析,提出了一种基于稀疏插桩跟踪的模糊测试方法,在不影响覆盖率计算精度的前提下,采用基于稀疏插桩的跟踪策略,仅对目标程序中覆盖率不可推导的基本块或分支进行插桩跟踪,并根据跟踪结果推导其余基本块或分支的被覆盖情况;同时结合“预热”优化,避免因动态插桩平台反复启动以及对目标程序代码的重复翻译所引入的时间开销。基于上述方法实现的原型工具SiCsFuzzer,在Windows平台9个规模在286KB~19.3MB,类型涉及图片处理、视频处理、文件压缩、加密和文档处理等类型应用所组成的测试集上,跟踪覆盖率引入的额外开销为程序正常执行时间的1.1倍,比传统的基于覆盖率反馈的模糊测试工具快3倍,并发现PDFtk和XnView程序最新版本中的未知漏洞各1个。  相似文献   

9.
孙家泽  易刚  舒新峰 《计算机工程》2021,47(12):215-220
针对并发程序数据竞争检测时准确率低和开销大的问题,基于Adaboost模型设计并发程序数据竞争语句级检测方法。对多线程并发程序进行插桩操作,记录指令的相关内存信息,并对提取出的指令集做语句级转化处理,利用语句对相关属性特征构建并发程序Adaboost数据竞争检测模型,实现多线程程序数据竞争检测工具ADR。实验结果表明,相比于Eraser、Djit+和Thread Sanitizer工具,ADR能够在降低时间及内存开销的同时,有效提高分类准确率,验证了所提方法的有效性。  相似文献   

10.
针对C、C++程序常出现的内存泄漏、内存越界访问、内存的不匹配释放等错误进行了研究,分析了现有的内存错误检测工具和方法,在基于开源的动态二进制插桩框架Pin的基础上,采用函数族的内存信息块管理方法和生命周期法,实现了在Linux平台下运行的内存检测工具MemGuard原型.该原型能有效地检测出内存泄漏、内存越界访问、内存的不匹配释放等问题,并通过与运行在Valgrind上的工具Memcheck的对比实验证明了该原型的有效性以及高效性.  相似文献   

11.
Stride prefetching is recognized as an important technique to improve memory access performance. The prior work usually profiles and/or analyzes the program behavior offline, and uses the identified stride patterns to guide the compilation process by injecting the prefetch instructions at appropriate places. There are some researches trying to enable stride prefetching in runtime systems with online profiling, but they either cannot discover cross-procedural prefetch opportunity, or require special supports in hardware or garbage collection. In this paper, we present a prefetch engine for JVM (Java Virtual Machine). It firstly identifies the candidate load operations during just-in-time (JIT) compilation, and then instruments the compiled code to profile the addresses of those loads. The runtime profile is collected in a trace buffer, which triggers a prefetch controller upon a protection fault. The prefetch controller analyzes the trace to discover any stride patterns, then modifies the compiled code to inject the prefetch instructions in place of the instrumentations. One of the major advantages of this engine is that, it can detect striding loads in any virtual code places for both regular and irregular code, not being limited with plain loop or procedure scopes. Actually we found the cross-procedural patterns take about 30% of all the prefetchings in the representative Java benchmarks. Another major advantage of the engine is that it has runtime overhead much smaller (the maximal is less than 4.0%) than the benefits it brings. Our evaluation with Apache Harmony JVM shows that the engine can achieve an average 6.2% speed-up with SPECJVM98 and DaCapo on Intel Pentium 4 platform, in spite of the runtime overhead.  相似文献   

12.
Java virtual machine (JVM) crashes are often due to an invalid memory reference to the JVM heap. Before the bug that caused the invalid reference can be fixed, its location must be identified. It can be in either the JVM implementation or the native library written in C invoked from Java applications. To help system engineers identify the location, we implemented a feature using page protection that prevents threads executing native methods from referring to the JVM heap. This feature protects the JVM heap during native method execution; if the heap is referred to invalidly, it interrupts the execution by generating a page-fault exception. It then reports the location where the exception was generated. The runtime overhead for using this feature depends on the frequency of native method calls because the protection is switched on each time a native method is called. We evaluated the runtime overhead by running the SPECjvm98, SPECjbb2000, VolanoMark, and JFCMark benchmark suites on a PC with two Intel Xeon® 1.6 GHz processors. The performance loss was less than 2% for the benchmark items that do not call native methods so frequently (104 times per second) and 5%–20% for the benchmark items that do (104–105 times per second). The worst performance loss was 54%, which was recorded for a benchmark item that calls native methods 2.0×106 times per second.  相似文献   

13.
Heap sprays are a new buffer overflow attack (BOA) form that can significantly increase the successful chance of a BOA even though the attacked process is protected by a lot of state-of-the-art anti-BOA mechanisms, such as ASLR, non-executable stack/DEP, signature-based IDSes, and type-safe languages. In this paper, we propose a glibc-and-ASLR-based solution to heap sprays—Heap Spray Protector (HSP). HSP controls the number and location of int 80 instructions in a process and hides the whereabouts of the only legal int 80 instruction; hence, HSP makes it difficult for attackers to issue a system call, let alone a heap spray attack. Moreover HSP can help ASLR defend against memory information leaking attacks. Furthermore, because HSP only modifies the glibc library and the kernel, it does not need to modify any source code or executable file. Finally, HSP allows attackers to execute as much code as possible before an attack can really create some damage; therefore, it enables the attacked hosts to collect more information about attackers which may be useful to block future attacks. Experimental results show HSP implemented on a Linux platform can effectively defend a system against heap sprays with less than 4.56% performance overhead.  相似文献   

14.
In this paper we propose and evaluate a new data-prefetching technique for cache coherent multiprocessors. Prefetches are issued by a functional unit called a prefetch engine which is controlled by the compiler. We let second-level cache misses generate cache miss traps and start the prefetch engine in a trap handler. The trap handler is fast (40–50 cycles) and does not normally delay the program beyond the memory latency of the miss. Once started, the prefetch engine executes on its own and causes no instruction overhead. The only instruction overhead in our approach is when a trap handler completes after data arrives. The advantages of this technique are (1) it exploits static compiler analysis to determine what to prefetch, which is hard to do in hardware, (2) it uses prefetching with very little instruction overhead, which is a limitation for traditional software-controlled prefetching, and (3) it is accurate in the sense that it generates very little useless traffic while maintaining a high prefetching coverage. We also study whether one could emulate the prefetch engine in software, which would not require any additional hardware beyond support for generating cache miss traps and ordinary prefetch instructions. In this paper we present the functionality of the prefetch engine and a compiler algorithm to control it. We evaluate our technique on six parallel scientific and engineering applications using an optimizing compiler with our algorithm and a simulated multiprocessor. We find that the prefetch engine removes up to 67% of the memory access stall time at an instruction overhead less than 0.42%. The emulated prefetch engine removes in general less stall time at a higher instruction overhead.  相似文献   

15.
Dynamic flexibility is a major challenge in modern system design to react to context or applicative requirements evolutions. Adapting behaviors may impose substantial code modification across the whole system, in the field, without service interruption and without state loss. This paper presents the JnJVM, a full Java virtual machine (JVM) that satisfies these needs by using dynamic aspect weaving techniques and a component architecture. It supports adding or replacing its own code, while it is running, with no overhead on unmodified code execution. Our measurements reveal similar performance when compared with the monolithic JVM Kaffe. Three illustrative examples show different extension scenarios: (i) modifying the JVMs behavior; (ii) adding capabilities to the JVM; and (iii) modifying applications behavior. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

16.
The popularity of Java and recent advances in compilation and execution technology for Java are making the language one of the preferred ones in the field of high-performance scientific and engineering computing. A distributed Java Virtual Machine supports transparent parallel execution of multi-threaded Java programs on a cluster of computers. It provides an alternative platform for high-performance scientific computations. In this paper, we present the design of a global object space for a distributed JVM. It virtualizes a single Java object heap across machine boundaries to facilitate transparent object accesses. We leverage runtime object connectivity information to detect distributed shared objects (DSOs) that are reachable from threads at different nodes to facilitate efficient memory management in the distributed JVM. Based on the concept of DSO, we propose a framework to characterize object access patterns, along three orthogonal dimensions. With this framework, we are able to effectively calibrate the runtime memory access patterns and dynamically apply optimized cache coherence protocols to minimize consistency maintenance overhead. The optimization devices include an object home migration method that optimizes the single-writer access pattern, synchronized method migration that allows the execution of a synchronized method to take place remotely at the home node of its locked object, and connectivity-based object pushing that uses object connectivity information to optimize the producer–consumer access pattern. Several benchmark applications in scientific computing have been tested on our distributed JVM. We report the performance results and give an in-depth analysis of the effects of the proposed adaptive solutions.  相似文献   

17.
基于垃圾收集的Java程序性能改善方法*   总被引:1,自引:0,他引:1  
JVM中堆的默认配置有时并不适用于大型程序,这会使系统在垃圾收集上耗费过多的时间,进而导致程序性能下降。简要分析了HotSpot JVM垃圾收集机制,然后结合已有研究,以一个网管程序为例,提出了通过调整垃圾收集提高程序运行效能的方法。  相似文献   

18.
Nowadays, Java is used in all types of embedded devices. For these memory-constrained systems, the automatic dynamic memory manager (Garbage Collector or GC) has been always a key factor in terms of the Java Virtual Machine (JVM) performance. Moreover, in current embedded platforms, power consumption is becoming as important as performance. Thus, in this paper we present an exploration, from an energy viewpoint, of the different possibilities of memory hierarchies for high-performance embedded systems when used by state-of-the-art GCs. This is a starting point for a better understanding of the interactions between the Java applications, the memory hierarchy and the GC.Hence, we subsequently present two techniques to reduce energy consumption on Java-based embedded systems, based on exploiting GC information. The first technique uses GC execution behavior to reduce leakage energy consumption taking advantage of the low-power mode of actual multi-banked SDRAM memories and it is intended for generational collectors. This technique can achieve a reduction up to 50% of SDRAM memory leakage.The second technique involves the inclusion of a software-controlled (scratch-pad) memory that stores GC instructions under the JVM control to reduce the active energy consumption and also improve the performance of the target embedded system and it is aimed at all kind of garbage collectors. For this last technique we have experimented with two different approaches for selecting the GC code to be stored in the scratchpad memory: one static and one dynamic. Our experimental results show that the proposed dynamic scratchpad management approach for GCs enables up to 63% energy consumption reduction and 25% performance improvement during the collector phase, which means, in terms of JVM execution, a global reduction of 29% and 17% for energy and cycles, respectively.Overall, this work outlines that the key for an efficient low-power implementation of Java Virtual Machines for high-performance embedded systems is the synergy between the GC choice, the memory architecture tuning, and the inclusion of power management schemes controlled by the JVM, exploiting knowledge of the GC behavior.  相似文献   

19.
Conventional methods supporting Java binary security mainly rely on the security of the host Java Virtual Machine (JVM). However, malicious Java binaries keep exploiting the vulnerabilities of JVMs, escaping their sandbox restrictions and allowing attacks on end-user systems. Administrators must confront the difficulties and dilemmas brought on by security upgrades. On the other hand, binary rewriting techniques have been advanced to allow users to enforce security policies directly on the mobile code. They have the advantages of supporting a richer set of security policies and a self-constrained written code. However, the high administrative and performance overhead caused by security configuration and code rewriting have prevented rewriters from becoming a practical security tool. In this paper, we address these problems by integrating binary code rewriters with Web caching proxies and build the security system called PB-JARS, a Proxy-based JAva Rewriting System. PB-JARS works as a complimentary system to existing JVM security mechanisms by placing another line of defense between users and their end-user systems. It gives system administrators centralized security control and management for the mobile code and security policies. We evaluated PB-JARS using a real Java binary traffic model derived from analyzing real Web trace records. Our results show that adding binary rewriting to a Web caching system can be very efficient in improving end-host security at a low cost.  相似文献   

20.
ZD码(ZigZag-decodable codes)是基于之字形解码算法设计生成的一类纠删码, 它仅需要少量的计算即可修复存储系统中的故障数据, 但需要存储相对其他纠删码更多的冗余数据以保证系统的高可靠性. 为了降低ZD码产生的存储开销, 本文通过分析当前在存储系统中使用的之字形解码的思想, 提出了一种优化的之字形解码算法. 新的解码算法能够更充分利用校验数据中的信息来完成数据修复. 基于新的解码算法, 本文相应的提出了一种新的ZD码编码方案, 由于新算法更高的信息利用率, 新的编码方案能够用更少的存储开销来满足存储系统的高可靠性. 实验结果表明, 本文提出的ZD码编码方案具有最优的存储开销, 且编解码性能远高于目前广泛使用的RS码.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号