期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

朱广林吕方赖庆宽陈华英何先波《计算机工程与科学》2021,43(6):962-968

编译优化技术的目的是挖掘程序中的优化空间,提高程序编译或运行效率,无效代码删除优化是被广泛使用的编译优化技术之一,它旨在删除程序中不可达的代码,以提升程序的执行效率。许多应用程序的执行路径往往与运行时的输入参数值相关,并且在一些分支路径上与运行时参数值相结合,可能存在无效代码,通过现有的无效代码删除优化,很难做出优化处理。为此,提出一种依赖数据流分析的激进蝴蝶优化方法,利用SSA中间表示,根据动态运行时的参数可能值,自动为程序生成代码形状类似蝴蝶（butterfly）的分支代码,使编译器在程序编译阶段为相关优化提供可行的优化依据。最后通过实验验证了该方法的有效性和可行性。相似文献

2.

Automating relational operations on data structures

Cohen D. Campbell N. 《Software, IEEE》1993,10(3):53-60

An approach to having compilers do most of the implementation detail work in programming that divides a program into two parts is described. The specification part describes what the program should do, but in a way that avoids commitment to implementation details. The annotation part provides implementation instructions that the compiler will carry out. Annotations affect execution efficiency, but not functional behavior. They are very high level and usually very short and hence encourage experimentation. To try out different implementation choices, the programmer simply changes the annotations and recompiles. The implementation details related to data representations are discussed. The testing of compilers that produce Lisp code for iteration, and for adding and deleting tuples of composite relations, is reviewed 相似文献

3.

Compilation techniques for parallel systems

《Parallel Computing》1999,25(13-14):1741-1783

Over the past two decades tremendous progress has been made in both the design of parallel architectures and the compilers needed for exploiting parallelism on such architectures. In this paper we summarize the advances in compilation techniques for uncovering and effectively exploiting parallelism at various levels of granularity. We begin by describing the program analysis techniques through which parallelism is detected and expressed in form of a program representation. Next compilation techniques for scheduling instruction level parallelism (ILP) are discussed along with the relationship between the nature of compiler support and type of processor architecture. Compilation techniques for exploiting loop and task level parallelism on shared-memory multiprocessors (SMPs) are summarized. Locality optimizations that must be used in conjunction with parallelization techniques for achieving high performance on machines with complex memory hierarchies are also discussed. Finally we provide an overview of compilation techniques for distributed memory machines that must perform partitioning of both code and data for parallel execution. Communication optimization and code generation issues that are unique to such compilers are also briefly discussed. 相似文献

4.

Milepost GCC: Machine Learning Enabled Self-tuning Compiler

Grigori Fursin Yuriy Kashnikov Abdul Wahid Memon Zbigniew Chamski Olivier Temam Mircea Namolaru Elad Yom-Tov Bilha Mendelson Ayal Zaks Eric Courtois Francois Bodin Phil Barnard Elton Ashton Edwin Bonilla John Thomson Christopher K. I. Williams Michael O��Boyle 《International journal of parallel programming》2011,39(3):296-327

Tuning compiler optimizations for rapidly evolving hardware makes porting and extending an optimizing compiler for each new platform extremely challenging. Iterative optimization is a popular approach to adapting programs to a new architecture automatically using feedback-directed compilation. However, the large number of evaluations required for each program has prevented iterative compilation from widespread take-up in production compilers. Machine learning has been proposed to tune optimizations across programs systematically but is currently limited to a few transformations, long training phases and critically lacks publicly released, stable tools. Our approach is to develop a modular, extensible, self-tuning optimization infrastructure to automatically learn the best optimizations across multiple programs and architectures based on the correlation between program features, run-time behavior and optimizations. In this paper we describe Milepost GCC, the first publicly-available open-source machine learning-based compiler. It consists of an Interactive Compilation Interface (ICI) and plugins to extract program features and exchange optimization data with the cTuning.org open public repository. It automatically adapts the internal optimization heuristic at function-level granularity to improve execution time, code size and compilation time of a new program on a given architecture. Part of the MILEPOST technology together with low-level ICI-inspired plugin framework is now included in the mainline GCC. We developed machine learning plugins based on probabilistic and transductive approaches to predict good combinations of optimizations. Our preliminary experimental results show that it is possible to automatically reduce the execution time of individual MiBench programs, some by more than a factor of 2, while also improving compilation time and code size. On average we are able to reduce the execution time of the MiBench benchmark suite by 11% for the ARC reconfigurable processor. We also present a realistic multi-objective optimization scenario for Berkeley DB library using Milepost GCC and improve execution time by approximately 17%, while reducing compilation time and code size by 12% and 7% respectively on Intel Xeon processor. 相似文献

5.

Development of dynamic protection against timing channels

Shahrzad Kananizadeh Kirill Kononenko 《International Journal of Information Security》2017,16(6):641-651

Information systems face many threats, such as covert channels, which declassify hidden information by, e.g., analyzing the program execution time. Such threats exist at various stages of the execution of instructions. Even if software developers are able to neutralize these threats in source code, new attack vectors can arise in compiler-generated machine code from these representations. Existing approaches for preventing vulnerabilities have numerous restrictions related to both their functionality and the range of threats that can be found and removed. This study presents a technique for removing threats and generating safer code using dynamic compilation in an execution environment by combining information from program analysis of the malicious code and re-compiling such code to run securely. The proposed approach stores summary information in the form of rules that can be shared among analyses. The annotations enable us to conduct the analyses to mitigate threats. Developers can update the analyses and control the volume of resources that are allocated to perform these analyses by changing the precision. The authors’ experiments show that the binary code created by applying the suggested method is of high quality. 相似文献

6.

Quick compilers using peephole optimization

Jack W. Davidson David B. Whalley 《Software》1989,19(1):79-97

Abstract machine modelling is a popular technique for developing portable compilers. A compiler can be quickly realized by translating the abstract machine operations to target machine operations. The problem with these compilers is that they trade execution efficiency for portability. Typically, the code emitted by these compilers runs two to three times slower than the code generated by compilers that employ sophisticated code generators. This paper describes a C compiler that uses abstract machine modelling to achieve portability. The emitted target machine code is improved by a simple, classical rule-directed peephole optimizer. Our experiments with this compiler on four machines show that a small number of very general handwritten patterns (under 40) yields code that is comparable to the code from compilers that use more sophisticated code generators. As an added bonus, compilation time on some machines is reduced by 10 to 20 per cent. 相似文献

7.

Integration of semantic verifiers into Java language compilers

A. V. Klepinin A. A. Melentyev 《Automatic Control and Computer Sciences》2011,45(7):408-412

This paper introduces one method for source code static semantic analysis at compilation time directly within standard compilers. The method is implemented via unified integration with Java compilers to get full access to Abstract Syntax Tree (AST) of compiled files after semantic analysis stage of compilation process. The unified integration is implemented by common AST interfaces and adapters to AST implementations of Sun/Oracle javac and Eclipse Compiler for Java (ecj) compilers. This method provides transparent integration with Eclipse and Netbeans integrated developer environments without a need for any special plugins. Several examples of program verification rules are presented to demonstrate the method. 相似文献

8.

Compiling Java just in time 总被引：1，自引：0，他引：1

《Micro, IEEE》1997,17(3):36-43

The Java programming language promises portable, secure execution of applications. Early Java implementations relied on interpretation, leading to poor performance compared to compiled programs. Compiling Java programs to the native machine instructions provides much higher performance. Because traditional compilation would defeat Java's portability and security, another approach is necessary. This article describes some of the important issues related to just-in-time, or JIT, compilation techniques for Java. We focus on the JIT compilers developed by Sun for use with the JDK (Java Development Kit) virtual machine running on SPARC and Intel processors. (Access the Web at www.sun. com/workshop/java/jit for these compilers and additional information.) We also discuss performance improvements and limitations of JIT compilers. Future Java implementations may provide even better performance, and we outline some specific techniques that they may use 相似文献

9.

Safe,multiphase bounds check elimination in Java

Andreas Gampe Jeffery von Ronne David Niedzielski Jonathan Vasek Kleanthis Psarris 《Software》2011,41(7):753-788

As part of its type‐safety regime, Java's semantics require precise exceptions at runtime when programs attempt out‐of‐bound array accesses. This paper describes a Java implementation that utilizes a multiphase approach to identifying safe array accesses. This approach reduces runtime overhead by spreading the out‐of‐bounds checking effort across multiple phases of compilation and execution: production of mobile code from source code, just‐in‐time (JIT) compilation in the virtual machine, application method invocations, and the execution of individual array accesses. The code producer uses multiple passes (including common subexpression elimination, load elimination, induction variable substitution, speculation of dynamically verified invariants, and inequality constraint analysis) to identify unnecessary bounds checks and prove their redundancy. During class‐loading and JIT compilation, the virtual machine verifies the proofs, inserts code to dynamically validate speculated invariants, and generates code specialized under the assumption that the speculated invariants hold. During each runtime method invocation, the method parameters and other inputs are checked against the speculated invariants, and execution reverts to unoptimized code if the speculated invariants do not hold. The combined effect of the multiple phases is to shift the effort associated with bounds‐checking array access to phases that are executed earlier and less frequently, thus, reducing runtime overhead. Experimental results show that this approach is able to eliminate more bounds checks than prior approaches with minimal overhead during JIT compilation. These results also show the contribution of each of the passes to the overall elimination. Furthermore, this approach increased the speed at which the benchmarks executed by up to 16%. Copyright © 2010 John Wiley & Sons, Ltd. 相似文献

10.

攻击网页浏览器:面向脚本代码块的ROP Gadget注入

袁平海曾庆凯张云剑刘尧《软件学报》2020,31(2):247-265

即时编译机制（just-in-time compilation）改善了网页浏览器执行JavaScript脚本的性能,同时也为攻击者向浏览器进程注入恶意代码提供了便利.借助即时编译器,攻击者可以将脚本中的整型常数放置到动态代码缓存区,以便注入二进制恶意代码片段（称为gadget）.通过常数致盲等去毒化处理,基于常数的注入已经得到有效遏制.证实了不使用常数转而通过填充脚本代码块也能实施gadget注入,并实现图灵完备的计算功能.在编译一段给定的脚本代码时,即时编译器生成的动态代码中通常存在着一些固定的机器指令序列.这些指令序列的存在性不受常数致盲和地址空间布局随机化等安全机制的影响,同时,这些指令序列中可能蕴涵着攻击者期望的gadget.在实施攻击时,攻击者可以汇集特定的脚本代码块来构造一个攻击脚本,再借助即时编译器来注入gadget.在x86-64架构上评估了这种注入攻击在SpiderMonkey和GoogleV8这两个开源即时编译引擎上的可行性.通过给这两个引擎输入大量的JavaScript脚本,可以得到较为丰富的动态代码块.在这些动态代码块上的统计分析结果表明,这两个引擎生成的动态代码中都存在图灵完备的gadget集合.在实际攻击场景中,攻击者可以利用的脚本集合完全包含且远远多于实验用的脚本.因此,攻击者可以采用该方法注入需要的gadget,以便构造出实现任意功能的ROP（return-orientedprogramming）代码. 相似文献

11.

基于提示的移动代码安全检查

胡荣贵陈意云郭帆张昱《小型微型计算机系统》2004,25(2):187-191

基于语言内部安全机制能够有效地保证移动代码的安全执行，其思想是要在移动代码中附加详细且足够的满足安全策略检查的注解信息．基于提示的移动代码的安全检查，克服了目前PCC存在的验证条件必须回送和证明长度过于庞大等缺陷，从而获得了更佳的代码安全检查性能．相似文献

12.

Ahead-of-time compilation of JavaScript programs

R. Zhuykov E. Sharygin 《Programming and Computer Software》2017,43(1):51-59

Modern virtual machines for JavaScript use just-in-time (JIT) compilation to produce binary code. JIT compilers cannot perform complex optimizations. In contrast, static compilation has unlimited capabilities for complex optimizing transformations, but it cannot be efficiently applied to dynamic languages, such as JavaScript. In this paper, a general approach to the ahead-of-time compilation of programs in dynamic languages is proposed, and this approach is used for improving two virtual machines JavaScript- Core and V8. In the implementation of the improved JavaScriptCore engine with ahead-of-time compilation, the specifics of using JavaScript programs as a part of locally stored applications for the ARM platform were taken into account. In the V8 engine for the x86-64 platform, the ahead-of-time compilation is implemented by caching an optimized internal representation in a separate file. 相似文献

13.

Microbenchmarks for determining branch predictor organization

Milena Milenkovic Aleksandar Milenkovic Jeffrey Kulick 《Software》2004,34(5):465-487

In order to achieve an optimum performance of a given application on a given computer platform, a program developer or compiler must be aware of computer architecture parameters, including those related to branch predictors. Although dynamic branch predictors are designed with the aim of automatically adapting to changes in branch behavior during program execution, code optimizations based on the information about predictor structure can greatly increase overall program performance. Yet, exact predictor implementations are seldom made public, even though processor manuals provide valuable optimization tips. This paper presents an experimental flow with a series of microbenchmarks that determine the organization and size of a branch predictor using on‐chip performance monitoring registers. Such knowledge can be used either for manual code optimization or for design of new, more architecture‐aware compilers. Three examples illustrate how insight into exact branch predictor organization can be directly applied to code optimization. The proposed experimental flow is illustrated with microbenchmarks tuned for Intel Pentium III and Pentium 4 processors, although they can easily be adapted for other architectures. The described approach can also be used during processor design for performance evaluation of various branch predictor organizations and for testing and validation during implementation. Copyright © 2004 John Wiley & Sons, Ltd. 相似文献

14.

Region-based compilation: Introduction, motivation, and initial experience 总被引：1，自引：0，他引：1

Richard E. Hank Wen-mei W. Hwu B. Ramakrishna Rau 《International journal of parallel programming》1997,25(2):113-146

The most important task of a compiler designed to exploit instruction-level parallelism (ILP) is instruction scheduling. If higher levels of ILP are to be achieved, the compiler must use, as the unit of scheduling, regions consisting of multiple basic blocks—preferably those that frequently execute consecutively, and which capture cycles in the program’s execution. Traditionally, compilers have been built using the function as the unit of compilation. In this framework, function boundaries often act as barriers to the formation of the most suitable scheduling regions. Function inlining may be used to circumvent this problem by assembling strongly coupled functions into the same compilation unit, but at the cost of very large function bodies. Consequently, global optimizations whose compile time and space requirements are superlinear in the size of the compilation unit, may be rendered prohibitively expensive. This paper introduces a new approach, called region-based compilation, wherein the compiler, after inlining, repartitions the program into more desirable compilation units, termed regions. Region-based compilation allows the compiler to control problem size and complexity while exposing inter-procedural scheduling, optimization and code motion opportunities. 相似文献

15.

Code Annotation for Safe and Efficient Dynamic Object Resolution

Andreas Hartmann Wolfram Amme Jeffery von Ronne Michael Franz 《Electronic Notes in Theoretical Computer Science》2004,82(2):362

The execution time of object oriented programs can be drastically reduced by transforming “non escaping” objects into a collection of its component scalar data fields. But for languages that support dynamic linking, this kind of optimization (which we call “object resolution”) can usually only be performed at runtime, when the entire program is available for analysis. In such cases, the resulting performance increases will be offset by the additional costs that arise during the analysis and restructuring phases.In this paper, we describe work in progress, which provides an annotation technique that reduces the runtime overhead required for performing object resolutions. Our method performs a partial static escape analysis of each class at compile-time and then annotates the intermediate representation of that class with information which the just-in-time (JIT) compiler can use for object resolution. We apply this technique to the safe TSA intermediate representation, producing a simple extension to safe TSA's type system that guarantees a safe and verifiable transmission of the annotated program. 相似文献

16.

一个基于混合并发模型的Java虚拟机 总被引：3，自引：0，他引：3

杨博王鼎兴郑纬民《软件学报》2002,13(7):1250-1256

从解释执行到及时编译的转变极大地提高了Java程序的运行速度.但是,现有的Java虚拟机还有待进一步的改进.提出了一种新的Java虚拟机编译与执行模型--混合并发模型HCCEM(hybrid concurrent compilation and execution model).该模型通过多线程控制方式将字节码的编译与执行过程相重叠,从而获取加速的效果.另外还给出了基于HCCEM的Java虚拟机JAFFE的设计方案,并就实现中的执行模式切换、异常处理以及层次线程等问题进行了讨论.实验结果表明,HCCEM能相似文献

17.

An experimental evaluation of data dependence analysis techniques 总被引：1，自引：0，他引：1

Psarris K. Kyriakopoulos K. 《Parallel and Distributed Systems, IEEE Transactions on》2004,15(3):196-213

Optimizing compilers rely upon program analysis techniques to detect data dependences between program statements. Data dependence information captures the essential ordering constraints of the statements in a program that need to be preserved in order to produce valid optimized and parallel code. Data dependence testing is very important for automatic parallelization, vectorization, and any other code transformation. In this paper, we examine the impact of data dependence analysis in practice. A number of data dependence tests have been proposed in the literature. In each test, there are different trade offs between accuracy and efficiency. We present an experimental evaluation of several data dependence tests, including the Banerjee test, the I-Test, and the Omega test. We compare these tests in terms of data dependence accuracy, compilation efficiency, effectiveness in parallelization, and program execution performance. We analyze the reasons why a data dependence test can be inexact and we explain how the examined tests handle such cases. We run various experiments using the Perfect Club Benchmarks and the scientific library Lapack. We present the measured accuracy of each test and the reasons for any approximation. We compare these tests in term's of efficiency and we analyze the trade offs between accuracy and efficiency. We also determine the impact of each data dependence test on the total compilation time. Finally, we measure the number of loops parallelized by each test and we compare the execution performance of each benchmark on a multiprocessor. Our results indicate that the Omega test is more accurate, but also very inefficient in the cases where the other two tests are inaccurate. In general, the cost of the Omega test is high and uses a significant percentage of the total compilation time. Furthermore, the difference in accuracy of the Omega test over the Banerjee test and the l-Test does not improve parallelization and program execution performance. 相似文献

18.

一种新型类型化中间语言的优化实现技术 总被引：1，自引：0，他引：1

李筏青陈晖陈意云《计算机工程》2005,31(5):63-65

类型化中间语言是提高代码安全性的一类重要方法。然而在其实现过程中,庞大的类型信息很难被高效地表达和操作。一个未经优化的实现将会给系统带来指数级增长的开销。该文描述了一种新型的类型化中间语言的优化实现技术,并将其成功应用于Intel ORP (Open Runtime Platform)的即时编译器中。相似文献

19.

结合模型和迭代编译优化矩阵相乘程序

陆平静王正华车永刚《计算机工程与科学》2009,31(Z1)

高性能计算应用程序获得的持续性能与机器峰值性能的差距日益扩大,很大程度上制约着高性能计算的发展。程序变换通过对程序进行适应机器体系结构特征的优化变换,提高程序实际执行性能,是解决该问题的有效途径之一。很多高级程序变换均具有数值参数,为了获得最优性能,需要仔细选择参数的值。传统的编译器使用简单的模型选择这些参数,难以适应日趋复杂的硬件平台和应用程序。迭代编译通过生成不同的程序版本并在实际硬件评估上运行程序,来评估关键优化参数的值并决定能够产生最优性能的值,显著优于静态方法,但巨大的优化开销限制了其应用范围。本文针对矩阵相乘程序提出一种结合性能模型和迭代编译的优化方法,利用基于对机器体系结构和程序的经验知识构造性能模型约束优化空间,并使用遗传算法加速在优化空间中寻找优秀解的过程。实验结果表明,该方法可以较低的开销获得更优的性能优化效果。相似文献

20.

Quantifying the Benefits of SSA-Based Mobile Code

Wolfram Amme Jeffery von Ronne Michael Franz 《Electronic Notes in Theoretical Computer Science》2005,141(2):103

High-performance just-in-time compilers for Java need to invest considerable effort before actual code generation can commence. This is in part due to the very nature of the Java Virtual Machine, which is not well matched to the requirements of optimizing code generators. Alternative transportation formats based on Static Single Assignment form should theoretically be superior to virtual machines, but this claim has not previously been validated in practice. This paper revisits the topic and attempts to quantify the effect of using an SSA-based mobile code representation (IR) instead of a virtual-machine based one.To this end, we have integrated full support for a verifiable SSA-based IR into Jikes RVM, an existing Java execution environment. The resulting system is capable of loading and executing Java programs represented in either format, traditional JVM bytecode as well as the SSA-based representation, and it can even execute programs made up of a mixture of the two formats. In our implementation, the two alternative just-in-time compilation pipelines share a common low-level code generator.Performance results are encouraging and show simultaneous improvements in both compilation time and code quality relative to Jikes RVM's standard optimizing compiler for JVM class files. They support the hypothesis that SSA-based intermediate representations offer advantages in the context of just-in-time compilation. 相似文献