期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Interprocedural optimization: Experimental results

Stephen Richardson Mahadevan Ganapathi 《Software》1989,19(2):149-169

The problem of tracking data flow across procedure boundaries has a long history of theoretical study by people who believed that such information would be useful for code optimization. Building upon previous work, an algorithm for interprocedural data flow analysis has been implemented. The algorithm produces three flow-insensitive summary sets: MOD, USE and ALIASES. The utility of the resulting information was investigated using an optimizing Pascal compiler. Over a sampling of 27 bench-marks, new optimizations performed as a result of interprocedural summary information contributed almost nothing to program execution speed. Finally, related optimization techniques of possibly greater potential are discussed. 相似文献

2.

Milepost GCC: Machine Learning Enabled Self-tuning Compiler

Grigori Fursin Yuriy Kashnikov Abdul Wahid Memon Zbigniew Chamski Olivier Temam Mircea Namolaru Elad Yom-Tov Bilha Mendelson Ayal Zaks Eric Courtois Francois Bodin Phil Barnard Elton Ashton Edwin Bonilla John Thomson Christopher K. I. Williams Michael O��Boyle 《International journal of parallel programming》2011,39(3):296-327

Tuning compiler optimizations for rapidly evolving hardware makes porting and extending an optimizing compiler for each new platform extremely challenging. Iterative optimization is a popular approach to adapting programs to a new architecture automatically using feedback-directed compilation. However, the large number of evaluations required for each program has prevented iterative compilation from widespread take-up in production compilers. Machine learning has been proposed to tune optimizations across programs systematically but is currently limited to a few transformations, long training phases and critically lacks publicly released, stable tools. Our approach is to develop a modular, extensible, self-tuning optimization infrastructure to automatically learn the best optimizations across multiple programs and architectures based on the correlation between program features, run-time behavior and optimizations. In this paper we describe Milepost GCC, the first publicly-available open-source machine learning-based compiler. It consists of an Interactive Compilation Interface (ICI) and plugins to extract program features and exchange optimization data with the cTuning.org open public repository. It automatically adapts the internal optimization heuristic at function-level granularity to improve execution time, code size and compilation time of a new program on a given architecture. Part of the MILEPOST technology together with low-level ICI-inspired plugin framework is now included in the mainline GCC. We developed machine learning plugins based on probabilistic and transductive approaches to predict good combinations of optimizations. Our preliminary experimental results show that it is possible to automatically reduce the execution time of individual MiBench programs, some by more than a factor of 2, while also improving compilation time and code size. On average we are able to reduce the execution time of the MiBench benchmark suite by 11% for the ARC reconfigurable processor. We also present a realistic multi-objective optimization scenario for Berkeley DB library using Milepost GCC and improve execution time by approximately 17%, while reducing compilation time and code size by 12% and 7% respectively on Intel Xeon processor. 相似文献

3.

利用软件线程集成技术实现硬件到软件的转移

蒋书波张焕春经亚枝《计算机工程与应用》2002,38(19):147-149

该文介绍了线程集成，一种在通用单片微处理器或微控制器上低耗并行执行的新方法，后级编译技术有效地插入多个控制线程，并提供细粒度的多个线程而不用上下文切换的方法，这样允许用软件完成实时的功能来代替专用外围硬件。该文研究了在主线程中集成实时客户线程时的代码转移，生成的集成线程能满足所有的实时性，线程集成的概念和代码转移被应用到实际中来检验这种方法的可行性。相似文献

4.

Using profile information to assist classic code optimizations

Pohua P. Chang Scott A. Mahlke Wen-Mei W. Hwu 《Software》1991,21(12):1301-1321

This paper describes the design and implementation of an optimizing compiler that automatically generates profile information to assist classic code optimizations. This compiler contains two new components, an execution profiler and a profile-based code optimizer, which are not commonly found in traditional optimizing compilers. The execution profiler inserts probes into the input program, executes the input program for several inputs, accumulates profile information and supplies this information to the optimizer. The profile-based code optimizer uses the profile information to expose new optimization opportunities that are not visible to traditional global optimization methods. Experimental results show that the profile-based code optimizer significantly improves the performance of production C programs that have already been optimized by a high-quality global code optimizer. 相似文献

5.

编译器中的edge profiling设计和实现 总被引：4，自引：0，他引：4

董希谦张兆庆《计算机科学》2003,30(1):46-48

1.简介这篇论文是在从事ORC(Open Research Compiler)编译器的过程中完成的。ORC是中科院计算所同Intel公司的合作项目,为Intel的新一代64位芯片开发开放源代码的先进编译器。ORC的实现目标:为研究机构和团体提供一个可靠稳定的、开放源代码的编译基础结构,方便其在ORC的基础上进行编译器和体系结构的研究;与当前其他IA-64编译器相比,要有比较高的性能。代码优化的一个重要目标是为了减少程序的执行时间。相似文献

6.

Compiler to interpreter: experiences with a distributed programming language

Robert M. Gebala Carole M. McNamee Ronald A. Olsson 《Software》2001,31(9):893-909

One interpretive approach for handling concurrency is to provide an interpreter instance for each executing language‐level process. Such an approach has mainly been applied to concurrent implementations of logic and functional languages. This paper describes the use of this approach in constructing an interpreter for an imperative, distributed programming language from an existing compiler and run‐time support system (RTS). Primary design goals were to exploit the existing compiler to the extent possible as well as to have minimal impact on the RTS used to support concurrency. We have been successful in meeting these goals. Additionally, performance results show our interpreter's execution times compare favorably to the times required for compilation, linkage, and execution of small programs or programs with a significant number of calls to the RTS; on such programs, our interpreter's performance also compares favorably to that of the standard Java implementation. However, for larger programs and programs with fewer calls to the underlying RTS, the conventional compiler‐based implementation outperforms the interpreter implementation. For many distributed programs in which network costs dominate, the performances of the two implementations differ little. Copyright © 2001 John Wiley & Sons, Ltd. 相似文献

7.

The concurrency models of Ada and occam—a practical comparison

《Journal of Microcomputer Applications》1991,14(4):363-377

A comparison of the concurrency models of Ada and occam is presented. This comparison is performed in terms of an Ada-to-occam compiler. A subset of the Ada programming language may be compiled allowing a study of how programs and algorithms expressed using the Ada concurrency model can be mapped to the occam concurrency model. The resultant occam code may then be executed on a transputer. This paper describes the Ada and occam concurrency models, highlights the major differences between them, discusses the problems encountered in trying to map concurrent Ada programs to equivalent occam programs as a result of these differences and explains how these problems were overcome in the present compiler. 相似文献

8.

Anomaly Detection in Concurrent Software by Static Data Flow Analysis

《IEEE transactions on pattern analysis and machine intelligence》1980,(3):265-278

Algorithms are presented for detecting errors and anomalies in programs which use synchronization constructs to implement concurrency. The algorithms employ data flow analysis techniques. First used in compiler object code optimization, the techniques have more recently been used in the detection of variable usage errors in dngle process programs. By adapting these existing algorithms, the sane classes of variable usage errors can be detected in concurrent process programs. Important classes of errors unique to concurrent process programs are also described, and algorithms for their detection are presented. 相似文献

9.

Quasidynamic layout optimizations for improving data locality

Kadayif I. Kandemir M. 《Parallel and Distributed Systems, IEEE Transactions on》2004,15(11):996-1011

Compiler-directed locality optimization techniques are effective in reducing the number of cycles spent in off-chip memory accesses. Recently, methods have been developed that transform memory layouts of data structures at compile-time to improve spatial locality of nested loops beyond current control-centric (loop nest-based) optimizations. Most of these data-centric transformations use a single static (program-wide) memory layout for each array. A disadvantage of these static layout-based locality enhancement strategies is that they might fail to optimize codes that manipulate arrays, which demand different layouts in different parts of the code. We introduce a new approach, which extends current static layout optimization techniques by associating different memory layouts with the same array in different parts of the code. We call this strategy "quasidynamic layout optimization." In this strategy, the compiler determines memory layouts (for different parts of the code) at compile time, but layout conversions occur at runtime. We show that the possibility of dynamically changing memory layouts during the course of execution adds a new dimension to the data locality optimization problem. Our strategy employs a static layout optimizer module as a building block and, by repeatedly invoking it for different parts of the code, it checks whether runtime layout modifications bring additional benefits beyond static optimization. Our experiments indicate significant improvements in execution time over static layout-based locality enhancing techniques. 相似文献

10.

Translation and Run-Time Validation of Optimized Code

Lenore Zuck Amir Pnueli Yi Fang Benjamin Goldberg Ying Hu 《Electronic Notes in Theoretical Computer Science》2002,70(4):179-200

The paper presents approaches to the validation of optimizing compilers. The emphasis is on aggressive and architecture-targeted optimizations which try to obtain the highest performance from modern architectures, in particular EPIC-like micro-processors. Rather than verify the compiler, the approach of translation validation performs a validation check after every run of the compiler, producing a formal proof that the produced target code is a correct implementation of the source code.First we survey the standard approach to validation of optimizations which preserve the loop structure of the code (though they may move code in and out of loops and radically modify individual statements), present a simulation-based general technique for validating such optimizations, and describe a tool, VOC-64, which implements these technique. For more aggressive optimizations which, typically, alter the loop structure of the code, such as loop distribution and fusion, loop tiling, and loop interchanges, we present a set of permutation rules which establish that the transformed code satisfies all the implied data dependencies necessary for the validity of the considered transformation. We describe the necessary extensions to the VOC-64 in order to validate these structure-modifying optimizations.Finally, the paper discusses preliminary work on run-time validation of speculative loop optimizations, that involves using run-time tests to ensure the correctness of loop optimizations which neither the compiler nor compiler-validation techniques can guarantee the correctness of. Unlike compiler validation, run-time validation has not only the task of determining when an optimization has generated incorrect code, but also has the task of recovering from the optimization without aborting the program or producing an incorrect result. This technique has been applied to several loop optimizations, including loop interchange, loop tiling, and software pipelining and appears to be quite promising. 相似文献

11.

The Partial Reverse If-Conversion Framework for Balancing Control Flow and Predication

David I. August Wen-Mei W. Hwu Scott A. Mahlke 《International journal of parallel programming》1999,27(5):381-423

Predicated execution is a promising architectural feature for exploiting instruction-level parallelism in the presence of control flow. Compiling for predicated execution involves converting program control flow into conditional, or predicated, instructions. This process is known as if-conversion. In order to apply ifconversion effectively, one must address two major issues: what should be ifconverted and when the if-conversion should be performed. A compiler's use of predication as a representation is most effective when large amounts of code are if-converted and when if-conversion is performed early in the compilation procedure. On the other hand, efficient execution of code generated for a processor with predicated execution requires a delicate balance between control flow and predication. The appropriate balance is tightly coupled with scheduling decisions and detailed processor characteristics. This paper presents a compilation framework based on partial reverse if-conversion that allows the compiler to maximize the benefits of predication as a compiler representation while delaying the final balancing of control flow and predication to schedule time. 相似文献

12.

一种加速访存地址计算的编译优化

高秀武姜军白书敬黄亮明《计算机工程》2023,49(1):173-180

在国产申威高性能多核服务器系统中,基础编译系统对应用程序中访存操作进行代码生成时,没有考虑国产处理器指令特征,导致编译器生成的访存地址计算代码效率较低,影响国产高性能处理器的性能。为充分发挥国产处理器高性能计算能力,提出一种加速访存地址计算的编译优化方法。加速访存地址计算编译优化基于处理器支持带扩展因子的运算指令,在编译器后端内存地址表达式合法性检查中,添加针对乘加模式的地址计算表达式合法性检查算法,自动识别地址表达式中存在的乘加运算并进行合法性检验,对符合条件的地址表达式在代码生成阶段匹配生成带扩展因子的运算指令来快速计算访存地址,从而加快访存指令的发射与执行以及应用程序中的访存地址生成,提升访存效率。使用行业标准性能测试集SPEC CPU2006对优化效果进行评测,结果表明,相比优化前SPECspeed Integer与SPECspeed Float Point两个子集,该优化方法平均性能分别提高了2.53%与1.50%。相似文献

13.

Optimizing Procedure Calls for Delayed Non-local Execution Protocol

VINCENZO LOIA MICHEL QUAGGETTO 《Software》1996,26(12):1303-1334

This work originated from the challenge to strengthen a C-like language compiler developed to support the compilation of sleepers, which are tools which allow complete access to the run-time stack in a delayed non-local execution protocol. Sleepers use a specialized form of procedure call, for which the gain in time execution and space memory allocation becomes a crucial need. These two objectives have been attained thanks to a methodology which generalizes the traditional distinction between tail-calls and normal-calls, introducing the orthogonal distinction between calls in which the calling environment is needed after the call site and calls in which it is not. These two dichotomies divide the space of calls into four classes. The strategy described in this paper is a simple and general framework which can be used to optimize the call-procedure statement in C compilers. The paper discusses optimization techniques appropriate to each class in turn, providing code details for SPARC and ALPHA processors. 相似文献

14.

Construction of speculative optimization algorithms

A. A. Belevantsev S. S. Gaisaryan V. P. Ivannikov 《Programming and Computer Software》2008,34(3):138-153

In modern processors, instructions to perform operations are often produced before it becomes known that this is required. Such an expedient, which is called speculative execution, helps to reveal parallelism at the instruction level. In the EPIC architectures, the speculative execution is completely controlled by the compiler, which makes it possible to avoid using complex hardware mechanisms for supporting speculative instruction production. Moreover, the idea of the speculative execution can be used by the compiler in machine-independent optimizations. The paper describes a scheme of construction of the speculative optimization that is based on the selection of properties of the control flow and data flow that are important from the optimization standpoint and on the estimation of the probabilities of their fulfillment. The probabilities found are used for searching and constructing advantageous speculative and bookkeeping transformations. For optimizations that include only speculative movements of instructions upwards along the control flow graph, on the basis of the suggested scheme, a method has been developed that includes algorithms for finding probabilities of data and control dependences, for estimating benefit of speculative movements, and for constructing a recovery code. On the basis of this method, an algorithm for the speculative scheduling of instructions for the Intel Itanium architecture has been developed and implemented. Specific features of its implementation and experimental results are described. 相似文献

15.

一种多倍数据供应的编译优化方法

彭飞顾乃杰高翔孙明明《小型微型计算机系统》2011,32(11)

数据的快速及时供应对访存密集型程序的性能有着直接的影响.提出一种多倍数据供应MDS(Multiple Data Supply)的编译优化方法,在不增加处理器设计复杂度的前提下,利用现有处理器的高带宽,一次对内存进行多个数据的读写,减少访存次数,提高应用程序效率.在编译优化阶段,利用自动向量化技术,生成向量形式的树结构,增加一条新的扩展路径来处理从向量化的树结构到底层结构的扩展.针对向量化后树结构的多样性问题,设计新的优化遍以及RAC(Register Assignment Chain)替换算法进行专门处理.在龙芯3A处理器平台上,对SPEC-CPU2000的测试,CINT程序平均性能提升11.6％,CFP程序平均性能提升14.4％. 相似文献

16.

Compiler techniques for the distribution of data and computation

Navarro A. Zapata E. Padua D. 《Parallel and Distributed Systems, IEEE Transactions on》2003,14(6):545-562

This paper presents a new method that can be applied by a parallelizing compiler to find, without user intervention, the iteration and data decompositions that minimize communication and load imbalance overheads in parallel programs targeted at NUMA architectures. One of the key ingredients in our approach is the representation of locality as a locality-communication graph (ICG) and the formulation of the compiler technique as a mixed integer nonlinear programming (MINLP) optimization problem on this graph. The objective function and constraints of the optimization problem model communication costs and load imbalance. The solution to this optimization problem is a decomposition that minimizes the parallel execution overhead. This paper summarizes the process of how the compiler extracts the locality information from a nonannotated code and focuses on how this compiler can derive the optimization problem, solve it, and generate the parallel code with the automatically selected iteration and data distributions. In addition, we include a discussion about our model and the solutions - the decompositions - that it provides. The approach presented in the paper is evaluated using several benchmarks. The experimental results demonstrate that the MINLP formulation does not increase compilation time significantly and that our framework generates very efficient iteration/data distributions for a variety of NUMA machines. 相似文献

17.

An Action Compiler Targeting Standard ML

Jrgen Iversen 《Electronic Notes in Theoretical Computer Science》2005,141(4):167

相似文献

18.

Code Analysis for Temporal Predictability

Jan Gustafsson Björn Lisper Raimund Kirner Peter Puschner 《Real-Time Systems》2006,32(3):253-277

The execution time of software for hard real-time systems must be predictable. Further, safe and not overly pessimistic bounds for the worst-case execution time (WCET) must be computable. We conceived a programming strategy called WCET-oriented programming and a code transformation strategy, the single-path conversion, that aid programmers in producing code that meets these requirements. These strategies avoid and eliminate input-data dependencies in the code. The paper describes the formal analysis, based on abstract interpretation, that identifies input-data dependencies in the code and thus forms the basis for the strategies provided for hard real-time code development. This work has been supported by the ARTIST2 Network of Excellence on Embedded Systems Design of IST FP6. Raimund Kirner is an assistant professor in computer science in the Real-Time Systems Group of the Vienna University of Technology. He received a Master's degree in computer science and a doctoral degree in technical sciences both from the Vienna University of Technology in Austria in the years 2000 and 2003, respectively. His research interests include worst-case execution time analysis, compiler support for worst-case execution time analysis, and the verification of real-time systems. Peter Puschner is a professor in computer science at Vienna University of Technology. His main research focus is on worst-case execution time (WCET) analysis for real-time programs. Puschner has been working on WCET analysis for more than ten years and has strongly influenced the state of the art in this field. He has published numerous papers on WCET analysis and software/hardware architectures supporting temporal predictability. He was a guest editor for the special issue on WCET analysis of the Kluwer International Journal on Real-Time Systems and chaired the program committee of the IEEE International Symposium on Object-oriented Real-time distributed Computing in 2003 and the Euromicro Real-Time Systems Conference in 2004. In 2000/2001 Peter Puschner spent one year as a Marie-Curie research fellow at the University of York, England. 相似文献

19.

Simple code optimizations

David R. Hanson 《Software》1983,13(8):745-763

Program optimization has received a great deal of attention for many years, which has resulted in numerous advances in compiler technology. The effectiveness of various simple optimizations has received comparably little attention during the same time period. The simplicity of most programs suggests that straightforward optimizations pay the greatest dividends. This paper describes three such optimizations suitable for one-pass compilers. The optimizations involve expression rearrangement, instruction selection, and the use of a cache for the allocation of resources. The cost of these optimizations is low; none require major changes to the size or structure of the compiler or reduce compilation speed by more than 10%. The benefits are high; each optimization results in at least a 10% average reduction in object code size and a corresponding reduction in execution time. Examples and implementation details are also described. 相似文献

20.

Multi-threaded code generation from Signal program to OpenMP

Kai HU Teng ZHANG Zhibin YANG 《Frontiers of Computer Science》2013,7(5):617-626

相似文献