期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Embedded Software Verification Using Symbolic Execution and Uninterpreted Functions

David Currie Xiushan Feng Masahiro Fujita Alan J. Hu Mark Kwan Sreeranga Rajan 《International journal of parallel programming》2006,34(1):61-91

Symbolic simulation and uninterpreted functions have long been staple techniques for formal hardware verification. In recent years, we have adapted these techniques for the automatic, formal verification of low-level embedded software—specifically, checking the equivalence of different versions of assembly language programs. Our approach, though limited in scalability, has proven particularly promising for the intricate code optimizations and complex architectures typical of high-performance embedded software, such as for DSPs and VLIW processors. Indeed, one of our key findings was how easy it was to create or retarget our verification tools to different, even very complex, machines. The resulting tools automatically verified or found previously unknown bugs in several small sequences of industrial and published example code. This paper provides an introduction to these techniques and a review of our results. 相似文献

2.

Deciding Boolean Algebra with Presburger Arithmetic

Viktor Kuncak Huu Hai Nguyen Martin Rinard 《Journal of Automated Reasoning》2006,36(3):213-239

相似文献

3.

A compiler optimization algorithm for shared-memory multiprocessors

McKinley K.S. 《Parallel and Distributed Systems, IEEE Transactions on》1998,9(8):769-787

This paper presents a new compiler optimization algorithm that parallelizes applications for symmetric, shared-memory multiprocessors. The algorithm considers data locality, parallelism, and the granularity of parallelism. It uses dependence analysis and a simple cache model to drive its optimizations. It also optimizes across procedures by using interprocedural analysis and transformations. We validate the algorithm by hand-applying it to sequential versions of parallel, Fortran programs operating over dense matrices. The programs initially were hand-coded to target a variety of parallel machines using loop parallelism. We ignore the user's parallel loop directives, and use known and implemented dependence and interprocedural analysis to find parallelism. We then apply our new optimization algorithm to the resulting program. We compare the original parallel program to the hand-optimized program, and show that our algorithm improves three programs, matches four programs, and degrades one program in our test suite on a shared-memory, bus-based parallel machine with local caches. This experiment suggests existing dependence and interprocedural array analysis can automatically detect user parallelism, and demonstrates that user parallelized codes often benefit from our compiler optimizations, providing evidence that we need both parallel algorithms and compiler optimizations to effectively utilize parallel machines 相似文献

4.

面向未解释程序的合作验证方法

杜一德洪伟疆陈振邦王戟《软件学报》2023,34(7):3116-3133

未解释程序的验证问题通常是不可判定的,但是最近有研究发现,存在一类满足coherence性质的未解释程序,其验证问题是可判定的,并且计算复杂度为PSPACE完全的.在这个结果的基础上,一个针对一般未解释程序验证的基于路径抽象的反例抽象精化（CEGAR）框架被提出,并展现了良好的验证效率.即使如此,对未解释程序的验证工作依然需要多次迭代,特别是利用该方法在针对多个程序验证时,不同的程序之间的验证过程是彼此独立的,存在验证开销巨大的问题.本文发现被验证的程序之间较为相似时,不可行路径的抽象模型可以在不同的程序之间复用.因此,本文提出了一个合作验证的框架,收集在验证过程中不可行路径的抽象模型,并在对新程序进行验证时,用已保存的抽象模型对程序进行精化,提前删减一些已验证的程序路径,从而提高验证效率.此外,本文通过对验证过程中的状态信息进行精简,对现有的基于状态等价的路径抽象方法进行优化,以进一步提升其泛化能力.本文对合作验证的框架以及路径抽象的优化方法进行了实现,并在两个具有代表性的程序集上分别取得了2.70x和1.49x的加速. 相似文献

5.

Optimizing OpenMP Programs on Software Distributed Shared Memory Systems

Min Seung-Jai Basumallik Ayon Eigenmann Rudolf 《International journal of parallel programming》2003,31(3):225-249

This paper describes compiler techniques that can translate standard OpenMP applications into code for distributed computer systems. OpenMP has emerged as an important model and language extension for shared-memory parallel programming. However, despite OpenMP's success on these platforms, it is not currently being used on distributed system. The long-term goal of our project is to quantify the degree to which such a use is possible and develop supporting compiler techniques. Our present compiler techniques translate OpenMP programs into a form suitable for execution on a Software DSM system. We have implemented a compiler that performs this basic translation, and we have studied a number of hand optimizations that improve the baseline performance. Our approach complements related efforts that have proposed language extensions for efficient execution of OpenMP programs on distributed systems. Our results show that, while kernel benchmarks can show high efficiency of OpenMP programs on distributed systems, full applications need careful consideration of shared data access patterns. A naive translation (similar to OpenMP compilers for SMPs) leads to acceptable performance in very few applications only. However, additional optimizations, including access privatization, selective touch, and dynamic scheduling, resulting in 31% average improvement on our benchmarks. 相似文献

6.

Cacheminer: A runtime approach to exploit cache locality on SMP

Yong Yan Xiaodong Zhang 《Parallel and Distributed Systems, IEEE Transactions on》2000,11(4):357-374

Exploiting cache locality of parallel programs at runtime is a complementary approach to a compiler optimization. This is particularly important for those applications with dynamic memory access patterns. We propose a memory-layout oriented technique to exploit cache locality of parallel loops at runtime on Symmetric Multiprocessor (SMP) systems. Guided by application-dependent and targeted architecture-dependent hints, our system, called Cacheminer, reorganizes and partitions a parallel loop using the memory-access space of its execution. Through effective runtime transformations, our system maximizes the data reuse in each partitioned data region assigned in a cache, and minimizes the data sharing among the partitioned data regions assigned to all caches. The executions of tasks in the partitions are scheduled in an adaptive and locality-presented way to minimize the execution time of programs by trading off load balance and locality. We have implemented the Cacheminer runtime library on two commercial SMP servers and an SimCS simulated SMP. Our simulation and measurement results show that our runtime approach can achieve comparable performance with the compiler optimizations for programs with regular computation and memory-access patterns, whose load balance and cache locality can be well optimized by the tiling and other program transformations. However, our experimental results show that our approach is able to significantly improve the memory performance for the applications with irregular computation and dynamic memory access patterns. These types of programs are usually hard to optimize by static compiler optimizations 相似文献

7.

基于电路拓扑结构分析的等价性验证方法

李光辉冯冬芹曾松伟《计算机辅助设计与图形学学报》2008,20(12)

随着集成电路设计规模的日益增大,结合多种推理引擎已成为组合电路形式化等价性验证的重要手段.提出一种基于电路拓扑结构分析的组合等价性验证方法,将电路的拓扑结构与验证算法的复杂性关联起来.在验证过程开始之前,利用min-cut方法计算表征电路复杂性的"电路宽度",以确定最佳的推理引擎,避免了传统的引擎切换过程,提高了算法的效率.针对ISCAS85电路的实验结果表明了该方法的效率和可行性. 相似文献

8.

F@BOOL@: Experiment with a simple verifying compiler based on SAT-solvers

N. V. Shilov 《Automatic Control and Computer Sciences》2011,45(7):428-436

A verifying compiler is computer system program that translates programs written by a human from a high-level language to equivalent executable programs and proves (verifies) mathematical statements specified by a human concerning the properties of the translated programs. The objective of the project F@BOOL@ is to develop a user friendly, compact, and portable verifying compiler of annotated computational programs that uses efficient and reliable automatic SAT solvers as the tools for automatic validation of correctness conditions (instead of semiautomatic proof techniques). In the period from 2006 to 2009, the SAT solver zChaff was used in the project F@BOOL@. The first experiments on the verification of simple Mini-NIL programs were performed using this solver, namely, the programs swapping variable values, checking whether three integer numbers are the sides of an equilateral or an isoscales triangle, and searching for one fake coin among 15 coins using scales. This paper considers the main ideas of the project F@BOOL@ and gives the details of the experiment on the verification of the program solving the coin puzzle. 相似文献

9.

BDD Based Procedures for a Theory of Equality with Uninterpreted Functions

Anuj Goel Khurram Sajid Hai Zhou Adnan Aziz Vigyan Singhal 《Formal Methods in System Design》2003,22(3):205-224

The logic of equality with uninterpreted functions has been proposed for verifying abstract hardware designs. The ability to perform fast satisfiability checking over this logic is imperative for such verification paradigms to be successful. We present symbolic methods for satisfiability checking for this logic. The first procedure is based on restricting analysis to finite instantiations of the variables. The second procedure directly reasons about equality by introducing Boolean-valued indicator variables for equality. Theoretical and experimental evidence shows the superiority of the second approach. 相似文献

10.

Coinductive Verification of Program Optimizations Using Similarity Relations

Sabine Glesner Johannes Leitner Jan Olaf Blech 《Electronic Notes in Theoretical Computer Science》2007,176(3):61

Formal verification methods have gained increased importance due to their ability to guarantee system correctness and improve reliability. Nevertheless, the question how proofs are to be formalized in theorem provers is far from being trivial, yet very important as one needs to spend much more time on verification if the formalization was not cleverly chosen. In this paper, we develop and compare two different possibilities to express coinductive proofs in the theorem prover Isabelle/HOL. Coinduction is a proof method that allows for the verification of properties of also non-terminating state-transition systems. Since coinduction is not as widely used as other proof techniques as e.g. induction, there are much fewer “recipes” available how to formalize corresponding proofs and there are also fewer proof strategies implemented in theorem provers for coinduction. In this paper, we investigate formalizations for coinductive proofs of properties on state transition sequences. In particular, we compare two different possibilities for their formalization and show their equivalence. The first of these two formalizations captures the mathematical intuition, while the second can be used more easily in a theorem prover. We have formally verified the equivalence of these criteria in Isabelle/HOL, thus establishing a coalgebraic verification framework. To demonstrate that our verification framework is suitable for the verification of compiler optimizations, we have introduced three different, rather simple transformations that capture typical problems in the verification of optimizing compilers, even for non-terminating source programs. 相似文献

11.

航天嵌入式软件整数溢出的形式化验证方法

高猛滕俊元王政《软件学报》2021,32(10):2977-2992

整数溢出引起的软件系统安全性问题屡见不鲜,已有的模型检测技术由于存在状态空间爆炸、不能有效支持中断驱动型程序检测等缺点而少有工程应用.结合真实案例,对航天嵌入式软件整数溢出问题的分布和特征进行了系统性的分析.在有界模型检测技术的基础上,结合整数溢出特征,提出了基于整数溢出变量依赖的程序模型约简技术;同时,针对中断驱动型程序,结合中断函数特征抽象,提出了基于干扰变量的中断驱动程序顺序化方法.经过基准测试程序和真实航天嵌入式软件实验,结果表明：该方法在保证整数溢出问题检出率的前提下,不仅能够提高分析效率,还使得已有的模型检测技术能够适用于中断驱动型程序整数溢出检测. 相似文献

12.

Validating More Loop Optimizations

Ying Hu Clark Barrett Benjamin Goldberg Amir Pnueli 《Electronic Notes in Theoretical Computer Science》2005,141(2):69

Translation validation is a technique for ensuring that a translator, such as a compiler, produces correct results. Because complete verification of the translator itself is often infeasible, translation validation advocates coupling the verification task with the translation task, so that each run of the translator produces verification conditions which, if valid, prove the correctness of the translation. In previous work, the translation validation approach was used to give a framework for proving the correctness of a variety of compiler optimizations, with a recent focus on loop transformations. However, some of these ideas were preliminary and had not been implemented. Additionally, there were examples of common loop transformations which could not be handled by our previous approaches.This paper addresses these issues. We introduce a new rule Reduce for loop reduction transformations, and we generalize our previous rule Validate so that it can handle more transformations involving loops. We then describe how all of this (including some previous theoretical work) is implemented in our compiler validation tool TVOC. 相似文献

13.

Exploiting domain-specific properties: compiling parallel dynamicneural network algorithms into efficient code

Prechelt L. 《Parallel and Distributed Systems, IEEE Transactions on》1999,10(11):1105-1117

Domain-specific constraints can be exploited to implement compiler optimizations that are not otherwise feasible. Compilers for neural network learning algorithms can achieve near-optimal colocality of data and processes and near-optimal balancing of load over processors, even for dynamically irregular problems. This is impossible for general programs, but restricting programs to the neural algorithm domain allows for the exploitation of domain-specific properties. The operations performed by neural algorithms are broadcasts, reductions, and object-local operations only; the load distribution is regular with respect to the (perhaps irregular) network topology; changes of network topology occur only from time to time. A language, compilation techniques, and a compiler implementation on the MasPar MP-1 are described and quantitative results for the effects of various optimizations used in the compiler are shown. Conservative experiments with weight pruning algorithms yield performance improvements of 27 percent due to load balancing and 195 percent improvement is achieved due to data locality, both compared to unoptimized versions. Two other optimizations-connection allocation and selecting the number of replicates-speed programs up by about 50 percent and 100 percent, respectively. This work can be viewed as a case study in exploiting domain-specific information; some of the principles presented here may apply to other domains as well 相似文献

14.

Language and Compiler Design for Streaming Applications

Saman Amarasinghe Michael l. Gordon Michal Karczmarek Jasper Lin David Maze Rodric M. Rabbah William Thies 《International journal of parallel programming》2005,33(2-3):261-278

High-performance streaming applications are a new and distinct domain of programs that is increasingly important. The StreamIt language provides novel high-level representations to improve programmer productivity and program robustness within the streaming domain. At the same time, the StreamIt compiler aims to improve the performance of streaming applications via stream-specific analysis and optimizations. In this paper, we motivate, describes and justify the StreamIt language which include a structured model of streams, a messaging system for control, and a natural textual syntax. 相似文献

15.

Combining Loop Transformations Considering Caches and Scheduling

Michael E. Wolf Dror E. Maydan Ding-Kai Chen 《International journal of parallel programming》1998,26(4):479-503

The performance of modern microprocessors is greatly affected by cache behavior, instruction scheduling, register allocation and loop overhead. High-level loop transformations such as fission, fusion, tiling, interchanging and outer loop unrolling (e.g., unroll and jam) are well known to be capable of improving all these aspects of performance. Difficulties arise because these machine characteristics and these optimizations are highly interdependent. Interchanging two loops might, for example, improve cache behavior but make it impossible to allocate registers in the inner loop. Similarly, unrolling or interchanging a loop might individually hurt performance but doing both simultaneously might help performance. Little work has been published on how to combine these transformations into an efficient and effective compiler algorithm. In this paper, we present a model that estimates total machine cycle time taking into account cache misses, software pipelining, register pressure and loop overhead. We then develop an algorithm to intelligently search through the various, possible transformations, using our machine model to select the set of transformations leading to the best overall performance. We have implemented this algorithm as part of the MIPSPro commercial compiler system. We give experimental results showing that our approach is both effective and efficient in optimizing numerical programs. 相似文献

16.

Vector data flow analysis for SIMD optimizations on OpenCL programs

Yu‐Te Lin Jenq‐Kuen Lee 《Concurrency and Computation》2016,28(5):1629-1654

Multi‐core systems equipped with micro processing units and accelerators such as digital signal processors (DSPs) and graphics processing units (GPUs) have become a major trend in processor design in recent years in attempts to meet ever‐increasing application performance requirements. Open Computing Language (OpenCL) is one of the programming languages that include new extensions proposed to exploit the computing power of these kinds of processors. Among the newly extended language features, the single‐instruction multiple‐data (SIMD) linguistics and vector types are added to OpenCL to exploit hardware features of the accelerators. The addition makes it necessary to consider how traditional compiler data flow analysis can be adopted to meet the optimization requirements of vector linguistics. In this paper, we propose a calculus framework to support the data flow analysis of vector constructs for OpenCL programs that compilers can use to perform SIMD optimizations. We model OpenCL vector operations as data access functions in the style of mathematical functions. We then show that the data flow analysis for OpenCL vector linguistics can be performed based on the data access functions. Based on the information gathered from data flow analysis, we illustrate a set of SIMD optimizations on OpenCL programs. The experimental results incorporating our calculus and our proposed compiler optimizations show that the proposed SIMD optimizations can provide average performance improvements of 22% on x86 CPUs and 4% on advanced micro devices GPUs. For the selected 15 benchmarks, 11 of them are improved on x86 CPUs, and six of them are improved on advanced micro devices GPUs. The proposed framework has the potential to be used to construct other SIMD optimizations on OpenCL programs. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献

17.

面向RISC-V的汇编程序语义等价性自动化测试系统

徐学政王涛方健张光达《计算机系统应用》2021,30(11):33-40

本文设计并实现了一套面向RISC-V的汇编程序语义等价性自动化测试系统.在面向RISC-V开发软件时,尤其是基于扩展指令(例如向量指令)编写高效的程序时,很难避免以手写汇编的方式编写代码.例如,为标准的C函数库编写相应的向量版函数.与编译器自动生成的代码不同,手写的汇编代码虽然可以最大限度地提高程序的效率,但因绕过了编译时对程序的约束(如类型检查、寄存器分配等)而对开发者提出了更高的要求.能否对新版本与标准版本的汇编程序进行快速地、自动化的语义等价性测试,将大大影响代码的正确性和软件开发和调试的效率.已有面向RISC-V的测试框架缺乏对语义等价性测试的支持,也未考虑程序执行带来的副作用.本研究基于模拟器的动态测试环境,设计并实现了一套面向RISC-V的汇编程序语义等价性自动化测试系统.系统通过跟踪机器状态,捕获程序执行的副作用,并结合用户定义的测试目标生成测试报告.实验表明,本系统相比已有的测试系统,能够有效地对RISC-V汇编程序的语义等价性进行测试. 相似文献

18.

基于规范划分集的并行循环计算划分

下载免费PDF全文

黄其军杨建武余华山许卓群《软件学报》2003,14(3):362-368

计算划分问题是并行编译中最为重要的问题之一.针对并行循环,在数据分布确定的情况下,提出了基于规范集的计算划分算法,具体讨论了规范集的获取方法及综合通信与负载均衡的最优方案选取算法.实验表明,在并行循环处理方面,这一算法与以前几种算法相比更加简单、有效;采用这一算法的p_HPF编译器对数据并行应用问题可以获得良好的加速比和效率.该编译器已在石油领域得到应用. 相似文献

19.

A Loop Transformation Algorithm for Communication Overlapping

Kazuaki Ishizaki Hideaki Komatsu Toshio Nakatani 《International journal of parallel programming》2000,28(2):135-154

Overlapping communication with computation is a well-known approach to improving performance. Previous research has focused on optimizations performed by the programmer. This paper presents a compiler algorithm that automatically determines the appropriate loop indices of a given nested loop and applies loop interchange and tiling in order to overlap communication with computation. The algorithm avoids generating redundant communication by providing a framework for combining information on data dependence, communication, and reuse. It also describes a method of generating messages to exchange data between processors for tiled loops on distributed memory machines. The algorithm has been implemented in our High Performance Fortran (HPF) compiler, and experimental results have shown its effectiveness on distributed memory machines, such as the RISC System/6000 Scalable POWERparallel System. This paper also discusses the architectural problems of efficient optimization. 相似文献

20.

一个有效的并行分析算法 总被引：3，自引：0，他引：3

胡永刚乔如良《计算机学报》1999,22(2):134-140

并行分析在并行编译系统中有着很重要的作用,它的优劣直接影响到编译系统的成败,随着机群系统及其并行开发环境的发展,多数的并行系统可支持多重并行循环的运行。而对只支持一重并行循环的编程系统,选择并行运行效率最高的循环,也是很重要的。为此,本文提出了一个有效的循环并行分析方案,它不但能给出多层循环的并行性,而且能够处理绝大部分实际应用中的并行性问题,本文对传统的并行分析算法进行修改,并给出了一个有效的并相似文献