首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
大量遗留的串行代码需要进行并行化改造,而并行程序复杂性及并行计算平台多样性导致改造成本较高.为此,设计了一种基于标记语言的三层并行编程框架,完成了从串行程序层到并行中间代码层、并行中间代码层到目标并行编程语言程序层的二个转换阶段.采用对串行代码进行语言标记的方法来实现并行中间代码层,该代码层实际是共享存储、分布式存储并行平台编程语言的一种抽象.该框架还实现了一种性能标记方法,可用于并行参数自动寻优.用于雷达数据处理的实验结果表明,实现了对应并行代码的生成,且并行加速比与人工实现的并行代码相当.  相似文献   

2.
域:支持并行程序概念设计的一种抽象手段   总被引:1,自引:1,他引:0  
1 引言长期以来,并行计算机在其潜在高性能和现实性能以及简单易用性之间存在着巨大的鸿沟,原因之一在于并行程序设计方法学的滞后导致大量的并行程序设计实践沿袭着传统串行程序开发的思路,而另一个重要的瓶颈在于并行程序的开发缺乏高效编程环境的支持。目前,并行程序的构造主要采取以下两种方式:(一)依据并行算法编写并行程序。这种方式构造的并行程序,一般能达到较高的并行效率,但对广大应用领域的用户要求太高。(二)利用并行化工具对串行程序进行并行改造。由于串行计算机上己经积累了大量成熟的应用程序,通过自动并行编译工具将串行程序移植到并行计算机上运行,无疑具有极为重要的现实意义。并行化方法由于对并行编译的对象缺乏高层的全局信息和并行信息,并行粒度较细,并行效率往往不  相似文献   

3.
邹卫军 《计算机工程》2008,34(9):268-269,
计算和数据分解是分布主存系统中并行编译的关键,在并行优化编译器的并行识别过程中,许多串行代码无法找到全局一致的分解结果.针对这种情况,该文提出一种融合程序控制流的动态分解算法,增加控制流对分解的影响,使生成的分解结果更适合于后端自动生成的并行代码.实验分析结果表明了该方法的有效性.  相似文献   

4.
刘有耀  杨鹏程 《计算机应用》2016,36(9):2422-2426
针对当前大量遗产代码无法重复利用的问题,设计一种新的编译工具将C的串行代码转换为基于MPI+OpenMP的混合并行编程代码,降低了并行编程的开发成本。首先,通过对JavaCC的优化,实现一种可以解析C语言的词法和语法分析器,进行源代码分析并生成抽象语法树;其次,根据语法树对源代码进行控制依赖性和数据依赖性分析,产生可并行化的语句块分区;再次,按照提出的并行代码生成方法得到目标代码;最后,基于Visual Studio 2010构建目标代码仿真验证环境。实验结果表明,该工具可以较为理想地实现串行代码自动并行化,与手工编写的代码在加速比上的误差为8.2%~18.4%。  相似文献   

5.
陈俊朴 《计算机工程》2009,35(10):33-36
网络处理器具有并行体系结构,而其高级语言往往具有串行语义。对串行程序进行并行化编译要求引入同步,而同步的优劣又影响生成代码的执行效率。针对网络处理器上的程序,提出一个对同步进行优化的程序划分算法以增加程序的并行性。实验数据表明,在一些有代表性的网络应用上,该算法可提高程序的并行性,并提升性能。  相似文献   

6.
针对复杂民机机电系统仿真效率低的问题,提出一种基于数据分发服务的分布式并行建模与仿真方法.分析数据分发服务的通信特点;利用数据分发服务的API函数和AMESim应用程序编程接口,在现有分布式互联架构平台的基础上,设计AMESim与分布式互联架构平台的数据交互接口,实现了民机升降舵系统与液压能源系统的分布式并行建模与仿真.仿真结果表明,该分布式方法保证了仿真变量分布式前和分布式后的同步性,拓展了复杂系统仿真的规模,并缩短了12.7%的仿真计算时间.  相似文献   

7.
随着网格技术的深入发展和普及应用,企业对应用程序设计提出了更高要求,应用程序的编程日益复杂。但是,基于网格技术的应用程序编写必须根据网格语义实现,应用程序需要能够同时提供多个编程接口的编程模型。由于网格接口数量会越来越多,编程代码规模会更加庞大和复杂,本文利用Java语言的跨平台特征,提出了分布式线程编程模型的构建方案,对促进网格应用程序的开发和设计具有实际意义。  相似文献   

8.
随着计算机体系结构的发展,分布式存储结构以其良好的扩展性逐渐占据了高性能计算机体系结构市场的主导地位.为了将现有的串行程序转换为能够在高性能计算机上运行的并行程序,研究人员提出了并行化编译器.然而,当前面向分布存储并行系统的编译器发展却相对较慢,而面向共享存储并行系统的编译器及其相应技术已逐渐成熟.一种开发面向分布存储并行系统编译器的可行方法是改进现有的面向共享存储并行系统的编译器,使其自动生成能够在分布存储结构高性能计算机上运行的MPI(Message Passing Interface)并行程序.因此,该文为面向共享存储并行系统的编译器Open64设计并实现了一个支持MPI代码生成的后端.根据分布式并行化编译的特点,主要从自动生成计算划分、改进循环优化和自动生成MPI并行代码3个方面对Open64进行了改进,使其能够实现面向分布存储的并行化编译.实验测试利用带有MPI后端的Open64对串行程序进行编译,生成的MPI并行代码可直接运行在具有分布存储结构的高性能计算机上.通过将该MPI并行代码的执行效率与传统面向分布存储并行系统编译器生成的MPI代码效率进行比较,并行效率有明显的提升.  相似文献   

9.
基于融合程序控制流的动态分解算法   总被引:1,自引:1,他引:0       下载免费PDF全文
计算和数据分解是分布主存系统中并行编译的关键,在并行优化编译器的并行识别过程中,许多串行代码无法找到全局一致的分解结果。针对这种情况,该文提出一种融合程序控制流的动态分解算法,增加控制流对分解的影响,使生成的分解结果更适合于后端自动生成的并行代码。实验分析结果表明了该方法的有效性。  相似文献   

10.
分布式智能视觉系统中基于KQML的Agent通信模型   总被引:2,自引:0,他引:2  
分布式智能视觉系统中多个Visual Agent之间的通信采用KQML实现,但仅凭KQML预留的行为原语是无法实现这种Agent闻任务级的高级、复杂交互的.因此,对KQML扩充了7条行为原语,并给出了定义、语义,构建了分布式智能视觉系统的Agent通信模型.  相似文献   

11.
The execution model for mobile, dynamically‐linked, object‐oriented programs has evolved from fast interpretation to a mix of interpreted and dynamically compiled execution. The primary motivation for dynamic compilation is that compiled code executes significantly faster than interpreted code. However, dynamic compilation, which is performed while the application is running, introduces execution delay. In this paper we present two dynamic compilation techniques that enable high performance execution while reducing the effect of this compilation overhead. These techniques can be classified as (1) decreasing the amount of compilation performed, and (2) overlapping compilation with execution. We first present and evaluate lazy compilation, an approach used in most dynamic compilation systems in which individual methods are compiled on‐demand upon their first invocation. This is in contrast to eager compilation, in which all methods in a class are compiled when a new class is loaded. In this work, we describe our experience with eager compilation, as well as the implementation and transition to lazy compilation. We empirically detail the effectiveness of this decision. Our experimental results using the SpecJVM Java benchmarks and the Jalapeño JVM show that, compared to eager compilation, lazy compilation results in 57% fewer methods being compiled and reductions in total time of 14 to 26%. Total time in this context is compilation plus execution time. Next, we present profile‐driven, background compilation, a technique that augments lazy compilation by using idle cycles in multiprocessor systems to overlap compilation with application execution. With this approach, compilation occurs on a thread separate from that of application threads so as to reduce intermittent, and possibly substantial, delay in execution. Profile information is used to prioritize methods as candidates for background compilation. Methods are compiled according to this priority scheme so that performance‐critical methods are invoked using optimized code as soon as possible. Our results indicate that background compilation can achieve the performance of off‐line compiled applications and masks almost all compilation overhead. We show significant reductions in total time of 14 to 71% over lazy compilation. Copyright © 2001 John Wiley & Sons, Ltd.  相似文献   

12.
M. K. Crowe 《Software》1987,17(7):455-467
A system for dynamic compilation under the Unix operating system is described. The basis of the system is an incremental assembler that can be used statically or during program execution to insert or replace a module in an executable image. All cross-module references are via offets into a run-time symbol table. All generated code is independent of its location or the location of the symbol table. The symbol table and all modules reside in memory segments compatible with the memory allocator malloc() . The symbol table origin is maintained in a processor register. Library procedures allow the assembler (or C compiler) to be called to alter the currently executing program, or to place a stub function which acts as a trap, so that when the stub is invoked it caues a file to be dynamically compiled into the executing program to replace the stub with a bona fide procedure. This facilitates the construction of advanced interactive environments using native code. Some example applications, to Prolog and to incremental compilation, are considered.  相似文献   

13.
OP2 is a high-level domain specific library framework for the solution of unstructured mesh-based applications. It utilizes source-to-source translation and compilation so that a single application code written using the OP2 API can be transformed into multiple parallel implementations for execution on a range of back-end hardware platforms. In this paper we present the design and performance of OP2’s recent developments facilitating code generation and execution on distributed memory heterogeneous systems. OP2 targets the solution of numerical problems based on static unstructured meshes. We discuss the main design issues in parallelizing this class of applications. These include handling data dependencies in accessing indirectly referenced data and design considerations in generating code for execution on a cluster of multi-threaded CPUs and GPUs. Two representative CFD applications, written using the OP2 framework, are utilized to provide a contrasting benchmarking and performance analysis study on a number of heterogeneous systems including a large scale Cray XE6 system and a large GPU cluster. A range of performance metrics are benchmarked including runtime, scalability, achieved compute and bandwidth performance, runtime bottlenecks and systems energy consumption. We demonstrate that an application written once at a high-level using OP2 is easily portable across a wide range of contrasting platforms and is capable of achieving near-optimal performance without the intervention of the domain application programmer.  相似文献   

14.
Techniques are described for the automatic generation of self-scheduling parallel programs. Both scheduling algorithms and the concurrent components of applications are expressed in a high-level concurrent language. Partitioning and data dependency information are expressed by simple control statements, which may be generated either automatically or manually. A self-scheduling compiler, implemented as a source-to-source transformation, takes application code, control statements, and scheduling routines and generates a new program that can schedule its own execution on a parallel computer. The approach has several advantages compared to previous proposals. It generates programs that are portable over a wide range of parallel computers. There is no need to embed special control structures in application programs. The use of a high-level language to express applications and scheduling algorithms facilitates the development, modification, and reuse of parallel programs  相似文献   

15.
编译优化技术的目的是挖掘程序中的优化空间,提高程序编译或运行效率,无效代码删除优化是被广泛使用的编译优化技术之一,它旨在删除程序中不可达的代码,以提升程序的执行效率。许多应用程序的执行路径往往与运行时的输入参数值相关,并且在一些分支路径上与运行时参数值相结合,可能存在无效代码,通过现有的无效代码删除优化,很难做出优化处理。为此,提出一种依赖数据流分析的激进蝴蝶优化方法,利用SSA中间表示,根据动态运行时的参数可能值,自动为程序生成代码形状类似蝴蝶(butterfly)的分支代码,使编译器在程序编译阶段为相关优化提供可行的优化依据。最后通过实验验证了该方法的有效性和可行性。  相似文献   

16.
针对网络化PLC控制系统特点,提出了基于事件图的控制程序建模与编译方法。通过将控制系统映射为离散事件系统,建立了控制程序的事件图模型;通过改进的深度优先搜索算法实现了事件图解耦,将串行执行的控制程序分解为可并行执行的事件序列;根据IO变量位置以及指令预期执行时间为事件序列分组,并下载至最佳设备;通过插入网络通讯指令,实现设备间变量同步。实验结果表明该方法可有效识别与提取控制程序中的并行任务,将其合理分配下载至不同的控制器中,同时保证控制逻辑的正确与同步。  相似文献   

17.
一个交互式的Fortran77并行化系统   总被引:6,自引:1,他引:5  
陈文光  杨博  王紫瑶  郑丰宙  郑纬民 《软件学报》1999,10(12):1259-1267
并行化编译器可以把现有的串行程序自动或半自动地转换为并行程序.现有并行化系统的自动并行化效果与手工并行化的效果相比还有一定的差距,这是由于并行化工具的分析能力不足以及程序中所固有的语义信息无法被并行化工具所理解而造成的.TIPS(Tsinghua interactive parallelizing system)系统通过提供一些友好的交互式工具,使用户与编译器紧密协作,是提高并行化系统的能力和效率的一条有效途径.  相似文献   

18.
Just-in-Time (JIT) compilation is frequently employed in order to speed-up the execution of platform-independent and dynamically extensible mobile code applications. Since the time required for dynamic compilation directly influences a program's execution time, JIT compilers usually utilize only simple and fast techniques for program analysis and optimization. To improve further the analysis and optimization process of such compilers program annotations can be used.However, mostly all current annotation approaches suffer from the fact that the verification of transmitted program information is time consuming and therefore will not be carried out on the consumer side of a mobile code system. In this paper, we present a verifiable annotation technique that is based on a well known iterative data flow algorithm and which can be used for the transmission of all program information that can be derived through data flow analysis. Preliminary measurements of compilation and verification time indicate that the presented technique seems to be implementable and therefore could be used as an all-purpose transportation technique for safe program annotations.  相似文献   

19.
The Dalvik virtual machine (VM) is an integral component used to execute applications in Android, which is one of the leading operating systems for mobile devices. The Dalvik VM is an interpreter and is equipped with a trace‐based just‐in‐time compiler for enhancing the execution performance of frequently executed paths, or traces. However, traces generated by the Dalvik VM can be stopped in a conditional branch or a method call/return, which means that these traces usually have a short lifetime, decreasing the effectiveness of the compiler optimizations applied to them. Furthermore, the just‐in‐time compiler applies only a few simple optimizations because of performance considerations. In this article we present a traces‐to‐region (T2R) framework that extends traces to regions and statically compiles these regions into native binaries so as to improve the execution of Android applications. The T2R framework involves three main stages: (i) the profiling stage, in which the run‐time trace information of an application is extracted; (ii) the compilation stage, in which regions are constructed from the extracted traces and are statically compiled into a native binary; and (iii) the execution stage, in which the compiled binary is loaded into the code cache when the application starts to execute. Experiments performed on an Android tablet demonstrated that the T2R framework was effective in improving the execution performance of applications by 10.5–16.2% and decreasing the size of the code cache by 4.6–28.5%. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

20.
Region-based compilation: Introduction, motivation, and initial experience   总被引:1,自引:0,他引:1  
The most important task of a compiler designed to exploit instruction-level parallelism (ILP) is instruction scheduling. If higher levels of ILP are to be achieved, the compiler must use, as the unit of scheduling, regions consisting of multiple basic blocks—preferably those that frequently execute consecutively, and which capture cycles in the program’s execution. Traditionally, compilers have been built using the function as the unit of compilation. In this framework, function boundaries often act as barriers to the formation of the most suitable scheduling regions. Function inlining may be used to circumvent this problem by assembling strongly coupled functions into the same compilation unit, but at the cost of very large function bodies. Consequently, global optimizations whose compile time and space requirements are superlinear in the size of the compilation unit, may be rendered prohibitively expensive. This paper introduces a new approach, called region-based compilation, wherein the compiler, after inlining, repartitions the program into more desirable compilation units, termed regions. Region-based compilation allows the compiler to control problem size and complexity while exposing inter-procedural scheduling, optimization and code motion opportunities.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号