期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

基于区域平均执行时间和数据依赖信息的可能并行区域识别 总被引：1，自引：0，他引：1

张超王蕾向晓娅冯晓兵《计算机学报》2008,31(10)

随着多核处理器逐渐成为处理器发展的新趋势,为了持续提高程序性能,必须并行执行应用程序.传统的自动并行技术能够很好地并行科学计算应用中的规则循环,但对于含有大量函数调用和指针引用的不规则程序,目前还不能有效地对其实施并行.针对这一现状,文中提出了基于区域平均执行时间和数据依赖信息的可能并行区域识别方法来对一些不规则程序实施高效并行,主要贡献如下:(1)自动识别程序中的多种并行性,不仅包括传统并行性分析中的循环迭代间的细粒度并行性,而且也包括传统并行性分析尚不能有效处理的循环体和函数调用点间的粗粒度并行性.对于程序中蕴含的众多并行性,文中基于区域平均执行时间实施收益分析来选择合适的并行区域实施并行;(2)自动识别可能并行区域间数据依赖关系的数量、类型以及导致数据依赖关系的程序变量.基于文中的分析结果,作者使用面向行为的投机并行系统(behavior oriented parallelism)对SPEC2006中的4个测试用例实现了并行化.并行化后的程序在Intel和AMD多核处理器上分别得到了300%和260%的平均性能加速. 相似文献

2.

一种非可规约循环的投机并行方法

下载免费PDF全文

邓之刚曾国荪周静《计算机工程与科学》2007,29(10):135-138

传统的并行编译器在处理非可规约循环时一般使用结点分割法,但由此带来的代码复制是不可避免的。本文使用投机的方法来挖掘非可规约循环的并行性,该方法在编译时查找程序中的非可规约循环,在运行时使用＂持续引用＂策略预测该循环的入口,进而实现非可规约循环的并行化。相似文献

3.

多重循环的软件流水：比较和提高

李文龙汤志忠《计算机科学》2004,31(3):163-166

循环并行化是并行编译的核心问题之一。许多科学计算程序的大部分执行时间花费在循环上，有效开发循环中的并行性将提高整个程序的执行效率。多重循环最为常见，因此并行化多重循环具有重要的理论和现实意义。现代处理器中硬件资源迅速增长，也使得在整个多维循环空间中开发并行性成为必要。目前大多数软件流水算法只对最内层循环，仅有少数的算法对多重循环进行软件流水，本文介绍几种多重循环的软件流水算法，比较它们之间的相似与不同之处，为编译器实现中算法的选择提供了指导。相似文献

4.

用于含过程调用DO循环的循环嵌入方法

原庆能丁永华臧斌宇朱传琪《软件学报》1997,8(11):809-816

循环是程序中蕴含并行性最为丰富的一种结构，因此成为并行化编译最主要的对象．但循环内的过程调用严重妨碍了循环的数据相关性分析，使得循环语句潜在的大量并行性得不到开发．本文提出的循环嵌入方法使部分含过程调用循环语句的并行化成为可能，对部分用其它过程间分析技术也能开发其并行性的这一类循环语句采用循环嵌入方法，并行化开销低，并且分析更精确．采用循环嵌入方法还可降低程序由于多次过程调用带来的调度开销．这一方法在作者开发的自动并行化编译系统AFT（automaticPortrantransformer）中得到了实现，对Spec92测试程序包的试验结果表明了本文提出的方法是行之有效的．相似文献

5.

自动并行编译新技术 总被引：1，自引：0，他引：1

阳雪林于勐陈道蓄谢立《软件学报》2000,11(9):1268-1275

自动并行编译为并行化现有的串行程序及编写新的并行程序提供了重要的支持 ,因此 2 0多年来一直受到重视 .近几年来 ,自动并行编译技术的研究进展 ,包括在依赖关系分析、程序变换、数据分布和重分布及调度等方面的进展 ,将自动并行编译进一步推向了实用化 .该文介绍了自动并行编译技术的最新进展 ,并提出了进一步的研究所要解决的问题. 相似文献

6.

基于LLVM Pass的复杂嵌套循环自动并行化框架

马春燕吕炳旭叶许姣张雨《软件学报》2023,34(7):3022-3042

随着多核处理器的普及应用,针对嵌入式遗留系统中串行代码的自动并行化方法是研究热点.其中,针对具有非完美嵌套结构、非仿射依赖关系特征的复杂嵌套循环的自动并行化方法存在技术挑战.提出了一种基于LLVMPass的复杂嵌套循环的自动并行化框架(CNLPF).首先,提出了一种复杂嵌套循环的表示模型,即循环结构树,并将嵌套循环的正则区域自动转换为循环结构树表示;然后,对循环结构树进行数据依赖分析,构建循环内和循环间的依赖关系;最后,基于OpenMP共享内存的编程模型生成并行的循环程序.针对SPEC2006数据集中包含近500个复杂嵌套循环的6个程序案例,分别对其进行复杂嵌套循环占比统计和并行性能加速测试.结果表明,提出的自动并行化框架可以处理LLVMPolly无法优化的复杂嵌套循环,增强了LLVM的并行编译优化能力,且该方法结合Polly的组合优化,比单独采用Polly优化的加速效果提升了9%-43%. 相似文献

7.

一种基于特征的程序可并行点发现方法

郭慎李培峰朱巧明《计算机应用与软件》2011,28(4)

并行编译技术的首要问题就是程序中可并行点的发现。以程序执行时间、程序中的循环部分、数据依赖性分析以及程序执行时间与循环次数比等特征来表征程序的可并行性,并采用SVM根据以上特征进行程序中的可并行点的挖掘。实验证明,该方法更能符合实际应用的需要,发现的可并行点做并行化后有可观的性能加速比。相似文献

8.

基于分布式系统的可并行循环动态识别技术

阳雪林于勐陈道蓄谢立《软件学报》2002,13(8):1718-1722

针对分布式环境下可抽取观察循环的不规则串行程序循环的动态依赖关系分析问题,提出了一个基于观察/执行模型的动态分析算法.其贡献是:(1) 算法可并行执行于分布式系统;(2) 直接分析具有拷入和最后赋值操作的循环;(3) 给出了循环的并行化方法;(4) 并不要求循环是完全可并行的,对某些部分可并行循环,也支持其并行执行.理论分析和实验表明,在处理器数量适当的情况下,循环可以并行时,可以获得很好的加速比;不能并行时,对串行执行增加的开销也是小的.从而为分布式环境下开发更多的循环并行性提供了一种新的手段. 相似文献

9.

并行推理机编译技术的研究

黄志毅胡守仁《计算机科学》1990,17(3):10-15

自动模式识别,数据相关性分析、AND并行性的开发、副作用处理、并行性的粒度分析、并发语言的处理和WAM指令集的扩充是并行推理机编译中所面临的一些课题。本文对这些课题及我们所做的工作逐一作了论述,并展示了并行推理机编译技术研究的前景。相似文献

10.

一种基于分解变换的并行化编译新技术 总被引：1，自引：0，他引：1

陈清萍李晓峰郑世荣《计算机科学》1998,(1)

并行变换是并行化编译过程中的重要组成部分,它对源程序进行等价重构,使其获得更多并行机会。传统的并行变换技术主要侧重干循环并行性的开发,利用各种循环并行技术采将循环重构为具有相似文献

11.

Automatic Parallelization of Recursive Procedures

Manish Gupta Sayak Mukhopadhyay Navin Sinha 《International journal of parallel programming》2000,28(6):537-562

Parallelizing compilers have traditionally focussed mainly on parallelizing loops. This paper presents a new framework for automatically parallelizing recursive procedures that typically appear in divide-and-conquer algorithms. We present compile-time analysis, using powerful, symbolic array section analysis, to detect the independence of multiple recursive calls in a procedure. This allows exploitation of a scalable form of nested parallelism, where each parallel task can further spawn off parallel work in subsequent recursive calls. We describe a runtime system which efficiently supports this kind of nested parallelism without unnecessarily blocking tasks. We have implemented this framework in a parallelizing compiler, which is able to automatically parallelize programs like quicksort and mergesort, written in C. For cases where even the advanced compile-time analysis we describe is not able to prove the independence of procedure calls, we propose novel techniques for speculative runtime parallelization, which are more efficient and powerful in this context than analogous techniques proposed previously for speculatively parallelizing loops. Our experimental results on an IBM G30 SMP machine show good speedups obtained by following our approach. 相似文献

12.

The LRPD test: speculative run-time parallelization of loops withprivatization and reduction parallelization

Rauchwerger L. Padua D.A. 《Parallel and Distributed Systems, IEEE Transactions on》1999,10(2):160-180

Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently defined access patterns. As parallelizable loops arise frequently in practice, we advocate a novel framework for their identification: speculatively execute the loop as a doall and apply a fully parallel data dependence test to determine if it had any cross-iteration dependences; if the test fails, then the loop is reexecuted serially. Since, from our experience, a significant amount of the available parallelism in Fortran programs can be exploited by loops transformed through privatization and reduction parallelization, our methods can speculatively apply these transformations and then check their validity at run-time. Another important contribution of this paper is a novel method for reduction recognition which goes beyond syntactic pattern matching: it detects at run-time if the values stored in an array participate in a reduction operation, even if they are transferred through private variables and/or are affected by statically unpredictable control flow. We present experimental results on loops from the PERFECT Benchmarks, which substantiate our claim that these techniques can yield significant speedups which are often superior to those obtainable by inspector/executor methods 相似文献

13.

Evaluating automatic parallelization in SUIF

Sungdo Moon Byoungro So Hall M.W. 《Parallel and Distributed Systems, IEEE Transactions on》2000,11(1):36-49

This paper presents the results of an experiment to measure empirically the remaining opportunities for exploiting loop-level parallelism that are missed by the Stanford SUIF compiler, a state-of-the-art automatic parallelization system targeting shared-memory multiprocessor architectures. For the purposes of this experiment, we have developed a run-time parallelization test called the Extended Lazy Privatizing Doall (ELPD) test, which is able to simultaneously test multiple loops in a loop nest. The ELPD test identifies a specific type of parallelism where each iteration of the loop being tested accesses independent data, possibly by making some of the data private to each processor. For 29 programs in three benchmark suites, the ELPD test was executed at run time for each candidate loop left unparallelized by the SUIF compiler to identify which of these loops could safely execute in parallel for the given program input. The results of this experiment point to two main requirements for improving the effectiveness of parallelizing compiler technology: incorporating control flow tests into analysis and extracting low-cost run-time parallelization tests from analysis results 相似文献

14.

Run-Time Support for the Automatic Parallelization of Java Programs

Bryan Chan Tarek S. Abdelrahman 《The Journal of supercomputing》2004,28(1):91-117

相似文献

15.

Unified Interprocedural Parallelism Detection

Jay P. Hoeflinger Yunheung Paek Kwang Yi 《International journal of parallel programming》2001,29(2):185-215

相似文献

16.

使用后向信息的动态数据流分析

吴蓉李剑慧朱传琪《计算机工程》2001,27(7):103-104,150

介绍了动态数据流分析的基本方法,分析了它在复杂控制流条件下的不足,提出了一种能够使用后向信息来进行动态数据流分析的BPD测试方法,该方法能够消除动态死码的副作用,从一个循环中提取相当部分的并行性。给出了在SPEC95基准程序包中的fpppp.f的实验结果,验证了BPD测试可以获得其他现有方法不能取得的显著的加速比。相似文献

17.

EFFECTIVE PARALLELIZATION TECHNIQUES FOR LOOP NESTS WITH NON-UNIFORM DEPENDENCES

《International Journal of Parallel, Emergent and Distributed Systems》2012,27(1):37-64

The parallelism of loop nests with non-uniform dependences is difficult to extract and ineffectively explored by the existing parallelization schemes. In this paper, we propose new efficient techniques in extracting parallelism of loop nests with non-uniform dependences using their irregularity. By this way, current highly parallel multiprocessor systems such as multithreaded and clustering multiprocessor systems can be fully utilized. These four mechanisms are (a) parallelization part splitting, (b) partial parallelization decomposition, (c) irregular loop interchange and (d) growing pattern detection. They explore parallelisms of special parallel patterns for nested loops with non-uniform dependences. The loop transformations used in uniform loops are also applied in non-uniform dependence loops after legality tests. We apply the results of classical convex theory and detect special parallel patterns of dependence vectors. We also proposed an algorithm that combines above mechanisms to enhance parallelism. We demonstrate that our technique gives much better speedup and extracts more parallelism than the existing techniques. Thus, we are encouraged by these apparent enhancements to pursue further development. 相似文献

18.

Dynamic and Speculative Polyhedral Parallelization Using Compiler-Generated Skeletons

Alexandra Jimborean Philippe Clauss Jean-François Dollinger Vincent Loechner Juan Manuel Martinez Caamaño 《International journal of parallel programming》2014,42(4):529-545

We propose a framework based on an original generation and use of algorithmic skeletons, and dedicated to speculative parallelization of scientific nested loop kernels, able to apply at run-time polyhedral transformations to the target code in order to exhibit parallelism and data locality. Parallel code generation is achieved almost at no cost by using binary algorithmic skeletons that are generated at compile-time, and that embed the original code and operations devoted to instantiate a polyhedral parallelizing transformation and to verify the speculations on dependences. The skeletons are patched at run-time to generate the executable code. The run-time process includes a transformation selection guided by online profiling phases on short samples, using an instrumented version of the code. During this phase, the accessed memory addresses are used to compute on-the-fly dependence distance vectors, and are also interpolated to build a predictor of the forthcoming accesses. Interpolating functions and distance vectors are then employed for dependence analysis to select a parallelizing transformation that, if the prediction is correct, does not induce any rollback during execution. In order to ensure that the rollback time overhead stays low, the code is executed in successive slices of the outermost original loop of the nest. Each slice can be either a parallel version which instantiates a skeleton, a sequential original version, or an instrumented version. Moreover, such slicing of the execution provides the opportunity of transforming differently the code to adapt to the observed execution phases, by patching differently one of the pre-built skeletons. The framework has been implemented with extensions of the LLVM compiler and an x86-64 runtime system. Significant speed-ups are shown on a set of benchmarks that could not have been handled efficiently by a compiler. 相似文献

19.

Using knowledge-based systems for research on parallelizing compilers

Chao-Tung Yang Shian-Shyong Tseng Yun-Woei Fann Ting-Ku Tsai Ming-Huei Hsieh Cheng-Tien Wu 《Concurrency and Computation》2001,13(3):181-208

相似文献

20.

面向对象语言并行化中的调用局部化优化

于勐臧婉瑜谢立孙钟秀过敏意《计算机学报》2002,25(4):409-416

该文提出了一种将调用局部化技术应用于并行环境下面向对象语言的方法，文中详细讨论了该技术的适用条件以及如何通过该方法减少循环中的远程过程调用开销，该优化技术产首先将循环分离成多个包含有远程调用的循环，再将分离后的循环分离给循环中对象所在的处理器，最后，化简迭代空间，并且用消息传递来传输数据，这种优化对象分布和循环并行化之后进行，将函数调用局部化于处理器，通过这种优化，可以进一步挖掘循环中的任务并行性，降低计算复杂度，减少函数调用开销，尤其适合面向对象语言中对循环里小函数的优化，该技术已经在作者设计的Java自动并行化编译器JAPS－Ⅱ中实现，在实验中，利用这种优化技术得到了超线性性加速比。相似文献