首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 578 毫秒
1.
软件流水技术是对程序及微程序中的循环进行优化的一种有效方法,可对基本块构成的循环体进行软件流水的LURPR算法已取得令人满意的效果。本文将在LURPR法的基础上,把软件流水技术扩展到任意结构的循环体,并给出相应的GURPR算法,GURPR算法可对任意的含非正常入口、条件出口、支路、循环嵌套及子程序调用的循环体进行软件流水。  相似文献   

2.
URPR——一种实现软件流水技术的方法   总被引:2,自引:0,他引:2  
软件流水技术是对AP数组处理机循环程序进行优化的一种有效方法.本文介绍一种在微代码循环压缩URCR算法基础上研究的URPR算法.首先对循环体进行展开,展开的个数取决于循环体之间的数据相关程度,然后将展开后的循环体逐个进行安放,最后进行收拢得到一个优化后的新循环体.初步实验验证了URPR比目前现有一些方法具有优越性.  相似文献   

3.
文[1]介绍了下标追踪法的基本思想、基本理论与基本方法,从理论上证明了时序层次在等价变换下达到标准形是可并行处理的充要条件。然而,时序层次是在全程追踪的基础上建立起来的,全程追踪往往需要过大的计算量与存储量,难以实现。尤其是在DO变量的始值或/和终值编译时不可计值的情况下,整个时序层次实际上是未知的。因此,[1]的成果多半只有理论价值,仅在DO变量的始值与终值编译时均可计值且相差不远的条件下,能够加以实施。在文[1]的基础上,本文介绍下标追踪法工程实现中的下列技术: 1.循环体在追踪等价意义下的最简形式; 2.确定追踪区间; 3.放宽限制条件; 4.数据结构的设计; 5.向多重循环拓广; 6.识别与构造算法。核心内容是证明了层次片断定理。该定理给出追踪区间的确定方法和公式,循环体内任意两个语句的任意两个数组项之间所有类型的数据依赖关系,必定会在这个追踪区间内发生。这个定理实际上解决了坐标方法未能解决的整数环上受限二元一次方程的求解问题。在那里,添加同名数组下标表达式一次项系数全同的条件回避掉这个问题。层次片断定理突破了下标追踪法工程化的难关。从计算公式和许多实例可以看到:下标追踪法的计算量与存储量不仅是可接受的,甚至是比较小的。也是由于有了层次片断定理,对赋值语句循环的限制进一步放宽,达到只要求步长和线性下标表达式系数编译时可计值的程度。  相似文献   

4.
3种提高软件流水有效性的算法:比较和结合   总被引:1,自引:0,他引:1  
李文龙  陈彧  林海波  汤志忠 《软件学报》2005,16(10):1822-1832
软件流水是开发循环程序指令级并行性的技术,它通过并行执行连续的多个循环体来加快循环的执行速度.在软件流水中,循环体的重叠增加了寄存器需求,导致寄存器压力增大,当目标处理机所提供的寄存器不足时,软件流水可能失败.在Itanium处理机上评估了NAS和SPEC2000基准程序中的软件流水循环的寄存器需求,发现静态寄存器不足是造成软件流水失败的主要原因,提出了3种增加软件流水个数、提高软件流水有效性的算法:限制循环展开因子的算法(register sensitive unrolling,简称RSU)、堆栈寄存器分配算法(stacked registerallocation,简称SRA)以及变量类型转换的算法(variabletype conversion,简称VTC).RSU根据静态寄存器需求确定一个合理的展开因子,增加了软件流水的成功率;SRA和VTC分别使用空闲的堆栈寄存器和旋转寄存器来充当静态寄存器,提高了寄存器的利用率.在面向Itanium处理器的开放源码编译器ORC(open research compiler)上实现了这3种算法,通过NAS程序的测试比较了这3种算法的有效性,同时对它们的结合应用进行了研究和实验.  相似文献   

5.
为了解决谓词抽象技术面临的程序中循环体的每次迭代都至少需要一个谓词来实现的难题,提出了一个两阶段的不完全判定过程,用来对一个包含循环的反例进行可行性模拟.通过给出的循环探测算法来从抽象模型中提取出包含循环的反例,并用循环迭代的数量作为参数来确定模拟实例.实验结果表明,该方法在典型的缓冲溢出实例中的表现优于传统的抽象求精方法.  相似文献   

6.
江南  汪吕蒙  张晓瞳  何炎祥 《软件学报》2022,33(6):2115-2126
迭代计算数据流等式的解,是数据流分析的常用方法.计算支配节点,从而识别自然循环,是许多现代编译器优化分析的重要组成部分.机械化验证高效的求解支配节点的算法通常是获得一个实际的“验证编译器”不可或缺的一部分.为了形式化证明一个高效的迭代求解严格支配节点的算法(CHK),首先建立了值域是逆序列表集合的半格结构,逆序列表中的元素是控制流图中节点的逆后序遍历次序,并证明了它是一个半格,其偏序满足上升链条件.然后使用半格结构,实现了一个基于工作表的Kildall迭代算法,计算严格支配节点.接下来,首先给出了控制流图中支配节点的定义性规范和相关性质定理,然后构造并证明了迭代求解算法所满足的重要性质.利用这些性质定理,相对于定义性规范,证明了该迭代求解算法的正确性和完备性.最后进行总结,并讨论未来工作.整个形式化开发使用的是定理证明助手Isabelle/HOL.  相似文献   

7.
串行循环向量化的图论方法   总被引:1,自引:0,他引:1  
本文讨论FORTRAN内层DO循环的向量化问题。对于满足给定条件的DO循环,根据文中导出的计算公式,可以精确地构造出与循环体相对应的时序依赖关系图。按照判别一个有向图是否存在有向回路的充要条件,采取逐个消去图中某些出度或入度为零的顶点的“消元法”,找出循环体中可以向量化的语句,并统一讨论了确定语句顺序和引进临时数组的算法。  相似文献   

8.
对于给定的距离参数。,性质测试算法A需以高概率正确地区分给定的对象具备预定性质II与二远离性质 II。若存在II的测试算法A满足其询问复杂性独立于规模参数n,则称II是可测的。设H是一个图,性质仔了)℃。为 不含井子图的图所构成的集合。在有界度模型中,Goldreich与Ron证明了对任意连通图H,性质仔力℃。是可测 的}s}。在邻接矩阵模型中,证明了对任意图H,不管其连通与否,性质件厂re。是可测的。  相似文献   

9.
将循环平稳理论引入传播算子算法中,提出了循环传播算子的DOA估计算法。利用有用信号与干扰信号循环频率的不同和噪声在感兴趣的循环频率上不呈现谱相关的性质,有效滤除了干扰信号和噪声的影响,实现了高分辨的DOA估计,并且突破了传统传播算子算法信号源数一定要小于阵列个数的限制;同时它不需要对高维空时协方差矩阵进行特征值分解,与基于特征子空间分解的算法相比运算量低,利于实时处理。仿真结果证明了该算法的有效性。  相似文献   

10.
广义模糊逻辑和锁语义归结原理   总被引:9,自引:0,他引:9  
将命题的真值取在格上的模糊逻辑,我们称为广义模糊逻辑。本文讨论了这种广义模糊逻辑的性质,并证明了,对于一阶谓词公式,在广义模糊逻辑中的不可满足性和在二值逻辑中的不可满足性是等价的。还证明了原始的归结原理在广义模糊逻辑中是完备的。 最后,在模糊逻辑中讨论了涉及子句真值的语义归结原理,对于在广义模糊逻辑中的不可满足配锁子句集,在任意一个模糊解释下,使用语义归结原理,总可演绎出空子句。  相似文献   

11.
In this paper, we consider the optimal loop scheduling and minimum storage allocation problems based on the argument-fetching dataflow architecture model. Under the argument-fetching model, the result generated by a node is stored in a unique location which is addressable by its successors. The main contribution of this paper includes: for loops containing no loop-carried dependences, we prove that the problem of allocating minimum storage required to support rate-optimal loop scheduling can be solved in polynomial time. The polynomial time algorithm is based on the fact that the constraint matrix in the formulation is totally unimodular. Since the instruction processing unit of an argument-fetching dataflow architecture is very much like a conventional processor architecture without a program counter, the solution of the optimal loop storage allocation problem for the former will also be useful for the latter.  相似文献   

12.
软件流水的低功耗编译技术研究   总被引:4,自引:1,他引:4       下载免费PDF全文
对具有可动态独立调整运行频率/电压的多功能部件配置结构M,基于全局调度的循环依赖关系,使用ILP形式化框架,研究了对给定循环L进行动态频率/电压调整的低功耗软件流水调度的编译优化技术.提出了一种合理而有效的低功耗最优化软件流水调度方法,使其在运行时保持性能不变而消耗的功耗/能量最小.  相似文献   

13.
A method for executing nested loops with constant loop-carried dependencies in parallel on message-passing multiprocessor systems to reduce communication overhead is presented. In the partitioning phase, the nested loop is divided into blocks that reduce the interblock communication, without regard to the machine topology. The execution ordering of the iterations is defined by a given time function based on L. Lamport's (1974) hyperplane method. The iterations are then partitioned into blocks so that the execution ordering is not disturbed, and the amount of interblock communication is minimized. In the mapping phase, the partitioned blocks are mapped onto a fixed-size multiprocessor system in such a manner that the blocks that have to exchange data frequently are allocated to the same processor or neighboring processors. A heuristic mapping algorithm for hypercube machines is proposed  相似文献   

14.
Precise value-based data dependence analysis for scalars is useful for advanced compiler optimizations. The new method presented here for flow and output dependence uses Factored Use and Def chains (FUD chains), our interpretation and extension of Static Single Assignment. It is precise with respect to conditional control flow and dependence vectors. Our method detects dependences which are independent with respect to arbitrary loop nesting, as well as loop-carried dependences. A loop-carried dependence is further classified as being carried from the previous iteration, with distance 1, or from any previous iteration, with direction <. This precision cannot be achieved by traditional analysis, such as dominator information or reaching definitions. To compute anti- and input dependence, we use Factored Redef-Use chains, which are related to FUD chains. We are not aware of any prior work which explicitly deals with scalar data dependence utilizing a sparse graph representation. A preliminary version of this paper appeared in theSeventh Anual Workshop on Languages and Compilers for Parallel Computing, August 1994. Supported in part by NSF Grant CCR-9113885 and a grant from Intel Corporation and the Oregon Advanced Computing Institute.  相似文献   

15.
For pt.I see ibid., p.470-85. A methodology for designing pipelined data-parallel algorithms on multicomputers is studied. The design procedure starts with a sequential algorithm which can be expressed as a nested loop with constant loop-carried dependencies. The procedure's main focus is on partitioning the loop by grouping related iterations together. Grouping is necessary to balance the communication overhead with the available parallelism and to produce pipelined execution patterns, which result in pipelined data-parallel computations. The grouping should satisfy dependence relationships among the iterations and also allow the granularity to be controlled. Various properties of grouping are studied, and methods for generating communication-efficient grouping are given. Given a grouping and an assignment of the groups to the processors, an analytic model is combined with the grouping results to describe the behavior and to estimate the performance of the resultant parallel program. Expressions characterizing the performance are derived  相似文献   

16.
This paper deals with task scheduling, where each task is one particular iteration of a DO loop with partial loop-carried dependencies. Independent iterations of such loops can be scheduled in an order different from the one of classical serial execution, so as to increase program performance.The approach that we present is based both on the use of a directive added to the High Performance Fortran (HPF2) language, which specifies the dependencies between iterations, and on inspector/executor support, implemented in the CoLUMBO library, which builds the task graph and schedules tasks associated with iterations. We validate our approach by showing results achieved on an IBM SP2 for a sparse Cholesky factorization algorithm applied to real problems.  相似文献   

17.
Parallel loops account for the greatest amount of parallelism in numerical programs.Executing nested loops in parallel with low run-time overhead is thus very important for achieving high performance in parallel processing systems.However,in parallel processing systems with caches or local memories in memory hierarchies,“thrashing problemmay”may arise whenever data move back and forth between the caches or local memories in different processors.Previous techniques can only deal with the rather simple cases with one linear function in the perfactly nested loop.In this paper,we present a parallel program optimizing technique called hybri loop interchange(HLI)for the cases with multiple linear functions and loop-carried data dependences in the nested loop.With HLI we can easily eliminate or reduce the thrashing phenomena without reucing the program parallelism.  相似文献   

18.
消除VLIW结构上的循环体间冗余流相关   总被引:2,自引:1,他引:1  
容红波  汤志忠 《软件学报》2000,11(1):126-132
数据相关是并行处理的基本依据.该文指出,VLIW(very long instruction word)特有的锁步性质使其数据相关性分析具有与众不同的特点.同一体差上的流相关形成一个线序集合,多体差上的特征流相关之间也存在包含关系.据此,提出一种用于VLIW的消除循环体间冗余流相关的方法.该方法是完备的,可以去除所有冗余的体间流相关,从而减轻循环调度的负担.文章给出判定单体差和多体差存在冗余的充分必要条件,以及消除冗余的线性复杂度的算法.这种方法具有普遍意义,可作为VLIW上软件流水和多指令流调度的基础.  相似文献   

19.
This paper describes Cronus, a platform for parallelizing general nested loops. General nested loops contain complex loop bodies (assignments, conditionals, repetitions) and exhibit uniform loop-carried dependencies. The novelty of Cronus is twofold: (1) it determines the optimal scheduling hyperplane using the QuickHull algorithm, which is more efficient than previously used methods, and (2) it implements a simple and efficient dynamic rule (successive dynamic scheduling) for the runtime scheduling of the loop iterations along the optimal hyperplane. This scheduling policy enhances data locality and improves the makespan. Cronus provides an efficient runtime library, specifically designed for communication minimization, that performs better than more generic systems, such as Berkeley UPC. Its performance was evaluated through extensive testing. Three representative case studies are examined: the Floyd-Steinberg dithering algorithm, the Transitive Closure algorithm, and the FSBM motion estimation algorithm. The experimental results corroborate the efficiency of the parallel code. The tests show speedup ranging from 1.18 (out of the ideal 4) to 12.29 (out of the ideal 16) on distributed-systems and 3.60 (out of 4) to 15.79 (out of 16) on shared-memory systems. Cronus outperforms UPC by 5-95% depending on the test case.  相似文献   

20.
本文讨论有限自动机显表出定义函数f的线性本原圈积分解问题。在该圈积分解之下,函数f被表成线性外函数因子fL与线性本原内涵数因子fN的圈积,这里函数fL定义了一个线性弱可逆有限自动机MfL,而函数fN定义了一个非线性有限自动机MfN且其不能再分解成一非平凡的线性外函数与一非线性内函数的圈积。本文证明了一函数f的任两线性本原内因子互为对方的线性本原内因子,从而证明了函数f的线性本原圈积分解的唯一性。本  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号