期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Evaluating the Effects of Predicated Execution on Branch Prediction

Gary Tyson Matthew Farrens 《International journal of parallel programming》1996,24(2):159-186

As microprocessor designs move towards deeper pipelines and support for multiple instruction issue, steps must be taken to alleviate the negative impact of branch operations on processor performance. One approach is to use branch prediction hardware and perform speculative execution of the instructions following an unresolved branch. Another technique is to eliminate certain branch instructions altogether by translating the instructions following a forward branch into predicate form. Both these techniques are employed in many current processor designs. This paper investigates the relationship between branch prediction techniques and branch predication. In particular, we are interested in how using predication to remove a certain class of poorly predicted branches affects the prediction accuracy of the remaining branches. A variety of existing predication models for eliminating branch operations are presented, and the effect that eliminating branches has on branch prediction schemes ranging from simple prediction mechanisms to the newer more sophisticated branch predictors is studied. We also examine the impact of predication on basic block size, and how the two techniques used together affect overall processor performance. 相似文献

2.

Branch Misprediction Prediction: Complementary Branch Predictors

Sendag R. Yi J.J. Peng-fei Chuang 《Computer Architecture Letters》2007,6(2):49-52

In this paper, we propose a new class of branch predictors, complementary branch predictors, which can be easily added to any branch predictor to improve the overall prediction accuracy. This mechanism differs from conventional branch predictors in that it focuses only on mispredicted branches. As a result, this mechanism has the advantages of scalability and flexibility (can be implemented with any branch predictor), but is not on the critical path. More specifically, this mechanism improves the branch prediction accuracy by predicting which future branch will be mispredicted next and when that will occur, and then it changes the predicted direction at the predicted time. Our results show that a branch predictor with the branch misprediction predictor achieves the same prediction accuracy as a conventional branch predictor that is 4 to 16 times larger, but without significantly increasing the overall complexity or lengthening the critical path. 相似文献

3.

一种精确的分支预测微处理器模型 总被引：3，自引：0，他引：3

陈跃跃周兴铭《计算机研究与发展》2003,40(5):741-745

在当今深流水宽发射的微处理器中，为实现高性能，精确的分支预测是不可缺少的关键技术．分支预测失效将浪费大量的时钟周期，无法发挥乱序执行的效能．宽发射微处理器的有效性能同时还依赖指令窗口的大小和指令预取宽度．提出了一种新的更精确的支持分支预测和分支误预测周期损失的微处理器模型．根据指令的执行带宽为指令窗口中可用指令数的平方根统计规律，给出了一个更为精确的描述微处理器取指带宽、分支预测精度、分支误预测周期损失、指令窗口大小和IPC之间关系的算法，并讨论了这些参数的综合权衡以及这些参数对程序IPC的影响．由此可以确定依赖多个微处理器参数的取指带宽阈值和微处理器中几个关键参数的选取．相似文献

4.

同时多线程处理器上的动态分支预测器设计方案研究

任建安虹路放梁博《计算机科学》2006,33(3):239-243

同时多线程处理器（SMT）每个周期能够从多个线程中发射指令执行，从而大大地提高了超标量微处理器的指令吞吐量，但多个线程的同时执行也带来了许多硬件资源的共享冲突问题.其中，多个线程共享分支预测硬件的方案会对分支预测精度产生较大的影响.研究SMT处理器中分支处理方案对于处理器整体性能的影响，对于指导SMT处理器的设计是十分重要的.本文利用SMT处理器模拟器，针对各线程运行独立应用的SMT结构实验评估了几种著名的分支预测方案;给出了在单线程和多线程情况下，分支预测方案对分支预测精度和处理器整体性能的影响的分析;总结出在这样的SMT结构中，各线程拥有独立的预测器是一种较好的选择，并且由于各独立预测器可以采用小而简单的结构，所以不会带来太多的硬件开销. 相似文献

5.

基于同时多线程的TBHBP分支预测器研究

李静梅关海洋《计算机科学》2012,39(9):307-311

针对传统处理器分支预测器存在分支预测信息混乱、分支指令别名冲突和容量冲突率高的缺点,提出基于同时多线程处理器的分支预测器TBHBP。该分支预测器采取线程历史信息与基于地址索引的局部历史信息相结合的综合历史信息作为模式匹配表PHT的索引,并采取线程独立拥有线程历史寄存器和分支历史寄存器的方式,通过新增分支结果输出表来提高指令的分支预测执行速度。研究结果表明,TBHBP分支预测器有效解决了分支信息过时、分支指令别名和容量冲突的问题。与Gshare分支预测器相比,其指令吞吐率提升了12.5%,分支误预测率和误预测路径取指率分别下降了0.5%和2.1%。相似文献

6.

Evaluation and Choice of Various Branch Predictors for Low-Power Embedded Processor 总被引：2，自引：0，他引：2

下载免费PDF全文

范东睿杨洪波高光荣赵荣彩《计算机科学技术学报》2003,18(6):0-0

Power is an important design constraint in embedded computing systems.To meet the power constraint,microarchitecture and hardware designed to achieve high performance need to be revisited,from both performance and power angles.This paper studies one of them:branch predictor.As well known,branch prediction is critical to exploit instruction level parallelism effectively,but may incur additional power consumption due to the hardware resource dedicated for branch prediction and the extra power consumed on mispredicted branches.This paper explores the design space of branch prediction mechanisms and tries to find the most beneficial one to realiz elow-power embedded processor.The sample processor studied is Godson-like processor,which is adual-issue,out-of-order processor with deep pipeline,supporting MIPS instruction set. 相似文献

7.

Using Predicated Execution to Improve the Performance of a Dynamically Scheduled Machine with Speculative Execution

Po-Yung Chang Eric Hao Yale N. Patt Pohua P. Chang 《International journal of parallel programming》1996,24(3):209-234

Conditional branches incur a severe performance penalty in wide-issue, deeply pipelined processors. Speculative execution^{(1, 2)} and predicated execution^(3–9) are two mechanisms that have been proposed for reducing this penalty. Speculative execution can completely eliminate the penalty associated with a particular branch, but requires accurate branch prediction to be effective. Predicated execution does not require accurate branch prediction to eliminate the branch penalty, but is not applicable to all branches and can increase the latencies within the program. This paper examines the performance benefit of using both mechanisms to reduce the branch execution penalty. Predicated execution is used to handle the hard-to-predict branches and speculative execution is used to handle the remaining branches. The hard-to-predict branches within the program are determined by profiling. We show that this approach can significantly reduce the branch execution penalty suffered by wide-issue processors. 相似文献

8.

Selective Branch Inversion: Confidence Estimation for Branch Predictors

Artur Klauser Srilatha Manne Dirk Grunwald 《International journal of parallel programming》2001,29(1):81-110

This paper describes a family of branch predictors that use confidence estimation to improve the performance of an underlying branch predictor. This method, referred to as Selective Branch Inversion (SBI), uses a confidence estimator to determine when the branch direction prediction is likely to be incorrect; branch decisions for these low-confidence branches are inverted. SBI with an underlying Gshare branch predictor outperforms other equal sized predictors such as the best history length Gshare predictor, as well as equally complex McFarling and Bi-Mode predictors. Our analysis shows that SBI achieves its performance through conflict detection and correction, rather than through conflict avoidance as some of the previously proposed predictors such as Bi-Mode and Agree. We also show that SBI is applicable to other underlying predictors, such as the McFarling Combined predictor. Finally we show that Dynamic Inversion Monitoring (DIM) can be used as a safeguard to turn off SBI in cases where it degrades the overall performance. 相似文献

9.

Modeling Control Speculation for Timing Analysis

Li Xianfeng Mitra Tulika Roychoudhury Abhik 《Real-Time Systems》2005,29(1):27-58

The schedulability analysis of real-time embedded systems requires worst case execution time (WCET) analysis for the individual tasks. Bounding WCET involves not only language-level program path analysis, but also modeling the performance impact of complex micro-architectural features present in modern processors. In this paper, we statically analyze the execution time of embedded software on processors with speculative execution. The speculation of conditional branch outcomes (branch prediction) significantly improves a program's execution time. Thus, accurate modeling of control speculation is important for calculating tight WCET estimates. We present a parameterized framework to model the different branch prediction schemes. We further consider the complex interaction between speculative execution and instruction cache performance, that is, the fact that speculatively executed blocks can generate additional cache hits/misses. We extend our modeling to capture this effect of branch prediction on cache performance. Starting with the control flow graph of a program, our technique uses integer linear programming to estimate the program's WCET. The accuracy of our method is demonstrated by tight estimates obtained on realistic benchmarks. 相似文献

10.

Construction of speculative optimization algorithms

A. A. Belevantsev S. S. Gaisaryan V. P. Ivannikov 《Programming and Computer Software》2008,34(3):138-153

In modern processors, instructions to perform operations are often produced before it becomes known that this is required. Such an expedient, which is called speculative execution, helps to reveal parallelism at the instruction level. In the EPIC architectures, the speculative execution is completely controlled by the compiler, which makes it possible to avoid using complex hardware mechanisms for supporting speculative instruction production. Moreover, the idea of the speculative execution can be used by the compiler in machine-independent optimizations. The paper describes a scheme of construction of the speculative optimization that is based on the selection of properties of the control flow and data flow that are important from the optimization standpoint and on the estimation of the probabilities of their fulfillment. The probabilities found are used for searching and constructing advantageous speculative and bookkeeping transformations. For optimizations that include only speculative movements of instructions upwards along the control flow graph, on the basis of the suggested scheme, a method has been developed that includes algorithms for finding probabilities of data and control dependences, for estimating benefit of speculative movements, and for constructing a recovery code. On the basis of this method, an algorithm for the speculative scheduling of instructions for the Intel Itanium architecture has been developed and implemented. Specific features of its implementation and experimental results are described. 相似文献

11.

开发指令并行性的分支控制技术 总被引：1，自引：1，他引：0

王新辉王建新《计算机工程与应用》1999,35(12):25-27,35

提高指令级并行性是现代计算机追求的目标之一,控制分支则为挖掘指令级并行提出了挑战性问题。为开发指令级并行性,现代计算机采用了两种分支控制技术即投机执行技术和判定执行技术。文章就这两种技术的实现进行了系统分析,并以Ｍｅｒｃｅｄ芯片的实现为例进行了说明。相似文献

12.

基于分支混淆算法的符号执行技术 总被引：1，自引：1，他引：0

过辰楷姬秀娟许静《计算机科学》2012,39(9):115-119

符号执行是静态分析中的一项常用技术,数组元素混淆问题是限制符号执行本身性能的关键因素之一。通过分析数组混淆实质,提出了一种分支混淆算法,利用边混淆边符号执行的策略,可以处理较为复杂的数组问题。该策略使用实时的约束求解,及时地剪除不可达的混淆分支。结合符号执行和约束求解技术,开发了基于分支混淆算法的工具原型ASym。初步实验表明,利用分支混淆算法可以处理具有分支结构的数组混淆问题,避免延迟替换出现的数组语义误差,且在很大程度上缩减了分支数量,提高执行效率。相似文献

13.

A Power-Aware Branch Predictor by Accessing the BTB Selectively

下载免费PDF全文

Cheol Hong Kim Sung Woo Chung and Chu Shik Jhon 《计算机科学技术学报》2005,20(5):607-614

Microarchitects should consider power consumption, together with accuracy, when designing a branch predictor, especially in embedded processors. This paper proposes a power-aware branch predictor, which is based on the gshare predictor, by accessing the BTB （Branch Target Buffer） selectively. To enable the selective access to the BTB, the PHT （Pattern History Table） in the proposed branch predictor is accessed one cycle earlier than the traditional PHT if the program is executed sequentially without branch instructions. As a side effect, two predictions from the PHT are obtained through one access to the PHT, resulting in more power savings. In the proposed branch predictor, if the previous instruction was not a branch and the prediction from the PHT is untaken, the BTB is not accessed to reduce power consumption. If the previous instruction was a branch, the BTB is always accessed, regardless of the prediction from the PHT, to prevent the additional delay/accuracy decrease. The proposed branch predictor reduces the power consumption with little hardware overhead, not incurring additional delay and never harming prediction accuracy. The simulation results show that the proposed branch predictor reduces the power consumption by 29-47%. 相似文献

14.

分支测试中关键分支的寻找算法

施冬梅《计算机与数字工程》2011,39(9):16-19,91

分支测试被实践证明是目前性价比最高的结构性测试方法之一,在判断测试用例的分支覆盖过程中,通过深入研究DD图的性质,提出了一种基于生成的DD图对应的支配树和蕴含树基础上实现的关键分支寻找算法,能简便、快捷地找出判断程序分支覆盖的最小分支子集,即在程序执行过程中,只要获得关键分支的执行状况就可以计算出所有分支的覆盖情况。该算法具有良好的时间复杂度和有效性,对提高软件测试质量和效率,起到了较好的作用。相似文献

15.

一种针对短循环的跳转隐藏技术

白锋程旭《计算机工程与应用》2003,39(22):70-71,97

手持移动应用等嵌入式应用的广泛普及,对嵌入式系统及其核心部件—嵌入式处理器的成本、性能和功耗都提出了苛刻的要求,但是现有技术无法或者很难在低成本下,提高流水线嵌入式处理器的性能。论文提出了一种跳转隐藏技术,它针对嵌入式应用中大量出现且占据大部分执行时间的短循环结构,通过“隐藏”跳转指令,增强流水线效率。分析表明,该方法可以在较小的硬件代价下有效地提高短循环的执行效率,从而进一步满足嵌入式应用对嵌入式系统及其核心部件嵌入式处理器的需要。相似文献

16.

基于SimpleScalar的动态分支预测器研究

张筱史战果吴迪《微型电脑应用》2011,27(11):19-21,68,69

分支预测精度是影响当代处理器性能的重要指标,在近十年内一直是学术界和工业界的研究热点。为给不同应用场合的处理器动态分支预测器设计提供性能参考,针对处理器架构设计中应用广泛的几种动态分支预测器,使用SPEC CPU2000在SimpleScalar模拟器中进行仿真及测试分析。测试结果以预测精度和指令/时钟周期作为指标,并结合硬件开销,分析了不同种类分支预测器的适用对象和场合。相似文献

17.

嵌入式处理器动态分支预测机制研究与设计 总被引：2，自引：1，他引：1

黄伟王玉艳章建雄《计算机工程》2008,34(21):163-165

针对嵌入式处理器的特定应用环境,通过对传统神经网络算法的改进,结合定制的分支目标缓冲,提出一种复合式动态分支预测机制。该机制基于全局索引方式,对BTB结构进行定制设计,实现对循环逻辑中最后一条分支指令的精确预测。实验结果表明,该动态分支预测机制能降低硬件复杂度,提高预测精度。相似文献

18.

Evaluation and choice of various brånch predictors for low-power embedded processor

下载免费PDF全文

Fan?DongRui?Email author Yang?HongBo Gao?GuangRong Zhao?RongCai 《计算机科学技术学报》2003,18(6):833-838

Power is an important design constraint in embedded computing systems. To meet the power constraint, microarchitecture and hardware designed to achieve high performance need to be revisited, from both performance and power angles. This paper studies one of them: branch predictor. As well known, branch prediction is critical to exploit instruction level parallelism effectively, but may incur additional power consumption due to the hardware resource dedicated for branch prediction and the extra power consumed on mispredicted branches. This paper explores the design space of branch prediction mechanisms and tries to find the most beneficial one to realize low-power embedded processor. The sample processor studied is Godson-like processor, which is a dual-issue, out-of-order processor with deep pipeline, supporting MIPS instruction set. 相似文献

19.

Branch elimination by condition merging

William C. Kreahling David Whalley Mark W. Bailey Xin Yuan Gang‐Ryung Uh Robert van Engelen 《Software》2005,35(1):51-74

Conditional branches are expensive. Branches require a significant percentage of execution cycles since they occur frequently and cause pipeline flushes when mispredicted. In addition, branches result in forks in the control flow, which can prevent other code‐improving transformations from being applied. In this paper we describe profile‐based techniques for replacing the execution of a set of two or more branches with a single branch on a conventional scalar processor. These sets of branches can include tests of multiple variables. For instance, the test if (p1 != 0 && p2 != 0) , which is testing for NULL pointers, can be replaced with if (p1 & p2 != 0) . Program profiling is performed to target condition merging along frequently executed paths. The results show that eliminating branches by merging conditions can significantly reduce the number of conditional branches executed in non‐numerical applications. Copyright © 2004 John Wiley & Sons, Ltd. 相似文献

20.

Alloyed Branch History: Combining Global and Local Branch History for Robust Performance

Zhijian Lu John Lach Mircea R. Stan Kevin Skadron 《International journal of parallel programming》2003,31(2):137-177

This paper introduces alloyed prediction, a new hardware-based two-level branch predictor organization that combines global and local history in the same structure, combining the advantages of current two-level predictors with those of hybrid predictors. The alloyed organization is motivated by measurements showing that wrong-history mispredictions are even more important than conflict-induced mispredictions. Wrong-history mispredictions arise because current two-level, history-based predictors provide only global or only local history. The contribution of wrong history to the overall misprediction rate is substantial because most programs have some branches that require global history and others that require local history. This paper explores several ways to implement alloyed prediction, including the previously proposed bi-mode organization. Simulations show that mshare is the best alloyed organization among those we examine, and that mshare gives reliably good prediction compared to bimodal (two-bit), two-level, and hybrid predictors. The robust performance of alloying across a range of predictor sizes stems from its ability to attack wrong-history mispredictions at even very small sizes without subdividing the branch prediction hardware into smaller and less effective components. 相似文献