期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

何炎祥陈勇吴伟徐超李清安《计算机研究与发展》2014,51(8):1773-1780

能耗是设计嵌入式系统不可忽视的一个重要方面.针对嵌入式设备主要能耗来源之一的总线能耗,提出了一种基于总线翻转编码的低功耗指令调度方法.该方法以程序执行频度的profile信息为指导,利用数据随机性增强算法调度指令,获得适应总线翻转编码的指令序列,既减少总线翻转次数,又获得较为平衡的总线使用率,最终达到节约能耗的目的.以MiBench测试用例集为基准进行的对比实验可以看出,该方法能够有效地减少总线翻转次数.相对于未编码优化的arm-linux-gcc的指令序列,平均优化率可达到26%左右.相对于VSI+BI方法,平均优化率也能达到10%以上. 相似文献

2.

一种基于可重定向编译器的功耗优化框架 总被引：1，自引：0，他引：1

徐步荣李曦魏亮辉《计算机仿真》2007,24(4):306-309,325

当今,低功耗设计成为系统设计中的关键问题之一,而编译中的低功耗优化也成为系统设计中的一个重要环节.文章针对传统功耗优化缺乏通用性的缺点,提出一个基于可重定向编译器的功耗优化框架.该框架通过对编译生成的二进制目标码进行横向再调度来降低指令总线上的高低电位切换次数,从而达到降低系统功耗的目的,并且,基于xpADL的支持,为该框架提供不同的体系结构描述,可以生成针对不同体系结构的功耗优化代码.以IA-64体系结构为例,在其仿真器Ski上作了大量实验,实验表明,对于静态代码,该框架的优化可达25%左右,对于动态代码,该框架可以达到30%以上的优化.因此,该框架的优化是行之有效的,并且具有相当的可扩展性. 相似文献

3.

面向总线的低功耗优化方法探究

刘钱何炎祥廖希密陈勇《计算机工程与应用》2014,50(12):42-47

随着社会信息化水平的不断提高,信息产业的快速发展,由此带来了能源的消耗也越来越高。特别是芯片集成度越来越高,系统应用越来越复杂,这就使得功耗问题成为嵌入式系统必须面对的一个关键问题。单纯的硬件功耗优化已经不能满足要求,基于软件的功耗优化取得了很好的成效。在编译阶段,通过减少总线的翻转次数来降低系统的功耗。针对指令地址总线,结合遗传算法进行函数段的分配,结合相关的编码策略,减少总线翻转,从而降低其功耗。针对数据总线,使用蚁群算法进行指令调度,用0-1翻转编码,有效减少了其总线翻转,降低了功耗。这种基于数据总线和地址总线的优化算法,能够在特定的实验平台下通过实验验证,算法对于总线功耗的优化效率大约为25%左右。相似文献

4.

LS-RISC指令级功耗模型的开发

冯国臣沈绪榜郑新建刘兴旺《计算机技术与发展》2005,15(9)

针对笔者自主研制的LS-RISC微处理器,讨论了其指令级功耗模型的开发.为了降低指令间效应对功耗分析带来的复杂度,按照指令执行时经过的功能部件,对指令进行重新分类,使得分析的复杂度由O(n2)减小到了O(n).功耗模型的成功开发,为低功耗编译和软件功耗优化奠定了基础. 相似文献

5.

基于指令聚类与指令调度的嵌入式软件功耗优化研究

陈嘉董渊杨阳戴桂兰王生原《小型微型计算机系统》2006,27(1):175-179

选用指令级能耗评估模型，提出和验证了一种基于指令聚类与指令调度的功耗优化方案．该方案采用深度优先算法搜索局部最优解，挑选出能耗较小的一种指令序列．又兼顾测试工作量与精确度，将能耗相似的指令归入同类，有效降低了获取相邻指令切换能耗参数的工作量过大这一问题．通过分析基于SimpleSealar／Wattch模拟器的实验结果，指出仅用指令调度技术进行指令级功耗优化，其效果有限，为了提高优化效率，必须进行更高级别的功耗评估与优化．相似文献

6.

降低指令存储器功耗的一种有效方法：循环缓冲

下载免费PDF全文

胡定磊陈书明《计算机工程与科学》2007,29(6):93-96

在超长指令字结构的数字信号处理器中,其指令存储器的功耗所占比重较大。但是,根据数字信号应用的特点,可以采用循环缓冲来减小指令存储器的功耗。本文提出了一种编译器控制的循环缓冲技术,由编译器选择合适的循环代码将其放入循环缓冲,从而减小了取指过程中指令存储器的功耗;给出了循环缓冲的体系结构设计、功耗分析以及有效利用循环缓冲的编译方法;最后用功能级功耗模型验证了该方法的有效性。相似文献

7.

LS—RISC指令级功耗模型的开发 总被引：1，自引：0，他引：1

冯国臣沈绪榜郑新建刘兴旺《微机发展》2005,15(9):104-107

针对笔者自主研制的LS-RISC微处理器,讨论了其指令级功耗模型的开发。为了降低指令间效应对功耗分析带来的复杂度,按照指令执行时经过的功能部件,对指令进行重新分类,使得分析的复杂度由O(n2)减小到了O(n)。功耗模型的成功开发,为低功耗编译和软件功耗优化奠定了基础。相似文献

8.

魂芯分簇VLIW DSP上指令调度的优化

《微型机与应用》2017,(11)

魂芯DSP处理器是一款32 bit静态超标量、分簇结构的、支持SIMD的VLIW处理器。魂芯DSP芯片有4个执行簇和3个内存块,但簇间数据传输和寻址会占用总线带宽。魂芯DSP上每个簇中有大量的计算部件,但是现有的编译器框架中指令调度算法是针对非分簇结构的,无法充分利用魂芯DSP的分簇结构特点,产生出高效的指令级并行代码。根据魂芯处理器架构分簇的特点,提出了在魂芯DSP上进行指令分簇和指令调度的启发式算法,并且在开源Open64编译器框架上进行了实现。实验结果表明,该算法在魂芯DSP编译器上的实现可以显著提高一些在DSP上有着计算密集型程序的性能。相似文献

9.

即时编译器中的轻量级指令调度算法

下载免费PDF全文

史晓华刘超金茂忠郭鹏《计算机工程》2007,33(15):3-6

介绍了一种为即时编译器和时空受限系统设计的轻量级线性复杂指令调度算法。该算法进行指令调度时,不基于传统的DAG图或表达式树,而是基于一种独创的数据结构扩展关联矩阵,其时间复杂性在最坏情况下也能与全部指令长度构成严格的线性关系,仅占用不到1 KB的内存空间。该算法已被Intel为Xscale设计的高性能J2ME虚拟机XORP采用为即时编辑器中的缺省指令调度算法。相似文献

10.

指令级并行编译器的数据预取及优化方法 总被引：6，自引：0，他引：6

连瑞琦张兆庆乔如良《计算机学报》2000,23(6):576-584

微处理器芯片的处理能力越来越强,但是,存储器的速度却远远不能与其匹配,造成了整个系统的性能不理想,为解决这个总理２,编译器发展了局部性优化、数据预取等多种技术,文中将介绍一种用于ＩＬＰ（Ｉｎｓｔｒｕｃｔｉｏｎｌｅｖ－ｅｌＰａｒａｌｌｅｌｉｓｍ）优化编译器的数据预取技术以及一种利用寄存器堆减少主存访问次数、对程序进行优化的方法,利用它们可以提高平均存储性能,对科学和工程计算的应用是相当有效的。相似文献

11.

Building a retargetable local instruction scheduler

Vicki Allan Steven J. Beaty Bogong Su Philip H. Sweany 《Software》1998,28(3):249-283

While high-performance architectures have included some Instruction-Level Parallelism (ILP) for at least 25 years, recent computer designs have exploited ILP to a significant degree. Although a local scheduler is not sufficient for generation of excellent ILP code, it is necessary as many global scheduling and software pipelining techniques rely on a local scheduler. Global scheduling techniques are well-documented, yet practical discussions of local schedulers are notable in their absence. This paper strives to remedy that disparity by describing a list scheduling framework and several important practical details that, taken together, allow implementation of an efficient local instruction scheduler that is easily retargetable for ILP architectures. The foundation of our machine-independent instruction scheduler is a timing model that allows easy retargetability to a wide range of architectures. In addition to describing how a general list-scheduler can be implemented within the framework of our timing model, experimental results indicate that lookahead scheduling can profoundly improve a scheduler's ability to produce a legal schedule. Further experimental data shows that deciding to schedule a data dependence DAG (DDD) in forward or reverse order depends significantly upon that target architecture, suggesting the possibility of scheduling in each direction and using the best of the two schedules. In contrast, experiments demonstrate little difference in code quality for schedules generated by either instruction-driven or operation-driven schedulers. Thus, the inherent flexibility of operation-driven methods suggests including that approach in a retargetable instruction scheduler. List scheduling is, of course, a heuristic scheduling method. A variety of scheduling heuristics are presented. In addition, the paper describes a method, using a genetic algorithm search, to ‘fine-tune’ the weights of twenty-four individual heuristics to form a DDD-node heuristic tuned to a specific architecture. © 1998 John Wiley & Sons, Ltd. 相似文献

12.

基于DVS的实时多核嵌入式系统低功耗算法 总被引：2，自引：0，他引：2

王力生郭振轲《计算机应用研究》2009,26(1):127-128

动态电压调整（DVS）是低功耗设计方法中最基本的技术。然而,大部分的算法是基于单处理器平台的,并且仅考虑了相互独立的任务,这时使用DVS往往不能取得较好的效果。基于DVS提出了一种循环旋转调度技术来降低功耗,通过对程序中的循环进行重组,使得在满足时限的同时功耗最小,同时也考虑了电压转换所消耗的时间和功耗。相似文献

13.

Efficient instruction scheduling using finite state automata

Vasanth Bala Norman Rubin 《International journal of parallel programming》1997,25(2):53-82

Modern compilers employ sophisticated instruction scheduling techniques to shorten the number of cycles taken to execute the instruction stream. In addition to correctness, the instruction scheduler must also ensure that hardware resources are not oversubscribed in any cycle. For a contemporary processor implementation with multiple pipelines and complex resource usage restrictions, this is not an easy task. The complexity involved in reasoning about such resource hazards is one of the primary factors that constrain the instruction scheduler from performing certain kinds of transformations that can result in improved schedules. We extend a technique for detecting pipeline resource hazards based on finite state automata, to support the efficient implementation of such transformations that are essential for aggressive instruction scheduling beyond basic blocks. Although similar code transformations can be supported by other schemes such as reservation tables, our scheme is superior in terms of space and time. A global instruction scheduler using these techniques was implemented in the KSR compiler. This work was begun while the authors were at Kendall Square Research (KSR). 相似文献

14.

Data-Dependency Graph Transformations for Instruction Scheduling

Mark?Heffernan Email author Kent?Wilken 《Journal of Scheduling》2005,8(5):427-451

This paper presents a set of efficient graph transformations for local instruction scheduling. These transformations to the data-dependency graph prune redundant and inferior schedules from the solution space of the problem. Optimally scheduling the transformed problems using an enumerative scheduler is faster and the number of problems solved to optimality within a bounded time is increased. Furthermore, heuristic scheduling of the transformed problems often yields improved schedules for hard problems. The basic node-based transformation runs in O(ne) time, where n is the number of nodes and e is the number of edges in the graph. A generalized subgraph-based transformation runs in O(n² e) time. The transformations are implemented within the Gnu Compiler Collection (GCC) and are evaluated experimentally using the SPEC CPU2000 floating-point benchmarks targeted to various processor models. The results show that the transformations are fast and improve the results of both heuristic and optimal scheduling. 相似文献

15.

Fast,frequency‐based,integrated register allocation and instruction scheduling

Ioana Cutcutache Weng‐Fai Wong 《Software》2008,38(11):1105-1126

Instruction scheduling and register allocation are two of the most important optimization phases in modern compilers as they have a significant impact on the quality of the generated code. Unfortunately, the objectives of these two optimizations are in conflict with one another. The instruction scheduler attempts to exploit instruction‐level parallelism and requires many operands to be available in registers. On the other hand, the register allocator wants register pressure to be kept low so that the amount of spill code can be minimized. Currently these two phases are done separately, typically in three passes: prepass scheduling, register allocation and postpass scheduling. But this separation can lead to poor results. Previous works attempted to solve the phase‐ordering problem by combining the instruction scheduler with graph‐coloring‐based register allocators. The latter tend to be computationally expensive. Linear‐scan register allocators, on the other hand, are simple, fast and efficient. In this paper, we describe our effort to integrate instruction scheduling with a linear‐scan allocator. Furthermore, our integrated optimizer is able to take advantage of execution frequencies obtained through profiling. Our integrated register allocator and instruction scheduler achieved good code quality with significantly reduced compilation times. On the SPEC2000 benchmarks running on a 900 MHz ItaniumII, compared with OpenIMPACT, we halved the time spent in instruction scheduling and register allocation with negligible impact on execution times. Copyright © 2007 John Wiley & Sons, Ltd. 相似文献

16.

CMOS全集成低压低功耗锁相环设计概述 总被引：1，自引：1，他引：1

徐伟黄乐天丁召明李强《电子技术应用》2015,41(5):21-24

锁相环是现代电路系统尤其是通信系统中非常重要的一个部分,通常锁相环的性能指标很大程度上影响着整个系统的整体性能。随着人们对低压低功耗的需求日益增长,低压低功耗锁相环的研究也成为了非常热门的方向。总结了近年来在低压低功耗锁相环研究方向具有代表性的技术和解决方案。低压下电荷泵电路电流的匹配、低压低功耗的压控振荡器的实现以及低压下较快速度的分频器设计都成为低压低功耗锁相环设计必须要面对的难题。相似文献

17.

改进混沌烟花算法的多目标调度优化研究 总被引：1，自引：0，他引：1

包晓晓叶春明计磊黄霞《计算机应用研究》2016,33(9)

为满足生产中的不同需求,以最小化完成时间、最小化工件总延期时间、最小化机器总空闲时间为目标函数,建立多目标优化模型。提出一种改进混沌烟花算法,通过逻辑自映射产生混沌序列避免算法陷入局部最优,并设计了一种双元锦标赛与动态淘汰制相结合的帕累托非劣解集的构造方法。通过对六个不同规模标准问题的仿真测试,验证了该算法在求解多目标作业车间问题时具有较高求解精度和稳定性。相似文献

18.

Shifted gray encoding to reduce instruction memory address bus switching for low-power embedded systems

Hui Guo Sri Parameswaran 《Journal of Systems Architecture》2010,56(4-6):180-190

Gray code bus encoding is a simple approach to reduce instruction address bus switching. It requires little encoding hardware and no additional bus lines. Our analytical study reveals that with Gray encoding the address bus switching can be reduced by nearly 50%, for long, sequentially accessed code.However, existing Gray bus encoding techniques involve decoding the Gray coded bus, which is expensive in terms of performance and area, and has stymied efforts to implement such a coding scheme in real systems. Furthermore, based on our experimental investigation on a set of benchmarks, the Gray bus encoding may be much less effective than expected – the switching reduction rate can be as low as under 30%.This paper presents a design approach to enable the use of Gray encoding by avoiding the bus decoding operation and to enhance the switching reduction efficiency by using a shifted Gray code for a given application. The experiment results show that our design approach can improve bus switching reduction rate by up to 22.55%, with little overhead on design logic and performance. 相似文献