首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 703 毫秒
1.
提出了一种基于分离比较cache的设计方法,其技术关键在于设计一个用来存储原标志低四位的全相联cache和分离标志比较器,以确保同时获得高性能和低能量损耗。SPEC95仿真结果表明,分离比较cache能够节省传统四路组相联cache13%的存取时间和45%~60%的能量损耗。  相似文献   

2.
Cache是现代微处理器中消耗能量最多的部件之一。论文研究了全相联cache的组织结构,给出了一种全相联cache的体系结构级功耗估算模型,验证了该模型的有效性,并定量地分析了全相联cache组织结构的功耗特性。  相似文献   

3.
在分析现有体系结构级低功耗cache设计方案的基础上,提出了一种混合cache低功耗设计策略,通过在常规混合cache结构上增加一标志域来区分cache某组中的指令和数据,限制了处理器每次访问的路数,从而达到低功耗的效果。详细阐明了该方法的原理和硬件实现,并将其应用到自主研发的龙腾C2微处理器上。实验结果表明,该方法不损耗cache性能,面积牺牲仅1.45%,总功耗降低了23.1%。  相似文献   

4.
一种低功耗动态可重构cache方案   总被引:1,自引:0,他引:1  
赵欢  苏小昆  李仁发 《计算机应用》2009,29(5):1446-1451
嵌入式系统中,处理器功耗是十分受关注的,研究表明嵌入式系统中cache存储器的功耗占处理器总功耗的30%~60%。为此提出一种低功耗动态可重构的cache方案Tournament cache,该cache方案通过在传统cache结构的基础上增加三个计数器和一个寄存器,在程序运行的过程中,根据计数器统计的结果动态调整cache的相联度,使得相联度在1、2或4路之间变化,以适应不同程序段的需要,从而降低系统的功耗。实验结果表明,此cache方案对比传统的四路组相联的cache能耗节省超过40%,而且性能的降低几乎可以忽略。  相似文献   

5.
一种低功耗动态可重构cache算法的研究   总被引:1,自引:0,他引:1  
动态可重构cache算法根据指令时间数监测程序段的变化,确定容量调整.在程序段内,状态机根据平均访问时间对cache的访问进行预判,然后根据预判的结果确定当前程序段的cache结构.实验结果表明,此算法比传统四路组相联cache功耗降低61%,而性能损失只有2%左右.与已有算法相比,功耗和性能都得到进一步的提高.  相似文献   

6.
在分析现有体系结构级低功耗cache设计方案的基础上,提出了一种混合cache低功耗设计策略,通过在常规混合cache结构上增加一标志域来区分cache某组中的指令和数据,限制了处理器每次访问的路数,从而达到低功耗的效果。详细阐明了该方法的原理和硬件实现,并将其应用到自主研发的龙腾C2微处理器上。实验结果表明,该方法不损耗cache性能,面积牺牲仅1.45%,总功耗降低了23.1%。  相似文献   

7.
一种低功耗的动态可重构Cache设计   总被引:1,自引:0,他引:1  
在嵌入式微处理器设计中,cache提高了性能的同时也成了主要的功耗来源.提出一种非统一的动态可重构的低功耗cache结构,和一种动态重构算法DAS(Dynamic Associativity Selection),通过动态重构cache来降低功耗.基于MiBench的仿真结果表明,可重构的cache结构比普通的cache结构的性能更优且能耗更低,指令和数据cache命中率分别平均提高了2.1%和1.4%,内存系统平均能耗降低了8.1%.  相似文献   

8.
虚拟多体Cache:一种高效实现高相联度Cache的方案   总被引:1,自引:0,他引:1  
高相联度Cache具有失效率低的优点,而且高相联度在许多情况下是非常重要的。但是高相联度Cache的一个突出问题是访问时间较长。文章提出的虚拟多体Cache能较好地解决这一问题,文中论述了虚拟多体Cache的思想和两种具体的方案:SMC-Cache和PMC-Cache,并给出了详细的性能模拟结果。模拟结果表明,它们能非常有效地提高Cache的性能。在Cache容量为4KB、相联度为4的情况下,它们在平均访问时间上比直接映象Cache分别减少了9.8%和10.8%。  相似文献   

9.
提出一种解决PACT01一种结合动态可编程逻辑阵列(DPGA)的处理器的新型体系结制中cache的一致性与同步性问题的算法,并且解决多线程支持的快速上下文切换及快速用户级操作问题。存储器替换机制是解决cache的一致性问题及当cache未命中时从局部或远程存储器到cacbe存储器的数据替换问题的一种硬件实现方法,产生冲突的原因是由于多线程并行的写入/读取的位置相同和读或写的位置相同。文中选择的是相联映射策略,同时也选择了最少最近使用LRU算法,即在cache未命中时替换最少最近使用的参考块,为实现LRU算法设置了与每块相对应的计数器。  相似文献   

10.
介绍了一种采用预比较方法的高速缓存结构。通过标志段的预比较来避免对无关标志段和数据段的访问以降低访问功耗。并引入反相时钟来优化其访问时序,使平均访问延时少于一个周期。实验显示,在保持命中率的基础上,对测试程序的访存优化表现出很好一致性,且功耗优势随相联度增加而增大。相比预测型结构,在8路相联度下平均有28.5%的功耗降低。  相似文献   

11.
In multiprocessor system-on-a-chips (MPSoCs) that use snoop-based cache coherency protocols, a miss in the data cache triggers the broadcast of coherency request to all the remote caches, to keep all data coherent. However, the majority of these requests are unnecessary because remote caches do not have the matching blocks and so their tag lookups fail. Both the coherency requests and the tag lookups corresponding to a remote miss consume unnecessary energy.We propose an architecture-level technique for snoop energy reduction, called broadcast filtering, which prevents unnecessary coherency requests from being broadcast to remote caches, and thus reduces snoop energy consumption by both the cache and bus. Broadcast filtering is implemented using a snooping cache and a split bus. The snooping cache checks if a block that cannot be obtained locally exists in remote caches before broadcasting a coherency request. If no remote cache has the matching block, there is no broadcast; and if broadcasting is necessary, the split bus allows coherency requests to be broadcast selectively to the remote caches which have matching blocks.Experimental results show a reduction by 90% of cache lookups, by 60% of bus usage, and by 40% of snoop energy consumption, at a small cost in reduced performance. An analysis result based on the energy model shows the broadcast filtering technique can reduce by up to 55% of energy consumption per cache coherency operation.  相似文献   

12.
在现代微处理器中,指令缓存的Tag读取、比较消耗了指令缓存较大比例的能耗。提出一种基于推断的低能耗指令缓存:不对称指令缓存。根据跳转指令比例低的特点,在该结构中区别处理跳转指令和顺序指令,使用和数据不完全对应的简化标记管理位。该结构采用了命中推断和变长指令取指两种创新技术,其中基于命中推断技术解决了指令缓存命中时Tag比较过多的问题;使用变长指令取指技术提高了顺序指令块的命中率。实验结果表明,对于选取的SPEC2006测试程序,不对称指令缓存结构较常规L1指令Cache取指能耗下降了40%~60%,比无标记指令缓存结构TH IC能耗降低了9%;取指ED2P方面,较常规L1指令Cache优化约50%,比TH IC结构优化约17%。  相似文献   

13.
On-chip instruction cache is a potential power hungry component in embedded systems due to its large chip area and high access-frequency. Aiming at reducing power consumption of the on-chip cache, we propose a Reduced One-Bit Tag Instruction Cache (ROBTIC), where the cache size is judiciously reduced and the cache tag field only contains the least significant bit of the full-tag. We develop a cache operational control scheme for ROBTIC so that with the one-bit cache tag, the program locality can still be efficiently exploited. For applications where most of the memory accesses are localized, our cache can achieve similar performance as a traditional full-tag cache; however, the power consumption of the cache can be significantly reduced due to the much smaller cache size, narrower tag array (just one bit), and tinier tag comparison circuit being used. Experiments on a set of benchmarks implemented in CMOS 180 nm process technology demonstrate that our proposed design can reduce up to 27.3% dynamic power consumption and 30.9% area of the traditional cache when the cache size is fixed at 32 instructions, which outperforms the existing partial-tag based cache design. With the cache size customization, a further 47.8% power saving can be achieved. Our experimental results also show that when implemented in the deep sub-micron technologies where the leakage power is not ignorable, our design is still efficient - a coherent power saving trend (about 22%) has been observed for technologies from 130 nm down to 65 nm.  相似文献   

14.
V-Way Cache结构利用存储访问在组之间分布的不均匀性,根据需求动态调整组相联度,具有比传统Cache结构更有效的资源利用率。然而,V-Way Cache结构组相联度调整以增大Tag阵列容量为代价,增加了面积、功耗等开销,且Tag阵列利用率不高。对V-Way Cache结构进行优化,提出一种低开销的异构可变相联度Cache结构HV-Way Cache。HV-Way Cache采用异构Tag阵列组织,通过允许多个组共享Tag项资源以缩减Tag路容量;Tag项替换信息以组为单位组织,挑选最久没有被使用的项作为被替换项。使用Cacti和Simics模拟器进行模拟实验,结果表明HV-Way Cache结构能以很少的性能损失实现面积、功耗开销的极大降低。  相似文献   

15.
Processors in embedded systems mostly employ cache architectures in order to alleviate the access latency gap between processors and memory systems. Caches in embedded systems usually occupy a major fraction of the implemented chip area. The power dissipation of cache system thus constitutes a significant fraction of the power dissipated by the entire processor in embedded systems. In this paper, we propose the compressed tag architecture to reduce the power dissipation of the tag store in cache systems. We introduce a new tag-matching mechanism by using a locality buffer and a tag compression technique. The main power reduction feature of our proposal is the use of small tag space matching instead of full tag matching, with modest additional hardware costs. The simulation results show that the proposed model provides a power and energy-delay product reduction of up to 27.8% and 26.5%, respectively, while still providing a comparable level of system performance to regular cache systems.  相似文献   

16.
以优化压缩cache的替换策略为目标,提出一种优化的基于修正LRU的压缩cache替换策略MLRU-C。MLRU-C策略能利用压缩cache中额外的tag资源,形成影子tag机制来探测并修正LRU替换策略的错误替换决策,从而优化压缩cache替换策略的性能。实验结果表明,与传统LRU替换策略相比,MLRU-C平均能降低L2压缩cache失效率12.3%。  相似文献   

17.
The L1 cache in today’s high-performance processors accesses all ways of a selected set in parallel. This constitutes a major source of energy inefficiency: at most one of the N fetched blocks can be useful in an N-way set-associative cache. The other N-1 cachelines will all be tag mismatches and subsequently discarded.We propose to eliminate unnecessary associative fetches by exploiting certain software semantics in cache design, thus reducing dynamic power consumption. Specifically, we use memory region information to eliminate unnecessary fetches in the data cache, and ring level information to optimize fetches in the instruction cache. We present a design that is performance-neutral, transparent to applications, and incurs a space overhead of mere 0.41% of the L1 cache.We show significantly reduced cache lookups with benchmarks including SPEC CPU, SPECjbb, SPECjAppServer, PARSEC, and Apache. For example, for SPEC CPU 2006, the proposed mechanism helps to reduce cache block fetches from the data and instruction caches by an average of 29% and 53% respectively, resulting in power savings of 17% and 35% in the caches, compared to the aggressively clock-gated baselines.  相似文献   

18.
嵌入式处理器中Cache的应用极大地提高了处理器的性能,同时Cache,尤其是指令Cache功耗占据了处理器很大一部分功耗,关闭不必要的tag SRAM和data SRAM的访问,可以极大地降低功耗。提出了一种流水化的指令Cache访问机制,关闭不必要的data SRAM的访问;并且通过记录指令Cache行的信息和预测下一行的Cache形成一个Cache行滑动窗口,关闭不必要的tag SRAM访问。所提出的方法没有性能损失,在SMIC 90nm工艺下进行功耗分析,其指令访问的功耗降低50%。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号