首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Circuit and microarchitectural techniques for reducing cache leakage power   总被引:2,自引:0,他引:2  
On-chip caches represent a sizable fraction of the total power consumption of microprocessors. As feature sizes shrink, the dominant component of this power consumption will be leakage. However, during a fixed period of time, the activity in a data cache is only centered on a small subset of the lines. This behavior can be exploited to cut the leakage power of large data caches by putting the cold cache lines into a state preserving, low-power drowsey mode. In this paper, we investigate policies and circuit techniques for implementing drowsy data caches. We show that with simple microarchitectural techniques, about 80%-90% of the data cache lines can be maintained in a drowsy state without affecting performance by more than 0.6%, even though moving lines into and out of a drowsy state incurs a slight performance loss. According to our projections, in a 70-nm complementary metal-oxide-semiconductor process, drowsy data caches will be able to reduce the total leakage energy consumed in the caches by 60%-75%. In addition, we extend the drowsy cache concept to reduce leakage power of instruction caches without significant impact on execution time. Our results show that data and instruction caches require different control strategies for efficient execution. In order to enable drowsy instruction caches, we propose a technique called cache subbank prediction, which is used to selectively wake up only the necessary parts of the instruction cache, while allowing most of the cache to stay in a low-leakage drowsy mode. This prediction technique reduces the negative performance impact by 78% compared with the no-prediction policy. Our technique works well even with small predictor sizes and enables a 75% reduction of leakage energy in a 32-kB instruction cache.  相似文献   

2.
In this paper, we propose a novel integrated circuit and architectural level technique to reduce leakage power consumption in high-performance cache memories using single V/sub t/ (transistor threshold voltage) process. We utilize the concept of gated-ground (nMOS transistor inserted between ground line and SRAM cell) to achieve a reduction in leakage energy without significantly affecting performance. Experimental results on gated-ground caches show that data is retained (DRG-Cache) even if the memory is put in the standby mode of operation. Data is restored when the gated-ground transistor is turned on. Turning off the gated-ground transistor in turn gives a large reduction in leakage power. This technique requires no extra circuitry; the row decoder itself can be used to control the gated-ground transistor. The technique is applicable to data and instruction caches as well as different levels of cache hierarchy, such as the L1, L2, or L3 caches. We fabricated a test chip in TSMC 0.25-/spl mu/m technology to show the data retention capability and the cell stability of the DRG-Cache. Our simulation results on 100-nm and 70-nm processes (Berkeley Predictive Technology Model) show 16.5% and 27% reduction in consumed energy in L1 cache and 50% and 47% reduction in L2 cache, respectively, with less than 5% impact on execution time and within 4% increase in area overhead.  相似文献   

3.
On-chip L1 and L2 caches represent a sizeable fraction of the total power consumption of microprocessors. In nanometer-scale technology, the subthreshold leakage power is becoming one of the dominant total power consumption components of those caches. In this study, we present optimization techniques to reduce the subthreshold leakage power of on-chip caches assuming that there are multiple threshold voltages, V/sub T/'s, available. First, we show a cache leakage optimization technique that examines the tradeoff between access time and subthreshold leakage power by assigning distinct V/sub T/'s to each of the four main cache components-address bus drivers, data bus drivers, decoders, and static random access memory (SRAM) cell arrays with sense amplifiers. Second, we show optimization techniques to reduce the leakage power of L1 and L2 on-chip caches without affecting the average memory access time. The key results are: 1) two additional high V/sub T/'s are enough to minimize leakage in a single cache-3 V/sub T/'s if we include a nominal low V/sub T/ for microprocessor core logic; 2) if L1 size is fixed, increasing L2 size can result in much lower leakage without reducing average memory access time; 3) if L2 size is fixed, reducing L1 size may result in lower leakage without loss of the average memory access time for the SPEC2K benchmarks; and 4) smaller L1 and larger L2 caches than are typical in today's processors result in significant leakage and dynamic power reduction without affecting the average memory access time.  相似文献   

4.
Static energy reduction techniques for microprocessor caches   总被引:1,自引:0,他引:1  
Microprocessor performance has been improved by increasing the capacity of on-chip caches. However, the performance gain comes at the price of static energy consumption due to subthreshold leakage current in cache memory arrays. This paper compares three techniques for reducing static energy consumption in on-chip level-1 and level-2 caches. One technique employs low-leakage transistors in the memory cell. Another technique, power supply switching, can be used to turn off memory cells and discard their contents. A third alternative is dynamic threshold modulation, which places memory cells in a standby state that preserves cell contents. In our experiments, we explore the energy and performance tradeoffs of these techniques. We also investigate the sensitivity of microprocessor performance and energy consumption to additional cache latency caused by leakage-reduction techniques.  相似文献   

5.
Leakage power is becoming the dominant component of chip power consumption with continued CMOS scaling. An important but commonly unnoticed fact is that leaky transistors act as resistors that help dampen the mid-frequency power supply noise. This paper focuses on the damping effect of various on-chip current components including the leakage current which becomes significant in scaled technologies. By developing physics-based damping models for active and leakage currents, we show that leakage, particularly gate tunneling leakage, provides more damping than strong-inversion current. The proposed models were validated in a 32-nm predictive CMOS technology under process–voltage–temperature (PVT) variations. Examples on large circuits such as SRAM caches are shown to illustrate the application of the proposed model. Simulation results show that the leakage induced damping effect can compensate the speed degradation at high temperatures by 7% or offer 61% saving in decap area and leakage power.   相似文献   

6.
Various architectural power reduction techniques have been proposed for on-chip caches in the last decade. In this paper, we first show that these power reduction techniques can be suboptimal when thermal effects are considered. Then, we propose a thermal-aware cache power-down technique that minimizes the power density of the active parts by turning off alternating rows of memory cells instead of entire banks. The decrease in the power density lowers the temperature, which then exponentially reduces the leakage. Thus, leakage power of the active parts is reduced in addition to the power eliminated from the parts that are turned off. Simulations based on SPEC2000, NetBench, and MediaBench applications in a 70-nm technology show that the proposed thermal-aware architecture can reduce the total energy consumption by 53% compared to a conventional cache, and 14% compared to a cache architecture with thermal-unaware power reduction scheme. Second, we show a block permutation scheme that can be used during the design of the caches to maximize the distance between blocks with consecutive addresses. Because of spatial locality, blocks with consecutive addresses are likely to be accessed within a short time interval. By maximizing the distance between such blocks, we minimize the power density of the hot spots in the cache, and hence reduce the peak temperature. This, in return, results in an average leakage power reduction of 8.7% compared to a conventional cache without affecting the dynamic power and the latency. Overall, both of our architectures add no extra run-time penalty compared to the thermal-unaware power reduction schemes, yet they result in a significant reduction in the total energy consumption of a cache  相似文献   

7.
In this paper, we propose a scheme for reducing the power consumption of memory components by conforming memory contents to a precharging value. The scheme is oriented to application to single bitline structure of memory. It selectively stores normal or inverted data to reduce the number of bit accesses that have different values from the precharging value, which reduces overall bitline toggling and ultimately contributes to power reduction of the memory  相似文献   

8.
The energy consumption due to input-output pins is a substantial part of the overall chip consumption. To reduce this energy, this work presents the working-zone encoding (WZE) method for encoding an external address bus, based on the conjecture that programs favor a few working zones of their address space at each instant. In such cases, the method identifies these zones and sends through the bus only the offset of this reference with respect to the previous reference to that zone, along with an identifier of the current working zone. This is combined with a one-hot encoding for the offset. Several improvements to this basic strategy are also described. The approach has been applied to several address streams, broken down into instruction-only, data-only, and instruction-data traces, to evaluate the effect on separate and shared address buses. Moreover, the effect of instruction and data caches is evaluated. For the case without caches, the proposed scheme is specially beneficial for data address and shared buses, which are the cases where other codings are less effective. On the other hand, for the case with caches the best scheme for the instruction-only and data-only traces is the WZE, whereas for the instruction-data traces it is either the WZE or the bus-invert with four groups (depending on the energy overhead of these techniques)  相似文献   

9.
This paper presents and evaluates three novel memory decoder designs which reduce energy consumption and delay by using selective precharging. These three designs, the AND-NOR, Sense-Amp, and the AND decoder, range in selectivity and select-line swing; these schemes charge and discharge fewer select-lines. This in turn consumes less energy than nonselective address decoders which charge and discharge all select-lines each cycle. These three decoding schemes are comprehensively simulated and compared to the conventional nonselective NOR decoder using 65 nm CMOS technology. Energy, delay, and area calculations are provided for all four 4-to-16 decoders under analysis. The most selective AND decoder performs best and dissipates between 61% and 99% less (73% less on average) and the selective Sense-Amp decoder performs only slightly worse by dissipating between 58% and 75% less (66% less on average) energy than dissipated by the NOR decoder. The AND-NOR decoder dissipates between 15% less and 20% more (6% more on average) energy than dissipated by the NOR decoder. In addition, the AND decoder is 7.5% and the Sense-Amp decoder is 5.0% faster than the NOR decoder, however, the AND-NOR decoder is 1.7% slower than the NOR decoder.  相似文献   

10.
Traditionally, state-encoding strategies targeting minimization of area, dynamic power or a combination of them have been utilized in finite state machine (FSM) synthesis. With drastic scaling down of devices at recent technology level, leakage power has also become an important design parameter to be considered during synthesis. A genetic algorithm-based state encoding, targeting area and power minimized FSM, has been proposed in this paper. A unified technique to reduce both static power (leakage) and dynamic power along with area trade-off has been carried out for FSM synthesis, targeting static CMOS NAND-NAND PLA, dynamic CMOS NOR-NOR PLA and pseudo-NMOS NOR-NOR PLA implementations. Suitable weights for area, leakage power and dynamic power to minimize power density have also been explored. Simulation with MCNC benchmarks shows an average improvement of 31%, 26% and 29% in leakage power consumption, dynamic power consumption and area requirement respectively, over NOVA-based state assignment technique in case of dynamic CMOS PLA implementation. Improvements of 30% in leakage power and 15% in area have been obtained for pseudo-NMOS PLA implementation. For the static CMOS case, the improvements are about 29% in leakage power consumption, 14% in dynamic power consumption and 18% in area requirement.  相似文献   

11.
《Microelectronics Journal》2015,46(11):1020-1032
This paper describes a new memristor crossbar architecture that is proposed for use in a high density cache design. This design has less than 10% of the write energy consumption than a simple memristor crossbar. Also, it has up to 3 times the bit density of an STT-MRAM system and up to 11 times the bit density of an SRAM architecture. The proposed architecture is analyzed using a detailed SPICE analysis that accounts for the resistance of the wires in the memristor structure. Additionally, the memristor model used in this work has been matched to specific device characterization data to provide accurate results in terms of energy, area, and timing. The proposed memory system was analyzed by modeling two different devices that vary in resistance range and switching time. This system does not require that the memristor devices have inherent diode effects which limit alternate current paths. Therefore this system is capable of utilizing a much broader class of devices.An architectural analysis has also been completed that shows how the memory system may perform as a cache memory. A hybrid cache structure was used to alleviate the long write latencies of memristor devices. This approach consisted of the tag array being made of SRAM cells while the data array was made of the memristor circuit proposed. This hybrid scheme allows multiple reads and writes to concurrently access different sub-arrays within a cache. The performance of these novel memristor based caches was compared to SRAM and STT-MRAM based caches through detailed simulations. The results show that the memristor caches are denser and allow better performance along with lower system power when compared to the STT-MRAM and SRAM caches.  相似文献   

12.
This paper describes a single-bit-line cross-point cell activation (SCPA) architecture, which has been developed to reduce active power consumption and to avoid increase in the size of high-density SRAM chips, such as 16-Mb SRAM's and beyond. A new PMOS precharging boost circuit, introduced to realize the single-bit-line structure, is also discussed. This circuit is suitable for operation under low-voltage power supply conditions. The SCPA architecture with the new word-line boost circuit is demonstrated with the experimental device, which is fabricated by a 0.4-μm CMOS wafer process technology  相似文献   

13.
Deep-submicron CMOS designs maintain high transistor switching speeds by scaling down the supply voltage and proportionately reducing the transistor threshold voltage. Lowering the threshold voltage increases leakage energy dissipation due to subthreshold leakage current even when the transistor is not switching. Estimates suggest a five-fold increase in leakage energy in every future generation. In modern microarchitectures, much of the leakage energy is dissipated in large on-chip cache memory structures with high transistor densities. While cache utilization varies both within and across applications, modern cache designs are fixed in size resulting in transistor leakage inefficiencies. This paper explores an integrated architectural and circuit-level approach to reducing leakage energy in instruction caches (i-caches). At the architecture level, we propose the Dynamically ResIzable i-cache (DRI i cache), a novel i-cache design that dynamically resizes and adapts to an application's required size. At the circuit-level, we use gated-Vdd, a novel mechanism that effectively turns off the supply voltage to, and eliminates leakage in, the SRAM cells in a DRI i-cache's unused sections. Architectural and circuit-level simulation results indicate that a DRI i-cache successfully and robustly exploits the cache size variability both within and across applications. Compared to a conventional i-cache using an aggressively-scaled threshold voltage a 64 K DRI i-cache reduces on average both the leakage energy-delay product and cache size by 62%, with less than 4% impact on execution time. Our results also indicate that a wide NMOS dual-Vt gated-Vdd transistor with a charge pump offers the best gating implementation and virtually eliminates leakage energy with minimal increase in an SRAM cell read time area as compared to an i-cache with an aggressively-scaled threshold voltage  相似文献   

14.
Energy consumption is one of the important parameters to be optimized during the design of portable embedded systems. Thus, most of the contemporary portable devices feature low-power processors coupled with on-chip memories (e.g., caches, scratchpads). Scratchpads are better than traditional caches in terms of power, performance, area, and predictability. However, unlike caches they depend upon software allocation techniques for their utilization. In this paper, we present scratchpad overlay techniques which analyze the application and insert instructions to dynamically copy both variables and code segments onto the scratchpad at runtime. We demonstrate that the problem of overlaying scratchpad is an extension of the Global Register Allocation problem. We present optimal and near-optimal approaches for solving the scratchpad overlay problem. The near-optimal scratchpad overlay approach achieves close to the optimal results and is significantly faster than the optimal approach. Our approaches improve upon the previously known static allocation technique for assigning both variables and code segments onto the scratchpad. The evaluation of the approaches for ARM7 processor reports, average energy, and execution time reductions of 26% and 14% over the static approach, respectively. Additional experiments comparing the overlayed scratchpads against unified caches of the same size, report average energy, and execution time savings of 20% and 10%, respectively. We also report data memory energy reductions of 45%-57% due to the insertion of a 1024-bytes scratchpad memory in the memory hierarchy of a digital signal processor (DSP).  相似文献   

15.
Soft errors issues in low-power caches   总被引:1,自引:0,他引:1  
As technology scales, reducing leakage power and improving reliability of data stored in memory cells is both important and challenging. While lower threshold voltages increase leakage, lower supply voltages and smaller nodal capacitances reduce energy consumption but increase soft errors rates. In this work, we present a comprehensive study of soft error rates on low-power cache design. First, we study the effect of circuit level techniques, used to reduce the leakage energy consumption, on soft error rates. Our results using custom designs show that many of these approaches may increase the soft error rates as compared to a standard 6T SRAM. We also validate the effects of voltage scaling on soft error rate by performing accelerated tests on off-the-shelf SRAM-based chips using a neutron beam source. Next, we study the impact of cache decay and drowsy cache, which are two commonly used architectural-level leakage reduction approaches, on the cache reliability. Our results indicate that the leakage optimization techniques change the reliability of cache memory. More importantly, we demonstrate that there is a tradeoff between optimizing for leakage power and improving the immunity to soft error. We also study the impact of error correcting codes on soft error rates. Based on this study, we propose an adaptive error correcting scheme to reduce the leakage energy consumption and improve reliability.  相似文献   

16.
Most microprocessors employ the on-chip caches to bridge the performance gap between the processor and the main memory. However, the cache accesses usually contribute significantly to the total power consumption of the chip. Based on the observation that an overwhelming majority of the values written to the cache are "0", in this paper we propose a zero-aware SRAM cell with an asymmetric inverter pair, called ZA cell, to minimize the cache power consumption in writing "0". The ZA cell uses a circuit-level technique, which is software independent and orthogonal to other low-power techniques at architecture-level. Compared to the conventional SRAM cell, the experimental results based on the SPEC2000 and MediaBench traces show that without compromise of both performance and stability, the ZA cell can reduce the average cache write power consumption over 60% for both the baseline instruction and data caches. In particular, the ZA cell is attractive in the data caches, which reveal the high write-zero rate.  相似文献   

17.
Effectiveness of previous SRAM leakage reduction techniques vary significantly as the leakage variation gets worse with process and temperature fluctuation. This paper proposes a simple circuit technique that adaptively trades off overhead energy for maximum leakage savings under severe leakage variations. The proposed run-time leakage reduction technique for on-die SRAM caches considers architectural access behavior to determine how often the SRAM blocks should enter a sleep mode. A self-decay circuit generates a periodic sleep pulse with an adaptive pulse period, which puts the SRAM array into a sleep mode more frequently at high leakage conditions (fast process, high temperature) and vice versa. An 0.18-/spl mu/m 1.8-V 16-kbyte SRAM testchip shows 94.2% reduction in SRAM cell leakage at a performance penalty less than 2%. Measurement results also indicate that our proposed memory cell improves SRAM static noise margin by 25%.  相似文献   

18.
对当前纳米级低功耗设计中静态功耗的产生机理以及各种降低漏电流功耗的电路设计理论及其特点做详细的论述.以期为相关研究:设计人员提供有益参考。  相似文献   

19.
We introduce a novel family of asymmetric dual-V/sub t/ static random access memory cell designs that reduce leakage power in caches while maintaining low access latency. Our designs exploit the strong bias toward zero at the bit level exhibited by the memory value stream of ordinary programs. Compared to conventional symmetric high-performance cells, our cells offer significant leakage reduction in the zero state and, in some cases, also in the one state, albeit to a lesser extent. A novel sense amplifier, in combination with dummy bitlines, allows for read times to be on par with conventional symmetric cells. With one cell design, leakage is reduced by 7/spl times/ (in the zero state) with no performance degradation, but with a stability degradation of 6%. Another cell design reduces leakage by 2/spl times/ (in the zero state) with no performance or stability loss. An alternative cell design reduces leakage by 58/spl times/ (in the zero state) with a performance degradation of 1% and an area increase of 2.4% and no stability degradation.  相似文献   

20.
A circuit technique is presented for reducing the subthreshold leakage energy consumption of domino logic circuits. Sleep switch transistors are proposed to place an idle dual threshold voltage domino logic circuit into a low leakage state. The circuit technique enhances the effectiveness of a dual threshold voltage CMOS technology to reduce the subthreshold leakage current by strongly turning off all of the high threshold voltage transistors. The sleep switch circuit technique significantly reduces the subthreshold leakage energy as compared to both standard low-threshold voltage and dual threshold voltage domino logic circuits. A domino adder enters and leaves a low leakage sleep mode within a single clock cycle. The energy overhead of the circuit technique is low, justifying the activation of the proposed sleep scheme by providing a net savings in total power consumption during short idle periods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号