首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Bus-invert coding for low-power I/O   总被引:1,自引:0,他引:1  
Technology trends and especially portable applications drive the quest for low-power VLSI design. Solutions that involve algorithmic, structural or physical transformations are sought. The focus is on developing low-power circuits without affecting too much the performance (area, latency, period). For CMOS circuits most power is dissipated as dynamic power for charging and discharging node capacitances. This is why many promising results in low-power design are obtained by minimizing the number of transitions inside the CMOS circuit. While it is generally accepted that because of the large capacitances involved much of the power dissipated by an IC is at the I/O little has been specifically done for decreasing the I/O power dissipation. We propose the bus-invert method of coding the I/O which lowers the bus activity and thus decreases the I/O peak power dissipation by 50% and the I/O average power dissipation by up to 25%. The method is general but applies best for dealing with buses. This is fortunate because buses are indeed most likely to have very large capacitances associated with them and consequently dissipate a lot of power  相似文献   

2.
一种改进的高层功耗估计方法   总被引:1,自引:0,他引:1  
本文讨论了集成电路高层功耗估计方法,并针对线性位相关系数模型提出了一种更为精确的功耗估计方法,它可用于功耗驱动的VLSI高层综合。实验结果表明当信号为有限参数的平稳时间序称时,这种估计方法比原有用信号的相关系数来代替强相关位相关系数的方法具有较高的精度,从而提高了在VLSI高层设计时对模块功耗估计的精度。  相似文献   

3.
Every new VLSI technology generation has resulted in interconnects increasingly limiting the performance, area, and power dissipation of new processors. Subsequently, it is necessary to devise efficient interconnect design techniques to reduce the impact of VLSI interconnects on overall system design. New optimizations of a wave-pipelined multiplexed (WPM) interconnect routing circuit are described in this paper. These WPM circuits can be used with current interconnect repeater circuits to further reduce interconnect delay, interconnect area, transistor area, and/or power dissipation. For example, new area constrained WPM circuit optimizations illustrate that the interconnect circuit power can be reduced by 26% or the interconnect performance can be improved by 74%. Moreover, in both these cases, because a significant number of repeaters are eliminated, the transistor area can reduce by 41% or 29%, respectively. Finally, the tolerance of WPM circuits to crosstalk noise, power supply noise, clock skew, and manufacturing variations is also presented. This study of tolerance levels defines the conditions under which the WPM circuit will function correctly, and it is shown in this paper for the first time that WPM circuits are robust enough to operate with variability that can be encountered in deep submicrometer technologies.  相似文献   

4.
Network on chip (NoC) has been proposed as an appropriate solution for today’s on-chip communication challenges. Power dissipation has become a key factor in the NoCs because of their shrinking sizes. In this paper, we propose a new encoding approach aimed at power reduction by decreasing the number of switching activities on the buses. This approach assigns the symbols to data word in such a way that the more frequent words are sent by less power consumption. This algorithm dedicates the symbols with less ones to high probability data and uses transition signaling to transmit data. The proposed method, unlike the existing low power encoding, does not rely on spatial redundancy and keeps the width of the bus constant. Experimental evaluations show that our approach reduces the power dissipation up to 46 % with 2.70, 0.51, and 15.43 % power, critical path and area overhead in the NoCs, respectively.  相似文献   

5.
With the development of mobile communication and portable electronic devices, minimising the average power dissipation has become a primary concern for very large scale integrated (VLSI) circuit design. The authors map a circuit into a weighted acyclic graph (WAG), and propose a criterion for partitioning the WAG into two disjoint graphs with a minimum ratio-cut cost  相似文献   

6.
Multiple-valued buses have been proposed as a way of overcoming the interconnection complexity of VLSI. In this paper we present efficient new encoder-decoder circuits for four-valued bus signalling in clocked CMOS VLSI systems. The important advantages of our designs are that they can be implemented by standard binary CMOS processes, and are considerably simpler than earlier designs. Furthermore, they have no static power dissipation. The circuits have been extensively simulated using SPICE and have been found to operate reliably.  相似文献   

7.
Design of a 20-mb/s 256-state Viterbi decoder   总被引:1,自引:0,他引:1  
The design of high-throughput large-state Viterbi decoders relies on the use of multiple arithmetic units. The global communication channels among these parallel processors often consist of long interconnect wires, resulting in large area and high power consumption. In this paper, we propose a data transfer oriented design methodology to implement a low-power 256-state rate-1/3 Viterbi decoder. Our architectural level scheme uses operation partitioning, packing, and scheduling to analyze and optimize interconnect effects in early design stages. In comparison with other published Viterbi decoders, our approach reduces the global data transfers by up to 75% and decreases the amount of global buses by up to 48%, while enabling the use of deeply pipelined datapaths with no data forwarding. In the register-transfer level (RTL) implementation, we apply precomputation in conjunction with saturation arithmetic to further reduce power dissipation with provably no coding performance degradation. Designed using a 0.25 /spl mu/m standard cell library, our decoder achieves a throughput of 20 Mb/s in simulation and dissipates only 0.45 W.  相似文献   

8.
Reducing the power dissipated by buses becomes one of the most important elements in low-power VLSI design. A new coding scheme called sequence-switch coding (SSC) is proposed in this paper. It is a general-purpose coding scheme that employs the sequence of data in reducing the number of transitions on buses. A simple switching algorithm is presented to show the feasibility of SSC. According to simulations, this algorithm reduces around 10% of bus transitions in the transmission of benchmark files. SSC can be used for burst data transfer in any application. In particular, it is suitable for internet and multimedia applications that have stream-type data transfer pattern.  相似文献   

9.
In modern-day VLSI systems, performance and manufacturing costs are being driven by the on-chip wiring needs due to the continuous increase in the number of transistors. This paper proposes a low overhead wave-pipelined multiplexed (WPM) routing technique that harnesses the inherent intraclock period interconnect idleness to implement wire sharing throughout the various hierarchical levels of design. It is illustrated in this paper that the WPM network can be readily incorporated into future gigascale integration (GSI) systems to reduce the number of interconnect routing channels in an attempt to contain escalating manufacturing costs. Both, a system level analysis and circuit level verification of this WPM routing are presented in this paper. A multilevel interconnect network design simulator (MINDS) that uses system level interconnect prediction (SLIP) techniques and HSPICE circuit simulations for optimizing the interconnect dimensions has been used to assess the opportunities for application of WPM wire circuits in high performance digital designs. A custom routing example highlights the ease with which the WPM routing technique can be easily incorporated into the existing VLSI systems. In addition, for a 40 million transistor system case study, this system level analysis reveals that the use of a WPM network could result in an almost 20% decrease in the number of metal layers for less than 4% increase in dynamic power with no loss of communication throughput performance. The key virtues of WPM routing are its flexibility, robustness, implementation simplicity and its low overhead requirements.  相似文献   

10.
An efficient state-sequential very large scale integration (VLSI) architecture and low-power design methodologies ranging from the system-level to the layout-level are presented for a large-constraint-length Viterbi decoder for code division multiple access (CDMA) digital cellular/personal communication services (PCS) applications. The low-power design approaches are also applicable to many other systems and algorithms. VLSI implementation issues and prototype fabrication results for a state-sequential Viterbi decoder for convolutional codes of rate 1/2 and constraint-length 9 are also described. The chip's core, consisting of approximately 65 k transistors, occupies 1.9 mm by 3.4 mm in a 0.8-μm triple-layer-metal n-well CMOS technology. The chip's measured total power dissipation is 0.24 mW at a 14.4 kb/s data-rate with 0.9216 MHz clocking at a supply voltage of 1.65 V. The Viterbi decoder presented here is the lowest power and smallest area core in its class, to the best of our knowledge  相似文献   

11.
We consider the problem of reduction of computation cost by introducing redundancy in the number of ports as well as in the input and output sequences of computation modules. Using our formulation, the classical "communication scenario" is the case when a computation module has to recompute the input sequence at a different location or time with high fidelity and low bit-error rates. We then consider communication with different computational cost objective than that given by bit-error rate. An example is communication over deep submicrometer very-large scale integration (VLSI) buses where the expected energy consumption per communicated information bit is the cost of computation. We treat this scenario using tools from information theory and establish fundamental bounds on the achievable expected energy consumption per bit in deep submicrometer VLSI buses as a function of their utilization. Some of our results also shed light on coding schemes that achieve these bounds. We then prove that the best tradeoff between the expected energy consumption per bit and bus utilization can be achieved using codes constructed from typical sequences of Markov stationary ergodic processes. We use this observation to give a closed-form expression for the best tradeoff between the expected energy consumption per bit and the utilization of the bus. This expression, in principle, can be computed using standard numerical methods. The methodology developed here naturally extends to more general computation scenarios.  相似文献   

12.
车载电源总线的电磁兼容性设计   总被引:1,自引:1,他引:0  
在车载通信系统中,总线的种类繁多,类型多样。而对于车内的任何一台设备,都必须有电源才能工作。因此电源总线在车载通信系统中的作用非常重要,但在实际的设计中,电源总线的电磁兼容设计很容易被设计者所忽视。基于以上原因,本文从车载总线中电源线的信号传输特征角度对车载总线进行了简要的分析,并根据分析结果对电源总线的传导发射进行了测试,得出了采用双绞线作为电源总线有利于信号传输的结论。这一结论对于工程实践有着重要的指导意义。  相似文献   

13.
Advances in VLSI technology have enabled the implementation of complex digital circuits in a single chip, reducing system size and power consumption. In deep submicron low power CMOS VLSI design, the main cause of energy dissipation is charging and discharging of internal node capacitances due to transition activity. Transition activity is one of the major factors that also affect the dynamic power dissipation. This paper proposes power reduction analyzed through algorithm and logic circuit levels. In algorithm level the key aspect of reducing power dissipation is by minimizing transition activity and is achieved by introducing a data coding technique. So a novel multi coding technique is introduced to improve the efficiency of transition activity up to 52.3% on the bus lines, which will automatically reduce the dynamic power dissipation. In addition, 1 bit full adders are introduced in the Hamming distance estimator block, which reduces the device count. This coding method is implemented using Verilog HDL. The overall performance is analyzed by using Modelsim and Xilinx Tools. In total 38.2% power saving capability is achieved compared to other existing methods.  相似文献   

14.
The Power Factor Approximation (PFA) power estimation method is reviewed and applied to VLSI array processing systems. The power dissipation of 1, 2, and 3 dimensional algorithms implemented on linear, hexagonal, and cubic processor arrays is investigated. Closed form equations are developed which show how the overall power dissipation is influenced by an algorithm's size and dimensionality, the target array processor's size and dimensionality, and the adopted partitioning strategy. The power estimation methods developed in this paper can be applied in the early phases of VLSI algorithm/architecture design, selection, and partitionment. The power dissipation of a matrix-matrix multiplication operation is estimated as an example application.This work was supported in part by the Hughes Aircraft Company fellowship program and the NSF initiation grant MIP-99-10437.  相似文献   

15.
Device scaling is an important part of the very large scale integration(VLSI) design to boost up the success path of VLSI industry, which results in denser and faster integration of the devices. As technology node moves towards the very deep submicron region, leakage current and circuit reliability become the key issues. Both are increasing with the new technology generation and affecting the performance of the overall logic circuit. The VLSI designers must keep the balance in power dissipation and the circuit’s performance with scaling of the devices. In this paper, different scaling methods are studied first. These scaling methods are used to identify the effects of those scaling methods on the power dissipation and propagation delay of the CMOS buffer circuit. For mitigating the power dissipation in scaled devices, we have proposed a reliable leakage reduction low power transmission gate(LPTG) approach and tested it on complementary metal oxide semiconductor(CMOS) buffer circuit. All simulation results are taken on HSPICE tool with Berkeley predictive technology model(BPTM) BSIM4 bulk CMOS files. The LPTG CMOS buffer reduces 95.16% power dissipation with 84.20% improvement in figure of merit at 32 nm technology node. Various process, voltage and temperature variations are analyzed for proving the robustness of the proposed approach. Leakage current uncertainty decreases from 0.91 to 0.43 in the CMOS buffer circuit that causes large circuit reliability.  相似文献   

16.
The use of deep-submicrometer (DSM) technology increases the capacitive coupling between adjacent wires leading to severe crosstalk noise, which causes power dissipation and may also lead to malfunction of a chip. In this paper, we present a technique that reduces crosstalk noise on instruction buses. While previous research focuses primarily on address buses, little work can be applied efficiently to instruction buses. This is due to the complex transition behavior of instruction streams. Based on instruction sequence profiling, we exploit an architecture that encodes pairs of bus wires and permute them in order to optimize power and noise. A close to optimal architecture configuration is obtained using a genetic algorithm. Unlike previous bus encoding approaches, crosstalk reduction can be balanced with delay and area overhead. Moreover, if delay (or area) is most critical, our architecture can be tailored to add nearly no overhead to the design. For our experiments, we used instruction bus traces obtained from 12 SPEC2000 benchmark programs. The results show that our approach can reduce crosstalk up to 50.79% and power consumption up to 55% on instruction buses.  相似文献   

17.
18.
This paper describes an area and power-efficient VLSI approach for implementing the discrete wavelet transform on streaming multielectrode neurophysiological data in real time. The VLSI implementation is based on the lifting scheme for wavelet computation using the symmlet4 basis with quantized coefficients and integer fixed-point data precision to minimize hardware demands. The proposed design is driven by the need to compress neural signals recorded with high-density microelectrode arrays implanted in the cortex prior to data telemetry. Our results indicate that signal integrity is not compromised by quantization down to 5-bit filter coefficient and 10-bit data precision at intermediate stages. Furthermore, results from analog simulation and modeling show that a hardware-minimized computational core executing filter steps sequentially is advantageous over the pipeline approach commonly used in DWT implementations. The design is compared to that of a B-spline approach that minimizes the number of multipliers at the expense of increasing the number of adders. The performance demonstrates that in vivo real-time DWT computation is feasible prior to data telemetry, permitting large savings in bandwidth requirements and communication costs given the severe limitations on size, energy consumption and power dissipation of an implantable device.  相似文献   

19.
With the advent of portable and high-density microelectronic devices, the power dissipation of very large scale integrated (VLSI) circuits is becoming a critical concern. Accurate and efficient power estimation during the design phase is required in order to meet the power specifications without a costly redesign process. In this paper, we present a review of the power estimation techniques that have recently been proposed  相似文献   

20.
Rotary clock is a resonant clocking technique that delivers on-chip clock signal distribution with very low power dissipation. Since it can only generate clock signals with multiple phases that are spatially distributed, rotary clock is often considered not applicable to industrial very large scale integration (VLSI) designs. This paper presents the first rotary-clock-based nontrivial digital circuit. Our design, a low-power and high-speed finite-impulse response (FIR) filter, is fully digital and generated using CMOS standard cells in 0.18 mum technology. We have shown that the proposed FIR filter is seamlessly integrated with the rotary clock technique. It uses the spatially distributed multiple clock phases of rotary clock and achieves high power savings. Simulation results demonstrate that our rotary-clock-based FIR filter can operate successfully at 610 MHz, providing a throughput of 39 Gb/s. In comparison with the conventional clock-tree-based design, our design achieves a 34.6% clocking power saving and a 12.8% overall circuit power saving. In addition, the peak current consumed by the rotary-clock-based filter is substantially lower by 40% on the average. Our study makes the crucial step toward the application of rotary clock technique to a broad range of VLSI designs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号