首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The voltage/frequency island (VFI) design paradigm is a practical architecture for energy-efficient networks-on-chip (NoC) systems. In VFI-based NoC systems, each island can be operated with different voltage and clock frequency and thus it is important to carefully partition processing elements (PEs) into islands based on their workloads and communications. In this paper, we propose an energy-efficient design scheme that optimizes energy consumption and hardware costs in VFI-based NoC systems. Since on-chip networks take up a substantial portion of system power budget in NoC-based systems, the proposed scheme uses communication-aware VFI partitioning and tile mapping/routing algorithms to minimize the inter-VFI communications. Experimental results show that the proposed design technique can reduce communication energy consumption by 32–51% over existing techniques and total energy consumption by 3–14%.  相似文献   

2.
A Globally Asynchronous, Locally Synchronous (GALS) system with dynamic voltage and frequency scaling can use the slowest frequency possible to accomplish a task with minimal power consumption. With the mechanism for implementing dynamic voltage scaling at each synchronous domain left up to the designer, our Globally Asynchronous, Locally Dynamic System (GALDS) provides a top-down, system-level means to maximize power reduction in an integrated circuit and facilitate system-on-a-chip (SoC) design. Our solution includes three distinct components: a novel bidirectional asynchronous FIFO to communicate between independently clocked synchronous blocks , an all-digital dynamic clock generator to quickly and glitchlessly switch between frequencies and a digitally controlled oscillator to generate the global fixed frequency clocks required by the all-digital dynamic clock generator. In addition to being capable of reducing power consumption when combined with dynamic voltage scaling, a GALDS design benefits from numerous other advantages such as simplified clock distribution, high performance operation and faster time-to-market through the modular nature of the architecture.  相似文献   

3.
Enabled by the continuous advancement in fabrication technology, present-day synchronous microprocessors include more than 100 million transistors and have clock speeds well in excess of the 1-GHz mark. Distributing a low-skew clock signal in this frequency range to all areas of a large chip is a task of growing complexity. As a solution to this problem, designers have recently suggested the use of frequency islands that are locally clocked and externally communicate with each other using mixed clock communication schemes. Such a design style fits nicely with the recently proposed concept of voltage islands that, in addition, can potentially enable fine-grain dynamic power management by simultaneous voltage and frequency scaling. This paper proposes a design exploration framework for application-adaptive multiple-clock processors which provides the means for analyzing and identifying the right interdomain communication scheme and the proper granularity for the choice of voltage/frequency islands in case of superscalar, out-of-order processors. In addition, the presented design exploration framework allows for comparative analysis of newly proposed or already published application-driven dynamic power management strategies. Such a design exploration framework and accompanying results can help designers and computer architects in choosing the right design strategy for achieving better power-performance tradeoffs in multiple-clock high-end processors.  相似文献   

4.
Power dissipation is becoming a prime design constraint in VLSI systems. The new key words for evaluating a design's performance are low power and high speed. This requires an overall system design review that considers suitable algorithms, architectures, circuits, and technology. In synchronous systems, the clocking network sets the frame that contains the whole design. It must be simple and robust. Power consumption in the clock distribution network has usually been a substantial part of the system total power consumption. New true single phase latches and flip flops are presented that are slope-insensitive, fast, and have data dependent power consumption. Flip flops are presented that work between DC and 1.7 GHz clock frequencies in a 1 μm CMOS technology. Methods are given that result in power saving in the clock system by reducing the clock rate by half for the same data throughput on the system level  相似文献   

5.
A new architecture of digital processors for passive UHF radio-frequency identification tags is proposed.This architecture is based on ISO/IEC 18000-6C and targeted at ultra-low power consumption.By applying methods like system-level power management,global clock gating and low voltage implementation,the total power of the design is reduced to a few microwatts.In addition,an innovative way for the design of a true RNG is presented,which contributes to both low power and secure data transaction.The digital processor is verified by an integrated FPGA platform and implemented by the Synopsys design kit for ASIC flows.The design fits different CMOS technologies and has been taped out using the 2P4M 0.35μm process of Chartered Semiconductor.  相似文献   

6.
Chip multiprocessors with globally asynchronous locally synchronous (GALS) clocking styles are promising candidates for processing computationally-intensive and energy-constrained workloads. The GALS methodology simplifies clock tree design, provides opportunities to use clock and voltage scaling jointly in system submodules to achieve high energy efficiencies, and can also result in easily scalable clocking systems. However, its use typically also introduces performance penalties due to additional communication latency between clock domains. We show that GALS chip multiprocessors (CMPs) with large inter-processor first-inputs–first-outputs (FIFOs) buffers can inherently hide much of the GALS performance penalty while executing applications that have been mapped with few communication loops. In fact, the penalty can be driven to zero with sufficiently large FIFOs and the removal of multiple-loop communication links. We present an example mesh-connected GALS chip multiprocessor and show it has a less than 1% performance (throughput) reduction on average compared to the corresponding synchronous system for many DSP workloads. Furthermore, adaptive clock and voltage scaling for each processor provides an approximately 40% power savings without any performance reduction. These results compare favorably with the GALS uniprocessor, which compared to the corresponding synchronous uniprocessor, has a reported greater than 10% performance (throughput) reduction and an energy savings of approximately 25% using dynamic clock and voltage scaling for many general purpose applications.   相似文献   

7.
The application-specific multiprocessor system-on-a-chip is a promising design alternative because of its high degree of flexibility, short development time, and potentially high performance attributed to application-specific optimizations. However, designing an optimal application-specific multiprocessor system is still challenging because there are a number of important metrics, such as throughput, latency, and resource usage, which need to be explored and optimized. This paper addresses the problem of synthesizing an application-specific multiprocessor system for stream-oriented embedded applications to minimize system latency under the throughput constraint. We employ a novel framework for this problem, similar to that of technology mapping in the logic synthesis domain, and develop a set of efficient algorithms, including labeling and clustering for efficient generation of the multiprocessor architecture with application-specific optimized latency. Specifically, the result of our algorithm is latency-optimal for directed acyclic task graphs. Application of our approach to the Motion JPEG example on Xilinx's Virtex II Pro platform FPGA shows interesting design tradeoffs.   相似文献   

8.
Conventional interconnections for digital clock distribution pose a severe power consumption problem for GHz clock distribution due to transmission line losses. Therefore, we have proposed an RF clock distribution (RCD) scheme for high-speed digital applications, in particular a multiprocessor system using global clocking. This paper first reports system power and signal integrity analysis results including skew, jitter, impedance mismatch, and noise for RF clock distribution,especially in the GHz range. Based on this analysis, a novel signal integrity design methodology for RF clock distribution systems is proposed. The clock skew created by process parameter variations are modeled and predicted. The system comprises a RF clock transmitter as a clock generator, an H-tree with junction couplers as a clock distributing network and a RF receiver as a digital clock-recovery module. Flip-chip interconnections for the chip-to-substrate assembly and 0.35 μm TSMC CMOS technology for the RF clock receiver are assumed. EMI analysis for 2 GHz 16-node-board-level RF clock distribution networks is conducted using 3D full-wave EM simulation. Finally, the RCD as a low power and high performance clocking method is demonstrated using HP's Advanced Design System (ADS) simulation, considering microwave frequency interconnection models and process parameter variations. In addition, test vehicles for both 2 GHz 16-node and 5 GHz 64-node board-level RF clock distribution networks were implemented and measured using thin, low-loss, and low permittivity RogersLt; RO3003 high-frequency organic substrate  相似文献   

9.
In this paper, we describe resource-efficient hardware architectures for software-defined radio (SDR) front-ends. These architectures are made efficient by using a polyphase channelizer that performs arbitrary sample rate changes, frequency selection, and bandwidth control. We discuss area, time, and power optimization for field programmable gate array (FPGA) based architectures in an M -path polyphase filter bank with modified N -path polyphase filter. Such systems allow resampling by arbitrary ratios while simultaneously performing baseband aliasing from center frequencies at Nyquist zones that are not multiples of the output sample rate. A non-maximally decimated polyphase filter bank, where the number of data loads is not equal to the number of M subfilters, processes M subfilters in a time period that is either less than or greater than the M data-load’s time period. We present a load-process architecture (LPA) and a runtime architecture (RA) (based on serial polyphase structure) which have different scheduling. In LPA, N subfilters are loaded, and then M subfilters are processed at a clock rate that is a multiple of the input data rate. This is necessary to meet the output time constraint of the down-sampled data. In RA, M subfilters processes are efficiently scheduled within N data-load time while simultaneously loading N subfilters. This requires reduced clock rates compared with LPA, and potentially less power is consumed. A polyphase filter bank that uses different resampling factors for maximally decimated, under-decimated, over-decimated, and combined up- and down-sampled scenarios is used as a case study, and an analysis of area, time, and power for their FPGA architectures is given. For resource-optimized SDR front-ends, RA is superior for reducing operating clock rates and dynamic power consumption. RA is also superior for reducing area resources, except when indices are pre-stored in LUTs.  相似文献   

10.
余乐  陈岩  李洋洋  吴超  王瑶  苏童  谢元禄 《电子学报》2017,45(7):1686-1694
本文在FPGA时钟网络(Clock Distributed Network,CDN)关键结构尺寸的参数化建模基础上,提出一种针对全定制FPGA CDN的设计和优化方法.本文所建立的参数化模型将结构尺寸分为拓扑结构和电路与互连两类,分别给出了这两类尺寸参数的设计原则.在标准CMOS 0.13μm工艺下,对H树型、鱼骨型以及混合型三种类型时钟网络设计了2组结构参数,分别代表优化前和优化后,对比分析延时、偏斜、功耗和面积等性能参数.实验结果显示:混合型结构在绝对延时和时钟偏斜上减小最多,分别达到20.89%和63.20%;鱼骨型结构的面积减小达到50.14%;H树型结构的绝对延时和功耗则均降低了7.37%和8.33%.以上结果充分证明了本文所提设计优化方法的有效性.  相似文献   

11.
On-FPGA communication is becoming more problematic as the long interconnection performance is deteriorating in technology scaling. In this paper, we address this issue by proposing a novel wave-pipelined signaling scheme to achieve substantial throughput improvement in FPGAs. A new analytical model capturing the electrical characteristics in FPGA interconnects is presented. Based on the model, throughput and power consumption of a wave-pipelined link have been derived analytically and compared to the conventional synchronous links. Two circuit designs are proposed to realize wave-pipelined link using FPGA fabrics. The proposed approaches are also compared with conventional synchronous and asynchronous pipelining techniques. It is shown that the wave-pipelined approach can achieve up to 5.7 times improvement in throughput and 13% improvement in power consumption versus conventional delay-based on-chip communication schemes. Also, trade-offs between power, throughput and area consumption between the proposed and conventional designs are studied. The wave-pipelining approach provides a new alternative for on-FPGA communications and can potentially become a promising solution to mitigate the future interconnect scaling challenge.  相似文献   

12.
The conflictual demand of faster and larger designs is increasingly difficult to answer by the advances of solid state technology alone. At some point, it is expected that designers and manufacturers will have to give up the traditional synchronous design methodology for a Globally Asynchronous Locally Synchronous (GALS) one. Such changes imply more synchronization constraints, but also more flexibility. Consequently, this paper proposes a novel Field-Programmable Gate Arrays (FPGA) architecture that is compatible with existing devices and that can also support GALS designs. The main objective is simple: the proposed architecture must appear unchanged for synchronous design, but it must also include a minimal amount of basic components to prevent metastability for efficient asynchronous communications. Thus, the paper presents the constraint equations required to implement such a circuit. It also presents a pausible clock generator application and simulation results for the proposed architecture. All results demonstrate that with a few additional customized circuits, a standard FPGA cell can become appropriate for GALS methodologies.  相似文献   

13.
As technology evolves into the deep submicron level, synchronous circuit designs based on a single global clock have incurred problems in such areas as timing closure and power consumption. An asynchronous circuit design methodology is one of the strong candidates to solve such problems. To verify the feasibility and efficiency of a large‐scale asynchronous circuit, we design a fully clockless 32‐bit processor. We model the processor using an asynchronous HDL and synthesize it using a tool specialized for asynchronous circuits with a top‐down design approach. In this paper, two microarchitectures, basic and enhanced, are explored. The results from a pre‐layout simulation utilizing 0.13‐μm CMOS technology show that the performance and power consumption of the enhanced microarchitecture are respectively improved by 109% and 30% with respect to the basic architecture. Furthermore, the measured power efficiency is about 238 μW/MHz and is comparable to that of a synchronous counterpart.  相似文献   

14.
The implementation of interconnect is becoming a significant challenge in modern integrated circuit (IC) design. Both synchronous and asynchronous strategies have been suggested to manage this problem. Creating a low skew clock tree for synchronous inter-block pipeline stages is a significant challenge. Asynchronous interconnect does not require a global clock, and therefore, it has a potential advantage in terms of design effort. This paper presents an asynchronous interconnect design that can be implemented using a standard application-specific IC flow. This design is considered across a range of IC interconnect scenarios. The results demonstrate that there is a region of the design space where the implementation provides an advantage over a synchronous interconnect by removing the need for clocked inter-block pipeline stages, while maintaining high throughput. Further results demonstrate a computer-aided design tool enhancement that would significantly increase this space. A detailed comparison of power, area, and latency of the two strategies is also provided for a range of IC scenarios.  相似文献   

15.
The design of many core systems-on-chip (SoCs) has become increasingly challenging due to high levels of integration, excessive energy consumption and clock distribution problems. To deal with these issues, we consider network-on-chip (NoC) architectures partitioned into several voltage-frequency islands (VFIs) and propose a design methodology for runtime energy management. The proposed approach minimizes the energy consumption subject to performance constraints. Then, we present efficient techniques for on-the-fly workload monitoring and management to ensure that the system can cope with variability in the workload and various technology-related parameters. Simulation results demonstrate the effectiveness of our approach in reducing the overall system energy consumption for a real video application. Finally, the results and functional correctness are validated using an field-programmable gate-array (FPGA) prototype for an NoC with multiple VFIs.   相似文献   

16.
针对模拟锁相环抗干扰能力差、可靠性不高,生产成本过高的弱点,采用Verilog编程语言,通过Quartus ii软件仿真,设计了一款基于FPGA的全数字锁相环。该锁相环能对输入数字信号进行快速地位同步时钟提取,并已经应用于以Altera公司生产的Cyclone iii系列FPGA芯片[1]为核心的软件无线电硬件平台的时钟同步提取当中。  相似文献   

17.
18.
In large-scale and high-speed digital systems, global synchronization has frequently been used to protect clocked I/O from data failure due to metastability. Synchronous design styles are widely used, easy to grasp and to implement, and also well supported by logic synthesis tools. There are many drawbacks with global synchronization. Most important is the relationship between physical size and maximum clock frequency, which will approach its limit as clock frequency and system size increase simultaneously. The purpose of this proposed Globally Updated Mesochronous (GUM) design style is to overcome those drawbacks by identifying all global signal links in the system and adding synchronization circuits to these. System level simplicity, inherited from synchronous design and its tool support, is retained. In this paper, the GUM design style is described, analyzed, and demonstrated. Experimental results from a large-scale high-speed system using three 0.8-/spl mu/m BiCMOS chips are given. The GUM design style is scaleable and suitable for future system-on-chip applications both on and among chips.  相似文献   

19.
双运算核提升小波变换的FPGA硬件实现   总被引:1,自引:0,他引:1  
应用提升方法实现了双正交小波变换.给出了应用因式分解法,将传统小波滤波器分解为基本提升步骤的推导.采用双运算核在FPGA硬件平台上实现小波变换模块.采用单一时钟,在不增加系统设计复杂性和功耗的情况下,使得系统达到实时处理的要求.系统通过仿真验证,工作稳定可靠.  相似文献   

20.
功耗问题一直是片上网络设计中最为关心的问题之一.基于全局异步局部同步(GALS)的电压岛(VFI)机制的引入不但提供了极大地降低片上功耗的可能,也解决了片上单时钟传输的瓶颈问题.本文改善了现有的两种电压岛划分、核映射及路由分配方法,提出了一种更优的综合解决方案,并进行了验证.仿真结果显示,本文的方案可以显著降低系统功耗,同时提高了片上网络性能.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号