期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

田祎颜军《电子设计工程》2012,20(12):13-15,20

浮点运算器的核心运算部件是浮点加法器,它是实现浮点指令各种运算的基础,其设计优化对于提高浮点运算的速度和精度相当关键。文章从浮点加法器算法和电路实现的角度给出设计方法,通过VHDL语言在QuartusII中进行设计和验证,此加法器通过状态机控制运算,有效地降低了功耗,提高了速度,改善了性能。相似文献

2.

A fully pipelined single-precision floating-point unit in the synergistic processor element of a CELL processor

Hwa-Joon Oh Mueller S.M. Jacobi C. Tran K.D. Cottier S.R. Michael B.W. Nishikawa H. Totsuka Y. Namatame T. Yano N. Machida T. Dhong S.H. 《Solid-State Circuits, IEEE Journal of》2006,41(4):759-771

The floating-point unit (FPU) in the synergistic processor element (SPE) of a CELL processor is a fully pipelined 4-way single-instruction multiple-data (SIMD) unit designed to accelerate media and data streaming with 128-bit operands. It supports 32-bit single-precision floating-point and 16-bit integer operands with two different latencies, six-cycle and seven-cycle, with 11 FO4 delay per stage. The FPU optimizes the performance of critical single-precision multiply-add operations. Since exact rounding, exceptions, and de-norm number handling are not important to multimedia applications, IEEE correctness on the single-precision floating-point numbers is sacrificed for performance and simple design. It employs fine-grained clock gating for power saving. The design has 768K transistors in 1.3 mm/sup 2/, fabricated SOI in 90-nm technology. Correct operations have been observed up to 5.6 GHz with 1.4 V and 56/spl deg/C, delivering 44.8 GFlops. Architecture, logic, circuits, and integration are codesigned to meet the performance, power, and area goals. 相似文献

3.

一种改进的基于Kogge-Stone结构的并行前缀加法器

赵翠华娄冕张洵颖沈绪榜《微电子学与计算机》2011,28(2):47-50

基于并行前缀算法的Kogge-Stone结构,通过改进其结构层次上的逻辑电路,提出一种改进的并行前缀加法器.与传统电路相比,该加法器不仅可以减小面积、功耗和延时,而且随着位宽的加大其优势更加明显,是适用于宽位的并行前缀加法器. 相似文献

4.

Field programmable gate arrays and floating point arithmetic

Fagin B. Renard C. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1994,2(3):365-367

We present empirical results describing the implementation of an IEEE Standard 754 compliant floating-point adder/multiplier using field programmable gate arrays. The use of FPGA's permits fast and accurate quantitative evaluation of a variety of circuit design tradeoffs for addition and multiplication. PPGA's also permit accurate assessments of the area and time costs associated with various features of the IEEE floating-point standard, including rounding and gradual underflow. These costs are analyzed, along with the effects of architectural correlation, a phenomenon that occurs when the cost of combining architectural features exceeds the sum of separate implementation. We conclude with an assessment of the strengths and weaknesses of using FPGA's for floating-point arithmetic 相似文献

5.

一种静态电路兼容的4GHz 64位动态加法器设计

王志远高茁《微电子学与计算机》2008,25(3):159-162

设计了一个与静态电路兼容的64位动态加法器,采用嵌入逻辑的动态触发器,以及多相位时钟技术,实现了与上、下级静态电路的接口.在加法器内部采用稀疏先行进位策略平衡逻辑路径长度以降低内部负载,提高性能.在STMicro90nmCMOS工艺下,该加法器可工作在4GHz时钟下,功耗45.9mW. 相似文献

6.

基于FPGA的线阵探测器非均匀校正的实现

高文清徐世伟刘严严丁艳艳《光电技术应用》2012,27(3):77-81

充分利用了FPGA的硬件资源,提出一种采用电路逻辑设计的FPGA来实现两点校正;利用FPGA中的浮点加法器、浮点除法器、浮点乘法器,以及内部RAM、ROM存储器,可以实时计算校正系数,然后对线阵红外探测器进行非均匀性校正,保证了校正精度。同时,充分利用FPGA并行处理能力强的特点,使系数、图像数据的读取在一个时钟周期内完成。相似文献

7.

高速浮点运算单元的FPGA实现

张小妍邵杰《电子工程师》2009,35(11):24-27

运用流水线技术对单精度浮点乘法和加法运算单元进行了优化设计。浮点加法器采用了改进的双路径结构,重点对移位单元和前导1检测单元的结构进行了优化。浮点乘法器在对被乘数进行Booth编码后,采用改进的4-2压缩器构成Wallace树,在简化逻辑的同时,提高了系统的吞吐率。经过仿真验证,在Virtex-4系列FPGA（现场可编程门阵列）上,浮点加法器的最高运行速率达到405MHz,浮点乘法器的最高运行速率达到429MHz。相似文献

8.

Systematic IEEE rounding method for high-speed floating-point multipliers 总被引：1，自引：0，他引：1

Quach N.T. Takagi N. Flynn M.J. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2004,12(5):511-521

For performance reasons, many high-speed floating-point multipliers today precompute multiple significand values (SVs) in advance. The final normalization and rounding steps are then performed by selecting the appropriate SV. While having speed advantages, this integrated rounding method complicates the development of the rounding logic significantly, hence, requiring a systematic rounding method. The systematic rounding method, presented in this paper, has three steps: 1) constructing a rounding table; 2) developing a prediction scheme; and 3) performing rounding digits selection (RDS). The rounding table lists all possible SVs that need to be precomputed. Prediction reduces the number of these SVs for efficient hardware implementation while RDS reduces the complexity of the rounding logic. Both prediction and RDS depend on the specifics of the hardware implementation. Two hardware implementations are described. The first one is modeled after that reported by Santoro et al. and the second improved one supports all IEEE rounding modes. Besides allowing systematic hardware optimization, this rounding method has the added advantage that verification and generalization are straightforward. 相似文献

9.

Leading-zero anticipatory logic for high-speed floating pointaddition

Suzuki H. Morinaka H. Makino H. Nakase Y. Mashiko K. Sumi T. 《Solid-State Circuits, IEEE Journal of》1996,31(8):1157-1164

This paper describes a new leading-zero anticipatory (LZA) logic for high-speed floating-point addition (FADD). This logic carries out the pre-decoding for normalization concurrently with addition for the significand. It also performs the shift operation of normalization in parallel with the rounding operation. The use of simple Boolean algebra allows the proposed logic to be constructed from a simple CMOS circuit. Its area penalty is as small as 30% of the conventional LZA method. The FADD core using the proposed logic was fabricated by 0.5 μm CMOS technology with triple metal interconnections and runs at 164 MHz under the condition of V_DD=3.3 V 相似文献

10.

A Low Power Approach to Floating Point Adder Design for DSP Applications

R.V.K. Pillai D. Al-Khalili A.J. Al-Khalili S.Y.A. Shah 《The Journal of VLSI Signal Processing》2001,27(3):195-213

The demand for high performance, low power floating point adder cores has been on the rise during the recent years particularly for DSP applications. In this paper, we present a new architecture for a low power, IEEE compatible, floating point adder, that is fast and has low latency. The functional partitioning of the adder into three distinct, clock gated data paths allows activity reduction. The switching activity function of the proposed adder is represented as a three state FSM. During any given operation cycle, only one of the data paths is active, during which time, the logic assertion status of the circuit nodes of the other data paths are held at their previous states. Critical path delay and latency are reduced by incorporating speculative rounding and pseudo leading zero anticipatory logic as well as data path simplifications. In contrast to conventional high speed floating point adders that use leading zero anticipatory logic, the proposed scheme offers a worst case power reduction of 50%. 相似文献

11.

A novel IEEE rounding algorithm for high-speed floating-point multipliers

Mustafa 《Integration, the VLSI Journal》2007,40(4):549-560

Modern floating-point multipliers perform rounding in compliance with the IEEE 754 standard. Since rounding is on the critical path, high-speed rounding algorithms are used to increase the performance for floating-point multiplication. To achieve high performance with minimum increase in hardware, existing rounding algorithms generate two consecutive values in parallel, and compute the rounded product using these values. This paper presents a novel IEEE rounding algorithm which generates two nonconsecutive values in parallel to compute the rounded product. Synthesis results for double precision operands show that the proposed algorithm has approximately 24–41% less delay than previous high-speed rounding algorithms presented elsewhere. The verification of the new algorithm is also presented in a simple and straightforward manner. 相似文献

12.

一种稀疏树加法器及结构设计 总被引：1，自引：0，他引：1

下载免费PDF全文

王骞丁铁夫《电子器件》2005,28(2):312-314

提出了一种稀疏树加法器,该加法器基于并行前缀加法器,以预处理和后处理阶段的面积和延迟换取并行前缀进位阶段的面积和延迟,可针对大多数并行前缀加法器进行改进,在较长操作数相加时可节省面积同时减小关键路径延迟。以几种并行前缀加法器Sldarisky、Brent—Kung、Kogge—Stone和Han—Carlson为例,对他们的面积和延迟进行了理论分析。在本文的最后用硬件描述语言实现了Sklansky加法器。相似文献

13.

模2~n±1乘法的可重构设计与实现

李军吕永其《信息安全与通信保密》2008,(2):97-99

论文提出了一种可同时实现模2~n±1乘法的算法及其VLSI结构。通过对常规并行前缀加法器和乘法器的改造,在实现普通加法和乘法的基础上增加少量逻辑,实现了模2~n±1乘法(n=8、16、32)。较之同类设计,该设计实现了对常规加法器和乘法器资源的高度重用,而且性能较高。相似文献

14.

Low-power design techniques for high-performance CMOS adders

Uming Ko Balsara T. Wai Lee 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1995,3(2):327-333

A high-performance adder is one of the most critical components of a processor which determines its throughput, as it is used in the ALU, the floating-point unit, and for address generation in case of cache or memory access. In this paper, low-power design techniques for various digital circuit families are studied for implementing high-performance adders, with the objective to optimize performance per watt or energy efficiency as well as silicon area efficiency. While the investigation is done using 100 MHz, 32 b carry lookahead (CLA) adders in a 0.6 μm CMOS technology, most techniques presented here can also be applied to other parallel adder algorithms such as carry-select adders (CSA) and other energy efficient CMOS circuits. Among the techniques presented here, the double pass-transistor logic (DPL) is found to be the most energy efficient while the single-rail domino and complementary pass-transistor logic (CPL) result in the best performance and the most area efficient adders, respectively. The impact of transistor threshold voltage scaling on energy efficiency is also examined when the supply voltage is scaled from 3.5 V down to 1.0 V 相似文献

15.

High performance asynchronous design using single-track full-buffer standard cells

Ferretti M. Beerel P.A. 《Solid-State Circuits, IEEE Journal of》2006,41(6):1444-1454

This paper presents a high-performance asynchronous template, single-track full-buffer (STFB), which achieves close to full-custom performance using a standard cell design flow and industry standard CAD tools to perform schematic capture, simulation, cell layout, and automatic placement and routing. This template and flow is demonstrated and evaluated with the implementation of a 64-bit asynchronous prefix adder, and its test circuitry, using the TSMC 0.25-/spl mu/m process. The 64-bit asynchronous prefix adder layout requires 0.96 mm/sup 2/ and the entire 260-k transistor test chip reaches a measured throughput of 1.45GHz. The design demonstrates that the STFB template can yield three times higher throughput with approximately half of the area of comparable quasi-delay-insensitive (QDI) templates, requires less timing assumptions than ultra-high-speed GasP bundled-data circuits, and can be designed with an automated place and route flow. 相似文献

16.

Designing High-Speed Adders in Power-Constrained Environments

《Circuits and Systems II: Express Briefs, IEEE Transactions on》2009,56(2):172-176

Data-driven dynamic logic (D3L) is very efficient when low-power constraints are mandatory. Unfortunately, this advantage is typically obtained at the expense of speed performances. This paper presents a novel technique to realize D3L parallel prefix tree adders without significantly compromising speed performance. When applied to a 64-bit Kogge–Stone adder realized with 90-nm complementary metal–oxide–semiconductor (CMOS) technology, the proposed technique leads to an energy-delay product that is 29% and 21% lower than its standard domino logic and conventional D3L counterparts, respectively. It also shows a worst case delay that is 10% lower than that of the D3L approach and only 5% higher than that of the conventional domino logic. 相似文献

17.

A high-speed four-bit full adder with a resistor coupled Josephson logic

《Electron Device Letters, IEEE》1983,4(12):428-429

A four-bit full adder circuit implemented in resistor coupled Josephson logic (RCJL) has been designed and successfully tested with 173-ps critical path delay. The full adder circuit uses dual rail logic with emphasis on high-speed operation. An experimental four-bit adder circuit was fabricated using lead-alloy Josephson IC technology with a 5-µm minimum feature size and a 7-µm minimum junction diameter. The circuit consists of 80 devices with 264 junctions. The minimum critical path delay for the ripple carry adder was measured to be 173 ps/4 bits. This result demonstrates the RCJL potential for high-speed digital applications. 相似文献

18.

DSP芯片中浮点加法器LOD电路的设计

车德亮黄士坦刘军华唐威段来仓《微电子学与计算机》2003,20(4):60-62,65

DSP芯片中浮点加法器的速度制约着整个芯片的工作速度，浮点加法器中LOD电路的速度又是浮点加法器工作速度的瓶颈。因此，我们可以通过对LOD电路的改进，来提高整个DSP芯片的工作性能。我们从LOD的组成结构和逻辑两个方面进行设计，实现了一种快速、高效的LOD电路。它针对处理的数据格式为TMS320C3X扩展精度浮点数据格式。相似文献

19.

Enhancing Accuracy and Dynamic Range of Scientific Data Analytics by Implementing Posit Arithmetic on FPGA

Hou Junjie Zhu Yongxin Du Sen Song Shijin 《Journal of Signal Processing Systems》2019,91(10):1137-1148

The high performance, power efficiency and reconfigurable characteristic of FPGA attract more and more attention in big data processing. In scientific data analytics, besides the consideration of computing performance, accuracy of the results and dynamic range of data representation are critical features that must be considered. At present, the floating-point IP cores in FPGA design use IEEE standard for floating-point arithmetic – IEEE 754. For FPGA based scientific data application, improving existing floating-point IP cores is a significant way to obtain better results. Posit is a floating-point arithmetic format first proposed by John L. Gustafson in 2017. In posit, the variable precision and efficient representation of exponent contribute a higher accuracy and larger dynamic range than IEEE 754. This work researches on the FPGA implementation of posit arithmetic for extending floating-point IP cores for FPGA based scientific data analytics. We design the logic for hardware implementation and implement it on FPGA. We compare the precision representation, dynamic range and performance of implemented posit FPU (Floating-Point Unit) with IEEE 754 floating-point IP cores. Posit exhibits better superiority in precision representation and dynamic range than IEEE 754, and through further optimization of the implementation, posit can be a good candidate for floating-point IP cores.

相似文献

20.

浮点加法器IP核的VHDL设计

何清平刘佐濂林少伟《山西电子技术》2006,(4):34-36

浮点数加法运算是浮点运算中使用频率最高的运算。结合VHDL和FPGA可编程技术，完成具有5级流水线结构、符合IEEE754浮点数标准、可参数化为单／双精度的浮点数加法器IP核的VHDL设计。相似文献