首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
本文介绍一种用于高性能DSP的32位浮点乘法器设计,通过采用改进Booth编码的树状4-2压缩器结构,提高了速度,降低了功耗,该乘法器结构规则且适合于VLSI实现,单个周期内完成一次24位整数乘或者32位浮点乘。整个设计采用Verilog HDL语言结构级描述,用0.25um单元库进行逻辑综合.完成一次乘法运算时间为24.30ns.  相似文献   

2.
本文介绍一种用于高性能DSP的32位浮点乘法器设计,通过采用改进Booth编码的树状4-2压缩器结构,提高了速度,降低了功耗,该乘法器结构规则且适合于VLSI实现,单个周期内完成一次24位整数乘或者32位浮点乘。整个设计采用Verilog HDL语言结构级描述,用0.25um单元库进行逻辑综合.完成一次乘法运算时间为24.30ns.  相似文献   

3.
刘强  王荣生 《计算机工程》2005,31(6):200-202
采用了一种改进的基—4 BOOTH编码方案,设计了一种高速32×32-b定/浮点并行乘法器。乘法器电路利用CPL逻辑来实现。通过对关键延时路径中的(4:2)压缩器和64位加法器的优化设计,可以在20ns内完成一次乘法运算。乘法器的设计由0.45um的双层金属CMOS工艺实现,工作电压为3.3V,用于自适应数字滤波运算中。  相似文献   

4.
基于FPGA的激光陀螺信号高速解调滤波设计   总被引:3,自引:0,他引:3  
在FPGA中实现DSP和计算机常用的IEEE单精度32位浮点表示方式,通过模块化设计,能够进行相关的浮点加法和乘法操作。利用内部逻辑单元、乘法器、ROM、RAM等资源,经过正确的逻辑控制和可靠的时序设计,设计了一个能对激光陀螺信号进行高速、精确滤波的专用滤波器,并且更简便实现后续DSP或计算机对滤波数据的格式处理。  相似文献   

5.
一种快速的浮点乘法器结构   总被引:2,自引:0,他引:2  
一种支持IEEE754浮点标准的全流水结构的浮点乘法器被提出.在该浮点乘法器中,提出一种新型的双路浮点乘法结构.这种结构相比于全规模乘法器,在不增加面积的前提下,缩短乘法树关键路径延迟13.6%,提高了乘法器的执行频率.这种乘法器有3个周期的延迟,每个周期能接收一条单精度或双精度浮点乘法指令.使用FPGA进行验证,并使用标准单元实现.采用0.18μm的静态CMOS工艺,执行频率为384MHz,面积为732902.25μm^2.在相同工艺条件下,将这种结构与其他乘法器结构进行比较,结果表明这种结构是有效的.  相似文献   

6.
本文主要介绍150-AP浮点乘法器的设计特点:采用4位×4位的乘法组件,该组件每4位相乘得到8位乘积,然后用进位存储加法树方案来实现部分积的累加;该乘法器的结构简单,层次少,乘法全过程为2个时间节拍(每节拍143 ns);该乘法器采用流水工作方式,每拍可产生一个乘法结果。文中还对一些相关性很强的可以自身链结成闭环流水操作的指令,作了特殊安排,提高了乘法效率。  相似文献   

7.
何军  黄永勤  朱英 《计算机科学》2013,40(12):15-18,51
如何减少四倍精度浮点运算的硬件开销和延迟是需要解决的重要问题。为减少四倍精度乘加器的硬件开销,基于支持64位×4的双精度浮点SIMD FMA部件,设计并实现了一种新的四倍精度浮点乘加器(QPFMA),来支持4种浮点乘加运算和乘法、加减法、比较运算,运算延迟为7拍。通过将四倍精度113位×113位尾数乘法器分解为4个57位×57位乘法器来共享双精度浮点SIMD FMA部件的53位×53位乘法器,显著减少了实现QPFMA的硬件开销。基于65nm工艺的逻辑综合结果表明,该QPFMA频率可达1.1GHz,面积是常规QPFMA设计的42.71%,仅与一个双精度浮点乘加器相当。与现有的QPFMA设计相比,相当工艺和频率下,其运算延迟减少了3拍,门数减少了65.96%。  相似文献   

8.
基于FPGA的高速流水线浮点乘法器设计   总被引:1,自引:0,他引:1  
设计了一种支持IEEE754浮点标准的32位高速流水线结构浮点乘法器.该乘法器采用新型的基4布思算法,改进的4:2压缩结构和部分积求和电路,完成Carry Save形式的部分积压缩,再由Catry Look-ahead加法器求得乘积.时序仿真结果表明该乘法器可稳定运行在80M的频率上,并已成功运用在浮点FFT处理器中.  相似文献   

9.
高效结构的多输入浮点乘法器在FPGA上的实现   总被引:1,自引:0,他引:1  
传统的多输入浮点乘法运算是通过级联二输入浮点乘法器来实现的,这种结构不可避免地使运算时延和所需逻辑资源成倍增加,从而难以满足高速数字信号处理的需求。本文提出了一种适合于在FPGA上实现的浮点数据格式和可以在三级流水线内完成的一种高效的多输入浮点乘法器结构,并给出了在Xilinx公司Virtex系列芯片上的测试数据。  相似文献   

10.
针对现有的采用Booth算法与华莱士(Wallace)树结构设计的浮点乘法器运算速度慢、布局布线复杂等问题,设计了基于FPGA的流水线精度浮点数乘法器。该乘法器采用规则的Vedic算法结构,解决了布局布线复杂的问题;使用超前进位加法器(Carry Look-ahead Adder,CLA)将部分积并行相加,以减少路径延迟;并通过优化的4级流水线结构处理,在Xilinx~ISE 14.7软件开发平台上通过了编译、综合及仿真验证。结果证明,在相同的硬件条件下,本文所设计的浮点乘法器与基4-Booth算法浮点乘法器消耗时钟数的比值约为两者消耗硬件资源比值的1.56倍。  相似文献   

11.
胡正伟  仲顺安  陈禾 《计算机工程》2007,33(21):237-239
研究了VelociTI结构浮点数字信号处理器寄存器堆的流水线读写原理并提出了一种设计方法。该方法对单操作数双精度浮点指令采用2个32位数据通路用1个流水线周期读取源操作数,双操作数双精度浮点指令采用锁定译码单元,利用若干流水线周期读取源操作数。采用写控制向量的方法实现了流水线多个周期执行写操作。该方法正确实现了基于IEEE754标准的双精度浮点数据在寄存器堆与功能单元之间的32位数据通路上的传输,仿真结果验证了其正确性。  相似文献   

12.
In the conventional floating point multipliers, the rounding stage is usually constructed by using a high speed adder for the increment operation, increasing the overall execution time and occupying a large amount of chip area. Furthermore, it may accompany additional execution time and hardware components for renormalization which may occur by an overflow from the rounding operation. A floating-point multiplier performing addition and IEEE rounding in parallel is designed by optimizing the operational flow based on the characteristics of floating point multiplication operation. A hardware model for the floating point multiplier is proposed and its operational model is algebraically analyzed in this research. The floating point multiplier proposed does not require any additional execution time nor any high speed adder for rounding operation. In addition, the renormalization step is not required because the rounding step is performed prior to the normalization operation. Thus, performance improvement and cost-effective design can be achieved by this approach.  相似文献   

13.
The floating point number is the most commonly used real number representation for digital computations due to its high precision characteristics. It is used on computers and on single chip applications such as DSP chips. Double precision (64-bit) representations allow for a wider range of real numbers to be denoted. However, single precision (32-bit) operations are more efficient. Recently, there has been an increasing interest in mixed precision computations which take advantage of single precision efficiency on 64-bit numbers. This calls for the ability to interchange between the two formats. In this paper, an algorithm that converts floating point numbers from 64- to 32-bit representations is presented. The algorithm was implemented as a Verilog code and tested on field programmable gate array (FPGA) using the Quartus II DE2 board and Agilent 16821A portable logic analyzer. Results indicate that the algorithm can perform the conversion reliably and accurately within a constant execution time of 25 ns with a 20 MHz clock frequency regardless of the number being converted.  相似文献   

14.
15.
为了提高多媒体数据的处理能力,高性能DSP普遍引入了SIMD技术。作为DSP重要组成部分的乘法器也必须具备这一功能。本文对SIMD乘法器的实现进行深入研究,提出了一种新的SIMD乘法器体系结构,采用两个16×8乘法器,通过对其操作数和结果进行符号扩展和拼接等处理,简单而高效地实现了16位FT-SIMD乘法器。同时,本体系结构可以扩展为32位和64位的SIMD乘法器。  相似文献   

16.
The design of a floating point matrix- vector multiplication processor array for VLSI, which has an optimal area-time complexity product, is presented. This processor array is capable of performing the function (where n = 1,…, N) and can be applied in many digital signal processing applications, by simply changing the matrix coefficients stored in that array. Each N-bit mantissa, M-bit exponent (N, M) processor element of the array comprises a mantissa multiplier/adder circuit and hardware to handle the floating point control. The multiplier/adder circuit is implemented by a new optimal algorithm, which is regular, recursive and fast. Secondly, the algorithm offers a highly local and regular interconnection network, which is a fundamental requirement in VLSI circuit design methodology.  相似文献   

17.
Performing computations with a low-bit number representation results in a faster implementation that uses less silicon, and hence allows an algorithm to be implemented in smaller and cheaper processors without loss of performance. We propose a novel formulation to efficiently exploit the low (or non-standard) precision number representation of some computer architectures when computing the solution to constrained LQR problems, such as those that arise in predictive control. The main idea is to include suitably-defined decision variables in the quadratic program, in addition to the states and the inputs, to allow for smaller roundoff errors in the solver. This enables one to trade off the number of bits used for data representation against speed and/or hardware resources, so that smaller numerical errors can be achieved for the same number of bits (same silicon area). Because of data dependencies, the algorithm complexity, in terms of computation time and hardware resources, does not necessarily increase despite the larger number of decision variables. Examples show that a 10-fold reduction in hardware resources is possible compared to using double precision floating point, without loss of closed-loop performance.  相似文献   

18.
黄惠生 《自动化仪表》2001,22(3):14-15,18
为智能仪表提供一种“十进制浮点数”,其特点是值域宽、值准确,它能代替整、长整型、三字节浮点数、四字节浮点数、BCD码数。用它通信可简化和统一仪表的通信协议,为制定仪表通用通信协议标准创造条件。  相似文献   

19.
This paper presents hardware designs, arithmetic algorithms, and numerical applications for variable-precision, interval arithmetic coprocessors. These coprocessors give the programmer the ability to set the initial precision of the computation, determine the accuracy of the results, and recompute inaccurate results with higher precision. Variable-precision, interval arithmetic algorithms are used to reduce the execution times of numerical applications. Three hardware designs with data paths of 16, 32, and 64 bits are examined. These designs are compared based on their estimated chip area, cycle time, and execution times for various numerical applications. Each coprocessor can be implemented on a single chip with a cycle time that is comparable to IEEE double-precision floating point coprocessors. For certain numerical applications, the coprocessors are two to four orders of magnitude faster than a conventional software package for variable-precision, interval arithmetic.  相似文献   

20.
本文提出了通过基于预处理和逻辑转换的并行Sticky位的计算方法。该方法已经成功地应用到64位高性能CPU的浮点部件设计中,能有效提高浮点部件的延时性能。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号