期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

自适应均衡器的FPGA设计 总被引：1，自引：0，他引：1

和玮苹张会生《信息安全与通信保密》2006,(8):146-148

论文介绍了自适应盲均衡器的FPGA设计,主要对自适应均衡器的核心运算单元-采用booth编码算法设计的高性能乘累加(MAC)运算单元进行了详细描述。相似文献

2.

侯华敏杨虹《微电子学》2005,35(5):509-512,516

设计了一个16位的高性能乘法累加单元,该电路能在单周期同时完成有符号与无符号整数的乘加、乘减运算,并且具有饱和运算功能.乘加单元采用改进的Booth编码乘法;把补码取反后加1的运算作为一个部分积,把累加数作为一个部分积,符号扩展位缩减后得到的补偿值为常数;部分积累加部分采用4-2压缩器;进位传递加法采用Brent-Kung加法,使结构对称紧凑.乘法累加单元采用hhnec 0.25 μm工艺实现,关键路径延时为4 ns. 相似文献

3.

盲均衡器高速运算单元的优化设计

何江海郭宪锋许家栋张会生耿光辉《信息安全与通信保密》2011,9(10):69-71

在自适应盲均衡器的设计过程中,符号数的乘加（MAC）运算是关键路径。文中对自适应均衡器的核心运算单元——Booth编码算法进行了详细描述,并采用Booth编码算法对高性能乘累加（MAC）运算单元进行了优化和现场可编程门阵列设计。系统仿真结果证明,优化设计后可以获得良好的均衡效果,在此基础上设计的均衡器能够很好地消除码间... 相似文献

4.

基于FPGA的全流水浮点乘累加器的设计及实现

《电子技术与软件工程》2016,(2)

为提升浮点乘累加的流水性能,本文提出了一种基于FPGA全流水浮点乘累加器的设计和实现方法。通过无阻赛流水累加和串形全加等技术,实现了任意长度单精度浮点复向量的乘累加计算,且相邻两个向量之间无流水间隙。该累加器在Xilinx的XC7VX690T FPGA上实现,乘法器和逻辑资源消耗不到1%,最高运行频率可达279MHz。相似文献

5.

32位DSP乘法器分析与设计

陈爽陈雷孙国欣刘闪刘茂华辛向利《电子工程师》2007,33(11):49-51

衡量DSP(数字信号处理器)芯片性能的一个重要指标是单位时间内能够完成乘累加操作的数量。乘累加速度的增加就会使得DSP芯片运算速度增加。因此,通过对数据通路中的乘法器进行各种设计分析,得出适合32位浮点DSP结构的乘法器,为得到较优的乘累加设计奠定了基础。相似文献

6.

一种高速DSP中延迟优化的乘累加单元的设计与实现

下载免费PDF全文

Sheraz Anjum 陈杰李海军《电子器件》2007,30(4):1375-1379

乘累加单元是任何数字信号处理器(DSP)数据通路中的一个关键部分.多年来,硬件工程师们一直倾注于其优化与改进.本文描述了一种速度优化的乘累加单元的设计与实现.本文的乘累加单元是为一种高速VLIW结构的DSP核设计,能够进行16×16 40的无符号和带符号的二进制补码操作.在关键路径延迟上,本文的乘累加单元比其他任何使用相同或不同算数技术实现的乘累加单元都更优.本文的乘累加单元已成功使用于synopsys的工具,并与synopsys的Design Ware库中相同位宽的乘累加单元比较.比较结果表明,本文的乘累加单元比Design Ware库中的任何其他实现都要快,适合于在需要高吞吐率的DSP核中使用.注意:比较是在Design compiler中使用相同属性和开关下进行的. 相似文献

7.

大数模幂乘运算的VLSI实现 总被引：5，自引：0，他引：5

陈弘毅盖伟新《电子学报》1999,27(2):8-17

信息加密,数字答乐,身份证等等是信息安全领域的重要内容,只有公钥密友体制才能很好地解决这些问题,大数模幂乘运算是许多公钥密友体制的核心运算,也是运算效率提高的瓶颈。基于Ｍｏｎｔｇｏｍｅｒｙ模乘变换,构造了一种新型的脉动阵列架构模乘运算器。结合简单二进制幂运算算法,采用０．８μｍＣＭＯＳ工艺,成功地设计并制造了２５６ｂｉｔ模幂乘运算器ＴＨＭ２５６,电路规模为１８６７７门,芯片面积为１７．６３ｍｍ６相似文献

8.

新型的DSP处理器高速低功耗多功能乘累加单元

下载免费PDF全文

高健陈杰《电子器件》2006,29(1):48-52,57

介绍了一种采用新型结构的应用于DSP处理器的多功能高速低功耗乘累加单元（MAC）。该设计采用了异步互锁流水线技术，极大的降低了功耗。在整个设计的关键路径即部分积产生和生成部分采用的互补部分积字校正（CPPWC）和三维压缩法（TDM）很好的优化了设计，提高了速度。嵌入该乘累加单元的DSP处理器采用SMIC 0．18CMOS工艺进行了流片。经测试，该设计优于采用传统结构的同类设计，其时延为3．34ns，功耗为13．9247mw。相似文献

9.

DSP中MAC的微系统结构设计

周昔平高德远樊晓桠荆元利沈戈《微电子学与计算机》2004,21(3):92-96

在实际的高性能定点数字信号处理器(DSP)设计过程中，往往需要设计一个功能复杂的乘累加器。也就是说，乘累加器不光是要同时完成通常所见的带符号数和无符号数的乘加及乘减运算，而且还需要同时完成整数乘加和小数乘加运算，无偏差的舍入运算，饱和等功能。另外，为了解决DSP中数据相关的问题，往往要求乘累加器在单拍完成所有的这些运算，因此很难找到一个高速度低成本的实现方案。文章首先给出了通常的高性能定点DSP中乘累加器所需要完成的功能需求，然后提出并实现了一个16位高性能乘累加器，将其所需要完成的上述各种功能巧妙地整合起来在单拍内完成，而完成所有上述功能只需要3级4：2压缩和一次超前进位的加法运算。该乘累加器采用0．35μm工艺实现，已经嵌入到数字信号处理器中并已经成功应用于实际的工程项目。相似文献

10.

基于高性能浮点乘累加器的浮点协处理器设计

邹翠谢憬谢鑫君《黑龙江电子技术》2014,(7):121-124

复杂运算中经常需要处理取值范围大、精度高的浮点型数据,一般的低端嵌入式内核中没有浮点硬件单元,采用软件模拟浮点运算往往不能满足实时性要求。现研究基于高性能浮点乘累加的通用浮点协处理器设计与实现,重点研究提升浮点运算能力、减少硬件开销等关键技术。实验结果显示向量浮点协处理器运算周期减少40％以上。相似文献

11.

A 600-MHz VLIW DSP

Agarwala S. Anderson T. Hill A. Ales M.D. Damodaran R. Wiley P. Mullinnix S. Leach J. Lell A. Gill M. Rajagopal A. Chachad A. Agarwala M. Apostol J. Krishnan M. Duc Bui Quang An Nagaraj N.S. Wolf T. Elappuparackal T.T. 《Solid-State Circuits, IEEE Journal of》2002,37(11):1532-1544

A 600-MHz VLIW digital signal processor (DSP) delivers 4800 MIPS, 2400 (16 b) or 4800 (8 b) million multiply accumulates (MMACs) at 0.3 mW/MMAC (16 b). The chip has 64M transistors and dissipates 719 mW at 600 MHz and 1.2 V, and 200 mW at 300 MHz and 0.9 V. It has an eight-way VLIW DSP core, a two-level memory system, and an I/O bandwidth of 2.4 GB/s. The chip integrates a c64X DSP core with Viterbi and turbo decoders. Architectural and circuit design approaches to achieve high performance and low power using a semi-custom standard cell methodology, while maintaining backward compatibility, are described. The chip is implemented in a 0.13-/spl mu/m CMOS process with six layers of copper interconnect. 相似文献

12.

Floating-point division and square root using a Taylor-series expansion algorithm

Taek-Jun Kwon Jeffrey Draper 《Microelectronics Journal》2009,40(11):1601-1605

Hardware support for floating-point (FP) arithmetic is a mandatory feature of modern microprocessor design. Although division and square root are relatively infrequent operations in traditional general-purpose applications, they are indispensable and becoming increasingly important in many modern applications. Therefore, overall performance can be greatly affected by the algorithms and the implementations used for designing FP-Div and FP-Sqrt units. In this paper, a single-precision fused floating-point multiply/divide/square root unit based on Taylor-series expansion algorithm is proposed. We extended an existing multiply/divide fused unit to incorporate the square root function with little area and latency overhead since Taylor's theorem enables us to compute approximations for many well-known functions with very similar forms. The implementation results of the proposed fused unit based on standard cell methodology in IBM 90 nm technology exhibits that the incorporation of square root function to an existing multiply/divide unit requires only a modest 18% area increase and the same low latency for divide and square root operation can be achieved (12 cycles). The proposed arithmetic unit exhibits a reasonably good area-performance balance. 相似文献

13.

A Predictably Low-Leakage ASIC Design Style

Jayakumar N. Khatri S. P. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2007,15(3):276-285

In this paper, we describe a new low-leakage standard cell based application-specific integrated circuit (ASIC) design methodology. This design is based on the use of modified standard cells, designed to reduce leakage currents (by almost two orders of magnitude) in standby mode and also allow precise estimation of leakage current. For each cell in a standard cell library, two low-leakage variants of the cell are designed. If the inputs of a cell during the standby mode of operation are such that the output has a high value, we minimize the leakage in the pull-down network, and similarly we minimize leakage in the pull-up network if the output has a low value. In this manner, two low-leakage variants of each standard cell are obtained. While technology mapping a circuit, we determine the particular variant to utilize in each instance, so as to minimize leakage of the final mapped design. We have performed experiments to compare placed-and-routed area, leakage and delays of this new methodology against Multithreshold CMOS (MTCMOS) and a regular standard cell based design style. The results show that our new methodology (which we call the "HL" methodology) has better speed and area characteristics than MTCMOS implementations. The leakage current for HL designs can be dramatically lower than the worst-case leakage of MTCMOS based designs, and two orders of magnitude lower than the leakage of traditional standard cells. An ASIC design implemented in MTCMOS would require the use of separate power and ground supplies for latches and combinational logic, while our methodology does away with such a requirement. Another advantage of our methodology is that the leakage is precisely estimable, in contrast with MTCMOS. Our primary contribution in this paper is a new low leakage design style for static CMOS designs. In addition, we also discuss techniques to reduce leakage in dynamic (domino logic) designs 相似文献

14.

An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS 总被引：3，自引：0，他引：3

Vangal S.R. Howard J. Ruhl G. Dighe S. Wilson H. Tschanz J. Finan D. Singh A. Jacob T. Jain S. Erraguntla V. Roberts C. Hoskote Y. Borkar N. Borkar S. 《Solid-State Circuits, IEEE Journal of》2008,43(1):29-41

This paper describes an integrated network-on-chip architecture containing 80 tiles arranged as an 8x10 2-D array of floating-point cores and packet-switched routers, both designed to operate at 4 GHz. Each tile has two pipelined single-precision floating-point multiply accumulators (FPMAC) which feature a single-cycle accumulation loop for high throughput. The on-chip 2-D mesh network provides a bisection bandwidth of 2 Terabits/s. The 15-FO4 design employs mesochronous clocking, fine-grained clock gating, dynamic sleep transistors, and body-bias techniques. In a 65-nm eight-metal CMOS process, the 275 mm² custom design contains 100 M transistors. The fully functional first silicon achieves over 1.0 TFLOPS of performance on a range of benchmarks while dissipating 97 W at 4.27 GHz and 1.07 V supply. 相似文献

15.

AE32000B: a Fully Synthesizable 32‐Bit Embedded Microprocessor Core

Hyun‐Gyu Kim Dae‐Young Jung Hyun‐Sup Jung Young‐Min Choi Jung‐Su Han Byung‐Gueon Min Hyeong‐Cheol Oh 《ETRI Journal》2003,25(5):337-344

In this paper, we introduce a fully synthesizable 32‐bit embedded microprocessor core called the AE32000B. The AE32000B core is based on the extendable instruction set computer architecture, so it has high code density and a low memory access rate. In order to improve the performance of the core, we developed and adopted various design options, including the load extension register instruction (LERI) folding unit, a high performance multiply and accumulate (MAC) unit, various DSP units, and an efficient coprocessor interface. The instructions per cycle count of the Dhrystone 2.1 benchmark for the designed core is about 0.86. We verified the synthesizability and the area and time performances of our design using two CMOS standard cell libraries: a 0.35‐·m library and a 0.18‐·m library. With the 0.35‐·m library, the core can be synthesized with about 47,000 gates and operate at 70 MHz or higher, while it can be synthesized with about 53,000 gates and operate at 120 MHz or higher with the 0.18‐·m library. 相似文献

16.

A 2.5-GFLOPS, 6.5 million polygons per second, four-way VLIWgeometry processor with SIMD instructions and a software bypassmechanism

Kubosawa H. Higaki N. Ando S. Takahashi H. Asada Y. Anbutsu H. Sato T. Sakate M. Suga A. Kimura M. Miyake H. Okano H. Asato A. Kimura Y. Nakayama H. Kimoto M. Hirochi K. Saito H. Kaido N. Nakagawa Y. Shimada T. 《Solid-State Circuits, IEEE Journal of》1999,34(11):1619-1626

A four-way very long instruction word (VLIW), 312-MHz geometry processor with peripheral component interconnect/accelerated graphic port bus bridge was implemented in a 0.21-μm, 2.5-V, three-layer-metal CMOS process. We adopted (1) a software bypass mechanism, (2) single-instruction multiple-data stream instructions, (3) four sets of floating-point multiply add and accumulate execution units, (4) special condition code registers and a branch condition generator for a clipping operation, and (5) automatic clock delay tuning methodology. As a result of these features, we achieved a performance of 2.5 GFLOPS and 6.5 million polygons per second for a three-dimensional geometry processor, which is the highest published performance as a single geometry processor. The processor is applicable to computer-aided-design systems that require very high graphics performance 相似文献

17.

分布式算法在FIR数字滤波器实现中的应用 总被引：2，自引：1，他引：1

LI Mei 王兰勋《通信技术》2008,41(8)

文章提出了一种利用FPGA实现FIR数字滤波器的设计方案,在设计过程中应用了分布式算法(DA).FPGA有着规整的内部逻辑阵列和丰富的连线资源,特别适合于数字信号处理任务.分布式算法(DA)是一项重要的FPGA技术,它使得在FPGA中实现FIR滤波器的关键运算--乘加运算,转化为了查找表,大大提高了FIR滤波器的速度.文中给出了VHDL语言编写的程序和仿真波形. 相似文献

18.

MAC units for matched filters in DS-CDMA systems

Premkumar A.B. Madhukumar A.S. Lau C.T. 《Broadcasting, IEEE Transactions on》2002,48(1):52-57

High data rates in wireless radio environments are becoming common. Current system designs are inadequate to meet the speed requirements that are anticipated in newer services. The computational complexity in the subsystems of wireless radio increases in direct proportion to the data rate. In this correspondence, we propose alternative multiply accumulate (MAC) units for the pulse shaping filters that use a new representation for their coefficients. Consequently, these new structures are fast, efficient and dissipate less power. The filters proposed take into account constraints, such as, intersymbol interference, response characteristics etc. in their design methodology 相似文献

19.

宽频带GTEM传输室接头的优化设计

周忠元汤仕平蒋全兴景莘慧《电波科学学报》2005,20(2):143-146

研制宽频带的吉赫横电磁波传输室,其高频输入接头的设计尤为关键.采用有限积分技术进行输入转接头的驻波比计算,利用软件建立仿真计算模型对内导体过渡段、介质垫片结构和圆形到方形处的结构进行优化,从而解决高频输入转接头的设计制作难题,保证了吉赫横电磁波传输室的整体性能.测试结果表明,采用此接头研制的传榆室工作频率上限达18GHz,电压驻波比小于1.5. 相似文献