首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Hou  Junjie  Zhu  Yongxin  Du  Sen  Song  Shijin 《Journal of Signal Processing Systems》2019,91(10):1137-1148

The high performance, power efficiency and reconfigurable characteristic of FPGA attract more and more attention in big data processing. In scientific data analytics, besides the consideration of computing performance, accuracy of the results and dynamic range of data representation are critical features that must be considered. At present, the floating-point IP cores in FPGA design use IEEE standard for floating-point arithmetic – IEEE 754. For FPGA based scientific data application, improving existing floating-point IP cores is a significant way to obtain better results. Posit is a floating-point arithmetic format first proposed by John L. Gustafson in 2017. In posit, the variable precision and efficient representation of exponent contribute a higher accuracy and larger dynamic range than IEEE 754. This work researches on the FPGA implementation of posit arithmetic for extending floating-point IP cores for FPGA based scientific data analytics. We design the logic for hardware implementation and implement it on FPGA. We compare the precision representation, dynamic range and performance of implemented posit FPU (Floating-Point Unit) with IEEE 754 floating-point IP cores. Posit exhibits better superiority in precision representation and dynamic range than IEEE 754, and through further optimization of the implementation, posit can be a good candidate for floating-point IP cores.

  相似文献   

2.
In this work we present an implementation of the exponential function in double precision, in a unit that supports IEEE floating-point arithmetic. As existing proposals, the implementation is based on the use of a floating-point multiplier and additional hardware. We decompose the computation into three subexponentials. The first and third subexponentials are computed in a conventional way (table look-up and polynomial approximation). The second subexponential is computed based on a transformation of the slow radix-2 digit-recurrence algorithm into a fast computation by using the multiplier and additional hardware. We present a design process that permits the selection of the most convenient trade-off between hardware complexity and latency. We discuss the algorithm, the implementation, and perform a rough comparison with three proposed designs. Our estimations indicate that the implementation proposed in this work presents better trade-off between hardware complexity and latency than the compared designs.  相似文献   

3.
4.
With the density of field-programmable gate arrays (FPGAs) steadily increasing, FPGAs have reached the point where they are capable of implementing complex floating-point applications. However, their general-purpose nature has limited the use of FPGAs in scientific applications that require floating-point arithmetic due to the large amount of FPGA resources that floating-point operations still require. This paper considers three architectural modifications that make floating-point operations more efficient on FPGAs. The first modification embeds floating-point multiply-add units in an island-style FPGA. While offering a dramatic reduction in area and improvement in clock rate, these embedded units are a significant change and may not be justified by the market. The next two modifications target a major component of IEEE compliant floating-point computations: variable length shifters. The first alternative to lookup tables (LUTs) for implementing the variable length shifters is a coarse-grained approach: embedded variable length shifters in the FPGA fabric. These shifters offer a significant reduction in area with a modest increase in clock rate and are smaller and more general than embedded floating-point units. The next alternative is a fine-grained approach: adding a 4:1 multiplexer unit inside a configurable logic block (CLB), in parallel to each 4-LUT. While this offers the smallest overall area improvement, it does offer a significant improvement in clock rate with only a trivial increase in the size of the CLB.  相似文献   

5.
根据IEEE754/854标准,微处理器浮点单元的异常类型繁多,产生异常的原因和处理方式也比较复杂。因此浮点单元的设计中经常面临异常难以捕获的问题。本文提出一种高精度、高指令密集度开放式异常处理方法,保证了异常检测的完备性,减少了面积和功耗,加快了指令执行速度。  相似文献   

6.
基于测试系统的FPGA逻辑资源的测试   总被引:6,自引:1,他引:5  
唐恒标  冯建华  冯建科 《微电子学》2006,36(3):292-295,299
FPGA在许多领域已经得到广泛应用,其测试问题也显得越来越突出。文章针对基于SRAM结构FPGA的特点,以Xilinx公司的XC4000系列芯片为例,利用检测可编程逻辑资源的多逻辑单元(CLB)混合故障的测试方法,阐述了如何在BC3192V50测试系统上实现FPGA的在线配置以及功能和参数测试。它是一种基于测试系统的通用的FPGA配置和测试方法。  相似文献   

7.
Increasing chip densities and transistor counts provide more room for designers to add functionality for important application domains into future microprocessors. As a result of rapid growth in financial, commercial, and Internet-based applications, hardware support for decimal floating-point arithmetic is now being considered by various computer manufacturers and specifications for decimal floating-point arithmetic have been added to the draft revision of the IEEE-754 Standard for Floating-Point Arithmetic (IEEE P754). In this paper, we presents an efficient arithmetic algorithm and hardware design for decimal floating-point division. The design uses an efficient piecewise linear approximation, a modified Newton–Raphson iteration, a specialized rounding technique, and a simplified decimal incrementer and decrementer. Synthesis results show that a 64-bit (16-digit) implementation of the decimal divider, which is compliant with the current version of IEEE P754, has an estimated critical path delay of 0.69 ns (around 13 FO4 inverter delays) when implemented using LSI Logic’s 0.11 micron Gflx-P standard cell library.
Michael J. SchulteEmail:
  相似文献   

8.
This paper deals with the optimization of iterative algorithms with matrix operations or nested loops for hardware implementation in Field Programmable Gate Arrays (FPGA), using Integer Linear Programming (ILP). The method is demonstrated on an implementation of the Finite Interval Constant Modulus Algorithm. It is an equalization algorithm, suitable for modern communication systems (4G and behind). For the floating-point calculations required in the algorithm, two arithmetic libraries were used in the FPGA implementation: one based on the logarithmic number system, the other using floating-point number system in the standard IEEE format. Both libraries use pipelined modules. Traditional approaches to the scheduling of nested loops lead to a relatively large code, which is unsuitable for FPGA implementation. This paper presents a new high-level synthesis methodology, which models both, iterative loops and imperfectly nested loops, by means of the system of linear inequalities. Moreover, memory access is considered as an additional resource constraint. Since the solutions of ILP formulated problems are known to be computationally intensive, an important part of the article is devoted to the reduction of the problem size.
Jan SchierEmail:
  相似文献   

9.
A high-performance data execution unit suitable for computation-intensive digital signal processing systems is described. This unit uses the hybrid number system approach to speed up the basic arithmetic operations while remaining compatible with a standard IEEE 32-b floating-point format. However, all the arithmetic operations are performed in the 32 b logarithmic number system (LNS) domain. This chip is designed using a 3.4 V 0.8 μm CMOS technology with double-layer metallization. Conversion algorithms, chip architecture, design methodology, and major circuit components are discussed. A macrocell design methodology is adopted in order to achieve high-performance custom design circuits with the convenience of an automatic layout system. Computer simulations indicate that all the 32 b floating-point arithmetic operations (multiplication, division, squaring, and square root) can be executed in 10 ns. Extension of this unit into a 64 b double-precision floating-point system and multiply-accumulation applications are also presented  相似文献   

10.
This paper demonstrates how IEEE 754 floating-point standard compliant rounding can be merged with carry-propagate addition in floating-point unit (FPU) designs by using a novel adaptation of the prefix adder. The paper considers add/subtract, multiply, and SRT divide operations and demonstrates that in every case a generic rounding architecture based on a prefix adder with a small amount of additional logic is sufficient to cover all the rounding modes. Critical path analysis shows that the proposed architecture is compatible with contemporary pipelined FPU design practice, while using significantly less logic  相似文献   

11.
The design of the WE32106 Math Accelerator Unit, which provides the WE32100 microprocessor with IEEE standard (Draft 10) floating-point capabilities, is described. The chip implements a host of floating-point operations in single, double, and double-extended precision, as well as the complete set of IEEE standard requirements for fault and exception handling. The chip provides a high-speed co-processor interface to the WE32100 microprocessor, as well as a general-purpose memory-mapped peripheral-mode interface to other microprocessors. The chip is implemented in 1.5 /spl mu/m twin-tub CMOS III technology.  相似文献   

12.
A new floating-point division architecture that complies with the IEEE 754-1985 standard is proposed in this paper. This architecture is based on the New Svoboda-Tung (NST) division algorithm and the radix-4 MROR (maximally redundant maximally recoded) signed digit number system. In NST division, the divisor and dividend must be prescaled. We summarize a general systematic method to accomplish the prescaling, and we also propose a hardware scheme such that the timing complexity is constant regardless of the bit length of the divisor. For the divider implementation, a new MROR signed digit adder with carry free characteristic is proposed for addition and subtraction, and this adder can improve the cycle time significantly. A 32-b/32-b radix-4 divider is thus designed in Verilog HDL; the simulation results show that this architecture is implementable using currently available libraries. The hardware complexity and performance of this divider is competitive with conventional SRT dividers.  相似文献   

13.
IEEE 802.11g性能分析及应用   总被引:1,自引:1,他引:0  
全面介绍IEEE802.11g标准的无线局域网,详细讲述IEEE802.11g草案标准的概念、产生背景、特点、构件及其体系结构和发展前景,探讨实现IEEE802.11gWLAN所需的关键技术及其双频多模应用方式,同时分析IEEE802.11g标准的网络性能。  相似文献   

14.
IEEE 802.17 resilient packet ring tutorial   总被引:11,自引:0,他引:11  
IEEE Working Group 802.17 is standardizing a new ring topology network architecture, called the resilient packet ring, to be used mainly in metropolitan and wide area networks. This article presents a technology background, gives an overview, and explains some of the design choices behind RPR. Some major architectural features are illustrated and compared by showing performance evaluation results using the RPR simulator developed at Simula Research Laboratory using the OPNET modeler simulation environment.  相似文献   

15.
Based on the floating-point representation and taking advantage of scaling factor indetermination in blind source separation (BSS) processing, we propose a scaling technique applied to the separation matrix, to avoid the saturation or the weakness in the recovered source signals. This technique performs an automatic gain control in an on-line BSS environment. We demonstrate the effectiveness of this technique by using the implementation of a division-free BSS algorithm with two inputs, two outputs. The proposed technique is computationally cheaper and efficient for a hardware implementation compared to the Euclidean normalisation.  相似文献   

16.
This paper focuses on the design and implementation of CLASS, a Cross-Layer Association scheme for IEEE 802.11-based multi-hop wireless mesh networks. The widely-used association strategy in traditional IEEE 802.11 wireless LANs allows a Mobile Station (MS) to scan wireless access links and then associate with the Access Point (AP) that has the best Received Signal Strength Indication (RSSI) value. Unlike traditional wireless LANs, IEEE 802.11-based wireless mesh networks consist of a multi-hop wireless backhaul. As such, the performance experienced by an MS after association with a specific Mesh Access Point (MAP) depends heavily on the conditions of both the access link (e.g., traffic load of associated stations, the frame error rate between an MS and an MAP) and the mesh backhaul (e.g., end-to-end latency and asymmetric uplink/downlink transportation costs). That is, selecting the MAP that yields the “best” performance depends on several factors and cannot be determined solely on the RSSI of the MS-MAP access link. CLASS uses an end-to-end airtime cost metric to determine the MAP to which an MS should associate. The airtime cost metric is based on the IEEE 802.11s, and comprises the access link airtime cost and the backhaul airtime cost. The proposed association scheme considers the frame error rate for various packet sizes, the available bandwidth on the access link after the association of the new MS, and the asymmetric uplink and downlink transportation costs on the backhaul. All experimental results are based on actual Linux-base testbed implementation. We also implement a general Cross-Layer Service Middleware (CLSM) module that is used to monitor network conditions and gather relevant metrics and factor values. Experimental results show that the proposed association scheme is able to identify the MAP which yields the highest end-to-end network performance for the mobile stations after their associations.  相似文献   

17.
The FPC controller and the AMD Am29325 32-bit floating-point mathematics processor form a two-chip cell designed for one- or two-dimensional systolic arrays which can be used to implement a wide variety of signal processing applications. The FPC controls the Am29325, routes data to and from it, and routes data and control to other cells in the array. Unique architectural features include two interchangeable data memories, an input port which can be used as either a local or global port, and a 32-bit instruction word that allows concurrent use of all cell resources. Additional features include a program memory, two data streams, and three control streams.  相似文献   

18.
19.
浮点数加法运算是浮点运算中使用频率最高的运算。结合VHDL和FPGA可编程技术,完成具有5级流水线结构、符合IEEE754浮点数标准、可参数化为单/双精度的浮点数加法器IP核的VHDL设计。  相似文献   

20.
The ability to provide flexibility and allow fine-grain circuit specialization make field programmable gate arrays (FPGA's) ideal candidates for computing elements within application-specific architectures. The benefits of gate-level specialization and reconfigurability can be extended by reconfiguring circuit resources at run-time. This technique, termed run-time reconfiguration (RTR), allows the exploitation of dynamic conditions or temporal locality within application-specific problems. For several applications, this technique has been shown to reduce the hardware resources required for computation. The use of this technique on conventional FPGA's, however, requires additional time for circuit reconfiguration. A functional density metric is introduced that balances the advantages of RTR against its associated reconfiguration costs. This metric is used to justify run-time reconfiguration against other more conventional approaches. Several run-time reconfigured applications are presented and analyzed using this approach  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号