期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Parallel Memory Accessing for FFT Architectures

V. Kitsakis K. Nakos D. Reisis N. Vlassopoulos 《Journal of Signal Processing Systems》2018,90(11):1593-1607

The current paper introduces an efficient technique for parallel data addressing in FFT architectures performing in-place computations. The novel addressing organization provides parallel load and store of the data involved in radix-r butterfly computations and leads to an efficient architecture when r is a power of 2. The addressing scheme is based on a permutation of the FFT data, which leads to the improvement of the address generating circuit and the butterfly processor control. Moreover, the proposed technique is suitable for mixed radix applications, especially for radixes that are powers of 2 and straightforward continuous flow implementation. The paper presents the technique and the resulting FFT architecture and shows the advantages of the architecture compared to hitherto published results. The implementations on a Xilinx FPGA Virtex-7 VC707 of the in-place radix-8 FFT architectures with input sizes 64 and 512 complex points validate the results. 相似文献

2.

基于FPGA的混合基FFT算法设计与实现

下载免费PDF全文

侯晓晨孟骁陈昊《太赫兹科学与电子信息学报》2021,19(2):303-307

目前,研究资源节约型的低复杂度混合基快速傅里叶变换(FFT)设计技术具有重要的应用价值。本文基于现场可编程逻辑门阵列(FPGA)平台提出并实现了一种新型混合基FFT分解算法。该算法基于原位存储结构设计,采用素数因子分解与库利-图基分解相结合的混合分解模式,在省去了一步旋转因子乘法运算的同时也有效减小了存储空间和运算量,并采用通用蝶形单元模块设计使得算法能够同时适应基2、基3、基4的FFT运算。仿真结果表明,该算法可以极大提高FFT处理点数的灵活性,有效节省运算资源。相似文献

3.

基于802.11a的FFT/IFFT处理器设计

吴斌姜鑫周玉梅《微电子学与计算机》2011,28(4):61-64

设计了一种应用于802.11a的64点FFT/IFFT处理器.采用单蝶形4路并行结构,提出了4路并行无冲突地址产生方法,有效地提高了吞吐率,完成64点FFT/IFFT运算只需63个时钟周期.提出的RAM双乒乓结构实现了对输入和输出均为连续数据流的缓存处理.不仅能实现64点FFT和IFFT,而且位宽可以根据系统任意配置.为了提高数据运算的精度,设计采用了块浮点算法,实现了精度与资源的折中.16位位宽时,在HJTC 0.18μmCMOS工艺下综合,内核面积为:0.626 7 mm2,芯片面积为:1.35 mm×1.27 mm,最高工作频率可达300 MHz,功耗为126.17 mW. 相似文献

4.

A new radix-2/8 FFT algorithm for length-q/spl times/2/sup m/ DFTs

Bouguezel S. Ahmad M.O. Swamy M.N.S. 《IEEE transactions on circuits and systems. I, Regular papers》2004,51(9):1723-1732

In this paper, a new radix-2/8 fast Fourier transform (FFT) algorithm is proposed for computing the discrete Fourier transform of an arbitrary length N=q/spl times/2/sup m/, where q is an odd integer. It reduces substantially the operations such as data transfer, address generation, and twiddle factor evaluation or access to the lookup table, which contribute significantly to the execution time of FFT algorithms. It is shown that the arithmetic complexity (multiplications+additions) of the proposed algorithm is, in most cases, the same as that of the existing split-radix FFT algorithm. The basic idea behind the proposed algorithm is the use of a mixture of radix-2 and radix-8 index maps. The algorithm is expressed in a simple matrix form, thereby facilitating an easy implementation of the algorithm, and allowing for an extension to the multidimensional case. For the structural complexity, the important properties of the Cooley-Tukey approach such as the use of the butterfly scheme and in-place computation are preserved by the proposed algorithm. 相似文献

5.

一个高效的嵌入式浮点FFT处理器的实现 总被引：2，自引：0，他引：2

杨靓黄士坦《信号处理》2003,19(2):161-165

FFT是数字信号处理中的一种非常重要的算法。本文构造了一个适于嵌入式应用的基16FFT处理器局部流水结构,同时设计实现了一个高效的基4蝶形运算模块。我们的研究应用了局部流水和反馈的思想,使基16FFT蝶形运算模块得以由两个基4/基2蝶形模块组成的反馈流水电路实现,在简化结构的同时提高了处理速度。基4蝶形模块中运算模块的利用率达到100％,而且比传统的基四蝶形模块节省60％以上的资源。相似文献

6.

混合基可重构FFT处理器的设计与实现

宋宇鲲曲双双徐礼晗张多利《微电子学与计算机》2020,(1):87-92,98

本文提出了一种新型混合基可重构FFT处理器,由支持基-2/3FFT的新型可重构蝶形单元和多路并行无冲突的存储器组成,实现了FFT过程中多路数据并行性和操作的连续性.本设计在TSMC28nm工艺下的最高频率为1.06GHz,同时在Xilinx的XC7V2000T FPGA芯片上搭建了混合基FFT处理器硬件测试系统.对混合基FFT处理器的FPGA硬件测试结果表明,本设计支持基-2、基-3和基-2/3混合模式FFT变换,且执行速度达到给定蝶乘器数量下的理论周期值,对单精度浮点数,混合基FFT处理器可提供10-5的结果精度. 相似文献

7.

WFTA算法的FPGA设计与实现

魏鹏孙磊王华力《通信技术》2011,44(4):167-169

Winograd傅里叶变换算法（WFTA）利用旋转因子W的特性对其进行分解,能够把FFT运算中乘法次数降到最低,是一种高效且资源占用相对较少的FFT实现方法。以256点分解为两维16×16点的小数组WFTA进行运算为例介绍了大数组WFTA算法的FPGA设计与实现方案。仿真测试表明,所设计的256点FFT处理器,乘法器资源消耗仅为基-2FFT的1/2、基-4FFT的2/3,且在100 MHz主时钟频率下完成运算仅需5.8μs,满足FFT处理器的高速实时性要求。相似文献

8.

基于FPGA的移位寄存器流水线结构FFT处理器设计与实现

郝小龙韦高刘娜《现代电子技术》2010,33(9):172-176

设计实现了基于FPGA的256点定点FFT处理器。处理器以基-2算法为基础,通过采用高效的两路输入移位寄存器流水线结构,有效提高了碟形运算单元的运算效率,减少了寄存器资源的使用,提高了最大工作频率,增大了数据吞吐量,并且使得处理器具有良好的可扩展性。详细描述了具体设计的算法结构和各个模块的实现。设计采用Verilog HDL作为硬件描述语言,采用QuartusⅡ设计仿真工具进行设计、综合和仿真,仿真结果表明,处理器工作频率为72 MHz,是一种高效的FFT处理器IP核。相似文献

9.

An accurate error analysis model for fast Fourier transform

Yutai Ma 《Signal Processing, IEEE Transactions on》1997,45(6):1641-1645

An error propagation model is proposed for the in-place decimation-in-time version of the radix-2 FFT algorithm. With the model, an accurate error expression and error variance for the computation of FFT are derived. This correspondence deals with fixed-point and block floating-point arithmetic. Simulation results agree closely with the theoretical predicted ones. We find that some roundoff errors at different stages correlate with each other. The density of correlations is closely associated with the round-off approach used in butterfly calculations 相似文献

10.

A 2.4-Gsample/s DVFS FFT Processor for MIMO OFDM Communication Systems 总被引：1，自引：0，他引：1

Yuan Chen Yu-Wei Lin Yu-Chi Tsao Chen-Yi Lee 《Solid-State Circuits, IEEE Journal of》2008,43(5):1260-1273

This paper presents a new dynamic voltage and frequency scaling (DVFS) FFT processor for MIMO OFDM applications. By the proposed multimode multipath-delay-feedback (MMDF) architecture, our FFT processor can process 1-8-stream 256-point FFTs or a high-speed 256-point FFT in two processing domains at minimum clock frequency for DVFS operations. A parallelized radix-2⁴ FFT algorithm is also employed to save the power consumption and hardware cost of complex multipliers. Furthermore, a novel open-loop voltage detection and scaling (OLVDS) mechanism is proposed for fast and robust voltage management. With these schemes, the proposed FFT processor can operate at adequate voltage/frequency under different configurations to support the power-aware feature. A test chip of the proposed FFT processor has been fabricated using UMC 90 nm single-poly nine-metal CMOS process with a core area of 1.88 times1.88 mm² . The SQNR performance of this FFT chip is over 35.8 dB for QPSK/16-QAM modulation. Power dissipation of 2.4 Gsample/s 256-point FFT computations is about 119.7 mW at 0.85 V. Depending on the operation mode, power can be saved by 18%-43% with voltage scaling in TT corner. 相似文献

11.

流水线结构FFT/IFFT处理器的设计与实现 总被引：1，自引：0，他引：1

何星张铁军侯朝焕《微电子学与计算机》2007,24(4):141-143,147

针对实时高速信号处理的要求，设计并实现了一种高效的FFT处理器。在分析了FFT算法的复杂度和硬件实现结构的基础上，处理器采用了按频率抽取的基—4算法，分级流水线以及定点运算结构。可以根据要求设置成4P点的FFT或IFFT。处理器可以对多个输入序列进行连续的FFT运算，消除了数据的输入输出对延时的影响。平均每完成一次N点FFT运算仅需要Ⅳ个时钟周期。整个设计基于Verilog HDL语言进行模块化设计。并在Altera公司的Cyclone Ⅱ器件上实现。相似文献

12.

A 64-point Fourier transform chip for high-speed wireless LAN application using OFDM 总被引：1，自引：0，他引：1

Maharatna K. Grass E. Jagdhold U. 《Solid-State Circuits, IEEE Journal of》2004,39(3):484-493

In this paper, we present a novel fixed-point 16-bit word-width 64-point FFT/IFFT processor developed primarily for the application in an OFDM-based IEEE 802.11a wireless LAN baseband processor. The 64-point FFT is realized by decomposing it into a two-dimensional structure of 8-point FFTs. This approach reduces the number of required complex multiplications compared to the conventional radix-2 64-point FFT algorithm. The complex multiplication operations are realized using shift-and-add operations. Thus, the processor does not use a two-input digital multiplier. It also does not need any RAM or ROM for internal storage of coefficients. The proposed 64-point FFT/IFFT processor has been fabricated and tested successfully using our in-house 0.25-/spl mu/m BiCMOS technology. The core area of this chip is 6.8 mm/sup 2/. The average dynamic power consumption is 41 mW at 20 MHz operating frequency and 1.8 V supply voltage. The processor completes one parallel-to-parallel (i.e., when all input data are available in parallel and all output data are generated in parallel) 64-point FFT computation in 23 cycles. These features show that though it has been developed primarily for application in the IEEE 802.11a standard, it can be used for any application that requires fast operation as well as low power consumption. 相似文献

13.

An efficient locally pipelined FFT processor

Liang Yang Kewei Zhang Hongxia Liu Jin Huang Shitan Huang 《Circuits and Systems II: Express Briefs, IEEE Transactions on》2006,53(7):585-589

The fast Fourier transform (FFT) is a very important algorithm in digital signal processing. The locally pipelined (LPPL) architecture is an efficient structure for FFT processor designing in a real-time embedded system. Two basic building blocks, to the LPPL FFT processor, the butterfly in pipeline, and address generating, are discussed in this brief. Based on the "deep" feedback to butterfly-2, a novel approach for pipelined architecture, the radix-2 single-path deep delay feedback architecture is proposed. For length-N discrete Fourier transform computation, the dominant hardware requirements are minimal for complex multipliers log/sub 4/N-1 and adders 2log/sub 4/N. As an integral need of the LPPL FFT processor design, address generating and coefficient store-load structures are also presented. 相似文献

14.

WIMAX系统中可配置FFT/IFFT的设计与实现

刘德福雷天民马卓《电子科技》2010,23(3):17-19

针对WIMAX系统中变长子载波的特点,通过采用流水线乒乓结构,以基2、基4混合基实现了高速可配置的FFT/IFFT。将不同点数的FFT旋转因子统一存储,同时对RAM单元进行优化,节约了存储空间;此外对基4蝶形单元进行优化,减少了加法和乘法运算单元。仿真和综合结果表明,设计满足了WIMAX高速系统中不同带宽下FFT/IFFT的要求。相似文献

15.

Radix-2 FFT butterfly processor using distributed arithmetic

MacTaggart I.R. Jack M.A. 《Electronics letters》1983,19(2):43-44

A parallel-data VLSI architecture for computation of the fast Fourier transform (FFT) is described. The processor is based on a computationally efficient vector rotate algorithm. Use of a 2-dimensional pipeline configuration allows a radix-2 butterfly operation to be performed once every system clock cycle (250 ns) to generate real or imaginary transform components. The architecture is considered to be a computationally efficient VLSI approach for high-bandwidth computation of the FFT. The design and performance of an 8-bit FFT butterfly processor are described. 相似文献

16.

Cost-Effective Triple-Mode Reconfigurable Pipeline FFT/IFFT/2-D DCT Processor

Chin-Teng Lin Yuan-Chu Yu Lan-Da Van 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2008,16(8):1058-1071

This investigation proposes a novel radix-4² algorithm with the low computational complexity of a radix-16 algorithm but the lower hardware requirement of a radix-4 algorithm. The proposed pipeline radix-4² single delay feedback path (R4²SDF) architecture adopts a multiplierless radix-4 butterfly structure, based on the specific linear mapping of common factor algorithm (CFA), to support both 256-point fast Fourier transform/inverse fast Fourier transform (FFT/IFFT) and 8times8 2D discrete cosine transform (DCT) modes following with the high efficient feedback shift registers architecture. The segment shift register (SSR) and overturn shift register (OSR) structure are adopted to minimize the register cost for the input re-ordering and post computation operations in the 8times8 2D DCT mode, respectively. Moreover, the retrenched constant multiplier and eight-folded complex multiplier structures are adopted to decrease the multiplier cost and the coefficient ROM size with the complex conjugate symmetry rule and subexpression elimination technology. To further decrease the chip cost, a finite wordlength analysis is provided to indicate that the proposed architecture only requires a 13-bit internal wordlength to achieve 40-dB signal-to-noise ratio (SNR) performance in 256-point FFT/IFFT modes and high digital video (DV) compression quality in 8 times 8 2D DCT mode. The comprehensive comparison results indicate that the proposed cost effective reconfigurable design has the smallest hardware requirement and largest hardware utilization among the tested architectures for the FFT/IFFT computation, and thus has the highest cost efficiency. The derivation and chip implementation results show that the proposed pipeline 256-point FFT/IFFT/2D DCT triple-mode chip consumes 22.37 mW at 100 MHz at 1.2-V supply voltage in TSMC 0.13-mum CMOS process, which is very appropriate for the RSoCs IP of next-generation handheld devices. 相似文献

17.

基于无冲突地址生成的高性能FFT处理器设计

王江黑勇郑晓燕仇玉林《微电子学与计算机》2007,24(3):15-19

提出一种基于存储器交织架构的FFT处理器设计方法,并且针对基-8FFT提出一种无冲突地址生成算法,数据按帧进行操作。每个存储器均划分为8个独立的存储体,通过对循环移位寄存器译码,蝶式运算单元并行无冲突读写操作数,8通道输入数据进行并行的复数乘法运算。每级运算引入完全流水,减少了运算的时钟周期开销,同时推导出局部流水线设计必须满足的不等式条件。输入、输出存储器采用乒乓操作,按帧轮换,FFT运算连续输入、输出,采样频率与系统工作频率一致,具有很好的实时性,运算精度通过块浮点得到保证。该设计方法可以扩展至基-16FFT处理器设计。相似文献

18.

快速图像匹配相关系数算法及实现 总被引：1，自引：0，他引：1

刘红侠杨靓黄巾黄士坦《微电子学与计算机》2007,24(2):32-35

最大归一互相关图像匹配算法是图像匹配中的常用算法,其关键是解算活动图与基准图间的相关系数。针对相关系数计算量大的特点,分析了FFT的基与FFT处理速度之间的关系以及基16FFT算法特点,提出用基16FFT算法计算相关系数,相关系数的处理时间大幅减小;同时针对高基蝶形单元设计复杂、使用不灵活等特点,提出采用级连思想实现主基16蝶形单元,使处理器的设计复杂度降低。实验证明,将主基16FFT处理器用于相关系数的计算中,使最大归一互相关图像匹配处理速度达到国际领先水平。相似文献

19.

16点基4-FFT芯片设计技术研究 总被引：2，自引：0，他引：2

丁晓磊朱恩赵梅《信息技术》2007,31(1):64-67,71

FFT算法是高速实时信号处理的关键算法之一,在很多领域有广泛应用。文中采用了基-4,按时间抽取FFT算法,完成了16点,32bit位长,定点复数FFT的设计。基-4蝶形单元中采用32位Booth算法乘法器,并使用3级流水线设计,并行的处理四路输入数据,极大地提高了FFT的处理速度。本设计划分为多个功能模块,全部采用Verilog HDL语言描述,并且通过仿真验证。相似文献

20.

一种快速傅里叶变换算法的FPGA实现

蒋华《太赫兹科学与电子信息学报》2008,6(6)

分析了快速傅里叶变换(FFT)算法的4种典型结构,提出了一种采用按时间抽取的基2单蝶形运算单元递归结构。对一种64点FFT进行仿真验证,在Cyclone的EP1C6T144C7上实现共占用967个逻辑单元,最高频率达56.47MHz。通过降低蝶形运算单元中乘法数目和采用乒乓RAM结构,节约了硬件资源,加快了FFT运算速度。相似文献