期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

蒋蓝祥《电视技术》2013,37(1)

提出一种新的非2-基N点FFT的素因子算法.该方案与原素因子分解算法比较,实现了各个小点数DFT的同址顺序运算,并通过简单的地址模加运算得到顺序的输出,省去了多余的整序运算,是一种通用N点FFT算法.设计结构规整简单,利于硬件实现.以中国数字电视广播地面传输标准(DTMB)规定的3 780点FFT为例,结合WFTA算法和混合基算法,介绍了算法的具体设计与实现方案. 相似文献

2.

混合基FFT算法运算量分析

下载免费PDF全文

程志鹏马琪竺红卫《太赫兹科学与电子信息学报》2016,14(6):925-928

通过理论推导、定量分析和实验设计的研究方法分析了非2整数次幂点数N的混合基快速傅里叶变换(FFT)算法运算量大小与N分解因子的不同组合方式以及组合次序的关系。实验结果表明,在一定条件下,对于相同FFT点数N的混合基FFT的不同分解因子组合,其算法运算量与所有分解因子总和K的大小有关,但与因子的组合次序无关。最后提出了建立混合基FFT最小运算量的分解因子匹配库作为使用混合基FFT时的分解因子组合选择参考表的设想。为相关研究和实际应用的工程人员提供一定参考。相似文献

3.

高速3780点FFT处理器的设计与实现

解春云刘光辉牛俊峰《电视技术》2012,36(5):1-4,11

介绍了地面数字电视多媒体广播传输系统(DTMB)中3 780点FFT处理器的设计与实现。通过综合利用混合基算法、素因子算法和WFTA算法,来完成3 780点FFT的算法设计。采用流水线结构进行硬件实现,为进一步提高系统吞吐率,其内部3,4,5,7,9点WFTA运算单元均采用并行数据处理方式。相似文献

4.

基于CORDIC旋转器的基-3 FFT算法高效设计

下载免费PDF全文

周群群许思耀姚亚峰付东兵《电子器件》2023,46(2):342-348

设计出一种可以用于FPGA高效实现的基-3 FFT算法,采用改进的三端前馈延迟转换器结构,优化了延迟和运算过程。针对蝶形运算中复数乘法器占据大量内存的问题,引入了CORDIC旋转器实现输入与旋转因子相乘的运算,可以降低乘法运算的复杂度,该CORDIC旋转器采用改进的高基CORDIC算法,解决了传统的CORDIC算法迭代次数多、延迟大的问题,从而达到高吞吐率要求。该基-3 FFT算法以寻址变序、流水处理的方式,可以满足最高运行频率为404 MHz的FFT处理要求。与基于传统复数乘法器的基-3 FFT算法相比,基于CORDIC旋转器的基-3 FFT算法使功耗平均减少了22%,使总延迟平均减少了29%。相似文献

5.

基于反馈控制结构的FFT信号处理器设计与FPGA实现

宗爱华张双吴恙杨维明《信息通信》2012,(3):49-51

研究了基于FPGA的基-2 FFT算法的设计与实现。为减小硬件资源开销,论文采用蝶形运算单元和控制器单元构成的反馈结构对基-2 FFT处理器的硬件j结构进行了总体设计,采用时序控制方法完成蝶形运算电路设计,采用同步有限状态机(FSM,finite state machine)方法实现了旋转因子系数的产生与控制。并基于Quartus II软件平台,完成了整个FFT处理器电路的FPGA实现,最后通过仿真验证了设计方案的正确性。相似文献

6.

一种基于高阶近似核DFT的快速实现算法 总被引：1，自引：0，他引：1

吕远唐斌祝俊《电子信息对抗技术》2009,24(2):19-22

理论分析优化近似核和基2DIT—FFT结构,提出并实现了一种高阶近似核DFT的快速算法。算法基于高阶近似核,无需三角运算实现FFT并提高了动态范围,基于DIT—FFT算法对DFT进行分解和蝶形运算,有效减少了运算量。理论分析和实验结果验证了方法的有效性,DSP硬件验证了算法的快速性。算法简单且具有广泛的适用性。相似文献

7.

OFDM系统中FFT算法的FPGA设计与实现

陈安乐国世超韩方景《信息技术》2009,33(12):70-73

针对FFT算法在OFDM系统中的应用,对一般的FFT算法进行比较分析,设计了一种便于FPGA硬件实现的基4 FFT算法结构。该实现结构的设计以简化电路结构,节省硬件资源,便于扩展维护为目的,以第一级运算为基础实现多级FFT运算,采用了电路复用技术,以一种新的数据排序方式实现正序输入,正序输出,简化旋转因子的排列,并对一些相关的关键技术进行了设计改进。本设计在ISE10.1平台采用VHDL语言编程实现,并通过了仿真验证。相似文献

8.

FFT算法的一种FPGA设计

陆旦前陈建平陈晓勇《现代电子技术》2007,30(6):178-181

在分析了快速傅里叶算法理论的基础上,提出了一种频率抽取基4FFT的FPGA设计方案,针对现有FFT的FPGA实现过程中蝶形运算需要频繁乘以多个旋转因子提出了改进方法,减少了旋转因子的乘法次数和存储空间,加快了蝶形运算的速度,设计的地址映射方法,无需运算即可得到所需数据的存放地址,并结合采用乒乓结构和流水线方式,来提高快速傅里叶变换(FFT)FPGA实现的速度,为实现FFT算法提供了一定的参考价值。相似文献

9.

WFTA算法的FPGA设计与实现

魏鹏孙磊王华力《通信技术》2011,44(4):167-169

Winograd傅里叶变换算法（WFTA）利用旋转因子W的特性对其进行分解,能够把FFT运算中乘法次数降到最低,是一种高效且资源占用相对较少的FFT实现方法。以256点分解为两维16×16点的小数组WFTA进行运算为例介绍了大数组WFTA算法的FPGA设计与实现方案。仿真测试表明,所设计的256点FFT处理器,乘法器资源消耗仅为基-2FFT的1/2、基-4FFT的2/3,且在100 MHz主时钟频率下完成运算仅需5.8μs,满足FFT处理器的高速实时性要求。相似文献

10.

基于FPGA的4k点基-16 FFT模块的实现

苏彦鹏张汉富韩磊《电子与封装》2007,7(9):8-11

针对高速实时处理的要求,提出了4096点快速傅立叶变换(FFT)模块在现场可编程门阵列(FPGA)中的设计和实现。在运算模块中,基于按频率抽取基-4算法提出了一种新型的基-16蝶型算法,并采用八级流水结构和四路转换器来实现。本文采用块浮点和循环存储结构,避免了溢出和节省了大量的硬件资源。实验结果表明,该方法在保证了运算精度和实现复杂度的同时,使运算速度相对于基-4算法提高了1倍。相似文献

11.

一种按时间抽取的混合基实序列高效FFT算法 总被引：2，自引：1，他引：1

张卉刘永刚阎跃鹏《微电子学与计算机》2008,25(11)

针对2N点实序列FFT的实现,分析了FFT运算的基本原理,并在基本原理的基础上介绍了一种按时间抽取的混合基FFT算法.此算法采用"包装"算法和基2-基4混合算法结合的方法进行运算.通过复杂度分析,显示了此算法与传统的单一基2或基4的FFT相比,大大减少了计算过程中所需的实加法的个数;当点数大于1024时,所需实乘法的个数也有所减少.这是一种实序列FFT的高效低复杂度算法. 相似文献

12.

New continuous-flow mixed-radix (CFMR) FFT Processor using novel in-place strategy 总被引：1，自引：0，他引：1

Jo B.G. Sunwoo M.H. 《IEEE transactions on circuits and systems. I, Regular papers》2005,52(5):911-919

The paper proposes a new continuous-flow mixed-radix (CFMR) fast Fourier transform (FFT) processor that uses the MR (radix-4/2) algorithm and a novel in-place strategy. The existing in-place strategy supports only a fixed-radix FFT algorithm. In contrast, the proposed in-place strategy can support the MR algorithm, which allows CF FFT computations regardless of the length of FFT. The novel in-place strategy is made by interchanging storage locations of butterfly outputs. The CFMR FFT processor provides the MR algorithm, the in-place strategy, and the CF FFT computations at the same time. The CFMR FFT processor requires only two N-word memories due to the proposed in-place strategy. In addition, it uses one butterfly unit that can perform either one radix-4 butterfly or two radix-2 butterflies. The CFMR FFT processor using the 0.18 /spl mu/m SEC cell library consists of 37,000 gates excluding memories, requires only 640 clock cycles for a 512-point FFT and runs at 100 MHz. Therefore, the CFMR FFT processor can reduce hardware complexity and computation cycles compared with existing FFT processors. 相似文献

13.

Mixed-radix,virtually scaling-free CORDIC algorithm based rotator for DSP applications

《Integration, the VLSI Journal》2021

In this work, we proposed a novel Coordinate Rotation DIgital Computer (CORDIC) rotator algorithm that converges faster by performing radix-2,4 and 16 CORDIC iterations while maintaining the scale factor implicitly constant. A mixed-radix is used to achieve convergence faster to reduce the computational latency of the CORDIC algorithm. The main concern of the higher radix CORDIC algorithm is the compensation of a variable scale factor. To solve this problem, the Taylor series approximation of sine and cosine is proposed for a higher radix CORDIC algorithm to achieve the scaling-free rotation of the two-dimensional vector. The scaling-free rotation of the proposed CORDIC algorithm removes the read-only memory (ROM) needed to store scale factor of higher radix CORDIC algorithm. Further, the proposed CORDIC algorithm is designed in rotation mode and optimized by removing the Z datapath for the digital signal processing (DSP) applications for which the angle of rotation is known in advance. Finally, the multipath delay commutator (MDC) fast Fourier transform (FFT) algorithm is implemented with the proposed CORDIC algorithm based rotator on FPGA. The proposed design is compared with existing designs. In a comparison between the radix-16 CORDIC rotator based FFT implementation and our proposed implementation, it has been found out that implementation proposed in this article has used 17% fewer resources. 相似文献

14.

Balanced Binary-Tree Decomposition for Area-Efficient Pipelined FFT Processing

Lee H.-Y. Park I.-C. 《IEEE transactions on circuits and systems. I, Regular papers》2007,54(4):889-900

This paper presents an area-efficient algorithm for the pipelined processing of fast Fourier transform (FFT). The proposed algorithm is to decompose a discrete Fourier transform (DFT) into two balanced sub-DFTs in order to minimize the total number of twiddle factors to be stored into tables. The radix in the proposed decomposition is adaptively changed according to the remaining transform length to make the transform lengths of sub-DFTs resulting from the decomposition as close as possible. An 8192-point pipelined FFT processor designed for digital video broadcasting-terrestrial (DVB-T) systems saves 33% of general multipliers and 23% of the total size of twiddle factor tables compared to a conventional pipelined FFT processor based on the radix-2² algorithm. In addition to the decomposition, several implementation techniques are proposed to reduce area, such as a simple index generator of twiddle factor and add/subtract units combined with the two's complement operation 相似文献

15.

A 2.4-Gsample/s DVFS FFT Processor for MIMO OFDM Communication Systems 总被引：1，自引：0，他引：1

Yuan Chen Yu-Wei Lin Yu-Chi Tsao Chen-Yi Lee 《Solid-State Circuits, IEEE Journal of》2008,43(5):1260-1273

This paper presents a new dynamic voltage and frequency scaling (DVFS) FFT processor for MIMO OFDM applications. By the proposed multimode multipath-delay-feedback (MMDF) architecture, our FFT processor can process 1-8-stream 256-point FFTs or a high-speed 256-point FFT in two processing domains at minimum clock frequency for DVFS operations. A parallelized radix-2⁴ FFT algorithm is also employed to save the power consumption and hardware cost of complex multipliers. Furthermore, a novel open-loop voltage detection and scaling (OLVDS) mechanism is proposed for fast and robust voltage management. With these schemes, the proposed FFT processor can operate at adequate voltage/frequency under different configurations to support the power-aware feature. A test chip of the proposed FFT processor has been fabricated using UMC 90 nm single-poly nine-metal CMOS process with a core area of 1.88 times1.88 mm² . The SQNR performance of this FFT chip is over 35.8 dB for QPSK/16-QAM modulation. Power dissipation of 2.4 Gsample/s 256-point FFT computations is about 119.7 mW at 0.85 V. Depending on the operation mode, power can be saved by 18%-43% with voltage scaling in TT corner. 相似文献

16.

Designing Fast Fourier Transform Accelerators for Orthogonal Frequency-Division Multiplexing Systems

Waqar Hussain Fabio Garzia Tapani Ahonen Jari Nurmi 《Journal of Signal Processing Systems》2012,69(2):161-171

Designing accelerators for the real-time computation of Fast Fourier Transform (FFT) algorithms for state-of-the-art Orthogonal Frequency-Division Multiplexing (OFDM) demodulators has always been challenging. We have scaled-up a template-based Coarse-Grain Reconfigurable Array device for faster FFT processing that generates special purpose accelerators based on the user input. Using a basic and a scaled-up version, we have generated a radix-4 and mixed-radix (2, 4) FFT accelerator to process different length and types of algorithms. Our implementation results show that these accelerators satisfy not only the execution time requirements of FFT processing for Single Input Single Output (SISO) wireless standards that are IEEE-802.11 a/g and 3GPP-LTE but also for Multiple Input Multiple Output (MIMO) IEEE-802.11n standard. 相似文献

17.

快速图像匹配相关系数算法及实现 总被引：1，自引：0，他引：1

刘红侠杨靓黄巾黄士坦《微电子学与计算机》2007,24(2):32-35

最大归一互相关图像匹配算法是图像匹配中的常用算法,其关键是解算活动图与基准图间的相关系数。针对相关系数计算量大的特点,分析了FFT的基与FFT处理速度之间的关系以及基16FFT算法特点,提出用基16FFT算法计算相关系数,相关系数的处理时间大幅减小;同时针对高基蝶形单元设计复杂、使用不灵活等特点,提出采用级连思想实现主基16蝶形单元,使处理器的设计复杂度降低。实验证明,将主基16FFT处理器用于相关系数的计算中,使最大归一互相关图像匹配处理速度达到国际领先水平。相似文献

18.

Mixed-Radix Algorithm for the Computation of Forward and Inverse MDCTs

Jiasong Wu Huazhong Shu Senhadji L. Limin Luo 《IEEE transactions on circuits and systems. I, Regular papers》2009,56(4):784-794

The modified discrete cosine transform (MDCT) and inverse MDCT (IMDCT) are two of the most computationally intensive operations in MPEG audio coding standards. A new mixed-radix algorithm for efficiently computing the MDCT/IMDCT is presented. The proposed mixed-radix MDCT algorithm is composed of two recursive algorithms. The first algorithm, called the radix-2 decimation-in-frequency algorithm, is obtained by decomposing an N-point MDCT into two MDCTs with the length N/2. The second algorithm, called the radix-3 decimation-in-time algorithm, is obtained by decomposing an N -point MDCT into three MDCTs with the length N/3. Since the proposed MDCT algorithm is also expressed in the form of a simple sparse matrix factorization, the corresponding IMDCT algorithm can be easily derived by simply transposing the matrix factorization. Comparison of the proposed algorithm with some existing ones shows that our proposed algorithm is more suitable for parallel implementation and particularly suitable for the layer III of MPEG-1 and MPEG-2 audio encoding and decoding. Moreover, the proposed algorithm can be easily extended to the multidimensional case by using the vector-radix method. 相似文献

19.

FPGA implementation of high-performance,resource-efficient Radix-16 CORDIC rotator based FFT algorithm

《Integration, the VLSI Journal》2020

The fast Fourier transform (FFT) is an algorithm widely used to compute the discrete Fourier transform (DFT) in real-time digital signal processing. High-performance with fewer resources is highly desirable for any real-time application. Our proposed work presents the implementation of the radix-2 decimation-in-frequency (R2DIF) FFT algorithm based on the modified feed-forward double-path delay commutator (DDC) architecture on FPGA device. Need for a complex multiplier to carry out the multiplication of complex twiddle factors and large memory to store the twiddle factors are the main concerns for FFT implementation. Propose work aims to address these issues. In this work, a high-performance radix-16 COordinate Rotational DIgital Computer (CORDIC) algorithm based rotator is proposed to carry out the complex twiddle factor multiplication. Further, CORDIC needs only rotational angles to carry out complex multiplication, which reduces the need for large memory to store the twiddle factors. To compute the total rotation for n-bit precision, our proposed radix-16 CORDIC algorithm takes n/4 iteration as compared to n iteration of the radix-2 CORDIC algorithm. Our proposed architecture of the radix-2 decimation-in-frequency (R2DIF) algorithm is implemented on a Virtex−7 series FPGA. Further, the detailed comparison is presented between our proposed FFT implementation and other recently proposed FFT implementations. Experimental results suggest that proposed implementation has less latency and hardware utilization as compared to recently proposed implementations. 相似文献