期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Scalable and modular memory-based systolic architectures for discrete Hartley transform

Meher P.K. Srikanthan T. Patra J.C. 《IEEE transactions on circuits and systems. I, Regular papers》2006,53(5):1065-1077

In this paper, we present a design framework for scalable memory-based implementation of the discrete Hartley transform (DHT) using simple and efficient systolic and systolic-like structures for short and prime transform lengths, as well as, for lengths 4 and 8. We have used the proposed short-length structures to construct highly modular architectures for higher transform lengths by a new prime-factor implementation approach. The structures proposed for the prime-factor DHT, interestingly, do not involve any transposition hardware/time. Besides, it is shown here that an N-point DHT can be computed efficiently from two (N/2)-point DHTs of its even- and odd-indexed input subsequences in a recursive manner using a ROM-based multiplication stage. Apart from flexibility of implementation, the proposed structures offer significantly lower area-time complexity compared with the existing structures. The proposed schemes of computation of the DHT can conveniently be scaled not only for higher transform lengths but also according to the hardware constraint or the throughput requirement of the application. 相似文献

2.

New Systolic Algorithm and Array Architecture for Prime-Length Discrete Sine Transform

Meher P.K. Swamy M.N.S. 《Circuits and Systems II: Express Briefs, IEEE Transactions on》2007,54(3):262-266

Using a simple input-regeneration approach and index-transformation techniques, a new formulation is presented in this paper for computing an N-point prime-length discrete sine transform (DST) through two pairs of [(N-1)/4]-point cyclic convolutions, where [(N-1)/4] is an odd number. The cyclic convolution-based algorithm is used further to obtain a simple regular and locally connected linear systolic array for concurrent pipelined implementation of the DST. It is shown that the proposed systolic structure involves significantly less area-time complexity compared with that of the existing structures 相似文献

3.

Hardware-Efficient Systolic-Like Modular Design for Two-Dimensional Discrete Wavelet Transform

Meher P.K. Mohanty B.K. Chandra Patra J. 《Circuits and Systems II: Express Briefs, IEEE Transactions on》2008,55(2):151-155

A systolic-like modular architecture is presented for hardware-efficient implementation of two-dimensional (2-D) discrete wavelet transform (DWT). The overall computation is decomposed into two distinct stages; where column processing is performed in stage-1, while row processing is performed in stage-2. Using a new data-access scheme and a novel folding technique, the computation of both the stages are performed concurrently for transposition-free implementation of 2-D DWT. The proposed design can offer nearly the same throughput rate, and requires the same or less the number of adders and multipliers as the best of the existing structures. The storage space is found to occupy most of the area in the existing 2-D DWT structures but the proposed structure does not require any on-chip or off-chip storage of input samples or storage/transposition of intermediate output. The proposed one, therefore, involves considerably less hardware complexity compared with the existing structures. Apart from that, it has less duration of cycle period in comparison to the existing structures, and has a latency of cycles while all the existing structures have latency of cycles, the filter order being small compared to the input size . 相似文献

4.

Multirate-based fast parallel algorithms for 2-D DHT-based real-valued discrete Gabor transform

Tao L Kwan HK 《IEEE transactions on image processing》2012,21(7):3306-3311

Novel algorithms for the multirate and fast parallel implementation of the 2-D discrete Hartley transform (DHT)-based real-valued discrete Gabor transform (RDGT) and its inverse transform are presented in this paper. A 2-D multirate-based analysis convolver bank is designed for the 2-D RDGT, and a 2-D multirate-based synthesis convolver bank is designed for the 2-D inverse RDGT. The parallel channels in each of the two convolver banks have a unified structure and can apply the 2-D fast DHT algorithm to speed up their computations. The computational complexity of each parallel channel is low and is independent of the Gabor oversampling rate. All the 2-D RDGT coefficients of an image are computed in parallel during the analysis process and can be reconstructed in parallel during the synthesis process. The computational complexity and time of the proposed parallel algorithms are analyzed and compared with those of the existing fastest algorithms for 2-D discrete Gabor transforms. The results indicate that the proposed algorithms are the fastest, which make them attractive for real-time image processing. 相似文献

5.

An Integer Approximation Method for Discrete Sinusoidal Transforms

R. J. Cintra 《Circuits, Systems, and Signal Processing》2011,30(6):1481-1501

Approximate methods have been considered as a means to the evaluation of discrete transforms. In this work, we propose and analyze a class of integer transforms for the discrete Fourier, Hartley, and cosine transforms (DFT, DHT, and DCT), based on simple dyadic rational approximation methods. The introduced method is general, applicable to several blocklengths, whereas existing approaches are usually dedicated to specific transform sizes. The suggested approximate transforms enjoy low multiplicative complexity and the orthogonality property is achievable via matrix polar decomposition. We show that the obtained transforms are competitive with archived methods in literature. New 8-point square wave approximate transforms for the DFT, DHT, and DCT are also introduced as particular cases of the introduced methodology. 相似文献

6.

Multi-mode parallel and folded VLSI architectures for 1D-fast Fourier transform

《Integration, the VLSI Journal》2016

The modern real time applications like orthogonal frequency division multiplexing and etc., demand high performance fast Fourier transform (FFT) design with less area and clock cycles. This paper proposes efficient FFT VLSI architectures using folded/parallel implementation. In the proposed folded FFT architecture, the number of cycles required to complete the operation is less than single path delay feedback (SDF)/multi-path delay commutator (MDC) architectures. In the proposed parallel FFT architecture, N-point FFT is implemented by using one N/2-point FFT without much extra hardware. Both the proposed architectures are implemented for radix-2, 2², and 4 using 45 nm technology library. The proposed parallel architecture achieves 56.7% and 40.6% of area reduction as compared with the existing parallel architecture based 16-point radix-2 and radix-2² DIF FFTs respectively. The proposed folded architecture achieves 65.5%, 51.1%, and 35.8% of worst path delay reduction as compared with the existing SDF based 16-point radix-2, radix-2², and radix-4 DIF FFTs respectively. 相似文献

7.

RAPID PROTOTYPING - Framework for FPGA-based discrete biorthogonal wavelet transforms implementation

Uzun I.S. Amira A. 《Vision, Image and Signal Processing, IEE Proceedings -》2006,153(6):721-734

The discrete wavelet transform has taken its place at the forefront of research for the development of signal and image processing applications. These wavelet-based approaches have outperformed existing strategies in many areas including telecommunication, numerical analysis and, most notably, image/video compression. The authors present an investigation into the design and implementation of 1-D and 2-D discrete biorthogonal wavelet transforms (DBWTs) using a field programmable gate array (FPGA)-based rapid prototyping environment. The proposed architectures for DBWTs are scalable, modular and have less area and time complexity when compared with existing structures. FPGA implementation results based on a Xilinx Virtex-2000E device have shown that the proposed system provides an efficient solution for the processing of DBWTs in real-time 相似文献

8.

Systolic algorithms and a memory-based design approach for a unified architecture for the computation of DCT/DST/IDCT/IDST

Chiper D.F. Swamy M.N.S. Ahmad M.O. Stouraitis T. 《IEEE transactions on circuits and systems. I, Regular papers》2005,52(6):1125-1137

In this paper, an efficient design approach for a unified very large-scale integration (VLSI) implementation of the discrete cosine transform/discrete sine transform/inverse discrete cosine transform/inverse discrete sine transform based on an appropriate formulation of the four transforms into cyclic convolution structures is presented. This formulation allows an efficient memory-based systolic array implementation of the unified architecture using dual-port ROMs and appropriate hardware sharing methods. The performance of the unified design is compared to that of some of the existing ones. It is found that the proposed design provides a superior performance in terms of the hardware complexity, speed, I/O costs, in addition to such features as regularity, modularity, pipelining capability, and local connectivity, which make the unified structure well suited for VLSI implementation. 相似文献

9.

A new split-radix FHT algorithm for length-q*2/sup m/DHTs

Bouguezel S. Ahmad M.O. Swamy M.N.S. 《IEEE transactions on circuits and systems. I, Regular papers》2004,51(10):2031-2043

In this paper, a new split-radix fast Hartley transform (FHT) algorithm is proposed for computing the discrete Hartley transform (DHT) of an arbitrary length N=q*2/sup m/, where q is an odd integer. The basic idea behind the proposed FHT algorithm is that a mixture of radix-2 and radix-8 index maps is used in the decomposition of the DHT. This idea and the use of an efficient indexing process lead to a new decomposition different from that of the existing split-radix FHT algorithms, since the existing ones are all based on the use of a mixture of radix-2 and radix-4 index maps. The proposed algorithm reduces substantially the operations such as data transfer, address generation, and twiddle factor evaluation or access to the lookup table, which contribute significantly to the execution time of FHT algorithms. It is shown that the arithmetic complexity (multiplications+additions) of the proposed algorithm is, in almost all cases, the same as that of the existing split-radix FHT algorithm for length- q*2/sup m/ DHTs. Since the proposed algorithm is expressed in a simple matrix form, it facilitates an easy implementation of the algorithm, and allows for an extension to the multidimensional case. 相似文献

10.

Radix-3$,times,$3 Algorithm for The 2-D Discrete Hartley Transform

Wu J.S. Shu H.Z. Senhadji L. Luo L.M. 《Circuits and Systems II: Express Briefs, IEEE Transactions on》2008,55(6):566-570

In this correspondence, we propose a vector-radix algorithm for the fast computation of a 2-D discrete Hartley transform (DHT). For data sequences whose length is a power of three, a radix-3 times 3 decimation in frequency algorithm is developed. It decomposes a length-N times N DHT into nine length-(N/3) times N (N/3) DHTs. Comparison of the computational complexity with known algorithms shows that the proposed algorithm, in some cases, reduces significantly the number of arithmetic operations. 相似文献

11.

Low Complexity Discrete Hartley Transform Precoded OFDM System over Frequency‐Selective Fading Channel

下载免费PDF全文

Xing Ouyang Jiyu Jin Guiyue Jin Peng Li 《ETRI Journal》2015,37(1):32-42

Orthogonal frequency‐division multiplexing (OFDM) suffers from spectral nulls of frequency‐selective fading channels. Linear precoded (LP‐) OFDM is an effective method that guarantees symbol detectability by spreading the frequency‐domain symbols over the whole spectrum. This paper proposes a computationally efficient and low‐cost implementation for discrete Hartley transform (DHT) precoded OFDM systems. Compared to conventional DHT‐OFDM systems, at the transmitter, both the DHT and the inverse discrete Fourier transform are replaced by a one‐level butterfly structure that involves only one addition per symbol to generate the time‐domain DHT‐OFDM signal. At the receiver, only the DHT is required to recover the distorted signal with a single‐tap equalizer in contrast to both the DHT and the DFT in the conventional DHT‐OFDM. Theoretical analysis of DHT‐OFDM with linear equalizers is presented and confirmed by numerical simulation. It is shown that the proposed DHT‐OFDM system achieves similar performance when compared to other LP‐OFDMs but exhibits a lower implementation complexity and peak‐to‐average power ratio. 相似文献

12.

CORDIC based fast algorithm for power-of-two point DCT and its efficient VLSI implementation

《Microelectronics Journal》2014,45(11):1480-1488

—In this paper, we present a coordinate rotation digital computer (CORDIC) based fast algorithm for power-of-two point DCT, and develop its corresponding efficient VLSI implementation. The proposed algorithm has some distinguish advantages, such as regular Cooley-Tukey FFT-like data flow, identical post-scaling factor, and arithmetic-sequence rotation angles. By using the trigonometric formula, the number of the CORDIC types is reduced dramatically. This leads to an efficient method for overcoming the problem that lack synchronization among the various rotation angles CORDICs. By fully reusing the uniform processing cell (PE), for 8-point DCT, only four carry save adders (CSAs)-based PEs with two different types are required. Compared with other known architectures, the proposed 8-point DCT architecture has higher modularity, lower hardware complexity, higher throughput and better synchronization. 相似文献

13.

A high-performance VLSI architecture for reconfigurable FIR using distributed arithmetic

《Integration, the VLSI Journal》2016

In this paper, we have analyzed the register complexity of direct-form and transpose-form structures of FIR filter and explored the possibility of register reuse. We find that direct-form structure involves significantly less registers than the transpose-form structure, and it allows register reuse in parallel implementation. We analyze further the LUT consumption and other resources of DA-based parallel FIR filter structures, and find that the input delay unit, coefficient storage unit and partial product generation unit are also shared besides LUT words when multiple filter outputs are computed in parallel. Based on these finding, we propose a design approach, and used that to derive a DA-based architecture for reconfigurable block-based FIR filter, which is scalable for larger block-sizes and higher filter-lengths. Interestingly, the number of registers of the proposed structure does not increase proportionately with the block-size. This is a major advantage for area-delay and energy efficient high-throughput implementation of reconfigurable FIR filters of higher block-sizes. Theoretical comparison shows that the proposed structure for block-size 8 and filter-length 64 involves 60% more flip-flops, 6.2 times more adders, 3.5 times more AND-OR gates, and offers 8 times higher throughput. ASIC synthesis result shows that the proposed structure for block-size 8 and filter-length 64 involves 1.8 times less area-delay product (ADP) and energy per sample (EPS) than the existing design, and it can support 8 times higher throughput. The proposed structure for block sizes 4 and 8, respectively, consumes 38% and 50% less power than the exiting structure for the same throughput rates on average for different supply voltages. 相似文献

14.

Cost-Effective Triple-Mode Reconfigurable Pipeline FFT/IFFT/2-D DCT Processor

Chin-Teng Lin Yuan-Chu Yu Lan-Da Van 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2008,16(8):1058-1071

This investigation proposes a novel radix-4² algorithm with the low computational complexity of a radix-16 algorithm but the lower hardware requirement of a radix-4 algorithm. The proposed pipeline radix-4² single delay feedback path (R4²SDF) architecture adopts a multiplierless radix-4 butterfly structure, based on the specific linear mapping of common factor algorithm (CFA), to support both 256-point fast Fourier transform/inverse fast Fourier transform (FFT/IFFT) and 8times8 2D discrete cosine transform (DCT) modes following with the high efficient feedback shift registers architecture. The segment shift register (SSR) and overturn shift register (OSR) structure are adopted to minimize the register cost for the input re-ordering and post computation operations in the 8times8 2D DCT mode, respectively. Moreover, the retrenched constant multiplier and eight-folded complex multiplier structures are adopted to decrease the multiplier cost and the coefficient ROM size with the complex conjugate symmetry rule and subexpression elimination technology. To further decrease the chip cost, a finite wordlength analysis is provided to indicate that the proposed architecture only requires a 13-bit internal wordlength to achieve 40-dB signal-to-noise ratio (SNR) performance in 256-point FFT/IFFT modes and high digital video (DV) compression quality in 8 times 8 2D DCT mode. The comprehensive comparison results indicate that the proposed cost effective reconfigurable design has the smallest hardware requirement and largest hardware utilization among the tested architectures for the FFT/IFFT computation, and thus has the highest cost efficiency. The derivation and chip implementation results show that the proposed pipeline 256-point FFT/IFFT/2D DCT triple-mode chip consumes 22.37 mW at 100 MHz at 1.2-V supply voltage in TSMC 0.13-mum CMOS process, which is very appropriate for the RSoCs IP of next-generation handheld devices. 相似文献

15.

Fast Parallel Approach for 2-D DHT-Based Real-Valued Discrete Gabor Transform

《IEEE transactions on image processing》2009,18(12):2790-2796

Two-dimensional fast Gabor transform algorithms are useful for real-time applications due to the high computational complexity of the traditional 2-D complex-valued discrete Gabor transform (CDGT). This paper presents two block time-recursive algorithms for 2-D DHT-based real-valued discrete Gabor transform (RDGT) and its inverse transform and develops a fast parallel approach for the implementation of the two algorithms. The computational complexity of the proposed parallel approach is analyzed and compared with that of the existing 2-D CDGT algorithms. The results indicate that the proposed parallel approach is attractive for real time image processing. 相似文献

16.

二维实值离散Gabor变换时间递归算法的双层并行格型结构实现方法

陶亮庄镇泉《电路与系统学报》2002,7(4):31-36

本文首先简单回顾了作者曾提出的二维实值离散Gabor变换及其与复值离散Gabor变换的简单关系，然后着重探讨了二维实值离散Gabor变换快速计算问题，提出了二维实值离散Gabor变换系数求解的时间递归算法以及由变换系数重构原图像的块时间递归算法，研究了双层并行格型结构实现算法的方法，计算复杂性分析及与其它算法的比较证明了双层并行格型结构实现方法在实时处理方面的优越性。相似文献

17.

Hardware-Efficient Systolization of DA-Based Calculation of Finite Digital Convolution

《Circuits and Systems II: Express Briefs, IEEE Transactions on》2006,53(8):707-711

Novel one- and two-dimensional systolic structures are designed for computation of circular convolution using distributed arithmetic (DA). The proposed structures involve significantly less memory and less area-delay complexity compared with the existing DA-based structures for circular convolution. Besides, it is shown that the proposed systolic designs for circular convolution can be used for computation of linear convolution as well. 相似文献

18.

A new approximation of the discrete hilbert transformer

《Proceedings of the IEEE. Institute of Electrical and Electronics Engineers》1979,67(1):174-175

A class of discrete Hilbert transformers (DHT) is presented, which requires less coefficient storage as compared to the optimal DHT, and whose performance is only slightly inferior to the optimal DHT. In respect to coefficient quantization error, the new DHT is found to be as good as, if not better, than the optimal one, especially for small word-lengths. 相似文献

19.

FHT algorithm for length N=q.2^m

Vijayakumar N. Prabhu K.M.M. 《Electronics letters》1999,35(12):966-968

Of late, the discrete Hartley transform (DHT) has become an important real-valued transform. Many fast algorithms for computing the DHT of sequence length N=2^m have been reported. Fast computation of the DHT of length N=q.2^m, where q is an odd integer, is proposed. The key feature of the algorithm is its flexibility in the choice of sequence length N, where N need not necessarily be a power of 2, while giving rise to a substantial reduction in computational complexity when compared to other algorithms 相似文献

20.

Full-rate full-diversity 2 × 2 space-time codes of reduced decoder complexity

Sezginer S. Sari H. 《Communications Letters, IEEE》2007,11(12):973-975

Multiple-input multiple-output (MIMO) techniques have become an essential part of broadband wireless communications systems. For example, the recently developed IEEE 802.16e specifications for broadband wireless access include three MIMO profiles employing 2times2 space-time codes (STCs) and two of those are mandatory on the downlink of Mobile WiMAX systems. Conventional approaches to STC design are based on performance criteria such as coding gain, diversity gain, multiplexing gain, and ignore the decoder complexity. In this paper, we take an alternative approach and present a full-rate full-diversity 2times2 STC design leading to substantially lower complexity of the optimum detector than existing schemes. This makes the implementation of high performance full-rate codes realistic in practical systems. 相似文献