期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Efficient architectures for two-dimensional discrete wavelet transform using lifting scheme.

Chengyi Xiong Jinwen Tian Jian Liu 《IEEE transactions on image processing》2007,16(3):607-614

Novel architectures for 1-D and 2-D discrete wavelet transform (DWT) by using lifting schemes are presented in this paper. An embedded decimation technique is exploited to optimize the architecture for 1-D DWT, which is designed to receive an input and generate an output with the low- and high-frequency components of original data being available alternately. Based on this 1-D DWT architecture, an efficient line-based architecture for 2-D DWT is further proposed by employing parallel and pipeline techniques, which is mainly composed of two horizontal filter modules and one vertical filter module, working in parallel and pipeline fashion with 100% hardware utilization. This 2-D architecture is called fast architecture (FA) that can perform J levels of decomposition for N * N image in approximately 2N2(1 - 4(-J))/3 internal clock cycles. Moreover, another efficient generic line-based 2-D architecture is proposed by exploiting the parallelism among four subband transforms in lifting-based 2-D DWT, which can perform J levels of decomposition for N * N image in approximately N2(1 - 4(-J))/3 internal clock cycles; hence, it is called high-speed architecture. The throughput rate of the latter is increased by two times when comparing with the former 2-D architecture, but only less additional hardware cost is added. Compared with the works reported in previous literature, the proposed architectures for 2-D DWT are efficient alternatives in tradeoff among hardware cost, throughput rate, output latency and control complexity, etc. 相似文献

2.

一种基于提升的二维离散小波变换VLSI架构

王超曹鹏李杰黄伟达《现代电子技术》2007,30(14):114-118

离散小波变换(Discrete Wavelet Transform,DWT)需要较多的运算量以及较大的存储器空间,为了使之适用于实时的图像处理应用,就需要开发特殊的架构和芯片来提高离散小波变换的运算性能。基于提升的二维DWT提出了一种新型的VLSI结构——LLSP架构,其结合逐级和基于行的架构这两者特点,带来了硬件开销和存储器空间的降低,并可以用于多提升步骤的扩展以及多级二维离散小波变换。相似文献

3.

Memory-efficient high-speed VLSI implementation of multi-level discrete wavelet transform

《Journal of Visual Communication and Image Representation》2016

Memory requirements and critical path are essential for 2-D Discrete Wavelet Transform (DWT). In this paper, we address this problem and develop a memory-efficient high-speed architecture for multi-level two-dimensional DWT. First, dual data scanning technique is first adopted in 2-D 9/7 DWT processing unit to perform lifting operations, which doubles the throughputs per cycle. Second, for 2-D DWT architecture, the proposed Row Transform Unit and Column Transform Unit take advantage of input sample availabilities and provision computing resources accordingly to optimize the processing speed, in which the number of processors is further optimized to significantly reduce the hardware cost. Third, to address the problem of high cost of memory for the immediate computing results from each level and the computation time as resolution level increases, multiple proposed 2-D DWT units were combined to build a parallel multi-level architecture, which can perform up to six levels of 2-D DWT in a resolution level parallel way on any arbitrary image size at competitive hardware cost. Experimental results demonstrated that the proposed scheme achieves improved hardware performance with significantly reduced on-chip memory resource and computational time, which outperforms the-state-of-the-art schemes and makes it desirable in memory-constrained real-time application systems. 相似文献

4.

High-Throughput Memory-Based Architecture for DHT Using a New Convolutional Formulation

Meher P.K. Patra J.C. Swamy M.N.S. 《Circuits and Systems II: Express Briefs, IEEE Transactions on》2007,54(7):606-610

A new formulation is presented for the computation of an -point discrete Hartley transform (DHT) from two pairs of [(N/2-1)/2]-point cyclic convolutions, and further used to obtain modular structures consisting of simple and regular memory-based systolic arrays for concurrent pipelined realization of the DHT. The proposed structures for direct-memory-based implementation is found to involve nearly the same hardware complexity as those of the existing structures, but offers two to four times more throughput and two to four times less latency compared with others. The distributed-arithmetic (DA)-based implementation is also found to offer very less memory-complexity and considerably low area-delay complexity compared with the existing DA-based structures. 相似文献

5.

Efficient Systolic Implementation of DFT Using a Low-Complexity Convolution-Like Formulation

《Circuits and Systems II: Express Briefs, IEEE Transactions on》2006,53(8):702-706

A reduced-complexity algorithm is presented for computation of the discrete Fourier transform, where$N$-point transform is computed from eight number of nearly$(N/8)$-point circular-convolution-like operations. A systolic architecture is also derived for very large-scale integration circuit implementation of the proposed algorithm. The proposed architecture is fully pipelined and contains regular and simple locally connected processing elements. It is devoid of complex control structure and is scalable for higher transform lengths. It is observed that the proposed systolic structure involves either less or nearly the same hardware-complexity compared with the corresponding existing systolic structures. In addition, it offers eight times more throughput and significantly low latency compared with the others. 相似文献

6.

Efficient architectures for 1-D and 2-D lifting-based wavelet transforms 总被引：4，自引：0，他引：4

Hongyu Liao Mandal M.Kr. Cockburn B.F. 《Signal Processing, IEEE Transactions on》2004,52(5):1315-1326

The lifting scheme reduces the computational complexity of the discrete wavelet transform (DWT) by factoring the wavelet filters into cascades of simple lifting steps that process the input samples in pairs. We propose four compact and efficient hardware architectures for implementing lifting-based DWTs, namely, one-dimensional (1-D) and two-dimensional (2-D) versions of what we call recursive and dual scan architectures. The 1-D recursive architecture exploits interdependencies among the wavelet coefficients by interleaving, on alternate clock cycles using the same datapath hardware, the calculation of higher order coefficients along with that of the first-stage coefficients. The resulting hardware utilization exceeds 90% in the typical case of a five-stage 1-D DWT operating on 1024 samples. The 1-D dual scan architecture achieves 100% datapath hardware utilization by processing two independent data streams together using shared functional blocks. The recursive and dual scan architectures can be readily extended to the 2-D case. The 2-D recursive architecture is roughly 25% faster than conventional implementations, and it requires a buffer that stores only a few rows of the data array instead of a fixed fraction (typically 25% or more) of the entire array. The 2-D dual scan architecture processes the column and row transforms simultaneously, and the memory buffer size is comparable to existing architectures. 相似文献

7.

Efficient Systolic Designs for 1- and 2-Dimensional DFT of General Transform-Lengths for High-Speed Wireless Communication Applications

Pramod K. Meher Jagdish C. Patra A. P. Vinod 《Journal of Signal Processing Systems》2010,60(1):1-14

In wireless communication, multiple receive-antennas are used with orthogonal frequency division multiplexing (OFDM) to improve the system capacity and performance. The discrete Fourier transform (DFT) plays an important part in such a system since the DFTs are required to be performed for the output of all those antennas separately. This paper presents area-time efficient systolic structures for one-dimensional (1-D) and two-dimensional (2-D) DFTs of general lengths. A low-complexity recursive algorithm based on Clenshaw’s recurrence relation is formulated for the computation of 1-D DFT. The proposed algorithm is used further to derive a linear systolic array for the DFT. The concurrency of computation has been enhanced and complexity is minimized by the proposed algorithm where an N −point DFT is computed via four inner-products of real-valued data of length ≈ (N/2). The proposed 1-D structure offers significantly lower latency, twice the throughput, and involves nearly the same area-time complexity of the corresponding existing structures. The proposed algorithm for 1-D DFT is extended further to obtain a 2-D systolic structure for the 2-D DFT without involving any transposition operation. 相似文献

8.

二维实值离散Gabor变换时间递归算法的双层并行格型结构实现方法

陶亮庄镇泉《电路与系统学报》2002,7(4):31-36

本文首先简单回顾了作者曾提出的二维实值离散Gabor变换及其与复值离散Gabor变换的简单关系，然后着重探讨了二维实值离散Gabor变换快速计算问题，提出了二维实值离散Gabor变换系数求解的时间递归算法以及由变换系数重构原图像的块时间递归算法，研究了双层并行格型结构实现算法的方法，计算复杂性分析及与其它算法的比较证明了双层并行格型结构实现方法在实时处理方面的优越性。相似文献

9.

An Efficient Pipeline Architecture and Memory Bit-Width Analysis for Discrete Wavelet Transform of the 9/7 Filter for JPEG 2000

Chung-Fu Lin Pei-Kung Huang Bing-Fei Wu 《Journal of Signal Processing Systems》2010,59(3):245-253

In this paper, we propose an efficient pipeline architecture for the DWT 9/7 filter defined in JPEG 2000. The proposed architecture is composed of column and row processors to perform the separable 2-D DWT. Based on the rescheduling DWT algorithm, we derive a new data flow graph to shorten the critical path. The proposed 1-D column processor requires less pipeline registers to achieve about the same critical path compared with other lifting-based architectures. For the row processor, the data dependency of each lifting step is reduced to only two computation nodes and therefore more pipeline registers can be applied to achieve higher processing speed without increasing the internal memory size in the 2-D case. That is, for an N × N image, it only requires 4N internal memory to perform the row-wise transform. For the memory bit-width analysis, we use software simulation to reduce the memory bit-width for various compression ratios. Since a portion of information from least significant bits of DWT coefficients would be discarded after EBCOT-tier2 processing, one can decrease the data width of internal memory to perform various compression ratios of JPEG 2000 coding, especially at the low-bit rates. Our simulation results suggest that it is practically possible to design the energy-aware memory architecture to further reduce the power consumption in the future work. 相似文献

10.

Three-dimensional discrete wavelet transform architectures 总被引：2，自引：0，他引：2

Weeks M. Bayoumi M.A. 《Signal Processing, IEEE Transactions on》2002,50(8):2050-2063

The three-dimensional (3-D) discrete wavelet transform (DWT) suits compression applications well, allowing for better compression on 3-D data as compared with two-dimensional (2-D) methods. This paper describes two architectures for the 3-D DWT, called the 3DW-I and the 3DW-II. The first architecture (3DW-I) is based on folding, whereas the 3DW-II architecture is block-based. Potential applications for these architectures include high definition television (HDTV) and medical data compression, such as magnetic resonance imaging (MRI). The 3DW-I architecture is an implementation of the 3-D DWT similar to folded 1-D and 2-D designs. It allows even distribution of the processing load onto 3 sets of filters, with each set performing the calculations for one dimension. The control for this design is very simple, since the data are operated on in a row-column-slice fashion. Due to pipelining, all filters are utilized 100% of the time, except for the start up and wind-down times. The 3DW-II architecture uses block inputs to reduce the requirement of on-chip memory. It has a central control unit to select which coefficients to pass on to the lowpass and highpass filters. The memory on the chip will be small compared with the input size since it depends solely on the filter sizes. The 3DW-I and 3DW-II architectures are compared according to memory requirements, number of clock cycles, and processing of frames per second. The two architectures described are the first 3-D DWT architectures 相似文献

11.

A nonseparable VLSI architecture for two-dimensional discreteperiodized wavelet transform

King-Chu Hung Yao-Shan Hung Yu-Jung Huang 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2001,9(5):565-576

A modified two-dimensional (2-D) discrete periodized wavelet transform (DPWT) based on the homeomorphic high-pass filter and the 2-D operator correlation algorithm is developed in this paper. The advantages of this modified 2-D DPWT are that it can reduce the multiplication counts and the complexity of boundary data processing in comparison to other conventional 2-D DPWT for perfect reconstruction. In addition, a parallel-pipeline architecture of the nonseparable computation algorithm is also proposed to implement this modified 2-D DPWT. This architecture has properties of noninterleaving input data, short bus width request, and short latency. The analysis of the finite precision performance shows that nearly half of the bit length can be saved by using this nonseparable computation algorithm. The operation of the boundary data processing is also described in detail. In the three-stage decomposition of an N×N image, the latency is found to be N²+2N+18 相似文献

12.

矢量基二维DCT修剪在DSP上的内存存取减少方法

下载免费PDF全文

刘项洋许勇郑孝遥陈付龙《电子学报》2019,47(3):757-763

本文针对矢量基二维DCT修剪提出内存存取减少方法.该方法旨在减少计算中因权重因子和信号输入而导致的内存存取.它首先利用权重因子的属性将计算流程图内每相邻两阶段内的蝴蝶运算单元进行融合,然后再以较少的权重因子来计算.本文采用通用DSP处理器来验证该方法对矢量基二维DCT修剪算法的有效性.并且实验结果显示该方法相比于常规方法可以大幅度减少运算所需的时钟周期数、降低对运算中对内存的存取量、以及占用更少的内存. 相似文献

13.

An Enhanced Memory Address Mapping Scheme for Improved Memory Access Performance of 2-D DWT Processing Systems

Sze-Wei Lee Soon-Chieh Lim 《The Journal of VLSI Signal Processing》2007,47(3):201-221

The implementation of the memory for storing image and transform coefficients in 2-D DWT processing systems using the more cost-effective external memory module such as DDR DRAM is shown to suffer from effective memory bandwidth which is significantly lower than the memory system peak bandwidth if the conventional direct logical-to-physical memory address mapping is adopted. The low effective memory bandwidth is caused by the high level of memory overhead cycle occurrence which is in turn is closely related to the logical memory access patterns of 2-D DWT processes. The problem becomes even more severe for the 2-D DWT processing of video. An analysis on the logical memory access patterns of multi-level 2-D DWT is carried out and an enhanced logical-to-physical memory mapping scheme which minimizes the occurrence of memory overhead cycles is proposed. The proposed scheme is simulated and its performance in terms of effective memory access bandwidth is evaluated and compared with the conventional direct mapping scheme.

Soon-Chieh LimEmail:

相似文献

14.

New matrix formulation for two-dimensional DCT/IDCT computation and its distributed-memory VLSI implementation 总被引：1，自引：0，他引：1

Hsiao S.-F. Tseng J.-M. 《Vision, Image and Signal Processing, IEE Proceedings -》2002,149(2):97-107

A direct method for the computation of 2-D DCT/IDCT on a linear-array architecture is presented. The 2-D DCT/IDCT is first converted into its corresponding I-D DCT/IDCT problem through proper input/output index reordering. Then, a new coefficient matrix factorisation is derived, leading to a cascade of several basic computation blocks. Unlike other previously proposed high-speed 2-D N /spl times/ N DCT/IDCT processors that usually require intermediate transpose memory and have computation complexity O(N/sup 3/), the proposed hardware-efficient architecture with distributed memory structure has computation complexity O(N/sup 2/ log/sub 2/ N) and requires only log/sub 2/ N multipliers. The new pipelinable and scalable 2-D DCT/IDCT processor uses storage elements local to the processing elements and thus does not require any address generation hardware or global memory-to-array routing. 相似文献

15.

High-Speed CORDIC Based on an Overlapped Architecture and a Novel σ-Prediction Method

Jae-Hyuck Kwak Jae hun Choi Earl E. Swartzlander Jr. 《The Journal of VLSI Signal Processing》2000,25(2):167-177

This paper presents architectural and algorithmic approaches for achieving high-speed CORDIC processing in both of the two operating modes: vectoring and rotation. For vectoring mode CORDIC processing, a modified architecture is proposed, which aims at reduction of computation time by overlapping the stages for redundant addition and selection of rotation direction. In addition, a novel rotation direction prediction scheme for rotation mode CORDIC is presented. The method is based on approximation of the binary angle input to a number with the arctangent weights (tan^–1 2^–i). The implementation is designed to keep the fast timing characteristics of redundant arithmetic in the x/y path of the CORDIC processing. The characteristics are analyzed with respect to latency time and area, and compared with those obtained by conventional CORDIC implementations. The results show that the proposed techniques reduce not only the block latency but also the overall computation time. Thus, they achieve higher throughput in pipelining. 相似文献

16.

High Throughput Parallel-Pipeline 2-D DCT/IDCT Processor Chip

G. A. Ruiz J. A. Michell A. Burón 《The Journal of VLSI Signal Processing》2006,45(3):161-175

This paper presents a 2-D DCT/IDCT processor chip for high data rate image processing and video coding. It uses a fully pipelined row–column decomposition method based on two 1-D DCT processors and a transpose buffer based on D-type flip-flops with a double serial input/output data-flow. The proposed architecture allows the main processing elements and arithmetic units to operate in parallel at half the frequency of the data input rate. The main characteristics are: high throughput, parallel processing, reduced internal storage, and maximum efficiency in computational elements. The processor has been implemented using standard cell design methodology in 0.35 μm CMOS technology. It measures 6.25 mm² (the core is 3 mm²) and contains a total of 11.7 k gates. The maximum frequency is 300 MHz with a latency of 172 cycles for 2-D DCT and 178 cycles for 2-D IDCT. The computing time of a block is close to 580 ns. It has been designed to meets the demands of IEEE Std. 1,180–1,990 used in different video codecs. The good performance in the computing speed and hardware cost indicate that this processor is suitable for HDTV applications. This work was supported by the Spanish Ministry of Science and Technology (TIC2000-1289).

相似文献

17.

A Unified FPGA-Based System Architecture for 2-D Discrete Wavelet Transform

Ishmael Sameen Yoong Choon Chang Mow Song Ng Bok-Min Goi Chee-Pun Ooi 《Journal of Signal Processing Systems》2013,71(2):123-142

This paper presents a novel unified and programmable 2-D Discrete Wavelet Transform (DWT) system architecture, which was implemented using a Field Programmable Gate Array (FPGA)-based Nios II soft-core processor working in combination with custom hardware accelerators generated through high-level synthesis. The proposed system architecture, synthesized on an Altera DE3 Stratix III FPGA board, was developed through an iterative design space exploration methodology using Altera’s C2H compiler. Experimental results show that the proposed system architecture is capable of real-time video processing performance for grayscale image resolutions of up to 1920?×?1080 (1080p) when ran on the Altera DE3 board, and it outperforms the existing 2-D DWT architecture implementations known in literature by a considerable margin in terms of throughput. While the proposed 2-D DWT system architecture satisfies real-time performance constraints, it can also perform both forward and inverse DWT, support a number of popular DWT filters used for image and video compression and provide architecture programmability in terms of number of levels of decomposition as well as image width and height. Based from the design principles used to implement the proposed 2-D DWT system architecture, a system design guideline can be formulated for SOC designs which plan to incorporate dedicated 2-D DWT hardware acceleration. 相似文献

18.

RAPID PROTOTYPING - Framework for FPGA-based discrete biorthogonal wavelet transforms implementation

Uzun I.S. Amira A. 《Vision, Image and Signal Processing, IEE Proceedings -》2006,153(6):721-734

The discrete wavelet transform has taken its place at the forefront of research for the development of signal and image processing applications. These wavelet-based approaches have outperformed existing strategies in many areas including telecommunication, numerical analysis and, most notably, image/video compression. The authors present an investigation into the design and implementation of 1-D and 2-D discrete biorthogonal wavelet transforms (DBWTs) using a field programmable gate array (FPGA)-based rapid prototyping environment. The proposed architectures for DBWTs are scalable, modular and have less area and time complexity when compared with existing structures. FPGA implementation results based on a Xilinx Virtex-2000E device have shown that the proposed system provides an efficient solution for the processing of DBWTs in real-time 相似文献

19.

离散小波变换的VLSI实现 总被引：3，自引：0，他引：3

乔世杰王国裕《微电子学》2001,31(2):143-145

离散小波变换已广泛应用于信号处理中。然而,实时小波变换需要大量运算,因此,专用小波变换芯片的设计已成为信号处理中的关键技术。文章提出了一种小波变换递归金字塔算法的VLSI结构,采用一组输入延迟单元和一个控制单元,用一组并行滤波器完成了小波变换。编写了相应的Verilog HDL模块,并进行了仿真和逻辑综合。相似文献

20.

A Very Efficient Storage Structure for DWT and IDWT Filters

Robert M. Owens Mohan Vishwanath 《The Journal of VLSI Signal Processing》1998,19(3):215-225

In this paper, we present an area-efficient storage and routing structure to be used as part of either a DWT or an IDWT filter. Such efficient structures are necessary for the single chip implementation of multidimensional DWT and IDWT filters for processing images and video. While the storage structures described in previously published architectures were adequate for the 1D DWT/IDWT filter, they do not scale well to a multidimensional implementation. The storage structure design and implementation described in this paper utilizes a combination of well-known efficient RAM cells with simple control to achieve compact size and scalability. When compared to other alternatives, the structure uses less power.In this paper, we examine the problem of constructing, on a single chip, filters for both the multidimensional Discrete Wavelet Transform (DWT) and the multidimensional Inverse Discrete Wavelet Transform (IDWT). We will use the following example to illustrate where the difficulty lies in constructing such a chip. Consider a filter that executes transforms on 2D images at the rate of 30 images per second. Furthermore, the size N × N of the images is 1024 × 1024, the length L of the filter is 8, the number of octaves O to be generated is 4, and the arithmetic precision P is 24. In image compression, such a filter would be a good candidate for the replacement of the filters presently used to perform the block Discrete Cosine Transform (DCT). 相似文献