共查询到20条相似文献,搜索用时 508 毫秒
1.
2.
A new two-dimensional fast cosine transform algorithm 总被引:1,自引:0,他引:1
The discrete cosine transform (2-D DCT) is based on a one-dimensional fast cosine transform (1-D FCT) algorithm. Instead of computing the 2-D transform using the row-column method, the 1-D algorithm is extended by means of the vector-radix approach. Derivation based on both the sequence splitting and Kronecker matrix product method are discussed. The sequence splitting approach has the advantage that all the underlying operations are shown clearly, while the matrix product representations are more compact and readily generalized to higher dimensions. The bit reversal operations are placed before the recursive additions so that the recursive operations can be performed in a very regular manner. This greatly simplifies the indexing problem in the software implementation of the algorithms. The vector-radix algorithm saves 25% multiplications as compared with the row-column method 相似文献
3.
This paper investigates efficient hardware architectures for implementation of 1-D and 2-D discrete wavelet transforms (DWTs).
The architectures are based on the lifting scheme. We propose a general structure to minimize the number of multipliers and
adders for 1-D DWTs. Compared to previous conventional architectures, the architecture presented here is more efficient in
terms of the required arithmetic units. Moreover, we describe a new frame scan method for a block-based 2-D DWT structure
which provides a flexible trade-off between the required internal memory size and external memory access. In contrast, other
2-D DWT structures require a fixed memory size. 相似文献
4.
Wu J.S. Shu H.Z. Senhadji L. Luo L.M. 《Circuits and Systems II: Express Briefs, IEEE Transactions on》2008,55(6):566-570
In this correspondence, we propose a vector-radix algorithm for the fast computation of a 2-D discrete Hartley transform (DHT). For data sequences whose length is a power of three, a radix-3 times 3 decimation in frequency algorithm is developed. It decomposes a length-N times N DHT into nine length-(N/3) times N (N/3) DHTs. Comparison of the computational complexity with known algorithms shows that the proposed algorithm, in some cases, reduces significantly the number of arithmetic operations. 相似文献
5.
Fast algorithm for the 3-D DCT-II 总被引:1,自引:0,他引:1
Recently, many applications for three-dimensional (3-D) image and video compression have been proposed using 3-D discrete cosine transforms (3-D DCTs). Among different types of DCTs, the type-II DCT (DCT-II) is the most used. In order to use the 3-D DCTs in practical applications, fast 3-D algorithms are essential. Therefore, in this paper, the 3-D vector-radix decimation-in-frequency (3-D VR DIF) algorithm that calculates the 3-D DCT-II directly is introduced. The mathematical analysis and the implementation of the developed algorithm are presented, showing that this algorithm possesses a regular structure, can be implemented in-place for efficient use of memory, and is faster than the conventional row-column-frame (RCF) approach. Furthermore, an application of 3-D video compression-based 3-D DCT-II is implemented using the 3-D new algorithm. This has led to a substantial speed improvement for 3-D DCT-II-based compression systems and proved the validity of the developed algorithm. 相似文献
6.
Hui Zhang Mingxin Xia Guangshu Hu 《Circuits and Systems II: Express Briefs, IEEE Transactions on》2007,54(2):200-204
Two-dimensional (2-D) convolution is widely used in image and video processing. Although the operation is simple, 2-D convolution is however both computationally expensive and memory-intensive. Field-programmable-gate-array (FPGA)-based parallel processing architectures were proposed to accelerate calculations for 2-D convolution. And data buffers implemented with FPGA on-chip resources were used to avoid direct access to external memories. Full buffering and partial buffering (PB) schemes were adopted in previous works. The former would consume a large amount of FPGA resources, while the latter would cause a sharp increase in external memory bus bandwidth. In this brief, we present a multiwindow PB scheme for FPGA-based 2-D convolvers. Compared with the aforementioned methods, the new buffering strategy exhibits a good balance between on-chip resource utilization and external memory bus bandwidth, and therefore is suitable for low-cost FPGA implementation 相似文献
7.
The implementation of the memory for storing image and transform coefficients in 2-D DWT processing systems using the more
cost-effective external memory module such as DDR DRAM is shown to suffer from effective memory bandwidth which is significantly
lower than the memory system peak bandwidth if the conventional direct logical-to-physical memory address mapping is adopted.
The low effective memory bandwidth is caused by the high level of memory overhead cycle occurrence which is in turn is closely
related to the logical memory access patterns of 2-D DWT processes. The problem becomes even more severe for the 2-D DWT processing
of video. An analysis on the logical memory access patterns of multi-level 2-D DWT is carried out and an enhanced logical-to-physical
memory mapping scheme which minimizes the occurrence of memory overhead cycles is proposed. The proposed scheme is simulated
and its performance in terms of effective memory access bandwidth is evaluated and compared with the conventional direct mapping
scheme.
相似文献
Soon-Chieh LimEmail: |
8.
A new family of two-dimensional (2-D) wavelength/time optical orthogonal codes (OOCs) for asynchronous optical code division multiple access (OCDMA) systems is proposed. The construction scheme uses the difference family (DF), which is an assemblage of difference sets in the combinatorial theory. It is proven that the proposed codewords satisfy the correlation properties required for the asynchronous OCDMA systems. The code dimension of the proposed codes is more flexible than that of the conventional 2-D codewords. The performance of the system with the proposed codes is analyzed by using the Markov-chain method. Numerical results show that the bit error rate (BER) has a minimal value given the number of simultaneous users. It is also observed that the maximum number of simultaneous users of the system can be achieved by properly choosing both the code weight and cross correlation of the 2-D OOCs. 相似文献
9.
Song Chen Postula A. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2000,8(1):74-83
This paper presents a novel approach to the synthesis of interleaved memory systems that is especially suited for application-specific processors. Our synthesis system generates the optimized interleaved memories for a specific algorithm and finds the best mapping of arrays in that algorithm onto the memory system to achieve high performance. The design space is four-dimensional (4-D) and comprises the number of memory banks, the type of memory components, the storage scheme, and the range of clock period in the system. Optimal designs are found among the Pareto points (a set of nondominated points in the design space) computed for our memory model under the performance and cost criteria set by the designer. The memory model includes all the components of an interleaved memory system and covers a lookup table-based address generation with data alignment. The synthesis is based on a general periodic storage scheme, which enables efficient handling of irregular and overlapped access patterns. The synthesis process is the exhaustive search of the heavily pruned design space, and the pruning is based on mathematically proven properties of periodic storage schemes. This paper presents the theorems, the synthesis algorithm, and the methods of effective word and bank address generation. Examples are given to illustrate the effectiveness of our method 相似文献
10.
11.
《IEEE transactions on circuits and systems. I, Regular papers》2006,53(9):1966-1976
In this paper, we propose a three-dimensional (3-D) split vector-radix fast Hartley transform (FHT) algorithm. The main idea behind the proposed algorithm is that the radix-2/4 approach is introduced in the decomposition of the 3-D discrete Hartley transform by using an appropriate index mapping and the Kronecker product. This provides an algorithm based on a mixture of radix-(2$,times,$ 2$,times,$ 2) and radix-(4$,times,$ 4$,times,$ 4) index maps and has a butterfly that is characterized by simple closed-form expressions. This algorithm offers substantial reductions in the numbers of multiplications, additions, data transfers, and twiddle factor evaluations or accesses to the look-up table, without a significant increase in the structural complexity compared to that of the existing 3-D vector radix FHT algorithm. 相似文献
12.
Three-dimensional biorthogonal multiresolution time-domain method and its application to electromagnetic scattering problems 总被引:1,自引:0,他引:1
Xianyang Zhu Dogaru T. Carin L. 《Antennas and Propagation, IEEE Transactions on》2003,51(5):1085-1092
A three-dimensional (3-D) multiresolution time-domain (MRTD) analysis is presented based on a biorthogonal-wavelet expansion, with application to electromagnetic-scattering problems. We employ the Cohen-Daubechies-Feauveau (CDF) biorthogonal wavelet basis, characterized by the maximum number of vanishing moments for a given support. We utilize wavelets and scaling functions of compact support, yielding update equations involving a small number of proximate field components. A detailed analysis is presented on algorithm implementation, with example numerical results compared to data computed via the conventional finite-difference time-domain (FDTD) method. It is demonstrated that for 3-D scattering problems the CDF-based MRTD often provides significant computational savings (in computer memory and run time) relative to FDTD, while retaining numerical accuracy. 相似文献
13.
A new family of 2-D optical orthogonal codes and analysis of its performance in optical CDMA access networks 总被引:5,自引:0,他引:5
Sun Shurong Hongxi Yin Ziyu Wang Anshi Xu 《Lightwave Technology, Journal of》2006,24(4):1646-1653
A new family of two-dimensional optical orthogonal code (2-D OOC), one-coincidence frequency hop code (OCFHC)/OOC, which employs OCFHC and OOC as wavelength hopping and time-spreading patterns, respectively, is proposed in this paper. In contrary to previously constructed 2-D OOCs, OCFHC/OOC provides more choices on the number of available wavelengths and its cardinality achieves the upper bound in theory without sacrificing good auto-and-cross correlation properties, i.e., the correlation properties of the code is still ideal. Meanwhile, we utilize a new method, called effective normalized throughput, to compare the performance of diverse codes applicable to optical code division multiple access (OCDMA) systems besides conventional measure bit error rate, and the results indicate that our code performs better than obtained OCDMA codes and is truly applicable to OCDMA networks as multiaccess codes and will greatly facilitate the implementation of OCDMA access networks. 相似文献
14.
Three-dimensional discrete wavelet transform architectures 总被引:2,自引:0,他引:2
The three-dimensional (3-D) discrete wavelet transform (DWT) suits compression applications well, allowing for better compression on 3-D data as compared with two-dimensional (2-D) methods. This paper describes two architectures for the 3-D DWT, called the 3DW-I and the 3DW-II. The first architecture (3DW-I) is based on folding, whereas the 3DW-II architecture is block-based. Potential applications for these architectures include high definition television (HDTV) and medical data compression, such as magnetic resonance imaging (MRI). The 3DW-I architecture is an implementation of the 3-D DWT similar to folded 1-D and 2-D designs. It allows even distribution of the processing load onto 3 sets of filters, with each set performing the calculations for one dimension. The control for this design is very simple, since the data are operated on in a row-column-slice fashion. Due to pipelining, all filters are utilized 100% of the time, except for the start up and wind-down times. The 3DW-II architecture uses block inputs to reduce the requirement of on-chip memory. It has a central control unit to select which coefficients to pass on to the lowpass and highpass filters. The memory on the chip will be small compared with the input size since it depends solely on the filter sizes. The 3DW-I and 3DW-II architectures are compared according to memory requirements, number of clock cycles, and processing of frames per second. The two architectures described are the first 3-D DWT architectures 相似文献
15.
《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2009,17(3):403-416
16.
Jiasong Wu Huazhong Shu Senhadji L. Limin Luo 《IEEE transactions on circuits and systems. I, Regular papers》2009,56(4):784-794
The modified discrete cosine transform (MDCT) and inverse MDCT (IMDCT) are two of the most computationally intensive operations in MPEG audio coding standards. A new mixed-radix algorithm for efficiently computing the MDCT/IMDCT is presented. The proposed mixed-radix MDCT algorithm is composed of two recursive algorithms. The first algorithm, called the radix-2 decimation-in-frequency algorithm, is obtained by decomposing an N-point MDCT into two MDCTs with the length N/2. The second algorithm, called the radix-3 decimation-in-time algorithm, is obtained by decomposing an N -point MDCT into three MDCTs with the length N/3. Since the proposed MDCT algorithm is also expressed in the form of a simple sparse matrix factorization, the corresponding IMDCT algorithm can be easily derived by simply transposing the matrix factorization. Comparison of the proposed algorithm with some existing ones shows that our proposed algorithm is more suitable for parallel implementation and particularly suitable for the layer III of MPEG-1 and MPEG-2 audio encoding and decoding. Moreover, the proposed algorithm can be easily extended to the multidimensional case by using the vector-radix method. 相似文献
17.
18.
Fast 2-dimensional 4 /spl times/ 4 forward integer transform implementation for H.264/AVC 总被引:1,自引:0,他引:1
Chih-Peng Fan 《Circuits and Systems II: Express Briefs, IEEE Transactions on》2006,53(3):174-177
In this paper, the novel two-dimensional (2-D) fast algorithm for realization of 4 /spl times/ 4 forward integer transform in H.264 is proposed. Based on matrix operations with Kronecker product and direct sum, the efficient fast 2-D 4 /spl times/ 4 forward integer transform can be derived from the proposed one-dimensional fast 4 /spl times/ 4 forward integer transform through matrix decompositions. The proposed fast 2-D 4 /spl times/ 4 forward integer transform design doesn't need transpose memory for direct parallel pipelined architecture. The fast 2-D 4 /spl times/ 4 forward integer transform requires fewer latency delays than the state-of-the-art methods. With regular modularity, the proposed fast algorithm is suitable for VLSI implementation to achieve real-time H.264/advanced video coding (AVC) signal processing. 相似文献
19.
20.
A Vlsi Architecture for Separable 2-D Discrete Wavelet Transform 总被引:2,自引:0,他引:2
In this paper, an efficient semi-systolic array architecture for separable 2-D Discrete Wavelet Transform (DWT) is introduced. The semi-systolic array is applicable to any convolution that requires an arbitrary subsampling function. The semi-systolic array presents a better implementation of the convolution function of DWT. This kind of implementation offers a higher efficiency compared to regular systolic implementation when applied for 2-D DWT. The architecture has an efficiency of at least 91% which increases proportional to the number of octaves with no change in the architecture design except for minor modifications to the control logic and memory size. The propose architecture is scalable for different size of filter and different number of octave. The communication routing is minimum since data transfers are limited to immediate neighboring processors. The components of the architecture are fairly regular and consist of minimum number of computational units which makes it a good candidate for VLSI implementation. 相似文献