首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 508 毫秒
1.
本文针对矢量基二维DCT修剪提出内存存取减少方法.该方法旨在减少计算中因权重因子和信号输入而导致的内存存取.它首先利用权重因子的属性将计算流程图内每相邻两阶段内的蝴蝶运算单元进行融合,然后再以较少的权重因子来计算.本文采用通用DSP处理器来验证该方法对矢量基二维DCT修剪算法的有效性.并且实验结果显示该方法相比于常规方法可以大幅度减少运算所需的时钟周期数、降低对运算中对内存的存取量、以及占用更少的内存.  相似文献   

2.
A new two-dimensional fast cosine transform algorithm   总被引:1,自引:0,他引:1  
The discrete cosine transform (2-D DCT) is based on a one-dimensional fast cosine transform (1-D FCT) algorithm. Instead of computing the 2-D transform using the row-column method, the 1-D algorithm is extended by means of the vector-radix approach. Derivation based on both the sequence splitting and Kronecker matrix product method are discussed. The sequence splitting approach has the advantage that all the underlying operations are shown clearly, while the matrix product representations are more compact and readily generalized to higher dimensions. The bit reversal operations are placed before the recursive additions so that the recursive operations can be performed in a very regular manner. This greatly simplifies the indexing problem in the software implementation of the algorithms. The vector-radix algorithm saves 25% multiplications as compared with the row-column method  相似文献   

3.
This paper investigates efficient hardware architectures for implementation of 1-D and 2-D discrete wavelet transforms (DWTs). The architectures are based on the lifting scheme. We propose a general structure to minimize the number of multipliers and adders for 1-D DWTs. Compared to previous conventional architectures, the architecture presented here is more efficient in terms of the required arithmetic units. Moreover, we describe a new frame scan method for a block-based 2-D DWT structure which provides a flexible trade-off between the required internal memory size and external memory access. In contrast, other 2-D DWT structures require a fixed memory size.  相似文献   

4.
In this correspondence, we propose a vector-radix algorithm for the fast computation of a 2-D discrete Hartley transform (DHT). For data sequences whose length is a power of three, a radix-3 times 3 decimation in frequency algorithm is developed. It decomposes a length-N times N DHT into nine length-(N/3) times N (N/3) DHTs. Comparison of the computational complexity with known algorithms shows that the proposed algorithm, in some cases, reduces significantly the number of arithmetic operations.  相似文献   

5.
Fast algorithm for the 3-D DCT-II   总被引:1,自引:0,他引:1  
Recently, many applications for three-dimensional (3-D) image and video compression have been proposed using 3-D discrete cosine transforms (3-D DCTs). Among different types of DCTs, the type-II DCT (DCT-II) is the most used. In order to use the 3-D DCTs in practical applications, fast 3-D algorithms are essential. Therefore, in this paper, the 3-D vector-radix decimation-in-frequency (3-D VR DIF) algorithm that calculates the 3-D DCT-II directly is introduced. The mathematical analysis and the implementation of the developed algorithm are presented, showing that this algorithm possesses a regular structure, can be implemented in-place for efficient use of memory, and is faster than the conventional row-column-frame (RCF) approach. Furthermore, an application of 3-D video compression-based 3-D DCT-II is implemented using the 3-D new algorithm. This has led to a substantial speed improvement for 3-D DCT-II-based compression systems and proved the validity of the developed algorithm.  相似文献   

6.
Two-dimensional (2-D) convolution is widely used in image and video processing. Although the operation is simple, 2-D convolution is however both computationally expensive and memory-intensive. Field-programmable-gate-array (FPGA)-based parallel processing architectures were proposed to accelerate calculations for 2-D convolution. And data buffers implemented with FPGA on-chip resources were used to avoid direct access to external memories. Full buffering and partial buffering (PB) schemes were adopted in previous works. The former would consume a large amount of FPGA resources, while the latter would cause a sharp increase in external memory bus bandwidth. In this brief, we present a multiwindow PB scheme for FPGA-based 2-D convolvers. Compared with the aforementioned methods, the new buffering strategy exhibits a good balance between on-chip resource utilization and external memory bus bandwidth, and therefore is suitable for low-cost FPGA implementation  相似文献   

7.
The implementation of the memory for storing image and transform coefficients in 2-D DWT processing systems using the more cost-effective external memory module such as DDR DRAM is shown to suffer from effective memory bandwidth which is significantly lower than the memory system peak bandwidth if the conventional direct logical-to-physical memory address mapping is adopted. The low effective memory bandwidth is caused by the high level of memory overhead cycle occurrence which is in turn is closely related to the logical memory access patterns of 2-D DWT processes. The problem becomes even more severe for the 2-D DWT processing of video. An analysis on the logical memory access patterns of multi-level 2-D DWT is carried out and an enhanced logical-to-physical memory mapping scheme which minimizes the occurrence of memory overhead cycles is proposed. The proposed scheme is simulated and its performance in terms of effective memory access bandwidth is evaluated and compared with the conventional direct mapping scheme.
Soon-Chieh LimEmail:
  相似文献   

8.
A new family of two-dimensional (2-D) wavelength/time optical orthogonal codes (OOCs) for asynchronous optical code division multiple access (OCDMA) systems is proposed. The construction scheme uses the difference family (DF), which is an assemblage of difference sets in the combinatorial theory. It is proven that the proposed codewords satisfy the correlation properties required for the asynchronous OCDMA systems. The code dimension of the proposed codes is more flexible than that of the conventional 2-D codewords. The performance of the system with the proposed codes is analyzed by using the Markov-chain method. Numerical results show that the bit error rate (BER) has a minimal value given the number of simultaneous users. It is also observed that the maximum number of simultaneous users of the system can be achieved by properly choosing both the code weight and cross correlation of the 2-D OOCs.  相似文献   

9.
This paper presents a novel approach to the synthesis of interleaved memory systems that is especially suited for application-specific processors. Our synthesis system generates the optimized interleaved memories for a specific algorithm and finds the best mapping of arrays in that algorithm onto the memory system to achieve high performance. The design space is four-dimensional (4-D) and comprises the number of memory banks, the type of memory components, the storage scheme, and the range of clock period in the system. Optimal designs are found among the Pareto points (a set of nondominated points in the design space) computed for our memory model under the performance and cost criteria set by the designer. The memory model includes all the components of an interleaved memory system and covers a lookup table-based address generation with data alignment. The synthesis is based on a general periodic storage scheme, which enables efficient handling of irregular and overlapped access patterns. The synthesis process is the exhaustive search of the heavily pruned design space, and the pruning is based on mathematically proven properties of periodic storage schemes. This paper presents the theorems, the synthesis algorithm, and the methods of effective word and bank address generation. Examples are given to illustrate the effectiveness of our method  相似文献   

10.
Three-dimensional chip (3-D) stacking technology provides a new approach to address the so-called memory wall problem. Memory processor chip stacking reduces this memory wall problem, permitting faster clock rates (with suitable processor logic) or permitting multicore access to shared memory using a large number of vertical vias between tiers in the stack, for ultrawide bit path transfer of data and address information to and from various levels of cache. Although a limited amount of parallel access is possible using conventional two-dimensional (2-D) chip memory-processor approaches, 3-D memory-processor stacking greatly extends this to much larger capacity memories. We evaluate high-clock-rate processors as well as shared memory processors with a large number of cores. Various architectural design options to reduce the impact of the memory wall on the processor performance are explored and validated through simulations. Certain architectural features can be implemented in a 3-D chip, such as an ultrawide, ultrashort vertical bus with low parasitic resistance and the elimination of conventional electrostatic discharge, and packaging parasitics required in multiple package 2-D solutions. The objective is to reduce the clocks per instruction figure of merit for high clock speeds in order to deliver significant performance levels. High-clock-rate processors can be designed with SiGe heterostructure bipolar transistors to obtain processors operating on the order of 16 or 32 GHz.   相似文献   

11.
A Split Vector-Radix Algorithm for the 3-D Discrete Hartley Transform   总被引:1,自引:0,他引:1  
In this paper, we propose a three-dimensional (3-D) split vector-radix fast Hartley transform (FHT) algorithm. The main idea behind the proposed algorithm is that the radix-2/4 approach is introduced in the decomposition of the 3-D discrete Hartley transform by using an appropriate index mapping and the Kronecker product. This provides an algorithm based on a mixture of radix-(2$,times,$2$,times,$2) and radix-(4$,times,$4$,times,$4) index maps and has a butterfly that is characterized by simple closed-form expressions. This algorithm offers substantial reductions in the numbers of multiplications, additions, data transfers, and twiddle factor evaluations or accesses to the look-up table, without a significant increase in the structural complexity compared to that of the existing 3-D vector radix FHT algorithm.  相似文献   

12.
A three-dimensional (3-D) multiresolution time-domain (MRTD) analysis is presented based on a biorthogonal-wavelet expansion, with application to electromagnetic-scattering problems. We employ the Cohen-Daubechies-Feauveau (CDF) biorthogonal wavelet basis, characterized by the maximum number of vanishing moments for a given support. We utilize wavelets and scaling functions of compact support, yielding update equations involving a small number of proximate field components. A detailed analysis is presented on algorithm implementation, with example numerical results compared to data computed via the conventional finite-difference time-domain (FDTD) method. It is demonstrated that for 3-D scattering problems the CDF-based MRTD often provides significant computational savings (in computer memory and run time) relative to FDTD, while retaining numerical accuracy.  相似文献   

13.
A new family of two-dimensional optical orthogonal code (2-D OOC), one-coincidence frequency hop code (OCFHC)/OOC, which employs OCFHC and OOC as wavelength hopping and time-spreading patterns, respectively, is proposed in this paper. In contrary to previously constructed 2-D OOCs, OCFHC/OOC provides more choices on the number of available wavelengths and its cardinality achieves the upper bound in theory without sacrificing good auto-and-cross correlation properties, i.e., the correlation properties of the code is still ideal. Meanwhile, we utilize a new method, called effective normalized throughput, to compare the performance of diverse codes applicable to optical code division multiple access (OCDMA) systems besides conventional measure bit error rate, and the results indicate that our code performs better than obtained OCDMA codes and is truly applicable to OCDMA networks as multiaccess codes and will greatly facilitate the implementation of OCDMA access networks.  相似文献   

14.
Three-dimensional discrete wavelet transform architectures   总被引:2,自引:0,他引:2  
The three-dimensional (3-D) discrete wavelet transform (DWT) suits compression applications well, allowing for better compression on 3-D data as compared with two-dimensional (2-D) methods. This paper describes two architectures for the 3-D DWT, called the 3DW-I and the 3DW-II. The first architecture (3DW-I) is based on folding, whereas the 3DW-II architecture is block-based. Potential applications for these architectures include high definition television (HDTV) and medical data compression, such as magnetic resonance imaging (MRI). The 3DW-I architecture is an implementation of the 3-D DWT similar to folded 1-D and 2-D designs. It allows even distribution of the processing load onto 3 sets of filters, with each set performing the calculations for one dimension. The control for this design is very simple, since the data are operated on in a row-column-slice fashion. Due to pipelining, all filters are utilized 100% of the time, except for the start up and wind-down times. The 3DW-II architecture uses block inputs to reduce the requirement of on-chip memory. It has a central control unit to select which coefficients to pass on to the lowpass and highpass filters. The memory on the chip will be small compared with the input size since it depends solely on the filter sizes. The 3DW-I and 3DW-II architectures are compared according to memory requirements, number of clock cycles, and processing of frames per second. The two architectures described are the first 3-D DWT architectures  相似文献   

15.
One major issue in designing image processors is to design a memory system that supports parallel access with a simple interconnection network. This paper presents an efficient memory allocation to minimize the number of memory modules and processing elements with a parallel access capability when multiple windows with arbitrary shapes are specified. This paper also presents an efficient search method based on regularity of window-type image processing. We give some practical examples including a stereo-matching processor for acquiring 3-D information, and an optical-flow processor for motion estimation. These examples show that the numbers of memory modules are reduced to 2.7% and 10%, respectively, in comparison with a basic approach. It is also shown that the search time is less than 1 ms for practical image sizes and window sizes.   相似文献   

16.
The modified discrete cosine transform (MDCT) and inverse MDCT (IMDCT) are two of the most computationally intensive operations in MPEG audio coding standards. A new mixed-radix algorithm for efficiently computing the MDCT/IMDCT is presented. The proposed mixed-radix MDCT algorithm is composed of two recursive algorithms. The first algorithm, called the radix-2 decimation-in-frequency algorithm, is obtained by decomposing an N-point MDCT into two MDCTs with the length N/2. The second algorithm, called the radix-3 decimation-in-time algorithm, is obtained by decomposing an N -point MDCT into three MDCTs with the length N/3. Since the proposed MDCT algorithm is also expressed in the form of a simple sparse matrix factorization, the corresponding IMDCT algorithm can be easily derived by simply transposing the matrix factorization. Comparison of the proposed algorithm with some existing ones shows that our proposed algorithm is more suitable for parallel implementation and particularly suitable for the layer III of MPEG-1 and MPEG-2 audio encoding and decoding. Moreover, the proposed algorithm can be easily extended to the multidimensional case by using the vector-radix method.  相似文献   

17.
吴智华  罗嵘  杨华中 《微电子学》2007,37(6):878-881,886
为了提高MPEG2解码芯片的内存带宽利用率,根据SDRAM的固有特征和MPEG2解码时对内存读取的特点,提出了一种新的内存存储结构。采用该方法,可以大大减少内存读写的冗余时间,从而满足MPEG2 MP@HL解码器的设计要求。与直接映射相比,文章提出的方法可以使行激活的次数至少减少82.6%,内存读写时间减少59%以上。  相似文献   

18.
In this paper, the novel two-dimensional (2-D) fast algorithm for realization of 4 /spl times/ 4 forward integer transform in H.264 is proposed. Based on matrix operations with Kronecker product and direct sum, the efficient fast 2-D 4 /spl times/ 4 forward integer transform can be derived from the proposed one-dimensional fast 4 /spl times/ 4 forward integer transform through matrix decompositions. The proposed fast 2-D 4 /spl times/ 4 forward integer transform design doesn't need transpose memory for direct parallel pipelined architecture. The fast 2-D 4 /spl times/ 4 forward integer transform requires fewer latency delays than the state-of-the-art methods. With regular modularity, the proposed fast algorithm is suitable for VLSI implementation to achieve real-time H.264/advanced video coding (AVC) signal processing.  相似文献   

19.
二维直方图准分的Otsu图像割及其快速实现   总被引:6,自引:0,他引:6       下载免费PDF全文
张新明  孙印杰  郑延斌 《电子学报》2011,39(8):1778-1784
传统二维Otsu法主要由于对二维直方图采用主对角线区域概率和近似为1的假设等原因,以致分割结果不够准确.针对此问题,提出了一种二维直方图准分的Otsu快速图像分割方法.(1)准确选择邻域模板构建二维直方图并将Otsu阈值法用于此直方图上以便提高分割性能;(2)对二维直方图主对角线上的目标和背景两区域的Otsu公式中对应...  相似文献   

20.
A Vlsi Architecture for Separable 2-D Discrete Wavelet Transform   总被引:2,自引:0,他引:2  
In this paper, an efficient semi-systolic array architecture for separable 2-D Discrete Wavelet Transform (DWT) is introduced. The semi-systolic array is applicable to any convolution that requires an arbitrary subsampling function. The semi-systolic array presents a better implementation of the convolution function of DWT. This kind of implementation offers a higher efficiency compared to regular systolic implementation when applied for 2-D DWT. The architecture has an efficiency of at least 91% which increases proportional to the number of octaves with no change in the architecture design except for minor modifications to the control logic and memory size. The propose architecture is scalable for different size of filter and different number of octave. The communication routing is minimum since data transfers are limited to immediate neighboring processors. The components of the architecture are fairly regular and consist of minimum number of computational units which makes it a good candidate for VLSI implementation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号