期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

JPEG2000并行阵列式小波滤波器的VLSI结构设计 总被引：2，自引：0，他引：2

兰旭光郑南宁梅魁志刘跃虎《电子学报》2004,32(11):1806-1809

提出一种基于提升算法实现JPEG2000编码系统中的二维离散小波变换(Discrete Wavelet Transform)的并行阵列式的VLSI结构设计方法.利用该方法所得结构由两个行处理器,一个列处理器以及少量行缓存组成;行列处理器内部是由并行阵列式的处理单元组成;能使行和列滤波器同时进行滤波,用优化的移位加操作替代乘法操作.整个结构采用流水线的设计方法处理,在保证同样的精度下,大大减少了运算量和提高了硬件资源利用率,几乎达到100％,加快了变换速度,也减少了电路的规模.该结构对于N×N大小的图像,处理速度达到O(N²/2)个时钟周期.二维离散小波滤波器结构已经过FPGA验证,并可作为单独的IP核应用于正在开发的JPEG2000图像编解码芯片中. 相似文献

2.

提升算法离散小波变换的硬件实现

杨明张俊胡德俊《集成电路应用》2006,(2):27-29,33

本文介绍一种小波变换提升算法的硬件实现，它可以设置为5／3和9／7小波变换并用于JPEG2000中。该硬件实现采用了折叠结构以达到减少硬件开销和提高硬件使用率的目的。其中的乘法部分采用了正则符号编码（CSD，Canoni Csigned digit）把乘法运算转化为移位加／减操作，加快了变换速度。同时采用了嵌入式延拓进行数据延拓，也达到了加快运算速度和减少存储要求的目的。整个架构采用VHDL实现并通过仿真验证。相似文献

3.

基于GPGPU的JPEG2000图像压缩方法

下载免费PDF全文

李玉峰吴蔚王恺崔迎炜《电子器件》2013,36(2)

为了进一步加快JPEG2000的压缩速度,对JPEG2000压缩标准进行研究,分析得出JPEG2000核心算法离散小波变换（DWT）部分数据之间的独立性适合并行化处理。NVIDIA最新推出的CUDA（计算统一设备架构）是非常适合大规模数据并行计算的软硬件开发平台。在通用计算图形处理器（general purpose graphic process unit, GPGPU）上使用CUDA技术实现DWT并行化加速,并针对GPGPU存储空间的特点进行优化。得出的实验结果表明,经过CUDA并行优化的方法能够有效地提高DWT的计算速度。相似文献

4.

基于FPGA小波变换核的设计与实现 总被引：1，自引：0，他引：1

崔巍刘波曹剑中王华伟刘凯王新《电光与控制》2009,16(3)

根据提升小波的框架结构,提出了基于FPGA小波变换核的设计与实现方案;根据自顶向下的设计思想,利用FPGA片内存储资源,实现了行列变换的并行执行;该结构由一个行处理器和一个列处理器组成,行、列处理器通过时分复用同时进行滤波,用优化的移位加操作替代乘法操作;采用流水线设计方法,减少了运算量,提高了硬件资源利用率;整个模块采用VHDL语言进行设计,并在QuartusⅡ下进行了编译和仿真.经验证系统工作可靠,完全满足实时处理的要求. 相似文献

5.

一种适合JPEG2000的离散小波变换VLSI统一结构 总被引：7，自引：0，他引：7

华林朱柯周晓芳章倩苓《微电子学》2003,33(4):280-283,287

提出了一种基于提升算法(1ifting)的离散小波变换(DWT)统一结构。它无需额外的边界延拓过程，经配置后可适用于JPEG2000中的无损或有损小波变换。通过将边界延拓过程内嵌于离散小波变换中，可以降低功耗，减少所需内存。为了达到更高的处理速度和硬件利用率，采用了流水线和折叠结构。这种高效紧凑的离散小波变换结构适用于JPEG2000编码器和各种实时图像／视频应用系统．相似文献

6.

整数5/3小波变换的VLSI结构设计

田华常青《现代电子技术》2005,28(20):99-102

在JPEG 2000中,无损图像压缩是采用整数5/3小波变换实现的.JPEG 2000也给出了5/3小波基于提升方法的算法.对提升方法的整数5/3小波变换算法进行了研究,针对二维的变换提出一种VLSI结构.该结构由4个模块构成,模块之间并行运行,模块内部采用流水线技术.对多级变换,级间的运算还可交叉,体现了提升方法的优势,较大地提高了硬件效率.其主要优点是消耗资源少且运算速度高,同时也适用于其他整数小波变换. 相似文献

7.

二维离散5/3小波变换并行VLSI结构设计

杜会斌周旭张学庆吴晓娟《无线电通信技术》2006,32(6):39-41

提出了一种基于提升算法的二维离散5/3小波变换(DWT)高效并行VLSI结构设计方法。该方法使得行和列滤波器同时进行滤波,采用流水线设计方法处理,在保证同样的精度下,大大减少了运算量,提高了变换速度,节约了硬件资源。该方法已通过了VerilogHDL行为级仿真验证,可作为单独的IP核应用在JPEG2000图像编、解码芯片中。该结构可推广到9/7小波提升结构。相似文献

8.

基于OpenMP的JPEG2000并行解码算法的实现 总被引：1，自引：1，他引：0

吴昊邓家先黄艳《通信技术》2011,44(4):10-12,15

为了提高JPEG2000的解码速度,在多核处理器平台上利用OpenMP（Open specifications for Multi Processing）实现了JPEG2000的高速并行解码。即利用OpenMP对JPEG2000解码过程中的T1解码器和离散小波逆变换进行多路并行解码,减少了这两部分的运行时间,从而降低JPEG2000的整体解码时间。实验结果表明,OpenMP是一种简单而有效的并行化编程工具,在保证解码图像质量不变的前提下,相对单线程串行算法,所提出的并行解码算法,解码速度有显著提高。相似文献

9.

基于FPGA的5/3提升小波优化算法

陈占良金龙旭陶宏江韩双丽张敏《电视技术》2015,39(11):113-116

为了实现线阵CCD空间相机图像的实时压缩处理,在提升算法的基础上,提出了一种适用于FPGA的二维提升小波变换结构与实现方案.该系统利用FPGA片内的存储资源,采用乒乓操作实现了行列变换之间的数据缓存传输,降低了功耗,提高了硬件利用率和运算速度.并且为了适应硬件实现速度,在进行小波边界处理时不需要额外的边界延拓过程,很大程度上降低了算法的复杂度;整个模块采用verilog HDL语言进行设计,并在QuestaSim下进行了仿真试验.实验结果表明,该系统工作稳定可靠,完全满足实时处理的要求,并适用于JPEG2000的多级二维5/3小波变换. 相似文献

10.

JPEG2000小波提升算法的硬件设计 总被引：7，自引：1，他引：6

下载免费PDF全文

董文辉刘明业《电子学报》2003,31(11):1674-1677

离散小波变换是当今许多图像处理和压缩技术的基础,并被最新的ISO/IEC静态图像压缩标准JPEG2000所采用.基于提升方法的离散小波变换比传统的基于卷积的运算量小.我们为JPEG2000中的小波提升算法提出一个硬件结构,该结构整体运算速度高,存储需求低,硬件资源耗费少.我们提出在数据通道之外实现边界扩展,以降低数据通道的复杂性,提高运算效率.我们通过采用流水线技术,进一步提高了硬件设计的运算效率. 相似文献

11.

A VLSI architecture for lifting-based forward and inverse wavelettransform

Andra K. Chakrabarti C. Acharya T. 《Signal Processing, IEEE Transactions on》2002,50(4):966-977

We propose an architecture that performs the forward and inverse discrete wavelet transform (DWT) using a lifting-based scheme for the set of seven filters proposed in JPEG2000. The architecture consists of two row processors, two column processors, and two memory modules. Each processor contains two adders, one multiplier, and one shifter. The precision of the multipliers and adders has been determined using extensive simulation. Each memory module consists of four banks in order to support the high computational bandwidth. The architecture has been designed to generate an output every cycle for the JPEG2000 default filters. The schedules have been generated by hand and the corresponding timings listed. Finally, the architecture has been implemented in behavioral VHDL. The estimated area of the proposed architecture in 0.18-μ technology is 2.8 nun square, and the estimated frequency of operation is 200 MHz 相似文献

12.

An Efficient Pipeline Architecture and Memory Bit-Width Analysis for Discrete Wavelet Transform of the 9/7 Filter for JPEG 2000

Chung-Fu Lin Pei-Kung Huang Bing-Fei Wu 《Journal of Signal Processing Systems》2010,59(3):245-253

In this paper, we propose an efficient pipeline architecture for the DWT 9/7 filter defined in JPEG 2000. The proposed architecture is composed of column and row processors to perform the separable 2-D DWT. Based on the rescheduling DWT algorithm, we derive a new data flow graph to shorten the critical path. The proposed 1-D column processor requires less pipeline registers to achieve about the same critical path compared with other lifting-based architectures. For the row processor, the data dependency of each lifting step is reduced to only two computation nodes and therefore more pipeline registers can be applied to achieve higher processing speed without increasing the internal memory size in the 2-D case. That is, for an N × N image, it only requires 4N internal memory to perform the row-wise transform. For the memory bit-width analysis, we use software simulation to reduce the memory bit-width for various compression ratios. Since a portion of information from least significant bits of DWT coefficients would be discarded after EBCOT-tier2 processing, one can decrease the data width of internal memory to perform various compression ratios of JPEG 2000 coding, especially at the low-bit rates. Our simulation results suggest that it is practically possible to design the energy-aware memory architecture to further reduce the power consumption in the future work. 相似文献

13.

VLSI Architecture of Line-Based Lifting Wavelet Transform for Motion JPEG2000 总被引：1，自引：0，他引：1

Seo Y.-H. Kim D.-W. 《Solid-State Circuits, IEEE Journal of》2007,42(2):431-440

In this paper, we proposed a new architecture of lifting processor for JPEG2000 and implemented it with both FPGA and ASIC. It includes a new cell structure that executes a unit of lifting calculation to satisfy the requirements of the lifting process of a repetitive arithmetic. After analyzing the operational sequence of lifting arithmetic in detail and imposing the causality to implement in hardware, the unit cell was optimized. A new simple lifting kernel was organized by repeatedly arranging the unit cells and a lifting processor was realized for Motion JPEG2000 with the kernel. The proposed processor can handle any size of tiles and support both lossy and lossless operation with (9,7) filter and (5,3) filter, respectively. Also, it has the same throughput rate as the input, and can continuously output the wavelet coefficients of the four types (LL, LH, HL, HH) simultaneously. The lifting processor was implemented in a 0.35 mum CMOS fabrication process, the result of which occupied about 90 000 gates, and was stably operated in about 150 MHz 相似文献

14.

Two-Symbol FPGA Architecture for Fast Arithmetic Encoding in JPEG 2000

Nandini Ramesh Kumar Wei Xiang Yafeng Wang 《Journal of Signal Processing Systems》2012,69(2):213-224

JPEG 2000 is one of the most popular image compression standards offering significant performance advantages over previous image standards. High computational complexity of the JPEG 2000 algorithms makes it necessary to employ methods that overcomes the bottlenecks of the system and hence an efficient solution is imperative. One such crucial algorithms in JPEG 2000 is arithmetic coding and is completely based on bit level operations. In this paper, an efficient hardware implementation of arithmetic coding is proposed which uses efficient pipelining and parallel processing for intermediate blocks. The idea is to provide a two-symbol coding engine, which is efficient in terms of performance, memory and hardware. This architecture is implemented in Verilog hardware definition language and synthesized using Altera field programmable gate array. The only memory unit used in this design is a FIFO (first in first out) of 256 bits to store the CX-D pairs at the input, which is negligible compared to the existing arithmetic coding hardware designs. The simulation and synthesis results show that the operating frequency of the proposed architecture is greater than 100 MHz and it achieves a throughput of 212 Msymbols/sec, which is double the throughput of conventional one-symbol implementation and enables at least 50% throughput increase compared to the existing two-symbol architectures. 相似文献

15.

二维9／7小波变换VLSI设计

朱斌杰杜慧敏杨晓强韩俊刚《国外电子元器件》2009,17(2):11-13,16

为了提高JPEG2000图像压缩速度,提出一种基于提升算法的二维离散9／7小波变换（DWT）Mesh结构的VLSI设计方案,利用这种Mesh结构的VLSI能够实现并行处理一个图像的所有像素点。这种并行处理的Mesh结构可提高小渡变换电路速度,以及图像压缩的速度。相似文献