共查询到15条相似文献,搜索用时 250 毫秒
1.
提出一种基于提升算法实现JPEG2000编码系统中的二维离散小波变换(Discrete Wavelet Transform)的并行阵列式的VLSI结构设计方法.利用该方法所得结构由两个行处理器,一个列处理器以及少量行缓存组成;行列处理器内部是由并行阵列式的处理单元组成;能使行和列滤波器同时进行滤波,用优化的移位加操作替代乘法操作.整个结构采用流水线的设计方法处理,在保证同样的精度下,大大减少了运算量和提高了硬件资源利用率,几乎达到100%,加快了变换速度,也减少了电路的规模.该结构对于N×N大小的图像,处理速度达到O(N2/2)个时钟周期.二维离散小波滤波器结构已经过FPGA验证,并可作为单独的IP核应用于正在开发的JPEG2000图像编解码芯片中. 相似文献
2.
3.
为了进一步加快JPEG2000的压缩速度,对JPEG2000压缩标准进行研究,分析得出JPEG2000核心算法离散小波变换(DWT)部分数据之间的独立性适合并行化处理。NVIDIA最新推出的CUDA(计算统一设备架构)是非常适合大规模数据并行计算的软硬件开发平台。在通用计算图形处理器(general purpose graphic process unit, GPGPU)上使用CUDA技术实现DWT并行化加速,并针对GPGPU存储空间的特点进行优化。得出的实验结果表明,经过CUDA并行优化的方法能够有效地提高DWT的计算速度。 相似文献
4.
5.
6.
在JPEG 2000中,无损图像压缩是采用整数5/3小波变换实现的.JPEG 2000也给出了5/3小波基于提升方法的算法.对提升方法的整数5/3小波变换算法进行了研究,针对二维的变换提出一种VLSI结构.该结构由4个模块构成,模块之间并行运行,模块内部采用流水线技术.对多级变换,级间的运算还可交叉,体现了提升方法的优势,较大地提高了硬件效率.其主要优点是消耗资源少且运算速度高,同时也适用于其他整数小波变换. 相似文献
7.
8.
基于OpenMP的JPEG2000并行解码算法的实现 总被引:1,自引:1,他引:0
为了提高JPEG2000的解码速度,在多核处理器平台上利用OpenMP(Open specifications for Multi Processing)实现了JPEG2000的高速并行解码。即利用OpenMP对JPEG2000解码过程中的T1解码器和离散小波逆变换进行多路并行解码,减少了这两部分的运行时间,从而降低JPEG2000的整体解码时间。实验结果表明,OpenMP是一种简单而有效的并行化编程工具,在保证解码图像质量不变的前提下,相对单线程串行算法,所提出的并行解码算法,解码速度有显著提高。 相似文献
9.
为了实现线阵CCD空间相机图像的实时压缩处理,在提升算法的基础上,提出了一种适用于FPGA的二维提升小波变换结构与实现方案.该系统利用FPGA片内的存储资源,采用乒乓操作实现了行列变换之间的数据缓存传输,降低了功耗,提高了硬件利用率和运算速度.并且为了适应硬件实现速度,在进行小波边界处理时不需要额外的边界延拓过程,很大程度上降低了算法的复杂度;整个模块采用verilog HDL语言进行设计,并在QuestaSim下进行了仿真试验.实验结果表明,该系统工作稳定可靠,完全满足实时处理的要求,并适用于JPEG2000的多级二维5/3小波变换. 相似文献
10.
11.
We propose an architecture that performs the forward and inverse discrete wavelet transform (DWT) using a lifting-based scheme for the set of seven filters proposed in JPEG2000. The architecture consists of two row processors, two column processors, and two memory modules. Each processor contains two adders, one multiplier, and one shifter. The precision of the multipliers and adders has been determined using extensive simulation. Each memory module consists of four banks in order to support the high computational bandwidth. The architecture has been designed to generate an output every cycle for the JPEG2000 default filters. The schedules have been generated by hand and the corresponding timings listed. Finally, the architecture has been implemented in behavioral VHDL. The estimated area of the proposed architecture in 0.18-μ technology is 2.8 nun square, and the estimated frequency of operation is 200 MHz 相似文献
12.
In this paper, we propose an efficient pipeline architecture for the DWT 9/7 filter defined in JPEG 2000. The proposed architecture
is composed of column and row processors to perform the separable 2-D DWT. Based on the rescheduling DWT algorithm, we derive
a new data flow graph to shorten the critical path. The proposed 1-D column processor requires less pipeline registers to
achieve about the same critical path compared with other lifting-based architectures. For the row processor, the data dependency
of each lifting step is reduced to only two computation nodes and therefore more pipeline registers can be applied to achieve
higher processing speed without increasing the internal memory size in the 2-D case. That is, for an N × N image, it only
requires 4N internal memory to perform the row-wise transform. For the memory bit-width analysis, we use software simulation
to reduce the memory bit-width for various compression ratios. Since a portion of information from least significant bits
of DWT coefficients would be discarded after EBCOT-tier2 processing, one can decrease the data width of internal memory to
perform various compression ratios of JPEG 2000 coding, especially at the low-bit rates. Our simulation results suggest that
it is practically possible to design the energy-aware memory architecture to further reduce the power consumption in the future
work. 相似文献
13.
In this paper, we proposed a new architecture of lifting processor for JPEG2000 and implemented it with both FPGA and ASIC. It includes a new cell structure that executes a unit of lifting calculation to satisfy the requirements of the lifting process of a repetitive arithmetic. After analyzing the operational sequence of lifting arithmetic in detail and imposing the causality to implement in hardware, the unit cell was optimized. A new simple lifting kernel was organized by repeatedly arranging the unit cells and a lifting processor was realized for Motion JPEG2000 with the kernel. The proposed processor can handle any size of tiles and support both lossy and lossless operation with (9,7) filter and (5,3) filter, respectively. Also, it has the same throughput rate as the input, and can continuously output the wavelet coefficients of the four types (LL, LH, HL, HH) simultaneously. The lifting processor was implemented in a 0.35 mum CMOS fabrication process, the result of which occupied about 90 000 gates, and was stably operated in about 150 MHz 相似文献
14.
JPEG 2000 is one of the most popular image compression standards offering significant performance advantages over previous
image standards. High computational complexity of the JPEG 2000 algorithms makes it necessary to employ methods that overcomes
the bottlenecks of the system and hence an efficient solution is imperative. One such crucial algorithms in JPEG 2000 is arithmetic
coding and is completely based on bit level operations. In this paper, an efficient hardware implementation of arithmetic
coding is proposed which uses efficient pipelining and parallel processing for intermediate blocks. The idea is to provide
a two-symbol coding engine, which is efficient in terms of performance, memory and hardware. This architecture is implemented
in Verilog hardware definition language and synthesized using Altera field programmable gate array. The only memory unit used
in this design is a FIFO (first in first out) of 256 bits to store the CX-D pairs at the input, which is negligible compared
to the existing arithmetic coding hardware designs. The simulation and synthesis results show that the operating frequency
of the proposed architecture is greater than 100 MHz and it achieves a throughput of 212 Msymbols/sec, which is double the
throughput of conventional one-symbol implementation and enables at least 50% throughput increase compared to the existing
two-symbol architectures. 相似文献