期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

邸志雄史江义刘凯李云松马佩军都跃《电子学报》2013,41(5):918-925

MQ(Multiple Quantization)编码器由于效率低下已经成为JPEG2000的性能瓶颈.本文对MQ编码算法中的上下文关系进行了提取,对索引表中的启动态和非暂态进行了分离,并提出一种用于预测索引值的方法.同时,对重归一化运算中出现的大概率事件和小概率事件进行分离,使其可并行对2个上下文完成编码.依据该算法,本文提出了一种多上下文并行处理的MQ编码器VLSI结构.实验结果表明,本文提出的MQ编码器能够工作在286.80MHz,吞吐量为573.60 Msymbols/sec,相比Dyer提出的Brute Force with Modified Byteout结构,本文的吞吐量提升约35%,且面积减小78%. 相似文献

2.

JPEG2000算术编码器的算法优化和VLSI设计 总被引：1，自引：1，他引：0

下载免费PDF全文

刘文松朱恩王健徐龙涛林叶《电子学报》2011,39(11):2486-2491

研究了JPEG2000算术编码器的算法和电路实现.提出了重归一化规程的一种新的顺序结构,通过添加独立的总移位次数预测规程,使得编码算法可以一次性顺序完成当前上下文的处理.据此设计了具有从流水线的三级流水线电路结构,流水线用于处理无编码字节输出的常规情况,从流水线单独处理编码字节的输出,从而有效缩短了各级电路的关键路径延... 相似文献

3.

一种适用于JPEG2000的高速MQ编码器的VLSI实现 总被引：6，自引：0，他引：6

华林朱珂周晓方俞军章倩苓《固体电子学研究与进展》2003,23(4):421-426

MQ编码器对于无损的数据压缩是一种非常有效的方法 ,它已被 JPEG2 0 0 0标准所采用。但该编码算法复杂度高 ,执行速度慢。文中提出了一种基于动态流水的高性能 MQ编码器的 VLSI结构。为了获得高速处理能力 ,首先分析了 JPEG2 0 0 0标准中 MQ编码算法的软件流程 ,并对其进行了相应的修改以适应硬件实现 ,然后采用了“动态流水”技术 ,可以根据变化的运算量来实时地安排流水操作。本 MQ编码器结构经 Xilinx FPGA实现 ,处理速度可达约 0 .6 2 5bit/ cycle( 32 .83Mbit/ sec) 相似文献

4.

JPEG2000标准中MQ编码器的VLSI结构设计

尚中祥宋学瑞《现代电子技术》2009,32(14):83-86

MQ编码器是JPEG 2000标准中重要的无损压缩算法,可获得很高的压缩效率.但因其算法复杂度高,执行速度慢,使其应用受到很大限制.为了获得高速处理能力,设计一种高速MQ编码器的VLSI结构,采用三级流水线结构,对算法进行优化,并改进概率估计表内容.设计使用Verilog进行编程,最后通过Modelsim 6.1进行仿真.实验结果表明,该设计极大地提高了编码速度.这里的研究对于JPEG 2000在实际中的应用有着重要的意义. 相似文献

5.

一种基于MQ编码器的图像联合压缩加密算法

谢凯明邓家先《电视技术》2014,(9)

为实现图像的压缩和加密同步,使用MQ编码器对内嵌零树小波压缩算法进行改进,将混合混沌序列作为流密钥对比特平面编码生成的上下文和判决进行修正,并送入MQ编码器进行熵编码。对算法进行仿真,结果表明:与原压缩算法相比,所提出算法的重构图像PSNR值至少提高了1 dB,且抗攻击性好,加解密速度快。算法实现了分辨率选择性加密,并在数据压缩的同时实现了算术加密。相似文献

6.

JPEG2000中高性能Tier-1编码器的VLSI结构设计与实现

徐伟哲苏阳平许旌阳王进祥《微电子学与计算机》2014,(3)

为满足JPEG2000编码器的硬件实现需求,针对其中最为复杂和耗时的Tier-1编码器,提出了一种高效的硬件实现结构.该结构采用通道并行的位平面编码器,并且在通道内部采用基于列的点跳跃算法,提升了位平面的编码速度.同时,MQ编码器与位平面编码器配合,引入5级动态流水结构,进一步提高编码效率.FPGA验证结果表明,运用该结构的Tier-1编码器,在提高70%编码效率的同时只增加了18.2%的硬件开销,取得了令人满意的结果. 相似文献

7.

基于JPEG2000的高速MQ算术编码器的研究与实现

周赟支琤王峰陈磊《信息技术》2007,(10):49-52

提出了一种基于流水线技术的高速MQ算术编码器的VLSI实现架构。文中采用表扩展及乒乓buffer输出,同时对标准编码流程进行了优化及调整,以适合VLSI高速实现。结构采用流水线技术,将整体架构分为三个流水级,极大的提高了处理速度。经Xilinx公司的FPGA验证,本结构的处理速度可达到1bit/cycle(47.292Mbit/sec)。相似文献

8.

JPEG2000中位平面编码的VLSI结构设计

下载免费PDF全文

乔世杰张益民高勇《电子器件》2007,30(6):2229-2232

位平面编码用于对量化的离散小波变换的码块数据进行编码.通过对位平面编码算法的分析和C语言验证,给出了位平面编码的四种基本编码操作和三个编码通道具体的VLSI结构实现.对位平面编码器的VLSI结构进行了仿真和综合,在图像验证系统上用逻辑分析仪实际测量的结果与仿真结果一致.该位平面编码器可在50 MHz的主频下,完成32×32码块数据的编码.所设计的位平面编码器已经作为单独的IP核应用于目前正在开发的JPEG2000图像编码芯片中. 相似文献

9.

块匹配运动估计VLSI结构研究与进展

郑兆青桑红石沈绪榜《中国集成电路》2006,15(10)

块匹配运动估计是视频编码器中的计算量和存储访问最密集的模块,为了满足实时编码的需求常用VLSI结构实现.本文对块匹配运动估计的VLSI结构作了系统的总结,并提出了改进的方向. 相似文献

10.

基于FBRM的自适应码率控制算法

许磊王胜利谢慧王丽丽《电子科技》2010,23(9):77-79,82

针对EBCOT中MQ占用大量编码时间和资源,提出了一种基于码率反馈MQ自适应率控制算法。根据小波子带特性自适应地选择Coding Pass进入MQ算术编码器,先进入码流的Coding Pass反馈控制未进入MQ的Coding Pass,查找截断点,舍弃对最终码流无贡献的Coding Pass的码段。从而提升了整个EBCOT编码效率。算法几乎对整个图像压缩质量无影响,同时还大幅度地提高了整个EBCOT的编码效率。试验结果表明,文中算法有效地减少了EBCOT中MQ的计算量和存储量,易于硬件实现。相似文献

11.

基于上下文的自适应二进制算术编码的硬件实现

陈光化陆桂富武凯《微电子学与计算机》2006,23(11):16-18,25

文章提出了一种适用H.264标准的自适应算术编码器的VLSI实现方案,它对算术编码的结构做了改进,用查表代替了乘法操作,并采用流水线结构实现,获得了较高的吞吐速率.在采用Verilog语言对编码模块进行描述后,用ALTEAR公司的现场可编程门阵列（FPGA）进行仿真验证.实验表明,这种流水线结构的算术编码器能够获得较高的编码速度. 相似文献

12.

JAGUAR: a fully pipelined VLSI architecture for JPEG imagecompression standard

Kovac M. Ranganathan N. 《Proceedings of the IEEE. Institute of Electrical and Electronics Engineers》1995,83(2):247-258

In this paper, we describe a fully pipelined single chip VLSI architecture for implementing the JPEG baseline image compression standard. The architecture exploits the principles of pipelining and parallelism to the maximum extent in order to obtain high speed and throughput. The architecture for discrete cosine transform and the entropy encoder are based on efficient algorithms designed for high speed VLSI implementation. The entire architecture can be implemented on a single VLSI chip to yield a clock rate of about 100 MHz which would allow an input rate of 30 frames per second for 1024×1024 color images 相似文献

13.

Parallel interleaver design and VLSI architecture for low-latency MAP turbo decoders

Dobkin R. Peleg M. Ginosar R. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2005,13(4):427-438

Standard VLSI implementations of turbo decoding require substantial memory and incur a long latency, which cannot be tolerated in some applications. A parallel VLSI architecture for low-latency turbo decoding, comprising multiple single-input single-output (SISO) elements, operating jointly on one turbo-coded block, is presented and compared to sequential architectures. A parallel interleaver is essential to process multiple concurrent SISO outputs. A novel parallel interleaver and an algorithm for its design are presented, achieving the same error correction performance as the standard architecture. Latency is reduced up to 20 times and throughput for large blocks is increased up to six-fold relative to sequential decoders, using the same silicon area, and achieving a very high coding gain. The parallel architecture scales favorably: latency and throughput are improved with increased block size and chip area. 相似文献

14.

High performance VLSI architecture for division and square root

McQuillan S.E. McCanny J.V. Woods R.F. 《Electronics letters》1991,27(1):19-21

A novel high performance bit parallel architecture to perform square root and division is proposed. Relevant VLSI design issues have been addressed. By employing redundant arithmetic and a semisystolic schedule, the throughput has been made independent of the size of the array.<> 相似文献

15.

Custom design of a VLSI PCM-FDM transmultiplexer from system specifications to circuit layout using a computer-aided design system

《Solid-State Circuits, IEEE Journal of》1986,21(1):73-85

The computer-aided design of a VLSI PCM-FDM transmultiplexer is presented. The entire design process, from system specifications to integrated circuit layout, is carried out with the aid of specialized computer programs for the analysis, synthesis, and optimization at each design level: the filter network, the architecture, and the circuit layout. These CAD tools support a top-down custom design methodology based on bit-serial architectures and standard cells. A customized architecture is constructed which is integrated using a 5-/spl mu/m CMOS cell library. The results are compared with a fully manual design and demonstrate the power of architecture based computer-aided design methodologies for VLSI filtering. By combining both synthesis and optimization aids at each design level it is possible to achieve a high degree of automation while retaining an efficient use of silicon area, high throughput, and moderate power consumption. 相似文献

16.

Pipelining flat CORDIC based trigonometric function generators

《Microelectronics Journal》2002,33(1-2):77-89

Despite further refinements of the CORDIC algorithm with the introduction of redundant arithmetic and higher radix CORDIC techniques, in terms of circuit latency and performance, the iterative nature remains to be the major bottleneck for further optimization. A technique known as flat CORDIC, in which the conventional X and Y recurrences are successively substituted to express the final vectors in terms of the initial vectors, can be used to eliminate the iterative process. In this paper, the techniques devised for the VLSI efficient implementation of a pipelined 16-bit flat CORDIC based sine–cosine generator are presented. Three possible schemes to pipeline the 16-bit flat CORDIC design have been presented to demonstrate the suitability of the proposed method to realize high throughput implementations. The 16-bit architecture has been synthesized with 0.35 μ CMOS process library using Synopsys. Finally, a detailed comparison with other major contributions show that the flat CORDIC based sine–cosine generators are, on average, 30% faster and occupy some 30% less silicon area. 相似文献

17.

适用于片上并行计算阵列的超精简处理器架构

周韧研刘雷波魏少军《电路与系统学报》2012,17(2):1-5

提出一种超精简处理单元架构。该处理单元基于运算-跳转式单指令处理器体系。使用指令优化和内部总线上加速器,该处理单元能够执行传统算术运算式单指令处理器难于执行的高效位运算以及执行效率较低的数据转移操作。以该处理单元构成的片上大规模并行计算阵列可用于图像处理等局部性强、实时性要求高的计算任务。包含有该处理单元架构的16 16的原型阵列已经在FPGA上实现,性能达30.7GOPS@120MHz,平均功耗39.5mW。相似文献

18.

A high throughput pass parallel block decoder architecture for JPEG 2000 that prevents stalling in the decoding process

《Integration, the VLSI Journal》2020

The Block Decoder (BD) which is an indispensable component of the JPEG 2000 image compression standard has the highest computational complexity and determines the speed of the overall decoder system. This paper proposes a high throughput pass parallel BD architecture, which can decode more than one bit per clock cycle. In BD, the dependency between context generation and arithmetic decoding unit incorporates stalling and reduces the throughput of the decoding process. The proposed selective byte input and synchronous sample skipping techniques are used to prevent stalling in the decoding process. The proposed architecture achieves 86% more throughput with 50% increment in the hardware cost than that of the best available serial BD architecture. In comparison with the best available pass parallel architecture, throughput improves almost 8.2 times with 61% increment in the hardware cost. Incorporation of the speed up techniques in the design is the main reason for more hardware consumption. The Figure of Merit of the proposed design, which is the ratio of throughput and hardware cost, is more than that of the available BD architectures for typical code block (CB) size of 32 × 32. The ASIC implementation of the proposed design consumes 66 mW power at maximum operating frequency. 相似文献