期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

H.264标准半像素精度运动估计的硬件结构设计 总被引：1，自引：1，他引：0

张鹏程张志杰《电视技术》2010,34(11)

设计了一种实时的基于可变块的半像素精度运动估计模块,包括半像素插值模块和半像素搜索模块,插值模块采用6阶FIR滤波器进行插值,搜索模块采用分级搜索算法.此模块应用在H.264标准便携视频设备的编码部分,用Verilog语言编写,采用Xilinx公司XC4VSX25的FPGA芯片作为硬件实现的运算核心. 相似文献

2.

Hardware design focusing in the tradeoff cost versus quality for the H.264/AVC fractional motion estimation targeting high definition videos

Gustavo Sanchez Marcel Corrêa Diego Noble Marcelo Porto Sergio Bampi Luciano Agostini 《Analog Integrated Circuits and Signal Processing》2012,73(3):931-944

This article presents an architecture for the fractional motion estimation (FME) of the H.264/AVC video coding standard focusing in a good tradeoff between the hardware cost and the video quality. The support to FME guarantees a high quality in the motion estimation process. The applied algorithmic simplifications together with the multiplierless implementation and with a well balanced pipeline allow a low cost and a high throughput solution. The architecture was also designed to avoid redundant external memory accesses when computing the FME. The design was divided in two main modules: integer motion estimation (with diamond search algorithm) and fractional refinement (half-pixel and quarter-pixel interpolation and search). The designed architecture was described in VHDL and synthesized to an Altera Stratix III FPGA. The architecture is able to reach 260 MHz when running in the target FPGA. In worst case scenario, this operation frequency allows a processing rate of 43 HD 1080p (1,920 × 1,080 pixels) frames per second, surpassing the requirements for real time processing. In comparison to related works, the developed architecture was able to achieve a good tradeoff among hardware costs, video quality and processing rate. 相似文献

3.

基于H.264快速半像素插值算法的VLSI实现

宋宇鲲陈效波《电子科技》2014,27(7):1-4

提出一种快速半像素插值算法的VLSI硬件结构,该结构充分利用中间数据,有效地降低了分像素运动估计的计算量。采用Altera FPGA开发平台进行验证,系统可稳定地工作在135 MHz时钟频率下,并实时编解码1080P@25fps高清视频,满足系统的实时性要求。相似文献

4.

一种基于硬件实现的快速运动估计半像素级搜索算法

赵波吴成柯张方《电路与系统学报》2005,10(2):146-150

研究了在绝对差和准则下的整像素级块匹配和半像素级块匹配的联系，对绝对差和进行了合理的数学曲线拟合，通过数学曲线来预测最小绝对差和所在半像素位置，从而得到半像素级最佳匹配矢量。分析了三种不同的凹函数预测模型，提出了一种适合硬件实现的运动估计快速半像素级搜索算法。该算法直接根据整像素级运动估计的结果来推算半像素级运动估计结果，在很大程度上降低了半像素级运动估计的运算复杂度，从而利用低码率视频编码的实时实现。试验结果表明该算法可获得较好的重建图像质量。该算法利于硬件实现，可以方便地集成到现有的视频编码器中，具有较好的实用价值。相似文献

5.

High performance fractional motion estimation in h.264/avc based on one-step algorithm and 8×4 element block processing

Nam Thang Ta Jun Rim Choi 《Signal Processing: Image Communication》2011,26(2):85-92

Conventional two-step algorithm, long latency of interpolation and various motion vectors are three factors that mainly induce high computation complexity of fractional motion estimation and also prevent it from encoding high-definition video. In order to overcome these obstacles, a high performance fractional motion engine is proposed in this paper with three techniques. First, based on high correlation between motion vector of a block and its up-layer as well as relationship of integer candidates, one-step algorithm is proposed. Second, an 8×4 element block processing is adopted, which not only eliminates almost redundancies in interpolation, but also still ensures hardware reusability. Finally, a scheme of processing 4×4 and 4×8 block with free of cycles is presented, so that the number of motion vectors can be reduced up to 59%. Experimental results show that the proposed design just needs 50% of gate count and 56% of cycles when compared with previous design while nearly maintaining the coding performance. 相似文献

6.

H.264/AVC运动估计算法的硬件结构研究

王巍林涛谢玉亭杨丽君牟茂《数字通信》2013,(3):1-4

运动估计是H.264/AVC编码器的重要组成部分,其运算量占据了整个编码器计算时间的60%~90%。对H.264/AVC运动估计的几种快速搜索算法进行分析比较,并在此基础上提出先进的六边形搜索算法。给出运动估计快速搜索算法的一般硬件结构,并在此基础上提出具有流水线并行处理能力的先进六边形搜索算法的硬件结构。实验结果表明：该硬件结构系统工作频率能够达到109.06 MHz,完全能够满足高清视频实时应用的要求。相似文献

7.

Fast two-step half-pixel accuracy motion vector prediction

Kyeong-Hwan Lee Jung-Hyun Choi Bub-Ki Lee Duk-Gyoo Kim 《Electronics letters》2000,36(7):625-627

A new fast algorithm is presented for reducing the computational complexity of half-pixel accuracy motion estimation. A reduction in the required number of computations is obtained by limiting the number of candidate blocks by predicting the direction of minimum error. It reduces the necessary computational overhead to ~38% of that for the conventional method 相似文献

8.

一种新的算法可编程的运动估计协处理器

刘锋庄奕琪何威《电路与系统学报》2007,12(5):126-130,125

本文针对块匹配运动估计快速搜索算法的要求,设计了一种算法可编程的运动估计及运动补偿协处理器。该协处理器设计采用软硬件协同处理结构。灵活的指令集和高效的硬件并行执行单元相结合,使得该协处理器具备可编程处理器结构及树形结构运动估计VLSI结构的优点,可以兼顾运动估计算法高处理效率和灵活性的要求。设计的协处理器不拘泥于某种快速搜索算法,通过改变内部程序代码,可以实现多种快速运动估计算法,包括TSS、DS、HEXBS、MVFAST、EPZS等,同时具备很强的可扩展性。与同类设计相比,本设计具有高效、灵活、算法可配置的特点,同时设计消耗的硬件资源也大幅减小。相似文献

9.

Half-pixel filter of MC-DCT compressed video 总被引：1，自引：0，他引：1

Cao G. Li J. Zhang Y. 《Electronics letters》2003,39(17):1243-1245

A novel half-pixel filter is proposed to extract one block with half-pixel precision motion vector in the DCT domain. The proposed filter reduces the computational complexity by integrating the interpolation and translation into a single step, while improves the video quality compared with the existing half-pixel filter. 相似文献

10.

运动估计的分层搜索算法及FPGA实现 总被引：3，自引：0，他引：3

汤华莲庄奕琪《现代电子技术》2004,27(4):84-86,89

针对H．263，MPEG4 SP等低比特率的视频编码特点，在全搜索块匹配算法的基础上提出了一种适合在硬件上实现的运动估计新算法，以及实现这一算法的硬件结构。这种结构充分利用硬件资源，采用了并行结构及数据复用技术，从而大大节省计算时间。对于CIF格式的图像，运动矢量搜索范围为-16～ 15．5，帧速率可达25帧／s。相似文献

11.

Efficient VLSI Architectures for Multiuser Channel Estimation in Wireless Base-Station Receivers

Sridhar Rajagopal Srikrishna Bhashyam Joseph R. Cavallaro Behnaam Aazhang 《The Journal of VLSI Signal Processing》2002,31(2):143-156

This paper presents a reduced-complexity, fixed-point algorithm and efficient real-time VLSI architectures for multiuser channel estimation, one of the core baseband processing operations in wireless base-station receivers for CDMA. Future wireless base-station receivers will need to use sophisticated algorithms to support extremely high data rates and multimedia. Current DSP implementations of these algorithms are unable to meet real-time requirements. However, there exists massive parallelism and bit level arithmetic present in these algorithms than can be revealed and efficiently implemented in a VLSI architecture. We re-design an existing channel estimation algorithm from an implementation perspective for a reduced complexity, fixed-point hardware implementation. Fixed point simulations are presented to evaluate the precision requirements of the algorithm. A dependence graph of the algorithm is presented and area-time trade-offs are developed. An area-constrained architecture achieves low data rates with minimum hardware, which may be used in pico-cell base-stations. A time-constrained solution exploits the entire available parallelism and determines the maximum theoretical data processing rates. An area-time efficient architecture meets real-time requirements with minimum area overhead. 相似文献

12.

Interpolation-Free Fractional-Pixel Motion Estimation Algorithms with Efficient Hardware Implementation 总被引：1，自引：0，他引：1

Mohammed S. Sayed Wael Badawy Graham Jullien 《Journal of Signal Processing Systems》2012,67(2):139-155

This paper presents interpolation-free fractional-pixel motion estimation (FME) algorithms and efficient hardware prototype of one of the proposed FME algorithms. The proposed algorithms use a mathematical model to approximate the matching error at fractional-pixel locations instead of using the block matching algorithm to evaluate the actual matching error. Hence, no interpolation is required at fractional-pixel locations. The matching error values at integer-pixel locations are used to evaluate the mathematical model coefficients. The performance of the proposed algorithms has been compared with several FME algorithms including the full quarter-pixel search (FQPS) algorithm, which is used as part of the H.264 reference software. The computational cost and the performance analysis show that the proposed algorithms have about 90% less computational complexity than the FQPS algorithm with comparable reconstruction video quality (i.e., approximately 0.2 dB lower reconstruction PSNR values). In addition, a hardware prototype of one of the proposed algorithms is presented. The proposed architecture has been prototyped using the TSMC 0.18 μm CMOS technology. It has maximum clock frequency of 312.5 MHz, at which, the proposed architecture can process more than 70 HDTV 1080p fps. The architecture has only 13,650 gates. The proposed architecture shows superior performance when compared with several FME architectures. 相似文献

13.

Analysis of spatial transform motion estimation with overlappedcompensation and fractional-pixel accuracy

Lopes F.J.P. Ghanbari M. 《Vision, Image and Signal Processing, IEE Proceedings -》1999,146(6):339-344

The authors investigate how the performance of spatial transform motion estimation can be significantly improved by incorporating overlapped compensation and fractional-pixel accuracy. An overlapped spatial transformation (OST) motion model is developed, which successfully addresses the inability of the conventional block matching technique to compensate for complex motion and inside-block motion field discontinuities. Simulation results show that the motion compensated prediction error of this method is reduced by 1.1 dB, compared with the conventional overlapped block matching motion estimation, for the same generated motion vectors overhead. To improve the performance, the overlap must be used in both the motion estimation and compensation processes. Further improvement can be obtained using half-pixel precision motion vectors. However, this improvement is comparatively less than the gain of conventional block matching from a half-pixel search 相似文献

14.

Prototyping of Tightly Coupled Hardware/Software-Systems

Wolfram Hardt Wolfgang Rosenstiel 《Design Automation for Embedded Systems》1997,2(3-4):283-317

Verification and test issues raise the need for rapid prototyping of complex systems and especially hardware/software-systems. We tackle this problem by integration of hardware/software-codesign and prototyping. First we define the concept of the entire system architecture. This concept directs the hardware/software-partitioning process. Our prototyping environment reflects the architecture concept as well. In this overview the architecture concept and all important design tasks (hardware/software-partitioning, speed-up estimation before HW-synthesis, and prototyping of the entire hardware/software-system) are presented and compared to several approaches from literature. Thus a substantial overview over the prototyping problem is given. The latter part of this presentation illustrates our approach by a case study and presents the results. Our automated design process generates a tightly coupled hardware/software-system with very good performance characteristics. The case study focus on the prototyping of a ciphering algorithm. The reported approach leads to a reasonable overall system speed-up of 10 percent. Similar results have been found for further examples as well. 相似文献

15.

AVS插值预测的一种高速自适应硬件结构设计

黄有源何明华《微电子学与计算机》2012,29(4):126-130

针对AVS解码器中插值预测计算复杂度大的问题,提出了亮度、色度插值计算的一种高速和自适应流水线的硬件结构.根据亮度插值算法的对称性提出一种转置滤波器组的结构,减少了亮度插值计算过程中滤波器的数量和缓存的大小,同时,提取出色度插值中复用的计算单元,节省了的硬件资源的使用.在SMIC 0.18μm工艺库下综合,最高时钟频率为200MHz,占逻辑门数约为82k,在参考帧为2时预测一个宏块最多只需要512个时钟周期.仿真与综合的结果表明,该硬件结构极大的提高了处理速度,能够满足1080p@30fps的AVS-P2视频实时解码的需求. 相似文献

16.

一种适用于遮挡问题的运动补偿帧插值方法

赵建伟王朋刘重庆《电子与信息学报》2004,26(5):771-776

针对二维运动估计中常出现的遮挡问题,提出一种基于网格模型的运动补偿自适应帧插值技术,将图像帧中分为3种不同区域,根据各自的特点分别进行运动估计和运动补偿插值。在准确检测遮挡区域前提下,单独对遮挡区域处理有效地减小了运动估计误差。提出采用特征窗口运动补偿匹配的方法进行网格节点运动矢量估计,消除了传统块匹配方法导致的块效应,得到了亚像素的匹配精度。实验证明,该方法简单易行,有效地解决了遮挡区域的运动估计问题,插值图像具有较好的视觉效果。相似文献

17.

Computationally Efficient Architecture for Accurate Frequency Estimation with Fourier Interpolation

Dongpei Liu Hengzhu Liu Li Zhou Jianfeng Zhang Botao Zhang 《Circuits, Systems, and Signal Processing》2014,33(3):781-797

A simplified DFT-based algorithm and its VLSI implementation for accurate frequency estimation of single-tone complex sinusoid signal are investigated. The proposed algorithm estimates frequency by interpolation using Fourier coefficients. It consists of a coarse search followed by a fine search, and its performance closely achieves the Cramer–Rao low bound (CRLB) even in low SNR region. Moreover, a pipelined triple-mode CORDIC architecture is designed to efficiently support complex multiplication, complex magnitude calculation and real division. The triple-mode CORDIC-based radix-4 architecture is employed for the hardware implementation of the frequency estimator, and is suitable for not only fast Fourier transformation but also accurate frequency estimation. A frequency estimator with 1024-point samples is implemented and verified on FPGA. It works at 215 MHz on a Xilinx XC6VLX240T FPGA device, and uses up 4,161 registers and 6,986 slice LUTs. ASIC synthesis results show that it requires an area of 60K equivalent NAND2 gates with a clock rate of 500 MHz at SMIC 0.18 μm technology. The whole latency of the frequency estimator is 2336 cycles. The proposed architecture provides a good trade off between hardware overhead, estimation performance and computation latency. 相似文献

18.

H.264编码器中1/4像素精度插值算法的VLSI实现

CHEN Guang-hua 陈光化翟海华石旭利张兆杨万芬芳《微电子学与计算机》2008,25(2):176-180

H.264视频编码标准中引入了1/4像素精度插值算法,大大提高了压缩效率,但同时使运算复杂度增加、存储带宽增大。针对以上问题,从运动估计的角度出发,采用一步插值法和数据复用技术,可使带宽减少26%,处理周期可减少45%;设计了相应的硬件结构:采用了5级流水线实现一步插值算法,通过输入缓冲单元实现了参考数据的复用;针对插值过程中产生的大量数据,采用乒乓操作结构,保证数据及时传递。该结构可以显著降低带宽,提高吞吐率,完全可以应用于实时编码器中。相似文献

19.

Power efficient SDS motion estimation architecture using dynamic iteration control and hierarchical adder compressors for real time HDTV video coding

Marcelo Porto João Altermann Eduardo Costa Luciano Agostini Sergio Bampi 《Analog Integrated Circuits and Signal Processing》2012,73(3):919-930

This paper presents a high performance, power efficient and low hardware cost architecture for motion estimation (ME) targeting portable consumer applications. This hardware uses the Sub-sampled Diamond Search algorithm (SDS) with a Dynamic Iteration Control (DIC). The SDS–DIC algorithm can significantly reduce the number of SAD (Sum of Absolute Difference) calculations for block matching, thus enabling the development of an efficient hardware design for the ME. The DIC technique allows for the required throughput to be achieved with a restriction in the number of iterations, which contributes to the reduction in the overall number of clock cycles needed for the motion vector calculation. The processing units (PU) of the ME were developed by using efficient hierarchical adder-compressors, where simultaneous additions of more than two operands can be performed. The results we present show that, by using both the adder compressors in the PU and the DIC technique, it is possible to obtain an efficient ME architecture with higher performance and reduced power consumption. The architecture that implements this algorithm and the PUs was described in VHDL. Hardware synthesis results are presented for a 0.18 μm CMOS standard cell library. The architecture can reach real time for HDTV 1080p with less than 40 mW of power consumption. 相似文献

20.

Efficient Hierarchical Motion Estimation Algorithm and Its VLSI Architecture

《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2008,16(10):1385-1398

This paper addresses the development and hardware implementation of an efficient hierarchical motion estimation algorithm, HMEA, using multiresolution frames to reduce the computational complexity. Excellent estimation performance is ensured using an averaging filter to downsample the original image. At the smallest resolution, the least two motion vector candidates are selected using a full-search block matching algorithm. At the middle level, these two candidate motion vectors are employed as the center points for small range local searches. Then, at the original resolution, the final motion vector is obtained by performing a local search around the single candidate from the middle level. HMEA exhibits regular data flow and is suitable for hardware implementation. An efficient VLSI architecture that includes an averaging filter to downsample the image and two 2-D semisystolic processing element arrays to determine the sum of absolute difference in pipeline is also presented. Simulation results indicate that HMEA is more area-efficient and faster than many full-search and multiresolution architectures while maintaining high video quality. This architecture with 59K gates and 1393 bytes of RAM is implemented for a search range of [ $-$16.0, $+$15.5]. 相似文献