共查询到20条相似文献,搜索用时 78 毫秒
1.
H.264标准半像素精度运动估计的硬件结构设计 总被引:1,自引:1,他引:0
设计了一种实时的基于可变块的半像素精度运动估计模块,包括半像素插值模块和半像素搜索模块,插值模块采用6阶FIR滤波器进行插值,搜索模块采用分级搜索算法.此模块应用在H.264标准便携视频设备的编码部分,用Verilog语言编写,采用Xilinx公司XC4VSX25的FPGA芯片作为硬件实现的运算核心. 相似文献
2.
Gustavo Sanchez Marcel Corrêa Diego Noble Marcelo Porto Sergio Bampi Luciano Agostini 《Analog Integrated Circuits and Signal Processing》2012,73(3):931-944
This article presents an architecture for the fractional motion estimation (FME) of the H.264/AVC video coding standard focusing in a good tradeoff between the hardware cost and the video quality. The support to FME guarantees a high quality in the motion estimation process. The applied algorithmic simplifications together with the multiplierless implementation and with a well balanced pipeline allow a low cost and a high throughput solution. The architecture was also designed to avoid redundant external memory accesses when computing the FME. The design was divided in two main modules: integer motion estimation (with diamond search algorithm) and fractional refinement (half-pixel and quarter-pixel interpolation and search). The designed architecture was described in VHDL and synthesized to an Altera Stratix III FPGA. The architecture is able to reach 260 MHz when running in the target FPGA. In worst case scenario, this operation frequency allows a processing rate of 43 HD 1080p (1,920 × 1,080 pixels) frames per second, surpassing the requirements for real time processing. In comparison to related works, the developed architecture was able to achieve a good tradeoff among hardware costs, video quality and processing rate. 相似文献
3.
提出一种快速半像素插值算法的VLSI硬件结构,该结构充分利用中间数据,有效地降低了分像素运动估计的计算量。采用Altera FPGA开发平台进行验证,系统可稳定地工作在135 MHz时钟频率下,并实时编解码1080P@25fps高清视频,满足系统的实时性要求。 相似文献
4.
研究了在绝对差和准则下的整像素级块匹配和半像素级块匹配的联系,对绝对差和进行了合理的数学曲线拟合,通过数学曲线来预测最小绝对差和所在半像素位置,从而得到半像素级最佳匹配矢量。分析了三种不同的凹函数预测模型,提出了一种适合硬件实现的运动估计快速半像素级搜索算法。该算法直接根据整像素级运动估计的结果来推算半像素级运动估计结果,在很大程度上降低了半像素级运动估计的运算复杂度,从而利用低码率视频编码的实时实现。试验结果表明该算法可获得较好的重建图像质量。该算法利于硬件实现,可以方便地集成到现有的视频编码器中,具有较好的实用价值。 相似文献
5.
Conventional two-step algorithm, long latency of interpolation and various motion vectors are three factors that mainly induce high computation complexity of fractional motion estimation and also prevent it from encoding high-definition video. In order to overcome these obstacles, a high performance fractional motion engine is proposed in this paper with three techniques. First, based on high correlation between motion vector of a block and its up-layer as well as relationship of integer candidates, one-step algorithm is proposed. Second, an 8×4 element block processing is adopted, which not only eliminates almost redundancies in interpolation, but also still ensures hardware reusability. Finally, a scheme of processing 4×4 and 4×8 block with free of cycles is presented, so that the number of motion vectors can be reduced up to 59%. Experimental results show that the proposed design just needs 50% of gate count and 56% of cycles when compared with previous design while nearly maintaining the coding performance. 相似文献
6.
7.
Kyeong-Hwan Lee Jung-Hyun Choi Bub-Ki Lee Duk-Gyoo Kim 《Electronics letters》2000,36(7):625-627
A new fast algorithm is presented for reducing the computational complexity of half-pixel accuracy motion estimation. A reduction in the required number of computations is obtained by limiting the number of candidate blocks by predicting the direction of minimum error. It reduces the necessary computational overhead to ~38% of that for the conventional method 相似文献
8.
本文针对块匹配运动估计快速搜索算法的要求,设计了一种算法可编程的运动估计及运动补偿协处理器。该协处理器设计采用软硬件协同处理结构。灵活的指令集和高效的硬件并行执行单元相结合,使得该协处理器具备可编程处理器结构及树形结构运动估计VLSI结构的优点,可以兼顾运动估计算法高处理效率和灵活性的要求。设计的协处理器不拘泥于某种快速搜索算法,通过改变内部程序代码,可以实现多种快速运动估计算法,包括TSS、DS、HEXBS、MVFAST、EPZS等,同时具备很强的可扩展性。与同类设计相比,本设计具有高效、灵活、算法可配置的特点,同时设计消耗的硬件资源也大幅减小。 相似文献
9.
Half-pixel filter of MC-DCT compressed video 总被引:1,自引:0,他引:1
A novel half-pixel filter is proposed to extract one block with half-pixel precision motion vector in the DCT domain. The proposed filter reduces the computational complexity by integrating the interpolation and translation into a single step, while improves the video quality compared with the existing half-pixel filter. 相似文献
10.
运动估计的分层搜索算法及FPGA实现 总被引:3,自引:0,他引:3
针对H.263,MPEG4 SP等低比特率的视频编码特点,在全搜索块匹配算法的基础上提出了一种适合在硬件上实现的运动估计新算法,以及实现这一算法的硬件结构。这种结构充分利用硬件资源,采用了并行结构及数据复用技术,从而大大节省计算时间。对于CIF格式的图像,运动矢量搜索范围为-16~ 15.5,帧速率可达25帧/s。 相似文献
11.
Sridhar Rajagopal Srikrishna Bhashyam Joseph R. Cavallaro Behnaam Aazhang 《The Journal of VLSI Signal Processing》2002,31(2):143-156
This paper presents a reduced-complexity, fixed-point algorithm and efficient real-time VLSI architectures for multiuser channel estimation, one of the core baseband processing operations in wireless base-station receivers for CDMA. Future wireless base-station receivers will need to use sophisticated algorithms to support extremely high data rates and multimedia. Current DSP implementations of these algorithms are unable to meet real-time requirements. However, there exists massive parallelism and bit level arithmetic present in these algorithms than can be revealed and efficiently implemented in a VLSI architecture. We re-design an existing channel estimation algorithm from an implementation perspective for a reduced complexity, fixed-point hardware implementation. Fixed point simulations are presented to evaluate the precision requirements of the algorithm. A dependence graph of the algorithm is presented and area-time trade-offs are developed. An area-constrained architecture achieves low data rates with minimum hardware, which may be used in pico-cell base-stations. A time-constrained solution exploits the entire available parallelism and determines the maximum theoretical data processing rates. An area-time efficient architecture meets real-time requirements with minimum area overhead. 相似文献
12.
Interpolation-Free Fractional-Pixel Motion Estimation Algorithms with Efficient Hardware Implementation 总被引:1,自引:0,他引:1
Mohammed S. Sayed Wael Badawy Graham Jullien 《Journal of Signal Processing Systems》2012,67(2):139-155
This paper presents interpolation-free fractional-pixel motion estimation (FME) algorithms and efficient hardware prototype
of one of the proposed FME algorithms. The proposed algorithms use a mathematical model to approximate the matching error
at fractional-pixel locations instead of using the block matching algorithm to evaluate the actual matching error. Hence,
no interpolation is required at fractional-pixel locations. The matching error values at integer-pixel locations are used
to evaluate the mathematical model coefficients. The performance of the proposed algorithms has been compared with several
FME algorithms including the full quarter-pixel search (FQPS) algorithm, which is used as part of the H.264 reference software.
The computational cost and the performance analysis show that the proposed algorithms have about 90% less computational complexity
than the FQPS algorithm with comparable reconstruction video quality (i.e., approximately 0.2 dB lower reconstruction PSNR
values). In addition, a hardware prototype of one of the proposed algorithms is presented. The proposed architecture has been
prototyped using the TSMC 0.18 μm CMOS technology. It has maximum clock frequency of 312.5 MHz, at which, the proposed architecture
can process more than 70 HDTV 1080p fps. The architecture has only 13,650 gates. The proposed architecture shows superior
performance when compared with several FME architectures. 相似文献
13.
Lopes F.J.P. Ghanbari M. 《Vision, Image and Signal Processing, IEE Proceedings -》1999,146(6):339-344
The authors investigate how the performance of spatial transform motion estimation can be significantly improved by incorporating overlapped compensation and fractional-pixel accuracy. An overlapped spatial transformation (OST) motion model is developed, which successfully addresses the inability of the conventional block matching technique to compensate for complex motion and inside-block motion field discontinuities. Simulation results show that the motion compensated prediction error of this method is reduced by 1.1 dB, compared with the conventional overlapped block matching motion estimation, for the same generated motion vectors overhead. To improve the performance, the overlap must be used in both the motion estimation and compensation processes. Further improvement can be obtained using half-pixel precision motion vectors. However, this improvement is comparatively less than the gain of conventional block matching from a half-pixel search 相似文献
14.
Verification and test issues raise the need for rapid prototyping of complex systems and especially hardware/software-systems. We tackle this problem by integration of hardware/software-codesign and prototyping. First we define the concept of the entire system architecture. This concept directs the hardware/software-partitioning process. Our prototyping environment reflects the architecture concept as well. In this overview the architecture concept and all important design tasks (hardware/software-partitioning, speed-up estimation before HW-synthesis, and prototyping of the entire hardware/software-system) are presented and compared to several approaches from literature. Thus a substantial overview over the prototyping problem is given. The latter part of this presentation illustrates our approach by a case study and presents the results. Our automated design process generates a tightly coupled hardware/software-system with very good performance characteristics. The case study focus on the prototyping of a ciphering algorithm. The reported approach leads to a reasonable overall system speed-up of 10 percent. Similar results have been found for further examples as well. 相似文献
15.
针对AVS解码器中插值预测计算复杂度大的问题,提出了亮度、色度插值计算的一种高速和自适应流水线的硬件结构.根据亮度插值算法的对称性提出一种转置滤波器组的结构,减少了亮度插值计算过程中滤波器的数量和缓存的大小,同时,提取出色度插值中复用的计算单元,节省了的硬件资源的使用.在SMIC 0.18μm工艺库下综合,最高时钟频率为200MHz,占逻辑门数约为82k,在参考帧为2时预测一个宏块最多只需要512个时钟周期.仿真与综合的结果表明,该硬件结构极大的提高了处理速度,能够满足1080p@30fps的AVS-P2视频实时解码的需求. 相似文献
16.
17.
Dongpei Liu Hengzhu Liu Li Zhou Jianfeng Zhang Botao Zhang 《Circuits, Systems, and Signal Processing》2014,33(3):781-797
A simplified DFT-based algorithm and its VLSI implementation for accurate frequency estimation of single-tone complex sinusoid signal are investigated. The proposed algorithm estimates frequency by interpolation using Fourier coefficients. It consists of a coarse search followed by a fine search, and its performance closely achieves the Cramer–Rao low bound (CRLB) even in low SNR region. Moreover, a pipelined triple-mode CORDIC architecture is designed to efficiently support complex multiplication, complex magnitude calculation and real division. The triple-mode CORDIC-based radix-4 architecture is employed for the hardware implementation of the frequency estimator, and is suitable for not only fast Fourier transformation but also accurate frequency estimation. A frequency estimator with 1024-point samples is implemented and verified on FPGA. It works at 215 MHz on a Xilinx XC6VLX240T FPGA device, and uses up 4,161 registers and 6,986 slice LUTs. ASIC synthesis results show that it requires an area of 60K equivalent NAND2 gates with a clock rate of 500 MHz at SMIC 0.18 μm technology. The whole latency of the frequency estimator is 2336 cycles. The proposed architecture provides a good trade off between hardware overhead, estimation performance and computation latency. 相似文献
18.
H.264视频编码标准中引入了1/4像素精度插值算法,大大提高了压缩效率,但同时使运算复杂度增加、存储带宽增大。针对以上问题,从运动估计的角度出发,采用一步插值法和数据复用技术,可使带宽减少26%,处理周期可减少45%;设计了相应的硬件结构:采用了5级流水线实现一步插值算法,通过输入缓冲单元实现了参考数据的复用;针对插值过程中产生的大量数据,采用乒乓操作结构,保证数据及时传递。该结构可以显著降低带宽,提高吞吐率,完全可以应用于实时编码器中。 相似文献
19.
Marcelo Porto João Altermann Eduardo Costa Luciano Agostini Sergio Bampi 《Analog Integrated Circuits and Signal Processing》2012,73(3):919-930
This paper presents a high performance, power efficient and low hardware cost architecture for motion estimation (ME) targeting portable consumer applications. This hardware uses the Sub-sampled Diamond Search algorithm (SDS) with a Dynamic Iteration Control (DIC). The SDS–DIC algorithm can significantly reduce the number of SAD (Sum of Absolute Difference) calculations for block matching, thus enabling the development of an efficient hardware design for the ME. The DIC technique allows for the required throughput to be achieved with a restriction in the number of iterations, which contributes to the reduction in the overall number of clock cycles needed for the motion vector calculation. The processing units (PU) of the ME were developed by using efficient hierarchical adder-compressors, where simultaneous additions of more than two operands can be performed. The results we present show that, by using both the adder compressors in the PU and the DIC technique, it is possible to obtain an efficient ME architecture with higher performance and reduced power consumption. The architecture that implements this algorithm and the PUs was described in VHDL. Hardware synthesis results are presented for a 0.18 μm CMOS standard cell library. The architecture can reach real time for HDTV 1080p with less than 40 mW of power consumption. 相似文献
20.
《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2008,16(10):1385-1398