H.264/AVC子像素插值的高性能流水线设计及实现 High efficient pipeline design and implementation for sub-pixel interpolation process in H.264/AVC期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

H.264/AVC子像素插值的高性能流水线设计及实现

引用本文：	李春澍,黄凯,修思文,马德,葛海通,严晓浪.H.264/AVC子像素插值的高性能流水线设计及实现[J].浙江大学学报(自然科学版 ),2011,45(7):1187-1193.

作者姓名：	李春澍黄凯修思文马德葛海通严晓浪

作者单位：	1. 浙江大学超大规模集成电路设计研究所,浙江杭州 310027; 2. 杭州中天微系统有限公司,浙江杭州 310027

摘要：	针对在H.264/AVC视频解码系统中子像素插值过程复杂度高的问题,提出一种子像素插值的2层流水线设计方法.第1层流水机制是当8×8分割块内部4个4×4块具有相同的运动信息时,基于4×4分割块参考像素读取和插值运算的两级流水,实现了不同4×4块插值过程的并行操作.第2层流水机制利用插值运算算法中1/2像素值之间的无依赖性以及水平和垂直插值运算过程的对称性,加速了各子像素位置处的像素插值运算过程.核心插值运算单元包括13个6阶滤波器、4个双线性插值滤波器和4个色度插值滤波器.插值运算过程的并行流水机制至少缩减了75%的插值运算时间.实验结果表明,与其他同领域工作相比,该架构设计的硬件开销较小,外部存储器访问量降低了47%,子像素插值性能提高了30%.
High efficient pipeline design and implementation for sub-pixel interpolation process in H.264/AVC

LI Chun-shu,HUANG Kai,XIU Si-wen,MA De,GE Hai-tong,YAN Xiao-lang.High efficient pipeline design and implementation for sub-pixel interpolation process in H.264/AVC[J].Journal of Zhejiang University(Engineering Science),2011,45(7):1187-1193.

Authors:	LI Chun-shu HUANG Kai XIU Si-wen MA De GE Hai-tong YAN Xiao-lang

Abstract:	A two level pipeline architecture was proposed in order to decrease the high complexity of sub pixel interpolation process in H.264/AVC decoding system. The first level pipeline scheme was utilized to explore the parallelism for the interpolation processes of different 4×4 blocks with two stages of fetching 4×4 block’s reference pixels and interpolation computation operation when the four 4×4 blocks inside one 8×8 block share the same motion information. The second level pipeline scheme was used to accelerate the sub pixel interpolation computation operation of different pixels by using the independence of adjacent half pixels and the symmetry between horizontal and vertical interpolation computation processes. The kernel interpolation computation unit was implemented with 13 six tap filters, 4 bilinear interpolation filters and 4 chroma interpolation filters. The pipelining and parallelism in interpolation computation process can reduce computation time by at least 75%. Experimental results show that the proposed architecture design can reduce the external memory bandwidth by 47% and improve the performance of sub pixel interpolation by 30% at a lower hardware cost compared with other designs．

Keywords:

	点击此处可从《浙江大学学报(自然科学版 )》浏览原始摘要信息
	点击此处可从《浙江大学学报(自然科学版 )》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏