CUDA架构下H.264快速去块滤波算法 Algorithm of H.264 fast deblocking filter on CUDA期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

CUDA架构下H.264快速去块滤波算法

引用本文：	刘虎,孙召敏,陈启美. CUDA架构下H.264快速去块滤波算法[J]. 计算机应用, 2010, 30(12): 3252-3254

作者姓名：	刘虎孙召敏陈启美

作者单位：	1. 南京大学2.

基金项目：	江苏省重大高科技研究项目，江苏省交通科学研究计划项目

摘要：	针对H.264/AVC视频编码标准中去块滤波器运算复杂度高、耗时巨大这一难题，提出了一种基于NVIDIA计算统一设备架构（CUDA）平台的H.264并行快速去块滤波算法，介绍了CUDA平台硬件结构特点与软件开发流程，根据图形处理器（GPU）的并发结构特点，对BS判定与滤波计算进行了并行优化，降低了算法复杂度，利用共享内存提高了数据访问速率，实现了去块滤波器的并行处理。实验结果表明，在图像质量基本不变的情况下，GPU算法能够明显提高运算速度，平均加速比在20倍左右，取得了良好的效果。
关键词：	计算统一设备架构 H.264 去块滤波并行计算
收稿时间：	2010-05-20
修稿时间：	2010-07-09
Algorithm of H.264 fast deblocking filter on CUDA

LIU Hu,SUN Zhao-min,CHEN Qi-mei. Algorithm of H.264 fast deblocking filter on CUDA[J]. Journal of Computer Applications, 2010, 30(12): 3252-3254

Authors:	LIU Hu SUN Zhao-min CHEN Qi-mei

Abstract:	In H.264/AVC video coding standard, deblocking filter was used for enhancing the coding efficiency. The filter was very complicated and cost a lot of time. A fast algorithm and efficient implementation of H.264 deblocking filter based on NVIDIA Compute Unified Device Architecture (CUDA) was proposed. The parallel hardware architecture and software development process of Graphic Processing Unit (GPU) were introduced firstly. On the basis of the parallel architecture and hardware characteristic of GPU, some algorithms were focused on BS computation and optimization of deblocking filter to reduce complexity and improve the computing speed, and the shared memory was used to improve the data access efficiency. The experimental results clearly show that, in the same image quality, the average acceleration rate is about 20, and the algorithm on CPU can achieve better performance.

Keywords:	Compute Unified Device Architecture (CUDA) H.264 deblocking filter parallel computing
本文献已被万方数据等数据库收录！
	点击此处可从《计算机应用》浏览原始摘要信息
	点击此处可从《计算机应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏