首页 | 官方网站   微博 | 高级检索  
     

CUDA架构下H.264快速去块滤波算法
引用本文:刘虎,孙召敏,陈启美. CUDA架构下H.264快速去块滤波算法[J]. 计算机应用, 2010, 30(12): 3252-3254
作者姓名:刘虎  孙召敏  陈启美
作者单位:1. 南京大学2.
基金项目:江苏省重大高科技研究项目,江苏省交通科学研究计划项目
摘    要:针对H.264/AVC视频编码标准中去块滤波器运算复杂度高、耗时巨大这一难题,提出了一种基于NVIDIA计算统一设备架构(CUDA)平台的H.264并行快速去块滤波算法,介绍了CUDA平台硬件结构特点与软件开发流程,根据图形处理器(GPU)的并发结构特点,对BS判定与滤波计算进行了并行优化,降低了算法复杂度,利用共享内存提高了数据访问速率,实现了去块滤波器的并行处理。实验结果表明,在图像质量基本不变的情况下,GPU算法能够明显提高运算速度,平均加速比在20倍左右,取得了良好的效果。

关 键 词:计算统一设备架构  H.264  去块滤波  并行计算  
收稿时间:2010-05-20
修稿时间:2010-07-09

Algorithm of H.264 fast deblocking filter on CUDA
LIU Hu,SUN Zhao-min,CHEN Qi-mei. Algorithm of H.264 fast deblocking filter on CUDA[J]. Journal of Computer Applications, 2010, 30(12): 3252-3254
Authors:LIU Hu  SUN Zhao-min  CHEN Qi-mei
Abstract:In H.264/AVC video coding standard, deblocking filter was used for enhancing the coding efficiency. The filter was very complicated and cost a lot of time. A fast algorithm and efficient implementation of H.264 deblocking filter based on NVIDIA Compute Unified Device Architecture (CUDA) was proposed. The parallel hardware architecture and software development process of Graphic Processing Unit (GPU) were introduced firstly. On the basis of the parallel architecture and hardware characteristic of GPU, some algorithms were focused on BS computation and optimization of deblocking filter to reduce complexity and improve the computing speed, and the shared memory was used to improve the data access efficiency. The experimental results clearly show that, in the same image quality, the average acceleration rate is about 20, and the algorithm on CPU can achieve better performance.
Keywords:Compute Unified Device Architecture (CUDA)   H.264   deblocking filter   parallel computing
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号