首页 | 官方网站   微博 | 高级检索  
     

基于OpenCL的图像重映射算法优化研究
作者姓名:吴再龙  张云泉  龙国平  徐建良  贾海鹏
作者单位:1.中国海洋大学 信息科学与工程学院,山东 青岛 266100;2.中国科学院软件研究所 并行软件与计算科学实验室,北京 100190;3.中国科学院软件研究所 计算机科学国家重点实验室,北京 100190
摘    要:图像重映射(Remap)算法是典型的图像变化算法。在图像放缩、扭曲、旋转等领域有着广泛的应用。随着图片规模和分辨率的不断提高,对图形映射算法的性能提出了越来越高的要求。本文在充分考虑不同GPU平台硬件体系结构差异的基础上,系统研究了在OpenCL框架下图像映射(Remap)算法在不同GPU平台上的高效实现方式。并从片外内存访存优化,向量化计算,减少动态指令等多个优化角度考察了不同优化方法在不同GPU平台上对性能的影响,提出了在不同GPU平台间实现性能移植的可能性。实验结果表明,优化后的算法在不考虑数据传输时间的前提下,在AMD HD5850GPU上相对于CPU版本取得114.3~491.5倍的加速比,相对于CUDA版本(现有GPU算法的实现)得到1.01~1.86的加速比,在NIVIDIA C2050 GPU上相对CPU版本取得100.7~369.8倍的加速比,相对于CUDA版本得到0.95~1.58的加速比。有效验证了本文提出的优化方法的有效性和性能可移植性。

关 键 词:OpenCL  通用计算  图像重映射算法  跨平台  
收稿时间:2012-10-22

Research on Image Remap Algorithm Optimization Based on OpenCL
Authors:Wu Zailong  Zhang Yunquan  Long Guoping  Xu Jianliang  Jia Haipeng
Affiliation:1. School of Information Science and Technology, The Ocean University of China, Qingdao, Shandong 266100, China; 2. Laboratory of Parallel Software and Computational Science, Institute of Software, the Chinese Academy of Science, Beijing 100190, China; 3. State Key Laboratory of Computing Science, the Chinese Academy of Sciences, Beijing 100190, China
Abstract:As a typical algorithm for image transformation, remap algorithm is widely used in image zooming, warping, rotating and some others. With continuous increase of image's scale and resolution, higher performance of graphic mapping algorithm has been more and more demanded. Taking full account of the differences of the hardware architectures on different GPU platforms, it is systematically studied in this paper that how remap algorithm based on OpenCL can run effectively on different GPU platforms. By applying memory access optimization of global memory, vectorization calculation, reducing judgments branch and some other optimization methods, we investigated the effects of different optimization on different platforms and suggested the possibility of realizing cross-platform portability. Experimental results showed that without counting the data transfer time, the speedup-ratio is 114.3~491.5 times for AMD HD5850 GPU to CPU version, and 1.01~1.86 times to CUDA version (with present GPU algorithm), and for NIVIDIA C2050 GPU, the speedup-ratio is 100.7~369.8 times to CPU and 0.95~1.58 times to CUDA. These well proved the validity and portability of the optimization methods proposed in this paper.
Keywords:OpenCL  Parallel computing  Image remap  Cross-platform  
点击此处可从《》浏览原始摘要信息
点击此处可从《》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号