首页 | 官方网站   微博 | 高级检索  
     

一种针对多媒体扩展指令集和实际多媒体程序的自动向量化方法
引用本文:姜伟华,梅超,郭一,朱嘉华,臧斌宇,朱传琪.一种针对多媒体扩展指令集和实际多媒体程序的自动向量化方法[J].计算机学报,2005,28(8):1255-1266.
作者姓名:姜伟华  梅超  郭一  朱嘉华  臧斌宇  朱传琪
作者单位:复旦大学并行处理研究所,上海,200433
基金项目:本课题得到国家自然科学基金(60273046)、上海市科学技术委员会重点基础项目基金(02JC14013)以及Intel公司大学合作项目(Intel NetBurst微体系结构的编译优化)资助. 致谢 感谢Intel公司的黄波博士和李剑慧博士对本研究的关心和指导以及在本文写作过程中提出的宝贵修改意见!
摘    要:自动向量化编译是利用处理器的多媒体扩展指令集提升多媒体程序性能的理想工具.但目前的研究不能有效加速实际程序.其主要原因是:普通算术操作的向量化不一定有性能提升;而多媒体典型操作因为其在源代码中表现形式多样而不能充分向量化.为了解决这一问题,文章对经典向量化算法进行改进以灵活统一地向量化这两类操作.主要的改进是增加了两个步骤:统一操作的不同表现形式和识别有价值的向量化操作.改进后的算法可以充分利用指令集生成高效代码,从而对实际多媒体程序拥有良好效果.此外,该算法可扩展性也很强.

关 键 词:自动向量化编译技术  多媒体扩展指令集  多媒体典型操作
收稿时间:2004-05-26
修稿时间:2004-05-26

Vectorization for Real-Life Multimedia Applications on Processors' Multimedia Extensions
JIANG Wei-hua,MEI Chao,GUO Yi,ZHU Jia-hua,ZANG Bin-Yu,ZHU Chuan-Qi.Vectorization for Real-Life Multimedia Applications on Processors'''' Multimedia Extensions[J].Chinese Journal of Computers,2005,28(8):1255-1266.
Authors:JIANG Wei-hua  MEI Chao  GUO Yi  ZHU Jia-hua  ZANG Bin-Yu  ZHU Chuan-Qi
Abstract:Almost all vendors have added multimedia extensions (MME) to their processors to speedup multimedia applications. However, researches on automatic vectorization of compiler so far have not fully utilized these MMEs to boost the performance of real-life multimedia applications. This results from their focus on vectorization for normal arithmetic operations which rarely have speedup and their failure to fully exploit benefits from MME support for multimedia specific operations. These multimedia specific operations have various forms in source code, especially those expressed in multiple statements and scattered in program. This fact greatly hindered their vectorization. In this paper, the authors resolve this problem by enhancing the classic vectorization algorithm to flexibly and uniformly vectorize beneficial normal arithmetic and multimedia specific operations. The authors mainly added two extra steps: one to uniform the appearance of operations and the other to recognize vectorizable operations. The experiment shows that above algorithm has satisfactory performance improvement in several real-life multimedia applications. The results reach 43.9% maximum and 7.4% average speedup for Accelerating Suite of Berkeley Multimedia Workload. Furthermore, any system based on the algorithm the authors proposed can be extended to vectorize more complicate cases by simply adding corresponding rules.
Keywords:automatic vectorization of compiler  multimedia extension  multimedia specific operation
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号