首页 | 官方网站   微博 | 高级检索  
     

面向国产平台的LLVM自动向量化移植与优化
引用本文:李嘉楠,韩林,柴赟达.面向国产平台的LLVM自动向量化移植与优化[J].计算机工程,2022,48(1):142-148.
作者姓名:李嘉楠  韩林  柴赟达
作者单位:1. 郑州大学 信息工程学院, 郑州 450000;2. 国家超级计算郑州中心, 郑州 450000
基金项目:国家重点研发计划“全球对地观测成果管理及共享服务系统关键技术研究”(2018YFB0505000);
摘    要:作为SIMD扩展部件向量化的重要手段,自动向量化已在LLVM编译器中得到实现,但向量长度以及指令集功能的差异,导致国产平台在自动向量化过程中容易错失向量化机会以及向量化后产生倒加速的问题。为使SIMD得到充分应用,结合国产平台的指令集特征完善指令代价信息以提高收益分析精准度,使其在自动向量化后生成后端支持且简洁高效的向量指令。在此基础上,提出一种改进的控制流向量化方法,通过添加指令代价信息提高自动向量化的适配能力,从而形成一套面向国产平台的LLVM自动向量化系统。实验结果表明,相比自动向量化移植前,通过该方法进行移植优化后,SPEC测试的整体性能提升10.8%,TSVC测试集中的加速比提升16%,精准代价指导下的加速比提升42%,控制流向量化下的加速比提升51%。

关 键 词:自动向量化  向量化收益  移植  LLVM编译器  国产平台  
收稿时间:2020-12-09
修稿时间:2021-01-19

Automatic Vectorization Transplant and Optimization of LLVM for Domestic Processors
LI Jia'nan,HAN Lin,CHAI Yunda.Automatic Vectorization Transplant and Optimization of LLVM for Domestic Processors[J].Computer Engineering,2022,48(1):142-148.
Authors:LI Jia'nan  HAN Lin  CHAI Yunda
Affiliation:1. School of Information Engineering, Zhengzhou University, Zhengzhou 450000, China;2. National Supercomputing Center in Zhengzhou, Zhengzhou 450000, China
Abstract:Automatic vectorization is essential in SIMD extension vectorization, and has been implemented in the LLVM compiler.However, the difference of vector length and instruction set functions can cause the domestic processors to lose the opportunity of vectorization in the process of automatic vectorization, or produce negative acceleration after vectorization.To make full use of SIMD, this paper discusses how to improve instruction cost information according to the instruction set features of domestic processors, so the accuracy of benefit analysis is increased.On this basis, precise and efficient vector instructions supported by the back end are generated after automatic vectorization.Furthermore, this paper proposes a vectorization method with improved control flows.By adding instruction cost information, the adaptability of automatic vectorization is improved.Finally a LLVM-based automatic vectorization system for domestic platforms is formed.The experimental results show that for the platforms having received automatic vectorization transplant, the proposed method provides a 10.8% overall performance improvement in SPEC tests, 16% acceleration ratio improvement on the TSVC test, 42% acceleration ratio improvement under the guidance of precision cost, and 51% acceleration ratio improvement under the control flow vecctorization.
Keywords:automatic vectorization  vectorization cost  transplant  LLVM compiler  domestic processor
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号