首页 | 官方网站   微博 | 高级检索  
     

面向多簇架构DSP的树匹配向量化算法
引用本文:郭连伟,郑启龙,黄胜兵,徐华叶.面向多簇架构DSP的树匹配向量化算法[J].计算机系统应用,2015,24(10):142-147.
作者姓名:郭连伟  郑启龙  黄胜兵  徐华叶
作者单位:安徽省高性能计算重点实验室, 合肥 230026;中国科学技术大学 计算机学院, 合肥 230027;安徽省高性能计算重点实验室, 合肥 230026;中国科学技术大学 计算机学院, 合肥 230027;安徽省高性能计算重点实验室, 合肥 230026;中国科学技术大学 计算机学院, 合肥 230027;安徽省高性能计算重点实验室, 合肥 230026;中国科学技术大学 计算机学院, 合肥 230027
基金项目:核高基重大专项(2012ZX01034-00-001)
摘    要:BWDSP是针对高性能计算设计的一款新型的处理器, 采用多簇超长指令字体系结构和SIMD架构, 有丰富的指令集. 为充分利用BWDSP提供的向量化资源, 迫切需要提出一种向量化算法. 本文在open64基础上研究并实现了面向多簇超长指令字(VLIW)DSP的SIMD编译优化算法. 算法基于OPEN64的中间语言WHIRL, 能够充分地利用BWDSP丰富的硬件资源和向量化指令. 最终实验结果表明, 对于能够合成双字和单字的循环程序, 该优化算法能够平均取得6倍和4倍的加速比.

关 键 词:单指令多数据  WHIRL树  多簇  超长指令字  指令并行
收稿时间:2/6/2015 12:00:00 AM
修稿时间:2015/4/26 0:00:00

SIMD Algorithm Based on Tree Matching for Multi-cluster and VLIW DSP
GUO Lian-Wei,ZHENG Qi-Long,HUANG Sheng-Bing and XU Hua-Ye.SIMD Algorithm Based on Tree Matching for Multi-cluster and VLIW DSP[J].Computer Systems& Applications,2015,24(10):142-147.
Authors:GUO Lian-Wei  ZHENG Qi-Long  HUANG Sheng-Bing and XU Hua-Ye
Affiliation:Anhui High Performance Computing Key Laboratory at Hefei, USTC, Hefei 230026, China;School of Computer Science and Technology, USTC, Hefei 230027, China;Anhui High Performance Computing Key Laboratory at Hefei, USTC, Hefei 230026, China;School of Computer Science and Technology, USTC, Hefei 230027, China;Anhui High Performance Computing Key Laboratory at Hefei, USTC, Hefei 230026, China;School of Computer Science and Technology, USTC, Hefei 230027, China;Anhui High Performance Computing Key Laboratory at Hefei, USTC, Hefei 230026, China;School of Computer Science and Technology, USTC, Hefei 230027, China
Abstract:BWDSP is a new type of processor designed for high performance computing, using multi-cluster VLIW structure and SIMD architecture, including a rich instruction set. In order to make full use of the resources of BWDSP, a SIMD algorithm is to be proposed. In this paper, an algorithm for DSP SIMD compiler optimization based on open64 infrastructure is studied and implemented. This algorithm is based on WHIRL intermediate language of Open64 and can make full use of rich hardware resources and vector instruction set. The experimental result shows that the vectorization algorithm achieves 6 times performance improvement for double-word vectorization and 4 times performance for single-word vectorization on average.
Keywords:SIMD  WHIRL tree  multi-cluster  VLIW  instruction-level parallelism
点击此处可从《计算机系统应用》浏览原始摘要信息
点击此处可从《计算机系统应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号