首页 | 官方网站   微博 | 高级检索  
     

一种基于Matrix的QR分解向量化方法
引用本文:鲁庆男,刘仲.一种基于Matrix的QR分解向量化方法[J].计算机工程与科学,2016,38(2):210-216.
作者姓名:鲁庆男  刘仲
作者单位:;1.国防科学技术大学计算机学院
基金项目:千核级通用微处理器共享存储体系结构研究基金(61472432)
摘    要:提出一种基于Matrix的Givens旋转的QR分解向量化方法。针对Matrix的体系结构特点,对向量数据访存和计算进行优化,使计算均衡分布到各个向量处理单元;设计双缓冲DMA的数据传输策略,使得内核的计算与DMA数据搬移的时间完全重迭,内核始终处于峰值计算,从而取得最佳的计算效率。实验结果表明,该方法能够取得较高的计算效率和性能加速比。

关 键 词:QR分解  向量处理器  Givens旋转  软件流水
收稿时间:2015-02-28
修稿时间:2016-02-25

A vectorization method of QR decomposition based on Matrix
LU Qing nan,LIU Zhong.A vectorization method of QR decomposition based on Matrix[J].Computer Engineering & Science,2016,38(2):210-216.
Authors:LU Qing nan  LIU Zhong
Affiliation:(College of Computer,National University of Defense Technology,Changsha 410073,China)
Abstract:We propose a vectorization method of QR decomposition with Givens rotation on Matrix processors. According to the systematic characteristics of Matrix architecture, the computation tasks are evenly distributed to all vector processing elements by optimizing the memory access to vector data and calculation. We also design a double DMA buffering scheme to smooth the data transfers, which can fully overlap the kernel computation time and the DMA data transfer time so that the kernel computation is always at its peak speed and the best computation efficiency is achieved. Experimental results show that the proposal can achieve higher computation efficiency and performance speedup.
Keywords:QR decomposition  vector processor  Givens rotation  software pipeline  
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号