首页 | 官方网站   微博 | 高级检索  
     

基于多核阵列体系结构的嵌套循环并行优化
引用本文:杨子煜,严明,赵鹏. 基于多核阵列体系结构的嵌套循环并行优化[J]. 计算机工程与科学, 2009, 31(Z1). DOI: 10.3969/j.issn.1007-130X.2009.A1.035
作者姓名:杨子煜  严明  赵鹏
作者单位:国防科技大学计算机学院,湖南,长沙,410073
基金项目:国家自然科学基金资助项目 
摘    要:多核处理器已广泛应用于高性能计算领域,如何有效地将传统串行程序转换为并行代码并减少程序中嵌套循环所占用时间仍是该领域的挑战性问题。本文首先基于多面体模型对嵌套循环进行依赖特征分析并实现瓦片分割,据此自动生成粗粒度并行代码。针对多核阵列处理器的结构特点,采用遗传算法生成通信优化的瓦片任务序列,在此基础上建立了有效的任务调度模型。最后将上述方法应用于LU分解,结果表明该方法与传统调度算法相比,在增加数据局部性、实现负载平衡方面具有更好效果。

关 键 词:嵌套循环  自动并行  任务调度  多面体模型  遗传算法

Optimized Parallelization of Loop Nests for Multi-core Array Architecture
YANG Zi-yu,YAN Ming,ZHAO Peng. Optimized Parallelization of Loop Nests for Multi-core Array Architecture[J]. Computer Engineering & Science, 2009, 31(Z1). DOI: 10.3969/j.issn.1007-130X.2009.A1.035
Authors:YANG Zi-yu  YAN Ming  ZHAO Peng
Abstract:Multi-core processors are widely used in high performance computing, however, the parallelization of regular sequential programs and the optimization of running time of loop nests are still challenging issues. We present the dependence analysis of nested loop for tiling in polyhedral model, which makes it possible to automatically transform the sequential code into coarse-grain parallel program. Then a genetic algorithm is introduced to optimize the scheduling of tiled task queue for communication overhead in multi-core array architecture. The simulation of LU decomposition proves that our approach can generate more effective parallel code to improve the data locality and load-balanced execution among cores.
Keywords:Nested Loops  Scheduling  Polyhedral Model  Genetic Algorithm
本文献已被 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号