首页 | 官方网站   微博 | 高级检索  
     


Design and Implementation of an Extended Collectives Library for Unified Parallel C
Authors:Carlos Teijeiro  Guillermo L. Taboada  Juan Touriño  Ramón Doallo  José C. Mouriño  Damián A. Mallón  Brian Wibecan
Affiliation:Carlos Teijeiro 1,Student Member,IEEE,Guillermo L.Taboada 1 Juan Tourio 1,Senior Member,IEEE,Member,ACM,Ramón Doallo 1,Member,IEEE,Jos C.Mourio 2 Damivn A.Mallón 3,and Brian Wibecan 4 1 Computer Architecture Group,University of A Corua,A Corun a 15071,Spain 2 Galicia Supercomputing Center,Santiago de Compostela 15705,Spain 3 Jlich Supercomputing Centre,Institute for Advanced Simulation,Forschungszentrum Jlich,Jlich D-52425,Germany 4 Industry Standard Servers Group,Hewlett-Packard Company,Montgomery,Alabama 36117,U.S.A.
Abstract:Unified Parallel C (UPC) is a parallel extension of ANSI C based on the Partitioned Global Address Space (PGAS) programming model, which provides a shared memory view that simplifies code development while it can take advantage of the scalability of distributed memory architectures. Therefore, UPC allows programmers to write parallel applications on hybrid shared/distributed memory architectures, such as multi-core clusters, in a more productive way, accessing remote memory by means of different high-level language constructs, such as assignments to shared variables or collective primitives. However, the standard UPC collectives library includes a reduced set of eight basic primitives with quite limited functionality. This work presents the design and implementation of extended UPC collective functions that overcome the limitations of the standard collectives library, allowing, for example, the use of a specific source and destination thread or defining the amount of data transferred by each particular thread. This library fulfills the demands made by the UPC developers community and implements portable algorithms, independent of the specific UPC compiler/runtime being used. The use of a representative set of these extended collectives has been evaluated using two applications and four kernels as case studies. The results obtained confirm the suitability of the new library to provide easier programming without trading off performance, thus achieving high productivity in parallel programming to harness the performance of hybrid shared/distributed memory architectures in high performance computing.
Keywords:
本文献已被 CNKI SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号

京公网安备 11010802026262号