期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Efficient Abstractions for GPGPU Programming

Mathias Bourgoin Emmanuel Chailloux Jean-Luc Lamotte 《International journal of parallel programming》2014,42(4):583-600

General purpose (GP)GPU programming demands to couple highly parallel computing units with classic CPUs to obtain a high performance. Heterogenous systems lead to complex designs combining multiple paradigms and programming languages to manage each hardware architecture. In this paper, we present tools to harness GPGPU programming through the high-level OCaml programming language. We describe the SPOC library that allows to handle GPGPU subprograms (kernels) and data transfers between devices. We then present how SPOC expresses GPGPU kernel: through interoperability with common low-level extensions (from Cuda and OpenCL frameworks) but also via an embedded DSL for OCaml. Using simple benchmarks as well as a real world HPC software, we show that SPOC can offer a high performance while efficiently easing development. To allow better abstractions over tasks and data, we introduce some parallel skeletons built upon SPOC as well as composition constructs over those skeletons. 相似文献

2.

High‐speed parallel implementations of the rainbow method based on perfect tables in a heterogeneous system

Jung Woo Kim Jungjoo Seo Jin Hong Kunsoo Park Sung‐Ryul Kim 《Software》2015,45(6):837-855

The computing power of graphics processing units (GPU) has increased rapidly, and there has been extensive research on general‐purpose computing on GPU (GPGPU) for cryptographic algorithms such as RSA, Elliptic Curve Cryptosystem (ECC), NTRU, and Advanced Encryption Standard. With the rise of GPGPU, commodity computers have become complex heterogeneous GPU+CPU systems. This new architecture poses new challenges and opportunities in high‐performance computing. In this paper, we present high‐speed parallel implementations of the rainbow method based on perfect tables, which is known as the most efficient time‐memory trade‐off, in the heterogeneous GPU+CPU system. We give a complete analysis of the effect of multiple checkpoints on reducing the cost of false alarms and take advantage of it for load balancing between GPU and CPU. For GTX460, our implementation is about 1.86 and 3.25 times faster than other GPU‐accelerated implementations, RainbowCrack and Cryptohaze, respectively, and for GTX580, 1.53 and 2.40 times faster. Copyright © 2014 John Wiley & Sons, Ltd. 相似文献

3.

基于计算网格的机器人分布式仿真系统 总被引：3，自引：0，他引：3

张平宋丙林王琼芳《机器人》2005,27(4):309-312

基于计算网格技术,构建了机器人分布式三维图形仿真系统,经对PUMA 560工业机器人进行动力学仿真计算表明,计算网格技术提高了机器人三维图形仿真系统的计算能力．相似文献

4.

Cellular automata simulation of urban dynamics through GPGPU 总被引：1，自引：0，他引：1

Ivan Blecic Arnaldo Cecchini Giuseppe A. Trunfio 《The Journal of supercomputing》2013,65(2):614-629

In recent years, urban models based on Cellular Automata (CA) are becoming increasingly sophisticated and are being applied to real-world problems covering large geographical areas. As a result, they often require extended computing times. However, in spite of the improved availability of parallel computing facilities, the applications in the field of urban and regional dynamics are almost always based on sequential algorithms. This paper makes a contribution toward a wider use in the field of geosimulation of high performance computing techniques based on General-Purpose computing on Graphics Processing Units (GPGPU). In particular, we investigate the parallel speedup achieved by applying GPGPU to a popular constrained urban CA model. The major contribution of this work is in the specific modeling we propose to achieve significant gains in computing time, while maintaining the most relevant features of the traditional sequential model. 相似文献

5.

人体经络循经感传的三维虚拟仿真研究与实现

郑绍华陈国栋林伟明余轮《计算机仿真》2008,25(11)

医学在人体经络模型建立的基础上,为了实现人体经络循经感传现象的虚拟仿真,通过Catmull-Rom样条曲线在计算机中拟合了人体经络线,并利用VC++的编程环境,结合OpenGL三维图形库,设计了一种在三维环境下人机交互式的操作方法来引导经络循行动画,达到实时仿真的效果.仿真结果在效率与真实感两者间取得良好的效果,动画过程流畅无明显停滞现象,与理论上的二维示意图对比具有视觉逼真性,达到教学和临床训练的目的. 相似文献

6.

基于MPI的电网仿真实时并行计算平台

下载免费PDF全文

陈勇李亚楼田芳张量《计算机工程》2011,37(17):268-270,273

在电力系统动态电磁暂态仿真的并行计算中,存在超实时和硬实时的问题。为此,提出一种基于MPI的实时并行计算平台。引入硬实时操作系统RTLinux,采用实时内核和PSDD编程模式对仿真并行计算程序、MPICH并行环境和GM软件等进行重构,以获得硬实时特性。测试结果证明,该平台的平均时间性能提高约10%,时间的最大抖动幅度降低50%~80%,并减少了时间的抖动频率。相似文献

7.

GPGPU技术研究与发展

林一松唐玉华唐滔《计算机工程与科学》2011,33(10):85

半导体工艺的发展使得芯片上集成的晶体管数目不断增加,图形处理器的存储和计算能力也越来越强大。目前,GPU的峰值运算能力已经远远超出主流的CPU,它在非图形计算领域,特别是高性能计算领域的潜力已经引起越来越多研究者的关注。本文介绍了GPU用于通用计算的原理以及目前学术界和产业界关于GPGPU体系结构和编程模型方面的最新研究成果。相似文献

8.

High Level Data Structures for GPGPU Programming in a Statically Typed Language

Mathias Bourgoin Emmanuel Chailloux Jean-Luc Lamotte 《International journal of parallel programming》2017,45(2):242-261

To increase software performance, it is now common to use hardware accelerators. Currently, GPUs are the most widespread accelerators that can handle general computations. This requires to use GPGPU frameworks such as Cuda or OpenCL. Both are very low-level and make the benefit of GPGPU programming difficult to achieve. In particular, they require to write programs as a combination of two subprograms, and, to manually manage devices and memory transfers. This increases the complexity of the overall software design. The idea we develop in this paper is to guarantee expressiveness and safety for CPU and GPU computations and memory managements with high-level data-structures and static type-checking. In this paper, we present how statically typed languages, compilers and libraries help harness high level GPGPU programming. In particular, we show how we added high-level user-defined data structures to a GPGPU programming framework based on a statically typed programming language: OCaml. Thus, we describe the introduction of records and tagged unions shared between the host program and GPGPU kernels described via a domain specific language as well as a simple pattern matching control structure to manage them. Examples, practical tests and comparisons with state of the art tools, show that our solutions improve code design, productivity, and safety while providing a high level of performance. 相似文献

9.

Physis语言框架在WENO高阶数值格式异构计算中的应用

邬萍孟晨王龙《数据与计算发展前沿》2015,6(5):42-47

WENO(weighted essentially non-oscillatory)是计算流体力学中广泛采用的一种高阶数值格式。由于算法本身和异构计算编程的复杂性,需要开展异构计算代码自动生成的研究,以加速更多的应用。本文基于Physis这一领域编程语言框架,针对三维五阶WENO计算的天文应用,实现了其异构代码的自动生成。在超级计算机"元"上的测试结果表明,自动生成的异构计算代码具有良好的可扩展性,计算性能达到手工优化异构代码的72%,可为相关流体计算的异构代码生成提供借鉴。相似文献

10.

Parallel computing of 3D smoking simulation based on OpenCL heterogeneous platform

Zhiyong Yuan Weixin Si Xiangyun Liao Zhaoliang Duan Yihua Ding Jianhui Zhao 《The Journal of supercomputing》2012,61(1):84-102

Open Computing Language (OpenCL) is an open royalty-free standard for general purpose parallel programming across Central Processing Units (CPUs), Graphic Processing Units (GPUs) and other processors. This paper introduces OpenCL to implement real-time smoking simulation in a virtual surgery training simulation system. Firstly, the Computational Fluid Dynamics (CFD) is adopted to construct the real-time smoking simulation model based on the Navier?CStokes (N-S) equations of an incompressible fluid under the condition of normal temperature and pressure. Then we propose a parallel computing technique based on OpenCL to accomplish the parallel computing of smoking simulation model on CPU and GPU, respectively. Finally, we render the smoke in real time by using a three-dimensional (3D) texture volume rendering method. Experimental results show that the parallel computing technique we have proposed achieve a satisfactory effect on image quality and rendering rate both on CPU and GPU. 相似文献

11.

基于消息传递的数据交错重分布负载平衡技术

刘杰迟利华胡庆丰李晓梅《计算机工程与设计》2005,26(2):312-314,319

数据重分布是实现消息传递环境下负载平衡的重要手段,提出了数据交错分布的模型问题及模型问题的并行计算模型,分析了模型问题在消息传递环境下的实现,讨论了性能和适用条件,给出了分析结果,讨论了通信与计算的时间重叠问题,将数据交错重分布负载平衡技术应用到非平衡刚性动力学方程组的并行计算中,获得了很好的负载平衡效果。相似文献

12.

GPGPU computation and visualization of three-dimensional cellular automata

St��phane Gobron Arzu ??ltekin Herv�� Bonafos Daniel Thalmann 《The Visual computer》2011,27(1):67-81

This paper presents a general-purpose simulation approach integrating a set of technological developments and algorithmic methods in cellular automata (CA) domain. The approach provides a general-purpose computing on graphics processor units (GPGPU) implementation for computing and multiple rendering of any direct-neighbor three-dimensional (3D) CA. The major contributions of this paper are: the CA processing and the visualization of large 3D matrices computed in real time; the proposal of an original method to encode and transmit large CA functions to the graphics processor units in real time; and clarification of the notion of top-down and bottom-up approaches to CA that non-CA experts often confuse. Additionally a practical technique to simplify the finding of CA functions is implemented using a 3D symmetric configuration on an interactive user interface with simultaneous inside and surface visualizations. The interactive user interface allows for testing the system with different project ideas and serves as a test bed for performance evaluation. To illustrate the flexibility of the proposed method, visual outputs from diverse areas are demonstrated. Computational performance data are also provided to demonstrate the method’s efficiency. Results indicate that when large matrices are processed, computations using GPU are two to three hundred times faster than the identical algorithms using CPU. 相似文献

13.

Compiler support for general-purpose computation on GPUs

Yu-Te Lin Peng-Sheng Chen 《The Journal of supercomputing》2009,50(1):78-97

In recent years, the GPU (graphics processing unit) has evolved into an extremely powerful and flexible processor, with it now representing an attractive platform for general-purpose computation. Moreover, changes to the design and programmability of GPUs provide the opportunity to perform general-purpose computation on a GPU (GPGPU). Even though many programming languages, software tools, and libraries have been proposed to facilitate GPGPU programming, the unusual and specific programming model of the GPU remains a significant barrier to writing GPGPU programs. In this paper, we introduce a novel compiler-based approach for GPGPU programming. Compiler directives are used to label code fragments that are to be executed on the GPU. Our GPGPU compiler, Guru, converts the labeled code fragments into ISO-compliant C code that contains appropriate OpenGL and Cg APIs. A native C compiler can then be used to compile it into the executable code for GPU. Our compiler is implemented based on the Open64 compiler infrastructure. Preliminary experimental results from selected benchmarks show that our compiler produces significant performance improvements for programs that exhibit a high degree of data parallelism. 相似文献

14.

Accelerating wildfire susceptibility mapping through GPGPU

Salvatore Di Gregorio Giuseppe Filippone William Spataro Giuseppe A. Trunfio 《Journal of Parallel and Distributed Computing》2013

In the field of wildfire risk management the so-called burn probability maps (BPMs) are increasingly used with the aim of estimating the probability of each point of a landscape to be burned under certain environmental conditions. Such BPMs are usually computed through the explicit simulation of thousands of fires using fast and accurate models. However, even adopting the most optimized algorithms, the building of simulation-based BPMs for large areas results in a highly intensive computational process that makes mandatory the use of high performance computing. In this paper, General-Purpose Computation with Graphics Processing Units (GPGPU) is applied, in conjunction with a wildfire simulation model based on the Cellular Automata approach, to the process of BPM building. Using three different GPGPU devices, the paper illustrates several implementation strategies to speedup the overall mapping process and discusses some numerical results obtained on a real landscape. 相似文献

15.

Auto-tuning for GPGPU applications using performance and energy model

《Journal of Systems Architecture》2016

The general-purpose graphic processing unit (GPGPU) is a popular accelerator for general applications such as scientific computing because the applications are massively parallel and the significant power of parallel computing inheriting from GPUs. However, distributing workload among the large number of cores as the execution configuration in a GPGPU is currently still a manual trial-and-error process. Programmers try out manually some configurations and might settle for a sub-optimal one leading to poor performance and/or high power consumption. This paper presents an auto-tuning approach for GPGPU applications with the performance and power models. First, a model-based analytic approach for estimating performance and power consumption of kernels is proposed. Second, an auto-tuning framework is proposed for automatically obtaining a near-optimal configuration for a kernel computation. In this work, we formulated that automatically finding an optimal configuration as the constraint optimization and solved it using either simulated annealing (SA) or genetic algorithm (GA). Experiment results show that the fidelity of the proposed models for performance and energy consumption are 0.86 and 0.89, respectively. Further, the optimization algorithms result in a normalized optimality offset of 0.94% and 0.79% for SA and GA, respectively. 相似文献

16.

Real time emotion aware applications: a case study employing emotion evocative pictures and neuro-physiological sensing enhanced by Graphic Processor Units 总被引：1，自引：0，他引：1

Konstantinidis EI Frantzidis CA Pappas C Bamidis PD 《Computer methods and programs in biomedicine》2012,107(1):16-27

In this paper the feasibility of adopting Graphic Processor Units towards real-time emotion aware computing is investigated for boosting the time consuming computations employed in such applications. The proposed methodology was employed in analysis of encephalographic and electrodermal data gathered when participants passively viewed emotional evocative stimuli. The GPU effectiveness when processing electroencephalographic and electrodermal recordings is demonstrated by comparing the execution time of chaos/complexity analysis through nonlinear dynamics (multi-channel correlation dimension/D2) and signal processing algorithms (computation of skin conductance level/SCL) into various popular programming environments. Apart from the beneficial role of parallel programming, the adoption of special design techniques regarding memory management may further enhance the time minimization which approximates a factor of 30 in comparison with ANSI C language (single-core sequential execution). Therefore, the use of GPU parallel capabilities offers a reliable and robust solution for real-time sensing the user's affective state. 相似文献

17.

GPGPU编程技术初探

林茂董玉敏邹杰杨敏张晋楠《电脑编程技巧与维护》2010,(2):15-17,23

伴随着GPGPU计算技术的不断发展,HPC高性能计算系统体系结构正在悄然发生着一场变革,这场变革为高性能计算发展提供了一个新的方向、CUDA是NIVIDIA公司提供的利用GPGPU进行并行运算应用开发的一套C语言编程平台,通过它可以利用特定显卡的高性能运算能力进行一些大规模高性能计算,有效提升计算机系统的使用效率,本文主要介绍GPU发展现状以及如何利用CUDA编程技术进行并行运算软件开发．相似文献

18.

Improving branch divergence performance on GPGPU with a new PDOM stack and multi-level warp scheduling

《Journal of Systems Architecture》2014,60(5):420-430

General-purpose graphics processing unit (GPGPU) plays an important role in massive parallel computing nowadays. A GPGPU core typically holds thousands of threads, where hardware threads are organized into warps. With the single instruction multiple thread (SIMT) pipeline, GPGPU can achieve high performance. But threads taking different branches in the same warp violate SIMD style and cause branch divergence. To support this, a hardware stack is used to sequentially execute all branches. Hence branch divergence leads to performance degradation. This article represents the PDOM (post dominator) stack as a binary tree, and each leaf corresponds to a branch target. We propose a new PDOM stack called PDOM-ASI, which can schedule all the tree leaves. The new stack can hide more long operation latencies with more schedulable warps without the problem of warp over-subdivision. Besides, a multi-level warp scheduling policy is proposed, which lets part of the warps run ahead and creates more opportunities to hide the latencies. The simulation results show that our policies achieve 10.5% performance improvements over baseline policies with only 1.33% hardware area overhead. 相似文献

19.

Extending FuzzyCLIPS for parallelizing data-dependent fuzzy expert systems

Chao-Chin Wu Lien-Fu Lai Yu-Shuo Chang 《The Journal of supercomputing》2012,59(3):1379-1395

FuzzyCLIPS is a rule-based programming language and it is very suitable for developing fuzzy expert systems. However, it usually requires much longer execution time than algorithmic languages such as C and Java. To address this problem, we propose a parallel version of FuzzyCLIPS to parallelize the execution of a fuzzy expert system with data dependence on a cluster system. We have designed some extended parallel syntax following the original FuzzyCLIPS style. To simplify the programming model of parallel FuzzyCLIPS, we hide, as much as possible, the tasks of parallel processing from programmers and implement them in the inference engine by using MPI, the de facto standard for parallel programming for cluster systems. Furthermore, a load balancing function has been implemented in the inference engine to adapt to the heterogeneity of computing nodes. It will intelligently allocate different amounts of workload to different computing nodes according to the results of dynamic performance monitoring. The programmer only needs to invoke the function in the program for better load balancing. To verify our design and evaluate the performance, we have implemented a human resource website. Experimental results show that the proposed parallel FuzzyCLIPS can garner a superlinear speedup and provide a more reasonable response time. 相似文献

20.

MPI动态负载平衡策略的研究与实现

卢照张锦娟师军鱼佳欣《微机发展》2010,(5):132-135,149

集群环境下的并行计算越来越被广泛应用,MPI是集群系统中最重要的编程工具。在并行处理过程中,负载平衡起着很重要的作用,它直接影响到整个算法的效率。文中结合MPI编程环境下的具体特点,提出了基于负载益处估价的方法来判断是否进行任务迁移,给出了负载实时监测和调度的算法,并在每个节点机上间隔性地进行测试。最后在搭建的MPI环境下,运用并行排序方法进行了验证。实验结果表明采用负载前后有了很明显的提高,特别是随着任务量不断增大的情况下提高的效果更加明显。相似文献