首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
杨际祥 《计算机科学》2016,43(4):188-191
多核并行编程的开发效率和加速比是影响多核进一步发展的两个重要问题。针对这两个问题,设计并实现了一个轻量级的多核多线程库(UCMLib)。该库基于任务原语概念,提供了数据并行性和任务并行性两种表达逻辑并行性的模式;对多线程编程的复杂性进行了封装和抽象,为开发者提供了高级的编程方法而不必显式地考虑锁和竞争,并降低了并行编程难度以提高开发效率。UCMLib的任务调度器基于对任务队列和工作者线程的有效构建和管理来提高并行程序的加速比。性能测试表明,当计算规模增大时,UCMLib在数据并行性与任务并行性两方面获得了比TPL库略优的加速比。最后给出了可能的性能改进方法以及需要进一步研究的问题。  相似文献   

2.
Multi-core systems are any computing device nowadays and stream processing applications are becoming recurrent workloads, demanding parallelism to achieve the desired quality of service. As soon as data, tasks, or requests arrive, they must be computed, analyzed, or processed. Since building such applications is not a trivial task, the software industry must adopt parallel APIs (Application Programming Interfaces) that simplify the exploitation of parallelism in hardware for accelerating time-to-market. In the last years, research efforts in academia and industry provided a set of parallel APIs, increasing productivity to software developers. However, a few studies are seeking to prove the usability of these interfaces. In this work, we aim to present a parallel programming assessment regarding the usability of parallel API for expressing parallelism on the stream processing application domain and multi-core systems. To this end, we conducted an empirical study with beginners in parallel application development. The study covered three parallel APIs, reporting several quantitative and qualitative indicators involving developers. Our contribution also comprises a parallel programming assessment methodology, which can be replicated in future assessments. This study revealed important insights such as recurrent compile-time and programming logic errors performed by beginners in parallel programming, as well as the programming effort, challenges, and learning curve. Moreover, we collected the participants’ opinions about their experience in this study to understand deeply the results achieved.  相似文献   

3.
李士刚  胡长军  王珏  李建江 《软件学报》2013,24(12):2782-2796
低功耗及廉价性使得异构多核在超级计算机计算资源中占有重要比例.然而,异构多核具有高带宽及松耦合一致性等特点,获得理想的存储及计算性能需要更多地考虑底层硬件细节.实现了一种针对典型的异构多核Cell BE 处理器的多级并行模型CellMLP,通过C 语言扩展编译指导语句,实现了对数据并行、任务并行以及流水并行编程模型的支持,提高了并行程序生产率.运行支持优化方面,数据并行采用SPE 并行数据传输、双缓冲等优化手段来提高数据传输带宽;任务并行使用一种新式混合任务队列以支持异步任务窃取,降低SPE 线程间竞争,提高了任务并行的可扩展性;流水并行首次使用阻塞信号传输机制实现SPE 线程间的低开销同步操作.实验对Stream,NASBenchmark 及BOTS 等应用进行了测试,结果表明,CellMLP 可对多种典型并行应用进行高效支持.与目前同类编程模型SARC 及CellSs 进行性能对比,其结果表明,CellMLP 实际数据传输带宽以及非规则应用的支持方面具有明显优势.  相似文献   

4.
Task-based programming models are beneficial for the development of parallel programs for several reasons. They provide a decoupling of the specification of parallelism from the scheduling and mapping to execution resources of a specific hardware platform, thus allowing a flexible and individual mapping. For platforms with a distributed address space, the use of parallel tasks, instead of sequential tasks, adds the additional advantage of a structuring of the program into communication domains that can help to reduce the overall communication overhead.  相似文献   

5.
Marco Vanneschi   《Parallel Computing》2002,28(12):595-1732
A software development system based upon integrated skeleton technology (ASSIST) is a proposal of a new programming environment oriented to the development of parallel and distributed high-performance applications according to a unified approach. The main goals are: high-level programmability and software productivity for complex multidisciplinary applications, including data-intensive and interactive software; performance portability across different platforms, in particular large-scale platforms and grids; effective reuse of parallel software; efficient evolution of applications through versions that scale according to the underlying technologies.

The purpose of this paper is to show the principles of the proposed approach in terms of the programming model (successive papers will deal with the environment implementation and with performance evaluation). The features and the characteristics of the ASSIST programming model are described according to an operational semantics style and using examples to drive the presentation, to show the expressive power and to discuss the research issues.

According to our previous experience in structured parallel programming, in ASSIST we wish to overcome some limitations of the classical skeletons approach to improve generality and flexibility, expressive power and efficiency for irregular, dynamic and interactive applications, as well as for complex combinations of task and data parallelism. A new paradigm, called “parallel module” (parmod), is defined which, in addition to expressing the semantics of several skeletons as particular cases, is able to express more general parallel and distributed program structures, including both data-flow and nondeterministic reactive computations. ASSIST allows the programmer to design the applications in the form of generic graphs of parallel components. Another distinguishing feature is that ASSIST modules are able to utilize external objects, including shared data structures and abstract objects (e.g. CORBA), with standard interfacing mechanisms. In turn, an ASSIST application can be reused and exported as a component for other applications, possibly expressed in different formalisms.  相似文献   


6.
LilyTask是一种以任务为单位、基于任务并行的计算模型和程序设计模型。本文介绍的LilyTask系统基于LilyTask模型,对在分布内存环境下实现任务池系统进行了有益的尝试,较好地解决了任务并行在分布存储环境下遇到的任务间关系、嵌套任务、一致性等相关问题。此外,本系统还采用函数索引技术实现分布环境下的任务迁移,通过任务窃取策略有效地实现负载平衡。  相似文献   

7.
Task parallelism is an approach to parallel programming that has recently gained traction because of its compatibility with the predominant object‐oriented languages and its low overhead compared to threading approaches. Parallel Task is an Open Source task‐parallel compiler and runtime system for object‐oriented languages, in particular Java. It is very flexible and expressive, demonstrated by the fact that it can be directly employed to implement most parallel computing patterns. The only notable exception has been the pipeline pattern where many data items are streamed through a number of processing stages. This is not surprising, as task parallelism is generally not compatible with the pipeline pattern. In this paper, we investigate how the pipeline pattern can be elegantly and efficiently implemented in a task‐parallel environment. To do so, we extend Parallel Task with the concept of implicit futures to allow creating pipelines in an intuitive and object‐oriented manner. Our experimental evaluation uses the extended Parallel Task to implement pipelines of different lengths and characteristics and compares with manual implementations. The evaluation demonstrates very good performance and scalability of the proposed task‐parallel pipeline approach. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

8.
任务并行编程模型研究与进展   总被引:1,自引:0,他引:1  
任务并行编程模型是近年来多核平台上广泛研究和使用的并行编程模型,旨在简化并行编程和提高多核利用率.首先,介绍了任务并行编程模型的基本编程接口和支持机制;然后,从3个角度,即并行性表达、数据管理和任务调度介绍任务并行编程模型的研究问题、困难和最新研究成果;最后展望了任务并行未来的研究方向.  相似文献   

9.
Task parallelism is an attractive approach to automatically load balance the computation in a parallel system and adapt to dynamism exhibited by parallel systems. Exploiting task parallelism through work stealing has been extensively studied in shared and distributed‐memory contexts. In this paper, we study the design of a system that uses work stealing for dynamic load balancing of task‐parallel programs executed on hybrid distributed‐memory CPU‐graphics processing unit (GPU) systems in a global‐address space framework. We take into account the unique nature of the accelerator model employed by GPUs, the significant performance difference between GPU and CPU execution as a function of problem size, and the distinct CPU and GPU memory domains. We consider various alternatives in designing a distributed work stealing algorithm for CPU‐GPU systems, while taking into account the impact of task distribution and data movement overheads. These strategies are evaluated using microbenchmarks that capture various execution configurations as well as the state‐of‐the‐art CCSD(T) application module from the computational chemistry domain. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

10.
The Block Conjugate Gradient algorithm (Block‐CG) was developed to solve sparse linear systems of equations that have multiple right‐hand sides. We have adapted it for use in heterogeneous, geographically distributed, parallel architectures. Once the main operations of the Block‐CG (Tasks) have been collected into smaller groups (subjobs), each subjob is matched by the middleware MJMS (MPI Jobs Management System) with a suitable resource selected among those which are available. Moreover, within each subjob, concurrency is introduced at two different levels and with two different granularities: the coarse‐grained parallelism to perform independent tasks and the fine‐grained parallelism within the execution of a task. We refer to this algorithm as to multi‐grained distributed implementation of the parallel Block‐CG. We compare the performance of a parallel implementation with the one of the distributed implementation running on a variety of Grid computing environments. The middleware MJMS—developed by some of the authors and built on top of Globus Toolkit and Condor‐G—was used for co‐allocation, synchronization, scheduling and resource selection. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

11.
Heterogeneous computing systems are promising computing platforms, since single parallel architecture based systems may not be sufficient to exploit the available parallelism with the running applications. In some cases, heterogeneous distributed computing (HDC) systems can achieve higher performance with lower cost than single-machine supersystems. However, in HDC systems, processors and networks are not failure free and any kind of failure may be critical to the running applications. One way of dealing with such failures is to employ a reliable scheduling algorithm. Unfortunately, most existing scheduling algorithms for precedence constrained tasks in HDC systems do not adequately consider reliability requirements of inter-dependent tasks. In this paper, we design a reliability-driven scheduling architecture that can effectively measure system reliability, based on an optimal reliability communication path search algorithm, and then we introduce reliability priority rank (RRank) to estimate the task’s priority by considering reliability overheads. Furthermore, based on directed acyclic graph (DAG) we propose a reliability-aware scheduling algorithm for precedence constrained tasks, which can achieve high quality of reliability for applications. The comparison studies, based on both randomly generated graphs and the graphs of some real applications, show that our scheduling algorithm outperforms the existing scheduling algorithms in terms of makespan, scheduling length ratio, and reliability. At the same time, the improvement gained by our algorithm increases as the data communication among tasks increases.  相似文献   

12.
数据并行虽然已经获得了广泛的应用,但是,仍然有一些应用程序不适于数据并行语言的并行模式,如树结构算法。数据并行与任务并行的结合可以很好地解决这些问题。该文主要讨论了在数据并行中引入任务并行时,遇到的共享变量、代码生成和处理器分配等问题,比较和分析了基于编译、基于语言和基于协作库的方法。  相似文献   

13.
一个用于工作站网络的动态负载平衡算法   总被引:3,自引:0,他引:3  
数学和科学计算中的大部分问题都可以用数据并行程序来开发其并行性,但是在工作站网络环境中,负载波动很大,负载平衡是影响其效率的一个重要因素。本文提出了一种动态负载平衡的算法,它可以使数据并行程序在运行时动态地调整负载。并且文中给出了这种算法的实验结果。  相似文献   

14.
With the increasing amount of parallelism obtainable on multicore platforms, stream programming has been proposed as an effective solution for exposing distributed parallelization. Nonetheless, a pressing demand of scheduling task and data parallelism in stream programming exists that can accomplish robust multicore performance in the face of varying application characteristics. This paper addresses the problem of scheduling task and data parallelism in stream programming. We present StreamMDE, an asynchronous concurrency stream programming framework which offers a novel parallel programming model for scheduling task and data parallelism in the message-driven execution paradigm. A key property of this framework is exposing controlled-grained parallelism, which allows us to control the granularity of task and data parallelism in stream graph. Our empirical evaluation of StreamMDE shows that higher efficiency of mixed task and data parallelism in stream programming can be exploited with the appropriate granularity control. The framework bridges the gap between the parallel scale and the architecture of stream programs and facilitates in designing and coding stream features in different schedules.  相似文献   

15.
This research defines and analyzes a methodology for deriving a performance model for SPMD hybrid parallel applications. Hybrid parallelism combines shared memory and message passing computing models. This work extends the current practice of application performance modelling by development of a methodology for hybrid applications with these procedures.
  • Creation of a model based on complexity analysis of an application code and its data structures.
  • Enhancement of a static complexity model by dynamic factors to capture execution time phenomena, such as memory hierarchy effects.
  • Quantitative analysis of model characteristics and the effects of perturbations in measured parameters.
These research results are presented in the context of a hybrid parallel implementation of a sparse linear algebra kernel. A model for this kernel is derived and analyzed using the methodology. Application of the model on two large parallel computing platforms provides case studies for the methodology. Operating system issues, machine balance factor, and memory hierarchy effects on model accuracy are examined. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

16.
为研究并行图形绘制技术,介绍图形绘制的流水线过程,对其内在的可并行性进行分析,研究并行绘制的实现方式,包括流水线并行、数据并行和作业并行,以及前分布拼接合成、中分布拼接合成和后分布拼接合成,讨论并行绘制面临的主要问题及其发展趋势。  相似文献   

17.
Models of parallel computation :a survey and classification   总被引:5,自引:1,他引:5  
In this paper, the state-of-the-art parallel computational model research is reviewed. We will introduce various models that were developed during the past decades. According to their targeting architecture features, especially memory organization, we classify these parallel computational models into three generations. These models and their characteristics are discussed based on three generations classification. We believe that with the ever increasing speed gap between the CPU and memory systems, incorporating non-uniform memory hierarchy into computational models will become unavoidable. With the emergence of multi-core CPUs, the parallelism hierarchy of current computing platforms becomes more and more complicated. Describing this complicated parallelism hierarchy in future computational models becomes more and more important. A semi-automatic toolkit that can extract model parameters and their values on real computers can reduce the model analysis complexity, thus allowing more complicated models with more parameters to be adopted. Hierarchical memory and hierarchical parallelism will be two very important features that should be considered in future model design and research.  相似文献   

18.
GOP is a graph‐oriented programming model which aims at providing high‐level abstractions for configuring and programming cooperative parallel processes. With GOP, the programmer can configure the logical structure of a parallel/distributed program by constructing a logical graph to represent the communication and synchronization between the local programs in a distributed processing environment. This paper describes a visual programming environment, called VisualGOP, for the design, coding, and execution of GOP programs. VisualGOP applies visual techniques to provide the programmer with automated and intelligent assistance throughout the program design and construction process. It provides a graphical interface with support for interactive graph drawing and editing, visual programming functions and automation facilities for program mapping and execution. VisualGOP is a generic programming environment independent of programming languages and platforms. GOP programs constructed under VisualGOP can run in heterogeneous parallel/distributed systems. Copyright © 2005 John Wiley & Sons, Ltd.  相似文献   

19.
The use of Graphics Processing Units (GPUs) for high‐performance computing has gained growing momentum in recent years. Unfortunately, GPU‐programming platforms like Compute Unified Device Architecture (CUDA) are complex, user unfriendly, and increase the complexity of developing high‐performance parallel applications. In addition, runtime systems that execute those applications often fail to fully utilize the parallelism of modern CPU‐GPU systems. Typically, parallel kernels run entirely on the most powerful device available, leaving other devices idle. These observations sparked research in two directions: (1) high‐level approaches to software development for GPUs, which strike a balance between performance and ease of programming; and (2) task partitioning to fully utilize the available devices. In this paper, we propose a framework, called PSkel, that provides a single high‐level abstraction for stencil programming on heterogeneous CPU‐GPU systems, while allowing the programmer to partition and assign data and computation to both CPU and GPU. Our current implementation uses parallel skeletons to transparently leverage Intel Threading Building Blocks (Intel Corporation, Santa Clara, CA, USA) and NVIDIA CUDA (Nvidia Corporation, Santa Clara, CA, USA). In our experiments, we observed that parallel applications with task partitioning can improve average performance by up to 76% and 28% compared with CPU‐only and GPU‐only parallel applications, respectively. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号