期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An experiment to measure the usability of parallel programming systems

Duane Szafron Jonathan Schaeffer 《Concurrency and Computation》1996,8(2):147-166

The growth of commercial and academic interest in parallel and distributed computing during the past 15 years has been accompanied by a corresponding increase in the number of available parallel programming systems (PPS). However, little work has been done to evaluate their usability, or to develop criteria for such evaluations. As a result, the usability of a typical PPS is based on how easily a small set of trivially parallel algorithms can be implemented by its authors. The paper discusses the design and results of an experiment to compare objectively the usability of two PPS. Half of the students in a graduate parallel and distributed computing course solved a problem using the Enterprise PPS while the other half used a PVM-like library of message-passing routines. The objective was to measure usability. The experiment provided valuable feedback as to what features of PPS are useful and the benefits they provide during the development of parallel programs. Although many usability experiments have been conducted for sequential programming languages and environments, they are rare in the parallel programming domain. Such experiments are necessary to help narrow the gap between what parallel programmers want and what current PPSs provide. 相似文献

2.

Automated tuning of parallel I/O systems: an approach to portableI/O performance for scientific applications

Ying Chen Winslett M. 《IEEE transactions on pattern analysis and machine intelligence》2000,26(4):362-383

相似文献

3.

From patterns to frameworks to parallel programs 总被引：1，自引：0，他引：1

S. MacDonald J. Anvik S. Bromling J. Schaeffer D. Szafron K. Tan 《Parallel Computing》2002,28(12):1663-1683

Object-oriented programming, design patterns, and frameworks are abstraction techniques that have been used to reduce the complexity of sequential programming. This paper describes our approach of applying these three techniques to the more difficult parallel programming domain. The Parallel Design Patterns (PDP) process, the basis of the CO₂P₃S parallel programming system, combines these techniques in a layered development model. The result is a new approach to parallel programming that addresses correctness and openness in a unique way. At the topmost development layer, a customized framework is generated from a design pattern specification of the parallel structure of the program. This framework encapsulates all of the structural details of the pattern, including communication and synchronization, to prevent programmer errors and ensure correctness. Lower layers are used only for performance tuning to make the code as efficient as necessary. This paper describes CO₂P₃S, based on the PDP process, and demonstrates it using an example application. We also provide results from a usability study of CO₂P₃S. 相似文献

4.

Locality-Aware Automatic Parallelization for GPGPU with OpenHMPP Directives

José M. Andión Manuel Arenaz François Bodin Gabriel Rodríguez Juan Touriño 《International journal of parallel programming》2016,44(3):620-643

The use of GPUs for general purpose computation has increased dramatically in the past years due to the rising demands of computing power and their tremendous computing capacity at low cost. Hence, new programming models have been developed to integrate these accelerators with high-level programming languages, giving place to heterogeneous computing systems. Unfortunately, this heterogeneity is also exposed to the programmer complicating its exploitation. This paper presents a new technique to automatically rewrite sequential programs into a parallel counterpart targeting GPU-based heterogeneous systems. The original source code is analyzed through domain-independent computational kernels, which hide the complexity of the implementation details by presenting a non-statement-based, high-level, hierarchical representation of the application. Next, a locality-aware technique based on standard compiler transformations is applied to the original code through OpenHMPP directives. Two representative case studies from scientific applications have been selected: the three-dimensional discrete convolution and the simple-precision general matrix multiplication. The effectiveness of our technique is corroborated by a performance evaluation on NVIDIA GPUs. 相似文献

5.

High‐level specifications for automatically generating parallel code

Alejandro Acosta Francisco Almeida Ignacio Pelez 《Concurrency and Computation》2013,25(7):989-1012

The arrival of multicore systems, along with the speed‐up potential available in graphics processing units, has given us unprecedented low‐cost computing power. These systems address some of the known architecture problems but at the expense of considerably increased programming complexity. Heterogeneity, at both the architectural and programming levels, poses a great challenge to programmers. Many proposals have been put forth to facilitate the job of programmers. Leaving aside proposals based on the development of new programming languages because of the effort this represents for the user (effort to learn and reuse code), the remaining proposals are based on transforming sequential code into parallel code, or on transforming parallel code designed for one architecture into parallel code designed for another. A different approach relies on the use of skeletons. The programmer has available set of parallel standards that comprise the basis for developing parallel code while programming sequential code. In this context, we propose a methodology for developing an automatic source‐to‐source transformation in a specific domain. This methodology is instantiated in a framework aimed at solving dynamic programming problems. Using this framework, the final user (a physician, mathematician, biologist, etc.) can express her problem using an equation in Latex, and the system will automatically generate the optimal parallel code for homogeneous or heterogeneous architectures. This approach allows for great portability toward these new emerging architectures and for great productivity, as evidenced by the computational results.Copyright © 2012 John Wiley & Sons, Ltd. 相似文献

6.

Parallel computing in networks of workstations with Paralex

Davoli R. Giachini L.-A. Bebaoglu O. Amoroso A. Alvisi L. 《Parallel and Distributed Systems, IEEE Transactions on》1996,7(4):371-384

Modern distributed systems consisting of powerful workstations and high-speed interconnection networks are an economical alternative to special-purpose supercomputers. The technical issues that need to be addressed in exploiting the parallelism inherent in a distributed system include heterogeneity, high-latency communication, fault tolerance and dynamic load balancing. Current software systems for parallel programming provide little or no automatic support towards these issues and require users to be experts in fault-tolerant distributed computing. The Paralex system is aimed at exploring the extent to which the parallel application programmer can be liberated from the complexities of distributed systems. Paralex is a complete programming environment and makes extensive use of graphics to define, edit, execute, and debug parallel scientific applications. All of the necessary code for distributing the computation across a network and replicating it to achieve fault tolerance and dynamic load balancing is automatically generated by the system. In this paper we give an overview of Paralex and present our experiences with a prototype implementation 相似文献

7.

A parallel programming assessment for stream processing applications on multi-core systems

《Computer Standards & Interfaces》2023

Multi-core systems are any computing device nowadays and stream processing applications are becoming recurrent workloads, demanding parallelism to achieve the desired quality of service. As soon as data, tasks, or requests arrive, they must be computed, analyzed, or processed. Since building such applications is not a trivial task, the software industry must adopt parallel APIs (Application Programming Interfaces) that simplify the exploitation of parallelism in hardware for accelerating time-to-market. In the last years, research efforts in academia and industry provided a set of parallel APIs, increasing productivity to software developers. However, a few studies are seeking to prove the usability of these interfaces. In this work, we aim to present a parallel programming assessment regarding the usability of parallel API for expressing parallelism on the stream processing application domain and multi-core systems. To this end, we conducted an empirical study with beginners in parallel application development. The study covered three parallel APIs, reporting several quantitative and qualitative indicators involving developers. Our contribution also comprises a parallel programming assessment methodology, which can be replicated in future assessments. This study revealed important insights such as recurrent compile-time and programming logic errors performed by beginners in parallel programming, as well as the programming effort, challenges, and learning curve. Moreover, we collected the participants’ opinions about their experience in this study to understand deeply the results achieved. 相似文献

8.

Dynamic load balancing on heterogeneous multi-GPU systems

Alejandro Acosta Vicente BlancoAuthor VitaeFrancisco AlmeidaAuthor Vitae 《Computers & Electrical Engineering》2013

Actual HPC systems are composed by multicore processors and powerful graphics processing units. Adapting existing code and libraries to these new systems is a fundamental problem due to the important increment on programming difficulties. The heterogeneity, both at architectural and programming levels at the same time, raises the programmability wall. The performance of the code is affected by the large interdependence between the code and the parallel architecture. We have developed a dynamic load balancing library that allows parallel code to be adapted to a wide variety of heterogeneous systems. The overhead introduced by our system is minimal and the cost to the programmer negligible. This system has been successfully applied to solve load imbalance problems appearing in homogeneous and heterogeneous multiGPU platforms. We consider the Dynamic Programming technique as case of study to validate our proposals using different heterogeneous scenarios in multiGPU systems. 相似文献

9.

A framework for efficient performance prediction of distributed applications in heterogeneous systems

Bogdan Florin Cornea Julien Bourgeois 《The Journal of supercomputing》2012,62(3):1609-1634

Predicting distributed application performance is a constant challenge to researchers, with an increased difficulty when heterogeneous systems are involved. Research conducted so far is limited by application type, programming language, or targeted system. The employed models become too complex and prediction cost increases significantly. We propose dPerf, a new performance prediction tool. In dPerf, we extended existing methods from the frameworks Rose and SimGrid. New methods have also been proposed and implemented such that dPerf would perform (i) static code analysis and (ii) trace-based simulation. Based on these two phases, dPerf predicts the performance of C, C++ and Fortran applications communicating using MPI or P2PSAP. Neither one of the used frameworks was developed explicitly for performance prediction, making dPerf a novel tool. dPerf accuracy is validated by a sequential Laplace code and a parallel NAS benchmark. For a low prediction cost and a high gain, dPerf yields accurate results. 相似文献

10.

Integrating Learning Supports into the Design of Visual Programming Systems

《Journal of Visual Languages and Computing》2001,12(5):501-524

相似文献

11.

Software framework concept with visual programming and digital twin for intuitive process creation with multiple robotic systems

《Robotics and Computer》2023

With the progressive digitalization in industrial manufacturing, the usage of complex robotic systems in both intralogistics and production is expected to increase. This proposes a challenge for planners and shop floor workers, as programming and interacting with these various systems leads to a high cognitive load. Especially the broad range of different manufacturer specific software leads to a number of problems, e.g. the program-synchronization between different systems and the often necessary workshops for workers. These problems can lead to inefficient programming and planning operations, bad worker satisfaction and human errors. In this paper, we present a modular, system agnostic and human centered software framework that unifies the programming of different systems, to enable centralized and intuitive system programming for non-expert operators. Our software framework utilizes visual programming concepts together with an integrated digital twin of the factory and a novel graph-based programming interface. We explain our concept in detail and describe our validation through integration into a realistic industrial setup with three different systems. In addition, we provide an evaluation of our concept's usability with an experimental user study and discuss the results of the study and the software implementation. Our study results show that even non-technical users are able to use our software after a brief introduction to create complex processes that involve multiple machines working in parallel. All users reported high usability and expert users reported that the visual process editor has enough features to create processes for industrial applications. Finally, we conclude this paper by providing an outlook on future work and use-cases of our software. 相似文献

12.

A visual representation of cellular automata-like systems

《Journal of Visual Languages and Computing》2004,15(6):409-438

Cellular automata (CA) models and corresponding algorithms have a rich theoretical basis, and have also been used in a great variety of applications. A number of programming languages and systems have been developed to support the implementation of the CA models. However, these languages focus on computational and performance issues, and do not pay enough attention to programming productivity, usability, understandability, and other aspects of software engineering.In this paper, we describe a new special-purpose programming language developed for visual specification, presentation, and explanation of CA systems within a visual programming environment, as well as, for programming them. This language is based on using visual patterns, colors, and animation for representing the CA system structures and operations on these structures, and for performing editing and composing manipulations with corresponding software components. Examples of the CA algorithm representations and some details of the environment implementation are presented. 相似文献

13.

Systems programming in Java

Ritchie S. 《Micro, IEEE》1997,17(3):30-35

The Java programming language has been widely accepted as a general purpose language for developing portable applications, toolkits, and applets. With so much activity in industry and academia in these user-level areas, is it surprising that Java is also an equally capable systems programming language? This article describes our experiences at JavaSoft with using Java as a systems-level programming language during the development of JavaOS. The author discusses the motivations for using Java and shows code examples to demonstrate various system-level primitives, including an Ethernet device driver 相似文献

14.

Airshed Pollution Modeling in an HPF Style Environment

《Journal of Parallel and Distributed Computing》2000,60(6):690-715

In this paper, we describe our experience with developing Airshed, a large pollution modeling application, in the Fx programming environment. We demonstrate that high level parallel programming languages like Fx and High Performance Fortran offer a simple and attractive model for developing portable and efficient parallel applications. Performance results are presented for the Airshed application executing on Intel Paragon and Cray T3D and T3E parallel computers. The results demonstrate that the application is “performance portable,” i.e., it achieves good and consistent performance across different architectures, and that the performance can be explained and predicted using a simple model for the communication and computation phases in the program. We also show how task parallelism was used to alleviate I/O related bottlenecks, an important consideration in many applications. Finally, we demonstrate how external parallel modules developed using different parallelization methods can be integrated in a relatively simple and flexible way with modules developed in the Fx compiler framework. Overall, our experience demonstrates that a high level parallel programming environment based on a language like HPF is suitable for developing complex multidisciplinary applications. 相似文献

15.

一种多范例并行应用系统的描述方法和性能估算模型

胡长军张素琴田金兰《计算机学报》2003,26(12):1671-1677

多范例并行是大规模并行应用系统的本质特征．规范化描述并行应用系统，建立性能估算模型对于提高多范例并行应用系统的开发效率和运行效率具有重要意义．该文提出了一种基于模块及其组合关系的描述方法和系统执行代价计算模型，它不仅能描述并行应用系统的多范例特征，而且将不同并行范例模块的组合时产生的代价引入模型．考虑的代价包括并行执行模式的转换、数据分布方式的转换以及编程范例的转换等，从而使模型更为准确．给出了描述和代价估算的应用实例，说明了规范化描述和代价估算对于确定并行策略的重要性以及模型的精确性．相似文献

16.

Supporting asynchronization in OpenMP for event-driven programming

《Parallel Computing》2019

The event-driven programming pattern is pervasive in a wide range of modern software applications. Unfortunately, it is not easy to achieve good performance and responsiveness when developing event-driven applications. Traditional approaches require a great amount of programmer effort to restructure and refactor code, to achieve the performance speedup from parallelism and asynchronization. Not only does this restructuring require a lot of development time, it also makes the code harder to debug and understand. We propose an asynchronous programming model based on the philosophy of OpenMP, which does not require code restructuring of the original sequential code. This asynchronous programming model is complementary to the existing OpenMP fork-join model. The coexistence of the two models has potential to decrease developing time for parallel event-driven programs, since it avoids major code refactoring. In addition to its programming simplicity, evaluations show that this approach achieves good performance improvements consistent with more traditional event-driven parallelization. 相似文献

17.

Extending OpenMP to Survive the Heterogeneous Multi-Core Era

Eduard Ayguadé Rosa M. Badia Pieter Bellens Daniel Cabrera Alejandro Duran Roger Ferrer Marc Gonzàlez Francisco Igual Daniel Jiménez-González Jesús Labarta Luis Martinell Xavier Martorell Rafael Mayo Josep M. Pérez Judit Planas Enrique S. Quintana-Ortí 《International journal of parallel programming》2010,38(5-6):440-459

This paper advances the state-of-the-art in programming models for exploiting task-level parallelism on heterogeneous many-core systems, presenting a number of extensions to the OpenMP language inspired in the StarSs programming model. The proposed extensions allow the programmer to write portable code easily for a number of different platforms, relieving him/her from developing the specific code to off-load tasks to the accelerators and the synchronization of tasks. Our results obtained from the StarSs instantiations for SMPs, the Cell, and GPUs report reasonable parallel performance. However, the real impact of our approach in is the productivity gains it yields for the programmer. 相似文献

18.

Raising the level of abstraction for developing message passing applications

Arora Ritu Bangalore Purushotham Mernik Marjan 《The Journal of supercomputing》2012,59(2):1079-1100

Message Passing Interface (MPI) is the most popular standard for writing portable and scalable parallel applications for distributed memory architectures. Writing efficient parallel applications using MPI is a complex task, mainly due to the extra burden on programmers to explicitly handle all the complexities of message-passing (viz., inter-process communication, data distribution, load-balancing, and synchronization). The main goal of our research is to raise the level of abstraction of explicit parallelization using MPI such that the effort involved in developing parallel applications is significantly reduced in terms of the reduction in the amount of code written manually while avoiding intrusive changes to existing sequential programs. In this research, generative programming tools and techniques are combined with a domain-specific language, Hi-PaL (High-Level Parallelization Language), for automating the process of generating and inserting the required code for parallelization into the existing sequential applications. The results show that the performance of the generated applications is comparable to the manually written versions of the applications, while requiring no explicit changes to the existing sequential code. 相似文献

19.

Automatic mapping of parallel applications on multicore architectures using the Servet benchmark suite

Jorge González-Domínguez^{Author Vitae} Guillermo L. Taboada Author VitaeBasilio B. Fraguela Author Vitae María J. Martín Author VitaeJuan Touriño Author Vitae 《Computers & Electrical Engineering》2012,38(2):258-269

Servet is a suite of benchmarks focused on detecting a set of parameters with high influence on the overall performance of multicore systems. These parameters can be used for autotuning codes to increase their performance on multicore clusters. Although Servet has been proved to detect accurately cache hierarchies, bandwidths and bottlenecks in memory accesses, as well as the communication overhead among cores, up to now the impact of the use of this information on application performance optimization has not been assessed. This paper presents a novel algorithm that automatically uses Servet for mapping parallel applications on multicore systems and analyzes its impact on three testbeds using three different parallel programming models: message-passing, shared memory and partitioned global address space (PGAS). Our results show that a suitable mapping policy based on the data provided by this tool can significantly improve the performance of parallel applications without source code modification. 相似文献

20.

From Design Patterns to Parallel Architectural Skeletons

《Journal of Parallel and Distributed Computing》2002,62(4):669-695

The concept of design patterns has been extensively studied and applied in the context of object-oriented software design. Similar ideas are being explored in other areas of computing as well. Over the past several years, researchers have been experimenting with the feasibility of employing design-patterns related concepts in the parallel computing domain. In the past, several pattern-based systems have been developed with the intention to facilitate faster parallel application development through the use of preimplemented and reusable components that are based on frequently used parallel computing design patterns. However, most of these systems face several serious limitations such as limited flexibility, zero extensibility, and the ad hoc nature of their components. Lack of flexibility in a parallel programming system limits a programmer to using only the high-level components provided by the system. Lack of extensibility here refers to the fact that most of the existing pattern-based parallel programming systems come with a set of prebuilt patterns integrated into the system. However, the system provides no obvious way of increasing the repertoire of patterns when need arises. Also, most of these systems do not offer any generic view of a parallel computing pattern, a fact which may be at the root of several of their shortcomings. This research proposes a generic (i.e., pattern- and application-independent) model for realizing and using parallel design patterns. The term “parallel architectural skeleton” is used to represent the set of generic attributes associated with a pattern. The Parallel Architectural Skeleton Model (PASM) is based on the message-passing paradigm, which makes it suitable for a LAN of workstations and PCs. The model is flexible as it allows the intermixing of high-level patterns with low-level message-passing primitives. An object-oriented and library-based implementation of the model has been completed using C++and MPI, without necessitating any language extension. The generic model and the library-based implementation allow new patterns to be defined and included into the system. The skeleton-library serves as a framework for the systematic, hierarchical development of network-oriented parallel applications. 相似文献