首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 22 毫秒
1.
With the proliferation of multi-processor core systems, parallel programming imposes a difficult challenge where current solutions are far from being considered efficient. In order to alleviate the difficulty of parallel programming, we propose a scheduler, which is part of a master–slave RTOS, to efficiently manage the parallel programs running on a multi-processor core system. We also propose an efficient protocol that serves as the interface between the operating system and application programs. This interface protocol runs on a dedicated control subnet to cut down the synchronization overhead between the parallel tasks. Such synchronization overhead incurred in these multi-core parallel systems has been recognized as one of the severe limiting factors when pushing up the performance envelope. Experimental results, obtained from the register-transfer level simulations of various benchmark parallel programs, show that the proposed protocol and the control subnet can improve the system efficiency by up to 33.5%. This protocol, as it is designed to be compatible with the minimum subset of the massage-passing interface functions (MPI), scales well with the number of cores.  相似文献   

2.
Clusters of SMPs are hybrid-parallel architectures that combine the main concepts of distributed-memory and shared-memory parallel machines. Although SMP clusters are widely used in the high performance computing community, there exists no single programming paradigm that allows exploiting the hierarchical structure of these machines. Most parallel applications deployed on SMP clusters are based on MPI, the standard API for distributed-memory parallel programming, and thus may miss a number of optimization opportunities offered by the shared memory available within SMP nodes. In this paper we present extensions to the data parallel programming language HPF and associated compilation techniques for optimizing HPF programs on clusters of SMPs. The proposed extensions enable programmers to control key aspects of distributed-memory and shared-memory parallelization at a high-level of abstraction. Based on these language extensions, a compiler can adopt a hybrid parallelization strategy which closely reflects the hierarchical structure of SMP clusters by automatically exploiting shared-memory parallelism based on OpenMP within cluster nodes and distributed-memory parallelism utilizing MPI across nodes. We describe the implementation of these features in the VFC compiler and present experimental results which show the effectiveness of these techniques.  相似文献   

3.
OpenTM在OpenMP的基础上引入事务的语法和语义,为事务存储程序设计提供了基于指导命令的程序设计接口.本文选取标准并行基准测试程序NPB中的应用程序LU作为例子,利用事务存储的投机并行执行能力和OpenTM接口实现了流水算法的并行.实验表明,OpenTM程序设计简单,避免了使用锁模式的复杂性,能够在科学计算领域发挥重大作用.  相似文献   

4.
An island model is a typical implementation of genetic programming on parallel computers with distributed memory. The island model has a migration facility that sends/receives some individuals in an island to/from another island to maintain diversity. The island model requires synchronization to migrate same-generation individuals between islands, and this synchronization causes an increase in computation time. This article proposes a new parallel genetic programming implementation based on the island model with asynchronous migration. Most recent computers are equipped with one or more multi-core processors, and are suitable for multi-threading. Therefore we employ a communication thread for migration between islands. The communication thread on a processor communicates with the communication thread on another processor to migrate individuals at appropriate intervals. Since the migration and other genetic operations can be independently processed on each core, and since we allow the exchange of individuals of different generations, no synchronization is needed in our implementation. In addition, a fitness calculation is also executed in parallel by the remaining cores. Experimental results show that the proposed method can reduce the computation time to about 17% in serial GP by using 40 threads.  相似文献   

5.
SNOW系统通信协议—Homer   总被引:2,自引:1,他引:1  
1.引言 NOW(Network of Workstations)系统是以商用工作站或高档微机作为处理节点,通过高速商用网或专用网络互联而构成的一种并行计算机系统,一般称之为并行机群系统。它具有可扩展性好、性能价格比高、用户投资风险小及软件资源丰富等优点。由国家  相似文献   

6.
Multiprocessors in which a shared bus is used by the processor to communicate with common memory are an emerging class of machines where there is a need to support parallel programming languages. A language construct that is found in a number of parallel programming languages to support synchronization and communication in the interprocess rendezvous. Shared-bus multiprocessor require a protocol to keep the date in their caches coherent. There are two major categories of these protocols: invalidation and write-boadcast. This paper examines the requirements for cache coherence protocols to support efficient interprocessor rendezvous. The approach taken is to examine the memory referencing patterns to the run-time data structures during rendezvous execution. The appropriate coherence protocol is shown to be a function of the processor scheduling strategy used by the run-time system at synchronzation points during the rendezvous. When processes migrate freely as a result of the scheduling strategy, invalidation protocols are found to be more efficient. When migration is restricted by the scheduler, write-broadcast protocols are more efficient.  相似文献   

7.
Web of Things (WoT) makes it possible to connect tremendous embedded devices to web in Representational State Transfer (REST) style. Some lightweight RESTful protocols have been proposed for the WoT to replace the HTTP protocol running on embedded devices. However, they keep the principal characteristic of the REST style. In particular, they support one-to-one requests in the client-server mode by four standard RESTful methods (GET, PUT, POST, and DELETE). This characteristic is however inconsistent with the practical networks of embedded devices, which typically perform a group operation. In order to meet the requirement of group communication in the WoT, we propose a resource-oriented protocol called SeaHttp to extend the REST style by introducing two new methods, namely BRANCH and COMBINE respectively. SeaHttp supports parallel processing of group requests by means of splitting and merging them. In addition SeaHttp adds spatiotemporal attributes to the standard URI for naming a dynamic request group of physical resource. Experimental results show that SeaHttp can reduce average energy consumption of group communication in the WoT by 18.5%, compared with the Constrained Application Protocol (CoAP).  相似文献   

8.
采用软件定义网络的方式可以实现对网络资源的统一部署和管理,不仅可以减轻手工配置工作量,同时可以减 少因手工配置差错带来的网络故障影响。在对网络资源进行编程开发时,使用Python脚本语言创建HTTP GET和PUT等请 求方法,与网络设备提供的编程接口通信,对网络设备资源进行增删改查,从而实现资源的自动化部署和管理。  相似文献   

9.
Irregular parallel algorithms pose a significant challenge for achieving high performance because of the difficulty predicting memory access patterns or execution paths. Within an irregular application, fine-grained synchronization is one technique for managing the coordination of work; but in practice the actual performance for irregular problems depends on the input, the access pattern to shared data structures, the relative speed of processors, and the hardware support of synchronization primitives. In this paper, we focus on lock-free and mutual exclusion protocols for handling fine-grained synchronization. Mutual exclusion and lock-free protocols have received a fair amount of attention in coordinating accesses to shared data structures from concurrent processes. Mutual exclusion offers a simple programming abstraction, while lock-free data structures provide better fault tolerance and eliminate problems associated with critical sections such as priority inversion and deadlock. These synchronization protocols, however, are seldom used in parallel algorithm designs, especially for algorithms under the SPMD paradigm, as their implementations are highly hardware dependent and their costs are hard to characterize. Using graph-theoretic algorithms for illustrative purposes, we show experimental results on two shared-memory multiprocessors, the IBM pSeries 570 and the Sun Enterprise 4500, that irregular parallel algorithms with efficient fine-grained synchronization may yield good performance.  相似文献   

10.
Multicore computers are ubiquitous. Expert developers as well as developers with little experience in parallelism are now asked to create multithreaded software to exploit parallelism in mainstream shared‐memory hardware. However, finding and fixing parallel programming errors is a complex and arduous task. Programmers thus rely on tools such as race detectors that typically focus on reporting errors due to incorrect usage of synchronization constructs or due to missing synchronization. This arsenal of debugging techniques, however, is incomplete. This article presents a new perspective and addresses a largely unexplored direction of defect localization where a wrong usage of nonparallel programming constructs might cause wrong parallel application behavior. In particular, we make a contribution by showing how to use data‐mining techniques to locate defects in multithreaded shared‐memory programs. Our technique analyzes execution anomalies in a condensed representation of the dynamic call graphs of a multithreaded object‐oriented application and identifies methods that contain a defect. Compared with race detectors that concentrate on finding incorrect synchronization, our method is able to reveal a wider range of defects that affect the control flow of a parallel program. Results from controlled experiments show that our data‐mining approach finds not only race conditions in different types of multicore applications but also other errors that cause incorrect parallel program behavior. Data‐mining techniques offer a fruitful new ground for parallel program debugging, and we also discuss long‐term directions for this interesting field. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

11.
PRISMA/DB是一并行的主存关系数据库管理系统,其设计思想主要有两个:第一,将整个数据库存入主存从而获得高性能;第二,系统使用一种面向对象的程序设计语言以模块方式实现,使得这种灵活的组织结构适应于功能和性能方面的分析和试验。目前其原型系统已实现,运行在一个具有100个结点的多处理机上。本文将对其设计和实现细节做一初步介绍。  相似文献   

12.
Aquarius-II is a cache coherent multiprocessor system designed for the parallel execution of Prolog programs. It contains two tiers of memory: synchronization memory and high bandwidth (HB) memory. The synchronization memory consists of snooping caches connected to a bus and is used to store rendezvous points, synchronization bits, synchronization variables such as locks and semaphores and most of the write shared data. The HB memory is used to store the bulk of the application program code and data. It contains caches and an inexpensive VLSI chip based crossbar interconnection network to memory. The caches connected to the crossbar do not have full snooping capability. The architecture is evaluated by a full simulation of parallel execution of Prolog programs on Aquarius-II. The design details of the components of the architecture and simulation results are presented. Simulation results indicate that the two tier memory system significantly reduces memory interference and speeds up synchronization when compared to a single bus multi. This shared memory multiprocesor architecture has the potential to support other parallel programming paradigms.  相似文献   

13.
More and more aspects of concurrency and concurrent programming are becoming part of mainstream programming and software engineering, due to several factors such as the widespread availability of multi-core/parallel architectures and Internet-based systems. This leads to the extension of mainstream object-oriented programming languages and platforms-Java is a main example-with libraries providing fine-grained mechanisms and idioms to support concurrent programming, in particular for building efficient programs. Besides this fine-grained support, a main research goal in this context is to devise higher-level, coarse-grained abstractions that would help building concurrent programs, as pure object-oriented abstractions help building large component-based programs. To this end, in this paper we present simpA, a Java-based framework that provides programmers with agent-oriented abstractions on top of the basic OO layer, as a means to organize and structure concurrent applications. We first describe the application programming interface (API) and annotation framework provided to Java programmers for building simpA applications, and then we discuss the main features of the approach from a software engineering point of view, by showing some programming examples. Finally, we define an operational semantics formalizing the main aspects of this programming model.  相似文献   

14.
Unified Parallel C(UPC) is a parallel extension of ANSI C based on the Partitioned Global Address Space(PGAS) programming model,which provides a shared memory view that simplifies code development while it can take advantage of the scalability of distributed memory architectures.Therefore,UPC allows programmers to write parallel applications on hybrid shared/distributed memory architectures,such as multi-core clusters,in a more productive way,accessing remote memory by means of different high-level language constructs,such as assignments to shared variables or collective primitives.However,the standard UPC collectives library includes a reduced set of eight basic primitives with quite limited functionality.This work presents the design and implementation of extended UPC collective functions that overcome the limitations of the standard collectives library,allowing,for example,the use of a specific source and destination thread or defining the amount of data transferred by each particular thread.This library fulfills the demands made by the UPC developers community and implements portable algorithms,independent of the specific UPC compiler/runtime being used.The use of a representative set of these extended collectives has been evaluated using two applications and four kernels as case studies.The results obtained confirm the suitability of the new library to provide easier programming without trading off performance,thus achieving high productivity in parallel programming to harness the performance of hybrid shared/distributed memory architectures in high performance computing.  相似文献   

15.
Optical technologies can support thousands of high bandwidth optical channels to/from a single CMOS integrated circuit, and can thus allow for the construction of novel bandwidth-intensive computing architectures which are no longer constrained by conventional electronic wiring limitations. In this paper, the architecture of a dynamically reconfigurableIntelligent Optical Backplaneis described. The backplane consists of a large number of parallel optical channels (typically 1000–10,000 bits) spaced a few hundred micrometers apart. The optical channels are arranged into upstream and downstream rings, where the channel access protocols are implemented by “smart pixel arrays.” The architecture exploits thebandwith advantageof the optical domain and can be dynamically reconfigured to embed conventional interconnection networks, including multiple busses, rings, and meshes. Unlike all-optical and passive optical systems, the proposed backplane is intelligent and can support communication primitives used in shared memory multiprocessing, including broadcasting, multicasting, acknowledgment, flow and error-control, buffering, shared memory caching, and synchronization. The backplane is also manufacturable using existing optoelectronic technologies. A second generation backplane supporting a distributed shared memory multi-processor is under development.  相似文献   

16.
High-level parallel programming models supporting dynamic fine-grained threads in a global object space are becoming increasingly popular for expressing irregular applications based on sophisticated adaptive algorithms and pointer-based data structures. However, implementing these multithreaded computations on scalable parallel machines poses significant challenges, particularly with respect to object caching. Object caching techniques must be able to tolerate unresponsive processors and protocol handler occupancy delays. This paper examines whether these challenges can be offset by leveraging responsive general-purpose communication architectural features (such as remote memory access and atomic operations), possibly compensating for the lack of more sophisticated hardware primitives by relying upon increased involvement of the run-time system and the compiler. A detailed performance analysis of four irregular applications, using the Illinois Concert System on the Cray T3D and the SGI Origin 2000, finds that existing software distributed shared memory (DSM) systems are capable of delivering good performance only in the presence of a high level of responsive communication architecture support (specifically, support for remote atomic operations). Recognizing that this situation stems from the synchronous request–reply nature of DSM protocols, we present a composable object caching framework, called view caching, which exploits knowledge of application data access semantics to construct custom protocols that require reduced processor synchronization. View caching protocols are more tolerant to responsiveness and occupancy delays and are able to exploit even lower level responsive communication primitives (such as nonatomic remote memory accesses) for a performance benefit.  相似文献   

17.
周杰  李文敬 《计算机科学》2017,44(Z11):586-591, 595
为解决多核机群Petri网并行化过程中,运用MPI+OPenMP混合编程实现同步会出现死锁的问题,提出了基于三层混合编程模型的Petri网并行算法。首先,根据事务内存的同步优势,在多核机群环境下构建MPI+OPenMP+STM的三层编程模型;然后,对Petri网的几何模型与代数模型的并行化进行分析,建立MPI+OPenMP+STM三层结构的Petri网并行模型,并对三层混合编程模型的Petri网并行算法进行设计与分析;最后,通过示例进行编程验证,该算法的运行效率明显优于其他编程模式,而且Petri网的规模越大,其并行计算的效果就越明显。因此,该算法是多核机群环境下模拟Petri网并行运行的一种高效且可行的算法。  相似文献   

18.
Martinez  J.F. Torrellas  J. 《Micro, IEEE》2003,23(6):126-134
Proper synchronization is vital to ensuring that parallel applications execute correctly. A common practice is to place synchronization conservatively so as to produce simpler code in less time. unfortunately, this practice frequently results in suboptimal performance because it stalls threads unnecessarily. Speculative synchronization overcomes this problem by allowing threads to speculatively execute past active barriers, busy locks, and unset flags. The result is high performance.  相似文献   

19.
PRISMA/DB是一并行的主存关系数据库管理系统。其设计思想主要有两个:第一,将整个数据库存入主存从而获得高性能,第二,系统使用一种面向对象的程序设计语言以模块方式实现,使得这种灵活的组织结构适应于功能和性能方面的分析和试验,目前其原型系统已实现,运行在一个具有100个结点的多处理机上。本文将其设计和实现细节作一初步介绍。  相似文献   

20.
随着工业以太网的发展,作为其实时性保障核心技术的时钟同步协议的安全性变得至关重要。针对时钟同步协议的安全性问题,首先提出一种基于有色Petri网的时钟同步协议安全性分析方法;然后通过建立协议的有色Petri网模型,利用状态方程等工具针对不安全状态的可达性进行判断分析,从而实现时钟同步协议的安全性分析;最后具体分析了一种基于精密时钟同步协议(PTP)的时钟同步协议以及针对该协议的主时钟欺骗攻击,验证了所提出方法的有效性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号