期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

基干MPI的并行I/0方法

李小卫罗省贤《微型机与应用》2003,22(3):13-14

基于MPI-2规范的并行I/O方法,以并行矩阵乘法问题为例,比较了并行I/O和串行I/O的性能,给出了并行I/O方法的应用实例. 相似文献

2.

基于MPI的并行I/O方法 总被引：3，自引：0，他引：3

李小卫罗省贤《微型机与应用》2003,(3)

基于MPI-2规范的并行I/O方法,以并行矩阵乘法问题为例,比较了并行I/O和串行I/O的性能,给出了并行I/O方法的应用实例。相似文献

3.

基于HDF5的结构网格计算流体动力学程序并行I/O技术

杨丽鹏车永刚《计算机应用》2013,33(9):2423-2427

大规模计算流体动力学(CFD)计算对数据I/O能力提出了很高需求。层次式文件格式(HDF5)可有效管理大规模科学数据,并对并行I/O具有良好的支持。针对结构网格CFD并行程序,设计了其数据文件的HDF5存储模式,并基于HDF5并行I/O编程接口实现了其数据文件的并行I/O,在并行计算机系统上进行了性能测试与分析。结果表明,在使用4~32个进程时,基于HDF5并行I/O方式的写文件性能比每进程独立写普通文件的方式高6.9~16.1倍;基于HDF5并行I/O方式的读文件性能不及后者,为后者的20%~70%,但是读文件的时间开销远小于写文件的时间开销,因此对总体性能的影响较小。相似文献

4.

基于Linux的SMP机群环境中并行I/O模型研究 总被引：1，自引：0，他引：1

曾碧卿陈志刚吕西红刘安丰《计算机工程与应用》2004,40(26):79-81

提出了一个基于数据通路的波浪推进式并行I/O模型框架,并在基于Linux的SMP机群系统中,根据波浪推进式并行I/O模型框架对各个数据通路进行建模,具体分析了这个波浪推进式并行I/O模型,从思路上解决了刻画并行I/O性能的并行I/O模型问题。相似文献

5.

多模式并行I/O测试程序Jetter的研究与实现

下载免费PDF全文

曹立强罗红兵张晓霞《计算机工程与应用》2008,44(4):97-100

并行I/O系统有多种存取模式,它们有各自的存取特点和适用范围。为了获得不同模式下的系统性能,并行I/O测试中往往要综合使用多种微测试程序。这不仅要求用户深入了解并行I/O的特点,而且要求他们熟悉各种并行I/O微测试程序的输入与输出。提出并实现了一个并行I/O测试Jetter,它从接口类型、存取模式和进程－文件关系的角度划分了并行I/O接口,不仅能够测试I/O系统在上述模式下的性能,而且简化了测试工作。实际应用Jetter表明,并行I/O系统对不同模式的支持效果不同,最高差异可以达到两个数量级以上,这些测试结论有助于用户开发高质量的并行程序。相似文献

6.

基于FFT并行算法的并行I/O性能的研究

ZHANG Guan-hong GAO Ling-ling 《数字社区&智能家居》2008,(33)

该研究对象为并行计算机的I/O性能,将任务分发给不同的处理结点,通过进程间的相互协调、有序合作完成FFT并行算法的实现。在完成任务的过程中,通过记录I/O时间与计算时间,求出I/O性能与计算性能,通过分析比较数据从而认识I/O性能的重要性。研究计算机的I/O性能对于如何进一步改进系统以及提高资源利用率具有重要意义。相似文献

7.

I/O受限的并行加速比模型与可扩展I/O体系结构

李琼杜云飞杨学军《计算机工程与科学》2011,33(3):28

为了缓解I/O瓶颈问题,可以从应用程序、可扩展算法、编译器和语言、运行时库、操作系统和体系结构六方面展开研究。其中,I/O体系结构是所有技术途径的关键支撑。当前并行I/O性能分析缺乏科学的理论模型为I/O体系结构设计提供理论依据。本文针对并行计算机系统的可扩展性问题,研究了I/O负载对并行计算机系统可扩展性的影响,建立了I/O受限的并行加速比性能模型,对目前大规模并行计算机系统中三种常用I/O体系结构的可扩展性进行了分析;以此为理论依据,提出了一种面向高性能计算的可扩展并行I/O系统结构。同时,还提出了几种有效降低I/O操作服务时间的策略,从而达到增强系统可扩展性的目的,为后续研究奠定了基础。相似文献

8.

科学计算双路并行I/O优化方法

曹立强莫则尧沈卫超夏芳陈军《计算机学报》2015,38(5)

科学计算数据集由数据和元数据组成.一般条件下,数据的尺寸较大,元数据尺寸较小.传统的高性能计算机并行文件系统可以高效率地读写大块连续数据,但是无法高效率地读写大量较小块的元数据.一旦大块数据和小块元数据两类读写特征混杂在一起,元数据将较严重地干扰并行I/O,造成性能的下降.为此,文中提出数据与元数据分治的双路并行I/O方法.该方法在高层I/O库中建立内存文件系统与并行文件系统两级存储,在存储资源之间并行迁移科学计算元数据.一方面降低较频繁读写元数据的I/O延迟,另一方面改变科学计算数据的存储特征与存储模式,从而提高科学计算应用、尤其是数据分析与可视化等读入密集型应用的I/O效率.测试表明,双路并行I/o方法可提高写性能8％～13％,提高读性能89％到1.01倍. 相似文献

9.

集群环境下I/O文件传输调度器的设计与实现

刘冰段富《电脑开发与应用》2012,25(6):33-35

近年来研究人员对高性能计算中的并行I/O问题进行了深入研究,然而这些研究主要针对MPP问题,而对集群计算机系统中并行I/O问题的研究不多。因此,对于集群计算中的并行I/O系统进行研究是一个重要的研究课题。对集群计算中的并行I/O传输调度效率进行研究,设计了一个文件传输调度器,可以实现文件传输最快捷,节点资源最大利用,显著提高I/O节点吞吐率和反应时间。经过大量数据的测试和实验证明该调度器的有效性和适用性。相似文献

10.

基于HD F5实现多区结构网格CFD程序的并行I/O

杨丽鹏车永刚《计算机研究与发展》2015,(4):861-868

计算流体动力学（computational fluid dynamics ,CFD）是高性能计算重要应用领域之一,其计算涉及大量数据访问．在大规模并行计算情况下,串行I/O的性能与计算能力不匹配,I/O成为性能瓶颈．并行I/O 是解决这一问题的主要途径之一．针对一个真实多区结构网格CFD 并行程序 HOSTA （high‐order simulator for aerodynamics）,基于HDF5（hierarchical data format v5）数据存储格式及其并行I/O编程接口,实现了其主要数据的并行I/O．在一套有6个I/O服务器结点的高性能计算机系统上,采用实际C FD算例进行了性能测试．对一个三角翼算例,并行I/O相对于串行I/O的性能加速比达到21．27,最高获得5．81 GBps的I/O吞吐率,并使程序整体性能提高10％以上;对一个网格规模更大的简单翼型算例,并行I/O最高获得了6．72 GBps的I/O吞吐率．相似文献

11.

Parallel programming interface for distributed data

Manhui Wang Andrew J. May Peter J. Knowles 《Computer Physics Communications》2009,180(12):2673-2679

The Parallel Programming Interface for Distributed Data (PPIDD) library provides an interface, suitable for use in parallel scientific applications, that delivers communications and global data management. The library can be built either using the Global Arrays (GA) toolkit, or a standard MPI-2 library. This abstraction allows the programmer to write portable parallel codes that can utilise the best, or only, communications library that is available on a particular computing platform.Program summaryProgram title: PPIDDCatalogue identifier: AEEF_v1_0Program summary URL: http://cpc.cs.qub.ac.uk/summaries/AEEF_1_0.htmlProgram obtainable from: CPC Program Library, Queen's University, Belfast, N. IrelandLicensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.htmlNo. of lines in distributed program, including test data, etc.: 17 698No. of bytes in distributed program, including test data, etc.: 166 173Distribution format: tar.gzProgramming language: Fortran, CComputer: Many parallel systemsOperating system: VariousHas the code been vectorised or parallelized?: Yes. 2–256 processors usedRAM: 50 MbytesClassification: 6.5External routines: Global Arrays or MPI-2Nature of problem: Many scientific applications require management and communication of data that is global, and the standard MPI-2 protocol provides only low-level methods for the required one-sided remote memory access.Solution method: The Parallel Programming Interface for Distributed Data (PPIDD) library provides an interface, suitable for use in parallel scientific applications, that delivers communications and global data management. The library can be built either using the Global Arrays (GA) toolkit, or a standard MPI-2 library. This abstraction allows the programmer to write portable parallel codes that can utilise the best, or only, communications library that is available on a particular computing platform.Running time: Problem dependent. The test provided with the distribution takes only a few seconds to run. 相似文献

12.

An O(1) time algorithm for the 3D Euclidean distance transform on the CRCW PRAM model

Yuh-Rau Wang Shi-Jinn Horng 《Parallel and Distributed Systems, IEEE Transactions on》2003,14(10):973-982

We develop a parallel algorithm for the 2D Euclidean distance transform (2D/spl I.bar/EDT, for short) of a binary image of size N /spl times/ N in O(1) time using N/sup 2+/spl delta/+/spl epsi// CRCW processors and a parallel algorithm for the 3D Euclidean distance transform (3D/spl I.bar/EDT, for short) of a binary image of size N /spl times/ N /spl times/ N in O(1) time using N/sup 3+/spl delta/+/spl epsi// CRCW processors, where /spl delta/=1/, /spl epsi/=1/(2/sup c+1/-1), h, and are constants and positive integers. Our 2D/spl I.bar/EDT (3D/spl I.bar/EDT) parallel algorithm can be used to build up Voronoi diagram and Voronoi polygons (polyhedra) in a 2D (3D) binary image also. All of these parallel algorithms can be performed in O(1) time using N/sup 2+/spl delta/+/spl epsi// (N/sup 3+/spl delta/+/spl epsi//) CRCW processors. To the best of our knowledge, all results derived above are the best O(1) time algorithms known. 相似文献

13.

An adaptive cache coherence protocol specification for parallel input/output systems

Garcia-Carballeira F. Carretero J. Calderon A. Perez J.M. Garcia J.D. 《Parallel and Distributed Systems, IEEE Transactions on》2004,15(6):533-545

Caching has been intensively used in memory and traditional file systems to improve system performance. However, the use of caching in parallel file systems and I/O libraries has been limited to I/O nodes to avoid cache coherence problems. We specify an adaptive cache coherence protocol that is very suitable for parallel file systems and parallel I/O libraries. This model exploits the use of caching, both at processing and I/O nodes, providing performance improvement mechanisms such as aggressive prefetching and delayed-write techniques. The cache coherence problem is solved by using a dynamic scheme of cache coherence protocols with different sizes and shapes of granularity. The proposed model is very appropriate for parallel I/O interfaces, such as MPI-IO. Performance results, obtained on an IBM SP2, are presented to demonstrate the advantages offered by the cache management methods proposed. 相似文献

14.

地理栅格数据的并行访问方法研究 总被引：1，自引：0，他引：1

欧阳柳熊伟程果陈宏盛陈荤《计算机科学》2012,39(11):116-121

在海量地理栅格数据处理中,数据I/O性能是影响处理算法程序整体性能的关键。目前针对地理栅格数据 I/O优化问题的研究成果还很有限,通过对并行程序中的数据I/O模式进行深入分析,结合栅格数据逻辑模型和物理模型的特点,提出了面向地理栅格数据的并行I/O框架;基于消息传递模型,实现了4种并行访问方法。实验证明,并行访问方法优于传统的串行访问方法和分时多进程访问方法。该研究成果可以提高并行栅格处理程序的I/()访问效率,进而提高其整体并行性能。相似文献

15.

Improved version of parallel programming interface for distributed data with multiple helper servers

Manhui Wang Andrew J. May Peter J. Knowles 《Computer Physics Communications》2011,182(7):1502-1506

We present an improved version of the Parallel Programming Interface for Distributed Data with Multiple Helper Servers (PPIDDv2) library, which provides a common application programming interface that is based on the most frequently used functionality of both MPI-2 and GA. Compared with the previous version, the PPIDDv2 library introduces multiple helper servers to facilitate global data structures, and allows programmers to make heavy use of large global data structures efficiently.

Program summary

Program title: PPIDDv2Catalogue identifier: AEEF_v2_0Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEEF_v2_0.htmlProgram obtainable from: CPC Program Library, Queen?s University, Belfast, N. IrelandLicensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.htmlNo. of lines in distributed program, including test data, etc.: 22 997No. of bytes in distributed program, including test data, etc.: 184 477Distribution format: tar.gzProgramming language: Fortran, CComputer: Many parallel systemsOperating system: VariousHas the code been vectorised or parallelised?: Yes. 2–1024 processors usedRAM: 50 MbytesClassification: 6.5External routines: Global Arrays or MPI-2Catalogue identifier of previous version: AEEF_v1_0Journal reference of previous version: Comput. Phys. Comm. 180 (2009) 2673Does the new version supersede the previous version?: YesNature of problem: Many scientific applications require management and communication of data that is global, and the standard MPI-2 protocol provides only low-level methods for the required one-sided remote memory access.Solution method: The Parallel Programming Interface for Distributed Data (PPIDD) library provides an interface, suitable for use in parallel scientific applications, that delivers communications and global data management. The library can be built either using the Global Arrays (GA) toolkit, or a standard MPI-2 library. This abstraction allows the programmer to write portable parallel codes that can utilise the best, or only, communications library that is available on a particular computing platform.Reasons for new version: In the previous version, functionality in global data structure was mainly implemented by MPI-2 passive one-sided operations. In real applications which make heavy use of global data structures, very poor performance was observed.Summary of revisions: Multiple helper servers are introduced to facilitate the manipulation and management of global data structure. Mutual exclusion is also implemented by the help of a data server, and becomes much more robust and efficient. In addition, flexible options are provided to choose different settings for helper servers. Significant improvement has been seen in performance tests.Running time: Problem-dependent. The test provided with the distribution takes only a few seconds to run. 相似文献

16.

Object-oriented analysis and design of the Message Passing Interface

Anthony Skjellum Diane G. Wooley Ziyang Lu Michael Wolf Purushotham V. Bangalore Andrew Lumsdaine Jeffrey M. Squyres Brian McCandless 《Concurrency and Computation》2001,13(4):245-292

The major contribution of this paper is the application of modern analysis techniques to the important Message Passing Interface standard, work done in order to obtain information useful in designing both application programmer interfaces for object-oriented languages, and message passing systems. Recognition of ‘Design Patterns’ within MPI is an important discernment of this work. A further contribution is a comparative discussion of the design and evolution of three actual object-oriented designs for the Message Passing Interface ( MPI-1SF ) application programmer interface (API), two of which have influenced the standardization of C++ explicit parallel programming with MPI-2, and which strongly indicate the value of a priori object-oriented design and analysis of such APIs. Knowledge of design patterns is assumed herein. Discussion provided here includes systems developed at Mississippi State University (MPI++), the University of Notre Dame (OOMPI), and the merger of these systems that results in a standard binding within the MPI-2 standard. Commentary concerning additional opportunities for further object-oriented analysis and design of message passing systems and APIs, such as MPI-2 and MPI/RT, are mentioned in conclusion. Connection of modern software design and engineering principles to high performance computing programming approaches is a new and important further contribution of this work. Copyright © 2001 John Wiley & Sons, Ltd. 相似文献

17.

MPI程序的Petri网模型及其验证 总被引：1，自引：0，他引：1

眭聃王力生叶青《计算机应用与软件》2007,24(10):205-206,209

利用PVM程序中抽取Petri网的方法实现了MPI程序的部分功能语句的Petri网抽象,并分别针对MPI-1和MPI-2在通信方式上的新特性给出对应的Petri网模型抽象方法,使利用Pertri网模型对MPI程序正确性进行理论验证成为可能. 相似文献

18.

Performance Evaluation of a Parallel Pipeline Computational Model for Space-Time Adaptive Processing

Wei-Keng Liao Alok Choudhary Donald Weiner Pramod Varshney 《The Journal of supercomputing》2005,31(2):137-160

This paper presents further results on the design and implementation of various optimizations based on our earlier work of developing a parallel pipelined model for the computational intensive applications that have multiple processing tasks. Performance evaluation of this model was done by using a real-time airborne radar application that employs a Space-Time Adaptive Processing (STAP) algorithm. This paper focuses on the following four issues: (1) The tradeoffs between increasing the throughput and reducing the latency are examined in more detail when allocating processors among different processing tasks. (2) A multi-threaded design is incorporated into the pipeline model and implemented on a massively parallel computer with symmetric multi-processor nodes, which shows enhanced performance. (3) The disk I/O is incorporated into the parallel pipeline to study its effect on performance in which two I/O task designs have been implemented: embedding I/O in the pipeline or having a separate I/O task. By using a double buffering approach together with the asynchronous I/O, the overall pipeline performance scales well as the number of processors increases. (4) From the comparison of the two I/O implementations, it is discovered that the latency may be improved when merging multiple tasks into a single task. The effect of reorganizing the task structure of the pipeline is discussed in detail. All the performance results shown in this work demonstrate the linear scalability the parallel pipeline model can achieve using a production radar application. Although this paper focuses on the implementation of the parallel pipeline model and uses the results from a STAP application to support the claims of the discovered properties for this pipeline, this model is also applicable to many other types of applications with similar computational characteristics. 相似文献

19.

基于RDMA操作的MPI-2单边通信的设计与实现

江海昇范辉《计算机应用》2006,26(3):550-0552

MPI-2单边通信存在很高的通信开销以及对通信进程中远程进程的依赖。为此提出了在InfiniBand体系结构上的高性能MPI-2单边通信设计方法。其中，MPI-2单边通信操作，比如MPI_Put, MPI_Get以及MPI_Accumulate将对应于InfiniBand远程直接内存访问(Remote Direct Memory Access, RDMA)操作。设计是基于MPICH2的在InfiniBand上的应用，可以很好地实现通信和计算的重叠处理。相似文献