期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

周涤宇刘杰《计算机工程与科学》2008,30(4):62-65

非结构网格上求解粒子输运方程的可扩展并行算法是一个亟待解决的课题。本文在文献[1]并行流水线勖扫描算法的基础上提出了一种改进算法。改进后的算法可以有效降低原算法对并行机通信延迟的依赖,减少程序运行的通信时间,达到了缩短并行计算时间和提高并行性能的目的。针对二维粒子输运问题进行的数值实验表明,从64扩展到256个处理机时,加速比呈线性增长,改进算法比原算法的并行计算时间最大减少了19％。相似文献

2.

并行环境下的同步异步PSO算法

职为梅王芳范明杨勇《计算机技术与发展》2009,19(3)

并行计算能够有效地缩减求解大规模问题的时间.文中在介绍了粒子群算法(Particle Swarm Optimization algo rithm)的基础上,对PSO算法的同步异步模型进行分析,给出了并行环境下的同步异步PSO算法.该并行算法在联想深腾1800大型汁算机上测试.实验证明PSO算法具有较高的并行性,并行算法明显提高了求解的速度. 相似文献

3.

并行文件系统I/O特征

赵铁柱《网络安全技术与应用》2013,(8):61-63

并行文件系统是并行计算系统的存储子系统,I/O性能是并行计算系统研究的重要方面.本文分析了并行文件系统I/O研究的难点,研究了并行文件系统的I/O特征和关键技术,指出了并行文件系统I/O性能研究未来的研究方向,为并行计算系统的设计和优化提供重要参考. 相似文献

4.

多体动力学优化的并行SQP方法

刘小蒙魏继卿鲁玉祥《计算机仿真》2015,32(5)

并行计算的发展大大提高计算机的计算效率,降低计算时间.针对多体动力学的优化问题,分析了求解灵敏度的三种方法的并行性,建立了有限差分法与直接微分法的并行算法.同时采用并行Armijo线性搜索,构成了完整的并行序列二次规划(SQP)算法.将上述算法应用到曲柄滑块的优化中,并与串行SQP算法进行了比较,证实了并行SQP算法可以大大降低计算时间.上述研究为多体动力学优化提供了一种并行求解思路. 相似文献

5.

基于MapReduce的并行AES加密算法

付雅丹杨庚胡持闵兆娥《计算机应用》2015,35(11):3079-3082

针对云计算环境的隐私保护问题,采用加密数据存储是一个可行的选择.为了提高数据加密解密的速度,结合云环境的并行计算特点和AES加密算法,设计了一种并行AES加密方案,给出了具体的并行算法,分析了算法的性能,并通过实验证明了方法的有效性.实验结果表明该并行算法在MapReduce模式下,在16核4节点构成的云计算集群上能够达到15.9的加速比,总加密时间减少了72.7%. 相似文献

6.

基于时间分解求解时间依赖问题的并行算法研究

李永刚欧阳洁肖曼玉《数值计算与计算机应用》2007,28(1):27-37

基于Lions等建立的Parareal模型,提出了改进的时间分解并行算法,并给出收敛性证明．采用主从模式构造了通用的MPI算法流程,通过分析算法的并行加速比给出了最佳的粗细网格步长之比．在集群系统下分别对热传导方程和对流扩散方程进行并行计算,证明了算法无论对线性还是非线性的问题均具有良好的适应性和扩展性．数值模拟结果表明:时间分解并行算法仅需极少的迭代次数即能取得很高的计算精度,且具有较好的并行加速比和并行效率．相似文献

7.

一个选择算法及其并行化

武继刚《计算机工程与设计》1996,17(5):60-64,F003

文中用合并选择的思想及堆上的最佳算法，给出了求解选择问题的一个新算法及其相应的并行化。将串行合并选择算法的复杂度ｎＬｏｇｋ＋Ｏ（ｎ）降低到（ｎＬｏｇｋ）／２＋（ｎＬｏｇＬｏｇｋ）／２＋Ｏ（ｎ），并保持了原并行算法的结构，在ＳＩＭＤ树型机器的并行计算模型上，并行运行相似文献

8.

Krylov子空间方法解线性方程组的并行性能分析及应用

刘青昆舒继武《计算机工程与应用》1999,35(6):33-36

许多并行计算问题,在结合并行机的特有体系结构时,要对算法的并行性能及其可扩展性进行分析。它决定了该算法解决有关问题是否有效,并进一步判断所用的并行计算系统是否符合求解问题的要求。文章通过对Ｋｒｙｌｏｖ子空间中两种有效算法－ＰＣＧ算法和ＧＭＲＥＳ（ｍ）算法在一类并行系统中形成的并行算法的性能进行了分析,给出了其求解问题规模与处理机数与加速比的关系结果表明。ＧＭＲＥＳ（ｍ）算法比ＰＣＧ算法更适合于并行。相似文献

9.

一种求解输运方程的并行调度算法

周涤宇刘杰《计算机学报》2010,33(5)

高效并行扫描问题是调度问题的子集,调度问题是NP完全问题.针对输运问题的特点,如何按特定的计算次序调度本地网格单元,以保证最佳的计算与通信性能是一个难度很大的问题.文中设计了一种基于局部深度优先的优先级(PDFDS)算法,该算法具有局部性、通信量小、优先级队列好等特点.将PDFDS算法应用到求解二维粒子输运方程的程序中,与现有的调度算法相比,新算法具有更好的并行计算效果,对于大规模计算问题,可以扩展到1024个处理器,相对于64个处理器的并行效率达到了96%. 相似文献

10.

I/O受限的并行加速比模型与可扩展I/O体系结构

李琼杜云飞杨学军《计算机工程与科学》2011,33(3):28

为了缓解I/O瓶颈问题,可以从应用程序、可扩展算法、编译器和语言、运行时库、操作系统和体系结构六方面展开研究。其中,I/O体系结构是所有技术途径的关键支撑。当前并行I/O性能分析缺乏科学的理论模型为I/O体系结构设计提供理论依据。本文针对并行计算机系统的可扩展性问题,研究了I/O负载对并行计算机系统可扩展性的影响,建立了I/O受限的并行加速比性能模型,对目前大规模并行计算机系统中三种常用I/O体系结构的可扩展性进行了分析;以此为理论依据,提出了一种面向高性能计算的可扩展并行I/O系统结构。同时,还提出了几种有效降低I/O操作服务时间的策略,从而达到增强系统可扩展性的目的,为后续研究奠定了基础。相似文献

11.

Parallelizing the Data Cube 总被引：1，自引：0，他引：1

Frank Dehne Todd Eavis Susanne Hambrusch Andrew Rau-Chaplin 《Distributed and Parallel Databases》2002,11(2):181-201

This paper presents a general methodology for the efficient parallelization of existing data cube construction algorithms. We describe two different partitioning strategies, one for top-down and one for bottom-up cube algorithms. Both partitioning strategies assign subcubes to individual processors in such a way that the loads assigned to the processors are balanced. Our methods reduce inter processor communication overhead by partitioning the load in advance instead of computing each individual group-by in parallel. Our partitioning strategies create a small number of coarse tasks. This allows for sharing of prefixes and sort orders between different group-by computations. Our methods enable code reuse by permitting the use of existing sequential (external memory) data cube algorithms for the subcube computations on each processor. This supports the transfer of optimized sequential data cube code to a parallel setting.The bottom-up partitioning strategy balances the number of single attribute external memory sorts made by each processor. The top-down strategy partitions a weighted tree in which weights reflect algorithm specific cost measures like estimated group-by sizes. Both partitioning approaches can be implemented on any shared disk type parallel machine composed of p processors connected via an interconnection fabric and with access to a shared parallel disk array.We have implemented our parallel top-down data cube construction method in C++ with the MPI message passing library for communication and the LEDA library for the required graph algorithms. We tested our code on an eight processor cluster, using a variety of different data sets with a range of sizes, dimensions, density, and skew. Comparison tests were performed on a SunFire 6800. The tests show that our partitioning strategies generate a close to optimal load balance between processors. The actual run times observed show an optimal speedup of p. 相似文献

12.

非定常粒子输运蒙特卡罗自适应并行计算

邓力袁国兴黄正丰许海燕王瑞宏李树《数值计算与计算机应用》2003,24(2):111-115

§1.引言对Boltzmann方程求解,采用连续截面、精确角分布的蒙特卡罗模拟(下简记为MC),可以获得理想的结果,然而MC方法计算耗时多是其相对其它方法的最大不足,并行计算和高加速比是克服这种不足的可行途径。相似文献

13.

非结构网格粒子输运Sn并行算法

迟利华刘杰田平《计算机工程与科学》2010,32(10):85-89

本文基于网格区域剖分,提出了一种新的非结构网格粒子输运Sn并行算法,实现了多个角方向和多个能群的同时计算,在计算的过程中不用进行优先级计算和优先级队列维护,只需要按照计算队列的次序组织并行计算。综合考虑所有方向和所有网格点的数据依赖关系,结合B-level优先级,提出了一种优先级计算方法,优先计算需要数据发送的任务,延迟需要接收数据的任务,达到减少处理器等待时间和计算与通信重叠的目的。使用本文的Sn并行算法和优先级队列针对二维粒子输运问题进行的数值实验表明,并行算法具有良好的并行计算加速效果,扩展到1 024个处理机时,相对64个处理机的并行效率达到52%。相似文献

14.

并行无存储冲突的邻接矩阵算法

李朝鹏成运《数字社区&智能家居》2009,(25)

邻接矩阵算法在科学计算与信息处理方面有着极为重要的应用,是图论的基础研究之一。针对目前邻接矩阵算法多是基于串行,或并行SIMD模型而无法解决存储冲突的问题,提出一种基于SIMD-EREW共享存储模型的并行邻接矩阵算法。算法使用O(p)个并行处理单元,在O(n2/p)的时间内完成对n个数据点邻接矩阵的计算。将提出算法与现有算法进行的性能对比分析表明:本算法明显改进了现有文献的研究结果,是一种并行无存储冲突的邻接矩阵算法。相似文献

15.

软件DSM机群上并行大规模地理图像处理系统ParGIP

史岗张福新胡伟武韩承德《计算机研究与发展》2003,40(1):53-59

基于共享虚拟存储（shared virtual memory,SVM)PC机群的大规模并行地理图像处理原型系统ParGIP(parallel geographical image processing)采用Client-Server计算模型，通过软件分布式共享存储（software distributed shared memory,software DSM)中间层将PC机群组织成一个逻辑上共享的内存的并行计算平台，地理图像处理可以充分利用ParGIP提供的大共享内存和并行处理能力来提高性能，缩短处理周期，从而解决传统单机串行方式下地理图像处理中内存匮乏和计算能力不足的问题，ParGIP还进一步将机群中各个结点上分布的磁盘组织起来，提供地理影像库所需的海量存储空间和并行I／O能力，测试结果表明，ParGIP的8机并行I／O带宽达到102．6MB/s，典型的图像处理算法获得了接近线性的加速比。相似文献

16.

A message combining approach for efficient array redistribution in non-all-to-all communication networks

《国际计算机数学杂志》2012,89(11):1609-1619

The Array redistribution problem is the heart of a number of applications in parallel computing. This paper presents a message combining approach for scheduling runtime array redistribution of one-dimensional arrays. The important contribution of the proposed scheme is that it eliminates the need for local data reorganization, as noted by Sundar in 2001; the blocks destined for each processor are combined in a series of messages exchanged between neighbouring nodes, so that the receiving processors do not need to reorganize the incoming data blocks before storing them to memory locations. Local data reorganization is of great importance, especially in networks where there is no direct communication between all nodes (like tori, meshes, and trees). Thus, a block must travel through a number of relays before reaching the target processor. This requires a higher number of messages generated, therefore, a higher number of data permutations within the memory of each target processor should be made to assure correct data order. The strategy is based on a relation between groups of communicating processor pairs called superclasses. 相似文献

17.

并行无存储冲突的邻接矩阵算法

李朝鹏成运《数字社区&智能家居》2009,5(9):7201-7202

邻接矩阵算法在科学计算与信息处理方面有着极为重要的应用,是图论的基础研究之一。针对目前邻接矩阵算法多是基于串行,或并行SIMD模型而无法解决存储冲突的问题,提出一种基于SIMD—EREW共享存储模型的并行邻接矩阵算法,算法使用O（p）个并行处理单元,在O（n^2／p）的时间内完成对n个数据点邻接矩阵的计算。将提出算法与现有算法进行的性能对比分析表明：本算法明显改进了现有文献的研究结果,是一种并行无存储冲突的邻接矩阵算法。相似文献

18.

A new parallel algorithm for parsing arithmetic infix expressions

Y. N. Srikant Priti Shankar 《Parallel Computing》1987,4(3):291-304

A new parallel algorithm for transforming an arithmetic infix expression into a par se tree is presented. The technique is based on a result due to Fischer (1980) which enables the construction of the parse tree, by appropriately scanning the vector of precedence values associated with the elements of the expression. The algorithm presented here is suitable for execution on a shared memory model of an SIMD machine with no read/write conflicts permitted. It uses O(n) processors and has a time complexity of O(log²n) where n is the expression length. Parallel algorithms for generating code for an SIMD machine are also presented. 相似文献

19.

An optimal parallel algorithm for planar cycle separators

Ming-Yang Kao Shang-Hua Teng K. Toyama 《Algorithmica》1995,14(5):398-408

We present an optimal parallel algorithm for computing a cycle separator of ann-vertex embedded planar undirected graph inO(logn) time onn/logn processors. As a consequence, we also obtain an improved parallel algorithm for constructing a depth-first search tree rooted at any given vertex in a connected planar undirected graph in O(log² n) time on n/logn processors. The best previous algorithms for computing depth-first search trees and cycle separators achieved the same time complexities, but withn processors. Our algorithms run on a parallel random access machine that permits concurrent reads and concurrent writes in its shared memory and allows an arbitrary processor to succeed in case of a write conflict.A preliminary version of this paper appeared as Improved Parallel Depth-First Search in Undirected Planar Graphs in theProceedings of the Third Workshop on Algorithms and Data Structures, 1993, pp. 407–420.Supported in part by NSF Grant CCR-9101385. 相似文献