共查询到20条相似文献,搜索用时 9 毫秒
1.
In this article, we present a multigrid algorithm for parallel computers, the chopped parallel multigrid (CPMG) algorithm. The CPMG algorithm improves the processor utilization by reducing the work load on coarse grids without affecting the convergence rate of the algorithm. This is in contrast to earlier approaches (Gannon and van Rosendale, 1986; Frederickson and McBryan, 1989), where unutilized processors are used to improve the convergence rate. The CPMG algorithm reduces the coarse grid work bychopping the alternate cycles of multigrid. Using analytical results and simulations on sequential machines we show that the CPMG can achieve almost the same convergence rate as standard MG for many cases. Analytically we show that the advantage gained by CPMG over standard MG on a mesh connected massively parallel machine is 33% in hardware utilization, 50% in communication overheads and 38% in overall execution time. We have also evaluated the performance of CPMG on an actual massively parallel machine, the DAP-510. The advantage gained by CPMG over standard MG is 35% in overall execution time. Moreover, the CPMG can be integrated with other parallel multigrid algorithms, such as the PSMG algorithm (Frederickson and McBryan, 1989) and Decker's algorithm (Decker, 1990). 相似文献
2.
3.
A novel parallel delayed least-mean-square (PDLMS) algorithm is proposed by introduc-ing the parallel processing method into delayed LMS (DLMS) algorithm. Compared with DLMS, the al-gorithm presented has the property of smaller delay, higher throughput rate and faster convergence speed, while it also exhibits some de-correlation effect for the correlated input sequence. These prop-erties make it more suitable for the cases of high order filter with high convergence speed. At the same time, it can be mapped onto the high-speed and/or high-pipelined hardware structure directly. 相似文献
4.
5.
6.
《国际计算机数学杂志》2012,89(3-4):435-440
This paper presents a parallel algorithm for solving the implicit diffusion difference equations. The basic idea is based on vectorization of the tridiagonal Toeplitz difference equations. This method is superior to the algorithm showed by H. Stone [8]. We computed some examples on an NEC SX-3/44R supercomputer by our method. The results showed a good parallelism with this algorithm. 相似文献
7.
《国际计算机数学杂志》2012,89(3-4):249-270
We study the parallel implementation of two diagonalization methods for solving dense linear systems: the well known Gauss-Jordan method and a new one introduced by Huard. The number of arithmetic operations performed by the Huard method is the same as for Gaussian elimination, namely 2n 3/3, less than for the Jordan method, namely n 3. We introduce parallel versions of these methods, compare their performances and study their complexity. We assume a shared memory computer with a number of processors p of the order of n, the size of the problem to be solved, We show that the best parallel version for Jordan's method is by rows whereas the best one for Huard's method is by columns. Our main result states that for a small number of processors the parallel Huard method is faster than the parallel Jordan method and slower otherwise. The separation is obtained for p = 0.44n. 相似文献
8.
尚月强 《计算机工程与应用》2007,43(19):61-63
针对基于PVM的由桌面PC机联网而成的网络并行计算环境中,处理机的运算速度较快而处理机间的通信相对较慢的实际情况,给出了一种局域网求解三角形方程组的并行算法,该算法将三角形方程组的系数矩阵及右端项按行分块,然后将分块的系数矩阵及右端项按卷帘方式存储在各处理机,通过循环传送已求出的解的部分分量以减少处理机间的通信开销,实现较容易。并在1-4台桌面PC机联成的局域网,PVM 3.4 on Windows2000,VC 6.0并行计算平台上编程对该算法进行了数值试验,试验结果表明该算法是有效的。 相似文献
9.
基于平衡划分的并行投影算法 总被引:2,自引:2,他引:0
沈燕芬 《计算机工程与设计》2005,26(10):2762-2764
基于DL算法,提出并分析了平衡划分并行投影算法PROJECT-DL。在PROJECT-DL算法中,数据被平均划分并分配给所有处理机,因而每个处理机具有相同的工作负载。给出了网络并行计算环境下的实验结果,并与PROJECT-S、PROJECT-NS算法进行了对比。理论分析和实验结果表明,PROJECT-DL算法是一种高并行效率、高扩展性的并行投影算法。 相似文献
10.
Effective design of parallel matrix multiplication algorithms relies on the consideration of many interdependent issues based on the underlying parallel machine or network upon which such algorithms will be implemented, as well as, the type of methodology utilized by an algorithm. In this paper, we determine the parallel complexity of multiplying two (not necessarily square) matrices on parallel distributed-memory machines and/or networks. In other words, we provided an achievable parallel run-time that can not be beaten by any algorithm (known or unknown) for solving this problem. In addition, any algorithm that claims to be optimal must attain this run-time. In order to obtain results that are general and useful throughout a span of machines, we base our results on the well-known LogP model. Furthermore, three important criteria must be considered in order to determine the running time of a parallel algorithm; namely, (i) local computational tasks, (ii) the initial data layout, and (iii) the communication schedule. We provide optimality results by first proving general lower bounds on parallel run-time. These lower bounds lead to significant insights on (i)–(iii) above. In particular, we present what types of data layouts and communication schedules are needed in order to obtain optimal run-times. We prove that no one data layout can achieve optimal running times for all cases. Instead, optimal layouts depend on the dimensions of each matrix, and on the number of processors. Lastly, optimal algorithms are provided. 相似文献
11.
颜启华 《计算机工程与设计》2008,29(10):2570-2572
提出了一种新的并行并操作算法PUDL,充分利用DL子算法能精确定位多个划分点的特性,使得划分后各个处理机要处理的子关系大小相等.因而算法具有较高的负载平衡性、可扩展性.最后给出了基于PC集群的实验结果,并把该结果与UNION-S、UNION-NS算法作了比较. 相似文献
12.
张哲 《计算机工程与应用》2010,46(29):47-49
在对称多处理机系统上,提出了一种求解稀疏对称有限元线性系统的正规化精确并行逆算法。该算法以一种避免数据依赖的反对角运动方法为基础,使用OpenMP编译指导来实现。诸如加速比和效率等数值实验结果的推出,说明在一个对称多处理机系统上,所提出的算法求解方法能更好地提高性能,获得更大的加速。 相似文献
13.
Thepaging problem is that of deciding which pages to keep in a memory ofk pages in order to minimize the number of page faults. We develop thepartitioning algorithm, a randomized on-line algorithm for the paging problem. We prove that its expected cost on any sequence of requests is within a factor ofH
k of optimum. (H
k is thekth harmonic number, which is about ln(k).) No on-line algorithm can perform better by this measure. Our result improves by a factor of two the best previous algorithm.Partial support for D. D. Sleator was provided by DARPA, ARPA Order 4976, Amendment 20, monitored by the Air Force Avionics Laboratory under Contract F33615-87-C-1499, and by the National Science Foundation under Grant CCR-8658139. 相似文献
14.
若干并行计算模型上的N体问题求解算法 总被引:1,自引:0,他引:1
从在实际中广泛应用的N体问题入手,研究如何在几种实际的并行计算模型(PRAM、APRAM、BSP、LogP、NHBL)上设计具体的并行算法;给出了这些模型上的并行算法的设计模式,分析不同模型上算法的性能,比较各个模型上算法设计风格以及算法性能的差异,并对这些并行计算模型做一个综合的评价。 相似文献
15.
为能够在大规模地形实时渲染中提高渲染及数据压缩的速率,提出一种利用GPU并行优化的快速EZC-DCT地形压缩算法。采用二维快速DCT变换代替EZC-DCT算法中的DCT变换,在利用GPU对算法进行并行加速的基础之上,对算法的并行方案进行优化改进,更加有效地利用GPU强大的并行计算能力,分担CPU的负荷,快速完成相关计算。实验结果表明,该算法帧速率比原EZC-DCT方法提升约10个百分点,满足地形渲染的实时性要求。 相似文献
16.
强化学习是一种重要的机器学习方法,然而在实际应用中,收敛速度缓慢是其主要不足之一。为了提高强化学习的效率,提出了一种并行强化学习算法。多个同时学习,在各自学习一定周期后,利用D-S证据利用对学习结果进行融合,然后在融合结果的基础上,各进行下一周期的学习,从而实现提高整个系统学习效率的目的。实验结果表明了该方法的可行性和有效性。 相似文献
17.
Virginia Niculescu 《The Journal of supercomputing》2004,29(1):5-25
Data distributions have a serious impact on time complexity of parallel programs, developed based on domain decomposition. A new kind of distributions—set distributions, based on set-valued mappings, is introduced. These distributions assign a data object to more than one process. The set distributions can be used especially when the number of processes is greater than the data input size, but, sometimes using set distributions can lead to efficient general parallel algorithms. The work-load properties of these distributions and their impact on the number of communications are discussed. In order to illustrate the implications of data distributions in the construction of parallel programs, some examples are presented. Two parallel algorithms for computation of Lagrange interpolation polynomial are developed, starting from simple distributions and set distributions. 相似文献
18.
PAML是一款利用最大似然法进行系统发育分析的软件包,被广泛使用.然而,由于模型复杂、参数众多,PAML的计算过程非常耗时.对PAML中最重要的codeml程序进行了并行算法研究,通过算法分析和程序Profiling确定程序瓶颈.在此基础上,利用现代CPU的多核并行能力和SIMD并行机制优化程序瓶颈,从而提高了程序整体的运行速度.实际数据集和人工数据集上的实验表明并行算法有效提高了codeml的计算速度,加速比最高达7.94倍. 相似文献
19.
结合0-1整数规划的隐式枚举法对目标排序法进行分析.引入PSRS(并行正则采样排序)算法对目标排序法的核心运算进行并行化,并改进PSRS算法的数据收集策略以适应0-1整数规划的并行隐式枚举.最后给出了基于改进的PSRS的并行0-1整数规划的求解算法,并对算法的时间复杂度进行了分析. 相似文献