首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 9 毫秒
1.
In this article, we present a multigrid algorithm for parallel computers, the chopped parallel multigrid (CPMG) algorithm. The CPMG algorithm improves the processor utilization by reducing the work load on coarse grids without affecting the convergence rate of the algorithm. This is in contrast to earlier approaches (Gannon and van Rosendale, 1986; Frederickson and McBryan, 1989), where unutilized processors are used to improve the convergence rate. The CPMG algorithm reduces the coarse grid work bychopping the alternate cycles of multigrid. Using analytical results and simulations on sequential machines we show that the CPMG can achieve almost the same convergence rate as standard MG for many cases. Analytically we show that the advantage gained by CPMG over standard MG on a mesh connected massively parallel machine is 33% in hardware utilization, 50% in communication overheads and 38% in overall execution time. We have also evaluated the performance of CPMG on an actual massively parallel machine, the DAP-510. The advantage gained by CPMG over standard MG is 35% in overall execution time. Moreover, the CPMG can be integrated with other parallel multigrid algorithms, such as the PSMG algorithm (Frederickson and McBryan, 1989) and Decker's algorithm (Decker, 1990).  相似文献   

2.
并行算法研究方法学   总被引:17,自引:0,他引:17  
并行算法是计算机科学中重要的研究内容,已有几十年的发展历程.回顾一下其研究历程,既有高潮也有低谷,究其原因是,它没有形成自身的一套研究方法学.为此文中提出并行算法研究要建立起一套完整的"理论-设计-实现-应用"的学科体系,也就是所谓的并行算法研究的生态环境.只有这样才能够保持并行算法研究稳定、可持续发展,并使得并行算法的研究成果更加实用,从而更富有生命力.  相似文献   

3.
A novel parallel delayed least-mean-square (PDLMS) algorithm is proposed by introduc-ing the parallel processing method into delayed LMS (DLMS) algorithm. Compared with DLMS, the al-gorithm presented has the property of smaller delay, higher throughput rate and faster convergence speed, while it also exhibits some de-correlation effect for the correlated input sequence. These prop-erties make it more suitable for the cases of high order filter with high convergence speed. At the same time, it can be mapped onto the high-speed and/or high-pipelined hardware structure directly.  相似文献   

4.
针对显式反馈信息作出的推荐在准确率和数据稀疏性处理上还存在缺陷的问题,引入隐式反馈信息,设计和实现了一种引入隐式反馈的多维度推荐算法(i MCF)。该算法涵盖用户、项目和隐式反馈三个维度的信息。对于前两个维度的信息,通过云模型相似度建模;而隐式反馈维度的信息主要是结合概率矩阵分解模型进行处理。之后再把这三个维度得出的预测评分根据权值进行平衡,得出最终预测评分并作出推荐。实验数据表明,该算法在召回率和准确率上的表现相对于其他算法有了较为明显的提升,且适合大数据环境。  相似文献   

5.
基于数据预处理的并行分层聚类算法*   总被引:3,自引:0,他引:3  
分层聚类技术在图像处理、入侵检测和生物信息学等方面有着极为重要的应用,是数据挖掘领域的研究热点之一。针对目前基于SIMD模型的并行分层聚类算法处理海量数据时效果不理想的问题,提出一种基于数据预处理的自适应并行分层聚类算法,在O((λn)2/p)的时间内对n个输入数据点进行聚类。其中1≤p≤n/log n,0.1≤λ≤0.3。将提出的算法与现有文献结论进行的性能对比分析表明,本算法明显改进了现有文献的研究结果。  相似文献   

6.
《国际计算机数学杂志》2012,89(3-4):435-440
This paper presents a parallel algorithm for solving the implicit diffusion difference equations. The basic idea is based on vectorization of the tridiagonal Toeplitz difference equations. This method is superior to the algorithm showed by H. Stone [8]. We computed some examples on an NEC SX-3/44R supercomputer by our method. The results showed a good parallelism with this algorithm.  相似文献   

7.
《国际计算机数学杂志》2012,89(3-4):249-270
We study the parallel implementation of two diagonalization methods for solving dense linear systems: the well known Gauss-Jordan method and a new one introduced by Huard. The number of arithmetic operations performed by the Huard method is the same as for Gaussian elimination, namely 2n 3/3, less than for the Jordan method, namely n 3. We introduce parallel versions of these methods, compare their performances and study their complexity. We assume a shared memory computer with a number of processors p of the order of n, the size of the problem to be solved, We show that the best parallel version for Jordan's method is by rows whereas the best one for Huard's method is by columns. Our main result states that for a small number of processors the parallel Huard method is faster than the parallel Jordan method and slower otherwise. The separation is obtained for p = 0.44n.  相似文献   

8.
针对基于PVM的由桌面PC机联网而成的网络并行计算环境中,处理机的运算速度较快而处理机间的通信相对较慢的实际情况,给出了一种局域网求解三角形方程组的并行算法,该算法将三角形方程组的系数矩阵及右端项按行分块,然后将分块的系数矩阵及右端项按卷帘方式存储在各处理机,通过循环传送已求出的解的部分分量以减少处理机间的通信开销,实现较容易。并在1-4台桌面PC机联成的局域网,PVM 3.4 on Windows2000,VC 6.0并行计算平台上编程对该算法进行了数值试验,试验结果表明该算法是有效的。  相似文献   

9.
基于平衡划分的并行投影算法   总被引:2,自引:2,他引:0  
基于DL算法,提出并分析了平衡划分并行投影算法PROJECT-DL。在PROJECT-DL算法中,数据被平均划分并分配给所有处理机,因而每个处理机具有相同的工作负载。给出了网络并行计算环境下的实验结果,并与PROJECT-S、PROJECT-NS算法进行了对比。理论分析和实验结果表明,PROJECT-DL算法是一种高并行效率、高扩展性的并行投影算法。  相似文献   

10.
Effective design of parallel matrix multiplication algorithms relies on the consideration of many interdependent issues based on the underlying parallel machine or network upon which such algorithms will be implemented, as well as, the type of methodology utilized by an algorithm. In this paper, we determine the parallel complexity of multiplying two (not necessarily square) matrices on parallel distributed-memory machines and/or networks. In other words, we provided an achievable parallel run-time that can not be beaten by any algorithm (known or unknown) for solving this problem. In addition, any algorithm that claims to be optimal must attain this run-time. In order to obtain results that are general and useful throughout a span of machines, we base our results on the well-known LogP model. Furthermore, three important criteria must be considered in order to determine the running time of a parallel algorithm; namely, (i) local computational tasks, (ii) the initial data layout, and (iii) the communication schedule. We provide optimality results by first proving general lower bounds on parallel run-time. These lower bounds lead to significant insights on (i)–(iii) above. In particular, we present what types of data layouts and communication schedules are needed in order to obtain optimal run-times. We prove that no one data layout can achieve optimal running times for all cases. Instead, optimal layouts depend on the dimensions of each matrix, and on the number of processors. Lastly, optimal algorithms are provided.  相似文献   

11.
提出了一种新的并行并操作算法PUDL,充分利用DL子算法能精确定位多个划分点的特性,使得划分后各个处理机要处理的子关系大小相等.因而算法具有较高的负载平衡性、可扩展性.最后给出了基于PC集群的实验结果,并把该结果与UNION-S、UNION-NS算法作了比较.  相似文献   

12.
在对称多处理机系统上,提出了一种求解稀疏对称有限元线性系统的正规化精确并行逆算法。该算法以一种避免数据依赖的反对角运动方法为基础,使用OpenMP编译指导来实现。诸如加速比和效率等数值实验结果的推出,说明在一个对称多处理机系统上,所提出的算法求解方法能更好地提高性能,获得更大的加速。  相似文献   

13.
Thepaging problem is that of deciding which pages to keep in a memory ofk pages in order to minimize the number of page faults. We develop thepartitioning algorithm, a randomized on-line algorithm for the paging problem. We prove that its expected cost on any sequence of requests is within a factor ofH k of optimum. (H k is thekth harmonic number, which is about ln(k).) No on-line algorithm can perform better by this measure. Our result improves by a factor of two the best previous algorithm.Partial support for D. D. Sleator was provided by DARPA, ARPA Order 4976, Amendment 20, monitored by the Air Force Avionics Laboratory under Contract F33615-87-C-1499, and by the National Science Foundation under Grant CCR-8658139.  相似文献   

14.
若干并行计算模型上的N体问题求解算法   总被引:1,自引:0,他引:1  
从在实际中广泛应用的N体问题入手,研究如何在几种实际的并行计算模型(PRAM、APRAM、BSP、LogP、NHBL)上设计具体的并行算法;给出了这些模型上的并行算法的设计模式,分析不同模型上算法的性能,比较各个模型上算法设计风格以及算法性能的差异,并对这些并行计算模型做一个综合的评价。  相似文献   

15.
为能够在大规模地形实时渲染中提高渲染及数据压缩的速率,提出一种利用GPU并行优化的快速EZC-DCT地形压缩算法。采用二维快速DCT变换代替EZC-DCT算法中的DCT变换,在利用GPU对算法进行并行加速的基础之上,对算法的并行方案进行优化改进,更加有效地利用GPU强大的并行计算能力,分担CPU的负荷,快速完成相关计算。实验结果表明,该算法帧速率比原EZC-DCT方法提升约10个百分点,满足地形渲染的实时性要求。  相似文献   

16.
并行强化学习算法及其应用研究   总被引:2,自引:0,他引:2       下载免费PDF全文
强化学习是一种重要的机器学习方法,然而在实际应用中,收敛速度缓慢是其主要不足之一。为了提高强化学习的效率,提出了一种并行强化学习算法。多个同时学习,在各自学习一定周期后,利用D-S证据利用对学习结果进行融合,然后在融合结果的基础上,各进行下一周期的学习,从而实现提高整个系统学习效率的目的。实验结果表明了该方法的可行性和有效性。  相似文献   

17.
Data distributions have a serious impact on time complexity of parallel programs, developed based on domain decomposition. A new kind of distributions—set distributions, based on set-valued mappings, is introduced. These distributions assign a data object to more than one process. The set distributions can be used especially when the number of processes is greater than the data input size, but, sometimes using set distributions can lead to efficient general parallel algorithms. The work-load properties of these distributions and their impact on the number of communications are discussed. In order to illustrate the implications of data distributions in the construction of parallel programs, some examples are presented. Two parallel algorithms for computation of Lagrange interpolation polynomial are developed, starting from simple distributions and set distributions.  相似文献   

18.
PAML是一款利用最大似然法进行系统发育分析的软件包,被广泛使用.然而,由于模型复杂、参数众多,PAML的计算过程非常耗时.对PAML中最重要的codeml程序进行了并行算法研究,通过算法分析和程序Profiling确定程序瓶颈.在此基础上,利用现代CPU的多核并行能力和SIMD并行机制优化程序瓶颈,从而提高了程序整体的运行速度.实际数据集和人工数据集上的实验表明并行算法有效提高了codeml的计算速度,加速比最高达7.94倍.  相似文献   

19.
结合0-1整数规划的隐式枚举法对目标排序法进行分析.引入PSRS(并行正则采样排序)算法对目标排序法的核心运算进行并行化,并改进PSRS算法的数据收集策略以适应0-1整数规划的并行隐式枚举.最后给出了基于改进的PSRS的并行0-1整数规划的求解算法,并对算法的时间复杂度进行了分析.  相似文献   

20.
一种基于学习机制的并行遗传算法   总被引:6,自引:0,他引:6  
基于生物学群落的概念,提出了一个群落-种群-个体的三层模型,并在该模型上发展了一种基于学习机制的并行遗传算法(PGABL)。算法引入黑板模型作为控制和交互的数据结构,采用群内、群间、群落三个学习算子,将遗传进化和遗传学习相结合,有效地改善了遗传算法的性能。实验结果表明,该算法具有良好的适应性和稳定性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号