首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 149 毫秒
1.
付朝江  陈洪均 《计算机应用》2015,35(12):3387-3391
针对弹塑性问题的有限元分析非常耗时,基于消息传递接口(MPI)集群环境,提出了残余平滑的子结构预处理共轭梯度并行算法。采取区域分解,将子结构通过界面条件处理为独立的有限元模型。整体分析时,每个处理器仅存储与其相关的子结构信息并生成局部刚度矩阵。采用对角存储方式和最小残余平滑法,设计出了结合残余平滑(MR)的并行子结构预处理共轭梯度(PCG)算法。并行算法中对负载平衡进行了探讨,对处理器间的通信进行了优化。利用子步法对弹塑性应力应变进行积分,根据预定的容许值自动调整每个子步的大小来控制积分过程的误差。在工作站集群上实现了数值算例,分析了算法的性能,计算性能与传统的PCG算法进行了比较。算例显示:所提算法具有良好的加速比和效率,优于传统的PCG算法,对弹塑性问题的有限元分析,是一种有效的并行求解算法。  相似文献   

2.
本文提出一种求解大规模稀疏矩阵特征问题的并行共轭梯度算法.为了提高算法的并行效率,设计了负载平衡的行划分方式,实现了计算和通信重叠的稀疏矩阵重排序方法,通过预处理减少计算过程中各进程间消息传递的通信量.另外,基于多核处理器高性能并行计算,实现了MPI和细粒度(线程级)OpenMP混合并行算法.在深腾7800并行计算机上对并行算法进行了测试,结果表明在进程数增多时并行算法可保持通信时间稳定性,在并行计算机上有很好的扩展性,适合大规模稀疏特征问题的求解.  相似文献   

3.
SMP集群系统上矩阵特征问题并行求解器的有效算法   总被引:2,自引:0,他引:2  
对称矩阵三对角化和三对角对称矩阵的特征值求解是稠密对称矩阵特征问题并行求解器的关键步 .针对SMP集群系统的多级体系结构,基于Householder变换的矩阵三对角化和三对角矩阵特征值问题的分而治之算法,给出了它们的MPI OpenMP混合并行算法 .算法研究集中在SMP集群系统环境下的负载平衡、通信开销和性能评价 .混合并行算法的设计结合了粗粒度线程并行模式和任务共享的动态调用方法,改善了MPI算法中的负载平衡问题、降低了通信开销 .在深腾6800上的实验表明,基于混合并行算法的求解器比纯MPI版本的求解器具有更好的性能和可扩展性 .  相似文献   

4.
并行计算正成为科学和工程计算中的一个新趋势。将采用区域分裂技术的并行有限元方法应用于工作站机群的分布式并行环境。提出了基于单元区域分裂的共轭梯度并行算法。在工作站机群上对坝体结构进行求解,对其并行性能进行分析。  相似文献   

5.
基于网络机群这一新的并行环境和消息传递界面MPI给出了两种不带平方根的Cholesky并行分解算法,算法采用行卷帘存储方案和提前发送策略,从而减少了负载的不平衡,增加了计算通信的重叠,减少了通信时间.理论分析和数值试验均表明,算法具有较高的并行加速比和效率.  相似文献   

6.
对称正定矩阵的并行LDLT分解算法实现   总被引:1,自引:0,他引:1  
基于网络机群这一新的并行环境和消息传递界面MPI给出了两种不带平方根的Cholesky并行分解算法,算法采用行卷帘存储方案和提前发送策略,从而减少了负载的不平衡,增加了计算通信的重叠,减少了通信时间。理论分析和数值试验均表明,算法具有较高的并行加速比和效率。  相似文献   

7.
作为颗粒离散元软件并行化的前期研究,对二维稳态导热问题的有限差分法求解程序进行了并行化处理.并行算法将计算域划分为若干个子域,并将各子域上的迭代计算任务分配给相应的处理器执行.同时,算法考虑负载平衡,并采用计算和通信的重叠技术,提高并行算法的效率.通过对二维稳态温度场导热问题的串/并行程序在曙光TC2600刀片服务器上的计算结果进行比较分析,验证了该并行方法的有效性.实验结果表明,计算耗时与通信耗时的比值越大,并行效率越高.  相似文献   

8.
针对双三次数值天气预报模式进行了并行算法研究。采用一维区域分解算法,借鉴块棋盘划分矩阵转置算法,设计和实现了数据转置通信算法,并采取计算与通信重叠技术减小通信时间对并行效率的影响,最终实现了双三次数值天气预报模式的并行算法,并在机群系统上进行了并行性能测试评估。结果表明,实现的双三次数值预报模式并行算法的并行效率较高,设计实现的数据转置通信算法、计算与通信重叠技术取得了较好的效果。  相似文献   

9.
分子动力学模拟的优化与并行研究   总被引:3,自引:1,他引:2  
分析讨论了分子动力学模拟的算法特征和计算特点,对串行程序作了优化,并使之适合于作并行化。对模拟体系使用区域分解的方法,在计算节点间保留了部分重叠区域,采用基于消息传递的MPI设计平台,在可扩展机群上实现了并行化,获得了90%以上的并行效率。  相似文献   

10.
片级三维寄生电容的并行提取算法   总被引:1,自引:0,他引:1  
随着多核CPU和分布式机群的日益普及,并行计算被日益广泛地应用于科学与工程实践中,以解决复杂的数值模拟问题.提出片级三维寄生电容的并行提取算法,它基于三维层次式块边界元素法,应用双向重叠组合思想将芯片划分为4类大小不同的"窗口";采用可变长的动态混合队列进行静态/动态结合的任务调度方法将全部"窗口"分配到不同进程,并在稀疏矩阵求和及进程间的规约求和运算中采用了提高并行效率的技术,达到了较好的负载平衡和较高的加速比.在分布式机群上采用消息传递接口编程的实验,验证了文中算法的有效性.  相似文献   

11.
This paper presents parallel computational strategies to implement explicit nonlinear finite element analysis code onto distributed memory parallel computers for solving large-scale problems in structural dynamics. Implementation details on both homogeneous and heterogeneous parallel processing environments are considered in detail in this paper. Implementation of an explicit nonlinear finite element dynamic analysis code on homogeneous systems is discussed first and this is later moved onto heterogeneous systems. Domain decomposition with explicit message passing is preferred for parallel implementation. The message passing implementation in the parallel algorithm is based on MPI (Message Passing Interface) libraries. Implementation aspects of overlapped, non-overlapped domain decomposition techniques, Dynamic Task Allocation (DTA) and clustering techniques for DTA and their relative merits are presented. The interprocessor communications are optimised by overlapping with computations to improve the performance of the domain decomposition based explicit dynamic analysis finite element code.The issues related to implementation of finite element code for nonlinear dynamic analysis on heterogeneous parallel computing environment are later presented. A new dynamic load-balancing algorithm is developed for this purpose and it is integrated with the domain decomposition based parallel explicit finite element code to test our algorithms on a coarse grain heterogeneous cluster of workstations. Numerical experiments have been carried out on PARAM-10000, an Indian parallel computer and also on cluster of Unix workstations.  相似文献   

12.
Some highlights of a recently developed finite element program capable of simulating the impact of a fast moving flexible or rigid object on a deforming substrate are briefly discussed. A finite element model specially tailored for this application and a parallel explicit solver using domain decomposition and message passing technologies were developed. A typical numerical example is presented.  相似文献   

13.
This paper presents a parallel mixed time integration algorithm formulated by synthesising the implicit and explicit time integration techniques. The proposed algorithm is an extension of the mixed time integration algorithms [Comput. Meth. Appl. Mech. Engng 17/18 (1979) 259; Int. J. Numer. Meth. Engng 12 (1978) 1575] being successfully employed for solving media-structure interaction problems. The parallel algorithm for nonlinear dynamic response of structures employing mixed time integration technique has been devised within the broad framework of domain decomposition. Concurrency is introduced into this algorithm, by integrating interface nodes with explicit time integration technique and later solving the local submeshes with implicit algorithm. A flexible parallel data structure has been devised to implement the parallel mixed time integration algorithm. Parallel finite element code has been developed using portable Message Passing Interface software development environment. Numerical studies have been conducted on PARAM-10000 (Indian parallel supercomputer) to test the accuracy and also the performance of the proposed algorithm. Numerical studies indicate that the proposed algorithm is highly adaptive for parallel processing.  相似文献   

14.
Structural dynamics methods for concurrent processing computers   总被引:3,自引:0,他引:3  
In the area of crash impact, research is urgently required on the development and evaluation of parallel methods for crash dynamics analysis of complex nonlinear finite element and/or finite difference structural problems. An investigation of selected nonlinear dynamics algorithms appropriate for parallel computers is reported. Implicit methods such as those of the Newmark type which build on the Cholesky decomposition strategy and explicit methods such as the central difference time integration method are included. Both implicit and explicit dynamics algorithms are investigated on two significantly different parallel computers, the FLEX/32 shared memory multicomputer and the INTEL iPSC Hypercube local memory computer.  相似文献   

15.
We present a new parallel semiconductor device simulation using the dynamic load balancing approach. This semiconductor device simulation based on the adaptive finite volume method with a posteriori error estimation has been developed and successfully implemented on a 16-PC Linux cluster with a message passing interface library. A constructive monotone iterative technique is also applied for solution of the system of nonlinear algebraic equations. Two different parallel versions of the algorithm to perform a complete device simulation are proposed. The first is a dynamic parallel domain decomposition approach, and the second is a parallel current-voltage characteristic points simulation. This implementation shows that a well-designed load balancing simulation can significantly reduce the execution time up to an order of magnitude. Compared with the measured data, numerical results on various submicron VLSI devices are presented, to show the accuracy and efficiency of the method.  相似文献   

16.
Two- and three-dimensional turbomachinery flows in stationary and rotating compressor cascades are studied by using a one-level inexact explicit Schwarz method, and a cubic eddy viscosity turbulence closure. The message passing paradigm is used for the parallel implementation of the domain decomposition algorithm, allowing the solver portability on different parallel platforms. A convergence accelerator is proposed, based on a condensed cycle structure that merges the additive Schwarz iterations with the fixed point non-linear ones. The use of a stable finite element formulation on higher-order elements Q2-Q1 is addressed as a mean for retaining non-oscillatory and accurate solutions. Furthermore, the elementwise quadratic approximation is used to enable the exact implementation of higher-order integrals arising in the anisotropic turbulence closure adopted. Numerical campaigns are carried out on IBM SP2 and SP3, and CRAY T3E architectures, in order to demonstrate the portability. The accompanying performance improvement is assessed. Finally, the predicting capabilities are discussed with reference to challenging turbomachinery test cases: a transitional linear compressor cascade, and an isolated compressor rotor designed for non-free vortex operation. Convergence speed-up in such configurations is discussed.  相似文献   

17.
本文选取了三维不可压缩流动方程的分步法(fractional-step method),其中动量方程使用BiCGSTAB算法进行迭代求解,而压力泊松方程使用Fourier变换法进行直接求解。本文研究该算法在集群平台上的并行算法,从区域分解入手,分析一维、两维、三维区域划分三种情况下,各并行处理器上的计算量与通讯量,根据分析结果使用两维区域分解。分析BiCGSTAB算法和泊松Fourier变换法在GPGPU异构平台上的移植方法。最后,本文分析了BiCGSTAB和泊松方程Fourier变换法两种算法在CPU集群和GPGPU异构平台上的并行性能结果。  相似文献   

18.
根据交通网络仿真的并行特征采用域分解方法设计交通并行仿真系统的框架,把交通网络分为几个子网,集群系统的每个节点机分别负责其中的一个子网,提出基于车辆数负载的网络分割算法来平衡各子网的负载量,并分析子网之间的通信机理.同时,在基于MPI 的并行计算平台上实现设计的并行仿真系统.通过实例表明,提出的并行算法能大大提高交通网络仿真的速度和效率.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号