首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 71 毫秒
1.
自行研制的三维并行全电磁PIC模拟软件UNIPIC-3D具有模拟高功率微波器件的能力。软件实现了并行的三维FDTD、粒子推进算法以及边界条件处理。软件通过读入输入文件进行规则与不规则两种区域划分方式,电磁场和粒子的并行化采用MPI机制,让粒子和电磁场的计算与通信同步,在高性能并行计算机上对软件的并行效率进行了测试。通过与2.5维UNIPIC软件的结果比较,验证了UNIPIC-3D软件并行模块的正确性。  相似文献   

2.
在采用Metis软件包进行区域分解的基础上,结合大型稀疏矩阵的行压缩存储方式,求解基于预处理共轭梯度法的有限元方程组,开发并行有限元计算程序。对三维地质模型进行计算模拟,得到的张店-仁河断裂近断层地震危害场的位移场数值计算结果表明,该并行程序取得了较好的计算加速比。  相似文献   

3.
气象资料三维变分同化阶段区域分解并行实现   总被引:2,自引:0,他引:2  
变分同化由于能明显改善同化质量,正在成为数值天气预报的主流同化方法.研究三维变分同化的并行计算,提出了三维变分同化的阶段区域分解、观测资料的自适应划分算法、计算与通信重叠的矩阵转置和周边区域通信以及文件I/O方法,在此基础上实现了MPI并行三维变分原型系统,在由8个双CPU节点组成的Linux机群上并行加速比达到了11.9.  相似文献   

4.
从求解三维绕流问题的Boltzmann模型方程的数值模拟程序出发,通过研究区域分解并行计算策略,引入输入/输出、通信与CACHE等优化策略,对数值模拟程序进行MPI并行化移植与高性能计算调试。以高空稀薄过渡流区飞行器绕流状态为算例,进行了MPI大规模并行计算测试,证实了所发展的MPI并行化区域分解策略及程序优化途径的正确性。研究表明开展的并行化实现能明显地缩短模式计算时间,并取得较好的效果。  相似文献   

5.
近年来,并行化洪水演进模拟技术发展迅速,在防汛减灾领域发挥重要作用.在考虑洪水演进模型的数值方法、并行模式和编程技术等因素后,选取一些有代表性的洪水演进模型,分析了同构并行和异构并行洪水演进模型涉及的技术细节,提出并行化模型开发的技术难点和解决方法.最后,提出将来并行化洪水演进模型研发的着力点:非结构网格模型的异构并行...  相似文献   

6.
《计算机科学与探索》2019,(11):1852-1863
三维声弹性理论及计算方法为海洋弹性浮体结构流固耦合振动声辐射与海洋声传播提供了理论基础,在海洋弹性浮体结构研究中具有很重要的影响。根据三维声弹性不同计算阶段计算密度特征,基于神威太湖之光超级计算系统,完成了三维声弹性应用软件(THAFTS-Acoustic)的多级并行和优化。优化技术包括循环分裂、循环合并、直接内存存取(DMA)、通信和计算的相互隐藏、基于神威太湖之光的向量化(SIMD)等方法。测试结果表明:三维声弹性多级异构并行具有较好的MPI扩展性能和众核并行加速效果,核心段加速可达18倍,64进程时程序整体相较原始程序并行程序加速5.5倍,可有效地发挥"神威·太湖之光"的强大计算能力,进一步支持THAFTS-Acoustic进行超大规模和更高精度的并行计算。  相似文献   

7.
陈江  赵永华  迟学斌 《计算机工程》2005,31(22):58-60,94
COUPL+是一种基于消息传递模型的并行库,它将并行程序巾需要处理的数据划分、消息传递函数的调用等都封装在其函数中。COUPL+可以简化在分布式存储结构并行机上编写基于网格的应用程序的任务。该文简要介绍了COUPL+的基本原理,以及它与MPI、OpenMP和HPF的特性对比;并且使用COUPL+实现了共轭梯度法和结构化网格计算两种并行计算中常用的任务,也对比了使用MPI和HPF的性能差异。  相似文献   

8.
蒙特卡罗(MC)模拟广泛用于核工程和核安全计算中,但在较高置信度要求下计算量大、计算周期长,难以满足工程周期要求。通过分析串行算法,针对大型SMP服务器Oracle M9000的结构特点,采用Open MP技术对其进行了并行化和实验计算。结果表明,多线程并行技术适合蒙特卡罗模拟方法和M9000结构体系,能获得极高的加速性能,且并行结果与串行结果完全一致。这为满足工程计算的高置信度、短周期要求提供了解决方案。  相似文献   

9.
成杰  张林波 《计算机科学》2012,39(5):278-281
介绍了所研制的一个开源三维结构分析并行自适应有限元软件PHG-Solid。它是以并行自适应有限元软件平台PHG为基础开发的,支持在纯三维结构上进行并行自适应有限元分析。与现有的商业和开源结构分析有限元软件相比,PHG-Solid的特点和优势在于:1)支持完全自动化且高度并行的自适应有限元计算;2)能稳健高效地求解大规模问题,具有很好的计算规模可扩展性;3)易于扩展,用户可根据需要添加相应的计算模块。通过几个大型数值算例来展示该软件的计算能力和并行可扩展性,其中的最大计算规模超过了5亿自由度,最大并行规模达到了1024个MPI进程。  相似文献   

10.
本文描述了化学复合驱数值模拟程序UTCHEM在分布式内存多计算机并行系统SMP-CLUSTER上并行化的关键技术。化学复合驱并行模型采用单程序多数据(SPMD)程序模型,利用区域分解方法将整个求解区域分解为子区域,使得多个计算节点同时求解一个单一的模拟问题。各计算节点通过消息传递对重叠区域的共享数据进行通信,以协调各节点之问的计算。目前仅对压力方程组求解部分进行了并行化实现。测试结果显示了较好的并行效率。  相似文献   

11.
提出一种按照计算域分解的并行化方法来构建等几何分析的刚度矩阵和右侧向量.将计算域分解成为若干个不相交的子区域,然后为每个区域分配一个处理器,所有处理器并行进行子区域上面的计算,所有处理器完成子区域的计算以后,使用一个快速的归并算法完成线性系统的装配.实验表明,本文提出的方法在8核的机器上可以达到6.46的加速比,能够在4秒左右的时间计算680万个矩阵元素个数.使用Intel MKL稀疏求解器来求解线性系统,本文的等几何分析求解器能够在大约10秒的时间内求解52万的自由度,本文的方法比ISOGAT速度要快上万倍.  相似文献   

12.
Yu and Wang [1, 2] implemented the first theoretically exact spiral cone-beam reconstruction algorithm developed by Katsevich [3, 4]. This algorithm requires a high computational cost when the data amount becomes large. Here we study a parallel computing scheme for the Katsevich algorithm to facilitate the image reconstruction. Based on the proposed parallel algorithm, several numerical tests are conducted on a high performance computing (HPC) cluster with thirty two 64-bit AMD-based Opteron processors. The standard phantom data [5] is used to establish the performance benchmarks. The results show that our parallel algorithm significantly reduces the reconstruction time, achieving high speedup and efficiency.  相似文献   

13.
Multibody Analysis of Controlled Aeroelastic Systems on Parallel Computers   总被引:1,自引:0,他引:1  
The paper describes the application of parallel techniques to amultibody multidisciplinary formulation. The problem is stated interms of a system of nonlinear Differential-Algebraic Equations(DAE). The parallel solution is obtained using a sub-structuringdomain decomposition method, that is able to exploit thecharacteristic quasi-monodimensional topology that multibodymodels usually present. The presence of explicit constraints inform of algebraic equations requires particular care in thetreatment of the related unknowns, to avoid local singularityproblems. The code has been successfully tested on differentcomputer architectures. Special attention has been dedicated toproduce a code that will efficiently work on a cluster of PCs.Results of three test problems, regarding the simulation of anonlinear beam bending and of complex aeroservomechanical systemsas an helicopter rotor and a tiltrotor aircraft, are presented.  相似文献   

14.
51.引言 在许多重要研究领域中,数值模拟相当复杂,数值模拟的结果依赖于数值方法的选取,计算网络的质量,边界处理等,其复杂性表现在物理特性、数学模型、计算区域不规则的几何形状等方面.当计算区域各部分的物理特性不同而且差异较大时,比如多种物质的流体运动流场中各个部分变化程度不均匀,有些部分变化非常平缓,有些部分变化极其剧烈;或者,当计算区域极其不规则时,比如空气动力学中的进气道系统的流场计算,绕复杂形状流场的数值分析等.若在计算区域上作整体计算,不仅难以准确地描述流场变化,而且受到计算机运算速度、…  相似文献   

15.
《国际计算机数学杂志》2012,89(15):2047-2060
The large spatial scale associated with the modelling of strong ground motion in three dimensions requires enormous computational resources. For this reason, the simulation of soil shaking requires high-performance computing. The aim of this work is to present a new parallel approach for these kind of problems based on domain decomposition technique. The main idea is to subdivide the original problem into local ones. It allows to investigate large-scale problems that cannot be solved by a serial code. The performance of our parallel algorithm has been examined analysing computational times, speed-up and efficiency. Results of this approach are shown and discussed.  相似文献   

16.
We explore an approach due to Nievergelt of decomposing a time-evolution equation along the time dimension and solving it in parallel with as little communication as possible between the processors. This method computes a map from initial conditions to final conditions locally on slices of the time-domain, and then patches these operators together into a global solution using a single communication step. A basic error analysis is given, and some comparisons are made with other parallel in time methods. Based on the assumption that parallel computation is cheap but communication is very expensive, it is shown that this method can be competitive for some problems. We present numerical simulations on graphic chips and on traditional parallel clusters using hundreds of processors for a variety of problems to show the practicality and scalability of the proposed method.  相似文献   

17.
Particle tracking methods are a versatile computational technique central to the simulation of a wide range of scientific applications. In this paper, we present a new parallel particle tracking framework for the applications of scientific computing. The framework includes the in-element particle tracking method, which is based on the assumption that particle trajectories are computed by problem data localized to individual elements, as well as the dynamic partitioning of particle-mesh computational systems. The ultimate goal of this research is to develop a parallel in-element particle tracking framework capable of interfacing with a different order of accuracy of ordinary differential equation (ODE) solver. The parallel efficiency of such particle-mesh systems depends on the partitioning of both the mesh elements and the particles; this distribution can change dramatically because of movement of the particles and adaptive refinement of the mesh. To address this problem we introduce a combined load function that is a function of both the particle and mesh element distributions. We present experimental results that detail the performance of this parallel load balancing approach for a three-dimensional particle-mesh test problem on an unstructured, adaptive mesh, and demonstrate the ability of interfacing with different ODE solvers.  相似文献   

18.
分子动力学模拟是对微观分子原子体系在时间与空间上的运动模拟,是从微观本质上认识体系宏观性质的有力方法.针对如何提升分子动力学并行模拟性能的问题,本文以著名软件GROMACS为例,分析其在分子动力学模拟并行计算方面的实现策略,结合分子动力学模拟关键原理与测试实例,提出MPI+OpenMP并行环境下计算性能的优化策略,为并行计算环境下实现分子动力学模拟的最优化计算性能提供理论和实践参考.对GPU异构并行环境下如何进行MPI、OpenMP、GPU搭配选择以达到性能最优,本文亦给出了一定的理论和实例参考.  相似文献   

19.
This paper proposes two viable computing strategies for distributed parallel systems: domain division with sub-domain overlapping and asynchronous communication. We have implemented a parallel computing procedure for simulation of Ti thin film growing process of a system with 1000 x 1000 atoms by means of the Monte Carlo (MC) method. This approach greatly reduces the computation time for simulation of large-scale thin film growth under realistic deposition rates. The multi-lattice MC model of deposition comprises two basic events: deposition, and surface diffusion. Since diffusion constitutes more than 90% of the total simulation time of the whole deposition process at high temperature, we concentrated on implementing a new parallel diffusion simulation that reduces communication time during simulation. Asynchronous communication and domain overlapping techniques are used to reduce the waiting time and communication time among parallel processors. The parallel algorithms we propose can simulate the thin  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号