期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

陈再高王玥王建国张殿辉付梅艳乔海亮袁媛《计算机工程与科学》2009,31(11)

自行研制的三维并行全电磁PIC模拟软件UNIPIC-3D具有模拟高功率微波器件的能力。软件实现了并行的三维FDTD、粒子推进算法以及边界条件处理。软件通过读入输入文件进行规则与不规则两种区域划分方式,电磁场和粒子的并行化采用MPI机制,让粒子和电磁场的计算与通信同步,在高性能并行计算机上对软件的并行效率进行了测试。通过与2.5维UNIPIC软件的结果比较,验证了UNIPIC-3D软件并行模块的正确性。相似文献

2.

近断层危害场的三维并行数值模拟

程海英谢江邵华钢《计算机工程》2010,36(22):37-39

在采用Metis软件包进行区域分解的基础上,结合大型稀疏矩阵的行压缩存储方式,求解基于预处理共轭梯度法的有限元方程组,开发并行有限元计算程序。对三维地质模型进行计算模拟,得到的张店-仁河断裂近断层地震危害场的位移场数值计算结果表明,该并行程序取得了较好的计算加速比。相似文献

3.

气象资料三维变分同化阶段区域分解并行实现 总被引：2，自引：0，他引：2

张卫民朱小谦赵军《计算机研究与发展》2005,42(6):1059-1064

变分同化由于能明显改善同化质量,正在成为数值天气预报的主流同化方法．研究三维变分同化的并行计算,提出了三维变分同化的阶段区域分解、观测资料的自适应划分算法、计算与通信重叠的矩阵转置和周边区域通信以及文件I／O方法,在此基础上实现了MPI并行三维变分原型系统,在由8个双CPU节点组成的Linux机群上并行加速比达到了11．9．相似文献

4.

并行化洪水演进模拟研究综述

李健张大伟姜晓明向立云《计算机工程与应用》2021,57(13):1-7

近年来,并行化洪水演进模拟技术发展迅速,在防汛减灾领域发挥重要作用.在考虑洪水演进模型的数值方法、并行模式和编程技术等因素后,选取一些有代表性的洪水演进模型,分析了同构并行和异构并行洪水演进模型涉及的技术细节,提出并行化模型开发的技术难点和解决方法.最后,提出将来并行化洪水演进模型研发的着力点:非结构网格模型的异构并行... 相似文献

5.

MPI并行调试与优化策略在三维绕流气体运动论数值模拟中的应用

徐金秀李志辉尹万旺《计算机科学》2012,39(5):300-303

从求解三维绕流问题的Boltzmann模型方程的数值模拟程序出发,通过研究区域分解并行计算策略,引入输入/输出、通信与CACHE等优化策略,对数值模拟程序进行MPI并行化移植与高性能计算调试。以高空稀薄过渡流区飞行器绕流状态为算例,进行了MPI大规模并行计算测试,证实了所发展的MPI并行化区域分解策略及程序优化途径的正确性。研究表明开展的并行化实现能明显地缩短模式计算时间,并取得较好的效果。相似文献

6.

船舶三维声弹性模拟软件的并行优化策略

《计算机科学与探索》2019,13(11):1852-1863

三维声弹性理论及计算方法为海洋弹性浮体结构流固耦合振动声辐射与海洋声传播提供了理论基础;在海洋弹性浮体结构研究中具有很重要的影响。根据三维声弹性不同计算阶段计算密度特征;基于神威太湖之光超级计算系统;完成了三维声弹性应用软件（THAFTS-Acoustic）的多级并行和优化。优化技术包括循环分裂、循环合并、直接内存存取（DMA）、通信和计算的相互隐藏、基于神威太湖之光的向量化（SIMD）等方法。测试结果表明:三维声弹性多级异构并行具有较好的MPI扩展性能和众核并行加速效果;核心段加速可达18倍;64进程时程序整体相较原始程序并行程序加速5.5倍;可有效地发挥“神威·太湖之光”的强大计算能力;进一步支持THAFTS-Acoustic进行超大规模和更高精度的并行计算。相似文献

7.

COUPL+:并行PDE求解函数库

陈江赵永华迟学斌《计算机工程》2005,31(22):58-60,94

COUPL＋是一种基于消息传递模型的并行库,它将并行程序巾需要处理的数据划分、消息传递函数的调用等都封装在其函数中。COUPL＋可以简化在分布式存储结构并行机上编写基于网格的应用程序的任务。该文简要介绍了COUPL＋的基本原理,以及它与MPI、OpenMP和HPF的特性对比;并且使用COUPL＋实现了共轭梯度法和结构化网格计算两种并行计算中常用的任务,也对比了使用MPI和HPF的性能差异。相似文献

8.

蒙特卡洛模拟的并行实现及并行效率研究

《计算机应用与软件》2018,(1)

蒙特卡罗(MC)模拟广泛用于核工程和核安全计算中,但在较高置信度要求下计算量大、计算周期长,难以满足工程周期要求。通过分析串行算法,针对大型SMP服务器Oracle M9000的结构特点,采用Open MP技术对其进行了并行化和实验计算。结果表明,多线程并行技术适合蒙特卡罗模拟方法和M9000结构体系,能获得极高的加速性能,且并行结果与串行结果完全一致。这为满足工程计算的高置信度、短周期要求提供了解决方案。相似文献

9.

三维结构分析并行自适应有限元软件PHG-Solid

成杰张林波《计算机科学》2012,39(5):278-281

介绍了所研制的一个开源三维结构分析并行自适应有限元软件PHG-Solid。它是以并行自适应有限元软件平台PHG为基础开发的,支持在纯三维结构上进行并行自适应有限元分析。与现有的商业和开源结构分析有限元软件相比,PHG-Solid的特点和优势在于:1)支持完全自动化且高度并行的自适应有限元计算;2)能稳健高效地求解大规模问题,具有很好的计算规模可扩展性;3)易于扩展,用户可根据需要添加相应的计算模块。通过几个大型数值算例来展示该软件的计算能力和并行可扩展性,其中的最大计算规模超过了5亿自由度,最大并行规模达到了1024个MPI进程。相似文献

10.

面向超大规模并行模拟的LBM计算流体力学软件

吕小敬刘钊褚学森石树鹏孟虹松黄震春《计算机科学》2020,47(4):13-17

格子玻尔兹曼方法(Lattice Boltzmann Method,LBM)是一种基于介观模拟尺度的计算流体力学方法,已被广泛用于理论研究和工程领域.提高LBM计算流体软件的并行模拟能力,是高性能计算及应用研究中的一项重要内容.该研究基于\"神威?太湖之光\"超级计算系统,设计并实现了一套高效扩展的LBM计算流体力学软件.... 相似文献

11.

一种用计算域分解的等几何分析并行化方法

郭利财黄章进顾乃杰《小型微型计算机系统》2013,34(6)

提出一种按照计算域分解的并行化方法来构建等几何分析的刚度矩阵和右侧向量.将计算域分解成为若干个不相交的子区域,然后为每个区域分配一个处理器,所有处理器并行进行子区域上面的计算,所有处理器完成子区域的计算以后,使用一个快速的归并算法完成线性系统的装配.实验表明,本文提出的方法在8核的机器上可以达到6.46的加速比,能够在4秒左右的时间计算680万个矩阵元素个数.使用Intel MKL稀疏求解器来求解线性系统,本文的等几何分析求解器能够在大约10秒的时间内求解52万的自由度,本文的方法比ISOGAT速度要快上万倍. 相似文献

12.

A Parallel Implementation of the Katsevich Algorithm for 3-D CT Image Reconstruction

Junjun Deng Hengyong Yu Jun Ni Tao He Shiying Zhao Lihe Wang Ge Wang 《The Journal of supercomputing》2006,38(1):35-47

Yu and Wang [1, 2] implemented the first theoretically exact spiral cone-beam reconstruction algorithm developed by Katsevich [3, 4]. This algorithm requires a high computational cost when the data amount becomes large. Here we study a parallel computing scheme for the Katsevich algorithm to facilitate the image reconstruction. Based on the proposed parallel algorithm, several numerical tests are conducted on a high performance computing (HPC) cluster with thirty two 64-bit AMD-based Opteron processors. The standard phantom data [5] is used to establish the performance benchmarks. The results show that our parallel algorithm significantly reduces the reconstruction time, achieving high speedup and efficiency. 相似文献

13.

Multibody Analysis of Controlled Aeroelastic Systems on Parallel Computers 总被引：1，自引：0，他引：1

Quaranta Giuseppe Masarati Pierangelo Mantegazza Paolo 《Multibody System Dynamics》2002,8(1):71-102

The paper describes the application of parallel techniques to amultibody multidisciplinary formulation. The problem is stated interms of a system of nonlinear Differential-Algebraic Equations(DAE). The parallel solution is obtained using a sub-structuringdomain decomposition method, that is able to exploit thecharacteristic quasi-monodimensional topology that multibodymodels usually present. The presence of explicit constraints inform of algebraic equations requires particular care in thetreatment of the related unknowns, to avoid local singularityproblems. The code has been successfully tested on differentcomputer architectures. Special attention has been dedicated toproduce a code that will efficiently work on a cluster of PCs.Results of three test problems, regarding the simulation of anonlinear beam bending and of complex aeroservomechanical systemsas an helicopter rotor and a tiltrotor aircraft, are presented. 相似文献

14.

分区计算问题的并行设计

袁国兴杨朝霞莫则尧《数值计算与计算机应用》2001,22(1):22-28

５１．引言在许多重要研究领域中,数值模拟相当复杂,数值模拟的结果依赖于数值方法的选取,计算网络的质量,边界处理等,其复杂性表现在物理特性、数学模型、计算区域不规则的几何形状等方面．当计算区域各部分的物理特性不同而且差异较大时,比如多种物质的流体运动流场中各个部分变化程度不均匀,有些部分变化非常平缓,有些部分变化极其剧烈;或者,当计算区域极其不规则时,比如空气动力学中的进气道系统的流场计算,绕复杂形状流场的数值分析等．若在计算区域上作整体计算,不仅难以准确地描述流场变化,而且受到计算机运算速度、… 相似文献

15.

A novel parallel approach for 3D seismological problems

《国际计算机数学杂志》2012,89(15):2047-2060

The large spatial scale associated with the modelling of strong ground motion in three dimensions requires enormous computational resources. For this reason, the simulation of soil shaking requires high-performance computing. The aim of this work is to present a new parallel approach for these kind of problems based on domain decomposition technique. The main idea is to subdivide the original problem into local ones. It allows to investigate large-scale problems that cannot be solved by a serial code. The performance of our parallel algorithm has been examined analysing computational times, speed-up and efficiency. Results of this approach are shown and discussed. 相似文献

16.

A Parallel Particle Tracking Framework for Applications in Scientific Computing

Jing-Ru C. Cheng Paul E. Plassmann 《The Journal of supercomputing》2004,28(2):149-164

Particle tracking methods are a versatile computational technique central to the simulation of a wide range of scientific applications. In this paper, we present a new parallel particle tracking framework for the applications of scientific computing. The framework includes the in-element particle tracking method, which is based on the assumption that particle trajectories are computed by problem data localized to individual elements, as well as the dynamic partitioning of particle-mesh computational systems. The ultimate goal of this research is to develop a parallel in-element particle tracking framework capable of interfacing with a different order of accuracy of ordinary differential equation (ODE) solver. The parallel efficiency of such particle-mesh systems depends on the partitioning of both the mesh elements and the particles; this distribution can change dramatically because of movement of the particles and adaptive refinement of the mesh. To address this problem we introduce a combined load function that is a function of both the particle and mesh element distributions. We present experimental results that detail the performance of this parallel load balancing approach for a three-dimensional particle-mesh test problem on an unstructured, adaptive mesh, and demonstrate the ability of interfacing with different ODE solvers. 相似文献

17.

Parallel computing for lattice Monte Carlo simulation of large-scale thin film growth

舒继武郑纬民陆勤黄汉臣黄伟安《中国科学F辑(英文版)》2002,45(2):103-110

This paper proposes two viable computing strategies for distributed parallel systems: domain division with sub-domain overlapping and asynchronous communication. We have implemented a parallel computing procedure for simulation of Ti thin film growing process of a system with 1000 x 1000 atoms by means of the Monte Carlo (MC) method. This approach greatly reduces the computation time for simulation of large-scale thin film growth under realistic deposition rates. The multi-lattice MC model of deposition comprises two basic events: deposition, and surface diffusion. Since diffusion constitutes more than 90% of the total simulation time of the whole deposition process at high temperature, we concentrated on implementing a new parallel diffusion simulation that reduces communication time during simulation. Asynchronous communication and domain overlapping techniques are used to reduce the waiting time and communication time among parallel processors. The parallel algorithms we propose can simulate the thin 相似文献

18.

SCMP: A Single-Chip Message-Passing Parallel Computer

Baker James M. Gold Brian Bucciero Mark Bennett Sidney Mahajan Rajneesh Ramachandran Priyadarshini Shah Jignesh 《The Journal of supercomputing》2004,30(2):133-149

As technology improves and transistor feature sizes continue to shrink, the effects of on-chip interconnect wire latencies on processor clock speeds will become more important. In addition, as we reach the limits of instruction-level parallelism that can be extracted from application programs, there will be an increased emphasis on thread-level parallelism. To continue to improve performance, computer architects will need to focus on architectures that can efficiently support thread-level parallelism while minimizing the length of on-chip interconnect wires. The SCMP (Single-Chip Message-Passing) parallel computer system is one such architecture. The SCMP system includes up to 64 processors on a single chip, connected in a 2-D mesh with nearest neighbor connections. Memory is included on-chip with the processors and the architecture includes hardware support for communication and the execution of parallel threads. Since there are no global signals or shared resources between the processors, the length of the interconnect wires will be determined by the size of the individual processors, not the size of the entire chip. Avoiding long interconnect wires will allow the use of very high clock frequencies, which, when coupled with the use of multiple processors, will offer tremendous computational power. 相似文献

19.

Solution-Domain-Decomposition Method for Heat Transfer Problem Using Parallel Distributed Computing

Seyoung Oh Seungho Paik Hoa D. Nguyen 《Journal of scientific computing》1997,12(2):187-204

Solution-domain-decomposition (SDD) method is formulated for solving heat transfer problem and generalized for solving multi-domain problem. A generalized algorithm is suggested for parallel and distributing computation. Chebyshev expansion on the dependent variables is used for pseudospectral approximation of the governing equation in this study. Linear superposition principle is adapted to incorporate the interactions between the subdomains. By effective subdivision of computational domain, significant computational efficiency and computational memory savings are accomplished without losing spectral accuracy of the solution. Owing to independent characteristics of the subdomains. the scheme is well suited for multi-processor machines. Convergence study reveals that spectra! accuracy is still conserved for the multi-domain calculation. The calculation domain is divided up to 8 subdomains and calculation is distributed up to independent CPUs. Significant speed-up ratio is obtained by distributing the subtasks through the network. 相似文献