首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
基于MPI集群环境对弹塑性区域分解有限元并行计算进行研究。提出了基于三阶和四阶的龙格库塔(Runge-Kutta)方法对应力-应变关系进行积分的算法。积分过程中自动调整子步大小来控制积分过程中的误差。研制了采用最小残余平滑法的子结构预处理共轭梯度并行求解算法。算法在基于工作站机群的并行环境下实现。计算结果表明:该算法具有良好的并行加速比和效率,是一种有效的并行求解算法。  相似文献   

2.
《Computers & Structures》1987,26(4):551-559
The development of general-purpose finite element computer software systems has provided the capability to analyze a wide range of linear and non-linear structural problems. However, these software systems are severely limited for non-linear response calculations because of the available speed on current sequential computers. Recent and projected advances in parallel multiple instruction multiple data (MIMD) computers provide an opportunity for significant gains in computing speed and for broadening the range of structural problems which may be solved. The key to these gains is the effective selection and implementation of algorithms which exploit parallel computing. This paper documents experiences solving transient response calculations on an experimental MIMD computer, termed the Finite Element Machine. The paper describes the algorithm used, its implementation for parallel computations, and results for representative one- and two-dimensional dynamic response test problems. The results show computation speedups of up to 7.83 for eight processors, and indicate that significant speedups of solution time are possible for non-linear dynamic response calculations through the use of many processors and appropriate parallel integration algorithms. The results are extremely encouraging and suggest that significant speedups in structural computations can be achieved through advances in parallel computers.  相似文献   

3.
Ma  Zhiqiang  Lou  Yunfeng  Li  Junjie  Jin  Xianlong 《Engineering with Computers》2020,36(2):443-453
Engineering with Computers - The finite element analysis of complex structure often requires a refine mesh in some local domain. To reduce the computation time, an explicit asynchronous step...  相似文献   

4.
A noteworthy thing in desktop PCs is that they can provide a great opportunity to increase the performance of processing multimedia data by exploiting task- and data-parallelism with multi-core CPU and many-core GPU. This paper presents a high performance parallel implementation of 2D DCT on this heterogeneous computing environment. For this purpose, Intel TBB (threading building blocks) and OpenCL (Open Compute Language) are utilized for task- and data-parallelism, respectively. The simulation result shows that the parallel DCT implementations far the serial ones in processing speed. Especially, OpenCL implementation shows a linear speedup, a typical SIMD characteristic as the increase of 2D data sets.  相似文献   

5.
An mpC language designed specifically for programming high-performance computations on heterogeneous networks is described. An mpC program explicitly defines an abstract computing network and distributes data, computations, and communications over it. At runtime, the mpC programming environment uses this information and that about the actual network to distribute the processes over the actual network so as to execute the program in the most efficient way. Experience in using mpC for solving problems on local networks consisting of heterogeneous workstations is discussed.  相似文献   

6.
刘瑜  袁宏春  梁正 《计算机应用》2008,28(2):279-282
在多系统异构局域网中,由于不同操作环境的消息传递接口(MPI)程序缺乏互操作性,使得并行时域有限差分运算(FDTD)难以充分利用局域网内的计算资源。对此,提出利用应用层服务消息块(SMB)协议实现异构FDTD计算,并通过内存文件存取、内存映射数组以及引入冗余计算等方法来缓解与克服SMB通信延迟对并行性能的影响。数值模拟实例验证了新方法的可行性与正确性,所得加速比、并行效率等性能指标参数与常规同构MPI消息传递方法基本相当。  相似文献   

7.
The goal of this paper is to explore parallel methodologies with the desired flexibility, generality and accuracy for nonlinear dynamic finite element analysis on massively parallel computer. This paper tests the generality of the concurrent element processing approach and proposes a basic software design strategy to fully take advantage of features available in massively parallel computers having a hierarchical ring architecture. As a testbed, a large scale general purpose code, DYNA3D as used and modified as appropriate to test proposed parallel design concepts on a KSRI parallel computer.  相似文献   

8.
并行计算正成为科学和工程计算中的一个新趋势。将采用区域分裂技术的并行有限元方法应用于工作站机群的分布式并行环境。提出了基于单元区域分裂的共轭梯度并行算法。在工作站机群上对坝体结构进行求解,对其并行性能进行分析。  相似文献   

9.
In order to exploit the efficient computing power of many integrated cores on heterogeneous cluster, a multi-level and multi-granularity collaborative parallel computing method is proposed for finite element structural mechanical analysis. Computing tasks are divided into three levels: inter-node parallelism, inter-device parallelism and inter-core parallelism. Through mapping decomposablecomput- ing jobs to different hardware layers of heterogeneous MIC system, the proposed method not only effectively resolves the load balancing problem between CPU and MIC devices, but also significantly reduces the communication overheads of the system. Different engineering simulation case experiments for large scale parallel computing were conducted on “Tianhe 2” supercomputer. Up to 39000 CPU+MIC cores were employed and the finite element size of the analysis was more than 100 million units. Test results show that the proposed method can achieve good speedup and parallel computing efficiency in large scale parallel computing of finite element structural analysis. The optimized adaptation of finite element structural analysis and heterogeneous MIC computing platform is realized, which can provide reference for parallel porting and performance optimization of similar applications.  相似文献   

10.
11.
Concepts and implementation of parallel finite element analysis   总被引:1,自引:0,他引:1  
The design of complex engineering systems such as advanced aircraft structures and offshore platforms requires continually increasing levels of detail in supporting analysis. The finite element method is widely used as a computational method with which to model physical systems in various engineering problems. For detailed analyses of complex designs, structural models composed of several thousands of degrees of freedom are no longer uncommon. Such design activities require large order finite element and/or finite difference models and excessive computation demands in both calculation speed and information management. The computer simulation of the nonlinear dynamic response of structures and the implementation of parallel FEM systems on a high speed multiprocessor have received considerable attention in recent years. The driving forces of these activities included the reliable simulation of automotive and aircraft crash phenomena, and the increased performance of computers. Most existing major structural analysis software systems were designed 10–20 years ago and have been optimized for current sequential computers. Such systems often are not well structured to take maximum advantage of the recent and continuing revolution in parallel vector computing capabilities. These parallel vector computer architectures not only occur in the form of large supercomputers, but are now also occurring for minicomputers and even engineering workstations. To benefit from advances in parallel computers, software must be developed which takes maximum advantage of the parallel processing feature.  相似文献   

12.
Optimum structural design with parallel finite element analysis   总被引:3,自引:0,他引:3  
Structural analysis is an important part of the optimum structural design process. Therefore, extra effort should be devoted to make this part as efficient as possible. Since finite element analysis is the most powerful and widely used tool in the structural analysis field, in this paper a new method for structural optimization by parallel finite element method is presented. This method divides the original structure into several substructures and assigns each substructure to one processor. Each processor handles its finite element calculation independently with limited communication between processors. Some numerical examples on the Cray X-MP multiprocessor system with their obtained speedups are presented.  相似文献   

13.
Explicit finite element analysis (FEA) of masonry shear walls containing reinforcement at spacing between 800 mm and 2000 mm, referred to as wide spaced reinforced masonry (WSRM), are modelled using macroscopic material characteristics for the unreinforced masonry (URM) panels and damaged concrete plasticity for the grouted cores containing reinforcement. The material model and some basic principles of the explicit finite element algorithm are briefly discussed. It has been shown that by minimising the kinetic energy and using an appropriate time scaling and/or damping factor, the FEA could provide reasonable and efficient prediction of the behaviour of the WSRM and URM shear walls.  相似文献   

14.
Energy-efficient resource allocation within clusters and data centers is important because of the growing cost of energy. We study the problem of energy-constrained dynamic allocation of tasks to a heterogeneous cluster computing environment. Our goal is to complete as many tasks by their individual deadlines and within the system energy constraint as possible given that task execution times are uncertain and the system is oversubscribed at times. We use Dynamic Voltage and Frequency Scaling (DVFS) to balance the energy consumption and execution time of each task. We design and evaluate (via simulation) a set of heuristics and filtering mechanisms for making allocations in our system. We show that the appropriate choice of filtering mechanisms improves performance more than the choice of heuristic (among the heuristics we tested).  相似文献   

15.
We present a high-order method employing Jacobi polynomial-based shape functions, as an alternative to the typical Legendre polynomial-based shape functions in solid mechanics, for solving dynamic three-dimensional geometrically nonlinear elasticity problems. We demonstrate that the method has an exponential convergence rate spatially and a second-order accuracy temporally for the four classes of problems of linear/geometrically nonlinear elastostatics/elastodynamics. The method is parallelized through domain decomposition and message passing interface (MPI), and is scaled to over 2000 processors with high parallel performance.  相似文献   

16.
GPU以及集成式的CPU-GPU架构凭借其强大的并行处理能力和可编程流水线方式,已经成为数据库领域的研究热点。为充分利用异构平台的并行计算能力,提升列存储系统的查询性能,在研究异构平台结构特性的基础上,首先提出了GPU多线程平台上进行连接的数据划分策略--ICMD(Improved CMD),利用GPU流处理器并行处理各个子空间上的连接,然后利用任务评估分配模型实现查询负载的动态分配,使得查询操作能在多核CPU、GPU上高效并行执行。同时利用片上全局同步机制、局部内存重用技术优化ICMD连接算法。最后采用SSB基准测试集测试,结果表明:Intel? HD Graphics 4600平台上并行连接查询相比于CPU版本获得了35%的性能提升,较GPU查询引擎的Ocelot性能上提升了18%。  相似文献   

17.
A novel finite element methodology is developed capable of analyzing the geometrically nonlinear behavior of thin-walled framed structures composed of non-prismatic members. The pertinent element matrices are formulated on the basis of a modified version of the variational theorem of Hellinger and Reissner. Finite geometry changes are consistently described by using an updated Lagrangian (U.L.) formulation. Validity, accuracy and reliability of the proposed scheme are examined on the basis of several well-selected test examples.  相似文献   

18.
Existing procedures for nonlinear finite element analysis are reviewed. Common computational steps among existing methods are identified. Parallel-vector solution strategies for the generation and assembly of element matrices, solution of the resulting system of linear equations, calculations of the unbalanced loads, displacements and stresses are all incorporated into the Newton-Raphson (NR), modified Newton-Raphson (mNR), and BFGS methods. Furthermore, a mixed parallel-vector Choleski-Preconditioned Conjugate Gradient (C-PCG) equation solver is also developed and incorporated into the piecewise linear procedure for nonlinear finite element analysis. Numerical results have indicated that the Newton-Raphson method is the most effective nonlinear procedure and the mixed C-PCG equation solver offers substantial computational advantages in a parallel-vector computer environment.  相似文献   

19.
《Parallel Computing》1997,23(9):1365-1377
A finite element fluid analysis code, which is based on the matrix-storage free formulation and the element-by-element computation strategy, is developed. The code has reduced memory requirements due to the matrix-storage free formulation. Simulations involving one million elements can be carried out with less than 208 Mbytes of memory. The code is implemented on the massively parallel computers, KSR1 and CRAY T3D. In the case of KSR1, high parallel efficiency is achieved, i.e. 95.9% with 16 CPUs. In the case of T3D, excellent scalability is achieved. Each time step of a 3D cavity flow problem with one million elements required 36.3, 18.7 and 9.8 s of CPU time by using 32, 64 and 128 processors, respectively.  相似文献   

20.
To efficiently execute a finite element program on a 2D torus, we need to map nodes of the corresponding finite element graph to processors of a 2D torus such that each processor has approximately the same amount of computational load and the communication among processors is minimized. If nodes of a finite element graph do not increase during the execution of a program, the mapping only needs to be performed once. However, if a finite element graph is solution-adaptive, that is, nodes of a finite element graph increase discretely due to the refinement of some finite elements during the execution of a program, a dynamic load-balancing algorithm has to be performed many times in order to balance the computational load of processors while keeping the communication cost as low as possible. In the paper we propose a parallel dynamic load-balancing algorithm (LB) to deal with the load-imbalancing problem of a solution-adaptive finite element program on a 2D torus. The algorithm uses an iterative approach to achieve load-balancing. We have implemented the proposed algorithm along with two parallel mapping algorithms, parallel orthogonal recursive bisection (ORB) and parallel recursive mincut bipartitioning (MC), on a simulated 2D torus. Three criteria, the execution time of load-balancing algorithms, the computation time of an application program under different load balancing algorithms, and the total execution time of an application program (under several refinement phases) are used for performance evaluation. Simulation results show that (1) the execution of LB is faster than those of MC and ORB; (2) the mappings of LB are better than those of ORB and MC; and (3) the speedups of LB are better than those of ORB and MC.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号