首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The graphic processor unit (GPU) is an ideal solution to problems involving parallel data computations. A serial CPU-based program of dynamic analysis for multi-body systems is rebuilt as a parallel program that uses the GPU’s advantages. We developed an analysis code named GMAP to investigate how the dynamic analysis algorithm of multi-body systems is implemented in the GPU parallel programming. The numerical accuracy of GMAP is compared with the commercial program MSC/ADAMS. The numerical efficiency of GMAP is compared with the sequential CPU-based program. Multiple pendulums with bodies and joints and the net-shape system with bodies and spring-dampers are employed for computer simulations. The simulation results indicate that the accuracy of GMAP’s solution is the same as that of ADAMS. In the net type system that has 2370 spring-dampers, GMAP indicates an improved efficiency of about 566.7 seconds (24.7% improvement). It is noted that the larger the size of the system, the better the time efficiency.  相似文献   

2.
针对板料成形过程仿真中计算效率低以及四边形单元几何逼近性差的问题,提出一种基于边光滑三角形壳元(Edge-based smoothed triangular shell element,EST)和图形处理器(Graphics processing unit,GPU)的板料成形并行计算方法。根据EST壳元及板料成形过程显式求解的特点,该方法采用将最小计算单位与线程一一对应的方式进行数组的求解,同时,采用并行缩减的方法进行单值的求解,实现了整个计算过程的细粒度并行。考虑到GPU并行计算系统的特点,采用由CPU进行主控,由GPU进行数值求解的程序架构,并以统一计算架构(Compute unified device architecture,CUDA)作为GPU编程环境,编制相应的程序。通过算例表明,与传统CPU串行计算方法相比,在计算精度一致的情况下,当计算模型单元数超过20 000个时,基于GPU的并行计算方法可以获得35倍以上的计算加速比,显著减少板料成形仿真的计算时间。  相似文献   

3.
Numerical optimization of tribological elements usually demands extended computations. The particle swarm optimization (PSO) method is known for its simple implementation and high efficiency in solving multifactor optimization problems. In this study, several parallel computing schemes using PSO for air foil bearing design are compared. The parallel programming models applied are multicore computing by OpenMP and many-core graphics processing unit (GPU) computing using Compute Unified Device Architecture (CUDA) and OpenACC. The best case was obtained when the OpenMP coding was applied at the algorithm level of optimization. The performance of CUDA was found to be compatible with OpenMP when parallel computing was used to solve the bearing model. Due to excess data communications computing using OpenACC was significantly slower than the other approaches. The parallel computing scheme recommended in this study is independent of PSO, which is applicable to tribological studies requiring global optimization analysis.  相似文献   

4.
全吉成  王平  王宏伟 《光学精密工程》2016,24(11):2863-2871
提出了计算机图形处理器(GPU)加速的光学航空影像正射校正并行算法,以满足获取光学航空影像对实时性的要求并提高对海量影像数据在CPU上串行正射校正的效率。介绍了光学影像正射校正算法原理以及正射校正算法的并行化处理。为减少GPU执行的计算负载,引入"有效像素区域"概念,设计了改进的GPU并行校正算法。通过配置选择以及存储器访问优化进一步提高了算法的执行效率。最后,分析了GPU并行算法的精度,并验证了噪声干扰对算法的影响。实验结果表明,优化的改进GPU并行算法显著提高了正射校正的速度,影像大小为5 000×5 000时,加速比最高可达CPU串行算法的223倍以上。虽然GPU单精度计算和噪声干扰会使影像校正精度有所下降,但尚在误差允许范围之内。该算法能够快速实现光学航空影像的正射校正,校正后的影像满足实际应用需要。  相似文献   

5.
Numerical analysis of thermal hydrodynamic (THD) lubrication is complex and time consuming because of the three-dimensional numerical domain of the solution of the energy equation. To reduce the execution time, many methods for determining solutions for bearing design problems are available; parallel computing technology is crucial for achieving high computational capability and involves using workstations equipped with multicore central processing units and many-core graphics processing units (GPUs). High-performance GPUs have emerged as powerful computing tools for modeling purposes in engineering and science. High-level GPUs depend on thousands of shaders, namely, processor cores. In this study, the traditional successive overrelaxation (SOR) method for solving the Reynolds equation and energy equation was replaced with the two- and three-dimensional red–black SOR methods for parallelizing the governing equations of a GPU system. The results show that a high-level GPU can be used to increase the parallel computing speed in the process of solving THD problems.  相似文献   

6.
单晶硅磨削过程分子动力学仿真并行算法   总被引:2,自引:0,他引:2  
建立单晶硅超精密磨削过程的三维分子动力学仿真模型,分析分子动力学仿真串行程序特点和并行仿真的可行性,提出基于区域二次划分的分子动力学仿真并行算法.编制并行仿真程序,进行分子动力学仿真,从瞬间原子位置图方面分析单晶硅超精密磨削过程的加工机理.将并行仿真结果与串行程序仿真结果进行对比分析,从瞬间原子位置图和系统能量方面验证并行程序结果的正确性,在仿真规模和计算时间方面并行程序有很大优势,从而说明并行仿真程序是有效的,可以应用在不同原子规模的分子动力学仿真计算中.  相似文献   

7.
增材制造模型的几何复杂程度和体积不断提高,模型切片所需时间大幅增加,极大影响了数据处理效率。提出了一种自适应负载均衡的异构并行切片算法,利用不断增长的GPU超强并行计算能力对传统切片算法进行了GPU并行化研究,利用模拟退火算法对切片任务进行负载均衡装箱,使各线程间的任务量一致。实验结果证明,该算法提高了模型的切片效率,尤其适合大型或超大型三维模型的快速切片任务。  相似文献   

8.
棉花中的异性纤维给棉纺织企业带来了不小的损失,因此开发出高效率的异纤分拣机是很多纺织企业的迫切需求。现有的异纤分拣机虽然能够达到剔除部分异纤的效果,但由于图像处理运算都是在工控机上的CPU里实现,为了达到在线实时检测的需求,只能采用比较简单的识别算法,异纤识别效果不是很理想。本文通过研究基于CPU+GPU异构系统的棉花异纤识别系统,让GPU通过并行运算来实现异纤的识别算法,从而大大提高了运算效率,有效的减少了算法的运行时间,为复杂的异纤识别算法在异纤识别系统的使用提供了条件,从而有效提高异纤杂质的剔除率。实验表明,基于CPU+GPU异构系统对算法的时间提高了十几倍。  相似文献   

9.
根据运动学等效的原则,在并联机器人中引入等效串联机器人及分支等效串联机器人,以等效广义坐标为中间变量建立并联机器人运动学正道解求解算法。该算法能有效处理结构带来的运动耦合,并且规划的软件具有自动生成迭代初始点、避免多解性以及便于实际应用等特点,从而为并联机器人的结构设计与创新提供了理论支持。  相似文献   

10.
依据离散元法理论对直线振动筛分过程进行了数值仿真,给出了程序运行的逻辑结构,并通过讨论确定了碰撞模型所选取的碰撞参数。通过引入CPU与GPU混合并行运算方法提高了数值仿真的运算效率,并依据运算结果讨论了并行运算对离散元三维数值仿真的影响。  相似文献   

11.
依据离散元法理论对直线振动筛分过程进行了数值仿真,给出了程序运行的逻辑结构,并通过讨论确定了碰撞模型所选取的碰撞参数。通过引入CPU与GPU混合并行运算方法提高了数值仿真的运算效率,并依据运算结果讨论了并行运算对离散元三维数值仿真的影响。  相似文献   

12.
Based on multi-body system theory and the mainshaft system of precision NC lathe as object investigated,it is treated as a coupled rigid-flexible multi-body system which is made up of some rigid and elastic bodies in an especial linking mode.And a dynamic model is established.The problems of computing vibration characteristics are resolved by using multi-body system transfer matrix method.Results show that the mainshaft system of NC lathe is in the stable and reliable working area all the time.The method is simple and easy,the idea is clear.In addition,the method can be easily used and popularized in the other multi-body system.  相似文献   

13.
一种基于图形处理器的频繁模式挖掘算法   总被引:1,自引:1,他引:0  
频繁模式挖掘是数据挖掘的核心问题.传统上,频繁模式并行挖掘主要是在集群上进行的,较少涉及共享内存多处理系统上的并行挖掘.基于广度优先搜索和直接计数策略研究了一种并行挖掘方法,并在图形处理器(graphics processing unit, GPU)最新统一计算设备架构CUDA(compute unified device architecture)下进行了实现.GPU-based FPMA用CPU控制搜索进程;在GPU的多处理器上,采用数据划分的计算策略,以适合GPU的顺序数据流方式计数,并根据候选项的长度动态剪枝事务数据集.实验结果表明,GPU-based FPMA比CPU版本平均加速了10倍以上.  相似文献   

14.
由于现有以大数据量和计算量为基础的大尺寸动态视觉测量系统处理速度较慢,本文建立了一个高速大尺寸动态视觉测量系统,并对该系统涉及的特征点中心定位、编码点识别、相机定向等算法进行了并行化研究。首先,分析了在不同测量条件下各个主要算法的时间消耗情况及每个主要算法的并行性;然后,对常规的特征点中心定位和编码点识别算法做了介绍,分别提出了特征点中心并行快速定位和编码点并行快速识别算法,并详细说明了这两种并行快速算法的实现原理。最后,针对大量原子操作的问题,提出了线程束集体原子操作的优化方法。实验结果表明:在不损失定位精度和识别率的前提下,图像中包含300个点时的并行方案比串行方案的时间开销减少了42%,当点数达到20 000时,时间开销减少91%以上。实验显示提出的并行设计方案有效地提高了处理速度,解决了大尺寸动态视觉测量系统实时性差的问题。  相似文献   

15.
一种航空斜视成像异速像移实时恢复算法   总被引:1,自引:0,他引:1  
李仕 《光学精密工程》2009,17(4):895-900
本文提出一种异速像移补偿算法,解决航空相机斜视工作时成像靶面的异速像移补偿问题。复杂多变的载机飞行姿态影响航空相机的工作状态,并导致各种像移模糊。当载机侧飞时,航空相机会处于斜视状态,焦平面上会同时出现多个异速像移。通过对异速像移产生机理的分析,我们根据不同像移量将图像分成多个区。为避免二维傅氏变化生成的振铃效应,将上述各区继续细分到像素线,用一维维纳滤波并行恢复各像素线,将结果合并成恢复图像。实验表明:对斜视状态下的运动模糊图像,本文算法恢复结果的PSNR为30.469,图像细节能得到有效恢复;本文的算法并行方案在GPU并行平台下取得一帧2048x2048灰度模糊图像17ms恢复的成绩,解决了图像恢复的实时性问题。  相似文献   

16.
运动模糊视频图像在图形处理器平台上的实时恢复   总被引:1,自引:0,他引:1  
王晶  李仕 《光学精密工程》2010,18(10):2262-2268
提出了一种图形处理器优化编程方法,用于实现运动模糊视频图像的实时恢复处理。根据计算统一设备架构(CUDA)的硬件框架特征对GPU的线程块及线程数量进行优化配置,并引入了一种自动内存接合访问的方法,使得GPU的硬件资源得到充分利用。根据图像频谱的对称性去除冗余信息,减少了图像算法在频谱滤波时的数据量,使得GPU对内存的访问次数下降,从而提升了算法效率。实验表明,本文提出的GPU方案的计算性能比传统的CPU平台方案提升了一个数量级,半频谱滤波设计使总时间开销减少20%以上,实验结果证明了本文方案的可行性及有效性。  相似文献   

17.
This study investigated the performance of parallel optimization by means of a genetic algorithm (GA) for lubrication analysis. An air-bearing design was used as the illustrated example and the parallel computation was conducted in a single system image (SSI) cluster, a system of loosely network-connected desktop computers. The main advantages of using GAs as optimization tools are for multi-objective optimization, and high probability of achieving global optimum in a complex problem. To prevent a premature convergence in the early stage of evolution for multi-objective optimization, the Pareto optimality was used as an effective criterion in offspring selections. Since the execution of the genetic algorithm (GA) in search of optimum is population-based, the computations can be performed in parallel. In the cases of uneven computational loads a simple dynamic load-balancing scheme is proposed for optimizing the parallel efficiency. It is demonstrated that the huge amount of computing demand of the GA for complex multi-objective optimization problems can be effectively dealt with by parallel computing in an SSI cluster.  相似文献   

18.
This paper describes one promising parallel algorithm and software system for structural modal analysis. The supercomputer ShenWei I is taken as the parallel computing environment and the finite element code MSC.NASTRAN is taken as the development platform. The purpose of the research and development is to make the dominating computing work of modal analysis parallelized, and, moreover, make the parallel solver tightly coupled with original analysis code. The integrated software system includes a serial FEA (Finite Element Analysis) and parallel solver and has a friendly interface. It can significantly improve the scale and velocity of structural modal analysis. It contributes to dynamic performance analysis for massive structures.  相似文献   

19.
Frequency response analysis is an important computational tool to simulate and understand the dynamic behavior of structures. However, for more target frequency and/or larger scale structures, the runtime is greatly increased. Furthermore, increasingly complex degree of freedom problems intended to improve the accuracy of the analysis results is creating longer. In this paper, we present efficient analysis using runtime reduction in frequency response analysis with NVIDIA GPU using the compute unified device architecture (CUDA) programming environment. The proposed method is based on the sparse conjugate gradient method and a Jacobi preconditioner. Numerical examples which implemented by three different FE model are used to verify the validity. The results show that GPU parallel implementation achieves significant speed up compared to a single CPU processor. Through these results, in the frequency response analysis, we show the possibility for efficient analysis with reduction of the solving time by using GPU parallel implementation.  相似文献   

20.
In Additive Manufacturing field, the current researches of data processing mainly focus on a slicing process of large STL files or complicated CAD models. To improve the efficiency and reduce the slicing time, a parallel algorithm has great advantages. However, traditional algorithms can’t make full use of multi-core CPU hardware resources. In the paper, a fast parallel algorithm is presented to speed up data processing. A pipeline mode is adopted to design the parallel algorithm. And the complexity of the pipeline algorithm is analyzed theoretically. To evaluate the performance of the new algorithm, effects of threads number and layers number are investigated by a serial of experiments. The experimental results show that the threads number and layers number are two remarkable factors to the speedup ratio. The tendency of speedup versus threads number reveals a positive relationship which greatly agrees with the Amdahl’s law, and the tendency of speedup versus layers number also keeps a positive relationship agreeing with Gustafson’s law. The new algorithm uses topological information to compute contours with a parallel method of speedup. Another parallel algorithm based on data parallel is used in experiments to show that pipeline parallel mode is more efficient. A case study at last shows a suspending performance of the new parallel algorithm. Compared with the serial slicing algorithm, the new pipeline parallel algorithm can make full use of the multi-core CPU hardware, accelerate the slicing process, and compared with the data parallel slicing algorithm, the new slicing algorithm in this paper adopts a pipeline parallel model, and a much higher speedup ratio and efficiency is achieved.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号