Parallelization of the finite-element method (FEM) has been contemplated by the scientific and high-performance computing community for over a decade. Most of the computations in the FEM are related to linear algebra that includes matrix and vector computations. These operations have the single-instruction multiple-data (SIMD) computation pattern, which is beneficial for shared-memory parallel architectures. General-purpose graphics processing units (GPGPUs) have been effectively utilized for the parallelization of FEM computations ever since 2007. The solver step of the FEM is often carried out using conjugate gradient (CG)-type iterative methods because of their larger convergence rates and greater opportunities for parallelization. Although the SIMD computation patterns in the FEM are intrinsic for GPU computing, there are some pitfalls, such as the underutilization of threads, uncoalesced memory access, lower arithmetic intensity, limited faster memories on GPUs and synchronizations. Nevertheless, FEM applications have been successfully deployed on GPUs over the last 10 years to achieve a significant performance improvement. This paper presents a comprehensive review of the parallel optimization strategies applied in each step of the FEM. The pitfalls and trade-offs linked to each step in the FEM are also discussed in this paper. Furthermore, some extraordinary methods that exploit the tremendous amount of computing power of a GPU are also discussed. The proposed review is not limited to a single field of engineering. Rather, it is applicable to all fields of engineering and science in which FEM-based simulations are necessary. 相似文献
Simple, rapid, catalyst-free synthesis of complex patterns of long, vertically aligned multiwalled carbon nanotubes, strictly confined within mechanically-written features on a Si(1 0 0) surface is reported. It is shown that dense arrays of the nanotubes can nucleate and fully fill the features when the low-temperature microwave plasma is in a direct contact with the surface. This eliminates additional nanofabrication steps and inevitable contact losses in applications associated with carbon nanotube patterns. 相似文献
This paper presents a methodology to identify and locate critical links in a grid street network system for feeder transit services. A ‘critical’ link can be defined as a link that—when eliminated from or appended to an existing network—would cause the largest change in the network connectivity and consequently in the transit performance. The most significant contribution of this study is to present a simple analytical approach to locating the critical link(s) for a grid street network system of any size with uniform passenger demand across the service area. The distance between demands points have been used as the basic measure of impedance in the final derived closed-form equations. Easily computable formulas for identifying the critical links have been validated by simulation analyses with the street network system from the City of St. Joseph, Missouri. Useful insights from the analytical derivations and simulation results indicate a monotonic decrease in link criticality when moving from the centrally located links to those located at the periphery in a grid street network system. 相似文献
Mass transport models are developed to predict the rate of decarburization in RH vessels. Experimental data for several RH vessels are simulated with the help of models to estimate the volumetric mass transfer coefficient. At carbon levels greater than 20–30 ppm, it suffices in the kinetic model to assume carbon mass transport control only. The estimated area of reaction is 10–50 times larger than the nominal area of the vacuum vessel. The correlation between volumetric mass transfer and circulation rate is investigated for different industrial vessels. It should now be possible to predict the volumetric mass transfer coefficient for any vessel, depending on argon flow rate, snorkel diameter and vessel pressure. Carried over slag in the ladle, when rich in FeO, can contribute more than 100 ppm of oxygen to metal. 相似文献
Assembly free FEM bypasses the assembly step and solves the system of linear equations at the element level using Conjugate Gradient (CG) type iterative solver. The smaller dense Matrix-vector Products (MvPs) are encapsulated within the CG solver and are computed either at element level or degree of freedom (DoF) level. Both these strategies exploit the computing power of GPU effectively, but the performance is lagging due to the uncoalesced global memory access on GPU. This paper proposes an improved MvP strategy in assembly free FEM, which improves the performance by coalesced global memory access using on-chip faster shared memory and using the texture cache memory on GPU. Since GPU has limited shared memory (in few KBs), the proposed technique suffers from a problem known as low occupancy. Despite the low occupancy issue, the proposed strategy outperforms both element based and DoF based MvP strategies on GPU. Numerical experiments compared with element level and DoF level strategies on GPU and found that, GPU instance of proposed MvP outperforms both strategies approximately by factor of 7 and 1.5 respectively.
Although the manner in which the molten metal flows plays a major role in the formation of the uniform cylinder in centrifugal
casting, not much information is available on this topic. The flow in the molten metal differs at various rotational speeds,
which in turn affects the final casting. In this paper, the influence of the flow of molten metal of hyper eutectic Al-2Si
alloys at various rotational speeds is discussed. At an optimum speed of 800 rpm, a uniform cylinder was formed. For the rotational
speeds below and above these speeds, an irregular shaped casting was formed, which is mainly due to the influence of melt.
Primary á-Al particles were formed in the tube periphery at low rotational speed, and their sizes and shapes were altered
with changes in rotational speeds. The wear test for the inner surface of the casting showed better wear properties for the
casting prepared at the optimum speed of rotation. 相似文献