首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Because of the computational power of today??s GPUs, they are starting to be harnessed more and more to help out CPUs on high-performance computing. In addition, an increasing number of today??s state-of-the-art supercomputers include commodity GPUs to bring us unprecedented levels of performance in terms of raw GFLOPS and GFLOPS/cost. In this work, we present a GPU implementation of an image processing application of growing popularity: The 2D fast wavelet transform (2D-FWT). Based on a pair of Quadrature Mirror Filters, a complete set of application-specific optimizations are developed from a CUDA perspective to achieve outstanding factor gains over a highly optimized version of 2D-FWT run in the CPU. An alternative approach based on the Lifting Scheme is also described in Franco et al. (Acceleration of the 2D wavelet transform for CUDA-enabled Devices, 2010). Then, we investigate hardware improvements like multicores on the CPU side, and exploit them at thread-level parallelism using the OpenMP API and pthreads . Overall, the GPU exhibits better scalability and parallel performance on large-scale images to become a solid alternative for computing the 2D-FWT versus those thread-level methods run on emerging multicore architectures.  相似文献   

2.
High order path-conservative schemes have been developed for solving nonconservative hyperbolic systems in (Parés, SIAM J.?Numer. Anal. 44:300?C321, 2006; Castro et al., Math. Comput. 75:1103?C1134, 2006; J.?Sci. Comput. 39:67?C114, 2009). Recently, it has been observed in (Abgrall and Karni, J.?Comput. Phys. 229:2759?C2763, 2010) that this approach may have some computational issues and shortcomings. In this paper, a modification to the high order path-conservative scheme in (Castro et al., Math. Comput. 75:1103?C1134, 2006) is proposed to improve its computational performance and to overcome some of the shortcomings. This modification is based on the high order finite volume WENO scheme with subcell resolution and it uses an exact Riemann solver to catch the right paths at the discontinuities. An application to one-dimensional compressible two-medium flows of nonconservative or primitive Euler equations is carried out to show the effectiveness of this new approach.  相似文献   

3.
Heterogeneous multiprocessor systems, where commodity multicore processors are coupled with graphics processing units (GPUs), have been widely used in high performance computing (HPC). In this work, we focus on the design and optimization of Computational Fluid Dynamics (CFD) applications on such HPC platforms. In order to fully utilize the computational power of such heterogeneous platforms, we propose to design the performance-critical part of CFD applications, namely the linear equation solvers, in a hybrid way. A hybrid linear solver includes both one CPU version and one GPU version of code for solving a linear equations system. When a hybrid linear equation solver is invoked during the CFD simulation, the CPU portion and the GPU portion will be run on corresponding processing devices respectively in parallel according to the execution configuration. Furthermore, we propose to build functional performance models (FPMs) of processing devices and use FPM-based heterogeneous decomposition method to distribute workload between heterogeneous processing devices, in order to ensure balanced workload and optimized communication overhead. Efficiency of this approach is demonstrated by experiments with numerical simulation of lid-driven cavity flow on both a hybrid server and a hybrid cluster.  相似文献   

4.
In this document, we present an alternative to the method introduced by Ebner (Pattern Recognit 60–67, 2003; J Parallel Distrib Comput 64(1):79–88, 2004; Color constancy using local color shifts, pp 276–287, 2004; Color Constancy, 2007; Mach Vis Appl 20(5):283–301, 2009) for computing the local space average color. We show that when the problem is framed as a linear system and the resulting series is solved, there is a solution based on LU decomposition that reduces the computing time by at least an order of magnitude.  相似文献   

5.
基于Hadoop的高性能海量数据处理平台研究   总被引:2,自引:0,他引:2  
海量数据高性能计算蕴藏着巨大的应用价值,但是目前云计算体系只具有海量数据处理能力,而不具有足够的高性能计算能力。将具有超强并行计算能力的CPU与云计算相融合,提出了基于CPU/GPU协同的异构高性能云计算体系结构。以开源Hadoop为基础,采用注释码的形式对MapReduce函数中需要并行的部分进行标记。通过 定制GPU类加载器,将被标记代码转换为CUDA代码并动态编译运行。该平台将GPU的计算能力融合到MapReduce框架中,可高效处理海量数据。  相似文献   

6.
We propose a numerical approach to solve variational problems on manifolds represented by the grid based particle method (GBPM) recently developed in Leung et al. (J. Comput. Phys. 230(7):2540–2561, 2011), Leung and Zhao (J. Comput. Phys. 228:7706–7728, 2009a, J. Comput. Phys. 228:2993–3024, 2009b, Commun. Comput. Phys. 8:758–796, 2010). In particular, we propose a splitting algorithm for image segmentation on manifolds represented by unconnected sampling particles. To develop a fast minimization algorithm, we propose a new splitting method by generalizing the augmented Lagrangian method. To efficiently implement the resulting method, we incorporate with the local polynomial approximations of the manifold in the GBPM. The resulting method is flexible for segmentation on various manifolds including closed or open or even surfaces which are not orientable.  相似文献   

7.
The stochastic collocation method (Babu?ka et al. in SIAM J Numer Anal 45(3):1005–1034, 2007; Nobile et al. in SIAM J Numer Anal 46(5):2411–2442, 2008a; SIAM J Numer Anal 46(5):2309–2345, 2008b; Xiu and Hesthaven in SIAM J Sci Comput 27(3):1118–1139, 2005) has recently been applied to stochastic problems that can be transformed into parametric systems. Meanwhile, the reduced basis method (Maday et al. in Comptes Rendus Mathematique 335(3):289–294, 2002; Patera and Rozza in Reduced basis approximation and a posteriori error estimation for parametrized partial differential equations Version 1.0. Copyright MIT, http://augustine.mit.edu, 2007; Rozza et al. in Arch Comput Methods Eng 15(3):229–275, 2008), primarily developed for solving parametric systems, has been recently used to deal with stochastic problems (Boyaval et al. in Comput Methods Appl Mech Eng 198(41–44):3187–3206, 2009; Arch Comput Methods Eng 17:435–454, 2010). In this work, we aim at comparing the performance of the two methods when applied to the solution of linear stochastic elliptic problems. Two important comparison criteria are considered: (1), convergence results of the approximation error; (2), computational costs for both offline construction and online evaluation. Numerical experiments are performed for problems from low dimensions $O(1)$ to moderate dimensions $O(10)$ and to high dimensions $O(100)$ . The main result stemming from our comparison is that the reduced basis method converges better in theory and faster in practice than the stochastic collocation method for smooth problems, and is more suitable for large scale and high dimensional stochastic problems when considering computational costs.  相似文献   

8.
The weakly coupled WKB system captures high frequency wave dynamics in many applications. For such a system a level set method framework has been recently developed to compute multi-valued solutions to the Hamilton-Jacobi equation and evaluate position density accordingly. In this paper we propose two approaches for computing multi-valued quantities related to density, momentum as well as energy. Within this level set framework we show that physical observables evaluated in Jin et al. (J. Comput. Phys. 210(2):497–518, [2005]; J. Comput. Phys. 205(1):222–241, [2005]) are simply the superposition of their multi-valued correspondents. A series of numerical tests is performed to compute multi-valued quantities and validate the established superposition properties.  相似文献   

9.
为提高大规模并行计算的并行效率,充分发挥CPU与GPU的功能特点,特别是体现GPU强大的运算能力,提出了用消息传递接口(MPI)将一组GPU连接起来。使GPU通用计算与计算流体力学中的LBM(latticeBoltzmannmethod)算法相结合。根据GPU通用计算与LBM算法的原理,使MPI作为计算分配的机制,CUDA(compute unified device architecture)作为主要的计算执行引擎,建立支持CUDA的GPU集群,在集群上对LBM算法中的D2Q9模型进行二维方腔流数值模拟。实验结果表明,利用GPU组模拟与CPU模拟结果一致,更充分发挥了GPU的计算能力,提高了并行效率。  相似文献   

10.
A coupled level set and moment of fluid method (CLSMOF) is described for computing solutions to incompressible two-phase flows. The local piecewise linear interface reconstruction (the CLSMOF reconstruction) uses information from the level set function, volume of fluid function, and reference centroid, in order to produce a slope and an intercept for the local reconstruction. The level set function is coupled to the volume-of-fluid function and reference centroid by being maintained as the signed distance to the CLSMOF piecewise linear reconstructed interface. The nonlinear terms in the momentum equations are solved using the sharp interface approach recently developed by Raessi and Pitsch (Annual Research Brief, 2009). We have modified the algorithm of Raessi and Pitsch from a staggered grid method to a collocated grid method and we combine their treatment for the nonlinear terms with the variable density, collocated, pressure projection algorithm developed by Kwatra et al. (J. Comput. Phys. 228:4146–4161, 2009). A collocated grid method makes it convenient for using block structured adaptive mesh refinement (AMR) grids. Many 2D and 3D numerical simulations of bubbles, jets, drops, and waves on a block structured adaptive grid are presented in order to demonstrate the capabilities of our new method.  相似文献   

11.
Computational fluid dynamics simulations using the WENO method and level set method are applied to high Mach number nonrelativistic astrophysical jets, including the effects of radiative cooling. WENO methods introduced in Liu et al. (J. Comput. Phys., 115:200–212, 1994) have allowed us to simulate HH 1-2 astrophysical jets at Mach number much higher than Mach 80 (Ha et al. in J. Sci. Comput. 24:29–44, 2005). Simulations at high Mach numbers and with radiative cooling are essential for achieving detailed agreement with the astrophysical images. Simulations of interaction between astrophysical jet and environment using level set methods are considered in this paper.  相似文献   

12.
This paper presents an optimized low-dissipation monotonicity-preserving (MP-LD) scheme for numerical simulations of high-speed turbulent flows with shock waves. By using the bandwidth dissipation optimization method (BDOM), the linear dissipation of the original MP scheme of Suresh and Huynh (J. Comput. Phys. 136, 83–99, 1997) is significantly reduced in the newly developed MP-LD scheme. Meanwhile, to reduce the nonlinear dissipation and errors, the shock sensor of Ducros et al. (J. Comput. Phys. 152, 517–549, 1999) is adopted to avoid the activation of the MP limiter in regions away from shock waves. Simulations of turbulent flows with and without shock waves indicate that, in comparison with the original MP scheme, the MP-LD scheme has the same capability in capturing shock waves but a better performance in resolving small-scale turbulence fluctuations without introducing excessive numerical dissipation, which implies the MP-LD scheme is a valuable tool for the direct numerical simulation and large eddy simulation of high-speed turbulent flows with shock waves.  相似文献   

13.
While hype around the benefits of ‘cloud computing’increase, challenges in maintaining data security and data privacy have also been recognised as significant vulnerabilities (Ristenpart et al. in Proceedings of the 14th ACM conference on computer and communications security, pp 103–115, 2009; Pearson in CLOUD’09, pp 44–52, 2009; Vouk in J Comput Inf Technol 4:235–246, 2008). These vulnerabilities generate a range of questions relating to the capacity of organisations relying on cloud solutions to effectively manage risk. This has become particularly the case as the threats faced by organisations have moved increasingly away from indiscriminate malware to more targeted cyber-attack tools. From forensic computing perspective it has also been recognised that ‘cloud solutions’ pose additional challenges for forensic computing specialists including discoverability and chain of evidence (Ruan et al. in Adv Digital Forensics VII:35–46, 2011; Reilly et al. in Int J Multimedia Image Process 1:26–34, 2011). However, to date there has been little consideration of how the differences between indiscriminate malware and targeted cyber-attack tools further problematize the capacity of organisations to manage risk. This paper also considers these risks and differentiates between technical, legal and ethical dilemmas posed. The paper also highlights the need for organisations to be aware of these issues when deciding to move to cloud solutions.  相似文献   

14.
Given a graph G=(V,E), a vertex v of G is a median vertex if it minimizes the sum of the distances to all other vertices of G. The median problem consists of finding the set of all median vertices of G. In this note, we present self-stabilizing algorithms for the median problem in partial rectangular grids and relatives. Our algorithms are based on the fact that partial rectangular grids can be isometrically embedded into the Cartesian product of two trees, to which we apply the algorithm proposed by Antonoiu and Srimani (J. Comput. Syst. Sci. 58:215–221, 1999) and Bruell et al. (SIAM J. Comput. 29:600–614, 1999) for computing the medians in trees. Then we extend our approach from partial rectangular grids to a more general class of plane quadrangulations. We also show that the characterization of medians of trees given by Gerstel and Zaks (Networks 24:23–29, 1994) extends to cube-free median graphs, a class of graphs which includes these quadrangulations.  相似文献   

15.
16.
In the past decades, Model Order Reduction (MOR) has demonstrated its robustness and wide applicability for simulating large-scale mathematical models in engineering and the sciences. Recently, MOR has been intensively further developed for increasingly complex dynamical systems. Wide applications of MOR have been found not only in simulation, but also in optimization and control. In this survey paper, we review some popular MOR methods for linear and nonlinear large-scale dynamical systems, mainly used in electrical and control engineering, in computational electromagnetics, as well as in micro- and nano-electro-mechanical systems design. This complements recent surveys on generating reduced-order models for parameter-dependent problems (Benner et al. in 2013; Boyaval et al. in Arch Comput Methods Eng 17(4):435–454, 2010; Rozza et al. Arch Comput Methods Eng 15(3):229–275, 2008) which we do not consider here. Besides reviewing existing methods and the computational techniques needed to implement them, open issues are discussed, and some new results are proposed.  相似文献   

17.
We present a new kind of high-order reconstruction operator of polynomial type, which is used in combination with the scheme presented in Castro et al. (J. Sci. Comput. 39:67?C114, 2009) for solving nonconservative hyperbolic systems. The implementation of the scheme is carried out on Graphics Processing Units (GPUs), thus achieving a substantial improvement of the speedup with respect to normal CPUs. As an application, the two-dimensional shallow water equations with geometrical source term due to the bottom slope is considered.  相似文献   

18.
In this article, we are interested in the simulation of phase transition in compressible flows, with the isothermal Euler system, closed by the van-der-Waals model. We formulate the problem as an hyperbolic system, with a source term located at the interface between liquid and vapour. The numerical scheme is based on (Abgrall and Saurel, J. Comput. Phys. 186(2):361?C396, 2003; Le Métayer et al., J. Comput. Phys. 205(2):567?C610, 2005). Compared with previous discretizations of the van-der-Waals system, the novelty of this algorithm is that it is fully conservative. Its Godunov-type formulation allows an easy implementation on multi-dimensional unstructured meshes.  相似文献   

19.
20.
Graphics processor units (GPU) that are originally designed for graphics rendering have emerged as massively-parallel “co-processors” to the central processing unit (CPU). Small-footprint multi-GPU workstations with hundreds of processing elements can accelerate compute-intensive simulation science applications substantially. In this study, we describe the implementation of an incompressible flow Navier–Stokes solver for multi-GPU workstation platforms. A shared-memory parallel code with identical numerical methods is also developed for multi-core CPUs to provide a fair comparison between CPUs and GPUs. Specifically, we adopt NVIDIA’s Compute Unified Device Architecture (CUDA) programming model to implement the discretized form of the governing equations on a single GPU. Pthreads are then used to enable communication across multiple GPUs on a workstation. We use separate CUDA kernels to implement the projection algorithm to solve the incompressible fluid flow equations. Kernels are implemented on different memory spaces on the GPU depending on their arithmetic intensity. The memory hierarchy specific implementation produces significantly faster performance. We present a systematic analysis of speedup and scaling using two generations of NVIDIA GPU architectures and provide a comparison of single and double precision computational performance on the GPU. Using a quad-GPU platform for single precision computations, we observe two orders of magnitude speedup relative to a serial CPU implementation. Our results demonstrate that multi-GPU workstations can serve as a cost-effective small-footprint parallel computing platform to accelerate computational fluid dynamics (CFD) simulations substantially.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号